Main Project

TIME SERIES
Project Submitted to
The Controller of Examinations,
St Berchmans College (Autonomous), Changanassery,
in partial fulfillment of the award of Bachelor Degree in MAthematics
Submitted By:
ALEN SUNNY
Reg. No: 12100003
Under the guidance of
MS. DEVI N
Assistant Professor
Department of Mathematics
St. Berchmans College(Autonomous), Chanaganassery
DEPARTMENT OF MATHEMATICS
ST. BERCHMANS COLLEGE , CHANGANASSERY
2021-2024
CERTIFICATE
This is to certify that this Mr. ALEN SUNNY , has undergone Bachelor of Science in Mathematics
Course at St. Berchmans College, Changanassery, during the period 2021-2024 and has under-
taken the dissertation under the guidance of Ms. Devi N , Assistant Professor , Department of
Mathematics , St. Berchmans College (Autonomous ) Changanassery . He is permitted to submit
this dissertation to the Controller of Examination of the College
Changanassery
19 / 02 /2024 Fr.John J Chavara
Assistant Professor and Head
Department of Mathematics
CERTIFICATE
This is to certify that this project entitled TIME SERIES is a record of bonafide project work
done by ALEN SUNNY (Reg. no : 12100003 ) under my guidance and supervision, in partial
fulfilment of the requirements for the award of Bachelor of Science Degree in Mathematics and that
his project has not been previously submitted for the award of any Degree, Diploma, Fellowship,
Title or Recognition.
MS. DEVI N Fr.John J Chavara
Assistant Professor Assistant Professor and Head
Department of Mathematics Department of Mathematics
Changanassery
19 / 02 / 2024
DECLARATION
I, ALEN SUNNY (Reg. No: 12100001 , 2021-2024), do hereby declare that the
dissertation entitled TIME SERIES is a bonafide record of project work done by me under
the guidance and supervision of Ms. DEVI N , Assistant Professor, Department of Mathematics,
St Berchmans College (Autonomous), Changanassery, and that this dissertation or any part
there of has not previously formed the basis for the award of any degree, diploma, associateship,
fellowship or any other similar title of any University or Institution
Changanassery ALEN SUNNY
19 / 02 / 2024
ACKNOWLEDGEMENT
First and foremost, praises and thanks to the Almighty God for guiding me to complete this project
successfully.
I take this opportunity to express my profound gratitude and deep regards to my guide
Ms. DEVI N, Assistant Professor, Department of Mathematics, for her exemplary guidance,
monitoring and constant encouragement throughout the course of this project. The blessing, help
and guidance given by her, time to time shall carry me a long way in the journey of life on which
I am about to embark.
I would like to thank Fr. John J Chavara, Assistant Professor and Head of the Department of
Mathematics for his kind hearted support.
I may add that I am indebted to my family for their valuable encouragement. I express my sincere
thanks to all my colleagues and friends for their help and suggestions to improve this report.
Finally, my thanks go to all the people who have supported me to complete the project work
directly or indirectly.
ALEN SUNNY
Contents
1 Basic Concepts of Time Series 7
1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Long Term Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Secular Trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.2 Cyclical Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Short-term Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 Seasonal Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.2 Irregular Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Estimation of Trend 11
2.1 Method for Measurement of Secular Trend . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Freehand curve Method (Graphical Method) . . . . . . . . . . . . . . . . . 12
2.1.2 Method of selected points: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.3 Method of semi- average: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.4 Method of Least Squares: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Methods for Measurement of seasonal variations . . . . . . . . . . . . . . . . . . . 19
2.2.1 Method of Simple Average . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Ratio To Trend Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.3 Ratio to Moving Average Method . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Method of Cyclic Variation: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Measurement of Irregular Component: . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 Identifying Time series Models 26
3.1 Component of Time Series: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.1 Additive Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.2 Multiplicative Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2
3.2 Forecasting related models of Time Series: . . . . . . . . . . . . . . . . . . . . . . 27
3.2.1 Exponential smoothing methods . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.2 Single – Equation Regression Models: . . . . . . . . . . . . . . . . . . . . . 28
3.2.3 ARIMA Models: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 FITTING 34
4.1 Fitting ARIMA models : The Box-Jenkins approach . . . . . . . . . . . . . . . . . 34
4.1.1 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.3 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Application For Time Series: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.1 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
INTRODUCTION
Time series analysis serves as a powerful tool in understanding and deciphering patterns within
sequential data. In this chapter of , we discuss into the fundamental definition of time series and
explore its various manifestations, unraveling the intricate variations that make it a captivating
field of study.
Moving forward, the second chapter focuses on the pivotal methods employed for measuring sec-
ular trends and deciphering seasonal variations within time series data. This chapter acts as a
crucial foundation for comprehending the nuances of temporal patterns inherent in the datasets
under scrutiny.
The third chapter takes a comprehensive approach, exploring the landscape of time series mod-
els. From additive to multiplicative models, and delving into the intricacies of Autoregressive
(AR), Moving Average (MA), Autoregressive Integrated Moving Average (ARIMA), and more,
this section equips the reader with a diverse toolkit for analyzing and interpreting time- dependent
phenomena.
Our journey culminates in the fourth chapter, where the Box-Jenkins approach takes center stage.
This chapter unfolds the methodology behind fitting time series models, providing a systematic
and robust framework for practitioners to apply in real- world scenarios.
4
PRELIMINARIES
What are Time Series ? A Time series refers to the values of a chronologically ordered over
a successive period of time. Or can be said that It is a set of data collected and arranged in
accordance of Time. The analysis of time series means separating out different components which
influences values of series.
Examples
• Weather Data
• Rainfall Measurements
• Stock Prices etc..
Mean
It is the average of the given numbers and is calculated by dividing the sum of given numbers by
the total number of numbers.
P
x
M ean =
n
x is the sum of the all observations , n is the total number of observations
Covariance
covariance is a measure of how much two random variables change together. It provides insights
5
into the direction of the relationship between two variables, whether they tend to increase or
decrease together.
Pn
i=1 (Xi − X̄)(Yi − Ȳ )
Cov(X, Y ) =
n−1
• Xi and Yi are the individual data points,
• X̄ and Ȳ are the means of X and Y , respectively
• n is the number of data points
Method of least squares It is the process of finding the best-fitting curve or line of best fit for
a set of data points by reducing the sum of the squares of the offsets (residual part) of the points
from the curve
Method of least squares =

P P P
n xy − y x
m= P 2 P
n x − ( x)2
P P
y−m x
b=
n
y = mx + b
This method is used to find a linear line of the form y = mx + b, where y and x are variables, m
is the slope, and b is the y-intercept.
Pearson correlation coefficient It is a number between –1 and 1 that measures the strength
and direction of the relationship between two variables.
cov(X, Y )
ρ(X, Y ) =
σX σY
6
Chapter 1
Basic Concepts of Time Series
1.1 Definition
Assume that the series Xt runs throughout time, that is (Xt )t = 0, ±1, ±2, . . . but is only observed
at times t = 1,. . . , n. So we observe ( X1 , . . . , Xn ). Theoretical properties refer to the underlying
process (Xt )t ∈ Z
The notations Xt and X(t) are interchangeable.
The theory for the time series is based on the assumption of ‘second-order stationary’. (second-
order stationary means time series have a constant mean, variance and autocovariance that doesn’t
change with time).
The variations in the time series can be divided into two parts: long term variations
and short term variations.
7
1.2 Long Term Variation
Long-term variations refer to patterns or changes in a phenomenon that occur over extended peri-
ods of time, typically spanning years, decades, or even centuries. These variations are characterized
by their slow and gradual nature.
Long-term variations are divided into two :
1. Secular Trend
2. Cyclical Variation
1.2.1 Secular Trend
Secular Trend shows the definite and basic tendency of the statistical data with the passage of
time. This statistical data it often shows a consistent upward or downward direction.
example
1. Changes in productivity
2. Increase in rate of capital
3. Growth of population
1.2.2 Cyclical Variation
Cyclical variations refer to recurring, periodic fluctuations that extend beyond a year. These
variations are often linked to economic or business cycles. Series relating to prices, production,
8
demand etc undergo in the cyclical variation.
examples
1. Stock Prices
2. Medical Data
3. Population growth etc . . .
1.3 Short-term Variations
Short-term variations refer to changes that occur over relatively brief periods of time, typically
ranging from seconds to months.
It is divided into two :
1. Seasonal Variations
2. Irregular Variations
1.3.1 Seasonal Variations
Seasonal Variations are those variations which occur with some degree of regularity within a
specific period of one year or shorter. These variations are associated with recurring events,
climate changes, holidays etc .
examples
• In summers the sale of ice-cream increases
9
• At the time of diwali season , sale of crackers increases
1.3.2 Irregular Variations
Irregular Variations also known as random variations, It is caused by unusual, unexpected and
accidental events like natural calamities. It is unpredictable and non-patterned fluctuations in
nature.
examples
1. Crime Rates
2. Transportation Trends
3. Energy Consumption etc. . .
1.4 Objective
To understand the underlying structure of Time Series represented by sequence of observations by
breaking it down to its components. And to fit a mathematical model and procced to forecast the
future.
10
Chapter 2
Estimation of Trend
As we know time series consists of data arranged chronologically. In forecasting (an application of
Time series) it is important to analyse the characteristic movement of variations in the given time
series. Following are methods which is served as a tool for this analysis:
2.1 Method for Measurement of Secular Trend
• Freehand curve Method (Graphical Method)
• Method of selected points
• Methods for Least Squares
11
2.1.1 Freehand curve Method (Graphical Method)
This is the simple method of studying trend. In this method the given time series data are plotted
on graph paper by taking time on X-axis and the other variable on Y-axis. The graph obtained will
be irregular as it would include short-run oscillations. We may observe the up and down movement
of the curve and if a smooth freehand curve is drawn passing approximately to all points of a curve
previously drawn, it would eliminate the short-run oscillations (seasonal, cyclical and irregular
variations) and show the long-period general tendency of the data. However, It is very difficult to
draw a freehand smooth curve and different persons are likely to draw different curves from the
same data. The following points must be kept in mind in drawing a freehand smooth curve:
1. The curve is smooth
2. The numbers of points above the line or curve are equal to the points below it.
3. The sum of vertical deviations of the points above the smoothed line is equal to the sum
of the vertical deviations of the points below the line. In this way the positive deviations
will cancel the negative deviations. These deviations are the effects of seasonal cyclical and
irregular variations and by this process they are eliminated.
4. The sum of the squares of the vertical deviations from the trend line curve is minimum.
Example
The table below shows the data of sale of nine years:-
Year 1990 1991 1992 1993 1994 1995 1996 1997 1998
Sales (in Lakhs ) 65 95 115 63 120 100 150 135 172
If we draw a graph taking year on x-axis and sales on y- axis, it will be irregular as shown
below. Now drawing a freehand curve passing approximately through all this points will represent
trend line (trend line = blue line)
12
Merits
1. It is simple method of estimating trend which requires no mathematical calculations.
2. It is a flexible method as compared to rigid mathematical trends and therefore , a better
representative of the trend of the data.
3. This method can be used even if trend is not linear.
4. If the observations are relatively stable, the trend can easily be approximated by this method.
5. Being a non mathematical method , it can be applied even by a common man.
13
Demrits
1. It is subjective method. The values of trend obtained by different statisticians would be
different and hence, not reliable
2. Predictions made on the basis of this method are little value
2.1.2 Method of selected points:
In this method, two points considered to be the most representative or normal, are joined by
straight line to get secular trend. This, again, is a subjective method since different persons may
have different opinions regarding the representative points. Further, only linear trend can be
determined by this method.
2.1.3 Method of semi- average:
In this method, as the name itself suggests semi averages are calculated to find out the trend values.
By semi-averages is meant the averages of the two halves of a series. In this method, thus, the
given series is divided into two equal parts (halves) and the arithmetic mean of the values of each
part (half) is calculated. The computed means are termed as semi-averages. Each semi-average is
paired with the centre of time period of its part. The two pairs are then plotted on a graph paper
and the points are joined by a straight line to get the trend. It should be noted that if the data is
for even number of years, it can be easily divided into two halves. But if it is for odd number of
years, we leave the middle year of the time series and two halves constitute the periods on each
side of the middle year.
14
Merits
1. It is simple method of measuring trend
2. It is an objective method because anyone applying this to given data would get identical
trend value.
Demerits
1. This method can give only linear trend of the data irrespective of whether it exists or not.
2. This is only a crude method of measuring trend, since we do not know whether the effects
of other components are completely eliminated or not.
Example
Fit a trend line by the method of semi averages for the given data
Year 2000 2001 2002 2003 2004 2005 2006

Production 105 155 120 100 110 125 135
Solution
Since the number of years is odd(seven) , we will leave the middle year’s production value and
obtain the averages of first three years and last three years.
Year Production Average

2000 105
105+115+120
2001 115 3 = 113.33
2002 120
2003 100 (left out)
2004 110
110+125+135
2005 125 3 = 123.33
2006 135
15
Figure 2.1: Trend line
2.1.4 Method of Least Squares:
This is one of the most popular methods of fitting a mathematical trend. The fitted trend is
termed as the best in the sense that the sum of squares of deviations of observations, from it, is
minimized. This method of Least squares may be used either to fit linear trend or a nonlinear
trend (Parabolic and Exponential trend)
Fitting of Linear Trend:
Given the data (yt , t) for n periods, where t denotes time period such as year, month, day, etc.
We have the values of the two constants, ‘a’ and ‘b’ of the linear trend equation:
yt = a + bt
Where the value of ‘a’ is merely the Y-intercept or the height of the line above origin. That is,
when X=0,Y= a. The other constant ‘b’ represents the slope of the trend line. When b is positive,
the slope is upwards, and when b is negative, the slope is downward. This line is termed as the
16
line of best fit because it is so fitted that the total distance of deviations of the given data from
the line is minimum. The total of deviations is calculated by squaring the difference in trend value
and actual value of variable. Thus, the term “Least Squares” is attached to this method using
least square method, the normal equation for obtaining the values of a and b are :
X X
yt = na + b t
X X X
tyt = a t+b t2
P
Let X = t – A, such that X = 0 where A denotes the year of origin. The above equations can
also be written as
X X
Y = na + b t
X X X
XY = a X +b x2
P
Since x = 0 i.e. deviation from actual mean is zero We can write
P P
Y XY
a= ;b = P 2
n x
Merits
1. Given the mathematical form of the trend to be fitted, the least squares method is an
objective method.
2. Unlike the moving average method, it is possible to compute trend values for all the periods
and predict the value for a period lying outside the observed data
3. The results of the method of least squares are most satisfactory because the fitted trend
(y0 − yt )2
P P
satisfies the two most important properties , ie .(1) (y0 − yt ) = 0 and (2)
minimum. Here y0 denotes the observed values and yt denotes the calculated trend value.
17
The first property implies that the position of fitted trend equation is such that the sum of devi-
ations of observations above and below this equal to zero. The second property implies that the
sums of squares of deviations of observations, about the trend equations, are minimum.
Demrits
1. As compared with the moving average method, it is cumbersome method.
2. It is not flexible like the moving average method. If some observations are added, then the
entire calculations are to be done once again.
3. It can predict or estimate values only in the immediate future or the past.
4. This method cannot be used to fit growth curves, the pattern followed by the most of the
economic and business time series.
Example
Given below are the data relating to the production of sugarcane in a district.
Fit a straight trend by the method of least squares tabulate the trend values
Year 2000 2001 2002 2003 2004 2005 2006

Prod. of Sugarcane 40 45 46 42 47 50 46
Table 2.1: Production of Sugarcane Over the Years
Solution
Computation of trend values by the method of least squares(ODD Years)
P P
Y 316 XY
a= = = 45.146; b = P 2 = 1.036
n 7 x
18
Year (x) Production of X = x-2003 x2 XY Trend Values (yt )
Sugarcane (Y)
2000 40 -3 9 -120 42.04
2001 45 -2 4 -90 43.07
2002 46 -1 1 -46 44.11
2003 42 0 0 0 45.14
2004 47 1 1 47 46.18
2005 50 2 4 100 47.22
2006 46 3 9 138 48.23
2
N= 7 ΣY = 316 ΣX = 0 Σx = 28 ΣXY = 29 Σyt = 316
Therefore , the required equation of the straight line trend is given by
Y = a+bX
Y = 45.143 + 1.036(x-2003)
The trend values can be obtained by
WhenX = 2000, yt = 45.143 + 1.036(2000 − 2003) = 42.035
When X = 2001, yt = 45.143 + 1.036(2001 − 2003) = 43.071
2.2 Methods for Measurement of seasonal variations
2.2.1 Method of Simple Average
This is the easiest and the simplest method of studying seasonal variations. This method is used
when the time series variable consists of only the seasonal and random components. The effect
of taking average of data corresponding to the same period (say first quarter of each year) is to
eliminate the effect of random component and thus, the resulting averages consist of only seasonal
component. These averages are then converted into seasonal indices.
19
It involves the following steps:
If figures are given on a monthly basis:
1. Average the raw data monthly year wise
2. Find the sum of all the figures relating to a month. It means add all values of January for
all the year. Repeat the process for all the months.
3. Find the average of monthly figures. i.e., divide the monthly total by the number of years.
4. Obtain the average of monthly averages by dividing the sum of averages by 12.
5. Taking the average of monthly average as 100 find out the percentages of monthly averages.
For the average of January (X1) this percentage would be:( Monthly Average(for Jan) /
Grand average) ×100
Merits and Demerits
This is a simplest method of measuring seasonal variations. However this method is based on the
unrealistic assumption that the trend and cyclical variations are absent from the data.
Example
Calculate the seasonal index for the monthly sales of a product using of simple average
Months
Years
Jan Feb Mar Apr May June July Aug Sep Oct Nov Dec
2001 15 41 25 31 29 47 41 19 35 38 40 30
2002 20 21 27 19 17 25 29 31 35 39 30 44
2003 18 16 20 28 24 25 30 34 30 38 37 39
Solution
Computation of seasonal indices by method of simple averages.
20
S.I for Jan = (Monthly Average(for Jan) / Grand average) x 100
Grand Average=355.582/12=29.63
S.I for Jan=(17.666/29.361)100 = 59.62 ;
S.I for Feb=(26/29.361)100 = 87.77
Similarly other seasonal index values can be obtained.
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
59.62 87.77 80.99 87.77 78.74 109.12 112.49 94.49 112.49 129.36 120.36 126.89
Table 2.2: Months and its Simple Interest (S.I)
2.2.2 Ratio To Trend Method
This method is used when then cyclical variations are absent from the data, i.e. the time series
variable Y consists of trend, seasonal and random components. Using symbols, we can write Y =
T.S .R
Various steps for the computation:
1. Obtain the trend values for each month or quarter, etc, by the method of least squares
2. Divide the original values by the corresponding trend values. This would eliminate trend
21
values from the data.
3. To get figures in percentages, the quotients are multiplied by 100.
Thus, we have three equations:

Y
× 100
T
T.S.R
× 100
T
S.R × 100
Merits and Demerits
It is an objective method of measuring seasonal variations. However, it is very complicated and
doesn’t work if cyclical variations are present.
2.2.3 Ratio to Moving Average Method
The ratio to moving average is the most commonly used method of measuring seasonal variations.
This method assumes the presence of all the four components of a time series. Various steps in
the computation are as follows:
1. Compute the moving averages with period equal to the period of seasonal variations. This
would eliminate the seasonal components and minimize the effect of random component.
The resulting moving averages would consist of trend, cyclical and random components.
2. The original values, for each month are divided by the respective moving average figures and
the ratio is expressed as a percentage, i.e. SR” = Y / M.A = TCSR / TCR’, where R’ and
R” denote the changed random components.
3. Finally, the random component R” is eliminated by the method of simple averages
22
Merits and Demerits
This method assumes that all the four components of a time series are present and, therefore,
widely used for measuring seasonal variations. However, the seasonal variations are not completely
eliminated if the cycles of these variations are not of regular nature. Further, some information is
always lost at ends of the time series.
Example
Calculate seasonal indices by Ratio to moving average method from the following data
Year 1st Quarter 2nd Quarter 3rd Quarter 4th Quarter

2005 68 62 61 63
2006 65 58 66 61
2007 68 63 63 67
Solution
Year Quater Given 4-Fig Total 2-Fig Total 4-Fig Avg % of Moving Avg
2005 I 68
II 62 254 251 63.186 96.63
III 61 62.260 101.20
IV 63
2006 I 65
II 58 247 252 62.375 104.21
III 66 62.750 92.43
IV 61 62.875 104.97
2007 I 68
II 63 255 261 64.125 106.04
III 63 64.500 97.67
IV 67
Year 1st Quarter 2nd Quarter 3rd Quarter 4th Quarter

2005 - - 96.63 101.20
2006 104.21 92.43 104.97 95.50
2007 106.04 97.67 - -
Total 210.25 190.10 201.60 196.70
Average 105.125 95.05 100.80 98.35
Seasonal Index 105.30 95.21 100.97 98.52
Table 2.3: Seasonal Index Table
23
399.32
Arithmetic average of averages = = 99.83
4
By expressing each quarterly average as percentage of 99.83 , we will obtain seasonal indices.
Seasonal index of 1st Quarter = (105.125/99.83) × 100 = 105.30
Seasonal index of 2nd Quarter = (95.05/99.83) × 100 = 95.21
Seasonal index of 3rd Quarter = (100.80/99.83) × 100 = 100.97
Seasonal index of 4th Quarter = (98.35/99.83) × 100 = 98.52
2.3 Method of Cyclic Variation:
Cyclic variation exists in the data when tendency of the data increases and decreases in a given
period but time period is not fixed for cyclic variation (because it a short term variation).
For measurement of cycle variation first calculate seasonal and trend components then remove
seasonal, trend component and irregular component. Irregular component is just like an error term
like the previous knowledge which cannot be directly eliminated. To eliminate irregular component
moving average method is used. This method of elimination of the irregular component is known
as smoothing of irregular component.
Steps for computation of cyclic variation are:
First estimate trend (T) and seasonal values (S) of the given time series.
1. Divide time series values (Y) by trend (T) and seasonal estimated value (S), get cyclic (C)
and random component (R).
Y / T S = TCSI / T S = CI
2. b) Now eliminate random component from second step by using moving average of 3 or 5
months period and get cyclic component.
24
2.4 Measurement of Irregular Component:
Irregular component is the last component, also known as error term of the time series. An error
term can’t be eliminate fully from any time series because this happens due to natural forces.
There are no methods available to measure this component . But this component can be removed
little bit by averaging of the indices. If there is multiplicative model of time series then it can be
removed by dividing all other components to the irregular component. If there is additive model
then it can be removed by subtracting all components to their regular component.
25
Chapter 3
Identifying Time series Models
In the previous chapters we have seen that Time series , variables of time series etc.. Now we are
going to see the Time series models
Modelling of time series can be classified into two :
1. Component of Time Series
2. Forecasting
3.1 Component of Time Series:
Component wise we can classify time series into two:
• Additive Model
• Multiplicative Model
26
3.1.1 Additive Model
A data model in which the effects of individual factors are differentiated and added together to
model the data.
Let Y = original observation, T = trend component , S = seasonal component, C= cyclical
component , and I = irregular component
Assume that the value of Y of a composite series is the sum of the four components. That is
Y =T +S+C +I
3.1.2 Multiplicative Model
This model assumes that as the data increase, so does the seasonal patten. Most time series plots
exhibit such a pattern . In this model , the trend , seasonal , cyclical , irregular components are
multiplied.
Let Y = original observation, T = trend component , S = seasonal component, C= cyclical
component , and I = irregular component
It is assumed that the value Y of a composite series is the product of the four components. That
is
Y = T SCI
3.2 Forecasting related models of Time Series:
For forecasting , there are mainly four approaches based on time series data.
1. Exponential smoothing methods
27
2. Single-equation regression models
3. Simultaneous-equation regression models
4. Autoregressive integrated moving average (ARIMA) models
3.2.1 Exponential smoothing methods
Exponential smoothing is a time series method for forecasting univariate time series data. Time
series methods work on the principle that a prediction is a weighted linear sum of past observa-
tions or lags. The Exponential Smoothing time series method works by assigning exponentially
decreasing weights for past observations. Because the weight assigned to each demand observation
is exponentially decreased.
The model assumes that the future will be somewhat the same as the recent past. The only pattern
that Exponential Smoothing learns from demand history is its level - the average value around
which the demand varies over time.
Exponential smoothing is generally used to make forecasts of time-series data based on prior as-
sumptions by the user, such as seasonality or systematic trends
3.2.2 Single – Equation Regression Models:
Single – equation Regression Models in time series analysis involve using a single equation to
explain and predict the behaviour of a dependent variable . Typically, these models express the
dependent variable as a linear function of one or more independent variables , including time.
The most basic form is the autoregressive model (AR) , where the current value of the variable
depends on its past values. Another common model is the moving average model(MA), where the
current value is expressed as a linear as a linear combination of past error terms. And the ARIMA
models are can be considered as the single equation regression models.
28
ACF (Auto Correlation Function):
Auto Correlation function takes into consideration of all the past observations irrespective of its
effect on the future or present time period. It calculates the correlation between the t and (t-k)
time period. It includes all the lags or intervals between t and (t-k) time periods. Correlation is
always calculated using the Pearson Correlation formula
P earson′ scorrelationcoef f icient = ρ(X, Y ) = (cov(X, Y )/σX.σY
PACF (Partial Correlation Function):
The PACF determines the partial correlation between time period t and t-k. It doesn’t take into
consideration all the time lags between t and t-k.
e.g : let’s assume that today’s stock price may be dependent on 3 days prior stock price but
it might not take into consideration yesterday’s stock price closure. Therefore we considers the
time lags having a direct impact on future time period by neglecting the insignificant time lags in
between the two-time slots t and t-k
3.2.3 ARIMA Models:
The model which is a forecasting algorithm based on the assumption that previous values carry
inherent information and can be used to predict future values. In order to understand ARIMA,
we first have to separate it into its foundational constituents
1. AR
2. I
3. MA
29
The ARIMA model takes in three parameters:
1. p is the order of the AR term
2. q is the order of the MA term
3. d is the number of differencing
Autoregressive AR and Moving average MA
AR (auto-Regressive) Model
The AR model only depends on past values to estimate future values. Generalized form of the
AR model:
p
X
AR(p) : xt = α + βi xt−i + ϵ
i=1
The value p determines the number of past values p will be taken into account for the prediction.
The higher the order of the model, the more past values will be taken into account.
30
Consider an example of a milk distribution company that produces milk every month in the
country. We want to calculate the amount of milk to be produced current month considering the
milk generated in the last year.
We begin by calculating the PACF values of all the 12 lags with respect to the current month. If
the value of the PACF of any particular month is more than a significant value only those values
will be considered for the model analysis.
e.g : in the above figure the values 1,2, 3 up to 12 displays the direct effect(PACF) of the milk
production in the current month w.r.t the given the lag t. If we consider two significant values
above the threshold then the model will be termed as AR(2).
(The AR model can simply be thought of as the linear combination of p past values.)
MA (Moving Average ) Model
The moving-average model depends on past forecast errors to make predictions. Generalised
form of MA model:
yt = a1 ϵt−1 + a2 ϵt−2 + a3 ϵt−3 + . . . + ak ϵt−k
31
Consider an example of Cake distribution during a birthday function . Let’s assume that a person
asks you to bring pastries to the party. Every year you miss judging the number of invites to the
party and end upbringing more or less no of cakes as per requirement. The difference in the actual
and expected results in the error. So you want to avoid the error for this year hence we apply the
moving average model on the time series and calculate the number of pastries needed this year
based on past collective errors. Next, calculate the ACF values of all the lags in the time series.
If the value of the ACF of any particular month is more than a significant value only those values
will be considered for the model analysis.
e.g : in the above figure the values 1,2, 3 up to 12 displays the total error(ACF) of count in pastries
current month w.r.t the given the lag t by considering all the in-between lags between time t and
current month. If we consider two significant values above the threshold then the model will be
termed as MA(2).
( The MA model can simply be thought of as the linear combination of q past forecast errors.)
ARMA (Auto Regressive Moving Average) Model:
This is a model that is combined from the AR and MA models. In this model, the impact of
32
previous lags along with the residuals is considered for forecasting the future values of the time
series. Here β represents the coefficients of the AR model and α represents the coefficients of the
MA model.
Yt = β1 · yt−1 + α1 · εt−1 + β2 · yt−2 + α2 · εt−2 + β3 · yt−3 + α3 · εt−3 + . . . + βk · yt−k + αk · εt−k
Integrated ( I ) :
1. The integrated part refers to differencing the time series data to make it stationary.
2. Stationarity means that the statistical properties of the time series, such as mean and vari-
ance, remain constant over time.
3. The differencing parameter ”d” represents the number of times differencing is needed to
achieve stationarity.
ARIMA(Auto-Regressive Integrated Moving Average ) Model:
The ARIMA model is quite similar to the ARMA model other than the fact that it includes one
more factor known as Integrated( I ) i.e. differencing which stands for I in the ARIMA model. So
in short ARIMA model is a combination of a number of differences already applied on the model
in order to make it stationary, the number of previous lags along with residuals errors in order to
forecast future values.
33
Chapter 4
FITTING
One of the main approach is by using The Box-Jenkins method.
4.1 Fitting ARIMA models : The Box-Jenkins approach
The Box-Jenkins procedure is concerned with fitting an ARIMA model to data. It has three parts:
• Identification
• Estimation
• Verification
The data may require pre-processing to make it stationary. To achieve stationarity we may do any
of the following:
34
• Look at the time series
• Re-scale it ( for instance , by a logarithmic or exponential transform)
• Remove deterministic components
• Difference it until stationary. In practice d = 1, 2 should be sufficient.
4.1.1 Identification
For the moment we will assume that our series is stationary. The initial model identification is
carried out by estimating the sample autocorrelations and partial autocorrelations and comparing
the resulting sample autocorrelograms and partial autocorrelograms with the theoretical ACF and
PACF derived already.
• An MA(q) process has negligible ACF after the qth term
• An AR(p) process has negligible PACF after the pth term
As we have noted, very approximately, both the sample ACF and PACF have standard deviation
of around √1T , where T is the length of the series. A rule of thumb is that ACF and PACF values
±2
are negligible when they lie between √
T
. An ARMA(p, q) process has kth order sample ACF and
PACF decaying geometrically for k > max(p, q).
4.1.2 Estimation
: ARMA Process Now we consider an ARMA(p, q) process. If we assume a parametric model for
the white noise - this parametric model will be that of Gaussian white noise we can use maximum
likelihood.
35
We rely on the prediction error decomposition.That is ,X1 , . . . , X,have joint density
n
Y
f (X1 , . . . , Xn ) = f (X1 ) f (Xt |X1 , . . . , Xt−1 )
t=2
Suppose the conditional distribution of Xt given X1 , . . . , Xt−1 is normal with meanX̄t , and
variance Pt − 1, and suppose that X1 ∼ N (X̄1 , P0 ) Then for the log likelihood we obtain
n−1
X (Xt − X̄t )2

−2 log L = log(2π) + log Pt−1 +
t=1
Pt−1
Here X̄ and Pt−1 are functions of the parameters α1 , . . . , αp , β1 , . . . , βq and so maximum likelihood
estimators can be found (numerically) by minimising −2 log L with respect to these parameters.
The matrix of second derivatives of −2 log L, evaluated at the maximum likelihood funtion esti-
mation, is the observed information matrix, and its inverse is an approximation to the covariance
matrix of the estimators. Hence we can obtain approximate standard errors for the parameters
from this matrix.
Estimation : AR processes
For the AR(p) process

p
X
Xt = αi Xt−1 + εt (4.1.1)
i=1
we have the Yule-Walker equations
p
X
ρk = αi ρ| i − k|, for k > 0 (4.1.2)
i=1
We fit the parameters α1 , . . . , αp by solving
p
X
rk = αi r|i − k| k = 1, . . . , p (4.1.3)
i=1
These are p equations for the p unknowns α1 , . . . , αp which, as before, can be solved using a
36
Levinson-Durbin recursion. The Levinson-Durbin recursion gives the residual variance
 2
n p
1 X X
σ 2p = Xt − ᾱj Xt−j  (4.1.4)
n t=p+1 j=1
This can be used to select the appropriate order p. Define an approximate log likelihood by
−2 log L = n log σ̄p2

(4.1.5)
Then this can be used for likelihood ratio tests.
4.1.3 Verification
The third step is to check whether the model fits the data Two main techniques for model verifi-
cation are
• Overfitting: add extra parameters to the model and use likelihood ratio or t tests to check
that they are not significant.
• Residual analysis: calculate residuals from the fitted model and plot their acf, pacf, ‘spectral
density estimates’, etc, to check that they are consistent with white noise.
White noise : white noise refers to a random signal with a constant mean and constant variance.
It is a type of stochastic process where each data point is independent and identically distributed
(i.i.d.), meaning that there is no correlation between successive observations.
Characteristics of white noise
• Constant Mean
• Constant Variance
37
• Independence
• Randomness
Tests for white noise The Box–Pierce test is based on the statistic
m
X
2
Qm = T r2k (4.1.6)
k=1
where rk is the kth sample autocorrelation coefficient of the residual series, and p + q < m ≪ T .
It is called a ‘portmanteau test’. Because it is based on the all-inclusive statistic. If the model is
correct then Qm ∼ χ2m−p−q approximately
In fact ,r − k has variance (T − k)/(T (t + 2)) , an improved test is the Box-Ljung procedure which
replaces Q by
m
X (−1) 2
Qm = T (T + 2) (T − k) r2k (4.1.7)
k=1
where again Q̄m ∼ χ2m−p−q approximately
4.2 Application For Time Series:
4.2.1 Forecasting
Following is the prediction data for Bitcoin obtained from Yahoo Finance using the
Python programming language.
1 pip install yfinance
2 pip install statsmodels
3 pip install pmdarima
38
4 pip install matplotlib
5 import pandas as pd
6 import math
7 import numpy as np
8 import statsmodels
9 import pandas as pd
10 import numpy as np
11 import matplotlib . pyplot as plt
12 import yfinance as yf
13 from statsmodels . graphics . tsaplots import plot_acf , plot_pacf
14 from statsmodels . tsa . arima . model import ARIMA
15 from statsmodels . tsa . statespace . sarimax import SARIMAX
16 from time import time
17 import datetime
18 import warnings
19 from pmdarima . arima import auto_arima
20 warnings . filterw arnings ( ’ ignore ’)
21 import matplotlib . pyplot as plt
22 from statsmodels . tsa . arima . model import ARIMA
23 from sklearn . metrics import mean_squared_error , m e a n _ a b s o l u t e _ e r r o r
24 # Additional code for importing yfinance and downloading data
25 import yfinance as yf
26 df = yf . download ( ’BTC - USD ’)
1 df
1 plt . plot ( df . index , df [ ’ Adj Close ’ ])
2 plt . show
1 # Train test split
2 to_row = int ( len ( df ) *0.9)
3 training_data = list ( df [0: to_row ][ ’ Adj Close ’ ])
4 testing_data = list ( df [ to_row :][ ’ Adj Close ’ ])
1 # split data into train and training set
2 plt . figure ( figsize =(10 ,6) )
39
3 plt . grid = ( True )
4 plt . xlabel ( ’ Dates ’)
5 plt . ylabel ( ’ Closing Prices ’)
6 plt . plot ( df [0: to_row ][ ’ Adj Close ’] , ’ green ’ , label = ’ Train data ’)
7 plt . plot ( df [ to_row :][ ’ Adj Close ’] , ’ Blue ’ , label = ’ Test data ’ )
8 plt . legend ()
1 m o d e l _ p r e d i c t i o n s = []
2 n_test_obser = len ( testing_data )
40
Figure 4.1: Enter Caption
1 arima_model = auto_arima ( testing_data , s u p p r e s s _ w a r n i n g s = True , seasonal = False )
2 p , d , q = arima_model . order
4 print ( f ’ Best ARIMA model parameters : p ={ p } , d ={ d } , q ={ q } ’)
1 for i in range ( n_test_obser ) :
2 model = ARIMA ( training_data , order = (0 ,1 ,0) )
3 model_fit = model . fit ()
4 output = model_fit . forecast ()
5 print ( output )
6 yhat = output [0]
1 # Use auto_arima to find the best ARIMA parameters
2 model = auto_arima ( testing_data , s u p p r e s s _ w a r n i n g s = True , m =5 , trace = True ,
error_action = ’ ignore ’ , seasonal = False )
3 order = model . get_params () [ ’ order ’]
5 # Print the found ARIMA order (p , d , q )
41
6 print ( " ARIMA Order (p , d , q ) : " , order )
8 # Fit ARIMA model using the found parameters
9 arima_model = ARIMA ( training_data , order = order )
10 arima_result = arima_model . fit ()
11
12
13 # Display the summary of the ARIMA model
14 print ( arima_result . summary () )
42
1 for i in range ( n_test_obser ) :
2 model = ARIMA ( training_data , order = (0 ,1 ,0) )
3 model_fit = model . fit ()
4 output = model_fit . forecast ()
5 yhat = output [0]
6 m o d e l _ p r e d i c t i o n s . append ( yhat )
7 a c t u a l _ t e s t _ v a l u e = testing_data [ i ]
8 training_data . append ( a c t u a l _ t e s t _ v a l u e )
9 print ( model_fit . summary () )
43
1 plt . figure ( figsize =(11 ,7) )
2 plt . grid = ( True )
4 data_range = df [ to_row :]. index
5 plt . plot ( data_range , m o d e l _ p r e d i c t i o n s , color = ’ blue ’ , marker = ’o ’ , linestyle = ’
dashed ’ , label = ’ BTC Predicted Price ’)
6 plt . plot ( data_range , testing_data , color = ’ red ’ , label = ’ BTC Actual Price ’)
7 plt . title ( ’ Bitcoin Price Prediction ’)
8 plt . xlabel ( ’ Date ’)
9 plt . ylabel ( ’ Price ’)
10 plt . legend ()
11 plt . show ()
44
45
CONCLUSION
In the introductory chapter, the fundamental concepts of the time series were explored. A clear
definition of time series was provided, emphasizing its sequential nature and the inherent patterns
within. The discussion delved into the distinctions between long-term and short-term variations,
illustrating these concepts through practical examples. By understanding the classifications of
variations within time series, readers gained insight into the complexity and diversity of temporal
data. This groundwork set the stage for an in-depth exploration of techniques to analyse and
model time-dependent phenomena. The second chapter focused on estimating trends in time se-
ries data, employing various methodologies for measuring secular trends and seasonal variations.
The analysis included methods such as the Method of Selected Points, Least Squares, Ratio to
Trend, and Moving Average Method. The exploration extended to cyclic and irregular varia-
tions, with concrete examples elucidating the application of each approach. By comprehensively
evaluating different trend estimation methods, the chapter provided a robust foundation for sub-
sequent modeling and forecasting endeavours. The third chapter delved into the crucial task of
identifying time series models, employing a two-fold classification based on components and fore-
casting. The first classification distinguished between additive and multiplicative models, offering
a nuanced understanding of their respective applications. Subsequently, the discussion expanded
to forecasting-related models, encompassing exponential smoothing methods, single-equation re-
gression models, simultaneous-equation regression models, and Autoregressive Integrated Moving
Average (ARIMA) models. The incorporation of graph representations enhanced the conceptual
46
clarity, facilitating a more intuitive grasp of the modeling approaches presented. The final chap-
ter culminated in the application of the ARIMA model, as outlined in Chapter three , through
the Box-Jenkins approach. The process of model fitting was intricately detailed, emphasizing the
systematic steps involved. To exemplify the practical utility of the developed models, a real-world
case study involving Bitcoin data was undertaken. The application side showcased the forecasting
capabilities of the ARIMA model, demonstrating its adaptability to dynamic and volatile datasets.
The chapter provided a conclusive demonstration of the theoretical concepts discussed throughout
the project, offering valuable insights into the applicative potential of time series modeling tech-
niques.
47
BIBLIOGRAPHY
Books
1. BASIC ECONOMETRICS by Damodar N Gujarati , Dawn C Porter , Sangeetha
Gunasekar
2. STATISTICS FOR BUSINESS & ECONOMICS by Dennis J. Sweeney, Thomas
Arthur Williams, David R. Anderson , Jeffrey D. Camm , James J. Cochran
3. TIME SERIES MODELS by A.C. HARVEY
Websites
1. data.world
2. finance.yahoo.com
3. towardsdatascience.com
48

Main Project

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Main Project

Uploaded by

Copyright:

Available Formats

TIME SERIES

The Controller of Examinations,

St Berchmans College (Autonomous), Changanassery,

in partial fulfillment of the award of Bachelor Degree in MAthematics

Reg. No: 12100003

Under the guidance of

St. Berchmans College(Autonomous), Chanaganassery

Mathematics , St. Berchmans College (Autonomous ) Changanassery . He is permitted to submit

this dissertation to the Controller of Examination of the College

19 / 02 /2024 Fr.John J Chavara

Assistant Professor and Head

MS. DEVI N Fr.John J Chavara

Assistant Professor Assistant Professor and Head

Department of Mathematics Department of Mathematics

fellowship or any other similar title of any University or Institution

Changanassery ALEN SUNNY

Mathematics for his kind hearted support.

1 Basic Concepts of Time Series 7

1.2 Long Term Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2.1 Secular Trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2.2 Cyclical Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Short-term Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3.1 Seasonal Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3.2 Irregular Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.1 Freehand curve Method (Graphical Method) . . . . . . . . . . . . . . . . . 12

2.1.2 Method of selected points: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.3 Method of semi- average: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.4 Method of Least Squares: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 Methods for Measurement of seasonal variations . . . . . . . . . . . . . . . . . . . 19

2.2.1 Method of Simple Average . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2.2 Ratio To Trend Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2.3 Ratio to Moving Average Method . . . . . . . . . . . . . . . . . . . . . . . 22

2.3 Method of Cyclic Variation: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.4 Measurement of Irregular Component: . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Identifying Time series Models 26

3.1 Component of Time Series: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1.1 Additive Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.1.2 Multiplicative Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2.1 Exponential smoothing methods . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2.2 Single – Equation Regression Models: . . . . . . . . . . . . . . . . . . . . . 28

3.2.3 ARIMA Models: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.1 Fitting ARIMA models : The Box-Jenkins approach . . . . . . . . . . . . . . . . . 34

4.2 Application For Time Series: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

and robust framework for practitioners to apply in real- world scenarios.

influences values of series.

• Stock Prices etc..

the total number of numbers.

x is the sum of the all observations , n is the total number of observations

• Xi and Yi are the individual data points,

• X̄ and Ȳ are the means of X and Y , respectively

• n is the number of data points

from the curve

Method of least squares =

is the slope, and b is the y-intercept.

and direction of the relationship between two variables.

Basic Concepts of Time Series

at times t = 1,. . . , n. So we observe ( X1 , . . . , Xn ). Theoretical properties refer to the underlying

The notations Xt and X(t) are interchangeable.

change with time).

and short term variations.

by their slow and gradual nature.

Long-term variations are divided into two :