You are on page 1of 12

Machine learning on inflation

in India
Vibha Yadav & Sandeep Yadav

“Even a bad AI/ML model will perform better than


a good model given a much larger data size”

We were intrigued by the possibility of predicting


India inflation (CPI) , given the small set of data
(only about 100 monthly data points) and large set
of features (around 300+). This data is a perfect
blend for overfitting issues and we wanted to see if
we could construct a model with reasonable
predictive power using new approaches.
BACKGROUND
Few studies have been conducted on inflation in
India. However we could not find any of them
directly relevant to the traders - though chances
are that we could have missed relevant studies.
Most of these are :
1. Prediction of inflation few months in advance:
The volatile food inflation is primary driver of CPI,
unlike in developed markets, thus making months
in advance calculations error prone.
2. Endogenous models try to predict the
inflation from historical inflation movements :
There is limited time series data for CPI in India
(monthly from 2012), as the benchmark changed.
Thus, unlike in developed countries, there is
insufficient information for endogenous analysis.
3. Exogenous models use macroeconomic
inputs
The exogenous models take inputs such as money
supply, currency etc to forecast inflation, which
have low predictive power.

The primary drawback of the above models are


that they are not able to replicate the model
performance of developed economies. Usually
these models do not account for (i) volatile
composition of CPI in India (ii) differences in
transmission of monetary and fiscal mechanism,
and (iii) ambiguous and incomplete public data in
India.
METHODOLOGY
As stated before, there are few data points in the
series and considerable number of features. Thus
the model is prone to overfit. Hence it was critical
to manipulate the data and reduce the feature
dimensions. Though we tested all major models
(DNN, CNN, RNN, LSTM and their combinations),
we found simpler ones gave a better fit.

We identified the hyperparameters by


implementing basic hypertuning.
However due to the volatile nature of the loss
function, and issues with small dataset, we had to
build on hypertuning by writing customized
algorithms to identify the ideal model parameters.
Output shape
The output of the model is a Gaussian distribution
distribution with a mean and std deviation. For the
prediction we prefer to choose only the values
within 1 standard deviation thus ignoring the
outliers. Examples of output predictions are:

Performance
The historical performance of the blind test data
(past nine months) vis-a-vis average economist
forecast is as below:
Mean Absolute error
Survey average 25.6 bp
Model prediction 21.5 bp

As can be seen, the model compares quite well


when compared with the average survey.
Moreover it is important to note that these
predictions are made using a small amount of
public data.
While the model can be improved with easy
optimizations that have already been identified, in
the current form the model gives a good indication
with minimal effort for data collection,
manipulation or calculation .

SWOT Analysis
SWOT analysis in this case is quite relevant and
highlights key features.
Way forward

Going forward we have following optimizations in


mind:
1. Predict seasonal non-public using RNN
models and use them as input to this model.
2. Extend predictions to even further time
horizons.

These are quite simple extensions to the current


model and we intend to update the model in next
iteration.
Appendix 1: Input data
We have taken monthly CPI components as the
features, and actual monthly reading of the CPI as
the label. As expected, large part of data is
surrogate data and quite often we have clubbed
data together.

Dimensions of correlated data have been reduced.


We did not use PCA or any other formal dimension
reducing algorithm as the features have strict
relationship (weights) and their co-relation and
impact are deterministic. There have been
exceptions when it makes economic sense to keep
co-related data such as PMI services and PMI
manufacturing have been retained separately.
Appendix 2: Mitigating overfits
We have tried to make the model more
representative of the underlying momentum.
Based on that, the major hurdles that we faced
were:
1. The lack of data: Any ML model relies heavily
on a large set of data. However in our case,
we had close to 120 monthly data points -
which included unreliable imputed data
during covid-19 lockdown phase.
2. The large set of input feature: The large set of
features (300 in CPI, but 100+ in surrogates)
led to significant risk of overfitting.
About 30 input features were carefully
selected thus reducing the dimension issues.
Quite often we have knowingly increased the
error in our model, to ensure it does not
overfit.
3. Imputed data in 2020: During the lockdown
of 2020, the government had used imputed
data, rather than practical observations.
Moreover the data during that time was very
volatile and unprecedented. Since we were
already struggling with small set of data,
there was hardly any chance to fit this
anomalous data within the model.
4. Cyclicity: CPI is quite seasonal. However to
keep model simple we have not taken the
time lag (as that would have increased the
input matrix) and have taken simpler
momentum surrogate for seasonality.

Appendix 3: Model structure


We experimented with many models. The models
experimented with were:
 Linear regression
 Dense Neural Network
 Multi-step Dense Neural Network
 Convolution Neural Network
 Recurrent Neural Network (LSTM)

However we preferred simpler versions of the


models with minimal hidden layers to minimize
overfit risk.

Appendix 4: Learning rate (1e-04)


The learning rates significantly improved the
performance after 1e-06 and started tapering
around 1e-04. Thus usually, dependent upon the
models we chose either 1e-03 or 1e-04 as our
learning rates.
Appendix 5: Batch size (1)
As can be seen, batch size 1 has strikingly better
performance for both training and validation set
(the y-axis on the right image is five times smaller).
Since the data set is small, the computational
demand is less - thus letting us use small batch size.
However the small batch size does lead to a more
volatile gradient descent.
Batch size 10:

Batch size 1:
Appendix 6: Examples of overfit:
The complex models lead to overfit quite often as
can be seen in the chart below. The validation set’s
error is increasing as the test set’s error continues
to decrease.

This can further be seen in the following graph


where green line is actual CPI and blue line is
predicted line. The training set is very well
predicted (upto 80 readings) , but prediction of the
blind test set is not that good
The below chart shows how the training set has
been compromised, but the predictions of testing
set are much more reflective:

You might also like