Professional Documents
Culture Documents
213
Code! = 000) which says transaction had been rejected due over dependent variable. In our case our output variable is
to some reasons. Table II shows the response code and its Transaction amount (we are predicting amount needed on next
description along with the count of approved and not approved day) and rest of parameters are input variables.
transactions.
TABLE II: Respond code and description of transaction
Response Code Response Description Count
000 Approved 74951
001 Limit Exceeded 1731
002 Account Not Found 254
003 Account Inactive 62
004 Low Balance 2879
007 Card not found 6
009 Error in input data 10
010 Duplicate Transaction 4
014 Warm Card 326
015 Stolen/Lost Card 14
016 Bad Card Status 23
017 Customer Cancellation 163
020 Invalid response 31
024 Bad PIN 542 Fig. 1: Relationship between transaction amount and date
025 Expired Date 24
028 Account not linked 199
029 Internal Error 3
In the figure 1 we have shown the relationship between
039 No Credit account 1 transaction amount and date. The green spots showing the
041 Expired date mismatch 5 dates in 2013, while the blue and red dates indicates the dates
045 Unable to process 272 in 2014 and 2015 respectively. In the above plot, it can be
049 Internal message Error 2 7
050 Host status unknown 100 noticed that most of transactions per day is in between 50k to
051 Host not processing 569 100k, which clearly shows the pattern in it. The next plot that is
053 No saving account 31 figure 2 shows the relationship between transaction amount and
054 Safe transmit mode 33
055 Host link down 894 transaction count. The Pearson coefficient value is 0.94 clearly
056 Sent to issuer 263 indicates that both of these variable are highly co-related to
058 Timed out 1018 each other. The maximum transaction count was encounter in
060 PIN retries exhausted 127
061 HSM not responding 24
our data is 200 and the maximum transaction amount is 250k.
079 Honor with ID 17 The graph is linear which mean as the number transaction
080 Message format found 3 count increases the amount of withdrawal will also increase as
083 No Comms Key 1
091 Issuer reversal 263
shown in figure 2.
094 TXN is not allowed 3
096 Transaction rejected 10
097 Cash has expired 21
104 Account Blocked 148
216 Routing not found 2
968 ATM reversed 15
975 Faulty Dispense 9
976 Cash Retracted 29
214
In mathematical term, if E. Time Series regression
Time series modeling is the method for forecasting and
y(w, x) (1)
prediction. It takes the decision by working on time that is
is the output value then we can say. (minutes, years, days, hours) and find out hidden insight in
data. It works well when the data is correlated. It is basically
y(w, x) = w0 + w1 x1 + ....wp xp (2) a set of data points gathered at constant time interval. There
are two things which makes time series special and different
Above formulae designate the vectors w = w1 , ..., wp as
from linear regression. First it is time dependent, unlink
coef_ and w0 as intercept_.
linear regression which says that observations are independent,
Linear Regression fits a linear model with coefficients w =
second, it identify the sessional trends in the data for example
w1 , ..., wp to minimize the residual sum of squares between
transactions occur most before gazette holiday etc. In order
the observed responses in the dataset, and the responses
to run time series model we have assumed that time series
predicted by the linear approximation. Mathematically it solves
(TS) is stationary and its statistical properties such as variance,
a problem of the form:
mean remain same over period. It is important to because
there is very high probability that series will follow same
minw ||Xw − y||2 (3)
pattern in future also. To check the stationary we have did
A. Linear Regression Model two thing first is plot Rolling Statistics. This plot contains the
With Linear Regression Model along with default settings
moving averages and moving variance and analyze if it varies
we have got following result mean squared error: 0.05 variance
from time. Another test for checking stationary is to check
score: 0.79. It can be noticed that error has been decreased
with Dickey-Fuller Test which consist of a ’Test Statistic’ and
drastically. The error shows that we are 0.05 percent away
’Critical Values’ at difference level (confidence). If the ‘Critical
from accuracy. The strong value of variance is 1 but in our
Value’ is greater than ‘Test Statistic’ then we can say that the
case it is 0.79 which is acceptable.
series is stationary as shown in figure 3.
B. Ridge Linear Regression Model
Ridge Regression technique, is used when the data suffers
from multi-co-linearity. In multi-co-linearity, even the least
squares estimates (OLS) are unbiased, their variances are
large which deviates the observed value far from the actual
value. By adding a degree of bias to the regression estimates,
ridge regression reduces the standard errors. Ridge regression
solves the multi-co-linearity problem by shrinking parameter
λ. (lambda). These are some important points about Ridge.
The assumptions of this regression is same as least squared
regression except normality is not to be assumed. It shrinks the
value of coefficients but does not reaches 0, which proposes
no feature selection. This method regularizes and uses l2
regularization. After running ridge we got following results:
mean squared error: 0.03 and variance score: 0.89. It can be Fig. 3: Analyzing Stationary of time series with Rolling
noticed that MSE and variance is improved. Statistics
C. Experiment with Lasso Model Following are the results of Dickey-Fuller Test:
The Least Absolute Shrinkage and Selection Operator is a Test Statistic -4.880407
regression method that involves constraining the absolute size p-value 0.000038
of the regression coefficients. By constraining the sum of the Lags Used 21.000000
absolute values of the estimates, we end up in a condition Number of Observations Used 889.000000
where some of the parameter estimates may be exactly 0. The Critical Value 5 % -2.864797
larger the penalty applied, the further estimates are shrunk Critical Value 1 % -3.437727
towards 0. This is convenient for some automatic variable Critical Value 10 % -2.568504
selection, or when dealing with highly correlated predictors,
where standard regression will usually have ‘too large’ coef- From the figure 3 we concluded that the difference in
ficients. After running lasso we got following results: mean standard deviation is very small, mean is clearly increasing
squared error: 0.03 and variance score: 0.90. It can be noticed and decreasing with time and thus we can say that it’s not
that MSE and variance is improved. a stationary series. Also, the test statistic is way less than the
D. Experiment with Bayesian Ridge Regression critical values. Note that the signed values should be compared
Bayesian linear regression is a method of linear regression and not the absolute values. Hence from the above test we
in which the statistical analysis is assumed to be within the concluded that we have to make time series stationary. To
context of Bayesian inference. After running BRR we got the make time series stationary we need to model the trend and
following results: mean squared error: 0.03 and variance score: seasonality in the distribution and remove those trends from
0.90. the series to get a stationary distribution. One of the method
215
to stationary series to calculate moving average and find it’s RSS we have got normalize root squared sum 0.30. We have
stationary. In this method, we take average of ‘k’ prior values run ARMA model for weeks also but the results were not
depending on the frequency of time series. Here we can take satisfactory either.
the average over the past one month, i.e. last thirty values. F. Recurrent Neural Network - Long Short-Term Memory
After running Dickey-Fuller Test, we found the following plot (LSTM) Model
and distribution as shown in figure 4.
The long short term memory network is the RRN which
trained itself with the help of back propagation. LSTM contain
the memory block instead of neurons which are connected via
layers. To setup our experiment for neural network we have
used transaction amount feature, because we are predicting
transaction amount. That is, given the amount of transaction (in
units of thousands) on a day, what is the amount of transaction
next day? We have created 3 variable one contain transaction
amount of day (X), second contain the contain transaction
amount of next day (Y), and third contain the transaction
amount of next to next day (Z). Figure 6 contain the plot of
test with LSTM.
216
H. Discussion ACKNOWLEDGEMENTS
All the information of features is discussed in section 3, also We are thankful to Yameen M. Malik, Zaid M. Memon
we have added missing value in the data set. After running and Farrukh H. Syed for the discussions and a review of the
experiment we have found following MSE and variance for manuscript.
each experiment. R EFERENCES
TABLE III: Best Accuracy per Algorithm [1] S. Madhavi, S. Abirami, C. Bharathi, B. Ekambaram, T. Krishna Sankar,
Algorithm Mean Squared Eror Variance
A. Nattudurai, N. Vijayarangan. “ATM Service Analysis Using Predictive
Linear Regression Model 0.05 0.79 Data Mining”. International Journal of Computer, Electrical, Automation,
Ridge Linear Regression Model 0.03 0.89 Control and Information Engineering Vol: 8, No:2, 2014.
Lasso Model 0.03 0.90 [2] M. Erol Genevois, D. Celik, H. Z. Ulukan. “ATM Location Problem and
RidgeCV Model 0.03 0.90 Cash Management in Automated Teller Machines”. International Journal
LassoLAR 0.26 -0.01 of Social, Behavioral, Educational, Economic, Business and Industrial
Bayesian Redige Regression 0.03 0.90 Engineering Vol: 9, No: 7, 2015.
Recurrent Neural Network (LSTM Model) 0.028 0.91
LSTM for Regression with Time Series 0.029 0.91 [3] Ahmadreza Ghodratia, Hassan Abyakb and Abdorreza Sharifihosseinic.
“ATM cash management using genetic algorithm”. Management Science
Letters 3 (2013) 2007–2014.
Note that MSE (Mean squared error) indicates how far [4] Saad M. Darwish. “A Methodology to Improve Cash Demand Forecasting
we are away from actual prediction while variance is about for ATM Network”. International Journal of Computer and Electrical
inference in data. The good value for variance is near to Engineering, Vol. 5, No. 4, August 2013.
one. We can notice from the above table that almost all of [5] Mohammad Hossein Pour Kazemi, Ph.d Eldar. Sedaght Parast, Mojtaba
the regression techniques have provided impressive result, but Amini “Prediction of Optimal Cash Injection in Automated Teller Ma-
chines, ARMA Approach”. 2014.
the best we have got is from Bayesian Ridge Regression and
[6] Venkatesh Kamini 1 Vadlamani Ravi 2 Anita Prinzie3 Dirk Van den
RidgeCV Model that is 0.90 variance and 0.03 MSE. One of Poel4. “Cash Demand Forecasting in ATMs by Clustering and Neural
the reason for better result is may be it does cross validation Networks”. 30 September 2013.
on data, due to which the data trained and test more efficiently. [7] Mojtaba Zandevakili, Mehdi Javanmard “Using fuzzy logic (type II) in
The next experiment which we conducted on our data is the the intelligent ATMs’ cash management” International Research Journal
time series regression, in time series experiment we have kept of Applied and Basic Sciences 2014 Available online at www.irjabs.com
ISSN 2251-838X / Vol, 8 (10): 1516-1519.
two features only one is transaction amount and other is date
[8] A.R. Brentnall, M.J. Crowder, D.J. Hand, "Predictive-sequential fore-
and created the stationary series in order to identify trends and casting system development for cash machine stocking", International
seasonality in data. We have tested the stationary of time series Journal of Forecasting, vol. 26, pp.764-776, 2010a
model with Dickey-Fuller test and found 95% confidence on [9] A.R. Brentnall, M.J. Crowder, D.J. Hand, "Predicting the amount indi-
stationary series. After making the time series stationary we viduals withdraw at cash machines using a random effects multinomial
have experimented with ARIMA (Auto regressive Integrated model", Statistical Modeling, vol. 10 (2), pp.197-214, 2010b.
Moving Average). The ARIMA forecasting for a stationary
time series is nothing but a linear (like a linear regression)
equation. The predictors depend on the parameters (p,d,q) of
the ARIMA model: Number of AR (Auto-Regressive) terms
(p), Number of MA (Moving Average) terms (q), and Number
of Differences terms (d). The RSS (Residual Squared Sum)
indicates the error of 351 which is not acceptable. After
normalization of RSS we have got normalize root squared
sum 0.30, which is very low therefore we can conclude that
ARIMA approach don’t work well on our data. The figure
3 shows stationary of time series which we got after running
experiments. The reasons for this results may be: less features,
data is asymmetric, sample are not selected properly, under-
fitting and over-fitting
V. C ONCLUSIONS
In this paper we have attempted to solve the ATM-out-of-cash
problem. The data set which we have used in this research
contains the ATM withdrawal transactional data of one of the
largerst banks of Pakistan. The proposed study of this paper has
implemented various algorithms for optimization such as linear
regression, Ridge Linear Regression, LASSO Model Ridge CV
Model, LASSO LAR, Bayesian Ridge Regression, Random
Forest Regression, time series prediction, RRN, RRN with time
series and ARIMA. In the final analysis, we found that linear
regression still provides the optimal solution. We report 98%
accuracy with that approach on this original data set.
217