Professional Documents
Culture Documents
Abstract—Competition in the business environment has been independently of the service they receive. Besides the tra-
increasing with the developed technology. There are various ditional, rule-based methods, numerous advanced prediction
alternative service providers for customers. Those providers models have been conducted on customer churn analysis. Xi
would like to retain their current customers. To reach that goal,
churn prediction is a convenient method. However, predicting et al. build a recurrent neural network (RNN) using customer
churning customers is not a trivial task. It is more demanding behavior as part of customer churn prediction. They also
for some sectors like fast-food, since there can be numerous treat the problem as a regression problem [4]. Alternatively,
reasons when a customer stops having services. In this study, the Adwan et al. estimate churning customers via multilayer
challenging situation of customer churn in the fast-food industry perceptron (MLP) with backpropagation, using data from a
is analyzed, and a fast-food chain’s data is used. The data is
formed sequentially accordingly with customers’ personal churn telecommunications company. [5]. Similarly, Ullah et al. work
periods. Several recurrent neural network models such as gated on telecom data and use the random forests (RF) method then
recurrent units and long short-term memory are built using compare it with other classification algorithms [6], whereas
the sequential data to predict the churn stages of customers, Coussement and Van den Poel implement an SVM model
and they are compared with the other standard classification to predict customer churns [7]. Wei et al. also use telecom
methods. Apart from recurrent neural network models, a hybrid
model consisting of a convolutional network and long short-term data [8]. Using sequential data with long short-term memory
memory is applied. (LSTM) is another popular approach to churn analysis [9].
Index Terms—churn analysis, sequential data, deep learning, Çelik and Osmanoğlu, on the other hand, develop customer
recurrent neural network, convolutional neural network churn models using data from different sectors and compare
them [10]. As a different approach, Bose and Chen build a
I. I NTRODUCTION model where unsupervised clustering and supervised models
With the developing technology, business competition has are applied as hybrid [11]. Tama et al. address the problem as a
increased dramatically. Nowadays, there are many companies classification problem and use data from the fast-food industry.
and their alternatives that offer similar services for customers. They calculate the customer satisfaction and probability of
Therefore, when the customers are not satisfied with the customer churn by using decision tree and artificial neural
service or find more suitable products, they may turn to other network models [12]. Similarly, Bayrak and Bahadir work
options. For this reason, companies would like to retain their on fast-food chain data. They apply LSTM on sequential
customers, and churn analysis has become more crucial than data [13]–[15]. Whereas, Migueis et al. create a sequence
ever [1]. Even a modest increase in churn rate might reduce a model using the Markov method and compare this model with
company’s ability to grow. The analysis has an essential role other models [16]. As a different domain, Günther works on
in business-driven data analytics [2]. Thanks to churn analysis, insurance data to build a churn model [17]. Burez et al. study
companies can predict which customers they will lose before class imbalances in customer churn prediction. Results of the
losing the customers. As a result, it is possible to examine the study indicate that under-sampling might lead to improved
cause of customer dissatisfaction and prevent the loss of these prediction accuracy [18].
customers [3]. In this study, we use a dataset that contains transactional
Churn analysis is critical to the fast-food industry, where data from a fast-food chain. The transaction data is formed
competition is extremely high. The analysis is considerably as sequential data to be able to catch the patterns. Instead
difficult with this industry, as people’s habits can change of a rule-based model, we use RNN methods such as gated
Authorized licensed use limited to: Unitec Library. Downloaded on September 17,2022 at 05:57:48 UTC from IEEE Xplore. Restrictions apply.
TABLE I: PATTERN SAMPLES
Regular order
Samples Customer pattern
frequency
[9, 3, 7, 5, 6, 19, 12, 10, 15, 2, 5, 7, 3] 15 110111001110
[111, 13, 9, 4, 0, 21, 1, 10, 6, 3, 19, 19] 21 110111101111
[2, 2, 7, 4, 11, 49, 27, 16, 1, 141, 55, 30] 55 111111111110
223
Authorized licensed use limited to: Unitec Library. Downloaded on September 17,2022 at 05:57:48 UTC from IEEE Xplore. Restrictions apply.
Fig. 3: Sample phase structure
III. E XPERIMENTS
The experiments for the models are executed in Python
programming language using Sklearn and Keras libraries. The
number of data for each label is 212453, and the labeled data
set contains 637359 rows. The 10% of the data is separated
as the test set. After separating the test set, the train set
includes 573623 rows. The parameters used in the models
can be viewed in Table IV. For the non-sequential models,
10-fold cross-validation is executed during the training. For
Fig. 4: Phases and sequences
the parameter tuning, grid-search is applied. The layers for
CNN-LSTM models is available in Table III.
224
Authorized licensed use limited to: Unitec Library. Downloaded on September 17,2022 at 05:57:48 UTC from IEEE Xplore. Restrictions apply.
TABLE III: CNN-LSTM LAYER DETAILS before or after a churn period. Apart from that, the permitted
customer demographic data can also be added to the feature
Layers Parameters set.
Conv1D filters=32, kernel size=3, activation=’relu’
MaxPooling1D pool size=2
hidden nodes=100, dropout=0.1 R EFERENCES
LSTM
activation=’sigmoid’, optimizer=’adam’
Dense units=3, activation=’sigmoid’
[1] A. Sharma and D. P. K. Panigrahi, “Article: A neural network based
approach for predicting customer churn in cellular network services,”
International Journal of Computer Applications, vol. 27, no. 11, pp.
TABLE IV: MODEL PARAMETERS 26–31, August 2011.
[2] S. Nalchigar and E. Yu, “Business-driven data analytics: A conceptual
Model Parameters modeling framework,” Data Knowledge Engineering, vol. 117, pp. 359–
DT max depth=5 372, 2018.
algorithm=’auto’, metric=’minkowski’, [3] M. Farquad, V. Ravi, and S. B. Raju, “Churn prediction using compre-
KNN
n neighbors=17, p=2, weights=’distance’ hensible support vector machine: An analytical crm application,” Applied
bootstrap=False, max depth=70, max features=10,
min samples leaf=5, min samples split=8,
Soft Computing, vol. 19, pp. 31–40, 2014.
RF
n estimators=100, random state=10 [4] M. Xi, Z. Luo, N. Wang, and J. Yin, “A latent feelings-aware rnn
LR C=0.001, penalty=’l2’ model for user churn prediction with behavioral data,” ArXiv, vol.
alpha=0.05, hidden layer sizes=150, dropout=0.1, abs/1911.02224, 2019.
MLP
activation=’relu’, solver=’adam’, max iter=100 [5] O. Adwan, H. Faris, K. Jaradat, O. Harfoushi, N. Ghatasheh, and
SVM C=10, gamma=0.01, kernel=’rbf’, random state=10 K. Abdullah, “Predicting customer churn in telecom industry using
hidden nodes=250, dropout=0.1, recurrent dropout=0.1, multilayer preceptron neural networks : Modeling and analysis,” 2014.
GRU
activation=’sigmoid’, optimizer=’adam’
hidden nodes=250, dropout=0.2, recurrent dropout=0.2,
[6] I. Ullah, B. Raza, A. K. Malik, M. Imran, S. U. Islam, and S. W. Kim,
LSTM “A churn prediction model using random forest: Analysis of machine
activation=’relu’, optimizer=’adam’
learning techniques for churn prediction and factor identification in
telecom sector,” IEEE Access, vol. 7, pp. 60 134–60 149, 2019.
[7] K. Coussement and D. Van den Poel, “Churn prediction in subscription
The results of the models can be seen in Table V. The results services: An application of support vector machines while comparing
on the table display that RNN models are more successful than two parameter-selection techniques,” Expert Systems with Applications,
vol. 34, no. 1, pp. 313–327, 2008.
the traditional, non-sequential models. When we compare the [8] C.-P. Wei and I.-T. Chiu, “Turning telecommunications call details
sequential models, the hybrid CNN-LSTM model is the most to churn prediction: a data mining approach,” Expert Systems with
successful among the others. Since a relatively large data set is Applications, vol. 23, no. 2, pp. 103–112, 2002.
[9] C. Mena, A. D. Caigny, K. Coussement, K. W. D. Bock, and S. Less-
used for training the models, more advanced models are more mann, “Churn prediction with sequential data and deep neural networks.
successful than the non-sequential models. a comparative analysis,” ArXiv, vol. abs/1909.11114, 2019.
[10] Özer Çelik and U. OSMANOĞLU, “Comparing to techniques used in
TABLE V: MODEL RESULTS customer churn analysis,” Journal of Multidisciplinary Developments,
vol. 4, no. 1, pp. 30–38, 2019.
Method Precision (%) Recall (%) F-1 Score (%) [11] I. Bose and X. Chen, “Hybrid models using unsupervised clustering for
DT 63.21 61.15 62.16 prediction of customer churn,” Journal of Organizational Computing
RF 70.03 68.19 69.09 and Electronic Commerce, vol. 19, no. 2, pp. 133–151, 2009.
KNN 65.72 62.64 64.14 [12] B. A. Tama, “Data mining for predicting customer satisfaction in fast-
SVC 70.89 71.64 71.26 food,” 2015.
LR 67.38 68.24 67.80
[13] A. T. Bayrak, A. A. Aktaş, O. Susuz, and O. Tunalı, “Churn prediction
MLP 72.31 73.14 72.72
GRU 75.09 75.24 75.16 with sequential data using long short term memory,” in 2020 4th
LSTM 77.05 77.12 77.08 International Symposium on Multidisciplinary Studies and Innovative
BI-LSTM 77.68 77.24 77.45 Technologies (ISMSIT), 2020, pp. 1–4.
CNN-LSTM 78.02 78.06 78.03 [14] A. T. Bayrak, A. A. Aktaş, O. Tunalı, O. Susuz, and N. Abbak,
“Personalized customer churn analysis with long short-term memory,” in
2021 IEEE International Conference on Big Data and Smart Computing
(BigComp), 2021, pp. 79–82.
IV. C ONCULUTION [15] M. B. Bahadır, A. T. Bayrak, G. Yücetürk, and P. Ergun, “A comparative
In this study, we use a fast-food chain’s transaction data study for employee churn prediction,” in 2021 29th Signal Processing
and Communications Applications Conference (SIU), 2021, pp. 1–4.
to predict the churn customers. We label the data whenever [16] V. L. Miguéis, D. V. D. Poel, A. Camanho, and J. F. E. Cunha,
a churn occurs so that we do not miss any potential label. “Predicting Partial Customer Churn Using Markov for Discrimination
We consider the problem a classification problem by using for Modeling First Purchase Sequences,” Working Papers of Faculty
of Economics and Business Administration, Ghent University, Belgium
three labels representing the stages of being churned. We build 12/806, Aug. 2012.
RNN models using sequential data and compare them with [17] C.-C. Günther, I. F. Tvete, K. Aas, G. I. Sandnes, and Ørnulf Borgan,
the traditional classification methods. As mentioned in the “Modelling and predicting customer churn from an insurance company,”
Scandinavian Actuarial Journal, vol. 2014, no. 1, pp. 58–71, 2014.
section Experiments, RNN models are more successful than [18] J. Burez and D. Van den Poel, “Handling class imbalance in customer
the traditional, non-sequential methods since the RNN models churn prediction,” Expert Systems with Applications, vol. 36, no. 3, Part
can link and detect the relations in sequential data better. 1, pp. 4626–4636, 2009.
[19] S. B. Kotsiantis, I. D. Zaharakis, and P. E. Pintelas, “Machine learning:
Also, among the sequential models, CNN-LSTM is the most a review of classification and combining techniques,” Artificial Intelli-
successful one. The pipeline in the study is straightforward gence Review, vol. 26, no. 3, pp. 159–190, Nov 2006.
and can be applied to some other sequential data in different [20] T. N. Sainath, O. Vinyals, A. Senior, and H. Sak, “Convolutional,
long short-term memory, fully connected deep neural networks,” in
sectors by defining the churn rules. 2015 IEEE International Conference on Acoustics, Speech and Signal
As a future study, this work can also be used as an input Processing (ICASSP), 2015, pp. 4580–4584.
for a recommender system by detecting customers’ products
225
Authorized licensed use limited to: Unitec Library. Downloaded on September 17,2022 at 05:57:48 UTC from IEEE Xplore. Restrictions apply.