Research Paper

JID: EOR
ARTICLE IN PRESS [m5G;March 20, 2019;13:13]

European Journal of Operational Research xxx (xxxx) xxx
Contents lists available at ScienceDirect
European Journal of Operational Research

journal homepage: www.elsevier.com/locate/ejor
Merging anomalous data usage in wireless mobile

telecommunications: Business analytics with a strategy-focused
data-driven approach for sustainability
Yi-Ting Chen a,c, Edward W. Sun b,∗, Yi-Bing Lin a
a
College of Computer Science, National Chiao Tung University (NCTU), Hsinchu, Taiwan
b
KEDGE Business School, France
c
School of Business Informatics and Mathematics, University of Mannheim, Germany
a r t i c l e i n f o a b s t r a c t
Article history: Mobile internet usage has exploded with the mass popularity of smartphones that offer more conve-
Received 20 June 2017 nient and efficient ways of doing anything from watching movies, playing games, and streaming music.
Accepted 23 February 2019
Understanding the patterns of data usage is thus essential for strategy-focused data-driven business ana-
Available online xxx
lytics. However, data usage has several unique stylized facts (such as high dimensionality, heteroscedas-
Keywords: ticity, and sparsity) due to a great variety of user behaviour. To manage these facts, we propose a novel
Analytics density-based subspace clustering approach (i.e., a three-stage iterative optimization procedure) for in-
Data mining telligent segmentation of consumer data usage/demand. We discuss the characteristics of the proposed
Decision support systems method and illustrate its performance in both simulation with synthetic data and business analytics with
Artificial intelligence real data. In a field experiment of wireless mobile telecommunications for data-driven strategic design
OR in telecommunications and managerial implementation, we show that our method is adequate for business analytics and plausi-
Validation of OR Computations
ble for sustainability in search of business value.
© 2019 Published by Elsevier B.V.
1. Introduction munications, Jain, Muller, and Vilcassim (1999) show that cellu-
lar service usage levels differ by customer segments (i.e., busi-
4G LTE, the fourth generation Long-Term Evolution (LTE) mo- ness/professional and personal). However, such segments have not
bile telecommunications technology, standardizes high-speed wire- been substantiated in mobile data usage. Amdocs’ 2015 State of the
less communications for mobile phones and data terminals, and RAN (Radio Access Network) reported that 10 percent of mobile
has spread globally. Based on a recent 2016 GSA study (Global users has consumed 80 percent of the world’s mobile data traf-
mobile Suppliers Association), 503 operators have commercially fic and the power users often used as much as 10 times more
launched LTE networks in 167 countries.1 Global mobile data traffic data than the average mobile subscriber based on 25 million voice
will increase 12-fold and the number of mobile subscriptions will and data connections (all with lots of smartphone usage) in ma-
reach 9.3 billion by the end of 2018 (Ericsson, 2012). Such a dra- jor cities around the world.2 Cisco’s Visual Networking Index fo-
matic and rapid expansion of wireless demand calls for sustainable cused on overall global carrier trends in February 2017 reported
spectrum management systemizing the managerial integration of that the top 20 percent of mobile users generate 56 percent of
the radio-frequency spectrum and the telecommunications infras- mobile data traffic and the top 5 percent of users consume 25
tructure (e.g., transceiver architectures, multi-channel interference, percent of mobile data traffic by September 2016.3 Both business
transmission, and core networks) for efficient utilization. professionals and private customers could demand high data us-
On the other hand, strong usage anomalies have been ob- age with individual preference for data exhausting activities, such
served in relation to demographic clusters, educational clusters, as videos, games, virtual reality, and augmented reality. Tradi-
geographic clusters, and professional clusters. For cellular com- tional demographic variables, education background, and career
∗ 2
Corresponding author. See http://solutions.amdocs.com/SOTR2015.html?LeadSource=OnlineBanner&
E-mail addresses: yitchen@mail.uni-mannheim.de (Y.-T. Chen), edward.sun@ Publication=AmdocsPR.
kedgebs.com (E.W. Sun), linyb@nctu.edu.tw (Y.-B. Lin). 3
See https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-
1
http://gsacom.com/press-release/4glte-networks-passes-500-milestone-says-gsa/ networking- index- vni/mobile- white- paper- c11- 520862.html.
https://doi.org/10.1016/j.ejor.2019.02.046
0377-2217/© 2019 Published by Elsevier B.V.
Please cite this article as: Y.-T. Chen, E.W. Sun and Y.-B. Lin, Merging anomalous data usage in wireless mobile telecommunications:
Business analytics with strategy-focused data-driven approach for sustainability, European Journal of Operational Research, https://doi.
org/10.1016/j.ejor.2019.02.046
JID: EOR
2 Y.-T. Chen, E.W. Sun and Y.-B. Lin / European Journal of Operational Research xxx (xxxx) xxx
experience in conventional customer segmentation are no longer is overestimated), and (2) consumers pay for a data plan that is be-
explicitly efficient. Therefore, the data-driven approach in business low their actual demand (i.e., the specified plan is underestimated).
analytics for sustainable data usage is greatly desired. In case (1), this increases resource wastage, while in case (2), con-
Sustainable operations refer to an enterprise’s pursuit of sus- sumers are required to pay an add-on, which increases future mo-
tainability in its corresponding systems while taking into account bile operator reserves. Both cases are unsustainable, as waste and
the economic, environmental, and social implications. In general, costs increase. Operators therefore need to employ precautionary
the goal of all businesses is to act economically, but the social measures for misestimating data usage/demand.
aspects of sustainable operations should not be neglected (Sodhi, The reasons behind considering sustainable data plan in the
2015). Jaehn (2016) highlights the “triple bottom line” – i.e., the mobile services are given as follows. First, a sustainable data
3Ps (profit, people and planet) ecosystem coined by Elkington plan is key to behaviour-based pricing, since all profitable tar-
(1997) – of sustainable operations on “quantitative aspects of busi- iff/rates are based on data usage. Second, sustainable plans could
ness administration, which in addition to economic objectives aims remould consumer preferences (wants) and steer consumption pat-
equally at sustainability in the environmental and/or social sense”. terns towards environmentally benign activities. Third, sustainable
Tang and Zhou (2012) illustrate that a 3Ps ecosystem is comprised plans could eliminate price obfuscation and engender efficiency.
of five core elements and various flows. Chioveanu and Zhou (2013) point out that the “strategic choice
Enterprises can improve their economic sustainability (such as of price presentation formats, or simply, price framing, has re-
values and vision) by designing and producing products in an envi- ceived relatively little attention in the economics literature in spite
ronmentally and socially responsible way. Ulhøi (1995) stresses the of its prevalence”. Fourth, sustainable data plans require sophisti-
requirement of corporate sustainable development (CSD), which cated segmentation in dealing with consumer heterogeneity that
remoulds consumer preferences (wants) and steers consumption interacts with consumer satisfaction. Therefore, sustainable data
patterns towards environmentally benign activities by reducing plans will help operators reach their profitability goals with effi-
throughput per unit of final products/services. Jenkin, Webster, cient price framing and steer consumer behaviour towards envi-
and McShane (2011) discuss the green information technologies ronmentally benign activities in achieving sustainability.
and systems that involve initiatives and programs directly or indi- In search of sustainable data plan, it is essential to find an intel-
rectly contributing to environmental sustainability. Tang and Zhou ligent segmentation method that captures (or characterizes) con-
(2012) use the term “cultivate future consumers” to stress eco- sumer heterogeneity of data usage/demand. Segmentation refers
nomic sustainability in creating a new market for new values (i.e., to distinct consumer patterns evoked by contextual differences
re-manufacture/recycle, eco-friendly, ethical, socially responsible, (Schlager & Maas, 2013). Elaborating the requirements of sustain-
etc.). able data plans requires carefully investigating consumer hetero-
With dramatically increasing data traffic, mobile telecommuni- geneity. In this paper, we propose an intelligent segmentation
cations operators could profit sustainably from charging consum- based on a novel density-based subspace clustering method, i.e.,
able services. A profitable mobile tariff (i.e., billing plan) based on a three-stage optimization procedure for data usage that illustrates
data/service demands is therefore vital (see Lin, Lin, Wu, & Wang (1) heterogeneity (i.e., heteroscedasticity) among clusters, and (2)
(2016)). Besides generating ultimate profit, the operator should intensive perturbation (i.e., anomalies) and sparsity within clus-
guarantee the Quality of Service (QoS) for all services provided to ters. In our work, a penalized likelihood estimation of parameters
maintain and attract subscribers and reduce customer churn. Sat- is undertaken with a modified EM algorithm for the Multivariate
isfactory service provision requires operators to enable (1) higher Gaussian Mixture (MGM) model. Our approach can simultaneously
download and upload data rates, (2) lower packet latencies, and decide the number of components to be mixed for the underly-
(3) supporting new multimedia services. Business analytics, which ing Gaussian mixture model, the mixing weights and the parame-
is an interdisciplinary context of transforming data into insight for ters of the Gaussian distribution components. Our method makes
making better decisions, accomplishing business goals, and creat- two contributions. First, we apply a penalized likelihood method to
ing values (see Mortenson, Doherty, and Robinson (2015), Royston identify the MGM model that combines finite multivariate Gaus-
(2013), and Vidgen, Shaw, and Grant (2017), for example). In a sian distributions for the modified EM algorithm. Second, we pro-
comprehensive review of sustainable operations, Jaehn (2016) sum- pose an entropy-based initialization algorithm for the modified EM
marizes fields of sustainable operations and classes of systems con- algorithm. Therefore, all three stages of the modified EM algorithm
sidered4 and discusses a number of studies in many sectors. Kunc (i.e., initialization, expectation, and maximization) are optimal for
and O’Brien (2018) address how organizations could include busi- heuristics. Our method enhances the capabilities of the classic EM
ness analytics in their strategy processes and consider the poten- algorithm framework for a mixture model and clustering analysis
tial role of business analytics in strategic decision support. We due to its three characteristics: (1) robustness to perturbation and
would like to add our study on business analytics of sustainable sparsity, (2) fast convergence to the global optimum, and (3) com-
operations for mobile telecommunications services (i.e., Informa- putational efficiency in heuristic searching.
tion and Communications Technology, ICT) to the OR community. We then provide a simulation study based on three conven-
We hereby attempt to present our business analytics by proposing tional data sets to verify the accuracy of the proposed method
a density-based subspace clustering method for merging anoma- when the ground truth is known and apply the proposed method
lous data usage to reach the desirable business value. with the real mobile usage data (which illustrates strong hetero-
Behaviour-based pricing is widely adopted by mobile operators geneity) to transform data into insight for business intelligence
in designing their tariffs: consumers are charged different prices (e.g., sustainability and value creation). Our framework of business
depending on their consumption patterns (i.e., data usage) (see Lin analytics for wireless mobile telecommunications can be summa-
et al., 2016; Siebert, 2015, for example). An increasing number of rized as follows:
studies show that consumers do not always make optimal deci- Features MGM Clustering
sions (Kalayci, 2015), particularly when price obfuscation exists. In Data −−−−−−−−−−−→ Descriptive −−−−−−−−−−−→ Predictive
Mobile Data Usage Method Verification
practice, we generally observe two cases: (1) consumers pay for a
Strategic Options
data plan that is above their actual demand (i.e., the specified plan −−−−−−−−−−−−−−−−−−→ Prescriptive.
Machine Learning & Simulation
We organize this article as follows. Section 2 introduces the

4
Table 1in Jaehn (2016), p. 258. industry background and relevant literature focusing on current
org/10.1016/j.ejor.2019.02.046
JID: EOR
Y.-T. Chen, E.W. Sun and Y.-B. Lin / European Journal of Operational Research xxx (xxxx) xxx 3
challenges for mobile telecommunications in the ICT industry with studies have carried on how ICT can change society and how poli-
respect to sustainability and our proposed method, i.e., intelli- cies about ICT can influence the society and business. For example,
gent segmentation that fosters the sustainability of behaviour- both Green of IT and Green by IT significantly influence the na-
based pricing. In Section 3, we describe the proposed clustering tional carbon emission policies and international agreements (see
methodology for intelligent segmentation of data. In Section 4, we Jenkin et al., 2011). Corporate engagement with communal soci-
present a simulation study that verifies accuracy of the proposed eties on ICT provides a constitution of the technology-enabled con-
clustering method for synthetic data with different characteristics. fines of corporate strategy and community influence (see Fjeldstad,
In Section 5, we perform a business analytics with the primary Snow, Miles, & Lettl, 2012). Racherla and Mandviwalla (2013) in-
data from a large telecom operator in Taiwan. We show that our tegrate a multilevel sociotechnical framework involving both mi-
proposed method indeed suffices the goal of business analytics cro and macro factors that are connected to the ICT infrastructure,
in search of business value. We summarize and highlight future universal access, and socioeconomics. Uratnik (2016) explain the
works in Section 6. interaction between corporate leveraging sustainable innovation
and virtual community in the co-production and co-creation of
2. Industry background and relative studies value. Based on a sample of 139 countries, Gouvea, Kapelianis, and
Kassicieh (2018) show that ICT has a significantly positive effect on
2.1. ICT and sustainable development environmental sustainability.
Information and Communications Technology (ICT) enables 2.2. Industrial challenges

users to access, store, transmit, and utilize information by unify-
ing sophisticated systems for telecommunications. Davenport and The profitability of ICT service providers calls for continuous
Prusak (1997) proposed a human-centred approach based on “In- investments in their infrastructure and improving service quality
formation Ecology” to design and manage information environ- (including matching consumption propensity, i.e., affordability). In-
ments encompassing: (1) information strategy, (2) information pol- stalling, upgrading, and maintaining their infrastructure requires
itics, behaviour, and culture, (3) information staff and management ICT service providers to pay more attention to sustainability issues
processes, and (4) information architecture. Eryomin (1998) points not only in terms of the environmental impact, but also to effi-
out that the hybrid “information ecology” science could shape our ciently reallocate scarce resources by avoiding information conges-
lives in terms of the social and economic implications of computer tion (Anderson & De Palma, 2009). Service charge tariff or billing
and communication technologies. For example, Germany initiated policies consequently aid ICT operators in these tasks. On the other
an integration of cyber-physical systems, the Internet of Things, hand, affordability remains a major constraint (see Crémer, Rey, &
and cloud computing named “Industry 4.0” for intelligent manu- Tirole, 20 0 0). Other tasks challenge ICT service providers when de-
facturing in the fourth industrial revolution (see Schwab, 2016). termining the level of investments and expenditure for customer
Digital ecosystems deriving from ICT have been a major con- acquisition and retention in competitive and dynamic markets with
tributor to the evolution and growth of the global economy. The respect to customer affordability. Min, Zhang, Kim, and Strivastava
Ericsson (2013)5 report illustrates (1) an approximate 1% increase (2016) empirically show that in wireless telecommunications mar-
in GDP for every 10% increase in the broadband penetration rate, kets, a firm’s acquisition cost per customer is more sensitive to
and 80 new jobs for every 10 0 0 new broadband users, (2) doubling market position and competition than retention cost per customer.
the broadband speed increases GDP by 0.3%, (3) broadband access- In addressing this issue, an accurate grouping/segmentation of user
ing increases household income by USD 2100 per year in OECD data demand for an appropriate billing strategy is critical.
countries with 4 megabits per second broadband and USD 800 In analysing data demand and particularly the heterogeneity of
per year in BIC countries with 0.5 megabits per second broadband, data usage, we face several difficulties. Consumers generally have a
(4) broadband speed upgrading from 0.5 megabits per second to degree of uncertainty about their future usage patterns. This usage
4 megabits per second increases household income by around USD uncertainty derives from hidden behaviour due to the fact that we
322 per month in OECD countries and by USD 46 per month in BIC cannot anticipate usage. As online product content changes rapidly
countries. UN Secretary-General Ban Ki-moon stated, “ICTs can be over time, consumers are unable to recognize the effects of their
an engine for achieving the Sustainable Development Goals. They current usage on their future usage. Data demand may suddenly
can power this global undertaking”.6 be driven by certain unpredictable social media phenomena (e.g.,
The mobile industry is the first sector to commit as a whole surge in popularity) and users may consequently change their be-
to the United Nations Sustainable Development Goals (SDGs). In haviour as well as their data consumption without prior indica-
September 2008, the Global System for Mobile Communications tions. For example, the current Pokémon Go mania (a location-
Association (GSMA)7 launched the Green Power for Mobile (GPM) based augmented reality mobile game) or simply Pokémania9 has
programme in order to guide the mobile industry systematically about 21 million daily active users10 with an average playing time
contribute to the environmental issues. With the publication of the of 33 minutes11 when the game was released in July 2016. Veri-
2017 Mobile Industry Impact Report, GSMA highlights its increas- zon Communications Inc., the largest wireless telecommunications
ing impact on sustainable development8 . Frisiani, Jubas, Lajous, and provider in the US, reported that it takes around 10 megabits
Nattermann (2017) point out that mobile operators can increase per hour of data usage and around 7 hours a day for 30 days
their sales to existing customers by making timely appealing of- straight to consume 2 gigabytes of data.12 Other activities caused
fers on services or hardware after analysing the customer data. by the surge in popularity, such as watching online high-definition
Value creation for business through ICT has been considered as videos (that usually consume as much as 350 megabits data us-
a primary way of contribution to society (see Lee, 2016). Several
9
see “Times Reporter Descends Into Pokémania”. The New York Times. Retrieved
5
Socioeconomic Effects of Broadband Speed, Ericsson, 2013. July 19, 2016.
6 10
“Countries adopt plan to use Internet in implementation of Sustainable Devel- https://www.surveymonkey.com/business/intelligence/
opment Goals”. Retrieved 15 March 2016, from Sustainable Development Goals: 17 pokemon- go- biggest- mobile- game- ever/.
Goals to Transform our World. United Nations, 16 December, 2015. 11
https://sensortower.com/blog/pokemon- go- usage- data.
7 12
www.gsma.com http://www.businessfinancenews.com/29596- will- verizon- communications- inc-
8
https://www.gsma.com/betterfuture/2017sdgimpactreport lose- pokmon- go- players- amid- data- consumptio/.
org/10.1016/j.ejor.2019.02.046
JID: EOR
age per hour), increase data demand without any prior indications. Jain (2010) summarizes several methods that define connectedness
In addition, what we call hidden information implies that we can- in the feature space, such as (1) the Jarvis–Patrick algorithm defin-
not obtain sufficient information in our analysis due to (1) con- ing the connectedness between a pair of points as the number of
sumers’ considerable lack of ICT literacy and corresponding will- common neighbours they share, where neighbours are the points
ingness to pay;13 (2) demographic variables, education background, in the region of a predetermined radius around the point, (2) DB-
and career experience in traditional consumer clustering analysis SCAN algorithm applies the Parzen window method in search for
no longer indicate the heterogeneity of data usage; and (3) the connected dense regions, and (3) spectral (or graph theoretic) clus-
regulator’s law/rule for computer-processed personal data protec- tering that represents the data points as nodes and weighted pair-
tion. This leads to the extremely urgent need to find an appropriate wise similarity as edges that connect the nodes. These methods
method to analyse (e.g., characterize and forecast) consumer data rely on (1) the neighbourhood size measured by distance, and (2)
usage/demand. the minimum number of points that a cluster can include in its
Another challenge in analysing data usage is overuse (see neighbourhood. When heteroscedasticity14 exists among clusters,
Suissa, 2015; Yan, 2015), which the World Health Organization de- the alteration of these two parameters leads to inadmissible sepa-
fines a behavioural addiction/dependence syndrome irrespective of ration in the feature space by using only a single density threshold
age, gender, ethnicity, career, or economic status. Such behavioural (see Campello, Moulavi, & Sander, 2013). As a result, probabilistic
addictions clearly lead to abnormally high data demand. In some mixture models have been introduced.
surveys, over 30% of user perceived themselves as addicted and The probabilistic mixture model approach assumes the exis-
these users consume nearly twice the data compared to non- tence of an underlying data generator driven by a mixture distri-
addicted users (see Billieux, 2012). Some addicted users report data bution function such that a cluster is defined by one or more mix-
usage beyond three standard deviations from the upper hinge (see ture components. The EM algorithm is therefore an ideal choice
Tossell, Kortum, Shepard, Rahmati, & Zhong, 2015). to infer the parameters of these mixture models (see Lin, 2009;
Ray & Ren, 2012; Yao, 2010). In addition, the Bayesian approach
2.3. Our method and relative studies has been proposed to improve performance when employing mix-
ture models and shows its superiority in terms of data analysis
Commercial mobile telecommunications operators typically em- (see Filippone, Camastra, Masulli, & Rovetta, 2008; Jain, 2010). Al-
ploy billing plans that classify customers into clusters of different ternatively, hidden Markov models (HMMs) can be applied by first
amounts of mobile data consumption. In this paper, we propose mapping each data sequence into an HMM, then defining a suitable
a novel density-based method of demand segmentation to appro- distance among HMMs, and finally proceeding to cluster the HMMs
priately classify the mobile data usage clusters. In machine learn- based on the distance matrix. As HMMs make explicit use of the
ing, our clustering task problem (see Santi, Aloise, & Blanchard, distance, they are remarkably geared toward continuous valued
2016) stems from learning a finitely valued function (i.e., the clas- time-series (possibly multi-dimensional), see Dias, Vermunt, and
sifier) on a compact metric space. Without any prior knowledge Ramos (2015) and references therein. Although HMMs allow vari-
of the data structure, clustering aims to reveal the natural data ant structures to be modelled directly, algorithms for HMMs do not
structure using similarity measures. Hence, objects that are simi- estimate the number of hidden states. In order to train HMMs a
lar should be placed in the same cluster while objects that are not (large) set of seed sequences is generally required. When the given
should be placed in different clusters (see Meyer & Olteanu, 2013). seed sequence is long, there are many possible HMMs for it, choos-
We face the same clustering problem as Santi et al. (2016) such ing one can be difficult. On the other hand, HMMs are not conspic-
that all available dissimilarity matrices are used to deal with uously meaningful for short data sequence because a small num-
heterogeneity. ber of hidden states still have a high probability to estimate more
Numerous studies use the axiomatic fuzzy set (AFS) cluster- parameters than the number of observation, see Khreich, Granger,
ing methodology (see Bagirov & Yearwood, 2006; Xie, Gao, Xie, Miri, and Sabourin (2012) and references therein.
Liu, & Grant, 2016; Xu, Liu, & Chen, 2009). Kim, Lee, Lee, and In the next section, we propose a three-stage optimization pro-
Lee (2005) conduct a kernel-based classification with four clus- cedure for a density-based subspace clustering analysis in line with
tering algorithms (i.e., K-means, Fuzzy C-means, average linkage, the classic expectation-maximization (EM) algorithm (see Bishop,
and mountain algorithm) and evaluate their performances for vari- 2006). In our work, a penalized likelihood estimation of param-
ous datasets. Their results indicate that each kernel clustering algo- eters is carried out to modify the EM algorithm for the Multi-
rithm evidently performs better than its conventional counterpart. variate Gaussian Mixture (MGM) model inference. Our approach
Carrizosa, Mladenović, and Todosijević (2014) propose a variable can simultaneously decide the number of components to be mixed
neighbourhood search method in clustering networks. Bai, Dhavale, for the underlying Gaussian mixture model, the mixing weights,
and Sarkis (2016) propose a hybrid methodology of fuzzy C-means and the parameters of the Gaussian distribution components. Our
for decision modelling in green supply chains. However, data usage framework consists in three main procedures, i.e., initialization, ex-
does not conform to the stereotypical data investigated and entails pectation, and maximization in the classic EM algorithm frame-
two stylized facts: (1) features are entangled due to the hetero- work. In contrast to other studies (see Nguyen, Wu, & Zhang,
geneity of many dissimilarity matrices (possibly driven by hidden 2014; Yang, Lai, & Lin, 2012), the main contributions of our work
behaviours and hidden information), and (2) many anomalies ex- are two-fold. First, we consider the entropy of distances between
ist due to overuse. Density-based clustering therefore constitutes data in the initialization stage rather than using K-means or K-
an appropriate clustering method to deal with these stylized facts means++. Our experimental results show that this is insensitive
(see Sander, Ester, Kriegel, & Xu, 1998). to extreme values in the dataset. Second, we employ a decision
The density-based clustering method separates the feature function driven by efficiency criteria (i.e., AIC, BIC and HQC) that
space into high-density and low-density regions (with nonlinear simultaneously consider the likelihood and penalty.
separating hyper-surfaces). The connected dense regions in the fea-
ture space are defined as clusters. Such connectedness (equiva-
lent to similarity) can be characterized by different algorithms.
14
Heteroscedasticity (or heteroskedasticity) here refers to the circumstance in
which the variability (or inconsistency) of clusters is unequal across the range of
13
“Customers typically do not know the marginal price until after they decide to values of a feature that describes it, namely, clusters with different variability quan-
consume” (Bushnell & Mansur, 2005). tified by the variance or any other measure of statistical dispersion.
org/10.1016/j.ejor.2019.02.046
JID: EOR
3. Multivariate Gaussian mixture model other data in the dataset into B disjoint bins, and we therefore
have a sequence of distance intervals a0 , a1 , . . . , aB . For every xn ,
This section reports the three stages of the multivariate Gaus- we construct a distance vector dn = [dn,1 , · · · , dn,N ], and then par-
sian mixture model and shows that all are heuristically optimal. tition these distances into histogram pn = [ pn,1 , · · · , pn,B ], where

0 ≤ pn,b ≤ 1 and Bb=1 pn,b = 1 for 1 ≤ b ≤ B. The discrete entropy of
3.1. Initialization: the first stage optimization dn is defined as:
For 1 ≤ m ≤ M and 1 ≤ n ≤ N, let = {x1 , · · · , xn , · · · , xN } be

B
H ( dn ) = pn,b log pn,b . (4)
a set of measured data usages from N users, where xn =
b=1
[xn,1 , · · · , xn,M ] is a vector and xn,m is the data usage of the nth user
quantified by the mth measure15 . We assume that the measured Eq. (4) is used in this paper to estimate the relative positions of
data usages follow a K multivariate Gaussian mixture model: data in the dataset.
If xn is in the dense area, the distances from xn to those data

K
belonging to the same cluster or in the adjacent areas contribute to
f ( xn ) = wk f k ( xn )
shorter distances. Those data not in the adjacent areas, i.e., anoma-
k=1
⎡ −1 ⎤ lies, result from longer distances. The distribution of pn diversifies

K exp (xn − μk )T C−1 ( x − μ ) and consequently H(dn ) is relatively large. On the other hand, if
⎢ 2 k n k
⎥
= wk ⎣ ⎦, (1) xn is an anomaly, the distances from xn to all others are only dis-
k=1 2π |Ck |
M tributed among some specific ranges with large values and H(dn )
is relatively smaller. If we use order statistics to sort the entropy
where wk and fk (xn ) denote the weight and probability of the kth values and directly select data with the top K entropy values as
Gaussian distribution. Let μk and Ck be the mean vector and the initial values for μk , data from the same cluster is very likely to be
covariance matrix of the kth Gaussian component, where 1 ≤ k ≤ K. chosen, as data with similar entropy values are usually clustered in
Given , the log-likelihood L() for the K multivariate Gaussian neighbouring regions. Therefore, we use the weighted random se-
mixture model can be expressed as: lection approach to initialize μk and give higher priorities to those
data with larger entropy values. Imagine a queue with unequal-

N
K
sized slots. The size of the slot for each data sn to occupy is pro-
L() = ln wk fk (xn )
portional to the entropy value of the data with respect to the total
n=1 k=1
H ( dn )
K
N 1 entropy values of the dataset N , and each slot is num-
M
= ln wk − ln (2π ) − ln |Ck | m=1 H (dm )
2 2 bered in the corresponding index. In every iteration, we generate a
n=1 k=1
random number r ∈ [0, 1), find out which slot r falls into, and the
(xn − μk )T |Ck |−1 (xn − μk ) data with the index will eventually be selected as μk .
− , (2)
2 Without choosing anomalies as initial parameters, we speed up
then the Expectation Maximization (EM) algorithm can be derived the convergence of the EM algorithm to fit the Gaussian mixtures.
for the above model-based clustering (elaborated in Section 3.2). A good initial estimate for the covariance matrix and the weight is
The EM algorithm is easy to implement and numerically sta- the within-cluster covariance matrix and the fractions of the num-
ble under certain conditions that suffice for reliable global con- ber of the data allocated to each cluster. For 1 ≤ k ≤ K, we define K
vergence. However, when multiple maxima exist, it may converge indicator functions δ k , apply these to , and denote δk,n = δk (xn )
slowly without convergence bounded to the global maximum. In which takes the value 1 when xn is the closest to μk among the K

addition, the EM algorithm is sensitive to the initial values of clusters, and 0 otherwise. Let θ k = δ =1 {xn } be a subset of data
k,n
wk , μk and Ck that start the search on the likelihood surface which are assigned to the kth cluster, where μk yields the smallest
(Bishop, 2006). The main approach to initialize the EM algorithm distance. Then, the initial values for Ck and wk can be expressed as
is K-means, where each cluster is represented by the center of the follows:
cluster gravity, but K-means is probably dominated by extreme val- |θ k |
ues in the dataset and therefore results in a singular problem in wk = (5)
N
the EM algorithm. To resolve this issue, we propose a stochastic
initialization such that the multiple starting points are chosen and
1
evaluated by their entropy with respect to their ambient values. Ck = (xn − μk )(xn − μk )T . (6)
Anomalies are those data with extreme values or abnormal dis-
|θ k | ∀x ∈θ
n k
tances from most data in the dataset and are not necessarily re-
Using Eqs. (3)–(6), the initialization process is described in
lated to or clustered with each other. Our experience in commer-
Algorithm 1 in Appendix 2.
cial mobile operations indicates that anomalies do not occupy the
main proportion of the dataset. We use the Euclidean distances di,j
3.2. The EM algorithm
to be the measurement between any two data in the dataset. For
1 ≤ i = j ≤ N, we have
After initialization, we obtain the initial parameters for each
di, j = xi − x j 2 . (3) cluster, i.e., μk , Ck and wk for 1 ≤ k ≤ K. Starting from the initial
parameters, the EM algorithm iteratively updates the parameters
With Eq. (3), we identify anomalies in the dataset by discrete
until it yields the largest likelihood given the data, i.e., the con-
entropy. Specifically, we use the histogram approach to approxi-
vergence is reached. In the EM algorithm, each iteration includes
mate the probability density function of the distances from one
two steps E and M. The E step computes the membership weights
data to all other data in the dataset and estimate the discrete en-
wk,n of data xn in the kth cluster, which reflect the certainty of xn
tropy. The histogram partitions the distances from one data to all
belonging to the kth cluster. For ≤ k ≤ K and 1 ≤ n ≤ N,
w f ( xn )
15
Such measure is a feature used to characterize data usage, for example, maxi- wk,n = K k k . (7)
mum, minimum, or average usage, depending on the service provider. k=1 wk f k (xn )
org/10.1016/j.ejor.2019.02.046
JID: EOR
For 1 ≤ k ≤ K, the M step uses the new membership weights to up- possible K in MGM, the cross validation process is repeatedly ex-
date the parameters in the following order: ecuted in F rounds, where every data is tested once in the pro-
1 cess as all data are equally important. Every round of cross vali-
N
wk = n=1 wk,n , (8) dation involves training with F − 1 subsets T and validation with
N
the remaining subset V and = T ∪ V . In the training stage,
N we use T to estimate the appropriate K pair parameters, includ-
wk,n xn
μk = n=1
N
, (9) ing wk , μk and Ck for 1 ≤ k ≤ K. Then in the validation stage, we
n=1 wk,n use the aforementioned criteria to evaluate the fitness of the K-
N Gaussian mixture model on V . If we obtain similar scores dur-
n=1 wk,n (xn − μk )(xn − μk )T
and Ck = N . (10) ing the F rounds, then we can be fairly confident about the num-
n=1 wk,n ber of mixture components. On the other hand, if the scores vary
Eqs. (7)–(10) are iteratively executed until convergence. Conver- widely in the F rounds, then we should be sceptical about the mix-
gence is guaranteed because the likelihood is proven to increase ture model’s fit and clustering performance. After all candidates
monotonically in each iteration and is bounded. When the differ- have been iterated, the model that minimizes all testing criteria
ence of likelihoods in the consecutive two iterations is smaller in F rounds will be selected eventually. This process is described
than ( = 10−10 in our implementation), the convergence is in Algorithm 3 in Appendix 2.
reached and we obtain parameters wk , μk and Ck . The procedure
can be summarized in Algorithm 2 in Appendix 2. 4. Accuracy evaluation
3.3. Penalized likelihood This section compares MGM with other previously proposed
methods, including K-means, K-means++, Fuzzy C-means (FCM),
To determine the number of the MGM components to best ap- and normalized cuts (N-Cuts) (Bai et al., 2016; Santi et al., 2016;
proximate the true distribution of the dataset, we first specify the Shi & Malik, 20 0 0). We use three datasets to examine the clus-
range [Kmin , Kmax ] for the number of components in the mixture tering accuracy under different cluster characteristics. Since these
model, construct a candidate pool for all possible numbers K of datasets are generated from the given mixture model information,
the MGM components where Kmin ≤ K ≤ Kmax , and use the model including the data and the labels that mark the true cluster of each
selection technique to evaluate the performance of all candidate data, it is easy to evaluate the accuracy of our proposed method by
models. This section considers some frequently used information checking whether the clustered labels are consistent with the true
criteria that trade off accuracy and complexity in the model con- labels. The first dataset is the stereotype Gaussian mixture model
struction. The likelihood of the candidate model expresses the ac- composed of two multivariate Gaussian distributions as per Fränti,
curacy of the fitting model, and the higher the likelihood, the bet- Virmajoki, and Hautamäki (2006) (Dataset-I). The second dataset
ter the performance fit. The total number of free parameters in the (Dataset-II) used for clustering analysis follows Yang et al. (2012).16
candidate model defines the complexity. At the same likelihood The third dataset (Dataset-III) consists of a Gamma mixture model.
level, a simpler candidate model with better performance should
be selected.
4.1. Synthetic Dataset-I
Akaike (1974) proposed the Akaike information criterion (AIC)
that is generally considered the first model criterion. The AIC is
As Jain (2010) and Zimek, Schubert, and Kriegel (2012) pointed
defined as:
1 out, clustering high-dimensional data is a big challenge due to
AIC = (K − 1 ) + K M + M (M + 1 ) − 2 log (L() ) (11) the distance concentration effect, such that irrelevant features con-
2 ceal relevant information when data dimension increases. More-
and some authors prefer using AIC in practice. Schwarz (1978) pro- over, the vast majority of real world clustering datasets have the
posed the Bayesian information criterion (BIC) that is a popular overlapping characteristic. In Dataset-I, we check the clustering ac-
measure to determine the number of mixture components: curacy of our method under the effects of (1) varying dimensions,
1 and (2) overlapping clusters. The Gaussian distribution dimension
BIC = (K − 1 ) + K M + M (M + 1 ) varies from 2, 4 to 16 (see Fränti et al., 2006). Taking the bivariate
2
Gaussian mixture model as an example, there are 10 different test
× log (N ) − 2 log (L() ). (12) cases as shown in Figure 1, where μ1 and μ2 are always located in
Claeskens and Hjort (2008) suggested that in practice the crite- the same positions but the value of the covariance matrix gradually
rion should be much more important than the number of param- increases in range from 10 to 100 along with the index increasing
eters. However, in the BIC criterion, the term log(N) may enlarge from 1 to 10. The overlapping region of two clusters also increases.
the penalty term depending on the value of N. Hannan and Quinn For each test case, there are 2048 paired data including the data
(1979) modified BIC and proposed the Hannan–Quinn information values and the true labels. With 3 different Gaussian distribution
criterion (HQC) that reduces the penalty from log(N) to log(log(N)), dimensions, there are 30 test cases in Dataset-I.
namely: As the dataset has been labelled for the underlying true clus-
1 ters, it is easy to identify errors. We report the clustering er-
HQC = (K − 1 ) + K M + M (M + 1 ) ror rates with different methods in Table 1 for Dataset-I in-
2 dicating that our method has the lowest error rates for the
× log (log (N ) ) − 2 log (L() ). (13) three experiments (i.e., with 2-dimensional, 4-dimensional, and
16-dimensional data). We observe two interesting facts. First, K-
We consider AIC, BIC and HQC simultaneously and select the means, K-means++ and FCM show almost identical error rates.
model with the best performance from all candidate models. Second, when the dimension increases, the errors decrease for
In addition to using the information criterion to evaluate the K-means, K-means++, FCM and MGM. The Dataset-I benchmark
performance of candidate models, we also use the F-fold cross val- data shows strong distance-dependent characteristics, particularly
idation technique to examine the stability of the candidate mod-
els, which randomly partitions the original dataset into F equal-

sized subsets f for 1 ≤ f ≤ F. That is = 1≤ f ≤F { f }. For every 16
Clustering benchmark datasets: http://cs.uef.fi/sipu/datasets/.
org/10.1016/j.ejor.2019.02.046
JID: EOR
4.2. Dataset-II
The accuracy of clustering methods is easily dominated by the

anomalies in the dataset, particularly for the distance-based meth-
ods. For Dataset-II, we follow the simulation method used by Yang
et al. (2012) to test the clustering performance under the effects of
(1) different level of perturbation (anomalies), and (2) overlapping
clusters. To examine the robustness of the proposed method, we
generate the parameters of the bivariate Gaussian mixture model
with different K. We set Kmin = 3 and Kmax = 5 and therefore have
three test cases (i.e., K = 3, 4 and 5) in Dataset-II. We first assign
K = 3 and apply Algorithm 1 to derive the corresponding parame-
ters, including the mean vectors:
μ1 = (0.5 0.5 )T , μ2 = (6 6 )T , μ3 = (10 10 )T ,
and the covariance matrices:

0.5 0.1 7.5 4 7.5 4
C1 = , C2 = , C3 = .
0.1 0.5 4 7.5 4 7.5
Fig. 1. Illustration of 10 different cases of the bivariate Gaussian mixture model in To model the distribution of the actual mobile data usages with
Dataset-I. four Gaussian models, we obtain the mean vectors:
Table 1 μ1 = (0.5 0.5 )T , μ2 = (3 3 )T ,

Error rate (%) of different methods performed with the synthetic Dataset-I (i.e.,
the benchmark G2-dataset) under predetermined dimension (M) and covari-
μ3 = (10 10 )T , μ4 = (25 25 )T ,
ance matrices (Ck ). and the covariance matrices:
M Ck K-Means K-Means++ FCM N-Cuts MGM

0.4 0.1 3 1
C1 = , C2 = ,
2 10 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.1953 0.0 0 0 0 0.1 0.4 1 3
20 0.5859 0.5859 0.0 0 0 0 1.0742 0.1465
30 5.0781 5.0781 0.9277 5.8594 0.9766 16 10 64 36
40 9.4727 9.4238 3.6132 10.0100 1.9531 C3 = , C4 = .
10 16 36 64
50 16.9430 16.9430 8.2031 17.9690 2.0166
60 18.9940 19.3360 11.8652 20.0680 5.8594 If we use a five-Gaussian mixture model to approximate the distri-
70 24.0230 24.0230 14.8925 24.7560 5.8594
80 27.0020 27.0020 18.7988 28.0270 6.0547
bution of the actual mobile data usages, we derive the mean vec-
90 29.6390 29.6390 21.9726 29.7360 9.8633 tors as:
100 30.7130 30.7130 25.5371 31.0060 12.1094
Mean 16.2451 16.2744 10.5810 16.8701 4.4839
μ1 = (0.5 0.5 )T , μ2 = (2.5 2.5 )T , μ3 = (6 6 )T ,
Std 11.7976 11.8101 9.0140 11.8289 4.1773 μ4 = (16 16 )T , μ5 = (40 40 )T ,
4 10 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0488 0.0 0 0 0
and the covariance matrices:
20
30
0.0 0 0 0
0.0 0 0 0
0.0 0 0 0
0.0 0 0 0
0.0 0 0 0
0.0 0 0 0
0.7324
5.1758
0.2100
0.2930

0.5 0.1 3 1 7.5 4
40 0.5859 0.5859 0.5859 10.6450 1.1230 C1 = , C2 = , C3 = ,
50 2.2461 2.2461 2.2460 14.9410 1.9040 0.1 0.5 1 3 4 7.5
60 4.3945 4.3945 4.3457 21.1910 3.6133
70 7.6660 7.6660 7.9589 23.7300 6.0547 36 20 100 64
C4 = , C5 = .
80 9.4727 9.4727 9.4238 26.8550 7.1777 20 36 64 100
90 12.3540 12.3540 12.3535 28.4180 7.1777
100 15.6740 15.6740 15.6738 29.6390 7.1777 For each K, we sample 2500 paired data, including the data
Mean 5.2393 5.2393 5.2587 16.1376 3.4731 and the true labels. In each test case, we consider the perturba-
Std 5.7445 5.7445 5.4604 11.4459 3.1387
tion of anomalies, since extremely large data usages indeed exist
16 10 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.0977 0.0 0 0 0 in our measured data of the commercial mobile operation dataset.
20 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 0.9277 0.0 0 0 0
Anomalies are sampled from a uniform distribution in the range
30 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 5.5664 0.0 0 0 0
40 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 10.5960 0.0 0 0 0 [50, 200]. We gradually increase the occupied ratio of anomalies
50 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 15.9670 0.0 0 0 0 from 1% to 25% in each test case resulting from our observation of
60 0.0 0 0 0 0.0 0 0 0 0.0 0 0 0 20.1660 0.0 0 0 0 actual consumers’ mobile data usages where anomalies did not oc-
70 0.0977 0.1465 0.0976 24.2190 0.0 0 0 0
cupy the main proportion of the dataset. Each ratio of anomalies is
80 0.7324 0.7324 0.7812 27.3440 0.0 0 0 0
90 1.5137 1.5137 1.5136 29.4920 0.0 0 0 0
run 200 times and we evaluate the clustering accuracy under the
100 2.3438 2.3438 2.3437 31.5430 0.0977 predetermined perturbations.
Mean 0.4688 0.4736 0.4736 16.5919 0.0098 We compare our method with other algorithms by mixing 3, 4
Std 0.8255 0.8232 0.7849 11.7937 0.0309 and 5 bivariate Gaussian distributions with Dataset-II. For K = 3,
the true clusters in the data are 3. Table 2 reports the results
with different perturbation rates ranged from 1% to 25%. With
for higher dimensions, namely, by increasing the dimension, the relative low perturbation rates, the anomalies occupied from 1%
μk values depart further from each other. N-Cuts do not depend to 5% of the total 2500 samples for each experiment with dif-
on distance to separate clusters. On the other hand, the K-means, ferent K. To compare MGM with other methods, we use the for-
K-means++ and FCM are distance-based clustering methods, and mula (ErrorMGM − Errorothers )/Errorothers in the following compar-
therefore benefit from μk departure in clustering. MGM is not in- isons. Compared with K-means, the MGM reduces the error rates
fluenced by such departure of cluster centroids as it searches the by 82.7631%, 86.0226%, 82.7115%, 83.4223%, and 82.7116% respec-
density of features. tively when K = 3 and the perturbation increases from 1% to 5%,
org/10.1016/j.ejor.2019.02.046
JID: EOR
Table 2
Purtbation Error rate (%) of different methods performed on the synthetic Dataset-II (generated by mixing 3, 4 and 5 bivariate Gaussian
distributions) with different purtbation rates.
K Method Purtbations
1% 2% 3% 4% 5% 7.5 % 10 % 15 % 20 % 25 %
3 K-means 34.0 0 09 34.7136 35.5409 35.9666 36.7649 34.0326 34.0387 35.0340 36.5857 37.0646
K-means++ 66.9967 67.3203 67.6375 67.6375 68.2540 85.4167 84.6667 87.6667 86.3799 84.8485
FCM 11.5426 37.9788 38.5847 39.1516 40.1590 38.3139 41.8413 45.0953 46.9849 49.2195
N-Cuts 5.5459 5.7121 7.0794 7.6664 7.7573 7.4333 8.6907 13.8027 17.8437 29.8175
GMM 5.8607 4.8521 6.1445 5.9624 5.9317 8.6347 7.0680 6.9280 7.3355 6.8727
4 K-means 19.8497 51.0813 51.6519 52.1181 46.4023 51.2955 51.5364 51.7869 51.0458 54.3030
K-means++ 75.2475 75.4902 75.7282 75.9615 76.1905 93.2500 91.4141 97.0238 91.6667 93.0 0 0 0
FCM 14.9932 52.4821 52.9361 53.8604 55.7072 58.5635 56.10 0 0 58.2667 58.3542 62.3260
N-Cuts 5.1005 6.7080 8.1145 9.5736 12.0056 7.4840 7.5273 11.2560 25.0 0 0 0 25.0 0 0 0
GMM 5.3049 6.3574 7.3500 8.0821 8.5988 4.9370 5.0439 5.1190 5.6333 4.8425
5 K-means 34.2401 34.7204 37.3595 38.2639 40.4997 39.2176 42.1685 43.1204 45.3263 52.2900
K-means++ 80.1980 80.3922 80.5825 80.7692 80.9524 96.60 0 0 96.5657 96.0 0 0 0 94.7917 96.60 0 0
FCM 22.7408 41.1577 62.2457 62.7032 63.1704 61.4856 61.8937 62.6944 62.6475 63.6332
N-Cuts 4.0758 5.8328 6.7382 9.6332 11.7797 6.8528 8.0097 17.4948 20.0 0 0 0 20.40 0 0
GMM 4.4593 5.4796 6.3944 7.4455 8.1185 4.5944 4.2065 4.0520 4.3404 5.1020
and therefore the MGM reduces the error rate by 83.5262% on the shape parameters α1 = 0.5, α2 = 4.8, α3 = 40/3, and the scale
average. In comparison with N-Cut, when K = 3 and perturbation parameters β1 = 1, β2 = 1.25, β3 = 0.75. When K = 4, the shape
is 1%, N-Cut performs better than MGM by 5.6756%. However, parameters are α1 = 0.5, α2 = 3, α3 = 6.25, α4 = 9.75 and the
MGM reduces the errors by 15.0559%, 13.2058%, 22.2265%, and scale parameters are β1 = 1, β2 = 1, β3 = 1.6, β4 = 2.56. When
23.5339% respectively, as the perturbation increases from 2% to 5%, K = 5, the shape parameters are α1 = 0.5, α2 = 25/12, α3 = 4.8,
and therefore MGM reduces the error rate by 13.6693% on aver- α4 = 64/9, α5 = 16, and the scale parameters are β1 = 1, β2 = 1.2,
age. In general, when K = 3 and the perturbation is relatively low, β3 = 1.25, β4 = 2.25, β5 = 2.5, respectively.
the average error rates the MGM reduced are 83.5262%, 91.4121%, Like Dataset-II, we consider the perturbation of anomalies sam-
77.8743%, and 13.6693% compared with K-means, K-means++, pled from the uniform distribution over the range [50, 200]. We
FCM, and N-Cut, respectively. Similarly, we see an improvement in gradually increase the occupied ratio of anomalies from 1% to 25%
accuracy on average of 82.7607%, 91.0197%, 81.538%, and 7.8851% in each test case and check whether the proposed method is ro-
for K = 4 and 82.7115%, 90.9156%, 84.0753%, and 13.2058% for bust against perturbations. Our method is compared with other
K = 5 when comparing the MGM with the other four methods, algorithms using 3, 4, and 5 Gamma mixture distributions with
respectively. Dataset-III. Table 3 summarizes the results with different pertur-
Table 2 also indicates that MGM outperforms other methods bation rates ranged from 1% to 25%.
when the perturbation rates are high, i.e., the occupation rate In comparison with K-means, when K = 3 and the perturbation
ranges from 7.5% to 25% of the total observations. Compared with increases from 1% to 5%, MGM reduces the error rates by 82.2718%,
the K-means, when K = 3 and the perturbation gradually increases 80.2254%, 80.2771%, 82.8381%, and 84.2248%; therefore, MGM re-
from 7.5% to 25%, the MGM reduces the error rates by 74.6281%, duces the error rate by 81.9675% on average. Compared with N-
79.2354%, 80.2249%, 79.9498% and 81.4575%, respectively, and the Cut, MGM reduces the error rates by 17.5153%, 14.4056%, 8.9411%,
MGM reduces the error rate by 79.0991% on average. For K = 3, 21.6312%, and 18.9445% when K = 3 and the perturbation increases
average accuracy improves with the MGM by 79.0991%, 91.4096%, from 1% to 5%, and the MGM therefore reduces the error rate by
83.1264%, and 37.6314% compared with K-means, K-means++, 16.2875% on average.
FCM, and N-Cuts, respectively. Similarly, we can see an average im- Overall, when K = 3 and the perturbation is relatively low, the
provement in accuracy of 90.15%, 94.5119%, 91.274%, and 55.9284% error rates reduced by MGM are 81.9675%, 90.9060%, 80.6375%,
for K = 4 and 89.9159%, 95.3613%, 92.8644%, and 62.1131% for K = and 16.2875% compared to K-means, K-means++, FCM, and N-
5 when comparing our method with the other four methods, re- Cuts, respectively. Similarly, we can see an average improvement of
spectively. accuracy of 89.4245%, 93.5029%, 85.8516%, and 10.9469% for K = 4
We find that when the perturbation crosses over 5%, e.g., and 89.2690%, 95.1978%, 91.8338%, and 15.8988% for K = 5 when
7.5%, the advantage of our proposed method becomes very signif- comparing our method with the other four methods, respectively.
icant over the N-Cut. Along with the increase in perturbation, our When the perturbation is relatively high, i.e., the occupation
method shows consistently superior performance. rate ranges from 7.5% to 25% of the total observations, the aver-
age accuracy has improved with the MGM by 80.8424%, 89.8340%,
4.3. Dataset-III 86.6415%, and 32.8704% compared with K-means, K-means++,
FCM, and N-Cuts respectively when K = 3. When K = 4, compar-
For Dataset-III, we consider the clustering performance under ing our method with the aforementioned four methods, the aver-
the effects of (1) the Gamma mixture model, (2) different level of age improvements of accuracies are 89.4987%, 92.6798%, 91.7409%,
perturbation (anomalies), and (3) overlapping clusters. The most and 39.8774% respectively. Similarly, for K = 5, the accuracy im-
important property of the Gamma distribution is its skewness and provements are 90.3343%, 94.6990%, 93.4242%, and 54.3619%. We
kurtosis, which allows a skew center in one cluster and relaxes the find that when the perturbation crosses over 5%, e.g., 7.5%, the ad-
constraint that distribution in one cluster should always be sym- vantage of MGM becomes very significant over N-Cut.
metric. The shape parameter α and the scale parameter β deter- As previously mentioned, K-means, K-means++ and FCM are
mine the Gamma distribution, where the mean is αβ and the vari- distance-based clustering methods, and they are beneficial when
ance is αβ 2 . the data belonging to different clusters depart further from
Like Dataset-II, we determine all parameters of the Gamma each other. On the other hand, if there are some anomalies in
mixture model with different K values. When K = 3, we obtain the dataset, the performance is easily influenced by anomalies.
org/10.1016/j.ejor.2019.02.046
JID: EOR
Table 3
Purtbation Error rate (%) of different methods performed on the synthetic Dataset-III (generated by mixing 3, 4 and 5 bivariate Gamma
distributions) with different levels of purtbation.
K Method Purtbations
1% 2% 3% 4% 5% 7.5 % 10 % 15 % 20 % 25 %
3 K-means 33.3820 33.3527 34.0113 33.3687 34.0113 34.3693 33.7053 34.7247 36.3533 38.6893
K-means++ 66.6667 66.6667 66.6667 66.6667 66.6667 66.6667 66.6667 66.6667 66.6667 66.6667
FCM 18.7893 36.6460 37.0020 37.2840 38.7947 46.7007 50.2940 51.2087 53.6500 53.5947
N-Cuts 7.1747 7.7053 7.3667 7.3073 6.6193 7.6007 7.9927 7.9260 18.5167 22.1640
GMM 5.9180 6.5953 6.7080 5.7267 5.3653 7.7240 7.1720 6.5620 6.1053 6.3233
4 K-means 36.6595 50.0250 50.2780 50.5170 50.2755 50.7585 51.7875 52.5591 53.0270 53.5025
K-means++ 75.0 0 0 0 75.0 0 0 0 75.0 0 0 0 75.0 0 0 0 75.0 0 0 0 75.0 0 0 0 75.0 0 0 0 75.0 0 0 0 75.0 0 0 0 75.0 0 0 0
FCM 16.3830 51.5205 52.0625 53.6895 57.7125 60.7060 65.1480 68.6576 68.0410 71.2780
N-Cuts 6.9630 4.8710 5.0430 5.4890 4.9835 5.9520 4.9910 12.8444 17.5600 22.3750
GMM 5.9995 3.5800 4.5380 5.4050 4.8415 5.9485 5.0440 5.9394 5.1885 5.3300
5 K-means 36.5772 37.3772 34.3964 36.3632 34.6036 40.5408 39.5564 43.7424 47.5100 47.9176
K-means++ 80.0 0 0 0 80.0 0 0 0 80.0 0 0 0 80.0 0 0 0 80.0 0 0 0 80.0 0 0 0 80.0 0 0 0 80.0 0 0 0 80.0 0 0 0 80.0 0 0 0
FCM 22.4052 53.3804 61.3180 61.3100 61.5568 62.1312 62.8688 64.30 0 0 65.4080 67.2132
N-Cuts 4.1956 4.1320 4.7156 5.3688 4.4284 5.3512 5.4772 12.0500 17.1580 19.8392
GMM 3.0188 4.0132 4.6584 4.3320 3.1864 4.3428 3.3924 4.3216 3.9704 5.1768
Table 4
Summary of attributes for all datasets investigated in the simulation study.
Attributes Dataset-I Dataset-II Dataset-III
Perturbation 0% 1%-25% 1%-25%

Clusters overlapping Moderate Increased Increased
Number of clusters 2 3, 4, 5 3, 4, 5
Dimensionality 2, 4, 16 2 1
Probability density Gaussian mixture Gaussian mixture Gamma mixture
Compared with distance-based clustering methods, N-Cut is much significantly better. The advantage of MGM is to use variations in
closer to spectral clustering methods. The N-Cuts clusters data data density to define clusters.
are not directly based on distances but on measuring the to- K-means and K-means++ are sensitive to perturbation as they
tal dissimilarity between different clusters and the total similar- assume that each cluster has roughly equal numbers of observa-
ity within the clusters. When there are some elongated clusters in tions and clusters are spherical. FCM is also distance based, such
the dataset, N-Cuts becomes sensitive to anomalies. Along with the that it assigns membership to objects corresponding to each clus-
increase in perturbation, our method shows consistently superior ter center determined by the distance between the centroid and
performance. the object. Objects on the boundaries between clusters are as-
signed membership degrees between 0 and 1 indicating their par-
4.4. Accuracy evaluation tial association. It is sensitive to perturbation and assigns outliers
low (or even no) membership degree. The drawback of N-Cuts is
Clustering is fundamental when dealing with high-dimensional from the minimum cut criteria applied because it occasionally sup-
data, however, there is distressingly little general theory on it for ports cutting isolated objects due to the small values achieved by
application to a particular data. When we conduct clustering for partitioning them.
big data, difficulties arise due to (1) undeterminable number of We investigate the Gaussian mixture distribution and Gamma
clusters, (2) perturbation (or anomalies), and (3) non-spherical and mixture distribution in our simulation. They are members of
overlapping in a dataset. In practice, we usually observe that when the exponential family distributions. The Gaussian distribution is
perturbation increases the data points are either elongated in cer- fundamental and the Gamma model illustrates rich characteris-
tain directions (i.e., the data shape is non-spherical) or entangled tics with different combinations of shape and scale parameters.
(i.e., clusters are overlapping). In our simulation, we assume that Other models can be used to evaluate the performance. It should
the ground-truth number of clusters is determinable. Each cluster- be noted that the exponential distribution, Erlang distribution,
ing algorithm makes specific structural assumptions that need to and chi-squared distribution are all special cases of the Gamma
be considered about the dataset, that is, the shape of the clusters. distribution.
Table 4 summarizes the attributes of each dataset investigated in We do not consider some synthetic cluster shapes such as rings
our study. or spirals because they are not realistic to describe the human
Dataset I serves as a benchmark for checking the impact of di- behaviour. It should be cautious when duplicating the proposed
mensionality. In Table 1, we can see that the proposed MGM out- method on the data with those pretentious cluster shapes.
performs other methods. For Datasets II and III, we evaluate all
methods with respect to the impact of perturbation and clusters 5. Business analytics of real mobile usages
overlapping. We shift the perturbation rate from 1% to 25% against
the total data size to verify the sensitivity of each method. Figs. 2 Business analytics is a process of analysing data to support
and 3 illustrate the performance of all methods. It is clear that decision and evaluate operations. By means of applying descrip-
N-Cuts and MGM outperform others. N-cuts is an unbiased mea- tive, predictive, and prescriptive analytics, it transforms data to
sure of disassociation between subgroups by evaluating simultane- valuable strategic design and provide the logical back end of
ously the total dissimilarity between groups and the total similar- searching the operational optimisation, see Kunc and O’Brien
ity within groups. Comparing N-Cuts and MGM, we can see that (2018) and references therein. In this section, we conduct a busi-
when increasing the perturbation and number of clusters, MGM is ness analytics attempting to provide strategic options of revenue
org/10.1016/j.ejor.2019.02.046
JID: EOR
Fig. 2. Comparison of the error rates of five methods performed on the Dataset-II (Gaussian mixture) that contains different perturbation rates ranged from 1% to 25%.
Fig. 3. Comparison of the error rates performed by five methods on the Dataset-III (Gamma mixture) that contains different perturbation rates ranged from 1% to 25%.
Table 5
Overview of related studies on pricing methods of communications in the literature.
Authors Object Pricing Network Linearity Evaluation

scheme congestion
Bagh and Bhargava (2013) Bundle service Two and three parts No Nonlinear Numerical
Brito, P., and Vereda (2010) Not specified Two parts No Linear –
Chen and Huang (2016) Data usage Two parts Yes Nonlinear Numerical
Ferrer, Mora, and Olivares (2010) Service Two parts No – Numerical
Fibich, Klein, Koeigsberg, and Muller (2017) Voice usage Three parts No Nonlinear –
Iyengar, Ansari, and Gupta (2007) Voice usage Three parts No Nonlinear Empirical
Lahiri, Dewan, and Marshall (2013) Data and traffic – No Nonlinear –
Lee, Mo, Jin, and Park (2012) Data usage Flat and two parts Yes Nonlinear Numerical
Ma, Deng, Xue, Shen, and Lan (2017) Data usage Three parts Yes Nonlinear Numerical
Masuda and Whang (2006) Voice usage Two parts No Nonlinear –
Sumantam, Chakraborty, and Sharma (2015) Cloud service Flat and two parts No Nonlinear Numerical
Wang, Ma, and Xu (2017) Data usage Two parts Yes Nonlinear Numerical
Yang and Ng (2010) Bundle service – No – Empirical
This study Data usage Two parts No Nonlinear Empirical
management for sustainability. Following the 3Ps in Jaehn present our prescriptive analytics in Section 5.3 that attempts to
(2016) we hereby interpret the goals of sustainability as follows: advise possible outcomes and actions.
stimulating the operator’s continuous investments (profit), foster-
ing the consumer’s personalized adoption to avoid smartphone ad- 5.1. Descriptive analytics
diction (person), and mitigating inefficiency caused by reckless de-
cisions (planet). Laffont and Tirole (1999) explain the economic Descriptive analytics identifies patterns and trends in data and
theory of pricing scheme applied by the telecommunications in- categorises, characterises, and classifies them into useful informa-
dustry. Our research is built on the nonlinear pricing literature and tion in order to understand past and current business performance,
we list some related works in Table 5. We first exemplify a descrip- see Kunc and O’Brien (2018) and references therein. Mobile opera-
tive analytics in Section 5.1 based on the method we proposed in tors analyze their users’ behaviour on the product/service and time
Section 3 with the real usage data provided by a telecom company frequency in order to guide business growth. Behavioural descrip-
(TEL). Based on the outcome of descriptive analytics, we perform a tion of mobile data builds a fine, complete user’s portrait through
predictive analytics in Section 5.2 with machine learning and sim- the study of usage data.
ulation for understanding the current business situation and for- In this section, we apply descriptive analytics on 10 0 0 anony-
mulating strategic options of revenue management. After that we mous users’ mobile data usages of twelve months randomly
org/10.1016/j.ejor.2019.02.046
JID: EOR
Table 6
Descriptive statistics of the TEL data (N=10 0 0) and augmented TEL data (N=10,0 0 0).
Dimension N = 10 0 0 N = 10,0 0 0
Mean Median Min Max Std Skewness Kurtosis Mean Median Min Max Std Skewness Kurtosis
1 6.8691 2.9763 0.0 0 0 0 208.9725 15.1112 7.7173 81.4790 6.8704 3.0128 0.0 0 0 0 208.9725 15.1022 7.7459 81.8090
2 7.6146 3.1054 0.0 0 0 0 286.0771 16.1661 9.0802 126.6288 7.5941 3.0421 0.0 0 0 0 286.0771 16.3833 9.1649 125.7435
3 7.6432 3.5732 0.0 0 0 0 308.2917 15.3870 10.1080 166.8029 7.6512 3.5760 0.0 0 0 0 308.2917 15.8783 10.5355 172.6348
4 8.7162 3.5263 0.0 0 0 0 490.9261 20.6669 14.0048 302.5758 8.7782 3.5694 0.0 0 0 0 490.9261 21.6824 14.2757 297.9158
5 8.5685 3.8581 0.0 0 01 357.7498 17.1135 10.6560 187.3824 8.6579 3.8407 0.0 0 01 357.7498 17.8552 10.9050 187.2890
6 8.2242 3.8639 0.0 0 0 0 340.6736 16.5748 10.2731 174.7519 8.2904 3.7665 0.0 0 0 0 340.6736 17.2143 10.5709 177.4677
7 9.0768 4.2893 0.0 0 0 0 316.7339 17.6045 8.2498 111.8587 9.0099 4.2607 0.0 0 0 0 316.7339 17.4286 8.0504 107.5642
8 8.7176 4.1756 0.0 0 06 298.7147 16.3958 8.6045 121.5529 8.6676 4.1508 0.0 0 06 298.7147 16.1949 8.4358 118.0090
9 9.1443 4.4917 0.0 0 0 0 312.6903 15.8742 8.7407 142.3905 9.0710 4.3027 0.0 0 0 0 312.6903 15.6446 8.4458 136.9362
10 9.9090 4.3173 0.0 0 01 199.7685 17.3113 5.3027 43.2591 9.8749 4.2621 0.0 0 01 199.7685 17.2277 5.1767 41.4716
11 8.8965 4.3180 0.0 0 09 266.3538 16.2707 7.8973 99.3210 8.7540 4.2794 0.0 0 09 266.3538 16.0402 7.8624 98.7368
12 9.8905 4.4845 0.0 0 0 0 242.0820 18.8164 6.9141 69.9945 9.5572 4.2786 0.0 0 0 0 242.0820 18.2699 7.1233 74.5377
sampled from the contractual period from April 2014 to June 2016. to the best fit solution such that all these criteria are steadily min-
Following the Computer-Processed Personal Data Protection Law in imized in F rounds. In order to graphically illustrate the clustering
Taiwan17 , they are randomly picked up by TEL from its primary results, we apply the t-Distributed Stochastic Neighbour Embed-
databank and regarded as a representative of the new 4G service ding (t-SNE) proposed by van der Maaten and Hinton (2008) for
without considering any specific features of demographics, psycho- dimensionality reduction by employing the Barnes-Hut approxima-
graphics, or sociographics. We refer to it as the TEL data in this tion. Fig. 4 illustrates four different clustering results i.e., K = 3, 4,
study and treat each monthly usage as a specific feature of usage. 5 and 6, respectively.
In addition, the 12 dimensional features used to characterize the Our descriptive analytics reveals there is likely to be ignored
cumulative data consumption are synchronized. Due to the legis- imperfection when matching six different data plans to four user
lation in force, we are not clarified for any labels of these 12 fea- groups. We shall conduct predictive analytics to investigate what
tures (e.g., concrete months of the contractual period). Our exper- could happen in revenue if TEL relocates these users clustered by
iment is therefore a double-blind test. From practitioner’s view- the MGM method to its data plans.
point, it is a misconception that many organizations think they
need “perfect” data to proceed with analytics. James Guszcza, the 5.2. Predictive analytics for formulating strategic options
Deloitte Consulting chief data scientist pointed out that there’s no
absolute standard that describes what data is sufficient vs. insuffi- Predictive tools (e.g., machine learning and simulation) will
cient and limited data set can help make valuable classifications or strengthen formulating strategic options as they provide decision
predictions.18 maker with actionable insights from the data, (see Kunc & O’Brien
In order to improve the statistical significance, we apply the (2018)). Our descriptive analytics infers the imperfection of current
bootstrapping method of Sun and Meinl (2012) to enlarge the sam- TEL data plans in matching the user clusters. In this section, we
ple to 10,0 0 0 observations with the original 12 features in our run a predictive analytics that related to sensitivity analysis to pre-
back-testing. We show the complete procedure in Algorithm 4 in dict revenue changes by introducing the MGM based data plan.
Appendix 2. We refer to this bootstrapped dataset as the aug- The current data plans are directly obtained from TEL and
mented TEL data in our study. The advantage of bootstrapping, par- listed in Table 9 where six data plans = ∪6k=1 {λk } are brack-
ticularly its contribution to both natural and social sciences, has eted from 0.55 gigabytes to 16 gigabytes. In Table 9, the two-part
been addressed in several studies (see the Statistical Science 2003 tariff (a lump-sum fee and a per-unit charge) applied by TEL is
special issue, vol. 18, no. 2). In addition, bootstrapping techniques shown. The tariff rates = ∪6k=1 {γk } indicate the basic subscrip-
can be used for an efficiency measurement (Fallah-Fini, Triantis, tion fees and two options a1 and a2 for data overage surcharge:
& Seaver, 2012) and to obtain a good estimation of the statisti- NT$ 100 per 0.2 gigabytes and NT$ 250 per gigabyte, i.e. p1 = 100
cal properties of the original population, see Cerquet, Falbo, and and p2 = 250. Customers will be informed by TEL20 before their
Pelizzari (2017) and references therein. subscribed plan to be completely exhausted. Then the consumer
The descriptive statistics of the TEL data and augmented TEL can voluntarily choose or deny the overage plan.
data are shown in Table 6. We recognize that there exists hetero- In our predictive analytics, we plan to recognize the revenue
geneity in the usage data. In order to well-characterize them, we changes for TEL after using the new brackets of basic data plans
apply the density-based subspace clustering proposed in Section 3. determined by the mean (or median) of the MGM components,
We specify the range of K between 3 and 6, as most operators gen- see Section 5.2. The MGM method clusters the actual usage into K
erally provide customers with more than 3 different data plans19 plans after confirming K 12-dimensional MGM distributions. Each
and TEL offers six data plans for 4G services. By running the pro- distribution has a mean vector (μk ). Then the mean or median of
posed method, we obtain the optimal mixture model with four each mean vector μk will be served as the bracket for the ba-
(K = 4) multivariate Gaussian distributions, i.e., four distinguish- sic data plan. Revenue management is to decide the fees for ba-
able clusters. Table 8 reports the fitting measures (i.e., AIC, BIC, sic plan and overage surcharge for each bracket. For the real data
and HQC) showing that grouping consumers into 4 clusters leads we investigated, TEL does not clarify either the basic plan initially
subscribed or the overage plan the user decided. We assume a
17
Taiwan has adopted the Personal Information Protection Act (PIPA) since 01 Oc- rational consumer will always choose the most beneficial plan. For
tober, 2012 and EU adopted the General Data Protection Regulation (GDPR) in 25 example, when the subscribed data allowance is almost exhausted,
May 2018.
18
https://deloitte.wsj.com/cio/2016/02/04/you- dont- need- perfect- data- for-
analytics-analytics/. 20
Developments in charging and billing architectures have been discussed in the
19
For example, Verizon Wireless in US offers three plans for the 4G LTE data, literature. See, among others, Koutsopoulou, Kaloxylos, Alonistioti, Merakos, and
AT&T in US and Deutsche Telekom in Germany offer four, Hutchison 3G in UK offers Kawamura (2004), de Reuver, de Koning, Bouwman, and Lemstra (2009), and Lee,
five, and NTT Docomo in Japan offers six. Murray, and Qiao (2015).
org/10.1016/j.ejor.2019.02.046
JID: EOR
Table 7
Descriptive statistics of the implied overage above subscription derived from the TEL data (N=10 0 0) and augmented TEL data (N=10,0 0 0).
Dimension N = 10 0 0 N = 10,0 0 0
Mean Median Min Max Std Skewness Kurtosis Mean Median Min Max Std Skewness Kurtosis
1 2.8220 0.0 0 0 0 0.0 0 0 0 193 12.9021 9.9027 117.9707 2.8399 0.0 0 0 0 0.0 0 0 0 193 12.6216 9.8716 120.6259
2 3.2660 0.0 0 0 0 0.0 0 0 0 271 13.8845 11.9203 191.8892 3.4289 0.0 0 0 0 0.0 0 0 0 271 14.3825 11.6562 183.8105
3 3.0940 0.0 0 0 0 0.0 0 0 0 293 13.0648 13.9856 271.1667 3.1419 0.0 0 0 0 0.0 0 0 0 293 12.6859 13.3342 255.3645
4 3.9670 0.0 0 0 0 0.0 0 0 0 475 18.5219 17.6938 424.7556 4.0760 0.0 0 0 0 0.0 0 0 0 475 17.6721 16.8397 412.3893
5 3.6120 1.0 0 0 0 0.0 0 0 0 342 14.6243 14.7881 304.6351 3.8037 1.0 0 0 0 0.0 0 0 0 342 14.4670 13.3417 262.5732
6 3.4350 0.0 0 0 0 0.0 0 0 0 325 14.1245 14.1576 284.0267 3.4697 1.0 0 0 0 0.0 0 0 0 325 13.3955 13.7251 282.8218
7 4.0490 1.0 0 0 0 0.0 0 0 0 301 15.0061 11.1144 177.3771 4.1322 1.0 0 0 0 0.0 0 0 0 301 15.8288 11.7612 189.8056
8 3.7190 1.0 0 0 0 0.0 0 0 0 283 13.7789 12.0550 202.6173 3.8793 1.0 0 0 0 0.0 0 0 0 283 15.0718 12.0091 191.1792
9 4.1010 1.0 0 0 0 0.0 0 0 0 297 13.0932 13.0618 261.2189 4.3339 1.0 0 0 0 0.0 0 0 0 297 14.2825 13.3495 253.2136
10 4.835 1.0 0 0 0 0.0 0 0 0 184 14.3499 7.1183 68.1558 4.9018 1.0 0 0 0 0.0 0 0 0 184 14.5343 7.2149 70.1434
11 4.0100 1.0 0 0 0 0.0 0 0 0 250 13.7045 10.7776 158.0708 4.1484 1.0 0 0 0 0.0 0 0 0 250 14.0061 11.0021 166.6570
12 4.9280 1.0 0 0 0 0.0 0 0 0 226 16.1705 8.9093 102.4978 5.0668 1.0 0 0 0 0.0 0 0 0 226 16.2198 8.5499 96.8315
Table 8
Cross validation measured by different information criteria (i.e., AIC, BIC, and HQC) to verify the number of clusters in the empirical study.
Iteration K=3 K=4 K=5 K=6
AIC BIC HQC AIC BIC HQC AIC BIC HQC AIC BIC HQC
1 1.4796 5.2255 2.9106 1.1679 4.1646 2.3127 1.7873 6.2823 3.5044 2.1005 7.3447 4.1038
2 1.4798 5.2256 2.9107 1.1878 4.1845 2.3326 1.8048 6.2998 3.5219 2.1051 7.3493 4.1084
3 1.4799 5.2258 2.9109 1.1756 4.1723 2.3204 1.8145 6.3096 3.5317 2.0978 7.3421 4.1012
4 1.4802 5.2261 2.9112 1.1682 4.1649 2.3130 1.8024 6.2974 3.5195 2.1031 7.3473 4.1064
5 1.4794 5.2253 2.9104 1.1905 4.1872 2.3353 1.7904 6.2854 3.5075 2.1036 7.3478 4.1070
6 1.4956 5.2415 2.9265 1.1740 4.1707 2.3188 1.8120 6.3070 3.5291 2.1109 7.3551 4.1142
7 1.4911 5.2370 2.9221 1.1919 4.1886 2.3367 1.7882 6.2832 3.5053 2.1057 7.3499 4.1090
8 1.4949 5.2408 2.9259 1.1912 4.1879 2.3359 1.7984 6.2935 3.5156 2.0971 7.3413 4.1004
9 1.4956 5.2414 2.9265 1.1933 4.1900 2.3381 1.8005 6.2955 3.5176 2.1041 7.3483 4.1074
10 1.5007 5.2466 2.9317 1.1817 4.1784 2.3265 1.7935 6.2885 3.5106 2.1177 7.3620 4.1211
Mean 1.4877 5.2336 2.9186 1.1822 4.1789 2.3270 1.7992 6.2942 3.5163 2.1046 7.3488 4.1079
Table 9
The Telecom (TEL) 4G data plans (March 2017).
i ii iii iv v vi Mean Median
Basic Data Plan (gigabytes) 0.55 2 3 4 8 16 5.5917 3.5

Tariff Rate (NT$) 636 936 1,136 1,336 1,736 2,636 1,402.67 1,236
Unit Rate (per gigabyte in NT$) / 1,156.36 468 378.67 334 217 164.75 450.13 356
Data Overage Fee pi /ai (i = 2) NT$100/0.2 gigabytes; NT$250/1 gigabyte
TEL will remind the customer of choosing an overage plan or stop- where ai is the overage plan and pi is its rate for i = 1, 2. is the
page. With the fees in Table 9, if a consumer knows that her ex- actual data usage whose characteristics are shown in Table 6, |θ̄ k |
cessive usage will finally exhausted above 0.4GB but no more than stands for the corresponding estimated number of users for each

1 gigabyte (to be charged at least NT$300), she will choose the data plan, and = 1≤k≤6 {θ̄ k }.
overage plan of 1 gigabyte at beginning that only charges NT$250. With the proposed MGM method, after identifying the number

Therefore, we consider this fact for the overage and apply the min- of clusters K, the new data plan brackets = ∪Kk=1 {λ
k } can be de-
imum principle to minimize the total payment. For example, if a termined by the mean or median of μk . The total revenue based
consumer’s usage is 5.1 gigabytes, the total charge is NT$ 1686 on the new basic plan (K,
,
) can be calculated by
since the “billing plan iv” applies NT$ 1336 (with 4 gigabytes data |θ k |

K 12
plan) with an overage charge of NT$ 350 (i.e., NT$250+NT$100). Rev(K, , , , ai , pi ) = γk
If a consumer’s usage is 5.5 gigabytes, the total payment is then k=1 n=1 m=1
x
− λ
k
NT$ 1736 charged by the “billing plan v” because the total charges
n,m
of NT$ 1836 (i.e., NT$1,336 + NT$250 × 2) by the “billing plan iv” + max 0, min pi ceil , (15)
is eliminated when applying the minimum principle. We illustrate ∀i∈{1,2} ai
the machine learning procedure of determining tariff for the MGM where |θ k | is the number of users for each MGM based data plan

data plans by Algorithm 5 in Appendix 2. and = 1≤k≤K {θ k }.
The total revenue (Rev) of TEL data plan with rates (see We propose new data plans by using 3, 4, 5, and 6 multi-
Table 9) and estimated overage (see Table 7) can be calculated as variate Gaussian distributions based on the descriptive results. In
follows: (see Algorithm 6 in Appendix 2): addition, we reduce the 12 dimensions to one dimension with
the scalar mean or median of the underlying K univariate mix-
|θ̄ k |
6 12
ture Gaussian distributions. Table 10 illustrates the new data plans
Rev(, , , ai , pi ) = γk for different clusters based on the original data (10 0 0 users) and
k=1 n=1 m=1
x x
augmented data (10,0 0 0 users). For example, when the resulting
n,m − λk n,m − λk cluster is 3 with the original 12 dimensions, we obtain two dif-
+ max 0, min p1 ceil , p2 ceil . ferent data plans based on the MGM-mean bracket and MGM-
a1 a2
median bracket. The mean-based bracket sets the data plan into
(14) three tiers bracketed by 0.7617 gigabytes, 4.4972 gigabytes, and
org/10.1016/j.ejor.2019.02.046
JID: EOR
Fig. 4. Empirical results of clustering the TEL data based on our method with different K.
Table 10
New data plan calculated with the proposed clustering method and the corresponding revenue increase (RI). The clustering is based on the original dimension (M = 12)
and one reduced feature (M = 1).
K Number of uesrs = 10 0 0 Number of uesrs = 10,0 0 0
Data Plan Brackets (in gigabyte) R.I Data Plan Brackets (in gigabyte) R.I
M = 12 3 0.7617 4.4972 17.9662 0.0503 0.8025 4.5538 17.6432 0.0533

4 0.8508 3.0034 7.9632 21.2971 0.1327 0.5874 2.2425 7.3196 23.8056 0.1249
Mean 5 0.2450 1.3972 4.5755 14.5199 40.8662 0.1311 0.2827 1.2857 3.5918 9.1414 21.2559 0.1001
6 0.2390 1.1003 2.7219 5.8098 13.6533 33.8925 0.0840 0.0741 0.7106 2.1130 5.9044 14.6535 40.8879 0.0939
M = 12 3 1.4528 6.8454 20.5755 0.2004 0.9216 4.538 18.4899 0.1916
4 0.7994 3.8532 12.1280 37.5210 0.2278 0.8363 3.7845 21.0881 36.8400 0.2369
Median 5 0.2883 1.9322 5.6408 14.3727 37.1780 0.1879 0.7044 2.7712 6.3432 14.1087 40.0204 0.1898
6 0.1678 0.9956 3.0018 7.7811 17.7030 43.9669 0.1800 0.3033 1.2250 3.2402 6.3300 14.3716 39.0178 0.1824
M=1 3 1.7927 6.4637 18.0513 0.1229 1.7199 6.2972 18.1898 0.1188
4 0.8508 3.0034 10.8944 24.4235 0.1370 1.2666 4.1088 10.3088 22.8327 0.1201
Mean 5 1.0976 3.0753 6.4436 12.9213 25.4689 0.1307 0.9916 2.5965 5.8860 12.9092 26.2471 0.1152
6 0.3224 1.4532 3.1604 5.9455 12.5409 25.6623 0.1215 0.3177 1.6737 3.6236 5.4941 11.9352 25.1715 0.1132
M=1 3 1.2328 5.2256 15.7015 0.1208 1.4704 6.2850 17.3413 0.1160
4 1.0013 3.5062 9.4141 21.2359 0.1298 0.9172 3.2555 8.8094 20.5245 0.1186
Median 5 0.2710 1.2917 3.8044 9.8720 22.2558 0.1200 0.2566 1.2323 3.6015 9.5470 22.2420 0.0989
6 0.2640 1.1319 2.7882 6.0388 11.9438 23.9125 0.1270 0.2626 1.1691 2.9419 6.0435 12.0787 24.7496 0.0940
org/10.1016/j.ejor.2019.02.046
JID: EOR
Table 11
New data plan calculated with the proposed clustering method and the corresponding revenue increase (RI). The clustering is based on the random 6 months (M = 6) to
obtain the data plan brackets. Revenue increase (RI) is based on the left 6 months. We report the mean and standard deviation (in parentheses) of 924 combinations.
Number of users = 10 0 0 Number of users = 10,0 0 0
K Data Plan Brackets (in gigabyte) R.I. Data Plan Brackets (in gigabyte) R.I.
3 1.4955 6.1195 19.5279 0.1156 1.4972 6.1164 19.5640 0.1066

(0.1560) (0.7645) (1.2251) (0.0445) (0.1461) (0.6373) (1.2602) (0.0538)
M=6 4 0.7096 2.9312 8.7970 22.7584 0.1495 0.7130 2.9167 8.7512 23.0141 0.1768
(0.2640) (0.8691) (2.2868) (1.9966) (0.0547) (0.3653) (0.9090) (1.5362) (1.7717) (0.0514)
Mean 5 0.4517 1.9111 4.9451 11.7496 25.6691 0.1220 0.4770 2.1867 4.7469 11.0775 25.2655 0.0893
(0.2673) (0.5255) (1.3394) (2.7717) (2.3490) (0.0652) (0.2523) (1.3238) (1.7881) (2.7709) (2.4681) (0.0643)
6 0.3792 1.6731 3.3439 6.5244 12.5147 27.7587 0.0377 0.3729 1.7555 3.3331 5.4655 12.0112 26.6707 0.0644
(0.1447) (0.6436) (1.5282) (3.1568) (2.1484) (2.7157) (0.0403) (0.1033) (0.5742) (1.8894) (1.7716) (1.8653) (1.9918) (0.0552)
3 1.5273 6.2586 19.7754 0.1015 1.5326 6.2581 19.8436 0.1206
(0.1876) (0.9243) (1.2290) (0.0443) (0.1948) (0.8053) (1.2993) (0.0530)
M=6 4 0.6740 2.8949 8.9170 23.2823 0.1224 0.7054 2.9431 8.8624 23.3501 0.1840
(0.2128) (0.5222) (0.9976) (1.7363) (0.0502) (0.2107) (0.9169) (1.0732) (1.8730) (0.0998)
Median 5 0.4302 1.8867 5.1086 10.9515 25.2566 0.1185 0.4486 1.9417 5.0805 11.0197 25.5093 0.0998
(0.1928) (0.7356) (2.4799) (1.6414) (2.1936) (0.0537) (0.1475) (0.5055) (2.3630) (1.9262) (2.4386) (0.0586)
6 0.3655 1.5382 3.1571 5.6327 12.1168 26.7001 0.1127 0.3527 1.4277 3.5199 6.0603 12.1305 26.8293 0.0776
(0.1501) (0.5355) (1.3621) (1.9490) (1.7206) (2.4094) (0.0561) (0.1327) (0.3602) (1.1217) (2.6929) (1.9452) (2.2707) (0.0489)
17.9662 gigabytes, while the median-based brackets are 1.4528 training and the rest six months for testing. There are 924 com-
gigabytes, 6.8454 gigabytes, and 20.5755 gigabytes. Although the binations when order does not matter and replacements are not
corresponding basic data rates are NT$ 636, NT$ 1336, and NT$ allowed. We apply Algorithm 6 on 924 different training/testing
2636 for both mean-based and median-based plans, new plans in- settings and present the results in Table 11. Unsurprisingly, we rec-
crease data allowance by 38.49%, 12.43%, and 12.29% for the mean- ognize the results from Table 10 that when K = 4 the proposed
based plans and 164.15%, 71.14%, and 28.60% for the median-based MGM leads to the highest revenue increase. Figs. 5 and 6 plot the
plan, respectively. Since the tariff rates for the TEL data plans re- revenue of all 924 combinations based on the MGM data plans (i.e.,
main unchanged, the unit rates of the MGM based plans are re- K = 3, 4, 5 and 6) in comparison with the TEL revenue.
duced. Laffont and Tirole (1999) consider the two-part tariffs as
We then compare the operator’s revenue increases af- two complementary services (connection and consumption). Many
ter applying the new billing plans by (RevenueMGM − studies have shown that these two prices should be coordinated
RevenueTEL )/RevenueTEL . We summarize the results in Table 10. (see Bagh and Bhargava, 2013, for example). It is worth losing some
We find that when K = 4 the MGM revenue increases most and revenue on connections to boost variable consumptions, and con-
the median-based data plan constituted of four brackets leads versely. Even though the revenue from the charge of basic plans at
to the highest revenue increase (i.e., 22.78% for 10 0 0 users and beginning decreases, the revenue from the overage surcharge in-
23.69% for 10,0 0 0 users). It is precisely the best result we obtained creases and the total revenue finally increases. The more the data
from the MGM clustering (i.e., K = 4) in terms of revenue increase. are charged for overage, the more the total revenue increases. On
the other hand, the revenue of basic plan can increase after as-
5.3. Prescriptive analytics signing more users who have large data consumption with higher
rate. The proposed MGM method seperates adjacent components
Prescriptive analytics in our study focuses on improving pre- efficiently as we have shown in Section 4. For members in a same
diction accuracy and prescribing better decision options that bring cluster, the mean vector describes the average consumption and
new “values”. The values here could be the company’s future ex- the covariance matrix of MGM allows discrepancy of consumption.
pected revenues and profits or intangible assets, from intellectual The covariance structure of MGM accommodates discrepant mem-
capital, and revealed opportunity to future growth potential and bership, see Fig. 4. Components adhered to discrepancy are com-
such a process can be considered to find, measure, create, and pro- pletely separated by the underlying Gaussian distributions to dif-
tect value across functional areas. Heyns (2015) shows that ana- ferent clusters. We illustrate the distributions of users clustered by
lytics enables organizations to understand and embrace emerging the TEL billing plans and MGM method in Table 12.
opportunities and align products and services with changing cus- In our predictive analytics, we assume other things are con-
tomer needs creating additional value for stakeholders in the pro- stant and simply adopt the current TEL tariff. We the match the
cess and effectively grow, optimise and protect value. Ransbotham MGM based data plan to the current TEL tariff (see Algorithm 5 in
and Kiron (2017) highlight that companies experienced in analytics Appendix 2). Decision makers can determine the fees for basic
are increasingly gaining competitive advantage and analytics for in- and overage plans in different way. Based on the proposed MGM
novations lead to new products, services, and processes or improve method, we draw advantages from the mixture of multivariate nor-
existing ones. mality by increasing the mean and accommodating itself to skew-
In Section 5.2, we obtained the new brackets of data plans by ness, which can be used to preserve the revenue from the basic
setting M = 1 and M = 12. When M = 12, the classification is well plan and meantime increase overage charge.
cultivated by covering all observable patterns (or variety) that the Once the rational consumers recognize when they remain in
machine learning algorithm can identify. It is unfortunately not ac- the regime of low rate basic plan their overage surcharge of the
cessible in real business as the data plans are determined before new data plan in fact increase, they will adjust their behaviour by
collecting all actual usage. On the other hand, when M = 1 the bias either subscribing an adequate basic data plan at the beginning21
in data will increase which consequently leads to predictive inac-
curacy. Therefore, we divide the data set into training data (used to
determine the data plan brackets) and test data (used to evaluate 21
An extra charge against a consumer who alters the initial contractual subscrip-
the revenue of these plans). We randomly pick up six months for tion will apply in practice. For example, TEL provides a time-varying regime of sur-
org/10.1016/j.ejor.2019.02.046
JID: EOR
16.45 16.45
TEL TEL
MGM (K = 3) MGM (K = 3)
MGM (K = 4) MGM (K = 4)
MGM (K = 5) MGM (K = 5)
MGM (K = 6) MGM (K = 6)
16.4 16.4
Log Mean Revenue
Log Mean Revenue

16.35 16.35
16.3 16.3
16.25 16.25
16.2 16.2
0 100 200 300 400 500 600 700 800 900 0 100 200 300 400 500 600 700 800 900
Number of Combinations Number of Combinations
(a) Revenue of clustering by the mean (b) Revenue of clustering by the median
Fig. 5. Comparing revenue with different data plan options for the TEL data (N=10 0 0). (a) illustrates the performance based on the data plan brackets determined by mean.
(b) illustrates the performance based on the data plan brackets determined by median.
18.85 18.85
TEL TEL
MGM (K = 3) MGM (K = 3)
MGM (K = 4) MGM (K = 4)
18.8 MGM (K = 5) 18.8 MGM (K = 5)
MGM (K = 6) MGM (K = 6)
18.75 18.75
Log Mean Revenue
Log Mean Revenue
18.7 18.7
18.65 18.65
18.6 18.6
18.55 18.55
18.5 18.5
0 100 200 300 400 500 600 700 800 900 0 100 200 300 400 500 600 700 800 900
Number of Combinations Number of Combinations
(a) Revenue of clustering by the mean (b) Revenue of clustering by the median
Fig. 6. Comparing revenue with different data plan options for the augmented TEL data (N=10,0 0 0). (a) illustrates the performance based on the data plan brackets deter-
mined by mean. (b) illustrates the performance based on the data plan brackets determined by median.
or foster their consumption to avoid excessive usage. Both choices tial contract. We should note a fact that TEL requires compensation
led to efficient data usage, and congestion of data traffic could be for early termination that are calculated differently from person
weakened. The operators can efficiently allocate data flows, there- to person with (received monthly discount + terminal equipment
fore, 3Ps triplet of sustainability is tenable. subsidy) × (number of remaining days of contract/total length of
contract)22 . It is very hard to confirm if all sampled users have suf-
5.4. Discussion and limitations ficient ICT literacy to well estimate their usage. In addition, some
consumers continuously pay for the add-on may be due to some
Frisiani et al. (2017) point out that mobile operators can in- price inelasticity effects, e.g., appropriated telecommunication sub-
crease their sales to existing customers by making timely appeal- sidies paid by the employer. On the other hand, many consumers
ing offers on services or hardware after analysing the customer benefit from the new plan as they could move to a lower tariff
data. We show that the operator’s total revenue can increase ben- class for basic plan. If the net change between the revenue de-
efited from analysing the user data. In our study, the users who crease caused by down-shifting billing plans and revenue increase
choose to consume the extremely large volume of add-on should caused by charging add-ons remains positive, the operator can still
pay more since this volume is not previously subscribed in the ini- benefit.
charge for the consumer who alters or terminates the contractual subscription be-
22
fore the predetermined maturity date. See Jullien, Rey, and Sand-Zantman (2013) for example.
org/10.1016/j.ejor.2019.02.046
JID: EOR
Table 12
Comparing distributions (in percentage) of users clustered by TEL data plans and MGM method (M=1, 6, and 12) for the TEL data and augmented TEL data.
N = 10 0 0 N = 10,0 0 0
TEL 0.0860 0.2580 0.1090 0.0850 0.2160 0.2460 0.0860 0.2554 0.1063 0.0811 0.2209 0.2503
K = 3 0.5270 0.4250 0.0480 0.3453 0.4426 0.2121

M=12 K = 4 0.1150 0.3490 0.3640 0.1720 0.1522 0.3690 0.3231 0.1557
Mean K = 5 0.0810 0.3190 0.2820 0.1300 0.1880 0.0930 0.0320 0.2940 0.0980 0.4830
K = 6 0.0980 0.1230 0.2330 0.2640 0.0590 0.2230 0.0960 0.1862 0.2365 0.3190 0.0301 0.1322
K = 3 0.4860 0.4350 0.0790 0.4770 0.4401 0.0829
M=12 K = 4 0.1680 0.4170 0.3470 0.0680 0.1683 0.4090 0.3516 0.0711
Mediean K = 5 0.0840 0.3330 0.2940 0.2170 0.0720 0.1141 0.3257 0.2940 0.2245 0.0417
K = 6 0.1060 0.2560 0.2340 0.2490 0.1260 0.0290 0.1049 0.2564 0.2262 0.2457 0.1383 0.0285
K = 3 0.3730 0.4100 0.2170 0.1382 0.4626 0.3992
M=6 K = 4 0.1470 0.3890 0.3230 0.1410 0.1640 0.4011 0.3173 0.1176
Mean K = 5 0.0620 0.1440 0.3420 0.2819 0.1701 0.0970 0.2680 0.2391 0.2771 0.1188
K = 6 0.0640 0.1810 0.2940 0.0430 0.2710 0.1470 0.0811 0.1301 0.3991 0.1185 0.1280 0.1432
K = 3 0.3730 0.4100 0.2170 0.2705 0.4521 0.2774
M=6 K = 4 0.1470 0.3890 0.3230 0.1410 0.0881 0.3650 0.3843 0.1626
Median K = 5 0.0640 0.1850 0.3110 0.3040 0.1360 0.1180 0.1430 0.2850 0.3210 0.1330
K = 6 0.0640 0.1740 0.3040 0.1470 0.1670 0.1440 0.1010 0.1528 0.2280 0.3205 0.1370 0.0607
K = 3 0.4540 0.3610 0.1740 0.4010 0.3780 0.2210
M=1 K = 4 0.3450 0.3090 0.2440 0.1020 0.3086 0.3370 0.2528 0.1016
Mean K = 5 0.0860 0.2640 0.3020 0.2450 0.1030 0.1308 0.2547 0.3534 0.2210 0.0401
K = 6 0.0850 0.2170 0.2130 0.2100 0.1900 0.0850 0.0960 0.2261 0.1938 0.2075 0.2012 0.0754
K = 3 0.4390 0.3810 0.1690 0.4940 0.3740 0.1320
M=1 K = 4 0.1120 0.3870 0.3400 0.1610 0.1103 0.3240 0.3722 0.1935
Median K = 5 0.1090 0.2680 0.2310 0.1420 0.2500 0.0991 0.2461 0.3130 0.2774 0.0644
K = 6 0.1030 0.2320 0.2510 0.2230 0.1220 0.0690 0.1071 0.1991 0.2963 0.2360 0.0870 0.0745
Besides the ICT industry, multipart tariff schemes have been sumptions, the data generating models, and heuristics. This paper
generally adopted for behaviour-based pricing by service providers provides a tool based on the density-based subspace clustering ap-
such as for insurance, car rental company, and membership orga- proach (i.e., the three-stage iterative optimization procedure) for
nization. Consumers are required to pay a variable per-unit fee for intelligent segmentation of heterogeneous consumer demand with
usage beyond a fixed allowance. We propose a method for business anomalies. This approach is particularly practical as it (1) miti-
analytics that uses density-based subspace clustering method to gates dependency on theoretical assumptions, (2) relies on eas-
assess the contribution of these pricing schemes to revenue man- ily recognized features and benefits from the recent development
agement for heterogeneous consumers with anomalous usage. An in big data technologies, (3) improves computational efficiency. In
important aspect of our method is that it allows for inference with this paper, we discuss how this method was developed taking into
entangled (or mixed) behaviour measured by multiple dimensions. account the recognized data features with a focus on the perfor-
It can be applied not only by using the log data but also with other mance of the method under business engineering sustainability
types of data at different frequencies and metrics such as big data. scenarios. In addition, as a decision-supporting system for business
We have highlighted several technical features of our method in analytics, the algorithmic optimization ensures identifying the best
Section 2.3 comparing with other established methods. With this combination of pricing strategies that jointly achieves managerial
empirical study, we show its applicability and rigidity particularly goals and sustainability. This study demonstrates not only the pro-
for clustering anomalous behaviours. posed method’s ability as an optimization tool in supporting man-
Nonetheless, our study does not come without limitations. agerial decision-making, but the analytic technique can also be
Changing of consumption will impact the total revenue. We take implemented for sustainable management in various industries in-
the billing plans for granted and work with it in a static way with- volving a high variety of consumer demand.
out considering any sequential interaction between the operator In this study, our method attempts to mitigate the effect of
and consumer. Obviously, the operator can formulate best billing anomalies in clustering heterogeneous demand. We illustrate the
plans by altering the current rates after applying a comprehensive critical components of the method and several innovative tech-
market research but we cannot clarify if the sustainability is still niques to bridge existing studies on heterogeneous demand with
achievable under another price regime without careful calculation. anomalies and evaluating sustainable business engineering oper-
The method is data-driven and relies on the distribution of sam- ations. We first calibrate and validate our method based on the
ples. Applying optimal sampling method (see Thompson, 2012, for simulated stylized data representing different behavioural patterns
example) might obtain the most information-rich representatives of demand heterogeneity. We then carry out business analytics by
that suffice for robust parameter estimates. However, randomiza- implementing the proposed method with the real data to formu-
tion inference were overly concise of obscuring important points late strategic options in search of business value. The results high-
for coming to a small number of samples. As to this aspect, it is light the superior performance of our method in comparison with
nontrivial to untiringly search for a large sample. In addition, re- several alternative methods and analytics performance shows that
searcher should attempt to make further exploration of the con- our method increases revenue and strengthens the sustainable goal
sumers’ log data in accordance with the legislation of privacy pro- compared to its conventional counterpart.
tection. The preliminary results from this study pave the way for sub-
sequent investigations using both simulation and machine learn-
6. Conclusion ing techniques for business analytics (e.g., descriptive, predictive,
and prescriptive analytics). Future study could consider the billing
The methods used to cluster heterogeneous demand differ in plans with sequential interaction between the operator and con-
the choice of objective function, the underlying probabilistic as- sumer. In addition, billing plans altering the current rates shall be
org/10.1016/j.ejor.2019.02.046
JID: EOR
investigated for their achievability. A particular case of analysing Appendix B. Algorithms

how the proposed method can be achieved with practical con-
straints (such as the telecommunications operator cost structure)
can be considered together with a cost-and-benefit analysis.
Acknowledgement Algorithm 1 Initialization algorithm.

Input: the dataset ; the number of clusters K; the number of
The authors would like to thank the three anonymous review-
histogram bins B
ers and the guest editors for providing valuable comments. This
Output: initial parameters for Gaussian mixture model, including
work was financially supported by the Center for Open Intelli-
μk , Ck and wk for 1 ≤ k ≤ K
gent Connectivity from The Featured Areas Research Center Pro-
1: for i = 1 to N do
gram within the framework of the Higher Education Sprout Project
2: for j = 1 to N do
by the Ministry of Education (MOE) in Taiwan. Chen was supported
3: di, j = xi − x j 2
by the research project funded by InfoTech Frankfurt am Main, Ger-
4: end for
many under USt-IdNr. DE320245686. Lin was supported in part
5: for b = 1 to B do
by the Ministry of Science and Technology (MOST) under Grant 1 N
6: pi,b = j=1 1 (ab−1 ≤di, j ≤ab )
106-2221-E-0 09-0 06 and Grant 106-2221-E-009-049-MY2, in part N
by the “Aiming for the Top University Program” of National Chiao 7: end for

Tung University and the Ministry of Education, Taiwan, and in part 8: H (di ) = Bb=1 pi,b log pi,b
by Academia Sinica AS-105-TP-A07 and Ministry of Economic Af- 9: end for
fairs (MOEA) 106-EC-17-A-24-0619. 10: for n = 1 to N do
H ( dn )
11: s n = N
Appendix A. Symbols and notations i=1 H (di )
12: end for
13: for k = 1 to K do
Symbol Description
14: r = Random(0, 1 )
N number of users 15: i∗ = argmin{sn − r ≥ 0}
M number of features that are used to characterized data 1≤n≤N
usages 16: μk ← xi∗
K number of clusters of users 17: end for
xn data usages of the nth user measured in the observation 18: for k = 1 to K do
period where xn is a row vector with M elements and
n = 1, . . . , N
19: θ k = {x : x − μk 2 ≤ x − μ j 2 ∀ j = k, 1 ≤ j ≤ K and x ∈
the set of measured data usages where ∪Nn=1 {xn } = }
μk mean vector of the kth Gaussian distribution where μk is a |θ k |
20: wk =
row vector with M elements and k = 1, . . . , K N
Ck covariance matrix of the kth Gaussian distribution, where 1
∀x∈θ k (x − μk )(x − μk )
T
k = 1, . . . , K 21: Ck =
L() given , the measured likelihood
|θ k |
f k ( xn ) probability density function of xn from the kth Gaussian 22: end for
distribution with parameters μk and Ck where
n = 1, . . . , N and k = 1, . . . , K
wk the weight for the kth Gaussian distribution where

1 ≤ k ≤ K, 0 ≤ wk ≤ 1 and Kk=1 wk = 1
δk,n the indicator function for the kth cluster which takes the
value 1 when xn is the closest to μk among K clusters
and 0 otherwise
sn probability of xn to be selected as the initial μk of the
cluster where 0 ≤ sn ≤ 1 and 1 ≤ n ≤ N Algorithm 2 EM algorithm.
di, j Euclidean distance between xi and x j where xi , x j ∈ and Input: the dataset ∈ ;the initial parameters derived from Al-
1 ≤ i = j ≤ N
gorithm 1, including μk , Ck and wk for 1 ≤ k ≤ K
dn distance vector of xn that contains dn, j where 1 ≤ j = n ≤ N
B number of disjoint bins in the histogram to approximate Output: the refined parameters for the mixture Gaussian model:
the probability density function of the distances from xn μk , Ck , wk ; and θ k for 1 ≤ k ≤ K
to all other data in 1: repeat
ab a distance interval partitioned by the B-bin histogram 2: Compute wk,m for 1 ≤ k ≤ K and 1 ≤ m ≤ || with Eq. (7)
where 0 ≤ b ≤ B
pn,b a value between 0 and 1 representing the proportion of
3: Update wk , μk and Ck for 1 ≤ k ≤ K with Eqs. (8)–(10)
data in , whose distances dn, j lie in the range of ab−1 4: Compute L() with Eq. (2)
and ab where 1 ≤ n = j ≤ N, 1 ≤ b ≤ B, 0 ≤ pn,b ≤ 1 and 5: until L () is converged.
B
b=1 pn,b = 1 6: for m = 1 to || do
vector of pn,b where 1 ≤ n ≤ N
Pn
H ( dn ) discrete entropy of dn
7: δm ← argmax wk,m with Eq. (7)
1≤k≤K
F number of folds in the cross validation
8: end for
T training set used in the cross validation that is a subset in
9: for k = 1 to K do
V validation subset used in the cross validation that is a 10: for m = 1 to || do

subset in 11: θ k = δm = k { x m }
basic data plan given by TEL 12: end for
tariff rate for .
13: end for
a i , pi TEL fees

data plan derived by MGM

tariff rate for
org/10.1016/j.ejor.2019.02.046
JID: EOR
Algorithm 3 Estimating the optimal number of components in Algorithm 6 Total revenue calculation.
MGM and their parameters. Input: the dataset ; the TEL data allowance ; the TEL tariff ;
Input: the dataset ; the number of partitions in cross validation the TEL overage plan ai and its rate pi for i = 1, 2
F ; the range of candidate components for MGM [Kmin , Kmax ] Output: the total revenue Rev(, , , ai , pi )
Output: the parameters for Gaussian mixture models, including 1: for n = 1 to N do

the number of components K, μk , Ck and wk for 1 ≤ k ≤ K 2: δn ← argmin 12 m=1 γk + V (xn,m , λk )
1: Randomly partition into F disjoint subsets f for 1 ≤ f ≤ F k
3: end for
2: for K = Kmin to Kmax do
4: for k = 1 to 6 do
3: for f = 1 to F do
5: for n = 1 to N do
4: T ← \ f
6: θ̄ k = δn =k {xn }
5: V ← f
7: end for
6: Using T to initialize parameters μk , Ck and wk
8: end for
with Algorithm 1. |θ̄ k | 12
7: Using T to obtain parameters μk , Ck and wk 9: Rev(, , ) = 6k=1 n=1 m=1 γk + V (xn,m , λk )
with Algorithm 2
8: AIC f,K ← V to evaluate AIC, see Eq. (11) 10: function V(x, λ) Calculating the charge of add-on a1 and
9: BIC f,K ← V to evaluate BIC, see Eq. (12) a2
10: HQC f,K ← V to evaluate HQC, see Eq. (13) 11: n1 = 0 Initializing the number of a1 (0.2 gigabytes)
11: end for 12: n2 = 0 Initializing the number of a2 (1 gigabyte)
1 F 13: if x > λ then
12: AICK ← f =1 AIC f,K
F
1 14: if (x − λ ) − x − λ > 2 a1 then only consider a2 be-
BICK ← F cause of 3 p1 > p2
13: f =1 BIC f,K
F 1 15: n2 = x − λ
F else comprising a1 and a2
14: H QCK ← f =1 H QC f,K
16:
F ( x − λ ) − x − λ )
15: end for 17: n1 =
a1
16: K ← argmin AICK + BICK + HQCK 18: n2 = x − λ
Kmin ≤K≤Kmax
19: end if
17: μk ← μk for 1 ≤ k ≤ K
20: end if
18: Ck ← Ck for 1 ≤ k ≤ K
21: V = p1 n1 + p2 n2
19: wk ← wk for 1 ≤ k ≤ K
22: end function
References
Algorithm 4 Bootstrapping algorithm.
Input: the dataset ; the number of bootstraps NB ; the sampling Akaike, H. (1974). Information theory and an extension of the maximum likelihood
principle. IEEE Transactions on Automatic Control,, 19(6), 716–723.
threshold Anderson, S., & De Palma, A. (2009). Information congestion. The RAND Journal of
Output: the set of bootstrapped data B Economics, 40(4), 688–709.
1: Initialization: B ← Ø Bagh, A., & Bhargava, H. k. (2013). How to price discriminate when tariff size mat-
ters. Marketing Science, 32(1), 111–126.
2: while |B | < NB do Bagirov, A., & Yearwood, J. (2006). A new nonsmooth optimization algorithm for
3: for n = 1 to || do minimum sum-of-squares clustering problems. European Journal of Operational
4: Generating uniformly distributed random number: r = Research, 170, 578–596.
Bai, C., Dhavale, D., & Sarkis, J. (2016). Complex investment decisions using rough set
Random (0, 1 ) and fuzzy c-means: An example of investment in green supply chains. European
5: if r ≥ and |B | < NB then Journal of Operational Research, 248(2), 507–521.
6: B ← B ∪ {xn } Billieux, J. (2012). Problematic use of the mobile phone: A literature review and a
pathways model. Current Psychiatry Reviews, 8(4), 1–9.
7: end if Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
8: end for Brito, D., P. , P., & Vereda, J. (2010). Can two-part tariffs promote efficient investment
9: end while on next generation networks. International Journal of Industrial Organization, 28,
323–333.
Bushnell, J., & Mansur, E. (2005). Consumption under noisy price signals: A study
of electricity retail rate deregulation in San Diego. The Journal of Industrial Eco-
nomics, 53(4), 493–513.
Campello, R., Moulavi, D., & Sander, J. (2013). Density based clustering based on
Algorithm 5 Determination of the tariff for MGM hierarchical density estimates. In Proceedings of the 17th Pacific-Asia Conference
Input: the number of clusters K; the TEL data allowance = on Knowledge Discovery and DataMining (PAKDD) (pp. 160–172). Australia: Gold
Coast.
1≤i≤6 {λi } ; the TEL tariff = 1≤i≤6 {γi }; the data allowance Carrizosa, E., Mladenović, N., & Todosijević, R. (2014). Variable neighborhood search

determined by MGM = 1≤i≤K {λ
i } for minimum sum-of-squares clustering on networks. European Journal of Oper-

Output: the tariff for MGM
= 1≤i≤K {γi
}
ational Research, 230(2), 356–363.
Cerquet, R., Falbo, P., & Pelizzari, C. (2017). Relevant states and memory in Markov
1: for i = 1 to K do chain bootstrapping and simulation. European Journal of Operational Research,
2: j ← argmin |λ
i − λ j | 219, 134–145.
1≤ j≤6 Chen, Y.-J., & Huang, K.-W. (2016). Pricing data services: pricing by minutes, by gigs
3: γi
← γ j or by megabytes per second? Information Systems Research, 27(3), 596–617.
Chioveanu, I., & Zhou, J. (2013). Price competition with consumer confusion. Man-
4: end for agement Science, 59(11), 2450–2469.
Claeskens, G., & Hjort, N. (2008). Model selection and model averaging. Cambridge
University Press.
Crémer, J., Rey, P., & Tirole, J. (20 0 0). Connectivity in the commercial internet. The
Journal of Industrial Economics, 48(4), 433–472.
Davenport, T. H., & Prusak, L. (1997). Information ecology. Oxford University Press.
org/10.1016/j.ejor.2019.02.046
JID: EOR
de Reuver, M., de Koning, T., Bouwman, H., & Lemstra, W. (2009). How new billing Meyer, P., & Olteanu, A.-L. (2013). Formalizing and solving the problem of clustering
processes reshape the mobile industry. Info, 11(1), 78–93. in MCDA. European Journal of Operational Research, 227(3), 494–502.
Dias, J., Vermunt, J., & Ramos, S. (2015). Clustering financial time series: New in- Min, S., Zhang, X., Kim, N., & Strivastava, R. K. (2016). Customer acquisition and
sights from an extended hidden Markov model. European Journal of Operational retention spending: An analytical model and empirical investigation in wireless
Research, 243, 852–864. telecommunications markets. Journal of Marketing Research, 1–53. doi:10.1509/
Elkington, J. (1997). Cannilbals with forks: The triple bottom line of the 21st century. jmr.14.0170.
Oxford Press. Mortenson, M., Doherty, N., & Robinson, S. (2015). Operational research from tay-
Ericsson (2012). Ericsson mobility report: On the pulse of the networked society. lorism to terabytes: A research agenda for the analytics age. European Journal of
Whitepaper 2012. Operational Research, 241(3), 583–595.
Eryomin, A. L. (1998). Information ecology – a viewpoint. International Journal of Nguyen, T. M., Wu, J. Q. M., & Zhang, H. (2014). Bounded generalized Gaussian mix-
Environmental Studies, 54(3–4), 241–253. ture model. Pattern Recognition, 47(9), 3133–3142.
Fallah-Fini, S., Triantis, K., & Seaver, W. (2012). Measuring the efficiency of highway Racherla, P., & Mandviwalla, M. (2013). Moving from access to use of the informa-
maintenance contracting strategies: A bootstrapped non-parametric meta-fron- tion infrastructure: a multilevel sociotechnical framework. Information Systems
tier approach. European Journal of Operational Research, 219, 134–145. Research, 24(3), 709–730.
Ferrer, J.-C., Mora, H., & Olivares, F. (2010). On pricing of multiple bundles of prod- Ransbotham, S., & Kiron, D. (2017). Analytics as a source of business innovation. MIT
ucts and services. European Journal of Operational Research, 206, 197–208. Sloan Management Review, 1–19.
Fibich, G., Klein, R., Koeigsberg, O., & Muller, E. (2017). Optimal three-part tariff Ray, S., & Ren, D. (2012). On the upper bound of the number of modes of a multi-
plans. Operations Research, 65(5), 1177–1189. variate normal mixture. Journal of Multivariate Analysis, 108, 41–52.
Filippone, M., Camastra, F., Masulli, F., & Rovetta, S. (2008). A survey of kernel and Royston, G. (2013). Operational research for the real world: big questions from a
spectral methods for clustering. Pattern Recognition, 41(1), 176–190. small island. Journal of the Operational Research Society, 64, 793–804.
Fjeldstad, O., Snow, C., Miles, R., & Lettl, C. (2012). The architecture of collaboration. Sander, J., Ester, M., Kriegel, H.-P., & Xu, X. (1998). Density based clustering in spa-
Strategic Management Journal, 33, 734–750. tial databases: the algorithm GDBSCAN and its applications. Data Mining and
Fränti, P., Virmajoki, O., & Hautamäki, V. (2006). Fast agglomerative clustering us- Knowledge Discovery, 2(2), 169–194.
ing a k-nearest Neighor graph. IEEE Transactions on Pattern Analysis and Machine Santi, É., Aloise, D., & Blanchard, S. (2016). A model for clustering data from
Intelligence, 28, 1875–1881. heterogeneous dissimilarities. European Journal of Operational Research, 253(3),
Frisiani, G., Jubas, J., Lajous, T., & Nattermann, P. (2017). A future for mobile opera- 659–672.
tors: The keys to successful reinvention. McKinsey & Company Report, 1–10. Schlager, T., & Maas, P. (2013). Fitting international segmentation for emerging mar-
Gouvea, R., Kapelianis, D., & Kassicieh, S. (2018). Assessing the nexus of sustainabil- kets: Conceptual development and empirical illustration. Journal of Marketing
ity and information and communications technology. Technological Forecasting Research, 21(2), 39–61.
and Social Change, 130, 39–44. Schwab, K. (2016). The fourth industrial revolution. World Economic Forum.
Hannan, E. J., & Quinn, B. G. (1979). The determination of the order of an autore- Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2),
gression. Journal of the Royal Statistical Society, 41(2), 490–495. 461–464.
Heyns, H. (2015). Becoming an analytics-driven organisation to create value. Ernst & Shi, J., & Malik, J. (20 0 0). Normalized cuts and image segmentation. IEEE Transac-
Young LLP, UK. tions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.
Iyengar, R., Ansari, A., & Gupta, S. (2007). A model of consumer learning for service Siebert, R. (2015). Entering new markets in the presence of competition: Price dis-
quality and usage. Journal of Marketing Research, 44(4), 529–544. crimination versus cannibalization. Journal of Economics & Management Strategy,
Jaehn, F. (2016). Sustainable operations. European Journal of Operational Research, 24(2), 369–389.
253, 243–264. Sodhi, M. (2015). Conceptualizing social responsibility in operations via stakeholder
Jain, A. K. (2010). Data clustering: 50 years beyond k-means. Pattern Recognition Let- resource-based view. Production and Operations Management, 24(9), 1375–1389.
ters, 31(8), 651–666. Suissa, A. J. (2015). Cyber addictions: toward a psychosocial perspective. Addictive
Jain, D., Muller, E., & Vilcassim, N. (1999). Pricing patterns of cellular phones and Behaviors, 43, 28–32.
phonecalls: A segment-level analysis. Management Science, 45(2), 131–141. Sumantam, B., Chakraborty, S., & Sharma, M. (2015). Pricing cloud services- the im-
Jenkin, T. A., Webster, J., & McShane, L. (2011). An agenda for ‘green’ information pact of broadband quality. Omega, 50, 96–114.
technology and systems research. Information and Organization, 21(1), 17–40. Sun, E., & Meinl, M. (2012). A new wavelet-based denoising algorithm for high-
Jullien, B., Rey, P., & Sand-Zantman, W. (2013). Termination fees revisited. Interna- -frequency financial data mining. European Journal of Operational Research, 217,
tional Journal of Industrial Organization, 31, 738–750. 589–599.
Kalayci, K. (2015). Price complexity and buyer confusion in markets. Journal of Eco- Tang, C. S., & Zhou, S. (2012). Research advances in environmentally and socially
nomic Behavior & Organization, 111, 154–168. sustainable operations. European Journal of Operational Research, 223, 589–599.
Khreich, W., Granger, E., Miri, A., & Sabourin, R. (2012). A survey of techniques for van der Maaten, L., & Hinton, G. (2008). Visualizing high-dimensional data using
incremental learning of hmm parameters. Information Sciences, 197, 105–130. t-sne. Journal of Machine Learning Research, 9, 2579–2605.
Kim, D.-W., Lee, K., Lee, D., & Lee, K. H. (2005). Evaluation of the performance of Thompson, S. K. (2012). Sampling. Wiley.
clustering algorithms in kernel induced feature space. Pattern Recognition, 38, Tossell, C., Kortum, P., Shepard, C., Rahmati, A., & Zhong, L. (2015). Exploring smart-
607–611. phone addiction: Insights from long-term telemetric behavioral measures. iJIM,
Koutsopoulou, M., Kaloxylos, A., Alonistioti, A., Merakos, L., & Kawamura, K. (2004). 9(2), 37–43.
Charging, accounting and billing management schemes in mobile telecommuni- Ulhøi, J. P. (1995). Corporate environmental and resource management: In search of
cation networks and the internet. IEEE Communications Surveys & Tutorials, 6(1), a new managerial paradigm. European Journal of Operational Research, 80, 2–15.
50–58. Uratnik, M. (2016). Interactional service innovation with social media users. Service
Kunc, M., & O’Brien, F. A. (2018). The role of business analytics in supporting strat- Science, 8(3), 300–319.
egy processes: Opportunities and limitations. Journal of the Operational Research Vidgen, R., Shaw, S., & Grant, D. (2017). Management challenges in creating value
Society. doi:10.1080/01605682.2018.1475104. from business analytics. European Journal of Operational Research, 261, 626–639.
Laffont, J.-J., & Tirole, J. (1999). Competition in telecommunications. MIT Press. Wang, X., Ma, R., & Xu, Y. (2017). The role of data cap in optimal two-part network
Lahiri, A., Dewan, R., & Marshall, F. (2013). Pricing of wireless services: service pric- pricing. IEEE/ACM Transactions on Networking, 25(6), 3602–3615.
ing vs. traffic pricing. Information Systems Research, 24(2), 418–435. Xie, J., Gao, H., Xie, H., Liu, X., & Grant, P. (2016). Robust clustering by detecting
Lee, B., Murray, N., & Qiao, Y. (2015). Active accounting and charging for pro- density peaks and assigning points based on fuzzy weighted k-nearest neigh-
grammable wireless networks. Mobile Networks and Applications, 20(1), 111–120. bors. Information Sciences, 354, 19–40.
Lee, D., Mo, J., Jin, G., & Park, J. (2012). Price of simplicity under congestion. IEEE Xu, X., Liu, X., & Chen, Y. (2009). Applications of axiomatic fuzzy set clustering
Journal on Selected Areas in Communivations, 30(11), 2158–2168. method on management strategic analysis. European Journal of Operational Re-
Lee, J.-K. (2016). Reflections on ICT-enabled bright society research. Information Sys- search, 198, 297–304.
tems Research, 27(1), 1–5. Yan, Z. (2015). Encyclopedia of mobile phone behavior. IGI Global.
Lin, T. (2009). Maximum likelihood estimation for multivariate skew normal mix- Yang, B., & Ng, C. (2010). Pricing problem in wireless telecommunication product
ture models. Journal of Multivariate Analysis, 100(2), 257–269. and service bundling. European Journal of Operational Research, 207, 473–480.
Lin, Y.-B., Lin, Y.-W., Wu, R.-C., & Wang, Y.-C. (2016). An investigation of telecom Yang, M.-S., Lai, C.-Y., & Lin, C.-Y. (2012). A robust EM clustering algorithm for Gaus-
mobile data billing plans. Journal of Internet Services and Information Security, sian mixture models. Pattern Recognition, 45, 3950–3961.
6(3), 1–26. Yao, W. (2010). A profile likelihood method for normal mixture with unequal vari-
Ma, X., Deng, T., Xue, M., Shen, Z.-J., & Lan, B. (2017). Optimal dynamic pricing of ance. Journal of Statistical Planning and Inference, 40(7), 2089–2098.
mobile data plans in wireless communications. Omega, 66, 91–105. Zimek, A., Schubert, E., & Kriegel, H.-P. (2012). A survey on unsupervised outlier de-
Masuda, Y., & Whang, S. (2006). On the optimality of fixed-up-to tariff for telecom- tection in high-dimensional numerical data. Statistical Analysis and Data Mining,
munications service. Information Systems Research, 17, 247–253. 5, 363–387.
org/10.1016/j.ejor.2019.02.046

Research Paper

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Research Paper

Uploaded by

Copyright:

Available Formats

JID: EOR

ARTICLE IN PRESS [m5G;March 20, 2019;13:13]

Contents lists available at ScienceDirect

European Journal of Operational Research

Merging anomalous data usage in wireless mobile

We organize this article as follows. Section 2 introduces the

Information and Communications Technology (ICT) enables 2.2. Industrial challenges

For 1 ≤ m ≤ M and 1 ≤ n ≤ N, let = {x1 , · · · , xn , · · · , xN } be

The accuracy of clustering methods is easily dominated by the

Table 1 μ1 = (0.5 0.5 )T , μ2 = (3 3 )T ,

Attributes Dataset-I Dataset-II Dataset-III

Perturbation 0% 1%-25% 1%-25%

Authors Object Pricing Network Linearity Evaluation

Iteration K=3 K=4 K=5 K=6

i ii iii iv v vi Mean Median

Basic Data Plan (gigabytes)  0.55 2 3 4 8 16 5.5917 3.5

K Number of uesrs = 10 0 0 Number of uesrs = 10,0 0 0

M = 12 3 0.7617 4.4972 17.9662 0.0503 0.8025 4.5538 17.6432 0.0533

Number of users = 10 0 0 Number of users = 10,0 0 0

3 1.4955 6.1195 19.5279 0.1156 1.4972 6.1164 19.5640 0.1066

Log Mean Revenue

Log Mean Revenue

K = 3 0.5270 0.4250 0.0480 0.3453 0.4426 0.2121

investigated for their achievability. A particular case of analysing Appendix B. Algorithms

Acknowledgement Algorithm 1 Initialization algorithm.

You might also like

Basic Data Plan (gigabytes) 0.55 2 3 4 8 16 5.5917 3.5