Professional Documents
Culture Documents
Abstract— One of the major concepts in business analytics is to start to obtain follow-up information such as an explanation of
identify the anomalies over time, also called as trend analysis. a particular anomaly in a metric.
This can be easily done in pivot tables by using time as one of the
dimensions, usually across columns. However, the trending Typically, a business's data is stored on a database or on
information itself is insufficient to make any quick and insightful databases. These databases are operated with associated
observations. Ranking the time series to identify the similar units
database servers, which manage the storage and retrieval of
of information can accelerate the analysis process. Similarly
projection of the time series to the future will help the decision records from the databases. Analytical servers have
maker to proactively build alternate plans for differing scenarios. additionally been provided to format database queries or
For doing such an analysis and modeling it becomes necessary to information requests sent from a client user interface to the
have aggregated data on demand. Current breed of row store database server for handling. The analytical servers can be
database have limited capabilities to provide the response time used to improve the efficiency of the database accesses and to
required for such an analysis and modeling. Hence, column store provide metrics of interest to the user from the retrieved
database are expected to be better alternatives, for these types of records from the database.
problems. In this work the attempts made by the authors to
develop such a system named as ‘rePivot’ are presented. The One of the major approaches to business analytics is to
proposed frame work consists of three modules namely – a
identify the impending change in trend before it accelerates.
column store database to provide quick access to data, a time
series ranking module and a probabilistic forecasting module. A
This kind of early warning systems are very important and will
case study of the proposed frame work in churn analysis and be useful in various scenarios like trading financial securities,
modeling in telecom has been carried out to test the suitability of predicting sales performance, analyzing the churn etc.
framework for industrial applications. Application of the However, using only a scalar value to compare multiple series,
framework has shown promising results. Work is under progress rank them and project the series to the future is not
to develop additional modules for survival analytics of individual appropriate, even though practically it is possible.
entities in the database.
In this research work an attempt has been made to develop a
system for analyzing the multidimensional data, ranking the
Keywords- Business Analytics, Telecom Churn, multidimensional
time series and predicting the time series to the future. The
data, SQL, prediction model.
framework consists of a crosstab query to provide the
I. INTRODUCTION transposed time series data for selected dimensions, similarity
search module which will rank the time series data and a
prediction module based on probabilistic forecasting model.
Today's businesses have sophisticated data analysis
requirements. The metrics or analyses of a business's data can II. ARCHITECTURE
be difficult to obtain. To calculate a meaningful metric,
business analysts often use spreadsheets to manually analyze The proposed framework consists of three major components.
data. Manual analysis, of course, is a tedious and time- They are - (i) an analytical query system based on an advanced
consuming process. Most applications fail to deliver useful column store database system, which provides the aggregated
metrics that provide unique insights into an organization's time series data from a star schema data mart, (ii) a time series
performance. Useful metrics highlight significant performance ranking module which ranks the time series with an adoptive
measures of the business. Typically, business analysts must algorithm and (iii) a prediction module, which provides a
execute multiple queries and other time-consuming manual simple but effective parametric model building capabilities. A
interventions to to produce these metrics. Then, despite the simplified architecture diagram of the proposed system is
time-consuming effort, analysts must start the process from shown in figure 1. These components are integrated on a
Once the time series data for the user defined criteria is
extracted from the query system, it will be processed by the
time series ranking module.
The trend i.e. time series can have some prominent patterns
which are of interest to business analysts. Some of them are
like [11]:
1. Vary considerably over the past few periods
2. Increase greatly
3. Drop drastically
4. Increase greatly and then drop drastically
5. Perform differently than the total trend
2
series. Large variances suggest a very different development, automatically adapts to the underlying data, it also delivers
while small variances indicate a similar development pattern. trustworthy conclusions for a completely different set of time
series.
Since the values for each series are very different, it is not
possible to compare the series values directly. In order to C. Prediction with a Probabilistic Projection Model
make the series comparable, the series will be normalized, by
dividing the individual values of the series by series mean. Customer defection is a prominent issue in subscription based
industry like mobile phones, credit cards; internet services etc.
Once the data is normalized, square of a sum of differences of
individual values in the time series with that of the overall The major characteristics of customer behavior prediction are
mean vector values is computed, which results in scalar values contractual agreement between the company and its
for each series. Ranking of these series of scalars will provide customers, acquisition cost for new customers, availability of
statistically valid ranks for the time series. The algorithm for large datasets of behavior at the customer level. This can
computing the ranking of time series is shown in figure 3. provide an ability to predict defection point of individual
customer.
3
This analysis can be done in two modes manual and
automatic. In both the modes the process remains the same,
only the space in which the analysis is carried out will differ.
In manual mode, user will select the dimensions of interest.
However in the automatic mode a pre defined structure for
hierarchical analysis is followed.
The data is drawn from the data warehouse into the churn data
mart. Both manual as well as automated ranking analysis of
the projection is carried out.
A sample analysis report for the case study has been shown in
figure 7. It shows the regions where the trends for churn have
been shown. Based on the interest, decision maker can choose
the graphs for which further analysis can be done, if used in
the manual mode and all the analysis is done at the back end,
if used in the automatic mode.
4
Regression (MLR). Table 1, shows the results for different
queries executed in manual mode. It can be observed that the
MAE and RMSE are superior from the proposed system than
from regression. The statistical test for change in the average
error has confirmed that the proposed approach significantly
out performs other methods on aggregate projection.
Figure 8. Projections for contract type with actual values. 3 51 186 400 1000
5 37 73 480 1400
7 9 37 85 320
8 29 75 120 350
10 15 59 75 405
IV. CONCLUSIONS
5
REFERENCES
[1] Abadi D.J, Maden S.R, Hachem N, “Column-Stores vs. Row-Stores: How
Different Are They Really?” SIGMOD 2008, June 2008, Vancouver, Canada.
[2] Berry, Michael J. A., Linoff, Gordon S. Data Mining Techniques: For
Marketing, Sales, and Customer Relationship Management, O’Reilly, 2004.
[3] Fader S, Peter, Hardie GS, Bruce, “Probability models for customer base
analysis”, journal for interactive marketing, 23, 2009, 61-69.
[4] Fader S, Peter, Hardie GS, Bruce, “How to project Customer retention”,
May 2006, available at SSRN.
[5] Ivanova M, Nes N, Goncalves R, Kersten M.L., “MonetDB/SQL Meets
SkyServer: the Challenges of a Scientific Database”, In Proceedings of the Dr. Rajappa Velur is working as a Professor and Dean Academics at
International Conference on Scientific and Statistical Database Management Cambridge Institute of Technology, Bangalore, INDIA. He has
(SSDBM), July 2007. received Bachelor, Master Degree in Electronics Discipline from
[6] Celko J, Joecelka’s SQL for smarties: Advanced SQL Programming, Gulbarga University and Ph. D. Degree in Graph Theory and its
Margan Kaufman, 2005. Computer Applications from Magadh University. His main research
[7] Kimball Ralph: The Data Warehouse Toolkit, Second Edition, Wiley
2004.
interests are in Graph Theory, Data Mining, Ad-Hoc Networks, and
[8] Parrrud, Olivia, and Data mining: modeling data for marketing, risk and Signals and Systems.
CRM, willey-Dream tech, 2003.
[9] R Manual Cran.r-project.org/manual.html
[10]Tien P L, Lin_T, Mc Granaghan M, “Some tips and examples for using
SAS@PROCTABULATE”, Proceedings of the SUGI22.
[11] http://blog.bissantz.com/imetrics-1, last accessed on 29 August, 2009