Professional Documents
Culture Documents
Keywords. Rule-based system, CDR loss rate, event filtering and correlation.
M.Gh. Negoita et al. (Eds.): KES 2004, LNAI 3215, pp. 16–23, 2004.
© Springer-Verlag Berlin Heidelberg 2004
A Robust Rule-Based Event Management Architecture for Call-Data Records 17
subset of rules that can match most usage patterns and data formats while allowing
unknown patterns to be learnt via user-intervention. Such a system would save the
tedious and impossible task of categorizing each and every vendor-specific CDR
format and having to hard-code for all present and future usage patterns. Although
there is commercially available software on the market like Hansen Software’s CASH
Call Accounting Software or TelSoft Solutions [1], these are closed-source and are
not ideal platforms for customizing to the user’s needs.
In this paper, we propose an effective architecture for filtering and learning CDRs
from correct, partially-correct, and unknown PABX data formats through the use of
an embedded forward-chaining rule-based engine. In addition, the proposed
architecture also provides web-based customizable reports for trend and historical
analysis of phone-line usage. This research is based on our experience in
implementing similar systems for Banks, Credit Collection Agencies and
Manufacturing research centers [2].
Fig. 2.1. CDR Generation for Normal Calls shows the generation of a CDR record for the entire
event comprising call-commencement, call-duration and call-termination of a Normal Call
A more complex scenario occurs when calls are transferred and CDRs have to be
correlated together to form a single record in the system. This is shown in Fig 2.2.
After the variables are identified, they can be classified and used within rules for
filtering CDR and identification.
In traditional fraud monitoring practices, the administrator would only be aware of the
case after it has occurred and by then it would be too late to take measures against the
18 C.W. Ong and J.C. Tay
perpetrator. Usually the reports will have to be viewed by the administrator and
flagged for suspicious activity, all of which is time-consuming and prone to errors.
The model presented here is intended as a first step towards improving fraud
detection efficiency and effectiveness. The expert system model once developed and
introduced will allow for line activity analysis while continuously publishing action
choices in real-time. Specifically, after every interval of CDR arrival, it can be
checked against fraud rules and reports can immediately be sent to the administrator.
The administrator can then take further action by flagging specific lines for further
analysis and monitoring. The action will usually be recommended (based on similar
past actions) by the system to minimize the administrative load.
Ideally the process of modeling fraud activity involves collecting historical line
activity data, and then applying statistical techniques (such as discriminant analysis)
on the dataset to obtain a predictive model that is used to distinguish between fraud
and non-fraud activity [5][6]. In our case however, it would be more difficult to
distinguish fraud and non-fraud activity as CDRs are only issued after a call is made.
Instead, a rule-based approach is used to correlate variables which form conjunctive
patterns in fraudulent phone-line usage. A fraud variable datum (or FVD) is a data
field in a CDR which is used as part of a rule condition to detect fraudulent
phone-line usage.
Some examples of FVDs (in order of significance) are; Duration of Call,
Frequency of Call, Time of Call, Day of Call and Destination of Call. The FVDs
represent the signs which a human operator should take note of when detecting fraud.
Duration of Call is often the prime suspect since an international call (or IDD) that is
placed for more than a certain duration incurs great cost and is unlikely to be used for
meetings, is cause for alarm. Frequency of Call can also indicate fraud activity since a
high frequency detected within a certain time period could be indicative of redial
activity in an attempt to gain unauthorized access to systems. By monitoring Time and
Day of Call, unusual activities can also be caught during periods when calls are not
expected to be placed for most extensions. The last category, Destination of Call,
could be used to monitor lines in which IDD calls are not allowed. Some examples of
Fraud Detection Rules are:
A Robust Rule-Based Event Management Architecture for Call-Data Records 19
be well-formed (as verified by CDR Processor); however, the rule engine still
performs a verification check against the schema database for any data range changes
if they have been specified by the administrator. This two-check process ensures that
the CDRs that enter into the database are valid. Fraud rules are also applied at this
step and any fraud triggers are then queued for job execution. CDRs which trigger the
fraud rules are tagged upon insertion into the CDR database and the appropriate alert
action (using E-Mail or SMS) is taken.
A reporting framework provides the system with a means to generate different user
specified reports in HTML, E-Mail or Excel spreadsheet formats. The system also
allows report templates to be customized using the web-based GUI. The major modules
in the Report Framework are the Template Builder, the User Query Interface and the
Report Presenter. Each module represents the work required from creating reports to
gathering user queries and finally presenting the results in HTML or Excel. These
results can then be viewed either in a browser or sent as E-mail as well as SMS alerts.
The Template Builder gathers attributes from the Attributes and Schema database
and provides a web-based interface for building report templates. This allows
customisation to suit departmental needs and data analysis requirements. Each
user-query is then manipulated for each report. Each query is built as a SQL statement
whose results can be in graphical format or raw data format. A test SQL function is
provided to ensure the query executes correctly against the database. The User Query
Interface obtains Report Templates from the Report Template Database and builds the
user interface for presentation to the user (using HTML for display on a browser).
Finally, from the raw data received from the results of User Query, Report Presenter
will then format reports to suit user needs. Drill-down reporting allows more detailed
view of data.
5 Performance Testing
From Fig 5.1, there were a total of 103574 records for the month of January 2003.
The implementation of hard coded IF-Else statements shown in Fig 1.1 produced
5867 error records which meant there was a 5.36% error rate. The rule-based
approach through the use of wizards to modify rules produces 1352 error records even
after rule adjustment due to inability to filter the CDR. This translates to a 1.28%
error rate. This shows a slight improvement over the old system. The disadvantages of
naive approach are that the hard coded rules are difficult to change and usually can
only be modified by shutting down the server and examining the error logs. A
rule-based system does not require a shutdown of the system since the rules can be
compiled by the CDR Processor immediately when new rules are added. CDRs with
recurring errors are also accumulated and presented to the user with the option to add
in new CDR filter rule based on closest rule match.
In this section an approximation to the CDR record loss rate at different simulated
call traffic intensities will be calculated. This approximation is made to investigate the
limitations of using the serial interface for output of CDR data. The approximation is
based on a simple model of a queuing system; a M/M/1*B system. This system
assumes exponentially distributed interarrival times and exponentially distributed
service times, using only one server and having a limited buffer space of B buffers.
Here, the arrival process represents the arrival of newly generated CDR to the
output buffer and the service process represents the process of transmitting the CDR
over the serial connection. Exponentially distributed interarrival times are often used
in telephony based queuing systems to model the arrival process and has shown to
often be a very good approximation (see [8]). The time to transmit the data over the
serial line is intuitively constant in this case, since the size of each CDR and the
22 C.W. Ong and J.C. Tay
transmitting rate are constant. However, as also mentioned in [8], systems with
general and deterministic service times can often be very closely approximated using
exponentially distributed service times in the model. By using exponential
distributions rather than general, the calculations can be simplified but still be
accurate enough to be used for approximating the limitations of the system.
The CDR loss rate was calculated for different arrival intensities and plotted in a
graph (see Fig 5.2). From the graph it can be determined that the CDR loss rate may
be neglected when the CDR arrival rate is less than close to 4 CDR per second. When
the arrival rate reaches 4 CDR per second, the output buffer starts to fill up and CDRs
are lost. At stress testing, call traffic generates a maximum arrival intensity of
approximately 1 CDR record per second, which is far lower than the critical arrival
intensity when call information records begin to get lost. Even if the traffic load
increases to three times the traffic load of today, there’s no immediate risk of losing
CDRs due to saturated output buffers.
From Fig 5.3 we can see that this arrival rate; at the point when the output buffer
starts to fill up, corresponds to a traffic intensity of about 80%.
The CDR transaction model we have assumed in our study is one in which the CDR is
produced only after a call has been placed. The first step in automatic CDR filtering is to
identify the data fields that comprise the CDR format and which can be used to identify
the type of CDR being produced. In the particular case of the Nortel Meridian One PABX
[5], five different call data were identified that are critical for call reporting. The
architecture that is proposed will allow for line activity analysis while continuously
publishing action availabilities in real-time. For performance evaluation, an approximation
to the CDR record loss rate at different simulated call traffic intensities was calculated.
From the results, we observe that the CDR loss rate is negligible when the CDR arrival
rate is less than 4 CDR per second. At stress testing, call traffic generates a maximum
arrival intensity of approximately only 1 CDR record per second, which is far lower than
the critical arrival intensity when call information records begin to get lost.
A Robust Rule-Based Event Management Architecture for Call-Data Records 23
1,6
1,4
1,2
1
ρ(λ)
0,8
0,6
0,4
0,2
0
0,5 1 1,5 2 2,5 3 3,5 4 4,5 5 5,5 6 6,5 7 7,5 8
λ
References
[1] TelSoft Solutions for Call Accounting, http://telsoftsolutions.com/callaccount.html
(verified on 15 Jan 2004).
[2] Nguyen A. T., J. B. Zhang, J. C. Tay, “Intelligent management of manufacturing event &
alarms”, technical report, School of Computer Engineering, Nanyang Technological
University, Jan 2004.
[3] Reference for Nortel Meridian 1 PBX/Meridian Link Services, http://
callpath.genesyslab.com/docs63/html/nortsl1/brsl1m02.htm#ToC (verified on 15 Jan 2004).
[4] Nortel Networks. (2002) CDR Description and Formats. Document Number:
553-2631-100 Document Release: Standard 9.00.
[5] Nikbakht, E. and Tafti, M.H.A, Application of Expert Systems in evaluation of credit card
borrowers. Managerial Finance 15/5, 19-27, 1989.
[6] Peter B., John S., Yves M., Bart P., Christof S., Chris C., Fraud Detection and Management in
Mobile Telecommunications Networks, Proceedings of the European Conference on Security
and Detection ECOS 97, pp. 91-96, London, April 28-30, 1997. ESAT-SISTA TR97-41.
[7] Java Expert System Shell or JESS, website at http://herzberg.ca.sandia.gov/jess/
[8] Jain, Raj, The Art of Computer Systems Performance Analysis. ISBN 0-471-50336-3, USA:
John Wiley & Sons, Inc., 1991.