You are on page 1of 9

Electrical Power and Energy Systems 71 (2015) 42–50

Contents lists available at ScienceDirect

Electrical Power and Energy Systems


journal homepage: www.elsevier.com/locate/ijepes

Fraud detection in registered electricity time series


Josif V. Spirić a,⇑, Miroslav B. Dočić b, Slobodan S. Stanković b
a
I. Strele 6/27, 16000 Leskovac, Serbia
b
Company for Electric Power Supply ‘‘Jugoistok’’ Niš, Branch ‘‘Elektrodistribucija Leskovac’’, Stojana Ljubića 16, 16000 Leskovac, Serbia

a r t i c l e i n f o a b s t r a c t

Article history: This paper analyses time series without the seasonal component of consumers’ power consumption at
Received 13 February 2014 low voltage in order to detect fraud and illogical consumption by customers. Statistical process control
Received in revised form 8 February 2015 is used, where the process represents the process of using electricity. XMR charts are used to indicate
Accepted 20 February 2015
major changes (decrease) in registered customers’ consumption. Verification of this method was tested
on time series of a set of customers who were caught stealing during a time series. It shows that symp-
toms of non-random factors in time series of customers are revealed in a high percentage, which indi-
Keywords:
rectly confirms the method’s ability to successfully detect electricity fraud.
Time series
The process
Ó 2015 Elsevier Ltd. All rights reserved.
Fraud
Registered consumption drop
Control limits
Rules

Introduction  Detection technique based on classification.


 The nearest neighbor technique.
The fraud of electricity can be committed by illegal, i.e. unregis-  Clustering technique.
tered customers. Therefore, they are only consumers, not cus-  Statistical technique.
tomers. Fraud may also be committed at the measuring points of  Information theoretic technique.
legal, i.e. registered consumers who become customers by signing  Spectral technique.
a contract. This paper deals with the fraud of electricity committed  Handling contextual anomalies.
only by registered customers.  Handling collective anomalies.
The fraud of electricity is a source of great non-technical losses
and are a serious problem of functioning power distribution sys- The same source also lists several fraud detection applications
tems. This problem is especially visible in developing countries, including: intrusion detection, fraud detection, fault/damage
countries in transition and generally in countries with low national detection, medical and public health anomaly detection, industrial
income per capita. Furthermore, inspections of customer’s measur- damage detection, image processing, anomaly detection in text
ing points are usually unprepared and unsophisticated which data, sensor networks and other domains. Here are a few papers
results low rate in discovering the fraud. The purpose of this paper that deal specifically with detection of electricity fraud.
is to promote statistical process control (SPC) strategy for detection In [2], detection of fraud and other non-technical losses in dis-
of suspicious electricity customers. tribution companies is based on the use of Pearson’s coefficient,
Fraud of electricity by registered customers is a criminal activity Bayesian networks and decision trees. The key idea of these meth-
that must be manifested in the time series data of monthly regis- ods is identification of a customer pattern with a drastic drop in
tered energy. Then, the time series can be termed as anomalous. consumption and subsequent stabilization, but a gradual (signifi-
The anomaly can be defined as behavior (structural template) that cant) drop with subsequent stabilization of further consumption
is not in line with the expected normal behavior. How to detect an is also taken into consideration.
anomaly on the basis of available data? To solve the problem of In [3], MIDAS is the name of the project which has developed
anomalies detection, there are the following techniques given in two methodologies for fraud detection (the dominant part of non-
[1]: technical losses). One is based on neural networks and the other
on statistical techniques. The first methodology uses neural net-
works due to problem conditions and works with the Kohonen net-
⇑ Corresponding author. Tel.: +381 64 836 7600. work structure. The second methodology is based on the detection
E-mail address: josif.vspiric@gmail.com (J.V. Spirić). of tolerant values outside the range (outliers).

http://dx.doi.org/10.1016/j.ijepes.2015.02.037
0142-0615/Ó 2015 Elsevier Ltd. All rights reserved.
J.V. Spirić et al. / Electrical Power and Energy Systems 71 (2015) 42–50 43

In [4], monthly development of a customer’s selected variable is Processes with unspecified parameters are ‘‘standard unknown’’.
called the pattern, and is represented by a 12-dimensional vector There are two phases in the process control and monitoring.
for a period of one year. Assessment of density distribution Phase I involves a retrospective study of the process and can be
requires a selection of a representative set of sample patterns, considered as preparatory process for Phase II which focuses on
which is represented by a sample matrix. The following procedure the ongoing monitoring [10]. By definition, X and R charts are most
leads to a pattern’s degree of normality, which is defined as the often used in Phase I of the process, which are characterized by lar-
measure of a pattern’s frequency in the considered group of cus- ger changes of followed characteristics. Series anomaly on chart is
tomers. High values of this coefficient will correspond to common determined by location of observations in regions within and out-
(normal) behavior, whereas low values will reveal illogical side control limits, and the existence of long sequences of consecu-
situations. tive observations. The characteristics of these charts are that they
In 1982, Polish mathematician Zdzislav Pawlak postulated the are relatively insensitive to small changes in the process, on the
theory of rough sets as a tool for knowledge discovery in database order of about 1.5r [11].
(KDD) and it is based on indistinguishability relation [5]. In this paper, fraud detection is based on the analysis of
Based on customer fraud results, a class of customers with monthly energy time series. Provided that the level of household
anomalous series is formed and is characterized by corresponding electrification and the process of using a device which monitors
patterns according to their consumption. After the discretization of that level are not changed, the main indicator of the series’ anom-
conditional attributes, it is possible to find customers with con- aly in terms of potential fraud is a significant change in consump-
sumption profiles (patterns) that are identical to the class of cus- tion value compared to the average consumption in the previous
tomers with anomalous series patterns. Such customers belong period. The analysis will be conducted on approximately uniform
to a boundary region and may, although they do not have to, load diagrams of customers at low voltage in the household
belong to a group of thieves. All customers in the boundary region category.
are considered as suspicious and they are the basis for the forma- The usual approach to anomaly detection is to define the region
tion of a list according to which an on-site inspection should be that represents normal behavior. Anomalies become instances
performed [6,7]. where some observed data do not belong to the region of normal
To create a list of suspects, it is possible to use ‘‘fuzzy’’ sets the- behavior. Much more rarely, in case when only anomalous
ory. Firstly, at least two criteria should be defined that express the instances are available, region of anomalous behavior is built,
relationship of a specific customer’s consumption characteristic to and then the region of normal behavior is defined [1].
the appropriate average value of the selected consumption charac- In particular case, time series of fraudulent customers are avail-
teristic at the site the customer belongs to. The criterion can also be able and these series are anomalous. The result of these series
the ratio of the customer’s consumption characteristic to that char- analysis should enable the identification of customer profiles with
acteristic’s average for an identified period for the same customer. anomalous series. However, it is very important to choose an ana-
In [8, 9], based on the selected criteria, their functions of belonging lytical method for this purpose. The suitable method should have a
to ‘‘fuzzy’’ sets are formed, functions of belonging to ‘‘fuzzy’’ sets high success rate in anomaly time series estimation of customers
for suspicion assessment are determined and ‘‘fuzzy’’ rules accord- who caught stealing. Taking into account the overall performances
ing to ‘‘if–then’’ system are set. After the ‘‘fuzzy’’ reasoning proce- of XMR control charts as one of the statistical methods and the
dure is finished, defuzzification is performed which turns a ‘‘fuzzy’’ electricity consumption process with its performances, the XMR
conclusion into a real number that represents the suspicion evalua- control charts method is chosen as a method for anomaly time ser-
tion. Values of suspicion evaluation (usually in%) make up the list ies testing of registered electricity customers. In further text, the
of priorities for on-site testing. customer with the anomalous time series will be synonym for sus-
In this paper, on time series of fraud detected customers, the picious customer.
performance of XMR charts method is tested for fraud detection
of those same customers. For detection of anomalies in time series
of registered consumed energy, statistical method of XMR charts Registered energy time series
was chosen. In the article of Wheeler: ‘‘The Chart for Individual
Values’’, it is stated this method was found by Jennet in the year The values of a customer’s monthly electricity consumption are
of 1942. In fifties years of the last century (more precisely, from changed constantly over time and these changes are the result of a
1943 to 1953), XMR charts are frequently used in the General series of factors, many of which are random. Most random vari-
Electric Co. in Wembley. In 1980, Wheeler was started again to ables are distributed in nature according to the law of normal
use this chart in chemical industry. The third and the fourth chap- distribution.
ter, in more detail, will be described formation and interpretation According to the central border theorem, the observed ran-
of these charts. dom variable has normal distribution if it is affected by many
For better overview of XMR charts method position, the basic factors in the capacity of an independent (or weakly dependent)
performances of its use in SPC are given. Statistical process control variable, which is also distributed according to normal dis-
is done by control charts which are based on statistic characteris- tribution [12]. This position may be extended under certain con-
tics of the process. In the range of X chart, applied method is fol- ditions and to a large number of factors, i.e. random variables are
lowed by basic (single) chart characteristic of the process and, in distributed according to any law of distribution and with no
that meaning, it is an individual chart. This chart is followed by restrictions regarding the dependence of a factor and the
the process with sample size of n = 1. X chart is frequently used observed random variable. Based on the above, it can be consid-
in non-manufacturing situations. One of this chart characteristics ered that a series of customers’ monthly energy, as a side effect
is long interval between observations. The chart range or R chart of the process of electricity consumption, is distributed according
is followed by ranges between successive process characteristics. to normal distribution.
The ranges which are found on this way are the basis for process A random process concept results from a random variable con-
variability estimation. Simultaneously, XMR chart is also followed cept extension by associating each possible outcome si of a phe-
by process and range characteristic, so it is mutual or simultaneous nomenon or an experiment with an appropriate time function
chart. Process parameters can be known or ‘‘standard known’’ X(t, si) instead of a number. Random process X(t, si) is defined as
based on external specifications or long term experience. a function that maps event space S(s1, . . ., si, . . ., sn) into the family
44 J.V. Spirić et al. / Electrical Power and Energy Systems 71 (2015) 42–50

of time functions. A random process can be defined as X(t) and the X chart is used to monitor processes in which one measure is
realization that corresponds to the i-th outcome si as Xi(t) [13]. sufficient to evaluate the process characteristic whose value
Energy which is measured in a time interval can be one of the changes over time. In our case, in fact, it is a measurement of elec-
characteristics of its usage process. That process is the result of tric energy consumption for the previous calendar month. As a rule,
many factors’ effects and can be considered as random. Due to this chart combination is used when there is not enough strong data
the adopted integer quantification of energy and the frequency of for the standard deviation of the corresponding average value of a pro-
electricity meter readings, it can be considered as a process with cess r. In that case, this parameter is estimated by introducing a
discrete states and discrete time. The set of values Xi(t) is called a range (MR)i as a new value, defined as the difference between
time sequence or time series. two consecutive values of Xi in time. The average of measured
Values of monthly energy billings for each customer during one monthly electricity values Xi, where i = 1, 2, . . ., n, is calculated
year form an annual registered energy diagram. Every following according to:
year, these diagrams vary according to the value of an average Pn
i¼1 X i
annual value and standard deviation at the level of the observed l¼ : ð1Þ
year. Annual diagrams also differ according to values observed in n
some months, quarters and seasons. A continuous series of annual Since there are n values of measured energy consumption for
charts represents an energy time series. A time series consists of the period based on which basic statistical characteristics of the
three deterministic components: trend, seasonal and cyclical com- consumption process are assessed, the problem of inability to
ponent. The fourth component is random and can be reached by obtain the range of a sample size one is overcome by calculating
removing the previous three components. the (MR)i range on the basis of two consecutive data Xi and Xi1,
The trend component expresses a long-term tendency in a ser- where i = 1, 2, . . ., n:
ies development. In this case, it may indicate a customer’s system-
ðMRÞi ¼ jX i  X i1 j: ð2Þ
atic efforts to legally reduce energy consumption or the
improvement of the standards of living and consumption increase, The range defined by the relation (2) is called a moving (mobile)
but the duration of the trend is then brief. range and there are n1 of them. The average moving range is
The cyclical component is the result of sequential alternation defined according to:
between a perennial deviation of the observed power above and Pn 
i¼2 ðMRÞi
perennial power deviation below mean values. The duration of MR ¼ : ð3Þ
the cycle is two or more years. The influence of this component n1
on energy values can be related to the cycles of economic activities’ Control limits are calculated according to the standard devia-
intensity. The cyclical component is not perceived at average annual tion estimation r
^ and d2 = 1.128 for a sample size k = 1 [11]:
values and is rarely present in a series of annual values of registered
MR
energy consumption. r^ ¼ r ¼ : ð4Þ
The seasonal component is the result of energy variations that d2
are repeated every year at the same time and in the same sense. The upper control limit in the X chart is:
In this case it is generated by a change in average monthly, quar-
terly or semi-annual temperatures. MR
UCL ¼ l þ 3  ¼ l þ 2:66  MR: ð5Þ
The random component is related to energy variations gener- 1:128
ated under the influence of random factors, such as random time The mean in the X chart is:
variations, and duration of holidays.
CL ¼ l: ð6Þ
The analysis of energy time series will consist of the description
of appearance development and control of time series values over The lower control limit in the X chart is:
time. At the same time, that implies monitoring the trend of energy
values development in characteristic periods. MR
LCL ¼ l  3  ¼ l  2:66  MR: ð7Þ
At first sight, a natural and intuitive indicator of potential elec- 1:128
tricity fraud should be a significant drop in electricity consumption The MR moving range chart can also be formed and it is defined
in the time series of registered monthly consumption. However, by:
practice shows that there are ‘‘sophisticated’’ methods of fraud.
One of them is characterized by the introduction of significant ðUCLÞMR ¼ 3:267  ðMRÞ ð8Þ
energy amounts to a customer’s electrical installation before the ðCLÞMR ¼ MR ð9Þ
measuring point, with a certain simultaneous intentional increase ðLCLÞMR ¼ 0: ð10Þ
of billed energy at the measuring point. In this case, there is even
an illusion of increase in the customer’s consumption, which Important data in this chart is the value of the average moving
camouflages the removal, i.e. energy fraud from the measuring range MR defined by relation (3). Both the X and MR chart should
device at the measuring point. A change in consumption needs to be simultaneously analyzed and, depending on the boundary con-
be established. One of the natural ways to achieve that is by ana- trol values, it is also possible and desirable to draw all control lines
lyzing the time series chart of each customer’s consumption. This in the same chart which could then be called the XMR chart.
includes automatic generation of time series, based on the local Formation of previous charts means checking the amount of
or remote reading of electricity meters. required data or in the considered case the length of a time series.
In order to get accurate control limits, it is necessary to have a
sufficient number of observations. The required number of obser-
XMR charts vations that provides false alarm rate lower than 1% with 95% con-
fidence is at least 100 [14]. For Phase I of the process in [15,16] it is
Monitoring and analyzing a time series of customers’ registered considered that 100 observations are a rough estimation of the
electricity consumption using an X chart of individual values (or number of required observations. However, it is often argued that
measures) and an MR chart of moving range will be discussed in 20–25 points are enough to set control limits [18,24,26]. A small
this paper in detail. number of observations can lead to spaced control limits, which
J.V. Spirić et al. / Electrical Power and Energy Systems 71 (2015) 42–50 45

may affect the absence of alarm signals. An insufficient number of chart indicates an increased percentage of false alarms which casts
observations may as well cause tight control limits which may lead doubt on the justification of using the chart.
to a false alarm. The acceptable number of observations also
depends on the time interval between successive information [24].
Determination of the X chart’s control limits is based on an Tests for unnatural patterns identification
assumption that process data are distributed according to normal
distribution. In satisfying these assumptions, it is possible to have Variability which is the result of cumulative effects of many
a deviation larger than ±3r compared to the centerline CL with a small, basic and common causes is called natural. When natural
probability of 0.0027. This deviation at the same time indicates variability is relatively small, it is considered as an acceptable level
an alarm, which can in some automated processes be converted of process performance (the process is under control or ‘‘in-con-
into a response signal. Deviation of real data distribution from nor- trol’’). In studies of SPC, this variability is called a ‘‘stable system
mal distribution gives slightly different values according to rela- of random causes’’. A process affected only by random causes is
tions (5) and (7), which can cause and increase the number of considered to be under statistical control and it is described as
false alarms. influenced by the usual causes.
In [17] it is stated that, according to Wheeler (1995), the XMR Another type of variability in manufacturing processes is caused
chart is the most appropriate for time series and that it can be used by poorly tuned machines, operator errors and bad materials and
regardless of the type of distribution as well as for a single parame- in this case illegal acts by customers. Such variability is, as a rule,
ter distributions. Furthermore, it is emphasized in [18] that it is not much higher than natural and is usually quantified as unacceptable
necessary to check whether data is distributed according to normal process performance (the process is ‘‘out-of-control’’). Major
distribution, because the XMR chart is insensitive enough to devia- sources of variability are then non-random and are known as spe-
tions from normal distribution. The previous statement applies to cial or attributable causes.
total insensitivity of XMR charts to deviation from normal distribution. Unlike EWMA (Exponentially Weighted Moving Average) and
In a study [14] performed on 10,000 data, on individual X charts CUSUM (Cumulative Sum) are methods that are used to identify
example (n = 1), it is shown that the percentage of false alarms (the regions of minor changes, the main objective of XMR charts method
value of observations above UCL or below LCL) for data distributed is identifying large – abnormal variability caused by special causes.
according to v-square distribution (n = 3) is 5 times higher compared In the X chart, the most important are UCL and LCL lines. Also
to normally distributed data. Moreover, ‘‘heavy-tailed’’ distribution (t, important are three regions of the same width A, B and C between
3) has 4 times higher percentage of false alarms than normal CL and LCL. Field C is bounded by lines CL and CL  r ^ , field B by
distribution. lines CL  r ^ and CL  2r ^ , and field A by lines CL  2r ^ and
The specified researches suggest a need to check the normality CL  3r ^ . The same field tag sequence and their corresponding
test, but only in case of exceeding the UCL and LCL limits in individ- width is applied to surfaces above the CL line.
ual X charts. There is no normality testing in this paper, but the A natural pattern is based on the logic of observations dis-
information regarding the used XMR charts’ sensitivity to this cir- tribution in accordance with the law of normal distribution. This
cumstance will be obtained by checking the effectiveness of non- means that 68.27% of observations are in C total region, 27.18%
random factors disclosure in time series of fraud detected cus- in B total region, and 4.28% in A total region. Expressed descrip-
tomers. The following part of the paper will provide broader inter- tively, the points numbers decreases with the distance from the
pretation of the XMR chart method, which takes into consideration centerline and a point can very rarely be found outside the control
points’ position in the chart between UCL and LCL lines. limits. Based on the above, the first and primary criterion is
To analyze the dynamic structure of the process characteristics reached. According to this criterion, if any point of the observed
over time it is necessary to examine the degree and direction of the series is outside the control lines in the X chart, the process will
same time series members interdependence spaced one or more be deemed out of control. The X chart is characterized as a rapid
periods apart, as well as the formation of the analytical expression and effective means of detecting large changes in the process environ-
which displays it. The degree and direction of the strength of the ment. When previous assumptions are failed, the behavior pattern
relationship between member observations of the same series is characterized as an abnormal pattern, which is at the same time
spaced k periods apart is measured by an autocorrelation coefficient a symptom of special causes.
rk which is represented in time series as a serial autocorrelation Unnatural pattern characteristic is not conditioned only by
coefficient. The value of rk can have positive and negative values Shewhart’s limit jX i  l0 j > 3r, but also by point series that are
and ranges in a closed interval [1, 1]. characterized by a number, field in which they are and probability
Positively correlated series are called persistent, because after with which they occur.
members’ high values there is an aspiration to continue with high In many processes, environment changes smaller than r to 2r
values and after low values an aspiration to continue with small are appeared for which Shewhart’s chart type is insufficiently
values. Negatively autocorrelated series are characterized by a shift sensitive. The sensitivity to small changes is then increased by
from large to small values or from small to large values (from testing the properties of point series in the chart. Table 1 shows
month to month). Positive autocorrelation is shown as a long series unnatural (non-random) patterns with their description and
of successive time intervals above or below the series mean value. symptoms according to which they are identified in the control
Negative autocorrelation indicates the absence or an unusually low chart.
frequency of such sequences, i.e. a frequent change of successive To simplify the unnatural patterns recognition, there are rules
observations in relation to the average value of the series. which are in literature equal to tests and test series term is also
Assessment of the autocorrelation type in this way is easily visible in use. The rules are based on the probability that the point’s posi-
in the position diagram of time series observations with its mean tion described in some of the rules will happen. The definition of
value drawn. these rules is given in Table 2, with the graphic view of point’s
On the basis of researches performed on a large sample with a position [11,20–22].
k = 1 step, it can be concluded that the increase in autocorrelation When a method uses two control charts, in this case the X and R
between two consecutive points increases the percentage of false chart, testing begins with the R chart. The process can be tested on
alarms in the chart [14]. Furthermore, for the autocorrelation coef- the R chart, but only according to rules RT1, RT4, RT7 and RT8 [23].
ficient larger than or equal to 0.4, it is considered that the control In case that any range in the R chart is larger than UCL, it is stated
46 J.V. Spirić et al. / Electrical Power and Energy Systems 71 (2015) 42–50

Table 1 (b) Changing ways of housing use in terms of the length of stay
Most recommended unnatural (non-random) patterns, their descriptions and symp- and/or the number of occupants, as well as changing space
toms in control charts [19].
function.
No. Unnatural Pattern description Symptoms in control chart (c) Substituting a part of electricity by other energy sources.
pattern (d) Incorrect display of a measuring device in case of one or
I Large shifts Sudden and high changes Points near or beyond more of the device’s measuring systems failure.
(strays, control limits
freaks)
II Smaller Sustained smaller changes A series of points on the
Table 2
sustained same side of the central
Rules for identifying unnatural patterns with graphics.
shifts line
III Trends Continuous changes in Steadily increasing or Rule Pattern description Graphics
one direction decreasing run of points
RT1 1 Point beyond a control
IV Stratification Small differences between A long run of points near
limit (±3r) UCL
values in a long run, the central line on both
+2
absence of points near sides +1
control limits Average
V Mixture Saw tooth effect, absence A run of consecutive −1
of points near the central points on both sides of the −2
line central line, all far from LCL
the central line
VI Systematic Regular alternation of A long run of consecutive RT2 2 Out of 3 points in a row
variation high and low values points alternating up and beyond (±2r) UCL
down +2
+1
VII Cycle Recurring periodic Cyclic recurring patterns
Average
movement of points
−1
−2
LCL

that the process is out of control. The X chart is then not analyzed RT3 4 Out of 5 points in a row
until new control lines in the R and X charts are formed [24]. beyond (±1r) UCL
According to Nelson [25], if any range is 3.5 times larger than +2
+1
the average range, with the first computing the average moving Average
range, this range is removed and a new average range is calculated, −1
−2
as well as new control limits in the X chart.
LCL
Determining the root cause of an unnatural pattern is important
for its recognition. To determine the root cause, a thorough knowl- RT4 8 Consecutive points above
edge of the process and a lot of experience are needed. or below the centerline UCL
+2
When applying these rules to solve a variety of problems, there +1
may be situations in which it is necessary to remove one or more Average
rules, because their application provide false conclusions that the −1
−2
process is ‘‘out of control’’. Furthermore, there are situations when LCL
the process reaches the ‘‘out-of-control’’ state without the rules are
recognized a specific cause. RT5 8 Points in a row on both
sides of the centerline UCL
The article [14] states that RT4 test, in addition to RT1 test, sig- +2
avoiding the ±1r area
nificantly increases the chart’s sensitivity to detecting small +1
environment changes. The same source recommends usage of only Average
−1
RT1 and RT4 tests for R charts. The other rules are considered as not −2
to uniquely identified situations with special causes that are com- LCL
mon in practice.
RT6 15 Points in a row within
the ±1r area UCL
Billed energy time series specificities in terms of potential fraud +2
+1
intent
Average
−1
The roles of a supplier and a customer need to be distinguished −2
LCL
when an energy consumption process is created. The customer is
one who directly forms the consumption process. It dictates the RT7 14 Alternating up and down
non-random factors, such as fraud. The supplier has only an obser- points in a row UCL
ver role. Of course, if it follows the time series of consumption and +2
+1
comes to a conclusion that a non-random factor in the process is Average
present, the supplier assigns the customer unnatural pattern treat- −1
−2
ment, i.e. it believes that the process of energy consumption by the
LCL
customer is out of control. This complete the first stage of control.
Based on fraud indications and possible additional analyses, sup- RT8 6 Points in a row steadily
plier makes a decision to check the situation by direct on-site increasing or decreasing UCL
+2
inspection. After checking, if suspicion of fraud is confirmed, it +1
can be said that fraud was discovered/detected. Average
There are several reasons for consumption reduction: −1
−2
LCL
(a) Illegal use of part of the energy.
J.V. Spirić et al. / Electrical Power and Energy Systems 71 (2015) 42–50 47

(e) Transition from one to two or more connections. Simultaneous monitoring of idealized successive values in the X
(f) Customers with registered consumption of 0 kWh. and R chart is given in Figs. 1 and 2 [21]. Fig. 1A shows an energy
(g) Unread customers. diagram where a sharp decline turns into a gradual decline of
energy in time. Fig. 1B shows a sudden increase which turns into
The mentioned reasons must be taken into consideration to the state of gradual growth. In Fig. 1C the process is more stable
make decision whether to perform registered customer on-site in relation to the centerline, after a period of strong destabilization.
control of electricity usage. Since a registered consumption drop Fig. 2A shows an energy diagram where a gradual decline turns
is one of the major leads to illegal electricity usage, the lower con- into a sharp decline of energy. In Fig. 2B a gradual increase
trol limit LCL and fields A, B, and C below CL in the X chart are becomes a sharp rise. In Fig. 2C, after a stable condition in the
important for the analysis of points position Xi. Taking into account beginning, there is a turn into a period of lower stability in relation
the observation given in the last paragraph of the second chapter, to the centerline. The range diagram in Fig. 1D in which range
the need for monitoring UCL and fields A, B and C above CL in the X reduction can be observed, rapid initially and then gradual, corre-
chart must not be excluded. In this manner, a complete possibility sponds to possible situations in Fig. 1. The range diagram in Fig. 2D
of monitoring process of a customer’s electricity usage is achieved. in which range increase can be observed, rapid at first and then
It also allows full use of the annotated rules. In terms of achieving gradual, corresponds to possible situations in Fig. 2.
fraud, the phenomena of invoiced energy increase is still a rare The subject of this paper is analyzing the energy diagram of cus-
occurrence, but it would not be ignored. Nevertheless, further ser- tomers with approximately uniform consumption in the absence of
ies analysis will be performed on examples of a consumption drop, a significant trend (Fig. 3a) and without a seasonal component
i.e. registered energy. (Fig. 3b).
Abrupt changes (shifts or abrupt change), the emergence of Monthly energy values decline can happen in just a few months
trends (trends or gradual change) and stratification (variability is and can represent a sharp decline. Monthly energy values decline
too small) can be considered as good indicators of non-random fac- that occurs in two to three seasonal periods are called a gradual
tors’ impact on the development of energy time series. decline.
In a time series of original Xi values of customers who are Based on the above, it can be concluded that in a time series,
known as fraudulent, the changes can most often be clearly seen from a fraud detection standpoint, the following should be taken
in terms of reducing the value precisely at time of the beginning into account:
of fraud.
Within energy distribution activities in terms of terrain/non-re- (a) The seasonal component or rather the disappearance of this
mote meter readings, irregular meter readings are a real occur- component in the event of fraud in seasonality series.
rence. This causes a sudden increase in the range of one failed (b) The trend component with a shorter duration, if the series
reading between two realized ones. In case of two or more failed consists of successive diagrams of uniform consumption.
readings zero value ranges appear. Both of these situations affect
the incorrect determination of control limits. In Fig. 4, the registered energy time series with approximately
In case of irregular readings it is necessary: uniform load diagram is given. Point 1 should be distinguished as
the time of the first registered energy of the series, and point s
(a) Not to take into account the abnormal ranges. as the time of the last registered energy. The analysis of the time
(b) To normalize the unread meter showing using the values of series then starts with point 1 and ends with point s. For customers
two adjacent (in time) meter showings. whose time of connecting to the network coincides with point s,
connection time, the beginning of the series and the start of the
Due to previously mentioned, unlike the X chart, the upper series analysis coincide.
control limit is important in the corresponding range chart. If a series that began at point 1 is analyzed from point s, then a
retrospective analysis of data from period 1  s will be performed

X-chart UCL X-chart UCL X-chart UCL


A A A
B B B
C CL C CL C CL
C C C
B B B
A LCL A LCL A LCL

(A) (B) (C)

R-chart

(D)

Fig. 1. Successive range decrease display.


48 J.V. Spirić et al. / Electrical Power and Energy Systems 71 (2015) 42–50

X-chart UCL X-chart UCL X-chart UCL


A A A
B B B
C CL C CL C CL
C C C
B B B
A LCL A LCL A LCL

(A) (B) (C)


R-chart

(D)

Fig. 2. Successive range increase display.

(a) to define control limits based on which it can be seen whether the
X process was under control (without non-random factors) or
whether it was out of control (with non-random factors). In case
that the process was under control, it will be deemed that control
limits at this stage (Phase I) may be the basis for a follow-up pro-
cess, which begins at point s + 1.
In case that the process is found not to have been under control,
this claim is verified on the field and if it is true, the customer is
disconnected which leads to the situation that (after a possible
reconnection) a new time series of the customer is monitored.
1 τ t From the perspective of fraud occurrence, the following can be
distinguished:

(b) X  Fraud that begins at the start of a series due to an unprofes-


sional connection of a customer to the electrical grid and, as a
rule, can only be accidentally revealed.
 Fraud which starts at some point in a series, but is revealed
immediately in a small number of cases or usually later.
 Fraud which starts at some point in a series and is not revealed
until the end of the series.

How long should a creditable period last in which a consump-


t tion drop analysis is conducted? In [1,2], a 24-month period is
used. The same sources provide a model of a diagram before and
Fig. 3. The registered energy time series: (a) a diagram with approximately uniform after fraud, the ratio of average consumption before and after fraud
consumption. (b) A diagram with seasonal consumption. and the duration of rapid and gradual decline of energy. This is
convenient for implementation in case when the start of a time
series and the start of the same series analysis are match.
X This paper analyses regions with a total loss of electricity in the
distribution network at low voltage of about 19%. Inspections
made in order to minimize losses were not based on customer data
analysis, and the database has only been updated since 2003.
The present paper, therefore, deals primarily with cases of a sig-
nificant shift between points 1 and s. An invoiced energy drop in a
month compared to the middle/average energy until then is clearly
visible in a time series (it is about p  r, where coefficient p is larger
than zero). Fig. 4 gives a simplified representation of a time series
1 t τ where there has been a significant drop at point t. The shorter the
duration of new lower values after point t, the greater the probability
that new lower value points will be close to or below the value of LCL
with updated data. With a longer duration of new values after point
Fig. 4. A display of a time series with a drop at point t.
J.V. Spirić et al. / Electrical Power and Energy Systems 71 (2015) 42–50 49

t, new values will shift to a new middle, away from new LCL lines, Out of 99 customers that are considered as suspicious, an
creating the impression that the process is under control. energy drop is noted in 67 cases (67.7%), 20 customers (20.2%)
Obviously, there is a time period during which one has to decide have the last energy before fraud detection approximately equal
whether the process is in the out-of-control state and then react in to the mean of energy, and the remaining 12 customers (12.1%)
terms of a field check. Clearly, it would be useful to consequently have last energy values even higher than the upper value of mean
perform sequential computation of new regions and new control energy. This finding shows that the energy drop before anomaly
limits during this period. New series can also be created during this discovering is not the only indicator of fraud. The other two
period, which may increase the security of decision making. possibilities point to a part of the total energy time series, which
The logic of determining control limits in the X chart is based on is registered at measuring point, but not the energy that entered
normal distribution of treated monthly energy observations. In the customer’s installation prior to the measuring point.
cases when abnormal profiles appear under the influence of non- Moreover, accidental or intentional increase in energy values in
random causes, a distortion of the assumed distribution may hap- the last month or several months before fraud detection is possible.
pen and false alarms may occur, i.e. signs for reaction. However, considering the percentages found, it can be concluded
Contradictory opinions regarding the issue have previously been that a value drop before detecting fraud is the dominant indicator
stated, referring primarily to the use of XMR charts in cases of nor- for its detection. At the same time, the rules in Table 2 are shown to
mal distribution distortion. The authors believe that the conse- significantly increase the sensitivity of the XMR charts method.
quences of a possible increase in the false alarms number do not On range MR chart of 97 customers, one or more points are out-
present a problem in terms of disturbing the process of the sup- side the control limit (UCL)MR. This confirms the thesis that there is
plier’s continued network presence, because the decision to control no need to explain X chart when the process in MR chart is out of
the measuring point is followed by the inspection result. control. In case of customers who are not suspicious according to
the rules in Table 2, there are no points outside the (UCL)MR. In case
of the same customers, none of the other rules for MR chart and
Case study rules for X chart is fulfilled. One of the reasons for failure in fraud
detection is already noted as small amount of data, and the other
The success of the XMR charts method for fraud detection is one is energy stealing before the meters. A series of 8 or more
analyzed on examples of 106 electricity customers connected to points below the MR line and only 4 series above the MR line are
the distribution system in the region of Leskovac, Serbia. registered in case of 70 customers in MR chart of suspicious cus-
Electricity fraud was detected in case of all customers during the tomers, according to RT4.
period between 2005 and 2012. Fraud was discovered during regu- In case of 97 suspicious customers, one or more points outside
lar inspections or due to anonymous tips and without any sophis- control limits UCL or LCL are registered in X chart. 53 customers are
ticated system based on energy values over time. Its energy time recorded outside both boundaries. In the upper half of X chart,
series are characterized by the absence of a seasonal component above the CL line, RT2 test is satisfied 78 times and RT3 test 75
and in case of only random factors, they can be considered as dia- times. At least, one of these two tests is satisfied in case of 87 cus-
grams of approximately uniform energy consumption. Monthly tomers. Below the CL line, RT2 test is satisfied 72 times, and RT3
readings were regularly performed in case of all customers. test 80 times. At least, one of these two tests is satisfied in case
Each time series starts at point 1 and ends at point s. The time of of 86 customers. When RT2 and RT3 tests are considered sepa-
fraud detection t is determined in it. A series from 1 to t is rately, 88 customers is under suspicion according to RT2 test, and
extracted as part of the total series and then the X and MR charts 90 customers according to RT3 test.
are formed from lt and rt. In this manner, the knowledge that a According to RT4 test, 159 series with 8 or more points below
customer is recognized as a fraudulent is abstracted. Then, accord- the CL line are found in case of 91 customers or 1.75 per customer
ing to the rules for identifying unnatural patterns, whether a time on average. According to the same test, 106 series are above the CL
series contains non-random factor symptoms is investigated. The line in case of 64 customers or 1.5 per customer on average. The
rules from RT1 to RT8 were tested. presence at least one series above and below the CL is found in case
In the period of energy fraud detection, monthly energy value at of 61 customers. According to RT5, RT6, RT7 and RT8 tests, there are
previous reading in t  1 can be: 23, 2, 5 and 25 suspicious customers of electricity, respectively. In
case of 6 suspicious customers, 11–14 consecutive points within
(a) Lower than the mid-series during the period from 1 to t, i.e. the region ±1rt in relation to lt are recorded. The appearance of
W t < lt  0:3 rt . alternating positive autocorrelation of time series data is observed
(b) Approximately equal to the middle during the period from 1 in case of 61 customers.
to t, i.e. lt  0:3 rt 6 W t < lt þ 0:3 rt . Based on the previous research of X charts analyzing, it can be
(c) Higher than the mid-series during the period from 1 to t or concluded that RT1, RT2, RT3 and RT4 tests were successful in
W t P lt þ 0:3 rt . 91.5%, 83%, 84.9% and 85.8% of cases, respectively. The success rate
of RT5 test is 21.7% and RT6 test is only 1.9%. The success rate of
Analysis of 106 time series leads to conclusion that only 7 cus- RT7 test is 4.7% and RT8 test is 23.6%. Based on R charts analyzing,
tomers after testing, according to the rules in Table 2, does not it can be concluded that RT1 test is successful in 91.5% of cases and
cause any suspicion by any of the rules. Characteristically, it is RT4 test has success rate of 66%. These data is also shown in
short length series with 26, 28, 35, 35, 44 and 47 months duration, Table 3.
and only one has length of 95 months duration. This indicates that The set of customers with founded stratification has no cross
it is easier to detect non-random factors on longer series. A total of section with the set of customers by which at least one point is
99 customers could be considered as suspicious, or it is necessary out of control limits. It means that total number of all suspicious
to control their measuring units. From 99 customers, according to customers is 2 + 97 = 99. The success rate of suspicious customers
RT1 test, 97 of them are suspicious both on X and R chart, and by detecting using all tests based on the total number of analyzed cus-
RT6 test 2 customers are suspicious (there is stratification). An tomers is 100  99/106 = 93.4%. Table 3 gives an overview of the
abrupt change of energy greater than 1.5r in the last month in number of suspicious customers calculated in absolute and per-
comparison with the middle is detected in 59 cases. centage values according to some particular tests.
50 J.V. Spirić et al. / Electrical Power and Energy Systems 71 (2015) 42–50

Table 3
An absolute and percentage values of the number of suspicious customers according to tests.

Test X-chart R-chart


The number of suspicious Participation of the test The number of suspicious Participation of the
customers according to test to confirm suspicion (%) customers according to test test to confirm suspicion (%)
RT1 97 91.5 97 91.5
RT2 88 83
RT3 90 84.9
RT4 91 85.8 70 66
RT5 23 21.7
RT6 2 1.9
RT7 5 4.7
RT8 25 23.6

Alternating positive autocorrelation of highly 57.6% is appeared [2] Monedero I, Biscarri F, León C, Guerrero JI, Biscarri J, Millán R. Detection of
frauds and other non-technical losses in a power utility using Pearson
as an important symptom presence of non-random factors. The
coefficient, Bayesian networks and decision trees. Int J Electr Power Energy
success rate of 93.4% in suspicious customers detecting is evidently Syst 2012;34:90–8.
high. Taking into consideration this result, possible deviation of the [3] Monedero I, Biscarri F, León C, Biscarri J, Millán R. MIDAS: detection of non-
data distribution compared to the normal distribution on the part technical losses in electrical consumption using neural networks and
statistical techniques. Escuela Técnika Superior de Ingeniería Informática,
of analyzed series did not disturb the performance of suspicious Departamento de Tecnología Electrónica, Avda, Seville, Spain; 2012.
customer’s detection. Overall, it can be concluded that XMR charts [4] Galván JR, Elices A, Muñoz A, Czernichow T, Sanz-Bobi MA. System for
method testing has shown that it can be a good tool in suspicious detection of abnormalities and fraud in customer consumption. In: 12th
Conference on the electric power supply industry, November 2–6, Pattaya,
customers detecting regarding electricity fraud. Thailand; 1998.
[5] Pawlak Z. Rough sets. Int J Comput Inform Sci 1982;11:341–56.
[6] Cabral Junior JE, Pinto JOP, Gontijo EM, Filho JR. Rough sets based fraud
Conclusions
detection in electrical energy consumers. 6th WSEAS Int. Conf. on Mathematics
and Computers in Physics (MCP ’04), Cancun, Mexico; 2004.
Checking registered energy time series of customers from the [7] Spirić JV, Stanković SS, Dočić MB, Popović TD. Using the rough set theory to
detect fraud committed by electricity customers. Electr Power Energy Syst
beginning of the series to fraud detection time by using the XMR
2014;62:727–34.
charts method shows significant success. Specifically, in 93.4% of [8] Spirić JV, Janjić A. Using of Fuzzy Logic in the struggle with the unauthorized
time series tested the rules indicate a non-random factor presence, consumption of the electrical energy. Regional conference and exhibition on
i.e. reveal an unnatural customer pattern. electricity distribution, October 5–8, Herceg Novi, Montenegro; 2004.
[9] Spirić JV, Janjić A. Application of fuzzy logic in detection of unauthorized
Based on the present results of conducted tests it can be con- electricity consumption by customers with single-rate tariff meters. Second
cluded that tests RT1, RT2, RT3 and RT4 should be conducted on regional conference and exhibition on electricity distribution, October 17–20,
X charts, as well as RT1 and RT4 on R charts with the highest suc- Zlatibor, Serbia; 2006.
[10] McCracken AK, Chakraborti S. Control chart for joint monitoring of mean and
cess rate. variance. An overview. Qual Technol Quant Manage 2013;10(1):17–36.
The XMR charts method does not see electricity fraud before the [11] Montgomery DC. Introduction to statistical quality control. 6th ed. Arizona,
measuring point, if that fraud starts with the date of a customer’s USA: Arizona State University, John Wile & Sons, Inc.; 2009.
[12] Korn GA, Korn TM. Mathematical handbook. New York, San Francisco, Toronto,
unprofessional connection. The term of unprofessional connection London, Sydney: McGraw-Hill Book Company; 1968.
implies the electricity supplier’s inability to observe the measuring [13] Spirić JV, Jović A, Lončarević F, Spirić JJ. Gubici električne energije u
point bypass at first connection. distributivnom sistemu Elektroprivrede Srbije. Tehnika, br. 4, Beograd; 2012.
[14] Khan RM. Problem solving and data analysis using Minitab. Chichester,
Electricity suppliers must form registered electrical energy time
UK: John Wiley & Sons; 2013.
series of their customers and monitor these series. After each new [15] Rigdon SE, Cruthis EN, Champ CW. Design strategies for individuals and
monthly energy data, characteristic data required for the XMR moving range. J Qual Technol 1994;26(4):274–87.
[16] Quesenberry CP. The effect of sample size on estimated limits for X and X
charts formation should be calculated again. The problem of diffi-
control charts. J Qual Technol 1993;25(4):237–47.
cult on-line testing rules on new chart can be overcome by intro- [17] Shenoy RR. Misuse and performance of individuals process control for single
ducing the first step analysis which consists of registering a parameter distributions of unknown stability. Boston, Massachusetts,
range larger than the upper limit on a range chart (UCL)MR. In this USA: North-eastern University; 2008.
[18] Mohammed MA, Worthington P, Woodall WH. Plotting basic control charts:
manner, the registered range should be an alarm for a more tutorial notes for healthcare practitioners. Birmingham, UK: Department of
detailed series analysis. Public Health and Epidemiology, University of Birmingham; 2008.
The previous circumstance requires placing strong emphasis on [19] Noskievičová D. Complex control chart interpretation. Int J Eng Business
Manage, vol. 5, FMMI, VŠB-TU Ostrava, Czech Republic; 2013.
the regularity of measuring point readings, as it affects moving [20] Nelson LS. The Shewhart control chart – tests for special causes. J Qual Technol
ranges. 1984;16(4):237–9.
An energy drop in the last reading prior to fraud detection was [21] Baldassarre MT, Boffoli N, Caivano D. Statistical process control for software:
fill the gap. Abdurrahman Coskun. Bari, Italy: University of Bari; 2010.
recorded in only 67.7% of tests. This suggests that the energy drop, [22] Jamali AS, JinLin L. False alarm rates for the Shewhart control chart with
even though primary, is not the only indicator of the presence of a interpretation rules. Beijing, China: Beijing Institute of Technology; 2006.
customer’s illegal activities. Although the XMR chart method is [23] Nelson LS. Technical aids. J Qual Technol 1984;16(4):238–9.
[24] George E, Janiszewski S. Applying six sigma and statistical quality control to
considered suitable for large and sudden changes higher than
optimizing software inspections. ITRA, PS&J Software Six Sigma; 2002.
1.5r, previous research shows that this method detects unnatural [25] Nelson LS. Control charts for individual measurements. J Qual Technol
patterns even for changes smaller than 1.5r. 1982;14(34).
[26] Wheeler DJ. Advanced topics in statistical process control: the power of
Shewhart’s charts. Knoxville, TN: SPC Press Inc.; 2004.
References

[1] Chandola V, Banerjee A, Kumar V. Anomaly detection: a survey. ACM


Computing Surveys. Minnesota, USA: University of Minnesota; 2009.

You might also like