You are on page 1of 6

2016 IEEE PES Transmission & Distribution Conference and Exposition - Latin America (PES T&D-LA).

Morelia, Mexico

A Big Data Analytics Design Patterns to Select


Customers for Electricity Theft Inspection
Adriano Galindo Leal, Member, IEEE Marcel Boldt
IPT – Institute for Technological Research Technology at Munich University of Applied Sciences
São Paulo - Brazil Germany
e-mail: leal@ipt.br or leal@ieee.org. e-mail: marcel@boldt.co.za or de.m.boldt@ieee.org.

Abstract—The complexity of Big Data originates not methods in criminology. In particular, it provides tools that
exclusively from high volume, but also from high velocity and may (1) analyze evidence beyond the limitations of human
high variety. These characteristics are caused by technological capability, (2) establish a scientific basis by analyzing large
advancements, mainly those that (1) allow to store huge volumes of data, and ultimately (3) represent human expert
quantities of data, and that facilitate easy access and data knowledge and reasoning in the context of crime investigation
manipulation; and (2) those that now allow processing these data and prevention. In the same publication, [6] presents an
for more comprehensive insight. Particularly these are approach providing automated content analysis for detecting
techniques that facilitate incorporation and combined analysis of crimes like copyright infringement, identity theft and child
heterogeneous data sources. The first category gave rise to the
sexual abuse in digital communities (e.g. peer-to-peer file
concepts of open databases, interactive web services, and social
media. These are by themselves huge sources of data from which
sharing systems, chat applications and social networking sites).
insight may be generated if the obstacles of their highly [7] examines the computer forensic process of obtaining
heterogeneous, dynamic and decentral nature – common digital evidence from social media and its related legal
characteristics of socio-technical systems – can be overcome. aspects. [8] proposes measures and a methodology to extract
Building upon systems engineering principles, this paper presents conclusions on the attitude of individual social network
a design pattern that pays respect to the inter-disciplinary users towards a predisposition against law-enforcement and
challenges of the Big Data environment. At its core are a authorities. Applying a psychosocial perspective, the paper’s
functional architecture and a phased advancement model. These intention is to use the information shared by users in social
are being elucidated exemplarily by outlining the development of media for proactive insider threat detection and prevention
a Big Data analytics approach to select suspicious customers within corporate cybersecurity.
from the customer database of a power grid operator for
inspections on electricity theft, based on customer profiling. It is thus likely that there is also information on social
media that is relevant to energy theft detection. Further
Keywords— Power distribution, power system analysis studies would be necessary to prove that information from
computing, decision support systems, pattern analysis, smart grids social media can be used as a sole indicator to predict an
individual’s likelihood of illegally obtaining electrical energy.
I. INTRODUCTION
However, it can be used in combination with other methods
A.Context, literature review, rationale, structure of detection (e.g. [9]–[13]) to increase the success rate of
“Big Data starts with large-volume, heterogeneous, inspections. In this paper, a Big Data Design Patterns to a i d
autonomous sources with distributed and decentralized t h e d e v e l o p m e n t of a p r e d i c t i v e a n a l y t i c s solution
control, and seeks to explore complex and evolving is proposed and elucidated using the example of an
relationships among data.” [1] Harvesting data in social analytics approach for leveraging a grid operator’s
networks is a trending Big Data topic. Techniques have been customer data, by enriching these data with personal
interdisciplinary developed in information studies, computer information generated from further sources (see Table I). At
science, applied mathematics and statistics in conjunction with first, an exemplary exploration of the problem of
sociology, economics, political science, linguistics,
electricity fraud is given in Section II, to provide a foundation
communication studies, culture related and philosophical
sciences. Information generated by social media analysis is for a set of hypotheses. After solution requirements are
used for example in business and marketing [2], finance [3], stipulated in Section III, a general solution principle and
or crisis management [4]. architecture are introduced and delimited in Section IV,
which involves a phase-based advancement model (see Fig.
Publications can also be found on crime prevention and 1) and a functional structure (see Fig. 2).
detection through social network data analysis. [5] provides
the scientific context: The field of computational forensics B. Big Data Design Patterns
(CF) is here defined as “hypothesis-driven investigation of a
specific forensic problem using computers, with the primary In Software Engineering, design patterns enable better
goal of discovery and advancement of forensic knowledge”. quality and lower total cost of ownership, due to its ability to
The discipline of CF explores the use of computational capture the best practices in software architecture design and
The authors also wish to acknowledge the generous financial support for
the congress participation from IEEE/PES Region 9 (Latin America &
Caribbean) and the South Brazil Section IEEE/PES Chapter.

978-1-5090-2875-7/16/$31.00 ©2016 IEEE


2016 IEEE PES Transmission & Distribution Conference and Exposition - Latin America (PES T&D-LA). Morelia, Mexico

how to avoid expensive traps and pitfalls. In contrast to a reasoning for choosing among these. There are two main
rigid method or technique, methodology as well as a design advantages: a suitable solution is chosen instead of the most
pattern could be defined as a “set of ongoing principles which obvious one, and it supports a later re-design or re-use of
can be adapted for use in a way which suits the specific parts by making the design process comprehensible throughout
nature of each situation in which it is used” [14]. It should the solution’s lifecycle. DSA recommends a semi-formal
notation QOC (questions, options, criteria), with questions
not be understood as a sequence of steps, but rather as a
being the key issues in the design, options being the alternative
‘toolbox’, that can be adapted flexibly. The superordinate answers and criteria being the reasons to evaluate the
(more abstract and more universal) meta-methodology that is alternatives’ adequateness [18].
adopted here is a system design process described within
the Hall-BWI Systems Engineering (SE) concept [15]. This II. PROBLEM
recommends a pre-study in which (1) the problem is further
A. The phenomenon of electricity theft
analyzed and delimited from its environment, (2) general Estimations on costs caused by power theft range from
solution requirements are being stipulated, (3) finally, as a £100m to £400m per annum in the UK, according to the
milestone, the architecture is being decided upon. The national Revenue Protection Association (UKRPA) [19]. The
ISO/IEC/IEEE standard no. 42010:2011 defines Architecture Brazilian Association of Electrical Energy Distributors
and provides a framework for its description [16], therefore, (ABRADEE) estimated the losses of 63 Electric Utilities, in
constitutes the understanding that underlies this paper. 2013, to be 5.6 % due to commercial losses and 8.39 % due to
technical losses [20]. Grid operators try to uncover pilferage
The process of Big Data solution development is made up of by performing inspections, typically based on insight gained
five phases (see Fig. 1). However, the focus of this paper is the from consumption pattern analysis. In practice, the success
pre-study phase that defines a solution’s architecture. The first rate of total inspections to those who discover fraud is
step towards this is to define and analyze the problem – inconsistent over time. Since, whenever a new method is
adequate methods can be borrowed from strategic planning. introduced the detection rate may initially be high, but its
A formal Statement of Requirements should subsume the effectiveness decreases over time as fraudsters adapt their
stakeholders’ expectations. It may serve to fix a contractual methods to avoid detection.
basis but also helps to shift the focus from problem to solution A typical household is billed according to its power
orientation. It also helps defining a common solution rationale consumption, measured by a meter that is read in an
– and / or cause-effect relations, e.g. by sensitivity models interval (typically monthly / yearly). Electricity theft,
[17], to identify critical factors and measures. therefore, entails forging demand being reported to the grid
operator. This can be done in three ways: (1) interrupting
measurement by disconnecting / bypassing the meter (stealing
electricity); tampering with the meter so that it measures
•Problem defini on
incorrectly or “forgets” (manipulation); (3) disturbing
•Stakeholders & requirements transmission of a measured consumption – in a conventional
•Architectural elements’
defini on grid by bribing an employee (corruption); in smart grids also
•Data access concept
•Analytics concept
by manipulating communication. Manipulating or bypassing a
•Business Integr on concept meter requires some level of technical skills, as these also
•DECISION: Analytics
aims & rati nale involve a severe risk of electrocution for committing
individuals. Several other factors also favor service offerings
by organized crime. [21] [22]
There are three categories of methodical approaches
against electricity theft, being technical / engineering methods,
managerial methods, and system change. Technical solutions
involve upgrades of hardware, e.g. power lines,
transformers, as well as applications of metering and
monitoring systems. New hardware will not only reduce
technical losses but also make the grid more secure against
attempts of illegal energy abstraction. Smart grid technologies,
like smart metering, already play an important role – yet the
full potential of these is still not entirely explored. In addition,
conventional technology, e.g. profoundly sealed meters, can
Fig. 1. Development Phases help. Managerial methods aim at improving processes within
utility organizations to ensure compliance. This not only
Design space analysis (DSA) [18] helps explore involves surveillance of customers but also providing
alternatives for designing architecture elements. It furthermore incentives to reduce theft to employees, prosecuting corruption.
depicts the design space considered during solution design, System change relates to the operating principles of the
comprising alternative solution design options and the utility. E.g. in principle, privatization of a utility operator
2016 IEEE PES Transmission & Distribution Conference and Exposition - Latin America (PES T&D-LA). Morelia, Mexico

may shift the focus towards optimizing profits, which may be • The utility grid operator management – the customer whose
reached by trying to minimize losses from theft [21]. interest is to deploy a Big Data approach to increase the
success rate of inspections. On approval, this stakeholder
B. Etiology of fraud
group will make a strategic investment decision on an
As electricity theft, no matter how performed, aims at implementation. This involves solving a conflict regards
making the grid operator assume and charge a demand that is costs estimated and the expected benefit of the
less than actual, it is essentially fraudulent conduct: “obtaining implementation. Management will hereby rely on the IT
something of value or avoiding an obligation by means of department’s recommendations.
deception” [23]. • The grid operator’s IT & Organization department – most
likely the contractor to a solution development project will
Factors inducing fraud root in a motivated, capable offender be in charge to implement the solution into the utility’s
and its environment with opportunities being created by systems and business processes, and to maintain the
available targets and threats by guardians / control [24]. The operational service to make the analytics outcomes
individual’s perceptions and moralities develop constantly available to decision makers. The IT dept. will be prone to
through its interactions to the utility, surrounding social demand a solution being reliably effective, allowing
relationships and technologies involved. [25] Profiling effortless implementation, maintenance, and extensibility.
publications state motivations to be financial strain arising • Regulatory bodies and public interest demand compliance
to laws on the use and protection of customer data.
from lifestyle choices (e.g. drug abuse, gambling), threats of
loss (of wealth, power, status) due to misfortune or imprudence,
These justified stakeholder claims substantiate the following
and typically (financial / social) implications of relationship
functional requirements:
breakdowns, or greed. They can also be personality / ego-
related, e.g. someone taking delight in the act of fraud. This can • Delivery of (factual and timely) relevant, household-based
be due to personality disorders (narcissistic/ antisocial), but information on the probability of electricity theft.
can also be the result of fraudulent behavior that provided • Local scalability (regions, boroughs, network segments).
gratification of successfully being able to solve a problem.
Another ego challenge for offenders is to perceive personal Apart from the functional requirements (a solution that
superiority from ‘fooling’ their victims. [26]. Complementary identifies suspicious customers), several non-functional
to motivations, psychological processes that neutralize internal requirements are typical of Big Data analytics solutions and
should, therefore, be taken into consideration:
moral objections take place. ‘Vocabularies of adjustment’ [27]
that trivialize a crime involve a ‘victimless crime’ as the victim • Allowing adaptation to new business uses cases.
‘can afford it’ – or ‘had it coming’ (responsibility is shifted to • Extensibility to further data sources.
a victim who is itself greedy or involved in immoral • Open interfaces towards applications.
actions). Another strategy comes from a misanthropic view of • Adaptability to fluctuating workload (scalability).
humankind: all people would do it, despite those who were • Resilience even when exposed to external disturbances.
naive. Offenses also appear less contemptible if a victim is • Transparent lifecycle costs / TCO.
disliked or despised. [23] [28] • Compliance with relevant laws and regulations.
Regards the relationship between a customer and an IV. S OLUTION PRINCIPLE & ARCHITECTURE
organization there are two general motivational themes: The The architecture must entail (see also Figure 2): (1) a data
first one is an offender who feels to have been treated access concept – which raw data from which sources
unfairly. The fraud is an act of retaliation or self-justice. must be made available; (2) an analytics concept – how
Secondly, there is territorial ownership: the offender feels relevant information is distilled from data; (3) a business
entitled by regular use, access, or occupation. [23] Typical integration concept – how analytics outcomes may be
neutralization processes can relate to the trustworthiness of an employed within operations.
organization. For example, in a study regarding non-payment
of public service charges, citizens’ compliance was affected by A. Data access concept
“(1) trust in the local government to use revenues to provide The first question (Q1) here is: Which raw data from which
expected services; (2) trust in the authorities to establish fair sources must be made available? Problem analysis (Section
procedures for revenue collection and distribution of services; II) allowed to stipulate eleven abstract criteria to a person
and (3) trust in other citizens to pay their share.” [29] being possibly related to a disposition towards electricity theft,
along with several indicators and sources where information
III. SOLUTION REQUIREMENTS could hypothetically be acquired (see Table I). In this con-
The objective is a concept to a customized analytics text, Ishikawa diagrams were helpful to guide the search for
approach. The level of detail is adequate in order to make causes. Main categories explored were the individual (ability /
a system build decision – i.e. a system definition / capability and mindset / willingness) and the environment
specification. The following stakeholders must be considered (opportunities / threats).
on formulating requirements to the solution:
2016 IEEE PES Transmission & Distribution Conference and Exposition - Latin America (PES T&D-LA). Morelia, Mexico

The resulting DSA options on Q1 were evaluated by a data processing, and data will not be available for all
set of criteria: customers.
Veracity - This involves data being free from errors, not
out- dated and credible in general. Internal sources will, in
general, be credible, while information processes from social
networks may be unreliable. Furthermore, information
generated from unstructured sources – e.g. text analysis - may
be of a varying quality due to insufficiencies of interpretation
algorithms. Especially outcomes of new approaches like
sentiment analysis should be considered critically.
Regarding how ETL (extract, transform, and load) is
concretely performed to make the data available. After all,
one of the innovations of Big Data is that various data sources
can be incorporated, be it because of new interfaces to
structured data sources that were formerly inaccessible (which
also incorporates data collected from sensors, discussed as
b ases
‘Internet of Things’), or that research provided new methods to
access information from unstructured – often plain text or
multimedia – sources. Information can be extracted by
linguistic techniques, e.g. extracting entities (dates, names, and
addresses), concluding to the author’s sentiment, or classifying
data to topical categories in order to filter those that potentially
contain relevant information. The concrete procedures must be
Fig. 2. Functional architecture (see also [30]) specific to each data source; this problem will thus be
TABLE I Q1: DSA DATA OPTION considered on a level below architecture. The general logic
Individual Criteria Indicator Data Source Data type will be (1) establishing an interface to the data source, (2)
consumption’s deviation consumption pattern analysis consumption data structured
Job social media Semi-structured extracting chunks of data, (3) perhaps storing them
technical Knowledge Education social media Semi-structured temporarily – e.g. in graph database structures for social
technical expressions social media unstructured
QOS perceived internal service structured relations of individuals, spatial databases for locations and
relation to utility sentiment internal, social net unstructured distances, key-value database concepts for unstructured text,
trust in public org./auth. social media unstructured
ecological responsibility use of renewable energy internal contract data structured caching of real-time streaming data; (4) applying a generic
social responsibility associations social media Semi-structured algorithm to extract the information, and finally (5) loading it
credit rating agency structured
economic conduct contractual behavior (changes) internal contract data structured into the main analytics environment.
number of contracts internal contract data structured
gambling social media unstructured At this point, it should be obvious that even a
lifestyle red flags drug abuse social media unstructured thoroughly conducted evaluation of all solution alternatives
relationship breakdown social media unstructured
personality Personality disorders social media unstructured will unlikely result in a perfect solution, or even deliver
difficulty of conduct age / type of meter
technical/friends with knowledge
internal contract data structured
social network Semi-structured
practically relevant results on the first attempt. The first
support available mental/criminal friends social network Semi-structured analysis outcomes will rather provide the foundation for
mental/regional crime rate public statistics structured improvements – by adding / altering data sources and ETL
last inspections internal customer data structured
probability of inspections inspections in the neighborhood internal customer data structured methods, and by improvements on the analytics algorithm
inspections at friends social network/internal structured itself. An iterative advancement model should thus be taken
electrical accidents social network unstructured
damage / harm caused
the appearance of power cuts internal service data structured into consideration, as further versions will be necessary to
increase the reliability of predictions. It is also the reason why
Relevance: A preliminary, hypothetical, intuitive the set of hypothesis created is in accordance with literature
assessment of the relatedness towards a disposition to electricity “not rigorous, as in the case of statistical approaches” [31],
theft. but rather a first entry to a Big Data implementation which
should be perceived as a dynamically evolving system of
Difficulty of acquisition: This subsumes several factors, rather evolutionary than mere technical logic.
like the general availability and completeness of data, the
complexity of its processing and costs incurred. Least difficult B. Analytics concept
to handle will is structured data from internal sources, as The next issue to be decided upon within Big Data
these are typically in the proper format for analysis, or can architecture (Q2) is: How is relevant information distilled from
be evaluated with established business analytics tools. data? The solution’s functional requirements call for a custom-
Structured data from external sources can be more difficult to developed predictive analytics approach. The process applied
treat due to its format or granularity. Most problematic in to this project phase is nevertheless generic (see also [32]):
this sense are unstructured sets of data, in particular from
external sources like social networks as getting hold of the Exploring & preparing data for analytics: A predictive
necessary information may afford complicated and expensive model hitherto necessitates input data being structured in
continuous or categorial formats [33] – what must be the
2016 IEEE PES Transmission & Distribution Conference and Exposition - Latin America (PES T&D-LA). Morelia, Mexico

outcomes of the ETL processes, which may also have added The following conclusions can be drawn with regards to
missing values or removed noise. Data quality should be the solution architecture: (1) There is not the one solution;
assessed through exploratory data analysis (EDA) prior to various approaches are thinkable even for well-defined
performing analytics [34]. This procedure may also offer problems; (2) Model development and improvement will be a
further insight into relationships within the data that might continuous process – on the one hand room for
deliver starting points for model development. improvements will be discovered on its use, on the other hand
fraudsters may adapt and improve on their strategies. Highly
Developing and training the model: A generic machine- recommended is a prototyping within solution development. It
learning algorithm must be chosen and trained with data. The is also advisable to reconsider the model for future versions.
algorithm identifies relationships and patterns within the data (3) The credibility of information must always be taken into
sets that can be generalized to a set of rules – the analytics consideration. Shady information may lead to worthy and
model – and used for prediction. Common algorithms are e.g. valuable insight, but might also cause trouble.
regression (linear / logistic), clustering (k-nearest neighbors / k-
means), stochastic (e.g. naive Bayes), decision trees / random C. Business integration concept
forests /bagged trees, black box approaches (artificial neural The architecture design question (Q3) to be answered here
networks, support vector machines) [34], [35]. Three things is: How may analytics outcomes is employed within
should be considered: At first, it may be necessary to dismantle operations? In a narrow sense, the key issue here is the degree
the problem to parts that can be solved with generic methods. of decision automation. Analytics outcomes are hopefully a
At second, not all identified patterns may be helpful. relevant resource of information that has been extracted
Integrating too many rules will overfit the model to the training from data. When it comes to the use of this information, a
data while too few rules make the model unreliable. At third, high-volume process with many decisions of low individual
applicable algorithms may tend to bias depending on how well value calls for automation by adding a software-based rules
they can learn from the concrete data. engine. In the current case, inspections can be automatically
Evaluating actual model performance: Effectiveness and triggered based on a defined threshold probability.
efficiency of a model’s performance are determined by the Alternatively, if the individual value of the decisions is high, a
proper output format (necessary criterion), the reliability of manual decision by an analyst or a manager might be more
predictions, and processing speed (comparative criterion). It suitable, which can be made reliant on available information.
must furthermore allow adding more variables, provide Further aspects relate to the integration in a wider
continuous/iterative learning capability, and be able to process sense: An aspect of the operationalization of analytics is an
large amounts of data, ideally by scaling out in an extreme organization, namely workflow / business process planning.
parallel processing environment. Another one is cultural: in order to adopt new practices,
Helpful to assess the quality of predictions is a confusion awareness of the problem needs to be raised and capabilities
matrix (see Table II). There are several key metrics [34]: must be built by training, in order to foster acceptance by
the staff being involved in the processes. Even if the new
TABLE II -Q2: C ONFUSION M ATRIX approach has been accepted in principle by the personnel,
actually positive actually negative intense assistance and supervision will initially be necessary in
predicted positive true positive (TP) false positive (FP) order to sustain the application until it has precipitated as a
predicted negative false negative (FN) true negative (TN) routine in the processes and a part of the employees’ mindset.
Accuracy is the ratio of correct predictions to total Finally, it is highly advisable to use standards like PMBOK,
predictions. ISO, CMM, ITIL, SOX to implement change management
concepts, since this is a key process to accomplishment.
= (1)
Precision is the ratio of correctly reported fraud to all V. CONCLUSION
indicated fraud, i.e. the probability that suggested fraud is Detecting electricity theft in energy grids by customer
actually fraud profiling is a complex problem, among other because of the
manifoldness of root causes and their ambiguity. Traditionally,
= (2)
systems for fraud detection are built to support a single
Recall is the probability of fraud being identified methodology, their lack of flexibility leads to new costs to
= (3) modernize it in order to adapt a strategy that has a great initial
success, but over time has its hit rate decreased given the
The output format is highly pre-defined by the problem in
this case, with a classification and prediction problem (high / adaptation of fraudsters.
low likelihood of fraud) and considering the uncertainty on Currently, Big Data technology with the use of design
underlying cause-effect relationships, it may be a relative pattern proposed, although with a larger initial investment,
figure with a business rule setting a threshold on when to allows such flexibility and lower total cost of ownership. Thus,
perform an inspection. Alternatively, a binary output could flag
allows the detection methodology to evolve in a continuous
suspicious individuals, but the quality of insight would be
lower. process that analyses the problem, identifies potentially
relevant data sources and makes data accessible, develops and
2016 IEEE PES Transmission & Distribution Conference and Exposition - Latin America (PES T&D-LA). Morelia, Mexico

improves analytics models to distil information, and provides Power Systems Conference and Exposition (PSCE), 2011 IEEE/PES.
IEEE, 2011, pp. 1–8.
means to employ use within the organization.
[12] R. Jiang, R. Lu, Y. Wang, J. Luo, C. Shen, and X. S. Shen, “Energy-
The approach has been depicted in an attempt to support theft detection issues for advanced metering infrastructure in smart
grid,” Tsinghua Science and Technology, vol. 19, 2014.
multiple methodologies to predict the likelihood of electricity
[13] S. McLaughlin, B. Holbert, A. Fawaz, R. Berthier, and S. Zonouz, “A
theft by customer data analysis. Both the solution and the multi-sensor energy theft detection framework for advanced metering
methodology still need to be validated in practice. We, together infrastructures,” Selected Areas in Communications, IEEE Journal on,
with another M.Sc. student, are in the early stages to validate vol. 31, no. 7, pp. 1319–1330, 2013.
the proposed approach in practice, so a pilot case comparing [14] P. Checkland and J. Poulter, Learning for action: a short definitive
account of soft systems methodology and its use for the practitioner,
the results obtained in practice by a local electric power teachers, and students. Chichester: Wiley, 2006.
distributor with the results of this new approach. [15] R. Haberfellner, O. L. de Weck, E. Fricke, and S. Vo¨ssner, Eds.,
Systems Engineering: Grundlagen und Anwendung, 12th ed. Zu¨rich:
ACKNOWLEDGMENT Orell Fu¨ssli, 2012.
The authors would like to express their deepest gratitude [16] “ISO/IEC/IEEE systems and software engineering – architecture
description,” ISO/IEC/IEEE 42010:2011(E) (Revision of ISO/IEC
and appreciation to Fernando José Gomes Landgraf, Carlos 42010:2007 and IEEE Std 1471-2000), December 2011.
Daher Padovezi, Zehbour Panossian, Alessandro Santiago dos [17] F. Vester, The Art of Interconnected Thinking: Ideas and Tools for a
Santos, Pedro Chinelato, Dirce Apparecida Rosaboni, Maria New Approach to Tackling Complexity, 2nd ed. Munich: MCB, 2012.
Aparecida Leal, Herbert Palm, Edith Wagner, Maria Celeste [18] A. Maclean and D. McKerlie, “Design space analysis and use
Cetra and Kornelia Muller. Additionally, the authors would representations,” in Scenario-Based Design: Envisioning Work and
like to extend their deepest gratitude and appreciation to the Technology in System Development, J. Carrol, Ed., Rank Xerox Ltd.
affiliates of IPT and Munich University of Applied Sciences. New York: John Wiley & Sons, Inc., 1995, pp. 183–207.
[19] “Strategy 2014/15,” UKRPA, London, Tech. Rep., April 2014.
REFERENCES [20] ABRADEE, “Furto e fraude de energia.” [On- line]. Available:
[1] X. Wu, X. Zhu, G.-Q. Wu, and W. Ding, “Data mining with big data,” http://www.abradee.com.br/setor-de-distribuicao/perdas/furto-e-fraude-
Knowledge and Data Engineering, IEEE Transactions on, vol. 26, no. 1, de-energia]
pp. 97–107, Jan 2014. [21] T. B. Smith, “Electricity theft: a comparative analysis,” Energy Policy,
[2] C.-Y. Lin, L. Wu, Z. Wen, H. Tong, V. Griffiths-Fisher, L. Shi, and D. vol. 32, no. 18, pp. 2067 – 2076, 2004.
Lubensky, “Social network analysis in the enterprise,” Proceedings of [22] S. McLaughlin, D. Podkuiko, and P. McDaniel, “Energy theft in the
the IEEE, vol. 100, no. 9, pp. 2759–2776, Sept 2012. advanced metering infrastructure,” in Critical Information
[3] J. Bollen and H. Mao, “Twitter mood as a stock market predictor,” Infrastructures Security. Springer, 2010, pp. 176–187.
Computer, vol. 44, no. 10, pp. 91–94, 2011. [23] G. M. Duffield and P. N. Grabosky, The psychology of fraud. Australian
[4] F. Johansson, J. Brynielsson, and M. Quijano, “Estimating citizen Institute of Criminology, 2001.
alertness in crises using social media monitoring and analysis,” in [24] L. E. Cohen and M. Felson, “Social change and crime rate trends: A
European Intelligence and Security Informatics Conference (EISIC). routine activity approach,” American sociological review, pp. 588–608,
IEEE Computer Society, Aug 2012, pp. 189–196. 1979.
[5] K. Franke and S. N. Srihari, “Computational forensics: An overview,” in [25] T. Winther, “Electricity theft as a relational issue: a comparative look at
Computational Forensics: Second International Workshop, IWCF 2008, zanzibar, tanzania, and the sunderban islands, India,” Energy for
Washington, DC, USA, August 7-8, 2008, Proceedings, ser. Lecture Sustainable Development, vol. 16, no. 1, pp. 111–119, 2012.
Notes in Computer Science / Image Processing, Computer Vision, [26] E. Stotland, “White collar criminals,” Journal of social issues, vol. 33,
Pattern Recognition, and Graphics, S. Srihari and K. Franke, Eds. no. 4, pp. 179–196, 1977.
Springer, 2008.
[27] D. R. Cressey, “Other people’s money; a study of the social psychology
[6] D. Hughes, P. Rayson, J. Walkerdine, K. Lee, P. Greenwood, A. Rashid, of embezzlement.” 1953.
C. May-Chahal, and M. Brennan, “Supporting law enforcement in dig-
[28] A. Kapardis and M. Krambia-Kapardis, “Enhancing fraud prevention
ital communities through natural language analysis,” in Computational
and detection by profiling fraud offenders.” Criminal behavior and
Forensics: Second International Workshop, IWCF 2008, Washington,
mental health: CBMH, vol. 14, no. 3, pp. 189–201, 2004
DC, USA, August 7-8, 2008.
[29] O.-H. Fjeldstad, “What’s trust got to do with it? non-payment of service
[7] M. Taylor, J. Haggerty, D. Gresty, P. Almond, and T. Berry, “Forensic
charges in local authorities in south Africa,” The Journal of Modern
investigation of social networking applications,” Network Security, vol.
African Studies, vol. 42, pp. 539–562, 12 2004.
2014, no. 11, pp. 9 – 16, 2014. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S1353485814701126 [30] M. Barlow, Real-Time Big Data Analytics: Emerging Architecture.
Sebastopol, CA: O’Reilly, 2013.
[8] M. Kandias, V. Stavrou, N. Bozovic, and D. Gritzalis, “Proactive insider
threat detection through social media: The youtube case,” in Proceedings [31] W. Raghupathi and V. Raghupathi, “Big data analytics—architectures,
of the 12th ACM Workshop on Workshop on Privacy in the Electronic implementation methodology, and tools,” in Big Data, Mining, and
Society, ser. WPES ’13. New York, NY, USA: ACM, 2013, pp. 261– Analytics: Components of Strategic Decision Making, S. Kudyba, Ed.
266. [Online]. Available: http://doi.acm.org/10.1145/2517840.2517865 Boca Raton, FL: Taylor & Francis, 2014, ch. 3, pp. 49–70.
[9] A. Chauhan and S. Rajvanshi, “Non-technical losses in power system: A [32] B. Lantz, Machine Learning with R. Birmingham: Packt, October 2013.
review,” in Power, Energy and Control (ICPEC), 2013 International [33] M. S. Brown, “Transforming unstructured data into useful information,”
Conference on. IEEE, 2013, pp. 558–561. in Big Data, Mining, and Analytics: Components of Strategic Decision
[10] T. Cazes, F. Macedo, E. Costa, L. Rios, A. Mendonc¸a, and L. Ribeiro, Making, S. Kudyba, Ed. Boca Raton, FL: Taylor & Francis, 2014.
“Detecção de anomalias de consumo por meio de análise baseada em [34] R. Schutt and C. O’Neil, Doing Data Science: Straight Talk from the
wavelet,” in VII Congresso de Inovação Tecnológica em Energia Frontline. Sebastopol, CA: O’Reilly, 2013.
Elétrica, Rio de Janeiro, 2013. [35] R. Caruana and A. Niculescu-Mizil, “An empirical comparison of
[11] S. S. S. R. Depuru, L. Wang, and V. Devabhaktuni, “Support vector supervised learning algorithms,” in Proceedings of the 23rd international
machine-based data classification for detection of electricity theft,” in conference on Machine learning. ACM, 2006, pp. 161–168.

You might also like