Anomaly Detection Using Visualisation

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/259096260
Anomaly Detection for Visual Analytics of Power Consumption Data
Article in Computers & Graphics · February 2014

DOI: 10.1016/j.cag.2013.10.006
CITATIONS READS
41 1,017
4 authors, including:
Halldor Janetzko Florian Stoffel

Lucerne University of Applied Sciences and Arts LKA NRW
51 PUBLICATIONS 619 CITATIONS 31 PUBLICATIONS 281 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
BigGIS View project
Visual Analytics View project
All content following this page was uploaded by Halldor Janetzko on 29 August 2018.
The user has requested enhancement of the downloaded file.

Anomaly Detection for Visual Analytics of Power Consumption Data
Halldór Janetzko, Florian Stoffel, Sebastian Mittelstädt, Daniel A. Keim

University of Konstanz. Email: firstname.lastname@uni-konstanz.de
Abstract
Commercial buildings are significant consumers of electrical power. Also, energy expenses are an increasing cost factor. Many
companies therefore want to save money and reduce their power usage. Building administrators have to first understand the power
consumption behavior, before they can devise strategies to save energy. Secondly, sudden unexpected changes in power consump-
tion may hint at device failures of critical technical infrastructure. The goal of our research is to enable the analyst to understand
the power consumption behavior and to be aware of unexpected power consumption values. In this paper, we introduce a novel
unsupervised anomaly detection algorithm and visualize the resulting anomaly scores to guide the analyst to important time points.
Different possibilities for visualizing the power usage time series are presented, combined with a discussion of the design choices
to encode the anomaly values. Our methods are applied to real-world time series of power consumption, logged in a hierarchical
sensor network.
Keywords: importance-driven, pixel-based visualization, anomaly detection, visual analytics
1. Introduction on a time-dependent energy consumption model. We have ex-

plored two different anomaly discovery methods. In the begin-
Commercial buildings consume a significant amount of elec- ning, we estimate the error rate using prediction. Then, we use
tricity. According to the Energy Information Administration’s clustering-based anomaly detection. Both methods have bene-
2010 statistics [1], the United States alone consumed an esti- fits and drawbacks and are complementing each other.
mated 1.3 trillion kW. It is about 37% of the total electricity The last step in our pipeline is the visualization being capa-
generated. How power is used in a commercial building has ble of effectively displaying large amounts of data and, at the
a large effect on energy efficiency strategies. The most impor- same time, allowing quick recognition of anomalous regions
tant energy usage is lighting. Then heating and cooling are next in the data. We integrated the three most common time series
in importance [2]. Current approaches for reducing the power visualization techniques (line charts, spiral visualizations, and
consumption for example integrate motion detection sensors for Recursive Patterns) presented in Aigner et al.’s book about time
each lamp switching them on and off. series [3]. Besides giving an appropriate overview of the data,
There is a growing interest in understanding how energy is the visualization is also able to support the administrator in a
spent in the commercial buildings. Furthermore, building ad- more detailed examination of the data, for example areas with
ministrators want to know how to reduce the failure rate and unusual power consumptions by interaction facilities. In ad-
detect anomalies. In addition, they want to know how to vi- dition, the visualization is capable of showing the hierarchical
sualize large volumes of energy consumption data collected by nature of the data set. This is necessary, because commonly
power meters (sensors) in a building to find patterns, trends, the energy consumption of different floors or buildings is inde-
and anomalies. In the end, our goal is to find how to automat- pendently monitored resulting in an inherent hierarchy in the
ically discover the anomaly, like unusual power consumption recorded data.
measurements highly differing from old observed patterns, and Our methods rely purely on the recorded power consump-
to reduce the energy cost of a building. For this task, anoma- tion data, which we were not cleaning in any way as the data
lies are of special interest, because they can be caused either by was in very good shape. There are many external influences
faulty equipment or potentially misconfigured devices consum- to the power consumption, like the environmental conditions
ing significantly more or less energy than required for proper or the number of people working in an office building. The
operation. large number and high complexity of external factors prohibit
In this paper, we present an analytical and visual approach the fully automatic diagnosis of anomalies. Hence, a human
to support the building administrators in detecting anomalies subject matter expert is needed to validate found anomalies and
and examining energy consumption data as shown in Figure investigate the interesting ones. Even though it is possible to
1. Our input data consists of a tree of time series reflecting think of extensions for an automatic analysis of anomalies like
the hierarchical nature of the power meters, e.g., one meter for incorporating external factors as weather data and holidays.
the whole building and one for each power outlet. The an- It is important to note that our methods are applicable not
alytical part is the automatic anomaly detection and is based only to power consumption time series data sets, although they
Preprint submitted to Computers & Graphics October 4, 2013
Time series Anomaly Pixel-based time series &
detection anomaly visualization
Figure 1: The input set of hierarchical time series is processed by anomaly detection methods. The resulting anomaly values are visualized together with the time
series values by pixel-based techniques. The visualization combines the raw time series with boosting techniques like highlighting and blurring for the anomaly
scores.
have been developed with a particular application in mind. This time-of-week and piecewise-linear modeling approach to ana-
is caused by the general nature of time series data and the gen- lyze commercial and industrial electric load data. To our knowl-
erality of both, the analytical and the visual methods presented edge, the unsupervised anomaly detection algorithms from pre-
in this paper. The most application-dependent part of this work diction and clustering described in this paper differ from the
is the anomaly detection being designed for daily patterns. Mathieu et al. method in two aspects: finer granularity and
weighted by time distance (recent data weights more than old
data).
2. Related Work
The review of several prediction methods for power data
Reading energy consumption statistics shows that commer- performed by Zhao et al. in [10] investigates the effectivity
cial buildings have a high energy usage, which motivates many and efficiency. Neural networks and Support Vector Machines
research projects developed to improve power efficiency. Within were performing better than statistical approaches. We though
the context of our work two main categories can be distinguished: decided to use the prediction technique developed in [11] as
analysis of power consumption data (detecting whether the en- peak-preservation is one of the main strengths of this technique.
ergy consumption performs normally or abnormally over dif-
ferent locations and time) and visual analysis (visualizing simi- 2.2. Visual Analysis
larities and anomalies with appropriate interaction techniques). Visualization of building energy consumption has not yet
been a major focus of research thus far. Most of the energy
2.1. Analysis of Power Consumption Data consumption visualizations have been time series line charts,
Applying data mining techniques for power consumption scatter plots, and maps [12, 13, 14, 15]. Recently, Many Eyes
data is a known approach for identifying abnormal usage be- [16] allows analysts to choose a visualization type for analyz-
havior. Agarwal et al. [4] examined 6 months of data from ing public building electricity consumption. The Google Pow-
the UCSD campus, including aggregate power consumption of erMeter [17] recently provides a free energy monitoring tool for
four buildings. Agarwal et al. focus more on the setup of power people to view home energy usage.
meters and provide only simple visualization methods like line In addition to these existing tools, improving visualization
charts. Catterson et al. [5] used an approach to monitor old techniques for time series data is ongoing research work. In
power transformers. Their goal is to proactively search for ab- SAVE [18], Lei Shi et al. presented a sensor anomaly visualiza-
normal behavior that may indicate the transformer is about to tion engine that guides the user to diagnose sensor network fail-
fail. Similarly, McArthur et al. [6] searched for anomalies to ures and faults using multiple coordinated views. In this paper,
detect problems with power generation equipment. Jakkula and we map multiple sensors’ calendar time series in a single view
Cook [7] compared several outlier detection methods to find to enable users to visually analyze energy usage and identify
which is better at identifying abnormal power consumption. anomalies. In SAGA Dashboard [19], Buevich et al. provided
Seem [8] used outlier detection to determine if the energy con- a visual interface for interaction with the sensor network. They
sumption for a day is significantly different from previous days’ require the user to use a device that tracks and visualizes home
energy consumption. This is a known approach for identifying energy usages. We extend the home energy consumption vi-
abnormal system behavior. sual analysis to large commercial buildings with dozens of sen-
The work conducted at Lawrence Berkeley National Labo- sors. We therefore restricted ourself to space-efficient visualiza-
ratory [9] focuses on demand response. Mathieu et al. used a tions like pixel-based Recursive Patterns. Furthermore, no pre-
2
defined devices and sensor types in our methods are required. main usage scenario abnormal behavior is defined as a differ-
Another related work being capable of visualizing hierarchical ence from the expected daily pattern. Both methods described
time series data are the TimeEdgeTrees introduced by Burch below assume a daily power usage pattern which, of course, can
and Weiskopf in [20]. The technique shows the time series as be different for each day of the week. Both techniques are not
one-dimensional, color-coded timelines instead of drawing the limited to daily patterns, but can be easily adapted to the peri-
graph edges. The hierarchy is preserved better by this approach odicity of the underlying data set. The first described method
while the space-efficiency is worse compared to the pixel-based is based on a weighted prediction, where recent measurements
approaches we use. We chose the pixel-based techniques as pe- have a higher impact than older measurements. The latter ap-
riodic patterns are easier perceivable. Additional discussions proach is transforming the observed daily pattern in the fre-
on related work concerning anomalies detection and boosting quency domain and looking for dissimilarity in a transformed
methods can be found in sections 3 and 4. space.
2.3. Our contribution 3.1. Prediction-based Anomaly Detection

To leverage the prior work and to support analysts in un- The basis for prediction is an observed pattern and the as-
derstanding power consumption data, we combine automated sumption that it is reoccurring (with slight modifications) in the
anomaly detection algorithms with interactive time series vi- future. If this assumption does not hold true, the predicted val-
sualizations. The resulting anomaly score is used to highlight ues may be far off the measured values. Considering this fact
unusual power usages in the time series visualizations. Our con- the other way round, observed values far distant from the ex-
tributions in the visual analysis process of power consumption pected ones tell us the model used does not explain the observed
data are: values. There might be two reasons, the first one is that the
model quality is not good enough and the second one is that the
1. In the anomaly detection process, we values are really differing from the expected and explainable be-
• detect power consumption anomalies based on ei- havior. We assume our data follows a regular underlying pattern
ther a clustering-based approach or a time-weighted and therefore also assume that the model describes the usual be-
prediction. havior well. Detecting anomalies using prediction follows this
idea and is related to the statistical measure of residuals.
• compare the prediction-based method with a simi- The prediction method used is crucial for the reliability and
larity based anomaly computation.
expressiveness of the computed anomaly scores. As already
2. In the time series and anomaly visualization process, we: stated above we assumed daily patterns and include develop-
• map the hierarchical time series onto a Treemap. ments over time into the prediction process. We decided to
use a prediction method developed and introduced in our ear-
• embed in each Treemap cell the corresponding me- lier works [11]. Basically, this method predicts a value for
ter’s time series visualization. each minute of the day by taking all previous measurement at
• provide different time series visualization techniques the same time of the day. As an example, assume we predict
dependent on the analysis purpose. the value for a Tuesday at 11:05am. We would now average
• visualize the anomaly score by visual boosting of all previous observed values of a Tuesday at 11:05am. Tak-
the raw time series representation. ing just an average would have the disadvantage of neglecting
recent developments in the time series. We therefore used a
Furthermore, we have provided an advanced visual inter- weighted averaging scheme with higher factors for recent val-
face enabling the user to visually analyze the power usage. His- ues and linearly decreasing influence weights for older values.
tograms for viewing the frequency and power usages of impor- Further detailed explanations can be found in [11]. This pre-
tant meters; visual queries for analyzing correlation and similar- diction method works very well for weekly patterns and will
ity; and various options on visualization types, Treemap layout, neglect holidays or other external events. The prediction model
colormappings, and anomaly score computations enable the an- will adjust to seasonal changes, but alternating behaviors can-
alysts to tailor the visualization to their needs. not be modeled by this approach. Furthermore, power usage
patterns randomly distributed over a day will negatively influ-
3. Anomalies Detection ence the prediction quality.
After predicting for each point in a time series the expected
Detecting and exploring of anomalies in time series is a values based on all values occurring before this point in the
very important aspect, especially when dealing with power con- time series, we can compute the difference between predicted
sumption data of physical infrastructure. Saving cost and en- and observed values. The difference is an indicator for the ab-
ergy as well are one of the main motivations for observing and normality of the point in a time series but needs for higher ex-
analyzing consumption data. But when dealing with infrastruc- pressiveness some kind of normalization. From the choice and
ture that may be even system-critical the number of failures the design of the prediction method we are assuming a model
must be reduced to an absolute minimum. Early signs of fail- which may not being applicable to all observed time series. We
ure should be visible in abnormal power usage patterns. In our counterbalance for this fact by calculating the average fitting of
our model. More in detail, we compute the average deviation
3
from the predicted values for the whole time series. If a whole
0 am max max
time series is highly unpredictable, the differences between pre-
1st Monday
6 am
dicted and actual values are less meaningful compared to a case 12 pm
when a time series follows perfect daily patterns with small de- 6 pm
viations. Computation of the anomaly score is summarized by 0 am
power consumption
2nd Monday
anomaly score
6 am
the following equation: 12 pm
6 pm
0 am
3rd Monday
|predVal[time] − obsVal[time]|
anomaly[time] =
6 am
12 pm
avgt∈T ime (|predVal[t] − obsVal[t]|)
6 pm
0 am
The variable time is the point in a time series for which
4th Monday
6 am
the anomaly score is calculated. At this position the difference 12 pm
between the predicted and observed value is computed and af- 6 pm min min
terwards normalized by the average deviation from the model. a) actual values b) prediction-based
anomaly score
c) clustering-based
anomaly score
3.2. Clustering-based Anomaly Detection

Figure 2: Comparison of the resulting anomaly scores based on the proposed
The second approach for detecting anomalies in time series methods. The third Monday shows an unusual behavior being reflected in the
data is similarity-based. We assume often-observed patterns to anomaly scores.
be the usual behavior and rarely occurring patterns to be abnor-
mal. Following this idea, we first have to define and compute
The second essential difference can be seen in the complex-
the similarity of patterns in order to detect whether a pattern
ity of the methods. Transforming the dataset in the frequency
occurs more than once. The approach described in this section
domain and applying MDS results in a data space with axes
is proposed and presented by Bellala et al. in [21, 22]. The
hard to interpret. But the frequency domain is typically less
time series is first partitioned into days and afterwards trans-
prone to noise and induces some robustness to the observed
formed by a Fourier transformation into the frequency domain.
time series. Though the transformed data space is complex,
Each day of the time series is resulting in a k-dimensional vec-
there exists the possibility to extract models of typical behavior
tor in the frequency domain with k being a parameter of the
by computing cluster representatives. Furthermore, the cluster-
transformation process. The next step described by Bellala et
ing approach allows supporting several typical ’behaviors’ of
al. is a dimension reduction by multi-dimensional scaling into a
a time series. Just assume a time series alternating between a
two-dimensional space. The density distribution in the reduced
day-work and a night-work pattern. The clustering approach
MDS space is now interpreted as an anomaly score. Points
will assign both low anomaly scores as both patterns are ob-
(time series of a single day) being in a high-density area with
served often, whereas the prediction-based method will assign
many (similar) neighbors are assumed to reflect the usual be-
each day a very high anomaly score as averages are computed.
havior. Outliers in the 2D space can be seen as days with un-
In Figure 2, we present a visual comparison between both
usual values and are assigned a high anomaly score. This tech-
anomaly computation methods. The first column shows a vi-
nique only takes the frequency domain into account and does
sualization of the observed values for four consecutive Mon-
not integrate external effects like weather data or week of the
days. The exceptional behavior of the third Monday is obvi-
day.
ous. This anomaly is reflected in all computed anomaly scores,
while the higher temporal resolution for the prediction-based
3.3. Comparison of Anomaly Detection Methods
method is visible. Altogether, the clustering-based approach
We previously described two methods for computing and is good for cases when time series switches between different
detecting anomalies and both come with their advantages and typical behaviors and the prediction-based approach is good for
drawbacks. Comparing both methods the most obvious differ- cases when the behavior slightly changes over time following a
ence is the resolution of the anomaly score. The prediction- (seasonal) trend.
based method computes for each point in a time series one
anomaly score, whereas the clustering-based method returns
only one anomaly value per day. It is of course possible to ex- 4. Anomalies Visualization
tend Bellala’s technique to cope with hours or even minutes of a The anomaly scores computed in the previous section are
day, but noise might influence the clustering approach. This be- used to highlight important time intervals of the input time se-
havior is inherited from the computation of the anomaly scores. ries. The visualization for the time series is influencing the de-
The clustering-based technique uses daily time series and uses sign possibilities depending on the visual variable encoding the
them as one data item in the clustering process. An anomaly numerical values. We implemented for comparison three well
value is assigned to each data item based on the density dis- known, state-of-the-art methods to visualize time series data:
tribution. Therefore, there is no possibility to assign different Recursive Patterns [23, 24], Spirals [25], and the traditional
anomaly scores to temporal sub-units of a day (i.e., minutes, line chart. These techniques are configured to visualize only
hours). one time series.
4
We will discuss the different design alternatives and moti- space is an extension to the HSV color space that allows mono-
vate our design decisions in the following sections. We focus tonical changes in lightness.
hereby on the possibilities to encode the time series and the The two proposed color encodings for the anomaly values
anomaly values simultaneously. We describe all state-of-the art can be seen in Figure 3. The first row depicts the original time
techniques visualizing time series, namely Recursive Patterns, series without any anomaly scores. We use different intensity
spiral visualizations, and line charts. levels to encode the anomaly scores and highlight important
areas. The effect of the intensity boosting can be seen in the
4.1. Recursive Pattern second row of Figure 3. For further visual boosting we com-
The Recursive Pattern is a pixel technique using coloring to bined blurring and intensity highlighting shown in the last row
visually encode numerical values using a parameterized space- of Figure 3.
filling layout. They are capable of displaying large amounts We added another highlighting technique, in order to direct
of time series data in a space-efficient way. By setting proper the analyst to the anomalous regions of the time series. This
parameters the resulting display can be calendar-like pixel vi- highlighting imitates the human perception regarding a focus
sualizations. The layout for one day is shown in Figure 2 and and the context area, where usually the focus area is sharp and
the values of the series are clearly recognizable and furthermore the context area is blurry. We used a similar approach to Kosara
patterns or exceptions are easy to identify. et al. and Giusti et al. in [31, 32]. Since we have the anomaly
In order to incorporate the anomaly score in the Recursive score for every element of the visualization, we are capable to
Patterns, we present a color boosting technique that highlights determine the important areas of the time series analytically
data points by manipulating the intensity of color values ac- corresponding to the focus area of the analyst. The implementa-
cording to the anomaly score. The boosting techniques applied tion adapts locally the blurring according to the anomaly value
in this paper are a subset of the techniques presented by Oelke of each element in the Recursive Pattern. Low anomaly val-
et al. in [26]. Out of the presented techniques, we used only ues are more blurry than areas with a high anomaly score. This
color highlighting as the visualization will not be overcrowded adaptive blurring technique utilizes the human depth intuition
with visual cues. Color boosting techniques bias the visual im- guiding the analyst to the interesting areas first in a pre-attentive
pression, which might lead to misinterpretations of the visu- way, depicted in the bottom row of Figure 3. The blurring will
alization. To keep these artifacts at a minimum, we created a affect the visibility of pixel borders, and it influences the com-
perceptual uniform colormap that only varies over hues with- parability between highlighted and non-highlighted areas. We
out variation in intensity. Since the change in intensity does though believe that the preattentive focusing on anomalies helps
only minimally shift the hue, the original color tone can be re- the analyst in assessing interesting points in time at a glance.
constructed mentally.
It is known that RGB and HSV are not perceptually uniform 4.2. Spiral Visualization
and that linear interpolations within these models do not pro- The spiral visualization is a technique to display recurring
duce color scales with equal or monotonically changing light- time series data with a fixed periodicity. Our implementation is
ness [27]. CIE LUV and CIE LAB have already been proven based on an Archimedean spiral, where the radius grows pro-
useful in former visual analytics research [28, 29]. By varying portionally to the spiral angle, which leads to a uniform expan-
over the color opponents (a and b) but maintaining the same sion of the spiral over time. In our implementation, each round
lightness value L, a perceptually uniform colormap can be cre- of the spiral is used to display one day of data. The proportional
ated in the CIE LAB color space. However, interpolations in growth of radius and spiral angle, combined with the absence
CIE LAB can lead to undefined RGB signals and thus, this color of any border between each circle makes it possible to build a
space cannot be used in the final application. Therefore, we use space-efficient visualization. Comparing the value of the same
the HSI color space [30] for intensity manipulation. This color time span on different days is possible, because these values are
Figure 3: Different methods to display the anomaly value. Top row: the time
series values without anomaly values. Second row: the intensity of the color
is adapted to the anomaly value. Third row: color intensity representing the Figure 4: Spiral visualization of time series. The left spiral shows the actual
anomaly score combined with adaptive Gaussian blurring. time series data, the right spiral shows the time series data with brightness and
saturation value adapted to the anomaly score of the corresponding polygon.
5
on a straight line going from the center of the spiral to the out- of over-plotting and visual clutter. It may be fine for line chart
ermost part of the spiral. Each polygon along this line displays displayed on a large screen, but as soon several line charts are
the same time span of different days. displayed the technique does not work anymore.
To show the anomaly score of each of the displayed time To show the anomaly value simultaneously with the time
spans, we apply the same color manipulations as described for series values, we used the empty space in the background of the
the Recursive Pattern above. The right spiral in Figure 4 shows line chart as shown in Figure 5. For each data point, we plot a
the described color saturation and brightness adjustment to high- red stripe in the background. The anomaly value is mapped to
light the anomalous values of the time series. By comparing the opacity of the stripe in a way, that for the lowest anomaly
the left with the right spiral the highlight of the outer ring of value it is completely transparent and therefore not visible. In
the right spiral is clearly visible. There is a time range with contrast, the highest anomaly score causes the stripe to have the
unusual numerical values beginning after one fourth of the day highest opacity resulting in a clearly visible, red stripe.
and lasting for one quarter of a day. Besides that, some little To reduce the visual clutter introduced by coloring the back-
colorful spots are visible in the right visualization, which were ground, we also support a minimized view. In this view, the
not that visible when applying only the brightness or saturation anomaly stripes are only plotted above and below the line chart,
modification technique. which keeps the visualization distraction-free, but still shows
the anomaly values. A comparison of both anomaly visualiza-
4.3. Line chart tion techniques for line charts can be seen in Figure 6.
The most common visualization of time series data is un-
doubtedly the line chart. The main difference to the Recursive
Pattern or spiral-based visualization can be found in the encod-
ing of the actual time series value. In the latter two, the series
value is shown by colored polygons, which have a spatial ex-
tent. In contrast, encoding the value in a line chart is done
by the position on the y-axis. The brightness and saturation-
based techniques to add the anomaly value into the visualiza-
tion makes no sense in such a positional encoding, having only
a very small area available for the coloring. Coloring segments
of the line and applying the same techniques to enrich the line Figure 6: Comparison of the anomaly visualization technique for line charts.
with anomaly score information as before is not helpful as line On the left, the whole background is used to show the anomaly scores, whereas
segments are very hard to see. To use coloring a larger line on the right, only a small stripe on the top and bottom of the chart background
stroke would be necessary, which would introduce high amount is used to display the anomaly score, which reduces the clutter from the back-
ground coloring.
4.4. Treemap Integration

We integrated all visualizations in a Treemap display [33,
34, 35, 36] (see Figures 5 and 7). In that way, the hierarchical
nature of our time series dataset is reflected in the visualiza-
tion. Treemaps are showing the leaves of each selected branch
and the nesting depth by borders. The selection of visualized
nodes can be achieved twofold, either by interactive roll-up or
drill-down operations in the Treemap visualization or by an ad-
ditional vertical tree representation. Our design choice using
Treemaps, though they visualize only the leaves of each branch,
was implied by the application needs. The analysts are mainly
interested in finding the root-causes of anomalies first and later
on in analyzing the impacts by traversing the hierarchy to the
root node. Further details concerning the used time series can
be seen in the application section 5.
Each cell of the Treemap contains the visualizations of the
time series building one branch of the hierarchy. The border
of each of the cells is furthermore drawn in white to allow a
clear distinction in terms of the hierarchy. The caption of each
Figure 5: The line chart visualization in a Tree Map with a horizontal strip lay- Treemap cell is used to display the numerical value used for
out using the value weighted anomaly to determine the cell size. The effect of
the layout score is clearly visible enlarging the time series not with the high-
layout and the cell label.
est anomaly score, but with the highest anomaly level regarding the time series The numerical value is used by the layout manager to com-
values. pute the final Treemap layout and directly influences the size of
6
a single Treemap cell. The computation of the numerical values this leads to the conclusion that a squarified layout is not the
is critical for the expressiveness of the visualization since the best choice. Instead, we implemented a so called strip lay-
size of a cell has a large influence on the perception. The size of out [38], which makes sure that the line charts are getting more
a Treemap cell can be computed by different measures. Given space on the horizontal axis than on the vertical (see Figure
the interest of an analyst to quickly recognize unusual or highly 5). Otherwise, the line charts would be very hard to interpret
anomalous time series, the Treemap layout can be adjusted to and this would be an unfair comparison to the pixel-based tech-
support these tasks by computing the layout score in different niques. Note that the size of each Treemap cell still reflects the
ways. For example, the analyst can choose between the sta- numerical value used for layout.
tistical variance, sum, or the arithmetic mean of the anomaly
score. To incorporate the level of the anomaly, there is also the 4.5. Comparison of Anomaly Visualizations
possibility to compute the layout based on the product of the We have presented three different state-of-the-art visualiza-
anomaly score and the time series value. In addition to anomaly tion approaches for time series and visual extensions to show
score-based layouts, the sum and the statistical variance of the time series and anomaly score simultaneously. All techniques
time series values can be used to compute the layout. Having have their own advantages and disadvantages. The Recursive
those choices, the visualization can be adapted to the priorities Patterns presented first have the ability to visualize large amounts
of the analyst independently of the visualization technique. We of data in a very compact and space efficient way. Regardless of
also added the possibility to assign the same importance value the shown time range, lasting from weeks and months to years,
to each node resulting in a regular layout enabling easy compar- the Recursive Patterns are always capable of showing the data
isons. Beside the general layout the actual width and height (the in a readable fashion revealing patterns. The visualization is
aspect ratio) of a single cell is an important factor when using designed in such a way that the value representation by color
different time series visualization techniques. For that reason, enables the analyst to easily spot interesting areas or regular
we implemented different layout algorithms for the previously patterns, nearly independent of the actual size of the visualiza-
described visualization methods. tion. In Figure 7, patterns and outstanding time spans are visi-
ble, even in the compact Treemap representation of 19 different
time series. Having spotted regular patterns Recursive Patterns
enable also the cross-comparison in different time series, since
the relative position of one point in time is well-aligned. Using
Recursive Pattern in Treemap is more difficult to compare the
same hour of a day, for example, as the position of the same
hour varies through the visualization.
Comparing the same hour is an advantage of the spiral vi-
sualization as the periodicity was set to daily patterns. The an-
gular encoding of the time of a day enables these comparisons
as a straight line from the spiral center to the outer spiral con-
nects these data values. With such visualizations, it is easy to
Figure 7: Treemap visualization of 19 time series, each time series has four
weeks of data. Interesting spots or patterns in the data are highlighted and can
be therefore easily detected.
A Recursive Pattern has a rectangular shape and, therefore,

a squarified layout [37] is applied to the Treemap. This layout
algorithm results in a square-like cell, which obviously leads
to an efficient space usage of the overall display. In addition,
we framed the Treemap cells to improve the overall structure
perception of the Treemap and the hierarchical representation.
The circular shape of the spiral graphs combined with the
squarified Treemap layout leads to the best readability and space
efficiency. We hereby maximize the size of visualization and at
the same time use as much space of the Treemap cell as pos-
sible. Creating the layout for Treemaps containing line charts
comes with a fundamental difference to the Recursive Pattern
Figure 8: Screenshot of our prototype showing the hierarchical and temporal
and the spirals: the width of a line chart is much larger than its selection possibilities together with the visualization panel.
height as our observed time span is quite long. Consequently,
7
Figure 9: Overview of the power consumption data from 28 sensors during 48 weeks. Despite the huge amount of data, patterns are still clearly visible. On the
right, the same visualization with adaptive blurring highlighting unusual power consumptions can be seen.
explore the value of the time series over time. In addition, com- ure 8. The left panel allows the navigation through the hierar-
paring time ranges and/or spot longer lasting trends is a simple chy of the sensor graph by selecting the nodes being visualized.
task, since the analyst has only to follow the continuous spiral The visualization panel in the center consists of the Treemap
over time. This is an advantage compared to the non-continuous visualization together with a colormap legend. The panel at the
time display of the Recursive Patterns, where layout breaks are bottom of the window allows to navigate in time and select the
needed, as with any space-filling curve. Line charts are great for time range that should be visualized. This timeline visualiza-
detailed visual explorations of continuous data for single time tion shows the total amount of power usage over time in order
series. For the usage scenario of anomaly visualization, there to give the analysts additional hints.
exist only a small number of application possibilities, since con- We implemented beside animation also interaction techniques
densed visualizations are needed as limited screen space is an like dragging the selected time range (blue rectangle in the time-
issue. The low space efficiency of line charts leads to our pro- line visualization) left and right causing immediate updates to
posed solution to re-use the empty space in the background to the visualization. The visualization allows basically three inter-
visually encode the anomaly value. We avoid the arising vi- action possibilities. The first is a tooltip allowing to inspect the
sual clutter by applying the stripe based anomaly visualization, underlying data values invoked by mouse hovering. We further-
which keeps the anomaly information but reduces the colored more directly support drill-down and roll-up operations in the
area distracting the analyst. Treemap visualization, allowing the analyst to keep his focus on
the visualization during traversing the sensor graph. Finally, the
analyst is able to select a region in the visualization and query
5. Applications
the system for similar time series sharing the selected behav-
The prototype integrating all the presented analytic and vi- ior by means of distance or correlation calculations. Switch-
sual techniques focuses especially on the detection of anoma- ing the visualization technique, colormap, value normalization,
lies and their temporal occurrence. With this task in mind, two anomaly calculation, or the weights for the Treemap layout is
general use cases can be identified. First, general browsing and possible by selecting the respective option.
exploration of the data is important to get an overall impres-
sion of the power usage. All different visualization techniques 5.2. Visual Inspection of Anomalies
presented above can be applied to gain from their individual In this use case, the building administrator gets the infor-
strengths. The second task is the examination of a specific is- mation, that in February 2012 the overall power consumption
sue, like unusually high or low power consumption. Our system and energy costs of a building was higher than expected. The
can provide the analytical and visual insights necessary to find investigation starts by getting an overview and some contex-
the source of the unusual energy consumption. All visualiza- tual information about the general energy consumption of the
tions are integrated in the same analytical framework, but use building. Undoubtedly, the most suitable visualization for this
different methods of displaying the power consumption and the task is the Recursive Pattern Visualization, which can be seen
anomaly values. in Figure 9. The blurring approach at the right side highlights
the anomalies further compared to the left figure, where we vi-
5.1. Analytical Framework sualized anomalies only by color intensity. The resulting visu-
Our prototype consists basically out of three parts reflecting alization points directly to one time series, which can be seen
the different dimensions in the data set and can be seen in Fig-
8
Figure 10: Sensor readings of sensor AE3 from 6th February to 4th of March 2012. On the left, the power consumption is visualized. On the right, the intensity of
the colors reflects the anomaly score. Due to the high intensity, an area in the fifth column of the third row stands out.
in Figure 10 on the right. Both, the left and the right visualiza-
tion show the power consumption data beginning on 6 February
2012. Each of the bigger rectangles contains the data from one
day, starting with Monday on the left. In total, there are four
weeks of data visible, starting on 6th February and ending on
4th of March.
In the visualization, there are some single, outstanding spots.
Those look relatively random and last only one pixel, which
stands for a time span of five minutes. Although the color is
quite intense and reddish, they are far too few and do not last
long enough to have a large influence on the power consump-
tion. Besides these spots, an area in the fifth column of the third
row stands out. The intensity seems to increase from pixel to
pixel over a long time. Having in mind, that one small black-
framed rectangle of the Recursive Pattern stands for one hour,
the anomaly score seems to increase over ten hours, until sud-
denly the anomaly score drops again. Due to the long dura- Figure 11: The time series query result window. On the top left, the query time
tion of the anomaly and the intense red color, the actual energy series is displayed, on the right the top-n query results are shown. The query
range is highlighted.
consumption in this time frame is very high. This makes this
anomaly a candidate for the cause of the higher energy costs in
February. this additional knowledge, the building administrator can con-
The building administrator found an anomaly in the given clude that the anomaly affected not only one, but at least four
time frame with the Recursive Pattern visualization. To identify parts of the building, where the sensors have been installed.
potentially correlated time series, our prototype implements a The quality of the conclusions drawn from the visualiza-
top-n time series similarity search. The query can be created by tions and analytical methods depends heavily on the sensor de-
clicking on a part of the visualization and selecting the query ployment. If each of the sensors monitors a single machine
area with the mouse. Afterwards, the desired similarity mea- or office, the building administrator has a concrete subject of
sure can be selected. The system supports the standard Eu- further examination. When they are deployed in a more gen-
clidean distance and positive, negative, and unsigned Pearson eral way, for example per building floor or even per building,
Correlation for different analysis tasks. In this case, selecting the shown analysis allows narrowing down the investigation of
the positive Pearson Correlation and the Euclidean Distance is power consumption to the affected units.
appropriate. The result of the query can be seen in Figure 11.
The query results show three very similar series: AE4, AE5,
6. Evaluation
and AE6. All three sensors are part of the same subtree of the
sensor hierarchy. This means they are located in the same build- We showed the applicability of our proposed technique in
ing as sensor AE3, which logged the time series identified as the previous application section, but it is very important that
anomalous by the Recursive Pattern visualization before. With real expert users rate our approach effective and helpful. We
9
therefore presented our approach to the target user group in a found some unexpected anomalies they applied further analysis
big company. We had contact to two analysts and interviewed techniques.
them first about their state of the art technology. The com- The experts very much appreciated the possibility to select
pany develops sensor networks measuring the power consump- a region in the time series and query for other similar time se-
tion for large buildings and is experienced with power manage- ries. When they selected a leaf in the hierarchy of time series
ment. The current state of the art technology they are using is a they would look for the impacts of the anomaly on the parent
line chart based visualization. They are able to select arbitrary nodes. The other way around, querying for anomalies on higher
time frames and inspect the temporal power distribution. Fur- levels would show the root-causes for the unusual power con-
ther analysis steps are yet impossible to perform. In later meet- sumption.
ings we explained our approach to the experts and afterwards A possibility for improvement mentioned by them is the in-
let them interact with our system and investigate the time series tegration of external events into the application. Sometimes
data. We asked them to describe their typical way of analyzing managers know in advance of extraordinary events that will
data and furthermore to comment on our proposed technique by cause unusual power consumptions. It should be possible to
thinking aloud using our prototype. We got very valuable and include this information whenever available and to reflect the
interesting feedback from the experts regarding the benefits and additional events in the visualization. Overall they found the in-
room for improvement. tegration of different time series visualization techniques com-
First of all, they validated the temporal patterns shown in bined with an anomaly representation very helpful and wanted
the pixel-oriented visualization techniques with their knowl- to integrate our techniques in their management tools.
edge of typical power consumption patterns. Their proof-of-
concept was that the daily periodic patterns were visible at a
7. Conclusion
glance, at the same time reflecting their expectations for the
time series. After they found the patterns like low power con- Analyzing and interpreting unusual patterns in time series
sumption at nighttime and weekends they started to look for data is a very important task. In this paper, we applied novel
anomalies using our visual boosting techniques. At first, obvi- analysis and proven visualization techniques to a system, which
ous patterns like holidays or the Christmas vacation have been supports analysts finding those patterns in a visual way. We
found. Afterwards, less obvious patterns have been investi- supported the analysis process by computing anomaly scores
gated. During their analysis, we asked the experts to comment of the given time series data with an anomaly detection algo-
on our techniques and give feedback related to visualization and rithm which produces very fine grained results. This also allows
analysis methods. the creation of detailed visualizations resulting in a fine grained
The first point they commented on was the helpfulness of pixel-based date representation. Furthermore, the algorithm is
the overview visualization in the form of the Recursive Pat- very efficient in terms of required computing power, because it
terns. Compared to line chart based visualizations they are very does not require expensive transformations nor does it rely on
familiar with, the calendar-like representation of the power con- elaborate analysis of the time series data.
sumption was highly appreciated. Furthermore, the possibility Having the anomaly scores, different visualizations can be
to interactively change the visualization type helped them a lot used to get deep insight into the time series and the anomaly
to get familiar with the pixel-oriented techniques. The color- scores, depending on the task to fulfill. Recursive Patterns gen-
ing of pixels was intuitive to them and they could interpret the erate overviews of large time spans and large amounts of data.
visualization easily. Spiral views provide the possibility to quickly detect and ana-
From an analysis point of view, a very interesting point was lyze periodic patterns. If the actual data values are of interest,
their comment on our prediction-based anomaly computation. the classical line charts are also available for further investiga-
They agreed with our definition of anomaly: ”The anomalous tions of the data set.
day is likely to deviate from the daily pattern in some way.“ As The double encoding of time series values and anomaly
shown above, our anomaly method is very fine grained, but to scores is solved in different ways. The novel adaptive blurring,
the experts a single time spot with a high anomaly score is not which generates a focus and a context area by blurring the vi-
important. They were more interested in longer periods of un- sualization according to the anomaly scores, guides the analyst
usual behavior, starting at approximately one hour duration. On directly to interesting spots of the visualization. This makes
the other hand the related anomaly computation method based the technique a particular advantage in overview visualizations,
on days was too coarse-grained for them for this kind of anal- where irrelevant areas of the time series are losing their level of
yses. An aggregation of the anomaly values might help to let detail by a strong blur, whereas interesting, high anomalous ar-
the analyst focus on the severe anomalies. The visualization eas are clearly visible and attract the focus of the human eye. To
of the anomaly scores together with the time series was men- support the display of multiple visualizations, the well-known
tioned very positive, especially with respect to the Recursive Treemap approach is extended by layouts based on space effi-
Pattern. The overview calendar-like visualization with inten- ciency and specific visual properties of the visualization. Since
sity highlighting and adaptive blurring let them focus on the the anomaly scores determining the layout can be selected de-
interesting spots. They had the impression that their attention pending on the analysis task, the resulting Treemap layout also
was guided to the anomalies, while the unimportant, common supports further analyses.
daily patterns were pushed in the background. As soon as they
10
The use case of power consumption data shows the appli- [16] IBM Research and the IBM Cognos software group . Many eyes: Public
cability of the methods shown in this paper. The general nature building energy consumptions. 2013.
[17] Google PowerMeter. viewed 6/17/13. http://www.google.com/
of the analysis and visualization methods makes it possible to powermeter/about/.
apply these techniques to time series not only from the applica- [18] Shi L, Liao Q, He Y, Li R, Striegel A, Su Z. SAVE: Sensor anomaly visu-
tion domain of power consumption data. In the future, we want alization engine. In: Visual Analytics Science and Technology (VAST),
to integrate external knowledge like known events influencing 2011 IEEE Conference on. IEEE; 2011, p. 201–10.
[19] Buevich M, Rowe A, Rajkumar R. SAGA: Tracking and Visualization
the time series like weather information. It would be also in- of Building Energy. In: Embedded and Real-Time Computing Systems
teresting to automatically determine the visualization method, and Applications (RTCSA), 2011 IEEE 17th International Conference on;
colormap, and possible enhancements like the adaptive blurring vol. 2. IEEE; 2011, p. 31–6.
based on the displayed data. [20] Burch M, Weiskopf D. Visualizing dynamic quantitative data in hierar-
chies. In: Proceedings of International Conference on Information Visu-
alization Theory and Applications. 2011, p. 177–86.
Acknowledgment [21] Bellala G, Marwah M, Arlitt M, Lyon G, Bash C. Following the electrons:
methods for power management in commercial buildings. In: Proceed-
We are thankful to Ming Hao of Hewlett-Packard Labs for ings of the 18th ACM SIGKDD international conference on Knowledge
fruitful collaboration on the analysis of power consumption data. discovery and data mining. ACM; 2012, p. 994–1002.
[22] Bellala G, Marwah M, Arlitt M, Lyon G, Bash CE. Towards an un-
This work has been partly funded by the German Research So- derstanding of campus-scale power consumption. In: Proceedings of
ciety (DFG) under the grant GK-1042, Explorative Analysis the Third ACM Workshop on Embedded Sensing Systems for Energy-
and Visualization of Large Information Spaces, Konstanz. Efficiency in Buildings. ACM; 2011, p. 73–8.
[23] Keim DA, Ankerst M, Kriegel HP. Recursive pattern: A technique for
visualizing very large amounts of data. In: Proceedings of the 6th confer-
References ence on Visualization’95. IEEE Computer Society; 1995, p. 279–86.
[24] Lammarsch T, Aigner W, Bertone A, Gartner J, Mayr E, Miksch S, et al.
[1] United States Energy Information Administration . Annual energy review. Hierarchical temporal patterns and interactive aggregated views for pixel-
2010. based visualizations. In: Information Visualisation, 2009 13th Interna-
[2] US Department of Energy . Energy efficiency trends in residen- tional Conference. IEEE; 2009, p. 44–50.
tial and commercial buildings. 2008. http://apps1.eere. [25] Weber M, Alexa M, Müller W. Visualizing time-series on spirals. In:
energy.gov/buildings/publications/pdfs/corporate/bt_ proceedings of the IEEE Symposium on Information Visualization. 2001,
stateindustry.pdf. p. 7–13.
[3] Aigner W, Miksch S, Schumann H, Tominski C. Time & Time-Oriented [26] Oelke D, Janetzko H, Simon S, Neuhaus K, Keim DA. Visual boosting in
Data. Human-Computer Interaction Series; Springer London; 2011. pixel-based visualizations. Computer Graphics Forum 2011;30(3):871–
ISBN 978-0-85729-078-6. 80.
[4] Agarwal Y, Weng T, Gupta RK. The energy dashboard: improving the [27] Keim DA. Designing pixel-oriented visualization techniques: Theory and
visibility of energy consumption at a campus-wide scale. In: Proceedings applications. Visualization and Computer Graphics, IEEE Transactions
of the First ACM Workshop on Embedded Sensing Systems for Energy- on 2000;6(1):59–78.
Efficiency in Buildings. ACM; 2009, p. 55–60. [28] Healey CG. Choosing effective colours for data visualization. In: Visual-
[5] Catterson VM, McArthur SD, Moss G. Online conditional anomaly de- ization’96. Proceedings. IEEE; 1996, p. 263–70.
tection in multivariate data for transformer monitoring. Power Delivery, [29] Wang L, Kaufman A. Importance driven automatic color design for direct
IEEE Transactions on 2010;25(4):2556–64. volume rendering. Computer Graphics Forum 2012;31(3pt4):1305–14.
[6] McArthur SD, Booth CD, McDonald J, McFadyen IT. An agent-based [30] Keim DA, Kriegel HP. Issues in visualizing large databases. In: Proc.
anomaly detection architecture for condition monitoring. Power Systems, Conf. on Visual Database Systems (VDB’95), Lausanne, Schweiz. 1995,
IEEE Transactions on 2005;20(4):1675–82. p. 203–14.
[7] Jakkula V, Cook D. Outlier detection in smart environment structured [31] Kosara R, Miksch S, Hauser H. Semantic depth of field. In: Proceed-
power datasets. In: Intelligent Environments (IE), 2010 Sixth Interna- ings of: IEEE Symposium on Information Visualization 2001 (INFOVIS
tional Conference on. IEEE; 2010, p. 29–33. 2001). IEEE Computer Society Press; 2001, p. 97–104.
[8] Seem JE. Using intelligent data analysis to detect abnormal energy con- [32] Giusti A, Taddei P, Corani G, Gambardella L, Magli C, Gianaroli L. Ar-
sumption in buildings. Energy and buildings 2007;39(1):52–8. tificial defocus for displaying markers in microscopy z-stacks. Visualiza-
[9] Mathieu JL, Price PN, Kiliccote S, Piette MA. Quantifying changes in tion and Computer Graphics, IEEE Transactions on 2011;17(12):1757–
building electricity use, with application to demand response. Smart Grid, 64.
IEEE Transactions on 2011;2(3):507–18. [33] Johnson B, Shneiderman B. Tree-maps: A space-filling approach to the
[10] Zhao Hx, Magoulès F. A review on the prediction of building energy con- visualization of hierarchical information structures. In: Visualization,
sumption. Renewable and Sustainable Energy Reviews 2012;16(6):3586– 1991. Visualization’91, Proceedings., IEEE Conference on. IEEE; 1991,
92. p. 284–91.
[11] Hao MC, Janetzko H, Mittelstdt S, Hill W, Dayal U, Keim DA, et al. A [34] Shneiderman B. Tree visualization with tree-maps: 2-d space-filling ap-
visual analytics approach for peak-preserving prediction of large seasonal proach. ACM Transactions on graphics (TOG) 1992;11(1):92–9.
time series. Computer Graphics Forum 2011;30(3):691–700. [35] Schreck T, Keim DA, Mansmann F. Regular TreeMap Layouts for Visual
[12] http://en.wikipedia.org/wiki/Energy_in_the_United_ Analysis of Hierarchical Data. In: Proceedings of the Spring Conference
States. viewed 6/17/13. on Computer Graphics (SCCG’2006). Casta Papiernicka, Slovak Repub-
[13] IBM TRIRIGA Energy Optimization: Integrated software solu- lic: ACM Siggraph; 2006, p. 184–91.
tion for improving buildings management and facilities opera- [36] Hao MC, Dayal U, Keim DA, Schreck T. Importance-driven visualization
tion. viewed 6/17/13. http://pic.dhe.ibm.com/infocenter/ layouts for large time series data. In: Information Visualization, 2005.
tivihelp/v57r1/topic/com.ibm.iteo.doc/infocenter.pdf. INFOVIS 2005. IEEE Symposium on. IEEE; 2005, p. 203–10.
[14] UCEI (University of California Energy Institute), Berke- [37] Bruls M, Huizing K, van Wijk J. Squarified treemaps. In: Proceedings
ley, California, CA . New york city building energy map. of the Joint Eurographics and IEEE TCVG Symposium on Visualization
2007. http://www.visualizing.org/visualizations/ (VisSym 00). Eurographics Association; 2000, p. 33–42.
new-york-city-building-energy-map. [38] Baker MJ, Eick SG. Space-filling software visualization. Journal of Vi-
[15] Granderson J, Piette MA, Ghatikar G, Price P. Building energy informa- sual Languages and Computing 1995;6(2):119–33.
tion systems: State of the technology and user case studies. Handbook of
Web Based Energy Information and Control Systems 2009;.
11
View publication stats

Anomaly Detection Using Visualisation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Anomaly Detection Using Visualisation

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Anomaly Detection for Visual Analytics of Power Consumption Data

Article in Computers & Graphics · February 2014

Halldor Janetzko Florian Stoffel

SEE PROFILE SEE PROFILE

BigGIS View project

Visual Analytics View project

The user has requested enhancement of the downloaded file.

Halldór Janetzko, Florian Stoffel, Sebastian Mittelstädt, Daniel A. Keim

1. Introduction on a time-dependent energy consumption model. We have ex-

2.3. Our contribution 3.1. Prediction-based Anomaly Detection

viations. Computation of the anomaly score is summarized by 0 am

3.2. Clustering-based Anomaly Detection

4.4. Treemap Integration

A Recursive Pattern has a rectangular shape and, therefore,

View publication stats

You might also like