You are on page 1of 8

BROUGHT TO YOU IN PARTNERSHIP WITH

CONTENTS

Advanced
öö What is time series data and where
is it taking us?

ö ö Time series data: use cases across


industries

ö ö Time series data case studies

Time Series
ö ö Time series data case studies

ö ö Purpose-built databases are better

ö ö Getting started with InfluxDB


Cloud 2.0

ö ö Cloud 2.0 hands-on learning

WRITTEN BY DANIELLA PONTES


PRODUCT MARKETING, INFLUXDATA

More than four years ago, InfluxDB — an open source time series Behind much of the interest in understanding time series better is the
platform — was launched. In the years since, time series technology has volume at which we are collecting time series data of all sorts. From the
become increasingly popular; according to DB-Engines, over the last 24 physical world, we have sensors in manufacturing and energy generating
months, time series has been the fastest growing database category. This plants, as well as fleets of personal devices, all generating tons of data.
popularity is fueled by the "sensorification" of the physical world (i.e., From the virtual world, we have been instrumenting software metrics,
IoT) and the rapidly increasing instrumentation requirements of the next events, and logs. With the containerization of applications, the number
generation of software. InfluxDB has millions of downloads, an expand- of collected measurements exploded. To make matters worse, the
ing list of enterprise customers, and a growing community that is always sampling is increasingly done at very fine intervals, all the way down to
finding new ways to build on the platform — and we are just scratching
the surface. Whether the data comes from humans, sensors, or machines,
InfluxData empowers developers to build next-generation monitoring,
analytics, and IoT applications faster, easier, and to scale ­­­-- delivering
real business value quickly.

As we enter the era of workflow automation, machine learning, and


artificial intelligence, it is time for time series data.

What is time series data and where is it taking us?


In a previous Refcard, we talked about how time series has been used
broadly as a tool to understand change and behavior. For instance, we use
time series to generate and observe economic indexes and market per-
formance, environment degradation, growth rate of social media, etc. So,
what's new? Why the new growing interest in understanding something
that we have already used for so long?

1
Time series is everywhere
and is an integral part of a
modern and sensorized
environment.
Open source to leverage collaboration and keep
freedom of choice and purpose-built for the required
leap in speed and efficiency

For more information, visit:


influxdata.com
ADVANCED TIME SERIES

nanosecond granularity. Although this is an eye opener, volume alone Dow Jones - DJIA - YTD value
doesn't fully explain the renewed focus on time series data. The funda-
mental question persists: why have we gotten into this "frenzy" mode of
collecting time series data about everything to which we have access?

The urge to collect time series data comes from the realization of the
unlimited use and value that can be extracted from it. A timestamp is like
a data "stain" to guide our observation over anything that we want to
understand over time. And not only that! Timestamps provide the most
fundamental metadata with which one can correlate, aggregate, trans-
form, and, ultimately, predict the combined effect of outcomes.

For instance, how valuable is it to be free of output inconsistencies and


downtime in a manufacturing plant? How much more cost-effective Source: Macrotrends, Dow Jones DJIA - 100 Year Historical Chart
is preemptive maintenance of remote deployments? How lucrative is
Regarding the Dow Jones index example above, the historical data is
energy generation overflow when re-aligned to be where demand exists?
inflation-adjusted using the headline CPI and each data point represents
And, now, with the trend of cloud-native elastic capacity, how can we
the month-end closing value. See the graph below.
act efficiently and effectively in computing resource assignments, while
keeping up service performance? Process outcomes, infrastructure
Dow Jones - DJIA - 100 Year Historical Chart
utilization, service delivery, user experience, and other factors such as
state, demand, environment, etc. can all be correlated in time to provide
a more comprehensive view of what is going on.

Time series provides an understanding of the present in absolute and


relative terms, giving a more acute sense of "consciousness." Further-
more, it provides a means to learn from past behavior, thereby increas-
ing the odds of a predictable outcome.

As a concrete example of what this could mean in our lives, let's take the
case of alerts in intensive care units. Data shows that nurses respond
to an alarm every 90 seconds, two-thirds of which turn out to be false
positives. This often causes a phenomenon known as "alarm fatigue."
The high false positive rate can be associated with absolute thresholds in
isolation. Hospitals are now applying machine learning and AI solutions Source: Macrotrends, Dow Jones DJIA - 100 Year Historical Chart
to the time series metrics they are collecting from intensive care equip-
ment and devices to make more educated decisions from the readings. The Dow Jones example also highlights additional aspects that charac-
terize time series, which go beyond buckets of samples of the same thing
Machine Learning and AI systems are fed by time series data. The larger separated by timestamps. Time series also has some characteristics
the data set, the more accurate the outcome. Therefore, time series data that are important to consider when designing or choosing solutions
is being collected and kept as an invaluable asset. The volume at which is to handle this type of data. A time series often goes through some data
generated — and the current and future value associated with collecting transformation in its lifecycle. Depending on the use case, the value of a
timestamped sample data — is driving the growing interest in time series.
single sample of raw data in a fine-grained sampling rate decreases with
time and the size of the bucket. Therefore, the data set can go through
So exactly what, then, is time series? Time series is a data set comprised
downsampling, aggregation, and mathematical operations over time,
of timestamped measurement values of the same thing collected over
thus generating new buckets of transformed time series.
time. It means that samples don't replace previous ones; instead, they
are accumulated with each value adding to the chronological knowledge
So, what do time series give you besides a gigantic collection of data
base for the observed target.
consuming your resources? For one, the agility to cope with the increasing
speed and complexities in our highly instrumentalized environments. If
Classical and intuitive examples of time series can be found in the finan-
cial market. For instance, below we have a graph of Dow Jones Industrial you can handle time series data properly, you have distributed contain-

Average (DJIA) index performance for the year to date, with each data erized application environments with CI/CD and zero-downtime, and accu-

point representing the day's closing value. rate and timely resource predictions for optimum cost effectiveness at

3 BROUGHT TO YOU IN PARTNERSHIP WITH


ADVANCED TIME SERIES

your fingertips. When used with machine learning and AI, more intelligent 1. Time series data is collected to provide a valued service, such as
alerts, and workflow automation, delivering better customer experience, optimum energy consumption and optimum exercise for fitness
happier users, and, ultimately, more valuable services becomes easier. goals, physical traffic monitoring (cars and people), etc. This is
the case for utilities consumption services, like Nest in the United
Time series data: use cases across industries States and Tado in Europe. Fitbit and Apple Watch are good exam-
Time series projects in support of applications usually start at ITOps ples in the fitness market.
and Engineering, development, and site reliability — with monitoring
2. Time series data also provides visibility into service operations,
production and pre-production code, as part of application develop-
encompassing data about the devices and sensors themselves,
ment, deployment and operations. However, as containers, Kubernetes,
such as battery life and geolocation, as well as physical traffic and
and CI/CD get adopted, full-stack monitoring becomes critical in order to
environmental variables that could impact services. An example
cope with constant updates in a reliable manner.
in this space is Worldsensing.

But organizations should not only collect time series of technical aspects. By 2020, it is expected that 11.5 Zettabytes (zetta = 10 to the twenty-first
In order to evaluate the current and trending state of an organization, power) of data will be generated, and this only with the instrumenta-
monitoring business key performance indicators (KPI), which captures tion of the physical world. When we look at the virtual world, we see
the composite effect by taking multiple measurements into account, is fragmentation of applications into microservices, ephemeral container
usually a good way to detect signs or trends early-on. Therefore, busi- deployments, additional logical layers (such as orchestration and
ness KPIs will also create demand for time series collection. service mesh) to handle the complexity of a fragmented distributed
environment. This has all contributed to the available time series data
Common time series measurements collected in DevOps use cases are:
increasing by an order of magnitude.
• System: CPU load, disk usage and disk IO, memory usage, inter-
face load. Time series data case studies
Here are some case studies that illustrate specific scenarios where
• Network: latency, jitter, packet loss, error rate.
adopting a time series platform for DevOps and IoT was fundamental to
• 0Containers/Kubernetes: resource utilization (CPU, memory,
the success of the services provided.
disk, network bandwidth), container status, pod health (ready,
status, restarts, eviction), mounted volumes stats, K8s state. Veritas Technologies,a leader in multi-cloud data management with
• Application: Health checks, up/down check, HTTP requests a 360-degree approach to data management, has more than 10,000
(queue, delay, response code, etc.), connections (e.g. database), NetBackup appliances deployed in the wild. These appliances are the
number of threads. leading backup and recovery software for enterprises, commonly used
for backing up data centers. This platform uses artificial intelligence (AI)
Besides the typical use case of DevOps monitoring, we also have the IoT
and machine learning (ML) to deliver predictive support services for Ver-
segment, both industrial IoT (IIoT) and Consumer IoT, generating tons of
itas appliance customers. The availability of a vast amount of time series
time stamped data. IIoT covers industrial organizations, whether large or
data (collected for use internally from Veritas' Auto Support capabilities)
small, working with solutions to digitally transform their manufacturing
enabled forecasting for a multitude of use cases from application perfor-
processes. As a legacy, IIoT segments use data historians to store their
mance optimization to workload anomaly detection.
monitoring data and show trends per machine or across a collection of
machines. Examples of industries that historically monitor their produc- Capital One Financial Corporation is a bank holding company specializ-
tion are energy producers, manufacturers, food and beverage produc- ing in credit cards, auto loans, banking, and savings products headquar-
tion plants, etc. A common set of collected data is: tered in McLean, Virginia. Capital One is ranked eleventh on the list of
largest banks in the United States by assets. Time series data at Capital
• Instrument readings (flow rate, valve position, temperature).
One consists of infrastructure, application, and business process metrics.
• Performance monitoring (units/hour, machine utilization vs.
The combination of these metrics is what the internal stakeholders rely
capacity, scheduled vs. unscheduled outages).
on for observability, which allows them to deliver better services and up-
• Environmental readings (weather, atmospheric conditions,
time for their customers. Protecting this critical data with a proven and
groundwater contamination).
tested recovery plan is therefore not a "nice to have" but a "must have."
• Production status (machine up/down, downtime reason tracking).
Tado° GmbH is a manufacturer of intelligent home climate solutions and
A more recent segment is Consumer IoT. Here, we have consumer devic-
produces thermostats and smart air conditioning controls. These smart-
es and services for utilities consumption, biometrics and lifestyle data,
er technologies allow for more efficient energy control and can save up
logistics, retail, and healthcare, among other use cases. In Consumer IoT,
to 31% more energy. The value proposition of their technologies, prod-
time series serves two main purposes:
ucts, and services is to enable a modern and sustainable lifestyle without

4 BROUGHT TO YOU IN PARTNERSHIP WITH


ADVANCED TIME SERIES

sacrificing comfort. They collect sensor data to operate the service, but sum of its parts. And it is about transparency, where nothing is hidden,
also provide customers with visibility into service metrics. When their giving you a chance to make educated decisions. Open source also keeps
customer base reached the threshold of a six-digit figure (hundreds of your options open and your data "fluid" by avoiding vendor lock-in.
thousands), they had to rethink their time series strategy and therefore
adopted the InfluxDB purpose-built time series database. Purpose-built databases are better
Time series data platforms pose intrinsic challenges in scalability, high
Siemens sells wind and gas turbines to large corporations and munici- availability, and usability. A few properties that make them very different
palities, providing monitoring as a service for these customers to track than other data stores include automated data lifecycle management,
the health, usage, and performance of the turbines. Key to these busi- summarization, operations cross-measurements, continuous queries,
nesses is the ability to implement a scalable time series store, allowing and large range scans of many records.
customers to get their information via a REST API, all in real-time via data
streaming from each turbine. Those embarking on time series data projects face a major decision: try to
adapt an existing relational database to manage time series data or adopt
Spiio offers irrigation systems consisting of wireless sensors, apps, and a time series database(s) with purpose-built storage and query engines?
integration with smart irrigation controllers. According to their CTO, When professionals try to use legacy databases and query engine models
"having permanent access to time series data and plant analytics was to handle time series, they eventually have to deal with architectural
like the blinds fell off to eliminate guesswork, reveal trends, and enable limitations impacting write and read speed, query processing time, and
data-driven decisions not only for green wall maintenance but also for storage requirements, among other system performance specifications.
its design — for green walls built to perform." The volume and speed required to handle production time series will de-
mand a different architecture — workarounds with layered processing and
PaPyal provides online financial services to people and businesses. Their
custom configurations will not buy much time until a true purpose-built
digital payments platform gives PayPal's 277 million active account holders
time series database is required. Addressing the challenges arising from
the confidence to connect and transact in new and powerful ways, whether
pervasive collection of time series, coupled with the need to handle time
they are online, on a mobile device, in an app, or in person. PayPal built and
series workloads, led to the rise of time series databases.
uses these monitoring solutions to help them improve their operational
efficiencies while mitigating incidents involving multiple teams. As for comparing the performance of time series databases to other
types of databases that have been adapted to handle time series,
In the time of time series, professionals are getting insights into physical
there's no need to believe in opinions — the answer is always in the
and virtual worlds to provide valuable and stellar services. They are
data. Purpose-built, well-designed time series database engines reach
also using time series to learn and predict, and, by doing so, they can be
benchmarks in the order of hundreds of millions of time series, millions
more productive and cost-effective while maintaining service perfor-
of writes per second, a thousand queries per second, and just 2.15 bytes
mance. But, in order to leverage the benefits of time series to its fullest
of storage per record. The latter is very important to keep long-term
extent, it is necessary to be able to collect, store, and analyze data in
expenses low, since, for some records, like healthcare and IIoT, the reten-
real-time and at scale. Therefore, picking the right solution is critical.
tion rate for device data is forever.

Open source is key Although time must be a central point in the overall platform architec-
Open source means, among other things, free to use; most importantly,
ture design of the database, just being able to query on time doesn't
it means that ideas and information are shared openly, and the commu-
cover all the requirements of an effective and efficient solution, as
nity is encouraged to collaborate transparently. Innovation happens at a
mentioned above. To achieve its performance and functionality goals, it
much faster pace, and because of the many brains continuously testing
is necessary to devise a combined strategy in the data model, func-
and applying it in different use cases, open source is more reliable,
tional query language, and storage engine design. Add to that a query
secure, and awesome.
language that is also scriptable and that eases the selection of series,
continuous queries, and transformation of queried data, and you have a
Cloud is a good match for time series because it also provides the elasticity
complete time series platform, such as InfluxData's InfluxDB platform.
necessary to assign resources only when necessary, eliminating the need
to commit to large investments upfront. However, what you will most likely
have to deal with is a hybrid multi-cloud world. Open source provides the
Getting started with InfluxDB Cloud 2.0
INFLUXDB DATA MODEL DESIGN
necessary freedom to interact with multiple platforms. Data is not locked. It
Efficiency and effectiveness have to start in the data structure, ensuring that
can flow to different domains and frameworks such as ML and AI.
time-stamped values are collected with the necessary precision and meta-
data to provide flexibility and speed in graphing, querying, and alerting. The
The power of the open source community to drive innovation is unsur-
passed by any proprietary software solutions because it is not about InfluxDB data model has a flexible schema that accommodates the needs of

license. It is about collaboration that makes the whole greater than the diverse time series use cases. The data format takes the following form:

5 BROUGHT TO YOU IN PARTNERSHIP WITH


ADVANCED TIME SERIES

<measurement name>,<tag set> <field set> <timestamp> ger-term trend data. This kind of data lifecycle management is difficult
for application developers to implement on top of regular databases but
The measurement name is a string, the tag set is a collection of key/
is fundamental for time series.
value pairs where all values are strings, and the field set is a collection
of key/value pairs where the values can be int64, float64, bool, or string. A good starting point for time series projects is on InfluxDB Cloud 2.0 — a
There are no hard limits on the number of tags and fields. hosted environment and easy to use UI. InfluxDB Cloud 2.0 supports the
Flux query and scripting language, which eases operations on the data
Being able to have multiple fields and multiple tags under the same
set, such as aggregations, cross-measurements operations, and downs-
measurement optimizes the transmission of the data, avoiding multiple
ampling, among others. InfluxDB Cloud 2.0 offers a rate-limited free plan
retransmissions which can render network protocols bloated when trans-
with no InfluxDB deployments needed. You go straight to collecting the
mitting data with shared tag sets. This design choice is also particularly im-
data and beginning your time series monitoring.
portant for IoT use cases where the agent stored on the monitored remote
devices sending the metrics has to be energy-efficient for a longer lifespan. Cloud 2.0 hands-on learning
The fastest way to get a feel for what time series data can do for you is
Furthermore, support for multiple types of data encoding beyond
to try it.
float64 values means that metadata can be collected along with the
time series and not limited to monitoring only numeric values. Precision Go to https://cloud2.influxdata.com/signup, register, and sign in:
is another parameter that must be taken into account when defining
the data model. Timestamps in InfluxDB can be precise to the second,
millisecond, microsecond, or nanosecond. This scale makes InfluxDB
a good choice for use cases in finance and scientific computing where
other solutions would be excluded.

See the example of CPU metrics mapped in InfluxDB line-protocol below:

cpu,host=serverA,region=uswest
idle=23,user=42,system=12 1464623548s

INFLUXDB TIME SERIES DATABASE SPECIALTIES Once signed in, it is time to generate your collectors Telegraf agent:
A good strategy when adopting a platform for time series should take
into account the management of all types of time series data — not
only metrics (numeric values regularly collected or pulled), but also
events (pushed at irregular time intervals, such as faults, peaks, and
human-triggered events, such as clicks).

Given the constant influx of granular data from a number of data sources
in modern architectures, a performant database solution for time series
has to handle high write rates with fast queries at scale, and that is
where most other types of databases stumble when used for time series.
In order to reach the bar lifted by time series, InfluxData implemented an In InfluxDB Cloud 2.0, there is no need to install a database, you just
architecture design with high compression, super-fast storage and query need to create Buckets to which your data can be sent.
engines, and a purpose-built stack.

InfluxDB uses: an append-only file for new data arrival in order to make
new points easily ingestible and durable; a columnar on-disk storage for
efficient queries and aggregations over time; a time-bound file structure
that facilitates the management of data in shards' sizes; a reverse index
mapping of measurements to tags, fields, and series for quick access
to targeted data; per type data compaction and compression for read
optimization and volume control.

In time series, it's common to keep high-precision data around for a When creating a Bucket, you must define the retention policy for the
short period of time, and then aggregate and downsample into lon- data in that bucket:

6 BROUGHT TO YOU IN PARTNERSHIP WITH


ADVANCED TIME SERIES

In the configuration page, you are presented pre-canned configurations


for some popular plugins, such as Systems, Docker, Kubernetes, and NG-
INX. For this example, we will use Systems configuration, which collects
system metrics, CPU, mem, disk, etc.

Once you create the Bucket, it is time to load it. You can do so in two ways:
1. Via the Telegraf agent.
2. Via the Line Protocol.

Let's now go back to the Telegraf view for final setup. Select "Setup
Instructions" to go to the agent setup instructions page:

On the Telegraf Setup Instructions, you will find directions on how to in-
stall Telegraf with the required token — before this step, one must create

Loading data via the Line Protocol is as easy as uploading a file or simply a token — to be able to send data to the respective Bucket:
copy-pasting the file onto the UI. Please make sure that you select the
correct precision to be applied to your metrics timestamps.

Now, to load the Bucket via an agent collector, it is first necessary to


create a Token for communication with the Bucket. Make sure that you
select the type of access to grant: Read/Write or All Access Token. This
Token will later be used in the Telegraf agent setup.

That's it! Now data will start flowing to your Bucket in InfluxDB Cloud 2.0.

Go to the Data Explorer icon and start composing queries to visualize


data in your buckets. The UI will easily guide you through the fields, tags,
and functions options for your data.

The next step it is to configure Telegraf for the measurements you want
to collect. To create a new Telegraf configuration, just click on the blue "+
Create Configuration" button in the upper right-hand corner:

This Refcard only shows an example of how to collect system time series
metrics from a host using Telegraf Systems pre-canned configuration.
However, this is just to get you started. The InfluxDB platform has a vast

7 BROUGHT TO YOU IN PARTNERSHIP WITH


ADVANCED TIME SERIES

number of Telegraf plugins (over 200!) for use in your environments. See
a list below of some common plugins:

1. apache 12. nginx


2. consul 13. nsq
3. docker 14. phpfpm
4. elasticsearch 15. ping
5. haproxy 16. postgresql
6. iis 17. rabbitmq
7. influxdb 18. redis
8. kubernetes 19. riak
9. memcached 20. system
10. mesos 21. varnish
11. mysql 22. win_system

You can find the full Telegraf plugin list here.

Now that you are all set to start exploring the world of time series data,
what's next? Read more time series case studies in various industry seg-
ments: telecom and service providers, e-commerce, financial markets,
IoT, research, manufacturing, telemetry, and, of course, the horizontal
case of DevOps and NetOps in any organization.

Written by Daniella Pontes, Product Marketing, InfluxData


Daniella Pontes is part of the product marketing team in InfluxData, San Francisco. She started her career in
telecommunications, wireless technology and global Internet service provisioning. As security became a major
concern for enterprises, she worked on enterprise policy management, SaaS and data encryption solutions.
Prior to joining InfluxData, she spent some years living in Japan, Germany, and Brazil, working for an online
agency developing and managing the Brazilian market.

Devada, Inc.
600 Park Offices Drive
Suite 150
Research Triangle Park, NC

DZone communities deliver over 6 million pages each month to 888.678.0399 919.678.0300
more than 3.3 million software developers, architects, and de-
cision makers. DZone offers something for everyone, including Copyright © 2019 Devada, Inc. All rights reserved. No part of this
news, tutorials, cheat sheets, research guides, feature articles, publication may be reproduced, stored in a retrieval system, or
source code, and more. "DZone is a developer’s dream," says PC transmitted, in any form or by means electronic, mechanical,
Magazine. photocopying, or otherwise, without prior written permission of
the publisher.

8 BROUGHT TO YOU IN PARTNERSHIP WITH

You might also like