You are on page 1of 13

A LITTLE BEE BOOK

How it Works
Streaming Analytics
A LITTLE BEE BOOK
This book belongs to:
How it Works
Streaming Analytics
Adapted from a variety of sources by Bob Yelland
With thanks to Avi Patwardhan & Kimberly Madia

For more copies of this book, or to read others in the series, visit: littlebeelibrary.com
BACK NEXT
Sometimes two minutes is too late.

Organisations need to spot risks and opportunities in


high-velocity data opportunities that often can be
detected and acted on only at a moments notice.

For time-sensitive processes such as thwarting


fraud, mitigating security threats or responding to
natural disasters, time is of the essence.

Real-time analytics (or stream computing) enables


continuous processing of data streams and can be
used to maximise the time value of data.

4 BACK NEXT
A key difference between stream computing and
online analytical processing (OLAP) is that the
latter requires data to be at rest before running
analytics.

Stream computing is a processing paradigm that


brings the analytics to the data, rather than storing
the data first.

The ability to be able to analyse data in real time


shifts the conversation from how to manage big
data to how to make sense of, analyse and act on it
at high velocities.

Analysing data in motion leads to immediate and


accurate decision making.

6 BACK NEXT
Stream computing can deliver a rapid return on
investment.

A healthcare firm realised 95% faster insight into


patient health by accelerating the execution of
complex algorithms. This saves lives by flagging the
risk of serious medical conditions. It also enables
effective targeting of patient care, thereby optimising
healthcare resources.

A utility company saved more than 700,000 gallons of


fuel and lowered costs for consumers by $24 million
by analysing the data from 2.3 million smart meters.

A telecommunication company improved marketing


effectiveness by 70% using behavioural based
segmentation to create dynamic offers.

8 BACK NEXT
There are four broad ways that stream computing is
being used today:

Streaming Extract, Transform and Load (ETL)


Data is continuously cleaned and aggregated before
being pushed into data stores.

Triggers Anomalous behaviour is detected in real


time, and further downstream actions are triggered
accordingly.

Data enrichment Live data is enriched with more


information by joining it with a static dataset, allowing
for a more complete real-time analysis.

Complex sessions and continuous learning


Events related to a live session (e.g. website activity)
are grouped together and analysed.

10 BACK NEXT
These uses have given rise to a number of industry
applications. For example:

Telecommunications
Call detail processing
Customer churn prediction
Device geomapping.

Travel and Transportation


Intelligent traffic management
Automotive telematics.

Energy and Utilities


Usage forecasting
Equipment monitoring.

Financial Services
Fraud detection & prevention
Targeted marketing
Cybersecurity monitoring.

12 BACK NEXT
Stream Processing (ESP) and Complex Event
Processing (CEP) are very similar concepts, but there
are some important differences:

Speed through Parallelism


CEP is often centralised. Stream applications
are deployed across many nodes to maximise
parallelism and scalability.

Deeper Analysis
CEP uses a rules engine to evaluate if-then-else style
rules, or an in-memory SQL database to perform
continuous simple queries. ESP provides more
options to analyse data though a comprehensive
programming language (SPL).

Broader Data Types


CEP engines handle structured data. Streams
have been designed to analyse all manner of data,
including image, video and acoustic data types.

14 BACK NEXT
Most stream computing platforms include two
core components: an application development
environment to build applications that ingest and
process data streams, and a runtime capability
designed to process data streams with low latency at
massive scale seamlessly across infrastructure.

In addition, streaming analytics toolkits improve the


productivity of developers and data scientists in
crafting complex analytics, such as natural language
processing, voice analytics and facial recognition.

16 BACK NEXT
There are a number of open source offerings that
can support streaming analytics:
Apache Storm, written in Clojure, was created
by Twitter and is composed of other open source
components, especially ZooKeeper for cluster
management, ZeroMQ for multicast messaging, and
Kafka for queued messaging.

Apache Spark, written in Scala, is a general


framework for large-scale data processing that
supports lots of different programming languages
and concepts such as MapReduce, in-memory
processing, stream processing, graph processing
and machine learning.

Apache Akka is a toolkit and runtime for building


highly concurrent, distributed, and resilient
messagedriven applications on the Java Virtual
Machine (JVM).

18 BACK NEXT
IBM InfoSphere Streams is an open platform that
blends the best elements of shareware, open source
software and open standards with powerful vendor-
developed technology.

IBM Streams is highly efficient, using 14.2 times fewer


hardware resources and delivering 12.3 times more
throughput compared to open source offerings.

It offers a highly scalable event server, integration


capabilities, and other typical features required for
implementing stream processing use cases.

IBM Streams is a leader in the Forrester Wave for


Big Data Analytics Platforms.

20 BACK NEXT
The latest IBM Streams update focuses on developer
productivity.

The Integrated Development Environment (IDE) is


based on Eclipse and offers visual development and
configuration.

It delivers faster streaming application delivery


by allowing the creation of streaming applications
in Java.

A developer with no prior Streams knowledge can


create applications in under an hour using Java
APIs for streaming analytic libraries such as natural
language processing, spatial, temporal, acoustic,
image recognition and more.

Time is of the essence


why not trial IBM Streams today?

22 BACK NEXT
Copyright IBM Corporation 2017.All Rights Reserved.
IBM, the IBM logo andibm.comare trademarks or registered trademarks of International
Business Machines Corporation in the United States, other countries, or both.
Other product, company or service names may be trademarks or service marks of others.
24