You are on page 1of 19

Real Time Analytics Platform

By-JYOTI JHA
What is Real Time Analytics?
 What is it?
• Real-time analytics is a process of delivering information
about events as they occur

 Some Examples
• Financial Industry - Fraud Detection, Trading
• E-commerce - Recommendations
• Telecom Industry - Machine to Machine communication
• Supply Chain Management
• Business Activity Monitoring
Why is it needed ?

 Time is money
• Inter-day risk analysis in real time could translate into
increased profits

 Helps organizations to stay ahead of competition

• E-commerce – throwing information based on what a user is


browsing or interested in could help in better sales and
experience
• Content creator could produce relevant and quality content
Case Study –
Telecommunications Industry
The Company, Challenge & Benefits
• Design a Near Real Time
Company
solution for predicting patterns
•• Telecom
Challenge
firm providing wireless
network service designed to deliver based on data generated by
• Machine
Design atoNear Real communications
Time solution for
Machine
predicting patterns based on data Machine-to Machine (M2M)
to millions of device.
generated by Machine-to- Machine communication and sent over
(M2M) communication and sent over
wireless network. wireless network.
• Solution should be able to support • Solution should be able to
addition of near real time streams
Benefits support addition of near real
without much of a change.
• Enabled customers to react to their
• Enable customer to get real time alerts
time streams without much of a
critical business needs in real time.
for business critical situations change.
• Improved Customer Experience.
• Reduced operating cost.
• Enable customer to get real
time alerts for business critical
situations
Examples

 Machine to Machine Communication


• Vineyards watering
o Spread over huge area
o Critical to maintain water level threshold

• Vehicle Tracking & Geo-fencing


o Mark the radius of vehicle movement (in case of valet parking)
Incoming Data Attributes
 Continuous input streams
• Events as they happen

 High data volume


• 1000-100000 events per second

 Varied sources
• Data coming from multiple sources
Expected Goals
 Identify patterns
• Devices sending incorrect /duplicate data
 Reliability
• Events are processed as they happen
• Events are not missed in case of failure
 Scalability
• Should be able to support increase in volume
 Capability to Add more Queries
• Should be able to add more queries for a particular type of
incoming stream
 Notification / Alerts System
Technology Stack – What all is
needed?
 Event Processing capability
• Esper
o Processing engine for data streams
o SQL-Like Support – run queries on data stream
o Sliding windows (time or length)
o Pattern Matching
o Executes large number of queries simultaneously
Technology Stack – Esper
 Esper - Simple steps to get started

• Get an Esper instance


• Create a statement (Esper Query Language)
• Register the statement with esper engine
• Create a Listner
• Attach listener to the statement
Technology Stack – Esper
 Esper – Sample Queries

 Time based window


select avg(price) from StockTickEvent.win:time(30 sec)

 Length based window


select symbol, avg(price) as averagePrice from
StockTickEvent.win:length(100) group by symbol
Technology Stack - Storm
 Data Carrier for Esper
• Storm
o Facilitates data transfer
o Continuous Computation
o Distributed, Fault tolerant
o Scalable, No Data Loss
o Provides parallelism
o Acking & Replay capability
Technology Stack - Storm
 Basic concept of Storm
• Streams, Spouts & Bolts
• Stream is unbounded sequence
of tuples
• Spouts are data emitters,
retrieving data from outside the
Storm cluster
• Bolts are data processors,
receive one or more stream and
emit (potentially) one or more
Technology Stack - Storm
 Storm Cluster
• Topology - A graph of spouts and bolts
that are connected with stream groupings
• Master Node – Runs daemon called
Nimbus
o Distributes code across cluster
o Assign tasks to machines
o Monitor failure

• Worker Node - Runs daemon called


Supervisor
o Listens for work assigned
o Start/Stop worker process
o Executes subset of topology

• Coordination between nimbus and


supervisor is done with Zookeeper
Technology Stack - Flume
 Log Data Collection
• Flume
o Stream oriented data flow
o Log streaming from various sources
o Collect, aggregate & move data to centralized data
store
o Distributed, Reliable
o Failover and recovery mechanism
Technology Stack - Flume
 Flume
• Agent - Receives data from
an application
• Collector – Writes data on to
a permanent storage
• Master – Separate service
controlling all the other
nodes
Technology Stack - Messaging
 Bridging the gap between Flume & Storm
• Queue Messaging System
o Robust messaging
o Flexible routing
o Highly available
o Makes Flume & Storm integration loosely coupled
• RabbitMQ fits the requirement
Fitting it all together

Data Center

You might also like