You are on page 1of 34

Big Data Analytics

Otto Medin & Louise Parberry


Sales Engineers
How big is big?
Response time requirements
Scalability requirements
Budget
Big Data Analytics Overview
Big Data By The Numbers
Data load limit ~400 MB/sec (commodity server)
3 terabyte data load
$180 hard drive
~7860 sec (~2 ! hrs)
1 exabyte
$63 million
87 years
Big Data Analytics
The key to Big Data Analytics:
PARALLELIZE!
(if you want a quick result, that is)
Big Data Analytics
Big Data Analytics Overview
Parallelization
Academy Model
Exercises
Agenda
Splitting data over multiple servers
Domain or functional decomposition
Academy will concentrate on domain model
Partitioning
By design
Search engine
By evolution
Corporate acquisitions
Our example!
Domain Decomposition
Considerations
Minimize communication
Compare to ECP
Server architecture
High availability requirements
Optimal number of threads
Task distribution
Split and delegate task - Map
Aggregate partial results Reduce
Result has same format 1 to N
Aggregation should not be bottleneck
MapReduce Pattern
How much set up is required?
Despite what you may read about other technologies,
development work is necessary for all
implementations
Do I need to install additional software?
No! "
MapReduce Questions
Multiple web shops
Regional warehouses
Europe, Asia, Americas
Big Web Shop
Outsources all orders to web shops
Academy Scenario
Big shop category managers want to know
about any of their products being frequently
out-of-stock
Measure of unhappiness
Product is out of stock at time of order too often
Product will still be delivered but might be late
The Problem
Web shop simulator
Business service
Big web shop order distribution
Business process and business operation
Warehouses
Data model and pivot table
Web service
Initial Infrastructure
HoleFoods Web Shop
HoleFoods Data Model
Outlet
Population
Country
City
Country
Name
Region
Product
Name Region
Name
Type
Transaction
Actual
Date Of Sale
Product
Outlet
Channel
AmountOfSale
Units Sold
InStock
Category
Price
SKU
DeepSee Data Model
Cubes
Defines dimensions and
measures
Subject Areas
Views on cubes
Provides automatic filtering
KPIs
Makes more sophisticated
computations available to
dashboards
Can make use of DeepSee, SQL,
or custom logic
DeepSee Performance and Scalability
Multi-level, incremental caching to
support large data models (100M+
facts)
Support for parallel execution of
queries to exploit multi-core
architectures:
Queries are split by # of facts
Queries are split by # of cells
Subqueries and joins
Logic for updates to Data Model is
streamlined
Academy setup
BigData
Asia
Europe
Americas
Order
Distributor
Web shop
Simulator
Four Ensemble instances:
In this exercise you will familiarize yourself
with a regional warehouse (DeepSee) and use
the web shop simulator.
Exercise 1
MDX
MDX (MultiDimensional eXpressions) standard query language
for OLAP (online analytical processing)
Provides standard syntax to execute queries against a cube
When you create a pivot table DeepSee generates and uses an
MDX query, which you can view directly
Analyzer provides an option for directly running MDX queries
You can run MDX queries in the DeepSee shell
DeepSee provides an API that you can use to run MDX queries
on your DeepSee cubes
MDX Example

SELECT NON EMPTY [OUTLET].%TOPMEMBERS ON 0,NON
EMPTY [CHANNEL].%TOPMEMBERS ON 1 FROM [SALES]
WHERE [MEASURES].[AMOUNT SOLD]


In this exercise you will access your
warehouse analytics programatically, using
MDX, and publish the results as a web service.
Exercise 2
Ens.CallStructure
Holds a request object and a target name
Also has a slot for the Response
Ens.Host.SendRequestSyncMultiple
Accepts a list of Ens.CallStructure
Makes calls in parallel
Adds response objects to Ens.CallStructure
How to parallelize dynamically
set tCall = ##class(Ens.CallStructure).%New()
set tCall.TargetDispatchName = MyBusinessHostClass"
set tCall.Request = ##class(MyRequestClass).%New()

set pRequestList = pRequestList + 1
set pRequestList(pRequestList) = tCall

set tSC = ..SendRequestSyncMultiple(.tRequestList)
How to parallelize dynamically
In this exercise you will retrieve statistics from
the relevant regional warehouses, using
parallel calls.
Exercise 3
Dashboards
Widgets
In this exercise you will aggregate the results
from Exercise 3 and monitor the aggregated
results using a dashboard.
Exercise 4
Warehouse problem simulator
Business Rule
Creates decision in point in business process
Change at runtime
In this exercise you will force a product
category to be out-of-stock and watch the
results deteriorate
Exercise 5
With InterSystems technology:
When does big data become big data?
When distributing data:
DeepSee (and perhaps iKnow) on the nodes
ECP useful for maintaining code
Conclusion
Questions?
Thank you
Developer Connection
developer.intersystems.com



Your Global Summit Every Day
We want your feedback
Wed love your feedback on the academy
you just attended. Go to:
intersystems.com/survey

Select the date, time, and academy you attended
and complete the short evaluation form.

Thank you
Big Data Analytics
Otto Medin & Louise Parberry
Sales Engineers