You are on page 1of 2

Cubes

light weight pluggable data warehouse

Synopsis: Python framework and set of tools for building a heterogenous pluggable data warehouse, multidimensional data access and online
analytical processing (OLAP) of categorical data. Light weight.
Authors: Robin Thomas and Stefan Urbanek

Overview

Pluggable Analytical Workspace

Take your Google Analytics, and your SQL database, and you have a
single way for your users to access all of them. No need to grant
account access for everyone to each particular datasource.

Manages the model, data stores and model providers.

sales

%
Workspace

"
Store

|
Browser

churn

BI Data
(Postgres)

activations

events

BI Data 2

Events

(Mongo)

(API)

[store_data]
type: sql
url: postgres://localhost/data
[store_data2]
type: mongo
host: localhost
[store_events]
type: mixpanel
api_key: 123456
api_secret: 123456

workspace = Workspace()
workspace.import_model(model.json)
workspace.register_default_store(sql, postgres://localhost/data)

Supported Backends:

analytical data

source data

API Model
Provider

Stores

easy to use JSON over HTTP API


can be integrated as Flask Blueprint
&
Authorizer

Static Model
Provider

Cubes

Workspace browsing of aggregated data


multi-dimensional data modeling
unied interface for analytical data
Server

Stores conguration:

Model Providers

#
Server

|
User Interface

Slicer

$
Authenticator

collect other (external) Cubes Slicers

Model

Slicing and Dicing

Metadata Logical description of data: cubes, dimensions, measures


and aggregates.
Cubes logical data structure, collection of measurable facts (invoices,
phone calls, events, )
Dimensions provide context for facts, used to lter queries or reports
and control scope of aggregation of facts. Might contain concept
hierarchies such as categorysubcategoryproduct or date hierarchy
(year-month-day or yearquartermonth-day) or geographical
hierarchies.
Model also provides information about mapping to the physical data
store.

Cell Provides context of interest, composed of cuts. There are three


kinds of cuts: point, set and range. Cuts can be also inverted using
invert=True, which will yield cells outside of the cut.
point cut single dimension member
cut1 = PointCut(9 status, [open])

cut2 = SetCut(9 region, [[sk, ba], [hu]])


SetCut

dimension

id

sales

year

sale_date

month

product_id

code

product

store_id

address

id

amount

code
name

Denormalized

paths

range cut members between two values of an


ordered dimension (such as date)

Physical model

store

path

set cut multiple dimensions members

SQL schema example:


date

dimension

PointCut

"cubes": [
{
"name": "sales,
measures: [amount],
"dimensions": [date, product", "store"]
"joins": [
{"master:date_id, "detail:date.id"},
{"master":"product_id", "detail":"product.id"},
{"master":"store_id", "detail":"store.id"}
]
}
],
"dimensions": [
{ "name": "product", "attributes": ["code", "name"] },
{ "name": "store", "attributes": ["code", "address"] }
]

RangeCut

cut3 = RangeCut(9 date, [2010, 1], [2012])


to

dimension

from

ocell = Cell([cut1, cut2, cut3])

Cell as user interface element: multi-dimensional breadcrumbs


lter UI

Browsing and Aggregation


Get aggregated data for a cell, get dimension members within a cell or get lit of facts within a cell, if available. Uses backend-specic aggregation or
interface to another aggregation engine
browser = workspace.browser(contracts)
result = browser.aggregate(o cell,
. drilldown=[9 sector])

Drill-down

cell[level.label_attribute]

cell[level.key]

for table cell or link label

for URL
Summary

Logical

Physical

aggregate

AggregationResult
|

"

Browser

Store

model

result.summary

backend-specic
might hold a database cursor

facts

SQL or other backend-specic


query is generated

result.cells
iterable

pri
ce

for cell in result:


print %s: %s % (cell[level.label_attribute],
cell[aggregate.name])

measure

physical data store


(database or API)

aggregate:

price_sum =

price

facts

cell[price_sum]
aggregate

Drill-down Get more details by year, by produce, by store,


result.cells

result.summary

o cell = Cell(cube)
browser.aggregate(o cell)

browser.aggregate(o cell,
drilldown=[9 date])

result.cells
cut = PointCut(9 date, [2010])
o cell = o cell.slice( cut)
browser.aggregate(o cell,
drilldown=[9 date])

Slicer Server

Visualizers

Unied aggregation interface to variety of data stores and services.


JSON interface over HTTP. Built using Flask web micro-framework. Can
be used as a stand-alone server or integrated in another application
and serve as an analytical module.

Turning JSON data into reports, charts, tables. It is very easy to build
custom visualization on top of the Cubes analytical data with
framework of yor choice.

List cubes:

Slicer

GET /cubes
Cell
(point of view)

Get cube model (metadata):


aggregates

GET /cube/sales/model

facts
(details)

Aggregate:

Cubes

GET /cube/sales/aggregate? cut=date:2010

model

& drilldown=date,region & split=status:1


& page=10 & page_size=100
}

"cell": [],
"total_cell_count": 2,
"drilldown": [
{
"record_count": 31,
"amount_sum": 550840,
date.year": 2009
},
{
"record_count": 31,
"amount_sum": 566020,
date.year": 2010
}
],
"summary": {
"record_count": 62,
"amount_sum": 1116860
}
Slicer JSON response

/facts
get list of facts within a cell (if available)
/members list dimension members within a cell
/cell
get multi-dimensional breadcrumbs information,
browsing context or where am I looking at?

generic visualizers and reporting applications

Results can be paginated page=, ordered with order= and also


formatted as CSV or newline separated JSON records using format=.

specic purpose reports

Use either a generic visualizer and reporting application such as Cubes


Visualizer or Cubes Viewer or create one that suits your reporting
needs.

Ways of Deployment

Authentication and Authorization

Quick ways of creating an analytical data server or adding an analytical


module into your application or on top of your system.

Authorization Manage access to the cubes or part of a cube using


access rights. User might have a right only to al imited set of cubes or
might have access to a particular cell in the cube. For example
engineers might not have access to the nancial cube and stores might
have access to nancials only for their store.

Python web framework using Cubes python module Plug-in for


Flask application Stand-alone server with HTML+JS front-end or stadalone server with external application.
HTML

HTML

HTTP request

JSON reply
Flask

Django, Flask,
Cubes
Python API

Web Application
HTML+JS, RoR,

Slicer Blueprint

JSON reply

model
model

bash$ slicer serve slicer.ini

Slicer server

store

Serving with the slicer tool:

Built-in authorizer uses a


JSON rights conguration le:

model

store

Simple deployment with UWSGI:


[uwsgi]
socket = 127.0.0.1:5000
module = cubes.server.app
callable = application

Flask blueprint integration:


from cubes.server import slicer
app = Flask(__name__)
app.register_blueprint(slicer,
url_prefix="/slicer")

Custom authorizer:
class CustomAuthorizer(Authorizer):
def authorize(self, cubes):
authorize with a database
return authorized_cubes

lidia: {
allowed_cubes: [sales],
cube_restrictions: {
sales: [store:3]
}
},
martin: {
allowed_cubes: [sales],
cube_restrictions: {
sales: [store:5]
}
}

def restricted_cell(self, identity, cube, cell):


# Restriction with user identity dimension
cut = PointCut(users, [identity])
restriction = Cell(cube, [cut])
if cell:
return cell & restriction
else:
return restriction

Authentication Server-side, plug-in based action that based on users


credentials or any other relevant information, provides a user identity
which is passed to the workspace.
There are two built-in atuhenticators: pass_parameter: pass identity
as a URL parameter, permissive method and http_basic_proxy:
permissive authentication using HTTP Basic method.

Backends

SQL Backend

Bring your own aggregation engine. Take your Google Analytics, and
your SQL database, and you have a single way for your users to access
all of them. No need to grant account access for everyone to each
particular datasource.

Built-in backend for ROLAP (Relational OLAP)


Features:
star and snowake schema support
joins are executed only if needed for a given query
mapping of DATE data type without the date dimension table
simple support of non-additive/semi-additive dimensions and
aggregates
split cell dimension mark cells as within or outside of a split cell
support for outer-joins

Backend modules:
#
Model Provider

model

|
Browser

!
Store

OR

Model Provider A live cubes concept mapper: maps foreign


cubes or foreign cube-like structures into Cubes model.
Aggregation Browser provides the core functionality of
aggregation or delegates the aggregation to an external
aggregator.
Store manages access to the data, establishes and
maintains database connections, generates appropriate
external API calls (pretends to be a store).
Backend

Cubes / Facts

Dimensions

Model

table

column (table)

required

MongoDB

collection

key/attribute

required

Mixpanel

event

property

automatic

Google Analytics

metric

dimension

automatic

cube

dimension

automatic

SQL

Cubes concepts:

Slicer

OR

subject category

subject

supplier

supplier type

subject dimension supplier dimension

contract

date

city

region

denormalization

data brewery.org
$ http://cubes.databrewery.org
% https://github.com/stiivi/cubes
& #databrewery at irc.freenode.net
Published for PyCon, April 2014, based on Cubes v1.0

geography dim.

date dim.