You are on page 1of 2

Cubes

– light weight pluggable data warehouse

Synopsis: Python framework and set of tools for building a heterogenous pluggable data warehouse, multidimensional data access and online
analytical processing (OLAP) of categorical data. Light weight.
Authors: Robin Thomas and Stefan Urbanek

Overview

Pluggable Analytical Workspace

Take your Google Analytics, and your SQL database, and you have a
single way for your users to access all of them. No need to grant
account access for everyone to each particular datasource.

Manages the model, data stores and model providers.

sales

%
Workspace

"
Store

|
Browser

churn

BI Data
(Postgres)

activations

events

BI Data 2

Events

(Mongo)

(API)

[store_data]
type: sql
url: postgres://localhost/data
[store_data2]
type: mongo
host: localhost
[store_events]
type: mixpanel
api_key: 123456
api_secret: 123456

workspace = Workspace()
workspace.import_model(“model.json”)
workspace.register_default_store(“sql”, “postgres://localhost/data”)

Supported Backends:

analytical data

source data

API Model
Provider

Stores

– easy to use JSON over HTTP API
– can be integrated as Flask Blueprint
&
Authorizer

Static Model
Provider

Cubes

Workspace – browsing of aggregated data
– multi-dimensional data modeling
– unified interface for analytical data
Server

Stores configuration:

Model Providers

#
Server

|
User Interface

Slicer

$
Authenticator

collect other (external) Cubes Slicers

Model

Slicing and Dicing

Metadata – Logical description of data: cubes, dimensions, measures
and aggregates.
Cubes – logical data structure, collection of measurable facts (invoices,
phone calls, events, …)
Dimensions – provide context for facts, used to filter queries or reports
and control scope of aggregation of facts. Might contain concept
hierarchies such as category–subcategory–product or date hierarchy
(year-month-day or year–quarter–month-day) or geographical
hierarchies.
Model also provides information about mapping to the physical data
store.

Cell – Provides context of interest, composed of cuts. There are three
kinds of cuts: point, set and range. Cuts can be also inverted using
invert=True, which will yield cells outside of the cut.
point cut – single dimension member
✂ cut1 = PointCut(9 “status”, [“open”])

✂ cut2 = SetCut(9 “region”, [[“sk“, “ba”], [“hu”]])
SetCut

dimension

id

sales

year

sale_date

month

product_id

code

product

store_id

address

id

amount

code
name

Denormalized

paths

range cut – members between two values of an
ordered dimension (such as date)

Physical model

store

path

set cut – multiple dimensions members

SQL schema example:
date

dimension

PointCut

"cubes": [
{
"name": "sales”,
“measures”: [“amount”],
"dimensions": [“date”, “product", "store"]
"joins": [
{"master”:”date_id”, "detail”:”date.id"},
{"master":"product_id", "detail":"product.id"},
{"master":"store_id", "detail":"store.id"}
]
}
],
"dimensions": [
{ "name": "product", "attributes": ["code", "name"] },
{ "name": "store", "attributes": ["code", "address"] }
]

RangeCut

✂ cut3 = RangeCut(9 “date”, [2010, 1], [2012])
to

dimension

from

ocell = Cell([cut1, cut2, cut3])

Cell as user interface element: multi-dimensional breadcrumbs
filter UI

Browsing and Aggregation
Get aggregated data for a cell, get dimension members within a cell or get lit of facts within a cell, if available. Uses backend-specific aggregation or
interface to another aggregation engine
browser = workspace.browser(“contracts”)
result = browser.aggregate(o cell,
. drilldown=[9 “sector”])

Drill-down

cell[level.label_attribute]

cell[level.key]

for table cell or link label

for URL
Summary

Logical

Physical


aggregate

AggregationResult
|

"

Browser

Store

model

result.summary

backend-specific
might hold a database cursor

facts

SQL or other backend-specific
query is generated

result.cells
iterable

pri
ce

for cell in result:
print “%s: %s“ % (cell[level.label_attribute],
cell[aggregate.name])

measure

physical data store
(database or API)

aggregate:

price_sum =

price

facts

cell[“price_sum”]
aggregate

Drill-down – Get more details – by year, by produce, by store, …
result.cells

result.summary

o cell = Cell(cube)
browser.aggregate(o cell)

browser.aggregate(o cell,
drilldown=[9 “date”])

result.cells
✂ cut = PointCut(9 “date”, [2010])
o cell = o cell.slice(✂ cut)
browser.aggregate(o cell,
drilldown=[9 “date”])

Slicer Server

Visualizers

Unified aggregation interface to variety of data stores and services.
JSON interface over HTTP. Built using Flask web micro-framework. Can
be used as a stand-alone server or integrated in another application
and serve as an analytical module.

Turning JSON data into reports, charts, tables. It is very easy to build
custom visualization on top of the Cubes analytical data with
framework of yor choice.

List cubes:

Slicer

{

GET /cubes
Cell
(point of view)

Get cube model (metadata):
aggregates

GET /cube/sales/model

facts
(details)

Aggregate:

Cubes

GET /cube/sales/aggregate? cut=date:2010

model

& drilldown=date,region & split=status:1
& page=10 & page_size=100
}

"cell": [],
"total_cell_count": 2,
"drilldown": [
{
"record_count": 31,
"amount_sum": 550840,
“date.year": 2009
},
{
"record_count": 31,
"amount_sum": 566020,
“date.year": 2010
}
],
"summary": {
"record_count": 62,
"amount_sum": 1116860
}
Slicer JSON response

/facts
– get list of facts within a cell (if available)
/members – list dimension members within a cell
/cell
– get multi-dimensional breadcrumbs information,
browsing context or “where am I looking at?”

generic visualizers and reporting applications

Results can be paginated page=, ordered with order= and also
formatted as CSV or newline separated JSON records using format=.

specific purpose reports

Use either a generic visualizer and reporting application such as Cubes
Visualizer or Cubes Viewer or create one that suits your reporting
needs.

Ways of Deployment

Authentication and Authorization

Quick ways of creating an analytical data server or adding an analytical
module into your application or on top of your system.

Authorization – Manage access to the cubes or part of a cube using
access rights. User might have a right only to al imited set of cubes or
might have access to a particular cell in the cube. For example
engineers might not have access to the financial cube and stores might
have access to financials only for their store.

➊ Python web framework using Cubes python module ➋ Plug-in for
Flask application ➌ Stand-alone server with HTML+JS front-end or stadalone server with external application.
HTML

HTML

HTTP request

JSON reply
Flask

Django, Flask, …
Cubes
Python API

Web Application
HTML+JS, RoR, …

Slicer Blueprint

JSON reply

model
model

bash$ slicer serve slicer.ini

{

Slicer server

store

Serving with the slicer tool:

Built-in authorizer uses a
JSON rights configuration file:

model

store

Simple deployment with UWSGI:
[uwsgi]
socket = 127.0.0.1:5000
module = cubes.server.app
callable = application

Flask blueprint integration:
from cubes.server import slicer
app = Flask(__name__)
app.register_blueprint(slicer,
url_prefix="/slicer")

}

Custom authorizer:
class CustomAuthorizer(Authorizer):
def authorize(self, cubes):
… authorize with a database …
return authorized_cubes

“lidia”: {
“allowed_cubes”: [“sales”],
“cube_restrictions”: {
“sales”: [“store:3”]
}
},
“martin”: {
“allowed_cubes”: [“sales”],
“cube_restrictions”: {
“sales”: [“store:5”]
}
}

def restricted_cell(self, identity, cube, cell):
# Restriction with ‘user identity’ dimension
cut = PointCut(“users”, [identity])
restriction = Cell(cube, [cut])
if cell:
return cell & restriction
else:
return restriction

Authentication – Server-side, plug-in based action that based on user’s
credentials or any other relevant information, provides a user identity
which is passed to the workspace.
There are two built-in atuhenticators: pass_parameter: pass identity
as a URL parameter, permissive method and http_basic_proxy:
permissive authentication using HTTP Basic method.

Backends

SQL Backend

Bring your own aggregation engine. Take your Google Analytics, and
your SQL database, and you have a single way for your users to access
all of them. No need to grant account access for everyone to each
particular datasource.

Built-in backend for ROLAP (Relational OLAP)
Features:
■ star and snowflake schema support
■ joins are executed only if needed for a given query
■ mapping of DATE data type without the date dimension table
■ simple support of non-additive/semi-additive dimensions and
aggregates
■ “split” cell dimension – mark cells as within or outside of a split cell
■ support for outer-joins

Backend modules:
#
Model Provider

model

|
Browser

!
Store

OR

Model Provider – A live cubes concept mapper: maps foreign
cubes or foreign cube-like structures into Cubes model.
Aggregation Browser – provides the core functionality of
aggregation or delegates the aggregation to an external
aggregator.
Store – manages access to the data, establishes and
maintains database connections, generates appropriate
external API calls (“pretends to be a store”).
Backend

Cubes / Facts

Dimensions

Model

table

column (table)

required

MongoDB

collection

key/attribute

required

Mixpanel

event

property

automatic

Google Analytics

metric

dimension

automatic

cube

dimension

automatic

SQL

Cubes concepts:

Slicer

★ OR ❄

subject category

subject

supplier

supplier type

subject dimension supplier dimension

contract

date

city

region

denormalization

data brewery.org
$ http://cubes.databrewery.org
% https://github.com/stiivi/cubes
& #databrewery at irc.freenode.net
Published for PyCon, April 2014, based on Cubes v1.0

geography dim.

date dim.