You are on page 1of 3

data brewery

Bubbles – operations

Bubbles Operations

For Bubbles v0.1, June 2013

Operation

Arguments

Description

Signatures

Metadata operations

eld_ lter

obj, keep, drop, rename

Filters elds of an object. Keep – keep only listed elds, drop – keep all except elds in the drop list, rename – new eld names.

rows

sql

Row operations

lter_by_value

obj, eld, value

Get rows where eld is equal to value.

rows

sql

lter_by_set

obj, eld, set

Get rows where eld is one of values from the set.

rows

lter_by_range

obj, eld, from, to

Get rows where eld is within given range.

(not yet)

lter_by_predicate

obj, elds, predicate

Get rows selected by the predicate. Predicate receives values for given elds.

rows

 

records

distinct

obj[, key]

Distinct values for key elds

rows

 

sql

rst_unique

obj[, key][,discard]

Every rst row with distinct value for key elds rows

sample

obj, value[, mode]

Provide a sample of object’s rows based on mode. The mode might be: rst, nth, random.

rows

 

sql

sort

obj, order

Returns object with rows ordered based on order. Order is a list of tuples ( eld, order).

rows

 

sql

aggregate

obj, keys, measures, include_count

Aggregate measures by keys

rows

Field Operations

text_substitute

obj, eld, substitions

Perform substitutions (pattern, value) on eld. rows

string_strip

obj, [elds, [chars]]

Strip whitespaces (or chars) from elds or all string and text elds.

rows

append_constant_ elds

obj, elds, values

Appends elds to the object with speci ed constant values.

rows

 

sql

dates_to_dimension

obj, [elds,

[unknown_date]]

 

rows

sql

Changes speci ed elds (or all date elds) to a date dimension key in form YYYYMMDD. unknown_date value is used for empty date elds.

data brewery

Bubbles – operations

Operation

Arguments

Description

Signatures

Compositions

append

objects[]

Append objects with same elds

rows

sql

join_details

master, detail,

Composes master and detail objects using left (inner) join by matching master_key eld(s) with detail_key eld(s).

rows,rows

master_key,

sql,sql

detail_key

added_keys

dimension, source,

Get keys that were added to the source if compared with dimension. Comparison is done on speci ed keys.

sql,sql

dimension_key,

source_key

added_rows

dimension, source,

Get whole rows that were added to the source if compared with dimension. Comparison is done on speci ed keys.

sql,sql

dimension_key,

sql,rows

source_key

changed_rows

dimension, source,

Get rows that were changed in the source

sql,sql

dimension_key,

(

elds are compared for change). Row

source_key, elds,

matching is done on speci ed keys.

version_ eld

Auditing

distinct_count

obj[, elds]

Count number of rows for distinct values of elds (or all elds)

sql

Assertions

assert_unique

obj[, key]

There should be no row (or key) duplicates in the object.

sql

Conversions

as_dict

obj, key, value

Converts object to a python dictionary.

rows

as_records

obj

 

Return an object with records representation

rows

sql

fetch_all

obj

Fetches (consumes) all rows into a list and returns an object with rows representation.

rows

Output

pretty_print

obj, target

rows

Produces textual output to target (or stdout) formatted as table.

Notes

All objects with sql representation currently provide also rows representation. The

statements are executed (not necessarily fetched) and objects are handled as iterator objects. Therefore all rows operations can be used.

Revision 1, June 2013, Bubbles 0.1 prototype

data brewery

Bubbles – operations

Assertions raise ProbeAssertionError on failure. Can be used in Pipelines to stop the process when condition is not met.

Most of the keys may be either a single elds or list of elds (composite keys)

Revision 1, June 2013, Bubbles 0.1 prototype