data brewery

Bubbles – operations

Bubbles Operations
For Bubbles v0.1, June 2013 Operation
Metadata operations field_filter obj, keep, drop, rename Filters fields of an object. Keep – keep only listed fields, drop – keep all except fields in the drop list, rename – new field names. ‣rows ‣sql

Arguments

Description

Signatures

Row operations
filter_by_value filter_by_set obj, field, value obj, field, set Get rows where field is equal to value. Get rows where field is one of values from the set. Get rows where field is within given range. Get rows selected by the predicate. Predicate receives values for given fields. Distinct values for key fields Every first row with distinct value for key fields Provide a sample of object’s rows based on mode. The mode might be: first, nth, random. Returns object with rows ordered based on order. Order is a list of tuples (field, order). Aggregate measures by keys ‣rows ‣sql ‣rows

filter_by_range filter_by_predicate

obj, field, from, to obj, fields, predicate

(not yet) ‣rows ‣records ‣rows ‣sql ‣rows ‣rows ‣sql ‣rows ‣sql ‣rows

distinct first_unique sample

obj[, key] obj[, key][,discard] obj, value[, mode]

sort

obj, order

aggregate

obj, keys, measures, include_count

Field Operations
text_substitute string_strip obj, field, substitions obj, [fields, [chars]] Perform substitutions (pattern, value) on field. Strip whitespaces (or chars) from fields or all string and text fields. Appends fields to the object with specified constant values. Changes specified fields (or all date fields) to a date dimension key in form YYYYMMDD. unknown_date value is used for empty date fields. ‣rows ‣rows

append_constant_fields

obj, fields, values

‣rows ‣sql ‣rows ‣sql

dates_to_dimension

obj, [fields, [unknown_date]]

Revision 1, June 2013, Bubbles 0.1 prototype

data brewery
Operation
Compositions
append join_details objects[] master, detail, master_key, detail_key dimension, source, dimension_key, source_key dimension, source, dimension_key, source_key dimension, source, dimension_key, source_key, fields, version_field Append objects with same fields

Bubbles – operations

Arguments

Description

Signatures

‣rows ‣sql ‣rows,rows ‣sql,sql

Composes master and detail objects using left (inner) join by matching master_key field(s) with detail_key field(s). Get keys that were added to the source if compared with dimension. Comparison is done on specified keys. Get whole rows that were added to the source if compared with dimension. Comparison is done on specified keys. Get rows that were changed in the source (fields are compared for change). Row matching is done on specified keys.

added_keys

‣sql,sql

added_rows

‣sql,sql ‣sql,rows

changed_rows

‣sql,sql

Auditing
distinct_count obj[, fields] Count number of rows for distinct values of fields (or all fields) ‣sql

Assertions
assert_unique obj[, key] There should be no row (or key) duplicates in the object. ‣sql

Conversions
as_dict as_records fetch_all obj, key, value obj obj Converts object to a python dictionary. Return an object with records representation Fetches (consumes) all rows into a list and returns an object with rows representation. ‣rows ‣rows ‣sql ‣rows

Output
pretty_print obj, target Produces textual output to target (or stdout) formatted as table. ‣rows

Notes
■ All objects with sql representation currently provide also rows representation. The statements are executed (not necessarily fetched) and objects are handled as iterator objects. Therefore all rows operations can be used.

Revision 1, June 2013, Bubbles 0.1 prototype

data brewery
■ ■

Bubbles – operations

Assertions raise ProbeAssertionError on failure. Can be used in Pipelines to stop the process when condition is not met. Most of the keys may be either a single fields or list of fields (composite keys)

Revision 1, June 2013, Bubbles 0.1 prototype