You are on page 1of 100

AB INITIO

( Day -2 )
A Practical
Introduction to
Ab Initio Software:
Part 2
AB INITIO
Simple
Components
Component Organizer

Click on Component
Organizer Button
The Graph Model
The Graph Model: Naming the
Pieces
Components
Dataset Datasets

Flows
The Graph Model: Some Details
Ports

Record format
Expression
metadata
metadata
Components
 Components may run on any computer
running the Co>Operating System.
 Different components do different jobs.
 The particular work a component
accomplishes depends upon its parameter
settings.
 Some parameters are data transformations,
that is business rules to be applied to an
input(s) to produce a required output.
Datasets
 A dataset is a source or destination of
data. It can be a simple file, a database
table, a SAS dataset, ...
 Datasets may reside on any machine
running the Co>Operating System.
 Datasets may reside on other machines if
connected by FTP or database middleware.
 Data is always described by record format
metadata (termed “DML”).
Dataset : Simple components
• These are main
dataset components
use for the data
Source & result
storage in the files.
Input file
Input File represents data records read as
input to a graph from one or multiple
serial files or from a multifile.

 Description Tab :
 URL : Path of the file where data is stored.
( file: / mfile: )
 Partition : Ad hoc multifile (changing depth)
 Access Tab :
 File Handling
 File Protection
 Ports : DML
Dataset: Records and Fields
0345John Smith
A dataset is 0212Sam Spade
made up of
records; a Records 0322Elvis Jones
record 0492Sue West
consists of 0121Mary Forth
fields. 0221Bill Black

Analogous
database
terms are Fields
rows and
columns
Sources of Record Format
Metadata
 Record formats can be generated
from:
• Database catalogs
• COBOL copybooks
• Other third-party products
• SAS datasets
 One can always resort to manual
entry!
Viewing Component Properties

Double click on a
component to bring
up its Properties Page
Viewing Port Properties

DML
Record Format Metadata in
Graphical Form
0345John Smith
0212Sam Spade
0322Elvis Jones
0492Sue West
0121Mary Forth
0221Bill Black
DML Types
 Fixed length

 Delimiter

 Mixed
Editing Types in GDE

Field name Field type Field length


DML : Fixed length
record
decimal(4) id;
string(6) first_name;
string(6) last_name;
string(1) new_line;
end
DML : Delimited
record
decimal(“|”) id;
string(‘|’) first_name;
string(“|”) last_name;
string(“\n”) new_line;
end
DML : Mixed
record
decimal(4) id;
string(‘|’) first_name;
string(“|”) last_name;
string(1) new_line;
end
Field Names
 Names consist of letters, digits, and
underscores:
a … z, A … Z, 0 … 9, _
 Note: No spaces, hyphens, $’s, #’s, %’s

 Case does matters! ABC and abc are


different!

 Some words are reserved (record, end, date,


…)
Field Type and Field Length
• There are several built-in types available
via the drop-down menu. This course
uses three types: string, decimal (for all
numbers), and date.

• A date type requires a format specifier


that is an exact representation of the date
(e.g., “MM-DD-YYYY”).

• A field length is either a number for fixed-


length fields, or the delimiter that
terminates the field for variable-length
fields.
What Data Can Be Described?
 There are both fixed-size and
variable-length types.
 ASCII, EBCDIC, UNICODE character
sets are supported.
 Supported types can represent
strings, numbers, binary numbers,
packed decimals, dates …
 Complex data formats can consist of
nested records, vectors, ...
Access to Field Characteristics
 Some aspects of field descriptions
(e.g., date formats) must be
accessed via the attribute pane.
 To see additional attributes, use the
‘Attributes’ item on the Record
Format Editor’s View Menu or use the
Attributes button.
More Record Format Editing
View… Attributes. Length can be delimiter string

Field Type drop-down Date format goes here


Expressions in DML
 Computations are expressed in the
algebraic syntax of C.
 Field names act as variables.
 Arithmetic operators: +, -, *, ...
 Comparison operators: >, <, ==, !=, ...
 Many built-in functions: string_concat,
string_trim, today, date_day_of_week, …
(but field sequence dependency)
Output file
Output File represents data records written
as output from a graph into one or
multiple serial files or a multifile.

 Description Tab :
 URL : Path of the file where data is stored.
( file: / mfile: )
 Partition : Ad hoc multifile (changing depth)
 Access Tab :
 File Handling
 File Protection
 Port : DML
Intermediate file
Intermediate File represents one or
multiple serial files or a multifile of
intermediate results that a graph writes
during execution, and saves for your
review after execution.

 Description Tab :
 URL : Path of the file where data is stored.
( file: / mfile: )
 Partition : Ad hoc multifile (changing depth)
 Access Tab :
 File Handling
 File Protection
 Port : DML
Viewing Data

1. Right click on dataset.

2. Select “View Data...”


The View Data Panel
Evaluating Expressions from
View Data

Type in an expression...

…or use the expression editor


Expression Editor
Fields Functions Operators

Expression text
Exercise : Writing DML
 Open New Graph create input file
 The data file data1.dat contains following data:
Rao,Sunita,20031223,24000,\n
Shinde,Sachin,19931029,32000,\n
Sharma,Sunil,19941102,19000,\n
 Use the Record Format Editor to create a
description of this data:
last_name
first_name
joining_date
salary
 Then use View Data to verify the description is
correct.
Simple components
 In these
components don’t
have any
parameter
Trash
 Trash ends a flow by accepting all
the data records in it and discarding
them.

Replicate
 Replicate arbitrarily combines all the data
records it receives into a single flow and
writes a copy of that flow to each of its
output flows.
Component: Gather Logs
 Reads logging records from
multiple flows connected to the
input port and writes them to the
specified ‘log file’ outside of the
application’s transactional context.
Database Components
 In these
components deals
with the third
party databases
for data Reading,
Manipulating and
Saving data in the
tables.

Note : Parameters changes depend on the database and utility to connect


the database.
Database Configuration (.dbc)
 dbms: oracle ## Required. Do not change
 db_version: 9.2.0.1 ## Required. Enter the Oracle version
number
 db_home: /etl_test/u01/app/oracle/product/9.2.0.1 ##
ORACLE_HOME
 db_name: @RDMETL ## Connect string
 db_nodes: localhost
 user: abinitio1 ## Or use a variable to avoid hard coding - $
{MY_USER}
 password: abinitio ## Can be encrypted from 2.12 onwards
Input Table
 Input Table unloads data records from a
database into an Ab Initio graph, allowing
you to specify as the source either a
database table, or an SQL statement that
selects data records from one or more
tables.
Output Table
 Output Table loads data records from a
graph into a database, letting you specify
the records' destination either directly as a
single database table, or through an SQL
statement that inserts data records into
one or more tables..
Run SQL
 Run SQL executes SQL statements in a
database and writes confirmation
messages to the log port.

 You can use Run SQL to perform database


operations such as table or index creation.
Exercise: Input Table / Run SQL
 Create DBC to connect the database
 Create table temp1 with columns
id number
first_name varchar2(10)
last_name varchar2(10)
 Create index on id column
 Insert dummy data in the table using
database insert statements
 View data in GDE using Input Table.
Update table
 Update Table executes UPDATE, INSERT or
DELETE statements in embedded SQL
format to modify a table in a database, and
writes status information to the log port.
 Port SQL is associated with the parameter
updateSqlOnceOnly.
• true :executed once (only), at the start of the
component's execution.
• false :first executed, and re-executed
immediately after each commit
Update Table : Working
 It’s work like merge (upsert) command:
The statements are applied to the
incoming records as follows. For each
record:
• The statement referenced by updateSqlFile is
attempted first. If the statement can be
successfully applied to the current record, it is
executed, and the statement referenced by
insertSqlFile is skipped.
• If the updateSqlFile statement cannot be
applied to the current record, the statement
referenced by insertSqlFile is attempted.
Note that updateSqlFile and insertSqlFile need not be files: the SQL
statements can be embedded in the component directly.
Simple Components

 In these
components the
record format
metadata does
not change from
input to output
The Filter by Expression
 For each record on the input port the
‘select_expr’ parameter is evaluated.
• If ‘select_expr’ evaluates true (non-zero), the
input record is written to the ‘out’ port exactly
as the input was read.
• If the ‘select_expr’ evaluates false (zero), the
record is written to the ‘deselect’ port.
 The ‘out’ port must be connected
downstream, those records meeting the
‘select_expr’ criteria
 The ‘deselect’ output may be optionally
used
Filter Data (Selection)
1. Push “Run” button.

2. View monitoring information.


3. View output data.
Expression Parameter
The Sort Component
 Reads records from input port, sorts
them by key, and writes the result
on the output port.
Keys
 A key identifies a single field or set
of fields (a composite key) used to
organize a dataset in some way.
 Single field: {id}
 Multiple fields: {last_name;
first_name}
 Modifiers: {id descending}
 Used for sorting, grouping,
partitioning.
Sorting
Sorting - The Key Specifier Editor
Exercise : Sorting & Filtering
 Read data1.dat
 Sort data according to the id field
(asc. /desc.)
 Save records having id <100 in
outfile1.dat
 Save records having id >= 100 in
table temp1
 Run the application and examine the
resulting data.
More Complex Components
 In these
components the
record format
metadata typically
changes (goes
through a
transformation)
from input to
output
Reformat
 Reads records from input port,
reformats each according to a
transform function (optional in the case
of the Reformat Component), and
writes the result records to the output
(out0) port.
 Additional output ports (out1, ...) can
be created by adjusting the count
parameter.
Transformation Functions
 A transform function specifies the
business rules used to create the
output record.
 Each field of the output record must
successfully be assigned a value.
Partial output records are not allowed!
 The Transform Editor is used to create
a transform function in a graphical
manner.
Data Transformation
id,last_name,first_name,j_date,salary

Reformat Change format to Remove


DD/MM/YYYY
id+1000000

Combine

n_id,full_name,n_date
The Transform Function Editor
Text DML: Transform Function
Syntax
 Transform Functions look like:
output-variables :: name ( input-
variables ) =
begin
assignments;
end;
 Assignments look like:
output-variable.field :: expression;
The Transform Function in Text
Format

out :: reformat (in) =


begin
out.id :: in.id + 1000000;
out.last_name :: string_concat(“Mac”, in.last_name);
end;
A Look Inside the Reformat Component

a b c

x y z
A Record arrives at the input port
9 45 QF

out :: trans(in) =
begin
out.x :: in.b - 1;
out.y :: in.a;
out.z :: fn(in.c);
end;
The Record is read into the component

9 45 QF

out :: trans(in) =
begin
out.x :: in.b - 1;
out.y :: in.a;
out.z :: fn(in.c);
end;
The Transformation Function is evaluated

9 45 QF

out :: trans(in) =
begin
out.x :: in.b - 1;
out.y :: in.a;
out.z :: fn(in.c);
end;
Since every rule within the Transform
function

out :: trans(in) =
begin
out.x :: in.b - 1;
out.y :: in.a;
out.z :: fn(in.c);
end;

44 9 RG
The result record is written to the output port
of the component

out :: trans(in) =
begin
out.x :: in.b - 1;
out.y :: in.a;
out.z :: fn(in.c);
end;

44 9 RG
Exercise : Reformat Data
 New graph (use i/p file data1.dat)
• id|city|name|salary
 Remove records for employee having
id >500
 Add a New field in output name
city (default value ‘Mumbai’)
 Change the delimiter from “|” to “;”
 Increase salary by 25% for employee
having id <100
 Run the graph and examine the results.
Rollup
 Rollup generates data records that
summarize groups of data records.
 By default, Rollup reads grouped
(sorted) records from the input port,
aggregates them as indicated by key
and transform parameters, and writes
the resulting aggregate record on the
out port.
Data Aggregation

0345Smith Bristol 56 Bristol 63


0212Spade London 8 Compton 12
0322Jones Compton 12 London 31
0492West London 23 New York 42
0121Forth Bristol 7
0221Black New York 42
Data Aggregation of
Sorted/Grouped Input

0345Smith Bristol 56
0121Forth Bristol 7 Bristol 63
0322Jones Compton 12 Compton 12
0212Spade London 8
0492West London 23 London 31
0221Black New York 42 New York 42
Built-in Functions for Rollup
 The following aggregation functions
are predefined and are only available
in the rollup component:
avg
max
count
min
first
product
last
sum
Rollup Wizard

Note the use of an aggregation function in the expression


Exercise : Rollup Data
 For above data find max, min points
associated with the city name
 Save aggregation result in different
fields (max_pt, min_pt)
 Run the application and examine the
results.
The Join Component
 Join performs a join of inputs.
 Join types are inner, outer, and
semi-joins with multiple flows of
data records.
 By default, the inputs to join must be
sorted and an inner join is computed.
Joining Data
0345Smith Bristol 56 0322970402 1242.50
0212Spade London 8 0345970924 923.75
0322Jones Compton 12 0121961211 12392.00
0492West London 23 0492971123 234.12
0121Forth Bristol 7 0666950616 2312.10
0221Black New York 42

0345Bristol 561997/09/24
0212London 81900/01/01
0322Compton 121997/04/02
0492London 231997/11/23
0121Bristol 71996/12/11
0221New York 421900/01/01
Joining Sorted Data on the ‘id’ field

0121Forth Bristol 7 0121961211 12392.00


0212Spade London 8
0221Black New York 42
0322Jones Compton 12 0322970402 1242.50
0345Smith Bristol 56 0345970924 923.75
0492West London 23 0492971123 234.12
0666950616 2312.10

0121Bristol 71996/12/11
0212London 81900/01/01
...
Building the Output Record
in0: in1:
record record
decimal(4) id; decimal(4) id;
string(6) name; date(”YYMMDD”) dt;
string(8) city; decimal(9.2) cost;
decimal(3) amount; end
end

out:
record
decimal(4) id;
string(8) city;
decimal(3) amount;
date(“YYYY/MM/DD”)dt;
end
What if the in1 record is missing?
in0: in1:
record record
decimal(4) id; decimal(4) id;
string(6) name; date(”YYMMDD”) dt; ???
string(8) city; decimal(9.2) cost;
decimal(3) amount; end
end

out:
record
decimal(4) id;
string(8) city;
decimal(3) amount;
date(“YYYY/MM/DD”)dt;
end
Prioritized Assignment
Destination Priority Source

out.dt :1: in1.dt;


out.dt :2: “1900/01/01”;
 In DML, a missing value (say, if there is no ‘in1’ record)
causes an assignment to fail.
 If an assignment for a left hand side fails, the next priority
assignment is tried. There must be one successful assignment
for each output field.
Assigning Priorities to Business
Rules
Resulting display when out.dt is
selected
Joining
A Look Inside the Join Component*

a b c a q r

Align inputs by key *join-type = Full


Outer join
a b c a q r

out :: fname(in0, in1) =


begin
...
...
...
...
...
end;

a x q
Records arrive at the inputs of the Join
G 234 42 G NY 4

Align inputs by a

out :: join(in0, in1) =


begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
The input records are read into the Join

G 234 42 G NY 4

Align inputs by a

out :: join(in0, in1) =


begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
The input Key fields are compared

G 234 42 G NY 4

Align inputs by a

out :: join(in0, in1) =


begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
The aligned records are passed to the
transformation function

Align inputs by a

G 234 42 G NY 4

out :: join(in0, in1) =


begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
The transformation engine evaluates based
on the inputs

Align inputs by a

G 234 42 G NY 4

out :: join(in0, in1) =


begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
A result record is emitted and written out

Align inputs by a

out :: join(in0, in1) =


begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;

G 24 NY
New records arrive at the inputs of the Join
H 79 23 K IL 8

Align inputs by a

out :: join(in0, in1) =


begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
Again, they are read into the Join
component
H 79 23 K IL 8

Align inputs by a

out :: join(in0, in1) =


begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
The input key fields are compared

H 79 23 K IL 8

Align inputs by a

out :: join(in0, in1) =


begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
The aligned records are passed to the
transformation function

K IL 8

Align inputs by a

H 79 23

out :: join(in0, in1) =


begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
The transformation engine evaluates based
on the inputs

K IL 8

Align inputs by a

H 79 23

out :: join(in0, in1) =


begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;
A result record is generated and written out

K IL 8

Align inputs by a

out :: join(in0, in1) =


begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: ”XX”;
end;

H 89 XX
Exercise: Join Data
 Study different joins
• Inner
• Full Outer
• Explicit
 Record required parameter
If a port does not have a record with a key
value that matches the current key value,
and you set the record-required parameter
for that port to:
• false - Join calls the transform function with NULL
for the corresponding argument.
• true - Join does not call the transform function at
all for the current key value.
The GDE Debugger
 The GDE has a built in debugger
capability
 To enable the Debugger,
Debugger:Enable Debugger
 The Debugger Toolbar
Enable Debugger Remove All Watchers

Add Watcher File Isolate Components


The GDE Debugger
 To add a Watcher File, select a flow and
click Add Watcher
 To remove a Watcher File, click Remove
All Watchers
 To Isolate a set of components, select the
components to be Isolated, Watcher Files
will automatically be placed into the graph
by the Debugger.
END
( Day -2 )

You might also like