You are on page 1of 29

What Does Ab Initio Mean?

 Ab Initio is a Latin phrase that means:


 Of, relating to, or occurring at the beginning;
first
 From first principles, in scientific circles
 From the beginning, in legal circles
About Ab Initio

 Ab Initio is a general purpose data processing platform for enterprise


class, mission critical applications such as data warehousing,
clickstream processing, data movement, data transformation and
analytics.
 Supports integration of arbitrary data sources and programs, and
provides complete metadata management across the enterprise.
 Proven best of breed ETL solution.
 Applications of Ab Initio:
– ETL for data warehouses, data marts and operational data sources.
– Parallel data cleansing and validation.
– Parallel data transformation and filtering.
– High performance analytics
– Real time, parallel data capture.
Ab initio Platforms
 No problem is too big or too small for Ab Initio.
Ab Initio runs on a few processors or few
hundred processors. Ab Initio runs on virtually
every kind of hardware
 SMP (Symmetric Multiprocessor) systems
 MPP (Massively Parallel Processor) systems
 Clusters
 PCs
Ab Initio runs on many operating
systems
 Compaq Tru64 UNIX
 Digital unix
 Hewlett-Packard HP-UX
 Ibm aix
 NCR MP-RAS
 Red Hat Linux
 IBM/Sequent DYNIX/ptx
 Siemens Pyramid Reliant UNIX
 Slicon Graphics IRIX
 Sun Solaris
 Windows NT and Windows 2000
Ab Initio base software
consists of three main pieces:

 Ab Initio Co>Operating System and


core components
 Graphical Development
environment(GDE)
 Enterprise Metadata
Environment(EME)
Ab Initio Architecture

Applications
Ab Initio
Application Development Environments Metadata
Graphical C ++ Shell Repository
Component User-defined Third Party
Library Components Components

Ab Initio Co>Operating System

Native Operating System


UNIX Windows NT
What is Graph Programming
Ab Initio has based the GDE on the Data Flow
Model
 Data flow diagrams allow you to think in terms of
meaningful processing steps, not microscopic
lines of code
 Data flow diagrams capture the movement of
information through the application.
 Ab Initio calls this development method Graph
Programming
Graph Programming?
 The process of constructing Ab Initio
applications is called Graph Programming.
 In Ab Initio’s Graphical Development
Environment, you build an application by
manipulating components, the building blocks of
the graph.
 Ab Initio Graphs are based on the Data Flow
Model. Even the symbols are similar. The basic
parts of Ab Initio graphs are shown below.
Symbols
Boxes for processing and Data
Transforms
Arrows for Data Flows between
processes
Cylinders for serial I/O files

Divided cylinders for parallel I/O files

Grid boxes for database tables


Graph Programming
 Working with the GDE on your desktop is easier
than drawing a data flow diagram on a white board.
You simply drag and drop functional modules
called Components and link them with a swipe of
the mouse. When it’s time to run the application,
Ab Initio Co>Operating System turns the diagram
into a collection of process running on servers
 The Ab Initio term for running data flow diagram is
a Graph. The inputs and outputs are dataset
components; the processing steps are program
components; and the data conduits are flows.
Anatomy of a Running Job

What happens when you push the “Run”


button?
 Your graph is translated into a script that can be executed in
the Shell Development Environment.
 This script and any metadata files stored on the GDE client
machine are shipped (via FTP) to the server.
 The script is invoked (via REXEC or TELNET) on the server.
 The script creates and runs a job that may run across many
nodes.
 Monitoring information is sent back to the GDE client.
Anatomy of a Running Job
 Host Process Creation
 Pushing “Run” button generates script.
 Script is transmitted to Host node.

 Script is invoked, creating Host process .

Host
GDE

Client Host Processing nodes


Anatomy of a Running Job
 Agent Process Creation
 Host process spawns Agent processes.

Host
GDE Agent Agent

Client Host Processing nodes


Anatomy of a Running Job
 Component Process Creation
 Agent processes create Component processes on each processing
node.

Host
GDE Agent Agent

Client Host Processing nodes


Anatomy of a Running Job
 Component Execution
 Component processes do their jobs.
 Component processes communicate directly with datasets and each
other to move data around.

Host
GDE Agent Agent

Client Host Processing nodes


Anatomy of a Running Job
 Successful Component Termination
 As each Component process finishes with its data, it exits with
success status.

Host
GDE Agent Agent

Client Host Processing nodes


Anatomy of a Running Job
 Agent Termination
 When all of an Agent’s Component processes exit, the Agent informs
the Host process that those components are finished.
 The Agent process then exits.

Host
GDE

Client Host Processing nodes


Anatomy of a Running Job
 Host Termination
 When all Agents have exited, the Host process
informs the GDE that the job is complete.
 The Host process then exits.

Host
GDE

Client Host Processing nodes


Ab Initio S/w Versions & File Extensions
 Software Versions
– Co>Operating System Version => 2.8.32
– GDE Version => 1.8.22

 File Extensions
– .mp Stored Ab Initio graph or graph component
– .mpc Program or custom component
– .mdc Dataset or custom dataset component
– .dml Data Manipulation Language file or record type
definition
– .xfr Transform function file
– .dat Data file (either serial file or multifile)
Versions
 To find the GDE version Select
Help >> About Ab Initio from the
GDE window.
 To find the Co>Operating
System version Select Run >>
Settings from the GDE window.
Look for the Detected base
System Version.
Connecting to Co>op Server from GDE
Host Profile Setting

1. Choose settings from the run menu


2. Check the use host profile setting checkbox.
3. Click Edit button to open the Host profile dialog.
4. If running Ab Initio on your local NT system, check Local
Execution (NT) checkbox and go to step 6.
5. If running Ab Initio on a Remote UNIX system, fill in the
path to the Host and Host Login and Password.
6. Type the full path of Host directory.
7. Select the Shell Type from pull down menu.
8. Test Login and if necessary make changes.
Host Profile

Enter Host,
Select the
Login,
Shell Type
Password &
Host directory
Ab Initio Components

Ab Initio provided
components. Datasets,
Partition, Transform,
Sort, Database are
frequently used.
Creating Graph

Type the
Label

Specify the
Input .dat
file
Create Graph - Dml
 Propagate from Neighbors: Copy
record formats from connected flow.
Specify  Same As: Copy record format’s
the .dml file
from a specific component’s port.
 Path: Store record formats in a
Local file, Host File, or in the Ab
Initio repository.
 Embedded: Type the record format
directly in a string.
Creating Graph - dml
 DML is Ab Initio’s Data
Manipulation Language.
 DML describes data in terms
of
– Record Formats that list the
fields and format of input,
output, and intermediate
records.
– Expressions that define
simple computations, for
example, selection.
– Transform Functions that
control reformatting,
Editing .dml file through aggregation, and other data
Record Format Editor – Grid transformations.
View – Keys that specify groupings,
ordering, and partitioning
relationships between
records.
Creating Graph - Transform
 A transform function is either a
DML file or a DML string that
describes how you manipulate
your data.
 Ab Initio transform functions
mainly consist of a series of
assignment statements. Each
statement is called a business
rule.
Specify the .xfr file  When Ab Initio evaluates a
transform function, it performs
following tasks:
– Initializes local variables
– Evaluates statements
– Evaluates rules.
 Transform function files have the
xfr extension.
Creating Graph - xfr
 Transform functions: A set
of rules that compute
output values from input
values.
 Business rule: Part of a
transform function that
describes how you
manipulate one field of
your output data.
 Variable: Optional part of a
transform function that
provides storage for
temporary values.
 Statement: Optional part of
a transform function that
assigns values of variables
in a specific order.

You might also like