You are on page 1of 92

INTRODUCTION

Trainer Introduction

Pre-Requisites

Lab Setup Discussion

Participant Introduction

3
4/9/17 All Rights Reserved
TRAINER
INTRODUCTION
Facilitator : Pranay Kumar P
Email.id : pranay.pothuganti@gmail.com
Ph.No : +91-9160619276
4
4/9/17 All Rights Reserved
PRE-REQUISITES

All Participants are expected to


have knowledge in

Oracle SQL
General relational database
(Oracle)
Data Warehousing.

5
4/9/17 All Rights Reserved
LAB SETUP DETAILS

All Machines with WinXP with


Min 4 GB RAM

MS-Office & Acrobat Reader

Within LAN

6
4/9/17 All Rights Reserved
PARTICIPANT
INTRODUCTION
About Yourself

Education Background

If Any Experience in IT, Kindly


Specify

7
4/9/17 All Rights Reserved
SESSION I
INTRODUCTION

8
4/9/17 All Rights Reserved
AB-INITIO

9
4/9/17 All Rights Reserved
HISTORY OF AB INITIO
Ab-Initio Software Corporation was founded in
the mid of 1990's by Sheryl Handler, the
former CEO at Thinking Machines
Corporation, after TMC filed for bankruptcy.

In addition to Handler, other former TMC


people involved in the founding of Ab Initio
included Cliff Lasser, Angela Lordi, and Craig
Stanfill.

Ab-Initio is known for being very secretive in


the way that they run their business, but their
software is widely regarded as top notch.

10
4/9/17 All Rights Reserved
HISTORY OF AB INITIO
Ab-Initio software is a fourth generation data
analysis, batch processing, data manipulation
graphical user interface (GUI)-based parallel
processing tool that is used mainly to extract,
transform and load data.

Ab-Initio software is a suite of products that


together provides platform for robust data
processing applications.

11
4/9/17 All Rights Reserved
HISTORY OF AB INITIO

Core Ab Initio Products are:

The Co>Operating System


The Component Library &
The Graphical Development
Environment
WHAT DOES THE
WORD
AB INITIO
MEAN?
Ab Initio in Latin stands for FROM THE
BEGINNING

From the beginning, our software was

designed to support a complete range of

business applications - from simple to the

most complex capabilities like parallelism


13
4/9/17 All Rights Reserved
WHAT DOES THE
WORD
AB INITIO
MEAN?
The Graphical Development Environment

and a powerful set of components allow our

customers to get valuable results from the

beginning.
AB INITIOS FOCUS
Moving Data
Move small and large volumes of
data in an
efficient manner
Deal with the complexity
associated with business data

High Performance
Scalable solutions

Better Productivity
15
4/9/17 All Rights Reserved
AB INITIO PLATFORMS
No process is too big or too small for Ab Initio.
It runs on a few processors or few hundred
processors & runs on virtually every kind of
hardware

SMP (Symmetric Multiprocessor) Systems

MPP (Massively Parallel Processor) Systems

Clusters and PCs.

16
4/9/17 All Rights Reserved
AB INITIO RUNS ON MANY
OPERATING SYSTEMS
Compaq Tru64 UNIX
Digital Unix
Hewlett-Packard HP-UX
IBM Aix
NCR MP-RAS
Red Hat Linux
IBM/Sequent DYNIX/ptx
Siemens Pyramid Reliant UNIX
Silicon Graphics IRIX
Sun Solaris
Windows NT and Windows 2000

17
4/9/17 All Rights Reserved
AB INITIO BASE
SOFTWARE - TWO
MAIN PIECES:

Ab-Initio Co>Operating System and


Core Components.

Graphical Development Environment


(GDE).

18
4/9/17 All Rights Reserved
AB INITIO PRODUCT
ARCHITECTURE
User Applications
User Applications

Development Environments
Development Environments
GDE Shell Ab Initio
GDE Shell Ab Initio

Component User-defined 3rd Party EME


Component User-defined 3rd Party EME
Library Components Components
Library Components Components

The Ab Initio Co>Operating System


The Ab Initio Co>Operating System

Native Operating System (Unix, Windows, OS/390)


Native Operating System (Unix, Windows, OS/390)

19
4/9/17 All Rights Reserved
ANATOMY OF A
RUNNING JOB
What happens when you push the RUN
button?
Your graph is translated into a script that
can be executed in the Shell
Development Environment.
This script and any metadata files stored on
the GDE client
machine are shipped (via FTP) to the
server.
The script is invoked (via REXEC or TELNET)
on the server.
The script creates and runs a job which in-
turn may run 20
All Rights Reserved
4/9/17
ANATOMY OF A RUNNING
JOB
Host Process Creation
Pushing RUN button generates
script.
Script is transmitted to Host node.
Script is invoked creating Host process.
Host

GDE

Client Host Processing nodes

21
4/9/17 All Rights Reserved
ANATOMY OF A RUNNING
JOB
Agent Process Creation
Host process spawns Agent processes

Host
Agent Agent

Client Host Processing nodes

22
4/9/17 All Rights Reserved
ANATOMY OF A RUNNING
JOB
Component Process Creation
Agent processes create Component
processes on
each processing node.
Host
Agent Agent

Client Host Processing nodes

4/9/17 23
All Rights Reserved
ANATOMY OF A RUNNING
JOB
Component Execution
Component processes do their jobs.
It communicates directly with datasets
& internally within each other to move
the data around.
Host
GDE Agent Agent

Client Host Processing nodes

24
4/9/17 All Rights Reserved
ANATOMY OF A RUNNING
JOB
Successful Component Termination
As soon as each Component process
finishes with its data, it exits with
success status.

Host
GDE Agent Agent

Client Host Processing nodes

25
4/9/17 All Rights Reserved
ANATOMY OF A RUNNING
JOB
Agent Termination
When all of an Agents Component
processes exit, the Agent informs
the Host process that those components
are finished.
The Agent process then exits.
Host

Client Host Processing nodes

26
4/9/17 All Rights Reserved
ANATOMY OF A RUNNING
JOB
Host Termination
When all Agents exit- the Host process
informs the
GDE that the job is complete.
The Host process then exits.
Host

Client Host Processing nodes

27
4/9/17 All Rights Reserved
LAB EXERCISE

EXERCISE_QUESTIONS\DAY 1 LAB EXER


CISES.txt

Simple Extract and Load

28
4/9/17 All Rights Reserved
DAY II
COMPONENTS

29
4/9/17 All Rights Reserved
REFORMAT COMPONENT

Reformat: Changes the record format


of data records,
By dropping fields,
By using DML expressions to add
fields, combine fields, or transform
the data in the records.
30
4/9/17 All Rights Reserved
SAMPLE DATA TO BE
REFORMATTED
DEPTNO DNAME LOC
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON

Reformatted Output
DEPTNO DNAME
10 ACCOUNTING
20 RESEARCH
30 SALES
40 OPERATIONS

31
4/9/17 All Rights Reserved
LAB EXERCISE

EXERCISE_QUESTIONS\REFORMAT.txt

Reformat

32
4/9/17 All Rights Reserved
FILTER BY EXPRESSION
COMPONENT
FBE

Filter by Expression: Filters data records


according to a specified DML expression.

select_expr Applies the expression in


the select_expr parameter to each
record to filter for each data records.

4/9/17 Cont., 34
All Rights Reserved
CONT.,
If the expression returns:

A non-zero value Filter by Expression


writes the record
to the out port.
A 0 Filter by Expression writes the
record to the
deselect port.

If you do not connect a flow to the


deselect port, Filter by Expression
discards the records. 35
4/9/17 All Rights Reserved
SAMPLE DATA TO BE
FILTERED

select-expr: deptno == 10;


Filtered Output

36
4/9/17 All Rights Reserved
LAB EXERCISE

EXERCISE_QUESTIONS\FILTER.txt
Filter By Expression

EXERCISE_QUESTIONS\EXP.txt

Reformat with Filter for simple


calculations, Variables, Handling Null

37
4/9/17 All Rights Reserved
BROADCAST
COMPONENT
BROADCAST

Broadcast: Arbitrarily combines all the data


records it receives into a single flow and writes
a copy of that flow to each of its output flow
partitions.

Use Broadcast to increase data parallelism


when you have connected a single fan-out flow
to the out port or to increase component
parallelism when you have connected multiple
straight flows to the out port.
39
4/9/17 All Rights Reserved
PARTITION BY
EXPRESSION
COMPONENT
PARTITION BY
EXPRESSION

Partition by Expression: Distributes data


records to its output flow partitions according
to a specified DML expression.

The Partition by Expression component:


Reads records in arbitrary order from the flows
connected to the in port
Distributes the records to the flows connected to the
out port, according to the expression in the function
parameter
41
4/9/17 All Rights Reserved
TRASH COMPONENT

Trash: Ends a flow by accepting all the data


records in it and discarding them.

Trash is a Broadcast component without an out


port.

The Trash component:


Readsrecordsfromtheinport
Discardstherecords

42
4/9/17 All Rights Reserved
LAB EXERCISE

EXERCISE_QUESTIONS\MULTIPLE TARGE
TS.txt

Working with multiple targets

43
4/9/17 All Rights Reserved
DAY III
COMPONENTS

44
4/9/17 All Rights Reserved
WORKING WITH FILES
Types of Files :
Delimited
Fixed Width
Mixed type

Location on be either on Co>op,


Shared drive and so on .....

Data can be either overwritten or


appended to the target file (By default
it is overwritten).
45
4/9/17 All Rights Reserved
Provide the record format for flat files
(dml).
LAB EXERCISE

EXERCISE_QUESTIONS\SRC_TRG_FILES.
txt

Working with source and target as


flat files

47
4/9/17 All Rights Reserved
JOIN COMPONENT

Join: Performs inner, outer, and semi-joins


with multiple flows of data records.

48
4/9/17 All Rights Reserved
JOIN TYPES
Inner Join Sets the record-required
parameters for all ports to True.

Outer Join Sets the record-required


parameters for all ports to False.

Explicit Allows you to set the


record-required parameter for each
port individually.

49
4/9/17 All Rights Reserved
JOIN TYPES -
CONT.,
Case 1: Inner Join join-type

Case 2: Full Outer Join join-type

Case 3: Explicit join-type:record-required0: false


record-required1: true

Case 4: Explicit join-type:record-required0: true


record-required1: false
SOME KEY JOIN
PARAMETERS
Key:
Name(s) of the field(s) in the input records that must
have matching values for Join to call the transform
function.

Driving:
Number of the port to which you want to connect the
driving input. The driving input is the largest input. All
other inputs are read into memory.

The driving parameter is only available when the sorted-


input parameter is set to In memory: Input need not be
sorted.
SOME KEY JOIN
PARAMETERS
Dedupn
Set the dedupn parameter to true to remove duplicates
from the corresponding inn port before joining.
This allows you to choose only one record from a group
with matching key values as the argument to the
transform function.
Default is false, which does not remove duplicates
Override-keyn
Alternative name(s) for the key field(s) for a particular in
port.
LAB EXERCISE

EXERCISE_QUESTIONS\JOIN.txt

Working with join

53
4/9/17 All Rights Reserved
CONCATENATE
COMPONENT

Not key-based
Result ordering is by partition
Serializes pipelined computation

54
4/9/17 All Rights Reserved
Useful for:
Creating serial flow from partitioned
data
Appending headers and trailers
Writing DML
CONT.,
The Concatenate component:

Reads all the data records from the first


flow connected to the in port (counting
from top to bottom on the
graph) and copies them to the out port.

Then reads all the data records from the


second
flow connected to the in port and
appends them to those of the first flow,
and so on.
56
4/9/17 All Rights Reserved
SORT COMPONENT

Sort: Sorts and merges data records Key -


Name(s) of the key field(s) and the sequence
specifiers(s) you want Sort to use when it
orders data records.

We can use Sort to order data records before


you send them to a component that requires
grouped or sorted records.

4/9/17 Cont., 57
All Rights Reserved
CONT.,
The Sort component:

Reads the records from all the flows


connected to the in port until it reaches
the number of bytes specified in the max-
core parameter.
Sorts the records and writes the results
to a temporary file on disk.
Repeats this procedure until it has read
all records
Merges all the temporary files,
maintaining the sort order.
Writes the result to the out port.All58Rights Reserved
4/9/17
DEDUP SORTED
COMPONENT

Dedup Sorted: Separates one specified data


record in each group of data records from the
rest of the records in the group.

Dedup Sorted requires grouped input.

Key - Names(s) of the key field(s) you want


Dedup Sorted to use when determining
groups of data records.
59
4/9/17 All Rights Reserved
LAB EXERCISE

EXERCISE_QUESTIONS\CONCAT.txt

Concatenate with sort and Dedup


sort

60
4/9/17 All Rights Reserved
ROLLUP COMPONENT

Rollup: Generates data records that summarize


groups of data records

Rollup gives you more control over record


selection, grouping, and aggregation than
Aggregate.

61
4/9/17 All Rights Reserved
ROLLUP COMPONENT

In the file specified in the transform parameter:

CreateaDMLtypenamedtemporary_type
Createtherequiredtransformfunctions

Cont.,
CONT.,
At runtime, Rollup executes the followi
ng steps:

1.Inputselection
2.Temporaryinitialization
3.Computation
4.Finalization
5.Outputselection

63
4/9/17 All Rights Reserved
LAB EXERCISE

EXERCISE_QUESTIONS\MFS.txt

MFS with Rollup Component

64
4/9/17 All Rights Reserved
FUSE COMPONENT

Fuse:
Combines multiple input flows into a single output flow by
applying a transform function to corresponding records of
each flow.

Fuse applies a transform function to corresponding


records of each input flow.

The first time the transform function executes, it uses the


first record of each flow and so on.

65
4/9/17 All Rights Reserved
CONT.,
Fuse sends the result of the transform
function to the out port.

The component works as follows. The


component tries to read from each of its
input flows.

If all of its input flows are finished, Fuse


exits.
Otherwise, Fuse reads one record from each
still-unfinished input port and a NULL from
4/9/17 each finished input port. 66
All Rights Reserved
SCAN COMPONENT

Scan: Generates a series of cumulative summary


records such as successive year-to-date totals for
groups of data records.

Scan produces intermediate summary records

In the file specified in the transform parameter:


CreateaDMLtypenamedtemporary_type
Createatransformfunctionnamedscan.
Createtherequiredtransformfunctions

67
All Rights Reserved
4/9/17
,
CONT.,
At runtime, Scan executes the
following steps:

1. Input selection
2.Temporaryinitialization
3.Computation
4.Finalization
5.Outputselection

68
4/9/17 All Rights Reserved
LAB EXERCISE

EXERCISE_QUESTIONS\SANDBOX_FUSE
_SCAN.txt

Creation of Sandbox and Working


with Fuse and scan

69
4/9/17 All Rights Reserved
REDEFINE FORMAT
COMPONENT

Redefine Format: Copies data records


from its input to its output without
changing the values in the data
records.
We can use Redefine Format to change
or rename fields in a record format
without changing the values in the
records.

70
4/9/17 All Rights Reserved
CONT.,
The Redefine Format component:

If you use Redefine Format to change a


record format, make sure you specify an
output record format compatible with the
input record format

Reads the data records arriving at the in


port

Writes the data records to the out port


4/9/17
with the fields renamed according 71 to the
All Rights Reserved
DAY IV
COMPONENTS

72
4/9/17 All Rights Reserved
GATHER COMPONENT

Gather: Combines data records from multiple flow partitions


arbitrarily

The Gather component:

Reads data records from the flows connected to the in port


Combines the records arbitrarily
Writes the combined records to the out port

Not key-based and used most frequently


73
4/9/17 All Rights Reserved
CONT.,
Result ordering is unpredictable

Neither serializes nor synchronizes


pipelined computation

Useful for efficient collection of data


from multiple partitions and for
repartitioning

74
4/9/17 All Rights Reserved
MERGE COMPONENT

Merge: Combines data records from multiple flow partitions


that have been sorted according to the same key specifier, and
maintains the sort order.

Key-based and result ordering is sorted if each input is sorted

Possibly synchronizes pipelined computation; may even


serialize

Useful for creating ordered data flows & used more than
concatenate, but still infrequently
75
4/9/17 All Rights Reserved
INTERLEAVE
COMPONENT

Interleave:Combines blocks of data records from multiple flow


partitions in round-robin fashion

The Interleave component:

1. Reads the number of data records specified in the block size


parameter from the first flow connected to the in port
2. Reads the number of data records specified in the block size
parameter from the next flow, and so on
3. Writes the records to the out port .
76
4/9/17 All Rights Reserved
CONT.,
Not key-based

Result ordering is inverse of round-robin

Synchronizes pipelined computation

Useful for restoring original order following a


record-independent parallel computation
partitioned by round-robin

Used in rare circumstances

77
4/9/17 All Rights Reserved
LOOKUP FILE

LookupFile:
Representsoneormoreserialfilesoramultifile

The
amountofdataissmallenoughtobeheldinma
inmemory

Thisallows
atransformfunctiontoretrieverecordsmuchm
4/9/17ore
78
All Rights Reserved
LAB EXERCISE

EXERCISE_QUESTIONS\PARAMETERS.tx
t

Parameters

EXERCISE_QUESTIONS\LOOKUP.txt

Lookup file

79
4/9/17 All Rights Reserved
PARTITION BY KEY

Partition by Key: Distributes data records to its output flow


partitions according to key values

The Partition by Key component:

Reads records in arbitrary order from the in port

Distributes them to the flows connected to the out port,


according to the key parameter, writing records with the
same key due to the same output flow
80
4/9/17 All Rights Reserved
PARTITION BY
PERCENTAGE

Partition by Percentage: Distributes a specified percentage


of the total number of input data records to each output flow.

The Partition by Percentage component:

Reads records from the in port

Writes a specified percentage of the input records to each


flow of the out port

81
4/9/17 All Rights Reserved
CONT.,
You can supply the percentages that Partition by
Percentage uses to data records in either of two ways:

Byspecifyingthepercentagesinthepercentages
parameter.

Byconnectingtheoutputofanycomponentthatpr
oducesa list of percentages tothepctportof
Partitionby Percentage.

Use decimal('\n') as the record format for the pct port of


Partition by Percentage

82
4/9/17 All Rights Reserved
CONT.,
You can assign a different percentage to each
output flow

Express percentages as integers from 1 to 100

Make the count of the percentages one less


than the number of flows on the out port

83
4/9/17 All Rights Reserved
PARTITION BY RANGE

Partition by Range: Distributes data records to its output flow


partitions according to the ranges of key values specified
for each partition.

The Partition by Range component:

Reads splitter records from the split port, and assumes that t
hese records are sorted according to the key parameter.

84
4/9/17 All Rights Reserved
CONT.,
Determineswhetherthenumberofflowsconnec
tedtotheout port isequalton(wheren-
1representsthenumberofsplitter
records).

If
not,PartitionbyRangewritesanerrormessage
andstopsthe
executionofthegraph.

Readsdatarecordsfromtheflowsconnectedto
theinportin arbitraryorder.
85
4/9/17 All Rights Reserved
CONT.,
Distributesthedatarecordstotheflowsconne
ctedtotheout
port
accordingtothevaluesofthekeyfield(s),asfollo
ws:


Assignsrecordswithkeyvalueslessthanoreq
ualtothe
first splitterrecordtothefirstoutputflow.


4/9/17Assignsrecordswithkeyvaluesgreaterthant
86
All Rights Reserved
PARTITION BY ROUND-
ROBIN

Partition by Round-robin: distributes blocks of data records


evenly to each output flow in round-robin fashion

The Partition by Round-robin component:


Reads records from the in port

Distributes them in block size chunks to its output flows


according to the order in which the flows are connected

The effect is like dealing a deck of cards


87
4/9/17 All Rights Reserved
RUN PROGRAM

RunProgram: Runsanexecutableprogram.

TheRunProgramcomponent:
Readsdatarecordsfromtheinportifyoucon
nectaflowto
thein port.

Runsthedatarecordsthroughtheprogramn
amedinthe command lineparameter.

Writesthedatarecordstotheoutportifyou
4/9/17 connectaflow 88
All Rights Reserved
DAY V
COMPONENTS

89
4/9/17 All Rights Reserved
INFORMATION
REGARDING PORTS

Gather Logs: Collects the output from the log ports


of components for analysis of a graph after execution

90
4/9/17 All Rights Reserved
CONT.,
The Gather Logs component:

Collectslogrecordsgeneratedbycompo
nentsthrough
therelog ports

Writesarecordcontainingthetextfrom
theStartText
parameter
tothefilespecifiedintheLogFileparame
ter

Writesanylogrecordsfromitsinportto
4/9/17 thefilespecifiedin the 91
All Rights Reserved
92
4/9/17 All Rights Reserved

You might also like