You are on page 1of 76

Accenture Accenture Ab Initio Training Ab Initio Training 11

Introduction to
Ab Initio
Prepared By : Ashok Chanda
Accenture Accenture Ab Initio Training Ab Initio Training 22
Ab initio Session 1
Ab initio Session 1
Introduction to DWH Introduction to DWH
Explanation of DW Architecture Explanation of DW Architecture
Operating System / Hardware Support Operating System / Hardware Support
Introduction to ETL Process Introduction to ETL Process
Introduction to Ab Initio Introduction to Ab Initio
Explanation of Ab Initio Architecture Explanation of Ab Initio Architecture
Accenture Accenture Ab Initio Training Ab Initio Training 33
What is Data Warehouse
What is Data Warehouse
A data warehouse is a copy of transaction data A data warehouse is a copy of transaction data
specifically structured for specifically structured for querying and querying and
reporting. reporting.
A data warehouse is a subject-oriented, A data warehouse is a subject-oriented,
integrated, time-variant and non-volatile integrated, time-variant and non-volatile
collection of data in support of management's collection of data in support of management's
decision making process. decision making process.
A data warehouse is a central repository for all A data warehouse is a central repository for all
or significant parts of the data that an or significant parts of the data that an
enterprise's various business systems collect. enterprise's various business systems collect.
Accenture Accenture Ab Initio Training Ab Initio Training 44
Data Warehouse-Definitions
Data Warehouse-Definitions
A data warehouse is a database geared towards A data warehouse is a database geared towards
the business intelligence requirements of an the business intelligence requirements of an
organization. The data warehouse integrates organization. The data warehouse integrates
data from the various operational systems and is data from the various operational systems and is
typically loaded from these systems at regular typically loaded from these systems at regular
intervals. Data warehouses contain historical intervals. Data warehouses contain historical
information that enables analysis of business information that enables analysis of business
performance over time. A collection of databases performance over time. A collection of databases
combined with a flexible data extraction system. combined with a flexible data extraction system.
Accenture Accenture Ab Initio Training Ab Initio Training 55
Data Warehouse
Data Warehouse
A data warehouse can be normalized or A data warehouse can be normalized or
denormalized. It can be a relational denormalized. It can be a relational
database, multidimensional database, flat database, multidimensional database, flat
file, hierarchical database, object file, hierarchical database, object
database, etc. Data warehouse data often database, etc. Data warehouse data often
gets changed. And data warehouses often gets changed. And data warehouses often
focus on a specific activity or entity. focus on a specific activity or entity.
Accenture Accenture Ab Initio Training Ab Initio Training 66
Why Use a Data Warehouse?
Why Use a Data Warehouse?
Data Exploration and Discovery
Integrated and Consistent data
Quality assured data
Easily accessible data
Production and performance awareness
Access to data in a timely manner
Accenture Accenture Ab Initio Training Ab Initio Training 77
Simplified Datawarehouse
Architecture
Accenture Accenture Ab Initio Training Ab Initio Training 88
Data warehouse Architecture
Data Warehouses can be architected in many different Data Warehouses can be architected in many different
ways, depending on the specific needs of a ways, depending on the specific needs of a
business. The model shown below is the "hub-and- business. The model shown below is the "hub-and-
spokes" Data Warehousing architecture that is popular in spokes" Data Warehousing architecture that is popular in
many organizations. many organizations.
In short, data is moved from databases used in In short, data is moved from databases used in
operational systems into a data warehouse staging area, operational systems into a data warehouse staging area,
then into a data warehouse and finally into a set of then into a data warehouse and finally into a set of
conformed data marts. Data is copied from one conformed data marts. Data is copied from one
database to another using a technology called ETL database to another using a technology called ETL
(Extract, Transform, Load). (Extract, Transform, Load).
Accenture Accenture Ab Initio Training Ab Initio Training 99
Accenture Accenture Ab Initio Training Ab Initio Training 10 10
The ETL Process
The ETL Process
Capture Capture
Scrub or Data cleansing Scrub or Data cleansing
Transform Transform
Load and Index Load and Index
Accenture Accenture Ab Initio Training Ab Initio Training 11 11
ETL Technology
ETL Technology

ETL Technology is an important component of the Data ETL Technology is an important component of the Data
Warehousing Architecture. It is used to copy data from Warehousing Architecture. It is used to copy data from
Operational Applications to the Data Warehouse Staging Operational Applications to the Data Warehouse Staging
Area, from the DW Staging Area into the Data Area, from the DW Staging Area into the Data
Warehouse and finally from the Data Warehouse into a Warehouse and finally from the Data Warehouse into a
set of conformed Data Marts that are accessible by set of conformed Data Marts that are accessible by
decision makers. decision makers.
The ETL software extracts data, transforms values of The ETL software extracts data, transforms values of
inconsistent data, cleanses "bad" data, filters data and inconsistent data, cleanses "bad" data, filters data and
loads data into a target database. The scheduling of loads data into a target database. The scheduling of
ETL jobs is critical. Should there be a failure in one ETL ETL jobs is critical. Should there be a failure in one ETL
job, the remaining ETL jobs must respond appropriately. job, the remaining ETL jobs must respond appropriately.

Accenture Accenture Ab Initio Training Ab Initio Training 12 12
Data Warehouse Staging Area
Data Warehouse Staging Area

The Data Warehouse Staging Area is temporary location The Data Warehouse Staging Area is temporary location
where data from source systems is copied. A staging where data from source systems is copied. A staging
area is mainly required in a Data Warehousing area is mainly required in a Data Warehousing
Architecture for timing reasons. In short, all required Architecture for timing reasons. In short, all required
data must be available before data can be integrated data must be available before data can be integrated
into the Data Warehouse. into the Data Warehouse.
Due to varying business cycles, data processing cycles, Due to varying business cycles, data processing cycles,
hardware and network resource limitations and hardware and network resource limitations and
geographical factors, it is not feasible to extract all the geographical factors, it is not feasible to extract all the
data from all Operational databases at exactly the same data from all Operational databases at exactly the same
time time
Accenture Accenture Ab Initio Training Ab Initio Training 13 13
Examples-
Examples-
Staging Area
Staging Area

For example, it might be reasonable to extract sales data on a daily For example, it might be reasonable to extract sales data on a daily
basis, however, daily extracts might not be suitable for financial basis, however, daily extracts might not be suitable for financial
data that requires a month-end reconciliation process. Similarly, it data that requires a month-end reconciliation process. Similarly, it
might be feasible to extract "customer" data from a database in might be feasible to extract "customer" data from a database in
Singapore at noon eastern standard time, but this would not be Singapore at noon eastern standard time, but this would not be
feasible for "customer" data in a Chicago database. feasible for "customer" data in a Chicago database.
Data in the Data Warehouse can be either persistent (i.e. remains Data in the Data Warehouse can be either persistent (i.e. remains
around for a long period) or transient (i.e. only remains around around for a long period) or transient (i.e. only remains around
temporarily). temporarily).
Not all business require a Data Warehouse Staging Area. For many Not all business require a Data Warehouse Staging Area. For many
businesses it is feasible to use ETL to copy data directly from businesses it is feasible to use ETL to copy data directly from
operational databases into the Data Warehouse. operational databases into the Data Warehouse.
Accenture Accenture Ab Initio Training Ab Initio Training 14 14
Data warehouse
Data warehouse
The purpose of the Data Warehouse in the overall Data The purpose of the Data Warehouse in the overall Data
Warehousing Architecture is to integrate corporate data. It Warehousing Architecture is to integrate corporate data. It
contains the "single version of truth" for the organization contains the "single version of truth" for the organization
that has been carefully constructed from data stored in that has been carefully constructed from data stored in
disparate internal and external operational databases. disparate internal and external operational databases.
The amount of data in the Data Warehouse is The amount of data in the Data Warehouse is
massive. Data is stored at a very granular level of massive. Data is stored at a very granular level of
detail. For example, every "sale" that has ever occurred detail. For example, every "sale" that has ever occurred
in the organization is recorded and related to dimensions in the organization is recorded and related to dimensions
of interest. This allows data to be sliced and diced, of interest. This allows data to be sliced and diced,
summed and grouped in unimaginable ways. summed and grouped in unimaginable ways.
Accenture Accenture Ab Initio Training Ab Initio Training 15 15
Data Warehouse
Data Warehouse
Contrary to popular opinion, the Data Warehouses does Contrary to popular opinion, the Data Warehouses does
not contain all the data in the organization. It's purpose not contain all the data in the organization. It's purpose
is to provide key business metrics that are needed by is to provide key business metrics that are needed by
the organization for strategic and tactical decision the organization for strategic and tactical decision
making. making.
Decision makers don't access the Data Warehouse Decision makers don't access the Data Warehouse
directly. This is done through various front-end Data directly. This is done through various front-end Data
Warehouse Tools that read data from subject specific Warehouse Tools that read data from subject specific
Data Marts. Data Marts.
The Data Warehouse can be either "relational" or The Data Warehouse can be either "relational" or
"dimensional". This depends on how the business "dimensional". This depends on how the business
intends to use the information. intends to use the information.
Accenture Accenture Ab Initio Training Ab Initio Training 16 16
Data Warehouse Environment
Data Warehouse Environment
In addition to a In addition to a
relational/multidimensional database, a relational/multidimensional database, a
data warehouse environment often data warehouse environment often
consists of an ETL solution, an OLAP consists of an ETL solution, an OLAP
engine, client analysis tools, and other engine, client analysis tools, and other
applications that manage the process of applications that manage the process of
gathering data and delivering it to gathering data and delivering it to
business users. business users.
Accenture Accenture Ab Initio Training Ab Initio Training 17 17
Data Mart
Data Mart
A subset of a data warehouse, for use by a A subset of a data warehouse, for use by a
single department or function. single department or function.
A repository of data gathered from operational A repository of data gathered from operational
data and other sources that is designed to serve data and other sources that is designed to serve
a particular community of knowledge workers. a particular community of knowledge workers.
A subset of the information contained in a data A subset of the information contained in a data
warehouse. warehouse.
Data marts have the same definition as the data Data marts have the same definition as the data
warehouse (see below), but data marts have a warehouse (see below), but data marts have a
more limited audience and/or data content. more limited audience and/or data content.
Accenture Accenture Ab Initio Training Ab Initio Training 18 18
Data Mart
Data Mart
ETL (Extract Transform Load) jobs extract data from the Data ETL (Extract Transform Load) jobs extract data from the Data
Warehouse and populate one or more Data Marts for use by groups Warehouse and populate one or more Data Marts for use by groups
of decision makers in the organizations. The Data Marts can be of decision makers in the organizations. The Data Marts can be
Dimensional Dimensional (Star Schemas) (Star Schemas) or relational, depending on how the or relational, depending on how the
information is to be used and what "front end" Data Warehousing information is to be used and what "front end" Data Warehousing
Tools will be used to present the information. Tools will be used to present the information.
Each Data Mart can contain different combinations of tables, Each Data Mart can contain different combinations of tables,
columns and rows from the Enterprise Data Warehouse. For columns and rows from the Enterprise Data Warehouse. For
example, an business unit or user group that doesn't require a lot of example, an business unit or user group that doesn't require a lot of
historical data might only need transactions from the current historical data might only need transactions from the current
calendar year in the database. The Personnel Department might calendar year in the database. The Personnel Department might
need to see all details about employees, whereas data such as need to see all details about employees, whereas data such as
"salary" or "home address" might not be appropriate for a Data Mart "salary" or "home address" might not be appropriate for a Data Mart
that focuses on Sales. that focuses on Sales.
Accenture Accenture Ab Initio Training Ab Initio Training 19 19
Star Schema
Star Schema
The The star schema star schema is perhaps the simplest data is perhaps the simplest data
warehouse schema. warehouse schema.
It is called a star schema because the entity- It is called a star schema because the entity-
relationship diagram of this schema resembles a relationship diagram of this schema resembles a
star, with points radiating from a central table. star, with points radiating from a central table.
The center of the star consists of a large fact The center of the star consists of a large fact
table and the points of the star are the table and the points of the star are the
dimension tables. dimension tables.
Accenture Accenture Ab Initio Training Ab Initio Training 20 20
Star Schema continued
Star Schema continued
A star schema is characterized by one or A star schema is characterized by one or
more very large more very large fact fact tables that contain tables that contain
the primary information in the data the primary information in the data
warehouse, and a number of much warehouse, and a number of much
smaller smaller dimension dimension tables (or lookup tables (or lookup
tables), each of which contains tables), each of which contains
information about the entries for a information about the entries for a
particular attribute in the fact table. particular attribute in the fact table.
Accenture Accenture Ab Initio Training Ab Initio Training 21 21
Advantages of Star Schemas
Advantages of Star Schemas
Provide a direct and intuitive mapping between Provide a direct and intuitive mapping between
the business entities being analyzed by end the business entities being analyzed by end
users and the schema design. users and the schema design.
Provide highly optimized performance for typical Provide highly optimized performance for typical
star queries. star queries.
Are widely supported by a large number of Are widely supported by a large number of
business intelligence tools, which may anticipate business intelligence tools, which may anticipate
or even require that the data-warehouse schema or even require that the data-warehouse schema
contain dimension tables contain dimension tables
Star schemas are used for both simple data Star schemas are used for both simple data
marts and very large data warehouses. marts and very large data warehouses.
Accenture Accenture Ab Initio Training Ab Initio Training 22 22
Star schema
Star schema
Diagrammatic representation of star Diagrammatic representation of star
schema schema
Accenture Accenture Ab Initio Training Ab Initio Training 23 23
Snowflake Schema
Snowflake Schema
The snowflake schema is a more complex The snowflake schema is a more complex
data warehouse model than a star data warehouse model than a star
schema, and is a type of star schema. schema, and is a type of star schema.
It is called a snowflake schema because It is called a snowflake schema because
the diagram of the schema resembles a the diagram of the schema resembles a
snowflake. snowflake.
Snowflake schemas normalize dimensions Snowflake schemas normalize dimensions
to eliminate redundancy. to eliminate redundancy.
Accenture Accenture Ab Initio Training Ab Initio Training 24 24
Snowflake Schema - Example
Snowflake Schema - Example
That is, the dimension data has been grouped That is, the dimension data has been grouped
into multiple tables instead of one large table. into multiple tables instead of one large table.
For example, a product dimension table in a star For example, a product dimension table in a star
schema might be normalized into a products schema might be normalized into a products
table, a product_category table, and a table, a product_category table, and a
product_manufacturer table in a snowflake product_manufacturer table in a snowflake
schema. While this saves space, it increases the schema. While this saves space, it increases the
number of dimension tables and requires more number of dimension tables and requires more
foreign key joins. The result is more complex foreign key joins. The result is more complex
queries and reduced query performance. queries and reduced query performance.
Accenture Accenture Ab Initio Training Ab Initio Training 25 25
Diagrammatic representation
Diagrammatic representation
for Snowflake Schema
for Snowflake Schema
Accenture Accenture Ab Initio Training Ab Initio Training 26 26
Fact Table
Fact Table
The centralized table in a star schema is The centralized table in a star schema is
called as FACT table. A fact table typically called as FACT table. A fact table typically
has two types of columns: those that has two types of columns: those that
contain facts and those that are foreign contain facts and those that are foreign
keys to dimension tables. The primary key keys to dimension tables. The primary key
of a fact table is usually a composite key of a fact table is usually a composite key
that is made up of all of its foreign keys. that is made up of all of its foreign keys.
Accenture Accenture Ab Initio Training Ab Initio Training 27 27
What happens during the ETL
What happens during the ETL
process?
process?

During extraction, the desired data is identified and During extraction, the desired data is identified and
extracted from many different sources, including extracted from many different sources, including
database systems and applications. Depending on the database systems and applications. Depending on the
source system's capabilities (for example, operating source system's capabilities (for example, operating
system resources), some transformations may take place system resources), some transformations may take place
during this extraction process. The size of the extracted during this extraction process. The size of the extracted
data varies from hundreds of kilobytes up to gigabytes, data varies from hundreds of kilobytes up to gigabytes,
depending on the source system and the business depending on the source system and the business
situation. After extracting data, it has to be physically situation. After extracting data, it has to be physically
transported to the target system or an intermediate transported to the target system or an intermediate
system for further processing. system for further processing.
Accenture Accenture Ab Initio Training Ab Initio Training 28 28
Examples of Second-
Examples of Second-
Generation ETL Tools
Generation ETL Tools
Powermart 4.5 Informatica Corporation Powermart 4.5 Informatica Corporation
Pioneer due to market share Pioneer due to market share
Ardent DataStage Ardent Software, Inc. Ardent DataStage Ardent Software, Inc.
General-purpose tool oriented to data marts General-purpose tool oriented to data marts
Sagent Data Mart Solution 3.0 Sagent Sagent Data Mart Solution 3.0 Sagent
Technology Technology
Progressively integrated with Microsoft Progressively integrated with Microsoft
Ab Initio 2.2 Ab Initio Software Ab Initio 2.2 Ab Initio Software
A kit of tools that can be used to build applications A kit of tools that can be used to build applications
Tapestry 2.1 D2K, Inc Tapestry 2.1 D2K, Inc
End-to-end data warehousing solution from a single vendor End-to-end data warehousing solution from a single vendor
Accenture Accenture Ab Initio Training Ab Initio Training 29 29
What to look for in ETL tools
What to look for in ETL tools
Use optional data cleansing tool to clean-up source data Use optional data cleansing tool to clean-up source data
Use extraction/transformation/load tool to retrieve, Use extraction/transformation/load tool to retrieve,
cleanse, transform, summarize, aggregate, and load data cleanse, transform, summarize, aggregate, and load data
Use modern, engine-driven technology for fast, parallel Use modern, engine-driven technology for fast, parallel
operation operation
Goal: define 100% of the transform rule with point and Goal: define 100% of the transform rule with point and
click interface click interface
Support development of logical and physical data models Support development of logical and physical data models
Generate and manage central metadata repository Generate and manage central metadata repository
Open metadata exchange architecture to integrate central Open metadata exchange architecture to integrate central
metadata with local metadata. metadata with local metadata.
Support metadata standards Support metadata standards
Provide end users access to metadata in business terms Provide end users access to metadata in business terms
Accenture Accenture Ab Initio Training Ab Initio Training 30 30
Operating System / Hardware
Operating System / Hardware
Support
Support
This section discusses how a DBMS utilizes This section discusses how a DBMS utilizes
OS/hardware features such as parallel OS/hardware features such as parallel
functionality, SMP/MPP support, and functionality, SMP/MPP support, and
clustering. These OS/hardware features clustering. These OS/hardware features
greatly extend the scalability and improve greatly extend the scalability and improve
performance. However, managing an performance. However, managing an
environment with these features is difficult environment with these features is difficult
and expensive. and expensive.
Accenture Accenture Ab Initio Training Ab Initio Training 31 31
Parallel Functionality
Parallel Functionality
The introduction and maturation of parallel The introduction and maturation of parallel
processing environments are key enablers of processing environments are key enablers of
increasing database sizes, as well as providing increasing database sizes, as well as providing
acceptable response times for storing, retrieving, acceptable response times for storing, retrieving,
and administrating data. DBMS vendors are and administrating data. DBMS vendors are
continually bringing products to market that take continually bringing products to market that take
advantage of multi-processor hardware advantage of multi-processor hardware
platforms. These products can perform table platforms. These products can perform table
scans, backups, loads, and queries in parallel. scans, backups, loads, and queries in parallel.
Accenture Accenture Ab Initio Training Ab Initio Training 32 32
Parallel Features
Parallel Features
An overview of typical parallel functionality is given below : An overview of typical parallel functionality is given below :
Queries Queries Parallel queries can enhance scalability for many query Parallel queries can enhance scalability for many query
operations operations
Data load Data load Performance is always a serious issue when loading Performance is always a serious issue when loading
large databases. Meeting response time requirements is the large databases. Meeting response time requirements is the
overriding factor for determining the best load method and should overriding factor for determining the best load method and should
be a key part of a performance benchmark be a key part of a performance benchmark
Create table as select Create table as select This feature makes it possible to create This feature makes it possible to create
aggregated tables in parallel aggregated tables in parallel
Index creation Index creation Parallel index creation exploits the benefits of Parallel index creation exploits the benefits of
parallel hardware by distributing the workload generated by a large parallel hardware by distributing the workload generated by a large
index created for a large number of processors . index created for a large number of processors .
Accenture Accenture Ab Initio Training Ab Initio Training 33 33
Which parallel processor
Which parallel processor
configuration, SMP or MPP
configuration, SMP or MPP
?
?
SMP and clustered SMP environments , have the SMP and clustered SMP environments , have the
flexibility and ability to scale in small increments. flexibility and ability to scale in small increments.
SMP environments are often useful for the large, SMP environments are often useful for the large,
but static data warehouse, where the data but static data warehouse, where the data
cannot be easily partitioned, due to the cannot be easily partitioned, due to the
unpredictable nature of how the data is joined unpredictable nature of how the data is joined
over multiple tables for complex searches and over multiple tables for complex searches and
ad-hoc queries. ad-hoc queries.
Accenture Accenture Ab Initio Training Ab Initio Training 34 34
Which parallel processor
Which parallel processor
configuration, SMP or MPP
configuration, SMP or MPP
?
?
MPP works well in environments where growth is potentially MPP works well in environments where growth is potentially
unlimited, access patterns to the database are predictable, and the unlimited, access patterns to the database are predictable, and the
data can be easily partitioned across different MPP nodes with data can be easily partitioned across different MPP nodes with
minimal data accesses crossing between them. This often occurs in minimal data accesses crossing between them. This often occurs in
large OLTP environments, where transactions are generally small large OLTP environments, where transactions are generally small
and predictable, as opposed to decision support and data and predictable, as opposed to decision support and data
warehouse environments, where multiple tables can be joined in warehouse environments, where multiple tables can be joined in
unpredictable ways. unpredictable ways.
In fact, data warehousing and decision support are the areas most In fact, data warehousing and decision support are the areas most
vendors of parallel hardware platforms and DBMSs are targeting. vendors of parallel hardware platforms and DBMSs are targeting.
MPP does not scale well if heavy data warehouse database accesses MPP does not scale well if heavy data warehouse database accesses
must cross MPP nodes, causing I/O bottlenecks over the MPP must cross MPP nodes, causing I/O bottlenecks over the MPP
interconnect, or if multiple MPP nodes are continually locked for interconnect, or if multiple MPP nodes are continually locked for
concurrent record updates. concurrent record updates.
Accenture Accenture Ab Initio Training Ab Initio Training 35 35
A Multi-CPU Computer (SMP)
A Multi-CPU Computer (SMP)
Accenture Accenture Ab Initio Training Ab Initio Training 36 36
A Network of Multi-CPU Nodes
A Network of Multi-CPU Nodes
Accenture Accenture Ab Initio Training Ab Initio Training 37 37
A Network of Networks
A Network of Networks
Accenture Accenture Ab Initio Training Ab Initio Training 38 38
Parallel Computer Architecture
Parallel Computer Architecture
Computers come in many shapes and sizes: Computers come in many shapes and sizes:
Single-CPU, Multi-CPU Single-CPU, Multi-CPU
Network of single-CPU computers Network of single-CPU computers
Network of multi-CPU computers Network of multi-CPU computers
Multi-CPU machines are often called SMPs (for Multi-CPU machines are often called SMPs (for
Symmetric Multi Processors). Symmetric Multi Processors).
Specially-built networks of machines are often called Specially-built networks of machines are often called
MPPs (for Massively Parallel Processors). MPPs (for Massively Parallel Processors).
Accenture Accenture Ab Initio Training Ab Initio Training 39 39
Introduction to Ab
Introduction to Ab
Initio
Initio
Accenture Accenture Ab Initio Training Ab Initio Training 40 40
History of Ab Initio
History of Ab Initio
Ab Initio Software Corporation Ab Initio Software Corporation was founded was founded
in the mid in the mid 1990's 1990's by Sheryl Handler, the former by Sheryl Handler, the former
CEO at Thinking Machines Corporation, after CEO at Thinking Machines Corporation, after
TMC filed for bankruptcy. In addition to Handler, TMC filed for bankruptcy. In addition to Handler,
other former TMC people involved in the other former TMC people involved in the
founding of Ab Initio included Cliff Lasser, founding of Ab Initio included Cliff Lasser,
Angela Lordi, and Craig Stanfill. Angela Lordi, and Craig Stanfill.
Ab Initio is known for being very secretive in the Ab Initio is known for being very secretive in the
way that they run their business, but their way that they run their business, but their
software is widely regarded as top notch. software is widely regarded as top notch.
Accenture Accenture Ab Initio Training Ab Initio Training 41 41
History of Ab Initio
History of Ab Initio
The Ab Initio software is a fourth generation The Ab Initio software is a fourth generation
data analysis, batch processing, data data analysis, batch processing, data
manipulation graphical user interface (GUI)- manipulation graphical user interface (GUI)-
based parallel processing tool that is used based parallel processing tool that is used
mainly to extract, transform and load data. mainly to extract, transform and load data.
The Ab Initio software is a suite of products that The Ab Initio software is a suite of products that
together provides platform for robust data together provides platform for robust data
processing applications. The Core Ab Initio processing applications. The Core Ab Initio
Products are: The [Co>Operating System] The Products are: The [Co>Operating System] The
Component Library The Graphical Development Component Library The Graphical Development
Environment Environment
Accenture Accenture Ab Initio Training Ab Initio Training 42 42
What Does What Does Ab Initio Ab Initio Mean? Mean?
Ab Initio is Latin for From the Beginning. Ab Initio is Latin for From the Beginning.
From the beginning our software was designed to From the beginning our software was designed to
support a complete range of business applications, from support a complete range of business applications, from
simple to the most complex. Crucial capabilities like simple to the most complex. Crucial capabilities like
parallelism and checkpointing cant be added after the parallelism and checkpointing cant be added after the
fact. fact.
The Graphical Development Environment and a powerful The Graphical Development Environment and a powerful
set of components allow our customers to get valuable set of components allow our customers to get valuable
results from the beginning. results from the beginning.
Accenture Accenture Ab Initio Training Ab Initio Training 43 43
Ab Initios focus Ab Initios focus
Moving Data Moving Data
move small and large volumes of data in an move small and large volumes of data in an
efficient manner efficient manner
deal with the complexity associated with business deal with the complexity associated with business
data data
High Performance High Performance
scalable solutions scalable solutions
Better productivity Better productivity
Accenture Accenture Ab Initio Training Ab Initio Training 44 44
Ab Initios Software
Ab Initios Software
Ab Initio software is a general-purpose Ab Initio software is a general-purpose
data processing platform for mission- data processing platform for mission-
critical applications such as: critical applications such as:
Data warehousing Data warehousing
Batch processing Batch processing
Click-stream analysis Click-stream analysis
Data movement Data movement
Data transformation Data transformation
Accenture Accenture Ab Initio Training Ab Initio Training 45 45
Applications of Ab Initio
Applications of Ab Initio
Software
Software
Processing just about any form and volume of data. Processing just about any form and volume of data.
Parallel sort/merge processing. Parallel sort/merge processing.
Data transformation. Data transformation.
Rehosting of corporate data. Rehosting of corporate data.
Parallel execution of existing applications. Parallel execution of existing applications.
Accenture Accenture Ab Initio Training Ab Initio Training 46 46
Ab Initio Provides For:
Ab Initio Provides For:
Distribution - a platform for applications to Distribution - a platform for applications to
execute across a collection of processors within execute across a collection of processors within
the confines of a single machine or across the confines of a single machine or across
multiple machines. multiple machines.
Reduced Run Time Complexity - the ability for Reduced Run Time Complexity - the ability for
applications to run in parallel on any applications to run in parallel on any
combination of computers where the Ab Initio combination of computers where the Ab Initio
Co>Operating System is installed from a single Co>Operating System is installed from a single
point of control. point of control.
Accenture Accenture Ab Initio Training Ab Initio Training 47 47
Applications of Ab Initio
Applications of Ab Initio
Software in terms of Data
Software in terms of Data
Warehouse
Warehouse
Front end of Data Warehouse: Front end of Data Warehouse:
Transformation of disparate sources Transformation of disparate sources
Aggregation and other preprocessing Aggregation and other preprocessing
Referential integrity checking Referential integrity checking
Database loading Database loading
Back end of Data Warehouse: Back end of Data Warehouse:
Extraction for external processing Extraction for external processing
Aggregation and loading of Data Marts Aggregation and loading of Data Marts
Accenture Accenture Ab Initio Training Ab Initio Training 48 48
Ab Initio or Informatica-
Ab Initio or Informatica-
Powerful ETL
Powerful ETL
Informatica and Ab Initio both support Informatica and Ab Initio both support parallelism parallelism. But Informatica . But Informatica
supports only one type of parallelism but the Ab Initio supports supports only one type of parallelism but the Ab Initio supports
three types of parallelism. In Informatica the developer need to do three types of parallelism. In Informatica the developer need to do
some partitions in server manager by using that you can achieve some partitions in server manager by using that you can achieve
parallelism concepts. But in Ab Initio the tool it self take care of parallelism concepts. But in Ab Initio the tool it self take care of
parallelism we have three types of parallelisms in Ab Initio 1. parallelism we have three types of parallelisms in Ab Initio 1.
Component 2. Data Parallelism 3. Pipe Line parallelism this is the Component 2. Data Parallelism 3. Pipe Line parallelism this is the
difference in parallelism concepts. difference in parallelism concepts.
2. We don't have scheduler in Ab Initio like Informatica you need to 2. We don't have scheduler in Ab Initio like Informatica you need to
schedule through script or u need to run manually. schedule through script or u need to run manually.
3. Ab Initio supports different types of text files means you can read 3. Ab Initio supports different types of text files means you can read
same file with different structures that is not possible in Informatica, same file with different structures that is not possible in Informatica,
and also Ab Initio is more user friendly than Informatica so there is and also Ab Initio is more user friendly than Informatica so there is
a lot of differences in Informatica and Ab initio. a lot of differences in Informatica and Ab initio.
8. AbInitio doesn't need a dedicated administrator, UNIX or NT Admin will suffice, where as other ETL tools do have administrative work. 8. AbInitio doesn't need a dedicated administrator, UNIX or NT Admin will suffice, where as other ETL tools do have administrative work.
Accenture Accenture Ab Initio Training Ab Initio Training 49 49
Ab Initio or Informatica-
Ab Initio or Informatica-
Powerful ETL-continued
Powerful ETL-continued
Error Handling - In Ab Initio you can attach error and reject files to Error Handling - In Ab Initio you can attach error and reject files to
each transformation and capture and analyze the message and data each transformation and capture and analyze the message and data
separately. Informatica has one huge log! Very inefficient when separately. Informatica has one huge log! Very inefficient when
working on a large process, with numerous points of failure. working on a large process, with numerous points of failure.
Robust transformation language - Informatica is very basic as far as Robust transformation language - Informatica is very basic as far as
transformations go. While I will not go into a function by function transformations go. While I will not go into a function by function
comparison, it seems that Ab Initio was much more robust. comparison, it seems that Ab Initio was much more robust.
Instant feedback - On execution, Ab Initio tells you how many Instant feedback - On execution, Ab Initio tells you how many
records have been processed/rejected/etc. and detailed records have been processed/rejected/etc. and detailed
performance metrics for each component. Informatica has a debug performance metrics for each component. Informatica has a debug
mode, but it is slow and difficult to adapt to. mode, but it is slow and difficult to adapt to.
Accenture Accenture Ab Initio Training Ab Initio Training 50 50
Both tools are fundamentally
Both tools are fundamentally
different
different
Which one to use depends on the work at hand and Which one to use depends on the work at hand and
existing infrastructure and resources available. existing infrastructure and resources available.
Informatica is an engine based ETL tool, the power this Informatica is an engine based ETL tool, the power this
tool is in it's transformation engine and the code that it tool is in it's transformation engine and the code that it
generates after development cannot be seen or generates after development cannot be seen or
modified. Ab Initio is a code based ETL tool, it generates modified. Ab Initio is a code based ETL tool, it generates
ksh or bat etc. code, which can be modified to achieve ksh or bat etc. code, which can be modified to achieve
the goals, if any that cannot be taken care through the the goals, if any that cannot be taken care through the
ETL tool itself. ETL tool itself.
Ab Initio doesn't need a dedicated administrator, UNIX Ab Initio doesn't need a dedicated administrator, UNIX
or NT Admin will suffice, where as other ETL tools do or NT Admin will suffice, where as other ETL tools do
have administrative work. have administrative work.
Accenture Accenture Ab Initio Training Ab Initio Training 51 51
Ab Initio Product Architecture
Ab Initio Product Architecture
Native Operating System (Unix, Windows, OS/390)
The Ab Initio Co>Operating

System
Component
Library
Development Environments
GDE Shell
3rd Party
Components
User-defined
Components
User Applications
Ab Initio
EME
Accenture Accenture Ab Initio Training Ab Initio Training 52 52
Ab Initio Architecture-
Ab Initio Architecture-
Explanation
Explanation
The Ab Initio Cooperating system unites the network of The Ab Initio Cooperating system unites the network of
computing resources-CPUs,storage disks , programs , computing resources-CPUs,storage disks , programs ,
datasets into a production quality data processing datasets into a production quality data processing
system with scalable performance and mainframe class system with scalable performance and mainframe class
reliability. reliability.
The Cooperating system is layered on the top of the The Cooperating system is layered on the top of the
native operating systems of the collection of servers .It native operating systems of the collection of servers .It
provides a distributed model for process execution, file provides a distributed model for process execution, file
management ,debugging, process monitoring , management ,debugging, process monitoring ,
checkpointing .A user may perform all these functions checkpointing .A user may perform all these functions
from a single point of control. from a single point of control.
Accenture Accenture Ab Initio Training Ab Initio Training 53 53
Co>Operating System Services
Co>Operating System Services
Parallel and distributed application execution Parallel and distributed application execution
Control Control
Data Transport Data Transport
Transactional semantics at the application level. Transactional semantics at the application level.
Checkpointing. Checkpointing.
Monitoring and debugging. Monitoring and debugging.
Parallel file management. Parallel file management.
Metadata-driven components. Metadata-driven components.
Accenture Accenture Ab Initio Training Ab Initio Training 54 54
Ab Initio: What We Do
Ab Initio: What We Do
Ab Initio software helps you build large-scale data Ab Initio software helps you build large-scale data
processing applications and run them in parallel processing applications and run them in parallel
environments. Ab Initio software consists of two main environments. Ab Initio software consists of two main
programs: programs:
Co>Operating System: Co>Operating System:
which your system administrator installs on a which your system administrator installs on a host host Unix Unix
or Windows NT server, as well as on processing or Windows NT server, as well as on processing
computers. computers.
The Graphical Development Environment (GDE): The Graphical Development Environment (GDE):
which you install on your PC ( which you install on your PC (GDE Computer GDE Computer) and ) and
configure to communicate with the host. configure to communicate with the host.
Accenture Accenture Ab Initio Training Ab Initio Training 55 55
The Ab Initio Co>Operating
System
The Co>Operating System The Co>Operating System Runs across Runs across
a variety of Operating Systems and a variety of Operating Systems and
Hardware Platforms including OS/390 on Hardware Platforms including OS/390 on
Mainframe Mainframe, , Unix Unix, and , and Windows Windows. Supports . Supports
distributed and parallel execution. Can distributed and parallel execution. Can
provide scalability proportional to the provide scalability proportional to the
hardware resources provided. Supports hardware resources provided. Supports
platform independent data transport. platform independent data transport.
Accenture Accenture Ab Initio Training Ab Initio Training 56 56
The Ab Initio Co>Operating
System-Continued
The Ab Initio Co>Operating System
depends on parallelism to connect (i.e.,
cooperate with) diverse databases. It
extracts,
transforms and loads data to and from
Teradata and other data sources.
Accenture Accenture Ab Initio Training Ab Initio Training 57 57
Solaris,
AIX, NT,
Linux,
NCR
Top Layer
Co-Op System
Any OS
Same Co-Op Command
On any OS.
Graphs can be moved from
One OS to another w/o any
Changes.
Co-Operating System Layer
GDE
GDE
GDE
GDE
Accenture Accenture Ab Initio Training Ab Initio Training 58 58
The Ab Initio Co>Operating System The Ab Initio Co>Operating System
Runs on: Runs on:
Sun Solaris Sun Solaris
IBM AIX IBM AIX
Hewlett-Packard HP- Hewlett-Packard HP-
UX UX
Siemens Pyramid Siemens Pyramid
Reliant UNIX Reliant UNIX
IBM DYNIX/ptx IBM DYNIX/ptx
Silicon Graphics IRIX Silicon Graphics IRIX
Red Hat Linux Red Hat Linux
Windows NT 4.0 Windows NT 4.0
(x86) (x86)
Windows NT 2000 Windows NT 2000
(x86) (x86)
Compaq Tru64 UNIX Compaq Tru64 UNIX
IBM OS/390 IBM OS/390
NCR MP-RAS NCR MP-RAS
Accenture Accenture Ab Initio Training Ab Initio Training 59 59
Connectivity to Other Software
Connectivity to Other Software
Common, high performance database Common, high performance database
interfaces: interfaces:
IBM DB2, DB2/PE, DB2EEE, UDB, IMS IBM DB2, DB2/PE, DB2EEE, UDB, IMS
Oracle, Informix XPS,Sybase,Teradata,MS SQL Oracle, Informix XPS,Sybase,Teradata,MS SQL
Server 7 Server 7
OLE-DB OLE-DB
ODBC ODBC
Other software packages: Other software packages:
Connectors to many other third party products Connectors to many other third party products
Trillium, ErWin, Siebel, etc. Trillium, ErWin, Siebel, etc.
Accenture Accenture Ab Initio Training Ab Initio Training 60 60
Ab Initio Cooperating System
Ab Initio Cooperating System
Ab Initio Software Corporation, headquartered in Lexington, MA, develops
software solutions that process vast amounts of data (well into the terabyte
range) in a timely fashion by employing many (often hundreds) of server
processors in parallel. Major corporations worldwide use Ab Initio software
in mission critical, enterprise-wide, data processing systems. Together,
Teradata and Ab Initio
deliver:
End-to-end solutions for integrating and processing data throughout
the enterprise
Software that is flexible, efficient, and robust, with unlimited scalability
Professional and highly responsive support
The Co>Operating System executes your application by creating and managing
the processes and data flows that the components and arrows represent.
Accenture Accenture Ab Initio Training Ab Initio Training 61 61
Graphical Development
Environment GDE
Accenture Accenture Ab Initio Training Ab Initio Training 62 62
The GDE
The GDE
The Graphical Development Environment (GDE) provides The Graphical Development Environment (GDE) provides
a graphical user interface into the services of the a graphical user interface into the services of the
Co>Operating System. Co>Operating System. The Graphical Development The Graphical Development
Environment Environment Enables you to create applications by Enables you to create applications by
dragging and dropping Components. Allows you to point dragging and dropping Components. Allows you to point
and click operations on executable flow charts. The and click operations on executable flow charts. The
Co>Operating System can execute these flowcharts Co>Operating System can execute these flowcharts
directly. Graphical monitoring of running applications directly. Graphical monitoring of running applications
allows you to quantify data volumes and execution allows you to quantify data volumes and execution
times, helping spot opportunities for improving times, helping spot opportunities for improving
performance. performance.
Accenture Accenture Ab Initio Training Ab Initio Training 63 63
The Graph Model
The Graph Model
Accenture Accenture Ab Initio Training Ab Initio Training 64 64
The Component Library:
The Component Library:
The Component Library: The Component Library: Reusable software Reusable software
Modules for Sorting, Data Transformation, Modules for Sorting, Data Transformation,
database Loading Etc. The components adapt at database Loading Etc. The components adapt at
runtime to the record formats and business rules runtime to the record formats and business rules
controlling their behavior. controlling their behavior.
Ab Initio products have helped reduce a project Ab Initio products have helped reduce a project
s development and research time significantly. s development and research time significantly.
Accenture Accenture Ab Initio Training Ab Initio Training 65 65
Components
Components
Components may run on any computer running Components may run on any computer running
the Co>Operating System. the Co>Operating System.
Different components do different jobs. Different components do different jobs.
The particular work a component accomplishes The particular work a component accomplishes
depends upon its parameter settings. depends upon its parameter settings.
Some parameters are data transformations, that Some parameters are data transformations, that
is business rules to be applied to an input (s) to is business rules to be applied to an input (s) to
produce a required output. produce a required output.
Accenture Accenture Ab Initio Training Ab Initio Training 66 66
3
3
rd
rd
Party Components
Party Components
Accenture Accenture Ab Initio Training Ab Initio Training 67 67
EME
EME
The Enterprise Meta>Environment (EME) is a high- The Enterprise Meta>Environment (EME) is a high-
performance object-oriented storage system that performance object-oriented storage system that
inventories and manages various kinds of information inventories and manages various kinds of information
associated with Ab Initio applications. It provides storage associated with Ab Initio applications. It provides storage
for all aspects of your data processing system, from for all aspects of your data processing system, from
design information to operations data. design information to operations data.
The EME also provides rich store for the applications The EME also provides rich store for the applications
themselves, including data formats and business rules. It themselves, including data formats and business rules. It
acts as hub for data and definitions . Integrated acts as hub for data and definitions . Integrated
metadata management provides the global and metadata management provides the global and
consolidated view of the structure and meaning of consolidated view of the structure and meaning of
applications and data- information that is usually applications and data- information that is usually
scattered throughout you business . scattered throughout you business .
Accenture Accenture Ab Initio Training Ab Initio Training 68 68
Benefits of EME
Benefits of EME

The Enterprise Meta>Environment provides a rich store The Enterprise Meta>Environment provides a rich store
for applications and all of their associated information for applications and all of their associated information
including : including :
Technical Metadata-Applications related business rules , Technical Metadata-Applications related business rules ,
record formats and execution statistics record formats and execution statistics
Business Metadata-User defined documentations of job Business Metadata-User defined documentations of job
functions ,roles and responsibilities. functions ,roles and responsibilities.
Metadata is data about data and is critical to understanding Metadata is data about data and is critical to understanding
and driving your business process and computational and driving your business process and computational
resources .Storing and using metadata is as important to resources .Storing and using metadata is as important to
your business as storing and using data. your business as storing and using data.
Accenture Accenture Ab Initio Training Ab Initio Training 69 69
EME-Ab Initio Relevance
EME-Ab Initio Relevance
By integrating technical and business By integrating technical and business
metadata ,you can grasp the entirety of metadata ,you can grasp the entirety of
your data processing from operational to your data processing from operational to
analytical systems. analytical systems.
The EME is completely integrated The EME is completely integrated
environment. The following figure shows environment. The following figure shows
how it fits in to the high level architecture how it fits in to the high level architecture
of Ab Initio software. of Ab Initio software.
Accenture Accenture Ab Initio Training Ab Initio Training 70 70
Accenture Accenture Ab Initio Training Ab Initio Training 71 71
Stepwise explanation of Ab
Stepwise explanation of Ab
Initio Architecture
Initio Architecture
You construct your application from the building blocks You construct your application from the building blocks
called components, manipulating them through the called components, manipulating them through the
Graphical Development Environment (GDE). Graphical Development Environment (GDE).
You check in your applications to the EME. You check in your applications to the EME.
The EME and GDE uses the underlining functionality of The EME and GDE uses the underlining functionality of
the Co>Operating System to perform many of their the Co>Operating System to perform many of their
tasks. The Cooperating System units the distributed tasks. The Cooperating System units the distributed
resources into a single resources into a single virtual computer virtual computer to run to run
applications in parallel. applications in parallel.
Ab Initio software runs on Unix ,Windows NT,MVS Ab Initio software runs on Unix ,Windows NT,MVS
operating systems. operating systems.
Accenture Accenture Ab Initio Training Ab Initio Training 72 72
Stepwise explanation of Ab
Stepwise explanation of Ab
Initio Architecture - continued
Initio Architecture - continued
Ab Initio connector applications extract Ab Initio connector applications extract
metadata from third part metadata sources into metadata from third part metadata sources into
the EME or extract it from the EME into a third the EME or extract it from the EME into a third
party destination. party destination.
You view the results of project and application You view the results of project and application
dependency analysis through a Web user dependency analysis through a Web user
interface .You also view and edit your business interface .You also view and edit your business
metadata through a web user interface. metadata through a web user interface.
Accenture Accenture Ab Initio Training Ab Initio Training 73 73
EME :Various users
EME :Various users
constituency served
constituency served
The EME addresses the metadata needs of The EME addresses the metadata needs of
three different constituencies: three different constituencies:
Business Users Business Users
Developers Developers
System Administrators System Administrators
Accenture Accenture Ab Initio Training Ab Initio Training 74 74
EME :Various users
EME :Various users
constituency served
constituency served
Business users are interested in exploiting data Business users are interested in exploiting data
for analysis, in particular with regard to for analysis, in particular with regard to
databases ,tables and columns. databases ,tables and columns.
Developers tend to be oriented towards Developers tend to be oriented towards
applications ,needing to analyze the impact of applications ,needing to analyze the impact of
potential program changes. potential program changes.
System Administrator and production personnel System Administrator and production personnel
want job status information and run statistics. want job status information and run statistics.
Accenture Accenture Ab Initio Training Ab Initio Training 75 75
EME Interfaces
EME Interfaces
We can create and manage EME through We can create and manage EME through
3 interfaces: 3 interfaces:
GDE GDE
Web User Interface Web User Interface
Air Utility Air Utility
Accenture Accenture Ab Initio Training Ab Initio Training 76 76
Thank You
Thank You
End of Session 1 End of Session 1

You might also like