Professional Documents
Culture Documents
5 IBM Presentation 268018 7 PDF
5 IBM Presentation 268018 7 PDF
order of Teradata
Stewart Hanna
Product Manager
Agenda
Platform Overview
Appetizer - Productivity
First Course Extensibility
Second Course Scalability
Desert Teradata
2
Information Management Software
Agenda
Platform Overview
Appetizer - Productivity
First Course Extensibility
Second Course Scalability
Desert Teradata
3
Information Management Software
Business
Value
Maturity of
Information Use
4
Information Management Software
5
Information Management Software
6
Information Management Software
InfoSphere DataStage
Provides codeless visual design of
data flows with hundreds of built-in
transformation functions
Optimized reuse of integration Developers Architects
objects
Supports batch & real-time Transform InfoSphere DataStage
operations Transform and aggregate any volume
of information in batch or real time
Produces reusable components that through visually designed logic
can be shared across projects
Deliver
Complete ETL functionality with
metadata-driven productivity
Supports team-based development
and collaboration
Provides integration from across the
broadest range of sources Hundreds of Built-in
Transformation Functions
7
Information Management Software
Agenda
Platform Overview
Appetizer - Productivity
First Course Extensibility
Second Course Scalability
Desert Teradata
8
Information Management Software
DataStage Designer
Complete
development
environment
Graphical, drag and
drop top down design
metaphor
Develop sequentially,
deploy in parallel
Component-based
architecture
Reuse capabilities
9
Information Management Software
Transformation Components
Lookup
Sort, Aggregation,
Transformer
Pivot, CDC
Join, Merge
Filter, Funnel, Switch, Modify
Remove Duplicates
Restructure stages
10
Information Management Software
Combining Data:
- Lookup, Joins, Merge
- Aggregator
Transform Data:
- Transformer, Remove Duplicates
Ancillary:
- Row Generator, Peek, Sort
11
Information Management Software
Deployment
12
Information Management Software
13
Information Management Software
InfoSphere FastTrack
To reduce Costs of Integration Projects through Automation
Business analysts and IT
collaborate in context to
create project specification
Leverages source
analysis, target models, Specification
and metadata to facilitate
mapping process
Auto-generation of
data transformation
jobs & reports
Generate historical
documentation for tracking
Supports data governance Auto-generates
DataStage jobs
Flexible Reporting
14
Information Management Software
Agenda
Platform Overview
Appetizer - Productivity
First Course Extensibility
Second Course Scalability
Desert Teradata
15
Information Management Software
16
Information Management Software
17
Information Management Software
18
Information Management Software
19
Information Management Software
Agenda
Platform Overview
Appetizer - Productivity
First Course Extensibility
Second Course Scalability
Desert Teradata
20
Information Management Software
ing
g
ing
in
Divide the laundry into different loads (whites, darks, ion
on
ion
linens)
i
rt i t
rtit
rt i t
Wash
Pa
pa
Dry Fold
Pa
Work on each piece independently
Re
Whites, Darks, Linens Delicates, Lingerie, Outside Shirts on Hangers, pants
3 Times faster with 3 Washing machines, continue with 3 Workclothes Linens folded
dryers and so on
21
Information Management Software
Data Partitioning
Transform Processor 1
A-F
G- M Transform Processor 2
Source
Data
N-T
Transform Processor 3
U-Z
Transform Processor 4
Partitioning
available
Types of
options
Break up big data into partitions
Run one partition on each processor
4X times faster on 4 processors; 100X faster on 100 processors
Partitioning is specified per stage meaning partitioning can change
between stages
Information Management Software
Data Pipelining
Records 9,001 1,000 records 1,000 records 1,000 records 1,000 records 1,000 records 1,000 records 1,000 records 1,000 records
to 100,000,000 eighth chunk seventh chunk sixth chunk fifth chunk fourth chunk third chunk second chunk first chunk
Archived Data
Load
Eliminate the write to disk and the read from disk
Source between processes Target
Start a downstream process while an upstream
process is still running.
This eliminates intermediate staging to disk, which is
critical for big data.
This also keeps the processors busy.
Information Management Software
Parallel Dataflow
g
ionin
it
Part
Source
Data
Pipelining
Customer last name Customer Postcode Credit card number
ing
ing
g
U-Z
nin
ion
ion
N-T
itio
G- M
rtit
Source
rtit
rt
Data A-F
pa
pa
Pa
Re
Re
Application Assembly: One Dataflow Graph Created With the DataStage GUI
Source Data
Data Warehouse
Source Target
Source Data
Data Warehouse
Source Target
Sequential U-Z
ing
in g
in g
N-T
32 4-way nodes
io n
on
on
Source G- M Data
rtit
ti
ti
rti
rti
Data A-F Warehouse
Pa
pa
pa
..
..
Re
Re
..
Customer last name Cust zip code Credit card number
..
..
Target
..
Source
4-way parallel U-Z
ing
N-T
in g
in g
io n
G- M
on
on
Source Data
rtit
ti
ti
Data Warehouse
rti
rti
A-F
Pa
pa
pa
Re
Re
U-Z
ing
N-T
in g
in g
io n
on
G- M
on
Source Data
rtit
ti
ti
rti
rti
Data A-F Warehouse
Pa
pa
pa
Re
Re
128-way parallel
26
Information Management Software
File Set stage: allows you to read data from or write data to a
file set. It only executes in parallel mode.
Sequential File stage: reads data from or writes data to one or more
flat files. It usually executes in parallel, but can be configured to execute
sequentially.
27
Information Management Software
(One object)
. . .
SAS stage:
Executes part or all of a SAS application in parallel by processing
parallel streams of data with parallel instances of SAS DATA and
PROC steps.
A parallel SAS Data Set is a set of one or more sequential SAS data
sets, with a header file specifying the names and locations of all of the
component files.
Information Management Software
30
Information Management Software
31
Information Management Software
32
Information Management Software
33
Information Management Software
DataStage
Designer
c design
job original
DataStage d compile
job & run
e verify
i manually review/ job
f optimize Balanced Optimization edit optimized job results
job
rewritten
optimized g compile
job & run
34
Information Management Software
35
Information Management Software
Agenda
Platform Overview
Appetizer - Productivity
First Course Extensibility
Second Course Scalability
Desert Teradata
36
Information Management Software
Frictionless Connectivity
37
Information Management Software
Teradata Connectivity
7+ highly optimized interfaces for
Teradata leveraging Teradata Tools
FastLoad
and Utilities: FastLoad, FastExport,
TPUMP, MultiLoad, Teradata FastExport
Parallel Transport, CLI and ODBC
MulitLoad
You use the best interfaces
specifically designed for your TPUMP
Teradata
integration requirements: EDW
Real-time/transactional trickle
feeds without table locking
38
Information Management Software
Teradata Connectivty
RequestedSessions determines the total number of distributed
connections to the Teradata source or target
When not specified, it equals the number of Teradata VPROCs
(AMPs)
Can set between 1 and number of VPROCs
39
Information Management Software
Example Settings
Configuration File Sessions Per Player Total Sessions
16 nodes 1 16
8 nodes 2 16
8 nodes 1 8
4 nodes 4 16
40
Information Management Software
32 Gb
Interconnect
24 Port Switch 24 Port Switch
TeradataTeradataTeradataTeradataTeradataTeradata TeradataTeradataTeradataTeradataTeradataTeradata
Node Node Node Node Node Node Node Node Node Node Node Node
Teradata
Legend
Stacked Switches
Linux Grid
Teradata
41
Information Management Software
42
Information Management Software
43