Professional Documents
Culture Documents
V. BEYET
03/07/2006
1
Presentation ...
Who am I ?
2
Summary
3
General presentation
Datastage : What is it ?
A graphic environment
4
General presentation
Data transformation :
Select,
Format,
Combine,
Aggregate
Sort.
5
General presentation
Development is done :
on a client-server mode,
with a graphical Design of flows,
with simple and basic elements,
with a simple language (basic).
Treatments are :
Compiled and run by an engine,
Written on a Universe database,
6
General presentation
Designer Manager
Server
Director Administrator
The programs
Called Jobs : first as source code and then as
executable programs, written in Universe Database
Data :
May be written in Universe Database but better in
server directories.
8
General presentation Server
9
General presentation Servur
Universe Database:
A Hash file is an indexed file; It‟s the central element to use all
the possibilities of the Datastage engine.
A Hash file with incorrectly defined keys may create disastrous problems.
10
Summary
11
Designer
The designer
12
Designer
The designer
Passive stages : a place for Data storage (the
data flow is from the stage or to the stage)
(DataStage engine).
link.
13
Designer
The designer
Active stages
An active stage is a representation of a transformation on the dataflow :
Sort : of a file
Aggregator : calculations
14
Designer
The designer
links
15
Designer
The designer
16
Designer
The designer
DataStage Designer :
Each job has :
- one or more source of data
- one or more transformation
- one or more destination for the data
The toolbar contains the stage icons to design
the jobs.
The jobs have to be compiled to create
executable programs.
17
Designer
The designer
The repository
The toolbar
with stage
icons
(palette)
18
Designer
The designer
19
Designer
The designer
Can be read,
Can be written,
Can be read and written in the same job,
Can be written cash or not,
Can be DOS file or Unix file …
Can be read by two jobs at the same time
Can‟t be written by two jobs at the same time
20
Designer
The designer
Sequential File :
Stage name
File Type
Stage description
21
Designer
The designer
Sequential File :
Output link
22
Designer
The designer
Sequential File :
23
Designer
The designer
Sequential File :
To test the connection and
Different columns of the Size to display view the data in the file
file (Output) : type, length (for View Data)
24
Designer
The designer
Sequential File :
25
Designer
The designer
Sequential File :
26
Designer
The designer
Sequential File :
View Data
27
Designer
The designer
Transformer Stage :
28
Designer
The designer
Transformer Stage :
Can do treatments by :
native basic function or created in the manager,
DataStage function or DataStage macro,
routines (before/after type)
Or only propagate columns.
29
Designer
The designer
Transformer Stage :
Output data
Input data
Right click :
propagate all
the columns
30
Designer
The designer
Transformer Stage :
Output data
Input data
31
Designer
The designer
Exercise n°1 :
Objective : Read a sequential file and create a new one (save the file)
The catalogue.in file has to be read and the catalogue_save.tmp file has to be written
Steps :
1- Create a table definition (structure of Catalogue table )
2- Design the job with 2 Sequential Files and 1 Transformer
3- Create the links (data flow)
4- Save and Compile the job
5- Run the job
6-Look at the performances statistics (right click)
32
Designer
The designer
Transformer Stage :
33
Designer
The designer
34
Designer
The designer
Exercise n°2 :
35
Designer
The designer
36
Designer
The designer
Hash File :
Stage name
Account name
(DataStage project)
File path
37
Designer
The designer
38
Designer
The designer
Hash File :
39
Designer
The designer
40
Designer
The designer
Principal Flow
(horizontal)
41
Designer
The designer
Exercise n°3 :
Objective : make a lookup between Catalog file and Film Type
to put the type film in the output file.
Steps :
1- Create a table definition (structure of FilmType table )
2- Modify your job to create a Hash File from the FilmType.in file
3- Create the link to show the lookup (data flow)
4- Save and Compile the job
5- Run the job
6-Look at the performances statistics (right click)
42
Designer
The designer
Exercise n°4 :
Objective : put the director name and the film name together
separated by a “>”. If the film type is not found, put “unknown
type” in the output file. What happens when the director name is
empty ? Find a solution.
43
Designer
The designer
Exercise n°5 :
Objective : If the film type is not found (use constraint), put the
film in a refusals file (First a Sequential file and then a Hash File)
44
Designer
The designer
Don’t forget : lookup can be designed with ORAOCI stage or UV stage but it is more
better with Hash Files.
45
Designer
The designer
Exercise n°6 :
Objective : Select only the films for which the type is known
(that means that the lookup is OK)
46
Designer
The designer
Exercise n°7 :
Objective : Select all the clients who are female to put them in
an output file
The SEXE column contains M (Male) or F (female)
And then create an annotation for this job (all the jobs must have annotations)
47
The director Director
Run jobs
Immediately or later, with more options than in the Designer
Job monitoring
To control the number of lines treated by each active stage of a job.
48
The director Director
49
The director Director
50
The director Director
Rows limit : the job stops after x Warnings limit : the job
rows (on each flow) stops after x warnings
51
The director Director
The status :
• "Not compiled"
• "Compiled"
• "Failed validation"
• "Validated ok"
• "Aborted"
• "Finished"
• "Running"
52
The director Director
53
The director Director
54
The director Director
Example of a log :
To look at error messages,
choose the job and click on the
“log” button
Green : OK – No problem
Yellow : warning
Red : blocking problem
Don’t forget : Clear the log from time to time (Job>Clear log).
55
The manager Manager
•jobs
•Routines
•table definitions
56
The manager Manager
•Jobs
•Table definitions
57
The manager Manager
IMPORT
This will create/modify elements in
the DataStage Project
58
The manager Manager
With the manager, you can compile many jobs at the same time (multiple compile
jobs)
you select the type of jobs you want to compile and select “Show manual
selection page” and click on “Next” button
59
Designer
The designer
Sort Stage :
60
Designer
The designer
Exercise n°8 :
Objective : When you have selected all the Women, sort the file
by alphabetical order.
61
Designer
The designer
Aggregator Stage :
62
Designer
The designer
63
Designer
The designer
Group by
Different
functions
64
Designer
The designer
Exercise n°9 :
65
Designer
The designer
Exercise n°10 :
66
Designer
The designer
67
Designer
The designer
68
Designer
The designer
69
Designer
The designer
Exercise n°11 :
Objective : With the job from exercise 10 (use the 2 methods in
the same job), create a Hash File to put the different results in the
same Hash File.
Column 1 : “AVERAGE METHOD 1” or “AVERAGE
METHOD 2”
Column 2 : the result of each method
In the Hash file, you must have 2 lines.
70
Designer
The designer
71
Designer
The designer
Stage Variables :
Simple treatments can be made easily with stage variable.
- It is a data which remain “active” during all the duration of the stage. So you
can find a max (if data is sorted), calculate a sum or count something.
- In the transformer, click on the right button and then select “Show Stage
variables”. Example :
72
Designer
The designer
Another example :
73
Designer
The designer
Exercise n°12 :
Objective : Try to calculate the average with stage variables.
Exercise n°13 :
Objective : Create a job that create a file with all the client (key)
and in a second column the list of the films (separated by a dot).
74
Designer
The designer
75
Designer
The designer
76
Designer
The designer
DataStage Variables :
Different variables are defined by Datastage :
-@NULL
- @INROWNUM, @OUTROWNUM
- @DATE
- @TRUE, @FALSE
- @PATH
Link Variables :
The more useful is : NOTFOUND
77
Designer
The designer
Routines :
- Source code (written with Basic language)
- It is external from the jobs and can be used many times at many
levels
- It can be a Transform function or a Before/After Function :
a transform function is called at each line
a before subroutine is called before the first line
(example : empty a file)
an after subroutine is called when all the lines have been
treated
78
Designer
The designer
79
Designer
The designer
80
Designer
The designer
Routines (3/3)
Code : use
Argument names
Test of
Save Compile the
routine
81
Designer
The designer
82
Designer
The designer
Routines :
Call DSLogInfo("Information", "RoutineName") For i= … To
Call DSLogWarn("Warning", "RoutineName") Next i
Call DSLogFatal("Abort", "RoutineName")
83
Designer
The designer
Routines : Test
84
Designer
The designer
Exercise n°14 :
Step 1 :
85
Designer
The designer
86
Designer
The designer
Exercise n°14 :
Step 2
87
Designer
The designer
88
Designer
The designer
Exercise n°15 :
Objective : With a routine (Use CASE ), calculate the amount
for the cassette hire (days number * hire price * coefficient).
The coefficient is calculated with that rule :
<5 days = days * hire price
>=5 and <10 days = days * hire price * 1.20
>=10 and <30 days = days * hire price * 1.50
>= 30 days = days * hire price * 3
89
Designer
The designer
UV Stage :
– works with internal hash file (in the DataStage Project)
– makes a Cartesian product
– uses SQL requests (select … from … where … order by …)
90
Designer
The designer
91
Designer
The designer
92
Designer
The designer
Step 3 :
93
Designer
The designer
94
Designer
The designer
95
Designer
The designer
The normalization :
Normalization :
12 A
12 A|B|C|D|E 12 B
12 C
12 D
12 E
Un-normalization :
96
Designer
The designer
Normalization :
Multi-valuated file must have :
1- a key
2- char(253) or @VM for separator
3- The “Normalize On” field from Hash File checked
4- the column(s) to normalize
1 2 4
3
97
Designer
The designer
98
Designer
The designer
99
Designer
The designer
100
Designer
The designer
101
Designer
The designer
query generated by
DataStage or user-
defined query
102
Designer
The designer
Selection of
the
columns
“Group by”
clause
103
Designer
The designer
104
Designer
The designer
Enter custom SQL statement : when you want to add something specific
105
Designer
The designer
Important parameters
106
Designer
The designer
Number of lines
between 2 commit
107
Designer
The designer
108
Designer
The designer
109
Designer
The designer
Treat lines 1 by 1
110
Designer
The designer
111
Designer
The designer
Number of lines
between 2 Commit
112
Designer
The designer
On the repository,
113
Designer
The designer
114
Designer
The designer
115
Designer
The designer
116
Designer
The designer
117
The administrator
Administrator
The Administrator :
118
The administrator
Administrator
And click on
Command button
119
The administrator
Administrator
120
The administrator
Administrator
121
The administrator
Administrator
Create a project
Location for the Project (jobs,
routines, UV hash files, table
122