Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
1Activity
0 of .
Results for:
No results containing your search query
P. 1
Data Stage1

Data Stage1

Ratings: (0)|Views: 21|Likes:
Published by mukesh

More info:

Published by: mukesh on Aug 23, 2013
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

03/19/2014

pdf

text

original

 
File Stages
There are 8 file stages.
1) Complex Flat File (CFF)
-
The Complex Flat File (CFF) stage is a file stage.
-
You can use the stage to read a file or write a file, but you cannot use the same stage to do both.
-
The stage can have a single input link or a single output link, as well as a single reject link.
-
When used as a source, the stage allows you to read data from one of more complex flat files,including MVS datasets with QSAM and VSAM files. A complex flat file may contain one or moreGROUPs, REDEFINES, OCCURS, or OCCURS DEPENDING ON clauses. Complex Flat File sourcestages execute in parallel mode when they are used to read multiple files, but you can configure thestage to execute sequentially if it is only reading one file with a single reader.
-
When used as a target, the stage allows you to write data to one or more complex flat files.
-
It does not write to MVS datasets.
-
To use the CFF stage:• In the
File Options Tab
, specify the stage properties.If reading a file or files:
 – 
Specify the type of file you are reading.
 – 
Give the name of the file or files you are going to read.
 – 
Specify the record type of the files you are reading.
 – 
Define what action to take if files are missing from the source.
 – 
Define what action to take with records that fail to match the expected meta data.If writing a file or files:
 – 
Specify the type of file you are writing.
 – 
Give the name of the files you are writing.
 – 
Specify the record type of the files you are writing.
 – 
Define what action to take if records fail to be written to the target file(s).• In the
Record Options Tab
, describe the format of the data you are reading or writing.• In the
Stage
 page
Columns Tab
, define the column definitions for the data you are reading or writing using this stage.
2) Data Set
-The Data Set stage is a file stage.-What is a data set? DataStage parallel extender jobs use data sets to manage data within a job. You canthink of each link in a job as carrying a data set. The Data Set stage allows you to store data beingoperated on in a persistent form, which can then be used by other DataStage jobs. Data sets areoperating system files, each referred to by a control file, which by convention has the suffix .ds. Usingdata sets wisely can be key to good performance in a set of linked jobs.-If you open the file in OS, it will show in different format.-A data set comprises a descriptor file and a number of other files that are added as the data set grows.These files are stored on multiple disks in your system. A data set is organized in terms of partitionsand segments. Each partition of a data set is stored on a single processing node. Each data segment1
 
contains all the records written by a single DataStage job. So a segment can contain files from many partitions, and a partition has files from many segments.-The descriptor file for a data set contains the following information:Data set header information.Creation time and data of the data set.• The schema of the data set.• A copy of the configuration file use when the data set was created.- For each segment, the descriptor file contains:• The time and data the segment was added to the data set.• A flag marking the segment as valid or invalid.• Statistical information such as number of records in the segment and number of bytes.• Path names of all data files, on all processing nodes.This information can be accessed through the Data Set Manager.-It will act as a lookup also.-By using orchadmin rm <<path>>/<<dataset name>> command you can delete the Data Set from theUNIX.-You can Append or Over write the DataSet.-Through Data Set management utility you can delete a partition or Segment or entire Data set.-The stage can have a single input link or a single output link. It does not support reject link.- It can be configured to execute in parallel or sequential mode.-As a source you cannot specify the partition to read data.-As a target you can specify the partition technique which is to be used.-DataSet cannot maintain the unique records. If you send the duplicate records to it then it contains theduplicate records as it is.-Each time you can change the configuration file and can load the data into same dataset. But for Append it will contain the config file details of config files used while it was created first time. For overwrite it will have the new config details only in descriptor file-If you set the update policy to append and send the more no.of columns data than it contains the ittakes the data for existing matching columns and load into dataset and logs the warning.-Questions:------------1) How to count the no.of records in a DataSet?Ans) dsrecords <<DataSet Name>>
Note: You can not use the same Data Set name as a source and target. You cannot update the Data Set.3) External Source
-The External Source stage is a file stage.-It allows you to read data that is output from one or more source programs.-The stage calls the program and passes appropriate arguments.-The stage can have a single output link, and a single rejects link.-It can be configured to execute in parallel or sequential mode.-It will not use as a target.-It will not take any input link.-The External Source stage allows you to perform actions such as interface with databases not currentlysupported by the DataStage Enterprise Edition.2
 
-When reading output from a program, DataStage needs to know something about its format. Theinformation required is how the data is divided into rows and how rows are divided into columns. Youspecify this on the
Format
tab.
4) External Target
-The External Target stage is a file stage. It allows you to write data to one or more source programs.The stage can have a single input link and a single rejects link. It can be configured to execute in parallel or sequential mode.-It will not act as a source.-The External Target stage allows you to perform actions such as interface with databases not currentlysupported by the DataStage Parallel Extender.-When writing to a program, DataStage needs to know something about how to format the data. Theinformation required is how the data is divided into rows and how rows are divided into columns. Youspecify this on the
Format
tab. Settings for individual columns can be overridden on the
Columns
tabusing the
Edit Column Metadata
dialog box.
5) File Set
-The File Set stage is a file stage.-It allows you to read data from or write data to a file set.-The stage can have a single input link or a single output link and a single rejects link.-It only executes in parallel mode.-What is a file set? DataStage can generate and name exported files, write them to their destination, andlist the files it has generated in a file whose extension is, by convention, .fs. The data files and the filethat lists them are called a
 file set 
. This capability is useful because some operating systems impose a 2GB limit on the size of a file and you need to distribute files among nodes to prevent overruns.-The amount of data that can be stored in each destination data file is limited by the characteristics othe file system and the amount of free disk space available.-The number of files created by a file set depends on:The number of processing nodes in the default node poolThe number of disks in the export or default disk pool connected toeach processing node in the default node pool• The size of the partitions of the data set-Unlike data sets, file sets carry formatting information that describes the format of the files to be reador written.-File name will be having suffix .fs-It creates the data files under each node at specified directories in config file.
6) FTP Plug-in
-It will allow to have only one input or output link.-You can load the data to or get the data from remote server.As a source-It cannot produce a reference or reject link.As a target-It can not have a reference lin3

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->