You are on page 1of 1

DATASET

Dataset will stores the data in the Native Format. Ex .DS Dataset is file stage, which is used for staging the data when we design dependent jobs. Dataset Supports 1 input link or 1 Output link and there will be no reject links in dataset stage. By Default Dataset will processed parallely. Dataset will stores the data inside Repository ( i.e inside Datastage) And Dataset is multiple files. They are a) Descriptor File b) Data File c) Control file d) Header Files In Descriptor File, we can see the Schema details and address of data. In Data File, we can see the data in Native format. And Control and Header files resides in Operating System. Pipeline anD partitioning
Pipeline parallelism means that as soon as data is available between stages( in pipes or links), it can be exchanged between them without waiting for the entire record set to be read. Partitioning parallelism means that entire record set is partitioned into small sets and processed on different nodes (logical processors).

File set 1)It stores data in the format similar to a sequential file. 2) Only advantage of using file set over a sequential file is "it preserves partioning scheme". 3) You can view the data but in the order defined in partitioning schema

You might also like