You are on page 1of 3

1. Hi I have a CSV file. I need to find if there are any duplicates in the files.

If any duplicate records are present I need to reject them and put into a sequential file. Can any body let me know how can I do this in DataStage8.1 version? my requirement is like sno,sname 1,A 2,B 3,C 1,D 2,E I want desire output like sno,sname 1,D 2,E Ans1 : Sorry to capture rejects, you need to use Transformer and then use Stage variables to find duplicates n capture them using constraints. In sequential stage we one option called filter in use unix command. (unix -d) upi will get duplicate
The command given is to identify the list of repeated items. Uniq -d It works best only when data is sorted. So club with sort command Eg: Cat <filename> | sort | uniq -d The above will give the list of unique repeated items.

Ans 2 : the remove duplicate does not support reject link Ans3 : 1st step -: sort the record base upon a column on which you are looking duplicate. 2nd -: Suppose that column is SL_NO then in transformer take 2 stage

variables. Var1 & Var2 Var1 = if SL_NO = Var2 then Var1 = 0 else Var1 = SL_NO Var2 = SL_NO 3rd Step-: Put the constrain in the reject file as Var1 = 0 and target table is Var1 <> 0 and you will not get any reject record in the target and all the reject record in the file. Ans 4 : According to your requirement it is better to go for Remove Duplicate Stage, in that you can find out option like duplicates retain last, so that you will get only the latest duplicate values in your output. the process is like that SeqFIle-->RemoveDuplicate-->Seqfile.

It works only when u need latest entered duplicate values in your o /p.No need to sort the data Assuming you have to get the last record for each keysno,sname 1,A 2,B 3,C 1,D 2,E I want desire output like sno,sname 1,D 2,E In sort stage do a stable sort and in remove duplicates mention last record to be retained. Other options you can try as mentioned by all.

You might also like