Professional Documents
Culture Documents
DataStage Hints
DataStage Hints
Various
Allow DataStage to see a UniData account
Need to activate Uniserver on the UniData account (free licence). This allows Objectcall to work from DataStage.
An Oracle null is defined by a bit being set, and as such can not be tested for directly from DataStage. However, you can use an NVL to set a null value in the Oracle stage to something else, and then use that same value in the lookup derivation: ORAOCI stage, user-defined SQL: NVL(SOURCE_SYSTEM_K4_VALUE,-999)=:4 Key field derivation in the look up stage:
IF lu.ANIMAL_PRODUCTION_PK = "" THEN -999 ELSE lu.ANIMAL_PRODUCTION_PK
ODBCs
Server details From DataStage Manager, to enable a new ODBC driver need to update the following DataStage config files: .odbc.ini uvodbc.config Note that there can also exist a local uvodbc.config file for a project. If there isnt DataStage uses the default file. This default file is located in $DSHOME / DataStage / DSEngine Or possibly /dsadm/DataStage//DSEngine/ To get to $DSHOME type: The file is called .odbc.ini cd `cat /.dshome`
16/05/11 /opt/scribd/conversion/tmp/scratch6111/58361818.doc
Page 1 of 20
Command to find number of files that can be opened: To increase number of files that can be opened: Place this after the umask command
16/05/11 /opt/scribd/conversion/tmp/scratch6111/58361818.doc
Page 2 of 20
Copy RANGE2, RANGE.COUNT and CD.VOC from JCFILE into VOC. Then run these commands to show how many items in the file are greater than 1000, 2000, 3000 etc. bytes: CD CD.VOC RANGE2 RANGE.COUNT LIST {File name} BY RANGE2 BREAK.ON RANGE2 TOTAL RANGE.COUNT DET.SUP
Output from RETRIEVE command to UNIX or DOS file
Use the COMO command to spool screen output to a file in the {Project}/&COMO&/ directory
Q
[n] S [n] S*
T
[line] W
16/05/11 /opt/scribd/conversion/tmp/scratch6111/58361818.doc
Page 3 of 20
Displays the next 10 lines of source code Subtracts n from the current line Displays the value of variable Changes the value of variable to STRING
DataStage shortcut
<CTRL> E to edit the highlighted stage derivation
16/05/11 /opt/scribd/conversion/tmp/scratch6111/58361818.doc
Page 5 of 20
<20> = Nullable <21> = Key <22> = Display length <23> = actual derivation code <25> = copy of <23> with any transforms applied <26> = something to do with input (output?) columns <45> = Constraint? <100> = Constraint? J\nn\ROOT item in file DS_JOBOBJECTS <2> called OLETYPE = CJobDefn <3> called NAME = {job name} <4> = Short job description <7> = Full job description <9> = Before job subroutine and argument list <11> = Job control code, if any, CR/LF delimited <13> = {text}/number of parameters <14> = parameter name, multi-valued <15> = ??, multi-valued <16> = prompt, multi-valued <17> = default value, multi-valued <18> = help text, multi-valued <19> = type code, multi-valued <31> = List of other jobs called not used? <94> to <96> = Job parameter details J\nn\V0S item in file DS_JOBOBJECTS <2> called OLETYPE = CTransformerStage <3> called NAME = {transform name} <37> = CStageVar/{n} <38> = Stage variable names, multi-valued <39> = Stage variable description fields, multi-valued <40> = Stage variable derivations, multi-valued <41> = Stage variable initial values, multi-valued <43> = Expanded copy of <40>, rebuilt when transform OKd in Designer <45> = Stage variable? J\nn\V0S item in file DS_JOBOBJECTS <2> called OLETYPE = CCustomStage <3> called NAME = {stage name} <7> = Id of CCustomeOutput output link <8> = stage type, eg ORAOCI8 <17> = parameters (with #s), multi-valued J\nn\V0S item in file DS_JOBOBJECTS <2> called OLETYPE = CCustomOutput <3> called NAME = {output link name} <14> = multi-valued details of passive stage: table name, options, SQL code <16> = input column names <23> = key J\nn\V0S item in file DS_JOBOBJECTS <2> called OLETYPE = CCustomInput <3> called NAME = {input link name} <15> = multi-valued details of passive stage: table name, options, SQL code, create/drop table <16> = input column names
16/05/11 /opt/scribd/conversion/tmp/scratch6111/58361818.doc
Page 6 of 20
J\nn\V0S item in file DS_JOBOBJECTS <2> called OLETYPE = CHashedFileStage <3> called NAME = {hash stage name} <8> = directory path J\nn\V0S item in file DS_JOBOBJECTS <2> called OLETYPE = CHashedInput <3> called NAME = {input link name} <6> = Hash file name <7> = Clear file option (1 or 0) <8> = Backup file option (1 or 0) <16> = Allow stage write cache option (1 or 0) <17> = Create file options, if blank file not created <18> = Delete file option (1 or 0) J\nn\V0S item in file DS_JOBOBJECTS <2> called OLETYPE = CHashedOutput <3> called NAME = {link name} <6> = Hash file name <7> = Select criteria (input or look-up file to this job) <9> = {Normalised field name} or the text Un-Normalised J\nn\V0S item in file DS_JOBOBJECTS for Sequencer <2> called OLETYPE = CJSJobActivity <3> called NAME = {stage name} <11> = Execution action: 0 = Run; 1 = Reset if required then run; 2 = Validate; 3 = Reset <12> = Called Job name <13> to <18> - internal parameters for job running <15> = invocation id expression <16> = 4 if invoking multiple instance job <17> = <15> <19> to <24> - parameters <20> = parameter name in job being called <21> = actual value fed through, calling job parameter name or hard coded value <23> = <21> ? J\nn\V0S item in file DS_JOBOBJECTS <2> called OLETYPE = CJSSequencer <3> called NAME = {blue sequencer name} <8> = output mode: 0 for ALL; 1 for ANY J\nn\V0S item in file DS_JOBOBJECTS <2> called OLETYPE = CJSActivityOutput <3> called NAME = {output link name} <6> Trigger expression Type: 1 Otherwise; 2 Executed OK; 3 Failed; 4 Warnings; 5 User Status; 6 Conditional <8> Trigger expression Text: N/A; Executed OK in quotes; Executed Failed ; Executed finished with warnings; = {user status}; <LHS> = <RHS> DS_METADATA <4> = Short description <5> = Long description <9> = 0 for normal file, 1 for associated (mvd) file <11> = text / number of fields <12> = Column name <13> = Description with CR/LFs embedded <14> = Data element
16/05/11 /opt/scribd/conversion/tmp/scratch6111/58361818.doc
Page 7 of 20
<15> = SQL type code, defined as above <16> = Length <17> = Scale <18> = Nullable 0 or 1 <19> = Key 1 or 0 <20> = Display length <21> = Association Only displayed if <9> = 1 <22> = Position (attribute Only displayed if <9> = 1 <23> = Attribute type Only displayed if <9> = 1 <102> = Dunno, set to multiv7.5: Level? <103> = text: v7.5
zeros
default=0"
<114> = set to multi-valued zeros <115> = ditto <116> = ditto DS_CONTAINERS Similar to DS_JOBS but for containers <5> = JOBNO, recs exist in DS_JOBOBJECTS as above RT_STATUSnnnn where nnnn = JOBNO Job status file, id is JobName.Instance and JobName.Instance.1 In the .1 record can change status attribute <2> - that appears in Director (for instance if can not clear status file normally) 3 Aborted 2 Finished 0 or 1 Running?
See JCFILE JC_TRANS for routine to duplicate derivations and many other useful bits. Also see JC_TRANSALL, JC_TABLES and JC_METADATA
16/05/11 /opt/scribd/conversion/tmp/scratch6111/58361818.doc
Page 8 of 20
At top of routine: Deffun FindMatchingContracts(InFile, OutFile, DBSOURCE) Calling "DSU.FindMatchingContracts" Later in code: VAR = FindMatchingContracts(InFile,OutFile,DBSOURCE)
The wait waits for the job to finish. The local is only really necessary if changing environment variables. Other dsjob options are:
dsjob [-server servname] [-user userid] [-password pwd] run [-mode] [-param p1 param p2 etc.] [-wait] ProjectName JobName
16/05/11 /opt/scribd/conversion/tmp/scratch6111/58361818.doc
Page 9 of 20
16/05/11 /opt/scribd/conversion/tmp/scratch6111/58361818.doc
Page 10 of 20
or
= = = = = = = =
"user f803163 phant0m" "lcd /radf/sinay/uat/temp" "cd ../erdp" "mdelete ":WILDCARD "put ":FILENAME:VERSION "mput ":WILDCARD "quote site chmod 646 ":FILENAME:VERSION "bye"
End command = "ftp -i -n < ":ScriptName Call DSU.ExecSH(command, SystemReturnCode) Delete Fv, ScriptName On Error Null Else Null
DataStage Macros
The following can be used within the derivations of a transform to return the information: Return the name of this job Sets the DataStage internal variable UserStatus, for instance with the output from an Oracle COUNT command Feed through the contents of UserStatus as a parameter Returns the number of rows processed in a link Returns the number of rows processed in a link in another job DSGetJobInfo(DSJ.ME,DSJ.JOBNAME) SetUserStatus(inputlinkname.ROW_COUNT)
From a sequencer: click on Insert parameter value Find the jobname where user status was set Click on $UserStatus DSGetLinkInfo(DSJ.ME,"TransformName","LinkName", DSJ.LINKROWCOUNT) Call a DataStage routine which does the following:
JOBHANDLE = DSAttachJob(JobName,DSJ.ERRWARN) Ans = DSGetLinkInfo(JOBHANDLE,TransformName,LinkName,DSJ.LINKROWCOUNT)
16/05/11 /opt/scribd/conversion/tmp/scratch6111/58361818.doc
Page 11 of 20
Ensure job is in a runnable state from within Job Control (Batch job)
* Ensure job is in a runnable state NOTRUNNABLE = 98 RUNFAILED (Aborted) = 3 CRASHED = 96 Finished = 1 Compiled = 99 Reset = 21 * Ensure job is in a runnable state
Status = DSGetJobInfo(hJob1, DSJ.JOBSTATUS) IF Status = DSJS.RUNFAILED OR Status = DSJS.CRASHED THEN ErrCode = DSRunJob(hJob1, DSJ.RUNRESET) ErrCode = DSWaitForJob(hJob1) hJob1 = DSAttachJob("JOBNAME", DSJ.ERRFATAL) If NOT(hJob1) Then Call DSLogFatal("Job Attach Failed:JOBNAME", "JobControl") Abort End END
16/05/11 /opt/scribd/conversion/tmp/scratch6111/58361818.doc
Page 12 of 20
SELECT Account_Number,Lessee_Name,etc FROM STH_Customers SELECT C.Account_Number,C.Lessee_Name,etc,B.Insurance_Indicator FROM STH_Customers C,STH_Bookings B WHERE C.Account_Number = B.Account_Number
Note the addition of the second file name, B and C in lieu of the full file names, and the where clause
Using an ODBC stage to access, select, look-up on a txt file (on NT)
This allows sql statements on a sequential (.csv or .txt) file! Save the csv file as a tab delimited text file Use Control Panel to set up an ODBC driver on the (NT) server: o System DSN o Microsoft text driver o Select the directory o .TAB o define format o tick Column Name Header o select OEM box o click on GUESS box (if change name click Modify) o doing this will generate a Schema.ini file o If ever need to modify anything delete this Schema.ini file first DataStage Manager: o Import table definitions from the driver just created DataStage Job o Use ODBC stage o Quote = 000 o Load columns, remove prefixies in column derivations can then do selects on the file, or lookups etc.
16/05/11 /opt/scribd/conversion/tmp/scratch6111/58361818.doc
Page 13 of 20
Read on HASH file when correct key doesnt find existing record
And other similar problems where the data looks like its in the wrong column when viewed in the DataStage hash file stage. This is often caused by having loaded the column definitions from DataStage Manager Table Definitions. The resolution is to delete all the columns in the hash file stage and enter them manually. If there are too many try just deleting the ones where it goes out of step and re-entering them manually. Alternatively, re-import the metadata in Manager, then in the job delete the file stage, recreate it and load the columns from the re-imported metadata (having removed I-types).
Warning Messages
Warning message in log regarding Phantom processes
For example: DataStage Job 270 Phantom 1364 Program "JOB.1215067440.DT.1362629138.TRANS1": Line 301, Variable previously undefined. Zero length string used. The Job number is 270 meaning that under that project directory there will be a subdirectory called RT_BP270. Under this directory will be the source code for JOB.1215067440.DT.1362629138.TRANS1. Each transformer will have a program. In this case you have a transformer called TRANS1. It should be possible to work out line 301 by looking at the transformer.
16/05/11 /opt/scribd/conversion/tmp/scratch6111/58361818.doc
Page 14 of 20
DSBrowser..ORA_RefVendor: ORA-00932: inconsistent datatypes This is due to the Oracle table having a datatype different to that defined in the stage Columns. For instance, the stage might say TIMESTAMP whereas the Oracle table says DATE.
16/05/11 /opt/scribd/conversion/tmp/scratch6111/58361818.doc
Page 15 of 20
Message: Data has been truncated To identify field that is too long change output to fixed-length flat file and run the job, then the log identifies the column!
Killing a process
Find the pid of the process, possibly by doing a LIST.READU Then kill it using LOGOUT {pid} LISTU shows users logged on UNLOCK ALL or other option from UV account, clears all locks UNLOCK USER 61234 ALL for one specific user (Userno column in LIST.READU)
Get Unix to finish typing for you (it will fill the unique bit of the file name): <ESC>\ and file name is completed automatically Changing the UNIX prompt export PS1='$PWD > ' export PS1='$ ' Sort output from ls al by the fifth column: ls al | sort k5 sort {-o output filename} {-t~} k1.1 {k3} {input filename} where t defines the delimiter, -k field.column is the sort key Script to remove a list of files or directories (called H_) for i in `ls|grep H_` note apostrophe is top left hand key do rm -rf $I -rf, all contents and dont prompt done Display files that match a OR b: Create a file with the text to search for, one item per line ls l | fgrep f {filename} 16/05/11 /opt/scribd/conversion/tmp/scratch6111/58361818.doc Page 16 of 20
Display only directories in the current directory: ls l | grep ^d ^d means first character is a d Displays lines in a file that do NOT have 0 to 9 as the first character: grep v ^[0-9] filename Displays directories excluding DataStage ones: ls -l |grep -v 'RT_' |grep -v 'DS_'|grep ^d|pg Search for a particular text string in a mass of files: find . exec grep string to find {} \; -print The argument '{}' inserts each found file into the grep command line. The \; argument indicates the exec command line has ended. Display all hash files excluding dictionary levels and log files: ls l ./HashFiles/ | grep v _D | grep v LOG | pg Displays lines containing abc OR xyz: grep -E "abc|xyz" file1 Count number of rows in INPUTFILE_FF starting with 20: grep ^20 /usr/dstage/data/Basel2/work/DA1842/INPUTFILE_FF | wc -l df k . show used and free space in kilobytes
Save and compress a bunch of files, then reverse the process: tar cvf {newname}.tar *{sel criteria}* compress {newname}.tar uncompress {newname}.tar.Z tar xvf {newname}.tar
ORACLE performance
For stats re processes that are running use, or pick out relevant columns: select * from v$sqlarea order by cpu_time desc Usefull columns include SQL_TEXT, EXECUTIONS, CPU_TIME, ELAPSED_TIME, DISK_READS, BUFFER_GETS, ROWS_PROCESSED
16/05/11 /opt/scribd/conversion/tmp/scratch6111/58361818.doc
Page 17 of 20
SQL
Wildcard character is % (matches zero or more characters). Underscore _ matches 1 character. Also have the NOT operand. Comparisons allowed are: =, <>, <, <=, >, >= /* used to include comments on SQL query page */
16/05/11 /opt/scribd/conversion/tmp/scratch6111/58361818.doc
Page 18 of 20
Summarising data:
SELECT AVG(100 * (SALES / QUOTA)) FROM SALESREPS; Average size of an order in the database: SELECT AVG(AMOUNT) FROM ORDERS WHERE CUST = 213423; Total amount of orders by customer (group orders by customer and total): SELECT CUST, SUM(AMOUNT) FROM ORDERS GROUP BY CUST; SELECT REP, CUST, SUM(AMOUNT) FROM ORDERS GROUP BY REP, CUST; Minimum and maximum size of an order in the database: SELECT MIN(QUOTA), MAX(QUOTA) FROM SALESREPS; Count how many orders in the database (counts number of rows): SELECT COUNT(*) FROM ORDERS; SELECT COUNT(AMOUNT) FROM ORDERS WHERE AMOUNT > 25000.00; Count distinct number of titles: SELECT COUNT(DISTINCT TITLE) FROM SALESREPS; For 3 particular sinds count and group by year (APDATE=DD^MMM^YY): SELECT SUBSTR(APDATE,8,2), COUNT(SUBSTR(APDATE,8,2)) FROM COMPOSITE_FWPS_AGREEMENT WHERE SIND IN (8,12,29) GROUP BY SUBSTR(APDATE,8,2);
16/05/11 /opt/scribd/conversion/tmp/scratch6111/58361818.doc
Page 19 of 20
Pure Iner Join Complete Set Right and Left Only Left Outer Join Right Outer Join Left Only Right Only
Merges only those rows with the same key values in both input files Merges all rows from both files Merges all rows from both files except those rows with the same key values Merges all rows from the first file (A) with rows from the second file (B) with the same key Merges all rows from the second file (B) with rows from the first file (A) with the same key Merges all rows from the first file (A) except rows with the same key in the second file (B) Merges all rows from the second file (B) except rows with the same key in the first file (A)
NULL handling
In a WHERE clause any condition that does not explicitly mention NULL values automatically fails if one of the values is NULL. Hence SALMONELLA_STATUTORY_SPP_FLAG <> 'T' / 'F' will automatically yield FALSE if SALMONELLA_STATUTORY_SPP_FLAG is a null. If you want to cater for nulls then you should put:where SALMONELLA_STATUTORY_SPP_FLAG IS NULL (it's quicker to write too). In 1, the GROUP BY function does include a grouping by the NULL value for any term, so the NULL fields are included and counted. If you want to include nulls in a different way you can also use the function NVL(field_which_could_be_null, value_to_replace_NULL_with) e.g. WHERE NVL(SALMONELLA_STATUTORY_SPP_FLAG, 'X') = 'X'
16/05/11 /opt/scribd/conversion/tmp/scratch6111/58361818.doc
Page 20 of 20