You are on page 1of 14

1. 2.

How do you connect EME to Abinitio Server? What does dependency analysis mean in Ab Initio?

Ans: Dependency analysis will answer the questions regarding datalinage.That is where does the data come from,what applications prodeuce and depend on this data etc. We can retrieve the maximum (surrogate key) from the existing data,the y using scan or next!in!sequence"reformat we can generate further sequence for new records.

3.

How to create repository in abinitio for stand alone system( !"A #$%?

Ans: #f you are trying to install the $ %#nitio on stand alone machine , then it is not necessary to create the repository , While installing #t creates automatically for you under a initio folder ( where you installing the $ %#nitio) #f you are still not clear please ask your &uestion on the same portal .

4.

What is the difference between rollup and scan?

Ans: 'y using rollup we cant generate cumulative summary records for that we will e using scan. &' How can i run the ( )*I mer+e files?

Ans:Do you mean y merging (ui map files in W).#f so, y merging (*# map files in (*# map editor it wont create corresponding test script.without testscript you cant run a file.+o it is impossi le to run a file y merging , (*# map files '

6.

,escribe how you would ensure that database ob-ect definitions ($ables. Indices. "onstraints. $ri++ers. *sers. o+ins. "onnection !ptions. and Server !ptions etc% are consistent and repeatable between multiple database instances (i'e': a test and production copy of a database%'

Ans: Take an entire data ase ackup and restore it in different instance. Take a statistics of all valid and invalid o -ects and match..eriodically refresh.

7.

,escribe the /)rant01evo2e3 ,, facility and how it is implemented?

Ans: 'asically,This is a part of D.'.$ responsi ilities ()$/T means permissions for example ()$/T 0)1$T1 T$'21 ,0)1$T1 3#1W $/D 4$/5 46)1 . )13671 means cancel the grant (permissions).+o,(rant or )evoke oth commands depend upon D.'.$. 4' E5plain the difference between the /truncate3 and 6delete6 commands? ans: Truncate 8% #t is a DD2 command, used to delete ta les or clusters. +ince it is a DD2 command hence it is auto commit and )oll ack can9t e performed. #t is faster than delete. ,elete:7 #t is D42 command, generally used to delete a record, clusters or ta les. )oll ack command can e performed , in order to retrieve the earlier deleted things. To make deleted things permanently, :commit: command should e used '

8.

When runnin+ a stored procedure definition script how would you +uarantee the definition could be 6rolled bac26 in the event of problems?

ans: There are quite a few factors that determines the approach such as what type of version control are used, what is the si;e of the change, what is the impact of the change, is it a new procedure or replacing an existing and so on. #f it is a new, then -ust drop the wrong one

if it is a replacement then how ig is the change and what will e the possi le impact, depending upon you can have the entire data ase acked up or -ust create a script for your original procedure efore messing it up or you -ust do an ed and chan+e the file ack to original and reapply. you may rename the old procedure as old and then work on new and so on. few issues to keep in mind are synonyms, dependancies, grants, any -o calling the procedure at the time of change and so on. #n nutshell, scenario can e varied and solution also can e varied. 89' ,escribe the process steps you would perform when defra+mentin+ a data table' $his table contains mission critical data' Ans: $here are several ways to do this: <) We can move the ta le in the same or other ta lespace and re uild all the indexes on the ta le. alter ta le =ta le!name> move =ta lespace!name> this activity reclaims the defragmented space in the ta le analy;e ta le ta le!name compute statistics to capture the updated statistics. ,))eorg could e done y taking a dump of the ta le, truncate the ta le and import the dump ack into the ta le. 88' ,escribe the elements you would review to ensure multiple scheduled 6batch6 -obs do not 6collide6 with each other'? ans: 'ecause every -o depend upon another -o for example if you first -o result is successfull then another -o will execute otherwise your -o doesn9t work. 8(' When usin+ multiple ,M statements to perform a sin+le unit of wor2. is it preferable to use implicit or e5plicit transactions. and why? Ans: 'ecause implicit is using for internal processing and explicit is using for user open data requied' 8:' How would you find out whether a S; <uery is usin+ the indices you e5pect? ans: 1xplain plan can e reviewed to check the execution plan of the query. This would guide if the expected indexes are used or not. 8=. What is a cursor? Within a cursor. how would you update fields on the row -ust fetched? Ans: The oracle engine uses work areas for internal processing in order to the execute sql statement is called cursor.There are two types of cursors like #mplecit cursor and 1xplicit cursor.#mplicit cursor is using for internal processing and 1xplicit cursor is using for user open for data required. 8&. What are "artesian -oins? Ans: 0artesian -oin will get you a 0artesian product. $ 0artesian -oin is when you -oin every row of one ta le to every row of another ta le. 5ou can also get one y -oining every row of a ta le to every row of itself. 8>' What are primary 2eys and forei+n 2eys? Ans: #n )D'4+ the relationship etween the two ta les is represented as .rimary key and foreign key relationship.Wheras the primary key ta le is the parent ta le and foreignkey ta le is the child ta le.The criteria for oth the ta les is there should e a matching column.

8?. Have you used rollup component? ,escribe how' Ans: #f the user wants to group the records on particular field values then rollup is est way to do that. )ollup is a multi%stage transform function and it contains the following mandatory functions. <. initialise ,. rollup ?. finalise $lso need to declare one temporary varia le if you want to get counts of a particular group. @or each of the group, first it does call the initialise function once, followed y rollup function calls for each of the records in the group and finally calls the finalise function once at the end of last rollup call. 84' Have you wor2ed with pac2a+es? Ans: .ackages are nothing ut the reusa le locks of o -ects like transforms, user defined functions, dmls etc. These packages are to e included in the transform where you use them. @or example, consider a user defined function like "Astring!trim.xfrA" out88trim(input!string)B egin let string(?C) trimmed!string B string!lrtrim(input!string)D out88trimmed!stringD end /ow, the a ove xfr can e included in the transform where you call the a ove function as include 99E"xfr"string!trim.xfr99D 'ut this should e included $'631 your transform function. @or more details see the help file in :packages:. 8@' E5plain what is loo2up? ans: 2ookup is asically a specific dataset which is keyed. This can e used to mapping values as per the data present in a particular file (serial"multi file). The dataset can e static as well dynamic ( in case the lookup file is eing generated in previous phase and used as lookup file in current phase). +ometimes, hash%-oins can e replaced y using reformat and lookup if one of the input to the -oin contains less num er of records with slim record length. $ #nitio has uilt%in functions to retrieve values using the key for the lookup. (9' How to handle if ,M chan+es dynamically in abinitio ans: #f the D42 changes dynamically then oth dml and xfr has to e passed as graph level parameter during the runtime' (8' How many components in your most complicated +raph? ans: This is a tricky question, num er of component in a graph has nothing to do with

the level of knowledge a person has. 6n the contrary, a proper standardi;ed and modular parametric approach will reduce the num er of components to a very few. #n a well thought modular and parametric design, mostly the graphs will have ?"F components, which will e doing a particular task and will then call another sets of graphs to do the next and so on. This way total num ers of distinct graphs will drastically come down, support and maintenance will be much more simplified' The ottomline is, there are lot more other things to plan rather than to add components. ((' What is the difference between loo27up file and loo27up. with a relevant e5ample? Ans: $ lookup is a component of a initio graph where we can store data and retrieve it y using a key parameter. $ lookup file is the physical file where the data for the lookup is stored. (:' ,o you 2now what a local loo2up is? ans8 2ookup @ile consists of data records which can e held in main memory. This makes the transform function to retrieve the records much faster than retirving from disk. #t allows the transform component to process the data records of multiple files fastly. (=' How do you add default rules in transformer? ans: Dou le click on the transform parameter of parameter ta page of component properties, it will open transform editor. #n the transform editor click on the 1dit menu and then select $dd Default )ules from the dropdown. #t will show two options % <) 4atch /ames ,) Wildcard. (&. What are 2inds of layouts does ab initio supports ans: 'asically there are serial and parallel layouts supported y $ #nitio. $ graph can have oth at the same time. The parallel one depends on the degree of data parallelism. #f the multi%file system is F%way parallel then a component in a graph can run F way parallel if the layout is defined such as it9s same as the degree of parallelism. ,G. What is the use of a++re+ation when we have rollup as we 2now rollup component in abinitio is used to summiriAe +roup of data record' then where we will use a++re+ation ? ans: $ggregation and )ollup oth can summerise the data ut rollup is much more convenient to use. #n order to understand how a particular summerisation eing rollup is much more explanatory compared to aggregate. )ollup can do some other functionalities like input and output filtering of records. ,H. What is the relation between EME . ),E and "o7operatin+ system ? ans: 141 is said as enterprise metdata env, (D1 as graphical devlopment env and 0o% operating sytem can e said as as initio server relation "w this 06%6., 141 $/D (D1 is as fallows 0o operating system is the $ initio +erver. this co%op is installed on perticular 6.+ platform that is called /$T#31 6.+ .comming to the 141, its i -ust as repository in informatica , its hold the metadata,trnsformations,d config files source and targets informations. comming to (D1 its is end user envirinment where we can devlop the graphs(mapping -ust like in informatica) desinger uses the (D1 and designs the graphs and save to the 141 or +and ox it is at user side.where 141 is ast server side.

(4. How to retrive data from database to source in that case whice componenet is used for this? ansB To unload (retrive) Data from the data ase D',, #nformix, or 6racle we have components like #nput Ta le and *nload D' Ta le y using these two components we can unload data from the data ase.

,I. What are the contineous components in Abinitio? ans: 0ontineous components used to create graphs,that produce useful output file while running continously 1x8% 0ontineous rollup,0ontineous update, atch su scri e :9' Which one is faster for processin+ fi5ed len+th dmls or delimited dmls and why ? ans: Ci5ed length D429s are faster ecause it will directly read the data of that length without any comparisons ut in delimited one,s every character is to e compared and hence delays :8. How will you test a dbc file from command prompt ?? ans8 try 6mDdb test myfile'dbc6 :(' What is mean by "o E !peratin+ system and why it is special for Ab7 initio ? ans: #t converts the $ #nitio specific code into the format, which the */#J"Windows can understand and feeds it to the native operating system, which carries out the task. ::' What is the latest version that is available in Ab7initio? Ans: The latest version of (D1 ism<.<C $/D 0o>operating system is ,.<F :=' What is AFD !"A e5pression where do you use it in ab7initio? ans: a local!expr is a parameter of ita le component of $ #nitio.$'260$2() is replaced y the contents of a local!expr.Which we can make use in parallel unloads.There are two forms of $'!260$2() construct, one with no arguments and one with single argument as a ta le name(driving ta le). The use of $'!260$2() construct is in +ome complex +&2 statements contain grammar that is not recogni;ed y the $ #nitio parser when unloading in parallel. 5ou can use the $'260$2() construct in this case to prevent the #nput Ta le component from parsing the +&2 (it will get passed through to the data ase). #t also specifies which ta le to use for the parallel clause' :&' How do you convert =7way MCS to 47way mfs? Ans: To convert F way to K way partition we need to change the layout in the partioning component. There will e seperate parameters for each and every type of partioning eg. $#!4@+!L641, $#!4@+!41D#*4!L641, $#!4@+!W#D1!L641 etc. The appropriate parameter need to e selected in the component layout for the type of partioning'' :>' What is Gmp-ret? Where it is used in ab7initio? ans: 5ou can use Mmp-ret in endscript like if N %eq(Mmp-ret)

then echo :success: else mailx %s :OgraphnameP failed: mailid ?G. How ,oes MAH"!1E wor2s? $ns: 4axcore is a value (it will e in 7 ).Whne ever a component is executed it will take that much memeory we specified for execution :?' What is the ,ifference between ,M E5pression and HC1 E5pression ? ans: The main difference "w dml Q xfr is that D42 represent format of the metadata. J@) represent the tranform functions.which will contain usiness rules :4' What are the different versions and releases of AFinitio (),E and "o7op version% ans: @or (D1 <.<N, <.<<, <.<,, <.<?, and <.<C is latest one. @or 0o%6p latest one is ,.<F. :@' How to run the +raph without ),E? Ans: #n )*/ BB> Deploy >> $s script , it create a . at file at ur host directory ,and then run . at file from 0ommand prompt =9' What are differences between different ),E versions(8'89.8'88.8'8(.8'8:and 8'8&%? What are differences between different versions of "o7op? ans: <.<N is a non key version and rest are key versions. There are lot of components added and revised at following versions. =8' "an anyone +ive me an e5aple of realtime start script in the +raph? Ans: Lere is a simple example to use a start script in a graph8 #n start script lets give as8 export MDTBRdate 9STmTdTy9R /ow this varia le DT will have today9s date efore the graph is run. /ow somewhere in the graph transform we can use this varia le asD out.process!dt88MDTD which provides the value from the shell. =(' What is the synta5 of mDdump command? ans: $he genaral syntax is :m!dump metadata data OactionP : =:' What is the importance of EME in abinitio?

Ans8 141 is a repository in $ #nition and it used for checkin and checkout for graphs also maintains graph version. ==. What is mDdump m!dump command prints the data in a formatted way. m!dump =dml> =file.dat> =>' what is F1!,"AS$I#) and 1EI I"A$E ? ans: 'roadcast % Takes data from multiple inputs, com ines it and sends it to all the output ports. E+ 7 5ou have , incoming flows (This can e data parallelism or component parallelism) on 'roadcast component, one with <N records Q other with ,N records. Then on all the outgoing flows (it can e any num er of flows) will have <N S ,N B ?N records )eplicate % #t replicates the data for a particular partition and send it out to multiple out ports of the component, ut maintains the partition integrity. 1g % 5our incoming flow to replicate has a data parallelism level of ,. with one partition having <N recs Q other one having ,N recs. /ow suppose you have ? output flos from replicate. Then each flow will have , data partitions with <N Q ,N records respectively. :?' What is s2ew and s2ew measurement? Ans: skew is the mesaureof data flow to each partation . suppose i"p is comming from F files and si;e is < g < g B ( <NNm S,NNm S?NNm SCoom ) <NNNm "FB ,CN m (<NN% ,CN )"CNNB %%> %<CN"CNN BB cal ur self it wil come in %ve value. calclu for ,NN,CNN,?NN. Sve value of skew is allways desria le. skew is a indericet measure of graph' :4' what is local and formal parameter? Ans: Two are graph level parameters ut in local you need to initiali;e the value at the time of declaration where as glo le no need to initiali;e the data it will promt at the time of running the graph for that parameter ' :@' What is drivin+ port? When do you use it? Ans: When you set the sorted%input parameter of :U6#/: component to :#n memory8 #nput need not e sorted:, you can find the driving port. (enerally driving port use to improve performance in a graph. The driving input is the largest input. $ll other inputs are read into memory. @or example, suppose the largest input to e -oined is on the in< port. +pecify a port num er of < as the value of the driving parameter. The component reads all other inputs to the -oin V for example, inN, and in, V into memory.

Default is N, which specifies that the driving input is on port inN. Uoin also improves performance y loading all records from all inputs except the driving input into main memory. =9' How to +et ,M usin+ *tilities in *#IH? Ans: #f your source is a co ol copy ook, then we have a command in unix which generates the required in $ #nitio. here it is8 co ol%to%dml' =8' What is semi7-oin ans: #n a initio,there are ? types of -oin... <.inner -oin. ,.outer -oin and ?.semi -oin. for inner -oin 9record!requiredn9 parameter is true for all in ports. for outer -oin it is false for all the in ports. if u want the semi -oin u put 9record!requiredn9 as true for the required component and false for other components'' =(. How would you do performance tunin+ for already built +raph ? "an you let me 2now some e5amples? Ans8 example 8% suppose sort is used in fornt of merge component its no use of using sort W c; we hv sort component uilt in merge. ,) we use lookup instead of U6#/,4erge 0omponenet. ?) suppose we wnt to -oin the data comming from , files and we dnt wnt dupliates we will use union funtion instead of adding addtional component for duplicate remover. =:' How to do we run se<uences of -obs ..li2e output of A J!F is Input to F How do we co7ordinate the -obs? Ans: 'y writing the wrapper scripts we can control the sequence of execution of more than one -o . ==' What is the difference between 'dbc and 'cf+ file? Ans: .cfg file is forX the remote connection and .d c is for connecting the data ase. .cfg contains 8 <. The name of the remote machine ,. The username"pwd to e used while connecting to the d . ?. The location of the operating system on the remote machine. F. The connection method. and .d c file contains the information8 <. The data ase name ,. Data ase version

?. *serid"pwd F. Data ase character set and some more... =&' Sift lin2s to MCS files on *ni5 for Ab Initio ? what is this '''' (% Gpound what is this :% G? what for it is used =% types of loadin+ & overwrite when it used ? ans: 2ink is a command where in unix we use for when the original file is deleted when we create a link the other replaces file exists. 1xample8 ln file < file, MY Total num er of positional parameters. MZ exit status of the last executed command types of loading are conventional loading and direct loading 6verride not overwrite it is used in -oin, assign keys alternate key #n -oin it is used hen input need not e sorted. =>' How to find the number of ar+uments defined in +raph? Ans: MY % /o of positional parameters MZ % the exit status of the last executed command. =?' ,ifference between conventional loadin+ and direct loadin+ ? when it is used in real time '? ans: 0onventional 2oad8 'efore loading the data, all the Ta le constraints will e checked against the data. Direct load8(@aster 2oading) $ll the 0onstraints will e disa led. Data will e loaded directly.2ater the data will e checked against the ta le constraints and the ad data won9t e indexed. $pi conventional loading utility direct loading' =4' can anyone tell me what happens when the +raph run? i'e $he "o7 operatin+ System will be at the host. We are runnin+ the +raph at some other place' How the "o7operatin+ System interprets with #ative !S? ans: when ever you press )un utton on your (D1,the (D1 genarates a script and the genarated script will e transfered to your host which is specified in to your (D1 run settings. then the 0o>operating system interprets this script and executes the script on different mechins(if required) as a su process(threads),after compleation of each su process,these su !processes will return status code to main process this main process in tern returns error or sucess code of the -o to (D1 =@' How to wor2 with parameteriAed +raphs? Ans: 6ne of the main purpose of the parameteri;ed graphs is that if we need to run the same graph for n num er of times for different files, we set up the graph parameters like M#/.*T!@#21, M6*T.*T!@#21 etc and we supply the values for these in the 1dit>parameters.These parameters are su stituted during the run time. we can set different types of parameters like positional, keyword, local etc.

The idea here is, instead of maintaining different versions of the same graph, we can maintain one version for different files. &9' What are the most commonly used components in a Abinition +raph? can anybody +ive me a practical e5ample of a trasformation of data. say customer data in a credit card company into meanin+ful output based on business rules? Ans: The most commonly used components in to any $ #nitio pro-ect are input file"output file input ta le"output ta le lookup file reformat,gather,-oin,runsql,-oin with d ,compress components,sort,trash,partition y expression,partition y key ,concatinate &8' How to Improve Ierformance of +raphs in Ab initio?)ive some e5amples or tips'? Ans: There are somany ways to improve the performance of the graphs in $ initio. # have few points from my side. <.*se 4@+ system using .artion y )ound y ro in. ,.#f needed use lookup local than lookup when there is a large data. ?.Takeout unnecessary components like filter y exp instead provide them in reformat"Uoin")ollup. F.*se gather instead of concatenate. C.Tune 4ax!core for 6ptional performance. G.Try to avoid more phases. &(' How to Schedule )raphs in AbInitio. li2e wor2flow Schedule in Informatica? And where we must is *ni5 shell scriptin+ in AbInitio? ans: $s like in #nformatica, the scheduling cannot e completely done in $ #nitio. #n the $ #nitio (D1 versions ,.<F.J there is a .lan>#t which replaced DT4 plans. Through .lan>#t plans we can have different methods to have pre and post process and also triggers. there are sepearte method to carry out particular process on success and failure of the graphs.$s concerned to my knowledge, the scheduling of these plans are done thru the cron -o only. &:' What r the )raph parameter? ans: There are , types of graph parameters in $ #nitio <. local parameter ,. @ormal parameters.(those parameters working at runtime) &=' E5plain the differences between api and utility mode? Ans: $.# and *T#2#T5 are the two possi le interfaces to connect to the data ases to perform certain user specific tasks. These interfaces allows the user to access or use certain functions (provided y the data ase vendor) to perform operation on the data ases.The functionality of each of these interfaces depends on the data ases. $.# has more flexi ility ut often considered as a slower process as compared to *T#2#T5 mode. Well the trade off is their performance and usage. &&' "an anyone please e5plain the environment varaibles with e5ample'? ansB 1nvironemental varia les server as glo al varia les in unix envrionment. They are used for passing on values from a shell" process to another.

They are inherited y $ initio as sand ox varia les" graph parameters like $#!+6)T!4$J!06)1 $#!L641 $#!+1)#$2 $#!4@+ etc. To know what all varia les exist, in your unix shell, find out the naming convention and type a command like :env [ grep $#:. This will provide you a list of all the varia les set in the shell. 5ou can refer to the graph parameters" components to see how these varia les are used inside $ initio. &>' What is the difference between sandbo5 and EME. can we perform chec2in and chec2out throu+h sandbo50 "an anybody e5plain chec2in and chec2out? AnsB +and oxes are work areas used to develop, test or run code associated with a given pro-ect. 6nly one version of the code can e held within the sand ox at any time. The 141 Datastore contains all versions of the code that have een checked into it. $ particular sand ox is associated with only one .ro-ect where as a .ro-ect can e checked out to a num er of sand oxes &?' How can we test the abintio manually and automation? ans8 i hope a initio testing is to e carried manually only no automation is availa le as of now.we need to integrate the graphs n also run them manually n carry on the process.thats it. &4' "an we load multiple files? Ans: 2oad multiple files from my perspective means writing into more than one file at a time. #f this is the same case with you, $ initio provides a component called Write 4ultiplefiles (in dataset 0omponent group) which can write multiple files at a time. 'ut the files which are to e written must e local files i.e., they should reside in your local .0. @or more information on this component read in help file. &@' What is data mappin+ and data modellin+? AnsB Data mapping deals with the transformation of the extracted data at @#12D level i.e. the transformation of the source field to target field is specified y the mapping defined on the target field. The data mapping is specified during the cleansing of the data to e loaded. @or 1xample8 sourceD string(?C) name B :+iva 7rishna :D targetD string(:N<:) nmB/*22(::)D"A(maximum length is string(?C))A" Then we can have a mapping like8 +traight move.Trim the leading or trailing spaces. The a ove mapping specifies the transformation of the field nm. GN. What do you mean by 'profile in Abinitio and what does it contains? Ans: .profile is a file which gets executed automatically when that particular user logging in. you can change your .profile file to include any commands that you want to execute whenever u logging in.you can even put commands in your .profile file that overrides

settings made in "etc"profile(this file is set up y the system adminiastrator). you can set the following in your .profile...... % 1nvironment settings % aliases % path varia les % name and si;e of your history file % primary and secondary command prompts.. >8' What is 'abinitiorc and What it contain? ans: .a initiorc is the config file for a initio. #t is found in user9s home directory. (enerally it is used to contain a initio home path, different log in information like id encrypted password login method for hosts where the graph connects in time of execution. #t may contain inf like 141 host and others. >(' How to e5ecute the +raph from start to end sta+es? $ell me and how to run +raph in non7Abinitio system? Ans: There are so many ways to do this, i am giving one example due to time constraint you can run components according to phasea how you defined. y creating ksh, sh scripts also you can run. >:' Cor data parallelism. we can use partition components' Cor component parallelism. we can use replicate component' i2e this which component(s% can we use for pipeline parallelism? ans: When connected sequence of components of the same ranch of graph execute concurrently is called pipeline parallelism. 0omponets like reformat where we distri ute input flow to multiple o"p flow using output index depending on some selection criteria and process those o"p flows simultaneosly creates pipeline parallelism. 'ut components like sort where entire i"p must e read efor a single record is writen to o"p can not achieve pipeline parallelism.

64. $here are ( tables. Employee and ,epartment' $here are few records in

employee table. for which. the department is not assi+ned' $he output of the <uery should contain all th employees names and their correspondin+ departments. if the department is assi+ned otherwise employee names and null value in the place department name' What is the <uery? Ans: *se an ouer -oin to get your query. +elect 1.1/$41, D.D/$41 from 1mployee 1, Dept D where D.D1.T/6 (S)B 1.D1.T/6 >&' What is the function you would use to transfer a strin+ into a decimal?

Ans: @or converting a string to a decimal we need to typecast it using the following
syntax, out.decimal!field 88 ( decimal( si;e!of!decimal ) ) string!fieldD The a ove statement converts the string to decimal and populates it to the decimal field in output. DDDDDDDDDDDDDDDDDDDD GG. How many parallelisms are in Abinitio? Ilease +ive a definition of each' ?

$ns8 T5.1+ 6@ .$)$2212#+4\]^ There are ? types of parallelism in a %initio. <) Data .arallelism8 Data is processed at the different servers at the same time. ,) .ipeline parallelism8 #n this the records are processed in pipeline, i.e. the components do not have to wait for all the records to e processed. The records that got processed are passed to next component in pipeline. ?) 0omponent .arallelism8 #n this two or more components process the records in parallel. 0omponent parallelism8% $ graph with multiple processes running simultaneously on separate data uses component parallelism. Data parallelism 8% $ graph that deals with data divided into segments and operates on each segment simultaneously uses data parallelism. /early all commercial data processing tasks can use data parallelism. To support this form of parallelism, $ #nitio provides .artition components to segment data, and Departition components to merge segmented data ack together . .ipeline parallelism 8% $ graph with multiple components running simultaneously on the same data uses pipeline parallelism. 1ach component in the pipeline continuously reads from upstream components, processes data, and writes to downstream components. +ince a downstream component can process records previously written y an upstream component, oth components can operate in parallel. /6T18 To limit the num er of components running simultaneously, set phases in the graph. >4. What is the difference between a ,F confi+ and a "C) file? AnsD $ .d c file has the information required for $ #nitio to connect to the data ase to extract or load ta les or views. While .0@( file is the ta le configuration file created y d !config while using components like 2oad D' Ta le. >@. Have you eveer encountered an error called Kdepth not e<ualK? ($his occurs when you e5tensively create +raphs it is a tric2 <uestion% ans8 When two components are linked together if their layout doesnot match then this pro lem can occur during the compilation of the graph. $ solution to this pro lem would e to use a partitioning component in etween if there was change in layout. ?9. How do you truncate a table? (Each candidate would say only 8 of the several ways to do this'% ans8 There are many ways to do it. <. .ro a ly the easiest way is to use Truncate Ta le ,. )un +ql or update ta le can e used to do the same thing ?. )un .rogram ?8' What is the difference between partitionin+ with 2ey and round robin? Ans8 .$)T#T#6/ '5 7158 #n this, we have to specify the key ased on which the partition will occur. +ince it is key ased it results in very well alanced data. #t is useful for key dependent parallelism. .$)T#T#6/ '5 )6*/D )6'#/8#n this, the records are partitioned in sequential way, distri uting data evenly in locksi;e chunks across the output partition. #t is not key ased and results in well alanced data especially with locksi;e of <. #t is useful for record independent parallelism. ?(' What is a ramp limit?

Ans: The ramp and limit are the varia les that are used to set the re-ect tolerance for a particular graph. The graph stops the execution of the graph when the no of re-ected records exceeds the following formula. limit S (ramp A no!of!records!processed). The default value will e set to N.N.
_ /ew#nte _ / FK

_ /ew#nterview&uestions.com ,G

You might also like