What is a source qualifier?

Source Qualifier represents the rows that the PowerCenter Server reads from a relational or flat file source when it runs a session. When a relational or a flat file source definition is added to a mapping, it is connected to a Source Qualifier transformation. PowerCenter Server generates a query for each Source Qualifier Transformation whenever it runs the session. The default query is SELET statement containing all the source columns. Source Qualifier has capability to override this default query by changing the default settings of the transformation properties. The list of selected ports or the order they appear in the default query should not be changed in overridden query.

What is a surrogate key?
A surrogate key is a substitution for the natural primary key. It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table. Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension tables primary keys. They can use Infa sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key. It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult. Some tables have columns such as AIRPORT_NAME or CITY_NAME which are stated as the primary keys (according to the business users) but ,not only can these change, indexing on a numerical value is probably better and you could consider creating a surrogate key called, say, AIRPORT_ID. This would be internal to the system and as far as the client is concerned you may display only the AIRPORT_NAME. Another benefit you can get from surrogate keys (SID) is : Tracking the SCD - Slowly Changing Dimension. Let me give you a simple, classical example: On the 1st of January 2002, Employee 'E1' belongs to Business Unit 'BU1' (that's what would be in your Employee Dimension). This employee has a turnover allocated to him on the Business Unit 'BU1' But on the 2nd of June the Employee 'E1' is muted from Business Unit 'BU1' to Business Unit 'BU2.' All the new turnover have to belong to the new Business Unit 'BU2' but the old one should Belong to the Business Unit 'BU1.' If you used the natural business key 'E1' for your employee within your datawarehouse everything would be allocated to Business Unit 'BU2' even what actualy belongs to 'BU1.' If you use surrogate keys, you could create on the 2nd of June a new record for the Employee 'E1' in your Employee Dimension with a new surrogate key. This way, in your fact table, you have your old data (before 2nd of June) with the SID of the Employee 'E1' + 'BU1.' All new data (after 2nd of June) would take the SID of the employee 'E1' + 'BU2.' You could consider Slowly Changing Dimension as an enlargement of your natural key: natural key of the Employee was Employee Code 'E1' but for you it becomes Employee Code + Business Unit - 'E1' + 'BU1' or 'E1' + 'BU2.' But the difference with the natural key enlargement process, is that you might not have all part of your new key within your fact table, so you might not be able to do the join on the new enlarge key -> so you need another id.

What is difference between Mapplet and reusable transformation?
Maplet consists of set of transformations that is reusable.A reusable transformation is a single transformation that can be reusable. If u create a variables or parameters in maplet that can not be used in another maping or maplet.Unlike the variables that r created in a reusable transformation can be usefull in any other maping or maplet. We can not include source definitions in reusable transformations.But we can add sources to a maplet. Whole transformation logic will be hided in case of maplet.But it is transparent in case of reusabletransformation. We cant use COBOL source qualifier,joiner,normalizer transformations in maplet.Where as we can make them as a reusable transformations.

What is DTM session?
The DTM process - Creates threads to initialize the session read write and transform data and handle pre- and post-session operations DTM process is the second process associated with a session run The primary purpose of the DTM process is to create and manage threads that carry out the session tasks The DTM allocates process memory for the session and divides it into buffers. This is also known as buffer memory It creates the main thread which is called the master thread The master thread creates and manages all other threads If you partition a session the DTM creates a set of threads for each partition to allow concurrent processing When the Informatica Server writes messages to the session log it includes the thread type and thread ID

What is a look up function? What is default transformation for the look up function?
Lookup is a transformation in Informatica which is mainly used for obtaining the "Key" values from the dimensions. Lookup can be connected or unconnected. If it is unconnected then it can be used as a function but it can return only one value. Where as connected lookup return more than one value. Lookup contains: 1. <Input Column/Value> 2. <Output Column(s)> 3. <Condition> the default transformation for look up is source qualifier

What is difference between a connected look up and unconnected look up?
Connected lookup Unconnected lookup

Receives input values directly from the pipeline Receives input values from the result of a lkp expression in another transformation. yoU can use a dynamic or static cache U can use a static cache.

Cache includes all lookup columns used in the maping Cache includes all lookup out put ports in the lookup condition and the lookup/return port. Support user defined default values user defiend default values Does not support

What is up date strategy and what are the options for update strategy?
Informatica processes the source data row-by-row. By default every row is marked to be inserted in the target table. If the row has to be updated/inserted based on some logic Update Strategy transformation is used. The condition can be specified in Update Strategy to mark the processed row for update or insert. Following options are available for update strategy : • • • • DD_INSERT : If this is used the Update Strategy flags the row for insertion. Equivalent numeric value of DD_INSERT is 0. DD_UPDATE : If this is used the Update Strategy flags the row for update. Equivalent numeric value of DD_UPDATE is 1. DD_DELETE : If this is used the Update Strategy flags the row for deletion. Equivalent numeric value of DD_DELETE is 2. DD_REJECT : If this is used the Update Strategy flags the row for rejection. Equivalent numeric value of DD_REJECT is 3.

What is subject area?
In terms of Erwin we can break a huge schema into smaller parts for easier managemant purpose and each part is known as subject area. There is always a main subject area which holds entire schema and there can be n number of user defined suject areas as per our requirement.

What is the difference between truncate and delete statements?
The DELETE command is used to remove rows from a table. A WHERE clause can be used to only remove some rows. If no WHERE condition is specified, all rows will be removed. After performing a DELETE operation you need to COMMIT or ROLLBACK the transaction to make the change permanent or to undo it.

TRUNCATE removes all rows from a table. The operation cannot be rolled back. As such, TRUCATE is faster and doesn't use as much undo space as a DELETE. The DROP command removes a table from the database. All the tables' rows, indexes and privileges will also be removed. The operation cannot be rolled back. DROP and TRUNCATE are DDL commands, whereas DELETE is a DML command. Therefore DELETE operations can be rolled back (undone), while DROP and TRUNCATE operations cannot be rolled back.

What are bitmap indexes and how and why are they used?
A bitmap index is a special kind of database index that uses bitmaps. Bitmap indexes have traditionally been considered to work well for data such as gender, which has a small number of distinct values, e.g., male and female, but many occurrences of those values. This would happen if, for example, you had gender data for each resident in a city. Bitmap indexes have a significant space and performance advantage over other structures for such data. However, some researchers argue that Bitmap indexes are also useful for unique valued data which is not updated frequently.
[1]

Bitmap indexes use bit arrays (commonly called "bitmaps") and answer queries by performing bitwise

logical operations on these bitmaps. Bitmap indexes are also useful in the data warehousing applications for joining a large fact table to smaller dimension tables[2] such as those arranged in a star schema. In other scenarios, a B-tree index would be more appropriate.

What are the different ways to filter rows using Informatica transformations?
Method 1: Sorter -Filter. Send all the data to a sorter and , sort by all feilds that u want to remove duplicacy from . note that in the preoperties tab, select Unique . This will select and send forward only Unique Data . Method 2; Use an Aggregator Use AGG Transformation and group by the keys /feilds that u want to remove duplicacy from. Distinct in SQL override

What is referential Integrity error? How do you rectify it? What is DTM process? What is target load order?
In a mapping if there are more than one target table then we need to give in which order the target tables should be loaded example: suppose in our mapping there are 2 target table 1. customer 2. Audit table

first customer table should be populated than Audit table for that we use target load order

What exactly is a shortcut and how do you use it?
A shortcut is a reference (link) to an object in a shared folder these are commonly used for sources and targets that are to be shared between different environments / or projects. A shortcut is created by assigning 'Shared' status to a folder within the Repository Manager and then dragging objects from this folder into another open folder; this provides a single point of control / reference for the object multiple projects don't all have import sources and targets into their local folders. A reusable transformaion is usually something that is kept local to a folder examples would be the use of a reusable sequence generator for allocating warehouse Customer Id's which would be useful if you were loading customer details from multiple source systems and allocating unique ids to each new sourcekey. Many mappings could use the same sequence and the sessions would all draw from the same continuous pool of sequence numbers generated.

What is a shared folder?

Shared Folder is like another folder but that can be accessed by all the users(Access can be changed) This is mainly used to share the Objects between the folders for resuablity. For example you can create a shared folder for keeping all the Common Mapplets Src Tgt and Transformation that can be used across the folders by creating shortcut to those. By doing we are increasing usablity of the code also changes can be made at one place and that will easily reflect in all other shortcuts.

What are the different transformations where you can use a SQL override? What is the difference between a Bulk and Normal mode and where exactly is it defined?
If you enable bulk loading, the PowerCenter Server bypasses the database log.This improves session performance. But the disadvantage is that target database cannot perform rollback as there is no database log. In normal load the databse log is not bypassed and therefore the target databse can recover from an incomplete session.The session performance is not as high as is in the case of bulk load

What is the difference between Local & Global repository?
• Global Repository : This is a centralized repository in a domain. This repository can contain shared objects across the repositories in a domain. The objects are shared through global shortcuts. Local Repository : Local repository is within a domain and it’ not a global repository. Local s repository can connect to a global repository using global shortcuts and can use objects in it’s shared folders.

What are data driven sessions?
The informatica server follows instructions coded into update strategy transformations with in the sessionmapping to determine how to flag records for insert,update,delete or reject. If you do not choose data driven optionn setting, the informatica server ignores all update strategy transformations in the mapping

What are the common errors while running a Informatica session? What are worklets and what is their use?
Worklet is a set of tasks. If a certain set of task has to be reused in many workflows then we use worklets. To execute a Worklet it has to be placed inside a workflow. The use of worklet in a workflow is similar to the use of mapplet in a mapping.

What is change data capture?
Changed Data Capture (CDC) helps identify the data in the source system that has changed since the last extraction. With CDC data extraction takes place at the same time the insert update or delete operations occur in the source tables and the change data is stored inside the database in change tables. The change data thus captured is then made available to the target systems in a controlled manner

What exactly is tracing level?
Tracing level in the case of informatica specifies the level of detail of information that can be

recorded in the session log file while executing the workflow. 4 types of tracing levels supported
1.Normal: It specifies the initialization and status information and summerization of the success rows

and target tows and the information about the skipped rows due to transformation errors.
2.Terse specifies Normal + Notification of data

3. Verbose Initialisation : In addition to the Normal tracing, specifies the location of the data cache

files and index cache files that are treated and detailed transformation statistics for each and every transformation within the mapping.
4. Verbose data: Along with verbose initialisation records each and every record processed by the

informatica server
For better performance of mapping execution the tracing level should be specified as TERSE Verbose initialisation and verbose data are used for debugging purpose.

What is the difference between constraints based load ordering and target load plan?
Target load order comes in the designer property..Click mappings tab in designer and then target loadplan.It will show all the target load groups in the particular mapping. You specify the order there the server will loadto the target accordingly. A target load group is a set of source-source qulifier-transformations and target. Where as constraint based loading is a session proerty. Here the multiple targets must be generated from one source qualifier. The target tables must posess primary/foreign key relationships. So that the server loads according to the key relation irrespective of the Target load order plan.

What is a deployment group and what is its use?

When and how a partition is defined using Informatica?
the Partitioning Option increases Power Center?s performance through parallel data processing, and this option provides a thread-based architecture and automatic data partitioning that optimizes parallel processing on multiprocessor and grid-based hardware environments. Partitions are used to optimize the session performance. We can select in session properties for partitions Types- Default----pass through partition, key range partition, round robin partition, hash partition.

How do you improve performance in an Update strategy?
We can use Update strategy transformation in two ways . 1.Mapping level. 2.session level. Importence of Update strategy transformation in both cases as follows. In real time if we want to update the existing record with the same source data you can go for session level update logic. If you want to applay different set of rules for updating or inserting a record even that record is existed in the warehouse table .you can go for mapping level Update strategy transformation.It means that if you are using Router transformation for performaning different activities.

How do you validate all the mappings in the repository at once?
Yes. We can validate all mappings using the Repo Manager.

How can you join two or more tables without using the source qualifier override SQL or a Joiner transformation?
U can use the Lookup Transformation to perform the join. Here u may not get the same results as SQL OVERRIDE(data coming from same soure) or Joiner(same source or different sources) because it takes single record in case of multi match If one of the table contains single record u dont need to use SQL OVERRIDE or Joiner to join records. Let it perform catesion product. This way u dont need to use both(sql override joiner) if the sources have same structure we can use union transformation

How can you define a transformation? What are different types of transformations in Informatica? How many repositories can be created in Informatica?
• • Standalone Repository : A repository that functions individually and this is unrelated to any other repositories. Global Repository : This is a centralized repository in a domain. This repository can contain shared objects across the repositories in a domain. The objects are shared through global shortcuts.


Local Repository : Local repository is within a domain and it’s not a global repository. Local repository can connect to a global repository using global shortcuts and can use objects in it’s shared folders. Versioned Repository : This can either be local or global repository but it allows version control for the repository. A versioned repository can store multiple copies, or versions of an object. This features allows to efficiently develop, test and deploy metadata in the production environment.

How many minimum groups can be defined in a Router transformation? How do you define partitions in Informatica?
http://www.scribd.com/doc/8359658/Pipeline-Partitioning-Overview-informatica

How can you improve performance in an Aggregator transformation?
a) Use sorted input to decrease the use of aggregate caches. b) Limit connected input/output or output ports c) Filter before aggregating (if you are using any filter condition)

How does the Informatica know that the input is sorted? How many worklets can be defined within a workflow? How do you define a parameter file? Give an example of its use. If you join two or more tables and then pull out about two columns from each table into the source qualifier and then just pull out one column from the source qualifier into an Expression transformation and then do a ‘generate SQL’ in the source qualifier how many columns will show up in the generated SQL. In a Type 1 mapping with one source and one target table what is the minimum number of update strategy transformations to be used? At what levels can you define parameter files and what is the order? In a session log file where can you find the reader and the writer details? For joining three heterogeneous tables how many joiner transformations are required? Can you look up a flat file using Informatica? While running a session what default files are created? Describe the use of Materialized views and how are they different from a normal view.
the difference between partiton at the database level and partition at the informatica level .? Is there a concept called database level partition .?

Informatica Partitioning is "how load the data efficiently" When you configure the partitioning information for a pipeline, you must define a partition type at each partition point in the pipeline. The partition type determines how the Integration Service redistributes data across partition points. Database Partitioning is ?how to store the data efficiently and how to retrieve the same" Informatica can also use the database partitioning as follows: Database partitioning. The Integration Service queries the IBM DB2 or Oracle database system for table partition information. It reads partitioned data from the corresponding nodes in the database. You can use database partitioning with Oracle or IBM DB2 source instances on a multi-node tablespace. You can use database partitioning with DB2 targets. 1. When you use source database partitioning, the Integration Service queries the database system catalog for partition information. It distributes the data from the database partitions among the session partitions. 2. If the session has more partitions than the database, the Integration Service generates SQL for each database partition and redistributes the data to the session partitions at the next partition point. Again see the below point, partitioning will not always increase the performance Session performance with partitioning depends on the data distribution in the database partitions. The Integration Service generates SQL queries to the database partitions. The SQL queries perform union or join commands, which can result in large query statements that have a performance impact.