You are on page 1of 143

FINAL INTERVIEW QUESTIONS ( ETL - INFORMATICA)

Data warehousing Basics 1. Definition of data warehousing? Data warehouse is a Subject oriented, Integrated, Time variant, Non volatile collection of data in support of management's decision making process. Subject Oriented Data warehouses are designed to help you analyze data. For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item last year?" This ability to define a data warehouse by subject matter, sales in this case makes the data warehouse subject oriented. Integrated Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure. When they achieve this, they are said to be integrated. Nonvolatile Nonvolatile means that, once entered into the warehouse, data should not change. This is logical because the purpose of a warehouse is to enable you to analyze what has occurred. Time Variant In order to discover trends in business, analysts need large amounts of data . This is very much in contrast to online transaction processing (OLTP) systems, where performance requirements demand that historical data be moved to an archive. A data warehouse's focus on change over time is what is meant by the term time variant. 2. How many stages in Datawarehousing? Data warehouse generally includes two stages ETL Report Generation ETL Short for extract, transform, load, three database functions that are combined into one tool Extract -- the process of reading data from a source database. Transform -- the process of converting the extracted data from its previous form into required form Load -- the process of writing the data into the target database.

ETL is used to migrate data from one database to another, to form data marts and data warehouses and also to convert databases from one format to another format. It is used to retrieve the data from various operational databases and is transformed into useful information and finally loaded into Datawarehousing system.

1 INFORMATICA 2 ABINITO 3 DATASTAGE 4. BODI 5 ORACLE WAREHOUSE BUILDERS Report generation In report generation, OLAP is used (i.e.) online analytical processing. It is a set of specification which allows the client applications in retrieving the data for analytical processing. It is a specialized tool that sits between a database and user in order to provide various analyses of the data stored in the database. OLAP Tool is a reporting tool which generates the reports that are useful for Decision support for top level management.

1. Business Objects 2. Cognos 3. Micro strategy 4. Hyperion 5. Oracle Express 6. Microsoft Analysis Services Different Between OLTP and OLAP OLTP 1 Application Oriented (e.g., purchase order it is functionality of an application) Used to run business Detailed data Repetitive access Few Records accessed at a time (tens), simple query Small database Current data Clerical User OLAP Subject Oriented (subject in the sense customer, product, item, time) Used to analyze business Summarized data Ad-hoc access Large volumes accessed at a time(millions), complex query Large Database Historical data Knowledge User

2 3 4 5

6 7 8

9 10 11 12

Row by Row Loading Time invariant Normalized data E R schema

Bulk Loading Time variant De-normalized data Star schema

3. What are the types of datawarehousing? EDW (Enterprise datawarehousing) It provides a central database for decision support throughout the enterprise It is a collection of DATAMARTS DATAMART It is a subset of Datawarehousing It is a subject oriented database which supports the needs of individuals depts. in an organizations It is called high performance query structure It supports particular line of business like sales, marketing etc.. ODS (Operational data store) It is defined as an integrated view of operational database designed to support operational monitoring It is a collection of operational data sources designed to support Transaction processing Data is refreshed near real-time and used for business activity It is an intermediate between the OLTP and OLAP which helps to create an instance reports

4. What are the modeling involved in Data Warehouse Architecture?

5. What are the types of Approach in DWH? Bottom up approach: first we need to develop data mart then we integrate these data mart into EDW Top down approach: first we need to develop EDW then form that EDW we develop data mart Bottom up OLTP ETL Data mart DWH OLAP Top down OLTP ETL DWH Data mart OLAP Top down Cost of initial planning & design is high Takes longer duration of more than an year Bottom up Planning & Designing the Data Marts without waiting for the Global warehouse design Immediate results from the data marts Tends to take less time to implement Errors in critical modules are detected earlier. Benefits are realized in the early phases. It is a Best Approach Data Modeling Types: Conceptual Data Modeling Logical Data Modeling Physical Data Modeling Dimensional Data Modeling 1. Conceptual Data Modeling Conceptual data model includes all major entities and relationships and does not contain much detailed level of information about attributes and is often used in the INITIAL PLANNING PHASE Conceptual data model is created by gathering business requirements from various sources like business documents, discussion with functional teams, business analysts, smart management experts and end users who do the reporting on the database. Data modelers create conceptual data model and forward that model to functional team for their review. Conceptual data modeling gives an idea to the functional and technical team about how business requirements would be projected in the logical data model.

2. Logical Data Modeling This is the actual implementation and extension of a conceptual data model . Logical data model includes all required entities, attributes, key groups, and relationships that represent business information and define business rules. 3. Physical Data Modeling Physical data model includes all required tables, columns, relationships, database properties for the physical implementation of databases. Database performance, indexing strategy, physical storage and demoralization are important parameters of a physical model. Logical vs. Physical Data Modeling Logical Data Model Represents business information and defines business rules Entity Attribute Primary Key Alternate Key Inversion Key Entry Rule Relationship Definition Dimensional Data Modeling Dimension model consists of fact and dimension tables It is an approach to develop the schema DB designs Types of Dimensional modeling Star schema Snow flake schema Star flake schema (or) Hybrid schema Physical Data Model Represents the physical implementation of the model in a database. Table Column Primary Key Constraint Unique Constraint or Unique Index Non Unique Index Check Constraint, Default Value Foreign Key Comment

Multi star schema What is Star Schema? The Star Schema Logical database design which contains a centrally located fact table surrounded by at least one or more dimension tables Since the database design looks like a star, hence it is called star schema db The Dimension table contains Primary keys and the textual descriptions It contain de-normalized business information A Fact table contains a composite key and measures The measure are of types of key performance indicators which are used to evaluate the enterprise performance in the form of success and failure Eg: Total revenue , Product sale , Discount given, no of customers To generate meaningful report the report should contain at least one dimension and one fact table The advantage of star schema Less number of joins Improve query performance Slicing down Easy understanding of data. Disadvantage: Require more storage space

Example of Star Schema: Snowflake Schema In star schema, If the dimension tables are spitted into one or more dimension tables The de-normalized dimension tables are spitted into a normalized dimension table Example of Snowflake Schema:

In Snowflake schema, the example diagram shown below has 4 dimension tables, 4 lookup tables and 1 fact table. The reason is that hierarchies (category, branch, state, and month) are being broken out of the dimension tables (PRODUCT, ORGANIZATION, LOCATION, and TIME) respectively and separately. It increases the number of joins and poor performance in retrieval of data. In few organizations, they try to normalize the dimension tables to save space. Since dimension tables hold less space snow flake schema approach may be avoided. Bit map indexes cannot be effectively utilized Important aspects of Star Schema & Snow Flake Schema In a star schema every dimension will have a primary key. In a star schema, a dimension table will not have any parent table. Whereas in a snow flake schema, a dimension table will have one or more parent tables. Hierarchies for the dimensions are stored in the dimensional table itself in star schema. Whereas hierarchies are broken into separate tables in snow flake schema. These hierarchies help to drill down the data from topmost hierarchies to the lowermost hierarchies. Star flake schema (or) Hybrid Schema Hybrid schema is a combination of Star and Snowflake schema Multi Star schema Multiple fact tables sharing a set of dimension tables

Confirmed Dimensions are nothing but Reusable Dimensions. The dimensions which u r using multiple times or in multiple data marts. Those are common in different data marts Measure Types (or) Types of Facts

Additive - Measures that can be summed up across all dimensions. o Ex: Sales Revenue Semi Additive - Measures that can be summed up across few dimensions and not with others o Ex: Current Balance Non Additive - Measures that cannot be summed up across any of the dimensions. o Ex: Student attendance

Surrogate Key Joins between fact and dimension tables should be based on surrogate keys Users should not obtain any information by looking at these keys These keys should be simple integers

A sample data warehouse schema Why need staging area for DWH? Staging area needs to clean operational data before loading into data warehouse. Cleaning in the sense your merging data which comes from different source. Its the area where most of the ETL is done Data Cleansing It is used to remove duplications It is used to correct wrong email addresses It is used to identify missing data It used to convert the data types It is used to capitalize name & addresses. Types of Dimensions: There are three types of Dimensions Confirmed Dimensions Junk Dimensions Garbage Dimension Degenerative Dimensions Slowly changing Dimensions Garbage Dimension or Junk Dimension Confirmed is something which can be shared by multiple Fact Tables or multiple Data Marts. Junk Dimensions is grouping flagged values Degenerative Dimension is something dimensional in nature but exist fact table.(Invoice No) Which is neither fact nor strictly dimension attributes. These are useful for some kind of analysis. These are kept as attributes in fact table called degenerated dimension Degenerate dimension: A column of the key section of the fact table that does not have the associated dimension table but used for reporting and analysis, such column is called degenerate dimension or line item dimension. For ex, we have a fact table with customer_id, product_id, branch_id, employee_id, bill_no, and date in key section and price, quantity, amount in measure section. In this fact table, bill_no from key section is a single value; it has no associated dimension table. Instead of creating a Separate dimension table for that single value, we can Include it in fact table to improve performance. SO here the column, bill_no is a degenerate dimension or line item dimension. Informatica Architecture

The Power Center domain It is a primary unit of the Administration. Can have single and multiple domains. It is a collection of nodes and services. Nodes A node is the logical representation of a machine in a domain One node in the domain acts as a gateway node to receive service requests from clients and route them to the appropriate service and node Integration Service: Integration Service does all the real job. It extracts data from sources, processes it as per the business logic and loads data to targets. Repository Service: Repository Service is used to fetch the data from the repository and sends it back to the requesting components (mostly client tools and integration service) Power Center Repository: Repository is nothing but a relational database which stores all the metadata created in Power Center. Power Center Client Tools: The Power Center Client consists of multiple tools. Power Center Administration Console: This is simply a web-based administration tool you can use to administer the Power Center installation.

Q. How can you define a transformation? What are different types of transformations available in Informatica? A. A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set of transformations that perform specific functions. For example, an Aggregator transformation performs calculations on groups of data. Below are the various transformations available in Informatica: Aggregator Custom Expression External Procedure Filter Input Joiner Lookup Normalizer Rank Router Sequence Generator Sorter Source Qualifier Stored Procedure Transaction Control Union Update Strategy XML Generator XML Parser

XML Source Qualifier Q. What is a source qualifier? What is meant by Query Override? A. Source Qualifier represents the rows that the PowerCenter Server reads from a relational or flat file source when it runs a session. When a relational or a flat file source definition is added to a mapping, it is connected to a Source Qualifier transformation. PowerCenter Server generates a query for each Source Qualifier Transformation whenever it runs the session. The default query is SELET statement containing all the source columns. Source Qualifier has capability to override this default query by changing the default settings of the transformation properties. The list of selected ports or the order they appear in the default query should not be changed in overridden query. Q. What is aggregator transformation? A. The Aggregator transformation allows performing aggregate calculations, such as averages and sums. Unlike Expression Transformation, the Aggregator transformation can only be used to perform calculations on groups. The Expression transformation permits calculations on a rowby-row basis only. Aggregator Transformation contains group by ports that indicate how to group the data. While grouping the data, the aggregator transformation outputs the last row of each group unless otherwise specified in the transformation properties. Various group by functions available in Informatica are : AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE. Q. What is Incremental Aggregation? A. Whenever a session is created for a mapping Aggregate Transformation, the session option for Incremental Aggregation can be enabled. When PowerCenter performs incremental aggregation , it passes new source data through the mapping and uses historical cache data to perform new aggregation calculations incrementally . Q. How Union Transformation is used? A. The union transformation is a multiple input group transformation that can be used to merge data from various sources (or pipelines). This transformation works just like UNION ALL statement in SQL, that is used to combine result set of two SELECT statements. Q. Can two flat files be joined with Joiner Transformation? A. Yes, joiner transformation can be used to join data from two flat file sources.

Q. What is a look up transformation? A. This transformation is used to lookup data in a flat file or a relational table, view or synonym. It compares lookup transformation ports (input ports) to the source column values based on the lookup condition. Later returned values can be passed to other transformations. Q. Can a lookup be done on Flat Files? A. Yes. Q. What is a mapplet? A. A mapplet is a reusable object that is created using mapplet designer. The mapplet contains set of transformations and it allows us to reuse that transformation logic in multiple mappings. Q. What does reusable transformation mean? A. Reusable transformations can be used multiple times in a mapping. The reusable transformation is stored as a metadata separate from any other mapping that uses the transformation. Whenever any changes to a reusable transformation are made, all the mappings where the transformation is used will be invalidated. Q. What is update strategy and what are the options for update strategy? A. Informatica processes the source data row-by-row. By default every row is marked to be inserted in the target table. If the row has to be updated/inserted based on some logic Update Strategy transformation is used. The condition can be specified in Update Strategy to mark the processed row for update or insert. Following options are available for update strategy: DD_INSERT: If this is used the Update Strategy flags the row for insertion. Equivalent numeric value of DD_INSERT is 0. DD_UPDATE: If this is used the Update Strategy flags the row for update. Equivalent numeric value of DD_UPDATE is 1. DD_DELETE: If this is used the Update Strategy flags the row for deletion. Equivalent numeric value of DD_DELETE is 2.

DD_REJECT: If this is used the Update Strategy flags the row for rejection. Equivalent numeric value of DD_REJECT is 3. Q. What are the types of loading in Informatica? There are two types of loading, 1. Normal loading and 2. Bulk loading. In normal loading, it loads record by record and writes log for that. It takes comparatively a longer time to load data to the target. In bulk loading, it loads number of records at a time to target database. It takes less time to load data to target. Q. What is aggregate cache in aggregator transformation? The aggregator stores data in the aggregate cache until it completes aggregate calculations. When you run a session that uses an aggregator transformation, the informatica server creates index and data caches in memory to process the transformation. If the informatica server requires more space, it stores overflow values in cache files. Q. What type of repositories can be created using Informatica Repository Manager? A. Informatica PowerCenter includes following type of repositories: Standalone Repository: A repository that functions individually and this is unrelated to any other repositories. Global Repository: This is a centralized repository in a domain. This repository can contain shared objects across the repositories in a domain. The objects are shared through global shortcuts. Local Repository: Local repository is within a domain and its not a global repository. Local repository can connect to a global repository using global shortcuts and can use objects in its shared folders. Versioned Repository: This can either be local or global repository but it allows version control for the repository. A versioned repository can store multiple copies, or versions of an object. This feature allows efficiently developing, testing and deploying metadata in the production environment. Q. What is a code page? A. A code page contains encoding to specify characters in a set of one or more languages. The code page is selected based on source of the data. For example if source contains Japanese text then the code page should be selected to support Japanese text. When a code page is chosen, the program or application for which the code page is set, refers to a specific set of data that describes the characters the application recognizes. This influences the way that application stores, receives, and sends character data.

Q. Which all databases PowerCenter Server on Windows can connect to? A. PowerCenter Server on Windows can connect to following databases: IBM DB2 Informix Microsoft Access Microsoft Excel Microsoft SQL Server Oracle Sybase Teradata Q. Which all databases PowerCenter Server on UNIX can connect to? A. PowerCenter Server on UNIX can connect to following databases: IBM DB2 Informix Oracle Sybase Teradata Q. How to execute PL/SQL script from Informatica mapping? A. Stored Procedure (SP) transformation can be used to execute PL/SQL Scripts. In SP Transformation PL/SQL procedure name can be specified. Whenever the session is executed, the session will call the pl/sql procedure. Q. What is Data Driven? The informatica server follows instructions coded into update strategy transformations within the session mapping which determine how to flag records for insert, update, delete or reject. If we do not choose data driven option setting, the informatica server ignores all update strategy transformations in the mapping.

Q. What are the types of mapping wizards that are provided in Informatica? The designer provide two mapping wizard. 1. Getting Started Wizard - Creates mapping to load static facts and dimension tables as well as slowly growing dimension tables. 2. Slowly Changing Dimensions Wizard - Creates mappings to load slowly changing dimension tables based on the amount of historical dimension data we want to keep and the method we choose to handle historical dimension data. Q. What is Load Manager? A. While running a Workflow, the PowerCenter Server uses the Load Manager process and the Data Transformation Manager Process (DTM) to run the workflow and carry out workflow tasks. When the PowerCenter Server runs a workflow, the Load Manager performs the following tasks: 1. Locks the workflow and reads workflow properties. 2. Reads the parameter file and expands workflow variables. 3. Creates the workflow log file. 4. Runs workflow tasks. 5. Distributes sessions to worker servers. 6. Starts the DTM to run sessions. 7. Runs sessions from master servers. 8. Sends post-session email if the DTM terminates abnormally. When the PowerCenter Server runs a session, the DTM performs the following tasks: 1. Fetches session and mapping metadata from the repository. 2. Creates and expands session variables. 3. Creates the session log file. 4. Validates session code pages if data code page validation is enabled. Checks

Query conversions if data code page validation is disabled. 5. Verifies connection object permissions. 6. Runs pre-session shell commands. 7. Runs pre-session stored procedures and SQL. 8. Creates and runs mappings, reader, writer, and transformation threads to extract, transform, and load data. 9. Runs post-session stored procedures and SQL. 10. Runs post-session shell commands. 11. Sends post-session email. Q. What is Data Transformation Manager? A. After the load manager performs validations for the session, it creates the DTM process. The DTM process is the second process associated with the session run. The primary purpose of the DTM process is to create and manage threads that carry out the session tasks. The DTM allocates process memory for the session and divide it into buffers. This is also known as buffer memory. It creates the main thread, which is called the master thread. The master thread creates and manages all other threads. If we partition a session, the DTM creates a set of threads for each partition to allow concurrent processing.. When Informatica server writes messages to the session log it includes thread type and thread ID. Following are the types of threads that DTM creates: Master Thread - Main thread of the DTM process. Creates and manages all other threads.

Mapping Thread - One Thread to Each Session. Fetches Session and Mapping Information. Pre and Post Session Thread - One Thread each to Perform Pre and Post Session Operations. Reader Thread - One Thread for Each Partition for Each Source Pipeline. Writer Thread - One Thread for Each Partition if target exist in the source pipeline write to the target. Transformation Thread - One or More Transformation Thread For Each Partition. Q. What is Session and Batches? Session - A Session Is A set of instructions that tells the Informatica Server How And When To Move Data From Sources To Targets. After creating the session, we can use either the server manager or the command line program pmcmd to start or stop the session. Batches - It Provides A Way to Group Sessions For Either Serial Or Parallel Execution By The Informatica Server. There Are Two Types Of Batches: 1. Sequential - Run Session One after the Other. 2. Concurrent - Run Session At The Same Time. Q. How many ways you can update a relational source definition and what are they? A. Two ways 1. Edit the definition 2. Reimport the definition Q. What is a transformation?

A. It is a repository object that generates, modifies or passes data. Q. What are the designer tools for creating transformations? A. Mapping designer Transformation developer Mapplet designer Q. In how many ways can you create ports? A. Two ways 1. Drag the port from another transformation 2. Click the add button on the ports tab. Q. What are reusable transformations? A. A transformation that can be reused is called a reusable transformation They can be created using two methods: 1. Using transformation developer 2. Create normal one and promote it to reusable Q. Is aggregate cache in aggregator transformation? A. The aggregator stores data in the aggregate cache until it completes aggregate calculations. When u run a session that uses an aggregator transformation, the Informatica server creates index and data caches in memory to process the transformation. If the Informatica server requires more space, it stores overflow values in cache files. Q. What r the settings that u use to configure the joiner transformation? Master and detail source Type of join Condition of the join Q. What are the join types in joiner transformation? A. Normal (Default) -- only matching rows from both master and detail Master outer -- all detail rows and only matching rows from master

Detail outer -- all master rows and only matching rows from detail Full outer -- all rows from both master and detail (matching or non matching) Q. What are the joiner caches? A. When a Joiner transformation occurs in a session, the Informatica Server reads all the records from the master source and builds index and data caches based on the master rows. After building the caches, the Joiner transformation reads records from the detail source and performs joins. Q. What r the types of lookup caches? Static cache: You can configure a static or read-only cache for only lookup table. By default Informatica server creates a static cache. It caches the lookup table and lookup values in the cache for each row that comes into the transformation. When the lookup condition is true, the Informatica server does not update the cache while it processes the lookup transformation. Dynamic cache: If you want to cache the target table and insert new rows into cache and the target, you can create a look up transformation to use dynamic cache. The Informatica server dynamically inserts data to the target table. Persistent cache: You can save the lookup cache files and reuse them the next time the Informatica server processes a lookup transformation configured to use the cache. Recache from database: If the persistent cache is not synchronized with the lookup table, you can configure the lookup transformation to rebuild the lookup cache. Shared cache: You can share the lookup cache between multiple transactions. You can share unnamed cache between transformations in the same mapping. Q. What is Transformation? A: Transformation is a repository object that generates, modifies, or passes data. Transformation performs specific function. They are two types of transformations: 1. Active Rows, which are affected during the transformation or can change the no of rows that pass through it. Eg: Aggregator, Filter, Joiner, Normalizer, Rank, Router, Source qualifier, Update Strategy, ERP Source Qualifier, Advance External Procedure. 2. Passive

Does not change the number of rows that pass through it. Eg: Expression, External Procedure, Input, Lookup, Stored Procedure, Output, Sequence Generator, XML Source Qualifier. Q. What are Options/Type to run a Stored Procedure? A: Normal: During a session, the stored procedure runs where the transformation exists in the mapping on a row-by-row basis. This is useful for calling the stored procedure for each row of data that passes through the mapping, such as running a calculation against an input port. Connected stored procedures run only in normal mode. Pre-load of the Source. Before the session retrieves data from the source, the stored procedure runs. This is useful for verifying the existence of tables or performing joins of data in a temporary table. Post-load of the Source. After the session retrieves data from the source, the stored procedure runs. This is useful for removing temporary tables. Pre-load of the Target. Before the session sends data to the target, the stored procedure runs. This is useful for verifying target tables or disk space on the target system. Post-load of the Target. After the session sends data to the target, the stored procedure runs. This is useful for recreating indexes on the database. It must contain at least one Input and one Output port. Q. What kinds of sources and of targets can be used in Informatica? Sources may be Flat file, relational db or XML. Target may be relational tables, XML or flat files. Q: What is Session Process? A: The Load Manager process. Starts the session, creates the DTM process, and sends post-session email when the session completes. Q. What is DTM process? A: The DTM process creates threads to initialize the session, read, write, transform data and handle pre and post-session operations. Q. What is the different type of tracing levels?

Tracing level represents the amount of information that Informatica Server writes in a log file . Tracing levels store information about mapping and transformations. There are 4 types of tracing levels supported 1. Normal: It specifies the initialization and status information and summarization of the success rows and target rows and the information about the skipped rows due to transformation errors. 2. Terse: Specifies Normal + Notification of data 3. Verbose Initialization: In addition to the Normal tracing, specifies the location of the data cache files and index cache files that are treated and detailed transformation statistics for each and every transformation within the mapping. 4. Verbose Data: Along with verbose initialization records each and every record processed by the informatica server. Q. Types of Dimensions? A dimension table consists of the attributes about the facts. Dimensions store the textual descriptions of the business. Conformed Dimension: Conformed dimensions mean the exact same thing with every possible fact table to which they are joined. Eg: The date dimension table connected to the sales facts is identical to the date dimension connected to the inventory facts. Junk Dimension: A junk dimension is a collection of random transactional codes flags and/or text attributes that are unrelated to any particular dimension. The junk dimension is simply a structure that provides a convenient place to store the junk attributes. Eg: Assume that we have a gender dimension and marital status dimension. In the fact table we need to maintain two keys referring to these dimensions. Instead of that create a junk dimension which has all the combinations of gender and marital status (cross join gender and marital status table and create a junk table). Now we can maintain only one key in the fact table. Degenerated Dimension: A degenerate dimension is a dimension which is derived from the fact table and doesnt have its own dimension table.

Eg: A transactional code in a fact table. Slowly changing dimension: Slowly changing dimensions are dimension tables that have slowly increasing data as well as updates to existing data. Q. What are the output files that the Informatica server creates during the session running? Informatica server log: Informatica server (on UNIX) creates a log for all status and error messages (default name: pm.server.log). It also creates an error log for error messages. These files will be created in Informatica home directory Session log file: Informatica server creates session log file for each session. It writes information about session into log files such as initialization process, creation of sql commands for reader and writer threads, errors encountered and load summary. The amount of detail in session log file depends on the tracing level that you set. Session detail file: This file contains load statistics for each target in mapping. Session detail includes information such as table name, number of rows written or rejected. You can view this file by double clicking on the session in monitor window. Performance detail file: This file contains information known as session performance details which helps you where performance can be improved. To generate this file select the performance detail option in the session property sheet. Reject file: This file contains the rows of data that the writer does not write to targets. Control file: Informatica server creates control file and a target file when you run a session that uses the external loader. The control file contains the information about

the target flat file such as data format and loading instructions for the external loader. Post session email: Post session email allows you to automatically communicate information about a session run to designated recipients. You can create two different messages. One if the session completed successfully the other if the session fails. Indicator file: If you use the flat file as a target, you can configure the Informatica server to create indicator file. For each target row, the indicator file contains a number to indicate whether the row was marked for insert, update, delete or reject. Output file: If session writes to a target file, the Informatica server creates the target file based on file properties entered in the session property sheet. Cache files: When the Informatica server creates memory cache it also creates cache files. For the following circumstances Informatica server creates index and data cache files: Aggregator transformation Joiner transformation Rank transformation Lookup transformation Q. What is meant by lookup caches? A. The Informatica server builds a cache in memory when it processes the first row of a data in a cached look up transformation. It allocates memory for the cache based on the amount you configure in the transformation or session properties. The

Informatica server stores condition values in the index cache and output values in the data cache. Q. How do you identify existing rows of data in the target table using lookup transformation? A. There are two ways to lookup the target table to verify a row exists or not : 1. Use connect dynamic cache lookup and then check the values of NewLookuprow Output port to decide whether the incoming record already exists in the table / cache or not. 2. Use Unconnected lookup and call it from an expression transformation and check the Lookup condition port value (Null/ Not Null) to decide whether the incoming record already exists in the table or not.

Q. What are Aggregate tables? Aggregate table contains the summary of existing warehouse data which is grouped to certain levels of dimensions. Retrieving the required data from the actual table, which have millions of records will take more time and also affects the server performance. To avoid this we can aggregate the table to certain required level and can use it. This tables reduces the load in the database server and increases the performance of the query and can retrieve the result very fastly.

Q. What is a level of Granularity of a fact table? Level of granularity means level of detail that you put into the fact table in a data warehouse. For example: Based on design you can decide to put the sales data in each transaction. Now, level of granularity would mean what detail you are willing to put for each transactional fact. Product sales with respect to each minute or you want to aggregate it upto minute and put that data. Q. What is session? A session is a set of instructions to move data from sources to targets. Q. What is worklet?

Worklet are objects that represent a set of workflow tasks that allow to reuse a set of workflow logic in several window. Use of Worklet: You can bind many of the tasks in one place so that they can easily get identified and also they can be of a specific purpose. Q. What is workflow? A workflow is a set of instructions that tells the Informatica server how to execute the tasks. Q. Why cannot we use sorted input option for incremental aggregation? In incremental aggregation, the aggregate calculations are stored in historical cache on the server. In this historical cache the data need not be in sorted order. If you give sorted input, the records come as presorted for that particular run but in the historical cache the data may not be in the sorted order. That is why this option is not allowed. Q. What is target load order plan? You specify the target loadorder based on source qualifiers in a mapping. If you have the multiple source qualifiers connected to the multiple targets, you can designate the order in which informatica server loads data into the targets. The Target load Plan defines the order in which data extract from source qualifier transformation. In Mappings (tab) Target Load Order Plan

Q. What is constraint based loading? Constraint based load order defines the order of loading the data into the multiple targets based on primary and foreign keys constraints. Set the option is: Double click the session Configure Object > check the Constraint Based Loading Q. What is the status code in stored procedure transformation? Status code provides error handling for the informatica server during the session. The stored procedure issues a status code that notifies whether or not stored procedure completed successfully. This value cannot see by the user. It only used by the informatica server to determine whether to continue running the session or stop. Q. Define Informatica Repository?

The Informatica repository is a relational database that stores information, or metadata, used by the Informatica Server and Client tools. Metadata can include information such as mappings describing how to transform source data, sessions indicating when you want the Informatica Server to perform the transformations, and connect strings for sources and targets. The repository also stores administrative information such as usernames and passwords, permissions and privileges, and product version. Use repository manager to create the repository. The Repository Manager connects to the repository database and runs the code needed to create the repository tables. These tables stores metadata in specific format the informatica server, client tools use. Q. What is a metadata? Designing a data mart involves writing and storing a complex set of instructions. You need to know where to get data (sources), how to change it, and where to write the information (targets). PowerMart and PowerCenter call this set of instructions metadata. Each piece of metadata (for example, the description of a source table in an operational database) can contain comments about it. In summary, Metadata can include information such as mappings describing how to transform source data, sessions indicating when you want the Informatica Server to perform the transformations, and connect strings for sources and targets. Q. What is metadata reporter? It is a web based application that enables you to run reports against repository metadata. With a Meta data reporter you can access information about your repository without having knowledge of sql, transformation language or underlying tables in the repository.

Q. What are the types of metadata that stores in repository? Source definitions. Definitions of database objects (tables, views, synonyms) or files that provide source data. Target definitions. Definitions of database objects or files that contain the target data. Multi-dimensional metadata. Target definitions that are configured as cubes and dimensions. Mappings. A set of source and target definitions along with transformations containing business logic that you build into the transformation. These are the instructions that the Informatica Server uses to transform and move data. Reusable transformations. Transformations that you can use in multiple mappings. Mapplets. A set of transformations that you can use in multiple mappings. Sessions and workflows. Sessions and workflows store information about how and when the Informatica Server moves data. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in a workflow. Each session corresponds to a single mapping. Following are the types of metadata that stores in the repository Database Connections Global Objects Multidimensional Metadata Reusable Transformations Short cuts Transformations Q. How can we store previous session logs? Go to Session Properties > Config Object > Log Options Select the properties as follows. Save session log by > SessionRuns Save session log for these runs > Change the number that you want to save the number of log files (Default is 0) If you want to save all of the logfiles created by every run, and then select the option Save session log for these runs > Session TimeStamp You can find these properties in the session/workflow Properties. Q. What is Changed Data Capture?

Changed Data Capture (CDC) helps identify the data in the source system that has changed since the last extraction. With CDC data extraction takes place at the same time the insert update or delete operations occur in the source tables and the change data is stored inside the database in change tables. The change data thus captured is then made available to the target systems in a controlled manner. Q. What is an indicator file? and how it can be used? Indicator file is used for Event Based Scheduling when you dont know when the Source Data is available. A shell command, script or a batch file creates and send this indicator file to the directory local to the Informatica Server. Server waits for the indicator file to appear before running the session. Q. What is audit table? and What are the columns in it? Audit Table is nothing but the table which contains about your workflow names and session names. It contains information about workflow and session status and their details. WKFL_RUN_ID WKFL_NME START_TMST END_TMST ROW_INSERT_CNT ROW_UPDATE_CNT ROW_DELETE_CNT ROW_REJECT_CNT Q. If session fails after loading 10000 records in the target, how can we load 10001th record when we run the session in the next time? Select the Recovery Strategy in session properties as Resume from the last check point. Note Set this property before running the session Q. Informatica Reject File How to identify rejection reason D - Valid data or Good Data. Writer passes it to the target database. The target accepts it unless a database error occurs, such as finding a duplicate key while inserting.

O - Overflowed Numeric Data. Numeric data exceeded the specified precision or scale for the column. Bad data, if you configured the mapping target to reject overflow or truncated data. N - Null Value. The column contains a null value. Good data. Writer passes it to the target, which rejects it if the target database does not accept null values. T - Truncated String Data. String data exceeded a specified precision for the column, so the Integration Service truncated it. Bad data, if you configured the mapping target to reject overflow or truncated data. Also to be noted that the second column contains column indicator flag value D which signifies that the Row Indicator is valid. Now let us see how Data in a Bad File looks like: 0,D,7,D,John,D,5000.375,O,,N,BrickLand Road Singapore,T

Q. What is Insert Else Update and Update Else Insert? These options are used when dynamic cache is enabled. Insert Else Update option applies to rows entering the lookup transformation with the row type of insert. When this option is enabled the integration service inserts new rows in the cache and updates existing rows. When disabled, the Integration Service does not update existing rows. Update Else Insert option applies to rows entering the lookup transformation with the row type of update. When this option is enabled, the Integration Service updates existing rows, and inserts a new row if it is new. When disabled, the Integration Service does not insert new rows.

Q. What are the Different methods of loading Dimension tables? Conventional Load - Before loading the data, all the Table constraints will be checked against the data. Direct load (Faster Loading) - All the Constraints will be disabled. Data will be loaded directly. Later the data will be checked against the table constraints and the bad data wont be indexed. Q. What are the different types of Commit intervals? The different commit intervals are: Source-based commit. The Informatica Server commits data based on the number of source rows. The commit point is the commit interval you configure in the session properties. Target-based commit. The Informatica Server commits data based on the number of target rows and the key constraints on the target table. The commit point also depends on the buffer block size and the commit interval. Q. How to add source flat file header into target file? Edit Task-->Mapping-->Target-->Header Options--> Output field names Q. How to load name of the file into relation target? Source Definition-->Properties-->Add currently processed file name port Q. How to return multiple columns through un-connect lookup? Suppose your look table has f_name,m_name,l_name and you are using unconnected lookup. In override SQL of lookup use f_name||~||m_name||~||l_name you can easily get this value using unconnected lookup in expression. Use substring function in expression transformation to separate these three columns and make then individual port for downstream transformation /Target. -----------------------------------------------------------------------------------------

Q. What is Factless fact table? In which purpose we are using this in our DWH projects? Plz give me the proper answer? It is a fact table which does not contain any measurable data. EX: Student attendance fact (it contains only Boolean values, whether student attended class or not ? Yes or No.)

A Factless fact table contains only the keys but there is no measures or in other way we can say that it contains no facts. Generally it is used to integrate the fact tables

Factless fact table contains only foreign keys. We can have two kinds of aggregate functions from the factless fact one is count and other is distinct count.

2 purposes of factless fact

1. Coverage: to indicate what did NOT happen. Like to Like: which product did not sell well in a particular region? 2. Event tracking: To know if the event took place or not. Like: Fact for tracking students attendance will not contain any measures. Q. What is staging area? Staging area is nothing but to apply our logic to extract the data from source and cleansing the data and put the data into meaningful and summaries of the data for data warehouse.

Q. What is constraint based loading Constraint based load order defines the order of loading the data into the multiple targets based on primary and foreign keys constraints. Q. Why union transformation is active transformation? the only condition for a transformation to bcum active is row number changes. Now the thing is how a row number can change. Then there are 2 conditions: 1. either the no of rows coming in and going out is diff. eg: in case of filter we have the data like id name dept row_num 1 aa 4 1 2 bb 3 2 3 cc 4 3 and we have a filter condition like dept=4 then the o/p wld b like id name dept row_num 1 aa 4 1 3 cc 4 2

So row num changed and it is an active transformation 2. or the order of the row changes eg: when Union transformation pulls in data, suppose we have 2 sources sources1: id name dept row_num 1 aa 4 1 2 bb 3 2 3 cc 4 3 source2: id name dept row_num 4 aaa 4 4 5 bbb 3 5 6 ccc 4 6 it never restricts the data from any source so the data can come in any manner id name dept row_num old row_num 1 aa 4 1 1 4 aaa 4 2 4 5 bbb 3 3 5 2 bb 3 4 2 3 cc 4 5 3 6 ccc 4 6 6 so the row_num are changing . Thus we say that union is an active transformation Q. What is use of batch file in informatica? How many types of batch file in informatica? With the batch file, we can run sessions either in sequential or in concurrently. Grouping of Sessions is known as Batch. Two types of batches: 1)Sequential: Runs Sessions one after another. 2)Concurrent: Run the Sessions at the same time.

If u have sessions with source-target dependencies u have to go for sequential batch to start the sessions one after another. If u have several independent sessions u can use concurrent batches Which run all the sessions at the same time

Q. What is joiner cache? When we use the joiner transformation an integration service maintains the cache, all the records are stored in joiner cache. Joiner caches have 2 types of cache 1.Index cache 2. Joiner cache.

Index cache stores all the port values which are participated in the join condition and data cache have stored all ports which are not participated in the join condition. Q. What is the location of parameter file in Informatica? $PMBWPARAM Q. How can you display only hidden files in UNIX $ ls -la total 16 8 drwxrwxrwx 2 zzz yyy 4096 Apr 26 12:00 ./ 8 drwxrwxrwx 9 zzz yyy 4096 Jul 31 16:59 ../ Correct answer is ls -a|grep "^\." $ls -a Q. How to delete the data in the target table after loaded. SQ---> Properties tab-->Post SQL delete from target_tablename SQL statements executed using the source database connection, after a pipeline is run write post sql in target table as truncate table name. we have the property in session truncate option. Q. What is polling in informatica?

It displays the updated information about the session in the monitor window. The monitor window displays the status of each session when you poll the Informatica server. Q. How i will stop my workflow after 10 errors Session level property error handling mention condition stop on errors: 10 --->Config object > Error Handling > Stop on errors Q. How can we calculate fact table size? A fact table is multiple of combination of dimension tables ie if we want 2 find the fact table size of 3years of historical date with 200 products and 200 stores 3*365*200*200=fact table size Q. Without using emailtask how will send a mail from informatica? by using 'mailx' command in unix of shell scripting Q. How will compare two mappings in two different repositories? in the designer client , goto mapping tab there is one option that is 'compare', here we will compare two mappings in two different repository in informatica designer go to mapping tab--->compare.. we can compare 2 folders within the same repository .. we can compare 2 folders within different repository .. Q. What is constraint based load order Constraint based load order defines the order in which data loads into the multiple targets based on primary key and foreign key relationship. Q. What is target load plan

Suppose i have 3 pipelines in a single mapping designer emp source--->sq--->tar1 dept source--->sq--->tar2 bonus source--->sq--->tar3 my requirement is to load first in tar2 then tar1 and then finally tar3 for this type of loading to control the extraction of data from source by source qualifier we use target load plan. Q. What is meant by data driven.. in which scenario we use that..? Data driven is available at session level. it says that when we r using update strategy t/r ,how the integration service fetches the data and how to update/insert row in the database log. Data driven is nothing but instruct the source rows that should take action on target i.e(update,delete,reject,insert). If we use the update strategy transformation in a mapping then will select the data driven option in session. Q. How to run workflow in unix? Syntax: pmcmd startworkflow -sv <service name> -d <domain name> -u <user name> -p <password> -f <folder name> <workflow name> Example Pmcmd start workflow service ${INFA_SERVICE} -domain ${INFA_DOMAIN} -uv xxx_PMCMD_ID -pv PSWD -folder ${ETLFolder} -wait ${ETLWorkflow} \ Q. What is the main difference between a Joiner Transformation and Union Transformation? Joiner Transformation merge horizontally Union Transformation merge vertically A joiner Transformation is used to join data from hertogenous database ie (Sql database and flat file) where has Union transformation is used to join data from the same relational sources.....(oracle table and another Oracle table) Join Transformation combines data record horizontally based on join condition. And combine data from two different sources having different metadata. Join transformation supports heterogeneous, homogeneous data source. Union Transformation combines data record vertically from multiple sources, having same metadata. Union transformation also support heterogeneous data source. Union transformation functions as UNION ALL set operator.

Q. What is constraint based loading exactly? And how to do this? I think it is when we have primary key-foreign key relationship. Is it correct? Constraint Based Load order defines load the data into multiple targets depend on the primary key foreign key relation. set the option is: Double click the session Configure Object check the Constraint Based Loading

Q. Difference between top down(w.h inmon)and bottom up(ralph kimball)approach? Top Down approach:As per W.H.INWON, first we need to build the Data warehouse after that we need to build up the DataMart but this is so what difficult to maintain the DWH. Bottom up approach;As per Ralph Kimbal, first we need to build up the Data Marts then we need to build up the Datawarehouse.. this approach is most useful in real time while creating the Data warehouse.

Q. What are the different caches used in informatica? Static cache Dynamic cache Shared cache Persistent cache Q. What is the command to get the list of files in a directory in unix? $ls -lrt Q. How to import multiple flat files in to single target where there is no common column in the flat files in workflow session properties in Mapping tab in properties choose Source filetype - Indirect Give the Source filename : <file_path> This <file_path> file should contain all the multiple files which you want to Load Q. How to connect two or more table with single source qualifier? Create a Oracle source with how much ever column you want and write the join query in SQL query override. But the column order and data type should be same as in the SQL query. Q. How to call unconnected lookup in expression transformation? :LKP.LKP_NAME(PORTS) Q. What is diff between connected and unconnected lookup? Connected lookup: It is used to join the two tables it returns multiple rows it must be in mapping pipeline u can implement lookup condition using connect lookup u can generate sequence numbers by enabling dynamic lookup cache. Unconnected lookup: it returns single output through return port it acts as a lookup function(:lkp) it is called by another t/r. not connected either source r target. -----CONNECTED LOOKUP: >> It will participated in data pipeline >> It contains multiple inputs and multiple outputs. >> It supported static and dynamic cache. UNCONNECTED LOOKUP:

>> It will not participated in data pipeline >> It contains multiple inputs and single output. >> It supported static cache only. Q. Types of partitioning in Informatica? Partition 5 types 1. 2. 3. 4. 5. Simple pass through Key range Hash Round robin Database

Q. Which transformation uses cache? 1. Lookup transformation 2. Aggregator transformation 3. Rank transformation 4. Sorter transformation 5. Joiner transformation Q. Explain about union transformation? A union transformation is a multiple input group transformation, which is used to merge the data from multiple sources similar to UNION All SQL statements to combine the results from 2 or more sql statements. Similar to UNION All statement, the union transformation doesn't remove duplicate rows. It is an active transformation.

Q. Explain about Joiner transformation? Joiner transformation is used to join source data from two related heterogeneous sources. However this can also be used to join data from the same source. Joiner t/r join sources with at least one matching column. It uses a condition that matches one or more pair of columns between the 2 sources. To configure a Joiner t/r various settings that we do are as below: 1) Master and detail source 2) Types of join 3) Condition of the join

Q. Explain about Lookup transformation? Lookup t/r is used in a mapping to look up data in a relational table, flat file, view or synonym. The informatica server queries the look up source based on the look up ports in the transformation. It compares look up t/r port values to look up source column values based on the look up condition. Look up t/r is used to perform the below mentioned tasks: 1) To get a related value. 2) To perform a calculation. 3) To update SCD tables.

Q. How to identify this row for insert and this row for update in dynamic lookup cache? Based on NEW LOOKUP ROW.. Informatica server indicates which one is insert and which one is update. Newlookuprow- 0...no change Newlookuprow- 1...Insert Newlookuprow- 2...update Q. How many ways can we implement SCD2? 1) Date range 2) Flag 3) Versioning Q. How will you check the bottle necks in informatica? From where do you start checking? You start as per this order 1. Target 2. Source 3. Mapping 4. Session 5. System Q. What is incremental aggregation? When the aggregator transformation executes all the output data will get stored in the temporary location called aggregator cache. When the next time the mapping runs the aggregator transformation runs for the new records loaded after the first run. These output values will get incremented with the values in the aggregator cache. This is called incremental aggregation. By this way we can improve performance... --------------------------Incremental aggregation means applying only the captured changes in the source to aggregate calculations in a session. When the source changes only incrementally and if we can capture those changes, then we can configure the session to process only those changes. This allows informatica server to update target table incrementally, rather than forcing it to process the entire source and recalculate the same calculations each time you run the session. By doing this obviously the session performance increases. Q. How can i explain my project architecture in interview..? Tell me your project flow from source to target..? Project architecture is like 1. Source Systems: Like Mainframe,Oracle,People soft,DB2. 2. Landing tables: These are tables act like source. Used for easy to access, for backup purpose, as reusable for other mappings. 3. Staging tables: From landing tables we extract the data into staging tables after all validations done on the data. 4. Dimension/Facts: These are the tables those are used for analysis and make decisions by analyzing the data. 5. Aggregation tables: These tables have summarized data useful for managers who wants to view monthly wise sales, year wise sales etc. 6. Reporting layer: 4 and 5 phases are useful for reporting developers to generate reports. I hope this answer helps you.

Q. What type of transformation is not supported by mapplets? Normalizer transformation COBOL sources, joiner XML source qualifier transformation XML sources Target definitions Pre & Post Session stored procedures Other mapplets Q. How informatica recognizes mapping? All are organized by Integration service. Power center talks to Integration Service and Integration service talk to session. Session has mapping Structure. These are flow of Execution.

Q. Can every transformation reusable? How? Except source qualifier transformation, all transformations support reusable property. Reusable transformation developed in two ways. 1. In mapping which transformation do you want to reuse, select the transformation and double click on it, there you got option like make it as reusable transformation option. There you need to check the option for converting non reusable to reusable transformation. but except for source qualifier trans. 2. By using transformation developer

Q. What is Pre Sql and Post Sql? Pre SQL means that the integration service runs SQL commands against the source database before it reads the data from source.

Post SQL means integration service runs SQL commands against target database after it writes to the target.

Q. Insert else update option in which situation we will use?

if the source table contain multiple records .if the record specified in the associated port to insert into lookup cache. it does not find a record in the lookup cache when it is used find the particular record & change the data in the associated port.

---------------------We set this property when the lookup TRFM uses dynamic cache and the session property TREAT SOURCE ROWS AS "Insert" has been set. -------------------This option we use when we want to maintain the history. If records are not available in target table then it inserts the records in to target and records are available in target table then it updates the records. Q. What is an incremental loading? in which situations we will use incremental loading? Incremental Loading is an approach. Let suppose you a mapping for load the data from employee table to a employee_target table on the hire date basis. Again let suppose you already move the employee data from source to target up to the employees hire date 31-12-2009.Your organization now want to load data on employee_target today. Your target already have the data of that employees having hire date up to 31-12-2009.so you now pickup the source data which are hiring from 1-1-2010 to till date. That's why you needn't take the data before than that date, if you do that wrongly it is overhead for loading data again in target which is already exists. So in source qualifier you filter the records as per hire date and you can also parameterized the hire date that help from which date you want to load data upon target. This is the concept of Incremental loading.

Q. What is target update override? By Default the integration service updates the target based on key columns. But we might want to update non-key columns also, at that point of time we can override the UPDATE statement for each target in the mapping. The target override affects only when the source rows are marked as update by an update strategy in the mapping.

Q. What is the Mapping parameter and Mapping variable? Mapping parameter: Mapping parameter is constant values that can be defined before mapping run. A mapping parameter reuses the mapping for various constant values.

Mapping variable: Mapping variable is represent a value that can be change during the mapping run that can be stored in repository the integration service retrieve that value from repository and incremental value for next run.

Q. What is rank and dense rank in informatica with any examples and give sql query for this both ranks for eg: the file contains the records with column 100 200(repeated rows) 200 300 400 500 the rank function gives output as 1 2 2 4 5 6 and dense rank gives 1 2 2 3 4 5 for eg: the file contains the records with column empno sal 100 1000 200(repeated rows) 2000 200 3000 300 4000 400 5000 500 6000 Rank : select rank() over (partition by empno order by sal) from emp 1 2 2 4 5 6

Dense Rank select dense_rank() over (partition by empno order by sal) from emp and dense rank gives 1 2 2 3 4 5 Q. What is the incremental aggregation? The first time you run an upgraded session using incremental aggregation, the Integration Service upgrades the index and data cache files. If you want to partition a session using a mapping with incremental aggregation, the Integration Service realigns the index and data cache files.

Q. What is session parameter? Parameter file is a text file where we can define the values to the parameters .session parameters are used for assign the database connection values

Q. What is mapping parameter? A mapping parameter represents a constant value that can be defined before mapping run. A mapping parameter defines a parameter file which is saved with an extension.prm a mapping parameter reuse the various constant values.

Q. What is parameter file? A parameter file can be a text file. Parameter file is to define the values for parameters and variables used in a session. A parameter file is a file created by text editor such as word pad or notepad. You can define the following values in parameter file ing parameters

Q. What is session override? Session override is an option in informatica at session level. Here we can manually give a sql query which is issued to the database when the session runs. It is nothing but over riding the default sql which is generated by a particular transformation at mapping level. Q. What are the diff. b/w informatica versions 8.1.1 and 8.6.1?

Little change in the Administrator Console. In 8.1.1 we can do all the creation of IS and repository Service, web service, Domain, node, grid ( if we have licensed version),In 8.6.1 the Informatica Admin console we can manage both Domain page and security page. Domain Page means all the above like creation of IS and repository Service, web service, Domain, node, grid ( if we have licensed version) etc. Security page means creation of users, privileges, LDAP configuration, Export Import user and Privileges etc.

Q. What are the uses of a Parameter file? Parameter file is one which contains the values of mapping variables. type this in notepad.save it . foldername.sessionname $$inputvalue1= --------------------------------Parameter files are created with an extension of .PRM

These are created to pass values those can be changed for Mapping Parameter and Session Parameter during mapping run.

Mapping Parameters: A Parameter is defined in a parameter file for which a Parameter is create already in the Mapping with Data Type , Precision and scale.

The Mapping parameter file syntax (xxxx.prm). [FolderName.WF:WorkFlowName.ST:SessionName] $$ParameterName1=Value $$ParameterName2=Value

After that we have to select the properties Tab of Session and Set Parameter file name including physical path of this xxxx.prm file.

Session Parameters: The Session Parameter files syntax (yyyy.prm). [FolderName.SessionName] $InputFileValue1=Path of the source Flat file

After that we have to select the properties Tab of Session and Set Parameter file name including physical path of this yyyy.prm file.

Do following changes in Mapping Tab of Source Qualifier's Properties section Attributes values

Source file Type ---------> Direct Source File Directory --------> Empty Source File Name --------> $InputFileValue1

Q. What is the default data driven operation in informatica? This is default option for update strategy transformation. The integration service follows instructions coded in update strategy within session mapping determine how to flag records for insert,delete,update,reject. If you do not data driven option setting, the integration service ignores update strategy transformations in the mapping. Q. What is threshold error in informatica? When the target is used by the update strategy DD_REJECT,DD_UPDATE and some limited count, then if it the number of rejected records exceed the count then the session ends with failed status. This error is called Threshold Error.

Q. SO many times i saw "$PM parser error ". What is meant by PM? PM: POWER MART 1) Parsing error will come for the input parameter to the lookup. 2) Informatica is not able to resolve the input parameter CLASS for your lookup. 3) Check the Port CLASS exists as either input port or a variable port in your expression. 4) Check data type of CLASS and the data type of input parameter for your lookup.

Q. What is a candidate key? A candidate key is a combination of attributes that can be uniquely used to identify a database record without any extraneous data (unique). Each table may have one or more candidate keys. One of these candidate keys is selected as the table primary key else are called Alternate Key.

Q. What is the difference between Bitmap and Btree index?

Bitmap index is used for repeating values. ex: Gender: male/female Account status:Active/Inactive Btree index is used for unique values. ex: empid. Q. What is ThroughPut in Informatica? Thoughtput is the rate at which power centre server read the rows in bytes from source or writes the rows in bytes into the target per second.

You can find this option in workflow monitor. Right click on session choose properties and Source/Target Statictics tab you can find thoughtput details for each instance of source and target.

Q. What are set operators in Oracle UNION UNION ALL MINUS INTERSECT

Q. How i can Schedule the Informatica job in "Unix Cron scheduling tool"?

Crontab The crontab (cron derives from chronos, Greek for time; tab stands for table) command, found in Unix and Unixlike operating systems, is used to schedule commands to be executed periodically. To see what crontabs are currently running on your system, you can open a terminal and run: sudo crontab -l To edit the list of cronjobs you can run: sudo crontab -e This will open a the default editor (could be vi or pico, if you want you can change the default editor) to let us manipulate the crontab. If you save and exit the editor, all your cronjobs are saved into crontab. Cronjobs are written in the following format: * * * * * /bin/execute/this/script.sh Scheduling explained As you can see there are 5 stars. The stars represent different date parts in the following order: 1. 2. minute (from 0 to 59) hour (from 0 to 23)

3. 4. 5.

day of month (from 1 to 31) month (from 1 to 12) day of week (from 0 to 6) (0=Sunday)

Execute every minute If you leave the star, or asterisk, it means every. Maybe that's a bit unclear. Let's use the the previous example again: * * * * * /bin/execute/this/script.sh They are all still asterisks! So this means execute /bin/execute/this/script.sh: 1. 2. 3. 4. 5. every minute of every hour of every day of the month of every month and every day in the week.

In short: This script is being executed every minute. Without exception. Execute every Friday 1AM So if we want to schedule the script to run at 1AM every Friday, we would need the following cronjob: 0 1 * * 5 /bin/execute/this/script.sh Get it? The script is now being executed when the system clock hits: 1. 2. 3. 4. 5. minute: 0 of hour: 1 of day of month: * (every day of month) of month: * (every month) and weekday: 5 (=Friday)

Execute on weekdays 1AM So if we want to schedule the script to run at 1AM every Friday, we would need the following cronjob: 0 1 * * 1-5 /bin/execute/this/script.sh Get it? The script is now being executed when the system clock hits: 1. 2. minute: 0 of hour: 1

3. 4. 5.

of day of month: * (every day of month) of month: * (every month) and weekday: 1-5 (=Monday til Friday)

Execute 10 past after every hour on the 1st of every month Here's another one, just for practicing 10 * 1 * * /bin/execute/this/script.sh Fair enough, it takes some getting used to, but it offers great flexibility.

Q. Can anyone tell me the difference between persistence and dynamic caches? On which conditions we are using these caches? Dynamic:-1)When you use a dynamic cache, the Informatica Server updates the lookup cache as it passes rows to the target. 2)In Dynamic, we can update catch will New data also. 3) Dynamic cache, Not Reusable (when we need Updated cache data, That only we need Dynamic Cache)

Persistent:-1)a Lookup transformation to use a non-persistent or persistent cache. The PowerCenter Server saves or deletes lookup cache files after a successful session based on the Lookup Cache Persistent property. 2) Persistent, we are not able to update the catch with New data. 3) Persistent catch is Reusable.

(When we need Previous Cache data, That only we need Persistent Cache) ---------------------------------few more additions to the above answer..... 1. Dynamic lookup allows modifying cache where as Persistent lookup does not allow us to modify cache. 2. Dynamic lookup use 'newlookup row', a default port in the cache but persistent does use any default ports in cache. 3.As session completes dynamic cache removed but the persistent cache saved in informatica power centre server. Q. How to obtain performance data for individual transformations? There is a property at session level Collect Performance Data, you can select that property. It gives you performance details for all the transformations.

Q. List of Active and Passive Transformations in Informatica? Active Transformation - An active transformation changes the number of rows that pass through the mapping. Source Qualifier Transformation Sorter Transformations Aggregator Transformations Filter Transformation Union Transformation Joiner Transformation Normalizer Transformation Rank Transformation Router Transformation Update Strategy Transformation Advanced External Procedure Transformation

Passive Transformation - Passive transformations do not change the number of rows that pass through the mapping. Expression Transformation Sequence Generator Transformation Lookup Transformation Stored Procedure Transformation XML Source Qualifier Transformation External Procedure Transformation

Q. Eliminating of duplicate records without using dynamic lookups? Hi U can eliminate duplicate records by an simple one line SQL Query. Select id, count (*) from seq1 group by id having count (*)>1; Below are the ways to eliminate the duplicate records: 1. By enabling the option in Source Qualifier transformation as select distinct. 2. By enabling the option in sorter transformation as select distinct. 3. By enabling all the values as group by in Aggregator transformation. Q. Can anyone give idea on how do we perform test load in informatica? What do we test as part of test load in informatica? With a test load, the Informatica Server reads and transforms data without writing to targets. The Informatica Server does everything, as if running the full session. The Informatica Server writes data to relational targets, but rolls back the data when the session completes. So, you can enable collect performance details property and analyze the how efficient your mapping is. If the session is running for a long time, you may like to find out the bottlenecks that are existing. It may be bottleneck of type target, source, mapping etc.

The basic idea behind test load is to see the behavior of Informatica Server with your session. Q. What is ODS (Operational Data Store)? A collection of operation or bases data that is extracted from operation databases and standardized, cleansed, consolidated, transformed, and loaded into enterprise data architecture. An ODS is used to support data mining of operational data, or as the store for base data that is summarized for a data warehouse. The ODS may also be used to audit the data warehouse to assure summarized and derived data is calculated properly. The ODS may further become the enterprise shared operational database, allowing operational systems that are being reengineered to use the ODS as there operation databases. Q. How many tasks are there in informatica? Session Task Email Task Command Task Assignment Task Control Task Decision Task Event-Raise Event- Wait Timer Task Link Task What are business components in Informatica?

Services Q. What is versioning? Its used to keep history of changes done on the mappings and workflows 1. Check in: You check in when you are done with your changes so that everyone can see those changes. 2. Check out: You check out from the main stream when you want to make any change to the mapping/workflow. 3. Version history: It will show you all the changes made and who made it. Q. Diff between $$$sessstarttime and sessstarttime? $$$SessStartTime - Returns session start time as a string value (String datatype) SESSSTARTTIME - Returns the date along with date timestamp (Date datatype) Q. Difference between $,$$,$$$ in Informatica? 1. $ Refers These are the system variables/Session Parameters like $Bad file,$input

file, $output file, $DB connection,$source,$target etc.. 2. $$ Refers User defined variables/Mapping Parameters like $$State,$$Time, $$Entity, $$Business_Date, $$SRC,etc. 3. $$$ Refers System Parameters like $$$SessStartTime $$$SessStartTime returns the session start time as a string value. The format of the string depends on the database you are using. $$$SessStartTime returns the session start time as a string value --> The format of the string depends on the database you are using. Q. Finding Duplicate Rows based on Multiple Columns? SELECT firstname, COUNT(firstname), surname, COUNT(surname), email, COUNT(email) FROM employee GROUP BY firstname, surname, email HAVING (COUNT(firstname) > 1) AND (COUNT(surname) > 1) AND (COUNT(email) > 1); Q. Finding Nth Highest Salary in Oracle? Pick out the Nth highest salary, say the 4th highest salary. Select * from (select ename,sal,dense_rank() over (order by sal desc) emp_rank from emp) where emp_rank=4; Q. Find out the third highest salary? SELECT MIN(sal) FROM emp WHERE sal IN (SELECT distinct TOP 3 sal FROM emp ORDER BY sal DESC);

Q. How do you handle error logic in Informatica? What are the transformations that you used while handling errors? How did you reload those error records in target? Row indicator: It generally happens when working with update strategy transformation. The writer/target rejects the rows going to the target Column indicator: D -Valid

o - Overflow n - Null t - Truncate When the data is with nulls, or overflow it will be rejected to write the data to the target The reject data is stored on reject files. You can check the data and reload the data in to the target using reject reload utility. Q. Difference between STOP and ABORT? Stop - If the Integration Service is executing a Session task when you issue the stop command, the Integration Service stops reading data. It continues processing and writing data and committing data to targets. If the Integration Service cannot finish processing and committing data, you can issue the abort command. Abort - The Integration Service handles the abort command for the Session task like the stop command, except it has a timeout period of 60 seconds. If the Integration Service cannot finish processing and committing data within the timeout period, it kills the DTM process and terminates the session.

Q. What is inline view? An inline view is term given to sub query in FROM clause of query which can be used as table. Inline view effectively is a named sub query Ex : Select Tab1.col1,Tab1.col.2,Inview.col1,Inview.Col2 From Tab1, (Select statement) Inview Where Tab1.col1=Inview.col1 SELECT DNAME, ENAME, SAL FROM EMP , (SELECT DNAME, DEPTNO FROM DEPT) D WHERE A.DEPTNO = B.DEPTNO In the above query (SELECT DNAME, DEPTNO FROM DEPT) D is the inline view. Inline views are determined at runtime, and in contrast to normal view they are not stored in the data dictionary, Disadvantage of using this is 1. Separate view need to be created which is an overhead 2. Extra time taken in parsing of view This problem is solved by inline view by using select statement in sub query and using that as table.

Advantage of using inline views: 1. Better query performance 2. Better visibility of code Practical use of Inline views: 1. Joining Grouped data with non grouped data 2. Getting data to use in another query

Q. What is generated key and generated column id in normalizer transformation? The integration service increments the generated key (GK) sequence number each time it process a source row. When the source row contains a multiple-occurring column or a multiple-occurring group of columns, the normalizer transformation returns a row for each occurrence. Each row contains the same generated key value. The normalizer transformation has a generated column ID (GCID) port for each multiple-occurring column. The GCID is an index for the instance of the multiple-occurring data. For example, if a column occurs 3 times in a source record, the normalizer returns a value of 1, 2 or 3 in the generated column ID.

Q. What is difference between SUBSTR and INSTR? INSTR function search string for sub-string and returns an integer indicating the position of the character in string that is the first character of this occurrence. SUBSTR function returns a portion of string, beginning at character position, substring_length characters long. SUBSTR calculates lengths using characters as defined by the input character set.

Q. What are different Oracle database objects? TABLES VIEWS INDEXES SYNONYMS SEQUENCES TABLESPACES Q. What is @@ERROR?

The @@ERROR automatic variable returns the error code of the last Transact-SQL statement. If there was no error, @@ERROR returns zero. Because @@ERROR is reset after each Transact-SQL statement, it must be saved to a variable if it is needed to process it further after checking it. Q. What is difference between co-related sub query and nested sub query? Correlated subquery runs once for each row selected by the outer query. It contains a reference to a value from the row selected by the outer query. Nested subquery runs only once for the entire nesting (outer) query. It does not contain any reference to the outer query row. For example, Correlated Subquery: Select e1.empname, e1.basicsal, e1.deptno from emp e1 where e1.basicsal = (select max(basicsal) from emp e2 where e2.deptno = e1.deptno) Nested Subquery: Select empname, basicsal, deptno from emp where (deptno, basicsal) in (select deptno, max(basicsal) from emp group by deptno) Q. How does one escape special characters when building SQL queries? The LIKE keyword allows for string searches. The _ wild card character is used to match exactly one character, % is used to match zero or more occurrences of any characters. These characters can be escaped in SQL. Example: SELECT name FROM emp WHERE id LIKE %\_% ESCAPE \; Use two quotes for every one displayed. Example: SELECT Frankss Oracle site FROM DUAL; SELECT A quoted word. FROM DUAL; SELECT A double quoted word. FROM DUAL;

Q. Difference between Surrogate key and Primary key? Surrogate key: 1. Query processing is fast. 2. It is only numeric 3. Developer develops the surrogate key using sequence generator transformation.

4. Eg: 12453 Primary key: 1. Query processing is slow 2. Can be alpha numeric 3. Source system gives the primary key. 4. Eg: C10999 Q. How does one eliminate duplicate rows in an Oracle Table? Method 1: DELETE from table_name A where rowid > (select min(rowid) from table_name B where A.key_values = B.key_values); Method 2: Create table table_name2 as select distinct * from table_name1; drop table table_name1; rename table table_name2 as table_name1; In this method, all the indexes,constraints,triggers etc have to be re-created. Method 3: DELETE from table_name t1 where exists (select x from table_name t2 where t1.key_value=t2.key_value and t1.rowid > t2.rowid) Method 4: DELETE from table_name where rowid not in (select max(rowid) from my_table group by key_value ) Q. Query to retrieve Nth row from an Oracle table? The query is as follows: select * from my_table where rownum <= n MINUS select * from my_table where rownum < n;

Q. How does the server recognize the source and target databases? If it is relational - By using ODBC connection FTP connection - By using flat file Q. What are the different types of indexes supported by Oracle? 1. B-tree index 2. B-tree cluster index 3. Hash cluster index 4. Reverse key index 5. Bitmap index 6. Function Based index Q. Types of Normalizer transformation? There are two types of Normalizer transformation. VSAM Normalizer transformation A non-reusable transformation that is a Source Qualifier transformation for a COBOL source. The Mapping Designer creates VSAM Normalizer columns from a COBOL source in a mapping. The column attributes are readonly. The VSAM Normalizer receives a multiple-occurring source column through one input port. Pipeline Normalizer transformation A transformation that processes multiple-occurring data from relational tables or flat files. You might choose this option when you want to process multiple-occurring data from another transformation in the mapping. A VSAM Normalizer transformation has one input port for a multiple-occurring column. A pipeline Normalizer transformation has multiple input ports for a multiple-occurring column. When you create a Normalizer transformation in the Transformation Developer, you create a pipeline Normalizer transformation by default. When you create a pipeline Normalizer transformation, you define the columns based on the data the transformation receives from another type of transformation such as a Source Qualifier transformation. The Normalizer transformation has one output port for each single-occurring input port. Q. What are all the transformation you used if source as XML file? XML Source Qualifier

XML Parser XML Generator Q. List the files in ascending order in UNIX? ls -lt (sort by last date modified) ls ltr (reverse) ls lS (sort by size of the file) Q. How do identify the empty line in a flat file in UNIX? How to remove it? grep v ^$ filename Q. How do send the session report (.txt) to manager after session is completed? Email variable %a (attach the file) %g attach session log file Q. How to check all the running processes in UNIX? $> ps ef Q. How can i display only and only hidden file in the current directory? ls -a|grep "^\." Q. How to display the first 10 lines of a file? # head -10 logfile Q. How to display the last 10 lines of a file? # tail -10 logfile

Q. How did you schedule sessions in your project? 1. Run once Set 2 parameter date and time when session should start. 2. Run Every Informatica server run session at regular interval as we configured, parameter Days, hour, minutes, end on, end after, forever. 3. Customized repeat Repeat every 2 days, daily frequency hr, min, every week, every month. Q. What is lookup override? This feature is similar to entering a custom query in a Source Qualifier transformation. When entering a Lookup SQL Override, you can enter the entire override, or generate and edit the default SQL statement. The lookup query override can include WHERE clause. Q. What is Sql Override? The Source Qualifier provides the SQL Query option to override the default query. You can enter any SQL statement supported by your source database. You might enter your own SELECT statement, or have the database perform aggregate calculations, or call a stored procedure or stored function to read the data and perform some tasks. Q. How to get sequence value using Expression? v_temp = v_temp+1 o_seq = IIF(ISNULL(v_temp), 0, v_temp) Q. How to get Unique Record? Source > SQ > SRT > EXP > FLT OR RTR > TGT In Expression: flag = Decode(true,eid=pre_eid, Y,'N) flag_out = flag pre_eid = eid Q. What are the different transaction levels available in transaction control transformation (TCL)? The following are the transaction levels or built-in variables:

TC_CONTINUE_TRANSACTION: The Integration Service does not perform any transaction change for this row. This is the default value of the expression. TC_COMMIT_BEFORE: The Integration Service commits the transaction, begins a new transaction, and writes the current row to the target. The current row is in the new transaction. TC_COMMIT_AFTER: The Integration Service writes the current row to the target, commits the transaction, and begins a new transaction. The current row is in the committed transaction. TC_ROLLBACK_BEFORE: The Integration Service rolls back the current transaction, begins a new transaction, and writes the current row to the target. The current row is in the new transaction. TC_ROLLBACK_AFTER: The Integration Service writes the current row to the target, rolls back the transaction, and begins a new transaction. The current row is in the rolled back transaction. Q. What is difference between grep and find? Grep is used for finding any string in the file. Syntax - grep <String> <filename> Example - grep 'compu' details.txt Display the whole line,in which line compu string is found.

Find is used to find the file or directory in given path, Syntax - find <filename> Example - find compu* Display all file names starting with computer

Q. What are the difference between DDL, DML and DCL commands? DDL is Data Definition Language statements CREATE to create objects in the database ALTER alters the structure of the database DROP delete objects from the database TRUNCATE remove all records from a table, including all spaces allocated for the records are removed COMMENT add comments to the data dictionary GRANT gives users access privileges to database REVOKE withdraw access privileges given with the GRANT command DML is Data Manipulation Language statements SELECT retrieve data from the a database INSERT insert data into a table UPDATE updates existing data within a table DELETE deletes all records from a table, the space for the records remain CALL call a PL/SQL or Java subprogram EXPLAIN PLAN explain access path to data LOCK TABLE control concurrency DCL is Data Control Language statements COMMIT save work done SAVEPOINT identify a point in a transaction to which you can later roll back ROLLBACK restore database to original since the last COMMIT SET TRANSACTION Change transaction options like what rollback segment to use

Q. What is Stored Procedure? A stored procedure is a named group of SQL statements that have been previously created and stored in the server database. Stored procedures accept input parameters so that a single procedure can be used over the network by several clients using different input data. And when the procedure is modified, all clients automatically get the new version. Stored procedures reduce network traffic and improve performance. Stored procedures can be used to help ensure the integrity of the database. Q. What is View? A view is a tailored presentation of the data contained in one or more tables (or other views). Unlike a table, a view is not allocated any storage space, nor does a view actually contain data; rather, a view is defined by a query that extracts or derives data from the tables the view references. These tables are called base tables. Views present a different representation of the data that resides within the base tables. Views are very powerful because they allow you to tailor the presentation of data to different types of users. Views are often used to: Provide an additional level of table security by restricting access to a predetermined set of rows and/or columns of a table Hide data complexity Simplify commands for the user Present the data in a different perspective from that of the base table Isolate applications from changes in definitions of base tables Express a query that cannot be expressed without using a view Q. What is Trigger? A trigger is a SQL procedure that initiates an action when an event (INSERT, DELETE or UPDATE) occurs. Triggers are stored in and managed by the DBMS. Triggers are used to maintain the referential integrity of data by changing the data in a systematic fashion. A trigger cannot be called or executed; the DBMS automatically fires the trigger as a result of a data modification to the associated table. Triggers can be viewed as similar to stored procedures in that both consist of procedural logic that is stored at the database level. Stored procedures, however, are not event-drive and are not attached to a specific table as triggers are. Stored procedures are explicitly executed by invoking a CALL to the procedure while triggers are implicitly executed. In addition, triggers can also execute stored Procedures.

Nested Trigger: A trigger can also contain INSERT, UPDATE and DELETE logic within itself, so when the trigger is fired because of data modification it can also cause another data modification, thereby firing another trigger. A trigger that contains data modification logic within itself is called a nested trigger. Q. What is View? A simple view can be thought of as a subset of a table. It can be used for retrieving data, as well as updating or deleting rows. Rows updated or deleted in the view are updated or deleted in the table the view was created with. It should also be noted that as data in the original table changes, so does data in the view, as views are the way to look at part of the original table. The results of using a view are not permanently stored in the database. The data accessed through a view is actually constructed using standard T-SQL select command and can come from one to many different base tables or even other views. Q. What is Index? An index is a physical structure containing pointers to the data. Indices are created in an existing table to locate rows more quickly and efficiently. It is possible to create an index on one or more columns of a table, and each index is given a name. The users cannot see the indexes; they are just used to speed up queries. Effective indexes are one of the best ways to improve performance in a database application. A table scan happens when there is no index available to help a query. In a table scan SQL Server examines every row in the table to satisfy the query results. Table scans are sometimes unavoidable, but on large tables, scans have a terrific impact on performance. Clustered indexes define the physical sorting of a database tables rows in the storage media. For this reason, each database table may have only one clustered index. Non-clustered indexes are created outside of the database table and contain a sorted list of references to the table itself. Q. What is the difference between clustered and a non-clustered index? A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data pages. A nonclustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a nonclustered index does not consist of the data pages. Instead, the leaf nodes contain index rows. Q. What is Cursor? Cursor is a database object used by applications to manipulate data in a set on a row-by row basis, instead of the typical SQL commands that operate on all the rows in the set at one time.

In order to work with a cursor we need to perform some steps in the following order: Declare cursor Open cursor Fetch row from the cursor Process fetched row Close cursor Deallocate cursor Q. What is the difference between a HAVING CLAUSE and a WHERE CLAUSE? 1. Specifies a search condition for a group or an aggregate. HAVING can be used only with the SELECT statement. 2. HAVING is typically used in a GROUP BY clause. When GROUP BY is not used, HAVING behaves like a WHERE clause. 3. Having Clause is basically used only with the GROUP BY function in a query. WHERE Clause is applied to each row before they are part of the GROUP BY function in a query.

RANK CACHE Sample Rank Mapping When the Power Center Server runs a session with a Rank transformation, it compares an input row with rows in the data cache. If the input row out-ranks a Stored row, the Power Center Server replaces the stored row with the input row. Example: Power Center caches the first 5 rows if we are finding top 5 salaried Employees. When 6th row is read, it compares it with 5 rows in cache and places it in Cache is needed. 1) RANK INDEX CACHE: The index cache holds group information from the group by ports. If we are Using Group By on DEPTNO, then this cache stores values 10, 20, 30 etc. All Group By Columns are in RANK INDEX CACHE. Ex. DEPTNO 2) RANK DATA CACHE: It holds row data until the Power Center Server completes the ranking and is generally larger than the index cache. To reduce the data cache size, connect only the necessary input/output ports to subsequent transformations. All Variable ports if there, Rank Port, All ports going out from RANK Transformations are stored in RANK DATA CACHE. Example: All ports except DEPTNO In our mapping example.

Aggregator Caches 1. The Power Center Server stores data in the aggregate cache until it completes Aggregate calculations. 2. It stores group values in an index cache and row data in the data cache. If the Power Center Server requires more space, it stores overflow values in cache files. Note: The Power Center Server uses memory to process an Aggregator transformation with sorted ports. It does not use cache memory. We do not need to configure cache memory for Aggregator transformations that use sorted ports. 1) Aggregator Index Cache: The index cache holds group information from the group by ports. If we are using Group By on DEPTNO, then this cache stores values 10, 20, 30 etc. All Group By Columns are in AGGREGATOR INDEX CACHE. Ex. DEPTNO 2) Aggregator Data Cache: DATA CACHE is generally larger than the AGGREGATOR INDEX CACHE. Columns in Data Cache: Variable ports if any Non group by input/output ports. Non group by input ports used in non-aggregate output expression. Port containing aggregate function

JOINER CACHES Joiner always caches the MASTER table. We cannot disable caching. It builds Index cache and Data Cache based on MASTER table. 1) Joiner Index Cache: All Columns of MASTER table used in Join condition are in JOINER INDEX CACHE. Example: DEPTNO in our mapping. 2) Joiner Data Cache: Master column not in join condition and used for output to other transformation or target table are in Data Cache. Example: DNAME and LOC in our mapping example.

Lookup Cache Files 1. Lookup Index Cache: Stores data for the columns used in the lookup condition. 2. Lookup Data Cache: For a connected Lookup transformation, stores data for the connected output ports, not including ports used in the lookup condition. For an unconnected Lookup transformation, stores data from the return port.

OLTP and OLAP Logical Data Modeling Vs Physical Data Modeling

Router Transformation And Filter Transformation Source Qualifier And Lookup Transformation Mapping And Mapplet Joiner Transformation And Lookup Transformation Dimension Table and Fact Table

Connected Lookup and Unconnected Lookup Connected Lookup Unconnected Lookup

Receives input values directly from the pipeline. Receives input values from the result of a :LKP expression in another transformation. We can use a dynamic or static cache. Cache includes all lookup columns used in the mapping. If there is no match for the lookup condition, the Power Center Server returns the default value for all output ports. We can use a static cache. Cache includes all lookup/output ports in the lookup condition and the lookup/return port. If there is no match for the lookup condition, the Power Center Server returns NULL.

If there is a match for the lookup condition, the If there is a match for the lookup condition, the Power Center Power Center Server returns the result of the Server returns the result of the lookup condition into the lookup condition for all lookup/output ports. return port. Pass multiple output values to another transformation. Supports user-defined default values Pass one output value to another transformation.

Does not support user-defined default values.

Cache Comparison Persistence and Dynamic Caches Dynamic 1) When you use a dynamic cache, the Informatica Server updates the lookup cache as it passes rows to the target. 2) In Dynamic, we can update catch will new data also. 3) Dynamic cache, Not Reusable. (When we need updated cache data, That only we need Dynamic Cache) Persistent 1) A Lookup transformation to use a non-persistent or persistent cache. The PowerCenter Server saves or deletes lookup cache files after a successful session based on the Lookup Cache Persistent property. 2) Persistent, we are not able to update the catch with new data. 3) Persistent catch is Reusable. (When we need previous cache data, that only we need Persistent Cache) View And Materialized View Star Schema And Snow Flake Schema

Informatica - Transformations In Informatica, Transformations help to transform the source data according to the requirements of target system and it ensures the quality of the data being loaded into target. Transformations are of two types: Active and Passive.

Active Transformation An active transformation can change the number of rows that pass through it from source to target. (i.e) It eliminates rows that do not meet the condition in transformation.

Passive Transformation A passive transformation does not change the number of rows that pass through it (i.e) It passes all rows through the transformation.

Transformations can be Connected or Unconnected.

Connected Transformation Connected transformation is connected to other transformations or directly to target table in the mapping.

Unconnected Transformation An unconnected transformation is not connected to other transformations in the mapping. It is called within another transformation, and returns a value to that transformation.

Following are the list of Transformations available in Informatica: Aggregator Transformation Expression Transformation Filter Transformation Joiner Transformation Lookup Transformation Normalizer Transformation Rank Transformation Router Transformation Sequence Generator Transformation Stored Procedure Transformation

Sorter Transformation Update Strategy Transformation XML Source Qualifier Transformation

In the following pages, we will explain all the above Informatica Transformations and their significances in the ETL process in detail. ============================================================================== Aggregator Transformation Aggregator transformation is an Active and Connected transformation. This transformation is useful to perform calculations such as averages and sums (mainly to perform calculations on multiple rows or groups). For example, to calculate total of daily sales or to calculate average of monthly or yearly sales. Aggregate functions such as AVG, FIRST, COUNT, PERCENTILE, MAX, SUM etc. can be used in aggregate transformation. ============================================================================== Expression Transformation Expression transformation is a Passive and Connected transformation. This can be used to calculate values in a single row before writing to the target. For example, to calculate discount of each product or to concatenate first and last names or to convert date to a string field. ============================================================================== Filter Transformation Filter transformation is an Active and Connected transformation. This can be used to filter rows in a mapping that do not meet the condition. For example, To know all the employees who are working in Department 10 or To find out the products that falls between the rate category $500 and $1000. ============================================================================== Joiner Transformation Joiner Transformation is an Active and Connected transformation. This can be used to join two sources coming from two different locations or from same location. For example, to join a flat file and a relational source or to join two flat files or to join a relational source and a XML source. In order to join two sources, there must be at least one matching port. While joining two sources it is a must to

specify one source as master and the other as detail. The Joiner transformation supports the following types of joins: 1)Normal 2)Master Outer 3)Detail Outer 4)Full Outer Normal join discards all the rows of data from the master and detail source that do not match, based on the condition. Master outer join discards all the unmatched rows from the master source and keeps all the rows from the detail source and the matching rows from the master source. Detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source. Full outer join keeps all rows of data from both the master and detail sources. ============================================================================== Lookup transformation Lookup transformation is Passive and it can be both Connected and UnConnected as well. It is used to look up data in a relational table, view, or synonym. Lookup definition can be imported either from source or from target tables. For example, if we want to retrieve all the sales of a product with an ID 10 and assume that the sales data resides in another table. Here instead of using the sales table as one more source, use Lookup transformation to lookup the data for the product, with ID 10 in sales table. Connected lookup receives input values directly from mapping pipeline whereas Unconnected lookup receives values from: LKP expression from another transformation. Connected lookup returns multiple columns from the same row whereas Unconnected lookup has one return port and returns one column from each row.

Connected lookup supports user-defined default values whereas Unconnected lookup does not support user defined values. ============================================================================== Normalizer Transformation Normalizer Transformation is an Active and Connected transformation. It is used mainly with COBOL sources where most of the time data is stored in de-normalized format. Also, Normalizer transformation can be used to create multiple rows from a single row of data. ==============================================================================

Rank Transformation Rank transformation is an Active and Connected transformation. It is used to select the top or bottom rank of data. For example, To select top 10 Regions where the sales volume was very high or To select 10 lowest priced products. ============================================================================== Router Transformation Router is an Active and Connected transformation. It is similar to filter transformation. The only difference is, filter transformation drops the data that do not meet the condition whereas router has an option to capture the data that do not meet the condition. It is useful to test multiple conditions. It has input, output and default groups. For example, if we want to filter data like where State=Michigan, State=California, State=New York and all other States. Its easy to route data to different tables. ============================================================================== Sequence Generator Transformation Sequence Generator transformation is a Passive and Connected transformation. It is used to create unique primary key values or cycle through a sequential range of numbers or to replace missing keys. It has two output ports to connect transformations. By default it has two fields CURRVAL and NEXTVAL (You cannot add ports to this transformation). NEXTVAL port generates a sequence of numbers by connecting it to a transformation or target. CURRVAL is the NEXTVAL value plus one or NEXTVAL plus the Increment By value. ============================================================================== Sorter Transformation Sorter transformation is a Connected and an Active transformation. It allows sorting data either in ascending or descending order according to a specified field. Also used to configure for case-sensitive sorting, and specify whether the output rows should be distinct. ============================================================================== Source Qualifier Transformation Source Qualifier transformation is an Active and Connected transformation. When adding a relational or a flat file source definition to a mapping, it is must to connect it to a Source Qualifier transformation. The Source Qualifier performs the various tasks such as Overriding Default SQL query,

Filtering records; join data from two or more tables etc. ============================================================================== Stored Procedure Transformation Stored Procedure transformation is a Passive and Connected & Unconnected transformation. It is useful to automate time-consuming tasks and it is also used in error handling, to drop and recreate indexes and to determine the space in database, a specialized calculation etc. The stored procedure must exist in the database before creating a Stored Procedure transformation, and the stored procedure can exist in a source, target, or any database with a valid connection to the Informatica Server. Stored Procedure is an executable script with SQL statements and control statements, user-defined variables and conditional statements. ============================================================================== Update Strategy Transformation Update strategy transformation is an Active and Connected transformation. It is used to update data in target table, either to maintain history of data or recent changes. You can specify how to treat source rows in table, insert, update, delete or data driven. ============================================================================== XML Source Qualifier Transformation XML Source Qualifier is a Passive and Connected transformation. XML Source Qualifier is used only with an XML source definition. It represents the data elements that the Informatica Server reads when it executes a session with XML sources. ==============================================================================

Constraint-Based Loading In the Workflow Manager, you can specify constraint-based loading for a session. When you select this option, the Integration Service orders the target load on a row-by-row basis. For every row generated by an active source, the Integration Service loads the corresponding transformed row first to the primary key table, then to any foreign key tables. Constraint-based loading depends on the following requirements: Active source: Related target tables must have the same active source. Key relationships: Target tables must have key relationships. Target connection groups: Targets must be in one target connection group. Treat rows as insert. Use this option when you insert into the target. You cannot use updates with constraint based loading. Active Source: When target tables receive rows from different active sources, the Integration Service reverts to normal loading for those tables, but loads all other targets in the session using constraint-based loading when possible. For example, a mapping contains three distinct pipelines. The first two contain a source, source qualifier, and target. Since these two targets receive data from different active sources, the Integration Service reverts to normal loading for both targets. The third pipeline contains a source, Normalizer, and two targets. Since these two targets share a single active source (the Normalizer), the Integration Service performs constraint-based loading: loading the primary key table first, then the foreign key table. Key Relationships: When target tables have no key relationships, the Integration Service does not perform constraint-based loading. Similarly, when target tables have circular key relationships, the Integration Service reverts to a normal load. For example, you have one target containing a primary key and a foreign key related to the primary key in a second target. The second target also contains a foreign key that references the primary key in the first target. The Integration Service cannot enforce constraint-based loading for these tables. It reverts to a normal load. Target Connection Groups: The Integration Service enforces constraint-based loading for targets in the same target connection group. If you want to specify constraint-based loading for multiple targets that receive data from the same active source, you must verify the tables are in the same target connection group. If the tables with the primary key-foreign key relationship are in different target connection groups, the Integration Service cannot enforce constraint-based loading when you run the workflow. To verify that all targets are in the same target connection group, complete the following tasks: Verify all targets are in the same target load order group and receive data from the same active source. Use the default partition properties and do not add partitions or partition points. Define the same target type for all targets in the session properties. Define the same database connection name for all targets in the session properties. Choose normal mode for the target load type for all targets in the session properties. Treat Rows as Insert: Use constraint-based loading when the session option Treat Source Rows As is set to insert. You might get inconsistent data if you select a different Treat Source Rows As option and you configure the session for constraint-based loading. When the mapping contains Update Strategy transformations and you need to load data to a primary key table first, split the mapping using one of the following options: Load primary key table in one mapping and dependent tables in another mapping. Use constraint-based loading to load the primary table. Perform inserts in one mapping and updates in another mapping. Constraint-based loading does not affect the target load ordering of the mapping. Target load ordering defines the order the Integration Service reads the sources in each target load order group in the mapping. A target load order group is a collection of source qualifiers, transformations, and targets linked together in a mapping. Constraint based loading establishes the order in which the Integration Service loads individual targets within a set of targets receiving data from a single source qualifier.

Example The following mapping is configured to perform constraint-based loading: In the first pipeline, target T_1 has a primary key, T_2 and T_3 contain foreign keys referencing the T1 primary key. T_3 has a primary key that T_4 references as a foreign key. Since these tables receive records from a single active source, SQ_A, the Integration Service loads rows to the target in the following order: 1. T_1 2. T_2 and T_3 (in no particular order) 3. T_4 The Integration Service loads T_1 first because it has no foreign key dependencies and contains a primary key referenced by T_2 and T_3. The Integration Service then loads T_2 and T_3, but since T_2 and T_3 have no dependencies, they are not loaded in any particular order. The Integration Service loads T_4 last, because it has a foreign key that references a primary key in T_3.After loading the first set of targets, the Integration Service begins reading source B. If there are no key relationships between T_5 and T_6, the Integration Service reverts to a normal load for both targets. If T_6 has a foreign key that references a primary key in T_5, since T_5 and T_6 receive data from a single active source, the Aggregator AGGTRANS, the Integration Service loads rows to the tables in the following order: T_5 T_6 T_1, T_2, T_3, and T_4 are in one target connection group if you use the same database connection for each target, and you use the default partition properties. T_5 and T_6 are in another target connection group together if you use the same database connection for each target and you use the default partition properties. The Integration Service includes T_5 and T_6 in a different target connection group because they are in a different target load order group from the first four targets. Enabling Constraint-Based Loading: When you enable constraint-based loading, the Integration Service orders the target load on a row-by-row basis. To enable constraint-based loading: 1. In the General Options settings of the Properties tab, choose Insert for the Treat Source Rows As property. 2. Click the Config Object tab. In the Advanced settings, select Constraint Based Load Ordering. 3. Click OK.

Target Load Plan When you use a mapplet in a mapping, the Mapping Designer lets you set the target load plan for sources within the mapplet. Setting the Target Load Order You can configure the target load order for a mapping containing any type of target definition. In the Designer, you can set the order in which the Integration Service sends rows to targets in different target load order groups in a mapping. A target load order group is the collection of source qualifiers, transformations, and targets linked together in a mapping. You can set the target load order if you want to maintain referential integrity when inserting, deleting, or updating tables that have the primary key and foreign key constraints. The Integration Service reads sources in a target load order group concurrently, and it processes target load order groups sequentially. To specify the order in which the Integration Service sends data to targets, create one source qualifier for each target within a mapping. To set the target load order, you then determine in which order the Integration Service reads each source in the mapping. The following figure shows two target load order groups in one mapping: In this mapping, the first target load order group includes ITEMS, SQ_ITEMS, and T_ITEMS. The second target load order group includes all other objects in the mapping, including the TOTAL_ORDERS target. The Integration Service processes the first target load order group, and then the second target load order group. When it processes the second target load order group, it reads data from both sources at the same time. To set the target load order: Create a mapping that contains multiple target load order groups. Click Mappings > Target Load Plan. The Target Load Plan dialog box lists all Source Qualifier transformations in the mapping and the targets that receive data from each source qualifier. Select a source qualifier from the list. Click the Up and Down buttons to move the source qualifier within the load order. Repeat steps 3 to 4 for other source qualifiers you want to reorder. Click OK.

Mapping Parameters & Variables Mapping parameters and variables represent values in mappings and mapplets. When we use a mapping parameter or variable in a mapping, first we declare the mapping parameter or variable for use in each mapplet or mapping. Then, we define a value for the mapping parameter or variable before we run the session. Mapping Parameters A mapping parameter represents a constant value that we can define before running a session. A mapping parameter retains the same value throughout the entire session. Example: When we want to extract records of a particular month during ETL process, we will create a Mapping Parameter of data type and use it in query to compare it with the timestamp field in SQL override. After we create a parameter, it appears in the Expression Editor. We can then use the parameter in any expression in the mapplet or mapping. We can also use parameters in a source qualifier filter, user-defined join, or extract override, and in the Expression Editor of reusable transformations. Mapping Variables Unlike mapping parameters, mapping variables are values that can change between sessions. The Integration Service saves the latest value of a mapping variable to the repository at the end of each successful session. We can override a saved value with the parameter file. We can also clear all saved values for the session in the Workflow Manager. We might use a mapping variable to perform an incremental read of the source. For example, we have a source table containing time stamped transactions and we want to evaluate the transactions on a daily basis. Instead of manually entering a session override to filter source data each time we run the session, we can create a mapping variable, $$IncludeDateTime. In the source qualifier, create a filter to read only rows whose transaction date equals $$IncludeDateTime, such as: TIMESTAMP = $$IncludeDateTime In the mapping, use a variable function to set the variable value to increment one day each time the session runs. If we set the initial value of $$IncludeDateTime to 8/1/2004, the first time the Integration Service runs the session, it reads only rows dated 8/1/2004. During the session, the Integration Service sets $$IncludeDateTime to 8/2/2004. It saves 8/2/2004 to the repository at the end of the session. The next time it runs the session, it reads only rows from August 2, 2004. Used in following transformations: Expression Filter Router Update Strategy Initial and Default Value: When we declare a mapping parameter or variable in a mapping or a mapplet, we can enter an initial value. When the Integration Service needs an initial value, and we did not declare an initial value for the parameter or variable, the Integration Service uses a default value based on the data type of the parameter or variable. Data ->Default Value Numeric ->0 String ->Empty String Date time ->1/1/1 Variable Values: Start value and current value of a mapping variable Start Value: The start value is the value of the variable at the start of the session. The Integration Service looks for the start value in the following order: Value in parameter file Value saved in the repository Initial value Default value Current Value:

The current value is the value of the variable as the session progresses. When a session starts, the current value of a variable is the same as the start value. The final current value for a variable is saved to the repository at the end of a successful session. When a session fails to complete, the Integration Service does not update the value of the variable in the repository. Note: If a variable function is not used to calculate the current value of a mapping variable, the start value of the variable is saved to the repository. Variable Data type and Aggregation Type When we declare a mapping variable in a mapping, we need to configure the Data type and aggregation type for the variable. The IS uses the aggregate type of a Mapping variable to determine the final current value of the mapping variable. Aggregation types are: Count: Integer and small integer data types are valid only. Max: All transformation data types except binary data type are valid. Min: All transformation data types except binary data type are valid. Variable Functions Variable functions determine how the Integration Service calculates the current value of a mapping variable in a pipeline. SetMaxVariable: Sets the variable to the maximum value of a group of values. It ignores rows marked for update, delete, or reject. Aggregation type set to Max. SetMinVariable: Sets the variable to the minimum value of a group of values. It ignores rows marked for update, delete, or reject. Aggregation type set to Min. SetCountVariable: Increments the variable value by one. It adds one to the variable value when a row is marked for insertion, and subtracts one when the row is Marked for deletion. It ignores rows marked for update or reject. Aggregation type set to Count. SetVariable: Sets the variable to the configured value. At the end of a session, it compares the final current value of the variable to the start value of the variable. Based on the aggregate type of the variable, it saves a final value to the repository. Creating Mapping Parameters and Variables Open the folder where we want to create parameter or variable. In the Mapping Designer, click Mappings > Parameters and Variables. -or- In the Mapplet Designer, click Mapplet > Parameters and Variables. Click the add button. Enter name. Do not remove $$ from name. Select Type and Data type. Select Aggregation type for mapping variables. Give Initial Value. Click ok. Example: Use of Mapping of Mapping Parameters and Variables EMP will be source table. Create a target table MP_MV_EXAMPLE having columns: EMPNO, ENAME, DEPTNO, TOTAL_SAL, MAX_VAR, MIN_VAR, COUNT_VAR and SET_VAR. TOTAL_SAL = SAL+ COMM + $$BONUS (Bonus is mapping parameter that changes every month) SET_VAR: We will be added one month to the HIREDATE of every employee. Create shortcuts as necessary. Creating Mapping 1. Open folder where we want to create the mapping. 2. Click Tools -> Mapping Designer. 3. Click Mapping-> Create-> Give name. Ex: m_mp_mv_example 4. Drag EMP and target table. 5. Transformation -> Create -> Select Expression for list -> Create > Done. 6. Drag EMPNO, ENAME, HIREDATE, SAL, COMM and DEPTNO to Expression. 7. Create Parameter $$Bonus and Give initial value as 200. 8. Create variable $$var_max of MAX aggregation type and initial value 1500. 9. Create variable $$var_min of MIN aggregation type and initial value 1500. 10. Create variable $$var_count of COUNT aggregation type and initial value 0. COUNT is visible when datatype is INT or SMALLINT.

11. Create variable $$var_set of MAX aggregation type. 12. Create 5 output ports out_ TOTAL_SAL, out_MAX_VAR, out_MIN_VAR, out_COUNT_VAR and out_SET_VAR. 13. Open expression editor for TOTAL_SAL. Do the same as we did earlier for SAL+ COMM. To add $$BONUS to it, select variable tab and select the parameter from mapping parameter. SAL + COMM + $$Bonus 14. Open Expression editor for out_max_var. 15. Select the variable function SETMAXVARIABLE from left side pane. Select $$var_max from variable tab and SAL from ports tab as shown below. SETMAXVARIABLE($$var_max,SAL) 17. Open Expression editor for out_min_var and write the following expression: SETMINVARIABLE($$var_min,SAL). Validate the expression. 18. Open Expression editor for out_count_var and write the following expression: SETCOUNTVARIABLE($$var_count). Validate the expression. 19. Open Expression editor for out_set_var and write the following expression: SETVARIABLE($$var_set,ADD_TO_DATE(HIREDATE,'MM',1)). Validate. 20. Click OK. Expression Transformation below: 21. Link all ports from expression to target and Validate Mapping and Save it. 22. See mapping picture on next page.

PARAMETER FILE A parameter file is a list of parameters and associated values for a workflow, worklet, or session. Parameter files provide flexibility to change these variables each time we run a workflow or session. We can create multiple parameter files and change the file we use for a session or workflow. We can create a parameter file using a text editor such as WordPad or Notepad. Enter the parameter file name and directory in the workflow or session properties. A parameter file contains the following types of parameters and variables: Workflow variable: References values and records information in a workflow. Worklet variable: References values and records information in a worklet. Use predefined worklet variables in a parent workflow, but we cannot use workflow variables from the parent workflow in a worklet. Session parameter: Defines a value that can change from session to session, such as a database connection or file name. Mapping parameter and Mapping variable USING A PARAMETER FILE Parameter files contain several sections preceded by a heading. The heading identifies the Integration Service, Integration Service process, workflow, worklet, or session to which we want to assign parameters or variables. Make session and workflow. Give connection information for source and target table. Run workflow and see result. Sample Parameter File for Our example: In the parameter file, folder and session names are case sensitive. Create a text file in notepad with name Para_File.txt [Practice.ST:s_m_MP_MV_Example] $$Bonus=1000 $$var_max=500 $$var_min=1200 $$var_count=0 CONFIGURING PARAMTER FILE We can specify the parameter file name and directory in the workflow or session properties. To enter a parameter file in the workflow properties: 1. Open a Workflow in the Workflow Manager. 2. Click Workflows > Edit. 3. Click the Properties tab. 4. Enter the parameter directory and name in the Parameter Filename field. 5. Click OK. To enter a parameter file in the session properties: 1. Open a session in the Workflow Manager. 2. Click the Properties tab and open the General Options settings. 3. Enter the parameter directory and name in the Parameter Filename field. 4. Example: D:\Files\Para_File.txt or $PMSourceFileDir\Para_File.txt 5. Click OK.

Mapplet A mapplet is a reusable object that we create in the Mapplet Designer. It contains a set of transformations and lets us reuse that transformation logic in multiple mappings. Created in Mapplet Designer in Designer Tool. We need to use same set of 5 transformations in say 10 mappings. So instead of making 5 transformations in every 10 mapping, we create a mapplet of these 5 transformations. Now we use this mapplet in all 10 mappings. Example: To create a surrogate key in target. We create a mapplet using a stored procedure to create Primary key for target table. We give target table name and key column name as input to mapplet and get the Surrogate key as output. Mapplets help simplify mappings in the following ways: Include source definitions: Use multiple source definitions and source qualifiers to provide source data for a mapping. Accept data from sources in a mapping Include multiple transformations: As many transformations as we need. Pass data to multiple transformations: We can create a mapplet to feed data to multiple transformations. Each Output transformation in a mapplet represents one output group in a mapplet. Contain unused ports: We do not have to connect all mapplet input and output ports in a mapping. Mapplet Input: Mapplet input can originate from a source definition and/or from an Input transformation in the mapplet. We can create multiple pipelines in a mapplet. We use Mapplet Input transformation to give input to mapplet. Use of Mapplet Input transformation is optional. Mapplet Output: The output of a mapplet is not connected to any target table. We must use Mapplet Output transformation to store mapplet output. A mapplet must contain at least one Output transformation with at least one connected port in the mapplet. Example1: We will join EMP and DEPT table. Then calculate total salary. Give the output to mapplet out transformation. EMP and DEPT will be source tables. Output will be given to transformation Mapplet_Out. Steps: Open folder where we want to create the mapping. Click Tools -> Mapplet Designer. Click Mapplets-> Create-> Give name. Ex: mplt_example1 Drag EMP and DEPT table. Use Joiner transformation as described earlier to join them. Transformation -> Create -> Select Expression for list -> Create -> Done Pass all ports from joiner to expression and then calculate total salary as described in expression transformation. Now Transformation -> Create -> Select Mapplet Out from list > Create -> Give name and then done. Pass all ports from expression to Mapplet output. Mapplet -> Validate Repository -> Save Use of mapplet in mapping: We can mapplet in mapping by just dragging the mapplet from mapplet folder on left pane as we drag source and target tables. When we use the mapplet in a mapping, the mapplet object displays only the ports from the Input and Output transformations. These are referred to as the mapplet input and mapplet output ports. Make sure to give correct connection information in session. Making a mapping: We will use mplt_example1, and then create a filter transformation to filter records whose Total Salary is >= 1500. mplt_example1 will be source. Create target table same as Mapplet_out transformation as in picture above. Creating Mapping Open folder where we want to create the mapping.

Click Tools -> Mapping Designer. Click Mapping-> Create-> Give name. Ex: m_mplt_example1 Drag mplt_Example1 and target table. Transformation -> Create -> Select Filter for list -> Create -> Done. Drag all ports from mplt_example1 to filter and give filter condition. Connect all ports from filter to target. We can add more transformations after filter if needed. Validate mapping and Save it. Make session and workflow. Give connection information for mapplet source tables. Give connection information for target table. Run workflow and see result.

Indirect Loading For Flat Files Suppose, you have 10 flat files of same structure. All the flat files have same number of columns and data type. Now we need to transfer all the 10 files to same target. Names of files are say EMP1, EMP2 and so on. Solution1: 1. Import one flat file definition and make the mapping as per need. 2. Now in session give the Source File name and Source File Directory location of one file. 3. Make workflow and run. 4. Now open session after workflow completes. Change the Filename and Directory to give information of second file. Run workflow again. 5. Do the above for all 10 files. Solution2: 1. Import one flat file definition and make the mapping as per need. 2. Now in session give the Source Directory location of the files. 3. Now in Fieldname use $InputFileName. This is a session parameter. 4. Now make a parameter file and give the value of $InputFileName. $InputFileName=EMP1.txt 5. Run the workflow 6. Now edit parameter file and give value of second file. Run workflow again. 7. Do same for remaining files. Solution3: 1. Import one flat file definition and make the mapping as per need. 2. Now make a notepad file that contains the location and name of each 10 flat files.

Sample: D:\EMP1.txt E:\EMP2.txt E:\FILES\DWH\EMP3.txt and so on 3. Now make a session and in Source file name and Source File Directory location fields, give the name and location of above created file. 4. In Source file type field, select Indirect. 5. Click Apply. 6. Validate Session 7. Make Workflow. Save it to repository and run.

Incremental Aggregation When we enable the session option-> Incremental Aggregation the Integration Service performs incremental aggregation, it passes source data through the mapping and uses historical cache data to perform aggregation calculations incrementally. When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If the source changes incrementally and you can capture changes, you can configure the session to process those changes. This allows the Integration Service to update the target incrementally, rather than forcing it to process the entire source and recalculate the same data each time you run the session. For example, you might have a session using a source that receives new data every day. You can capture those incremental changes because you have added a filter condition to the mapping that removes pre-existing data from the flow of data. You then enable incremental aggregation. When the session runs with incremental aggregation enabled for the first time on March 1, you use the entire source. This allows the Integration Service to read and store the necessary aggregate data. On March 2, when you run the session again, you filter out all the records except those time-stamped March 2. The Integration Service then processes the new data and updates the target accordingly. Consider using incremental aggregation in the following circumstances: You can capture new source data. Use incremental aggregation when you can capture new source data each time you run the session. Use a Stored Procedure or Filter transformation to process new data. Incremental changes do not significantly change the target. Use incremental aggregation when the changes do not significantly change the target. If processing the incrementally changed source alters more than half the existing target, the session may not benefit from using incremental aggregation. In this case, drop the table and recreate the target with complete source data. Note: Do not use incremental aggregation if the mapping contains percentile or median functions. The Integration Service uses system memory to process these functions in addition to the cache memory you configure in the session properties. As a result, the Integration Service does not store incremental aggregation values for percentile and median functions in disk caches. Integration Service Processing for Incremental Aggregation (i)The first time you run an incremental aggregation session, the Integration Service processes the entire source. At the end of the session, the Integration Service stores aggregate data from that session run in two files, the index file and the data file. The Integration Service creates the files in the cache directory specified in the Aggregator transformation properties. (ii)Each subsequent time you run the session with incremental aggregation, you use the incremental source changes in the session. For each input record, the Integration Service checks historical information in the index file for a corresponding group. If it finds a corresponding group, the Integration Service performs the aggregate operation incrementally, using the aggregate data for that group, and saves the incremental change. If it does not find a corresponding group, the Integration Service creates a new group and saves the record data. (iii)When writing to the target, the Integration Service applies the changes to the existing target. It saves modified aggregate data in the index and data files to be used as historical data the next time you run the session. (iv) If the source changes significantly and you want the Integration Service to continue saving aggregate data for future incremental changes, configure the Integration Service to overwrite existing aggregate data with new aggregate data. Each subsequent time you run a session with incremental aggregation, the Integration Service creates a backup of the incremental aggregation files. The cache directory for the Aggregator transformation must contain enough disk space for two sets of the files. (v)When you partition a session that uses incremental aggregation, the Integration Service creates one set of cache files for each partition. The Integration Service creates new aggregate data, instead of using historical data, when you perform one of the following tasks: Save a new version of the mapping. Configure the session to reinitialize the aggregate cache. Move the aggregate files without correcting the configured path or directory for the files in the session properties. Change the configured path or directory for the aggregate files without moving the files to the new location. Delete cache files.

Decrease the number of partitions. When the Integration Service rebuilds incremental aggregation files, the data in the previous files is lost. Note: To protect the incremental aggregation files from file corruption or disk failure, periodically back up the files. Preparing for Incremental Aggregation: When you use incremental aggregation, you need to configure both mapping and session properties: Implement mapping logic or filter to remove pre-existing data. Configure the session for incremental aggregation and verify that the file directory has enough disk space for the aggregate files. Configuring the Mapping Before enabling incremental aggregation, you must capture changes in source data. You can use a Filter or Stored Procedure transformation in the mapping to remove pre-existing source data during a session. Configuring the Session Use the following guidelines when you configure the session for incremental aggregation: (i) Verify the location where you want to store the aggregate files. The index and data files grow in proportion to the source data. Be sure the cache directory has enough disk space to store historical data for the session. When you run multiple sessions with incremental aggregation, decide where you want the files stored. Then, enter the appropriate directory for the process variable, $PMCacheDir, in the Workflow Manager. You can enter sessionspecific directories for the index and data files. However, by using the process variable for all sessions using incremental aggregation, you can easily change the cache directory when necessary by changing $PMCacheDir. Changing the cache directory without moving the files causes the Integration Service to reinitialize the aggregate cache and gather new aggregate data. In a grid, Integration Services rebuild incremental aggregation files they cannot find. When an Integration Service rebuilds incremental aggregation files, it loses aggregate history. (ii) Verify the incremental aggregation settings in the session properties. You can configure the session for incremental aggregation in the Performance settings on the Properties tab. You can also configure the session to reinitialize the aggregate cache. If you choose to reinitialize the cache, the Workflow Manager displays a warning indicating the Integration Service overwrites the existing cache and a reminder to clear this option after running the session.

When should we go for hash partitioning? Scenarios for choosing hash partitioning: Not enough knowledge about how much data maps into a give range. Sizes of range partition differ quite substantially, or are difficult to balance manually Range partitioning would cause data to be clustered undesirably. Features such as parallel DML, partition pruning, joins etc are important. You Can Define Following Partition Types In Workflow Manager: 1) Database Partitioning The integration service queries the IBM db2 or oracle system for table partition information. It reads partitioned data from the corresponding nodes in the database. Use database partitioning with oracle or IBM db2 source instances on a multi-node table space. Use database partitioning with db2 targets 2) Hash Partitioning Use hash partitioning when you want the integration service to distribute rows to the partitions by group. For example, you need to sort items by item id, but you do not know how many items have a particular id number 3) Key Range you specify one or more ports to form a compound partition key. The integration service passes data to each partition depending on the ranges you specify for each port. Use key range partitioning where the sources or targets in the pipeline are partitioned by key range. 4) Simple Pass-Through The integration service passes all rows at one partition point to the next partition point without redistributing them. Choose pass-through partitioning where you want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions 5) Round-Robin The integration service distributes data evenly among all partitions. Use round-robin partitioning where you want each partition to process approximately the same number of rows.

Partition Types Overview Creating Partition Tables To create a partition table gives the following statement Create table sales (year number(4), product varchar2(10), amt number(10)) partition by range (year) ( partition p1 values less than (1992) , partition p2 values less than (1993), partition p5 values less than (MAXVALUE) ); The following example creates a table with list partitioning Create table customers (custcode number(5), Name varchar2(20), Addr varchar2(10,2), City varchar2(20), Bal number(10,2)) Partition by list (city), Partition north_India values (DELHI,CHANDIGARH), Partition east_India values (KOLKOTA,PATNA), Partition south_India values (HYDERABAD,BANGALORE, CHENNAI), Partition west India values (BOMBAY,GOA); alter table sales add partition p6 values less than (1996); alter table customers add partition central_India values (BHOPAL,NAGPUR);SSS Alter table sales drop partition p5; Alter table sales merge partition p2 and p3 into partition p23;

The following statement adds a new set of cities ('KOCHI', 'MANGALORE') to an existing partition list. ALTER TABLE customers MODIFY PARTITION south_india ADD VALUES ('KOCHI', 'MANGALORE'); The statement below drops a set of cities (KOCHI' and 'MANGALORE') from an existing partition valu e list. ALTER TABLE customers MODIFY PARTITION south_india DROP VALUES (KOCHI,MANGALORE); SPLITTING PARTITIONS You can split a single partition into two partitions. For example to split the partition p5 of sales table into two partitions give the following command. Alter table sales split partition p5 into (Partition p6 values less than (1996), Partition p7 values less then (MAXVALUE)); TRUNCATING PARTITON Truncating a partition will delete all rows from the partition. To truncate a partition give the following statement Alter table sales truncate partition p5; LISTING INFORMATION ABOUT PARTITION TABLES To see how many partitioned tables are there in your schema give the following statement Select * from user_part_tables; To see on partition level partitioning information Select * from user_tab_partitions;

TASKS The Workflow Manager contains many types of tasks to help you build workflows and worklets. We can create reusable tasks in the Task Developer. Types of tasks: Task Type Session Email Command Event-Raise Event-Wait Timer Decision Assignment Tool where task can be Reusable or not created Task Developer Workflow Designer Worklet Designer Workflow Designer Worklet Designer Yes Yes Yes No No No No No

Control No SESSION TASK A session is a set of instructions that tells the Power Center Server how and when to move data from sources to targets. To run a session, we must first create a workflow to contain the Session task. We can run as many sessions in a workflow as we need. We can run the Session tasks sequentially or concurrently, depending on our needs. The Power Center Server creates several files and in-memory caches depending on the transformations and options used in the session. EMAIL TASK The Workflow Manager provides an Email task that allows us to send email during a workflow. Created by Administrator usually and we just drag and use it in our mapping. Steps: 1. In the Task Developer or Workflow Designer, choose Tasks-Create. 2. Select an Email task and enter a name for the task. Click Create. 3. Click Done. 4. Double-click the Email task in the workspace. The Edit Tasks dialog box appears. 5. Click the Properties tab. 6. Enter the fully qualified email address of the mail recipient in the Email User Name field. 7. Enter the subject of the email in the Email Subject field. Or, you can leave this field blank. 8. Click the Open button in the Email Text field to open the Email Editor. 9. Click OK twice to save your changes. Example: To send an email when a session completes: Steps: 1. Create a workflow wf_sample_email 2. Drag any session task to workspace. 3. Edit Session task and go to Components tab. 4. See On Success Email Option there and configure it. 5. In Type select reusable or Non-reusable. 6. In Value, select the email task to be used. 7. Click Apply -> Ok. 8. Validate workflow and Repository -> Save 9. We can also drag the email task and use as per need. 10. We can set the option to send email on success or failure in components tab of a session task. COMMAND TASK

The Command task allows us to specify one or more shell commands in UNIX or DOS commands in Windows to run during the workflow. For example, we can specify shell commands in the Command task to delete reject files, copy a file, or archive target files. Ways of using command task: 1. Standalone Command task: We can use a Command task anywhere in the workflow or worklet to run shell commands. 2. Pre- and post-session shell command: We can call a Command task as the pre- or post-session shell command for a Session task. This is done in COMPONENTS TAB of a session. We can run it in Pre-Session Command or Post Session Success Command or Post Session Failure Command. Select the Value and Type option as we did in Email task. Example: to copy a file sample.txt from D drive to E. Command: COPY D:\sample.txt E:\ in windows Steps for creating command task: 1. In the Task Developer or Workflow Designer, choose Tasks-Create. 2. Select Command Task for the task type. 3. Enter a name for the Command task. Click Create. Then click done. 4. Double-click the Command task. Go to commands tab. 5. In the Commands tab, click the Add button to add a command. 6. In the Name field, enter a name for the new command. 7. In the Command field, click the Edit button to open the Command Editor. 8. Enter only one command in the Command Editor. 9. Click OK to close the Command Editor. 10. Repeat steps 5-9 to add more commands in the task. 11. Click OK. Steps to create the workflow using command task: 1. Create a task using the above steps to copy a file in Task Developer. 2. Open Workflow Designer. Workflow -> Create -> Give name and click ok. 3. Start is displayed. Drag session say s_m_Filter_example and command task. 4. Link Start to Session task and Session to Command Task. 5. Double click link between Session and Command and give condition in editor as 6. $S_M_FILTER_EXAMPLE.Status=SUCCEEDED 7. Workflow-> Validate 8. Repository > Save WORKING WITH EVENT TASKS We can define events in the workflow to specify the sequence of task execution. Types of Events: Pre-defined event: A pre-defined event is a file-watch event. This event Waits for a specified file to arrive at a given location. User-defined event: A user-defined event is a sequence of tasks in the Workflow. We create events and then raise them as per need. Steps for creating User Defined Event: 1. Open any workflow where we want to create an event. 2. Click Workflow-> Edit -> Events tab. 3. Click to Add button to add events and give the names as per need. 4. Click Apply -> Ok. Validate the workflow and Save it. Types of Events Tasks: EVENT RAISE: Event-Raise task represents a user-defined event. We use this task to raise a user defined event. EVENT WAIT: Event-Wait task waits for a file watcher event or user defined event to occur before executing the next session in the workflow. Example1: Use an event wait task and make sure that session s_filter_example runs when abc.txt file is present in D:\FILES folder. Steps for creating workflow:

1. Workflow -> Create -> Give name wf_event_wait_file_watch -> Click ok. 2. Task -> Create -> Select Event Wait. Give name. Click create and done. 3. Link Start to Event Wait task. 4. Drag s_filter_example to workspace and link it to event wait task. 5. Right click on event wait task and click EDIT -> EVENTS tab. 6. Select Pre Defined option there. In the blank space, give directory and filename to watch. Example: D:\FILES\abc.tct 7. Workflow validate and Repository Save. Example 2: Raise a user defined event when session s_m_filter_example succeeds. Capture this event in event wait task and run session S_M_TOTAL_SAL_EXAMPLE Steps for creating workflow: 1. Workflow -> Create -> Give name wf_event_wait_event_raise -> Click ok. 2. Workflow -> Edit -> Events Tab and add events EVENT1 there. 3. Drag s_m_filter_example and link it to START task. 4. Click Tasks -> Create -> Select EVENT RAISE from list. Give name 5. ER_Example. Click Create and then done. Link ER_Example to s_m_filter_example. 6. Right click ER_Example -> EDIT -> Properties Tab -> Open Value for User Defined Event and Select EVENT1 from the list displayed. Apply -> OK. 7. Click link between ER_Example and s_m_filter_example and give the condition $S_M_FILTER_EXAMPLE.Status=SUCCEEDED 8. Click Tasks -> Create -> Select EVENT WAIT from list. Give name EW_WAIT. Click Create and then done. 9. Link EW_WAIT to START task. 10. Right click EW_WAIT -> EDIT-> EVENTS tab. 11. Select User Defined there. Select the Event1 by clicking Browse Events button. 12. Apply -> OK. 13. Drag S_M_TOTAL_SAL_EXAMPLE and link it to EW_WAIT. 14. Mapping -> Validate 15. Repository -> Save. Run workflow and see. TIMER TASK The Timer task allows us to specify the period of time to wait before the Power Center Server runs the next task in the workflow. The Timer task has two types of settings: Absolute time: We specify the exact date and time or we can choose a user-defined workflow variable to specify the exact time. The next task in workflow will run as per the date and time specified. Relative time: We instruct the Power Center Server to wait for a specified period of time after the Timer task, the parent workflow, or the top-level workflow starts. Example: Run session s_m_filter_example relative to 1 min after the timer task. Steps for creating workflow: 1. Workflow -> Create -> Give name wf_timer_task_example -> Click ok. 2. Click Tasks -> Create -> Select TIMER from list. Give name TIMER_Example. Click Create and then done. 3. Link TIMER_Example to START task. 4. Right click TIMER_Example-> EDIT -> TIMER tab. 5. Select Relative Time Option and Give 1 min and Select From start time of this task Option. 6. Apply -> OK. 7. Drag s_m_filter_example and link it to TIMER_Example. 8. Workflow-> Validate and Repository -> Save. DECISION TASK The Decision task allows us to enter a condition that determines the execution of the workflow, similar to a link condition. The Decision task has a pre-defined variable called $Decision_task_name.condition that represents the result of the decision condition. The Power Center Server evaluates the condition in the Decision task and sets the pre-defined condition variable to True (1) or False (0). We can specify one decision condition per Decision task.

Example: Command Task should run only if either s_m_filter_example or S_M_TOTAL_SAL_EXAMPLE succeeds. If any of s_m_filter_example or S_M_TOTAL_SAL_EXAMPLE fails then S_m_sample_mapping_EMP should run. Steps for creating workflow: 1. Workflow -> Create -> Give name wf_decision_task_example -> Click ok. 2. Drag s_m_filter_example and S_M_TOTAL_SAL_EXAMPLE to workspace and link both of them to START task. 3. Click Tasks -> Create -> Select DECISION from list. Give name DECISION_Example. Click Create and then done. Link DECISION_Example to both s_m_filter_example and S_M_TOTAL_SAL_EXAMPLE. 4. Right click DECISION_Example-> EDIT -> GENERAL tab. 5. Set Treat Input Links As to OR. Default is AND. Apply and click OK. 6. Now edit decision task again and go to PROPERTIES Tab. Open the Expression editor by clicking the VALUE section of Decision Name attribute and enter the following condition: $S_M_FILTER_EXAMPLE.Status = SUCCEEDED OR $S_M_TOTAL_SAL_EXAMPLE.Status = SUCCEEDED 7. Validate the condition -> Click Apply -> OK. 8. Drag command task and S_m_sample_mapping_EMP task to workspace and link them to DECISION_Example task. 9. Double click link between S_m_sample_mapping_EMP & DECISION_Example & give the condition: $DECISION_Example.Condition = 0. Validate & click OK. 10. Double click link between Command task and DECISION_Example and give the condition: $DECISION_Example.Condition = 1. Validate and click OK. 11. Workflow Validate and repository Save. Run workflow and see the result. CONTROL TASK We can use the Control task to stop, abort, or fail the top-level workflow or the parent workflow based on an input link condition. A parent workflow or worklet is the workflow or worklet that contains the Control task. We give the condition to the link connected to Control Task. Control Option Fail Me Fail Parent Stop Parent Abort Parent Fail Top-Level WF Stop Top-Level WF Description Fails the control task. Marks the status of the WF or worklet that contains the Control task as failed. Stops the WF or worklet that contains the Control task. Aborts the WF or worklet that contains the Control task. Fails the workflow that is running. Stops the workflow that is running.

Abort Top-Level WF Aborts the workflow that is running. Example: Drag any 3 sessions and if anyone fails, then Abort the top level workflow. Steps for creating workflow: 1. Workflow -> Create -> Give name wf_control_task_example -> Click ok. 2. Drag any 3 sessions to workspace and link all of them to START task. 3. Click Tasks -> Create -> Select CONTROL from list. Give name cntr_task. 4. Click Create and then done. 5. Link all sessions to the control task cntr_task. 6. Double click link between cntr_task and any session say s_m_filter_example and give the condition: $S_M_FILTER_EXAMPLE.Status = SUCCEEDED. 7. Repeat above step for remaining 2 sessions also. 8. Right click cntr_task-> EDIT -> GENERAL tab. Set Treat Input Links As to OR. Default is AND. 9. Go to PROPERTIES tab of cntr_task and select the value Fail top level 10. Workflow for Control Option. Click Apply and OK. 11. Workflow Validate and repository Save. Run workflow and see the result.

ASSIGNMENT TASK The Assignment task allows us to assign a value to a user-defined workflow variable. See Workflow variable topic to add user defined variables. To use an Assignment task in the workflow, first create and add the Assignment task to the workflow. Then configure the Assignment task to assign values or expressions to user-defined variables. We cannot assign values to pre-defined workflow. Steps to create Assignment Task: 1. Open any workflow where we want to use Assignment task. 2. Edit Workflow and add user defined variables. 3. Choose Tasks-Create. Select Assignment Task for the task type. 4. Enter a name for the Assignment task. Click Create. Then click done. 5. Double-click the Assignment task to open the Edit Task dialog box. 6. On the Expressions tab, click Add to add an assignment. 7. Click the Open button in the User Defined Variables field. 8. Select the variable for which you want to assign a value. Click OK. 9. Click the Edit button in the Expression field to open the Expression Editor. 10. Enter the value or expression you want to assign. 11. Repeat steps 7-10 to add more variable assignments as necessary. 12. Click OK.

Scheduler We can schedule a workflow to run continuously, repeat at a given time or interval, or we can manually start a workflow. The Integration Service runs a scheduled workflow as configured. By default, the workflow runs on demand. We can change the schedule settings by editing the scheduler. If we change schedule settings, the Integration Service reschedules the workflow according to the new settings. A scheduler is a repository object that contains a set of schedule settings. Scheduler can be non-reusable or reusable. The Workflow Manager marks a workflow invalid if we delete the scheduler associated with the workflow. If we choose a different Integration Service for the workflow or restart the Integration Service, it reschedules all workflows. If we delete a folder, the Integration Service removes workflows from the schedule. The Integration Service does not run the workflow if: The prior workflow run fails. We remove the workflow from the schedule The Integration Service is running in safe mode

Creating a Reusable Scheduler For each folder, the Workflow Manager lets us create reusable schedulers so we can reuse the same set of scheduling settings for workflows in the folder. Use a reusable scheduler so we do not need to configure the same set of scheduling settings in each workflow. When we delete a reusable scheduler, all workflows that use the deleted scheduler becomes invalid. To make the workflows valid, we must edit them and replace the missing scheduler.

Steps: 1. 2. 3. 4. 5. 6. Open the folder where we want to create the scheduler. In the Workflow Designer, click Workflows > Schedulers. Click Add to add a new scheduler. In the General tab, enter a name for the scheduler. Configure the scheduler settings in the Scheduler tab. Click Apply and OK.

Configuring Scheduler Settings Configure the Schedule tab of the scheduler to set run options, schedule options, start options, and end options for the schedule. There are 3 run options: 1. 2. 3. Run on Demand Run Continuously Run on Server initialization

1. Run on Demand: Integration Service runs the workflow when we start the workflow manually. 2. Run Continuously: Integration Service runs the workflow as soon as the service initializes. The Integration Service then starts the next run of the workflow as soon as it finishes the previous run. 3. Run on Server initialization Integration Service runs the workflow as soon as the service is initialized. The Integration Service then starts the next run of the workflow according to settings in Schedule Options. Schedule options for Run on Server initialization: Run Once: To run the workflow just once. Run every: Run the workflow at regular intervals, as configured. Customized Repeat: Integration Service runs the workflow on the dates and times specified in the Repeat dialog box. Start options for Run on Server initialization:

End options for Run on Server initialization: End on: IS stops scheduling the workflow in the selected date. End After: IS stops scheduling the workflow after the set number of Workflow runs. Forever: IS schedules the workflow as long as the workflow does not fail.

Creating a Non-Reusable Scheduler 1. 2. 3. In the Workflow Designer, open the workflow. Click Workflows > Edit. In the Scheduler tab, choose Non-reusable. Select Reusable if we want to select an existing reusable scheduler for the workflow. 4. 5. Note: If we do not have a reusable scheduler in the folder, we must Create one before we choose Reusable.

6. 7. 8. 9.

Click the right side of the Scheduler field to edit scheduling settings for the non- reusable scheduler If we select Reusable, choose a reusable scheduler from the Scheduler Browser dialog box. Click Ok.

Points to Ponder: To remove a workflow from its schedule, right-click the workflow in the Navigator window and choose Unscheduled Workflow. To reschedule a workflow on its original schedule, right-click the workflow in the Navigator window and choose Schedule Workflow.

Active and Idle Databases During pushdown optimization, the Integration Service pushes the transformation logic to one database, which is called the active database. A database that does not process transformation logic is called an idle database. For example, a mapping contains two sources that are joined by a Joiner transformation. If the session is configured for source-side pushdown optimization, the Integration Service pushes the Joiner transformation logic to the source in the detail pipeline, which is the active database. The source in the master pipeline is the idle database because it does not process transformation logic. The Integration Service uses the following criteria to determine which database is active or idle: 1. When using full pushdown optimization, the target database is active and the source database is idle. 2. In sessions that contain a Lookup transformation, the source or target database is active, and the lookup database is idle. 3. In sessions that contain a Joiner transformation, the source in the detail pipeline is active, and the source in the master pipeline is idle. 4. In sessions that contain a Union transformation, the source in the first input group is active. The sources in other input groups are idle. To push transformation logic to an active database, the database user account of the active database must be able to read from the idle databases.

Working with Databases You can configure pushdown optimization for the following databases: IBM DB2 Microsoft SQL Server Netezza Oracle Sybase ASE Teradata Databases that use ODBC drivers When you push transformation logic to a database, the database may produce different output than the Integration Service. In addition, the Integration Service can usually push more transformation logic to a database if you use a native driver, instead of an ODBC driver. Comparing the Output of the Integration Service and Databases The Integration Service and databases can produce different results when processing the same transformation logic. The Integration Service sometimes converts data to a different format when it reads data. The Integration Service and database may also handle null values, case sensitivity, and sort order differently. The database and Integration Service produce different output when the following settings and conversions are different: Nulls treated as the highest or lowest value. The Integration Service and a database can treat null values differently. For example, you want to push a Sorter transformation to an Oracle database. In the session, you configure nulls as the lowest value in the sort order. Oracle treats null values as the highest value in the sort order. Sort order. The Integration Service and a database can use different sort orders. For example, you want to push the transformations in a session to a Microsoft SQL Server database, which is configured to use a sort order that is not case sensitive. You configure the session properties to use the binary sort order, which is case sensitive. The results differ based on whether the Integration Service or Microsoft SQL Server database process the transformation logic. Case sensitivity. The Integration Service and a database can treat case sensitivity differently. For example, the Integration Service uses case sensitive queries and the database does not. A Filter transformation uses the following filter condition: IIF(col_varchar2 = CA, TRUE, FALSE). You need the database to return rows that match CA. However, if you push this transformation logic to a Microsoft SQL Server database that is not case sensitive, it returns rows that match the values Ca, ca, cA, and CA. Numeric values converted to character values. The Integration Service and a database can convert the same numeric value to a character value in different formats. The database can convert numeric values to an unacceptable character format. For example, a table contains the number 1234567890. When the Integration Service converts the number to a character value, it inserts the characters 1234567890. However, a database might convert the number to 1.2E9. The two sets of characters represent the same value. However, if you require the characters in the format 1234567890, you can disable pushdown optimization. Precision. The Integration Service and a database can have different precision for particular datatypes. Transformation datatypes use a default numeric precision that can vary from the native datatypes. For example, a transformation Decimal datatype has a precision of 1-28. The corresponding Teradata Decimal datatype has a precision of 1-18. The results can vary if the database uses a different precision than the Integration Service. Using ODBC Drivers When you use native drivers for all databases, except Netezza, the Integration Service generates SQL statements using native database SQL. When you use ODBC drivers, the Integration Service usually cannot detect the database type. As a result, it generates SQL statements using ANSI SQL. The Integration Service can generate more functions when it generates SQL statements using the native language than ANSI SQL. Note: Although the Integration Service uses an ODBC driver for the Netezza database, the Integration Service detects that the database is Netezza and generates native database SQL when pushing the transformation logic to the Netezza database. In some cases, ANSI SQL is not compatible with the database syntax. The following sections describe problems that you can encounter when you use ODBC drivers. When possible, use native drivers to prevent these problems.

Working with Dates The Integration Service and database can process dates differently. When you configure the session to push date conversion to the database, you can receive unexpected results or the session can fail. The database can produce different output than the Integration Service when the following date settings and conversions are different: Date values converted to character values. The Integration Service converts the transformation Date/Time datatype to the native datatype that supports subsecond precision in the database. The session fails if you configure the datetime format in the session to a format that the database does not support. For example, when the Integration Service performs the ROUND function on a date, it stores the date value in a character column, using the format MM/DD/YYYY HH:MI:SS.US. When the database performs this function, it stores the date in the default date format for the database. If the database is Oracle, it stores the date as the default DD-MON-YY. If you require the date to be in the format MM/DD/YYYY HH:MI:SS.US, you can disable pushdown optimization. Date formats for TO_CHAR and TO_DATE functions. The Integration Service uses the date format in the TO_CHAR or TO_DATE function when the Integration Service pushes the function to the database. The database converts each date string to a datetime value supported by the database. For example, the Integration Service pushes the following expression to the database: TO_DATE( DATE_PROMISED, 'MM/DD/YY' ) The database interprets the date string in the DATE_PROMISED port based on the specified date format string MM/DD/YY. The database converts each date string, such as 01/22/98, to the supported date value, such as Jan 22 1998 00:00:00. If the Integration Service pushes a date format to an IBM DB2, a Microsoft SQL Server, or a Sybase database that the database does not support, the Integration Service stops pushdown optimization and processes the transformation. The Integration Service converts all dates before pushing transformations to an Oracle or Teradata database. If the database does not support the date format after the date conversion, the session fails. HH24 date format. You cannot use the HH24 format in the date format string for Teradata. When the Integration Service generates SQL for a Teradata database, it uses the HH format string instead. Blank spaces in date format strings. You cannot use blank spaces in the date format string in Teradata. When the Integration Service generates SQL for a Teradata database, it substitutes the space with B. Handling subsecond precision for a Lookup transformation. If you enable subsecond precision for a Lookup transformation, the database and Integration Service perform the lookup comparison using the subsecond precision, but return different results. Unlike the Integration Service, the database does not truncate the lookup results based on subsecond precision. For example, you configure the Lookup transformation to show subsecond precision to the millisecond. If the lookup result is 8:20:35.123456, a database returns 8:20:35.123456, but the Integration Service returns 8:20:35.123. SYSDATE built-in variable. When you use the SYSDATE built-in variable, the Integration Service returns the current date and time for the node running the service process. However, when you push the transformation logic to the database, the SYSDATE variable returns the current date and time for the machine hosting the database. If the time zone of the machine hosting the database is not the same as the time zone of the machine running the Integration Service process, the results can vary. 1. How do I eliminate the duplicate rows ? Ans: delete from table_name where rowid not in (select max(rowid) from table group by duplicate_values_field_name); or delete duplicate_values_field_name dv from table_name ta where rowid <(select min(rowid) from table_name tb where ta.dv=tb.dv); 2.How do I display row number with records?

Ans:Select rownum,emp.* from emp 3.Display the records between two range? Ans: select rownum, empno, ename from emp where rowid in (select rowid from emp where rownum <=&upto minus select rowid from emp where rownum<&Start); Enter value for upto: 10 Enter value for Start: 7 4.I know the nvl function only allows the same data type(ie. number or char or date Nvl(comm, 0)), if commission is null then the text Not Applicable want to display, instead of blank space. How do I write the query? Ans:select nvl(to_char(comm.),'NA') from emp; 5. Find out nth highest salary from emp table? Ans:SELECT DISTINCT (a.sal) FROM EMP A WHERE &N = (SELECT COUNT (DISTINCT (b.sal)) FROM EMP B WHERE a.sal<=b.sal); or SELECT * FROM (SELECT DISTINCT(SAL),DENSE_RANK() OVER (ORDER BY SAL DESC) AS RNK FROM EMP) WHERE RNK=&N or select min(sal) from (select distinct sal from emp order by sal desc) where rownum <=&n 6. Find out nth highest salary DEPT wise from emp table? Ans:SELECT * FROM (SELECT DISTINCT(SAL),DENSE_RANK() OVER (PARTITION BY DEPTNO ORDER BY SAL DESC) AS RNK FROM EMP) WHERE RNK=&N 7. Display Odd/ Even number of records? Ans:Odd number of records: select * from emp where (rowid,1) in (select rowid, mod(rownum,2) from emp); Even number of records: select * from emp where (rowid,0) in (select rowid, mod(rownum,2) from emp); 8.What are the more common pseudo-columns? Ans: SYSDATE, USER , UID, CURVAL, NEXTVAL, ROWID, ROWNUM 9.How To Display last 5 records in a table? Ans: select * from (select rownum r, emp.* from emp) where r between (Select count(*)-5 from emp) and (Select count(*) from emp)

10.How To Display last record in a table? Ans: select * from (select rownum r, emp.* from emp) where r in (Select count(*) from emp) 11. How To Display particular nth record in a table? Ans: select * from (select rownum r, emp.* from emp) where r in (2) or r=2 12.How To Display even or odd records in a table? Ans:select * from (select emp.* , rownum r from emp) where mod (r,2)=0 13. What is the difference between a HAVING CLAUSE and a WHERE CLAUSE? Ans:Specifies a search condition for a group or an aggregate. HAVING can be used only with the SELECT statement. HAVING is typically used in a GROUP BY clause. When GROUP BY is not used, HAVING behaves like a WHERE clause. Having Clause is basically used only with the GROUP BY function in a query. WHERE Clause is applied to each row before they are part of the GROUP BY function in a query. 14.What is sub-query? Explain properties of sub-query? Ans: Sub-queries are often referred to as sub-selects, as they allow a SELECT statement to be executed arbitrarily within the body of another SQL statement. A sub-query is executed by enclosing it in a set of parentheses. Sub-queries are generally used to return a single row as an atomic value, though they may be used to compare values against multiple rows with the IN keyword. A subquery is a SELECT statement that is nested within another T-SQL statement. A subquery SELECT statement if executed independently of the T-SQL statement, in which it is nested, will return a result set. Meaning a subquery SELECT statement can standalone and is not depended on the statement in which it is nested. A subquery SELECT statement can return any number of values, and can be found in, the column list of a SELECT statement, a FROM, GROUP BY, HAVING, and/or ORDER BY clauses of a T-SQL statement. A Subquery can also be used as a parameter to a function call. Basically a subquery can be used anywhere an expression can be used. 15.Properties of Sub-Query Ans: A subquery must be enclosed in the parenthesis. A subquery must be put in the right hand of the comparison operator, and A subquery cannot contain a ORDER-BY clause. A query can contain more than one sub-queries. 16. What are types of sub-queries? Ans:Single-row sub query, where the subquery returns only one row. Multiple-row sub query, where the subquery returns multiple rows,.and Multiple column subquery, where the sub query returns multiple columns. 17. what is the out put for query select * from emp where rownum<=3 Ans: it display first 3 Records 18.what is the out put for query select * from emp where rownum=1;

Ans: it display first Record in the table 19. what is the out put for query select * from emp where rownum=2; Ans: it will not display any record 20.what is the out put for query select * from emp where rownum>1; Ans: even this also will not display the records. why because when it fetch the first record rownum is 1 so condition fail so it will not get first record when it fetches 2nd record rownum is again 1 because it didn't pick up first record so 2nd time also condition failed. 21. How to display Top N salaries in emp? Ans: select * from (select distinct sal from emp order by sal desc) where rownum<=&n 22. How To display Last Record in emp table? Ans: Select * from ( select rownum as rn,emp.* from emp) where rn in(select count(*) from emp) 23. How To display First and last Records in emp table? Ans:select * from ( select rownum as rn,emp.* from emp) where rn in(1,(select count(*) from emp) ) 24. How to Diplay 1,5,8 records in emp table? Ans: select * from ( select rownum as rn,emp.* from emp) where rn in (1,5,8) 25. In Oracle, can we add a Not Null column to a table with data? If "No" then how can we do that? Ans:No, we cannot add a Not Null column to a table with data. Oracle throws Error ORA-01758. See example below! Eg: alter table EMP add comm2 number not null Error: ORA-01758: table must be empty to add mandatory (NOT NULL) column. Workaround: Provide a Default value to the column being added, along with the NOT NULL constraint. Then the column will get added with the default value for all existing rows. Eg: alter table EMP add comm2 number not null default 100 -- Comm2 will have 100 for all rows 26. While doing an ascending order sort on a column having NULL values, where does the NULLs show up in the result set? At the beginning or at the end? Ascending order sort - NULLs come last because Oracle treats NULLs are the largest possible values Descending order sort - NULLs come first * How to make NULLs come last in descending order sort?

Add NULLS LAST to the order by desc clause Eg: select col1 from table1 order by col1 desc NULLS LAST 27. how to set Time of execution of an SQL Statement first run this in sql prompt: set timing on After execution of each query we get the time take for it if you don't want run this : set timing off 28.What is the Datatype of NULL in Oracle? Ans:Datatype of NULL is "char(0)" and size is '0' 29.Oracle Functions - Replace versus Trim SQL> select replace('jose. antony@ yahoo.com',' ', null) as Replace1 from dual; REPLACE1 -------------------jose.antony@yahoo.com --Removes all spaces from in-between SQL> select trim('jose. antony@ yahoo.com') as Trim1 from dual;

TRIM1 ---------------------jose. antony@ yahoo.com --Removes spaces from both sides only 30. Explain ROWID in Oracle? ROWID is a unique hexadecimal value which Oracle inserts to identify each record being inserted. It is used for all Full Table scans. Structure: OOOOOOFFFBBBBBBRRR OOOOOO - First six characters is the Object Number which idenities the Data Segment FFF - Next 3 characters is the Database File number BBBBBB - Next 6 characters shows the DataBlock number RRR -Next 3 characters identified the Row within the block 31. What is difference between Co-related sub query and nested sub query? Correlated subquery runs once for each row selected by the outer query. It contains a reference to a value from the row selected by the outer query. Nested subquery runs only once for the entire nesting (outer) query. It does not contain any reference to the outer query row. For example, Correlated Subquery: select e1.empname, e1.basicsal, e1.deptno from emp e1 where e1.basicsal = (select max(basicsal) from emp e2 where e2.deptno = e1.deptno)

Nested Subquery: select empname, basicsal, deptno from emp where (deptno, basicsal) in (select deptno, max(basicsal) from emp group by deptno) 32.What is the difference between TRUNCATE and DELETE commands? Ans:Both will result in deleting all the rows in the table .TRUNCATE call cannot be rolled back as it is a DDL command and all memory space for that table is released back to the server. TRUNCATE is much faster.Whereas DELETE call is an DML command and can be rolled back. 33. How to find out the duplicate column Ans: select column_name,count(*) from table_name having count(*)>1 if the result more than 1 then we can say that this column having duplicate records 34. How to find 2nd max salary from emp ? Ans: select max(sal) from emp where sal not in(select max(sal) from emp) 35. How to find max salary department wise in emp table? Ans:select deptno,max(sal) from emp group by deptno; 36. How to find 2nd max salary department wise in emp table? select deptno,max(sal) from emp where (deptno,sal) not in(select deptno,max(sal) from emp group by deptno) group by deptno; 37. Table1 having 10 records and table2 having 10 records both tables having 5 matching records. then how many records will display in 1. equi join 2.left outer join 3. right outer join 4. full outer join Ans: 1.in equi join matching records will display it means 5records will display 2.in left outer join matching 5 and non matching 5 records in left table so total 10 will display 3.in right outer join matching 5 and non matching 5 records in right table so total 10 will display. 4.in full outer join matching 5 and non matching 5 records in left table and non matching records in right table so total 15 will display 38.EMP table, for those emp whose Hiredate is same, update their sal by "sal+500" or else for others keep the sal as it is, how to do it by SQL query? Ans:UPDATE emp SET sal=sal+500 WHERE hiredate IN (SELECT hiredate FROM employees HAVING COUNT(*)>1 GROUP BY hiredate) What is Data warehouse? Data warehouse is relational database used for query analysis and reporting. By definition data warehouse is Subject-oriented, Integrated, Non-volatile, Time variant. Subject oriented : Data warehouse is maintained particular subject. Integrated : Data collected from multiple sources integrated into a user readable unique format. Non volatile : Maintain Historical date. Time variant : data display the weekly, monthly, yearly. 2) What is Data mart? A subset of data warehouse is called Data mart.

3) Difference between Data warehouse and Data mart? Data warehouse is maintaining the total organization of data. Multiple data marts used in data warehouse. where as data mart is maintained only particular subject. 4) Difference between OLTP and OLAP? OLTP is Online Transaction Processing. This is maintained current transactional data. That means insert, update and delete must be fast. 5) Explain ODS? Operational data store is a part of data warehouse. This is maintained only current transactional data. ODS is subject oriented, integrated, volatile, current data. 6) Difference between Power Center and Power Mart? Power center receive all product functionality including ability to multiple register servers and metadata across the repository and partition data. One repository multiple informatica servers. Power mart received all features except multiple register servers and partition data. 7) What is a staging area? Staging area is a temporary storage area used for transaction, integrated and rather than transaction processing. When ever your data put in data warehouse you need to clean and process your data. 8) Explain Additive, Semi-additive, Non-additive facts? Additive fact: Additive Fact can be aggregated by simple arithmetical additions. Semi-Additive fact: semi additive fact can be aggregated simple arithmetical additions along with some other dimensions. Non-additive fact: Non-additive fact cant be added at all. 9) What is a Fact less Fact and example? Fact table which has no measures. 10) Explain Surrogate Key? Surrogate Key is a series of sequential numbers assigned to be a primary key for the table. 11) How many types of approaches in DHW? Two approaches: Top-down(Inmol approach), Bottom-up(Ralph Kimball) 12) Explain Star Schema? Star Schema consists of one or more fact table and one or more dimension tables that are toforeignkeys. Dimension tables are De-normalized, Fact table-normalized Advantages: Less database space & Simplify queries. 13) Explain Snowflake schema? Snow flake schema is a normalize dimensions to eliminate the redundancy.The dimension data has been grouped into one large table. Both dimension and fact tables normalized. 14) What is confirm dimension? If both data marts use same type of dimension that is called confirm dimension.If you have same type of dimension can be used in multiple fact that is called confirm dimension. 15) Explain the DWH architecture? 16) What is a slowly growing dimension? Slowly growing dimensions are dimensional data,there dimensions increasing dimension data with out update existing dimensions.That means appending new data to existing dimensions. 17) What is a slowly changing dimension? Slowly changing dimension are dimension data,these dimensions increasing dimensions data with update existing dimensions. Type1: Rows containing changes to existing dimensional are update in the target by overwriting the existing dimension.In the Type1 Dimension mapping, all rows contain current dimension data. Use the type1 dimension mapping to update a slowly changing dimension table when you do not need to keep any previous versions of dimensions in the table. Type2: The Type2 Dimension data mapping inserts both new and changed dimensions into the target.Changes are tracked in the target table by versioning the primary key and creating a version number for each dimension in the table.

Use the Type2 Dimension/version data mapping to update a slowly changing dimension when you want to keep a full history of dimension data in the table.version numbers and versioned primary keys track the order of changes to each dimension. Type3: The type 3 dimension mapping filters source rows based on user-defined comparisions and inserts only those found to be new dimensions to the target.Rows containing changes to existing dimensions are updated in the target. When updating an existing dimension the informatica server saves existing data in different columns of the same row and replaces the existing data with the updates. 18) When you use for dynamic cache. Your target table is also look up table then you go for dynamic cache .In dynamic cache multiple matches return an error.use only = operator. 19) what is lookup override? Override the default SQL statement.You can join multiple sources use lookup override.By default informatica server add the order by clause. 20) we can pass the null value in lookup transformation? Lookup transformation returns the null value or equal to null value. 21) what is the target load order? You specify the target load order based on source qualifiers in a mapping.if u have the multiple source qualifiers connected to the multiple targets you can designate the order in which informatica server loads data into the targets. 22) what is default join that source qualifier provides? Inner equi join. 23) what are the difference between joiner transformation and source qualifier transformation? You can join heterogeneous data sources in joiner transformation, which we cannot achive in source qualifier transformation. You need matching keys to join two relational sources in source qualifier transformation.where you doesnt need matching keys to join two sources. Two relational sources should come from same data source in source qualifier.You can join relational sources, which are coming from different sources in source qualifier.You can join relational sources which are coming from different sources also. 24) what is update strategy transformation? Whenever you create the target table whether you are store the historical data or current transaction data in to target table. 25) Describe two levels in which update strategy transformation sets? 26) what is default source option for update strategy transformation? Data driven. 27) What is data driven? The information server follows instructions coded into update strategy transformations with in the session mapping determine how to flag records for insert,update,delete or reject if u do not choose data driven option setting , the informatica server ignores all update strategy transformations in the mapping. 28) what are the options in the trarget session of update strategy transformation? Insert Delete Update Update as update Update as insert Update else insert Truncate table. 29) Difference between the source filter and filter? Source filter is filtering the data only relational sources. Where as filter transformation filter the data any type of source. 30) what is a tracing level? Amount of information sent to log file.

-- What are the types of tracing levels? Normal,Terse,verbose data,verbose intitialization. --Expalin sequence generator transformation? -- can you connect multiple ports from one group to multiple transformations? Yes 31) can you connect more than one group to the same target or transformation? NO 32) what is a reusable transformation? Reusable transformation can be a single transformation.This transformation can be used in multiple mappings.when you need to incorporate this transformation into mapping you add an instance of it to mapping.Later if you change the definition of the transformation, all instances of it inherit the changes.Since the instance of reusable transformation is a pointer to that transformation.U can change the transformation in the transformation developer, its instance automatically reflect these changes. This feature can save U great deal of work. -- what are the methods for creating reusable transformation? Two methods 1) Design it in the transformation developer. 2) Promote a standard transformation from the mapping designer.After you add a transformation to the mapping, you can promote it to status of reusable transformation. Once you promote a standard transformation to reusable status, you can demote it to a standard transformation at any time. If u change the properties of a reusable transformation in mapping , you can revert it to the original reusable transformation properties by clicking the revert. 33) what are mapping parameters and mapping variables? Mapping parameter represents a constant value that you can define before running a session.A mapping parameter retains the same value throughout the entire session. When you use the mapping parameter , you declare and use the parameter in a mapping or mapplet.Then define the value of parameter in a parameter file for the session. Unlike a mapping parameter, a mapping variable represents a value that can change through out the session. The informatica server save the value of mapping variable to the repository at the end of session run and uses that value next time you run the session. 34) can you use the mapping parameters or variables created in one mapping into another mapping? NO, we can use mapping parameters or variables in any transformation of the same mapping or mapplet in which have crated mapping parameters or variables. 35) Can you are the mapping parameters or variables created in one mapping into any other result transformation. Yes because the reusable transformation is not contained with any mapplet or mapping. 36) How the informatica server sorts the string values in rank transformation? When the informatica server runs in the ASCII data movement mode it sorts session data using binary sort order.If you configures the session to use a binary sort order, the informatica server calculates the binary value of each string and returns the specified number of rows with the highest binary values for the string. 37) What is the rank index in rank transformation? The designer automatically creates a RANKINDEX port for each Rank transformation. The informatica server uses the Rank Index port to store the ranking position for each record in a group.For example, if you create a Rank transformation that ranks the top 5 sales persons for each quarter, the rank index number the salespeople from 1 to 5. 38) what is the mapplet? Mapplet is a set of transformation that you build in the mapplet designer and you can use in multiple mappings. 39) Difference between mapplet and reusable transformation? Reusable transformation can be a single transformation.Where as mapplet use multiple transformations. 40) what is a parameter a file? Paramater file defines the values for parameter and variables.

WORKFLOW MANAGER 41) what is a server? The power center server moves data from source to targets based on a workflow and mapping metadata stored in a repository. 42) what is a work flow? A workflow is a set of instructions that describe how and when to run tasks related to extracting,transformation and loading data. -- what is session? A session is a set of instructions that describes how to move data from source to target using a mapping. -- what is workflow monitor? Use the work flow monitor work flows and stop the power center server. 43) explain a work flow process? The power center server uses both process memory and system shared memory to perform these tasks. Load manager process: stores and locks the workflow tasks and start the DTM run the sessions. Data Transformation Process DTM: Perform session validations,create threads to initialize the session,read,write and transform data, and handle pre and post session operations. The default memory allocation is 12,000,000 bytes. 44) What are types of threads in DTM? The main dtm thread is called the master thread. Mapping thread. Transformation thread. Reader thread. Writer thread. Pre-and-post session thread. 45) Explain work flow manager tools? 1) Task developer. 2) Work flow designer. 3) Worklet designer. 46) Explain work flow schedule. You can sehedule a work flow to run continuously, repeat at given time or interval or you manually start a work flow.By default the workflow runs on demand. 47) Explain stopping or aborting a session task? If the power center is executing a session task when you issue the stop the command the power center stop reading data. If continuous processing and writing data and committing data to targets. If the power center cant finish processing and committing data you issue the abort command. You can also abort a session by using the Abort() function in the mapping logic . 48) What is a worklet? A worklet is an object that represents a set of taske.It can contain any task available in the work flow manager. You can run worklets inside a workflow. You can also nest a worklet in another worklet.The worklet manager does not provide a parameter file for worklets. The power center server writes information about worklet execution in the workflow log. 49) what is a commit interval and explain the types? A commit interval is the interval at which power center server commits data to targets during a session. The commit interval the number of rows you want to use as a basis for the commit point. Target Based commit: The power center server commits data based on the number of target rows and the key constraints on the target table. The commit point also depends on the buffer block size and the commit interval. Source-based commit:--------------------------------------------User-defined commit:---------------------------------------------50) Explain bulk loading? You can use bulk loading to improve performance of a session that inserts a large amount of data to a db2,sysbase,oracle or MS SQL server database.

When bulk loading the power center server by passes the database log,which speeds performance. With out writing to the database log, however the target database cant perform rollback. As a result you may not be perform recovery. 51) What is a constraint based loading? When you select this option the power center server orders the target load on a row-by-row basis only. Edit tasks->properties->select treat source rows as insert. Edit tasks->config object tab->select constraint based If session is configured constraint absed loading when target table receive rows from different sources.The power center server revert the normal loading for those tables but loads all other targets in the session using constraint based loading when possible loading the primary key table first then the foreign key table. Use the constraint based loading only when the session option treat rows as set to insert. Constraint based load ordering functionality which allows developers to read the source once and populate parent and child tables in a single process. 52) Explain incremental aggregation? When using incremental aggregation you apply captured changes in the source to aggregate calculations in a session.If the source changes only incrementally and you can capture changes you can configure the session to process only those changes. This allows the power center server to update your target incrementally rather than forcing it to process the entire source and recalculate the same data each time you run the session. You can capture new source data.use incremental aggregation when you can capture new source data much time you run the session.Use a stored procedure on filter transformation only new data. Incremental changes do not significantly change the target.Use incremental aggregation when the changes do not significantly change the target.If processing the incrementally changed source alters more than half the existing target, the session may not benefit from using incremental aggregation. In this case drop the table and recreate the target with complete source data. 53) Processing of incremental aggregation The first time u run an incremental aggregation session the power center server process the entire source.At the end of the session the power center server stores aggregate data from the session runs in two files, the index file and the data file .The power center server creates the files in a local directory. Transformations. --- what is transformation? Transformation is repository object that generates modifies or passes data. 54) what are the type of transformations? 2 types: 1) active 2) passive. -- explain active and passive transformation? Active transformation can change the number of rows that pass through it.No of output rows less than or equal to no of input rows. Passive transformation does not change the number of rows.Always no of output rows equal to no of input rows. 55) Difference filter and router transformation. Filter transformation to filter the data only one condition and drop the rows dont meet the condition. Drop rows does not store any ware like session log file.. Router transformation to filter the data based on multiple conditions and give yiou the option to route rows that dont match to a default group. 56) what r the types of groups in router transformation? Router transformation 2 groups 1. Input group 2. output groups. Output groups in 2 types. 1. user defined group 2. default group. 57) difference between expression and aggregator transformation? Expression transformation calculate the single row values before writes the target.Expression transformation executed by row-by-row basis only. Aggregator transformation allows you to perform aggregate calculations like max, min, avg

Aggregate transformation perform calculation on groups. 58) How can u improve the session performance in aggregate transformation? Use stored input. 59) what is aggregate cache in aggregate transformation? The aggregate stores data in the aggregate cache until it completes aggregate calculations.When u run a session that uses an aggregate transformation , the informatica server creates index and data caches in memory is process the transformation. If the informatica server requires more space it seores overview values in cache files. 60) explain joiner transformation? Joiner transformation joins two related heterogeneous sources residing in different locations or files. --What are the types of joins in joiner in the joiner transformation? Normal Master outer Detail outer Full outer 61) Difference between connected and unconnected transformations. Connected transformation is connected to another transformation with in a mapping. Unconnected transformation is not connected to any transformation with in a mapping. 62) In which conditions we cannot use joiner transformation(limitations of joiner transformation)? Both pipelines begin with the same original data source. Both input pipelines originate from the same source qualifier transformation. Both input pipelines originate from the same normalizer transformation Both input pipelines originate from the same joiner transformation. Either input pipelines contains an update strategy transformation Either input pipelines contains sequence generator transformation. 63) what are the settings that u use to configure the joiner transformation? Master and detail source. Type of join Condition of the join 64) what is look up transformation look up transformation can be used in a table view based on condition by default lookup is left outer join 65) why use the lookup transformation? To perform the following tasks. Get a related value.For example if your table includes employee ID,but you want to include such as gross sales per invoice or sales tax but not the calculated value(such as net sales) Update slowly changing dimension tables. You can use a lookup transformation to determine whether records already exist in the target. 66) what are the types of lookup? Connected and unconnected 67) difference between connected and unconnected lookup? Connected lookup Receives input values directly from the pipe line. U can use a dynamic or static Cache Cache includes all lokkup columns used in the mapping(that is lookup table columns included in the lookup condition and lookup table columns linked as output ports to other transformations) Unconnected lookup Receives input values from the result of a clkp expression in a another transformation. U can use a static cache Cache includes all lookup/output ports in the lookup condition and the lookup/return port.

Can return multiple columns from the same row or insert into the dynamic lookup cache. If there is no match for the lookup condition, the informatica server returns the default value for all output ports.If u configure dynamic caching the informatica server inserts rows into the cache. Pass multiple output values to another transformatnion.Link lookup/output ports to another transformation Supports user-defined default values.

Designate one return port(R).Returns one column from each row. If there is no matching for the lookup condition the informatica server returns NULL

Pass one output value to another transformation.The lookup/output/return port passes the same value to the -------------------------------------------------------Does not support user-defined default values.

68) explain index cache and data cache? The informatica server stores conditions values in the index cache and output values in the data cache. 69) What are the types of lookup cache? Persistent cache: U can save the look up cache files and reuse them the next time the informatica server processes a lookup transformation to use the cache. Static cache: U can configure a static or read-only lookup table.By default informatica server creates a static cache.It caches the lookup table and lookup values in the cache for each row that comes into the transformation.When the lookup condition is true the inforamtica server does not update the cache while it processes the lookup transformation. Dynamic cache: If you want to cache the target table and insert new rows into cache and the target you can create a look up transformation to use dynamic cache.The informatica server dynamically inserts data into the target table. Shared cache: You can share the lookup cache between multiple transformations.You can share unnamed cache between transformation in the same mapping. 70) Difference between static cache and dynamic cache? Static cache You cannot insert or update the cache The informatica server returns a value from the lookup table or cache when the condition is true,.When the condition is true the informatica server returns the default value for connected transformation Dynamic cache You can insert rows into the cache as you pass rows to the target The informatica server inserts rows into the cache when the condition is false.This indicates that the row in the cache or target table.You can pass these rows to the target table.

ORACLE: 71) Difference between primary key and unique key? Primary key is Not null unique Unique accept the null values.

72) Difference between inserting and sub string? 73) What is referential integrity? 74) Difference between view and materialized view? 75) What is Redolog file? The set of redo log files for a database is collectively know as the databases redo log. 76) What is RollBack statement? A database contains one or more rollback segments to temporarily store undo information.Roll back segment are used to generate read consistant data base information during database recovery to rooback uncommitted transactions for users. -- what is table space? A data base is divided into logical storage unit called table space.A table space is used to grouped related logical structures together. -- How to delete the duplicate records. -- What are the difference types of joins in Oracle? Self-join,equi-join,outer join. 77) What is outer join? One of which rows that dont match those in the commen column of another table. 78) write query Max 5 salaries? Select * from emp e where 5>(select count(*) from emp where sal>e.sal) 79) what is synonym? 82) What is bit map index and example? 83) What is stored procedure and advantages? 84) Explain cursor and how many types of triggers in oracle? Trigger is stored procedure.Trigger is automatically executed. 85) Difference between function and stored procedure? Function returns a value.Procedure does not return a value(but returns a value tru IN OUT parameters!!!!!!) 86) Difference between replace and translate? 87) Write the query nth max sal Select distinct (a.sal) from emp a where &n=select count(distinct(b.sal) from emp b where a.sal<=b.sal 88) Write the query odd and even numbers? Select * from emp where (rowed,1) in (select rowed,mod(rownum,2) from emp) DataWareHousing - ETL Project Life Cycle ( Simple to understand ) Warehousing

-> Datawarehousing projects are categorized into 4 types. 1) Development Projects. 2) Enhancement Projects 3) Migration Projects 4) Production support Projects. -> The following are the different phases involved in a ETL project development life cycle. 1) Business Requirement Collection ( BRD ) 2) System Requirement Collection ( SRD ) 3) Design Phase a) High Level Design Document ( HRD ) b) Low level Design Document ( LLD ) c) Mapping Design 4) Code Review 5) Peer Review

6) Testing a) Unit Testing b) System Integration Testing. c) USer Acceptance Testing ( UAT ) 7) Pre - Production 8) Production ( Go-Live ) Business Requirement Collection :---------------------------------------------> The business requirement gathering start by business Analyst, onsite technical lead and client business users. -> In this phase,a Business Analyst prepares Business Requirement Document ( BRD ) (or) Business Requirement Specifications ( BRS ) -> BR collection takes place at client location. -> The o/p from BR Analysis are -> BRS :- Business Analyst will gather the Business Requirement and document in BRS -> SRS :- Senior technical people (or) ETL architect will prepare the SRS which contains s/w and h/w requirements. The SRS will includes a) O/S to be used ( windows or unix ) b) RDBMS required to build database ( oracle, Teradata etc ) c) ETL tools required ( Informatica,Datastage ) d) OLAP tools required ( Cognos ,BO ) The SRS is also called as Technical Requirement Specifications ( TRS ) Designing and Planning the solutions :------------------------------------------------> The o/p from design and planning phase is a) HLD ( High Level Design ) Document b)LLD ( Low Level Design ) Document HLD ( High Level Design ) Document : An ETL Architect and DWH Architect participate in designing a solution to build a DWH. An HLD document is prepared based on Business Requirement. LLD ( Low Level Design ) Document : Based on HLD,a senior ETL developer prepare Low Level Design Document The LLD contains more technical details of an ETL System. An LLD contains data flow diagram ( DFD ), details of source and targets of each mapping. An LLD also contains information about full and incremental load. After LLD then Development Phase will start Development Phase ( Coding ) :--------------------------------------------------> Based an LLD, the ETL team will create mapping ( ETL Code ) -> After designing the mappings, the code ( Mappings ) will be reviewed by developers. Code Review :-> Code Review will be done by developer. -> In code review,the developer will review the code and the logic but not the data. -> The following activities takes place in code review -> You have to check the naming standards of transformation,mappings of data etc. -> Source and target mapping ( Placed the correct logic or not in mapping ) Peer Review :-> The code will reviewed by your team member ( third party developer ) Testing:--------------------------------

The following various types testing carried out in testing environment. 1) Unit Testing 2) Development Integration Testing 3) System Integration Testing 4) User Acceptance Testing Unit Testing :-> A unit test for the DWH is a white Box testing,It should check the ETL procedure and Mappings. -> The following are the test cases can be executed by an ETL developer. 1) Verify data loss 2) No.of records in the source and target 3) Dataload/Insert 4) Dataload/Update 5) Incremental load 6) Data accuracy 7) verify Naming standards. 8) Verify column Mapping -> The Unit Test will be carried by ETL developer in development phase. -> ETL developer has to do the data validations also in this phase. Development Integration Testing -> Run all the mappings in the sequence order. -> First Run the source to stage mappings. -> Then run the mappings related to dimensions and facts. System Integration Testing :-> After development phase,we have to move our code to QA environment. -> In this environment,we are giving read-only permission to testing people. -> They will test all the workflows. -> And they will test our code according to their standards. User Acceptance Testing ( UAT ) :-> This test is carried out in the presence of client side technical users to verify the data migration from source to destination. Production Environment :---------------------------------> Migrate the code into the Go-Live environment from test environment ( QA Environment ).

What is Event-Based Scheduling? When you use event-based scheduling, the Informatica Server starts a session when it locates the specified indicator file. To use event-based scheduling, you need a shell command, script, or batch file to create an indicator file when all sources are available. The file must be created or sent to a directory local to the Informatica Server. The file can be of any format recognized by the Informatica Server operating system. The Informatica Server deletes the indicator file once the session starts. Use the following syntax to ping the Informatica Server on a UNIX system: pmcmd ping [{user_name | %user_env_var} {password | %password_env_var}] [hostname:]portno Use the following syntax to start a session or batch on a UNIX system: pmcmd start {user_name | %user_env_var} {password | %password_env_var} [hostname:]portno [folder_name:]{session_name | batch_name} [:pf=param_file] session_flag wait_flag Use the following syntax to stop a session or batch on a UNIX system: pmcmd stop {user_name | %user_env_var} {password | %password_env_var} [hostname:]portno[folder_name:]{session_name | batch_name} session_flag

Use the following syntax to stop the Informatica Server on a UNIX system: pmcmd stopserver {user_name | %user_env_var} {password | %password_env_var} [hostname:]portno 2. Explain the following commands. $ ls > file1 $ banner hi-fi > message $ cat par.3 par.4 par.5 >> report $ cat file1>file1 $ date ; who $ date ; who > logfile $ (date ; who) > logfile 3.wht is the significance of "tee" cmd? It reads the standard input and sends it to the standard output while redirecting a copy of what it has read to the file specified by the user. 4. What does the command $who | sort logfile > newfile do? The input from a pipe can be combined with the input from a file . The trick is to use the special symbol - (a hyphen) for those commands that recognize the hyphen as std input. In the above command the output from who becomes the std input to sort , meanwhile sort opens the file logfile, the contents of this file is sorted together with the output of who (rep by the hyphen) and the sorted output is redirected to the file newfile. 5. What does the command $ls | wc l > file1 do? ls becomes the input to wc which counts the number of lines it receives as input and instead of displaying this count , the value is stored in file1. 6. Which of the following commands is not a filter man , (b) cat , (c) pg , (d) head Ans: man A filter is a program which can receive a flow of data from std input, process (or filter) it and send the result to the std output. 7. How is the command $cat file2 different from $cat >file2 and >> redirection operators ? is the output redirection operator when used it overwrites while >> operator appends into the file. 9. Explain the steps that a shell follows while processing a command. After the command line is terminated by the key, the shel goes ahead with processing the command line in one or more passes. The sequence is well defined and assumes the following order. Parsing quoted. All consecutive occurrences of a space or tab are replaced here with a single space. : The shell first breaks up the command line into words, using spaces and the delimiters, unless Variable evaluation : Any command surrounded by backquotes is executed by the shell which then: The shell finally scans the command line for wild-cards (the characters *, ?, [, ]). Command substitution:Any command surrounded by backquotes is executed by the shell which then replaces the standard output of the command into the command line. Wild-card interpretation:Any word containing a wild-card is replaced by a sorted list of filenames that match the pattern. The list of these filenames then forms the arguments to the command. PATH evaluation: It finally looks for the PATH variable to determine the sequence of directories it has to search in order to hunt for the command. 12. What is the difference between cat and more command? Cat displays file contents. If the file is large the contents scroll off the screen before we view it. So command 'more' is like a pager which displays the contents page by page. 13. Write a command to kill the last background job? Kill $! 14. Which command is used to delete all files in the current directory and all its sub-directorories? rm -r * 15. Write a command to display a files contents in various formats? $od -cbd file_name c - character, b - binary (octal), d-decimal, od=Octal Dump. 16. What will the following command do?

$ echo * It is similar to 'ls' command and displays all the files in the current directory. 17. Is it possible to create new a file system in UNIX? Yes, mkfs is used to create a new file system. 18. Is it possible to restrict incoming message? Yes, using the mesg command. 19. What is the use of the command "ls -x chapter[1-5]" ls stands for list; so it displays the list of the files that starts with 'chapter' with suffix '1' to '5', chapter1, chapter2, and so on. 20. Is du a command? If so, what is its use? Yes, it stands for disk usage. With the help of this command you can find the disk capacity and free space of the disk. 21. Is it possible to count number char, line in a file; if so, How? Yes, wc-stands for word count. wc -c for counting number of characters in a file. wc -l for counting lines in a file. 22. Name the data structure used to maintain file identification? inode, each file has a separate inode and a unique inode number. 23. How many prompts are available in a UNIX system? Two prompts, PS1 (Primary Prompt), PS2 (Secondary Prompt). 24. How does the kernel differentiate device files and ordinary files? Kernel checks 'type' field in the file's inode structure. 25. How to switch to a super user status to gain privileges? Use su command. The system asks for password and when valid entry is mad e the user gains super user (admin) privileges. 26. What are shell variables? Shell variables are special variables, a name-value pair created and maintained by the shell. Example: PATH, HOME, MAIL and TERM 27. What is redirection? Directing the flow of data to the file or from the file for input or output. Example : ls > wc 28. How to terminate a process which is running and the specialty on command kill 0? With the help of kill command we can terminate the process. Syntax: kill pid Kill 0 - kills all processes in your system except the login shell. 29. What is a pipe and give an example? A pipe is two or more commands separated by pipe char '|'. That tells the shell to arrange for the output of the preceding command to be passed as input to the following command. Example : ls -l | pr The output for a command ls is the standard input of pr. When a sequence of commands are combined using pipe, then it is called pipeline. 30. Explain kill() and its possible return values. There are four possible results from this call: kill() returns 0. This implies that a process exists with the given PID, and the system would allow you to send signals to it. It is system-dependent whether the process could be a zombie. kill() returns -1, errno == ESRCH either no process exists with the given PID, or security enhancements are causing the system to deny its existence. (On some systems, the process could be a zombie.) kill() returns -1, errno == EPERM the system would not allow you to kill the specified process. Thi s means that either the process exists (again, it could be a zombie) or draconian security enhancements are present (e.g. your process is not allowed to send signals to *anybody*). kill() returns -1, with some other value of errno you are in trouble! The most-used technique is to assume that success or failure with EPERM implies that the process exists, and any other error implies that it doesn't.

An alternative exists, if you are writing specifically for a system (or all those systems) that provide a /proc filesystem: checking for the existence of /proc/PID may work. 31. What is relative path and absolute path. Absolute path : Exact path from root directory. Relative path : Relative to the current path. 32.Construct pipes to execute the following jobs. 1. Output of who should be displayed on the screen with value of total number of users who have logged in displayed at the bottom of the list. 2. Output of ls should be displayed on the screen and from this output the lines containing the word poem should be counted and the count should be stored in a file. 3. Contents of file1 and file2 should be displayed on the screen and this output should be appended in a file . From output of ls the lines containing poem should be displayed on the screen alo ng with the count. 4. Name of cities should be accepted from the keyboard . This list should be combined with the list present in a file. This combined list should be sorted and the sorted list should be stored in a file newcity. 5. All files present in a directory dir1 should be deleted any error while deleting should be stored in a file errorlog. 11. What is the use of grep command? grep is a pattern search command. It searches for the pattern, specified in the command line with appropriate option, in a file(s). Syntax : grep Example : grep 99mx mcafile 10. What difference between cmp and diff commands? cmp - Compares two files byte by byte and displays the first mismatch diff - tells the changes to be made to make the files identical How do send the session report (.txt) to manager after session is completed? Email variable - %a (attach the file) %g attach session log file

How do identify the empty line in a flat file in Unix? How to remove it? grep v ^$ filename

List the files in ascending order in Unix? ls -lt (sort by last date modified) ls ltr (reverse) ls lS (sort by size of the file) How to open a Database using Unix Command mysql -u uname -h hostname p pwd What is command to check space in Unix? Ans: df -k What is command to kill last background Job? Ans: kill $! How you will list all Hidden files? Ans: ls -la|grep ^[.] How to kill a process forcibly?

Ans: kill -9 PID (Process Identification Number) How to print/display the first line of a file? There are many ways to do this. However the easiest way to display the first line of a file is using the [head] command. $> head -1 file.txt No prize in guessing that if you specify [head -2] then it would print first 2 records of the file. Another way can be by using [sed] command. [Sed] is a very powerful text editor which can be used for various text manipulation purposes like this. $> sed '2,$ d' file.txt How does the above command work? The 'd' parameter basically tells [sed] to delete all the records from display from line 2 to last line of the file (last line is represented by $ symbol). Of course it does not actually delete those lines from the file, it just does not display those lines in standard output screen. So you only see the remaining line which is the 1st line. How to print/display the last line of a file? The easiest way is to use the [tail] command. $> tail -1 file.txt If you want to do it using [sed] command, here is what you should write: $> sed -n '$ p' test From our previous answer, we already know that '$' stands for the last line of the file. So '$ p' basically prints (p for print) the last line in standard output screen. '-n' switch takes [sed] to silent mode so that [sed] does not print anything else in the output. How to display n-th line of a file? The easiest way to do it will be by using [sed] I guess. Based on what we already know about [sed] from our previous examples, we can quickly deduce this command: $> sed n '<n> p' file.txt You need to replace <n> with the actual line number. So if you want to print the 4th line, the command will be $> sed n '4 p' test Of course you can do it by using [head] and [tail] command as well like below: $> head -<n> file.txt | tail -1 You need to replace <n> with the actual line number. So if you want to print the 4th line, the command will be $> head -4 file.txt | tail -1 How to remove the first line / header from a file? We already know how [sed] can be used to delete a certain line from the output by using the'd' switch. So if we want to delete the first line the command should be: $> sed '1 d' file.txt But the issue with the above command is, it just prints out all the lines except the first line of the file on the standard output. It does not really change the file in-place. So if you want to delete the first line from the file itself, you have two options. Either you can redirect the output of the file to some other file and then rename it back to original file like below: $> sed '1 d' file.txt > new_file.txt $> mv new_file.txt file.txt Or, you can use an inbuilt [sed] switch 'i' which changes the file in-place. See below: $> sed i '1 d' file.txt How to remove the last line/ trailer from a file in Unix script? Always remember that [sed] switch '$' refers to the last line. So using this knowledge we can deduce the below command: $> sed i '$ d' file.txt

How to remove certain lines from a file in Unix? If you want to remove line <m> to line <n> from a given file, you can accomplish the task in the similar method shown above. Here is an example: $> sed i '5,7 d' file.txt The above command will delete line 5 to line 7 from the file file.txt How to remove the last n-th line from a file? This is bit tricky. Suppose your file contains 100 lines and you want to remove the last 5 lines. Now if you know how many lines are there in the file, then you can simply use the above shown method and can remove all the lines from 96 to 100 like below: $> sed i '96,100 d' file.txt # alternative to command [head -95 file.txt] But not always you will know the number of lines present in the file (the file may be generated dynamically, etc.) In that case there are many different ways to solve the problem. There are some ways which are quite complex and fancy. But let's first do it in a way that we can understand easily and remember easily. Here is how it goes: $> tt=`wc -l file.txt | cut -f1 -d' '`;sed i "`expr $tt - 4`,$tt d" test As you can see there are two commands. The first one (before the semi-colon) calculates the total number of lines present in the file and stores it in a variable called tt. The second command (after the semi -colon), uses the variable and works in the exact way as shows in the previous example. How to check the length of any line in a file? We already know how to print one line from a file which is this: $> sed n '<n> p' file.txt Where <n> is to be replaced by the actual line number that you want to print. Now once you know it, it is easy to print out the length of this line by using [wc] command with '-c' switch. $> sed n '35 p' file.txt | wc c The above command will print the length of 35th line in the file.txt. How to get the nth word of a line in Unix? Assuming the words in the line are separated by space, we can use the [cut] command. [cut] is a very powerful and useful command and it's real easy. All you have to do to get the n-th word from the line is issue the following command: cut f<n> -d' ' '-d' switch tells [cut] about what is the delimiter (or separator) in the file, which is space ' ' in this case. If the separator was comma, we could have written -d',' then. So, suppose I want find the 4th word from the below string: A quick brown fox jumped over the lazy cat, we will do something like this: $> echo A quick brown fox jumped over the lazy cat | cut f4 d' ' And it will print fox How to reverse a string in unix? Pretty easy. Use the [rev] command. $> echo "unix" | rev xinu How to get the last word from a line in Unix file? We will make use of two commands that we learnt above to solve this. The commands are [rev] and [cut]. Here we go. Let's imagine the line is: C for Cat. We need Cat. First we reverse the line. We get taC rof C. Then we cut the first word, we get 'taC'. And then we reverse it again. $>echo "C for Cat" | rev | cut -f1 -d' ' | rev Cat How to get the n-th field from a Unix command output? We know we can do it by [cut]. Like below command extracts the first field from the output of [wc c] command

$>wc -c file.txt | cut -d' ' -f1 109 But I want to introduce one more command to do this here. That is by using [awk] command. [awk] is a very powerful command for text pattern scanning and processing. Here we will see how may we use of [awk] to extract the first field (or first column) from the output of another command. Like above suppose I want to print the first column of the [wc c] output. Here is how it goes like this: $>wc -c file.txt | awk ' ''{print $1}' 109 The basic syntax of [awk] is like this: awk 'pattern space''{action space}' The pattern space can be left blank or omitted, like below: $>wc -c file.txt | awk '{print $1}' 109 In the action space, we have asked [awk] to take the action of printing the first column ($1). More on [awk] later. How to replace the n-th line in a file with a new line in Unix? This can be done in two steps. The first step is to remove the n-th line. And the second step is to insert a new line in n-th line position. Here we go. Step 1: remove the n-th line $>sed -i'' '10 d' file.txt # d stands for delete Step 2: insert a new line at n-th line position $>sed -i'' '10 i This is the new line' file.txt # i stands for insert How to show the non-printable characters in a file? Open the file in VI editor. Go to VI command mode by pressing [Escape] and then [:]. Then type [set list]. This will show you all the non-printable characters, e.g. Ctrl-M characters (^M) etc., in the file. How to zip a file in Linux? Use inbuilt [zip] command in Linux How to unzip a file in Linux? Use inbuilt [unzip] command in Linux. $> unzip j file.zip How to test if a zip file is corrupted in Linux? Use -t switch with the inbuilt *unzip+ command $> unzip t file.zip How to check if a file is zipped in Unix? In order to know the file type of a particular file use the [file] command like below: $> file file.txt file.txt: ASCII text If you want to know the technical MIME type of the file, use -i switch. $>file -i file.txt file.txt: text/plain; charset=us-ascii If the file is zipped, following will be the result $> file i file.zip file.zip: application/x-zip How to connect to Oracle database from within shell script? You will be using the same [sqlplus] command to connect to database that you use normally even outside the shell script. To understand this, let's take an example. In this example, we will connect to database, fire a query and get the output printed from the unix shell. Ok? Here we go

$>res=`sqlplus -s username/password@database_name <<EOF SET HEAD OFF; select count(*) from dual; EXIT; EOF` $> echo $res 1 If you connect to database in this method, the advantage is you will be able to pass Unix side shell variables value to the database. See below: $>res=`sqlplus -s username/password@database_name <<EOF SET HEAD OFF; select count(*) from customer where last_name='$1'; EXIT; EOF` $> echo $res 12 How to execute a database stored procedure from Shell script? $> SqlReturnMsg=`sqlplus -s username/password@database <<EOF BEGIN Proc_Your_Procedure( your-input-parameters ); END; / EXIT; EOF` $> echo $SqlReturnMsg How to check the command line arguments in a UNIX command in Shell Script? In a bash shell, you can access the command line arguments using $0, $1, $2, variables, where $0 prints the command name, $1 prints the first input parameter of the command, $2 the second input parameter of the command and so on. How to fail a shell script programmatically? Just put an [exit] command in the shell script with return value other than 0. this is because the exit codes of successful Unix programs is zero. So, suppose if you write exit -1 inside your program, then your program will thrown an error and exit immediately. How to list down file/folder lists alphabetically? Normally [ls lt] command lists down file/folder list sorted by modified time. If you want to list then alphabetically, then you should simply specify: [ls l] How to check if the last command was successful in Unix? To check the status of last executed command in UNIX, you can check the value of an inbuilt bash variable [$?]. See the below example: $> echo $? How to check if a file is present in a particular directory in Unix? Using command, we can do it in many ways. Based on what we have learnt so far, we can make use of [ls] and [$?] command to do this. See below: $> ls l file.txt; echo $? If the file exists, the [ls] command will be successful. Hence [echo $?] will print 0. If the file does not exist, then [ls] command will fail and hence [echo $?] will print 1.

How to check all the running processes in Unix? The standard command to see this is [ps]. But [ps] only shows you the snapshot of the processes at that instance. If you need to monitor the processes for a certain period of time and need to refresh the results in each interval, consider using the [top] command. $> ps ef If you wish to see the % of memory usage and CPU usage, then consider the below switches $> ps aux If you wish to use this command inside some shell script, or if you want to customize the output of [ps] command, you may use -o switch like below. By using -o switch, you can specify the columns that you want *ps+ to print out. $>ps -e -o stime,user,pid,args,%mem,%cpu How to tell if my process is running in Unix? You can list down all the running processes using *ps+ command. Then you can grep your user name or process name to see if the process is running. See below: $>ps -e -o stime,user,pid,args,%mem,%cpu | grep "opera" 14:53 opera 29904 sleep 60 0.0 0.0 14:54 opera 31536 ps -e -o stime,user,pid,arg 0.0 0.0 14:54 opera 31538 grep opera 0.0 0.0 How to get the CPU and Memory details in Linux server? In Linux based systems, you can easily access the CPU and memory details from the /proc/cpuinfo and /proc/meminfo, like this: $>cat /proc/meminfo $>cat /proc/cpuinfo Just try the above commands in your system to see how it works.

How to display top 10 users Who | head -10 | wc w? who | head -10 ls -lrt |tail -10 What is ls -ltd? ls - ltd current directory information. Option l: Long listing t: Time Stamp d: Information about current directory (Used with -l Option) Which transformation should we use to normalize the COBOL and relational sources? Normalizer Transformation. When we drag the COBOL source in to the mapping Designer workspace, the Normalizer transformation automatically appears, creating input and output ports for every column in the source. Difference between static cache and dynamic cache? In case of Dynamic cache when you are inserting a new row it looks at the lookup cache to see if the row existing or not, If not it inserts in the target and cache as well in case of Static cache when you are inserting a new row it

checks the cache and writes to the target but not cache If you cache the lookup table, you can choose to use a dynamic or static cache. By default, the lookup cache remains static and does not change during the session. With a dynamic cache, the Informatica Server inserts or updates rows in the cache during the session. When you cache the target table as the lookup, you can look up values in the target and insert them if they do not exist, or update them if they do. What are the join types in joiner transformation? The following are the join types Normal, Master Outer, Detail Outer, and Full Outer In which conditions we can not use joiner transformation (Limitations of joiner transformation)? In the conditions; either input pipeline contains an Update Strategy transformation, you connect a Sequence Generator transformation directly before the Joiner transformation 1.Both input pipelines originate from the same Source Qualifier transformation. 2. Both input pipelines originate from the same Normalizer transformation. 3. Both input pipelines originate from the same Joiner transformation. 4. Either input pipeline contains an Update Strategy transformation. 5. We connect a Sequence Generator transformation directly before the Joiner transformation. What is the look up transformation? Used to look up data in a relational table or view. Lookup is a passive transformation and used to look up data in a flat file or a relational table What are the difference between joiner transformation and source qualifier transformation? Source Qualifier Operates only with relational sources within the same schema. Joiner can have either heterogeneous sources or relation sources in different schema 2. Source qualifier requires atleats one matching column to perform a join. Joiner joins based on matching port. 3. Additionally, Joiner requires two separate input pipelines and should not have an update strategy or Sequence generator (this is no longer true from Infa 7.2). 1) Joiner can join relational sources which come from different sources whereas in source qualifier the relational sources should come from the same data source. 2) We need matching keys to join two relational sources in source qualifier transformation. Where as we doesnt need matching keys to join two sources. Why use the lookup transformation? Lookup is a transformation to look up the values from a relational table/view or a flat file. The developer defines the lookup match criteria. There are two types of Lookups in Power center-Designer, namely; 1) Connected Lookup 2) Unconnected Lookup. Different caches can also be used with lookup like static, dynamic, persistent, and shared (The dynamic cache cannot be used while creating an un-connected lookup) Lookup transformation is Passive and it can be both Connected and Unconnected as well. It is used to look up data in a relational table, view, or synonym look up is used to perform one of the following task: -to get related value -to perform calculation -to update slowly changing dimension table check whether the record already existing in the table What are the difference between joiner transformation and source qualifier transformation? Source Qualifier Operates only with relational sources within the same schema. Joiner can have either heterogeneous sources or relation sources in different schema 2. Source qualifier requires atleats one matching

column to perform a join. Joiner joins based on matching port. 3. Additionally, Joiner requires two separate input pipelines and should not have an update strategy or Sequence generator. 1) Joiner can join relational sources which come from different sources whereas in source qualifier the relational sources should come from the same data source. 2) We need matching keys to join two relational sources in source qualifier transformation. Where as we doesnt need matching keys to join two sources . What is source qualifier transformation? SQ is an active transformation. It performs one of the following task: to join data from the same source database to filter the rows when Power centre reads source data to perform an outer join to select only distinct values from the source In source qualifier transformation a user can defined join conditions, filter the data and eliminating the duplicates. The default source qualifier can over write by the above options; this is known as SQL Override. alidwh@gmail.com the source qualifier represents the records that the Informatica server reads when it runs a session. When we add a relational or a flat file source definition to a mapping, we need to connect it to a source qualifier transformation. The source qualifier transformation represents the records that the Informatica server reads when it runs a session. How the Informatica server increases the session performance through partitioning the source? Partitioning the session improves the session performance by creating multiple connections to sources and targets and loads data in parallel pipe lines What are the rank caches? The Informatica server stores group information in an index cache and row data in data cache when the server runs a session with a Rank transformation; it compares an input row with rows with rows in data cache. If the input row out-ranks a stored row, the Informatica server replaces the stored row with the input row. During the session, the Informatica server compares an input row with rows in the data cache. If the input row outranks a stored row, the Informatica server replaces the stored row with the input row. The Informatica server stores group information in an index cache and row data in a data cache. What is Code Page Compatibility? When two code pages are compatible, the characters encoded in the two code pages are virtually identical. Compatibility between code pages is used for accurate data movement when the Informatica Sever runs in the Unicode data movement mode. If the code pages are identical, then there will not be any data loss. One code page can be a subset or superset of another. For accurate data movement, the target code page must be a superset of the source code page. How can you create or import flat file definition in to the warehouse designer? By giving server connection path Create the file in Warehouse Designer or Import the file from the location it exists or modify the source if the structure is one and the same

first create in source designer then drag into warehouse designer you can't create a flat file target defenition directly ramraj There is no way to import target definition as file in Informatica designer. So while creating the target definition for a file in the warehouse designer it is created considering it as a table, and then in the session properties of that mapping it is specified as file. U can not create or import flat file definition in to warehouse designer directly.Instead U must analyze the file in source analyzer,then drag it into the warehouse designer.When U drag the flat file source definition into warehouse designer workspace,the warehouse designer creates a relational target definition not a file definition.If u want to load to a file,configure the session to write to a flat file.When the informatica server runs the session,it creates and loads the flatfile. What is aggregate cache in aggregator transformation? Aggregate value will stored in data cache, grouped column value will stored in index cache Power centre server stores data in the aggregate cache until it completes aggregate calculations. aggregator transformation contains two caches namely data cache and index cache data cache consists aggregator value or the detail record index cache consists grouped column value or unique values of the records When the Power Center Server runs a session with an Aggregator transformation, it stores data in aggregator until it completes the aggregation calculation. The aggregator stores data in the aggregate cache until it completes aggregate calculations. When u run a session that uses an aggregator transformation, the Informatica server creates index and data caches in memory to process the transformation. If the Informatica server requires more space, it stores overflow values in cache files. How can you recognize whether or not the newly added rows in the source are gets insert in the target? In the type-2 mapping we have three options to recognize the newly added rows. i) Version Number ii) Flag Value iii) Effective Date Range From session SrcSuccessRows can be compared with TgtSuccessRows check the session log or check the target table How the Informatica server increases the session performance through partitioning the source? Partitioning the session improves the session performance by creating multiple connections to sources and targets and loads data in parallel pipe lines What are the types of lookup? Mainly first three Based on connection: 1. Connected 2. Unconnected Based on source Type: 1. Flat file 2. Relational Based on cache: 1. Cached 2. Un cached Based on cache Type: 1. Static 2. Dynamic Based on reuse: 1. persistence 2. Non persistence Based on input: 1. Sorted 2. Unsorted connected, unconnected mainly two types of look up...there 1.static lookup 2.dynamic lookup In static lookup..There two types are used one is connected and unconnected. In connected lookup means while using the pipeline symbol... In unconnected lookup means while using the expression condition

What are the types of metadata that stores in repository? Data base connections, global objects,sources,targets,mapping,mapplets,sessions,shortcuts,transfrmations The repository stores metadata that describes how to transform and load source and target data. Data about data Metadata can include information such as mappings describing how to transform source data, sessions indicating when you want the Informatica Server to perform the transformations, and connect strings for sources and targets. Following are the types of metadata that stores in the repository Database connections Global objects Mappings Mapplets Multidimensional metadata Reusable transformations Sessions and batches Short cuts Source definitions Target definitions Transformations. What happens if Informatica server doesn't find the session parameter in the parameter file? Workflow will fail. Can you access a repository created in previous version of informatica? We have to migrate the repository from the older version to newer version. Then you can use that repository. Without using ETL tool can u prepare a Data Warehouse and maintain Yes we can do that using PL/ SQL or Stored procedures when all the data are in the same databases. If you have source as flat files you can?t do it through PL/ SQL or stored procedures. How do you identify the changed records in operational data In my project source system itself sending us the new records and changed records from the last 24 hrs. Why couldn't u go for Snowflake schema? Snowflake is less performance while compared to star schema, because it will contain multi joins while retrieving the data. Snowflake is preferred in two cases, &#61656; If you want to load the data into more hierarchical levels of information example yearly, quarterly, monthly, daily, hourly, minutes of information. Prefer snowflake. &#61656; Whenever u found input data contain more low cardinality elements. You have to prefer snowflake schema. Low cardinality example: sex , marital Status, etc., Low cardinality means no of distinct records is very less while compared to total number of the records Name some measures in your fact table? Sales amount. How many dimension tables did you had in your project and name some dimensions (columns)?

Product Dimension : Product Key, Product id, Product Type, Product name, Batch Number. Distributor Dimension: Distributor key, Distributor Id, Distributor Location, Customer Dimension : Customer Key, Customer Id, CName, Age, status, Address, Contact Account Dimension : Account Key, Acct id, acct type, Location, Balance, How many Fact and Dimension tables are there in your project? In my module (Sales) we have 4 Dimensions and 1 fact table. How many Data marts are there in your project? There are 4 Data marts, Sales, Marketing, Finance and HR. In my module we are handling only sales data mart. What is the daily data volume (in GB/records)? What is the size of the data extracted in the extraction process? Approximately average 40k records per file per day. Daily we will get 8 files from 8 source systems. What is the size of the database in your project? Based on the client?s database, it might be in GB?s. What is meant by clustering? It will join two (or more) tables in single buffer, will retrieve the data easily. . Whether are not the session can be considered to have a heterogeneous target is determined? It will consider (there is no primary key and foreign key relationship) Under what circumstance can a target definition are edited from the mapping designer. Within the mapping where that target definition is being used? We can't edit the target definition in mapping designer. we can edit the target in warehouse designer only. But in our projects, we haven't edited any of the targets. if any change required to the target definition we will inform to the DBA to make the change to the target definition and then we will import again. We don't have any permission to the edit the source and target tables. Can a source qualifier be used to perform a outer join when joining 2 database? No, we can't join two different databases join in SQL Override. If u r source is flat file with delimited operator.when next time u want change that delimited operator where u can make? In the session properties go to mappings and click on the target instance click set file properties we have to change the delimited option.

If index cache file capacity is 2MB and datacache is 1 MB. If you enter the data of capacity for index is 3 MB and data is 2 MB. What will happen? Nothing will happen based the buffer size exists in the server we can change the cache sizes. Max size of cache is 2 GB. Difference between next value and current value ports in sequence generator? Assume that they r both connected to the input of another transformer? It will gives values like nextvalue 1, currval 0. How does dynamic cache handle the duplicates rows?

Dynamic Cache will gives the flags to the records while inserting to the cache it will gives flags to the records, like new record assigned to insert flag as "0", updated record is assigned to updated flag as "1", No change record assigned to rejected flag as "2" How will u find whether your mapping is correct or not without connecting session? Through debugging option. If you are using aggregator transformation in your mapping at that time your source contain dimension or fact? According to requirements, we can use aggregator transformation. There is no limitation for the aggregator. We should use source as dimension or fact. My input is oracle and my target is flat file shall I load it? How? Yes, Create flat file based on the structure match with oracle table in warehouse designer than develop the mapping according requirement and map to that target flat file. Target file is created in TgtFiles directory in the server system. for a session, can I use 3 mappings? No, for one session there should be only one mapping. We have to create separate session for each mapping Type of loading procedures? Load procedures are two types 1) Normal load 2) bulk loads if you are talking about informatica level. If you are talking about project load procedures based on the project requirement. Daily loads or weekly loads. Are you involved in high level r low level design? What is meant by that high level design n low level design? Low Level design: Requirements should be in the excel format which describes field to field validations and business logic needs to

present. Mostly onsite team will do this Low Level design. High Level Design: Describes the informatica flow chart from source qualifier to target simply we can say flow chart of the informatica mapping. Developer will do this design document. what r the dimension load methods? Daily loads or weekly loads based on the project requirement. where we are using lkp b/n source to stage or stage to target? Depend on the requirement. There is no rule we have to use in this stage only. How will you do SQL tuning? We can do SQL tuning using Oracle Optimizer, TOAD software did u use any other tools for scheduling purpose other than workflow manager or pmcmd? Using third party tools like "Control M", What is SQL mass updating? A) Update (select hs1.col1 as hs1_col1 , hs1.col2 as hs1_col2 , hs1.col3 as hs1_col3 , hs2.col1 as hs2_col1 , hs2.col2 as hs2_col2 , hs2.col3 as hs2_col3 From hs1, hs2 Where hs1.sno = hs2.sno) set hs1_col1 = hs2_col1 , hs1_col2 = hs2_col2 , hs1_col3 = hs2_col3; what is unbounded exception in source qualifier? "TE_7020 Unbound field in Source Qualifier" when running session A) Problem Description: When running a session the session fails with the following error: TE_7020 Unbound field <field_name> in Source Qualifier <SQ_name>" Solution: This error will occur when there is an inconsistency between the Source Qualifier and the source table. Either there is a field in the Source Qualifier that is not in the physical table or there is a column of the source object that has no link to the corresponding port in the Source Qualifier. To resolve this, re-import the source definition into the Source Analyzer in Designer. Bring the new Source definition into the mapping.This will also re-create the Source Qualifier. Connect the new Source Qualifier to the rest of the mapping as before. Using unconnected lookup how we you remove nulls n duplicates?

We can't handle nulls and duplicates in the unconnected lookup. We can handle in dynamic connected lookup. I have 20 lookup, 10 joiners, 1 normalizer how will you improve the session performance? We have to calculate lookup & joiner caches size. What is version controlling? It is the method to differentiate the old build and the new build after changes made to the existing code. For the old code v001 and next time u have to increase the version number as v002 like that. In my last company we haven't use any version controlling. We just delete the old build and replace with the new code. We don't maintain version controlling in informatica. We are maintaining the code in VSS (Virtual visual Source) that is the software with maintain the code with versioning. Whenever client made change request came once the production starts we have to create another build. How is the Sequence Generator transformation different from other transformations? The Sequence Generator is unique among all transformations because we cannot add, edit, or delete its default ports (NEXTVAL and CURRVAL). Unlike other transformations we cannot override the Sequence Generator transformation properties at the session level. This protecxts the integrity of the sequence values generated. What are the advantages of Sequence generator? Is it necessary, if so why? We can make a Sequence Generator reusable, and use it in multiple mappings. We might reuse a Sequence Generator when we perform multiple loads to a single target. For example, if we have a large input file that we separate into three sessions running in parallel, we can use a Sequence Generator to generate primary key values. If we use different Sequence Generators, the Informatica Server might accidentally generate duplicate key values. Instead, we can use the same reusable Sequence Generator for all three sessions to provide a unique value for each target row. What is a Data warehouse? A Data warehouse is a denormalized database, which stores historical data in summary level format. It is specifically meant for heavy duty querying and analysis. What is cube? Cube is a multidimensional representation of data. It is used for analysis purpose. A cube gives multiple views of data. What is drill-down and drill-up?

Both drill-down and drill-up are used to explore different levels of dimensionally modeled data. Drill-down allows the users view lower level (i.e. more detailed level) of data and drill-up allows the users to view higher level (i.e. more summarized level) of data. What is the need of building a data warehouse? The need of building a data warehouse is that, it acts as a storage fill for a large amount of data. It also provides end user access to a wide varity of data, helps in analyzing data more effectively and also in generating reports. It acts as a huge repository for integrated information. What is Data Modeling? What are the different types of Data Modeling? Data modeling is a process of creating data models. In other words, it is structuring and organizing data in a uniform manner where constraints are placed within the structure.The Data structure formed are maintained in a database management system. The Different types of Data Modeling are: 1. Dimension Modelling 2. E-R Modelling What are the different types of OLAP TECHNOLOGY? Online Analytical process is of three types, they are MOLAP, HOLAP and ROLAP. MOLAP Mulidimensional online analytical process. It is used for fast retrival of data and also for slicing and dicing operations. It plays a vital role in easing complex calculations. ROLAP Relational online analytical process. It has the ability to handle large amount of data. HOLAP Hybrid online analytical process. It is a combination of both HOLAP and MOLAP. What is the difference between a Database and a Datawarehouse? Database is a place where data is taken as base to data access to retrieve and load data, whereas, a data warehouse is a place where application data is managed for analysis and reporting services. Database stores data in the form of tables and columns. On the contrary, in a data warehouse, data is subject oriented and stored in the form of dimensions and packages which are used for analysis purpose. In short, we must understand that a database is used for running an enterprise but a data warehouse helps in how to run an enterprise. What is the use of tracing levels in transformation? Tracing level in the case of informatica specifies the level of detail of information that can be recorded in the session log file while executing the workflow. 4 types of tracing levels supported, by default, the tracing level for every transformation is Normal. 1. Normal: It specifies the initialization and status information and summerization of the success rows and target tows and the information about the skipped rows due to transformation errors. 2. Terse specifies Normal + Notification of data 3. Verbose Initialisation : In addition to the Normal tracing specifies the location of the data cache files and index

cache files that are treated and detailed transformation statistics for each and every transformation within the mapping. 4. Verbose data: Along with verbose initialisation records each and every record processed by the informatica server For better performance of mapping execution the tracing level should be specified as TERSE Verbose initialisation and verbose data are used for debugging purpose. Tracing levels store information about mapping and transformations. What are the various types of transformation? Various types of transformation are: Aggregator Transformation, Expression Transformation, Filter Transformation, Joiner Transformation, Lookup Transformation, Normalizer Transformation, Rank Transformation, Router Transformation, Sequence Generator Transformation, Stored Procedure Transformation, Sorter Transformation, Update Strategy Transformation, XML Source Qualifier Transformation, Advanced External Procedure Transformation, External Transformation. What is the difference between active transformation and passive transformation? An active transformation can change the number of rows that pass through it, but a passive transformation can not change the number of rows that pass through it. What is the use of control break statements? They execute a set of codes within the loop and endloop. What are the types of loading in Informatica? There are two types of loading, normal loading and bulk loading. In normal loading, it loads record by record and writes log for that. It takes comparatively a longer time to load data to the target in normal loading. But in bulk loading, it loads number of records at a time to target database. It takes less time to load data to target. What is the difference between source qualifier transformation and application source qualifier transformation? Source qualifier transformation extracts data from RDBMS or from a single flat file system. Application source qualifier transformation extracts data from application sources like ERP. How do we create primary key only on odd numbers? To create primary key, we use sequence generator and set the 'Increment by' property of sequence generator to 2

What is authenticator? It validates user name and password to access the PowerCentre repository What is the use of auxiliary mapping? Auxiliary mapping reflects change in one table whenever there is a change in the other table. What is a mapplet? Mapplet is the set of reusable transformation. Which ETL tool is more preferable Informatica or Data Stage and why? Preference of an ETL tool depends on affordability and functionality. It is mostly a tradeoff between the price and feature. While Informatica has been a market leader since the past many years, DataStage is beginning to pick up momentum. What is worklet? Worklet is an object that represents a set of tasks. What is workflow? A workflow is a set of instructions that tells the Informatica server how to execute the tasks. What is session? A session is a set of instructions to move data from sources to targets. Why do we need SQL overrides in Lookup transformations? In order to lookup more than one value from one table, we go for SQL overrides in Lookups.

Q. What type of Indexing mechanism do we need to use for a typical datawarehouse? A. On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other types of clustered/non-clustered, unique/non-unique indexes. Q. How are the Dimension tables designed?

A. Most dimension tables are designed using Normalization principles upto 2NF. In some instances they are further normalized to 3NF. Q. What is Difference between E-R Modeling and Dimensional Modeling.? A. Basic diff is E-R modeling will have logical and physical model. Dimensional model will have only physical model. E-R modeling is used for normalizing the OLTP database design. Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design. Q. Why fact table is in normal form? A. Basically the fact table consists of the Index keys of the dimension/Look up tables and the measures. so whenever we have the keys in a table .that itself implies that the table is in the normal form. Q. What are the advantages data mining over traditional approaches? A. Data Mining is used for the estimation of future. For example, if we take a company/business organization, by using the concept of Data Mining, we can predict the future of business interms of Revenue (or) Employees (or) Cutomers (or) Orders etc. Traditional approches use simple algorithms for estimating the future. But, it does not give accurate results when compared to Data Mining. Q. What are the vaious ETL tools in the Market? . What is a CUBE in datawarehousing concept? A. Cubes are logical representation of multidimensional data.The edge of the cube contains dimension members and the body of the cube contains data values. Q. What is data validation strategies for data mart validation after loading process ? A. Data validation is to make sure that the loaded data is accurate and meets the business requriments. Strategies are different methods followed to meet the validation requriments Q. What is Data warehosuing Hierarchy? A. Hierarchies Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. For example, in a time dimension, a hierarchy might aggregate data from the month level

to the quarter level to the year level. A hierarchy can also be used to define a navigational drill path and to establish a family structure. Within a hierarchy, each level is logically connected to the levels above and below it. Data values at lower levels aggregate into the data values at higher levels. A dimension can be composed of more than one hierarchy. For example, in the product dimension, there might be two hierarchies--one for product categories and one for product suppliers. Dimension hierarchies also group levels from general to granular. Query tools use hierarchies to enable you to drill down into your data to view different levels of granularity. This is one of the key benefits of a data warehouse. When designing hierarchies, you must consider the relationships in business structures. For example, a divisional multilevel sales organization. Hierarchies impose a family structure on dimension values. For a particular level value, a value at the next higher level is its parent, and values at the next lower level are its children. These familial relationships enable analysts to access data quickly. Q. What is the difference between view and materialized view? A. View - store the SQL statement in the database and let you use it as a table. Everytime you access the view, the SQL statement executes. Materialized view - stores the results of the SQL in table form in the database. SQL statement only executes once and after that everytime you run the query, the stored result set is used. Pros include quick query results. Q. What is the difference between Datawarehousing and BusinessIntelligence? A. Data warehousing deals with all aspects of managing the development, implementation and operation of a data warehouse or data mart including meta data management, data acquisition, data cleansing, data transformation, storage management, data distribution, data archiving, operational reporting, analytical reporting, security management, backup/recovery planning, etc. Business intelligence, on the other hand, is a set of software tools that enable an organization to analyze measurable aspects of their business such as sales performance, profitability, operational efficiency, effectiveness of marketing campaigns, market penetration among certain customer groups, cost trends, anomalies and exceptions, etc. Typically, the term business intelligence is used to encompass OLAP, data visualization, data mining and query/reporting tools.Think of the data warehouse as the back office and business intelligence as the entire business including the back office. The business needs the back office on which to function, but the back office without a business to support, makes no sense. What is the difference between datawarehouse and BI? A. Simply speaking, BI is the capability of analyzing the data of a datawarehouse in advantage of that business. A BI tool analyzes the data of a datawarehouse and to come into some business decision depending on the result of the analysis.

What is the difference between clustered and a non-clustered index? A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data pages. A Nonclustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk.

What is purpose of Number of Cached Values in Sequence Generator Transformation? Non-Reusable If the Value is 0 It doesnt Cache Values. Informatica reads the start value from the repository and then keeps generating the sequence values. At the end of the session it updates the current value to Start Value or Lastgeneratedsequence+1 (based on the Reset Option) If the Value is > 0 then Informatica reads the start value and caches the number of values based on the Number of Cached Values and then updates the Current value in the repository. It will again go to the Repository if all the values are used. At the end of the session it throws away any unremaining Sequence Numbers which were not used.