You are on page 1of 48

Data warehousing concepts

By PenchalaRaju.Yanamala

What is Dimensional Modelling


Latest Answer: It is logical design techniques and visual techinques it can be
contain aggregate table, dimension table, fact table ...

What is the Difference between OLTP and OLAP

Answered by swetha on 2005-03-30 12:00:33: OLTP Current data Short


database transactions Online update/insert/delete Normalization is
promoted High volume transactions Transaction
Latest Answer: Thanks Jyothsna....... ...

What is surrogate key ? where we use it expalin with examples

don't know

Latest Answer: We can say "Surrogate key" is a User defined primary key..

What are Data Marts

Data Mart is a segment of a data warehouse that can provide data for reporting
and analysis on a section, unit, department or operation in the company, e.g.
sales, payroll, production. Data marts are sometimes
Latest Answer: A Data Mart is the subset of the data warehouse that caters the
needs of a specific functional domain.examples of functional domains can be
given as Sales, Finance, Maketing, HR etc. ...

What are the methodologies of Data Warehousing.

Latest Answer: There are four methods in which one can build a
datawarehouse.1. Top-Down (Emphasizes the DW. )2. Bottom-Up (Emphasizes
data marts.)3. Hybrid (Emphasizes DW and data marts; blends “top-down” and
“bottom-up” methods.)4. Federated (Emphasizes the need to ...

What is a Data Warehousing?

Data Warehouse is a repository of integrated information, available for queries


and analysis. Data and information are extracted from heterogeneous sources as
they are generated....This makes it much
Latest Answer: Data Warehousing is Relational Database which is specially
designed for analysis processing rather then for querying and transactional
processing. ...

What are the vaious ETL tools in the Market

Latest Answer: By far, the best ETL tool on the market is Hummingbird
Genio.Hummingbird is a division of OpenText, they make, among other things,
connectivity and ETL software. ...

What is Fact table

Answer posted by Chintan on 2005-05-22 18:46:03: A table in a data warehouse


whose entries describe data in a fact table. Dimension tables contain the data
from which dimensions are created.
Latest Answer: Fact table is the one which contains measures of interest at most
granular level.These values are numeric.Ex:sales amount would be a measure
.Each dimension table has a single-part primary which exactly corresponds to
one of the components of multiparty ...

What is ODS

Latest Answer: ODS means Operational Data store. ODS & Staging layer are the
two layers between the source and the target datbases in the data
warehouse..ODS is used to store the recent data. ...

What are conformed dimensions

Latest Answer: A dimension which can be shared with multiple fact tables such
dimensions are know as conformed dimension. ...

What is a lookup table


Latest Answer: hi if the data is not available in the source systems then we have
to get the data by some reference tables which are present in the database.these
tables are called lookuptablesfor example while loading the data from oltp to
olap,we have ...

What is ER Diagram
Answered by Puneet on 2005-05-07 04:21:07: ER - Stands for entitity
relationship diagrams. It is the first step in the design of data model which will
later lead to a physical database design of possible
Latest Answer: Entity Relationship Diagrams are a major data modelling tool and
will help organize the data in your project into entities and define the relationships
There are three basic elements in ER models: Entities are the "things" about
which we seek ...

What is ETL

Answered by sunitha on 2005-04-28 21:17:53: ETL is extraction,trasformation


and loading,ETL technology is used for extraction the information from the
source database and loading it to the target database
Latest Answer: Data Acquisition technique is now called ETL(Extraction,
Transformation and Loading)Extraction-The process of extracting the data from
various sources. Sources can be file system, database, XML file, Cobol File,
ERP etcTransformation-Transforming the ...

What are conformed dimensions

Latest Answer: In Integrated schema Design, a dimension which can be shared


across multiple fact tables is called Conformed Dimension. ...

What is conformed fact?

Latest Answer: A fact,which can be used across multiple datamarts is called as


conformed fact. ...

Can a dimension table contains numeric values?

Latest Answer: Absolutely!For example, a perishable product in a grocery store


might have SHELF_LIFE (in days) as part of the product dimension. This value
may, for example, be used to calculate optimum inventory levels for the product.
Too much inventory, ...

What is a Star Schema

Answer posted by Chintan on 2005-05-22 18:34:55: A relational database


schema organized around a central table (fact table) joined to a few smaller
tables (dimension tables) using foreign key references.
Latest Answer: A data warehouse design that enhances the performance of
multidimensional queries on traditional relational databases. One fact table is
surrounded by a series of related tables. Data is joined from one of the points to
the center, providing a so-called ...

What Snow Flake Schema

Answered by Girinath.S.V.S on 2005-03-17 06:40:48: Snowflake schemas


normalize dimensions to eliminate redundancy. That is, the dimension data has
been grouped into multiple tables instead of one large
Latest Answer: Any schema with extended dimensions(ie., dimesion with one or
more extensions) is known as snowflake schema ...

What is a dimension table

Answer posted by Riaz Ahmad on 2005-06-09 14:45:26: A dimensional table is a


collection of hierarchies and categories along which the user can drill down and
drill up. it contains only the textual attributes.
Latest Answer: A dimensional table contains detail values/data which is short and
wide(ie; less coloums and more rows) Always based on dimensions analysis is
done in Datawarehousing. ...

What is data mining

Answered by Puneet on 2005-05-07 04:24:28: Data mining is a process of


extracting hidden trends within a datawarehouse. For example an insurance
dataware house can be used to mine data for the most high
Latest Answer: Data Mining: Smpler way we can define as DWH(Data
Warehouse)+ AI(Artificial Intellegence)used in DSS(Decision Supportive
System) ...

What type of Indexing mechanism do we need to use for a typical


datawarehouse
Answered by on 2005-03-23 01:45:54: bitmap index
Latest Answer: Space requirements for indexes in a warehouse are often
significantly larger than the space needed to store the data, especially for the fact
table and particularly if the indexes are B*trees.Hence, you may want to keep
indexing on the fact table to a ...
Differences between star and snowflake schemas

Answered by sudhakar on 2005-05-09 18:32:18: star schema uses denormalized


dimension tables,but in case of snowflake schema it uses normalized dimensions
to avoid redundancy...
Latest Answer: star schema uses denormalized dimension tables,but in case of
snowflake schema it uses normalized dimensions to avoid redundancy... ...

What is Difference between E-R Modeling and Dimentional Modeling.

Latest Answer: E-R Modeling is a model for OLTP, optimized for Operational
database, namely insert, update, delete data and stressing on data relational
integrity.Dimensional Modeling is a model for OLAP, optimized for retrieving data
because it's uncommon to update ...

Why fact table is in normal form?

Latest Answer: The Fact table is central table in Star schema, Fact table is kept
Normalized because its very bigger and so we should avoid redundant data in it.
Thats why we make different dimensions there by making normalized star
schema model which helps in query ...

What is junk dimension?what is the difference between junk dimension and


degenerated dimension?

Latest Answer: Junk Dimension also called as garbage dimension. A garbage


dimension is a dimension that consists of low-cardinality columns such as codes,
indicators, status,and flags. The garbage dimension is also referred to as a junk
dimension. Attributes in a garbage ...

What are slowly changing dimensions

Latest Answer: The definition of slowly changing dimension is in its name only.
The dimension which changes slowly with time. A customer dimension table
represents customer. When creating a customer, normal assumption is it is
independent of time. But what if address ...

How do you load the time dimension


Latest Answer: create a procedure to load data into Time Dimension. The
procedure needs to run only once to popullate all the data. For eg, the code
below fills up till 2015. You can modify the code to suit the feilds in ur table.create
or replace procedure ...

Difference between Snow flake and Star Schema. What are situations where
Snow flake Schema is better

Difference between Snow flake and Star Schema. What are situations where
Snow flake Schema is better than Star Schema to use and when the opposite is
true?

What is a linked cube?

A cube can be stored on a single analysis server and then defined as a linked
cube on other Analysis servers. End users connected to any of these analysis
servers can then access the cube. This arrangement
Latest Answer: Hi All,Could you please let me know what is Replicate Cube &
Transparent Cube?Thanks & regards,Amit Sagpariya ...

What is the datatype of the surrgate key

Latest Answer: It is a system generated sequence number, an artificial key used


in maintaining history.It comes while handling slowly changing dimensions ...

For 80GB Datawarehouse How many records are there in Fact Table There are
25 Dimension and 12 Fact
For 80GB Datawarehouse How many records are there in Fact Table There are
25 Dimension and 12 Fact Tables

How data in datawarehouse stored after data has been extracted and
transformed from hetrogeneous sources

How data in datawarehouse stored after data has been extracted and
transformed from hetrogeneous sources and where does the data go from
datawarehouse.

What is the role of surrogate keys in data warehouse and how will u generate
them?
Latest Answer: A surrogate key is a substitution for the natural primary key. We
tend to use our own Primary keys (surrogate keys) rather than depend on the
primary key that is available in the source system. When integrating the data,
trying to work with ...

What are the various Reporting tools in the Market

Answered by Hemakumar on 2005-04-12 05:40:50:


Cognos BusinessObjects MicroStrategies Actuate
Latest Answer: Dear friends you have mentioned so many reporting tools but
missed one open source tool (java based)that is jasper reportsunfortunatly i am
working on that. ...

What is Normalization, First Normal Form, Second Normal Form , Third Normal
Form

Answer posted by Badri Santhosh on 2005-05-18 09:40:29: Normalization : The


process of decomposing tables to eliminate data redundancy is called
Normalization. 1N.F:- The table should caontain
Latest Answer: Normalization:It is the process of efficiently organizing data in a
database.There are 2-goals of the normalization process: 1. Eliminate redundant
data 2. Ensure data dependencies make sense(only storing related data in a
table)First Normal ...

What does level of Granularity of a fact table signify

Latest Answer: Granularity means nothing but it is a level of representation of


measures and metrics.The lowest level is called detailed dataand highest level is
called summary dataIt depends of project we extract fact table significanceBye ...

What are non-additive facts

Latest Answer: Non additive facts are the facts that do not participate in
arithmetic caliculations. for example in stock fact table there will be opening and
closing balances along with qty sold and amt etc. but opening and closing
balances were never used in arithmetic ...

What is VLDB
Answered by Kiran on 2005-05-06 20:12:19: The perception of what constitutes a
VLDB continues to grow. A one terabyte database would normally be considered
to be a VLDB.
Latest Answer: Very Large Database (VLDB)it is sometimes used to describe
databases occupying magnetic storage in the terabyte range and containing
billions of table rows. Typically, these are decision support systems or
transaction processing applications serving large ...

What is SCD1 , SCD2 , SCD3

Latest Answer: SCD1, SCD2, SCD3 are also called TYPE1, TYPE2, TYPE3
dimensions Type1: It never maintains history in the target table. It keeps the most
recent updated record in the data base. Type2: It maintains full history in the
target. It maintains history by ...

Why are OLTP database designs not generally a good idea for a Data
Warehouse

Answer posted by Shri Dana on 2005-04-06 19:04:05: OLTP cannot store


historical information about the organization. It is used for storing the details of
daily transactions while a datawarehouse is a huge
Latest Answer: OLTP databases are generally volatile in nature which are not
suitable for datawarehouses which we use to store historic data ...

What is a CUBE in datawarehousing concept?


Latest Answer: CUBE is used in DWH for representing multidimensional data
logically. Using the cube, it is easy to carry out certain activity e.g. drill down /
drill up, slice and dice, etc. which enables the business users to understand the
trend of the business. ...

What is the main differnce between schema in RDBMS and schemas in


DataWarehouse....?

Latest Answer: Diff b.w OLTP and OLAP :------------------------OLTP Schema :*


Normalized * More no.of trans* Less time for queries execution* More no.of
users* Have Insert,delete and update trans. OLAP (DWH) Schema :* De
Normalized * Less no.of trans* ...
What is meant by metadata in context of a Datawarehouse and how it is
important?

-
Latest Answer: meta data is stored in repository only not in dataware house .. but
we r placing our repository in database in that way ur correct ,,but not directly
stored in the dataware house plz check it mam ...

Wht r the data types present in bo?n wht happens if we implement view in the
designer n report

Latest Answer: hi venkateshdimension , measure, detail are objects type.data


types are character, date and numeric ...

What is the definition of normalized and denormalized view and what are the
differences between them

What is the definition of normalized and denormalized view and what are the
differences between them

What is the main difference between Inmon and Kimball philosophies of data
warehousing?

Latest Answer: RalfKimball: he follows bottum-up approach i.e., first create


individual Data Marts from the existing sources and then create Data
Warehouse.BillImmon: he follows top-down approach i.e., first create Data
Warehouse from the existing ...

Explain degenerated dimension in detail.

Latest Answer: A Degenerate dimension is a Dimension which has only a single


attribute.This dimension is typically represented as a single field in a fact
table.Degenerate Dimensions are the fastest way to group similar
transactions.Degenerate Dimensions are used when ...

What is the need of surrogate key;why primary key not used as surrogate key

Latest Answer: Datawarehousing depends on the surrogate key not primary key,
for suppose if u r taking the product price it will change over the time, but product
no. will not change but price will change over the time to maintain the full
hystorical data ...
How do you connect two fact tables ? Is it possible ?

Latest Answer: The only way to connect two fact tables is by using conformed
dimension. ...

Explain the flow of data starting with OLTP to OLAP including staging
,summary tables,Facts and dimensions.

Explain the flow of data starting with OLTP to OLAP including staging ,summary
tables,Facts and dimensions.

What are the Different methods of loading Dimension tables


Latest Answer: The answer to this depends on what kind of Dimension are we
loading. If it is not changing , then simply insert. If it is slowly changing dim of
type 1 , update else insert(50% of the time)Type 2, Only Insert (50% of the
time)Type 3 ,Rarely used as we ...

What are modeling tools available in the Market

Latest Answer: There is one more data modelling tool available in the market and
that is "KALIDO".This is end to end data warehousing tool. Its a unique and
user friendly tool. ...

What is real time data-warehousing

Latest Answer: Real time Data warehousing means combination of hetrogenious


databases and query and analysis purpose and Decisionmaking and reporting
purpose. ...

What are Semi-additive and factless facts and in which scenario will you use
such kinds of fact tables

What are Semi-additive and factless facts and in which scenario will you use
such kinds of fact tables

What is degenerate dimension table?

Latest Answer: Degenerate Dimensions : If a table contains the values, which r


neither dimesion nor measures is called degenerate dimensions.Ex : invoice
id,empno ...
What is Data warehosuing Hierarchy?

Latest Answer: hierarchy is an ordered series of related dimension objects


grouped together to perform the multidimensional analysis.Multidimensional
analysis is a technique to modify the data,so that the data can be viewed
from different perspectives and at different ...

What is the difference between view and materialized view

Latest Answer: View is a logical reference to a database table. But Meterial


View is actual table and we can refresh data in time intervels. If you made any
change in database table that change will effect into view but not meterialize
view.. ...

What are the different architecture of datawarehouse

Latest Answer: Architecture 1:Source=>Staging=>DWHArchitecture


2:Source=>Staging=>Datamarts ...

What is hybrid slowly changing dimension

Latest Answer: Hybrid SCDs are combination of both SCDÂ 2 and SCD
3.Whatever changes done in source for each and every record there is a new
entry in target side, whether it may be UPDATE or INSERT. There is new
column added to provide the previous record info (generally ...

What is the difference between star schema and snow flake schema ?and
when we use those schema's?

What is the difference between star schema and snow flake schema ?and when
we use those schema's?
Can you convert a snowflake schema in to star schema?
Latest Answer: Star ----->Snow Flake also vice versa is possibleIn Star
SchemaWhen we try to access many attributes or few attributes from a single
dimension table the performance of the query falls. So we denormalize this
dimension table into two or sub dimensions. ...

Explain the situations where snowflake is better than star schema

Latest Answer: A snowflake schema is a way to handle problems that do not fit
within the star schema. It consists of outrigger tables which relate to dimensions
rather than to the fact table.The amount of space taken up by dimensions is so
small compared to the ...

What are Aggregate tables

Latest Answer: Aggregate table contains the summary of existing warehouse


data which is grouped to certain levels of dimensions.Retrieving the required
data from the actual table, which have millions of records will take more time and
also affects the server

What is a general purpose scheduling tool

Latest Answer: A sheduling tool is a tool which is used to shedule the


datawarehouse jobs...All the jobs which does some process are sheduled using
this tool, which eliminates the manual intervension. ...

Which columns go to the fact table and which columns go the dimension table

Answered by Satish on 2005-04-29 08:20:29: The Aggreation or calculated value


colums will go to Fac Tablw and details information will go to diamensional table.
Latest Answer: Before broken into coloumns is going to the factAfter broken
going to dimensions ...

Why should you put your data warehouse on a different system than your
OLTP system

Latest Answer: An DW is typically used most often for intensive querying . Since
the primary responsibility of an OLTP system is to faithfully record on going
transactions (inserts/updates/deletes), these operations will be considerably
slowed down by the heavy querying ...

What is the main FUNCTIONAL difference between ROLAP,MOLAP,HOLAP?

(NOT AS A RELATIONAL,MULTI, HYBRID?)


Latest Answer: The FUNCTIONAL difference between these is how they
information is stored. In all cases, the users see the data as a cube of
dimensions and facts.ROLAP - detailed data is stored in a relational database in
3NF, star, or snowflake form. Queries ...

Is it correct/feasible develop a Data Mart using an ODS?

the ODS is technically designed to be used as the feeder for the DW and other
DM's -- yes. It is to be the source of truth.Read the complete thread at
http://asktom.oracle.com/pls/ask/f?
p=4950:8:16165205144590546310::NO::F4950_P8_DISPLAYID,F4950_P8_CRI
TERIA:30801968442845,
Latest Answer: Hi According to Bill Inmon's paradigm an enterprize can have one
datware house and datamarts source their information from the datawarehouse.
In the dataware house, information is stroed in 3rd Normalization. This Dataware
house is build on ODS. You ...

What are the possible data marts in Retail sales.?

Latest Answer: product informationstore time ...


Read Answers (3)

Answer Question Subscribe


What is BUS Schema?

Latest Answer: Bus Schema : Let we consider/explain these in x,y axis


Dimension Table : A,B,C,D,E,F ...
Read Answers (3) | Asked by : Reddeppa

What are the steps to build the datawarehouse


Latest Answer: 1.Understand the bussiness requirements.2.Once the business
requirements are clear then Identify the Grains(Levels).3.Grains are defined
,design the Dimensional tables with the Lower level Grains.4.Once the
Dimensions are designed,design the Fact table ...

What is rapidly changing dimension?

Latest Answer: A rapidly changing dimension is a result of poor decisions during


the requirements analysis and data modeling stages of the Data Warehousing
project. If the data in the dimension table is changing a lot, it is a hint that the
design should be revisited. ...

What is data cleaning? how is it done?

Latest Answer: it is a process of identifing and changing the inconsistencies and


inaccuracies ...

Do u need seperate space for Datawarehouse & Data mart


Latest Answer: I think the comments made earlier are not specific.We dont
required any seperate space for data mart and data where house unless until
those marts are too big or client required.We can maintain both in a same
schema. ...

What is source qualifier?

Latest Answer: Source qualifier is a transformation which extracts data from the
source. Source qualifier acts as SQL query when the source is a relational
database and it acts as a data interpreter if the source is a flatfile. ...

Explain ODS and ODS types.

Latest Answer: It is designed to support Operational Monitoring. It is subject


oriented,integrated database which holds the current,detailed data.data here is
volatile ...

What is a level of Granularity of a fact table

Latest Answer: It also means that we can have (for example) data agregated for
a year for a given product as well as the data can be drilled down to Monthly,
weekl and daily basis...teh lowest level is known as the grain. going down to
details is Granularity ...

How are the Dimension tables designed

Latest Answer: Find where data for this dimension are located. Figure out how to
extract this data. Determine how to maintain changes to this dimension (see
more on this in the next section). Change fact table and DW population routines.
...

1.what is incremental loading?2.what is batch processing?3.what is cross


reference table?4.what is aggregate

1.what is incremental loading?2.what is batch processing?3.what is cross


reference table?4.what is aggregate fact table

Give examples of degenerated dimensions

Latest Answer: Degenerated Dimension is a dimension key without


corresponding dimension. Example: In the PointOfSale Transaction Fact table,
we have: Date Key (FK), Product Key (FK), Store ...
What is the difference between Datawarehouse and Datawarehousing
Latest Answer: dataware house is a container to store the historical datawhere
as dataware hosuning is a process or technique to analyze tha data in the ware
house ...

Summarize the differene between OLTP,ODS AND DATA WAREHOUSE ?

Latest Answer: ODS: this is operational data stores, which means the real time
transactional databases. In data warehouse, we extract the data from ODS,
transform in the stagging area and load into the target data warehouse.I think,
earlier comments on the ODS is little ...

What is the purpose of "Factless Fact Table"? How it is involved in Many to


many relationship?

What is the purpose of "Factless Fact Table"? How it is involved in Many to many
relationship?

What is the difference between Data modelling and Dimensional modelling?

Latest Answer: Dimensional Modelling is the Analysis of the Transactional Data


(Facts) based on Master Data (Dimensions).Data Modeling is the process of
creating a data model by applying a data model theory to create a data model
instance.Regards,Sridhar Tirukovela ...

Explain the advanatages of RAID 1, 1/0, and 5. What type of RAID setup would
you put your TX logs

Latest Answer: Raid 0 - Make several physical hard drives look like one hard
drive. No redundancy but very fast. May use for temporary spaces where loss of
the files will not result in loss of committed data. Raid 1- Mirroring. Each hard
drive in the ...

What is the life cycle of data warehouse projects

Latest Answer: STRAGEGY & PROJECT PLANNINGDefinition of scope, goals,


objectives & purpose, and expectationsEstablishment of implementation
strategyPreliminary identification of project resourcesAssembling of project
teamEstimation of project scheduleREQUIREMENTS ...
What is slicing and dicing? Explain with real time usage and business reasons
of it's use

Latest Answer: Hi, Slicing and Dicing is a feature that helps us in seeing the
more detailed information about a particular thing. For eg: You have a report
which shows the quarterly based performance of a particular product. But you
want to see it ...

What is meant by Aggregate Factable?

Factable having aggregated calculations like sum, avg, sum(sal)


+sum(comm),these are Aggregated FactableCheersPadhu
Latest Answer: An aggregate fact table stores information that has been
aggregated, or summarized from a detail fact table. Aggregate fact table ares
useful in improving query performance. Often an aggregate fact table can be
maintained through the use of ...

What is difference between BO, Microstrategy and Cognos

Latest Answer: BO is a ROLAP Tool,Cognos is a MLAP Tool and MicroStrategy


is a HLAP Tool ...
Read Answers (1) | Asked by : prasad

What is data validation strategies for data mart validation after loading process

Latest Answer: Data validation is to make sure that the loaded data is accurate
and meets the business requriments.Strategies are different methods followed to
meet the validation requriments ...

Which automation tool is used in data warehouse testing?

Latest Answer: No Tool testing in done in DWH, only manual testing is done.

What are the advantages data mining over traditional approaches?

Latest Answer: Data Mining is used for the estimation of future. For example, if
we take a company/business organization, by using the concept of Data Mining,
we can predict the future of business interms of Revenue (or) Employees (or)
Cutomers (or) Orders ...

What is the differences between the static and dynamic caches?

Latest Answer: static cache stores overloaded values in the memory and it wont
change throught the running of the session where as dynamic cache stores the
values in the memory and changes dynamically duirng the running of the session
used in scd types -- where target ...

What is cube and why we are crating a cube what is diff between etl and olap
cubes any budy ans

What is cube and why we are crating a cube what is diff between etl and olap
cubes any budy ans plz?

What are the various attributes in time dimension, If this dimension has to
consider only date of birth

What are the various attributes in time dimension, If this dimension has to
consider only date of birth of a citizen of a country?

What are late arriving Facts and late arriving dim ? How does it impacts DW?

Latest Answer: Late arriving Fact table:Â Â Â Â Â Â Â This is rarely happens in


practice. For example there was a credit card of HDFC transaction happened on
25th Mar 2005, but this record we received on 14th Aug 2007. During this period
there is a possibility of change ...

What are the various techniques in ER modelling?

Latest Answer: ER modelling is the first step for any Database project like
Oracle, DB2.1. Conceptual Modelling2. Logical Modelling3. Physical Modelling ...

Explain Bill Inmon's versus Ralph Kimball's Approach to Data Warehousing.

Bill Inmon vs Ralph Kimball In the data warehousing field, we often hear about
discussions on where a person / organization's philosophy falls into Bill Inmon's
camp or into Ralph Kimball's
Latest Answer: Bill inmon : Data warehouse à Data martRalph Kimbol : Data
mart à Data warehouseCheers,Sithu, sithusithu@Hotmail.com ...

I want to know how to protect my data over networ.which software will be use

Information Packages(IP) are advanced by some author as a way of building


dimensional models - e.g.

Information Packages(IP) are advanced by some author as a way of building


dimensional models - e.g. star schemas. Explain what IPs are and Give an
example of it\'s use in building a dimensional model.
What is Replicate,Transparent and Linked cubes?
1) What is Data warehouse?

Data warehouse is relational database used for query analysis and reporting. By
definition data warehouse is Subject-oriented, Integrated, Non-volatile, Time
variant.

Subject oriented : Data warehouse is maintained particular subject.

Integrated : Data collected from multiple sources integrated into a

user readable unique format.

Non volatile : Maintain Historical date.

Time variant : data display the weekly, monthly, yearly.

2) What is Data mart?

A subset of data warehouse is called Data mart.

3) Difference between Data warehouse and Data mart?

Data warehouse is maintaining the total organization of data. Multiple data marts
used in data warehouse. where as data mart is maintained only particular subject.

4) Difference between OLTP and OLAP?

OLTP is Online Transaction Processing. This is maintained current transactional data.


That means insert, update and delete must be fast.

5) Explain ODS?

Operational data store is a part of data warehouse. This is maintained only current
transactional data. ODS is subject oriented, integrated, volatile, current data.

6) Difference between Power Center and Power Mart?

Power center receive all product functionality including ability to multiple register
servers and metadata across the repository and partition data.

One repository multiple informatica servers. Power mart received all features except
multiple register servers and partition data.

7) What is a staging area?

Staging area is a temporary storage area used for transaction, integrated and rather
than transaction processing.

When ever your data put in data warehouse you need to clean and process your data.

8) Explain Additive, Semi-additive, Non-additive facts?

Additive fact: Additive Fact can be aggregated by simple arithmetical additions.


Semi-Additive fact: semi additive fact can be aggregated simple arithmetical

additions along with some other dimensions.

Non-additive fact: Non-additive fact can’t be added at all.

9) What is a Fact less Fact and example?

Fact table which has no measures.

10)Explain Surrogate Key?

Surrogate Key is a series of sequential numbers assigned to be a primary


key for the table.

11)How many types of approaches in DHW?

Two approaches: Top-down(Inmol approach), Bottom-up(Ralph Kimball)

12) Explain Star Schema?

Star Schema consists of one or more fact table and one or more dimension tables
that are related to foreign keys.
Dimension tables are De-normalized, Fact table-normalized

Advantages: Less database space & Simplify queries.

13) Explain Snowflake schema?

Snow flake schema is a normalize dimensions to eliminate the redundancy.The


dimension data has been grouped into one large table. Both dimension and fact
tables normalized.

14) What is confirm dimension?

If both data marts use same type of dimension that is called confirm
dimension.If you have same type of dimension can be used in multiple fact that
is called confirm dimension.

15) Explain the DWH architecture?

16) What is a slowly growing dimension?

Slowly growing dimensions are dimensional data,there dimensions increasing


dimension data with out update existing dimensions.That means appending
new data to existing dimensions.

17) What is a slowly changing dimension?

Slowly changing dimension are dimension data,these dimensions increasing


dimensions data with update existing dimensions.
Type1: Rows containing changes to existing dimensional are update in the
target by overwriting the existing dimension.In the Type1 Dimension mapping,
all rows contain current dimension data.

Use the type1 dimension mapping to update a slowly changing dimension


table when you do not need to keep any previous versions of dimensions in the
table.

Type2: The Type2 Dimension data mapping inserts both new and changed
dimensions into the target.Changes are tracked in the target table by versioning
the primary key and creating a version number for each dimension in the table.

Use the Type2 Dimension/version data mapping to update a slowly changing


dimension when you want to keep a full history of dimension data in the
table.version numbers and versioned primary keys track the order of changes
to each dimension.

Type3: The type 3 dimension mapping filters source rows based on user-
defined comparisions and inserts only those found to be new dimensions to the
target.Rows containing changes to existing dimensions are updated in the
target. When updating an existing dimension the informatica server saves
existing data in different columns of the same row and replaces the existing
data with the updates.

18) When you use for dynamic cache.

Your target table is also look up table then you go for dynamic cache .In
dynamic cache multiple matches return an error.use only = operator.

19) what is lookup override?

Override the default SQL statement.You can join multiple sources use lookup
override.By default informatica server add the order by clause.

20) we can pass the null value in lookup transformation?

Lookup transformation returns the null value or equal to null value.

21) what is the target load order?

You specify the target load order based on source qualifiers in a mapping.if u
have the multiple source qualifiers connected to the multiple targets you can
designate the order in which informatica server loads data into the targets.

22) what is default join that source qualifier provides?

Inner equi join.

23) what are the difference between joiner transformation and source qualifier
transformation?

You can join heterogeneous data sources in joiner transformation, which we


cannot achive in source qualifier transformation.
You need matching keys to join two relational sources in source qualifier
transformation.where you doesn’t need matching keys to join two sources.

Two relational sources should come from same data source in source
qualifier.You can join relational sources, which are coming from different
sources in source qualifier.You can join relational sources which are coming
from different sources also.

24) what is update strategy transformation?

Whenever you create the target table whether you are store the historical data
or current transaction data in to target table.

25) Describe two levels in which update strategy transformation sets?

26) what is default source option for update strategy transformation?

Data driven.

27) What is data driven?

The information server follows instructions coded into update strategy


transformations with in the session mapping determine how to flag records for
insert,update,delete or reject if u do not choose data driven option setting , the
informatica server ignores all update strategy transformations in the mapping.

28) what are the options in the trarget session of update strategy transformation?

Insert

Delete

Update

Update as update

Update as insert

Update else insert

Truncate table.

29) Difference between the source filter and filter?

Source filter is filtering the data only relational sources. Where as filter
transformation filter the data any type of source.

30) what is a tracing level?

Amount of information sent to log file.

-- What are the types of tracing levels?


Normal,Terse,verbose data,verbose intitialization.

--Expalin sequence generator transformation?

-- can you connect multiple ports from one group to multiple transformations?

Yes

31) can you connect more than one group to the same target or transformation?

NO

32) what is a reusable transformation?

Reusable transformation can be a single transformation.This transformation can


be used in multiple mappings.when you need to incorporate this transformation
into mapping you add an instance of it to mapping.Later if you change the
definition of the transformation, all instances of it inherit the changes.Since the
instance of reusable transformation is a pointer to that transformation.U can
change the transformation in the transformation developer, its instance
automatically reflect these changes. This feature can save U great deal of work.

-- what are the methods for creating reusable transformation?

Two methods

1) Design it in the transformation developer.

2) Promote a standard transformation from the mapping designer.After you


add a transformation to the mapping, you can promote it to status of
reusable transformation.

Once you promote a standard transformation to reusable status, you


can demote it to a standard transformation at any time.

If u change the properties of a reusable transformation in mapping , you


can revert it to the original reusable transformation properties by clicking
the revert.

33)what are mapping parameters and mapping variables?

Mapping parameter represents a constant value that you can define


before running a session.A mapping parameter retains the same value
throughout the entire session.

When you use the mapping parameter , you declare and use the
parameter in a mapping or mapplet.Then define the value of parameter in
a parameter file for the session.

Unlike a mapping parameter, a mapping variable represents a value that


can change through out the session. The informatica server save the
value of mapping variable to the repository at the end of session run and
uses that value next time you run the session.
34)can you use the mapping parameters or variables created in one mapping
into another mapping?

NO, we can use mapping parameters or variables in any transformation of


the same mapping or mapplet in which have crated mapping parameters
or variables.

35)Can you are the mapping parameters or variables created in one mapping
into any other result transformation.

Yes because the reusable transformation is not contained with any


mapplet or mapping.

36)How the informatica server sorts the string values in rank transformation?

When the informatica server runs in the ASCII data movement mode it
sorts session data using binary sort order.If you configures the session to
use a binary sort order, the informatica server calculates the binary value
of each string and returns the specified number of rows with the highest
binary values for the string.

37)What is the rank index in rank transformation?

The designer automatically creates a RANKINDEX port for each Rank


transformation. The informatica server uses the Rank Index port to store
the ranking position for each record in a group.For example, if you create
a Rank transformation that ranks the top 5 sales persons for each quarter,
the rank index number the salespeople from 1 to 5.

38)what is the mapplet?

Mapplet is a set of transformation that you build in the mapplet designer


and you can use in multiple mappings.

39)Difference between mapplet and reusable transformation?

Reusable transformation can be a single transformation.Where as mapplet


use multiple transformations.

40)what is a parameter a file?

Paramater file defines the values for parameter and variables.

WORKFLOW MANAGER

41)what is a server?

The power center server moves data from source to targets based on a
workflow and mapping metadata stored in a repository.

42)what is a work flow?


A workflow is a set of instructions that describe how and when to run tasks
related to extracting,transformation and loading data.

-- what is session?

A session is a set of instructions that describes how to move data from source to
target using a mapping.

-- what is workflow monitor?

Use the work flow monitor work flows and stop the power center server.

43)explain a work flow process?

The power center server uses both process memory and system shared
memory to perform these tasks.

Load manager process: stores and locks the workflow tasks and start the
DTM run the sessions.

Data Transformation Process DTM: Perform session validations,create


threads to initialize the session,read,write and transform data, and handle pre
and post session operations.

The default memory allocation is 12,000,000 bytes.

44)What are types of threads in DTM?

The main dtm thread is called the master thread.

Mapping thread.

Transformation thread.

Reader thread.

Writer thread.

Pre-and-post session thread.

45)Explain work flow manager tools?

1) Task developer.

2) Work flow designer.

3) Worklet designer.

46)Explain work flow schedule.

You can sehedule a work flow to run continuously, repeat at given time or
interval or you manually start a work flow.By default the workflow runs on
demand.
47)Explain stopping or aborting a session task?

If the power center is executing a session task when you issue the stop
the command the power center stop reading data. If continuous
processing and writing data and committing data to targets.

If the power center can’t finish processing and committing data you issue
the abort command.

You can also abort a session by using the Abort() function in the mapping
logic.

48)What is a worklet?

A worklet is an object that represents a set of taske.It can contain any task
available in the work flow manager. You can run worklets inside a
workflow. You can also nest a worklet in another worklet.The worklet
manager does not provide a parameter file for worklets.

The power center server writes information about worklet execution in the
workflow log.

49)what is a commit interval and explain the types?

A commit interval is the interval at which power center server commits


data to targets during a session. The commit interval the number of rows
you want to use as a basis for the commit point.

Target Based commit: The power center server commits data based on
the number of target rows and the key constraints on the target table. The
commit point also depends on the buffer block size and the commit
interval.

Source-based commit:---------------------------------------------

User-defined commit:----------------------------------------------

50)Explain bulk loading?

You can use bulk loading to improve performance of a session that inserts
a large amount of data to a db2,sysbase,oracle or MS SQL server
database.

When bulk loading the power center server by passes the database
log,which speeds performance.

With out writing to the database log, however the target database can’t
perform rollback.As a result you may not be perform recovery.

51)What is a constraint based loading?

When you select this option the power center server orders the target load
on a row-by-row basis only.
Edit tasks->properties->select treat source rows as insert.

Edit tasks->config object tab->select constraint based

If session is configured constraint absed loading when target table receive


rows from different sources.The power center server revert the normal
loading for those tables but loads all other targets in the session using
constraint based loading when possible loading the primary key table first
then the foreign key table.

Use the constraint based loading only when the session option treat rows
as set to insert.

Constraint based load ordering functionality which allows developers to


read the source once and populate parent and child tables in a single
process.

52)Explain incremental aggregation?

When using incremental aggregation you apply captured changes in the


source to aggregate calculations in a session.If the source changes only
incrementally and you can capture changes you can configure the session
to process only those changes. This allows the power center server to
update your target incrementally rather than forcing it to process the entire
source and recalculate the same data each time you run the session.

You can capture new source data.use incremental aggregation when you
can capture new source data much time you run the session.Use a stored
procedure on filter transformation only new data.

Incremental changes do not significantly change the target.Use


incremental aggregation when the changes do not significantly change the
target.If processing the incrementally changed source alters more than
half the existing target, the session may not benefit from using incremental
aggregation. In this case drop the table and recreate the target with
complete source data.

53)Processing of incremental aggregation

The first time u run an incremental aggregation session the power center
server process the entire source.At the end of the session the power
center server stores aggregate data from the session runs in two files, the
index file and the data file .The power center server creates the files in a
local directory.

Transformations.

--- what is transformation?

Transformation is repository object that generates modifies or passes


data.

54)what are the type of transformations?


2 types:

1) active

2) passive.

-- explain active and passive transformation?

Active transformation can change the number of rows that pass through it.No
of output rows less than or equal to no of input rows.

Passive transformation does not change the number of rows.Always no of

output rows equal to no of input rows.

55)Difference filter and router transformation.

Filter transformation to filter the data only one condition and drop the rows
don’t meet the condition.

Drop rows does not store any ware like session log file..

Router transformation to filter the data based on multiple conditions and give
yiou the option to route rows that don’t match to a default group.

56)what r the types of groups in router transformation?

Router transformation 2 groups 1. Input group 2. output groups.

Output groups in 2 types. 1. user defined group 2. default group.

57)difference between expression and aggregator transformation?

Expression transformation calculate the single row values before writes the
target.Expression transformation executed by row-by-row basis only.

Aggregator transformation allows you to perform aggregate calculations like


max, min,avg…

Aggregate transformation perform calculation on groups.

58)How can u improve the session performance in aggregate transformation?

Use stored input.

59)what is aggregate cache in aggregate transformation?

The aggregate stores data in the aggregate cache until it completes


aggregate calculations.When u run a session that uses an aggregate
transformation , the informatica server creates index and data caches in
memory is process the transformation. If the informatica server requires more
space it seores overview values in cache files.

60)explain joiner transformation?


Joiner transformation joins two related heterogeneous sources residing in
different locations or files.

--What are the types of joins in joiner in the joiner transformation?

Normal

Master outer

Detail outer

Full outer

61)Difference between connected and unconnected transformations.

Connected transformation is connected to another transformation with in a


mapping.

Unconnected transformation is not connected to any transformation with in a


mapping.

62)In which conditions we cannot use joiner transformation(limitations of


joiner transformation)?

Both pipelines begin with the same original data source.

Both input pipelines originate from the same source qualifier transformation.

Both input pipelines originate from the same normalizer transformation

Both input pipelines originate from the same joiner transformation.

Either input pipelines contains an update strategy transformation

Either input pipelines contains sequence generator transformation.

63)what are the settings that u use to configure the joiner transformation?

Master and detail source.

Type of join

Condition of the join

64)what is look up transformation

look up transformation can be used in a table view based on condition by


default lookup is left outer join

65)why use the lookup transformation?

To perform the following tasks.


Get a related value.For example if your table includes employee ID,but you
want to include such as gross sales per invoice or sales tax but not the
calculated value(such as net sales)

Update slowly changing dimension tables. You can use a lookup


transformation to determine whether records already exist in the target.

66)what are the types of lookup?

Connected and unconnected

67)difference between connected and unconnected lookup?

Connected lookup Unconnected lookup

Receives input values directly Receives input values from the result of
from the pipe line. a clkp expression in a another
transformation.

U can use a dynamic or static U can use a static cache

Cache

Cache includes all lokkup Cache includes all lookup/output ports


columns used in the in the lookup condition and the
mapping(that is lookup table lookup/return port.
columns included in the lookup
condition and lookup table
columns linked as output ports
to other transformations)

Can return multiple columns Designate one return port(R).Returns


from the same row or insert one column from each row.
into the dynamic lookup cache.

If there is no match for the If there is no matching for the lookup


lookup condition, the condition the informatica server returns
informatica server returns the NULL
default value for all output
ports.If u configure dynamic
caching the informatica server
inserts rows into the cache.

Pass multiple output values toPass one output value to another


another transformatnion.Link transformation.The lookup/output/return
lookup/output ports to anotherport passes the same value to the
transformation -----------------------------------------------------
----
Does not support user-defined default
Supports user-defined default values.
values.
68)explain index cache and data cache?

The informatica server stores conditions values in the index cache and
output values in the data cache.

69)What are the types of lookup cache?

Persistent cache: U can save the look up cache files and reuse them the next
time the informatica server processes a lookup transformation to use the
cache.

Static cache: U can configure a static or read-only lookup table.By default


informatica server creates a static cache.It caches the lookup table and
lookup values in the cache for each row that comes into the
transformation.When the lookup condition is true the inforamtica server does
not update the cache while it processes the lookup transformation.

Dynamic cache: If you want to cache the target table and insert new rows into
cache and the target you can create a look up transformation to use dynamic
cache.The informatica server dynamically inserts data into the target table.

Shared cache: You can share the lookup cache between multiple
transformations.You can share unnamed cache between transformation in the
same mapping.

70)Difference between static cache and dynamic cache?

Static cache Dynamic cache

You cannot insert or update You can insert rows into the cache
the cache as you pass rows to the target

The informatica server The informatica server inserts rows


returns a value from the into the cache when the condition is
lookup table or cache when false.This indicates that the row in
the condition is true,.When the cache or target table.You can
the condition is true the pass these rows to the target table.
informatica server returns
the default value for
connected transformation

ORACLE:

71) Difference between primary key and unique key?

Primary key is Not null unique

Unique accept the null values.

72) Difference between inserting and sub string?


73) What is referential integrity?

74) Difference between view and materialized view?

75) What is Redolog file?

The set of redo log files for a database is collectively know as the databases redo
log.

76) What is RollBack statement?

A database contains one or more rollback segments to temporarily store undo


information.Roll back segment are used to generate read consistant data base
information during database recovery to rooback uncommitted transactions for users.

-- what is table space?

A data base is divided into logical storage unit called table space.A table space is
used to grouped related logical structures together.

-- How to delete the duplicate records.

-- What are the difference types of joins in Oracle?

Self-join,equi-join,outer join.

77) What is outer join?

One of which rows that don’t match those in the commen column of another table.

78) write query Max 5 salaries?

Select * from emp e where 5>(select count(*) from emp where sal>e.sal)

79) what is synonym?

80) --------------------------------

81)

82) What is bit map index and example?

83) What is stored procedure and advantages?

84) Explain cursor and how many types of triggers in oracle?

Trigger is stored procedure.Trigger is automatically executed.

85) Difference between function and stored procedure?


Function returns a value.Procedure does not return a value(but returns a value tru IN
OUT parameters!!!!!!)

86) Difference between replace and translate?

87) Write the query nth max sal

Select distinct (a.sal) from emp a where &n=select count(distinct(b.sal) from emp b
where a.sal<=b.sal

88) Write the query odd and even numbers?

Select * from emp where (rowed,1) in (select rowed,mod(rownum,2) from emp)


Interview Questions

1. What are the different types of joins


1. self join
2. equi-join
3. non equi-join
4. cross join
5. natural join
6. full outer join
7. outer join
8. left outer join
9. right outer join

2. what is sub-query? types of sub-quires? use of sub-quires?


Sub-query is nothing but a query inside a query which appears only after the
WHERE clause of a select statement.
Two types of sub-query are there:
a) Co-Related sub-query
b) Non-Co-related sub-query.
Use of sub-query is to run or execute your sub-query according to the best
Execution plan available with Oracle

3. what is view? types of views? use of views? how to create view(syntax)?


View is nothing but parsed SQL statement which fetches record at the time of
execution.
There are mainly two type of views
a) Simple View
b) Complex View
apart from that we can also subdivided views as Updatable Views and Read only
Views.
Lastly there is an another view named as Materialized Views.
View is used for the purposes as stated below:
a) Security
b) Faster Response
c) Complex Query solve
Syntax is :
Create or replace view([olumn1],[column2]...)
as
Select column1,column2...
from table_name
[where condition]
[with read only],[with check option]

4. Briefly explain the difference between first ,second ,third and fourth
normal forms?
First Normal form : Attribute should be atomic.
Second Normal Form : Non-Key attribute should be fully functionally dependent
on key Attribute.
Third normal Form : There is no transitivity dependency between attribute.
Suppose 'y' is dependent on 'x' i.e. x->y and 'z' is dependent on 'y' i.e. y->z this
is transitivity dependency So we can split table on to two tables os that result will
be x->z.
Forth Normal Form : A determinant is any attribute (simple or composite) on
which some other attribute is fully functionally dependent.
A relation is in BCNF is, and only if, every determinant is a candidate key.

5. Difference between Two tier architecture and Three tier architecture?


Following are the tier types in a client server application:
a. 1 tier application: All the processing is done on one machines and number of
clients are attached to this machine (mainframe applications)
b. 2 tier application: Clients and data base on different machines. Clients are
thick clients i.e. processing is done at client side. Application layer is on Clients.
c. 3 tier application: Client are partially thick. Apart from that there are two more
layers application layer and database layer.
d. 4 tier application: Some clients may be totally non thick clients some clients
may be partially thick and further there are 3 layers web layer, application layer
and database layer.

6. There is a eno & gender in a table. Eno has primary key and gender has a
check constraints for the values 'M' and 'F'.
While inserting the data into the table M was misspelled as F and F as M.
What is the update statement to replace F with M and M with F?
CREATE TABLE temp(
eno NUMBER CONSTRAINTS pk_eno PRIMARY KEY,
gender CHAR(1) CHECK (gender IN( 'M','F')));

INSERT INTO temp VALUES ('01','M');


INSERT INTO temp VALUES ('02','M');
INSERT INTO temp VALUES ('03','F');
INSERT INTO temp VALUES ('04','M');
INSERT INTO temp VALUES ('05','M');
INSERT INTO temp VALUES ('06','F');
INSERT INTO temp VALUES ('07','M');
INSERT INTO temp VALUES ('08','F');

COMMIT;
UPDATE temp SET gender =DECODE(gender,'M','F','F','M');

Commit;

7. What is difference between Co-related sub query and nested sub


query??
Co-related sub query is one in which inner query is evaluated only once and from
that result outer query is evaluated.
Nested query is one in which Inner query is evaluated for multiple times for
getting one row of that outer query.
ex. Query used with IN() clause is Co-related query.
Query used with = operator is Nested query

8. How to find out the database name from SQL*PLUS command prompt?
SELECT INSTANCE_NAME FROM V$INSTANCE;
SELECT * FROM V$DATABASE;
SELECT * FROM GLOBAL_NAME;

9. What is Normalization?
Normalization is the process of removing redundant data from your tables in
order to improve storage efficiency, data integrity and scalability.

10. Difference between Store Procedure and Trigger


Stored procedure is a pl/sql programming block stored in the database for
repeated execution Whereas, rigger is a pl/sql programming block that is
executed implicitly by a data manipulation statement.

11. What is the difference between Single row sub-Query and Scalar sub-
Query
Single row sub-queries returns only one row of results. A single row sub query
uses a single row operator; the common operator is the equality operator(=).
A Scalar sub-query returns exactly one column value from one row. Scalar sub-
queris can be used in most places where you would use a column name or
expression, such as inside a single row function as an argument, in insert, order
by clause, where clause, case expressions but not in group by or having clause.

12. TRUNCATE TABLE EMP; DELETE FROM EMP; Will the outputs of the
above two commands
Delete Command:
1. It’s a DML Command
2. Data can be rolled back.
3. Its slower than Truncate command b’coz it logs each row deletion.
4. With delete command trigger can be fire.
Truncate Command:
1. It’s a DDL Command
2. Data Can not be rolled back.
3. Its is faster than delete b’coz it does not log rows.
With Truncate command trigger can not be fire.
both cases only the table data is removed, not the table structure.

13. What is the use of the DROP option in the ALTER TABLE command
Drop option in the ALTER TABLE command is used to drop columns you no
longer need from the table.
 The column may or may not contain data
 Using alter column statement only one column can be dropped at a time.
 The table must have at least one column remaining in it after it is altered.
 Once a column is dropped, it cannot be recovered.

14. What will be the output of the following query


SELECT REPLACE(TRANSLATE(LTRIM(RTRIM('!! ATHEN !!','!'), '!'), 'AN',
'**'),'*','TROUBLE') FROM DUAL;
TROUBLETHETROUBLE

15. What will be the output of the following query


SELECT DECODE(TRANSLATE('A','1234567890','1111111111'), '1','YES',
'NO' );
NO
Explanation :
The query checks whether a given string is a numerical digit.

16. How can one transfer LOB and user defined data from oracle to
warehouse using ETL informatica because whenever you select the source
data in informatica it shows it can take only character data.
LOB can be trasferred as text in informatica 7.1.2

17. what is data validation strategies for data mart validation after loading
process
Data validation strategies are often heavily influenced by the architecture for the
application. If the application is already in production it will be significantly harder
to build the optimal architecture than if the application is still in a design stage. If
a system takes a typical architectural approach of providing common services
then one common component can filter all input and output, thus optimizing the
rules and minimizing efforts.
There are three main models to think about when designing a data validation
strategy.
 Accept Only Known Valid Data
 Reject Known Bad Data
 Sanitize Bad Data
We cannot emphasize strongly enough that "Accept Only Known Valid Data" is
the best strategy. We do, however, recognize that this isn't always feasible for
political, financial or technical reasons, and so we describe the other strategies
as well.
All three methods must check:
 Data Type
 Syntax
 Length

18. In which situation context and alias are going to use?


Aliases
Which are logical pointers to an alternate table name. This command is dimmed
until you select a table within the Structure window. You can define aliases to
resolve the loops that Designer detected in the universe structure. This feature
works only if you have defined at least one join and all the cardinalities in the
joins have been detected.
Context
Can be used to resolve loops in the universe, You can create contexts manually,
or cause them to be detected by Designer. When contexts are useful, Designer
suggests a list of contexts that you can create.

19. what is the difference between ETL tool and OLAP tools?
ETL tools are used to extract, transformation and loading the data into data
warehouse / data mart
OLAP tools are used to create cubes/reports for business analysis from data
warehouse / data mart

20. What is Data warehousing Hierarchy?


Hierarchies
Hierarchies are logical structures that use ordered levels as a means of
organizing data. A hierarchy can be used to define data aggregation. For
example, in a time dimension, a hierarchy might aggregate data from the month
level to the quarter level to the year level. A hierarchy can also be used to define
a navigational drill path and to establish a family structure.

Within a hierarchy, each level is logically connected to the levels above and
below it. Data values at lower levels aggregate into the data values at higher
levels. A dimension can be composed of more than one hierarchy. For example,
in the product dimension, there might be two hierarchies--one for product
categories and one for product suppliers.

Dimension hierarchies also group levels from general to granular. Query tools
use hierarchies to enable you to drill down into your data to view different levels
of granularity. This is one of the key benefits of a data warehouse.

When designing hierarchies, you must consider the relationships in business


structures. For example, a divisional multilevel sales organization.

Hierarchies impose a family structure on dimension values. For a particular level


value, a value at the next higher level is its parent, and values at the next lower
level are its children. These familial relationships enable analysts to access data
quickly.

Levels
A level represents a position in a hierarchy. For example, a time dimension might
have a hierarchy that represents data at the month, quarter, and year levels.
Levels range from general to specific, with the root level as the highest or most
general level. The levels in a dimension are organized into one or more
hierarchies.

Level Relationships
Level relationships specify top-to-bottom ordering of levels from most general
(the root) to most specific information. They define the parent-child relationship
between the levels in a hierarchy.
Hierarchies are also essential components in enabling more complex rewrites.
For example, the database can aggregate an existing sales revenue on a
quarterly base to a yearly aggregation when the dimensional dependencies
between quarter and year are known.

21. what are the data types present in BO? what happens if we implement
view in the designer and report?
Three different data types: Dimensions, Measure and Detail.
View is nothing but an alias and it can be used to resolve the loops in the
universe.

22. What is surrogate key ? where we use it explain with examples


surrogate key is a substitution for the natural primary key.

It is just a unique identifier or number for each row that can be used for the
primary key to the table. The only requirement for a surrogate primary key is that
it is unique for each row in the table.

Data warehouses typically use a surrogate, (also known as artificial or identity


key), key for the dimension tables primary keys. They can use Infa sequence
generator, or Oracle sequence, or SQL Server Identity values for the surrogate
key.

It is useful because the natural primary key (i.e. Customer Number in Customer
table) can change and this makes updates more difficult.

Some tables have columns such as AIRPORT_NAME or CITY_NAME which are


stated as the primary keys (according to the business users) but ,not only can
these change, indexing on a numerical value is probably better and you could
consider creating a surrogate key called, say, AIRPORT_ID. This would be
internal to the system and as far as the client is concerned you may display only
the AIRPORT_NAME.

Another benefit you can get from surrogate keys (SID) is :

Tracking the SCD - Slowly Changing Dimension.

Let me give you a simple, classical example:

On the 1st of January 2002, Employee 'E1' belongs to Business Unit 'BU1' (that's
what would be in your Employee Dimension). This employee has a turnover
allocated to him on the Business Unit 'BU1' But on the 2nd of June the Employee
'E1' is muted from Business Unit 'BU1' to Business Unit 'BU2.' All the new
turnover have to belong to the new Business Unit 'BU2' but the old one should
Belong to the Business Unit 'BU1.'

If you used the natural business key 'E1' for your employee within your
datawarehouse everything would be allocated to Business Unit 'BU2' even what
actualy belongs to 'BU1.'
If you use surrogate keys, you could create on the 2nd of June a new record for
the Employee 'E1' in your Employee Dimension with a new surrogate key.

This way, in your fact table, you have your old data (before 2nd of June) with the
SID of the Employee 'E1' + 'BU1.' All new data (after 2nd of June) would take the
SID of the employee 'E1' + 'BU2.'

You could consider Slowly Changing Dimension as an enlargement of your


natural key: natural key of the Employee was Employee Code 'E1' but for you it
becomes
Employee Code + Business Unit - 'E1' + 'BU1' or 'E1' + 'BU2.' But the difference
with the natural key enlargement process, is that you might not have all part of
your new key within your fact table, so you might not be able to do the join on the
new enlarge key -> so you need another id.

23. What is a linked cube?


A cube can be stored on a single analysis server and then defined as a linked
cube on other Analysis servers. End users connected to any of these analysis
servers can then access the cube. This arrangement avoids the more costly
alternative of storing and maintaining copies of a cube on multiple analysis
servers. linked cubes can be connected using TCP/IP or HTTP. To end users a
linked cube looks like a regular cube.

24. What is meant by metadata in context of a Data warehouse and how it is


important?
In context of a Data warehouse metadata is meant the information about the data
.This information is stored in the designer repository.

25. What is the main difference between schema in RDBMS and schemas in
Data Warehouse....?
RDBMS Schema
* Used for OLTP systems
* Traditional and old schema
* Normalized
* Difficult to understand and navigate
* Cannot solve extract and complex problems
* Poorly modeled

DWH Schema
* Used for OLAP systems
* New generation schema
* De Normalized
* Easy to understand and navigate
* Extract and complex problems can be easily solved
* Very good model

26. What is Dimensional Modelling?


Dimensional Modelling is a design concept used by many data warehouse
designers to build their data-warehouse. In this design model all the data is
stored in two types of tables - Facts table and Dimension table. Fact table
contains the facts/measurements of the business and the dimension table
contains the context of measurements i.e., the dimensions on which the facts are
calculated.

27. What is real time data-warehousing?


In real-time data warehousing, your warehouse contains completely up-to-date
data and is synchronized with the source systems that provide the source data.
In near-real-time data warehousing, there is a minimal delay between source
data being generated and being available in the data warehouse. Therefore, if
you want to achieve real-time or near-real-time updates to your data warehouse,
you’ll need to do three things:

1. Reduce or eliminate the time taken to get new and changed data out of
your source systems.
2. Eliminate, or reduce as much as possible, the time required to cleanse,
transform and load your data.
3. Reduce as much as possible the time required to update your aggregates.
Starting with version 9i, and continuing with the latest 10g release, Oracle has
gradually introduced features into the database to support real-time, and near-
real-time, data warehousing. These features include:

• Change Data Capture


• External tables, table functions, pipelining, and the MERGE command,
and
• Fast refresh materialized views

28. What is a lookup table?


When a table is used to check for some data for its presence prior to loading of
some other data or the same data to another table, the table is called a LOOKUP
Table.

29. What type of Indexing mechanism do we need to use for a typical data-
warehouse?
On the fact table it is best to use bitmap indexes. Dimension tables can use
bitmap and/or the other types of clustered/non-clustered, unique/non-unique
indexes.

30. What does level of Granularity of a fact table signify?


In simple terms, level of granularity defines the extent of detail. As an example,
let us look at geographical level of granularity. We may analyze data at the levels
of COUNTRY, REGION, TERRITORY, CITY and STREET. In this case, we say
the highest level of granularity is STREET.

31. What is data mining?


Data mining is a process of extracting hidden trends within a data-warehouse.
For example an insurance data warehouse can be used to mine data for the
most high risk people to insure in a certain geographical area.

32. What is degenerate dimension table?


the values of dimension which is stored in fact table is called degenerate
dimensions. these dimensions doesn’t have its own dimensions.
for e.g. Invoice_no, Invoice_line_no in fact table will be a degenerate dimension
(columns), provided if you don’t have a dimension called invoice.

33. How do you load the time dimension?


Every Data-warehouse maintains a time dimension. It would be at the most
granular level at which the business runs at (ex: week day, day of the month and
so on). Depending on the data loads, these time dimensions are updated.
Weekly process gets updated every week and monthly process, every month.
Time dimension in DWH must be load Manually. we load data into Time
dimension using pl/sql scripts.

34. What is ER Diagram ?


ER - Stands for entity relationship diagrams. It is the first step in the design of
data model which will later lead to a physical database design of possible a
OLTP or OLAP database

35. Difference between Snow flake and Star Schema?


Star schema contains the dimesion tables mapped around one or more fact
tables.
It is a denormalised model.
No need to use complicated joins.
Queries results fastly.

Snowflake schema
It is the normalised form of Star schema.
Contains in-depth joins, because the tables are splitted in to many pieces. We
can easily do modification directly in the tables.
We have to use complicated joins, since we have more tables .
There will be some delay in processing the Query .

36. What is a CUBE in data warehousing concept?


Cubes are logical representation of multidimensional data. The edge of the cube
contains dimension members and the body of the cube contains data values.

37. What are non-additive facts?


Fact table typically has two types of columns: those that contain numeric facts
(often called measurements), and those that are foreign keys to dimension
tables.
A fact table contains either detail-level facts or facts that have been aggregated.
Fact tables that contain aggregated facts are often called summary tables. A fact
table usually contains facts with the same level of aggregation.
Though most facts are additive, they can also be semi-additive or non-additive.
Additive facts can be aggregated by simple arithmetical addition. A common
example of this is sales. Non-additive facts cannot be added at all.
An example of this is averages. Semi-additive facts can be aggregated along
some of the dimensions and not along others. An example of this is inventory
levels, where you cannot tell what a level means simply by looking at it.

38. How are the Dimension tables designed?


Most dimension tables are designed using Normalization principles upto 2NF. In
some instances they are further normalized to 3NF.
39. What are Semi-additive and factless facts and in which scenario will
you use such kinds of fact tables?
Semi-Additive: Semi-additive facts are facts that can be summed up for some of
the dimensions in the fact table, but not the others. For example:
Current_Balance and Profit_Margin are the facts. Current_Balance is a semi-
additive fact, as it makes sense to add them up for all accounts (what's the total
current balance for all accounts in the bank?), but it does not make sense to add
them up through time (adding up all current balances for a given account for
each day of the month does not give us any useful information

A factless fact table captures the many-to-many relationships between


dimensions, but contains no numeric or textual facts. They are often used to
record events or
coverage information. Common examples of factless fact tables include:
- Identifying product promotion events (to determine promoted products that
didn’t sell)
- Tracking student attendance or registration events
- Tracking insurance-related accident events
- Identifying building, facility, and equipment schedules for a hospital or university

40. What are the Different methods of loading Dimension tables?


Conventional Load:
Before loading the data, all the Table constraints will be checked against the
data.
Direct load:(Faster Loading)
All the Constraints will be disabled. Data will be loaded directly. Later the data
will be checked against the table constraints and the bad data won't be indexed.

41. What are Aggregate tables?


Aggregate table contains the summary of existing warehouse data which is
grouped to certain levels of dimensions. Retrieving the required data from the
actual table, which have millions of records will take more time and also affects
the server performance. To avoid this we can aggregate the table to certain
required level and can use it. This tables reduces the load in the database server
and increases the performance of the query and can retrieve the result very
fastly.

42. What is active data warehousing?


An active data warehouse provides information that enables decision-makers
within an organization to manage customer relationships nimbly, efficiently and
proactively. Active data warehousing is all about integrating advanced decision
support with day-to-day-even minute-to-minute-decision making in a way that
increases quality of those customer touches which encourages customer loyalty
and thus secure an organization's bottom line. The marketplace is coming of age
as we progress from first-generation "passive" decision-support systems to
current- and next-generation "active" data warehouse implementations.

43. Why Denormalization is promoted in Universe Designing?


In a relational data model, for normalization purposes, some lookup tables are
not merged as a single table. In a dimensional data modeling(star schema),
these tables would be merged as a single table called DIMENSION table for
performance and slicing data. Due to this merging of tables into one large
Dimension table, it comes out of complex intermediate joins. Dimension tables
are directly joined to Fact tables. Though, redundancy of data occurs in
DIMENSION table, size of DIMENSION table is 15% only when compared to
FACT table. So only Denormalization is promoted in Universe Designing.

44. what is the metadata extension?


Informatica allows end users and partners to extend the metadata stored in the
repository by associating information with individual objects in the repository. For
example, when you create a mapping, you can store your contact information
with the mapping. You associate information with repository metadata using
metadata extensions.
Informatica Client applications can contain the following types of metadata
extensions:
Vendor-defined. Third-party application vendors create vendor-defined
metadata extensions. You can view and change the values of vendor-defined
metadata extensions, but you cannot create, delete, or redefine them.
User-defined. You create user-defined metadata extensions using
PowerCenter/PowerMart. You can create, edit, delete, and view user-defined
metadata extensions. You can also change the values of user-defined extensions

45. What is a Metadata?


Data that is used to describe other data. Data definitions are sometimes referred
to as metadata. Examples of metadata include schema, table, index, view and
column definitions.

46. What are the types of metadata that stores in repository?


• Source definitions. Definitions of database objects (tables, views,
synonyms) or files that provide source data.
• Target definitions. Definitions of database objects or files that contain the
target data.
• Multi-dimensional metadata. Target definitions that are configured as
cubes and dimensions.
• Mappings. A set of source and target definitions along with
transformations containing business logic that you build into the
transformation. These are the instructions that the Informatica Server uses
to transform and move data.
• Reusable transformations. Transformations that you can use in multiple
mappings.
• Mapplets. A set of transformations that you can use in multiple mappings.
• Sessions and workflows. Sessions and workflows store information
about how and when the Informatica Server moves data. A workflow is a
set of instructions that describes how and when to run tasks related to
extracting, transforming, and loading data. A session is a type of task that
you can put in a workflow. Each session corresponds to a single mapping.

47. What is Informatica Metadata and where is it stored?


Informatica Metadata contains all the information about the source tables, target
tables, the transformations, so that it will be useful and easy to perform
transformations during the ETL process.
The Informatica Metadata is stored in Informatica repository.

48. Define informatica repository?


The Informatica repository is a relational database that stores information, or
metadata, used by the Informatica Server and Client tools. Metadata can include
information such as mappings describing how to transform source data, sessions
indicating when you want the Informatica Server to perform the transformations,
and connect strings for sources and targets.
The repository also stores administrative information such as usernames and
passwords, permissions and privileges, and product version.
Use repository manager to create the repository. The Repository Manager
connects to the repository database and runs the code needed to create the
repository tables. These tables stores metadata in specific format the informatica
server, client tools use.

49. What is power center repository?


The PowerCenter repository allows you to share metadata across repositories to
create a data mart domain. In a data mart domain, you can create a single global
repository to store metadata used across an enterprise, and a number of local
repositories to share the global metadata as needed.

50. What is metadata reporter?


It is a web based application that enables you to run reports against repository
metadata. With a meta data reporter, you can access information about yours
repository with out having knowledge of sql, transformation language or
underlying tables in the repository.

51. What does the Metadata Application Programming Interface (API) allow
you to do?
A. Repair damaged data dictionary entries.
B. Delete data dictionary information about database objects you no longer need.
C. Extract data definition commands from the data dictionary in a variety of
formats.
D. Prepare pseudocode modules for conversion to Java or PL/SQL programs
with a Metadata code generator.

52. Why you use repository connectivity?


When you edit, schedule the session each time, informatica server directly
communicates the repository to check whether or not the session and users are
valid. All the metadata of sessions and mappings will be stored in repository.

53. If I done any modifications for my table in back end does it reflect in
informatca warehouse or mapping designer or source analyzer?
NO. Informatica is not at all concern with back end data base. It displays all the
information that is to be stored in repository. If want to reflect back end changes
to informatica screens, again we have to import from back end to informatica by
valid connection and you have to replace the existing files with imported files.
54. What’s the diff between Informatica, powercenter server, repository
server and repository?
Powercenter server contains the scheduled runs at which time data should load
from source to target
Repository contains all the definitions of the mappings done in designer.

55. What are the tasks that Load manger process will do?
Manages the session and batch scheduling: When you start the informatica
server the load manager launches and queries the repository for a list of
sessions configured to run on the informatica server. When you configure the
session the load manager maintains list of list of sessions and session start
times. When you start a session load manger fetches the session information
from the repository to perform the validations and verifications prior to starting
DTM process.
Locking and reading the session: When the informatica server starts a session
load manager locks the session from the repository. Locking prevents you
starting the session again and again.
Reading the parameter file: If the session uses a parameter files, load manager
reads the parameter file and verifies that the session level parameters are
declared in the file
Verifies permission and privileges: When the session starts load manger
checks whether or not the user have privileges to run the session.
Creating log files: Load manger creates log file contains the status of session.

56. What are the mapping parameters and mapping variables?


Mapping parameter represents a constant value that you can define before
running a session. A mapping parameter retains the same value throughout the
entire session. When you use the mapping parameter, we declare and use the
parameter in a mapping or mapplet. Then define the value of parameter in a
parameter file for the session.
Unlike a mapping parameter, a mapping variable represents a value that can
change throughout the session. The informatica server saves the value of
mapping variable to the repository at the end of session run and uses that value
next time you run the session.

57. What are the rank caches?


During the session, the informatica server compares an input row with rows in
the datacache. If the input row out-ranks a stored row, the informatica server
replaces the stored row with the input row. The informatica server stores group
information in an index cache and row data in a data cache.

58. What is the status code?


Status code provides error handling for the informatica server during the session.
The stored procedure issues a status code that notifies whether or not stored
procedure completed sucessfully. This value can not seen by the user. It only
used by the informatica server to determine whether to continue running the
session or stop.

59. What are the tasks that source qualifier performs?


1. Join data originating from same source data base.
2. Filter records when the informatica server reads source data.
3. Specify an outer join rather than the default inner join
4. Specify sorted records.
5. Select only distinct values from the source.
6. Creating custom query to issue a special SELECT statement for the
informatica server to read source data.

60. What are parameter files ? Where do we use them?


Parameter file is any text file where u can define a value for the parameter
defined in the informatica session, this parameter file can be referenced in the
session properties, When the informatica sessions runs the values for the
parameter is fetched from the specified file.
For eg : $$ABC is defined in the infomatica mapping and the value for this
variable is defined in the file called abc.txt as
[foldername_session_name]
ABC='hello world"
In the session properties you can give in the parameter file name field abc.txt

61. What is a mapping, session, worklet, workflow, mapplet?


Mapping - represents the flow and transformation of data from source to target.
Mapplet - a group of transformations that can be called within a mapping.
Session - a task associated with a mapping to define the connections and other
configurations for that mapping.
Workflow - controls the execution of tasks such as commands, emails and
sessions.
Worklet - a workflow that can be called within a workflow.

62. What is the difference between Power Center & Power Mart?
Power Center : we can connect to single and multiple Repositories, generally
used in big Enterprises.
Power Mart : we can connect to only a single Repository. ERP support.

63. Can Informatica load heterogeneous targets from heterogeneous


sources?
Yes

64. What are snapshots?


A snapshot is a table that contains the results of a query of one or more tables or
views, often located on a remote database.

65. What are materialized views ?


Materialized view is a view in which data is also stored in some temp table .i.e. if
we will go with the View concept in DB in that we only store query and once we
call View it extract data from DB. But In materialized View data is stored in some
temp tables.

66. What is partitioning?


Partitioning is a part of physical data warehouse design that is carried out to
improve performance and simplify stored-data management. Partitioning is done
to break up a large table into smaller, independently-manageable components
because it:
1. reduces work involved with addition of new data.
2. reduces work involved with purging of old data.

67. What are the types of partitioning?


Two types of partitioning are:
1. Horizontal partitioning.
2. Vertical partitioning (reduces efficiency in the context of a data warehouse).

68. What is Full load & Incremental or Refresh load?


Full Load is the entire data dump load taking place the very first time.
Gradually to synchronize the target data with source data, there are further 2
techniques:-
Refresh load - Where the existing data is truncated and reloaded completely.
Incremental - Where delta or difference between target and source data is
dumped at regular intervals. Timestamp for previous delta load has to be
maintained.

69. What are the modules in Power Mart?


1. PowerMart Designer
2. Server
3. Server Manager
4. Repository
5. Repository Manager

70. What is a staging area? Do we need it? What is the purpose of a staging
area?
Staging area is place where you hold temporary tables on data warehouse
server. Staging tables are connected to work area or fact tables. We basically
need staging area to hold the data , and perform data cleansing and merging ,
before loading the data into warehouse.

71. How to determine what records to extract?


When addressing a table some dimension key must reflect the need for a record
to get extracted. Mostly it will be from time dimension (e.g. date >= 1st of current
mth) or a transaction flag (e.g. Order Invoiced Stat). Foolproof would be adding
an archive flag to record which gets reset when record changes.

72. What are the various transformation available?


Aggregator Transformation
Expression Transformation
Filter Transformation
Joiner Transformation
Lookup Transformation
Normalizer Transformation
External Transformation

73. What is a three tier data warehouse?


Three tier data warehouse contains three tier such as bottom tier, middle tier and
top tier.
Bottom tier deals with retrieving related data’s or information from various
information repositories by using SQL.
Middle tier contains two types of servers.
1.ROLAP server
2.MOLAP server
Top tier deals with presentation or visualization of the results .

74. How can we use mapping variables in Informatica? Where do we use


them?
After creating a variable, we can use it in any expression in a mapping or a
mapplet. Also they can be used in source qualifier filter, user defined joins or
extract overrides and in expression editor of reusable transformations.
Their values can change automatically between sessions.

75. Techniques of Error Handling - Ignore , Rejecting bad records to a flat


file , loading the records and reviewing them (default values)
Rejection of records either at the database due to constraint key violation or the
informatica server when writing data into target table. These rejected records we
can find in the badfiles folder where a reject file will be created for a session. we
can check why a record has been rejected. And this bad file contains first column
a row indicator and second column a column indicator.
These row indicators or of four types
D-valid data,
O-overflowed data,
N-null data,
T- Truncated data,
And depending on these indicators we can changes to load data successfully to
target.

76. How do we call shell scripts from informatica?


You can use a Command task to call the shell scripts, in the following ways:
1. Standalone Command task. You can use a Command task anywhere in the
workflow or worklet to run shell commands.
2. Pre- and post-session shell command. You can call a Command task as the
pre- or post-session shell command for a Session task. For more information
about specifying pre-session and post-session shell commands.

77. What are active transformation / Passive transformations?


An active transformation can change the number of rows as output after a
transformation, while a passive transformation does not change the number of
rows and passes through the same number of rows that was given to it as input.

78. How to use mapping parameters and what is their use?


In designer you will find the mapping parameters and variables options. you can
assign a value to them in designer. coming to there uses suppose you are doing
incremental extractions daily. suppose your source system contains the day
column. so every day u have to go to that mapping and change the day so that
the particular data will be extracted . if we do that it will be like a layman's work.
there comes the concept of mapping parameters and variables. once if u assign
a value to a mapping variable then it will change between sessions
79. How to delete duplicate rows in flat files source is any option in
informatica?
Use a sorter transformation , in that u will have a "distinct" option make use of it .

You might also like