DWH Interview Questions

15.10.
2022, 11:54 DWH Interview Questions – Data Warehousing
Data Warehousing
dwhbeginners.wordpress.com
DWH Interview Questions
DWH INTERVIEW QUESTIONS (http://sqlage.blogspot.in/2013/07/dwh-interview-questions.html)
1. What is Data Warehouse?
Different people have different definitions for a data warehouse. The most popular definition came
from Bill Inmon, who provided the following:
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in

support of management’s decision making process.
Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For example,
“sales” can be a particular subject.
Integrated: A data warehouse integrates data from multiple data sources. For example, source A and
source B may have different ways of identifying a product, but in a data warehouse, there will be
only a single way of identifying a product.
Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3
months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a
transactions system, where often only the most recent data is kept. For example, a transaction system
may hold the most recent address of a customer, where a data warehouse can hold all addresses
associated with a customer.
Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data
warehouse should never be altered.
Ralph Kimball provided a more concise definition of a data warehouse:
A data warehouse is a copy of transaction data specifically structured for query and analysis.
This is a functional view of a data warehouse. Kimball did not address how the data warehouse is
built like Inmon did; rather he focused on the functionality of a data warehouse.
Or
Data warehouse or enterprise data warehouse (DW, DWH, or EDW) is a database

(http://en.wikipedia.org/wiki/Database) used for reporting
(http://en.wikipedia.org/wiki/Business_reporting) and data analysis
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 1/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
(http://en.wikipedia.org/wiki/Data_analysis). It is a central repository of data which is created by

integrating data from one or more disparate sources. Data warehouses store current as well as
historical data and are used for creating trending reports for senior management reporting such as
annual and quarterly comparisons.
2. Why we need Data Warehouse
It is a central repository of data which is created by integrating data from one or more disparate
sources. Data warehouses store current as well as historical data and are used for creating trending
reports for senior management reporting such as annual and quarterly comparisons for decission
support and business intelligence.
Or
A lot of times people question the value of data warehousing. Why do we spend 1 year building a
data warehouse? We can’t wait that long. Let’s just install QlikView/Spotfire and feed the transaction
system direct to it and we have a BI!
Absolutely! You can. You can buy BO, MicroStrategy, QlikView, Spotfire or any BI tool you like, and
then report straight from the transaction system. Or, if you fancy, you can create a cube first (SSAS,
Cognos or Hyperion), then install appropriate client tool (Tableau, Strategy Companion, etc). Of
course you can. And this is the best way to learn about Data Warehousing: by not doing it.
What you will experience is:
a) Data quality issues
b) Low confident level from users
c) Quick turn around of report but data is unstable
d) Issues with data consistency
e) Issues with performance
The whole year spent on building a data warehouse essentially for providing a quality data source. A
data warehouse has the following characteristics:
a) Integrated
b) Consistent
c) Contains historical data
d) Tested and verified
e) Performant
A data warehouse integrates data from multiple sources correctly. This integration doesn’t happen
overnight. A Business Analyst spent weeks analysing the sources and wrote down a specification of
how the data should be integrated. A Data Architect looked at that spec and designed a performant
star schema to host the data. An ETL Architect looked at the star schema design and wrote an ETL
population spec. An ETL developer studied the ETL spec and built the workflows. And finally, a
tester verified the data.
That takes months, but as a result, we have integrated, consistent, clean data source containing the
correct and valid data. And it is performant. Your query doesn’t need to join 15 tables in a horrible
way. All the data is in a centralised place, ready for you to query.
Top Five Benefits of a Data Warehouse (http://spotfire.tibco.com/blog/?p=7597)
According to The Data Warehouse Institute (http://tdwi.org/portals/data-warehousing.aspx), a data

warehouse is the foundation for a successful BI program. The concept of data warehousing is pretty
easy to understand—to create a central location and permanent storage space for the various data
sources needed to support a company’s analysis, reporting and other BI functions.
And it’s really important for your business.
But a data warehouse also costs money — big money. The problem is when big money is involved it’s
tough to justify spending it on any project, especially when you can’t really quantify the benefits
upfront. When it comes to a data warehouse, it’s not easy to know what the benefits are until it’s up
and running. According to BI-Insider.com (http://bi-insider.com/portfolio/benefits-of-a-data-
warehouse/), here are the key benefits of a data warehouse once it’s launched.
A Data Warehouse Delivers Enhanced Business Intelligence
By providing data from various sources, managers and executives will no longer need to make
business decisions based on limited data or their gut. In addition, “data warehouses and related BI
can be applied directly to business processes including marketing segmentation, inventory
management, financial management, and sales.”
A Data Warehouse Saves Time
Since business users can quickly access critical data from a number of sources—all in one place—they
can rapidly make informed decisions on key initiatives. They won’t waste precious time retrieving
data from multiple sources.
Not only that but the business execs can query the data themselves with little or no support from IT—
saving more time and more money. That means the business users won’t have to wait until IT gets
around to generating the reports, and those hardworking folks in IT can do what they do best—keep
the business running.
A Data Warehouse Enhances Data Quality and Consistency
A data warehouse implementation includes the conversion of data from numerous source

systems into a common format. Since each data from the various departments is standardized, each
department will produce results that are in line with all the other departments. So you can have more
confidence in the accuracy of your data. And accurate data is the basis for strong business decisions.
A Data Warehouse Provides Historical Intelligence
A data warehouse stores large amounts of historical data so you can analyze different time periods
and trends in order to make future predictions. Such data typically cannot be stored in a transactional
database or used to generate reports from a transactional system.
A Data Warehouse Generates a High ROI
Finally, the piece de resistance—return on investment. Companies that have implemented data
warehouses and complementary BI systems have generated more revenue
(http://searchsqlserver.techtarget.com/tip/The-IDC-data-warehousing-ROI-study-An-analysis) and
saved more money than companies that haven’t invested in BI systems and data warehouses.
And that should be reason enough for senior management to jump on the data warehouse
bandwagon.
3. What are the goals of data warehouse?

Goals of a Data Warehouse
Make an organization’s information easily accessible
The contents of the data warehouse must be understandable and be intuitive and obvious to the
business user. The contents of the data warehouse need to be labeled meaningfully. The tools that
access the data warehouse must be simple and easy to use. They also must return query results to the
user with minimal wait times.
Present the organization’s information consistently
Consistent information means high-quality information. It means that all the data is accounted for
and complete. Consistency also implies that common definitions for the contents of the data
warehouse are available for users.
Be adaptive and resilient to change
We simply can’t avoid change. User needs, business conditions, data, and technology are all subject
to the shifting sands of time. The data warehouse must be designed to handle this inevitable change.
Be a secure bastion that protects our information assets
The data warehouse must effectively control access to the organization’s confidential information.
Serve as the foundation for improved decision making
The data warehouse must have the right data in it to support decision making.
4. What are the challenges and issues of data warehouse?
Configuration and change management
Configuration and change management is probably the largest single issue affecting data
warehouseImplementation and maintenance it operates at every level of the organization and is often
the “elephant in the room”—we all know it is there, and we know that we don’t do enough about it,
but nobody talks about it.
Managing and improving data quality
Data quality is often considered a major issue because of the garbage-in garbage-out principle. Most
data Warehouses faithfully reproduce any data quality issues in the source system, even amplifying
some of them.
Engagement with the enterprise architecture
According to Paul Weill, Director of MIT Center for Information Systems Research, Enterprise
Architecture is the Organization of logic for business processes and IT infrastructure, reflecting the
integration and standardization requirements of the firm’s operating model.
In practice, Enterprise Architecture depends on how an organization’s strategy and architecture

teams define the current state of processes and systems, how they define the future state of those
processes and systems, and how they build a migration path between the current and future states.
Enhancing return on investment
The on-going cost of running a data warehouse, especially in times of economic hardship, is often
questioned. It is therefore common to look for ways to improve the return on investment. This can be
done in one of two ways: by gaining more financial benefit from the output, or by reducing the cost to
manage and maintain the system.
Or
Carving out a data warehouse can look like a straightforward task on the surface. The path of least
resistance would seem to be to replicate the parent’s environment as-is in the carve-out organization,
using the exact same software and architecture. However, this is not always going to be the best
solution. In most cases, the new organization is much smaller than the parent organization and will
have significantly fewer capabilities to support the data warehouse than the parent. In addition,
changes within both the parent and the carve-out organizations during the transition period can lead
to an implementation in the new organization that needs to be different than the parent’s
implementation.
Change can be a significant component during carve-outs. The capabilities and needs of the carved
out organization will not always match up with those of the parent. Therefore, the implementation
needs to adapt to match up with those needs and capabilities. Here are a few major challenges that
you need to be prepared to tackle:
High software licensing costs: Large organizations with mature BI platforms usually have expensive
database, ETL and reporting tools to power their data warehouses and reporting systems. However,
chances are that the budget of the new organization is much smaller than the parent’s budget and the
licensing and support cost of those tools can exceed what a smaller organization can afford. With this
in mind, make sure to analyze what the carved out organization needs to do business and whether
any lower-cost alternatives are feasible. The market for BI tools is more mature than it was just a few
years ago, and lower-cost alternatives can prove to be just as capable as the tools provided by the
traditional high-cost vendors. While moving to a new set of tools may lengthen the time needed for
the carve-out effort or increase the amount of resources needed to perform the implementation, the
effort can pay itself off over the long run in lower licensing and ongoing support costs.
Complex architecture: This can be a great opportunity to rework the architecture without significant
additional impact to the business or the IT staff. Usually the parent company does business in
vertical or geographic markets that the new company will not be participating in, and complexity
related to this can be eliminated. In addition, the parent may have gone through acquisitions and
mergers that resulted in additional complexity while trying to adapt an acquisition target’s data into
an existing data warehouse. Simplifying the architecture where possible can ease the transition of
responsibility to production support staff, as the IT staff may be much more limited in skills and
manpower than the parent. A simplified architecture can reduce the timeline of the implementation
as well.
Code changes at the parent organization: While the carve-out is occurring, the parent organization
will continue conducting its day-to-day operations. This includes their IT operations, as the parent’s
IT staff continues maintenance of code in the data warehouse. This can lead to issues during building
and testing as developers attempt to hit a moving target and users attempt to validate results in the
new environment against the parent’s environment. Make sure to have a process in place to get logic
and code changes communicated to the new organization, as well as a change control process to
prioritize logic and code changes for evaluation as they come in.
As the case usually is for implementation projects, the right amount of planning will help ensure a
successful carve-out. Well-calculated changes in the right places can help ensure a solution that
meets the needs and capabilities of the new organization, and the result will be a data warehouse that
can well serve the organization.
5. What are the different architectural components of data warehouse?
Different data warehousing systems have different structures. Some may have an ODS (operational
data store), while some may have multiple data marts. Some may have a small number of data
sources, while some may have dozens of data sources. In view of this, it is far more reasonable to
present the different layers of data warehouse architecture rather than discussing the specifics of any
one system.
In general, all data warehouse systems have the following layers:
Data Source Layer
Data Extraction Layer
Staging Area
ETL Layer
Data Storage Layer
Data Logic Layer
Data Presentation Layer
Metadata Layer
System Operations Layer
The picture below shows the relationships among the different components of the data warehouse
architecture:
(https://dwhbeginners.files.wordpress.com/2013/09/data-warehouse-architecture1.jpg)
Each component is discussed individually below:
Data Source Layer
This represents the different data sources that feed data into the data warehouse. The data source can
be of any format — plain text file, relational database, other types of database, Excel file, etc., can all
act as a data source.
Many different types of data can be a data source:
Operations — such as sales data, HR data, product data, inventory data, marketing data,
systems data.
Web server logs with user browsing data.
Internal market research data.
Third-party data, such as census data, demographics data, or survey data.
All these data sources together form the Data Source Layer.
Data Extraction Layer
Data gets pulled from the data source into the data warehouse system. There is likely some minimal
data cleansing, but there is unlikely any major data transformation.
Staging Area
This is where data sits prior to being scrubbed and transformed into a data warehouse / data mart.
Having one common area makes it easier for subsequent data processing / integration.
ETL Layer
This is where data gains its “intelligence”, as logic is applied to transform the data from a
transactional nature to an analytical nature. This layer is also where data cleansing happens. The ETL
design phase (http://www.1keydata.com/datawarehousing/etl.html) is often the most time-
consuming phase in a data warehousing project, and an ETL tool
(http://www.1keydata.com/datawarehousing/tooletl.html) is often used in this layer.
Data Storage Layer
This is where the transformed and cleansed data sit. Based on scope and functionality, 3 types of
entities can be found here: data warehouse, data mart, and operational data store (ODS). In any given
system, you may have just one of the three, two of the three, or all three types.
Data Logic Layer
This is where business rules are stored. Business rules stored here do not affect the underlying data
transformation rules, but do affect what the report looks like.
Data Presentation Layer
This refers to the information that reaches the users. This can be in a form of a tabular / graphical
report in a browser, an emailed report that gets automatically generated and sent everyday, or an
alert that warns users of exceptions, among others. Usually anOLAP tool
(http://www.1keydata.com/datawarehousing/toololap.html) and/or a reporting tool
(http://www.1keydata.com/datawarehousing/toolreporting.html) is used in this layer.
Metadata Layer
This is where information about the data stored in the data warehouse system is stored. A logical data
model would be an example of something that’s in the metadata layer. A metadata tool
(http://www.1keydata.com/datawarehousing/toolmetadata.html) is often to used to manage
metadata.
System Operations Layer
This layer includes information on how the data warehouse system operates, such as ETL job status,
system performance, and user access history.
6. What are the different interconnected layers of a dataware house?
There are four different interconnected layers they are: –
• Operational database layer

• Informational access layer
• Data access layer
• Meta data layer.
7. Explain process flow in data warehouse?8. Why should you put your data warehouse on a
different system than your OLTP system?
A OLTP system is basically ” data oriented ” (ER model) and not ” Subject oriented “(Dimensional
Model) .That is why we design a separate system that will have a subject oriented OLAP system…
Moreover if a complex querry is fired on a OLTP system will cause a heavy overhead on the OLTP
server that will affect the daytoday business directly.
Or
The loading of a warehouse will likely consume a lot of machine resources. Additionally, users may
create querries or reports that are very resource intensive because of the potentially large amount of
data available. Such loads and resource needs will conflict with the needs of the OLTP systems for
resources and will negatively impact those production systems.
9. What are the steps to build the data warehouse? (http://www.questions-interviews.com/data-
warehouse/data-warehousing-2.aspx#What_are_the_steps_to_build_the_data_warehouse)
Gathering business requirements>>Identifying Sources>>Identifying Facts>>Defining

Dimensions>>Define Attributes>>Redefine Dimensions / Attributes>>Organize Attribute
Hierarchy>>Define Relationship>>Assign Unique Identifiers
10.
What is the standard procedure for creating a data warehouse?
This is also one of the most crucial data warehouse interview questions when learning about data
warehouses. The standard procedure used to make a data warehouse is very similar to majority of
database projects. Below are the common steps:
Identification and collection of requirements
Conducting dimensional modeling
Development of the data warehouse architecture which include the ODS or Operational Data Store
Designing OLAP cubes and the relational database
Development of applications to be used for maintaining stored data
Development of applications to be utilized for analysis
Testing and deploying the completed data warehouse system
11. What are the general stages of use of dataware house?
These are the general stages of use: –
Offline Operational Databases – Data warehouses in this initial Stage are developed by simply
copying the database of an operational System to an off-line server where the processing load of
reporting does not impact on the operational system’s performance.
Offline Data Warehouse – Data warehouses in this stage of evolution are updated on a regular time
cycle (usually daily, weekly or monthly) from the operational systems and the data is stored in an
integrated reporting-oriented data structure
Real Time Data Warehouse – Data warehouses at this stage are updated on a transaction or event
basis, every time an operational system performs a transaction (e.g. an order or a delivery or a
booking etc.)
Integrated Data Warehouse – Data warehouses at this stage are used to generate activity or
transactions that are passed back into the operational systems for use in the daily activity of the
organization.
12. Hierarchy of DWH?
Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy
can be used to define data aggregation. For example, in a time dimension, a hierarchy might
aggregate data from the month level to the quarter level to the year level. A hierarchy can also be
used to define a navigational drill path and to establish a family structure.
Within a hierarchy, each level is logically connected to the levels above and below it. Data values at
lower levels aggregate into the data values at higher levels. A dimension can be composed of more
than one hierarchy. For example, in the product dimension, there might be two hierarchies–one for
product categories and one for product suppliers.
Dimension hierarchies also group levels from general to granular. Query tools use hierarchies to
enable you to drill down into your data to view different levels of granularity. This is one of the key
benefits of a data warehouse.
When designing hierarchies, you must consider the relationships in business structures. For eg: a
divisional multilevel sales organization.
Hierarchies impose a family structure on dimension values. For a particular level value, a value at the
next higher level is its parent, and values at the next lower level are its children. These familial
relationships enable analysts to access data quickly.
13. Define schema hierarchy? (http://www.atoziq.com/2012/10/define-schema-hierarchy-data.html)
A concept hierarchy that is a total (or) partial order among attributes in a database schema is called a
schema hierarchy.
14. Differences between systems(OLTP,OLAP,ODS,DWH,DSS)

(http://satishmsbi.blogspot.in/2011/10/differences-between-systemsoltpolapodsd.html)
The below are the differences between systems. Please go through them.
OLTP OLAP
1. It is dynamic. 1. It is static [unchanged].

2. It follows normalization. 2. It follows denormalization.
3. It contains current data. 3. It contains historical data.
4. It is designed to support 4. It is designed to support decision
transactional Process making process
5. It contains detailed data. 5. It contains summarized information.
ODS DWH

operational Process
Similarities
1. Integrated database. 1. Integrated database.
2. Enterprise data. 2. Enterprise data.
3. Subject oriented database. 3. Subject oriented database.
Differences
1. Contains current information. 1. Contains historical information.
2. Data is volatile. 2. Data is non-volatile.
3. Contains detail information. 3 . Contains summary information.

ODS OLTP
1. Subject oriented database. 1. Application oriented database.

OLTP DWH

2. It contains current data. 2. It contains historical data.
3. It is application oriented database. 3. It is subject oriented database.
4. It is not flexible. 4. It is flexible.
5. It stored all data. 5. It stores relevant data.
OLTP DSS
1. It is designed to support operational 1. It is designed to support decision making

process process
3. Data is in inconsistency form. 3. It is in consistent form.
4. It stores recent data for approximately 4 to 4. It stores One year data.
6 months data.
5. It follows normalized schema. 5. It follows star schema.
DWH DM
1. It is about entire organization. 1. It is about individual department in

Organization
2. It is created on RDBMS. 2. It is created on RDBMS & MDDB.
3. It follows integrated schema design. 3. It follows star schema design.
4. It is integrated database. 4 . Subject oriented databases.

15. What is data Mart? And diff b/w data mart and data warehouse?
A data mart is a simple form of a data warehouse that is focused on a single subject (or functional
area), such as Sales or Finance or Marketing. Data marts are often built and controlled by a single
department within an organization.
DATAMART DATA WAREHOUSE
A scaled – down version of the Data It is a database management system that

Warehouse that addresses only one subject facilitates on-line analytical processing by
like Sales Department, HR Department etc., allowing the data to be viewed in different
dimensions or perspectives to provide
business intelligence.
One fact table with multiple dimension More than one fact table and multiple
tables. dimension tables.
[Sales Department] [HR Department] [Sales Department , HR Department ,
[Manufacturing Department] Manufacturing Department]
Small Organizations prefer DATAMART Bigger Organization prefer DATA
WAREHOUSE
16. What is difference between OLTP and OLAP
OLTP OLAP
On Line Transaction processing On Line Analytical processing

Continuously updates data Read Only Data
Tables are in normalized form Partially Normalized / Denormalized Tables
Single record access Multiple records for analysis purpose
Holds current data Holds current and historical data

Records are maintained using Primary key field Records are baased on surogate keyfield
Delete the table or record Cannot delete the records
Complex data model Simplified data model
Source data is operational data. This data is the Data comes from various OLTP data sources as
source of truth. shown in the above diagram
Data is inserted via short inserts and updates. Periodic (i.e. scheduled) and long running (i.e.
The data is normally captured via user actions during off-peak) batch jobs refresh the data.
via web based applications. Also, known as ETL process as shown in the
diagram.
Regular backup of data is required to prevent Data can be reloaded from the OLTP systems if
any loss of data, monetary loss, and legal required. Hence, stringent backup is not
liability. required.
Transactional data older than certain period can The volume of this data will be higher as well
be archived and purged based on the due to its requirement to maintain historical
compliance requirements. data.
The typical users are operational staff. The typical users are management and
executives to make business decisions.
The space requirement is relatively small if the The space requirement is larger due to the
historical data is archived. existence of aggregation structures and
historical data. Also requires more indexes than
OLTP.
17. Explain about the integrated dataware house stage?
At this stage dataware house gets updated every time a transaction occurs. The transactions
performed during this time are passed back to the operational systems which records the
transactions.
18. What is Virtual Data Warehousing?
A virtual or point-to-point data warehousing strategy means that end-users are allowed to get at
operational databases directly using whatever tools are enabled to the “data access network”
Or
A virtual data warehouse provides a compact view of the data inventory. It contains Meta data. It
uses middleware to build connections to different data sources. They can be fast as they allow users
to filter the most important pieces of data from different legacy applications.
19. What is real time data-warehousing?
Date warehouses are updated on a transaction or event basis, every time an operational system
performs a transaction.
Or
Data warehousing captures business activity data. Real-time data warehousing captures business
activity data as it occurs. As soon as the business activity is complete and there is data about it, the
completed activity data flows into the data warehouse and becomes available instantly.
Or
Real-time data warehousing is a combination of two things: 1) real-time activity and 2) data
warehousing. Real-time activity is activity that is happening right now. The activity could be
anything such as the sale of widgets. Once the activity is complete, there is data about it.
Data warehousing captures business activity data. Real-time data warehousing captures business
activity data as it occurs. As soon as the business activity is complete and there is data about it, the
completed activity data flows into the data warehouse and becomes available instantly. In other
words, real-time data warehousing is a framework for deriving information from data as the data
becomes available.
20. What is active data warehouse?
DEFINITION
An active data warehouse (ADW) is a data warehouse implementation that supports near-time or
near-real-time decision making. It is featured by event-driven actions triggered by a continuous
stream of queries (generated by people or applications) against a broad, deep granular set of
enterprise data.
Other Definitions
An Active data warehouse aims to capture data continuously and deliver real time data. They
provide a single integrated view of a customer across multiple business lines. It is associated with
Business Intelligence Systems
Wingspan Technology
What is an ADW? An ADW is a relational data warehouse environment that supports real-time
updates, fast response times, aggregated data queries with detailed drill-down capabilities and
dynamic mutability to support changing business needs. It places more control and power into the
hands of the decision makers who use the system most and know the business best.
Teradata
An active data warehouse (ADW) is a traditional data warehouse extended to provide operational
intelligence based on historical data combined with today’s up-to-date data. The ADW supports
mixed workloads from an enterprise data warehouse that serves as a single source of truth for
decision making with predictable service levels for query response times, near real-time data
freshness, and mission critical data availability. Moreover, the ADW integrates into the overall
enterprise architecture to deliver decision services throughout an organization.
21. What is VLDB?
VLDB is abbreviation of Very Large DataBase. A one terabyte database would normally be
considered to be a VLDB. Typically, these are decision support systems or transaction processing
applications serving large numbers of users.
22. What is Data Mining?
Data Mining is the process of analyzing data from different perspectives and summarizing it into
useful information.
23. What is ODS?
ODS is also a small DWH which will help analyst to analysis the business. It will have data for less
number of days. Generally it will be around 1 month to 6 months. Like DWH here also surrogate
keys will be generated, error and reject handling will be done.
Operational Data Store is used by many organizations for analysis purpose as well as for data backup
and data recovery. Data Stored in ODS is usally in Normalized form as in transactonal DB’s.
While in DWH data will be denormalized ODS is actually a replica of Tacnsactional database,
collection two or more business functions data.
Or
An Operational Data Store (ODS) integrates data from
multiple business operation sources to address operational
problems that span one or more business functions. An ODS
has the following features:
• Subject-oriented — Organized around major subjects
of an organization (customer, product, etc.), not specific
applications (order entry, accounts receivable, etc.).
• Integrated — Presents an integrated image of

subject-oriented data which is pulled from fragmented
operational source systems.
• Current — contains a snapshot of the current
content of legacy source systems. History is not kept, and
might be moved to the data warehouse for analysis.
• Volatile — Since ODS content is kept current, it
changes frequently. Identical queries run at different
times may yield different results.
• Detailed — ODS data is generally more detailed than
data warehouse data. Summary data is usually not stored in
an ODS; the exact granularity depends on the subject that
is being supported.
Operational Data store is a subject oriented, integrated, current, volatile collection of data
used to support the tactical decision making process for the enterprise. It is the central point of data
integration for busienss management, delivering a common view of enterprise data.
ODS means operational data store which supports operational monitoring, data is volatile, current,
detailed, subjectoriented and integrated.
ODS is abbreviation of Operational Data Store. a database structure that is a repository for near real-
time operational data rather than long term trend data. The ODS may further become the enterprise
shared operational database, allowing operational systems that are being re-engineered to use the
ODS as there operation databases.
Or
A collection of operation or bases data that is extracted from operation databases and standardized,
cleansed, consolidated, transformed, and loaded into enterprise data architecture. An ODS is used to
support data mining of operational data, or as the store for base data that is summarized for a data
warehouse. The ODS may also be used to audit the data warehouse to assure summarized and
derived data is calculated properly. The ODS may further become the enterprise shared operational
database, allowing operational systems that are being reengineered to use the ODS as there operation
databases.
A collection of operation or bases data that is extracted from operation databases and standardized,
cleansed, consolidated, transformed, and loaded into enterprise data architecture. An ODS is used to
support data mining of operational data, or as the store for base data that is summarized for a data
warehouse. The ODS may also be used to audit the data warehouse to assure summarized and
derived data is calculated properly. The ODS may further become the enterprise shared operational
database, allowing operational systems that are being reengineered to use the ODS as there operation
databases.
24. What is the difference between ODS and data warehouse?
ODS DWH

operational Process
Similarities
1. Integrated database. 1. Integrated database.
2. Enterprise data. 2. Enterprise data.
3. Subject oriented database. 3. Subject oriented database.
Differences
1. Contains current information. 1. Contains historical information.
3. Contains detail information. 3 . Contains summary information.

25. What is Business Intelligence?
Business Ingelligence (BI) – technology infrastructure for gaining maximum information from
available data for the purpose of improving business processes (http://datawarehouse4u.info/What-
is-Business-Intelligence.html). Typical BI infrastructure components are as follows: software solution
for gathering, cleansing, integrating, analyzing and sharing data. Business Intelligence
(http://datawarehouse4u.info/What-is-Business-Intelligence.html) produces analysis and provides
believable information to help making effective and high quality business decisions.
The most common kinds of Business Intelligence systems (http://datawarehouse4u.info/What-is-

Business-Intelligence.html) are:
* EIS – Executive Information Systems
* DSS – Decision Support (http://datawarehouse4u.info/What-is-Business-Intelligence.html) Systems
* MIS – Management Information Systems
* GIS – Geographic Information Systems
* OLAP (http://datawarehouse4u.info/What-is-Business-Intelligence.html) – Online Analytical

Processing and multidimensional analysis
* CRM – Customer Relationship Management
Business Intelligence systems (http://datawarehouse4u.info/What-is-Business-
Intelligence.html) based on Data Warehouse technology. A Data Warehouse(DW) gathers information
from a wide range of company’s operationalsystems (http://datawarehouse4u.info/What-is-Business-

Intelligence.html), Business Intelligence systems based on it. Data loaded to DW is usually good
integrated and cleaned that allows to produce credible information which reflected so called ‘one
version of the true’.
26. What is difference between business intelligence and data warehousing?
Data Warehousing helps you store the data while business intelligence helps you to control the data
for decision making, forecasting etc.
Data warehousing using ETL jobs, will store data in a meaningful form. However, in order to query
the data for reporting, forecasting, business intelligence tools were born.
The management of different aspects like development, implementation and operation of a data
warehouse is dealt by data warehousing. It also manages the meta data, data cleansing, data
transformation, data acquisition persistence management, archiving data.
In business intelligence the organization analyses the measurement of aspects of business such as
sales, marketing, efficiency of operations, profitability, and market penetration within customer
groups. The typical usage of business intelligence is to encompass OLAP, visualization of data,
mining data and reporting tools.
27. What is the difference between dependent data warehouse and independent data warehouse?
A dependent data warehouse stored the data in a central data warehouse. On the other hand
independent data warehouse does not make use of a central data warehouse.
Or
Dependent data ware house are build ODS, where as independent data warehouse will not depend
on ODS.
28. What are dependent and independent data marts? (http://www.atoziq.com/2012/10/what-are-

dependent-and-independent-data.html)
Dependent data marts are sourced directly from enterprise data warehouses.
Independent data marts are data captured from one (or) more operational systems (or) external
information providers (or) data generated locally with in particular department (or) geographic area.
29. What is XMLA?
XMLA is XML for Analysis which can be considered as a standard for accessing data in OLAP, data
mining or data sources on the internet. It is Simple Object Access Protocol. XMLA uses discover and
Execute methods. Discover fetched information from the internet while Execute allows the
applications to execute against the data sources.
XMLA stands for XML for Analysis. It is an industry standard for accessing data in analytical
systems, such as OLAP. XMLA is based on XML, SOAP and HTTP.
30. What is the difference between OLAP, ROLAP, MOLAP and HOLAP?
31. What is a cube in data warehousing concept? (http://www.questions-interviews.com/data-

warehouse/data-warehousing-2.aspx#What_is_a_cube_in_data__warehousing_concept)
Cubes are logical representation of multidimensional data. The edge of the cube contains dimension
members and the body of the cube contains data values.
Or
Cubes are data processing units composed of fact tables and dimensions from the data warehouse.
They provide multidimensional views of data, querying and analytical capabilities to clients.
Or
Multi dimensional data is logically represented by Cubes in data warehousing. The dimension and
the data are represented by the edge and the body of the cube respectively. OLAP environments view
the data in the form of hierarchical cube. A cube typically includes the aggregations that are needed
for business intelligence queries.
32. What is a linked cube?
Linked cube in which a sub-set of the data can be analysed into great detail. The linking ensures that
the data in the cubes remain consistent.
33. What is cube grouping?
A transformer built set of similar cubes is known as cube grouping. A single level in one dimension of
the model is related with each cube group. Cube groups are generally used in creating smaller cubes
that are based on the data in the level of dimension.
34. What is data cube technology used for?
Data cube is a multi-dimensional structure. Data cube is a data abstraction to view aggregated data
from a number of perspectives. The dimensions are aggregated as the ‘measure’ attribute, as the
remaining dimensions are known as the ‘feature’ attributes.
Data is viewed on a cube in a multidimensional manner. The aggregated and summarized facts of
variables or attributes can be viewed. This is the requirement where OLAP plays a role.
Or
Data cube is a multi-dimensional structure. Data cube is a data abstraction to view aggregated data
from a number of perspectives. The dimensions are aggregated as the ‘measure’ attribute, as the
remaining dimensions are known as the ‘feature’ attributes. Data is viewed on a cube in a
multidimensional manner. The aggregated and summarized facts of variables or attributes can be
viewed. This is the requirement where OLAP plays a role.
Data cubes are commonly used for easy interpretation of data. It is used to represent data along with
dimensions as some measures of business needs. Each dimension of the cube represents some
attribute of the database. Eg: Sales per day, month or year.
Or
A multi-dimensional structure called the data cube. A data abstraction allows one to view aggregated
data from a number of perspectives. Conceptually, the cube consists of a core or base cuboids,
surrounded by a collection of sub-cubes/cuboids that represent the aggregation of the base cuboids
along one or more dimensions. We refer to the dimension to be aggregated as the measure attribute,
while the remaining dimensions are known as the feature attributes.
35. What are offline OLAP cubes?
These are OLAP cubes created by clients, end users or third-party applications accessing a data
warehouse, relational database or OLAP cube through the Microsoft® PivotTable® Service. E.g.
Microsoft® Excel™ is very popular as a client for creating offline local OLAP cubes from relational
databases for multidimensional analysis. These cubes have to be maintained and managed by the end
users who have to manually refresh their data.
36. What are virtual cubes?
These are combinations of one or more real cubes and require no disk space to store them. They store
only the definitions and not the data of the referenced source cubes. They are similar to views in
relational databases.
Or
A virtual or point-to-point data warehousing strategy means that end-users are allowed to get at
operational databases directly using whatever tools are enabled to the “data access network”.
Or
These are combinations of one or more real cubes and require no disk space to store them. They store
only the definitions and not the data of the referenced source cubes. They are similar to views in
relational databases.
37.
What is an OLAP cube?
An OLAP cube will connect to a data source to read and process the raw data to perform
aggregations and calculations for its associated measures. Cubes are the core components of OLAP
systems. They aggregate facts from every level in a dimension provided in a schema. For example,
they could take data about products, units sold and sales value, and then add them up by month, by
store, by month and store and all other possible combinations. They’re called cubes because the end
data structure resembles a cube.
38. What are the types of OLAP?
ROLAP (Relational OLAP) – Users see their data organized in cubes and dimensions but the data is
really stored in RDBMS. The performance is slow. a storage mode that uses tables in a relational
database to store multidimensional structures.
MOLAP (Multidimensional OLAP) – Users see their data organized in cubes and dimensions but
the data is really stored in MDBMS. Query performance is fast.
HOLAP (Hybrid OLAP) – It is a combination of ROLAP and HOLAP. EG: HOLOs. In this one will
find data queries on aggregated data as well as detailed data.
39. What are MOLAP cubes?
MOLAP Cubes: stands for Multidimensional OLAP. In MOLAP cubes the data aggregations and a
copy of the fact data are stored in a multidimensional structure on the Analysis Server computer. It is
best when extra storage space is available on the Analysis Server computer and the best query
performance is desired. MOLAP local cubes contain all the necessary data for calculating aggregates
and can be used offline. MOLAP cubes provide the fastest query response time and performance but
require additional storage space for the extra copy of data from the fact table.
40. What are ROLAP cubes?
ROLAP Cubes: stands for Relational OLAP. In ROLAP cubes a copy of data from the fact table is not
made and the data aggregates are stored in tables in the source relational database. A ROLAP cube is
best when there is limited space on the Analysis Server and query performance is not very important.
ROLAP local cubes contain the dimensions and cube definitions but aggregates are calculated when
they are needed. ROLAP cubes require less storage space than MOLAP and HOLAP cubes.
41. What are HOLAP cubes?
HOLAP Cubes: stands for Hybrid OLAP. A ROLAP cube has a combination of the ROLAP and
MOLAP cube characteristics. It does not create a copy of the source data however; data aggregations
are stored in a multidimensional structure on the Analysis Server computer. HOLAP cubes are best
when storage space is limited but faster query responses are needed.
42. What is the PivotTable® Service? (http://www.paretoanalysts.com/)
This is the primary component that connects clients to the Microsoft® SQL Server™ 2000 Analysis
Server. It also provides the capability for clients to create local offline cubes using it as an OLAP
server. PivotTable® Service does not have a user interface, the clients using its services has to provide
its user interface.
43. What is difference between Bill Inmon and Ralph Kimball approaches of Data Warehouse
architecture
Bill Inmon Approach: According to Bill Inmon Data warehouse need to fulfill need of all category of
users .In an organization there are different type of user like
· Marketing
· Supply Change Management
· Operations
Each department has its different way of interpreting data so Data warehouse should be able to
answer each department queries. This can be achieved by designing tables in 3NF form. According to
him data in Datawarehouse should be in 3NF and lowest granularity level. The data should be
accessible at detailed atomic levels by drilling down or at summarized levels by drilling up.
He stressed that data should be organized into subject oriented, integrated, non volatile and time
variant structures. According to him an organization have one Data warehouse and Data mart source
there information from Data warehouse. Inmon Approach is also called Top Down approach.
In this methodology data is brought into staging area from OLTP system or ODS (Operational Data
store) and then summarized and aggregated. After this process data mart will source their data from
data warehouse and will apply new set of transformation and aggregation according to their need.
Key points to be noted about this approach
1. Data should be organized into subject oriented, integrated, non volatile and time variant structures
2.Data in 3rd Normalization form
3.Top to down approach
4. Data Mart source from Datawarehouse
Pro’s of Bill Inmon approach
1. Easy to maintain
2. Well integrated
Cons of Bill Inmon approach
Difficult to implement
Ralph Kimball Approach:
Kimbell views Data warehouse as combination of data marts connected to Data warehouse bus
structure.Data marts are focused on delivering business objectives of different departments and Data
warehouse bus consists of conformed dimension, measures defined for whole organization. User can
query all data marts together using conformed dimensions.
In this approach the data warehouse is not a physical storage of the data as in the Inmon approach. It
is virtual It is a combination of data marts, each having a star schema design.
In this approach data is always stored in dimensional model.
Key points to be noted about this approach are
1. Data is always stored in the dimensional model.
2. Data ware house is Virtual
3. Bottoms’s up approach
Pro’s of Ralph Kimball approach
Fast to build
Cons of Ralph Kimball approach
Difficult to maintain because of redundancy of data across data marts
Conclusion: In reality there is no right or wrong between these two approaches. In reality actual
methodology implemented is combination of both.
There are two major design methodologies followed in data warehousing Ralph Kimball and Bill
Inmon.We will discuss about both of these in detail.
Pros and cons of both the approaches
Ralph Kimball Bill Inmon
Build business process oriented small data One centralize data warehouse which will
marts which are joined to each other using act as a enterprise-wide datawarehouse and
common dimension between business then build data mart as per need for specific
processes. department or process
It is known as bottom-up approach It is known as top down approach
Data marts should be build on dimensional Central data warehouse to follow ER
modelling approach modelling approach

44. What is data modeling?
Data modeling is representing the real world set of data structures or entities and their relationship in
their data models, required for a database.Data Modelling consists of various types like:
Conceptual data modeling
Logical data modeling
Physical data modeling
Enterprise data modeling
Relation data modeling
Dimensional data modeling
45. Can you explain the general data modeling lifecycle?
46. Difference between data modeling and data mining?
Data modeling aims to identify all entities that have data. It then defines a relationship between these
entities. Data models can be conceptual, logical or Physical data models. Conceptual models are
typically used to explore high level business concepts in case of stakeholders. Logical models are
used to explore domain concepts while Physical models are used to explore database design.
Data mining is used to examine or explore the data using queries. These queries can be fired on the
data warehouse. Data mining helps in reporting, planning strategies, finding meaningful patterns etc.
it can be used to convert a large amount of data into a sensible form.
47. What is Dimensional Model?
A Dimensional Model is a database structure that is optimized for online queries and Data
Warehousing tools. It is comprised of “fact” and “dimension” tables.
A “fact” is a numeric value that a business wishes to count or sum. A dimension table stores
attributes, or dimensions, that describe the objects in a fact table.
Dimensional Models are designed for reading, summarizing and analyzing numeric information,
whereas Relational Models are optimized for adding and maintaining data using real-time
operational systems
Or
Dimensional model consists of dimension and fact tables. Fact tables store different transactional
measurements and the foreign keys from dimension tables that qualify the data. The goal of
Dimensional model is not to achive high degree of normalization but to facilitate easy and faster data
retrieval.
Ralph Kimball is one of the strongest proponents of this very popular data modeling technique which
is often used in many enterprise level data warehouses.
48. Can you explain the dimensional data modeling lifecycle?
49. What are the different architectural choices available in dimensional modeling?
50. What is hierarchy and how it is handled in dimensional model?
51. What is dimensional approach for storing data in data warehouse?
In dimensional approach transaction data are partitioned into either “facts”, which are generally
numeric transaction data, or “dimensions”, which are the reference information that gives context to
the facts.
52. How we organize sales transactions data into dimensional approach?
A sales transaction can be broken up into facts such as the number of products ordered and the price
paid for the products, and into dimensions such as order date, customer name, product number,
order ship-to and bill-to locations, and salesperson responsible for receiving the order.
53. What is the main advantage in using dimensional approach?
The main advantage in using dimensional approach is that the data warehouse is easier for the user
to understand and to use. The retrieval of data from data warehouse tends to operate very quickly.
54. What are the main disadvantages of using dimensional approach?
There are mainly two disadvantages of using dimensional approach to storing data in data
warehouse.
1. For maintaining integrity of facts and dimensions, the process of loading the data from data
warehouse from different operational systems gets complicated.
2. If organization is adopting the dimensional approach changes the way in which it does the
business, it is difficult to modify the data warehouse structure.
55. What is normalization approach for storing data in data warehouse?
In normalization approach the data in the data warehouse are stored following, to a degree, database
normalization rules. Tables are grouped together by subject areas that reflect general data categories
(e.g., data on customers, products, finance, etc.).
56. What are the advantages and disadvantage of using normalization approach for storing data in
data warehouse?
Advantages:
1. The main advantage is that it is easy to add information into the database.
Disadvantages:
1. It is difficult to join data from different sources into meaningful information
2. It is also difficult to access the information without a precise understanding of the source of data
and the data structure of the data warehouse.
57. If de-normalized is improves data warehouse processes, why fact table is in normal form?
Foreign keys of facts tables are primary keys of Dimension tables. It is clear that fact table contains
columns which are primary key to other table that itself make normal form table.
58. What are Normalization, First Normal Form, Second Normal Form, And Third Normal Form?
1. Normalization is process for assigning attributes to entities–Reduces data redundancies–Helps

eliminate data anomalies–Produces controlled redundancies to link tables
2. Normalization is the analysis of functional dependency between attributes / data items of

userviews? It reduces a complex user view to a set of small and stable subgroups of fields / relations
1NF: Repeating groups must be eliminated, Dependencies can be identified, All key attributes
defined, No repeating groups in table
2NF: The Table is already in1NF,Includes no partial dependencies–No attribute dependent on a

portion of primary key, Still possible to exhibit transitive dependency, Attributes may be functionally
dependent on non-key attributes
3NF: The Table is already in 2NF, Contains no transitive dependencies
59. What is Data Normalization and denormalization
Database normalization is the process of organizing the fields

(https://en.wikipedia.org/wiki/Field_(computer_science)) and tables
(https://en.wikipedia.org/wiki/Table_(database)) of a relational database
(https://en.wikipedia.org/wiki/Relational_database) to minimize redundancy
(https://en.wikipedia.org/wiki/Data_redundancy) and dependency. Normalization usually involves
dividing large tables into smaller (and less redundant) tables and defining relationships between
them. The objective is to isolate data so that additions, deletions, and modifications of a field can be
made in just one table and then propagated through the rest of the database using the defined
relationships.
60.
Why would you like to derenormalize your design?
Faster retrieval of data.
Minimize the number of joins.
61.
What do you mean by Dimension Attributes?
62. What is difference between Dimensional Model and Entity-Relationship model
Entity-Relationship model Dimensional Model
One table per entity One fact table for data organization
Minimize data redundancy Maximize understandability
Data is normalized and used for OLTP. Data is de normalized and used in data
Optimized for OLTP processing warehouse and data mart. Optimized for
OLAP
The Transaction Processing Model The data warehousing model or anlytical
model
Tables are units of storage Cubes are units of storage
Several tables and chains of relationships Few tables and fact tables are connected to
among them dimensional tables
63. What are the differences b/w dimension table and fact table?
Dimension Table Fact Table
It provides the context /descriptive It provides measurement of an enterprise.

information for a fact table measurement.
Structure of Dimension – Surrogate key, one Measurement is the amount determined by
or more other fields that compose the observation.
natural key (nk) and set of Attributes.
Size of Dimension Table is smaller than Fact Structure of Fact Table – foreign key (fk),
Table. Degenerated Dimension and Measurements.
. In a schema more number of dimensions is Size of Fact Table is larger than Dimension
presented than Fact Table. Table.
Surrogate Key is used to prevent the In a schema less number of Fact Tables
primary key (pk) violation (store historical observed compared to Dimension Tables.
data).
Provides entry points to data. Compose of Degenerate Dimension fields act
as Primary Key.
Values of fields are in numeric and text Values of the fields always in numeric or
representation. integer form.

64. What is fact table and types of Fact Table?
A fact table is the central table in a star schema of a data warehouse. A fact table stores quantitative
information.
A fact table typically has two types of columns those are facts or measures and foreign keys to
dimension tables.
Types of Fact Tables
There are two types of fact tables:
Cumulative: This type of fact table describes what has happened over a period of time. For example,
this fact table may describe the total sales by product by store by day. The facts for this type of fact
tables are mostly additive facts. The first example presented here is a cumulative fact table.
Snapshot: This type of fact table describes the state of things in a particular instance of time, and
usually includes more semi-additive and non-additive facts. The second example presented here is a
snapshot fact table.
(Or)
Transactional – Transactional fact table is the most basic one that each grain associated with it
indicated as “one row per line in a transaction”, e.g., every line item appears on an invoice.
Transaction fact table stores data of the most detailed level therefore it has high number of
dimensions associated with.
Periodic snapshots – Periodic snapshots fact table stores data that is a snapshot in a period of time.
The source data of periodic snapshots fact table is data from a transaction fact table where you choose
period to get the output.
Accumulating snapshots – The accumulating snapshots fact table describes activity of a business
process that has clear beginning and end. This type of fact table therefore has multiple date columns
to represent milestones in the process. A good example of accumulating snapshots fact table is
processing of a material. As steps towards handling the material are finished, the corresponding
record in the accumulating snapshots fact table get updated.
65.
What is FactLess Fact Table and where we need to create this table
A fact table which does not contain numeric fact columns it is called factless facts table.
Or
In the real world, it is possible to have a fact table that contains no measures or facts. These tables are
called “Factless Fact tables”.
Eg: A fact table which has only product key and date key is a factless fact. There are no measures in
this table. But still you can get the number products sold over a period of time.
(Or)
By definition factless fact table is a fact table (http://www.zentut.com/data-warehouse/fact-table/) that

does not contain any facts. There are two kinds of factless fact tables:
Factless fact table describes event or activity.
Factless fact table describes condition, eligibility or coverage.
Both kinds of factless fact tables play a very important role in your dimensional model
(http://www.zentut.com/data-warehouse/dimensional-modeling/) design. Let’s examine each of them
in detail and see the situations when you can apply them to make your design more robust.
Factless fact table for event or activity
When designing dimensional model, you often find that you want to track events or activities that
occurs in your business process but you can’t find measures to track. In these situations, you can
create a transaction-grained fact table that has no facts to describe that events or activities. Even
though there are no facts storing in the fact table, the event can be counted to produce very
meaningful process measurements.
Factless fact table for event or activity example
For example, you may want to track employee leaves. How often and why your employee leaves are
very important for you to plan your daily activities and resources.
At the center of diagram below is the FACT_LEAVE table that has no facts at all. However
theFACT_LEAVE table is used to measure employee leave event when it occurs.
The following SQL statement is used to count number of leaves that and employee has been taken:
SELECT employee_name AS name, COUNT (leave_type_id) AS leave FROM fact_leave
INNER JOIN dim_employee
WHERE employee_id =
Executing the SQL query above, it would give you following result:
name leave
Doe, John 7
Doe, Sam 9
Walker Mike 8
…

Factless fact table for condition, eligibility or coverage
Factless fact table can be also used in these situations:
Tracking salesperson assigned to each prospect or customer
Logging the eligibility of employees for a compensation program
Capturing the promotion campaigns that are active at specific times such as holidays.
Those examples above describe conditions, eligibility or coverage. The factless fact table can be used
to model conditions, eligibility or coverage. Typically information is captured by this star will not be
studied alone but used with other business processes to produce meaningful information.
Let’s take a look at the sale star below. By looking only at the star, we don’t know what product has
promotion that did not sell.
Sales Star Schema
In order to track this kind of information, we can create a star that has factless fact table which is
known as coverage table (according to Kimball).
Factless Fact Table – Example 2
In order to answer the question: what product that has promotion did not sell, we need to do as
follows:
Look at the second star to find out products that have promotions.
Look at the first star to find out products that have promotion that sell.
The difference between is the list of products that have promotion but did not sell.
Factless fact table is crucial in many complex business processes. By applying concepts and
techniques about factless fact table in this tutorial, you can design a dimensional model that has no
clear facts to produce more meaningful information for your business processes.
Or
A fact table that does not contain any measure is called a fact-less fact. This table will only contain
keys from different dimension tables. This is often used to resolve a many-to-many cardinality issue.
Explanatory Note:
Consider a school, where a single student may be taught by many teachers and a single teacher may
have many students. To model this situation in dimensional model, one might introduce a fact-less-
fact table joining teacher and student keys. Such a fact table will then be able to answer queries like,
1. Who are the students taught by a specific teacher?
2. Which teacher teaches maximum students?
3. Which student has highest number of teachers.etc?
66. What is the purpose of Factless Fact Table?
Fact less tables are so called because they simply contain keys which refer to the dimension tables.
Hence, they don’t really have facts or any information but are more commonly used for tracking
some information of an event.
Eg: To find the number of leaves taken by an employee in a month.
A tracking process or collecting status can be performed by using fact less fact tables. The fact table
does not have numeric values that are aggregate, hence the name. Mere key values that are referenced
by the dimensions, from which the status is collected, are available in fact less fact tables.
67. What is a coverage fact
A fact-less-fact table can only answer ‘optimistic’ queries (positive query) but can not answer a
negative query. Again consider the illustration in the above example. A fact-less fact containing the
keys of tutors and students can not answer a query like below,
1. Which teacher did not teach any student?
2. Which student was not taught by any teacher?
Why not? Because fact-less fact table only stores the positive scenarios (like student being taught by a
tutor) but if there is a student who is not being taught by a teacher, then that student’s key does not
appear in this table, thereby reducing the coverage of the table.
Coverage fact table attempts to answer this – often by adding an extra flag column. Flag = 0 indicates
a negative condition and flag = 1 indicates a positive condition. To understand this better, let’s
consider a class where there are 100 students and 5 teachers. So coverage fact table will ideally store
100 X 5 = 500 records (all combinations) and if a certain teacher is not teaching a certain student, the
corresponding flag for that record will be 0.
68.
What is fact constellation?
Fact constellation is the process of joining two or more fact tables.
69. What are incident and snapshot facts
A fact table stores some kind of measurements. Usually these measurements are stored (or captured)
against a specific time and these measurements vary with respect to time. Now it might so happen
that the business might not able to capture all of its measures always for every point in time. Then
those unavailable measurements can be kept empty (Null) or can be filled up with the last available
measurements. The first case is the example of incident fact and the second one is the example of
snapshot fact.
70. What are steps involved in Designing Fact Table
Here is overview of four steps to design a fact table described by Kimball:
Choosing business process to model – The first step is to decide what business process to model by
gathering and understanding business needs and available data
Declare the grain – by declaring a grain means describing exactly what a fact table record represents
Choose the dimensions – once grain of fact table is stated clearly, it is time to determine dimensions
for the fact table.
Identify facts – identify carefully which facts will appear in the fact table.
71. What is fact and types of Measures?
A fact is something that is quantifiable (Or measurable). Facts are typically (but not always)
numerical values that can be aggregated.
A “fact” is a numeric value that a business wishes to count or sum.
Types of Facts
There are three types of facts:
Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact
table.
Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in
the fact table, but not the others.
Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions
present in the fact table.
Eg: Facts which have percentages, ratios calculated.
72. What is conformed fact?
Conformed dimensions are the dimensions which can be used across multiple Data Marts in
combination with multiple facts tables accordingly.
73. What is a level of Granularity of a fact table?
Level of granularity means level of detail that you put into the fact table in a data warehouse. Level of
granularity would mean what detail you are willing to put for each transactional fact.
Or
Level of granularity means level of detail that you put into the fact table in a data warehouse. For
example: Based on design you can decide to put the sales data in each transaction. Now, level of
granularity would mean what detail you are willing to put for each transactional fact. Product sales
with respect to each minute or you want to aggregate it upto minute and put that data.
Or
Granularity
The first step in designing a fact table is to determine the granularity of the fact table. By granularity,
we mean the lowest level of information that will be stored in the fact table.
This constitutes two steps:
Determine which dimensions will be included.
Determine where along the hierarchy of each dimension the information will be kept.
The determining factor usually goes back to the requirements
74.
What does level of Granularity of a fact table signify?
In simple terms, level of granularity defines the extent of detail. As an example, let us look at
geographical level of granularity. We may analyze data at the levels of COUNTRY, REGION,
TERRITORY, CITY and STREET. In this case, we say the highest level of granularity is STREET.
75. What is dimension table?
A dimension table is a table in a star schema of a data warehouse. A dimension table stores attributes,
or dimensions, that describe the objects in a fact table.
In data warehousing, a dimension is a collection of reference information about a measurable event.

These events are known as facts and are stored in a fact table. Dimensions categorize and describe
data warehouse facts and measures in ways that support meaningful answers to business questions.
They form the very core of dimensional modeling.
76. What are different types of Dimension Table?
Conformed Dimension:
Conformed dimensions mean the exact same thing with every possible fact table to which they are
joined.
Eg: The date dimension table connected to the sales facts is identical to the date dimension connected
to the inventory facts.
Junk Dimension:
Contains low cardinality flags or indicators. It is generated by cross joining two or more low
cardinialtiy dimensions. Example: Cross join gender and marital status dimensions and generate a
junk dimension.
Degenerated Dimension:
A degenerate dimension is a dimension which is derived from the fact table and doesn’t have its own
dimension table.
Eg: A transactional code in a fact table.
Or
Degenerate Dimension: Keeping the control information on Fact table ex: Consider a Dimension
table with fields like order number and order line number and have 1:1 relationship with Fact table,
In this case this dimension is removed and the order information will be directly stored in a Fact table
inorder eliminate unneccessary joins while retrieving order information..
15.10.2022, 11:54 yj g Questions – Data Warehousing
DWH Interview
Role-playing dimension:
Dimensions which are often used for multiple purposes within the same database are called role-
playing dimensions. For example, a date dimension can be used for “date of sale”, as well as “date of
delivery”, or “date of hire
77. What is a core dimension?
Core dimension is a dimension table which is dedicated for single fact table or data mart.
78. What is a mini dimension?
Mini dimensions can be used to handle rapidly changing dimension scenario. If a dimension has a
huge number of rapidly changing attributes it is better to separate those attributes in different table
called mini dimension. This is done because if the main dimension table is designed as SCD type 2,
the table will soon outgrow in size and create performance issues. It is better to segregate the rapidly
changing members in different table thereby keeping the main dimension table small and
performing.
79. What is a static dimension?
80. What are garbage dimensions?
81. What are multi-valued dimensions?
82. What are hot swappable dimensions?
83. When should we create separate fact tables?
84. Can’t we store degenerate dimension in dimensions table instead of fact table?
85. What is lookup table in dimension model?
86. What are the Different methods of loading Dimension tables?
There are two different ways to load data in dimension tables.
Conventional (Slow):
All the constraints and keys are validated against the data before, it is loaded, this way data integrity
is maintained.
Direct (Fast):
All the constraints and keys are disabled before the data is loaded. Once data is loaded, it is validated
against all the constraints and keys. If data is found invalid or dirty it is not included in index and all
future processes are skipped on this data.
87. What is slowly changing dimension table?
Dimensions that change over time are called slowly changing dimensions. For instance a product
price change over time, people changes their name for some reason, country, state, city names may
change over time. These are the few examples of slowly chnaging dimensions.
(or)Slowly Changing Dimensions (SCD) – dimensions that change slowly over time, rather than cha
regular schedule, time-base. In Data Warehouse there is a need to track changes in dimension attribu
report historical data. In other words, implementing one of the SCD types should enable users assign
dimensions attribute value for given date? Example of such dimensions could be: customer, geograph
employee.There are many approaches how to deal with SCD. The most popular are:* Type 0 – The p
method* Type 1 – Overwriting the old value* Type 2 – Creating a new additional record
* Type 3 – Adding a new column

* Type 4 – Using historical table
* Type 6 – Combine approaches of types 1,2,3 (1+2+3=6)
Type 0 – The passive method. In this method no special action is performed upon dimensional chang
Some dimension data (http://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.html) can rem
same as it was first time inserted, others may be overwritten.
Type 1 – Overwriting the old value. In this method no history of dimension changes is kept in the da
old dimension value is simply overwritten be the new one. This type is easy to maintain and is often
which changes are caused by processing corrections (e.g. removal special characters, correcting spelli
Before the change:
Customer_ID Customer_Name Customer_Type

1 Cust_1 Corporate
After the change:

1 Cust_1 Retail
Type 2 – Creating a new additional record. In this methodology all history of dimension changes is k
database. You capture attribute change by adding a new row with a new surrogate key to the dimens
Both the prior and new rows contain as attributes the natural key (or other durable identifier). Also ‘e
and ‘current indicator’ columns are used in this method. There could be only one record with current
to ‘Y’. For ‘effective date’ columns, i.e. start_date and end_date, the end_date for current record usual
value 9999-12-31. Introducing changes to the dimensional model in type 2 could be very expensive da
operation so it is not recommended to use it in dimensions where a new attribute could be added in t
Before the change:
Customer_ID Customer_Name Customer_Type Start_Date End_Date Current_

1 Cust_1 Corporate 22-07-2010 31-12-9999 Y
After the change:
Customer_ID Customer_Name Customer_Type Start_Date End_Date Current_F

1 Cust_1 Corporate 22-07-2010 17-05-2012 N
2 Cust_1 Retail 18-05-2012 31-12-9999 Y
Type 3 – Adding a new column. In this type usually only the current and previous value of dimensio
the database. The new value is loaded into ‘current/new’ column and the old one into ‘old/previous’
Generally speaking the history is limited to the number of column created for storing historical data.
least commonly needed techinque.
Before the change:
Customer_ID Customer_Name Current_Type Previous_Type

1 Cust_1 Corporate Corporate
After the change:
Customer_ID Customer_Name Current_Type Previous_Type

1 Cust_1 Retail Corporate

Type 4 – Using historical table. In this method a separate historical table is used to track all dimensio
historical changes for each of the dimension. The ‘main’ dimension table keeps only the current data
and customer_history tables.
Current table (http://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.html):

1 Cust_1 Corporate
Historical table:
Customer_ID Customer_Name Customer_Type Start_Date End_Date

1 Cust_1 Retail 01-01-2010 21-07-2010
1 Cust_1 Oher 22-07-2010 17-05-2012
1 Cust_1 Corporate 18-05-2012 31-12-9999
Type 6 – Combine approaches of types 1,2,3 (1+2+3=6). In this type we have in dimension table such a
columns as:
* current_type – for keeping current value of the attribute. All history records for given item of attribu
same current value.
* historical_type – for keeping historical value of the attribute. All history records for given item of att
have different values.
* start_date – for keeping start date (http://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions

‘effective date’ of attribute’s history.
* end_date – for keeping end date of ‘effective date’ of attribute’s history.
* current_flag – for keeping information about the most recent record.
In this method to capture attribute change we add a new record (http://datawarehouse4u.info/SCD-S

Changing-Dimensions.html) as in type 2. The current_type information is overwritten with the new o
1. We store the history in a historical_column as in type 3.
Customer_ID Customer_Name Current_Type Historical_Type Start_Date End_Date C

1 Cust_1 Corporate Retail 01-01-2010 21-07- N
2010
2 Cust_1 Corporate Other 22-07-2010 17-05- N
2012
3 Cust_1 Corporate Corporate 18-05-2012 31-12- Y
9999
88. What are fast changing dimensions? 89. What is hybrid slowly changing dimension?
Hybrid SCDs are combination of both SCD 1 and SCD 2. It may happen that in a table, some columns
are important and we need to track changes for them i.e capture the historical data for them whereas
in some columns even if the data changes, we don’t care.
90. How different is SCD1 from SCD2
SCD1 stores only current data where as SCD2 stores History records also.
91. What are two columns available in SCD2 those help to track changes?
Current_Flag, Start_Date and End_Date
Eg:
Customer_ID Customer_Name Customer_Type Start_Date End_Date Current_Flag

1 Cust_1 Corporate 22-07-2010 17-05-2012 N
2 Cust_1 Retail 18-05-2012 31-12-9999 Y

92. Explain in brief about critical column.
A critical column in a warehouse is a column whose value changes over a period of time. For eg: City
of the user. If a user resides in city ‘abc’ and the warehouse keeps a track of his per day expenses –
when the user changes the city, the data warehouse becomes inconsistent since the city has changed
and the expenses are shown under the new city.
Or
A column (usually granular) is called as critical column which changes the values over a period of
time.
For example, there is a customer by name ‘Anirudh’ who resided in Bangalore for 4 years and shifted
to Pune. Being in Bangalore, he purchased Rs. 30 Lakhs worth of purchases. Now the change is the
CITY in the data warehouse and the purchases now will shown in the city Pune only. This kind of
process makes data warehouse inconsistent. In this example, the CITY is the critical column.
Surrogate key can be used as a solution for this.
93. Define indexing? (http://www.atoziq.com/2012/10/define-

indexing-data-warehousing.html)
Indexing is a technique, which is used for efficient data retrieval (or) accessing data in a faster
manner. When a table grows in volume, the indexes also increase in size requiring more storage.
94. Which kind of index is preferred in DWH?
Bitmap index is the best one. Why because B-tree is suited for unique values (eg: empid) and Bitmap
is best for repeated values (eg: gender m/f)
95. What type of Indexing mechanism do we need to use for a typical datawarehouse?
On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other
types of clustered/non-clustered, unique/non-unique indexes.
To my knowledge, SQLServer does not support bitmap indexes. Only Oracle supports bitmaps.
96.
Explain about the role of bitmap indexes to solve aggregation problems?
Bitmaps are very useful in start schema to join large databases to small databases. Answer queries
and bit arrays are used to perform logical operations on the databases. Bit map indexes are very
efficient in handling Gender differentiation; also repetitive tasks are performed with much larger
efficiency.
97. . Explain about Encoding technique used in bitmaps indexes?
Bitmaps commonly use one bitmap for every single distinct value. Number of bitmaps used can be
reduced by opting for a different type of encoding. Space can be optimized but when a query is
generated bitmaps have to be accessed.98. How many clustered indexes can u create for a table
in DWH? In case of truncate and delete Command what happens to table, which has unique
id.You can have only one clustered index per table. If you use delete command, you can rollback…
it fills your redo log files.If you do not want records, you may use truncate command, which will
be faster and does not fill your redo log file.
99. List out the OLAP operations in multidimensional data model?
(http://www.atoziq.com/2012/10/list-out-olap-operations-in.html)
=> Roll-up
=> Drill-down
=> Slice and dice
=> Pivot (or) rotate
100. What is roll-up operation? (http://www.atoziq.com/2012/10/what-is-roll-up-operation-

data.html)
The roll-up operation is also called drill-up operation which performs aggregation on a data cube
either by climbing up a concept hierarchy for a dimension (or) by dimension reduction.
101. What is slicing-dicing?
Slicing means showing the slice of a data, given a certain set of dimension (e.g. Product) and value
(e.g. Brown Bread) and measures (e.g. sales).
Dicing means viewing the slice with respect to different dimensions and in different level of
aggregations.
Slicing and dicing operations are part of pivoting.
102. Difference between slicing and dicing with example?
103. What is drill-through?
Drill through is the process of going to the detail level data from summary data.
Consider the above example on retail shops. If the CEO finds out that sales in East Europe has
declined this year compared to last year, he then might want to know the root cause of the decrease.
For this, he may start drilling through his report to more detail level and eventually find out that even
though individual shop sales has actually increased, the overall sales figure has decreased because a
certain shop in Turkey has stopped operating the business. The detail level of data, which CEO not
much was interested on earlier, has this time helped him to pin point the root cause of declined sales.
And the method he has followed to obtain the details from the aggregated data is called drill through.
104.
What is drilling across?
Drill across corresponds to switching from 1 classification in 1 dimension to a different classification

in different dimension.
105. What is difference between drill & scope of analysis?
Drilling can be done in drill down, up, through, and across; scope is the overall view of the drill
exercise.
106. What is Date Dimension and how you will load Date dimension
Time dimensions are usually loaded by a program that loops through all possible dates that may
appear in the data. 100 years may be represented in a time dimension, with one row per day.
107. What is star Schema
A star schema is the simplest form of a dimensional model, in which data is organized
into facts and dimensions. A fact is an event that is counted or measured, such as a sale or login. A
dimension contains reference information about the fact, such as date, product, or customer. A star
schema is diagramed by surrounding each fact with its associated dimensions. The resulting diagram
resembles a star.
108. What is Snowflake schema
The snowflake schema represents a dimensional model which is also composed of a central fact table
and a set of constituent dimension tables which are further normalized into sub-dimension tables. In
a snowflake schema implementation, Warehouse Builder uses more than one table or view to store
the dimension data. Separate database tables or views store data pertaining to each level in the
dimension.
109. When do you snowflake and when not?
110. What is BUS Schema?
BUS Schema is composed of a master suite of confirmed dimension and standardized definition if
facts.
Or
A BUS schema is to identify the common dimensions across business processes, like identifying
conforming dimensions. It has conformed dimension and Standardized definition of facts.
111. What is galaxy schema?
Galaxy Schema:
Galaxy schema contains many fact tables with some common dimensions (conformed dimensions).
This schema is a combination of many data marts.
112. Fact Constellation Schema:
The dimensions in this schema are segregated into independent dimensions based on the levels of
hierarchy. For example, if geography has five levels of hierarchy like teritary, region, country, state
and city; constellation schema would have five dimensions instead of one.
113. What are the diff b/w snowflake schema and star schema
Comparison chart
Snowflake Schema Star Schema
Ease of maintenance/change: No redundancy and hence Has redundant data and

more easy to maintain hence less easy to
and change maintain/change
Ease of Use: More complex queries and Less complex queries and
hence less easy to understand easy to understand
Query Performance: More foreign ;-and hence Less no. of foreign keys and
more query execution time hence lesser query execution
time
Type of Datawarehouse: Good to use for Good for datamarts with
datawarehouse core to simple relationships (1:1 or
simplify complex 1:many)
relationships (many:many)
Joins: Higher number of Joins Fewer Joins
Dimension table: It may have more than one Contains only single
dimension table for each dimension table for each
dimension dimension
When to use: When dimension table is When dimension table
relatively big in size, contains less number of rows,
snowflaking is better as it we can go forStar schema.
reduces space.
Normalization/ De- Dimension Tables are in Both Dimension and Fact
Normalization: Normalized form but Fact Tables are in De-Normalized
Table is still in De- form
Normalized form
Data model: Bottom up approach Top down approach
114. When should you use a Star and when should you use Snowflake Schema?
Star Schema: If the performance is the priority than go for Star Schema, since here dimension tables
are denormalized.
When dimension table contains less number of rows, we can go forStar schema.
Snowflake Schema: If memory space is the priority than go for Snowflake Schema, since here
dimension tables are normalized.
When dimension table is relatively big in size, snowflaking is better as it reduces space.
115. What are advantages/disadvantages of Start Schema
Advantage of Star Schema:
1. Provide a direct mapping between the business entities and the schema design.
2. Provide highly optimized performance for star queries.
3. It is widely supported by a lot of business intelligence tools.
Disadvantage of Star Schema:
There are some requirement which can not be meet by star schema like relationship between
customer and bank account can not represented purely as star schema as relationship between them
is many to many.
116. What is advantage/Disadvantage of snowflake schema?
Advantage of Snowflake Schema
The main advantage of Snowflake Schema is the improvement of query performance due to
minimized disk storage requirements and joining smaller lookup tables.
It is easier to maintain.
Increase flexibility…It provides greater flexibility in interrelationship between dimension levels and
components.
No redundancy so it is easier to maintain.
Disadvantage of Snowflake Schema
The main disadvantage of the Snowflake Schema is the additional maintenance efforts needed to the
increase number of lookup tables.
Makes the queries much more difficult to create because more tables need to be joined.
117. How can you implement many relations in star schema model? (http://www.questions-
interviews.com/data-warehouse/data-warehousing-
3.aspx#How_can_you_implement_many_relations_in_star_schema_model)
Many-many relations can be implemented by using snowflake schema .With a max of n dimensions.
118. What are Aggregate tables?
Aggregate table contains the summary of existing warehouse data which is grouped to certain levels
of dimensions.Retrieving the required data from the actual table, which have millions of records will
take more time and also affects the server performance.To avoid this we can aggregate the table to
certain required level and can use it.This tables reduces the load in the database server and increases
the performance of the query and can retrieve the result very fastly.
119. What is the difference between aggregate table and materialized view?
(http://www.questions-interviews.com/data-warehouse/data-warehousing-
3.aspx#What_is_the_difference_between_aggregate_table_and_materialized_view)
Aggregate tables are pre-computed totals in the form of hierarchical multidimensional structure.,
whereas materialized view ,is an database object which caches the query result in a concrete table and
updates it from the original database table from time to time .Aggregate tables are used to speed up
the query computing whereas materialized view speed up the data retrieval .
120.
What are the steps to load Data Warehouse/Data Mart by using any ETL Tool
15.10.2022, 11:54 p y
DWH Interview Questions g Warehousing
– Data y
ETL process
ETL (http://datawarehouse4u.info/ETL-process.html) (Extract, Transform and Load) is a process

in data warehousing (http://datawarehouse4u.info/ETL-process.html) responsible for pulling data out
of the source systems and placing it into a data warehouse. ETL involves the following tasks:
– extracting the data from source systems (SAP (http://datawarehouse4u.info/ETL-process.html),

ERP, and other oprational systems), data from different source systems is converted into
one consolidated data (http://datawarehouse4u.info/ETL-process.html) warehouse format which is
ready for transformation processing.
– transforming the data may involve the following tasks:
* applying business rules (http://datawarehouse4u.info/ETL-process.html) (so-called derivations,

e.g., calculating new measures and dimensions),
* cleaning (e.g., mapping NULL to 0 or “Male” to “M” and “Female” to “F” etc.),
* filtering (e.g., selecting only certain columns to load),
* splitting a column into multiple columns and vice versa,
* joining together data from multiple sources (e.g., lookup, merge),
* transposing rows and columns,
* applying any kind of simple or complex data (http://datawarehouse4u.info/ETL-

process.html) validation (e.g., if the first 3 columns in a row are empty then reject the row from
processing)
– loading the data into a data warehouse or data repository other reporting applications

(http://datawarehouse4u.info/ETL-process.html)
121. What is data cleaning? How can we do that?
Data cleaning is the process of identifying erroneous data. The data is checked for accuracy,
consistency, typos etc.
Data cleaning Methods:
Parsing – Used to detect syntax errors.
Data Transformation – Confirms that the input data matches in format with expected data.
Duplicate elimination – This process gets rid of duplicate entries.
Statistical Methods- values of mean, standard deviation, range, or clustering algorithms etc are used
to find erroneous data.
122. What is data scrubbing?
A process to upgrade the quality of data before it is moved into a data warehouse
123. What is Data purging?
Deleting data from data warehouse is known as data purging. Usually junk data like rows with null
values or spaces are cleaned up.
Data purging is the process of cleaning this kind of junk values.
124. Which tables you load first while loading DWH ( Dimensions or Fact)
Dimension
125. What is early arriving fact OR late arriving dimension , Explain how would you handle
those records
An Early arriving fact takes place when the activity measurement arrives at the data warehouse
without its full context. In other words, the statuses of the dimensions attached to the activity
measurement are ambiguous or unknown for some period of time
Handling late arriving dimensions:
We all know that first we will process dimension records and insert into the dimension table. Next
the fact records are processed by joining with the dimension table. In case of late arriving dimension
when you joined the fact table with dimension, the fact records are not inserted into the fact table as
there is no corresponding dimension for that record. To handle this we have to create another table in
which we will insert the fact records that are failed to insert into the original fact table. When we
process the data next time, we will use this table along with the fact stage table to join with the
dimension table to insert into the fact table.
The following sample sql queries explains this process
Code:
Insert into dimension_table
Select * from dim_Stg;
Insert into fact_stg
Select * from fact_failed_records;
Truncate table fact_failed_records;
Insert into fact_table
Select * from fact_stg
join dimension_table
Where [join condition];
Insert into fact_failed_records
Select * from fact_stg
Where not exists (select * from dimension where [join condition] );
126. Please tell me in which situation context and alias are going to use
127. Which technology should be used for interactive data querying across multiple
dimensions for a decision making for a DW?
MOLAP
128. What is the difference between metadata and data dictonary?
Metadata is:
The data about data;
The structured data about data;
A set of independent assertions about a resource.
Data Dictionary is:
An intergrated set of system tables;
contains definitions of and information about all objects in the system;
Is “data about the data” or “metadata”
Is entirely maintained by RDBMS
129. What is Natural Key?
A natural key is a set of one or more column in the dimension table that uniquely identifies a record
in the table. The values of the natural key column(s) are provided by the source system. Ideally the
natural keys must be defined as the primary key of the dimension table, but we refrain from this for
the following reasons
The Natural Key could contain non numeric data type (timestamp or char) columns. Joining large
fact tables with non numeric data types like timestapms could lead to performance issues.
They could be more than one in number, hence increasing the size of the Fact Table as we would need
more than one column to join the fact table with the dimension table.
The format and structure of the natural keys could change in the future. This could happen when
new source systems are added.
Having Natural keys will make the process of Slowly Changing Dimensions
(http://www.dwhinfo.com/Technical/DWHETLSlowlyChangingDimensionProcess.html) very
complex.
130. What is Surrogate Key?
Surrogate keys are keys that have no “business” meaning and are solely used to identify a record in
the table. Such keys are either database generated (example: Identity in SQL Server, Sequence in
Oracle, Sequence/Identity in DB2 UDB etc.) or system generated values (like generated via a table in
the schema).
131. What is Primary key?
A primary key is a unique identifier for a database (http://pc.net/glossary/definition/database) record.

When a table is created, one of the fields is typically assigned as the primary key. While the primary
key is often a number, it may also be a text field or other data type
(http://pc.net/glossary/definition/datatype).
132. What is Foreign Key?
In the context of relational databases, a foreign key is a field

(http://en.wikipedia.org/wiki/Field_(computer_science)) (or collection of fields) in one table
(http://en.wikipedia.org/wiki/Table_(database)) that uniquely identifies a row of another table. In
other words, a foreign key is a column or a combination of columns that is used to establish and
enforce a link between the data in two tables.
133. In your Data Warehouse, Do you like to use Natural Keys or Surrogate Keys and why
Surrogate keys.
Following are the reasons to choose
The Natural Key could contain non numeric data type (timestamp or char) columns. Joining large
fact tables with non numeric data types like timestapms could lead to performance issues.
They could be more than one in number, hence increasing the size of the Fact Table as we would need
more than one column to join the fact table with the dimension table.
The format and structure of the natural keys could change in the future. This could happen when
new source systems are added.
Having Natural keys will make the process of Slowly Changing Dimensions
(http://www.dwhinfo.com/Technical/DWHETLSlowlyChangingDimensionProcess.html) very
complex.
134. What is transitive dependency?
135. Which kind of index is preferred in DWH?
136. How many clustered indexes can u create for a table in DWH?
By definition, a clustered index physically arranges all data in a table in a sequential manner. Since
you can not have more than one physical arrangement of data in a table, you can have just one
clustered index per table.
137. What is a data profile?
138. Transient data is which of the following?
Data profiling is a way to find out what is the

profile of the information contained in the source.
E.g. In a table a column may be defined as
alphanumeric. However, majority of the data may
be numeric. Profiling tools will provide the
statistical information about how many records
have pure no. populated as against no. of records
with alphanumeric data.Before data migration
exercise, these tools provide vital clues about

whether the exercise is going to be a success or a
failure. This can help is changing the target
schema or applying cleanse at the source level so
that most of the records can get in the destination
database.In DW these tools are used at the design
stage for the same purpose. Some tool vendors
who sell this as a product call this as data
discovery phase.
Data in which changes to existing records cause the previous version of the records to be eliminated
139. What is loop in Data warehousing?
In DWH loops may exist between the tables. If loops exist, then query generation will take more time,
because more than one path is available. It creates ambiguity also. Loops can be avoided by creating
aliases of the table or by context.
Example: 4 Tables – Customer, Product, Time, Cost forming a close loop. Create alias for the cost to
avoid loop.
140. What is Data Cardinality?
Cardinality is the term used in database relations to denote the occurrences of data on either side of
the relation.
There are 3 basic types of cardinality:
High data cardinality:
Values of a data column are very uncommon.
e.g.: email ids and the user names
Normal data cardinality:
Values of a data column are somewhat uncommon but never unique.
e.g.: A data column containing LAST_NAME (there may be several entries of the same last name)
Low data cardinality:
Values of a data column are very usual.
e.g.: flag statuses: 0/1
Determining data cardinality is a substantial aspect used in data modeling. This is used to determine
the relationships
Types of cardinalities:
The Link Cardinality – 0:0 relationships
The Sub-type Cardinality – 1:0 relationships
The Physical Segment Cardinality – 1:1 relationship
The Possession Cardinality – 0: M relation
The Child Cardinality – 1: M mandatory relationship
The Characteristic Cardinality – 0: M relationship
The Paradox Cardinality – 1: M relationship.
141. What is snapshot with reference to data warehouse?
Snapshot refers to a complete visualization of data at the time of extraction. It occupies less space and
can be used to back up and restore data quickly.
Or
A snap shot is a process of knowing about the activities performed. Snap shot is stored in a report
format from a specific catalog. The report is generated soon after the catalog is disconnection.
Or
You can disconnect the report from the catalog to which it is attached by saving the report with a
snapshot of the data.
142. What are Critical Success Factors?
Key areas of activity in which favorable results are necessary for a company to reach its goal.
There are four basic types of CSFs which are:
Industry CSFs
Strategy CSFs
Environmental CSFs
Temporal CSFs
A few CSFs are:
Money
Your future
Customer satisfaction
Quality
Product or service development
Intellectual capital
Strategic relationships
Employee attraction and retention
Sustainability
The advantages of identifying CSFs are:
they are simple to understand;
they help focus attention on major concerns;
they are easy to communicate to coworkers;
they are easy to monitor;
and they can be used in concert with strategic planning methodologies.
143. What is reconciled data?
Current data intended to be the single source for all decision support systems.
144. What is Chained Data Replication?
In Chain Data Replication, the non-official data set distributed among many disks provides for load
balancing among the servers within the data warehouse.
Blocks of data are spread across clusters and each cluster can contain a complete set of replicated
data. Every data block in every cluster is a unique permutation of the data in other clusters.
When a disk fails then all the calls made to the data in that disk are redirected to the other disks when
the data has been replicated.
At times replicas and disks are added online without having to move around the data in the existing
copy or affect the arm movement of the disk.
In load balancing, Chain Data Replication has multiple servers within the data warehouse share data
request processing since data already have replicas in each server disk.
145. What is BRODCASTING and REPLICATE?
Broadcast – Takes data from multiple inputs, combines it and sends it to all the output ports.
Eg – You have 2 incoming flows (This can be data parallelism or component parallelism) on Broadcast
component, one with 10 records & other with 20 records. Then on all the outgoing flows (it can be
any number of flows) will have 10 + 20 = 30 records.
Replicate – It replicates the data for a particular partition and send it out to multiple out ports of the
component, but maintains the partition integrity.
Eg – Your incoming flow to replicate has a data parallelism level of 2. with one partition having 10
recs & other one having 20 recs. Now suppose you have 3 output flos from replicate. Then each flow
will have 2 data partitions with 10 & 20 records respectively.
146. Which automation tool is used in data warehouse testing?
No Tool testing in done in DWH, only manual testing is done.
147. What is the difference between a Scan component and a RollUp component?
Rollup is for group by and Scan is for successive total. Basically, when we need to produce summary
then we use scan. Rollup is used to aggregate data.
148. What is skew and skew measurement?
Skew is the mesaureof data flow to each partation .
Suppose i/p is comming from 4 files and size is 1 gb
1 gb= (100mb+200mb+300mb+5oomb)
1000mb/4= 250 mb
(100- 250 )/500= –> -150/500 == cal ur self it wil come in -ve value.
Calclu for 200,500,300.
+ve value of skew is allways desriable.
Skew is a indericet measure of graph.
149. What are parallel querys and query hints?
150. What is parallelism?
Parallesim is differnt processors shared the same memory and
process is worked with the same memory resourse concurentlly
or condionally like RDBMS.
151. What is Bit Mapped Index? (http://www.techieinterview.com/QuestionAnswersView.asp?

Category=357&Qid=183&Question=What%20is%20Bit%20Mapped%20Index)
Bitmap indexes make use of bit arrays (bitmaps) to answer queries by performing bit-wise logical
operations. They work well with data that has a lower cardinality which means the data that take
fewer distinct values. Bitmap indexes are useful in the data warehousing applications. Bitmap
indexes have a significant space and performance advantage over other structures for such data.
Tables that have less number of insert or update operations can be good candidates.
130. What are the advantages of Bitmap indexes?
# They have a highly compressed structure, making them fast to read.
# Their structure makes it possible for the system to combine multiple indexes together so that they
can access the underlying table faster.
The Disadvantage of Bitmap indexes is:
# the overhead on maintaining them is enormous.
152. Explain regression in predictive modeling?

(http://www.atoziq.com/2012/10/explain-regression-in-
predictive.html)
=> Regression definition
=> linear regression
=> multiple regressions
=> Non-linear regression
=> other regression models
153. Explain statistical perspective in data mining? (http://www.atoziq.com/2012/10/explain-

statistical-perspective-in-data.html)
=> Point estimation
=> Data summarization
=> Bayesian techniques
=> Hypothesis testing
=> Regression
=> Correlatio
154. Explain Bayesian classification. (http://www.atoziq.com/2012/10/explain-bayesian-

classification-data.html)
=> Bayesian theorem
=> Naïve Bayesian classification
=> Bayesian belief networks
=> Bayesian learning
155. What is Bi-directional Extract?
In hierarchical, networked or relational databases, the data can be extracted, cleansed and transferred
in two directions. The ability of a system to do this is refered to as bidirectional extracts.
This functionality is extremely useful in data warehousing projects.
Data Extraction
The source systems the data is extracted from vary in various forms right from their structures and
file formats to the department and the business segment they belong to. Common source formats
include flat files and relational database and other non-relational database structures such as IMS,
VSAM or ISAM.
Data transformation
the extracted data may undergo transformation with possible addition of metadata before they are
exported to another large storage area.
In transformation phase, various functions related to business needs, requirements, rules and policies
are applied on them. During this process some values even get translated and encoded. Care is also
taken to avoid redundancy of data.
Data cleansing
in data cleansing, scrutinizing of the incorrect or corrupted data is done and those inaccuracies are
removed. Thus data consistency is ensured in Data cleansing.
It involves activities like
– removing typographical errors and inconsistencies
– comparing and validating data entries against a list of entities
Data transformation
this is the last process of Bidirectional Extracts. The cleansed, transformed extracted source data is
then loaded into the data warehouse.
Advantages
– Updates and data loading become very fast due to bidirectional extracting.
– As timely updates are received in a useful pattern companies can make good use of this data to
launch new products and formulate market strategies.
Disadvantage
– More investment on advance and faster IT infrastructure.
– Not being able to come up with fault tolerance may mean unexpected stoppage of operations when
the system breaks.
– Skilled data administrator needs to be hired to manage the complex process.
156. Who are the Data Stewards and whats their role?
Key Points
USGS needs good data stewardship at all levels of the organization
Data stewards ensure official agency records requirements are met, and data documentation is
developed and maintained.
Data stewards create data standards, establish data access security requirements, and are
active in all levels of data management.
Project managers and field supervisors appoint data stewards and determine what data will be
maintained.
Project managers ensure adherence to USGSrequirements, and that resources are available for
data management.
Specialists are GIS specialists, resource specialists, database/system administrators.
Specialists work with data stewards, implement data standards, create metadata, and manage
databases.
ALL are responsible for the integrity and quality of the data.
A Data Steward is one who manages another’s facts or information to ensure that they can be used to
draw conclusions or make decisions. Data Stewards are “keepers of the flame” in terms of data
quality. They are responsible as stewards to serve and protect the customers’ needs or assets
(consider an airline steward or a trustee).
Stewardship equals taking responsibility for a set of data for the well being of the larger
organization, and operating in service to, rather than in control of, those around us.
Data stewardship is primarily the job of the professionals who create and maintain data. Although
they have significant support roles to play, stewardship cannot simply be delegated to
the IT or GIS shops.
For example, for a spatially-enabled dataset, the GIS person may be responsible for maintaining the
data but the decision on what information to collect and what format to keep it in belong to the
“ologists” and business area leads they are working with.
USGS cannot accomplish data management without people taking on the roles of data stewardship at
all levels of the organization. We are looking for people to embrace those data steward roles and
responsibilities. People with knowledge about the business needs of the organization are necessary at
all levels to define and manage data content and quality to ensure that the data collected and
maintained meet those business needs.
Data Steward Roles and Responsibilities
Many of the responsibilities of Data Stewards are the same, regardless of where the person falls
within the organization.
Be accountable for
integrity and quality of data
personally created/updated.
Data stewards
are responsible for
establishing requirements
and
assessing the quality of the data
in a database or a portion of a database used to make any official decision.
Data quality
means
fitness for intended use
Create data standards and business rules. Follow formal established process.
Data stewards
are responsible for leading or supporting the
data standards efforts
. These efforts should follow the DOI/USGS process and include all documentation.
Ensure that information meets customer needs.
Can the data be relied on to be correct?
Are they in a format that is readable and understandable?
Is there current documentation on the data such as when they were collected, where, how, by
whom, and under what conditions?
Data Access:
Data access rules relate to both internal and external access. As a data steward you are required to
take into consideration things likeFOIA, Privacy Act, and IT Security Issues that could impact your
data. Data Stewards should assess their data early in the data collection process to determine if
anything they are collecting is sensitive and might be restricted from access either inside or outside
the organization.
Data Stewards should coordinate with Privacy, FOIA and IT Security officials in their local or state

organizations.
Establish data access security requirements.
Ensure official agency records requirements are being met.
National Archives and Records Administration (
NARA
) rules regulate the disposal of all types of records, including alphanumeric and spatial
datasets.
Always involve your Records Manager/Administrator early in the data collection planning
process.
Ensure data documentation is developed and maintained including FGDC metadata.
Metadata
, which is defined as “data about data” describes the content, quality, condition, and other
characteristics of data.
Metadata
is to be collected from the beginning of the data collection process for both alphanumeric
and spatial data.
[see
Describe > Metadata (http://www.usgs.gov/datamanagement/describe/metadata.php)
for more information]
Participate in the data management team for your geographic area (national, state, local).
Data management
is going on all around you.
Teamwork is very important
to assure that
duplicate data are not being collected
. When you determine a need to collect data for a project, proposal, or decision in your
area, work with the team to identify existing data stores or data collection parameters.
Employees who have roles and responsibilities for data management need to work
together.
Be active advocates of data management.
Endorse
good data management practices
, use them, and share them.

Knowledge/Skills & Abilities Required:
Knowledge of basic data management principles and concepts
Knowledge of how to create data standards, determine business data requirements, and business
rules
Management Responsibilities
Management includes Project Managers and Field Supervisors. Management responsibilities

include:
Ensure
resources are available for data management activities
for their respective program areas.
Determine
what data will be maintained
, consistent with the objectives of the USGS.
Appoint and support data stewards
for their areas of responsibility.
Be accountable
for all aspects of data within their program or geographic area. Includes responsibility for
quality, accessibility, completeness, timeliness, accuracy, and standards
Be accountable for integrity and quality of business data personally created/updated.
Provide oversight during development of projects to ensure the data needs and requirements
are
documented
Ensure adherence
to Bureau requirements for
metadata
and
data standards
. [see
and
Plan > Data Standards (http://www.usgs.gov/datamanagement/plan/datastandards.php)
Specialist Responsibilities
Specialists include GIS Specialists, Resource Specialists, and Database/System Administrators.

Specialist responsibilities include:
Be aware of resource data requirements, standards, access rules, and training.
Work with
data stewards
to interpret business needs into applications and derive data requirements.
Implement State/Bureau
data standards
; and may participate in the development of standards. [see
Plan > Data Standards (http://www.usgs.gov/datamanagement/plan/datastandards.php)
Facilitate educational opportunities for the treatment, application, and value of spatial data.
Create and maintain
metadata
to quality specifications. [see
Provide consistent interpretation and application of Bureau/State policies to their respective

State Offices.
Manage
databases
containing spatial data.
Be accountable for integrity and quality of business data personally created/updated.
References
Chatfield, T., Selbach, R. February, 2011. Data Management for Data Stewards. Data Management
Training Workshop. Bureau of Land Management (BLM).
157. What the easiest way to build a corporate specific time dimension?
158. What are confirmed dimension? We alwys give date as a conformed dimension but if it
has different format for different contries say YYMMDD for Italy and MM-DD-YYYY for
france.Then are they not confirmed.
Conceptual Data Model
During the Planning phase of the project, the conceptual data model is created to capture the high-
level data requirements for the project. Since the model captures the highlights of the client’s
information needs, it is the only model that effectively reflects the enterprise level.
Depending on the requirements, the enterprise-wide vision may need to be emphasized to help guide
the client in the development of an overall data warehousing strategy. Detail models that reflect the
project’s scope will be created during logical and physical data modeling. The conceptual data model
is the precursor to the logical data model; it is not tied to any particular solution or technology.
Entities, relationships, major attributes, and metadata across functional areas are included. During
successive releases, the conceptual data model should be validated and updated if necessary. An
enterprise should have only one conceptual data model.
Logical Data Model
During the design phase of the project, the logical data model is created for the scope of the complete
project. A portion of the conceptual data model will be fully attributed and completed as the logical
data model. The logical data model reflects the technology to be used. In today’s environment, this
typically means either a relational DBMS or a multidimensional tool. But if the client should be using
an older DBMS such as IMS or IDMS, the logical model will be quite different than if an RDBMS is to
be used. The logical data model reflects a logical data design that can be used by the developers on
the project. For an RDBMS, that means logical tables (views) and columns.
Physical Data Model
Like the logical data model, the physical data model is created during the design phase. This
modeling activity should reflect the scope of the specific release of the project. The model’s final
design will be highly dependent on the technical solution for the data warehouse. The purpose of this
model is to capture all the technical details required to produce the final tables, and
physical constructs such as indexes and table partitions. The logical data model will serve as a
blueprint to the project team while the physical data model is a blueprint for the DBAs. All the
functionality reflected in the logical data model should be preserved while creating the physical
data model. The generated table schemas will be identical to the physical data model.
http://www.allinterview.com/Interview-Questions/Data-Warehouse-General/page12.html
(http://www.allinterview.com/Interview-Questions/Data-Warehouse-General/page12.html)
http://satishmsbi.blogspot.in/2011/10/differences-between-systemsoltpolapodsd.html
(http://satishmsbi.blogspot.in/2011/10/differences-between-systemsoltpolapodsd.html)
http://hussain-msbi.blogspot.in/search/label/SSIS (http://hussain-msbi.blogspot.in/search/label/SSIS)
http://blog.stevienova.com/2008/11/22/ssis-slowly-changing-dimensions-with-checksum/
(http://blog.stevienova.com/2008/11/22/ssis-slowly-changing-dimensions-with-checksum/)
http://srikanthtechnologies.com/books/orabook/oraclebook.html
(http://srikanthtechnologies.com/books/orabook/oraclebook.html)
http://www.dwhinfo.com/Technical/DWHTechnicalMain.html
(http://www.dwhinfo.com/Technical/DWHTechnicalMain.html)
DATA WAREHOUSE CONCEPTS
A fundamental concept of a data warehouse is the distinction between data and information. Data is
composed of observable and recordable facts that are often found in operational or transactional
systems.
At Rutgers, these systems include the registrar’s data on students (widely known as the SRDB),
human
resource and payroll databases, course scheduling data, and data on financial aid. In a data
warehouse
environment, data only comes to have value to end-users when it is organized and presented as
information. Information is an integrated collection of facts and is used as the basis for
decisionmaking. For example, an academic unit needs to have diachronic information about its extent
of
instructional output of its different faculty members to gauge if it is becoming more or less reliant on
part-time faculty.
DATA WAREHOUSE DEFINITIONS
The data warehouse is that portion of an overall Architected Data Environment that serves as the
single
integrated source of data for processing information. The data warehouse has specific characteristics
that
include the following:
Subject-Oriented: Information is presented according to specific subjects or areas of interest, not
simply as computer files. Data is manipulated to provide information about a particular subject. For
example, the SRDB is not simply made accessible to end-users, but is provided structure and
organized
according to the specific needs.
Integrated: A single source of information for and about understanding multiple areas of interest. The
data warehouse provides one-stop shopping and contains information about a variety of subjects.
Thus
the OIRAP data warehouse has information on students, faculty and staff, instructional workload,
and
student outcomes.
Non-Volatile: Stable information that doesn’t change each time an operational process is executed.
Information is consistent regardless of when the warehouse is accessed.
Time-Variant: Containing a history of the subject, as well as current information. Historical
information is an important component of a data warehouse.
Accessible: The primary purpose of a data warehouse is to provide readily accessible information to
end-users.
Process-Oriented: It is important to view data warehousing as a process for delivery of information.
The maintenance of a data warehouse is ongoing and iterative in nature.
Other Definitions
Data Warehouse: A data structure that is optimized for distribution. It collects and stores integrated
sets of historical data from multiple operational systems and feeds them to one or more data marts. It
may also provide end-user access to support enterprise views of data.
Data Mart: A data structure that is optimized for access. It is designed to facilitate end-user analysis
of
data. It typically supports a single, analytic application used by a distinct set of workers.
Staging Area: Any data store that is designed primarily to receive data into a warehousing
environment.
Operational Data Store: A collection of data that addresses operational needs of various operational
units. It is not a component of a data warehousing architecture, but a solution to operational needs.
OLAP (On-Line Analytical Processing): A method by which multidimensional analysis occurs.
Multidimensional Analysis: The ability to manipulate information by a variety of relevant categories
or “dimensions” to facilitate analysis and understanding of the underlying data. It is also sometimes
referred to as “drilling-down”, “drilling-across” and “slicing and dicing”
Hypercube: A means of visually representing multidimensional data.
Star Schema: A means of aggregating data based on a set of known dimensions. It stores data
multidimensionally in a two dimensional Relational Database Management System (RDBMS), such

as
Oracle.
Snowflake Schema: An extension of the star schema by means of applying additional dimensions to
the
dimensions of a star schema in a relational environment.
Multidimensional Database: Also known as MDDB or MDDBS. A class of proprietary, non-relational
database management tools that store and manage data in a multidimensional manner, as opposed to
the
two dimensions associated with traditional relational database management systems.
OLAP Tools: A set of software products that attempt to facilitate multidimensional analysis. Can
incorporate data acquisition, data access, data manipulation, or any combination thereof.
COMPARISON OF DATA WAREHOUSE AND OPERATIONAL DATA
HOW IS THE WAREHOUSE DIFFERENT?
The data warehouse is distinctly different from the operational data used and maintained by day-to-
day
operational systems. Data warehousing is not simply an “access wrapper” for operational data,
where
data is simply “dumped” into tables for direct access. Among the differences:
OPERATIONAL DATA DW DATA
application oriented subject oriented
detailed summarized, otherwise refined
accurate, as of the moment of access represents values over time, snapshots
serves the clerical community serves the managerial community
can be updated is not updated
run repetitively and nonreflectively run heuristically
requirements for processing understood before
initial development
requirements for processing not completely
understood before development
compatible with the Software Development Life
Cycle
completely different life cycle
performance sensitive (immediate response
required when entering a transaction)
performance relaxed (immediacy not required)
accessed a unit at a time (limited number of data
elements for a single record)
accessed a set at a time (many records of many data
elements)
transaction driven analysis driven
control of update a major concern in terms of
ownership
control of update no issue
high availability relaxed availability
managed in its entirety managed by subsets
nonredundancy redundancy is a fact of life
static structure; variable contents flexible structure
small amount of data used in a process large amount of data used in a process
The Data Warehousing Process – Part 1
Determine Informational Requirements
• Identify and analyze existing informational capabilities.
• Identify from key users the significant business questions and key metrics that the target user.
group regards as their most important requirements for information.
• Decompose these metrics into their component parts with specific definitions.
• Map the component parts to the informational model and systems of record.
The Data Warehousing Process – Part 2
Evolutionary and Iterative Development Process
When you begin to develop your first data warehouse increment, the architecture is new and fresh.
With
the second and subsequent increments, the following is true:

15.10.2022, 11:54 q g Questions – Data Warehousing
DWH Interview
• Start with one subject area (or subset or superset) and one target user group.
• Continue and add subject areas, user groups and informational capabilities to the architecture
based on the organization’s requirements for information, not technology.
• Improvements are made from what was learned from previous increments.
• Improvements are made from what was learned about warehouse operation and support.
• The technical environment may have changed.
• Results are seen very quickly after each iteration.
• The end user requirements are refined after each iteration.
http://oirap.rutgers.edu/dwbasics.pdf (http://oirap.rutgers.edu/dwbasics.pdf)
Leave a comment
Create a free website or blog at WordPress.com.

DWH Interview Questions - Data Warehousing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DWH Interview Questions - Data Warehousing

Uploaded by

Copyright:

Available Formats

15.10.

2022, 11:54 DWH Interview Questions – Data Warehousing

DWH INTERVIEW QUESTIONS (http://sqlage.blogspot.in/2013/07/dwh-interview-questions.html)

1. What is Data Warehouse?

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in

Ralph Kimball provided a more concise definition of a data warehouse:

Data warehouse or enterprise data warehouse (DW, DWH, or EDW) is a database

(http://en.wikipedia.org/wiki/Data_analysis). It is a central repository of data which is created by

2. Why we need Data Warehouse

What you will experience is:

a) Data quality issues

b) Low confident level from users

c) Quick turn around of report but data is unstable

d) Issues with data consistency

e) Issues with performance

c) Contains historical data

d) Tested and verified

Top Five Benefits of a Data Warehouse (http://spotfire.tibco.com/blog/?p=7597)

According to The Data Warehouse Institute (http://tdwi.org/portals/data-warehousing.aspx), a data

And it’s really important for your business.

A Data Warehouse Delivers Enhanced Business Intelligence

A Data Warehouse Saves Time

A Data Warehouse Enhances Data Quality and Consistency

A data warehouse implementation includes the conversion of data from numerous source

A Data Warehouse Provides Historical Intelligence

A Data Warehouse Generates a High ROI

3. What are the goals of data warehouse?

Goals of a Data Warehouse

Make an organization’s information easily accessible

Present the organization’s information consistently

Be adaptive and resilient to change

Be a secure bastion that protects our information assets

Serve as the foundation for improved decision making

4. What are the challenges and issues of data warehouse?

Configuration and change management

Managing and improving data quality

Engagement with the enterprise architecture

In practice, Enterprise Architecture depends on how an organization’s strategy and architecture

Enhancing return on investment

can well serve the organization.

5. What are the different architectural components of data warehouse?

In general, all data warehouse systems have the following layers:

Data Source Layer

Data Extraction Layer

Data Storage Layer

Data Logic Layer

Data Presentation Layer

System Operations Layer

Each component is discussed individually below:

Data Source Layer

Many different types of data can be a data source:

Web server logs with user browsing data.

Internal market research data.

Third-party data, such as census data, demographics data, or survey data.

Data Extraction Layer

Data Storage Layer

Data Logic Layer

Data Presentation Layer

System Operations Layer

6. What are the different interconnected layers of a dataware house?

There are four different interconnected layers they are: –

• Operational database layer