Professional Documents
Culture Documents
Data Warehousing
dwhbeginners.wordpress.com
Different people have different definitions for a data warehouse. The most popular definition came
from Bill Inmon, who provided the following:
Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For example,
“sales” can be a particular subject.
Integrated: A data warehouse integrates data from multiple data sources. For example, source A and
source B may have different ways of identifying a product, but in a data warehouse, there will be
only a single way of identifying a product.
Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3
months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a
transactions system, where often only the most recent data is kept. For example, a transaction system
may hold the most recent address of a customer, where a data warehouse can hold all addresses
associated with a customer.
Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data
warehouse should never be altered.
A data warehouse is a copy of transaction data specifically structured for query and analysis.
This is a functional view of a data warehouse. Kimball did not address how the data warehouse is
built like Inmon did; rather he focused on the functionality of a data warehouse.
Or
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 1/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
It is a central repository of data which is created by integrating data from one or more disparate
sources. Data warehouses store current as well as historical data and are used for creating trending
reports for senior management reporting such as annual and quarterly comparisons for decission
support and business intelligence.
Or
A lot of times people question the value of data warehousing. Why do we spend 1 year building a
data warehouse? We can’t wait that long. Let’s just install QlikView/Spotfire and feed the transaction
system direct to it and we have a BI!
Absolutely! You can. You can buy BO, MicroStrategy, QlikView, Spotfire or any BI tool you like, and
then report straight from the transaction system. Or, if you fancy, you can create a cube first (SSAS,
Cognos or Hyperion), then install appropriate client tool (Tableau, Strategy Companion, etc). Of
course you can. And this is the best way to learn about Data Warehousing: by not doing it.
The whole year spent on building a data warehouse essentially for providing a quality data source. A
data warehouse has the following characteristics:
a) Integrated
b) Consistent
e) Performant
A data warehouse integrates data from multiple sources correctly. This integration doesn’t happen
overnight. A Business Analyst spent weeks analysing the sources and wrote down a specification of
how the data should be integrated. A Data Architect looked at that spec and designed a performant
star schema to host the data. An ETL Architect looked at the star schema design and wrote an ETL
population spec. An ETL developer studied the ETL spec and built the workflows. And finally, a
tester verified the data.
That takes months, but as a result, we have integrated, consistent, clean data source containing the
correct and valid data. And it is performant. Your query doesn’t need to join 15 tables in a horrible
way. All the data is in a centralised place, ready for you to query.
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 2/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
But a data warehouse also costs money — big money. The problem is when big money is involved it’s
tough to justify spending it on any project, especially when you can’t really quantify the benefits
upfront. When it comes to a data warehouse, it’s not easy to know what the benefits are until it’s up
and running. According to BI-Insider.com (http://bi-insider.com/portfolio/benefits-of-a-data-
warehouse/), here are the key benefits of a data warehouse once it’s launched.
By providing data from various sources, managers and executives will no longer need to make
business decisions based on limited data or their gut. In addition, “data warehouses and related BI
can be applied directly to business processes including marketing segmentation, inventory
management, financial management, and sales.”
Since business users can quickly access critical data from a number of sources—all in one place—they
can rapidly make informed decisions on key initiatives. They won’t waste precious time retrieving
data from multiple sources.
Not only that but the business execs can query the data themselves with little or no support from IT—
saving more time and more money. That means the business users won’t have to wait until IT gets
around to generating the reports, and those hardworking folks in IT can do what they do best—keep
the business running.
A data warehouse stores large amounts of historical data so you can analyze different time periods
and trends in order to make future predictions. Such data typically cannot be stored in a transactional
database or used to generate reports from a transactional system.
Finally, the piece de resistance—return on investment. Companies that have implemented data
warehouses and complementary BI systems have generated more revenue
(http://searchsqlserver.techtarget.com/tip/The-IDC-data-warehousing-ROI-study-An-analysis) and
saved more money than companies that haven’t invested in BI systems and data warehouses.
And that should be reason enough for senior management to jump on the data warehouse
bandwagon.
The contents of the data warehouse must be understandable and be intuitive and obvious to the
business user. The contents of the data warehouse need to be labeled meaningfully. The tools that
access the data warehouse must be simple and easy to use. They also must return query results to the
user with minimal wait times.
Consistent information means high-quality information. It means that all the data is accounted for
and complete. Consistency also implies that common definitions for the contents of the data
warehouse are available for users.
We simply can’t avoid change. User needs, business conditions, data, and technology are all subject
to the shifting sands of time. The data warehouse must be designed to handle this inevitable change.
The data warehouse must effectively control access to the organization’s confidential information.
The data warehouse must have the right data in it to support decision making.
Configuration and change management is probably the largest single issue affecting data
warehouseImplementation and maintenance it operates at every level of the organization and is often
the “elephant in the room”—we all know it is there, and we know that we don’t do enough about it,
but nobody talks about it.
Data quality is often considered a major issue because of the garbage-in garbage-out principle. Most
data Warehouses faithfully reproduce any data quality issues in the source system, even amplifying
some of them.
According to Paul Weill, Director of MIT Center for Information Systems Research, Enterprise
Architecture is the Organization of logic for business processes and IT infrastructure, reflecting the
integration and standardization requirements of the firm’s operating model.
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 4/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
The on-going cost of running a data warehouse, especially in times of economic hardship, is often
questioned. It is therefore common to look for ways to improve the return on investment. This can be
done in one of two ways: by gaining more financial benefit from the output, or by reducing the cost to
manage and maintain the system.
Or
Carving out a data warehouse can look like a straightforward task on the surface. The path of least
resistance would seem to be to replicate the parent’s environment as-is in the carve-out organization,
using the exact same software and architecture. However, this is not always going to be the best
solution. In most cases, the new organization is much smaller than the parent organization and will
have significantly fewer capabilities to support the data warehouse than the parent. In addition,
changes within both the parent and the carve-out organizations during the transition period can lead
to an implementation in the new organization that needs to be different than the parent’s
implementation.
Change can be a significant component during carve-outs. The capabilities and needs of the carved
out organization will not always match up with those of the parent. Therefore, the implementation
needs to adapt to match up with those needs and capabilities. Here are a few major challenges that
you need to be prepared to tackle:
High software licensing costs: Large organizations with mature BI platforms usually have expensive
database, ETL and reporting tools to power their data warehouses and reporting systems. However,
chances are that the budget of the new organization is much smaller than the parent’s budget and the
licensing and support cost of those tools can exceed what a smaller organization can afford. With this
in mind, make sure to analyze what the carved out organization needs to do business and whether
any lower-cost alternatives are feasible. The market for BI tools is more mature than it was just a few
years ago, and lower-cost alternatives can prove to be just as capable as the tools provided by the
traditional high-cost vendors. While moving to a new set of tools may lengthen the time needed for
the carve-out effort or increase the amount of resources needed to perform the implementation, the
effort can pay itself off over the long run in lower licensing and ongoing support costs.
Complex architecture: This can be a great opportunity to rework the architecture without significant
additional impact to the business or the IT staff. Usually the parent company does business in
vertical or geographic markets that the new company will not be participating in, and complexity
related to this can be eliminated. In addition, the parent may have gone through acquisitions and
mergers that resulted in additional complexity while trying to adapt an acquisition target’s data into
an existing data warehouse. Simplifying the architecture where possible can ease the transition of
responsibility to production support staff, as the IT staff may be much more limited in skills and
manpower than the parent. A simplified architecture can reduce the timeline of the implementation
as well.
Code changes at the parent organization: While the carve-out is occurring, the parent organization
will continue conducting its day-to-day operations. This includes their IT operations, as the parent’s
IT staff continues maintenance of code in the data warehouse. This can lead to issues during building
and testing as developers attempt to hit a moving target and users attempt to validate results in the
new environment against the parent’s environment. Make sure to have a process in place to get logic
and code changes communicated to the new organization, as well as a change control process to
prioritize logic and code changes for evaluation as they come in.
As the case usually is for implementation projects, the right amount of planning will help ensure a
successful carve-out. Well-calculated changes in the right places can help ensure a solution that
meets the needs and capabilities of the new organization, and the result will be a data warehouse that
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 5/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
Different data warehousing systems have different structures. Some may have an ODS (operational
data store), while some may have multiple data marts. Some may have a small number of data
sources, while some may have dozens of data sources. In view of this, it is far more reasonable to
present the different layers of data warehouse architecture rather than discussing the specifics of any
one system.
Staging Area
ETL Layer
Metadata Layer
The picture below shows the relationships among the different components of the data warehouse
architecture:
(https://dwhbeginners.files.wordpress.com/2013/09/data-warehouse-architecture1.jpg)
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 6/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
This represents the different data sources that feed data into the data warehouse. The data source can
be of any format — plain text file, relational database, other types of database, Excel file, etc., can all
act as a data source.
Operations — such as sales data, HR data, product data, inventory data, marketing data,
systems data.
All these data sources together form the Data Source Layer.
Data gets pulled from the data source into the data warehouse system. There is likely some minimal
data cleansing, but there is unlikely any major data transformation.
Staging Area
This is where data sits prior to being scrubbed and transformed into a data warehouse / data mart.
Having one common area makes it easier for subsequent data processing / integration.
ETL Layer
This is where data gains its “intelligence”, as logic is applied to transform the data from a
transactional nature to an analytical nature. This layer is also where data cleansing happens. The ETL
design phase (http://www.1keydata.com/datawarehousing/etl.html) is often the most time-
consuming phase in a data warehousing project, and an ETL tool
(http://www.1keydata.com/datawarehousing/tooletl.html) is often used in this layer.
This is where the transformed and cleansed data sit. Based on scope and functionality, 3 types of
entities can be found here: data warehouse, data mart, and operational data store (ODS). In any given
system, you may have just one of the three, two of the three, or all three types.
This is where business rules are stored. Business rules stored here do not affect the underlying data
transformation rules, but do affect what the report looks like.
This refers to the information that reaches the users. This can be in a form of a tabular / graphical
report in a browser, an emailed report that gets automatically generated and sent everyday, or an
alert that warns users of exceptions, among others. Usually anOLAP tool
(http://www.1keydata.com/datawarehousing/toololap.html) and/or a reporting tool
(http://www.1keydata.com/datawarehousing/toolreporting.html) is used in this layer.
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 7/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
Metadata Layer
This is where information about the data stored in the data warehouse system is stored. A logical data
model would be an example of something that’s in the metadata layer. A metadata tool
(http://www.1keydata.com/datawarehousing/toolmetadata.html) is often to used to manage
metadata.
This layer includes information on how the data warehouse system operates, such as ETL job status,
system performance, and user access history.
7. Explain process flow in data warehouse?8. Why should you put your data warehouse on a
different system than your OLTP system?
A OLTP system is basically ” data oriented ” (ER model) and not ” Subject oriented “(Dimensional
Model) .That is why we design a separate system that will have a subject oriented OLAP system…
Moreover if a complex querry is fired on a OLTP system will cause a heavy overhead on the OLTP
server that will affect the daytoday business directly.
Or
The loading of a warehouse will likely consume a lot of machine resources. Additionally, users may
create querries or reports that are very resource intensive because of the potentially large amount of
data available. Such loads and resource needs will conflict with the needs of the OLTP systems for
resources and will negatively impact those production systems.
9. What are the steps to build the data warehouse? (http://www.questions-interviews.com/data-
warehouse/data-warehousing-2.aspx#What_are_the_steps_to_build_the_data_warehouse)
10.
This is also one of the most crucial data warehouse interview questions when learning about data
warehouses. The standard procedure used to make a data warehouse is very similar to majority of
database projects. Below are the common steps:
Development of the data warehouse architecture which include the ODS or Operational Data Store
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 8/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
Offline Operational Databases – Data warehouses in this initial Stage are developed by simply
copying the database of an operational System to an off-line server where the processing load of
reporting does not impact on the operational system’s performance.
Offline Data Warehouse – Data warehouses in this stage of evolution are updated on a regular time
cycle (usually daily, weekly or monthly) from the operational systems and the data is stored in an
integrated reporting-oriented data structure
Real Time Data Warehouse – Data warehouses at this stage are updated on a transaction or event
basis, every time an operational system performs a transaction (e.g. an order or a delivery or a
booking etc.)
Integrated Data Warehouse – Data warehouses at this stage are used to generate activity or
transactions that are passed back into the operational systems for use in the daily activity of the
organization.
Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy
can be used to define data aggregation. For example, in a time dimension, a hierarchy might
aggregate data from the month level to the quarter level to the year level. A hierarchy can also be
used to define a navigational drill path and to establish a family structure.
Within a hierarchy, each level is logically connected to the levels above and below it. Data values at
lower levels aggregate into the data values at higher levels. A dimension can be composed of more
than one hierarchy. For example, in the product dimension, there might be two hierarchies–one for
product categories and one for product suppliers.
Dimension hierarchies also group levels from general to granular. Query tools use hierarchies to
enable you to drill down into your data to view different levels of granularity. This is one of the key
benefits of a data warehouse.
When designing hierarchies, you must consider the relationships in business structures. For eg: a
divisional multilevel sales organization.
Hierarchies impose a family structure on dimension values. For a particular level value, a value at the
next higher level is its parent, and values at the next lower level are its children. These familial
relationships enable analysts to access data quickly.
A concept hierarchy that is a total (or) partial order among attributes in a database schema is called a
schema hierarchy.
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 9/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
The below are the differences between systems. Please go through them.
OLTP OLAP
ODS DWH
ODS OLTP
OLTP DWH
OLTP DSS
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 10/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
DWH DM
15. What is data Mart? And diff b/w data mart and data warehouse?
A data mart is a simple form of a data warehouse that is focused on a single subject (or functional
area), such as Sales or Finance or Marketing. Data marts are often built and controlled by a single
department within an organization.
OLTP OLAP
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 11/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
At this stage dataware house gets updated every time a transaction occurs. The transactions
performed during this time are passed back to the operational systems which records the
transactions.
A virtual or point-to-point data warehousing strategy means that end-users are allowed to get at
operational databases directly using whatever tools are enabled to the “data access network”
Or
A virtual data warehouse provides a compact view of the data inventory. It contains Meta data. It
uses middleware to build connections to different data sources. They can be fast as they allow users
to filter the most important pieces of data from different legacy applications.
Date warehouses are updated on a transaction or event basis, every time an operational system
performs a transaction.
Or
Data warehousing captures business activity data. Real-time data warehousing captures business
activity data as it occurs. As soon as the business activity is complete and there is data about it, the
completed activity data flows into the data warehouse and becomes available instantly.
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 12/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
Or
Real-time data warehousing is a combination of two things: 1) real-time activity and 2) data
warehousing. Real-time activity is activity that is happening right now. The activity could be
anything such as the sale of widgets. Once the activity is complete, there is data about it.
Data warehousing captures business activity data. Real-time data warehousing captures business
activity data as it occurs. As soon as the business activity is complete and there is data about it, the
completed activity data flows into the data warehouse and becomes available instantly. In other
words, real-time data warehousing is a framework for deriving information from data as the data
becomes available.
DEFINITION
An active data warehouse (ADW) is a data warehouse implementation that supports near-time or
near-real-time decision making. It is featured by event-driven actions triggered by a continuous
stream of queries (generated by people or applications) against a broad, deep granular set of
enterprise data.
Other Definitions
An Active data warehouse aims to capture data continuously and deliver real time data. They
provide a single integrated view of a customer across multiple business lines. It is associated with
Business Intelligence Systems
Wingspan Technology
What is an ADW? An ADW is a relational data warehouse environment that supports real-time
updates, fast response times, aggregated data queries with detailed drill-down capabilities and
dynamic mutability to support changing business needs. It places more control and power into the
hands of the decision makers who use the system most and know the business best.
Teradata
An active data warehouse (ADW) is a traditional data warehouse extended to provide operational
intelligence based on historical data combined with today’s up-to-date data. The ADW supports
mixed workloads from an enterprise data warehouse that serves as a single source of truth for
decision making with predictable service levels for query response times, near real-time data
freshness, and mission critical data availability. Moreover, the ADW integrates into the overall
enterprise architecture to deliver decision services throughout an organization.
VLDB is abbreviation of Very Large DataBase. A one terabyte database would normally be
considered to be a VLDB. Typically, these are decision support systems or transaction processing
applications serving large numbers of users.
Data Mining is the process of analyzing data from different perspectives and summarizing it into
useful information.
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 13/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
ODS is also a small DWH which will help analyst to analysis the business. It will have data for less
number of days. Generally it will be around 1 month to 6 months. Like DWH here also surrogate
keys will be generated, error and reject handling will be done.
Operational Data Store is used by many organizations for analysis purpose as well as for data backup
and data recovery. Data Stored in ODS is usally in Normalized form as in transactonal DB’s.
While in DWH data will be denormalized ODS is actually a replica of Tacnsactional database,
collection two or more business functions data.
Or
is being supported.
used to support the tactical decision making process for the enterprise. It is the central point of data
integration for busienss management, delivering a common view of enterprise data.
ODS means operational data store which supports operational monitoring, data is volatile, current,
detailed, subjectoriented and integrated.
ODS is abbreviation of Operational Data Store. a database structure that is a repository for near real-
time operational data rather than long term trend data. The ODS may further become the enterprise
shared operational database, allowing operational systems that are being re-engineered to use the
ODS as there operation databases.
Or
A collection of operation or bases data that is extracted from operation databases and standardized,
cleansed, consolidated, transformed, and loaded into enterprise data architecture. An ODS is used to
support data mining of operational data, or as the store for base data that is summarized for a data
warehouse. The ODS may also be used to audit the data warehouse to assure summarized and
derived data is calculated properly. The ODS may further become the enterprise shared operational
database, allowing operational systems that are being reengineered to use the ODS as there operation
databases.
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 14/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
A collection of operation or bases data that is extracted from operation databases and standardized,
cleansed, consolidated, transformed, and loaded into enterprise data architecture. An ODS is used to
support data mining of operational data, or as the store for base data that is summarized for a data
warehouse. The ODS may also be used to audit the data warehouse to assure summarized and
derived data is calculated properly. The ODS may further become the enterprise shared operational
database, allowing operational systems that are being reengineered to use the ODS as there operation
databases.
ODS DWH
Business Ingelligence (BI) – technology infrastructure for gaining maximum information from
available data for the purpose of improving business processes (http://datawarehouse4u.info/What-
is-Business-Intelligence.html). Typical BI infrastructure components are as follows: software solution
for gathering, cleansing, integrating, analyzing and sharing data. Business Intelligence
(http://datawarehouse4u.info/What-is-Business-Intelligence.html) produces analysis and provides
believable information to help making effective and high quality business decisions.
* DSS – Decision Support (http://datawarehouse4u.info/What-is-Business-Intelligence.html) Systems
Business Intelligence systems (http://datawarehouse4u.info/What-is-Business-
Intelligence.html) based on Data Warehouse technology. A Data Warehouse(DW) gathers information
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 15/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
Data Warehousing helps you store the data while business intelligence helps you to control the data
for decision making, forecasting etc.
Data warehousing using ETL jobs, will store data in a meaningful form. However, in order to query
the data for reporting, forecasting, business intelligence tools were born.
The management of different aspects like development, implementation and operation of a data
warehouse is dealt by data warehousing. It also manages the meta data, data cleansing, data
transformation, data acquisition persistence management, archiving data.
In business intelligence the organization analyses the measurement of aspects of business such as
sales, marketing, efficiency of operations, profitability, and market penetration within customer
groups. The typical usage of business intelligence is to encompass OLAP, visualization of data,
mining data and reporting tools.
27. What is the difference between dependent data warehouse and independent data warehouse?
A dependent data warehouse stored the data in a central data warehouse. On the other hand
independent data warehouse does not make use of a central data warehouse.
Or
Dependent data ware house are build ODS, where as independent data warehouse will not depend
on ODS.
Dependent data marts are sourced directly from enterprise data warehouses.
Independent data marts are data captured from one (or) more operational systems (or) external
information providers (or) data generated locally with in particular department (or) geographic area.
XMLA is XML for Analysis which can be considered as a standard for accessing data in OLAP, data
mining or data sources on the internet. It is Simple Object Access Protocol. XMLA uses discover and
Execute methods. Discover fetched information from the internet while Execute allows the
applications to execute against the data sources.
XMLA stands for XML for Analysis. It is an industry standard for accessing data in analytical
systems, such as OLAP. XMLA is based on XML, SOAP and HTTP.
30. What is the difference between OLAP, ROLAP, MOLAP and HOLAP?
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 16/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
Cubes are logical representation of multidimensional data. The edge of the cube contains dimension
members and the body of the cube contains data values.
Or
Cubes are data processing units composed of fact tables and dimensions from the data warehouse.
They provide multidimensional views of data, querying and analytical capabilities to clients.
Or
Multi dimensional data is logically represented by Cubes in data warehousing. The dimension and
the data are represented by the edge and the body of the cube respectively. OLAP environments view
the data in the form of hierarchical cube. A cube typically includes the aggregations that are needed
for business intelligence queries.
Linked cube in which a sub-set of the data can be analysed into great detail. The linking ensures that
the data in the cubes remain consistent.
A transformer built set of similar cubes is known as cube grouping. A single level in one dimension of
the model is related with each cube group. Cube groups are generally used in creating smaller cubes
that are based on the data in the level of dimension.
Data cube is a multi-dimensional structure. Data cube is a data abstraction to view aggregated data
from a number of perspectives. The dimensions are aggregated as the ‘measure’ attribute, as the
remaining dimensions are known as the ‘feature’ attributes.
Data is viewed on a cube in a multidimensional manner. The aggregated and summarized facts of
variables or attributes can be viewed. This is the requirement where OLAP plays a role.
Or
Data cube is a multi-dimensional structure. Data cube is a data abstraction to view aggregated data
from a number of perspectives. The dimensions are aggregated as the ‘measure’ attribute, as the
remaining dimensions are known as the ‘feature’ attributes. Data is viewed on a cube in a
multidimensional manner. The aggregated and summarized facts of variables or attributes can be
viewed. This is the requirement where OLAP plays a role.
Data cubes are commonly used for easy interpretation of data. It is used to represent data along with
dimensions as some measures of business needs. Each dimension of the cube represents some
attribute of the database. Eg: Sales per day, month or year.
Or
A multi-dimensional structure called the data cube. A data abstraction allows one to view aggregated
data from a number of perspectives. Conceptually, the cube consists of a core or base cuboids,
surrounded by a collection of sub-cubes/cuboids that represent the aggregation of the base cuboids
along one or more dimensions. We refer to the dimension to be aggregated as the measure attribute,
while the remaining dimensions are known as the feature attributes.
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 17/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
These are OLAP cubes created by clients, end users or third-party applications accessing a data
warehouse, relational database or OLAP cube through the Microsoft® PivotTable® Service. E.g.
Microsoft® Excel™ is very popular as a client for creating offline local OLAP cubes from relational
databases for multidimensional analysis. These cubes have to be maintained and managed by the end
users who have to manually refresh their data.
These are combinations of one or more real cubes and require no disk space to store them. They store
only the definitions and not the data of the referenced source cubes. They are similar to views in
relational databases.
Or
A virtual or point-to-point data warehousing strategy means that end-users are allowed to get at
operational databases directly using whatever tools are enabled to the “data access network”.
Or
These are combinations of one or more real cubes and require no disk space to store them. They store
only the definitions and not the data of the referenced source cubes. They are similar to views in
relational databases.
37.
An OLAP cube will connect to a data source to read and process the raw data to perform
aggregations and calculations for its associated measures. Cubes are the core components of OLAP
systems. They aggregate facts from every level in a dimension provided in a schema. For example,
they could take data about products, units sold and sales value, and then add them up by month, by
store, by month and store and all other possible combinations. They’re called cubes because the end
data structure resembles a cube.
ROLAP (Relational OLAP) – Users see their data organized in cubes and dimensions but the data is
really stored in RDBMS. The performance is slow. a storage mode that uses tables in a relational
database to store multidimensional structures.
MOLAP (Multidimensional OLAP) – Users see their data organized in cubes and dimensions but
the data is really stored in MDBMS. Query performance is fast.
HOLAP (Hybrid OLAP) – It is a combination of ROLAP and HOLAP. EG: HOLOs. In this one will
find data queries on aggregated data as well as detailed data.
MOLAP Cubes: stands for Multidimensional OLAP. In MOLAP cubes the data aggregations and a
copy of the fact data are stored in a multidimensional structure on the Analysis Server computer. It is
best when extra storage space is available on the Analysis Server computer and the best query
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 18/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
performance is desired. MOLAP local cubes contain all the necessary data for calculating aggregates
and can be used offline. MOLAP cubes provide the fastest query response time and performance but
require additional storage space for the extra copy of data from the fact table.
ROLAP Cubes: stands for Relational OLAP. In ROLAP cubes a copy of data from the fact table is not
made and the data aggregates are stored in tables in the source relational database. A ROLAP cube is
best when there is limited space on the Analysis Server and query performance is not very important.
ROLAP local cubes contain the dimensions and cube definitions but aggregates are calculated when
they are needed. ROLAP cubes require less storage space than MOLAP and HOLAP cubes.
HOLAP Cubes: stands for Hybrid OLAP. A ROLAP cube has a combination of the ROLAP and
MOLAP cube characteristics. It does not create a copy of the source data however; data aggregations
are stored in a multidimensional structure on the Analysis Server computer. HOLAP cubes are best
when storage space is limited but faster query responses are needed.
This is the primary component that connects clients to the Microsoft® SQL Server™ 2000 Analysis
Server. It also provides the capability for clients to create local offline cubes using it as an OLAP
server. PivotTable® Service does not have a user interface, the clients using its services has to provide
its user interface.
43. What is difference between Bill Inmon and Ralph Kimball approaches of Data Warehouse
architecture
Bill Inmon Approach: According to Bill Inmon Data warehouse need to fulfill need of all category of
users .In an organization there are different type of user like
· Marketing
· Operations
Each department has its different way of interpreting data so Data warehouse should be able to
answer each department queries. This can be achieved by designing tables in 3NF form. According to
him data in Datawarehouse should be in 3NF and lowest granularity level. The data should be
accessible at detailed atomic levels by drilling down or at summarized levels by drilling up.
He stressed that data should be organized into subject oriented, integrated, non volatile and time
variant structures. According to him an organization have one Data warehouse and Data mart source
there information from Data warehouse. Inmon Approach is also called Top Down approach.
In this methodology data is brought into staging area from OLTP system or ODS (Operational Data
store) and then summarized and aggregated. After this process data mart will source their data from
data warehouse and will apply new set of transformation and aggregation according to their need.
1. Data should be organized into subject oriented, integrated, non volatile and time variant structures
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 19/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
1. Easy to maintain
2. Well integrated
Difficult to implement
Kimbell views Data warehouse as combination of data marts connected to Data warehouse bus
structure.Data marts are focused on delivering business objectives of different departments and Data
warehouse bus consists of conformed dimension, measures defined for whole organization. User can
query all data marts together using conformed dimensions.
In this approach the data warehouse is not a physical storage of the data as in the Inmon approach. It
is virtual It is a combination of data marts, each having a star schema design.
3. Bottoms’s up approach
Fast to build
Conclusion: In reality there is no right or wrong between these two approaches. In reality actual
methodology implemented is combination of both.
There are two major design methodologies followed in data warehousing Ralph Kimball and Bill
Inmon.We will discuss about both of these in detail.
Build business process oriented small data One centralize data warehouse which will
marts which are joined to each other using act as a enterprise-wide datawarehouse and
common dimension between business then build data mart as per need for specific
processes. department or process
It is known as bottom-up approach It is known as top down approach
Data marts should be build on dimensional Central data warehouse to follow ER
modelling approach modelling approach
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 20/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
Data modeling is representing the real world set of data structures or entities and their relationship in
their data models, required for a database.Data Modelling consists of various types like:
Data modeling aims to identify all entities that have data. It then defines a relationship between these
entities. Data models can be conceptual, logical or Physical data models. Conceptual models are
typically used to explore high level business concepts in case of stakeholders. Logical models are
used to explore domain concepts while Physical models are used to explore database design.
Data mining is used to examine or explore the data using queries. These queries can be fired on the
data warehouse. Data mining helps in reporting, planning strategies, finding meaningful patterns etc.
it can be used to convert a large amount of data into a sensible form.
A Dimensional Model is a database structure that is optimized for online queries and Data
Warehousing tools. It is comprised of “fact” and “dimension” tables.
A “fact” is a numeric value that a business wishes to count or sum. A dimension table stores
attributes, or dimensions, that describe the objects in a fact table.
Dimensional Models are designed for reading, summarizing and analyzing numeric information,
whereas Relational Models are optimized for adding and maintaining data using real-time
operational systems
Or
Dimensional model consists of dimension and fact tables. Fact tables store different transactional
measurements and the foreign keys from dimension tables that qualify the data. The goal of
Dimensional model is not to achive high degree of normalization but to facilitate easy and faster data
retrieval.
Ralph Kimball is one of the strongest proponents of this very popular data modeling technique which
is often used in many enterprise level data warehouses.
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 21/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
In dimensional approach transaction data are partitioned into either “facts”, which are generally
numeric transaction data, or “dimensions”, which are the reference information that gives context to
the facts.
A sales transaction can be broken up into facts such as the number of products ordered and the price
paid for the products, and into dimensions such as order date, customer name, product number,
order ship-to and bill-to locations, and salesperson responsible for receiving the order.
The main advantage in using dimensional approach is that the data warehouse is easier for the user
to understand and to use. The retrieval of data from data warehouse tends to operate very quickly.
There are mainly two disadvantages of using dimensional approach to storing data in data
warehouse.
1. For maintaining integrity of facts and dimensions, the process of loading the data from data
warehouse from different operational systems gets complicated.
2. If organization is adopting the dimensional approach changes the way in which it does the
business, it is difficult to modify the data warehouse structure.
In normalization approach the data in the data warehouse are stored following, to a degree, database
normalization rules. Tables are grouped together by subject areas that reflect general data categories
(e.g., data on customers, products, finance, etc.).
56. What are the advantages and disadvantage of using normalization approach for storing data in
data warehouse?
Advantages:
1. The main advantage is that it is easy to add information into the database.
Disadvantages:
2. It is also difficult to access the information without a precise understanding of the source of data
and the data structure of the data warehouse.
57. If de-normalized is improves data warehouse processes, why fact table is in normal form?
Foreign keys of facts tables are primary keys of Dimension tables. It is clear that fact table contains
columns which are primary key to other table that itself make normal form table.
58. What are Normalization, First Normal Form, Second Normal Form, And Third Normal Form?
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 22/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
1NF: Repeating groups must be eliminated, Dependencies can be identified, All key attributes
defined, No repeating groups in table
60.
61.
One table per entity One fact table for data organization
Minimize data redundancy Maximize understandability
Data is normalized and used for OLTP. Data is de normalized and used in data
Optimized for OLTP processing warehouse and data mart. Optimized for
OLAP
The Transaction Processing Model The data warehousing model or anlytical
model
Tables are units of storage Cubes are units of storage
Several tables and chains of relationships Few tables and fact tables are connected to
among them dimensional tables
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 23/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
63. What are the differences b/w dimension table and fact table?
A fact table is the central table in a star schema of a data warehouse. A fact table stores quantitative
information.
A fact table typically has two types of columns those are facts or measures and foreign keys to
dimension tables.
Cumulative: This type of fact table describes what has happened over a period of time. For example,
this fact table may describe the total sales by product by store by day. The facts for this type of fact
tables are mostly additive facts. The first example presented here is a cumulative fact table.
Snapshot: This type of fact table describes the state of things in a particular instance of time, and
usually includes more semi-additive and non-additive facts. The second example presented here is a
snapshot fact table.
(Or)
Transactional – Transactional fact table is the most basic one that each grain associated with it
indicated as “one row per line in a transaction”, e.g., every line item appears on an invoice.
Transaction fact table stores data of the most detailed level therefore it has high number of
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 24/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
Periodic snapshots – Periodic snapshots fact table stores data that is a snapshot in a period of time.
The source data of periodic snapshots fact table is data from a transaction fact table where you choose
period to get the output.
Accumulating snapshots – The accumulating snapshots fact table describes activity of a business
process that has clear beginning and end. This type of fact table therefore has multiple date columns
to represent milestones in the process. A good example of accumulating snapshots fact table is
processing of a material. As steps towards handling the material are finished, the corresponding
record in the accumulating snapshots fact table get updated.
65.
What is FactLess Fact Table and where we need to create this table
A fact table which does not contain numeric fact columns it is called factless facts table.
Or
In the real world, it is possible to have a fact table that contains no measures or facts. These tables are
called “Factless Fact tables”.
Eg: A fact table which has only product key and date key is a factless fact. There are no measures in
this table. But still you can get the number products sold over a period of time.
(Or)
Both kinds of factless fact tables play a very important role in your dimensional model
(http://www.zentut.com/data-warehouse/dimensional-modeling/) design. Let’s examine each of them
in detail and see the situations when you can apply them to make your design more robust.
When designing dimensional model, you often find that you want to track events or activities that
occurs in your business process but you can’t find measures to track. In these situations, you can
create a transaction-grained fact table that has no facts to describe that events or activities. Even
though there are no facts storing in the fact table, the event can be counted to produce very
meaningful process measurements.
For example, you may want to track employee leaves. How often and why your employee leaves are
very important for you to plan your daily activities and resources.
At the center of diagram below is the FACT_LEAVE table that has no facts at all. However
theFACT_LEAVE table is used to measure employee leave event when it occurs.
The following SQL statement is used to count number of leaves that and employee has been taken:
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 25/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
WHERE employee_id =
Executing the SQL query above, it would give you following result:
name leave
Doe, John 7
Doe, Sam 9
Walker Mike 8
…
Capturing the promotion campaigns that are active at specific times such as holidays.
Those examples above describe conditions, eligibility or coverage. The factless fact table can be used
to model conditions, eligibility or coverage. Typically information is captured by this star will not be
studied alone but used with other business processes to produce meaningful information.
Let’s take a look at the sale star below. By looking only at the star, we don’t know what product has
promotion that did not sell.
In order to track this kind of information, we can create a star that has factless fact table which is
known as coverage table (according to Kimball).
In order to answer the question: what product that has promotion did not sell, we need to do as
follows:
Look at the second star to find out products that have promotions.
Look at the first star to find out products that have promotion that sell.
The difference between is the list of products that have promotion but did not sell.
Factless fact table is crucial in many complex business processes. By applying concepts and
techniques about factless fact table in this tutorial, you can design a dimensional model that has no
clear facts to produce more meaningful information for your business processes.
Or
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 26/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
A fact table that does not contain any measure is called a fact-less fact. This table will only contain
keys from different dimension tables. This is often used to resolve a many-to-many cardinality issue.
Explanatory Note:
Consider a school, where a single student may be taught by many teachers and a single teacher may
have many students. To model this situation in dimensional model, one might introduce a fact-less-
fact table joining teacher and student keys. Such a fact table will then be able to answer queries like,
Fact less tables are so called because they simply contain keys which refer to the dimension tables.
Hence, they don’t really have facts or any information but are more commonly used for tracking
some information of an event.
A tracking process or collecting status can be performed by using fact less fact tables. The fact table
does not have numeric values that are aggregate, hence the name. Mere key values that are referenced
by the dimensions, from which the status is collected, are available in fact less fact tables.
A fact-less-fact table can only answer ‘optimistic’ queries (positive query) but can not answer a
negative query. Again consider the illustration in the above example. A fact-less fact containing the
keys of tutors and students can not answer a query like below,
Why not? Because fact-less fact table only stores the positive scenarios (like student being taught by a
tutor) but if there is a student who is not being taught by a teacher, then that student’s key does not
appear in this table, thereby reducing the coverage of the table.
Coverage fact table attempts to answer this – often by adding an extra flag column. Flag = 0 indicates
a negative condition and flag = 1 indicates a positive condition. To understand this better, let’s
consider a class where there are 100 students and 5 teachers. So coverage fact table will ideally store
100 X 5 = 500 records (all combinations) and if a certain teacher is not teaching a certain student, the
corresponding flag for that record will be 0.
68.
A fact table stores some kind of measurements. Usually these measurements are stored (or captured)
against a specific time and these measurements vary with respect to time. Now it might so happen
that the business might not able to capture all of its measures always for every point in time. Then
those unavailable measurements can be kept empty (Null) or can be filled up with the last available
measurements. The first case is the example of incident fact and the second one is the example of
snapshot fact.
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 27/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
Choosing business process to model – The first step is to decide what business process to model by
gathering and understanding business needs and available data
Declare the grain – by declaring a grain means describing exactly what a fact table record represents
Choose the dimensions – once grain of fact table is stated clearly, it is time to determine dimensions
for the fact table.
Identify facts – identify carefully which facts will appear in the fact table.
A fact is something that is quantifiable (Or measurable). Facts are typically (but not always)
numerical values that can be aggregated.
Types of Facts
Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact
table.
Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in
the fact table, but not the others.
Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions
present in the fact table.
Conformed dimensions are the dimensions which can be used across multiple Data Marts in
combination with multiple facts tables accordingly.
Level of granularity means level of detail that you put into the fact table in a data warehouse. Level of
granularity would mean what detail you are willing to put for each transactional fact.
Or
Level of granularity means level of detail that you put into the fact table in a data warehouse. For
example: Based on design you can decide to put the sales data in each transaction. Now, level of
granularity would mean what detail you are willing to put for each transactional fact. Product sales
with respect to each minute or you want to aggregate it upto minute and put that data.
Or
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 28/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
Granularity
The first step in designing a fact table is to determine the granularity of the fact table. By granularity,
we mean the lowest level of information that will be stored in the fact table.
Determine where along the hierarchy of each dimension the information will be kept.
74.
In simple terms, level of granularity defines the extent of detail. As an example, let us look at
geographical level of granularity. We may analyze data at the levels of COUNTRY, REGION,
TERRITORY, CITY and STREET. In this case, we say the highest level of granularity is STREET.
A dimension table is a table in a star schema of a data warehouse. A dimension table stores attributes,
or dimensions, that describe the objects in a fact table.
Conformed Dimension:
Conformed dimensions mean the exact same thing with every possible fact table to which they are
joined.
Eg: The date dimension table connected to the sales facts is identical to the date dimension connected
to the inventory facts.
Junk Dimension:
Contains low cardinality flags or indicators. It is generated by cross joining two or more low
cardinialtiy dimensions. Example: Cross join gender and marital status dimensions and generate a
junk dimension.
Degenerated Dimension:
A degenerate dimension is a dimension which is derived from the fact table and doesn’t have its own
dimension table.
Or
Degenerate Dimension: Keeping the control information on Fact table ex: Consider a Dimension
table with fields like order number and order line number and have 1:1 relationship with Fact table,
In this case this dimension is removed and the order information will be directly stored in a Fact table
inorder eliminate unneccessary joins while retrieving order information..
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 29/57
15.10.2022, 11:54 yj g Questions – Data Warehousing
DWH Interview
Role-playing dimension:
Dimensions which are often used for multiple purposes within the same database are called role-
playing dimensions. For example, a date dimension can be used for “date of sale”, as well as “date of
delivery”, or “date of hire
Core dimension is a dimension table which is dedicated for single fact table or data mart.
Mini dimensions can be used to handle rapidly changing dimension scenario. If a dimension has a
huge number of rapidly changing attributes it is better to separate those attributes in different table
called mini dimension. This is done because if the main dimension table is designed as SCD type 2,
the table will soon outgrow in size and create performance issues. It is better to segregate the rapidly
changing members in different table thereby keeping the main dimension table small and
performing.
84. Can’t we store degenerate dimension in dimensions table instead of fact table?
Conventional (Slow):
All the constraints and keys are validated against the data before, it is loaded, this way data integrity
is maintained.
Direct (Fast):
All the constraints and keys are disabled before the data is loaded. Once data is loaded, it is validated
against all the constraints and keys. If data is found invalid or dirty it is not included in index and all
future processes are skipped on this data.
Dimensions that change over time are called slowly changing dimensions. For instance a product
price change over time, people changes their name for some reason, country, state, city names may
change over time. These are the few examples of slowly chnaging dimensions.
(or)Slowly Changing Dimensions (SCD) – dimensions that change slowly over time, rather than cha
regular schedule, time-base. In Data Warehouse there is a need to track changes in dimension attribu
report historical data. In other words, implementing one of the SCD types should enable users assign
dimensions attribute value for given date? Example of such dimensions could be: customer, geograph
employee.There are many approaches how to deal with SCD. The most popular are:* Type 0 – The p
method* Type 1 – Overwriting the old value* Type 2 – Creating a new additional record
Type 0 – The passive method. In this method no special action is performed upon dimensional chang
Some dimension data (http://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.html) can rem
same as it was first time inserted, others may be overwritten.
Type 1 – Overwriting the old value. In this method no history of dimension changes is kept in the da
old dimension value is simply overwritten be the new one. This type is easy to maintain and is often
which changes are caused by processing corrections (e.g. removal special characters, correcting spelli
Current table (http://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.html):
* current_type – for keeping current value of the attribute. All history records for given item of attribu
same current value.
* historical_type – for keeping historical value of the attribute. All history records for given item of att
have different values.
Hybrid SCDs are combination of both SCD 1 and SCD 2. It may happen that in a table, some columns
are important and we need to track changes for them i.e capture the historical data for them whereas
in some columns even if the data changes, we don’t care.
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 32/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
SCD1 stores only current data where as SCD2 stores History records also.
91. What are two columns available in SCD2 those help to track changes?
Eg:
A critical column in a warehouse is a column whose value changes over a period of time. For eg: City
of the user. If a user resides in city ‘abc’ and the warehouse keeps a track of his per day expenses –
when the user changes the city, the data warehouse becomes inconsistent since the city has changed
and the expenses are shown under the new city.
Or
A column (usually granular) is called as critical column which changes the values over a period of
time.
For example, there is a customer by name ‘Anirudh’ who resided in Bangalore for 4 years and shifted
to Pune. Being in Bangalore, he purchased Rs. 30 Lakhs worth of purchases. Now the change is the
CITY in the data warehouse and the purchases now will shown in the city Pune only. This kind of
process makes data warehouse inconsistent. In this example, the CITY is the critical column.
Surrogate key can be used as a solution for this.
Indexing is a technique, which is used for efficient data retrieval (or) accessing data in a faster
manner. When a table grows in volume, the indexes also increase in size requiring more storage.
Bitmap index is the best one. Why because B-tree is suited for unique values (eg: empid) and Bitmap
is best for repeated values (eg: gender m/f)
95. What type of Indexing mechanism do we need to use for a typical datawarehouse?
On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other
types of clustered/non-clustered, unique/non-unique indexes.
To my knowledge, SQLServer does not support bitmap indexes. Only Oracle supports bitmaps.
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 33/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
96.
Bitmaps are very useful in start schema to join large databases to small databases. Answer queries
and bit arrays are used to perform logical operations on the databases. Bit map indexes are very
efficient in handling Gender differentiation; also repetitive tasks are performed with much larger
efficiency.
Bitmaps commonly use one bitmap for every single distinct value. Number of bitmaps used can be
reduced by opting for a different type of encoding. Space can be optimized but when a query is
generated bitmaps have to be accessed.98. How many clustered indexes can u create for a table
in DWH? In case of truncate and delete Command what happens to table, which has unique
id.You can have only one clustered index per table. If you use delete command, you can rollback…
it fills your redo log files.If you do not want records, you may use truncate command, which will
be faster and does not fill your redo log file.
99. List out the OLAP operations in multidimensional data model?
(http://www.atoziq.com/2012/10/list-out-olap-operations-in.html)
=> Roll-up
=> Drill-down
The roll-up operation is also called drill-up operation which performs aggregation on a data cube
either by climbing up a concept hierarchy for a dimension (or) by dimension reduction.
Slicing means showing the slice of a data, given a certain set of dimension (e.g. Product) and value
(e.g. Brown Bread) and measures (e.g. sales).
Dicing means viewing the slice with respect to different dimensions and in different level of
aggregations.
Drill through is the process of going to the detail level data from summary data.
Consider the above example on retail shops. If the CEO finds out that sales in East Europe has
declined this year compared to last year, he then might want to know the root cause of the decrease.
For this, he may start drilling through his report to more detail level and eventually find out that even
though individual shop sales has actually increased, the overall sales figure has decreased because a
certain shop in Turkey has stopped operating the business. The detail level of data, which CEO not
much was interested on earlier, has this time helped him to pin point the root cause of declined sales.
And the method he has followed to obtain the details from the aggregated data is called drill through.
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 34/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
104.
Drilling can be done in drill down, up, through, and across; scope is the overall view of the drill
exercise.
106. What is Date Dimension and how you will load Date dimension
Time dimensions are usually loaded by a program that loops through all possible dates that may
appear in the data. 100 years may be represented in a time dimension, with one row per day.
A star schema is the simplest form of a dimensional model, in which data is organized
into facts and dimensions. A fact is an event that is counted or measured, such as a sale or login. A
dimension contains reference information about the fact, such as date, product, or customer. A star
schema is diagramed by surrounding each fact with its associated dimensions. The resulting diagram
resembles a star.
The snowflake schema represents a dimensional model which is also composed of a central fact table
and a set of constituent dimension tables which are further normalized into sub-dimension tables. In
a snowflake schema implementation, Warehouse Builder uses more than one table or view to store
the dimension data. Separate database tables or views store data pertaining to each level in the
dimension.
BUS Schema is composed of a master suite of confirmed dimension and standardized definition if
facts.
Or
A BUS schema is to identify the common dimensions across business processes, like identifying
conforming dimensions. It has conformed dimension and Standardized definition of facts.
Galaxy Schema:
Galaxy schema contains many fact tables with some common dimensions (conformed dimensions).
This schema is a combination of many data marts.
The dimensions in this schema are segregated into independent dimensions based on the levels of
hierarchy. For example, if geography has five levels of hierarchy like teritary, region, country, state
and city; constellation schema would have five dimensions instead of one.
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 35/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
113. What are the diff b/w snowflake schema and star schema
Comparison chart
Star Schema: If the performance is the priority than go for Star Schema, since here dimension tables
are denormalized.
Snowflake Schema: If memory space is the priority than go for Snowflake Schema, since here
dimension tables are normalized.
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 36/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
1. Provide a direct mapping between the business entities and the schema design.
There are some requirement which can not be meet by star schema like relationship between
customer and bank account can not represented purely as star schema as relationship between them
is many to many.
The main advantage of Snowflake Schema is the improvement of query performance due to
minimized disk storage requirements and joining smaller lookup tables.
It is easier to maintain.
Increase flexibility…It provides greater flexibility in interrelationship between dimension levels and
components.
The main disadvantage of the Snowflake Schema is the additional maintenance efforts needed to the
increase number of lookup tables.
Makes the queries much more difficult to create because more tables need to be joined.
117. How can you implement many relations in star schema model? (http://www.questions-
interviews.com/data-warehouse/data-warehousing-
3.aspx#How_can_you_implement_many_relations_in_star_schema_model)
Many-many relations can be implemented by using snowflake schema .With a max of n dimensions.
Aggregate table contains the summary of existing warehouse data which is grouped to certain levels
of dimensions.Retrieving the required data from the actual table, which have millions of records will
take more time and also affects the server performance.To avoid this we can aggregate the table to
certain required level and can use it.This tables reduces the load in the database server and increases
the performance of the query and can retrieve the result very fastly.
119. What is the difference between aggregate table and materialized view?
(http://www.questions-interviews.com/data-warehouse/data-warehousing-
3.aspx#What_is_the_difference_between_aggregate_table_and_materialized_view)
Aggregate tables are pre-computed totals in the form of hierarchical multidimensional structure.,
whereas materialized view ,is an database object which caches the query result in a concrete table and
updates it from the original database table from time to time .Aggregate tables are used to speed up
the query computing whereas materialized view speed up the data retrieval .
120.
What are the steps to load Data Warehouse/Data Mart by using any ETL Tool
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 37/57
15.10.2022, 11:54 p y
DWH Interview Questions g Warehousing
– Data y
ETL process
* cleaning (e.g., mapping NULL to 0 or “Male” to “M” and “Female” to “F” etc.),
Data cleaning is the process of identifying erroneous data. The data is checked for accuracy,
consistency, typos etc.
Data Transformation – Confirms that the input data matches in format with expected data.
Statistical Methods- values of mean, standard deviation, range, or clustering algorithms etc are used
to find erroneous data.
A process to upgrade the quality of data before it is moved into a data warehouse
Deleting data from data warehouse is known as data purging. Usually junk data like rows with null
values or spaces are cleaned up.
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 38/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
124. Which tables you load first while loading DWH ( Dimensions or Fact)
Dimension
125. What is early arriving fact OR late arriving dimension , Explain how would you handle
those records
An Early arriving fact takes place when the activity measurement arrives at the data warehouse
without its full context. In other words, the statuses of the dimensions attached to the activity
measurement are ambiguous or unknown for some period of time
We all know that first we will process dimension records and insert into the dimension table. Next
the fact records are processed by joining with the dimension table. In case of late arriving dimension
when you joined the fact table with dimension, the fact records are not inserted into the fact table as
there is no corresponding dimension for that record. To handle this we have to create another table in
which we will insert the fact records that are failed to insert into the original fact table. When we
process the data next time, we will use this table along with the fact stage table to join with the
dimension table to insert into the fact table.
Code:
join dimension_table
126. Please tell me in which situation context and alias are going to use
127. Which technology should be used for interactive data querying across multiple
dimensions for a decision making for a DW?
MOLAP
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 39/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
Metadata is:
A natural key is a set of one or more column in the dimension table that uniquely identifies a record
in the table. The values of the natural key column(s) are provided by the source system. Ideally the
natural keys must be defined as the primary key of the dimension table, but we refrain from this for
the following reasons
The Natural Key could contain non numeric data type (timestamp or char) columns. Joining large
fact tables with non numeric data types like timestapms could lead to performance issues.
They could be more than one in number, hence increasing the size of the Fact Table as we would need
more than one column to join the fact table with the dimension table.
The format and structure of the natural keys could change in the future. This could happen when
new source systems are added.
Having Natural keys will make the process of Slowly Changing Dimensions
(http://www.dwhinfo.com/Technical/DWHETLSlowlyChangingDimensionProcess.html) very
complex.
Surrogate keys are keys that have no “business” meaning and are solely used to identify a record in
the table. Such keys are either database generated (example: Identity in SQL Server, Sequence in
Oracle, Sequence/Identity in DB2 UDB etc.) or system generated values (like generated via a table in
the schema).
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 40/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
133. In your Data Warehouse, Do you like to use Natural Keys or Surrogate Keys and why
Surrogate keys.
The Natural Key could contain non numeric data type (timestamp or char) columns. Joining large
fact tables with non numeric data types like timestapms could lead to performance issues.
They could be more than one in number, hence increasing the size of the Fact Table as we would need
more than one column to join the fact table with the dimension table.
The format and structure of the natural keys could change in the future. This could happen when
new source systems are added.
Having Natural keys will make the process of Slowly Changing Dimensions
(http://www.dwhinfo.com/Technical/DWHETLSlowlyChangingDimensionProcess.html) very
complex.
136. How many clustered indexes can u create for a table in DWH?
By definition, a clustered index physically arranges all data in a table in a sequential manner. Since
you can not have more than one physical arrangement of data in a table, you can have just one
clustered index per table.
Data in which changes to existing records cause the previous version of the records to be eliminated
In DWH loops may exist between the tables. If loops exist, then query generation will take more time,
because more than one path is available. It creates ambiguity also. Loops can be avoided by creating
aliases of the table or by context.
Example: 4 Tables – Customer, Product, Time, Cost forming a close loop. Create alias for the cost to
avoid loop.
Cardinality is the term used in database relations to denote the occurrences of data on either side of
the relation.
e.g.: A data column containing LAST_NAME (there may be several entries of the same last name)
Determining data cardinality is a substantial aspect used in data modeling. This is used to determine
the relationships
Types of cardinalities:
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 42/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
Snapshot refers to a complete visualization of data at the time of extraction. It occupies less space and
can be used to back up and restore data quickly.
Or
A snap shot is a process of knowing about the activities performed. Snap shot is stored in a report
format from a specific catalog. The report is generated soon after the catalog is disconnection.
Or
You can disconnect the report from the catalog to which it is attached by saving the report with a
snapshot of the data.
Key areas of activity in which favorable results are necessary for a company to reach its goal.
Industry CSFs
Strategy CSFs
Environmental CSFs
Temporal CSFs
Money
Your future
Customer satisfaction
Quality
Intellectual capital
Strategic relationships
Sustainability
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 43/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
Current data intended to be the single source for all decision support systems.
In Chain Data Replication, the non-official data set distributed among many disks provides for load
balancing among the servers within the data warehouse.
Blocks of data are spread across clusters and each cluster can contain a complete set of replicated
data. Every data block in every cluster is a unique permutation of the data in other clusters.
When a disk fails then all the calls made to the data in that disk are redirected to the other disks when
the data has been replicated.
At times replicas and disks are added online without having to move around the data in the existing
copy or affect the arm movement of the disk.
In load balancing, Chain Data Replication has multiple servers within the data warehouse share data
request processing since data already have replicas in each server disk.
Broadcast – Takes data from multiple inputs, combines it and sends it to all the output ports.
Eg – You have 2 incoming flows (This can be data parallelism or component parallelism) on Broadcast
component, one with 10 records & other with 20 records. Then on all the outgoing flows (it can be
any number of flows) will have 10 + 20 = 30 records.
Replicate – It replicates the data for a particular partition and send it out to multiple out ports of the
component, but maintains the partition integrity.
Eg – Your incoming flow to replicate has a data parallelism level of 2. with one partition having 10
recs & other one having 20 recs. Now suppose you have 3 output flos from replicate. Then each flow
will have 2 data partitions with 10 & 20 records respectively.
147. What is the difference between a Scan component and a RollUp component?
Rollup is for group by and Scan is for successive total. Basically, when we need to produce summary
then we use scan. Rollup is used to aggregate data.
1 gb= (100mb+200mb+300mb+5oomb)
1000mb/4= 250 mb
(100- 250 )/500= –> -150/500 == cal ur self it wil come in -ve value.
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 44/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
Bitmap indexes make use of bit arrays (bitmaps) to answer queries by performing bit-wise logical
operations. They work well with data that has a lower cardinality which means the data that take
fewer distinct values. Bitmap indexes are useful in the data warehousing applications. Bitmap
indexes have a significant space and performance advantage over other structures for such data.
Tables that have less number of insert or update operations can be good candidates.
# Their structure makes it possible for the system to combine multiple indexes together so that they
can access the underlying table faster.
=> Regression definition
=> linear regression
=> multiple regressions
=> Non-linear regression
=> Point estimation
=> Data summarization
=> Bayesian techniques
=> Hypothesis testing
=> Regression
=> Correlatio
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 45/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
=> Bayesian theorem
=> Bayesian learning
In hierarchical, networked or relational databases, the data can be extracted, cleansed and transferred
in two directions. The ability of a system to do this is refered to as bidirectional extracts.
Data Extraction
The source systems the data is extracted from vary in various forms right from their structures and
file formats to the department and the business segment they belong to. Common source formats
include flat files and relational database and other non-relational database structures such as IMS,
VSAM or ISAM.
Data transformation
the extracted data may undergo transformation with possible addition of metadata before they are
exported to another large storage area.
In transformation phase, various functions related to business needs, requirements, rules and policies
are applied on them. During this process some values even get translated and encoded. Care is also
taken to avoid redundancy of data.
Data cleansing
in data cleansing, scrutinizing of the incorrect or corrupted data is done and those inaccuracies are
removed. Thus data consistency is ensured in Data cleansing.
Data transformation
this is the last process of Bidirectional Extracts. The cleansed, transformed extracted source data is
then loaded into the data warehouse.
Advantages
– Updates and data loading become very fast due to bidirectional extracting.
– As timely updates are received in a useful pattern companies can make good use of this data to
launch new products and formulate market strategies.
Disadvantage
– Not being able to come up with fault tolerance may mean unexpected stoppage of operations when
the system breaks.
156. Who are the Data Stewards and whats their role?
Key Points
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 46/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
Data stewards ensure official agency records requirements are met, and data documentation is
developed and maintained.
Data stewards create data standards, establish data access security requirements, and are
active in all levels of data management.
Project managers and field supervisors appoint data stewards and determine what data will be
maintained.
Project managers ensure adherence to USGSrequirements, and that resources are available for
data management.
Specialists work with data stewards, implement data standards, create metadata, and manage
databases.
ALL are responsible for the integrity and quality of the data.
A Data Steward is one who manages another’s facts or information to ensure that they can be used to
draw conclusions or make decisions. Data Stewards are “keepers of the flame” in terms of data
quality. They are responsible as stewards to serve and protect the customers’ needs or assets
(consider an airline steward or a trustee).
Stewardship equals taking responsibility for a set of data for the well being of the larger
organization, and operating in service to, rather than in control of, those around us.
Data stewardship is primarily the job of the professionals who create and maintain data. Although
they have significant support roles to play, stewardship cannot simply be delegated to
the IT or GIS shops.
For example, for a spatially-enabled dataset, the GIS person may be responsible for maintaining the
data but the decision on what information to collect and what format to keep it in belong to the
“ologists” and business area leads they are working with.
USGS cannot accomplish data management without people taking on the roles of data stewardship at
all levels of the organization. We are looking for people to embrace those data steward roles and
responsibilities. People with knowledge about the business needs of the organization are necessary at
all levels to define and manage data content and quality to ensure that the data collected and
maintained meet those business needs.
Many of the responsibilities of Data Stewards are the same, regardless of where the person falls
within the organization.
Be accountable for
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 47/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
personally created/updated.
Data stewards
establishing requirements
and
Data quality
means
Data stewards
Is there current documentation on the data such as when they were collected, where, how, by
whom, and under what conditions?
Data Access:
Data access rules relate to both internal and external access. As a data steward you are required to
take into consideration things likeFOIA, Privacy Act, and IT Security Issues that could impact your
data. Data Stewards should assess their data early in the data collection process to determine if
anything they are collecting is sensitive and might be restricted from access either inside or outside
the organization.
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 48/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
NARA
) rules regulate the disposal of all types of records, including alphanumeric and spatial
datasets.
Always involve your Records Manager/Administrator early in the data collection planning
process.
Metadata
, which is defined as “data about data” describes the content, quality, condition, and other
characteristics of data.
Metadata
is to be collected from the beginning of the data collection process for both alphanumeric
and spatial data.
[see
Participate in the data management team for your geographic area (national, state, local).
Data management
. When you determine a need to collect data for a project, proposal, or decision in your
area, work with the team to identify existing data stores or data collection parameters.
Employees who have roles and responsibilities for data management need to work
together.
Endorse
Knowledge of how to create data standards, determine business data requirements, and business
rules
Management Responsibilities
Ensure
Determine
Be accountable
for all aspects of data within their program or geographic area. Includes responsibility for
Provide oversight during development of projects to ensure the data needs and requirements
are
documented
Ensure adherence
metadata
and
data standards
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 50/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
. [see
and
Specialist Responsibilities
Work with
data stewards
to interpret business needs into applications and derive data requirements.
Implement State/Bureau
data standards
Facilitate educational opportunities for the treatment, application, and value of spatial data.
metadata
Manage
databases
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 51/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
References
Chatfield, T., Selbach, R. February, 2011. Data Management for Data Stewards. Data Management
Training Workshop. Bureau of Land Management (BLM).
157. What the easiest way to build a corporate specific time dimension?
158. What are confirmed dimension? We alwys give date as a conformed dimension but if it
has different format for different contries say YYMMDD for Italy and MM-DD-YYYY for
france.Then are they not confirmed.
During the Planning phase of the project, the conceptual data model is created to capture the high-
level data requirements for the project. Since the model captures the highlights of the client’s
information needs, it is the only model that effectively reflects the enterprise level.
Depending on the requirements, the enterprise-wide vision may need to be emphasized to help guide
the client in the development of an overall data warehousing strategy. Detail models that reflect the
project’s scope will be created during logical and physical data modeling. The conceptual data model
is the precursor to the logical data model; it is not tied to any particular solution or technology.
Entities, relationships, major attributes, and metadata across functional areas are included. During
successive releases, the conceptual data model should be validated and updated if necessary. An
enterprise should have only one conceptual data model.
During the design phase of the project, the logical data model is created for the scope of the complete
project. A portion of the conceptual data model will be fully attributed and completed as the logical
data model. The logical data model reflects the technology to be used. In today’s environment, this
typically means either a relational DBMS or a multidimensional tool. But if the client should be using
an older DBMS such as IMS or IDMS, the logical model will be quite different than if an RDBMS is to
be used. The logical data model reflects a logical data design that can be used by the developers on
the project. For an RDBMS, that means logical tables (views) and columns.
Like the logical data model, the physical data model is created during the design phase. This
modeling activity should reflect the scope of the specific release of the project. The model’s final
design will be highly dependent on the technical solution for the data warehouse. The purpose of this
model is to capture all the technical details required to produce the final tables, and
physical constructs such as indexes and table partitions. The logical data model will serve as a
blueprint to the project team while the physical data model is a blueprint for the DBAs. All the
functionality reflected in the logical data model should be preserved while creating the physical
data model. The generated table schemas will be identical to the physical data model.
http://www.allinterview.com/Interview-Questions/Data-Warehouse-General/page12.html
(http://www.allinterview.com/Interview-Questions/Data-Warehouse-General/page12.html)
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 52/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
http://satishmsbi.blogspot.in/2011/10/differences-between-systemsoltpolapodsd.html
(http://satishmsbi.blogspot.in/2011/10/differences-between-systemsoltpolapodsd.html)
http://hussain-msbi.blogspot.in/search/label/SSIS (http://hussain-msbi.blogspot.in/search/label/SSIS)
http://blog.stevienova.com/2008/11/22/ssis-slowly-changing-dimensions-with-checksum/
(http://blog.stevienova.com/2008/11/22/ssis-slowly-changing-dimensions-with-checksum/)
http://srikanthtechnologies.com/books/orabook/oraclebook.html
(http://srikanthtechnologies.com/books/orabook/oraclebook.html)
http://www.dwhinfo.com/Technical/DWHTechnicalMain.html
(http://www.dwhinfo.com/Technical/DWHTechnicalMain.html)
A fundamental concept of a data warehouse is the distinction between data and information. Data is
composed of observable and recordable facts that are often found in operational or transactional
systems.
At Rutgers, these systems include the registrar’s data on students (widely known as the SRDB),
human
resource and payroll databases, course scheduling data, and data on financial aid. In a data
warehouse
environment, data only comes to have value to end-users when it is organized and presented as
information. Information is an integrated collection of facts and is used as the basis for
decisionmaking. For example, an academic unit needs to have diachronic information about its extent
of
instructional output of its different faculty members to gauge if it is becoming more or less reliant on
part-time faculty.
The data warehouse is that portion of an overall Architected Data Environment that serves as the
single
integrated source of data for processing information. The data warehouse has specific characteristics
that
simply as computer files. Data is manipulated to provide information about a particular subject. For
example, the SRDB is not simply made accessible to end-users, but is provided structure and
organized
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 53/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
Integrated: A single source of information for and about understanding multiple areas of interest. The
data warehouse provides one-stop shopping and contains information about a variety of subjects.
Thus
the OIRAP data warehouse has information on students, faculty and staff, instructional workload,
and
student outcomes.
Non-Volatile: Stable information that doesn’t change each time an operational process is executed.
Accessible: The primary purpose of a data warehouse is to provide readily accessible information to
end-users.
Other Definitions
Data Warehouse: A data structure that is optimized for distribution. It collects and stores integrated
sets of historical data from multiple operational systems and feeds them to one or more data marts. It
Data Mart: A data structure that is optimized for access. It is designed to facilitate end-user analysis
of
data. It typically supports a single, analytic application used by a distinct set of workers.
Staging Area: Any data store that is designed primarily to receive data into a warehousing
environment.
Operational Data Store: A collection of data that addresses operational needs of various operational
units. It is not a component of a data warehousing architecture, but a solution to operational needs.
or “dimensions” to facilitate analysis and understanding of the underlying data. It is also sometimes
Star Schema: A means of aggregating data based on a set of known dimensions. It stores data
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 54/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
Oracle.
Snowflake Schema: An extension of the star schema by means of applying additional dimensions to
the
database management tools that store and manage data in a multidimensional manner, as opposed to
the
OLAP Tools: A set of software products that attempt to facilitate multidimensional analysis. Can
incorporate data acquisition, data access, data manipulation, or any combination thereof.
The data warehouse is distinctly different from the operational data used and maintained by day-to-
day
operational systems. Data warehousing is not simply an “access wrapper” for operational data,
where
data is simply “dumped” into tables for direct access. Among the differences:
initial development
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 55/57
15.10.2022, 11:54 DWH Interview Questions – Data Warehousing
Cycle
elements)
ownership
small amount of data used in a process large amount of data used in a process
• Identify from key users the significant business questions and key metrics that the target user.
• Decompose these metrics into their component parts with specific definitions.
• Map the component parts to the informational model and systems of record.
When you begin to develop your first data warehouse increment, the architecture is new and fresh.
With
• Start with one subject area (or subset or superset) and one target user group.
• Continue and add subject areas, user groups and informational capabilities to the architecture
• Improvements are made from what was learned from previous increments.
• Improvements are made from what was learned about warehouse operation and support.
http://oirap.rutgers.edu/dwbasics.pdf (http://oirap.rutgers.edu/dwbasics.pdf)
Leave a comment
https://dwhbeginners.wordpress.com/interview-questions-and-answers/dwh-interview-questions/ 57/57