Professional Documents
Culture Documents
DW Main PDF
DW Main PDF
Index
Data Warehouse:
It is a central managed and integrated database containing data from the operational sources
in an organization.
A data warehouse is a powerful database model that significantly enhances the user’s ability
to quickly analyze large multidimensional data sets.
It cleanses and organizes data to allow users to make business decisions based on facts.
And so, the data in data warehouse must have strong analytical characteristics.
Data warehouse is a decisional database system.
Data Warehouse Delivers Enhanced Business Intelligence.
Data Warehouse Saves Time.
Data Warehouse Enhances Data Quality and Consistency.
Data Warehouse Provides Historical Intelligence.
Knowledge Discovery and Decision Support
Subject Oriented Data: - A data warehouse is organized around major subjects, such as
customer, vendor, product, and Sales. It focuses on the modeling and analysis of data rather
than day-to-day business operations.
Integrated Data: -A data warehouse is constructed by integrating data from multiple
heterogeneous data sources.
Time Referenced Data: - A data warehouse is a repository of historical data. It gives the view
of the data for a designated time frame.
Non Volatile Data: - A data warehouse is always a physically separate store of data
transformed from the application data found in the operational environment.
Due to this separation, a data warehouse does not require transaction processing, recovery,
and concurrency control mechanisms.
The non-volatility of data enables users to dig deep into history and arrive at specific business
decision based on facts.
Operational Database:-
The Operational databases are often used for on-line transaction processing (OLTP). It deals
with day-to-day operations such as banking, purchasing, manufacturing, registration,
accounting, etc.
These systems typically get data into the database. Each transaction processes information
about a single entity. Following are some examples of OLTP queries:
– What is the price of 2GB Kingston Pen drive?
– What is the email address of the president?
Subject-oriented:
A data warehouse is organized around major subjects, such as customer, vendor, product, and
Sales. It focuses on the modeling and analysis of data rather than day-to-day business
operations.
Integrated:
A data warehouse is constructed by integrating data from multiple heterogeneous data sources.
Time variant:
A data warehouse is a repository of historical data. It gives the view of the data for a designated
time frame.
Non-volatile:
A data warehouse is always a physically separate store of data transformed from the application
data found in the operational environment.
Due to this separation, a data warehouse does not require transaction processing, recovery,
and concurrency control mechanisms.
Q3. Draw and explain data warehouse architecture.
Or
Explain the framework of data warehouse.
Data Warehouse Architecture:-
Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 4
A Data Warehouse Architecture (DWA) is a way of representing the overall structure of data,
communication, processing and presentation that exists for end-user computing within the
enterprise.
The architecture is made up of a number of interconnected parts as follows:
Source System
Source data transport layer
Data Quality control and data profiling layer
Metadata management layer
Data integration layer
Data processing layer
End user reporting layer
Below figure shows architecture of data warehouse which consist of various layer
interconnected each other.
Source System
Operational systems process data to support critical operational needs. In order to do this,
operational databases have been historically created to provide an efficient processing structure
for a relatively small number of well-defined business transactions.
The goal of data warehousing is to free the information locked up in the operational systems
and to combine it with information from other external sources of data.
Large organizations are acquiring additional data from outside databases. This information
includes demographic, econometric, competitive and purchasing trends.
Q4. Write any five significant differences between OLTP database and Data warehouse
database.
Q6. What are the various levels of data redundancy in data warehouse?
Or
Describe virtual data warehouse and central data warehouse
End users are allowed to get operational databases directly using whatever tools are enabling
to data access network. That is it provides on-the-fly data for decision support purposes.
This approach is flexible and has minimum amount of redundant data. This approach can put
the unplanned query load on operational systems.
The advantages of this approach are:
Flexibility
No data redundancy
Provides end-users with the most current corporate information
Virtual data warehouses often provide a starting point for organizations to learn what end
users are really looking for.
Granularity of Facts:-
The granularity of a fact is the level of detail at which it is recorded. If data is to be analyzed
effectively, it must be all at the same level of granularity.
The more granular the data, the more we can do with it. Excessive granularity brings needless
space consumption and increases complexity. We can always aggregate details.
Granularity is not absolute or universal across industries. This has two implications. First, grains
are not predetermined; second, what is granular to one business may be summarized for
another.
Adding elements to an existing key always increases the granularity of the data; removing any
part of an existing key decreases its granularity.
Granularity is also determined by the inherent cardinality of the entities participating in the
primary key. Data at the order level is more granular than data at the customer level, and data
at the customer level is more granular than data at the organization level.
Q8. Explain the various types of additivity of facts with examples.
A fact is something that is measurable and is typically numerical values that can be aggregated.
Facts that are additive across all dimensions are referred as Full Additive or Additive.
Example of Additive Fact:-
Semi-Additive:
Facts those are additive across some of the dimensions, but not all are referred as Semi-
Additive.
Example of Semi-Additive Fact:-
Suppose a Bank stores current balance by account by end of each day.
The Balance cannot be summed up across Time dimension. It does not make sense if we sum
the current balance by date.
Facts those are not additive across any dimension are referred as Non-Additive.
Non-additive facts are usually the result of ratio or other calculations.
A star schema consists of a central fact table containing measures and a set of dimension
tables.
In star schema model a fact table is at the center of the star and the dimension tables as points
of the star.
A star schema represents one central set of facts. The dimension tables contain descriptions
about each of the aspects.
Say for example a warehouse that store sales data, there is a sales fact table stores facts about
sales while dimension tables store data about location , clients, items, times, branches.
Examples of sales facts are unit sales, dollar sales, sale cost etc. Facts are numeric values
which enable users to query and understand business performance metrics by summarizing
data.
Take a situation where a household can own many insurance policies, yet any policy could be
owned by multiple households.
The simple approach to this is the traditional resolution of the many-to-many relationship, called
an associative entity.
Recursive relationships come in two forms— “self‟ relationships (1: M) and “bill of materials”
relationships (M: M). A self-relationship involves one table whereas a bill of materials involves
two.
These structures have been handled in ER modeling from its inception. They are generally
supported as a self-relating entity or as an associative entity in which an associate entity
(containing parent-child relationships) has dual relationships to a primary entity.
Step1:-
Asks your email address and oracle support password to configure security updates.
Step2:-
Following are the installation options
create and configure a database
install database software only
Upgrade an existing database
Step3:-
Here you can select the type of installation you want to perform.
The following are the installation types:
Single Instance database installation
Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 17
Real Application Cluster database installation (RAC)
Step4:-
To select the language in which your product will run.
Step5:-
We can choose the edition of the database to install, Enterprise, Standard, Standard Edition
One, or Personal Edition.
Step6:-
This step asks us to specify the installation location for storing Oracle configuration files and
software files.
Step9
The actual installation happens in step 9. A progress bar proceeds to the right as the installation
happens and steps for Prepare, Copy Files, and Setup Files are checked off as they are done.
Step10
Shows the success or failure of database installation.
Q2. What are the hardware and software requirements for installing oracle warehouse
builder?
Configuring Listener:-
Run Net Configuration Assistant to configure a listener.
Step1:-
The first screen is a welcome screen. Select Listener Configuration option from it and then click
next button.
Step3:-
The third screen asks you to enter a name for the listener. The default name is “LISTENER”.
Enter a new name or continue with the default and then click next button to proceed.
Step5:-
The fifth and final screen asks the TCP/IP port number for the listener to run. The default port
number is 1521 and continues with the default port number.
It will ask us if we want to configure another listener. Select no to finish the listener
configuration.
Step1:-
The first step is to specify what action to take. Since we do not have a database created, we'll
select the Create a Database option.
Step2:-
This step will offer the following three options for a database template to select:
General Purpose or Transaction Processing
Custom Database
Data Warehouse
We are going to choose the Data Warehouse option for our purposes
On the final Database Configuration Screen, there is a button in the lower right corner labeled
Password Management.
We’ll scroll down until we see the OWBSYS schema and click on the check box to uncheck it
(indicating we want it unlocked) and then type in a password and confirm it as shown in the
following image:
Design Center:-
The Design Center is the primary graphical user interface for designing a logical design of the
data warehouse.
Design Center is used to:
import source objects
design ETL processes
Define the integration solution.
Control Center Manager:-
It is a part of the design center. It manages communication between target schema and design
center. As soon as we define a new object in the Design Center, the object is listed in the
Control Center Manager under its deployment location.
Repository Browser:-
The Host Name is the name assigned to the computer on which we've installed the database,
and we can just leave it at LOCALHOST.
The Port Number is the one we assigned to the listener back when we had installed it. It
defaults to the standard 1521.
Oracle Service Name is name of database i.e. ACMEDW
Q8. What is design center? Explain the functions of project explorer and connection
explorer windows.
Design Center:-
The Design Center is the main graphical interface used for the logical design of the data
warehouse. Through Design Center we define our sources and targets and design our ETL
processes to load the target from the source. The logical design will be stored in a workspace in
the Repository on the server.
Below diagram shows the Design Center which consist of Project Explorer Connection
Explorer and Global Explorer.
Project Explorer:-
Project Explorer window we can create objects that are relevant to our project. It has nodes for
each of the design objects we'll be able to create.
We need to design an object under the Databases node to model the source database. If we
expand the Databases node in the tree, we will notice that it includes both Oracle and Non-
Oracle databases.
It also has option to pull data from flat files. The Project Explorer can also be used for defining
the target structure.
Connection Explorer:-
The Connection Explorer is where the connections are defined to our various objects in the
Project Explorer. The workspace has to know how to connect to the various databases, files,
and applications we may have defined in our Project Explorer.
As we begin creating modules in the Project Explorer, it will ask for connection information and
this information will be stored and be accessible from the Connection Explorer window.
Connection information can also be created explicitly from within the Connection Explorer .
Global Explorer
The Global Explorer is used to manage these objects. It includes objects such as Public
Transformations or Public Data Rules.
A transformation is a function, procedure, or package defined in the database in Oracle's
procedural SQL language called PL/SQL. Data rules are rules that can be implemented to
enforce certain formats in our data.
Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 27
Q9. Write a procedure to create new project in OWB. What is difference between a
module and a project?
Following steps are used to create new project in OWB.
Step1: Launch the Design Center
Step2: Right-click on the project name in the Project Explorer and select Rename from the
resulting pop-up menu. Alternatively, we can select the project name, then click on the Edit
menu entry, and then on Rename.
Module:
Modules are grouping mechanisms in the Projects Navigator that correspond to locations in the
Locations Navigator. A single location can correspond to one or more modules. However, a
given module can correspond to only one metadata location and data location at a time.
The association of a module to a location enables you to perform certain actions more easily in
Oracle Warehouse Builder. For example, group actions such as creating snapshots, copying,
validating, generating, deploying, and so on, can be performed on all the objects in a module by
choosing an action on the context menu when the module is selected
Project:-
Project contains one or more a module(s).There are 2 types of module. One is Oracle and
second one is Non-Oracle.
The steps involved in creating the module and importing the metadata for a flat file are:
1. The first task we need to create a new module to contain our file definition. If we look in the
Project Explorer under our project, we'll see that there is a Files node right below the Databases
node. Right-click on the Files node and select New from the pop-up menu to launch the wizard.
3. We need to edit the connection in Step 2. So we'll click on the Edit button, we see in the
following image, it only asks us for a name, a description, and the path to the folder where the
files are.
4. The Name field is prefilled with the suggested name based on the module name. As it did for
the database module location names, it adds that number 1 to the end. So, we'll just edit it to
remove the number and leave it set to ACME_FILES_LOCATION.
5. Notice the Type drop-down menu. It has two entries: General and FTP. If we select FTP (File
Transfer Protocol used for getting a file over the network), it will ask us for slightly more
information.
6. The simplest option is to store the file on the same computer on which we are running the
database. This way, all we have to do is enter the path to the folder that contains the file. We
should have a standard path we can use for any files we might need to import in the future. So
we create a folder called Getting Started with OWB_files, which we'll put in the D: drive. Choose
any available drive with enough space and just substitute the appropriate drive letter. We'll click
on the Browse button on the Edit File System Location dialog box, choose the file path, and
click on the OK button.
7. We'll then check the box for Import after finish and click on the Finish button.
Step2:-
The screen starts by suggesting a connection name based on the name we gave the module.
Click on the Edit button beside the Name field to fill in the details. This will display the following
screen:
Here, sales indicate data about products sold and to be sold in a company.
The dimensions become the business characteristics about the sales, for example:
• A time dimension—users can look back in time and check various time periods
• A store dimension—information can be retrieved by store and location
• A product dimension—various products for sale can be broken out
Think of the dimensions as the edges of a cube, and the intersection of the dimensions as the
measure we are interested in for that particular combination of time, store, and product.
A picture is worth a thousand words, so let's look at what we're talking about in the following
image:
Think of the width of the cube, or a row going across, as the product dimension. Every piece of
information or measure in the same row refers to the same product, so there are as many rows
in the cube as there are products.
Think of the height of the cube, or a column going up and down, as the store dimension. Every
piece of information in a column represents one single store, so there are as many columns as
there are stores.
Finally, think of the depth of the cube as the time dimension, so any piece of information in the
rows and columns at the same depth represent the same point in time. The intersection of each
of these three dimensions locates a single individual cube in the big cube, and that represents
the measure amount we're interested in. In this case, it's dollar sales for a single product in a
single store at a single point in time.
In relational implementation data is organized into dimension tables, fact tables and materialized
views. A multidimensional implementation requires a database with special features that allow it
to store cubes as actual objects in the database.
The data required for the analysis is extracted from relational data warehouse or other data
sources and loaded in a multidimensional database which looks like a hypercube. Hypercube is
a cube with many dimensions.
The association of a module to a location enables you to perform certain actions more easily in
Oracle Warehouse Builder. For example, group actions such as creating snapshots, copying,
validating, generating, deploying, and so on, can be performed on all the objects in a module by
choosing an action on the context menu when the module is selected
All modules, including their source and target objects, must have locations associated with them
before they can be deployed. You cannot view source data or deploy target objects unless there
is a location defined for the associated module.
Source Module:-
A source module is composed of source statements in the assembler language.
It accepts a no input from the data stream because they are used at the start of a workflow.
It is a place where data are stores.
Target Module:-
A target module is composed of target statements in the assembler language
It accepts input from the data stream.
It is place where data are extracts.
Q5. List and explain the functionalities that can be performed by OWB in order to create
data warehouse.
The Oracle Warehouse Builder is a tool provided by Oracle, which can be used at every stage
of the implementation of a data warehouse, from initial design and creation of the table structure
to the ETL process and data-quality auditing. So, the answer to the question of where it fits in is
everywhere.
List of Functions:
Data modeling
Extraction, Transformation, and Load (ETL)
Data profiling and data quality
Metadata management
Business-level integration of ERP application data
Integration with Oracle business intelligence tools for reporting purposes
Vii. Advanced data lineage and impact analysis
Oracle Warehouse Builder is also an extensible data integration and data quality solutions
platform. Oracle Warehouse Builder can be extended to manage metadata specific to any
application, and can integrate with new data source and target types, and implement support for
new data access mechanisms and platforms, enforce your organization's best practices, and
foster the reuse of components across solutions.
Target schema:-
A target schema contains the data objects that contain your data warehouse data. The target
schema is going to be the main location for the data warehouse. When we talk about our "data
warehouse" after we have it all constructed and implemented, the target schema is what we will
be referring to. You can design a relational target schema or a dimensional target schema.
Every target module must be mapped to a target schema.
Step1
The Time/Date dimension provides the time series information to describe warehouse data.
Most of the data warehouses include a time dimension.
Also the information it contains is very similar from warehouse to warehouse. It has levels such
as days, weeks, months, etc. The Time dimension enables the warehouse users to retrieve data
by time period.
Step5: Summary of Time Dimension before creation of the Sequence and Map
Dimension Attributes
The Attributes are actual data items that are stored in the dimension that can be found at more
than one level. Say for example the time dimension has following attributes in each level: id
(identifies that level), Start and end date (designate time period of that level), time span (number
of days in the time period), description etc.
Level Attributes
Each level has Level Attributes associated with it that provide descriptive information about the
value in that level. For example, Day level has level attributes such as day of week, day of
month, day of quarter, day of year etc.
Hierarchies
It is composed of certain levels in order. There can be one or more hierarchies in a dimension.
The month, quarter and year can be a hierarchy. The data can be viewed at each of these
levels, and the next level
Example:-
The source tables have columns such as AIRPORT_NAME or CITY_NAME which are stated as
the primary keys (according to the business users) but, these can change and we could
consider creating a surrogate key called, say, AIRPORT_ID.
This would be internal to the warehouse system and as far as the client is concerned you may
display only the AIRPORT_NAME. Surrogate keys are numeric values and hence Indexing is
faster.
ETL stands for extract transform and load. The ETL process transforms the data from an
application-oriented structure into a corporate data structure. Once the source and target
structures defined, we can move on to the following activities in constructing a data warehouse.
The data warehouse architect builds a source to-target data map before ETL processing starts.
The source target map specifies:
The data mapping is the input needed to feed the ETL process. Mappings are visual
representations of the flow of data from source to target and the operations that need to be
performed on the data.
Q2. What is staging? What are its benefits? Explain the situation where staging is
essential.
Staging:-
Staging is the process of copying the source data temporarily into tables in target database. The
purpose is to perform any cleaning and transformations before loading the source data into the
final target tables. Staging stores the results of each logical step of transformation in staging
tables. The idea is that in case of any failure you can restart your ETL from the last successful
staging step.
This process will take a lot longer if we directly access the remote database to pull and
transform data. We'll also be doing all of the manipulations and transformations in memory and
if anything fails; we'll have to start all over again.
Benefits:-
• Source database connection can be freed immediately after copying the data to the staging
area. The formatting and restructuring of the data happens later with data in the staging area.
• If the ETL process needs to be restarted, there is no need to go back to disturb the source
system to retrieve the data.
This is the layer where the cleansed and transformed data is temporarily stored. Once the data
is ready to be loaded to the warehouse, we load it in the staging database. The advantage of
using the staging database is that we add a point in the ETL flow where we can restart the load
from. The other advantages of using staging database is that we can directly utilize the bulk
load utilities provided by the databases and ETL tools while loading the data in the
warehouse/mart, and provide a point in the data flow where we can audit the data.
In the absence of a staging area, the data load will have to go from the OLTP system to the
OLAP system directly, which in fact will severely hamper the performance of the OLTP system.
This is the primary reason for the existence of a staging area. Without applying any business
rule, pushing data into staging will take less time because there are no business rules or
transformation applied on it.
Disadvantages:
It takes more space in database and it may not be cost effective for client.
Disadvantage of staging is disk space as we have to dump data into a local area.
Q3. Write the steps for building staging area table using Data Object Editor.
It is explained here with example.
STEP 1:-Navigate to the Databases | Oracle | ACME_DATA WAREHOUSE module. We will
create our staging table under the Tables node, so let’s right-click on that node and select
New.... from the pop-up menu.
STEP 2:-Upon selecting New.... we are presented with the Data Object Editor screen. However,
instead of looking at an object that’s been created already, we’re starting with a brand-new one.
STEP 3:-The first tab is Name tab where we’ll give our new table a name. Let’s call it
POS_TRANS_STAGE for Point-of-Sale transaction staging table. We’ll just enter the name into
the Name field, replacing the default TABLE_1 that it suggested for us.
STEP 4:-Let’s click on the Columns tab next and enter the information that describes the
columns of our new table. We have listed the key data elements that we will need for creating
the columns. We didn’t specify any properties of those data elements other than the name, so
we’ll need to figure that out.
Q4. What are mapping operators? Explain any two source target mapping operators in
detail.
Mapping operators
These are the basic design elements to construct an ETL mapping. Used to represent sources
and targets in the data flow. Also used to represent how to transform the data from source to
target.
Q5. List and explain the use of various windows available in mapping editor.
Mapping-The mapping window is the main working area on the right where we will design the
mapping. This window is also referred as canvas.
Explorer-This window is similar to project explorer in design center. It has two tabs that is
available object tab & selected object tab.
Mapping properties-The Mapping properties window display various property that can be set
for objects in our mapping. When an object is selected in the canvas its property will be display
in this window.
Palette-This palette contains each of the objects that can be used in our mapping. We can click
on the object we want to place in the mapping and drag it onto the canvas.
Bird’s Eye View-This window display miniature version of entire canvas & allows us to store
around the canvas without using scroll bar.
Cube Operator: An operator that represents a cube. This operator will be used to represent
cube in our mapping.
Dimension Operator: An operator that represents dimensions. This operator will be used in our
mapping to represent them.
Joiner
• This operator will implement an SQL join on two or more input sets of data, and produces a
single output row set.
• A join takes records from one source and combines them with the records from another source
using some combination of values that are common between the two.
Q8. What are data flow operators? Explain the concept of pivot operator with example.
Data flow operators
A data warehouse requires restructuring of the source data into a format that is congenial for the
analysis of data. The data flow operators are used for this purpose. These operators are
dragged and dropped into our mapping between our sources and targets. Then they are
connected to those sources and targets to indicate the flow of data and the transformations that
will occur on that data as it is being pulled from the source and loaded into the target structure.
Pivot
The pivot operator enables you to transform a single row of attributes into multiple rows.
Suppose we have source records of sales data for the year that contain a column
We wish to transform the data set to the following with a row for each quarter:
2005 Q2 15000
2005 Q3 14000
2005 Q4 25000
• It has two groups defined—an input group, INGRP1—and an output group, OUTGRP1.
• Link the SALE_DATE attribute of source table to the INGRP1 of the EXPRESSION operator.
• Right-click on OUTGRP1 and select Open Details... from the pop-up menu.
• This will display the Expression Editor window for the expression.
• Click on the Output Attributes tab and add a new output attribute OUTPUT1 of number type
and click OK.
• Click on OUTPUT1 output attribute in the EXPRESSION operator and turn our attention to the
property window of the Mapping Editor.
Partition Tab:
A partition is a way of breaking down the data stored in a table into subsets that are stored
separately. This can greatly speed up data access for retrieving random records, as the
database will know the partition that contains the record being searched for based on the
partitioning scheme used.
It can directly home in on a particular partition to fetch the record by completely ignoring all the
other partitions that it knows won't contain the record.
Q2. What is the purpose of main attribute group in a cube? Discuss about dimension
attributes and measures in the cube.
The first group represents main attributes for the cube and contains data elements to which we
will need to map. Other groups represent the dimensions that are linked to the cube. As far as
the dimensions are concerned we make separate map for them prior to cube mapping.
The data we map for the dimensions will be to attributes in the main cube group, which will
indicate to the cube which record is applicable from each of the dimensions.
• Cube has attributes for surrogate and business identifiers defined for each dimension of the
cube.
• All business identifiers are prefixed with the name of the dimension
• The name of a dimension is used as the surrogate identifier for that dimension.
• Say for example, if SKU and NAME are two business identifiers in PRODUCT dimension, then
the main attribute group will have three PRODUCT related identifiers; PRODUCT_SKU,
PRODUCT_NAME, PRODUCT.
Q7. Write the steps for validating and generating in Data Object Editor
(I) Validating in the Data Object Editor:
Consider, we have POS_TRANS_STAGE table i.e. staging table defined.
Let's double-click on the POS_TRANS_STAGE table name in the Design Center to launch the
Data Object Editor so that we can discuss validation in the editor.
We can right-click on the object displayed on the Canvas and select Validate from the pop-up
menu
(ii) We can select Validate from the Object menu on the main editor menu bar.
(iii) To validate every object currently loaded into our Data Object Editor. It is to select Validate
All from the Diagram menu entry on the main editor menu bar. We can also press the validate
icon on the General Toolbar, which is circled in the following image of the toolbar icons:
When validating from the Design Center. Here we get another window created in the editor, the
Generation window, which appears below the Canvas window.
When we validate from the Data Object Editor, it is on an object-by-object basis for objects
appearing in the editor canvas. But when we validate a mapping in the Mapping editor, the
mapping as a whole is validated all at once. Let's close the Data Object Editor and move on to
discuss validating in the Mapping Editor.
Data Object Editor and open our POS_TRANS_STAGE table in the editor by double-clicking on
it in the Design Center.
(ii) The Generate entry on the pop-up menu when we right-click on an object,
(iii)Generate icon on the general toolbar right next to the Validate icon as shown in the following
image:
Result:
Q8. What is object deployment? Explain the functions of control center manager.
Deployment is the process of creating physical objects in the target schema based on the
logical definitions created using the Design Center.
If no attribute group is selected when we select the intermediate option in the drop-down menu,
we'll immediately get a message in the Script tab saying the following:
Please select an attribute group.
When we click on an attribute group in any operator on the mapping, the Script window
immediately displays the code for setting the values of that attribute group.
When we selected the Intermediate generation style, the drop-down menu and buttons on the
right-hand side of the window became active. We have a number of options for further
investigation of the code that is generated,
If we need to make modifications later and something goes wrong, or we just want to reproduce
a system from an earlier point in time, we have a ready-made copy available for use. We won't
have to try to manually back out of any changes we might have made.
We would also be able to make comparisons between that saved version of the object and the
current version to see what has been changed.
The Warehouse Builder has a feature called the Recycle Bin for storing deleted objects and
mappings for a later retrieval.
It allows us to make copies of objects by including a clipboard for copying and pasting to and
from, which is similar to an operating system clipboard. It also has a feature called Snapshots,
which allows us to make a copy (or snapshot) of our objects at any point during the cycle of
developing our data warehouse that can later be used for comparisons.
Q2. What is recycling bin? Describe the features of warehouse builder recycle bin
window.
Recycle Bin
Full Snapshots:
Full snapshots provide complete metadata of an object that you can use to restore it later.
So it is suitable for making backups of objects.
Full snapshots take longer time to create and require more storage space than signature
snapshots.
Signature Snapshots:
Q4. What are the different operations that can be performed on a snapshot of an object
that is created?
The workspace objects can be exported and save them to a file. We can export anything from
an entire project down to a single data object or mapping. Following are the benefits of
Metadata Loader exports and imports
Backup
To transport metadata definitions to another repository for loading
If we choose an entire project or a collection such as a node or module, it will export all objects
contained within it. If we choose any subset, it will also export the context of the objects so that
it will remember where to put them on import.
Say for example if we export a table, the metadata also contains the definition for:
The module in which it resides
The project the module is in.
We can also choose to export any dependencies on the object being exported if they exist.
Following steps are used to export object in Design Center:-
Select the project by clicking on it and then select Design | Export |Warehouse Builder
Metadata from the main menu.
Accept the default file names and locations, and click on the Export button.
1. Metadata Snapshots
2. The Import Metadata Wizard
Metadata Snapshots:-
A snapshot captures all the metadata information about the selected objects and their
relationships at a given point in time. While an object can only have one current definition in a
workspace, it can have multiple snapshots that describe it at various points in time.
Snapshots are stored in the Oracle Database, in contrast to Metadata Loader exports, which are
stored as separate disk files. You can, however, export snapshots to disk files. Snapshots are
also used to support the recycle bin, providing the information needed to restore a deleted
metadata object.
When you take a snapshot, you capture the metadata of all or specific objects in your
workspace at a given point in time. You can use a snapshot to detect and report changes in
your metadata. You can create snapshots of any objects that you can access from the Projects
Navigator.
A snapshot of a collection is not a snapshot of just the shortcuts in the collection but a snapshot
of the actual objects.
Import Metadata Wizard:-
The Import Metadata Wizard automates importing metadata from a database into a module in
Oracle Warehouse Builder. You can import metadata from Oracle Database and non-Oracle
databases.
Each module type that stores source or target data structures has an associated Import Wizard,
which automates the process of importing the metadata to describe the data structures.
Importing metadata saves time and avoids keying errors, for example, by bringing metadata
definitions of existing database objects into Oracle Warehouse Builder.
The Welcome page of the Import Metadata Wizard lists the steps for importing metadata from
source applications into the appropriate module. The Import Metadata Wizard for
Oracle Database supports importing of tables, views, materialized views, dimensions, cubes,
external tables, sequences, user-defined types, and PL/SQL transformations directly or through
object lookups using synonyms.
When you import an external table, Oracle Warehouse Builder also imports the associated
location and directory information for any associated flat files.
Synchronizing means maintaining consistency between workspace object with its mapping
operator.
It can be achieved with the help of:-
Inbound Synchronization
Outbound Synchronization
Inbound Synchronization:-
Inbound uses the specified repository object to update the operator in our mapping for
matching. It means that the changes in workspace object will be reflected in mapping operator.
Outbound Synchronization:-
Outbound option would update the workspace object with the changes we've made to the
operator in the mapping. It means that the changes in mapping operator will be reflected in
workspace object
Following are the three matching strategies:-
Match by Object Identifier
Each source attribute is identified with a uniquely created ID internal to the Warehouse Builder
metadata. The unique ID stored in the operator for each attribute is exactly same as that of the
corresponding attribute in the workspace object to which the operator is synchronized with. This
matching strategy compares the unique object identifier of an operator attribute with that of a
workspace object.
Match by Object Name
This strategy matches the bound names of the operator attributes to the physical names of the
workspace object attributes.
OLAP Terminologies:
Cube:-
Data in OLAP databases is stored in cubes. Cubes are made up of dimensions and measures.
A cube may have many dimensions.
Dimensions:-
In an OLAP database cube categories of information are called dimensions. Some dimensions
could be Location, Products, Stores, and Time.
ROLAP stands for Relational Online Analytical Processing. They provide a multidimensional
view of this data. Below diagram show the architecture of ROLAP.
All of the relational OLAP vendors store the data in a special way known as a star or snowflake
schema. The most common form of these stores the data values in table known as the fact
table.
One dimension is selected as the fact dimension and this dimension forms the columns of the
fact table. The other dimensions are stored in additional tables with the hierarchy defined by
child-parent columns.
Since SQL was designed as an access language to relational databases, it is not necessarily
optimal for multidimensional queries.
The vast majority of ROLAP applications are for simple analysis of large volumes of information.
Retail sales analysis is the most common one.
The complexity of setup and maintenance has resulted in relatively few applications of ROLAP
to financial data warehousing applications such as financial reporting or budgeting.
Q11. Explain RAP (Real-Time Analytical Processing).
RAP takes the approach that derived values should be calculated on demand, not pre-
calculated. This avoids both the long calculation time and the data explosion that occur with the
pre-calculation approach used by most OLAP vendors.
In order to calculate on demand quickly enough to provide fast response, data must be stored in
memory. This greatly speeds calculation and results in very fast response to the vast majority of
requests.
Another refinement of this would be to calculate numbers when they are requested but to retain
the calculations (as long as they are still valid) so as to support future requests. This has two
compelling advantages.
First, only those aggregations which are needed are ever performed. In a database with a
growth factor of 1,000 or more, many of the possible aggregations may never be requested.
Second, in a dynamic, interactive update environment, budgeting being a common example,
calculations is always up to date.
There is no waiting for a required pre-calculation after each incremental data change. It is
important to note that since RAP does not pre-calculate, the RAP database is typically 10 per
cent to 25 per cent the size of the data source.
Data Explosion:-
Consider what happens when a 200 MB source file explodes to 10 GB. The database no longer
fits on any laptop for mobile computing. When a 1 GB source file explodes to 50 GB it cannot be
accommodated on typical desktop servers.
In both cases, the time to pre-calculate the model for every incremental data change will likely
be many hours. So even though disk space is cheap, the full cost of pre-calculation can be
unexpectedly large.
Therefore it is critically important to understand what data sparsity and data explosion are, what
causes these, and how these can be avoided for, the consequences of ignoring data explosion
can be very costly and, in most cases, result in project failure.
Data Sparsity:-
Input data or base data in OLAP applications is typically sparse (not densely populated). Also,
as the number of dimensions increase, data will typically become sparser (less dense). Data
explosion is the phenomenon that occurs in multidimensional models where the derived or
calculated values significantly exceed the base values.