You are on page 1of 9

Datawarehouse and Datamart

A data mart is a simple form of a data warehouse that is focused on a single subject (or functional
area), such as Sales or Finance or Marketing
A datawarehouse is a collection of multiple subject areas. It is central unit which is made by
combining all the data marts

ETL Plan
What is ETL Plan?
ETL stands for Extract, Transform and Load. ETL Plan is to design the flow of the metadata
Extract ----------------------> Transform ----------------------> Load
Source Transformation Rule Target
Source Definition: It is the structure of the source table from which the data is extracted
Target Definition: It is the structure of the target table to which the data is loaded
Transformation Rule: It is the business logic used for transforming the data

OLTP and OLAP


What is OLTP and OLAP?
OLTP stands for OnLine Transaction Processing. It is designed for business transaction process. It is
designed for fast storing of data. The data here is in normalized form without data duplicates
OLAP stands for OnLine Analytical Processing. It is designed for analyzing the business. It is
designed for fast retrieving of data. The data here is in de-normalized form with data duplicates
OLTP-------------------------> ETL--------------------> OLAP
OLTP OLAP
To support business transaction processing To support decision making process
Volatile data Non volatile data
Current data Historical data
Detailed data Summary data
Designed for running the business Designed for analyzing the business
Normalization De-normalization
Application oriented data Subject oriented data
ER modelling Dimensional modelling
NOTE: ETL stands for extract transform and load
http://prashanthobiee.blogspot.in/2012/12/oltp-and-olap.html

Star Schema and Snowflake Schema


What is Star schema and SnowFlake schema?
Star Schema: A Star schema is a schema in which a fact is connected to multiple dimensions and
dimension table doesn't have any parent table.
Snowflake Schema: A SnowFlake schema is a schema in which a fact is connected to multiple
dimensions and dimension table have one or more parent table. In other words, snowflake schema is
"a star schema with dimensions connected to some more dimensions"

Star Schema vs Snowflake schema


Star Schema
Snowflake Schema
Has data redundancy(duplicate data) and
difficult
to maintain
No data redundancy and easy to maintain
Has De-normalized tables Has Normalized tables
Suitable for large datawarehouse Suitable for small datawarehouse
Dimension table is not connected to other
dimension table
Dimension table is connected to another
dimension table
Queries are less complex and easy to understand Queries are more complex and difficult to
understand
Has less number of joins Has more number of joins
Less query execution time More execution time because of complex
Queries.

ODBC and OCI


What is ODBC and OCI?
ODBC stands for Open Database Connectivity and is also known as Universal Data Connector.
ODBC can be used to connect to any type of data source
OCI stands for Oracle Call Interface and is used to connect only to Oracle data source

Slowly Changing Dimension


What is a SLOWLY CHANGING DIMENSION?
A Slowly Changing Dimension is a dimension that change over time.
Example:
An employee Location can change over time
Slowly Changing Dimension can be categorized into 3 types
SCD type1
Slowly Changing Dimension type 1
Slowly Changing Dimension Type 1
In SCD Type1 the old data is overwritten by the new data. The old data is permanently deleted
Let us consider an example
EmpNo Name Location
1001 Mark London
Consider Mark moves to Manchester. In SCD type 1 the old data is completely vanishes. So the data
will be stored as
EmpNo Name Location
1001 Mark Manchester
There are no traces of old location
Advantages:
SCD type 1 can be easily implemented as there is overwriting of the data and we don't store the
historical data
Disadvantages:
The disadvantage of SCD type 1 is that the organisation doesn't where Mark worked before i.e., there
is no trace that Mark worked in London
NOTE: SCD type 1 stores only current data
SCD type2
Slowly Changing Dimension type 2
Slowly Changing Dimension Type 2
In SCD type 2 a new record will be inserted in addition to the old record. In SCD type 2 historical
data is maintained so you will have both the old record and the new record
Let us consider the example we used in SCD type 1
EmpKey Name Location
1001 Mark London
Now, Mark moves from London to Manchester. In SCD type 2 a new record is created to the old record.
So, now you have
EmpKey Name Location
1001 Mark London
2001 Mark Manchester
If he moves from Manchester to Dublin then you have
EmpKey Name Location
1001 Mark London
2001 Mark Manchester
3001 Mark Dubin
So, historical data is maintained
Advantages:
As SCD type 2 contains historical data the organisation can track where Mark worked before moving to
Manchester
Disadvantages:
As it maintains historical data the size of the table increases gradually
Note: SCD type 2 stores Current data + Historical data

SCD type3
Slowly Changing Dimension type 3
Slowly Changing Dimension type 3
In SCD type 3, a new column is added to the orginal data, which displays the partial historical data
Let us consider the same example that we used in SCD type 1 and SCD type 2
EmpNo Name Location
1001 Mark London
Consider Mark moves from London to Manchester, then a new column is added to the previous data as
shown below
EmpNo Name CurrentLocation PreviousLocation
1001 Mark Manchester London
Consider Mark moves again from Manchester to Dublin then you have
EmpNo Name CurrentLocation PreviousLocation
1001 Mark Dublin Manchester
So, SCD type 3 maintains partial history
Advantages:
It maintains partial history and with this size of the table doesn't increase as much as it does in SCD type
2
Disadvantages:
The organisation doesn't have complete information the employee as it cannot track the complete history
of the employee i.e., from the above example when Mark moves to Dublin the organisation doesn't have
any trace of his working in London
Note: SCD type 3 maintains Current data + partial history

Import Excel Worksheet


How do we import data from an Excel work sheet?
To import metadata from an excel sheet we need to create a driver for excel data source. This can
be achieved by using the following steps
Open Control Panel > Administrative Tools > DataSources (ODBC)
Click on the System DSN tab

Click on Add
Select the Excel driver from the given list
Click on OK
A new window opens
Enter the data source name
Select the excel work from which you want to import metadata
Now open BI Administration tool
Go to File > Import Metadata
Select the Data Source that you have just created
Now you can import the required data from your Excel Worksheet

Define repository in terms of OBIEE


Repository stores the Meta data information. OBIEE repository is a file system and the extension of
the repository file is .RPD
· All the rules needed for security, data modeling, aggregate navigation, caching, and connectivity is
stored in metadata repositories.
· Each metadata repository can store multiple business models. BI Server cannot access multiple
repositories.

OBIEE 10g vs OBIEE 11g?


What is the difference between OBIEE 10g and OBIEE 11g?
A Database Repository must be created before installing OBIEE by using Repository Creation
Utility(RCU) tool
OBIEE 11g uses a WEBLOGIC server as Applictaion server whereas OBIEE 10g uses OC4J
Many configuration settings (such as uploading a repository into BI server) can be done using EM
OBIEE 11g displays table names and COLUMN NAMES while mapping whereas OBIEE 10g displays
only table names
OBIEE 11g join is done from fact to dimension where as in 10g join is done from dimension to fact
In 10g Users and Groups(or Roles) are created in repository whereas in 11g Users and Groups are
created in EM
Groups no longer exist and are replaced by Application roles. Data level security is implemented by
using Application roles to which users belong to.
In Presentation Catalog, AuthenticatedUser role is used instead of Everyone group
LTS priority ordering is introduced
We can model LOOK UP TABLE is repository
Presentation layer Hierarchies are introduced
New time series functions PERIOD ROLLING and AGGREGATE AT is introduced
TIME SERIES functions can be created in the front end also
AGGREGATE PERSISTENCE WIZARD creates indexes automatically
SESSION VARIABLES are intialised only when they are used
In addition to the existing views, Map View is introduced in 11g
RAGGED and SKIPPED HIERARCHY are supported in 11g
PARENT CHILD Hierarchy is introduced
Presentation Variable can hold multiple values
KPI's and SCORECARD's are introduced in 11g
ACTION LINKS, MASTER-DETAIL reports, SELECTION STEPS are introduced
SELECT_PHYSICAL is supported
REQUESTS are renamed as ANLAYSIS
IBOTS are renamed as AGENTS
CHARTS are renamed as GRAPH

Confirmed Dimension
What is a CONFIRMED DIMENSION?
A Dimension which exists in more than one fact table is known as a Confirmed Dimension

Factless fact table


What is a Factless Fact table?
A Fact table is a table which consists of measurements, metrics or facts of a business process. A
factless fact table is a fact table that doesn't contain any facts. They contain information to capture
information but not for calculation
Example:
Tracking a student or employee attendance.

Degenerate Dimension
What is a Degenerate Dimension ?
A degenerate dimension is a dimension which is stored in a fact table.
Ex:
Consider SH schema and a new dimension Sale_id in the Sales Fact table is inserted then Sale_id is
known as Degenerate Dimension

Alias Table
What is an Alias table and why is it used?
An alias table (Alias) is a physical table which references a different physical table as its source
Advantages of using alias table:
It allows you to reuse any existing table more than once, without having to import it
several times
It can be used to avoid circular joins by setting multiple tables each with different keys,
names or joins For example: Order date and Shipping date may reference to same
column in the time dimension table. By using alias you can create two different
tables OrderDate and ShippingDate
It can be used for best practice naming conventions as you can rename the table and
leaving the original physical table as it.
Connection Pool
What is a connection pool in OBIEE?
Connection pool contains the information about the connection between the data source and Oracle
BI server. It is used for importing the metadata and for queries initiated by the initialization blocks.
In general we have separate connection pool's from importing the data and for the initialization
blocks.
Note: In OBIEE we never import data we just import metadata
Connection pool properties
Connection pool Properties:
A connection pool has 3 tabs
1 General
2 XML
3 Writeback
1 General
Name: This is to give a name for the connection pool
Call Interface: This specifies that the analytics will use this driver to connect to the data source. In
this we have a list of options available such as ODBC, OCI, etc. ODBC can be used for any kind of
data source whereas OCI is used only with Oracle data source.
Maximum Connections: This is used to define the number of concurrent users in the
organisation. This is generally specified by the DBA. The default value is 10. Once the limit is
reached BI server checks for the other available connection pools, if none of them exists it waits until
the connection is available
Required fully qualified table names: When this option is selected, all the requests sent from
this connection pool use fully qualified table names i.e., DB.schema.tablename
Data Source Name: As the name suggests Data source name to which the queries will be routed.
Shared logon: If this option is checked all the requests through this connection pool use the
username and password specified in the connection pool. If this option is unchecked all the
connections through this connection pool use the database user ID and password specified in the
DSN
Enable Connection Pooling: It allows multiple concurrent query requests to share a single
database connection. This reduces the overhead of connecting to a database because it doesn't open
and close a new connection for every query. If this option is unchecked each query sent to the
database opens a new connection
Timeout: It is the idle time (after the request completes) for the connection to be closed. During
the idle time new requests use this connection instead of opening a new connection. If this is set to
0(zero) then it means that the connection pooling is disabled.
Use multithreaded connections: If this option is checked Oracle BI terminates idle physical
threads or queries else the one thread is tied to one database and these idle threads consume
memory
Execute queries asynchronously: As the name suggests if this option is checked then the
queries run asynchronously else they run synchronously. By default this option is unchecked
Execute on Connect: This is used for the server to specify a command each time the connection is
established with the database. This can be any database accepted command.
Parameters supported: If this option is checked, that means all the database parameters
mentioned in the database features are supported by the server. This option is checked by default
Isolation level: This option controls the transaction locking for all the requests issued by this
connection. These are of 4 types
a. Dirty Read: This is known as 0(zero) locking. It can read uncommitted or dirty data, change
values in data during read process in a transaction. Lease restrictive of all types
b. Committed Read: Locks are held while the data is read to avoid dirty reads. Data can be
changed before the transaction ends with that connection
c. Repeatable Read: This places locks on all data used in a query so that nobody can update the
data. However new rows can be inserted by other users but will be available in later reads in the
current transactions.
d. Serialization: This places a range lock on data set preventing other users to insert or update the
rows in data set until the transaction is complete. It is the most restrictive of all types
NQSConfig.ini
What is NQSConfig.ini? Where is it located?
NQSConfig.ini is the initialization file used by Oracle BI server to set parameters on start up. Each
instance of Oracle BI Server has ots own NQSConfig.ini file.
The Oracle BI Server reads the NQSConfig.INI file each time it is started.
The path for NQSConfig.ini is
In OBIEE 10g:
Drive:/OracleBI/Server/Config/
In OBIEE 11g:
Drive:\OBIEE11G\instances\instance1\config\OracleBIServerComponent\coreapplication_obis1/

Can you change the location of your rpd file in your OBIEE Configuration? If Yes, Where would you mention the new
location of this rpd file for BI Server?
In OBIEE 10g you need to specify it in the NQSConfig.ini
In OBIEE 11g repository is managed using EM (Enterprise Manager). We can directly specify the
path to load the rpd
If you have more than 3 repository files mentioned in you NQSConfig.ini file as default,
which one gets loaded into the memory when BI server is started and Why?
Ex:
Star = SamplerRepository1.rpd, DEFAULT;
Star = SamplerRepository2.rpd, DEFAULT;
Star = SamplerRepository3.rpd, DEFAULT;
SamplerRepository3.rpd will be loaded into the memory because as the server starts reading from the
top each time it loads the next repository. So finally the last rpd will be loaded into the server
memory
What are the minimum services needed to load a repository file onto memory and view
a dashboard which has reports that have been refreshed on a scheduled basis?
OC4J(10g) or Weblogic(11g), BI server, Presentation Server, Scheduler server
What are the different places (files) to view the physical sql generated by an Answers
report?
NQQuerylog
Administration > Manage Sessions > View Log
Where does the BI Server logs its start, stop and restart times in the file system?
NQServer.log
You have two tables Table 1 and Table 2 joined by a foreign key in the database? They
are imported together from the database into your physical layer. Is this relationship
still preserved in the OBIEE physical layer?
Yes
Same as the above question but what happens if you import each table separately?
Keys will be affected but not the joins
If Table 1 and Table 2 are dragged from physical layer to BMM layer, which table
becomes a Fact Table and which table becomes a Dimension Table?
Table with primary becomes the dimension table and table with foreign key becomes Fact table
What if the tables (Table 1 and Table 2) are not joined, then what happens in BMM
layer?
Both tables act as Fact table
Does OBIEE store physical sql ? How is physical sql generated in OBIEE
environments?
Yes, physical SQL is generated by the query compiler during the query processign of the logical sql
Are there any occasions where physical sql is not generated when running against a
backend database like Oracle, SQL Server or any other relational database?
When the logging level is 0(zero) or when the query log file reaches maximum limit

Complex join
What is a Complex join?
A Complex Join in used in the Business Model & Mapping Layer (BMM layer) in the repository.
Logical tables in a BMM layer can have multiple logical tables sources (LTS). Complex join actually is
an intelligent join between LTSs of two logical tables in BMM Layer. When two columns are selected
for a query in Answers/Dashboard, BI Server reads the complex join between two logical tables and
then intelligently & dynamically selects the LTSs to join. This means that the BI server is able to
select the most efficient join in the physical layer and translate query into a physical SQL.

Is it mandatory to have hierarchies defined in your repository? If Yes, where does it


help? If No, what happens in the reports?
In OBIEE 10g, it is not mandatory. These are used for drill down and level based measures
In OBIEE 11g it is mandatory, else you get some warnings and errors

How do you create outer joins in physical layer?


We cannot create

What does Consistency Checking perform; What are the minimum criteria to pass
consistency checking for a given repository?
One Dimension table and One Fact table

How do we upload new file into BI server in 10g and 11g?


In 10g using NQSConfig.ini
In 11g using EM(EM manages NQSConfig.ini)

How to bypass server authentication?


You can bypass the authentication in NQSConfig.ini and instance config.xml, BYPASS SERVER
AUTHENTICATION=YES
What happens when foreign key join is defined in BMM layer?
If a foreign key join is used in BMM layer, then the BI Server will always select this physical join to
create the SQL even when it is not most efficient join
When is a complex join used in physical layer?
Complex join is used in a physical layer when we want to use join expressions. For ex: we want to use
greater than or less than operator.

Implicit Fact Column


What is an Implicit Fact Column?
An Implicit Fact column is used when we have multiple Fact tables and the report is generated only
from dimension tables. When a report is generated only from dimension tables(having multiple
Facts) then BI Server has multiple join paths available between dimensions and facts. When a report is generated
only from dimensions, it results in error. To overcome this error we
create an implicit fact column. This Implicit fact column shows the join path for the BI Server
This can created in the presentation layer in the Admin tool

Chronological Key
What is a Chronological key?
Chronological Key is the key which uniquely identifies the data at particular level. Chronological key
is mostly used in time dimensions where time series functions are used

What is the best default logging level for production users?


Log Level 0

What is the difference between logging level 1 and 2?


Level 1 Logs the SQL statement issued from the client application and logs elapsed times for query
compilation, query execution, query cache processing, and back-end database processing.Logs
the query status (success, failure, termination, or timeout). Logs the userID, session ID, and request
ID for each query.
Level 2 Logs everything logged in Level 1.Additionally, for each query, logs the repository name,
business model name, presentation catalog (called Subject Area in Answers) name, SQL for the
queries issued against physical databases, queries issued against the cache, number of rowsreturned
from each query against a physical database and from queries issued against the cache,
and the number of rows returned to the client application.
Query Repository Tool

What is Query Repository Tool?


It is utility of Seibel/OBIEE Admin tool
Allows you to examine the repository metadata tool
For example: search for objects based on name, type.
Examine relationship between metadata objects like which column in the presentation layer maps to which table in
physical layer
What are the different utilities available and explain them in detail

How can I export all the available tables into an excel sheet?
Using the Repository Documentation Utility

Which variable(s) doesn't require an initialization block?


Static Repository Variable

How to model a presentation layer so that all the dimension tables under a Dimension
folder and fact tables come under a Fact folder?
To give the appearance of nested folders in Oracle BI Answers, prefix the name of the presentation
folder to be nested with a hyphen and a space and place it after the folder in which it nests
Alternatively, you can enter a hyphen and a greater than sign in the description field
If I change a column name in the presentation layer from 'A' to 'A1'. Does the query

which includes this column work?


Yes it will work unless you change the column formula

Where can we apply filter in the repository?


In 'where' clause in content tab

What is Authentication and how many types of authentications are available in


OBIEE?
Authentication is the process by which a system verifies, through the use of a user ID and password,
that a user has the necessary permissions and authorizations to log in and access data. The Server
authenticates each connection request it receives. The different authentications available in OBIEE
are
Operating system authentication
External table authentication
Database authentication
LDAP authentication

Cache Management
Cache Management in OBIEE
1. Admin tool Manage > Cache > Purge
You can purge all the cache or cache by Subject Area or Cache by query
2. Physical Layer
Physical Table Properties > Cacheable, Cache persistence time
3 Event Pooling Table
4. EM(Enterprise Manager) > Capacity Management >Performance > Cache Enabled
5. Analysis > Advanced tab > Bypass presentation cache
6. Analysis > Advanced tab >
Prefix set variable DISABLE_CACHE_HIT=1;
7. When dynamic variable value changes the cache associated with that subject area purges
8.. Creating a batch file and scheduling it

You might also like