You are on page 1of 51

Analyzing User Query

Needs
Types of Users

Executives
Managers
Business analysts
User Access
Types of Users
 Executives
 Casual users
or managers
 Business
analysts or
power users

Structured Unstructured
Gathering User Requirements
 Areas to focus:
 How users do business and what the business
drivers are
 What attributes users need (required versus
good to have)
 What are the business hierarchies are
 What data users use and what they like to have
 What levels of detail or summary needed
 What types of front-end data access tool used
 How users expect to see the query results
Gathering User Requirements:
Possible Obstacles
The following are some of the possible obstacles:
 Business objective of the data warehouse has not been specifically
defined
 Scope of the data warehouse is too broad
 Misunderstanding about the purpose and function of a decision support
systems and operational systems
User Query Progression

 Starts simple
 Becomes more analytical
 Requires different techniques and
flexible tools
Why?

What? Why?

Why?
Training
 Methods
- Informal: one-to-one or small class
- Formal: larger class
- Self-study
 Basic topics
- Logging on
- Accessing metadata
- Creating and submitting a query
- Interpreting results
- Saving queries and storing results
- Utilizing resources
- Learning warehouse fundamentals
Query Efficiency

User considerations
 Successful completion
 Faster query execution
 Less CPU used
 More opportunity for further analysis
Query Efficiency

Designer considerations
 Use indexes
 Select minimum data
 Employ resource governmors
 Minimize bottlenecks
 Develop metrics
 User prepared and tested queries
 Use quiet periods
Charge Models

 Examples of charge models:


- Flat allocation model
- Transaction-based model
- Telephone service model
- Cable TV model
 Develop your own unique model
 Avoid a charge model that discourages users from using the warehouse
Query Scheduling and Monitoring
 Query scheduling
- Manages information usage
- Directs queries
- Executes queries
- Sets job queue priorities
 Query monitoring
- Track resource-intensive queries
- Detect unused queries
- Catch queries that use summary data inefficiently
- Catch queries that perform regular summary
calculations at the time of query execution
- Detect illegal access
Query Management and Monitoring
Tools
 Use tools, schedulers, Oracle Enterprise Manager
 Consider
- Automation levels
- Technology interfaces
- Cost
Security

 Do not overlook
 Subject area sponsors:
- Review and authorize request for
access rights
- Identify enhancements
 Transparent security
 Easy to implement, maintain, and manage
Security Plan

 Define a strategy:
- Allocate business area owners
- Ensure invisibility
 Ensure easy management
 Consider auditing
 Manage passwords
Role-Based Security

 Subject area access:


- Summary data for new users
- All data for experienced users
 Departmental access
 Limited object access
 Access during load
Application Context and Fine-Grained
Access Control in Oracle8i

Who am I?
Where am I?

Table Access Application


policy context
Comparing OLAP and DSS

 OLAP is used for multidimensional analysis.


 DSS provides a system enabling decision making.
 OLAP tools provide a DSS capability
 OLAP for the warehouse provides analytical power.
 Other terms:
- EIS
- KBS
The Functionality of OLAP

 Rotate and drill down to successive levels of detail.


 Create and examine calculated data interactively
on large volumes of data.
 Determine comparative or relative differences.
 Perform exception and trend analysis.
 Perform advanced analytical functions for example
forecasting, modeling, and regression analysis
Original OLAP Rules

 Multidimensional conceptual view


 Transparency
 Accessibility
 Consistent reporting performance
 Client-server architecture
Original OLAP Rules

 Generic dimensionality
 Dynamic sparse matrix handling
 Multiuser support
 Unrestricted cross-dimensional operations
 Intuitive data manipulation
 Flexible reporting
 Unlimited dimensions and aggregation levels
Relational Database Model

Attribute 1 Attribute 1 Attribute 1 Attribute 1

Name Age Gender Emp No.


Row 1
Anderson 31 F 1001
Row 2
Green 42 M 1007
Row 3
Lee 22 M 1010
Row 4
Ramos 32 F 1020
The table above illustrates the employee relation.
Multidimensional Database Model

Customer
Store Store
Time Time

SALES FINANCE

Product

GL-Line

The data is found at the intersection of dimensions.


Relational Server
 Benefits:
- Well-known environment with many experts in most
organizations able to support the product
- Can be used with data warehousing and operational
systems
- Many tools available with advanced features including
improvements made to performance with report servers
 Disadvantages:
- Does not have any complex functions or analysi s
capabilities provided by OLAP tools
- These products may also be restricted to the volumes of
Multidimensional Server
 Benefits:
- Quick access to very large volumes of data
- Extensive and comprehensive libraries of complex
functions specifically for analysis
- Strong modeling and forecasting capabilities
- Can access multidimensional and relational database
structures
 Disadvantages:
- Difficulty of changing dimensions without reaggregating
to time
- Lack of support for very large volumes of data
Modeling the Data Warehouse
Data Warehouse Database Design
Phases

 Defining the business model (conceptual Select a


model) business
process
 Creating the dimensional model (logical
model) 2,3
 Modeling summaries
 Creating the physical model

Physical model
Performing Strategic Analysis

Phase 1: Defining the Business Model

Performing strategic

Select a
analysis business
process

Creating the business



(conceptual) model
Creating the Business Model
Phase 1: Defining the Business Model
Performing strategic analysis
 Creating the business (conceptual)model
- Defining business requirements
- Identifying the business measures
- Identifying the dimensions
- Identifying the grain
- Identifying the business definitions
and rules
- Verifying data sources
Creating the Business Model
Phase 1: Defining the Business Model
Performing strategic analysis
 Creating the business (conceptual) model
- Defining business requirements
- Identifying the business measures
- Identifying the dimensions
- Identifying the grain
- Identifying the business definitions
and rules
- Verifying data sources
Business Requirements Drive the Design
Process
Primary input

Business requirements
Other inputs

Type title here

Type title here Type title here Type title here

Existing Production Research Nonrelational


metadata ERD model legacy systems
Identifying Measures and Dimensions

Measures Dimensions

The attribute varies The attribute is perceived as


continuously: a constant or discrete value:

•Balance •Description
•United Sold •Location
•Cost •Color
•Sales •Size
Determining Granularity

YEAR?

QUARTER?

MONTH?

WEEK?

DAY?
Identifying Business Rules

Location Product

Geographic proximity Type Monitor Status


0 - 1 miles PC 15 inch New
1 - 5 miles Server 17 inch Rebuilt
> 5 miles 19 inch Custom
None

Time Store

Month>Quarter>Year Store>District>Region
Creating the Dimensional Model

 Identify fact tables


- Translate business measures into fact tables
- Analyze source system information for
additional measures
- Identify base and derived measures
- Document additivity of measures
 Identify dimension tables
 Link fact tables to the dimension tables
 Create views for users
Dimension Tables
Dimension tables have the following characteristics:
 Contain textual information that represents the
attributes of the business
 Contain relatively static data
 Are joined to a fact table through a foreign key
reference

Product Channel
Facts
(units,
price) Time
Customer
Fact Tables

Fact tables have the following characteristics:


 Contain numeric measures (metric) of the business
 May contain summarized (aggregated) data
 May contain date-stamped data
 Are typically additive
 Have key value that is typically a concatenated key
composed of the primary keys of the dimensions
 Joined to dimension tables through foreign keys that
reference primary keys in the dimension tables
Fact table

Product Channel

Facts
(units,
price)

Customer Time

Dimension tables
Star Schema Model
Product Table Store Table
 Central fact table
Product_id Store_id
 Radiating dimensions Product_desc District_id
 Denormalized model
Sales Fact Table
Product_id
Store_ id
Item_id
Day_id
Sales_dollars
Sales_units

Time Table Item Table


Day_id Item_id
Month_id Item_desc
Year_id
Star Schema Model

 Easy for users to understand


 Fast response to queries
 Simple metadata
 Supported by many front end tools
 Less robust to change
 Slower to build
 Does not support history
Snowflake Schema Model
Product Table Store Table District Table
Product_id Store_id District_id
Product_desc District_id District_desc

Sales Fact Table


Product_id
Store_ id
Item_id
Day_id
Sales_dollars
Sales_units

Time Table Dept Table Mgr Table


Item Table Dept_id Dept_id
Day_id Item_id
Month_id Dept_desc Mgr_id
Item_desc Mgr_id Mgr_name
Year_id
Snowflake Schema Model
 Direct use by some tools
 More flexible to change
 Provides for speedier data loading
 May become large and unmanageable
 Degrades query performance
 More complex metadata

Country State County City


Using Summary Data

Phase 3: Modeling summaries


 Provides fast access to precomputed data
 Reduces use of I/O, CPU, and memory
 Is distilled from source systems and precalculated summaries
 Usually exists in summary fact tables
Designing Summary Tables

 Average
 Maximum
Total
Percentage
Units Sales($) Store

Product A
Total

Product B
Total

Product C
Total
Summary Tables Example

SALES BY MONTH/REGION
SALES FACTS Month Region Tot_Sales$
Sales$ Region Month Jan 99 North 41,000
10,000 North Jan 99 Jan 99 East 10,000
12,000 North Feb 99 Feb 99 South 40,000
11,000 South Jan 99 Mar 99 West 17,000
15,000 West Mar 99
18,000 South Feb 99
20,000 North Jan 99
10,000 East Jan 99
SALES BY_MONTH
2,000 West Mar 99
Month Tot_Sales
Jan 99 51,000
Feb 99 40,000
Mar 99 17,000
Summary Management in Oracle8i
Sales
summary
Region

State

City

Product Time

Summary advisor
Summary Space
usage Summary requirements
recommendations
Using Time in the Data Warehouse
The Time Dimension

 Time is critical to the data warehouse


 A consistent representation of time is required for extensibility

Time
Sales fact
dimension
Where should the element of time be stored?
Creating the Physical Model

Phase 4: Creating the Physical Model


 Translate the dimensional design to a physical
model for implementation
 Define storage strategy for tables and indexes
 Perform database sizing
 Define initial indexing strategy
 Define partitioning strategy
 Update metadata document with physical
information
Physical Model Design Tasks

 Define naming and database standards


 Perform database sizing
 Design tablespaces
 Develop initial indexing strategy
 Develop data partition strategy
 Define storage parameters
 Set initialization parameters
 Use parallel processing
Using Data Modeling Tools

 Tools with a GUI enable definition, modeling, and


reporting
 Avoid a mix of modeling techniques caused by:
- Development pressure
- Developers with lack of knowledge
- No strategy
 Determine a strategy
Spreadsheets
 Write and publish formally
 Make available electronicallyCASE tools Paper and
pencil
Summary

This lesson discussed the following topics:


 Creating a business model
Select among
 Creating a dimensional model business
 Modeling the summaries processes
2,3
 Creating a physical model

Business model

Dimensional model

Physical model

You might also like