You are on page 1of 13

Data Warehouse Appliances Evaluation Criteria

Krishna Manoharan krishmanoh@gmail.com http://dsstos.blogspot.com

DataWarehouse Appliances
Many Appliances in the Market with differing capabilities. A common comparison method conducted by a prospective Customer is by conducting Performance Benchmarks. In a typical real world deployment, Performance characteristics would be one among many requirements. This presentation is a customer centric attempt to outline other requirements that can influence the outcome of a deployment in the long run. Ultimately the underlying technology becomes of academic interest and hence should not be the sole criteria in evaluating an appliance.
2

Grading an Appliance
In the following pages, I have listed typical Customer Requirements. The criteria that is listed is not all-encompassing, it can serve as a foundation to further expand your requirements.
It is important that as a Customer, you rate these criteria based on your needs. And then compare and grade Appliances based on their capabilities as pertaining to these requirements.

As I typically do, I would use a real-world example to show the process. Appliance evaluation is not part of this exercise.
3

A Business Case High Level


Business Unit within the Organization has a requirement to analyze and run reports on their transaction data. Currently no DW exists. Rapidly changing business model due to nature of business. The DW needs to keep pace with the changes. A traditional Dimensional Model may not suffice. Dashboards, Canned and Ad-hoc Reports. Rigid SLA with financial penalties. Reports need to be consistent and available 24*7 (no downtime during loading of data). No real-time reporting requirements. 90% of the Reports require churning the last 1 year worth of data. Some Reports churn the entire data set every day. Growth rate is 2X/year. Data loaded on a daily basis. ETL should not affect reporting performance.
4

Some Relevant Facts in our example


Existing Transaction System Oracle 10g R2 RAC DW needs to be built from scratch. ETL Engine - Informatica Reporting Engine Cognos/Business Objects Development tools SQL Developer/Toad/Erwin Backup Netbackup Monitoring - BMC Patrol Admin Skills Primary Oracle shop. No DR capabilities yet. However, HA is a requirement. Limited DC space available.
5

Criteria 1 Developer Requirements


rea

Requirements
ET Inte ration apabilities

Comments


Ratin
t w t out omplicat workRequired

Int

r t w t Infor

Developer Requirements

Be chema Agnostic


Can reports be run directly against Normalized Transaction Schemas? Or do I need a traditional Dimensional Model with Summary Layers for performance? This will identify development time and flexiblility to meet Business requirements rapidly. What methods are available for ensuring Data Consistency during reporting - apart from typical ACID capabilities? For example - Oracle offers Partition Exchange as an option.

ata Consistency During Loading

Support for Stored Procedures

Support for Stored Procedures

Predefined Complex Views

Help with Cognos integration/Reporting by defining Views

Support Analytical/Window Functions at DB Layer

Support analytical/Window functions at the DB layer.

ERWIN + SQL Developer + Toad compatibility

To ensure compatibilty with current Data Modelling/Developement tools.

 

BI T

I tegration

Abilit to int

rat wit Cognos/Business Objects.

our /t r arounds.

Required

Required (Performance Test)

Required (Performance Test)

NA

Required (Performance Test)

Required (Performance Test)

Optional

Criteria 1 Developer Requirements contd.


Area


Requirements
artitioning and supported types of partitioning.

Comments
Date Time Based Partitioning or equivalent is preferable due to the nature of reports and potential archiving capabilities . Will consider other partitioning methods if able to achive performance criteria. Helps ith reducing ETL Development Time by enforcing constraints(PK FK Unique Not Null) on the DB. Increases Database overhead. Typically enforced through Indexes (more overhead). The nature of the Business requires updating older data. Indexes typically speed up such DML operations and enforcing constraints. If capability is absent then require to be tested using Performance Criteria.
% #

Rating
Required (Performance est
" !

Developer Requirements

Support Constraints on Database

Optional

Data lifecycle Management (Archival strategies


"

Archiving offloading compressing seldom used data.

&

Support various different Datatypes

Support for existing Oracle Datatypes. Currently using AW columns.

Required

NA

"

Support ndexes
$

Optional (Performance est

Criteria 3 Standards, HA etc.


Area Standards Requirements
Be HW Agnostic

Comments
Future proof - no lock downs into proprietary HW.

Rating
Optional

SQL Standard

Support for ANSI SQL and/or Oracle SQL

Required

Operational Data Store

Support Replication (Shareplex/ olden Gate/Etc) Ability to support Transaction Level activities
'

Support replication from Transaction systems using Golden Gate/Shareplex/Informatica CDC etc. Function as a hybrid Single row select/insert/update/delete capabilities)
(

NA

NA

Availability

HA Capabilities

HA capabilities continously active) - Full HW/SW redundancy

DR Capabilities

DR Mechanisms Sync/Async Replication etc)

NA

Required

Criteria 4 Performance
Area
)

anagement
Rating
Required (Perfor ance Test)
1 0

Requirements
inimal Performance anagement of Reports (Adhoc/Canned) and Loads
1 )

Comments
Do loads / reports need to be tuned for perfor ance? How are bad queries handled? Save on Storage. Reduces I overhead and i proves I perfor ance. Pushes burden to CPU. Co pression and related side-effects on D L activities (Insert/Update/Delete) Consistent ti ings for Reports for a fixed nu ber of users. Do ETL activities affect Reporting Perfor ance?
0 0 0 3 0 2 0 2 0

Perfor ance Manage ent

Linear scalability? Does scale-out/up require downti e?


0

High Perfor ance Density (Perfor ance/ B + Perfor ance/U)


5 1 1 1

How s all a configuration will eet perfor ance needs? Affects overall Required (Perfor ance price of appliance. Test)

Linear and sea less scalability rt Perfor ance/Capacity Planning


1 4 1

Required (Perfor ance Test)

Database Resource Manage ent


1

Resource

anage ent capabilities - control CPU/I per User or other echanis s.

Required (Perfor ance Test)

Consistent Reporting Perfor ance (even during Loading activities)


1

Required (Perfor ance Test)

Aggressive Co pression Capabilities and Mini al Co pression Restrictions.


1 1

Required (Perfor ance Test)

Criteria 5
Area
6

aintenance
Co ents
8 8

Require ents
Si plified, accurate and fast Statistics athering Si plified Partition Maintenance Si plified Space Manage ent Perfor ance and Database Stats ( ot statistics for objects) Archiving based on Date Predicates
A 9 9 9 9 7

Rating
Required (Perfor ance Test)
9

Statistics gathering on Large objects can be expensive in ter s of resources and ti e.

Maintenance

Does the appliance report perfor ance and database stats - for e.g. Nu ber of concurrent sessions, e ory Utilization and such?
8 @ 8 8

Easy Backup capabilities

Backing up large volu es of data can be challenging. Easy integration into existing Netbackup infrastructure would be preferred.
8

Debug/Trace Capabilities

Troubleshoot perfor ance issues using debug/trace capabilities.

ell rounded Ad in Console


9

ell designed Ad in interface would

typically archive/delete older data based on a date predicate.

ake

anage ent efficient.

Space

anage ent consu es significant ti e of DBAs.

Partition

aintenance consu es significant ti e of DBAs.

Required

Required

Required

Optional

Required

Required

Required

10

Criteria 5
Area Requirements
Easy Restore capabilities

aintenance contd.
Comments
In case of needing to restore data from an earlier period, can it be automated and seamless (without requiring an outage)? Recovery from crash/hw failures without disruptions to running loads/reports.

Rating
Required

Consistent and Easy Recovery capablities

Required

Easy Upgrades/Patching

Seamless and easy upgrades/patching without downtime.

Required

Maintenance

Easy to learn and Manage Admin Skills

Do we need specialized skills? Is it easily learnable?

Low Complexity

Is it complex in terms of HW/SW/Components etc?

Quick Infrastructure Ramp-up time Responsive Support Model/Economical Licensing

Can it be up and running within a day/two?

How is it licensed? Is HW/SW supported by a single vendor?

Integrate with existing Monitoring Infrastructure

Can the appliance be

onitored by BMC Patrol? Can alerts be setup? SNMP capabilities?

Required

Required

Required

Required

Required

11

Criteria 6
Area Requirements

nfrastructure Footprint
Comments Rating

Infrastructure Footprint

Moderate Power/Cooling Requirements

Power consumption, Cooling etc (affects long term cost). Current DataCenter has limitations.

Required

Minimal Space requirements

Space consumed in the Data Center (affects long term cost). Current DataCenter has limitations.

Required

No Appliance Localization

When scaling out by adding an additional appliance unit, does the new unit need to be in close proximity with the existing one? With limited DataCenter Space, this is important.

Required

Customer Appliance Modifications

Can the appliance be re-assembled in a Customer Provided Rack?What other odifications can be done by the customer?

Optional

12

Summarizing
As you can see, there are many aspects to evaluating an appliance. There is no perfect appliance that would possibly fit all your requirements. It comes down to striking a middle ground. If you were to take these criteria with your requirements and ask your appliance vendor to fill in their comments, it would be easy to compare different vendors even before attempting a benchmark. Benchmarks are time and resource intensive and require significant up-front planning. Ideally, this evaluation matrix should help you narrow down the list significantly.
13

You might also like