Microsoft - Data Warehousing With FastTrack PDW - TechEd Australia 2010

SESSION CODE: #DAT314
Data Warehousing with FastTrack and PDW

Nicholas Dritsas Principal Program Manager SQL Server Customer Advisory Team Microsoft
Agenda
SQL Server DataWarehouse Offering Overview Fast Track Offering
Motivation Balanced Architecture Approach for DW Example FastTrack Reference Architectures Optimizing Storage, Load and Maintenance Case Studies
Parallel DataWarouse Offering Overview
Agenda
Microsoft Data Warehousing - Products Positioning

PDW with Hub-and-spoke
Scale Complexity HA by default SW-HW integration
1 Minimal HW tune
4 3
PDW
up/optimization. Supports mixed workloads 2 Balanced solution for mostly scan centric workloads.
3 Max HW tune up for
SQL Server 2008 R2 with Fast Track Reference Architecture
2
SQL Server 2008 R2
most DW scenarios. 4 Most flexible Architecture for handling all DW scenarios.
New in SQL Server 2008 Data Warehousing Enablers

High speed Adapters Data Compression Star Join Query Optimization
MERGE SQL Statement
Backup Compression
Parallel Query Enhancements
Change Data Capture (CDC)
Resource Governor
Scale-out Shared Databases
Persistent Lookups
Policy Based Administration Partition-Aligned Indexed Views
Data Mining Improvements
Data Profiling
New - Report Builder 2.0
Included at no charge! No Fee Based Options: Compression Partitioning Advanced Security Manageability ETL Business Intelligence
Agenda

1 Minimal HW tune
4 3
PDW
2
SQL Server 2008 R2
SQL Server Relational Data Warehouses Today

Hundreds of deployments > 1 TB Dozens of deployments > 5 TB A wide variety of approaches Synergy with the SQL Sever BI Stack Momentum!
Steady stream of enabling features
Resource Governor, Compression, Star Query,
Next scale breakthrough coming with Parallel Data Warehouse this year
9
Some SQL Data Warehouses Today
Big SAN Biggest 64-core Server Connected together!
Whats wrong with this picture???
10
System out of balance

This server can consume 16 GB/Sec of IO, but the SAN can only deliver 2 GB/Sec
Even when the SAN is dedicated to the SQL Data Warehouse, which it often isnt Lots of disks for Random IOPS BUT Limited controllers Limited IO bandwidth
System is typically IO bound and queries are slow

Despite significant investment in both Server and Storage
11
The Alternative: A Balanced System

Design a server + storage configuration that can deliver all the IO bandwidth that CPUs can consume when executing a SQL Relational DW workload Avoid sharing storage devices among servers Avoid overinvesting in disk drives
Focus on scan performance, not IOPS
Layout and manage data to maximize range scan performance and minimize fragmentation
12
What is FastTrack Data Warehouse?

A method for designing a cost-effective, balanced system for Data Warehouse workloads Reference hardware configurations developed in conjunction with hardware partners using this method Best practices for data layout, loading and management Relational Database Only Not SSAS, IS, RS
13
Agenda
14
Data Warehouse Workload Characteristics

SELECT L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY) AS SUM_QTY, SUM(L_EXTENDEDPRICE) AS SUM_BASE_PRICE, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS SUM_DISC_PRICE, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)) AS SUM_CHARGE, AVG(L_QUANTITY) AS AVG_QTY, AVG(L_EXTENDEDPRICE) AS AVG_PRICE, AVG(L_DISCOUNT) AS AVG_DISC, COUNT(*) AS COUNT_ORDER LINEITEM L_RETURNFLAG, L_LINESTATUS L_RETURNFLAG, L_LINESTATUS
Scan Intensive
Hash Joins
Aggregations
FROM GROUP BY ORDER BY
15
Balanced Architecture Components
16
Balanced System - CPU

Determine your data consumption rate, per CPU core, for your query mix
Simple example: Assume TPCH query 2 is your average query Run the query on a test server with data fully cached in memory
Execute parallel query using MAXDOP 4 Observe 100% CPU on 4 cores Time the query and observe # pages read (Set Statistics IO on; Set Statistics Time on) Per Core Consumption = (# Logical Reads* 8K)/(CPU Time)
17
You can get more sophisticated

Realize that queries performing complex calculations, format conversions, multi-dimension hash joins, etc. will be more cpu-intensive than others
Complex queries will consume data at a slower per-core rate than simpler queries
Alternative: Measure per-core data consumption for a variety of queries, and take the weighted average
A standard approach to capacity planning
18
Or you can leave it to us

Weve measured a mix of TPCH queries that reflect a prototype Data Warehouse workload Concluded that SQL Sever 2008 R2 on current x64 cores consume ~200 MB/Sec per core on average for this workload We use this as a basis for the published reference architectures Your mileage will vary!
For precise system sizing, measure your own workload
19
Balanced System Determine Storage Sizing

CPU core count and consumption rate for workload will determine # of controllers and enclosures need to provide aggregate throughput # of controllers will determine minimum disk count for delivering the scan bandwidth Determine desired per-disk capacity based on expected data volume
Leave enough room for TempDB and for extra copies of the largest tables in the system, for maintenance activities
20
Balanced System IO Stack

Use a 2x quad-core server as a building block / starting point Ensure that the per-core data consumption rate can be delivered by all elements of the IO stack
Maximum theoretical throughput for IO stack components sized for an 8 CPU core Fast Track system (assumes 200 MB/s per core)
CPU Socket (4 Core)
CPU Socket (4 Core)
21
Balanced System Determine Storage Sizing (2)

Keep in mind theoretical maximums are just that theoretical Some testing/validation may be needed
Observed bandwidth realized on 8 core Fast Track system running SQLIO
CPU Socket (4 Core)
CPU Socket (4 Core)
22
Balanced System - Scaling the IO Stack

CPU Socket (4 Core) CPU Socket (4 Core) CPU Socket (4 Core) CPU Socket (4 Core) CPU Socket (4 Core) CPU Socket (4 Core) CPU Socket (4 Core) CPU Socket (4 Core)
Fiber Switch
Storage Processor Storage Processor
Storage Enclosure
RAID-1 RAID-1 RAID-1 RAID-1 RAID-1
Storage Enclosure
Storage Enclosure
HBA HBA HBA HBA HBA HBA HBA

Storage Processor Storage Processor Storage Processor Storage Processor Storage Processor Storage Processor
Storage Enclosure
RAID-1 RAID-1
RAID-1 RAID-1
Storage Enclosure
RAID-1
Storage Enclosure
Storage Enclosure
Server
23
HBA
Storage Enclosure
Agenda
Motivation Balanced Architecture Approach for DW Example FastTrack Reference Architectures Optimizing Storage, Load and Maintenance Case Studies Conclusions

24
Using a Preconfigured FastTrack Reference Architecture

Guesstimate of 200 MB/sec per core for an average DW workload Equates to 800 MB/Sec enclosure per quad-core CPU Estimate total bandwidth needed under query concurrency
Derives CPU count Derives total Storage profile
25
Published Reference Architectures

Balanced System Examples -- HP / Dell / IBM, 8 to 48 core
26
Agenda
27
Optimizing Storage Layout for Scan Intensive Workloads

LUN configuration is based on RAID1 pairs
Optimal for scan type access patterns
S P A S P B
RAID GP01 RAID GP02 RAID GP05
01
02
03
04
09
10
LUN1 LUN2
RAID GP03
LUN3 LUN4
RAID GP04
LUN0 (Logs)
H S
Striping across storage is accomplished via SQL Server data files Observed throughput for a single RAID pair >= 130 MB/s
28
05
06
07
08
LUN5
LUN6
LUN7
LUN8
Storage Layout Implications for SQL Server

Create a SQL data file per LUN, for every filegroup TempDB filegroups share same LUNs as other databases Log on separate disks, within each enclosure
Striped using SQL Striping Log may share these LUNs with load files, backup targets
29
Storage Layout Implications for SQL Server

LUN 1 LUN 2 LUN 3
Permanent FG
Permanant_DB
LUN16
Permanent_1.ndf
Permanent_2.ndf
Permanent_3.ndf
Permanent_16.ndf
Stage Database
Stage FG
Local Drive 1
Stage_1.ndf
Stage_2.ndf
Stage_3.ndf
Stage_16.ndf
TempDB
TempDB.mdf (25GB) TempDB_02.ndf (25GB)
TempDB_03ndf (25GB)
TempDB_16.ndf (25GB)
Log LUN 1 Permanent DB Log

Stage DB Log
How Scans are Optimized

SQL Server issues a large number of asynchronous read-ahead requests when performing scans
Attempts to issue I/O at rate needed to keep CPUs busy

Size of I/O issued is dependent on continuity of underlying data pages
I/O size can be any multiple of 8K up to 512K
Average request size that will be issued by read-ahead operations can be determined by looking at
avg_fragment_size_in_pages exposed by sys.dm_index_physical_stats Values >= 64 pages will mean I/Os sizes issued by read-ahead should be at or near 512K
31
Read-Ahead in Action
Clustered index: Key Order
1. Next range of pages requests is determined by looking at B-Tree for next range of key values 2. Pages for the range are sorted
3. I/O issued for each contiguous range of pages (up to 64 pages in a single request)
Heap: Allocation Order

Scan GAM pages to determine next range of pages
I/O issued for each contiguous range of pages (up to 64 pages in a single request)
32
Techniques to Maximize Scan Throughput

E startup parameter Minimize use of NonClustered indexes on Fact Tables Load techniques to avoid fragmentation
Load in Clustered Index order (e.g. date) when possible
Index Creation always MAXDOP 1, SORT_IN_TEMPDB

Isolate volatile tables in separate filegroup Isolate staging tables in separate filegroup or DB
Periodic maintenance
33
Conventional data loads lead to fragmentation

Bulk Inserts into Clustered Index using a moderate batchsize parameter
Each batch is sorted independently
Overlapping batches lead to page splits

1:31 1:32 1:36 1:33 1:32 1:34 1:37 1:35 1:33 1:38 1:34 1:39 1:35 1:40
Key Order of Index
34
Alternatives for loading

Use a heap
Practical if queries need to scan whole partitions
orUse a batchsize = 0
Fine if no parallelism is needed during load
orUse a Two-Step Load

1. Load to a Staging Table (heap) 2. INSERT-SELECT from Staging Table into Target CI Resulting rows are not fragmented Can use Parallelism in step 1 essential for large data volumes
35
Two-Step Load Variations

To achieve high parallelism during historical load
Typically into a partitioned table Use a Staging Table (heap) that is partitioned identically to the Target Table Use multiple concurrent streams to load the Staging Table with moderate batchsize (SSIS, Bulk Insert, etc) INSERT-SELECT separate partitions into the Target Table potentially in parallel
Use ALTER TABLE SET ( LOCK_ESCALATION = AUTO)
Note: If memory is limited, TempDB could be heavily used for sorting
36
Two-Step Load Variations (cont.)

To avoid most TempDB space and TempDB IO during load
Use a partitioned Staging Table that is also indexed identically to Target Table Load Staging Table using moderate batchsize (< 1M rows) Final INSERT-SELECTs will avoid any sort!
However the staging loads will be logged
Note: Parallelism will be limited if load batches overlap
37
Other fragmentation best practices

Avoid Autogrow of filegroups
Pre-allocate filegroups to desired long-term size Manually grow in large increments when necessary
Keep volatile tables in a separate filegroup

Tables that are frequently rebuilt or loaded in small increments
If historical partitions are loaded in parallel, consider separate filegroups for separate partitions to avoid extent fragmentation
38
Sometimes fragmentation cant be avoided

If incremental loads overlap data already present in the Clustered Index, page splits will occur anyway Periodic table maintenance can reduce the fragmentation Partitioning on history (date key) can help minimize needed maintenance operations
39
Maintenance considerations
Use ALTER INDEX REBUILD WITH (MAXDOP = 1, SORT_IN_TEMPDB)
Single threaded -- avoids creating new extent fragmentation Can rebuild just the current partition
Avoid ALTER INDEX REORGANIZE

Pages will become physically ordered, but significant extent fragmentation may occur
40
Handling long-term accumulation of fragmentation

Sometimes it may be best to start fresh:
Create a new filegroup to replace the old Create a new copy of the table in new filegroup
With matching Partitions and Clustered Index
INSERT-SELECT from old to new (avoids a sort) Build secondary indexes Drop original table and rename the new All but final step can be performed online
41
Agenda
42
Case 1: Insurance Claims -High-volume loads in a short load window

Example: Load and enrich 50 GB of incremental data in less than 1 hour Only possible with a highly parallel load design Use partitioned destination table
Partitioned by equal ranges of customer key
But a Clustered Index on Date # partitions = # cores
Parallel loading to staging table first Separate filegroups per-partition prevents interleaving during load
43
System Design
MSA2000 DAE Pri_A Pri_B Pri_C Pri_D Log Hot Spare Hot Spare
Primary Storage 8 Drives (4 RAID1 Pairs)
Logs 2 Drives (1 RAID1 Pair)
Spares 2 Drives
44
Results
Existing Appliance Loading Subject Area 1 Loading Subject Area 2 Query times Subject Area 1 Query times Subject Area 2 5:10:21 total time SQL Server Fast Track DW 51:31 total time Comparison R SQL Server 6x faster R SQL Server 2.5x faster R SQL Server 12x faster R SQL Server 7x faster
4:36:08 total time

3:03 avg query time (using 9 benchmark queries) 56:44 avg query time (using 4 benchmark queries) $22K / TB $13K / TB
1:50.01 total time

0:15 avg query time (using 9 benchmark queries) 8:09 avg query time (using 4 benchmark queries)
Price per TB (8TB) Cal : Price per TB (16TB) Cal:
45
Case 2: Telecom--Initial Data Load

Load 400 GB to new Clustered Index on an 8core server in under 7 hours Target table designed with 8 partitions of evenly spaced historical ranges 3-step load process leveraging partitioning
Load, Index, Switch All steps use parallelism Minimal logging
46
Case 2: Telecom -- Initial Data Load

Data Size: 400G (50G * 8) Bulk Insert 8 files to match core count, and partition the final table according to core count 1 Heap Table per destination partition, and final table is assumed to be Empty Create Clustered Index on the Heap Tables, and 1:1 switch each into the final Partitioned Table SSIS Package Attributes/MaxConcurrentExecuables: 8 Use MAXDOP=1: minimal fragmentation
1. Bulk Insert
2. Create Clustered Index
3. Switch
47
Agenda
48

1 Minimal HW tune
4 3
PDW
2
SQL Server 2008 R2
49
SQL Server Parallel Data Warehouse

A data warehouse appliance with massive scalability
Massive Scale-Out of SQL Server through Massively Parallel Processing (MPP) system: 10s TB 100s TB PB Choice of hardware vendor - Reference Architectures from HP, Bull EMC, Dell, IBM Low cost of ownership through industry standard hardware Simplified deployment & maintenance via appliance model Integration with existing SQL Server 2008 data warehouses via Hub & Spoke Architecture Deep integration with Microsoft BI
50
Parallel Data Warehouse Appliance Hardware Architecture

Database Servers Control Nodes Active / Passive
SQL SQL
Storage Nodes
Client Drivers
SQL
SQL
SQL
Management Servers
SQL
Data Center Monitoring
SQL
Landing Zone
ETL Load Interface
SQL
SQL
Backup Node
Corporate Backup Solution
SQL
SQL
Spare Database Server
51
Corporate Network
Private Network
Dual Fiber Channel
Dual Infiniband
Question & Answer Session
2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Resources
www.msteched.com/Australia
Sessions On-Demand & Community
www.microsoft.com/australia/learning
Microsoft Certification & Training Resources
http:// technet.microsoft.com/en-au
Resources for IT Professionals
http://msdn.microsoft.com/en-au
Resources for Developers
54

Microsoft - Data Warehousing With FastTrack PDW - TechEd Australia 2010

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Microsoft - Data Warehousing With FastTrack PDW - TechEd Australia 2010

Uploaded by

Copyright:

Available Formats

SESSION CODE: #DAT314

Data Warehousing with FastTrack and PDW

Parallel DataWarouse Offering Overview

Parallel DataWarouse Offering Overview

Microsoft Data Warehousing - Products Positioning

Scale Complexity HA by default SW-HW integration

SQL Server 2008 R2 with Fast Track Reference Architecture

most DW scenarios. 4 Most flexible Architecture for handling all DW scenarios.

New in SQL Server 2008 Data Warehousing Enablers

MERGE SQL Statement

Parallel Query Enhancements

Change Data Capture (CDC)

Scale-out Shared Databases

Policy Based Administration Partition-Aligned Indexed Views

Data Mining Improvements

New - Report Builder 2.0

Parallel DataWarouse Offering Overview

Microsoft Data Warehousing - Products Positioning

Scale Complexity HA by default SW-HW integration

SQL Server 2008 R2 with Fast Track Reference Architecture

most DW scenarios. 4 Most flexible Architecture for handling all DW scenarios.

SQL Server Relational Data Warehouses Today

Some SQL Data Warehouses Today

Big SAN Biggest 64-core Server Connected together!

Whats wrong with this picture???

System out of balance

System is typically IO bound and queries are slow

The Alternative: A Balanced System

What is FastTrack Data Warehouse?

Parallel DataWarouse Offering Overview

Data Warehouse Workload Characteristics

FROM GROUP BY ORDER BY

Balanced Architecture Components

Balanced System - CPU

You can get more sophisticated

Or you can leave it to us

Balanced System Determine Storage Sizing

Balanced System IO Stack

CPU Socket (4 Core)

CPU Socket (4 Core)

Balanced System Determine Storage Sizing (2)

CPU Socket (4 Core)

CPU Socket (4 Core)

Balanced System - Scaling the IO Stack

Storage Processor Storage Processor

RAID-1 RAID-1 RAID-1 RAID-1 RAID-1

RAID-1 RAID-1 RAID-1 RAID-1 RAID-1

RAID-1 RAID-1 RAID-1 RAID-1 RAID-1

HBA HBA HBA HBA HBA HBA HBA

RAID-1 RAID-1 RAID-1 RAID-1 RAID-1

RAID-1 RAID-1 RAID-1 RAID-1 RAID-1

RAID-1 RAID-1 RAID-1 RAID-1 RAID-1

RAID-1 RAID-1 RAID-1 RAID-1 RAID-1

Parallel DataWarouse Offering Overview

Using a Preconfigured FastTrack Reference Architecture

Published Reference Architectures

Parallel DataWarouse Offering Overview

Optimizing Storage Layout for Scan Intensive Workloads

Storage Layout Implications for SQL Server

Storage Layout Implications for SQL Server

TempDB.mdf (25GB) TempDB_02.ndf (25GB)

Log LUN 1 Permanent DB Log

How Scans are Optimized