You are on page 1of 47

Irish SQL Academy 2008.

Level 300

Bob Duffy
DTS 2000 SSIS 2005

1.75 Developers

*Figures are only approximations and should not be referenced or quoted


Optimize and Minimize staging (else use RawFiles if possible)
Hardware Infrastructure: Disks, RAM, CPU, Network
Stabilize the basics SQL Infrastructure: File Groups, Indexing, Partitioning

Replace destinations with RowCount


Measure Source->RowCount throughput
Source->Destination throughput

OVAL performance tuning strategy


The Three Ss
Tune Data Flow Bag of Tricks

Lookup patterns
Parallelize Script vs custom transform
Increase the efficiency of
Sharpen every aspect

Parallelize, partition,
Share pipeline

Buy faster, bigger, better


Spend hardware
But be aware of limitations
Row
Based
(synchronous)

Partially
Blocking
(asynchronous)

Blocking
(asynchronous)
http://msdn.microsoft.com/en-us/library/ms345346.aspx
Source data Source servers
EMC CX600 run SSIS Destination Database
server runs SQL EMC CX3-80
2 Gb Fiber Channel Server
1 Gb Ethernet
connections

4 Gb Fiber Channel

Source servers: Destination server:


Unisys ES3220L Unisys ES7000/One
2 sockets each with 4 core 32 sockets each with dual core
Intel 2 GHz CPUs Intel 3.4 GHz CPUs
4 GB RAM 256 GB RAM
Windows Server 2008 Windows Server 2008
SQL Server 2008 SQL Server 2008
Make: Unisys
Model: ES7000/one Enterprise Server
OS: Microsoft Windows Server 2008 x64 Datacenter Edition
CPU: 32 socket dual core Intel Xeon 3.4 GHz (7140M)
RAM: 256 GB
HBA: 8 dual port 4Gbit FC
NIC: Intel PRO/1000 MT Server Adapter
Database: Pre-release build of SQL Server 2008 Enterprise Edition (V10.0.1300.4)
Storage: EMC Clariion CX3-80 (Qty 1)
11 trays of 15 disks; 165 spindles x 146 GB 15Krpm; 4Gbit FC

Quantity: 4
Make: Unisys
Model: ES3220L
OS: Windows2008 x64 Enterprise Edition
CPU: 2 socket quad core Intel Xeon processors @ 2.0GHz
RAM: 4 GB
HBA: 1 dual port 4Gbit Emulex FC
NIC: Intel PRO1000/PT dual port
Database: Pre-release build of SQL Server 2008 Integration Services (V10.0.1300.4)
Storage: 2x EMC CLARiiON CX600 (ea: 45 spindles, 4 2Gbit FC)
C1

C1

C1

C1
Orders Table
Partition Partition Partition Partition Partition Partition Partition Partition
1 2 3 4 5 6 55 56

Orders_1 Orders_2 Orders_3 Orders_4 Orders_5 Orders_6 ... Orders_55 Orders_56

...
SSIS

SSIS

SSIS

SSIS

SSIS

SSIS

SSIS

SSIS
orders.tbl.1 orders.tbl.2 orders.tbl.3 orders.tbl.4 orders.tbl.5 orders.tbl.6 orders.tbl.55 orders.tbl.56
(Package details
removed to protect
the innocent)
Follow Microsoft
Iterative design, development & testing
Development Guidelines

People & Processes


Understand the Business Kimballs ETL and SSIS books are an excellent reference

Resource contention, processing windows,


Get the big picture SSIS does not forgive bad database design
Old principles still apply e.g. load with/without indexes?

Will this run on IA64 / X64?


No BIDS on IA64 how will I debug?
Platform considerations Is OLE-DB driver XXX available on IA64?
Memory and resource usage on different platforms
Process Break complex ETL into logically distinct packages (vs
monolithic design)

Modularity Improves development & debug experience

Package Separate sub-processes within package into separate Containers


More elegant, easier to develop

Modularity Simple to disable whole Containers when debugging

Component Use Script Task/Transform for one-off problems

Modularity
Build custom components for maximum re-use
Concise naming conventions

Conformed blueprint design patterns

Presentable layout

Annotations

Error Logging

Configurations
Get as close to the data as possible
Limit number of columns
Filter number of rows

Dont be afraid to leverage TSQL


Type conversions, null coercing, coalescing, data type sharpening
select nullif(name, ) from contacts order by 1
select convert(tinyint, code) from sales

Performance Testing & Tuning


Connect Output to RowCount transform
See Performance Best Practices

FastParse for text files


BEFORE: AFTER:
select
dbo.Tbl_Dim_Store.SK_Store_ID
select * from etl.uf_FactStoreSales(@Date)
, Tbl_Dim_Store.Store_Num
,isnull(dbo.Tbl_Dim_Merchant_Division.SK_Merch_Di
v_ID, 0) as SK_Merch_Div_ID
from dbo.Tbl_Dim_Store
left outer join dbo.Tbl_Dim_Merchant_Division
on dbo.Tbl_Dim_Store.Merch_Div_Num =
dbo.Tbl_Dim_Merchant_Division.Merch_Div_N
um
where Current_Row = 1
Use the power of TSQL to clean the data 'on the fly'
Avoid over- Too many moving parts is inelegant and likely slow
But dont be afraid to experiment there are many ways to
design solve a problem

Maximize Allocate enough threads


EngineThreads property on DataFlow Task
Parallelism See Performance Talk

Minimize Synchronous vs. Asynchronous components

blocking Memcopy is expensive

Minimize For example, minimize data retrieved by LookupTx


ancillary data
Three Modes of Full Cache for small lookup datasets
No Cache for volatile lookup datasets
Operation Partial Cache for large lookup datasets

Tradeoff Full Cache is optimal, but uses the most memory, also takes time to
load
memory vs. Partial Cache can be expensive since it populates on the fly using
singleton SELECTs
performance No Cache uses no memory, but takes longer

Can use Merge Catch is that it requires Sorted inputs


Join component See SSIS Performance white paper for more details

instead
Can written in any .Net language
Custom Must be signed, registered and installed but

components can be widely re-used


Quite fiddly for single task

Can be written in VisualBasic.Net or C#

Scripts Are persisted within a package and have


limited reuse
Have template methods already created for you
http://sqlcat.com

http://technet.microsoft.com/en-us/library/bb961995.aspx

http://blogs.msdn.com/sqlperf/archive/2008/02/27/etl-world-record.aspx

You might also like