You are on page 1of 47

Irish SQL Academy 2008.

Level 300

Bob Duffy

DTS 2000

SSIS 2005

1.75 Developers

*Figures are only approximations and should not be referenced or quoted

Optimize and
Stabilize the basics

Measure
Tune
Parallelize

Minimize staging (else use RawFiles if possible)


Hardware Infrastructure: Disks, RAM, CPU, Network
SQL Infrastructure: File Groups, Indexing, Partitioning

Replace destinations with RowCount


Source->RowCount throughput
Source->Destination throughput
OVAL performance tuning strategy
The Three Ss
Data Flow Bag of Tricks

Lookup patterns
Script vs custom transform

Sharpen

Increase the efficiency of


every aspect

Share

Parallelize, partition,
pipeline

Spend

Buy faster, bigger, better


hardware

But be aware of limitations

Row
Based
(synchronous)

Partially
Blocking
(asynchronous)

Blocking
(asynchronous)

http://msdn.microsoft.com/en-us/library/ms345346.aspx

Source data Source servers


EMC CX600
run SSIS
2 Gb Fiber Channel

Destination
server runs SQL
Server

Database
EMC CX3-80

1 Gb Ethernet
connections

4 Gb Fiber Channel
Source servers:
Unisys ES3220L
2 sockets each with 4 core
Intel 2 GHz CPUs
4 GB RAM
Windows Server 2008
SQL Server 2008

Destination server:
Unisys ES7000/One
32 sockets each with dual core
Intel 3.4 GHz CPUs
256 GB RAM
Windows Server 2008
SQL Server 2008

Make:
Model:
OS:
CPU:
RAM:
HBA:
NIC:
Database:
Storage:

Unisys
ES7000/one Enterprise Server
Microsoft Windows Server 2008 x64 Datacenter Edition
32 socket dual core Intel Xeon 3.4 GHz (7140M)
256 GB
8 dual port 4Gbit FC
Intel PRO/1000 MT Server Adapter
Pre-release build of SQL Server 2008 Enterprise Edition (V10.0.1300.4)
EMC Clariion CX3-80 (Qty 1)
11 trays of 15 disks; 165 spindles x 146 GB 15Krpm; 4Gbit FC

Quantity:
Make:
Model:
OS:
CPU:
RAM:
HBA:
NIC:
Database:
Storage:

4
Unisys
ES3220L
Windows2008 x64 Enterprise Edition
2 socket quad core Intel Xeon processors @ 2.0GHz
4 GB
1 dual port 4Gbit Emulex FC
Intel PRO1000/PT dual port
Pre-release build of SQL Server 2008 Integration Services (V10.0.1300.4)
2x EMC CLARiiON CX600 (ea: 45 spindles, 4 2Gbit FC)

C1

C1

C1

C1

Orders Table
Partition
1

Orders_1

Partition
2

Orders_2

Partition
3

Orders_3

Partition
4

Orders_4

Partition
5

Orders_5

Partition
6

Orders_6

SSIS

SSIS

SSIS

SSIS

orders.tbl.3

orders.tbl.4

orders.tbl.5

orders.tbl.6

...

Orders_55

orders.tbl.55

Partition
56

Orders_56

SSIS

SSIS
orders.tbl.2

...

SSIS

SSIS
orders.tbl.1

Partition
55

orders.tbl.56

(Package details
removed to protect
the innocent)

Follow Microsoft
Development Guidelines

Iterative design, development & testing

Understand the Business

People & Processes


Kimballs ETL and SSIS books are an excellent reference

Get the big picture

Platform considerations

Resource contention, processing windows,


SSIS does not forgive bad database design
Old principles still apply e.g. load with/without indexes?
Will this run on IA64 / X64?
No BIDS on IA64 how will I debug?
Is OLE-DB driver XXX available on IA64?
Memory and resource usage on different platforms

Process
Modularity

Break complex ETL into logically distinct packages (vs


monolithic design)
Improves development & debug experience

Package
Modularity

Separate sub-processes within package into separate Containers


More elegant, easier to develop
Simple to disable whole Containers when debugging

Component
Modularity

Use Script Task/Transform for one-off problems


Build custom components for maximum re-use

Concise naming conventions


Conformed blueprint design patterns
Presentable layout

Annotations
Error Logging
Configurations

Get as close to the data as possible


Limit number of columns
Filter number of rows

Dont be afraid to leverage TSQL


Type conversions, null coercing, coalescing, data type sharpening
select nullif(name, ) from contacts order by 1
select convert(tinyint, code) from sales

Performance Testing & Tuning


Connect Output to RowCount transform
See Performance Best Practices

FastParse for text files

BEFORE:

AFTER:

select
select * from etl.uf_FactStoreSales(@Date)
dbo.Tbl_Dim_Store.SK_Store_ID
, Tbl_Dim_Store.Store_Num
,isnull(dbo.Tbl_Dim_Merchant_Division.SK_Merch_Di
v_ID, 0) as SK_Merch_Div_ID
from dbo.Tbl_Dim_Store
left outer join dbo.Tbl_Dim_Merchant_Division
on dbo.Tbl_Dim_Store.Merch_Div_Num =
dbo.Tbl_Dim_Merchant_Division.Merch_Div_N
um
where Current_Row = 1

Use the power of TSQL to clean the data 'on the fly'

Avoid overdesign
Maximize
Parallelism
Minimize
blocking
Minimize
ancillary data

Too many moving parts is inelegant and likely slow


But dont be afraid to experiment there are many ways to
solve a problem
Allocate enough threads
EngineThreads property on DataFlow Task
See Performance Talk

Synchronous vs. Asynchronous components


Memcopy is expensive

For example, minimize data retrieved by LookupTx

Three Modes of
Operation

Tradeoff
memory vs.
performance
Can use Merge
Join component
instead

Full Cache for small lookup datasets


No Cache for volatile lookup datasets
Partial Cache for large lookup datasets

Full Cache is optimal, but uses the most memory, also takes time to
load
Partial Cache can be expensive since it populates on the fly using
singleton SELECTs
No Cache uses no memory, but takes longer

Catch is that it requires Sorted inputs


See SSIS Performance white paper for more details

Custom
components

Scripts

Can written in any .Net language


Must be signed, registered and installed but
can be widely re-used
Quite fiddly for single task

Can be written in VisualBasic.Net or C#


Are persisted within a package and have
limited reuse
Have template methods already created for you

http://sqlcat.com

http://technet.microsoft.com/en-us/library/bb961995.aspx

http://blogs.msdn.com/sqlperf/archive/2008/02/27/etl-world-record.aspx

You might also like