You are on page 1of 47

Irish SQL Academy 2008.

Level 300

Bob Duffy
DTS 2000 SSIS 2005

1.75 Developers

*Figures are only approximations and should not be referenced or quoted


Optimize and • Minimize staging (else use RawFiles if possible)
• Hardware Infrastructure: Disks, RAM, CPU, Network
Stabilize the basics • SQL Infrastructure: File Groups, Indexing, Partitioning

• Replace destinations with RowCount


Measure • Source->RowCount throughput
• Source->Destination throughput

• OVAL performance tuning strategy


• The Three S‟s
Tune • Data Flow Bag of Tricks

• Lookup patterns
Parallelize • Script vs custom transform
• Increase the efficiency of
Sharpen every aspect

• Parallelize, partition,
Share pipeline

• Buy faster, bigger, better


Spend hardware
But be aware of limitations
Row
Based
(synchronous)

Partially
Blocking
(asynchronous)

Blocking
(asynchronous)
http://msdn.microsoft.com/en-us/library/ms345346.aspx
Source data Source servers
EMC CX600 run SSIS Destination Database
server runs SQL EMC CX3-80
2 Gb Fiber Channel Server
1 Gb Ethernet
connections

4 Gb Fiber Channel

Source servers: Destination server:


Unisys ES3220L Unisys ES7000/One
2 sockets each with 4 core 32 sockets each with dual core
Intel 2 GHz CPUs Intel 3.4 GHz CPUs
4 GB RAM 256 GB RAM
Windows Server 2008 Windows Server 2008
SQL Server 2008 SQL Server 2008
Make: Unisys
Model: ES7000/one Enterprise Server
OS: Microsoft Windows Server 2008 x64 Datacenter Edition
CPU: 32 socket dual core Intel® Xeon 3.4 GHz (7140M)
RAM: 256 GB
HBA: 8 dual port 4Gbit FC
NIC: Intel® PRO/1000 MT Server Adapter
Database: Pre-release build of SQL Server 2008 Enterprise Edition (V10.0.1300.4)
Storage: EMC Clariion CX3-80 (Qty 1)
11 trays of 15 disks; 165 spindles x 146 GB 15Krpm; 4Gbit FC

Quantity: 4
Make: Unisys
Model: ES3220L
OS: Windows2008 x64 Enterprise Edition
CPU: 2 socket quad core Intel® Xeon processors @ 2.0GHz
RAM: 4 GB
HBA: 1 dual port 4Gbit Emulex FC
NIC: Intel PRO1000/PT dual port
Database: Pre-release build of SQL Server 2008 Integration Services (V10.0.1300.4)
Storage: 2x EMC CLARiiON CX600 (ea: 45 spindles, 4 2Gbit FC)
C1

C1

C1

C1
Orders Table
Partition Partition Partition Partition Partition Partition Partition Partition
1 2 3 4 5 6 55 56

Orders_1 Orders_2 Orders_3 Orders_4 Orders_5 Orders_6 ... Orders_55 Orders_56

...
SSIS

SSIS

SSIS

SSIS

SSIS

SSIS

SSIS

SSIS
orders.tbl.1 orders.tbl.2 orders.tbl.3 orders.tbl.4 orders.tbl.5 orders.tbl.6 orders.tbl.55 orders.tbl.56
(Package details
removed to protect
the innocent)
Follow Microsoft
• Iterative design, development & testing
Development Guidelines

• People & Processes


Understand the Business • Kimball‟s ETL and SSIS books are an excellent reference

• Resource contention, processing windows, …


Get the big picture • SSIS does not forgive bad database design
• Old principles still apply – e.g. load with/without indexes?

• Will this run on IA64 / X64?


• No BIDS on IA64 – how will I debug?
Platform considerations • Is OLE-DB driver XXX available on IA64?
• Memory and resource usage on different platforms
Process • Break complex ETL into logically distinct packages (vs
monolithic design)

Modularity • Improves development & debug experience

Package • Separate sub-processes within package into separate Containers


• More elegant, easier to develop

Modularity • Simple to disable whole Containers when debugging

Component • Use Script Task/Transform for one-off problems

Modularity
• Build custom components for maximum re-use
Concise naming conventions

Conformed “blueprint” design patterns

Presentable layout

Annotations

Error Logging

Configurations
Get as close to the data as possible
• Limit number of columns
• Filter number of rows

Don‟t be afraid to leverage TSQL


• Type conversions, null coercing, coalescing, data type sharpening
• select nullif(name, „‟) from contacts order by 1
• select convert(tinyint, code) from sales

Performance Testing & Tuning


• Connect Output to RowCount transform
• See Performance Best Practices

„FastParse‟ for text files


BEFORE: AFTER:
select
dbo.Tbl_Dim_Store.SK_Store_ID
select * from etl.uf_FactStoreSales(@Date)
, Tbl_Dim_Store.Store_Num
,isnull(dbo.Tbl_Dim_Merchant_Division.SK_Merch_Di
v_ID, 0) as SK_Merch_Div_ID
from dbo.Tbl_Dim_Store
left outer join dbo.Tbl_Dim_Merchant_Division
on dbo.Tbl_Dim_Store.Merch_Div_Num =
dbo.Tbl_Dim_Merchant_Division.Merch_Div_N
um
where Current_Row = 1
Use the power of TSQL to clean the data 'on the fly'
Avoid over- • Too many moving parts is inelegant and likely slow
• But don‟t be afraid to experiment – there are many ways to
design solve a problem

Maximize • Allocate enough threads


• EngineThreads property on DataFlow Task
Parallelism • See Performance Talk

Minimize • Synchronous vs. Asynchronous components

blocking • Memcopy is expensive

Minimize • For example, minimize data retrieved by LookupTx


ancillary data
Three Modes of • Full Cache – for small lookup datasets
• No Cache – for volatile lookup datasets
Operation • Partial Cache – for large lookup datasets

Tradeoff • Full Cache is optimal, but uses the most memory, also takes time to
load
memory vs. • Partial Cache can be expensive since it populates on the fly using
singleton SELECTs
performance • No Cache uses no memory, but takes longer

Can use Merge • Catch is that it requires Sorted inputs


Join component • See SSIS Performance white paper for more details

instead
• Can written in any .Net language
Custom • Must be signed, registered and installed – but

components can be widely re-used


• Quite fiddly for single task

• Can be written in VisualBasic.Net or C#

Scripts • Are persisted within a package – and have


limited reuse
• Have template methods already created for you
http://sqlcat.com

http://technet.microsoft.com/en-us/library/bb961995.aspx

http://blogs.msdn.com/sqlperf/archive/2008/02/27/etl-world-record.aspx

You might also like