Professional Documents
Culture Documents
Msbi - Ssis (2.0)
Msbi - Ssis (2.0)
ETL Training
Microsoft SQL Server Integration Services 2008
Agenda
1. Introduction
2. SSIS Architecture
3. Control flow
4. Data flow
5. Labs
6. Emerging scenarios ( Fuzzy, SCD, RSS )
7. More Hands on Lab
8. SQL 2000 – SQL 2005 Migration
9. Resources
What You Learn
Reporting Services
Development Tools
Management Tools
Analysis Services
OLAP & Data Mining
Integration Services
ETL
SQL Server
Relational Engine
BI Stack Usage (ETL – Cube – Reporting)
Databases
ETL Standard Reports
Adhoc Reports
Data Cube
ETL
Warehouse
Flat Files
Excel
ETL
Excel
Introduction to SSIS and Product History
Data
Data Source Destination
Adapters Adapters
Loops &
Sequences
Tasks
Event
Handlers
Data Flow
Control Flow Data Flow
Send Mail
Merge
Loop
Split
Data Flow
SQL Server Flat File
SSIS Package Architecture Overview
Microsoft SQL Server 2005 Integration Services (SSIS) consists of four key parts:
Integration Services service
Integration Services object model
Integration Services runtime and runtime
executable
Data Flow task
SQL Server Import and Export Wizard
– Launching BIDS
– Project templates – creating and opening SSIS projects
A Guided tour of BI Development Studio (Cont..)
The
package
designer
Package
Explorer
Control
Flow
Solution
Explorer
The Properties
Toolbox Window
Data Flow
SSIS Packages, Control Flow & Task
SSIS Packages, Control Flow & Task
• Package
Unit of Execution
One Or More Tasks
Data Flow
Control Flow
Connection Managers
SSIS Packages, Control Flow & Task
(Cont..)
1. Precedence Constraints
2. Tasks
3. Containers
Control Flow – Loop Containers
• Sequence Container
Control Flow – Sequence Containers
Sequence containers group the package into multiple separate control flows, each
containing one or more tasks and containers that run within the overall package control
flow. As each Sequence container is treated as a single unit of operation, it enables the
execution and tracking of parallel operations.
Lab: Creating a Simple SSIS Package
Excercise:1: Move data from one table to another using simple SSIS
package
Provider
Select the provider to use for
the connection.
Property pane
Sets the provider-specific
connection string properties necessary to
create a connection.
Test connection
Uses the current settings in
the property pane to try to establish a
connection and display whether the
attempt succeeded.
Data Flow
Introduction to Data Flow
1. Data Extraction
2. Data Routing
3. Data Transformations
4. Data Cleansing
5. Data Warehouse Support
6. Business Intelligence Integration
Introduction to Data Flow - Sources
Sources are the data flow components that make data from different types of
data sources available to a data flow. Sources have one regular output and many
have one error output.
OLE DB Source Editor Flat File Editor and XML Source Editor
Introduction to Data Flow – Destination
Destinations are the data flow components that load the data in a data
flow into different types of data sources or create an in-memory dataset.
Destinations have one input and one error output.
The following is the list of destinations that SQL Server
Integration Services (SSIS) provides.
Data Mining Model Training Destination
DataReader Destination
Dimension Processing Destination
Excel Destination
Flat File Destination
OLE DB Destination
Partition Processing Destination
Raw File Destination
Recordset Destination
Script Component
SQL Server Compact Edition Destination
SQL Server Destination
Introduction to Data Flow – Transformations
SQL Server 2008 Integration Services (SSIS) transformations are the components in the
data flow of a package that aggregate, merge, distribute, and modify data.
The Copy Column transformation creates new columns by copying input columns and adding
the new columns to the transformation output. Later in the data flow, different transformations
can be applied to the column copies
Transformations - Data Conversion Transformation
The Data Conversion transformation converts the data in an input column to a different data type and then
copies it to a new Output column.
Transformations - Conditional Split Transformation
The Conditional Split transformation can route data rows to different outputs depending on the
content of the data.
Transformations - Sort Transformation
A data viewer displays data that is moving between two data flow
components. The data flow components can be sources,
transformations and destinations.
To read data from flat files using SSIS, the SSIS needs to do character
encoding. Generally, the flat files can be delimited or fixed width. A text file has
to include things like line endings and carriage returns.
The ASCII character set only has 128 characters (Ex: 0-9, A-Z , a-z etc.).
Many flat file contains Greek characters too. And the difficulty is that in order to
properly read a text file, the reading program needs to know what encoding was
used when the file was created – otherwise it might display the wrong character,
substitute some default “unknown character”.
When you create a connection to a text file, SSIS actually needs the character
encoding of the file to be specified. It calls it a code page.
Package Encryption
ProtectionLevel is an SSIS package property that is used to specify how
sensitive information is saved within the package and also whether to encrypt
the package or the sensitive portions of the package.
ServerStorage: SSIS won’t encrypt any part of the package.Use SQL Server
security.
Synchronous and Asynchronous Transformations
Synchronous:
A synchronous transformation processes incoming rows and pass
them on in the data flow one row at a time. Output is synchronous with input.
Ex: Data Conversion
Asynchronous:
An asynchronous transformation cannot pass each row along in the
data flow as it is processed.
SSIS 2008 provides package execution logging feature. The following are the
five types of log providers.
1) Text File
2) SQL Server
4) XML File
Check point starts the package execution from the point of failure in SSIS. To
configure check points in package, the following properties needs to be
configured at the package level.
The TransactionOption property exists at the package level, container level and
task level. TransactionOption can be set to one of the following:
Variables store values that an Integration Services (SSIS) package and its
containers, tasks, and event handlers can use at run time. The scripts in the
Script task and the Script component can also use variables.
You can use variables in Integration Services packages for the following
purposes:
System Variables
SSIS provides a set of
system variables that store
information about the running
package and its objects. These
variables can be used in
expressions and property
expressions to customize packages,
containers, tasks, and event
handlers.
Ex: PackageName
• CreateDeploymentUtility - True
• AllowConfigurationChanges - True
• DeploymentOutputPath - <Path>
Package Deployment Wizard
Advanced Data Flow
Advanced Data Flow - Lookup Transformation
Fuzzy Lookup
This transformation differs from the Lookup transformation in its use of fuzzy
matching
The Fuzzy Lookup transformation uses fuzzy matching to return one or more
close matches from the reference table.
The Fuzzy Lookup transformation tries to find an exact match. If it fails, the
Fuzzy Lookup transformation provides close matches from the reference
table.
Advanced Data Flow – Fuzzy Lookup
Transformation
Reference Table
Failed
Exact Lookup Fuzzy Lookup
Matched record
Yes No
Further process clean data Score > 0.70 Manual review
If no exact match to a given input record is found in the reference table, try to identify a
fuzzy match. If the resulting fuzzy match has high similarity to the input, consider it
clean and route for further processing or loading.
Advanced Data Flow – Fuzzy Grouping Transformation
Fuzzy Grouping
Input Relation
RID Organization Name Address
… … …
Post-processing
RID Name RID
Merge transformation
– Merge data from two data sources, such as tables and files.
You can configure the Merge Join transformation in the following ways:
The SCD is used to track dimensional changes. There are six types of SCDs. Out of
six, first three types are commonly used.
Type 0: Data never gets modified.
Type 1: Overwrites old with new data, and therefore does not track historical data.
Type 2: Tracks historical data by creating multiple records for a give natural key in the
dimensional tables with separate surrogate keys and/or different version numbers.
SCD (Slowly Changing Dimension)
– Matching incoming rows with rows in the lookup table to identify new and existing rows.
– Identifying incoming rows that contain changes when changes are not permitted.
– Identifying incoming rows that contain historical changes that require insertion of new
records and the updating of expired records.
– Detecting incoming rows that contain changes that require the updating of existing
records, including expired ones.
Advanced Data Flow – Slowly Changing Dimension
Transformation
1. Choose the connection manager to access the data source that contains the dimension table that you
want to update You can select from a list of connection managers that the package includes.
2. Choose the dimension table or view you want to update. After you select the connection manager, you
can select the table or view from the data source.
3. Set key attributes on columns and map input columns to columns in the dimension table. You must
choose at least one business key column in the dimension table and map it to an input column.
– Fixed attribute indicates that the column value must not change.
Advanced Data Flow – Slowly Changing Dimension
Transformation (Cont..)
Steps to create Slowly Changing Dimension transformation outputs
• You can access data by using technologies that are not supported by built-in connection types.
• You can modify the data in the source with different data from the destination.
• You can validate important columns in the source data and skip records that contain invalid data to
prevent them from being copied to the destination.
SSIS Package Deployment
SSIS Package Deployment
Integration Services (SSIS) includes tools and wizards for deploying packages.
And running the Package Installation Wizard to install the packages to the file
system or to an instance of SQL Server 2005 or above.
Before you build a deployment utility for the packages, you can create package
configurations that update properties of package elements at run time.
The configurations are automatically included when you deploy the packages.
SSIS Package Deployment
How to: Create an Integration Services Package Deployment Utility
• DTExec Utility
• DTExecUI Utility
SSIS Package Execution
2. Type dtexec / followed by the DTS, SQL, or File option and the package path,
including the package name.
6. Optionally, view logging and reporting information before closing the Command
Prompt window.
SSIS Package Execution
4. Expand the Stored Packages folder and its subfolders to locate the package to run,
right-click the package, and then click Run Package.
5. In the Execute Package Utility dialog box, optionally, specify a different package to run.
7. To review the command line that the utility uses, click Command Line.
SSIS Package Execution
8. Click Execute.
9. To stop the running package, click Stop in the Package Execution
Progress dialog box.
10. When the package finishes, click Close to exit the Package Execution
Progress dialog box.
11. Click Close.
Scheduling Packages
Scheduling Packages
You can extend the ETL functionality of SSIS by implementing RSS Feed
aggregation.
Read multiple RSS feeds.
Filter the content based on user preferences.
Create a single RSS feed that contains the combined feed.
SSIS also opens up the programming environment to target data sources, such as Active
Directory or any other custom data source