SSIS Tutorial Transcript for bbv Techday 2006


© September, 2006 by Urs Gehrig

SQL Server Integration Services

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

Page 2 of 24

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

Goal and Scenario
This tutorial about SSIS shows you what SSIS is for – namely Microsoft’s new ETL (Exctract Transform and Load) tool. SSIS is a part of SQL Server 2005 and the successor of SQL Server 2000’s DTS. This tutorial is not a summary of best practices it’s more a demonstration of its power and should give you some ideas for your next project. During this tutorial you will see an example of how you can load address data from a legacy system, adding some missing data and auditing information. The SSIS package doesn’t overwrite already existing data. Instead it writes a history of all chances so you will know all the time who made which chances at what time. What you will learn here is great and useful but also only the peak of the iceberg. Therefore at the end of this tutorial you will find both some ideas for your own exercises and great web links and books about SSIS. Have fun!

Prerequisite
Reading this tutorial is one thing, to program it on your on machine is another. You don’t need a lot for exercising yourself, only • SQL Server 2005 SP1 • Access to bbv’s ftp server. The ftp server plays the role of a legacy system. • Internet connection. You will query geonames.org. • And last but not least this transcript. You can get a copy from wiki.bbv.ch. You will also find there a BIDS project containing two packages. One package is the result of this tutorial and the other one for setting up the DB.

Some Words about SQL Server Integration Services
“It took a unique team to build SQL Server 2005 Integration Services. If I had told you in 2000 when we started the Integration Services project that we would assembly a team of almost 30 people at Microsoft who were utterly passionate about ETL, you would have been skeptical.” – Bill Baker, General Manager for SQL Server BI at Microsoft “All are correct because Integration Services is a set of utilities, applications, designers, components, and services all wrap up into one powerful software application suite. SQL Server Integration Services (SSIS) is many things to many people. – Kirk Haselden, Development Manager of SSIS team at Microsoft

Page 3 of 24

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

Step by Step Tutorial 1 Create a new SSIS project and solution. 2 Add user variables to hold working directory etc. • The working directory will contain all of our input files. NOTE: Variables are case sensitive.

From the “SSIS” menu choose “Variables”. Click the “Add new Variable” Button. Name it “WorkingDir”. Scope should be the top most. Data Type of string. For “Value” enter “D:\Work\bbv TechDay 2006\SSIS Tutorial\Work” (no quotes). 7. Do the same for Variable “InputFiles” of type string and Value “AD*.tab”. The variable window looks like:

1. 2. 3. 4. 5. 6.

3 Add and configure a FTP task. • We will download the address files from there.

1. Make the Control Flow canvas (vs. Data Flow) active. 2. From the toolbox, add a “FTP” task and open it. 3. On the “General” page set “Name” to “Load Input Files from Legacy”. 4. For “FtpConnection” select “<New Connection…>”. 5. For “Server name” type “ftp.bbv.ch”, “User name” is “bbvftp” and “Password” is “???”1. 6. Click “OK” to close “FTP Connection Manager Editor”. 7. On the “File Transfer” page set “IsLocalPathVariable” to “True”, “LocalVariable” to “User::WorkingDir”, “OverwriteFileAtDest ” to “True” and “Operation” to “Receive Files”. 8. On the “Expressions” page enter the expression “"/System/SSIS_Tutorial/" + @[User::InputFiles]” for the property “RemotePath”.

For security reason the password is not shown here.

Page 4 of 24

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

4 Add and configure a For Each container. • We will loop over the address files. • The container will return the full file name into a mapped variable.

9. Click “OK” to close “FTP Task Editor”. 1. Add, connect and open a “Fore Each” container to the dataflow. 2. On the “General” page set “Name” to “Process Input Files”. 3. On the “Collection” page use the default enumerator type of “Foreach File Enumerator”. So it will loop once for each file in the specified folder. 4. Enter the expression “@[User::WorkingDir]” for the property “Directory” and “@[User::InputFiles]” for “FileSpec”. 5. Leave the default “Fully Qualified” for “Retrieve File name” 6. Change to the “Variable Mappings” page. 7. In the “Variable” Column, drop the list down and choose “Add a new variable” that is scoped to the top most container, which is the package itself. 8. Name the variable “InputFile” and set “Value Type“ to “String”. 9. Click “OK” twice to close “add variable” dialog and “For Each Loop” editor. 10.After you hit enter a warning icon may appear on the transform. If you hover your mouse over the transform, a tool tip will mention an empty path.

11.We need to delay the validation until runtime. Set the property “DelayValidation” to TRUE.
Page 5 of 24

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

5 Add a new user variable to hold row counts. • Later, inside the DataFlow task, we will map RowCount transform to this variable, in effect storing the row count of the Data Flow Path.

6 Add a dataflow task. • The task will be processed once per iteration of the loop. Therefore in our case for each file in the folder the dataflow task will be executed.

1. From the “SSIS” menu choose “Variables”. You should see the “InputFile” already there. 2. Click the “Add new Variable” Button. 3. Name should be “LineCount” 4. Scope should be the top most. 5. Data Type of int32. 6. Value can be left at 0. 1. From the Toolbox add a “Data Flow Task” to the inside of the loop container and open it. 2. Rename in to “Read Input File”. 3. The data flow looks like:

7 Add Flat File Source and Connection Manager. • We define a single and specific file in this step; it could be any of our existing files. After this step we will define a Property Expression on the new connection manager to load a different file per iteration of the loop.

1. Open the Data Flow Task. 2. From the Toolbox add a Flat File Source to the Data Flow Task and open the Flat File Source. 3. Click “new” to create a new Flat File Connection Manager. 4. For “Connection Manager Name” enter “Load Address Data”. 5. For the “File Name” point to our first file “D:\Work\bbv TechDay 2006\SSIS Tutorial\Work\AD20060920.tab” (no quotes). 6. Select “Tab {t}” as “Column delimiter”. 7. In “Preview rows 1 - 100” you can see three dummy rows and some strange characters. 8. After you select “10000 (MAC – Roman)” for “Code page”

Page 6 of 24

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

8 Modify Connection String to dynamically change with loop iteration. • Remember the variable “InputFile” we created earlier. We need that to feed our connection string per iteration of the loop via a property expression.

and “3” for “Header rows to skip” everything is ok. 9. Rename the columns as following: ID, Phone, Company, Title, Surname, Name, Address, Country, Zip and City. 10.Click “OK” twice to close “Connection Manager” dialog and “Flat File Source” editor. 1. In the Connection manager window select (but not open) the “Load Address Data” connection manager. We want to view its properties in the property sheet not the editor window. 2. In the property pane click in the empty row for “Expression” and then click the ellipse button. 3. Chose the “ConnectionString” property and click the ellipse button for the “Expression” column to go into the expression builder. 4. Expand the variable folder and drag the “InputFile” variable down to the expression. 5. Click “OK” twice to close “Expression Builder” and “Property Expression Editor”.

Page 7 of 24

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

9 Add Column to hold name of the file processed. • This will add a new column to our data flow, containing the file name we are processing, to each data row. Nice for auditing.

10 Add a Row Count Transform. • To capture the number of rows processed to a variable. • Anomaly with the “Row Count Transform” is you have to manually type in variable names, it will not allow you to pick from a list. Remember variable names are case sensitive. 11 Add Audit Transform. • To add useful meta data to our data stream for capturing in log data, such as package name and start time.

1. With the Flat File Source selected, view the properties window and 2. Set the “FilenameColumnName” property to “SourceFilename”. 3. After you hit enter a warning icon may appear on the transform. If you hover your mouse over the transform, a tool tip will mention a meta data error. 4. We need to refresh metadata. Right click the source and choose “Advanced Editor”. 5. Select the “Refresh” button at the bottom and click “OK” to close the source. 1. Add, connect and open a “Row Count Transform” to the dataflow. 2. Enter “LineCount” in the “VariableName” property (the variable we created earlier). 3. Click “OK” to close “Row Count Transform”. 1. Add, connect and open an “Audit Transform” to the dataflow. 2. In the first blank row, click the “Audit Type” column and select “Execution Start Time”. 3. Note the name is automatically filled in for you. 4. Now keep adding audit types “Machine Name” and “User Name”.

5. Click “OK” to close “Audit Transform”.
Page 8 of 24

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

12 Split all Addresses without any country info. • In the next step we will try to guest the missing country code.

13 Try to find out the missing Country codes. • For this we make a call to the net, e.g. http://ws.geonames.org/postalCodeSearch?postalcode=90 11&placename=Irnsum&maxRows=1. So wee need a HTTP Connection manager. • The answer looks like:
<?xml version="1.0" encoding="UTF-8" ?> <geonames> <totalResultsCount>1</totalResultsCount> <code> <postalcode>9011</postalcode> <name>Irnsum (Jirnsum)</name> <countryCode>NL</countryCode> <lat>53.09166665</lat> <lng>5.75</lng> </code> </geonames>

1. Add, connect and open a “Conditional Split Transform” to the dataflow. 2. Enter “Without Country” for “Output Name” and “TRIM(Country) == ""” for “Condition”. 3. Enter “With Country” for “Default output name”. 4. Click “OK” to close “Conditional Split Transform”. 1. Right click into “Connection Mangers” window and select “New Connection...“ 2. Select Type “HTTP” and press “Add…” 3. Point the “Server URL” to “http://ws.geonames.org/postalCodeSearch?postalcode ={Zip}&placename={City}&maxRows=1 and verify it by pressing “Test Connection”. 4. Press “OK” to close “HTTP Connection Manager Editor”. 5. Rename it to “geonames.org” 6. Add, connect (to the “With Country” path) and open a “Script Component” to the dataflow. 7. Rename it to “Add Country”. 8. From “Available Input Columns” select “Country”, “Zip” and “City”. 9. For “Country” change “Usage Type” to “ReadWrite”. 10.Go to tab “Connection Managers” and add Connection Manager “geonames.org”. Rename it to “Geonames”. 11.Go to tab “Script” and press “Design Script…” 12.In the “Project Explorer” right click on node “References” and select “Add Reference…” 13.Select Component “System.XML” and press “Add”. Do the same for “Microsoft.SQLServer.ManagedDTS”. 14.Press “OK” to close “Add Reference”. 15.Add the following two import lines at the very beginning of the source code: “Imports System.Text” and “Imports Microsoft.SqlServer.Dts.Runtime” 16.Type in the whole code as show in Appendix A.

Page 9 of 24

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

14 Merge both data paths together.

15 Add two Data Viewers to see the effect of “Add Country”.

1. Add and connect (to the “With Country” path) an “Union All Component” to the dataflow. 2. Add a path from the Script Task “Add Country” to the “Union All Task”. 1. Right click the path “Without Country” and select the menu “Date Viewers…”. 2. Select the tab “Data Viewers” and click the button “Add…”. 3. Name it “Without Country: before adding” and for Type select “Grid”. 4. Under the tab “Grid” remove all columns from “Displayed Columns” except “ID”, “Name”, “Address”, “Country”, “Zip” and “City”. 5. Click “OK” twice to close “Config Data Viewer” and “Data Path Flow Editor”. 6. Do the same for the path connecting “Add Country” and “Union All” but name it “Without Country: after adding”. 7. Note the two little icons indicating a Data Viewer is on the path:

8. Run the package. The two “Data Viewer” appears but only one
Page 10 of 24

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

shows data. A “Data Viewer” always pause the package processing.

16 Have a look to the distribution of the Countries. • Add an additional Data Viewer but this time of type “Column Chart”.

9. To continue press the green arrow button in top of the “Data Viewer” window. 10.The package pause the processing again and the second “Data Viewer” shows also data. Have a look to the “Country” column where you will see the country code now. 11.Continue the package processing again and stop the debugger. 1. Add a “Data Viewer” to the path from “Conditional Split” to “Union All”. 2. Name it “Distribution of Country” and make it of Type “Column Chart”. 3. On the tab “Chart Column” select “Country” as “Visualized column”. 4. Click “OK” twice to close “Config Data Viewer” and “Data Path Flow Editor”. 5. Run the package again and have a look to the chart.

Page 11 of 24

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

17 Cleansing the columns. • Have a look to one of the two Data Viewer of Type Grid and you see that some string values have leading white spaces. We have to remove all of them.

6. Stop the debugger. 1. Add, connect and open a “Derived Column” to the dataflow and name it “Trim all values”. 2. For “Derived Column” select “Replace Phone” and set “Expression” to “TRIM(Phone)”. 3. Do the same for the “Derived Column” “Replace Company” through “Replace City”.

Page 12 of 24

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

18 Prepare the columns for saving to database. • In our DB we will save all strings as UNICODE but the data from the files are coded as “Macintosh (Roman)”. So we have to convert the strings.

4. Click “OK” to close “Derived Column Transformation Editor”. 1. Add, connect and open a “Data Conversion” to the dataflow and name it “Macintosh -> Unicode”. 2. Convert “Input Column” Phone to “Unicode string” and name it “nPhone”. 3. Do the same for “Company”, “Titel”, “Surname”, “Name”, “Address”, “Country”, “Zip” and “City”. 4. Convert “Execution Start Time” to “database timestamp” and name it “nExecution Start time”.

19 Its’ time to save the data to the DB. • Because we will maintain a history of all chances we will add a SCD to the dataflow.

5. Click “OK” to close “Data Conversion Transformation Editor”. 1. Add, connect and open a “Slowly Changing Dimension” to the dataflow and name it “Save Addresses”. 2. For “Table or view” select “dbo.Addresses”. 3. For “Input Columns” select the corresponding Column, like “nAddress” for “Address”. 4. Don’t assign an “Input Column” for “ValidFrom” and “ValidTo”. 5. “ID” is the only one of key type “Business key”. Click “Next >”. 6. Dimensions columns “Address” to “Zip” are of change type “Historical attribute”. Click “Next >”. 7. Use start and end date to identify current and expired records: “ValidFrom” and “ValidTo” and select “System::Start Time”

Page 13 of 24

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

for the value to set. Click “Next >”. 8. Deselect “Enable inferred member support”. Click “Next >” and “Finish” to close the wizard. 9. The wizard adds several data flow controls to the path:

10.Lets have a look to the SQL statement of the “OLE DB Command” task:

Page 14 of 24

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

20 Last but not least – delete the processed input file.

1. Make the Control Flow canvas (vs. Data Flow) active. 2. Add, connect and open a “File System Task” to the “Success task flow”. 3. Name it “Delete Input File”. 4. Select “Delete file” as “Operation”. 5. Set “IsSourcePathVariable” to “True” and “SourceVariable” to “User::InputFile”. 6. Click “OK” to close “File System Task Editor”. 7. The control flow looks like following:

Page 15 of 24

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

21 Have a look to the running package.

22 Save environment specific parameters to a configuration 1. Select from the Windows Start Menu “Run…” and start the file. program “sysdm.cpl”. • Because the path to the configuration file is embedded to 2. Go to the “Advanced” page and click “Environment the package, the whole package is no longer independent Variables”. to the environment. Therefore we will use an indirect 3. Under “System variables” click “New”.
Page 16 of 24

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

configuration, i.e. the path to the configuration file is placed in an environment variable.

4. Define the variable “SSISTutorial_Configuration” and set its value to “D:\Work\bbv TechDay 2006\SSIS Tutorial\ SSISTutorial_Configuration.dtsconfig”.

5. Click “OK” three times to close “System Property” dialog. 6. Go back to the BIDS close and reopen it to get the new environment variable. Open the project and go to the package. 7. Select from the menu “SSIS / Package Configuration…”. 8. Select “Enable package configurations”. 9. Click “Add” and “Next”. 10.As “Configuration type” select “XML configuration file”. 11.Select “Configuration location is stored in an environment variable”. 12.Select the environment variable “SSISTutorial_Configuration”. 13.Click “Next >”. 14.Name it “Configuration” and click “Finish”. 15.Click “Add” and “Next” again. 16.This time select “XML configuration file” and “Specify configuration settings directly”. 17.Name the configuration file “D:\Work\bbv TechDay 2006\SSIS Tutorial\ SSISTutorial_Configuration.dtsconfig” and click “Next >”. 18.Select following objects: • Executables / Load Input Files from Legacy / Properties / RemotePath • Connection Managers / FTP Connection Manager /
Page 17 of 24

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

• • • • • • • •

Properties / ServerName Connection Managers / FTP Connection Manager / Properties / ServerPassword Connection Managers / FTP Connection Manager / Properties / ServerUserName Connection Managers / geonames.org / Properties / ConnectionString Connection Managers / SSIS_Tutorial / Properties / ConnectionString Connection Managers / SSIS_Tutorial / Properties / InitialCatalog Connection Managers / SSIS_Tutorial / Properties / ServerName Variables / InputFiles / Properties / Value Variables / WorkingDir / Properties / Value

19.Click “Next >”.
Page 18 of 24

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

23 Inspect the configuration file.

20.Name it “XML File” and click “Finish”. 21.Click “Close” to close “Package Configurations Organizer”. 1. In the “Solution Explorer” right click on the node project node and select “Add / Existing Item…”. 2. Select the file “D:\Work\bbv TechDay 2006\SSIS Tutorial\ SSISTutorial_Configuration.dtsconfig” and click “Add”. 3. Open the file and press “Ctrl+K, Ctrl+D” to format the configuration file. 4. Go to the line for the ftp server password. For security reason the password was not exported. Type it in again.

Page 19 of 24

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

Are you ready for some more interesting stuff?
Following are some ideas for your own exercises: • Error Flow: Many data flow controls can redirect erroneous data to a different path. Try to redirect addresses with truncated columns to an error file for later investigation. • Package logging: Enable logging for the package and play around with the many logging options and log providers. (see BIDS menu “SSIS / Logging…”) • Transactions: Start looking in the BOL; search for “transaction [Integration Services]” • Checkpoints: You can configure Integration Services packages to restart from a point of failure, instead of rerunning the entire package, by setting the properties that apply to checkpoints. Insert a checkpoint after the task “Load Input Files from Legacy”. Start looking in the BOL; search for “checkpoints [Integration Services]”.

Are you looking for more information?
Try this books, web sites etc.: • Microsoft’s SSIS product site: http://www.microsoft.com/sql/technologies/integration/default.mspx • Project REAL: Business Intelligence ETL Design Practices: http://www.microsoft.com/technet/prodtechnol/sql/2005/realetldp.mspx • The ultimate SSIS Book (in my opinion) from the Development Manager on the Integration Services team (You will found it in bbv library.): HASELDEN, Kirk: Microsoft SQL Server 2005 Integration Services. Sams Puplishing 2006.

Page 20 of 24

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

Appendix A – Code for Script Component “Add Country”
' Microsoft SQL Server Integration Services user script component ' This is your new script component in Microsoft Visual Basic .NET ' ScriptMain is the entrypoint class for script components Imports Imports Imports Imports Imports Imports Imports System System.Data System.Math System.Text Microsoft.SqlServer.Dts.Pipeline.Wrapper Microsoft.SqlServer.Dts.Runtime.Wrapper Microsoft.SqlServer.Dts.Runtime

Public Class ScriptMain Inherits UserComponent #Region "Private declarations..." Private httpConnection As Microsoft.SqlServer.Dts.Runtime.HttpClientConnection #End Region Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer) Dim xmlResult As Xml.XmlDocument = New Xml.XmlDocument() Dim node As Xml.XmlNode Dim URL As String 'chance URL according actual row parameters and load page 'Remark: Because you can change DTS Variables only in 'PreExecute' and 'PostExecute' ' we can't use an expression for 'genames.org' connection string. So we do it on this way. URL = Me.Connections.Geonames.ConnectionString.Replace("{Zip}", Uri.EscapeUriString(Row.Zip)) URL = URL.Replace("{City}", Uri.EscapeUriString(Row.City)) httpConnection.ServerURL = URL xmlResult.LoadXml(Encoding.Default.GetString(httpConnection.DownloadData())) 'xmlResult example: '<?xml version="1.0" encoding="UTF-8" ?> '<geonames> ' <totalResultsCount>1</totalResultsCount>
Page 21 of 24

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

' <code> ' <postalcode>9011</postalcode> ' <name>Irnsum (Jirnsum)</name> ' <countryCode>NL</countryCode> ' <lat>53.09166665</lat> ' <lng>5.75</lng> ' </code> '</geonames> 'parse result and set country node = xmlResult.DocumentElement.SelectSingleNode("/geonames/code/countryCode") Row.Country = CStr(IIf(node Is Nothing, "<unknown>", node.InnerText)) End Sub

Public Overrides Sub AcquireConnections(ByVal Transaction As Object) MyBase.AcquireConnections(Transaction) httpConnection = New HttpClientConnection(Me.Connections.Geonames.AcquireConnection(Nothing)) End Sub Public Overrides Sub ReleaseConnections() MyBase.ReleaseConnections() httpConnection = Nothing End Sub End Class

Page 22 of 24

SSIS Tutorial – Transcript for bbv Techday 2006

© September, 2006 by Urs Gehrig

Appendix B – Adresses Table Definition

Page 23 of 24

© September, 2006 by Urs Gehrig