You are on page 1of 95

Declaration: The above bellow information is collected from the internet blog as well as from my personal experience.

There is no hard and fast rule that to be followed while developing SSIS package, but those are generally tried and tested methods. Email Me :

How To: Configure a For Loop Container

1. 2. Start in the package Control Flow, with no object selected (Press ESC in the main window). Right click the background in the package, and select Variables

3. 4.

Add a variable called LoopIteration with the Int32 data type in the package. Add a For Loop Container to the package

5. 6. 7. 8.

Edit the For Loop Container by double-clicking it, or right-clicking it and choosing Edit. Set the InitExpression to @[User::LoopIteration]=0 Set the EvalExpression to @[User::LoopIteration]<5 where 5 would is the number of loops you want to run. Set the AssignExpression to @[User::LoopIteration]=@[User::LoopIteration]+1 Your settings should now look like this:

Converting SSIS Data Types to .Net Data Types

If you use Scripting tasks, the CLR or build components for SSIS, you will need to convert values from their SSIS types into .Net types. Here's a quick list of the SSIS data types and their .Net companion.

Integration Services Managed Data Types Data Type


System.Byte[] System.DateTime System.Decimal System.Byte System.Int16 System.Int32 System.Int64 System.Boolean System.Single System.Double System.Byte System.UInt16 System.UInt32 System.UInt64 System.Guid

Generating a new SSIS Package GUID with the dtutil Utitlity

If you make a copy of a .dtsx file, the Package ID property remains the same. This value is logged as the SourceID, and to ensure it is unique, you should generate a new value for any copies of a package. dtutil /FILE "Copy of sample.dtsx" /IDRegenerate Or the short hand version dtutil /FILE "Copy of sample.dtsx" /I

Simon Sabin's SSIS Raw File Reader

Ever used a Raw File destination in SSIS ? They're FAST! Writing rows from the pipeline to disk as quickly as possible. Raw Files are useful for increasing the throughput speed of a data flow task by using them in Extract / Transform procedures. The reason they are so efficient is they stored the data in a big binary file. Try and read your data from these optimised files, and you'll have a lot of trouble figuring out whats what. Simon's saved us all the hassle! Download his new tool and simply point it at your raw file. It will load the data so you can browse through it in a data grid. Much easier! If you are having troubles selecting your files (I've been using the file extension .dtsxraw on mine) just type *.* and hit enter in the open file box to see all the files in your folder. Simon, that might be your first bug to fix!

Using Expressions with SSIS Maintenance Tasks and SelectedDatabases

Its been noted by a few people that Expressions cannot be used to configure the Shrink Database Task at runtime. The problem is that the SelectedDatabases property is a collection which cannot be created by an expression. The same problem occurs in most of the Maintenance Task items of SSIS. Maintenance Plan Task components are simple wrappers to existing sql commands. The sql that is run by the Shrink Database Task is similar to: USE [scratch] GO DBCC SHRINKDATABASE(N'scratch', 1, TRUNCATEONLY) Instead of using the Shrink Database Task, I prefer using an Execute SQL Task and applying an expression to that. This also gives you greater control of the shrinking process: you can just shrink an

individual data/log file if you wished. Replacing the Shrink Database Task with the more flexible Execute SQL Task Remove the Shrink Database Task from the package, and add a Execute SQL Task in its place.

Assuming that the name of the database you want to shrink is in the variable DBName, you would set the Expression for the SqlStatementSource property of the Execute SQL Task to: "DBCC SHRINKDATABASE(N'" + @[User::DBName] + "', 1)"

So Simple. If you are wanting to shrink multiple databases, you can place the Execute SQL Task inside a For Each Container. You could use the DBCC SHRINKFILE command if you to shrink just the log or an individual data file. DBCC SHRINKDATABASE shrinks both all the log and data files for a database. If you must use the Maintenance Plan Task component, Kirk Haselden has written an example of a self modifying package at runtime. You could use this method however in this case it might make the problem more complex.

No!!! Undo, Undo, Undo!!

Or rather, why is there no "undo" in when working with Integration Services packages? This has been something that has me puzzled me since working with Beta 2 of Integration Services. I guess I figured it would be fixed by RTM, and just worked around the issue by regularly checking in finalised changes, and saving only when I'm sure of a changes I've made. But it hardly seems logical that while Business Intelligence studio is built on the Visual Studio IDE, that undo doesn't work. Even simple undos such as undoing changes to Annotation Textboxes, or undoing layout changes of diagrams in Control / Data Flows. Gah! I meant to double click on that task, not move it! Undo! Please? Under the covers, SSIS Packages are saved as XML files. So why is it Business Intelligence Studio cannot simply keep a history of delta changes to that xml document? Even if its too difficult to undo the delta changes between versions, it should be simple enough to refresh (or open) from the change history and reset the working environment. In the meantime though, I guess I'll just have to keep using the "undo checkout" feature of SourceSafe. If you want to vote on having "undo", or other changes included in the tools for the next version of SQL Server, have a look at the MSDN Product Feedback Centre for SQL Server.

SSIS Lookup with value range

Joe Salvatore called out for help on an interesting use of the SSIS Lookup component: Can you lookup a row using a value within a range? Joe specified his criteria as: 1. 2. 3. DataStagingSource.ModifyDate < DataWarehouseDimension.RowExpiredDate AND DataStagingSource.ModifyDate >= DataWarehouseDimension.RowEffectiveDate AND DataStagingSource.NaturalKey = DataWarehouseDimension.NaturalKey

Easy! To show how it is done I've created a test database "SCRATCH" on my local machine, and created two tables and some data with the following script: CREATE Table dbo.sourcedata (naturalkey varchar(32), modifydate smalldatetime) insert into dbo.sourcedata (naturalkey, modifydate) values ('a','1 Jan 2006') create table dbo.lookupdimension (naturalkey varchar(32), roweffectivedate smalldatetime, rowexpireddate smalldatetime, surrogatekey int) insert into dbo.lookupdimension (naturalkey, roweffectivedate, rowexpireddate, surrogatekey) values ('a', '11 dec 2005', '28 feb 2006', 1) For an example of what we want to get in our output, run:


sourcedata.naturalkey, sourcedata.modifydate, lookupdimension.surrogatekey from sourcedata LEFT OUTER join lookupdimension on sourcedata.naturalkey = lookupdimension.naturalkey and sourcedata.modifydate >= lookupdimension.roweffectivedate and sourcedata.modifydate < lookupdimension.rowexpireddate
Create a SSIS package with a connection to the database. Add a Dataflow task, and a OLEDB Source, Lookup, and some destination for the data (I used a Flat File Destination as it requires the least configuration when you just want a proof of concept). Here's how my DataFlow task looks:

Make sure the OLE DB Source connecter selects all the columns from the dbo.sourcedata table. Hook up the OLE DB Source and Lookup. Then open the properties of the Lookup component by doubleclicking it, or right-click and choose Properties. Select the dbo.lookupdimension to be used as lookup source.

In the Columns tab, make sure the naturalkey has a relationship between the tables. Drag and drop the modifydate column to roweffectivedate. Select surrogatekey as the output.

Now here is where the trick comes in. Jump to advanced, and tick Enable memory restriction, then Modify the SQL Statement. You can now modify the select statement that is used by the component to perform lookups. Lets set it up with the same rules that Joe wants:

select * from (select * from [dbo].[lookupdimension]) as refTable where [refTable].[naturalkey] = ? and ? >= [refTable].[roweffectivedate] and ? < [refTable].[rowexpireddate]

Now click the Parameters button. You should be able to set three parameters here. naturalkey should be the input column for the first parameter, with modifydate for both the other two.

Configure your destination, and add a Data Viewer to the flow if you want to be able to see the results.

Your lookup should be able to locate the row in the dbo.lookupdimension table and return the surrogatekey value.

How to use OUTPUT parameters with SSIS Execute SQL Task

Yesterday while trying to get OUTPUT parameters to work with SSIS Execute SQL Task I encountered a lot of problems, which I'm sure other people have experienced. BOL Help is very light on this subject, so consider this the lost page in help.

The problem comes about because different providers expect parameters to be declared in different ways. OLEDB expects parameters to be marked in the SQL statement with ? (a question mark) and use ordinal positions (0, 1, 2...) as the Parameter name. ADO.Net expects you to use the parameter name in both the SQL statement and the Parameters page. In order to use OUTPUT parameters to return values, you must follow these steps while configuring the Execute SQL Task: For OLEDB Connection Types:

1. 2. 3.



You must select the OLEDB connection type. The IsQueryStoredProcedure option will be greyed out. Use the syntax EXEC ? = dbo.StoredProcedureName ? OUTPUT, ? OUTPUT, ? OUTPUT, ? OUTPUT The first ? will give the return code. You can use the syntax EXEC dbo.StoredProcedureName ? OUTPUT, ? OUTPUT, ? OUTPUT, ? OUTPUT to not capture the return code. Ensure a compatible data type is selected for each Parameter in the Parameters page. Set your parameters Direction to Output. Set the Parameter Name to the parameter marker's ordinal position. That is the first ? maps to Parameter Name 0. The second ? maps to Parameter Name 1, etc.

For ADO.Net Connection Types:

1. 2. 3.


You must select the ADO.Net connection type. You must set IsQueryStoredProcedure to True. Put only the stored procedure's name in SQLStatement. Ensure the data type for each parameter in Parameter Mappings matches the data type you declared the variable as in your SSIS package. Set your parameters Direction to Output. Set the Parameter Name to the same name as the parameter is declared in stored procedure.

For other connection types, check out the table on this page



Note: if you choose the ADO/ADO.Net connection type, parameters will not have datatypes like LONG, ULONG, etc. The datatypes will change to Int32, etc. Make sure that the datatype is EXACTLY the same type as the Variable in your package is defined. If you choose a different datatype (bigger/smaller/different type) you will get the error: Error: 0xC001F009 at Customers: The type of the value being assigned to variable "User::Result_CustomerID" differs from the current variable type. Variables may not change type during execution. Variable types are strict, except for variables of type Object. Error: 0xC002F210 at Add New Customer, Execute SQL Task: Executing the query "dbo.AddCustomer" failed with the following error: "The type of the value being assigned to variable "User::Result_CustomerID" differs from the current variable type. Variables may not change type during execution. Variable types are strict, except for variables of type Object. ". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly. To fix this error make sure the datatype you select for each parameter in the Parameters page exactly matches the datatype for the variable. If you have attempted to use a connection type other than ADO.Net with named parameters you will recieve this error:


Error: 0xC002F210 at Add New customer, Execute SQL Task: Executing the query "exec dbo.AddCustomer" failed with the following error: "Value does not fall within the expected range.". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly. Named parameters can only be used with the connection type. Use ordinal position numbering in order to use OUTPUT parameters with the OLEDB connection type. Eg: 0, 1, 2, 3, etc.

OUTPUT parameters are extremely useful for returning small fragments of data from SQL Server, instead of having a recordset returned. You might use OUTPUT parameters when you want to load a value into a SSIS Package variable so that the value can be reused in many places. The data that is output might be used for configuring / controlling other Control Flow items, instead of being part of a data flow task. If you were using output parameters in Management Studio, your SQL statement might look something like: DECLARE @Name nvarchar(125) DECLARE @DOB smalldatetime DECLARE @CustomerID int EXEC dbo.AddCustomer @CustomerName = @Name, @CustomerDOB = @DOB, @CustomerID = @CustomerID OUTPUT PRINT @CustomerID If you attempt to use the same syntax (highlighted above) with an Execute SQL Task you could end up with the error message: Error: 0xC002F210 at Add New customer, Execute SQL Task: Executing the query "EXEC dbo.AddCustomer @CustomerName = @Name" failed with the following error: "Must declare the scalar variable "@Name".". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.

The only hint SQL Server 2005 Books Online gives is: QueryIsStoredProcedure Indicates whether the specified SQL statement to be run is a stored procedure. This property is read/write only if the task uses the ADO connection manager. Otherwise the property is read-only and its value is false. (from SSIS Designer F1 Help > Task Properties UI Reference > Execute SQL Task Editor (General Page) ) There's a number of pages in Books Online that address Parameter use with the Execute SQL Task, but none adaquately address using output parameters. Articles which could do with updating:

How to: Map Query Parameters to Variables in an Execute SQL Task Execute SQL Task Editor (Parameter Mapping Page) Execute SQL Task Editor (General Page) Execute SQL Task Execute SQL Task (Integration Services)


Response to SSIS: Be wary of using SourceSafe

Jamie's post touches on one potential problem of having SSIS packages that are too big: errors with Source Safe while checking in. SSIS Packages are saved as xml documents, SourceSafe is going to put in a lot of effort to save the delta changes of the xml between saves. While I haven't encountered this problem in my environment, Jamie does make a suggestion which is very valuable: "Keep your packages as small as can be. A modular approach to development can be of real benefit". I'd definitely agree with Jamie on this approach, especially when using SourceSafe in a team development environment. Because SSIS Packages are XML, you cannot assume that SourceSafe is going to safely merge changes from shared checkouts. In short: you cannot have multiple developers working on the same package at once. When designing/developing a package, ensure only a single task / dataflow is put into a Data Flow task. Group Control Flow tasks with a Sequence Container when you can. When you find your package is becoming too big, or you need to have more than one developer working on it at once, you can easily move the group to a new package, and use an Execute Package Task in the parent to keep the control flow. In addition to thereby enabling your project to have more developers working at once, you are also gaining finer control of packages and tracking of changes. In an article on Event Handler Issues, Daniel Read takes issue that Event Handlers fire too often. If his packages had been more modular, with a parent package calling multiple (sub)packages, enabling events only in the parent package would be suffice. Event Handlers in the child packages would not be necessary. So some quick tips on using SSIS with SourceSafe: 1. 2. 3. 4. Use exclusive checkouts (not shared checkouts) for SSIS Package files. Group Control Flow tasks with Sequence Containers in case you want to break up a package later. Use the Execute Package Task to call other packages from a parent package. Log the Package Version with your custom Event Handlers so you know what version ran when.

On Jamie's other note to backup your SourceSafe: thats a definite. I would backup any resource that is part of your work; source code is especially precious. SourceSafe can be used for tracking changes and little incremental backups of daily work. But backing up SourceSafe helps you recover from detrimental changes to SourceSafe (like permanently deleting a SourceSafe project) or from corruption of SourceSafe itself.

SSIS: Handling Lookup Misses

Let me direct you to an article by Ash on speed differences between two approaches to handing failed lookups in SQL Server Integration Services. Failed lookups are likely to happen at some point, and in most cases you won't want your package to fail as a result, but rather a default or derived value or other handling routine should be used. There are plenty of ways to handle redirected rows as a result of a failure. Ash provides a good reason for why ignoring the failures (instead of redirecting the row) is a better option: memcopy. In order for your row to go down a different path, the data must be copied into that path's buffer. Ignoring the failure means no copying of the row. While the timings aren't a significate difference for a single lookup. If you were loading a huge amount


of data into multiple fact tables, with lots of lookups, you could gain a lot of performance overall. Ash's test shows why benchmarking different methods can often find a more efficient method.

Lookups on a Range
Sometimes it is necessary to lookup with a value that occurs in a table on a range of values rather than to an exact match. SSIS can do a range lookup in the lookup component itself. On the Advanced option you can check the "Modify the SQL statement" button and manually give it any SQL statement you like. Set the parameterMap property of the lookup task to the lineage id's of the parameter columns. The lineage id's are in the lookup advanced editor under Input and Output Properties.

Set the Here is a published help link: It is for SSIS 2005, but has similarities in SSIS 2008.
See also the example in the DateRangeLookup.dtsx package in SSIS_EXAMPLES.

http://svn/corp/CBI/Enterprise Data Warehouse ETL/trunk/SSISPackages/SSIS_EXAMPLES/Packages

Conditional SQL Statements in Expressions.

Expressions can be utilized to create conditional SQL statements. For example, this statement in an expression determines what select statement to use based on the INSelectSourceDataFlag. @[User::INSelectSourceDataFlag] ==True? "SELECT lot_id, current_mfg_facility_oid, staged_datetime, trav_step_oid, mfg_part_code, lot_in_qty FROM fabps.fab_fab_lot_status WHERE (CURRENT_FLAG = 'Y') and staged_datetime >= to_date('" + @[User::INStartExtDate] + "','yyyy-mm-dd hh24:mi:ss') and staged_datetime <= to_date('" + @[User::INEndExtDate] + "','yyyy-mm-dd hh24:mi:ss') and rownum < 100" : "SELECT lot_id, current_mfg_facility_oid, staged_datetime, trav_step_oid, mfg_part_code, lot_in_qty

Title Answer

What is BIDS Error "ORA-12154: TNS:could not resolve the connect identifier specified"? Error "ORA-12154: TNS:could not resolve the connect identifier specified" occurrs when defining a new Oracle Connection Manager that uses a Configuration File to retrieve its login credentials. The package only reads the Configuration File on opening, so the fix is to close and re-open the package which causes it to refresh its database connection credentials from the config file.


Title Answer

Which SSIS dataflow transformations are synchronous or cause pipeline blocking? Blocking in transformations can adversely affect package performance. This link below classifies the SSIS transformations into synchronous vs. asynchronous and group them into levels of blocking that they cause. Note, it specifically is written for SSIS 2005 but likely applies to 2008 as well. Help Link

Title Answer

Hat Tip: Justin Kuttler What causes BIDS error "You cannot debug or run this project, because the required version of the Microsoft Office application is not installed"? You cant set breakpoints in the script components of BIDS. Heres what MSDN has to say about it ( Excerpt: The Script component does not support the use of breakpoints. Therefore, you cannot step through your code and examine values as the package runs. You can monitor the execution of the Script component by using the following methods: Interrupt execution and display a modal message by using the MessageBox.Show method in the System.Windows.Forms namespace. (Remove this code after you complete the debugging process.) Raise events for informational messages, warnings, and errors. The FireInformation, FireWarning, and FireError methods display the event description in the Visual Studio Output window. However, the FireProgress method, the Console.Write method, and Console.WriteLine method do not display any information in the Output window. Messages from the FireProgress event appear on the Progress tab of SSIS Designer. For more information, see Raising Events in the Script Component. Log events or user-defined messages to enabled logging providers. For more information, see Logging in the Script Component. If you just want to examine the output of a Script component configured as a source or as a transformation, without saving the data to a destination, you can stop the data flow with a Row Count Transformation and attach a data viewer to the output of the Script component. For information about data viewers, see Debugging Data Flow.


Hat Tip: Arthur Delight SSIS - Lookup Cache Modes - Full, Partial, None Posted by Phil Brammer under SSIS , SSIS Advanced Techniques , SSIS Data flow There are many, many resources out on the Net regarding SSIS and the Lookup component and what each of its cache modes are and how to implement them in your own package. This is going to be a technical post, for those of you interested in what each cache mode does behind the scenes. For this post, use the following schema and data:
create table fact_sales (id int identity(1,1), sales_rep_id int, sales_dollars decimal(18,2) ) create table dim_sales_rep ( id int identity(1,1), first_name varchar(30), last_name varchar(50) ) insert insert insert insert insert insert insert into into into into into into into fact_sales fact_sales fact_sales fact_sales fact_sales fact_sales fact_sales (sales_rep_id, (sales_rep_id, (sales_rep_id, (sales_rep_id, (sales_rep_id, (sales_rep_id, (sales_rep_id, sales_dollars) sales_dollars) sales_dollars) sales_dollars) sales_dollars) sales_dollars) sales_dollars) values values values values values values values (1,120.99); (2,24.87); (3,98.11); (4,70.64); (4,114.19); (4,37.00); (5,401.50);

insert into dim_sales_rep insert into dim_sales_rep insert into dim_sales_rep ('Larry','White'); insert into dim_sales_rep ('Carrie','Green'); insert into dim_sales_rep ('Adam','Smith');

(first_name, last_name) values ('John','Doe'); (first_name, last_name) values ('Jane','Doe'); (first_name, last_name) values (first_name, last_name) values (first_name, last_name) values

FULL Cache Mode First, it is always advisable to build a query for the lookup, instead of choosing a table in the Table/View drop-down. The primary reason is so that you can limit the resultset to only the columns needed to perform the lookup as well as return any columns needed downstream, and to have the ability to add a WHERE clause if needed. The full cache mode will run the specified query (or its own depending on how you assigned the lookup table) and attempt to cache all of the results. It will execute this query very early on in the package execution to ensure that the first set of rows coming out of the source(s) are cached. If SSIS runs out of memory on the machine though, the 17

data flow will fail as the lookup component will not spool its memory overflow to disk. Be cautious of this fact. Once the data is cached, the lookup component will not go back to the database to retrieve its records, so long as the data flow is not restarted. (In SQL Server 2008, you can now reuse lookup caches.) Using SQL Profiler, you can see that only one database call is made:
declare @p1 int set @p1=1 exec sp_prepare @p1 output,NULL,N'select sales_rep_id, sales_dollars from fact_sales',1 select @p1 go exec sp_execute 1 go SET NO_BROWSETABLE ON go declare @p1 int set @p1=1 exec sp_prepare @p1 output,NULL,N'select id, first_name, last_name from dim_sales_rep',1 select @p1 go exec sp_execute 1 go exec sp_unprepare 1 go exec sp_unprepare 1 go

PARTIAL Cache Mode Partial cache mode will not execute a query immediately at package execution. Instead, it will wait until its first input row arrives. Once the row arrives, whatever lookup value (in this case, sales_rep_id) is being passed in, will get substituted for a parameter, and then SSIS will send the query to the database for retrieval. At this point, all of the data returned will be cached for future lookups. If a new sales_rep_id is encountered, then the query will have to be re-executed, and the new resultset will get added to the lookup cache. In other words, in the above data, if my source is select sales_rep_id, sales_dollars from fact_sales, we should have five database calls made by the lookup component. Even though for sales_rep_id = 4 we have three entries, in partial cache mode the first time we retrieve the lookup records for sales_rep_id = 4, the results will be cached, allowing future occurrences of sales_rep_id = 4 to be retrieved from cache. This is illustrated in the SQL Profiler data:
exec sp_executesql N'select * from (select id, first_name, last_name from dim_sales_rep) [refTable] where [refTable].[id] = @P1',N'@P1 int',1 go exec sp_executesql N'select * from (select id, first_name, last_name


from dim_sales_rep) [refTable] where [refTable].[id] = @P1',N'@P1 go exec sp_executesql N'select * from from dim_sales_rep) [refTable] where [refTable].[id] = @P1',N'@P1 go exec sp_executesql N'select * from from dim_sales_rep) [refTable] where [refTable].[id] = @P1',N'@P1 go exec sp_executesql N'select * from from dim_sales_rep) [refTable] where [refTable].[id] = @P1',N'@P1 go exec sp_unprepare 1 go

int',2 (select id, first_name, last_name int',3 (select id, first_name, last_name int',4 (select id, first_name, last_name int',5

In the above data, you can see at the end each sales_rep_id being passed in. Note that we only have one line for sales_rep_id = 4. Thats because the remaining two records were bounced against the lookup cache, avoiding a trip to the database. NO Cache Mode Using the NO Cache Mode will essentially tell SSIS that you want each incoming row (from fact_sales in this case) to be bounced against the database. Since we have seven fact_sales rows, we will see seven calls to the database - MOST of the time. It is important to note that even though we are telling the lookup component to avoid caching rows, it will keep the last match in memory and use it for the next comparison. If the next comparisons key value matches the value still in memory, a database call is avoided, and the value is carried forward. In our example data above, if we sort by sales_rep_id, we will still only have five calls to the database because after we lookup our first value of sales_rep_id = 4, it will be reused for the subsequent lookups for sales_rep_id = 4. If we sort our data by sales_dollars, we will have six database calls, because only two sales_rep_id = 4 records are together and hence the first lookup is only used once. Here is a simple table illustrating each no cache example mentioned above:
SALES_REP_ID, SALES_DOLLARS, LOOKUP DATABASE CALL Y or N 1 120.99 Y 2 24.87 Y 3 98.11 Y 4 70.64 Y 4 114.19 N 4 37.00 N 5 401.50 Y



2 4 4 3 4 1 5

24.87 37.00 70.64 98.11 114.19 120.99 401.50


The SQL Profiler data for the second example above is here:
exec sp_executesql N'select * from from dim_sales_rep) [refTable] where [refTable].[id] = @P1',N'@P1 go exec sp_executesql N'select * from from dim_sales_rep) [refTable] where [refTable].[id] = @P1',N'@P1 go exec sp_executesql N'select * from from dim_sales_rep) [refTable] where [refTable].[id] = @P1',N'@P1 go exec sp_executesql N'select * from from dim_sales_rep) [refTable] where [refTable].[id] = @P1',N'@P1 go exec sp_executesql N'select * from from dim_sales_rep) [refTable] where [refTable].[id] = @P1',N'@P1 go exec sp_executesql N'select * from from dim_sales_rep) [refTable] where [refTable].[id] = @P1',N'@P1 go (select id, first_name, last_name int',2 (select id, first_name, last_name int',4 (select id, first_name, last_name int',3 (select id, first_name, last_name int',4 (select id, first_name, last_name int',1 (select id, first_name, last_name int',5

Get all from Table A that isn't in Table B

A common requirement when building a data warehouse is to be able to get all rows from a staging table where the business key is not in the dimension table. For example, I may want to get all rows from my STG_DATE table where the DateID is not in DIM_DATE.DateID. There are 2 ways to do this in conventional SQL.





There are many cases where using conventional T-SQL may not be an option in achieving this. Perhaps the data is on different servers. Or perhaps STG_DATE isn't even a table; it may be a text file. In these cases you may have to use DTS to achieve your required results. There are 2 methods of doing this in DTS 2005. The first method is analogous to the first SQL statement above whereas the second method builds on some of the new functionality in DTS 2005. We're going to need a source and destination table for demonstration purposes. Let's create them now. For simplicity we are going to place the tables into the same database.



, )

DayOfQuarter INT


And we're going to need some data in STG_DATE. If you don't edit it the following script will create a data-set from 1st Jan 1900 to 31st Dec 2050 which results in 55152 rows.

SET DATEFIRST 1 DECLARE @startdate DATETIME DECLARE @enddate DATETIME DECLARE @date DATETIME DECLARE @id INT SET @startdate = '1900-01-01' --Change these to SET @enddate = '2050-12-31' --whatever you want SET @id = 0 SET @date = DATEADD(dd, @id, @startdate) WHILE @date <= @enddate BEGIN INSERT INTO STG_DATE VALUES (@id --DateID , @date --TheDate , DATEPART(dd, @date) --DayOfMonth , DATEPART(dy, @date) --DayOfYear , DATEPART(dw, @date) --DayOfWeek , DATENAME(dw, @date) --DayName , DATEPART(ww, @date) --WeekOfYear , 'Week ' + RIGHT('0' + DATENAME(ww, @date), 2) --WeekName , DATEPART(mm, @date) --MonthOfYear , DATENAME(mm, @date) --MonthName , DATEPART(qq, @date) --Quarter , 'Q' + DATENAME(qq, @date) + ' ' + DATENAME(yy, @date) --QuarterName , DATEPART(yy, @date) --Year , DATEPART(hh, @date) --Hour , DATEPART(mi, @date) --Minute , DATEPART(ss, @date) --Second , CASE WHEN DATEPART(dw, @date) IN (6,7) THEN 0 ELSE 1 END --IsWeekday , (DATEDIFF(DAY, DATEADD(qq, DATEDIFF(qq,0,@date) ,0) ,@ate) + 1) --DayOfQuarter



@id @date

= =

@id + 1 DATEADD(dd, @id, @startdate)

Our aim is quite simply to get all the data from STG_DATE into DIM_DATE using DTS 2005.

Method 1
This method is an implementation of the LEFT OUTER JOIN with an IS NULL clause SQL statement that you see at the top of this article. Let's have a look at the data flow.

The STG_DATE source and DIM_DATE source OLE DB Source adapters point to the 2 tables we have just created. I am assuming that you are familiar with OLE DB Source adapters and know how to set up connections to the two tables. The Order by DateID* Sort transformations do exactly what they say on the tin. Now we are going to join the data sourced from the STG_DATE and DIM_DATE tables.

Drag a Merge Join transformation onto the designer and rename it Left Outer Join on DateID Drag the green connector from Order by DateID STG_DATE to Left Outer Join on DateID The Input Output Selection dialog will appear. From the "Input:" combo box select "Merge Join Left Input". We are modeling a SQL Left Outer Join hence it is important that the correct data flow is applied to the correct Left Outer Join on DateID input Double click Left Outer Join on DateID to enter the Merge Join Editor Change the Join Type to "Left Outer Join" Select all the check boxes next to the columns in the left input. This will add them to the output data flow from Left Outer Join on DateID Select the DateID column from the right input and give it an alias of DIM_DATE_DateID. This will add the DateID column to the output data flow from Left Outer Join on DateID

Your Merge Join Editor should now look something like the following.


The Conditional Split transformation is used to split a data flow into multiple data flows based on the state of the data. In this case we are going to use it to identify all the data that has a NULL value in the DIM_DATE_DateID column.

Drag a Conditional Split transformation onto the data flow designer Rename the Conditional Split transformation as Get New Dates Drag the output connector from Left Outer Join on DateID to Get New Dates Double click Get New Dates to view the Conditional Split Editor Expand Columns in the top left treeview and drag the DIM_DATE_DateID column into first row in the bottom half of the editor. This will enable us to build a condition on this column that allows us to define what data goes into this output Change the Condition to "ISNULL([DIM_DATE_DateID])". This will ensure that only rows that have the value NULL in the DIM_DATE_DateID column will be included in this output


Change the Output Name to "New Rows"

Your Conditional Split Editor should now look like this.

Finally, we need to put the data somewhere at the end of the data flow.

Drag an OLE DB Destination adapter onto the designer Rename it DIM_DATE destination Drag the output connector from Get New Dates Conditional Split to DIM_DATE destination The Input Output Selection dialog will appear. From the "Output:" combo box select "New Rows" Double click DIM_DATE destination On the Connection tab select the DIM_DATE table


Click on the mappings tab. The correct mappings will be created automatically by joining fields with identical names

And that's it! Executing this data flow will result in all 55152 rows being copied from STG_DATE to DIM_DATE.

Method 2
This method uses the new DTS Lookup transformation. Before running this you will have to empty the destination table that you have just populated with method 1.

Execute TRUNCATE TABLE DIM_DATE in SQL Server management Studio

Let's have a look at the data flow.

The STG_DATE OLE DB Source adapter points to the STG_DATE table that we created earlier. Again I will assume you can configure this yourself. Now drag a Lookup transformation onto your data flow designer.

Rename the Lookup as Lookup DIM_DATE Drag the green connector from STG_DATE to Lookup DIM_DATE Double click Lookup DIM_DATE to open the Lookup Editor In the "Connection" combo box select the connection that points to the DIM_DATE table From the "Use a table or a view" combo box select the DIM_DATE table. This is specifying that we are going to use the DIM_DATE table as a lookup Click the Columns tab Mappings between the input columns and the lookup columns will have been created automatically. Delete all mappings except the one between the DateID columns. This ensures that we will use values from the lookup table where the values in the DateID columns are equal

The Lookup Editor Columns tab should look like the following.


The error output from Lookup DIM_DATE will contain all the rows for which there was no matching lookup record in the lookup table. In other words, it will contain all the rows that are not already in the lookup table. This is not an error per se but it achieves our aim. Drag a new OLE DB Destination adapter onto the data flow designer

Rename the adapter DIM_DATE Drag the red Error output from Lookup DIM_DATE to DIM_DATE The "Configure Error Output" dialog will be displayed. In the Error combo box change the selection to "Redirect Row" and click OK. This will ensure that no error is raised when rows get added to the Error output Double click DIM_DATE to display the OLE DB Destination Editor Configure DIM_DATE to point to the appropriate table and click on the mappings tab. The mappings will be configured automatically.

And that is it! Again, executing this data flow will result in all 55152 rows being copied from STG_DATE to DIM_DATE.



There are 2 methods that you can use to accomplish this requirement. Method 2 is quicker to build although some people may still choose to use the first method as they may be less happy in using the error output for something for which it was not designed. Experiment to see which method works best for your data set. Currently rated 5.0 by 8 people

Currently 5/5 Stars. 1 2 3 4 5

howto, dataflow, lookup

Data Flow E-mail | Permalink | Comments (12) | Post RSS

Related posts
Searching for tasks with code Executables and Event HandlersSearching packages or just enumerating through all tasks is not quite as straightforward as it may f...Expression Date FunctionsDate Parts Expressions support a range of date related functions such as DATEADD, with the same bas...Creating packages in code Flat File Source to OLE-DB Destination (SQL Server)This code sample programmatically creates a package
that imports a text file into SQL Server, with a...

2/20/2009 8:44:29 AM # This helpful alot to me.. RK 2/24/2009 8:45:47 AM # Yeah, these 2 methods are only used for inserting new records into designate table. What if we want to update existing record in the designate table when there is a match? how would you handle it? marvin 2/24/2009 2:01:53 PM # Consume the output that has the matches so in the first example it is the output "Conditional Split Default Output". Then I would test those rows further using a derived column transform or similar. Just because you have seen the row before does not mean you have to update the existing version. I might do a test similar to InValueColumn != ExistValueColumn This will tell me if a column value has changed between this run and the existing entry. I personally


would then stage the rows that require UPDATEing and UPDATE in an ExecSQL statement in the CTL Flow. You could use a OLE DB Transform to do the same thing but perf will suck as you increase the amount of rows that require updating. Allan Mitchell 3/12/2009 10:45:19 AM # May I suggest using the TableDifference component from for the scenario Martin asked? This will make it very easy to do all comparisons from one component. It is my preferred component for these scenario's, as it is a lot faster than the Slowly Changing Dimension when you are working with a serious number of records, and you don't have to add and configure all sorts of expressions & conditional filters to find out what was changed and what is not. Peter 3/16/2009 8:30:32 PM # Checksum (CRC32) works fine in dev and works as expected in prod most of the time. But at times, Prod job fails with this message: Executed as user: OPENTEXT\wlsqld01usr. Microsoft (R) SQL Server Execute Package Utility Version 10.0.1787.0 for 64-bit Copyright (C) Microsoft Corp 1984-2005. All rights reserved. Started: 12:30:55 PM Error: 2009-03-13 12:31:13.66 Code: 0xC0047062 Source: X_OT_SOP_RAM Checksum [1037] Description: System.ArgumentException: Item has already been added. Key in dictionary: '79764919' Key being added: '79764919' at System.Collections.Hashtable.Insert(Object key, Object nvalue, Boolean add) at System.Collections.Hashtable.SyncHashtable.Add(Object key, Object value) at Konesans.Dts.Component.Helpers.CRC32..ctor(UInt32 aPolynomial, Boolean cacheTable) at Konesans.Dts.Pipeline.ChecksumTransform.ChecksumTransform.PreExecute() at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPreExecute(IDTSManagedComponent Wrapper100 wrapper) End Error Error: 2009-03-13 12:31:13.67 Code: 0xC0047062 Source: X_OT_SOP_LINE_WORK ChecksumDST [307] Description: System.ArgumentException: Item has already been added. Key in dictionary: '79764919' Key being added: '79764919' at System.Collections.Hashtable.Insert(Object key, Object nvalue, Boolean add) at System.Collections.Hashtable.SyncHashtable.Add(Object key, Object value) at Konesans.Dts.Component.Helpers.CRC32..ctor(UInt32 aPolynomial, Boolean cacheTable) at Konesans.Dts.Pipeline.ChecksumTransform.ChecksumTransform.PreExecute() at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPreExecute(IDTSManagedComponent Wrapper100 wrapper) End Error Error: 2009-03-13 12:31:13.67 Code: 0xC004701A Source: X_OT_SOP_RAM SSIS.Pipeline Description: component "Checksum" (1037) failed the pre-execute phase and returned error code 0x80070057. End Error Error: 2009-03-13 12:31:13.67 Code: 0xC004701A Source: X_OT_SOP_LINE_WORK SSIS.Pipeline Description: component "ChecksumDST" (307) failed the pre-execute phase and returned error code 0x80070057. End Error DTExec: The package execution returned DTSER_FAILURE (1). Started: 12:30:55 PM Finished: 1:02:19 PM Elapsed: 1884.63 seconds. The package execution failed. The step failed. Does anybody has any idea how to fix it? Syed 3/19/2009 3:50:56 AM # This is very helpful, Method 2 is lot easier. SDangol 4/16/2009 4:20:39 PM #


Hi Syed: We have run into the same issue as you described here. We run the checksum package on a 64-bit production machine and we have seen this error from time to time. We are looking for a solution on this as well.. Any idea ? OnError,DBSW0212,DMZMGMT\sgalvining,Validate Transform Data,{DD72865F-C8B1-4DF4-BF333724865CE957},{4E0CB18F-0373-414C-9852-4F29B28BD627},4/13/2009 5:03:49 AM,4/13/2009 5:03:49 AM,-1073450910,0x,System.ArgumentException: Item has already been added. Key in dictionary: '79764919' Key being added: '79764919' at System.Collections.Hashtable.Insert(Object key, Object nvalue, Boolean add) at System.Collections.Hashtable.SyncHashtable.Add(Object key, Object value) at Konesans.Dts.Component.Helpers.CRC32..ctor(UInt32 aPolynomial, Boolean cacheTable) at Konesans.Dts.Pipeline.ChecksumTransform.ChecksumTransform.PreExecute() at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPreExecute(IDTSManagedComponent Wrapper90 wrapper) OnError,DBSW0212,DMZMGMT\sgalvining,Provider,{6F1B4233-8A08-4688-827F-FB98DC7F975F}, {4E0CB18F-0373-414C-9852-4F29B28BD627},4/13/2009 5:03:49 AM,4/13/2009 5:03:49 AM,1073450910,0x,System.ArgumentException: Item has already been added. Key in dictionary: '79764919' Key being added: '79764919' at System.Collections.Hashtable.Insert(Object key, Object nvalue, Boolean add) at System.Collections.Hashtable.SyncHashtable.Add(Object key, Object value) at Konesans.Dts.Component.Helpers.CRC32..ctor(UInt32 aPolynomial, Boolean cacheTable) at Konesans.Dts.Pipeline.ChecksumTransform.ChecksumTransform.PreExecute() at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPreExecute(IDTSManagedComponent Wrapper90 wrapper) Chengshu 4/19/2009 3:55:34 PM # This is Perfect example for data Cleansing.Its help allot.Cheer Up!!! Thank for posting this Sarabjeet Singh 4/21/2009 7:27:29 PM # Hey, Sarabjeet I am not able to understand what your comments mean here. We have the issue and are looking for any fix or solution. Any one konws what the issue is and how to fix it. chengshu wang 5/13/2009 8:41:28 AM # Hi, Excellent examples. Example 1 gives me an error though on the final step 'Destination'. [SQL Server Destination [286]] Error: Unable to prepare the SSIS bulk insert for data insertion. [DTS.Pipeline] Error: component "SQL Server Destination" (286) failed the pre-execute phase and returned error code 0xC0202071. I don't get this error when I go to the advanced settings of the destination and check out 'table lock'. Is there an other way to avoid this error, any SQL server setting maybe? Or do I have to check out


this setting on the most of the packages I am going to create in the future. Thanks Kevin Wils 6/9/2009 8:42:27 PM # I have been trying to make Method 2 work but I keep getting this error: [Insert Destination [14940]] Error: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005. An OLE DB record is available. Source: "Microsoft SQL Native Client" Hresult: 0x80004005 Description: "Could not complete cursor operation because the table schema changed after the cursor was declared.". SKim 6/11/2009 9:20:55 PM # Excelent man, I was asked 1 hour ago to build a DTS in 2 hours. Thanks to you I will get it done on time! Thanks!!

Generating Surrogate Keys

Surrogate keys are generally considered fundamental building blocks of a data warehouse. They are used as identifiers for dimensional members and enable us to manage slowly changing dimensions. SSIS does not contain a built in component for generating surrogate keys but there is still a mechanism for doing it the Script Component. The Script Component allows us to modify the data in a data flow path using managed code and we can use it to generate surrogate keys. The Row Number Transformation can be used to help generate surrogate keys.

1. Creating the transformation
When you drag a script component from the toolbox to the design surface you will be prompted as to whether the component is to be a source adapter, a destination adapter or a transformation. Select Transformation.


2. Configuring the transformation metadata

Your script component should contain 1 input and 1 output. This is to be a synchronous transformation so the SynchronousInputID property of the output must be the same as the ID property of the input. The output should have 1 column which in this instance I have called SK. In the case of a synchronous transformation no output buffer is required; the input buffer is used by the output. Therefore, any columns that we create on our output actually appear as if they are part of the input buffer as we will see later in the script A good rule of thumb for synchronous script transformations is that you should only add columns to the output which do not already exist in your input because the input columns will flow through the script transformation in any event


Note in the screenshot above the SynchronousInputID property of the output is set to the ID of the Input thus indicating that this is a synchronous transformation.

3. Building the script

We are going to use managed code to populate SK with an integer that acts as a surrogate key. The code looks like this:


There are four things to note about what is going on here:

We have declared a variable called counter that will contain a value to be outputted as the surrogate key. We have initialized counter within the New() method which gets called just once on each execution We increment counter within the Input_ProcessInputRow() method which gets called for each row in the input buffer We are outputting the contents of the input buffer to the component output along with our surrogate key value from counter

And that's it! Its all very simple. Naturally your surrogate key values wont always start from 1 as they do here but you can build on this method to pass in the current maximum value of the surrogate key in question in order to initialize counter. The example package that has been illustrated above can be downloaded (7KB) Generating Surrogate It is very simple to run as it does not require any configuration. The synchronous


nature of the script transformation is demonstrated by passing through 100000 rows of data. Download it and try it out! Row Number TransformationThe Row Number Transformation calculates a row number for each row, and adds this as a new output co...Creating packages in code - Flat File Source to OLE-DB Destination (SQL Server)This code sample programmatically creates a package that imports a text file into SQL Server, with a...Multicast Transform and Threading in SQL Server 2008 The Multicast transform in SSIS 2005 enables us
to take 1 input dataset and from it generate n out...

2/19/2009 4:54:41 AM # Thanks. this is useful for me. namemo 3/23/2010 11:06:43 AM #

I assume this would pick the latest value of the SK from a table and write back at the end? What happens in the event of a crash, particularly if loading records in batches of say 10000, what happens if the load crashes at row 12000 and the SK is not written back to the SK table? I am probably missing something, would appreciate comments. KG

Easy Package Configuration.

One of the age old problems in DTS is moving packages between your development, test and production environments. Typically a series of manual edits needs to be done to all the packages to make sure that all the connection objects are pointing to the correct physical servers. This is time consuming and gives rise to the possibility of human error, particularly if the solution incorporates many DTS packages. Many companies have provided their own custom solutions for managing this problem but these are still workarounds for a problem that was inherent in DTS. Happily, Integration Services (IS) now provides a solution to this problem Package Configurations. Package configurations are a mechanism for dynamically changing properties of your IS objects and components at run-time using values that are stored externally to the package. There are a number of methods to store and pass these values to a package:

An XML file Environment variable Registry settings Parent Package Variable

The beauty of an XML file is that each of your packages can point their package configurations at the same XML file which means settings that are pertinent to all packages only have to be changed in one place. In the case of using an XML file to store the values, as long as the XML files are stored in the same place on each environment, (e.g. C:\PackageConfigs\Environment.dtsConfig) it is not necessary to do any editing of the packages when they are moved between environments. This is an example of a direct configuration. An indirect configuration uses environment variables to store the configuration values.


The simplest usage of package configurations would be to store the name of your database server that will be the destination for all your data flows and it is this situation that will be demonstrated herein by use of an XML configuration file. Another method of achieving the same aim would be to store the name of your database server in an environment variable. The environment variable simply needs to be edited on each separate environment.

Setting up your package

First we will need an IS package to which we will add a package configuration. This demonstration will use the simple concept of importing a text file into a SQL Server table. The text file is a comma-separated-value (CSV) file and should contain the data below, with a file name of PersonAge.txt.

1,Joe Bloggs,27 2,Mary Smith,26 3,Fred Jones,28

The destination table will be called dbo.PersonAge. It will reside in a database called DataStore. Use the following script to create the database and table:

CREATE DATABASE DataStore GO USE [DataStore] GO SET ANSI_NULLS ON SET ANSI_PADDING ON GO CREATE TABLE [dbo].[PersonAge]( [PersonAgeKey] [int] NULL, [Person] [varchar](35) NULL, [PersonAge] [int] NULL ) ON [PRIMARY] GO
It is assumed that you are familiar enough with IS to build this package so this article will not explain how to do this in detail except to ask that you build your package thus:

An Execute SQL task, named Truncate destination. The SQL command is TRUNCATE TABLE dbo.PersonAge. A Data Flow task, named Import File. In your control flow, drag a precedence constraint arrow from Truncate destination to Import File.

Your control flow should look like this:

Now build the data flow inside our Import File task:


A Flat File connection called Source pointing at your recently created CSV An OLE DB Connection called Destination pointing at your recently created database DataStore. Import File data flow requires a Flat File Source component pointing at Source. In the advanced editor, Input and Output Properties tab, change the DataType property of external columns Column 0 & Column 2 to DT_I4. Do the same to the output columns Column 0 & Column 2 as well. Import File data flow requires an OLE DB Destination pointing at Destination. In the editor select the dbo.PersonAge table as the destination. The output from the Flat File Source is the input to the OLE DB Destination.

Your data flow should look like this:

At this stage you should be able to run the package successfully.

Setting up a package configuration

Now for setting up the package configuration. Its very easy to do but brings tremendous flexibility to your packages. With a package configuration you can edit your package properties, variables, connections and the properties of your control flow tasks (termed executables) at run-time. Note that you cannot edit the properties of your data flow components.

On the menu bar, point to DTS, Package Configurations or right-click on the control flow design surface and select Package Configurations. In the Package Configurations Organizer click Enable package configurations and click Add Click through the welcome screen and in the Configuration Type combo select XML Configuration File. In the space for Configuration file name type C:\PackageConfigurations\Environment.dtsConfig and click Next In the object tree browse to Connections.Destination.Properties and check the InitialCatalog & ServerName properties. Click Next Give your configuration a name and click Finish. Seeeasy!!!

Your package will now pick up the values for the 2 properties at run-time. If you open the XML file in a text editor you will be able to see that the properties are currently set to whatever they were set to prior to building the package configuration. Now you could easily move this package (and any other packages in your application) to a completely new environment and all you would have to do is change 1 property in C:\PackageConfigurations\Environment.dtsConfig. Pretty nifty! If you wanted to you could even dynamically populate the name and location of your source file at run-time.

This is a short introduction to package configurations in order to demonstrate the use of them. Package configurations are used to alter the state of your package at run-time therefore enabling the IS developer to build dynamic packages without having to write custom code.


Download the pre-built demonstration material (7KB) Easy Package Currently rated 5.0 by 4 people

Currently 5/5 Stars. 1 2 3 4 5

confiurations, howto

Configuration E-mail | Permalink | Comments (9) | Post RSS

Related posts
Creating packages in code - Package ConfigurationsContinuing my theme of building various types of packages in code, this example shows how to buildin...Using Parent Package Variables in Package ConfigurationsPackage configurations are now the prescribed way of being able to set values within your package fr...Creating packages in code Execute SQL TaskThe Execute SQL Task is for obvious reasons
very well used, so I thought if you are building package...

11/4/2008 11:49:22 AM # In DTS package you could use a query to set the value of a connection. For eg., there is one gateway database and 2 other databases of which one is a primary and other is a standby database. If there is a failover status changes from primary to standby. At any point gateway database has information about the current primary server. Inside a dts package I can query this database and set the connection so that data is always pulled out of primary database. How can I do this in SSIS? Or how can I use queries to alter the value of connections in SSIS? This is my 2nd day on SSIS. Hope you can help me. thank you Vinod vinod 11/5/2008 12:44:42 PM # If you take a look at the Execute SQL Task ( post, youll see how you can store the result of a SELECT query into a variable. The variable can then be used in a Property Expression ( on the Connection, perhaps the ConnectionString property to dynamically change the server that the connection points at. Darren Green


11/11/2008 9:56:03 PM # Thanks!!! This helped Mahesh Vijayamohanan 2/18/2009 10:52:05 PM # Hi, Thanks for the intro to package configurations. I'm fairly new to SSIS, but have managed to get two packages working together sharing a Package Configuration. I have run into a problem trying to schedule the package through SQL Agent. We have 15-20 feeds that we have to load. The format of each load is different. The schedule for each feed is such that there can be a wide window in which the feed can come in. The user can change the schedule of the feed (it is database-driven), also. I developed my app as 2 packages, a "Scheduler", and a "Loader". While there is one Scheduler, there will be 15-20 Loader Packages. The Scheduler is configured with: 1. Log File Path (great example on dynamic log file names in the Expert SSIS book!) 2. Database Connection info 3. Source Feed info (e.g., Feed Cd, Inbound Folder, Staging Folder, etc.) The Job of the Scheduler is to: 1. Check if the feed should run today. 2. Check the Start time to begin watching for the feed, and the length of time to wait for the feed before aborting If the Feed should run today, the Scheduler invokes an "Execute Package Task" to call the Loader The Scheduler is also configured with the Name of the Loader Package to invoke The Scheduler and the Loader Package are both deployed, and are intended to share a single Package configuration file. I've benefited obviously from the FileWatcher Task provided at this site! The Scheduled Job is a single Step A SQL Server Integration Services Package Step The Package is the "Scheduler" Package I spectified a Package Configuration File for both the "Scheduler" Package and the "Loader" Package (same config file) When I run the job, the Agent finds the Package configuration, and correctly runs the Scheduler Package. However, the job fails when the Loader Package tries to run using the "Default" package configuration from the Development environment. Is there a way to use the Execute Package Task, and pass in the Configuration file to use when executing the package? I investigated using "Package Parent Variables", but that is cumbersome because: 1. There are a half dozen variables requiring a hafl dozen configurations 2. This precludes running the Loader package stand-alone (it is self-sufficient save for the "Connection-related variables") Any thoughts appreciated. Thanks, STeve STeve Tahmosh


3/31/2009 10:52:58 AM # Hi, Setting up the package using Configuration is quite useful but we would more be appreciated if you could provide configuration using "SQL Server". No blog or site has provided any hint to use this configuration in order to start with. Hopefully we all would be interested if you could provide us step by step process by quoting examples like when packages deployed from development to test, test to production. Thanks in advance. Chandra Bose 4/1/2009 8:25:12 AM # Hi, I've successfully set up an xml config for a dataflow task using an xml source. The only thing I miss is how to config the path to the xsd file in the .dtsconfig. Unfortunately I can't use inline schema for I don't have any influence on the format of the xml source file. Any ideas? Best regards, Chris Chris Gaisberger 4/1/2009 8:29:45 AM # Sorry... I found the solution by myself... I mixed up the xml connection manager with the xml source. The xml source of course has a config setting for the xsd. But thanks anyway, great site Regards, Chris Chris Gaisberger 7/14/2009 8:11:34 PM # For some reason I keep getting the following error. I have saved the sensitive info with a password: Error 3 Validation error. Data Flow Task: OLE DB Destination [217]: SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER. The AcquireConnection method call to the connection manager "removed" failed with error code 0xC0202009. There may be error messages posted before this with more information on why the AcquireConnection method call failed. Dave Powell 7/24/2009 3:41:17 PM # I am new to SSIS, pkg configuration. I have a question i have a ssis package connected to a database in development database ,where my pkg and config is working fine. But i have to move to QA but i dont have the database in QA server. Can i configure my package to point to development server database and package which is in QA server. Basically its a cross domain issue. Thanks chand


The Execute SQL Task

In this article we are going to take you through the Execute SQL Task in SQL Server Integration Services for SQL Server 2005 (although it appies just as well to SQL Server 2008). We will be covering all the essentials that you will need to know to effectively use this task and make it as flexible as possible. The things we will be looking at are as follows:

A tour of the Task. The properties of the Task.

After looking at these introductory topics we will then get into some examples. The examples will show different types of usage for the task:

Returning a single value from a SQL query with two input parameters. Returning a rowset from a SQL query. Executing a stored procedure and retrieveing a rowset, a return value, an output parameter value and passing in an input parameter. Passing in the SQL Statement from a variable. Passing in the SQL Statement from a file.

Tour Of The Task

Before we can start to use the Execute SQL Task in our packages we are going to need to locate it in the toolbox. Let's do that now. Whilst in the Control Flow section of the package expand your toolbox and locate the Execute SQL Task. Below is how we found ours.


Now drag the task onto the designer. As you can see from the following image we have a validation error appear telling us that no connection manager has been assigned to the task. This can be easily remedied by creating a connection manager. There are certain types of connection manager that are compatable with this task so we cannot just create any connection manager and these are detailed in a few graphics time.

Double click on the task itself to take a look at the custom user interface provided to us for this task. The task will open on the general tab as shown below. Take a bit of time to have a look around here as throughout this article we will be revisting this page many times.


Whilst on the general tab, drop down the combobox next to the ConnectionType property. In here you will see the types of connection manager which this task will accept.

As with SQL Server 2000 DTS, SSIS allows you to output values from this task in a number of formats. Have a look at the combobox next to the Resultset property. The major difference here is the ability to output into XML.


If you drop down the combobox next to the SQLSourceType property you will see the ways in which you can pass a SQL Statement into the task itself. We will have examples of each of these later on but certainly when we saw these for the first time we were very excited.

Next to the SQLStatement property if you click in the empty box next to it you will see ellipses appear. Click on them and you will see the very basic query editor that becomes available to you.

Alternatively after you have specified a connection manager for the task you can click on the Build Query button to bring up a completely different query editor. This is slightly inconsistent.


Once you've finished looking around the general tab, move on to the next tab which is the parameter mapping tab. We shall, again, be visiting this tab throughout the article but to give you an initial heads up this is where you define the input, output and return values from your task. Note this is not where you specify the resultset.


If however you now move on to the ResultSet tab this is where you define what variable will receive the output from your SQL Statement in whatever form that is.


Property Expressions are one of the most amazing things to happen in SSIS and they will not be covered here as they deserve a whole article to themselves. Watch out for this as their usefulness will astound you.


For a more detailed discussion of what should be the parameter markers in the SQL Statements on the General tab and how to map them to variables on the Parameter Mapping tab see Working with Parameters and Return Codes in the Execute SQL Task.

Task Properties
There are two places where you can specify the properties for your task. One is in the task UI itself and the other is in the property pane which will appear if you right click on your task and select Properties from the context menu. We will be doing plenty of property setting in the UI later so let's take a moment to have a look at the property pane. Below is a graphic showing our properties pane.


Now we shall take you through all the properties and tell you exactly what they mean. A lot of these properties you will see across all tasks as well as the package because of everything's base structure The Container. BypassPrepare Should the statement be prepared before sending to the connection manager destination (True/False) Connection This is simply the name of the connection manager that the task will use. We can get this from the connection manager tray at the bottom of the package. DelayValidation Really interesting property and it tells the task to not validate until it actually executes. A usage for this may be that you are operating on table yet to be created but at runtime you know the table will be there. Description Very simply the description of your Task. Disable


Should the task be enabled or not? You can also set this through a context menu by right clicking on the task itself. DisableEventHandlers As a result of events that happen in the task, should the event handlers for the container fire? ExecValueVariable The variable assigned here will get or set the execution value of the task. Expressions Expressions as we mentioned earlier are a really powerful tool in SSIS and this graphic below shows us a small peek of what you can do. We select a property on the left and assign an expression to the value of that property on the right causing the value to be dynamically changed at runtime.

One of the most obvious uses of this is that the property value can be built dynamically from within the package allowing you a great deal of flexibility FailPackageOnFailure If this task fails does the package? FailParentOnFailure If this task fails does the parent container? A task can he hosted inside another container i.e. the For Each Loop Container and this would then be the parent. ForcedExecutionValue This property allows you to hard code an execution value for the task.


ForcedExecutionValueType What is the datatype of the ForcedExecutionValue? ForceExecutionResult Force the task to return a certain execution result. This could then be used by the workflow constraints. Possible values are None, Success, Failure and Completion. ForceExecutionValue Should we force the execution result? IsolationLevel This is the transaction isolation level of the task. IsStoredProcedure Certain optimisations are made by the task if it knows that the query is a Stored Procedure invocation. The docs say this will always be false unless the connection is an ADO connection. LocaleID Gets or sets the LocaleID of the container. LoggingMode Should we log for this container and what settings should we use? The value choices are UseParentSetting, Enabled and Disabled. MaximumErrorCount How many times can the task fail before we call it a day? Name Very simply the name of the task. ResultSetType How do you want the results of your query returned? The choices are ResultSetType_None, ResultSetType_SingleRow, ResultSetType_Rowset and ResultSetType_XML. SqlStatementSource Your Query/SQL Statement. SqlStatementSourceType The method of specifying the query. Your choices here are DirectInput, FileConnection and Variables TimeOut How long should the task wait to receive results?


TransactionOption How should the task handle being asked to join a transaction?

Usage Examples
As we move through the examples we will only cover in them what we think you must know and what we think you should see. This means that some of the more elementary steps like setting up variables will be covered in the early examples but skipped and simply referred to in later ones. All these examples used the AventureWorks database that comes with SQL Server 2005.

Returning a Single Value, Passing in Two Input Parameters

So the first thing we are going to do is add some variables to our package. The graphic below shows us those variables having been defined. Here the CountOfEmployees variable will be used as the output from the query and EndDate and StartDate will be used as input parameters. As you can see all these variables have been scoped to the package. Scoping allows us to have domains for variables. Each container has a scope and remember a package is a container as well. Variable values of the parent container can be seen in child containers but cannot be passed back up to the parent from a child.

Our following graphic has had a number of changes made. The first of those changes is that we have created and assigned an OLEDB connection manager to this Task ExecuteSQL Task Connection. The next thing is we have made sure that the SQLSourceType property is set to Direct Input as we will be writing in our statement ourselves. We have also specified that only a single row will be returned from this query. The expressions we typed in was:

SELECT COUNT(*) AS CountOfEmployees FROM HumanResources.Employee WHERE (HireDate BETWEEN ? AND ?)


Moving on now to the Parameter Mapping tab this is where we are going to tell the task about our input paramaters. We Add them to the window specifying their direction and datatype. A quick word here about the structure of the variable name. As you can see SSIS has preceeded the variable with the word user. This is a default namespace for variables but you can create your own. When defining your variables if you look at the variables window title bar you will see some icons. If you hover over the last one on the right you will see it says "Choose Variable Columns". If you click the button you will see a list of checkbox options and one of them is namespace. after checking this you will see now where you can define your own namespace.


The next tab, result set, is where we need to get back the value(s) returned from our statement and assign to a variable which in our case is CountOfEmployees so we can use it later perhaps. Because we are only returning a single value then if you remember from earlier we are allowed to assign a name to the resultset but it must be the name of the column (or alias) from the query.


A really cool feature of Business Intelligence Studio being hosted by Visual Studio is that we get breakpoint support for free. In our package we set a Breakpoint so we can break the package and have a look in a watch window at the variable values as they appear to our task and what the variable value of our resultset is after the task has done the assignment. Here's that window now.

As you can see the count of employess that matched the data range was 2.

Returning a Rowset
In this example we are going to return a resultset back to a variable after the task has executed not just a single row single value. There are no input parameters required so the variables window is nice and straight forward. One variable of type object.


Here is the statement that will form the soure for our Resultset.

select p.ProductNumber,, pc.Name as ProductCategoryName FROM Production.ProductCategory pc JOIN Production.ProductSubCategory psc ON pc.ProductCategoryID = psc.ProductCategoryID JOIN Production.Product p ON psc.ProductSubCategoryID = p.ProductSubCategoryID
We need to make sure that we have selected Full result set as the ResultSet as shown below on the task's General tab.

Because there are no input parameters we can skip the parameter mapping tab and move straight to the Result Set tab. Here we need to Add our variable defined earlier and map it to the result name of 0 (remember we covered this earlier)


Once we run the task we can again set a breakpoint and have a look at the values coming back from the task. In the following graphic you can see the result set returned to us as a COM object. We can do some pretty interesting things with this COM object and in later articles that is exactly what we shall be doing.

Return Values, Input/Output Parameters and Returning a Rowset from a Stored Procedure
This example is pretty much going to give us a taste of everything. We have already covered in the previous example how to specify the ResultSet to be a Full result set so we will not cover it again here. For this example we are going to need 4 variables. One for the return value, one for the input parameter, one for the output parameter and one for the result set. Here is the statement we want to execute. Note how much cleaner it is than if you wanted to do it using the current version of DTS.


In the Parameter Mapping tab we are going to Add our variables and specify their direction and datatypes.


In the Result Set tab we can now map our final variable to the rowset returned from the stored procedure.


It really is as simple as that and we were amazed at how much easier it is than in DTS 2000.

Passing in the SQL Statement from a Variable

SSIS as we have mentioned is hugely more flexible than its predecessor and one of the things you will notice when moving around the tasks and the adapters is that a lot of them accept a variable as an input for something they need. The ExecuteSQL task is no different. It will allow us to pass in a string variable as the SQL Statement. This variable value could have been set earlier on from inside the package or it could have been populated from outside using a configuration. The ResultSet property is set to single row and we'll show you why in a second when we look at the variables. Note also the SQLSourceType property. Here's the General Tab again.


Looking at the variable we have in this package you can see we have only two. One for the return value from the statement and one which is obviously for the statement itself.

Again we need to map the Result name to our variable and this can be a named Result Name (The column name or alias returned by the query) and not 0.


The expected result into our variable should be the amount of rows in the Person.Contact table and if we look in the watch window we see that it is.

Passing in the SQL Statement from a File

The final example we are going to show is a really interesting one. We are going to pass in the SQL statement to the task by using a file connection manager. The file itself contains the statement to run. The first thing we are going to need to do is create our file connection mananger to point to our file. Click in the connections tray at the bottom of the designer, right click and choose "New File Connection"


As you can see in the graphic below we have chosen to use an existing file and have passed in the name as well. Have a look around at the other "Usage Type" values available whilst you are here.

Having set that up we can now see in the connection manager tray our file connection manager sitting alongside our OLE-DB connection we have been using for the rest of these examples.

Now we can go back to the familiar General Tab to set up how the task will accept our file connection as the source.


All the other properties in this task are set up exactly as we have been doing for other examples depending on the options chosen so we will not cover them again here.

We hope you will agree that the Execute SQL Task has changed considerably in this release from its DTS predecessor. It has a lot of options available but once you have configured it a few times you get to learn what needs to go where. We hope you have found this article useful. Currently rated 4.7 by 17 people

Currently 4.705883/5 Stars. 1 2 3 4 5

variables, expressions, execute sql task


Tasks E-mail | Permalink | Comments (43) | Post RSS

Related posts
Creating packages in code Execute SQL TaskThe Execute SQL Task is for obvious reasons very well used, so I thought if you are building package...File Watcher TaskThe task will detect changes to existing files as well as new files, both actions will cause the fil...Searching for tasks with code Executables and Event HandlersSearching packages or just enumerating through all tasks is not quite as straightforward as it may f...

10/31/2008 6:03:05 PM # These are great examples, but do you have articles that take the next step with these same examples? For instance, once I have a result set back from the package, what can I do with it? Thanks, Steve Steve 10/31/2008 9:13:07 PM # Hi Steve The resultset can, as you know, take a number of forms but will also be read into a variable. Once in the variable you can do with it as you will. If you use this transform then it is very possible/probable that you will want to seed it with the MAX(Existing Value) which you could retrieve using an ExecuteSQL task and reading into a variable. Using a property expression you would assign this variable to the Seed property of the transform (we expose this property through the parent Data Flow task) Another use would be to read a resultset into a variable, of type object, and use that variable in a ForEachLoop enumerator. Very similar to hope this makes sense. Allan Mitchell 11/5/2008 11:27:13 AM # You have saved many hours of my time!!. Thanks. Jorge Hi Alan! How are you!? I hope you are good! And how can I get the message returned from SQL statment like : " Msg 208, Level 16, State 1, Line 1 Invalid object name 't_data_notificacaso'. "


OR " (13 row(s) affected) " PedroCGD 11/5/2008 6:17:10 PM # Pedro For your first request. Have a look in the Output Window and it will give you a big hint as to where to find this (Event Handler) Error: 0xC002F210 at Execute SQL Task, Execute SQL Task: Executing the query "select * from INoThere" failed with the following error: "Invalid object name 'INoThere' Your second request Imagine the statement UPDATE dbo.dimCurrency SET CurrencyName = CurrencyName Now add SELECT @@ROWCOUNT as rtn set your resultset to Single Row and read into a INT32 variable. Allan Mitchell 11/7/2008 7:37:18 PM # Thanks Allan... The need is not the error message or the count of rows... what i need is exactly the message inside SQL Server... the first line... Msg 208, Level 16, State 1, Line 1 regards! See you next SQLBits! Pedro PedroCGD 11/19/2008 7:18:16 AM # Great article, keep up with the work! Contains all the information. Thanks buddy, by the way your country list does not have my country, Sri Lanka. hic hic Sriwantha Sri Aravinda Attanayake 12/6/2008 3:07:34 PM # Hi, This is really very much helpful. What i want to get to know is, in case of script task following by the Execute SQL Task and if the Execute SQL task is returning full result set how can i use it Script Task???


Let me know as soon as possible. Thanks, Manikandan Manikandan 1/1/2009 9:23:55 AM # Can we more than on result sets for an XML result set. Sanjeev 1/19/2009 3:19:23 PM # How do you pass a variable into a SQL Task that has executes inline SQL within the task? Brian Cooper 1/22/2009 1:10:39 PM # personally Brian I would be inclined to create a variable that holds the Expression (EvaluateAsExpression) of the statement you want to run. In the ExecSQL task you use that instead of a statement Allan Mitchell 2/7/2009 7:40:21 PM # Excellent Article... Great Job!!!! Thanks, ManI V S Mani V S 2/12/2009 2:25:16 AM # Hello, What I would like to do is handle a full resultset (created by the execute sql task in the control flow) in a data flow, which has more sophisticated transformation items and variable definitions. Currently there is no way to 'bridge' the gap from control flow to data flow without writing the resultset to disk. What I would like to see is the equivalent of a 'dataset' source in a dataflow. I realize that saving the data as an 'object' loses any metadata, so each column would have to be manually defined, if possible. Also, using a resultset as a datasource would allow me to use the 'decimal' variable value, which is not defined in the control flow foreach control. Any suggestions would be appreciated. Bob F. Bob Fidelman 2/18/2009 9:57:37 AM #


Hi, great article, but how did you set the watch variables? I can only get the watch window to appear (there are 4 of them) but I cannot set anything. Neither can I see anything in the locals window. DesC DesC 2/18/2009 10:03:33 AM # Also, if instead of CountOfEmployees in the above example I create a datetime variable and try to populate by (for simplicity) a statement like 'SELECT GETDATE() 'MyDate'' it does not work. I think the datetime format must be somehow intrinsically wrong or incompatible between SQL and SSIS but I'm not sure how. DesC 3/17/2009 4:57:51 AM # Great article. For some reason Returning a Rowset from Stored Procedures didn't work. SDangol 3/18/2009 11:15:05 PM # How can I change the value of the input variable to the stored procedure at run time without opening the package. The input variable I need to pass is a varchar field and I'm not sure how to set this in the expression. Can you help? Mamata 3/26/2009 8:14:44 PM # Thank you for the fine discussion of the workings of this container. It was very helpful. Jerry 4/7/2009 2:12:06 AM # Amazing. you covered all. i was in the middle of reading and cant stand if not giving my appreciation to you. diedie 4/17/2009 1:08:33 PM # Thank - Merci Laurent 5/22/2009 8:22:50 AM # Good work, really help full thanks a million


Jeswanth 5/28/2009 8:18:40 PM # Great!!! Thanks a made my life very easy... Though I am still facing an issue. I run the query to get single row, also while executing the sql I have added breakpoints and watch. I see it returns the right value, but just before the task turns green it initiliazes to -1. Any body any idea??? Puran 6/10/2009 1:27:56 PM # I found the article quite useful, however...I coudnt see any output in the Watch window. Probably, I am missing something on setting the breakpoint part. Can you please brief on it? Rashmi Patankar 6/11/2009 4:57:00 PM # This exemple was exactly what i needed. Well done and clear. i'll use this site a lot in the next few month i think. Thank Pat 6/15/2009 6:26:43 PM # Just what I was looking for. Thanks Hemant patel 6/16/2009 8:47:42 PM # Hi, great article - but my "execute sql task" doesnt work? Do you have an idea? i just want to get returned from the query -1 or null. True or False: My Query: SELECT (CASE WHEN do_export = 0 AND data_controlling = 1 THEN -1 ELSE 0 END) AS IMP FROM Tablexxx WHERE (LEFT(userid, 2) = ?) In the Result Set i named IMP to the VariableName - ist that correct? IMP ever showed -1. I would be so lucky if you or anyone could help me? Thanks for the answer!! Herb Herbert Rogelj 6/17/2009 7:50:16 AM # Herbert


Does it return a resultset in Management Studio? In your example you will need a parameter in "Parameter Mapping" for the "?" in your statement. You will need to define the "Resultset" as "Full Result set" You will need a variable of type "Object" to hold the resultset. On the Resultset tab you map the "Result Name" of "0" (zero, null) to the Object Variable. Allan Mitchell 7/14/2009 2:54:46 PM # Hi, I just wanted to assign the maximum of one of my col. to a variable. I did the following: Created a variable activity_max Writing a SQL task query: select max(id) as id from dbo.activities Nothing in parameter mapping Result set : Result name = ID , Variable name = User::activity_Max My result set = single row I ran the query which runs fine but the value of my variable is still zero ?? Help please ! Amit goyal 7/21/2009 9:44:22 PM # I had to pass parameters to a stored procedure and an update statement within a ForEach Loop Container. This article helped me figure that out. Thanks! Marianne Daye 9/18/2009 11:11:17 PM # How do I capture the result set from the following SQL task? I'm running multiple statements, but the Execute SQL Task wants to grab the result set from the first SQL statement, not the last. create table #Temp ( value int ); insert into #Temp select top 1 col from Permanent; select value from #Temp; c_kenth 10/22/2009 8:05:46 AM # Very good document. But how to call an Oracle stored procedure. When I put "exec my_proc" in "sql statment", it is not working. Hhave you an answer or URL links "ssis-oracle" Fabien Lonardi


12/8/2009 9:38:18 PM # Excellent article! Unfortunately, although I followed the instructions to the letter, I am receiving the following error: [Execute SQL Task] Error: Executing the query "EXEC usp_helloworld ?,? output" failed with the following error: "Value does not fall within the expected range.". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly. I have created a dummy stored proc: create table mytest (stmt as varchar(50), stat as int); CREATE procedure dbo.usp_helloworld @proj_task_key as bigint, @proc_status as int output as begin declare @myproj as bigint; set @myproj = @proj_task_key; set @proc_status = 1; insert into mytest values('helloworld' + cast(isnull(@myproj,'empty') as varchar),@proc_status); end; go then, in SSIS, configured an EXECUTE SQL TASK transform as follows: ResultSet: None ConnectionType: OLE DB Connection: to the db with the usp_helloworld and the table needed forstored proc SQLSourceType: Direct Input SQLStatement: EXEC usp_helloworld ,? output ParametersVar Name Direction Datatype Parm Name Parm Size User::proj_task_key Input LARGE_INTEGER @proj_task_key 0 User::proc_status Output LONG @proc_status 1 As Variables User::proj_task_key is declared as Int64, Value 0 and User::proc_stat is Int32, Value 0. I do not know where I have gone wrong. Can u help? Adela Adela 12/8/2009 10:15:04 PM # I figured it out! My version of SSIS has Parameter Name and Parameter Size. I was putting name strings in the Parameter Name column when I should have been putting the parameter numbers (0 & 1)! Yay! Adela 12/10/2009 6:29:30 PM # I made the same mistake as Adela - putting names in the Parameter Names columns instead of 0,1,2... My next mistake was putting anything in Parameters when what I really wanted was the user variables in the Result Set tab, not the Parameter tab. Once I cleared out Parameters, it ran like a charm and I used my variables to build dynamic flat file names. Scott C 12/15/2009 2:06:51 AM #


Nice one!!..It helped me a lot. Now I am thinking redesign my data conversion SSIS packages. Thanks again for the efforts. Neel 12/28/2009 9:43:24 PM # When I assign my result set to a user variable (int32) which was defined at the package level, the value of the variable became {-1} once the execution was done. I did see correct value in the same value on execution when it stopped at a breakpoint. Am I missing something here?? Nits 1/6/2010 5:09:28 PM # Hi Allan Mitchell, This article is great and helped a lot. I have small doubt, now iam using Rowset as FullResutltSet where iam able to get multiple rows in a object variable like Com_Object but how to read the data from the com_object vidyasagar 1/13/2010 11:16:55 PM # Gracias, Thank you, the tutorial it's very cool and easy. saludos. Christian 1/20/2010 11:01:16 AM # vidyasagar, the output of using FullResultSet is an ADO Recordset object. Use the ADO library to access it, or try something like the ForEach Loop which has an ADO enumerator built in. An example of this is shown in the post Darren Green 1/21/2010 12:13:50 PM # Hello Allan, I have a question about the execute SQL Task regarding the TIMEOUT. My objective is to unpack a data file from an receive data folder, read it's content into a DB and them compact the result to a archive folder (in case of error a message is set). So far I was able to do all the steps, if you need details feel free to ask. The problem starts when the file becomes large (I'm using one with 2 GB of size) because the SSIS execute task does not wait for the full unpacking and sets the error task. I think this is a timout problem and I have setted the value to 3600 (being in seconds this would mean an hour and it does not take that long) aldo without any success, it keeps falling. If you can help me or if you have had a similar problem plesa respond. Thanks in advance, Complements,


Bruno Pimenta 1/21/2010 3:05:31 PM # Hello everybody ... I have a problem when execute this qry in Execute sql task: [Execute SQL Task] Error: Executing the query "select * from SRVCP.SerivaCapaDatos.dbo.CDH_ADMINISTRACIONPORFORMAPAGO" failed with the following error: "Error no especificado (Excepcin de HRESULT: 0x80004005 (E_FAIL))". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly. I used Linked Server to connect to other server (Link server name : SRVCP). When i execute de sentence ( elect * from SRVCP.SerivaCapaDatos.dbo.CDH_ADMINISTRACIONPORFORMAPAGO) in Sql query analyzer it works great ... Thanks for your time ... Javier Mora 1/28/2010 8:53:07 AM # Javier, the error you quote is the generic one the task raises. I would expect there another error message that gives more detail. It is common for a single problem to be reported across several messages in SSIS. The source will of course be the task. Start by simplifying the issue, dont use any input or output variables or any result set options, just try and get the SQL to execute. Since you mentioned linked servers, security is an obvious candidate. Think what is different about the manual query test compared to when using SSIS. Is the location or user different for example? Darren Green 2/21/2010 6:57:15 AM # in the package i have following steps(assuming that excel sheet is already in location) 1 .sql task in whcih iam using connection type as Excel and sqlStatementas Drop table 'Asd'(Asd refers to table in excel sheet) 2.sql task in a excel whcih iam creating a table connection type as Excel and salStatement as Create table 'Asd'(columns..) my problem is if for some reason second step fails and when i re run the package at 1st step iam getting error Asd table doesnot exist( b'coz when 2nd step failed the excel sheet is empty and it doesnot contain table so when i re run package throws error that table doesnot exist) is there any way to sucess the 1st step (drop table Asd) even if the table doesnot exist in excel sheet????????? tabrez

Looping over files with the Foreach Loop

In SQL Server 2000 Data transformation Services (DTS) it was a bit of a hack to be able to loop over files of a given type in a certain directory and import them into your destination. It involved a lot of "Glue Code" and a certain amount of fooling the package into going back to a previous task because it still had work to do. Well thankfully in SQL Server 2005 Integration Services (SSIS) that has all changed and this article is going to show you how.


The image below shows us how incredibly simple and clean the package will look when finished. There are some things worth pointing out at this stage. In the centre of the screen we see the Foreach Enumerator container and inside that we see the Data Flow task which houses the pipeline. At the bottom in the Connection Managers tray we see our Flat File Connection Manager (My Source File) and our OLEDB Connection Manager (My Destination). The Flat File Connection Manager is the one in which we are most interested for this article. Both of these managers are used in the Data Flow behind the DataFlow task. We will not be detailing the pipeline behind the DataFlow task in this article but it consists of a Flat File Source moving data to an OLEDB destination.

Let's begin then by opening up the Foreach enumerator and moving straight to the Collection node in the tree on our left. Below we see our information already populated.


What we see on the screen is pretty self explanatory but let's go through it anyway. We have chosen to enumerate over a file collection and ths is indicated by the value next to the Enumerator property at the top. We need to specify a folder over which to loop and for which type of files to look and we do that in the centre of the form. We are given three options as to what is returned when the loop finds a file in the folder at the bottom of the form. We can return the whole filename including extension and path, the name and extension or simply the name of the file found. Because our connection manager is going to need to know exactly where to find the file and it's name we have hosen the first option. The final thing we see on this screen is the ability to traverse subfolders. In our example we do not need to do this. When the Foreach enumerator finds a file it needs to tell us about what it found and it does this by populating a variable. Click on to the Variable Mappings node now. Our package currently has no variables able to accept the name of the file so we are going to create a new one.


The next screen we see allows us to set the values of the variable. As we can see variables can be scoped in SSIS to certain executables in the package or to the package itself.


Here is how our variable looks with all its properties set.

Because the enumerator will only return us at most one value on every iteration we map our variable to an index of 0.


We have now configured everything as far as the Foreach enumerator is concerned. We now need to set the rpoerties of the Flat File Connection Manager. Highlight the manager in the tray at the bottom, right click and choose properties.


The important part of this dialog is highlighted and that is "Expressions". Click on the ellipses and we will be taken through to the next screen where we can start to create the expression. In the screen that follows, from the Property column drop the list down and choose ConnectionString


Now hit the ellises button to the right and we are taken through to the expression editor where we will build the actual expression itself.


Our requirements are pretty simple here and all we want to do is to retrieve the variable we defined earlier. To do this simply drag the variable from the list at the top to the expression text box at the bottom. Property Expressions can become very complex and we shall no dount be seeing more of them in future articles. After you have chosen the variable click OK


We now see that our expression is mapped to our ConnectionString property. Click OK Finally we can now see our File Manager's Connection string property being mapped to an expression in the properties of the manager.


That's all there is to it. When the enumerator finds a file matching our requirements it will set the correct property on the connection manager and this will be used by the pipeline at runtime. Currently rated 4.7 by 25 people


Currently 4.72/5 Stars. 1 2 3 4 5

Workflow E-mail | Permalink | Comments (44) | Post RSS

Related posts
For Loop Container Samples One of the new tasks in SQL Server 2005 is the For Loop Container. In this article we will demonst...Searching for tasks with code Executables and Event HandlersSearching packages or just enumerating through all tasks is not quite as straightforward as it may f...Where is my app.config for SSIS?Sometimes when working with SSIS you need to add or change settings in the .NET application configur...


11/11/2008 7:31:02 AM # Excellent example, but i tried the same with sql 2008 IS and excel 2007, the same is working in sql 2005 IS and office 2003 with excel file. I can see the provider difference between office2003(jet.oledb.4.0) and office2007(jet.oledb.12.0), rest are all same, nothing changed.... can u help to sort out why its working with sql2008/office2007.. kaushik saha 11/11/2008 8:51:31 AM # Here is what I do Use the Microsoft Office 12.0 Access Database Engine OLE DB Provider ServerName is the full path to your XL file (and will be the expression you set) On the [All] tree item I set the extended properties to Excel 12.0;HDR=YES Looping over xlsx files in itself is not an issue Works for me. Allan Mitchell 12/4/2008 5:30:08 PM # Set DelayValidation = True on the connection manager and the data flow. Crash Burke 12/10/2008 8:54:56 AM # can any help me out on forloop and foreachloop containers i am unable understand them plz susheel 12/17/2008 7:36:39 AM # Hi, This is a fantastic example. I am new to MS DB Tools (a long time Oracle person), and this type of capability is excellent. I have a developer working with my direction, who has been charted to do just this as described above. I extended the requirement that the value for the "My Source File Connection" "ConnectionString" property needs to come from a value in a Database Table. I would think a simple transact-sql statement would do the trick. This would be so that we can either: 1) transparently move the folder to another location, or 2) have multiple environments on the same server, or 3) use the same code, unchanged (except for data) in dev and prod). My developer said that this would require some ".net code", and is not a simple sql statement - the challenge is in getting the sql selected value into the property.


My question: Is there a technique on this site which demonstrates how to make a property such as "ConnectionString" populated from a SQL statement? I just learned of this site through a Wrox book "Professional SQL Server 2005 Integration Services" Thanks, STeve STeve Tahmosh 12/17/2008 7:55:37 PM # OK so you read the value into a variable - easy You now need to either look at Property Expressions or you need to look at Configurations. You can set the property on the connection manager from there. Allan Mitchell 12/18/2008 4:55:13 AM # Hi Allan, Thanks - you are speaking to someone who knows SSIS. I am not that person. I will make another attempt to answer my question. I can figure out how to query a table with a t-sql script. That is not why I wrote this post. I thought the original post was extremely helpful. STeve Tahmosh 12/18/2008 6:05:25 AM # Hi Allan, You must be the same Allan Mitchell that co-authored the Wrox book professional sql serer 2005 integration services. I am on chapter 1 I am simultaneously on Chapter 4 of the Wrox book on SSRS. Basically, I would be interested in an overall view of where I should focus in order to accomplish my goal. Given a large number of features in the tools, including scripting languages that one might assume familiarity with, how does one architect a system for flexibility (defined below) i.e., Chapter 1 says "Variables allow you to dynamically configure a package at runtime. Without variables, each time you wanted to deploy a package from development to production, you'd have to open the package and change all the hard-coded... Now with variables you can just change the variables at deployment time...." My goal is not to even have to change the variables at deployment time, but to write a package that queries database data at run time, and therefore would work in dev or production appropriately. i.e., In one environ I have source files in c:\folder1, in a second environment, in c:\folder2, in production in d:\folder1, etc. My original question was (perhaps better stated here): What is the overall architectural components do I need (using your excellent tutorial on Looping over files) to accomplish this goal? Do I need an ActiveX script (is that even the proper tool in SSIS any more) which invokes an ExecuteSQL TAsk to read the database variables and assign them to SSIS properties, and are those properties set with some kind of scripting tool? In any case, if this question is too basic for this forum, I will continue along with Chapter 1 of the Professional Sql Server 2005 Integration Services (I'm almost done!), and may be able to answer this myself in no time. Would welcome a pointer to a chapter in the book to focus on. (I am not looking to be spoon fed the code) Keep up the good work. STeve


STeve Tahmosh 12/18/2008 7:41:10 AM # Hey, I'm continuing to plug along - it seems like the approach is in Chapter 3 of the Wrox Professional SQL Server 2005 Integration Services book. Use an "Execute SQL" task and store the Result Set into Variables of (for a novice) Package Scope. Use a "Script" Task to set the value of the "ConnectionString" property of the (using your example) "My Source File" connection to the value of the variable containing the Folder where files would live. Am I getting warmer? If so, seems I've got one thing remaining: Dynamically Setting the ConnectionString property Would this be done in the Script? Is there a Script command to set, for example the ConnectionString Property to a value stored in a variable

STeve Tahmosh 1/8/2009 8:58:19 PM # Steve, You are looking at combining this tutorial with the Shreding a recordset Tutorial Ray 1/9/2009 8:28:13 AM # In SQL 2008, the second image (For Each Loop Editor) looks different. In order to see it, you have to select another 'Enumerator' type from the list (E.G: For Each Item Enumerator) and then select again the 'For Each File Enumerator'. That will display the correct 'Enumerator Configuration' field as in this manual. Good Luck! Lekfir lekfir 1/13/2009 7:37:12 PM # Hi there, I did the above example and it works perfectly with text files, but I actually need to loop through tables in multiple Access files. I repeated the same steps from above with a connection to an Access file instead of a text file and end up with this error: SISS Error Code DTS_E_CANNOTAQUIRECONNECTIONFROMCONNECTIONMANAGER. The error comes up as soon as I set the ConnectionString property expression in the OLE DB connection manager. The whole package runs smoothly when I set up multiple connections to all the Access files, but I cannot seem to get it to loop...any suggestions? Thanks in advance for your help! Kenady Kenady 2/16/2009 10:02:34 PM # Kenady,


I am having the same exact problem as you. Trying to do the same for each file loop, but with MS access databases. Did you ever figure this out. As soon as I change the connectionstring in the propery expression I get the same SISS Error code. SISS Error Code DTS_E_CANNOTAQUIRECONNECTIONFROMCONNECTIONMANAGER. I have tried it multiple times and have had no luck. I hope you or someone else has a solution to this. Jesse JesseLewis 2/24/2009 5:11:35 PM # Hi! Thanks for you advise above. I have a related issue. I have 4 For Each loops in series which all successfully loop through multiple Excel files of four different layouts, one loop for each layout. My problem comes in when I don't have a file for all 4 loops. The first, which is the only one that does not use the first row for column headers, will successfully "skip" the loop and move on without error. The remaining three give me a "Package Validation Error ... code 0x80040E37 ... Opening rowset for "Sheet1$" failed. Check that the object exists in the database." All four are set to Delay Validation = True. All four delete the input file if the remaining steps within the loop run successfully. Is there a workaround for this? One of my customer's main requirements is to be able to run any combination of the four file types wihtout all four necessarily being present. Many thanks! Kathleen 2/24/2009 7:25:13 PM # Never mind We found it! I needed to change the DelayValidation property of the Import task to

True as well as the one in the Connection Manager. Thanks anyway! Kathleen 3/5/2009 7:32:04 AM # i agree with this , which actually take all the files data and load it to single table . but how to do if scenario is like this i have 10 csv in a folder. so i need to load all the data of 10 csv to different table. What i have tried is: i used FOR EACH LOOP container to read all the 10 csv files... but its loading all the data of 10 csv files to single table praxy 3/13/2009 5:43:12 PM # Amazingly enough, I am unable to get this type of loop working. I am familiar to, and new to SSIS. I have been all around the web looking for either tutorials or working code that shows me how to do what I need to do. My basic task is to loop through the files in a directory (csv or xls), pull the


data (same structure in each file), and export it somewhere (csv or xls). If anyone out there could take the time to look at it, I would love to send you an example of my project, or some screen shots. The project is very small, and it only attempts to do what has been described here. Rahsaan Pringle 4/22/2009 3:55:55 PM # I have a req. where i need to loop through all the .sql files in a particular order and execute them. So far iam successful in looping through a folder and execute all .sql files but how can i execute them one by one in a preferred order? Any ideas? I was thinking to remane files with some numeric prefixes for eg; 1_filename, 2_filename Now the issue is how do i loop in this order in SSIS? Im using a for loop container. Binoj 4/23/2009 1:52:38 PM # This example definitely works for me - I do however have a question about the variable that I created. I am wanting to store this filename in the database to indicate that this file has been processed, but, whenever I reference the variable, it attempts to define it as Unicode string[DT_WSTR] and Length = 0 ------- Why is that as with being length 0 - When I try and reference it in my derived column - it is blank. As it was a variable , how do I get this into my columns. Any ideas or pointers will be greatly appreciated. Thanks Charkra 4/28/2009 7:38:54 PM # hello guys can anyone let me know how to enable unicode in sql database 2)how to change oledbe connections to connection in configg files directly in the script rakesh 4/30/2009 4:26:19 PM # Good after noon everyone. I'm looking to get some help on a particular For Each Loop container that I'm attempting to create. What I need is to loop through a collection of txt files, but after each enumeration I want to attach a number to the data within the file. An example would be. File1.txt (all records get a 1 appended to the end of each row) File2.txt (all records get a 2 appended to the end of each row) File3.txt (all records get a 3 appended to the end of each row) etc......

Alvin 5/11/2009 3:29:46 AM # This example was great! Thank you so much. MC 5/12/2009 4:21:44 PM #


This was a great example; I was trying to use the File Watcher which had to be installed, but this was easier to use and it worked the first time..thanks Richard Cranston 5/19/2009 9:15:03 AM # This is very helpful. We have a requirement to do something similar except I was considering using raw files. How would the raw file source connection need to be set up to loop through the files? gj 5/19/2009 1:36:02 PM # GJ, The Raw File Source does not use a connection manager. The source wants the file name direct, and it can accept variables, so skip the section about setting a property expression. On your Raw File Source change the AccessMode property to be "File name from variable", and set the FileNameVariable property to be the User::FileWeJustFound variable. Darren Green 5/20/2009 4:47:05 AM # Thanks Darren. I'm new to SSIS and I appreciate the help. I've followed the above steps as suggested and the raw file source seems good now. The raw file source connects to an OLE DB destination and I'd like to load each file into a different table in the ForEach Loop. The file names are the same as the table names other than file extension. How should the OLE DB destination be setup for this? gj 5/20/2009 8:45:55 AM # The OLE-DB Destination has a similar AccessMode property, select one of the "Table or view name variable" options depending if you want fast load or not. You need a second variable for this, the table name, which can be derived from the filename as you suggest. Use an expression on the variable to do this, see an example expression for getting the filename minus the extension at Similar to how you can set an expression on a connection manager in the example above, you can do this on a variable, so the value of the variable is actually from the expression. Set the EvaluateAsExpression property as true, and set the expression itself on the Expression property of your variable. Darren Green 5/20/2009 11:52:57 PM # I created a table variable specifying the table variable expression property to the "filename minus the extension" expression. My raw file source is set to use the FileWeJustFound variable and the OLE DB destinations is set to use the table variable. When I try to run the package without a value for the FileWeJustFound variable, it errors saying the file name hasn't been properly specified. If I try to run with the path to one of the files as the value for the FileWeJustFound variable, it seems to make it through the first table and fails. Does there need to be a value specified for the file variable? The error messages from when I set a value for the FileWeJustFound variable are - [Raw File Source [410]] Error: The output column "colum_name" (453) is mapped to an external metadata column that does not exist. Thanks again for your help on this Darren.


gj 6/2/2009 4:40:13 AM # Brilliant Article, just what i needed. I'm also using a script to check if the file exists, so i didn't have to make much changes to that, just passed in the new variable. But one problem with this approach is that, if we want to change the configuration or column length of the file, we have to remove the expression, choose a file, make changes and put back the expression. Abhinay B 6/4/2009 9:45:57 PM # All I can say is thank you, thank you, thank you!!! Your excelent example was exactly what I was looking for and it let me implement my solution easily. U R great! James. James Baird-Kerr 6/18/2009 8:45:26 PM # Hi I have a requirement of looping through x files as explained above. But incase the load of a particular file fails, package should not stop but rather should continue with the other files execution. Please suggest how can I do that ? Thanks Tarun tarun 7/15/2009 5:03:28 PM # First off, thanks for the example. It is very useful for what I need to do. Question 1: I haven't seen this mentioned explicitly. It is true that the Foreach Loop will always go through the files alphanumeric order? I have file names which have a datestamp component to them (i.e. file20090714, file20090715, ... etc) and wish to process them in this order. Question 2: I have file names which have both a datestamp and timestamp component to them (i.e. newfile20090715105307). I'm only supposed to receive and process one file per day. If there is more than one file in the directory then I wish to abort the processing. Any thoughts on how I could do this? Any help you can provide will be very much appreciated. Thank you. sps 7/20/2009 6:21:45 PM # Muy bueno el articulo. Great job! Nacho 8/31/2009 8:28:48 AM #


Hi I am getting the file name inserted into table but not in order. I am inserting 100 files. What i need is i want to insert name of the file as one record in the the table for the each file in am inserting. Can anybody help me? vyas 10/5/2009 8:32:19 AM # Thanks man! That was really helpful! Mike 11/9/2009 11:38:07 AM # Thanks to Allan Mitchell and (U.S.) Kathleen's accurate comment on to set DelayValidation on the Data Flow task object & Source object (inside the task object). I was able to finally, automatically parse in submitted Excel files! Before, I wasted hours of my life manually setting them and therefore, redoing when I made mistakes. Thanks a lot!!! Rostand 11/25/2009 7:04:29 PM # Hi, this post has been very helpful, thank you. Unfortunately I still seem to be running into an issue. I have a process which loops through files in a given folder. The file name is read and validated to determine the appropriate Data Flow Task to execute (as there are currently four types of files requiring unique data flows). The process runs great for the first file. However it appears that the Foreach Loop Container is not updating the Package Variable (in relation to this post example the variable is called "FileWeJustFound") with the next file name. I have delayed all validation on the connections and the dataflow tasks with no success. Any ideas? TennesseePaul 11/30/2009 3:57:16 PM # Found it, just need some turkey. The Precedence Constraint was set to "OR" when it should have been "AND" for Constraint and Expression. TennesseePaul 12/10/2009 7:14:09 PM # How can I do the same thing but with MDB files and keep from importing records that already exist in the destination tables? John Fuhrman 12/27/2009 11:19:06 AM # Hi, I need to read the files in the loop in the ascending order of time when the files were created. Is there a way to do it?


Deepak 1/22/2010 3:39:11 PM # This example can be further extended with the MultipleFlatFiles Connector (right click Connector window) to process multiple input files in parallel! Create a multiFlatFile source Set the FlatFileDatasource to the Multipath Connection then you need to copy and paste your Dataflow to create a second instance of it run the package and you should see it processing both Dataflows! (may need to set concurrent threads etcs on the package)

ColinR 1/26/2010 3:09:54 PM # Hi, Interesting comment about the MultipleFlatFiles Connector. However, I think a disadvantage of this is the missing possibility of skipping to the next file when an error occurs, like you can when using a Foreach Loop Container. Or am is mistaken? Ronald Bijlhouwer 2/11/2010 1:55:45 PM # Hi, great article. Is it possible in some way to go to the next iteration of the loop? (like the 'continue' keyword in C#) For example: I loop over some files and I process their data in the data flow. In the data flow, I have a component that checks the validity of the data. If errors are found, the file is skipped and the foreach loop goes to the next file. Koen Verbeeck 3/12/2010 3:54:55 PM # This was very helpful. What connection string is needed for SSIS to SQL Server 2000? I created a source OLDDB to SQL Server 2000, then have a foreach loop changing connection. I was using: Provider=sqloledb;Data Source=SQLServerName;Integrated Security=SSPI;Initial Catalog=master Then tried: Data Source= SQLServerName;Initial Catalog=master;Provider=SQLNCLI10.1;Integrated Security=SSPI;Auto Translate=False; Giuseppe


Backup or transfer SSIS Packages

How can you backup your SSIS packages? I've been asked several times, and the answer is it depends.

Where do you store your packages?

SSIS Package Store The SSIS package store is just a folder on disk, so regular file system backups should suffice, or you can backup that folder specifically. By default it is %ProgramFiles%\Microsoft SQL Server\90\DTS\Packages. It is possible that multiple folders can be used, or the default is changed. This can be explored further by reviewing the SSIS service configuration file %ProgramFiles %\Microsoft SQL Server\90\DTS\Binn\MsDtsSrvr.ini.xml. For more information see Configuring the Integration Services Service. Restoration will depend on the capabilities of your file backup method and software. SQL Server (MSDB) The SQL Server store uses a table in the msdb database. For SQL Server 2005 this is dbo.sysdtspackages90, and for SQL Server 2008 this is dbo.sysssispackages. No extra work is required as msdb should already be included as part of your regular database backup routine. Regular backup and restore options apply but bear in mind msdb is a system database. Knowing the tables means we can manipulate the rows of data directly which offers some useful options such as moving rows via T-SQL, or other data access technologies including SSIS itself. File System This is perhaps the most common storage location used, and with good reason. It is easy to use and fits well with the code project style development paradigm we have with SSIS compared to traditional SQL Server object deployment and management. SSIS certainly fits better into this model, and we can use regular file system backups again.

SSIS Backup Tool

For those of you that remember DTS I wrote a rather handy tool DTSBackup. Whilst it was widely used by myself and many others, you could get by quite happily without it, but it gave people comfort, and more importantly it gave people more control and certainly faster and more granular recovery options. With SSIS there was a major development process change, lead by the new tools, to a code project style of development. This means that the primary backup should be your source code repository along with your release management process. The change in development paradigm between DTS and SSIS meant I didn't see a strong need for a SSIS port of the tool. Whilst I haven't written anything as polished or accomplished for SSIS, I have put together a sample application. Whilst I don't see a need for backups this code is still a useful example of transferring packages as part of a deployment or maintenance process. Here is one of the core methods implemented in the application. The application is a simple Windows Forms application and is aimed more at getting you started rather than be a polished application.
/// /// /// /// /// /// <summary> Import all packages from a file system folder into a SQL Server (MSDB) store. </summary> <param name="folder">The source file system folder path.</param> <param name="server">The target SQL Server name.</param> <returns>The number of packages transferred.</returns>


public static int ImportToSqlServer(string folder, string server) { // Validate parameters if (string.IsNullOrEmpty(folder)) { throw new ArgumentNullException(folder); } if (string.IsNullOrEmpty(server)) { throw new ArgumentNullException(server); } int counter = 0; Application application = new Application(); // Get package files string[] files = Directory.GetFiles(folder, "*.dtsx"); foreach(string file in files) { // Load and save package using (Package package = application.LoadPackage(file, null)) { application.SaveToSqlServer(package, null, server, null, null); counter++; } } return counter; }

The simple form allows you to import and export packages form SQL Server:

Sample Code Project (8KB)