Professional Documents
Culture Documents
MSBI Interview Question
MSBI Interview Question
What is ETL?
ETL is a process that extracts the data from different source systems, then transforms the data (like applying
calculations, concatenations, etc.) and finally loads the data into the Data Warehouse system. Full form of ETL is
Extract, Transform and Load.
It's tempting to think a creating a Data warehouse is simply extracting data from multiple sources and loading into
database of a Data warehouse. This is far from the truth and requires a complex ETL process. The ETL process
requires active inputs from various stakeholders including developers, analysts, testers, top executives and is
technically challenging.
In order to maintain its value as a tool for decision-makers, Data warehouse system needs to change with
business changes. ETL is a recurring activity (daily, weekly, monthly) of a Data warehouse system and needs to
be agile, automated, and well documented.
BI(Business Intelligence) is a set of processes, architectures, and technologies that convert raw data into
meaningful information that drives profitable business actions.It is a suite of software and services to transform
data into actionable intelligence and knowledge.
BI has a direct impact on organization's strategic, tactical and operational business decisions. BI supports fact-
based decision making using historical data rather than assumptions and gut feeling.
BI tools perform data analysis and create reports, summaries, dashboards, maps, graphs, and charts to provide
users with detailed intelligence about the nature of the business.
SSIS - How to Find The Version Of SSIS Package From Dtsx File
Scenario:
Let's say you just start working for a company and they pointed you to a folder which holds SSIS Packages. You
need to find out the version of these SSIS Package and schedule them to SQL Server 2008 or SQL Server 2012
according to their version.
Solution:
To find out the version our SSIS Package , we need to read the .dtsx file itself. We can open the file by using
different programs such as internet explorer or notepad or word pad etc. The .dtsx files are xml files and the
property we need to look for is "PackageFormatVersion".
PackageFormatVersion="PackageFormatVersion">3
PackageFormatVersion="PackageFormatVersion">6
Fig 1: SSIS Package with different versions
As you can see that I have two SSIS Packages but can't tell either they are SSIS 2008 or SSIS 2012.
Right Click on the Package.dtsx and go to Open With. You can choose the program you want to use for this. I
opened it with Notepad.
SSIS - What Is The Difference Between Control Flow and Data Flow In SSIS ?
Control Flow:
Control Flow is part of SQL Server Integration Services Package where you handle the flow of operations or
Tasks.
Let's say you are reading a text file by using Data Flow task from a folder. If Data Flow Task completes
successfully then you want to Run File System Task to move the file from Source Folder to Archive Folder. If
Data Flow Task failed then you want to send email to your users by using Send Mail Task. The Precedence
Constraints are used to control the execution flow.
Data Flow:
Data Flow is the part of SQL Server Integration Services Package, where data is extracted by using Data Flow
Sources ( OLE DB Source, Raw File Source, Flat File Source , Excel Source etc.). After extacting data Data Flow
Transformations such as Data Conversion, Derived Column, Lookup, Multicast,Merge etc are used to implement
different business logics and finally written to Data Flow Destinations (OLE DB Destination, Flat File
Destination,Excel Destination,DataReader Destination ADO NET Destination etc.)
What Is Parallel Execution In SSIS, How Many Tasks A SSIS Package Can Execute In Parallel?
In simple words, If you place more than one Task on Control Flow pane and do not connect them by using
Precedence constraint, the Tasks will run in Parallel.
This can be helpful to speedup the process when we load data from Source Database to Staging Database and
there is no dependency which table should be loaded first.
This is great , So If I need to load 100 staging tables from source database, I can run all of them in Parallel?
In this post, I am considering default settings, that means our SSIS Package will only be able to execute
Total Tasks=Number of processors of machine+2.
1-Connect to SQL Server by using SSMS if installed on the machine, Right Click on Instance Name and go to
properties and then General and you will be able to see the number of processors.
Fig 1: Find Number of Processors from SQL Server Instance
2-Click on Start and then in Search write "Device Manager" and it will open Device Manager, Click on Processors
and you will see them there.
Fig 2: Find out the Number of Processors on Computer by using Device Manager
My machine has 4 processors, So the max number of Tasks those can be executed by SSIS Package on my
machine will be 4(processors)+2=6 with default setting.
MaxConcurrentExecutables, a package level property in SSIS determines the number of control flow items that
can be executed in parallel. The default value is -1. This is equivalent to number of processors (logical and
physical) plus 2.
For example, in the below package running on my machine with 4 processors and MaxConcurrentExecutables =
-1, you can see 6 tasks have completed execution and 6 are currently running. It’s executing 6 at a time because
4 processors + 2 = 6 threads.
This applies to all versions of SSIS. Parallelism is powerful when your goal is to complete a process as quickly as
possible, specially when the tasks in a control flow are independent of each other.
If you’re thinking of increasing this setting to an infinity hoping to achieve a Nobel prize in performance tuning…
slow down. If the words throughput, threading, multi-tasking scares you, you should be careful with this property.
In most cases, the default setting can get the job done just fine.
Parallel execution improves performance on computers that have multiple physical or logical processors. To
support parallel execution of different tasks in the package, SSIS uses two properties:
MaxConcurrentExecutables and EngineThreads. MaxConcurrentExcecutables Property
The MaxConcurrentExecutables property is a property of the package. This property defines how many tasks can
run simultaneously; by specifying the maximum number of SSIS threads that can execute in parallel per
package. The default value is -1, which equates to the number of physical or logical processors plus 2. Using a
package which calls another package, in this example MaxConcurrentExecutables has its default value set as -1
and the server running the package has 8 processors, all 10 tasks (taking the Execute Package task in the
example, though it applies in the same way to other tasks as well) are executed in parallel, as shown
below:
If
MaxConcurrentExecutables was changed to 4 in the above package and run it on the same server, then only 4
tasks will runn in parallel at a time (Note the image below shows tasks are executed in a batch of 4, once 4 tasks
are executed another batch of 4 tasks will be executed)
What are the Precedence Constraints in SSIS, and where and why have you used them?
The executables in SSIS refer to tasks or containers. A precedence constraint links 2 executables: the
precedence executable and constrained executable. See an example below.
Green Arrow Success Returns true only if precedence executable runs Succe
Red Arrow Failure Returns true only if precedence executable runs Failur
Blue Arrow Completion Always returns true no matter what result precedence
Also the result of a precedence constraint can be evaluated by an expression you defined or both. See the
example below.
1. Create an new package in the project LearnSSIS1 and rename the package to
PrecedenceContraint.dtsx.
2. Open the package and define a variable V with Int32 type and 1 as its default value.
3. Drag and drop a script task to the package and rename the task to "Source". Then Copy and paste the
task and rename the copy to "Destination". At last using precedence constraint links them together as
follows.
5. Click "Stop Debugging" and open the source or the destination script task you will find the default source
code below.
6. Change the code in line 95 in "Source" task to the following and click "OK" button to save it.
Dts.TaskResult = (int)ScriptResults.Failure;
7. Run the package again, you will get the result below. Then click "Stop Debugging".
Because The "Source" task returns "Failure" but the precedence constraint was defined to continue
running if the source returns "Success". In this case, the package stopped running after the "Source"
task executed.
9. Run the package again, you will see the "destination" task runs successfully because the "Source"
returns Failure which meets the precedence constraint setting.
10. Right click the red failure arrow and choose "Edit..." to open precedence constraint editor.
o Constraint
o Expression
o Expression and Constraint
o Expression or Constraint
The default one is Constraint which we already tried in the previous steps. The value is the constraint
value we set and the current value is Failure. That means only if the "Source" task returns failure we'll
run "Destination" task.
11. Select "Expression and Constraint" and click "..." button in Expression to open Expression Builder to
create an expression @[User::V] == 1 and click OK button.
12. Leave "Multiple constraints" as the default "Logical AND" and click OK to save the setting. Then Run the
package.
You can see the "Destination" task ran OK because both constraint and the expression returns true. "fx"
in the above picture means the precedence constraint contains expression condition.
15. Right click the precedence constraint and change the evaluation operation to "Expression or Constraint"
and click OK.
16. Run the package again and you will find the "Destination" task runs successfully because the Constraint
and Expression are logic OR ( True || False = True ).
What is the difference between the Success and the Completion value of Precedence Constraint?
Precedence Constraints are the arrows those we use in Control Flow Pane to connect the Tasks. Precedence
Constraints are used to control the execution flow of Tasks as well under what condition pass execution control
to which Task.
The default constraint is Success that is represented with Green Arrow between Tasks.
Fig 1: Precedence Constraint on Success
In Fig 1, The Execute SQL Task has to execute successfully to pass execution control to Data Flow Tasks. If
Execute SQL task will fail then Data Flow Tasks will not execute.
If Execute SQL Task task fails, then control will not pass to Data Flow Tasks as shown in Fig 3.
There could be requirements in which even Execute SQL Task executes successfully or fail, we always want to
execute Data Flow Tasks. For this requirement, We need to configure Precedence Constraint to Completion.
Double click on the Green Arrow between the Tasks and then configure as shown below in Fig 4.
Fig 4: Configure Precedence Constraint for Completion
The Data Flow Tasks will execute on Completion of Execute SQL Task ( Completion can be success or failure
status).
DelayValidation Property:
DelayValidation Property is available on Task level, Connection Manager, Container and on Package level. By
default the value of this property is set to false that means that when the package start execution, It validates all
the Tasks, Containers, Connection Managers and objects( Tables,Views, Stored Procedures etc.) used by them.
If any object such as table or destination file etc. is not available then Package validation fails and Package stop
execution.
By setting this property to True, We enforce our SSIS Package not to validate that Task, Connection Manager or
entire Package at start but validate at run time. Let me explain with some real time examples
Let's say instead of creating permanent staging tables we decided to use temp tables in our ssis pacakge. We
want to load the data in temp table from flat file source and then want to use this temp table in other tasks. Before
we use temp table in Data Flow Task , we have to create it. As the temp table will be created by using Execute
SQL Task before the Data Flow task and if we let the Delayvalidation=false, package will try to validate temp
table in Data Flow Task. As Temp table will not be available at this point, Package will fail. To skip this part, we
can set the DelayValidation property to True so Package will skip Validation at start point. By the time package
will reach to Data Flow Task, The temp table will be created by Execute SQL Task in above step and it will
validate and load successfully.
To create Excel file with Datetime, you have to create empty excel destination file and keep it as template. The
steps involved are
1-- You copy the template file to required folder , while you copy the file you can rename with datetime.
As the file will not be available at the time of Packate Start, Package will fail to validate the Connection manager
and Data Flow Task. You can set the property DelayValidation=true for both by going to properties. By doing that
you are skipping the pre-validation. By the Time, Package will reach to Data Flow Task to load the data, you
would have created the file with datetime by using File System Task.
Example 3: How to Create Multiple Files Dynamically From a SQL Server Table
I have a post in which I read the data from table and then create file name dynamically by using data from table.
As the file name will be created later and no file will be available at time or Package Start time, I have used the
DelayValidation Property. Click Here to see the blog post.
To set the DelayValidation Property, You can right click on Task/Connection Manager and go to properties and
set it to true. You can also click on any Task/ Connection Manager/ Container and hit F4 Key to go to properties.
Fig 1: Package Level DelayValidation Property
We can set the property value to TRUE and then it will open just one OLEDB connection with a server
and keep it alive till the end of the package execution. The property can be set via the Properties
window for the OLEDB Connection Manager.
Here I am taking an example where I am using these properties. I am taking a Foreach loop
container. In a folder I am having more than 1400+ files and I want to insert a record to in table which
having the information of file name.
Take Foreach loop container
Take the Enumerator is file enumerator.
Configure the collection and select the folder and file extension and checked Name only because we
are storing the name. Select the Traverse subfolder. It means if folder contain sub folder it will travel
that.
I we need to map the variable.
Click ok.
No I am taking the execute sql task to insert the file name into the table.
Now I am configuring the execute sql task.
When we set the properties true it will take less time to complete the task because in previous case
each time it opening the connection and closing the connection. But in second case it opens the
connection once and after completing the task it will close this connection. If we are having the large
transaction the it will be better to set the Retain same connection to true. For small transaction set is
as false.
If we create a temp table in SSIS Package and want to use it in other tasks, which properties
do we need to use?
We have create a SSIS Package for Upsert(Insert/Update). We get csv file with millions
of records with (Id,Name,Address columns). If the record come with new Id , we need to
insert that record in dbo.Customer table(id, name, address) and for existing IDs we need
to update those records.
After doing some analysis, we got to know that the number of records those need to be
updated on daily basis are minimum 100,000 per day. To perform above task we can
use Lookup Transformation and find out existing and non existing records. Any non-
existing IDs can be directly inserted into dbo.Customer table but for update we have to
use OLE DB Command transformation. OLE DB Command transformation is slow, it will
update one row at a time and for 100,000 records it will take long time.
How about inserting the records into some staging table and write TSQL Statement to
Insert/update records? Good idea! It will be fast and easy to do. But my Architect do not
want to create a new table :(
Solution:
Ok, How about we create Temp table and then use it in our package to perform the
above task and once done, the Temp table will be gone!
Step 1:
Step 2:
Create dbo.Customer Table by using below script
USE TestDB
GOCREATE TABLE dbo.Customer
(
ID INT,
Name VARCHAR(100),
Address VARCHAR(100)
)
Step 3:
Create SSIS Package to load csv file into dbo.Customer Table.( Insert new records and
update existing)
Create OLE DB Connection to the database where your dbo.Customer table exists.
Right Click on Connection and then click properties or Click on Connection and press F4
to go to properties.
Set RetainSameConnection=True.
Step 4:
Create ##Temp table by using Execute SQL Task as shown below by using
Create Table ##Temp(ID INT, Name VARCHAR(100),ADDRESS VARCHAR(100))
Fig 2: Create ##Temp table by using Execute SQL Task
Step 5:
Bring Data Flow Task to Control Flow Surface and then connect Execute SQL task to it.
Inside Data Flow task bring Flat File Source and make connection to Source.csv file that
you have created in Step 1.
Drag Lookup Transformation and configure as shown below. Our goal is to Insert any
record which Id does not exist in dbo.Customer table and if ID exists we want to update
that records. Instead of using OLE DB Command Transformation, we will insert records
which needs to be update in ##Temp table inside Data Flow Task.
Fig 3: Configure Lookup Transformation ( Redirect rows to no match output)
Fig 4: Choose Id from dbo.Customer for lookup
Fig 5: Map the Source Id to dbo.Customer.ID for lookup
Step 6:
Bring OLE DB Destination Transformation from Data Flow Items as shown. Join No
Match Output ( new records) of Lookup to OLE DB Destination and choose destination
Table (dbo.Customer).
Fig 6: Insert new records by using No Match Output of Lookup Transformation
As we do not want to use OLE DB Command transformation for update inside Data Flow
Task. Let's write all records those need to be update into ##Temp table by using OLE
DB Destination. We will not be able to see ##Temp table in drop down in OLE DB
Destination. Here are two steps we need to take
i) Create a variable with name ##Temp as shown below
Fig 7: TableName variable holding Temp Table Name
ii) Go to SSMS and create ##Temp table ( if you would not create this table, you will not
be able to map the columns in OLE DB Destination)
Create Table ##Temp(ID INT, Name VARCHAR(100),ADDRESS VARCHAR(100))
Bring the OLE DB Destination and map to TableName Variable as shown below.
Fig 8: Configure OLE DB Destination to use TableName variable for Destination Table
Name.
Fig 9: Map the Source Columns to ##Temp Table Columns
After all the configuration our Data Flow will look like below figure. I renames the
transformation to provide better picture about what we are doing in this Data Flow Task.
Go to Control Flow Surface and Drag Execute SQL Task to write update statement.
UPDATE DST
SET DST.Name=SRC.Name
,DST.ADDRESS=SRC.ADDRESS
FROM dbo.Customer DST
INNER JOIN ##Temp SRC
ON DST.ID=SRC.ID
If we try to run the SSIS Package, It might complain that ##Temp does not exists. Go to
package properties by right clicking in Control Flow Pane and Set DelayValidation=True.
By setting DelayValidation we are asking the package not to validate any objects as
##Temp table does not exist at this point and it will be created later in Package.
Run the Package couple of times and check the data in dbo.Customer table. Data
should be loaded. Now let's go to Source.csv file and change some values for Name and
Address columns and run the package one more time to make sure, Update logic is
working fine.
Here is the data after update.
Id,Name,Address
1,Aamir1,Test ADDRESS
2,Raza1,Test Address
3,July, 123 River Side CA USA
4,Robert,540 Rio Rancho NM
As we can see that the records are updated, where ever we made changes in Name and
Address values.
Fig 16: dbo.Customer data after Upsert
What is data Viewer in SSIS? Is data viewer available in ControlFlow or Data Flow?
Scenario:
Let’s say we are developing a package and it extracts some records from source,
Implement some business logic by using different transformations and finally load into
destination (table/file). When we look at destination, record is incorrect but we are not
sure what happen to source record. We want to see the change in record/records after
each of transformation to find out which logic is not working correctly.
Solution:
SQL Server Integration Services (SSIS) provided Data Viewer in Data Flow Task. Data
Viewer can be used between two transformations to see the data. When we executes
our package Data Viewer pop up window shows data so we can see What is changed
from Input to Output.
In this example we are extracting few records from Source, We want to see what we are
extracting. We have used aggregate transformation that is grouping by CountryName
and doing Sum operation on SaleAmount. We can create second data viewer after
Aggregate transformation to see the data.
To use Data Viewer between Transformations, Double click on green connection that
exists between two transformations, it will open Data Viewer Editor (Data Flow Path
Editor).
In SSIS 2008/ R2 and previous versions where other options were available in Data
Viewer, those options are removed. The only Grid option is left in SSIS 2012 and latest
versions and that is even not called Grid anymore but only Data Viewer.
Data Viewer Configuration window in SSIS 2008R2 and old
Once data viewers are created, we can execute our package. We will be able to see
data at different stages of execution.
How to use Data Viewer in SSIS Package to view data while debugging SSIS Package
We can hit Play button in Data Viewer Output window to go to next Data Viewer. The
data can also be copied from Data Viewer and used for testing.
Problem
We have a number of SSIS packages that routinely fail for various reasons such as a
particular file is not found, an external FTP server is unavailable, etc. In most cases
these error conditions are just a temporary situation and we can simply rerun the
package at a later time and it will be successful. The issue, however, is that we do
not want to rerun the tasks in the package that have have already completed
successfully. Is there a way that we can restart an SSIS package at the point of
failure and skip any tasks that were successfully completed in the previous execution
of the package?
Solution
SSIS provides a Checkpoint capability which allows a package to restart at the point
of failure. The Checkpoint implementation writes pertinent information to an XML file
(i.e. the Checkpoint file) while the package is executing to record tasks that are
completed successfully and the values of package variables so that the package's
"state" can be restored to what it was when the package failed. When the package
completes successfully, the Checkpoint file is removed; the next time the package
runs it starts executing from the beginning since there will be no Checkpoint file
found. When a package fails, the Checkpoint file remains on disk and can be used
the next time the package is executed to restore the values of package variables and
restart at the point of failure.
The starting point for implementing Checkpoints in a package is with the SSIS
package properties. You will find these properties in the Properties window under the
Checkpoints heading:
CheckpointFileName - Specify the full path to the Checkpoint file that the package
uses to save the value of package variables and log completed tasks. Rather than
using a hard-coded path as shown above, it's a good idea to use an expression that
concatenates a path defined in a package variable and the package name.
CheckpointUsage - Determines if/how checkpoints are used. Choose from these
options: Never (default), IfExists, or Always. Never indicates that you are not using
Checkpoints. IfExists is the typical setting and implements the restart at the point of
failure behavior. If a Checkpoint file is found it is used to restore package variable
values and restart at the point of failure. If a Checkpoint file is not found the package
starts execution with the first task. The Always choice raises an error if the
Checkpoint file does not exist.
SaveCheckpoints - Choose from these options: True or False (default). You must
select True to implement the Checkpoint behavior.
After setting the Checkpoint SSIS package properties, you need to set these
properties under the Execution heading at the individual task level:
Before wrapping up the discussion on Checkpoints, let's differentiate the restart from
the point of failure behavior with that of a database transaction. The typical behavior
in a database transaction where we have multiple T-SQL commands is that either
they all succeed or none of them succeed (i.e. on failure any previous commands
are rolled back). The Checkpoint behavior, essentially, is that each command (i.e.
task in the SSIS package) is committed upon completion. If a failure occurs the
previous commands are not rolled back since they have already been committed
upon completion.
Let's wrap up this discussion with a simple example to demonstrate the restart at the
point of failure behavior of Checkpoints. We have an SSIS package with Checkpoint
processing setup to restart at the point of failure as described above. The package
has two Execute SQL tasks where the first will succeed and the second will fail. We
will see the following output when running the package in BIDS:
Notice that Task 1 is neither green nor red; in fact it was not executed. The package
began execution with Task 2; Task 1 was skipped because it ran successfully the
last time the package was run. The first run ended when Task 2 failed. The second
run demonstrates the restart at the point of failure behavior.
Caveats:
SSIS does not persist the value of Object variables in the Checkpoint file.
When you are running an SSIS package that uses Checkpoints, remember that when
you rerun the package after a failure, the values of package variables will be restored
to what they were when the package failed. If you make any changes to package
configuration values the package will not pickup these changes in a restart after
failure. Where the failure is caused by an erroneous package configuration value,
correct the value and remove the Checkpoint file before you rerun the package.
For a Data Flow task you set the FailPackageOnFailure or FailParentOnFailure
properties to True as discussed above. However, there is no restart capability for the
tasks inside of the Data Flow; in other words you can restart the package at the Data
Flow task but you cannot restart within the Data Flow task.
Breakpoints in SSIS
A breakpoint is an intentional stop marked in the code of an application where execution pauses for
debugging. This allows the programmer to inspect the internal state of the application at that point.
When we developing the package in ssis we need to test and troubleshoot issue. It is helpful to know
the status of the data at certain points in the executing of the package.
In other word we can say that using the Breakpoints we debug the SSIS package, view the value of
the variables. It enables us to stop a package during the execution and view the status of these items.
We can see the value of variable immediately before or after execution of the task.
Open SSDT.
Now we need to configure the for Loop properties.
Click ok.
Now I am taking script task in the For loop container to display the values.
Assign the values
Edit Script
Each option in the Set Breakpoints window will stop the package execution at a different point during
The task:
The most commonly used events in breakpoints are OnPreExecute, OnPostExecute, OnError,
The other properties in the Set Breakpoints window are Hit Count and Hit Count Type. These
properties
Hit count Type: - as we discuss above you can select the hit count type and hit count as per your
need.
Above example I show how to use breakpoints on the Container. Similarly we set the break points on
Script task as well as on Control flow and Data flow task.
Click OK.
For viewing the value on execution time on the Data flow pan we are using Data viewer.
Will my package run successfully by using SQL Server Agent if I have data viewers and
Breakpoint enabled?
Breakpoints and Data Viewers are only artifacts that have meaning within the debugger.
If running your package from SQL Agent fails, then there's a whole host of things that
could be wrong, generally permission related, but a data viewer or a breakpoint will not
be one of them.
This is short post to answer one of the interview questions " What are different ways to
execute SSIS Package"?
SSIS Package can be executed by multiple ways, here are some of them.
2) DtExecUI
Execute Package Utility (DtExecUI) is graphical interface to run the SSIS Packages. The
Utility can run packages from different locations such as MS SQL Server Database,SSIS
Package Stored or packages stored in file system.
When you connect to SSIS Instance by using SSMS and then run the package , it
initiates DtExecUI. The graphical interface provide you different options to change the
values of variables , connection mangers etc.
If your packages are stored in file system task and you double click the .dtsx file, it
opens with DtexecUI. It is stand alone utility.
3) Dtexec.exe
Dtexec.exe is command line way to run your package. You have to provide information
such as package path to run the package from command line. You can also provide the
values of variables or Connection managers from command line to run the package with
specific requirements.
4) SQL Server Agent Job
SQL Server Agent can be used to create job that can run the SSIS Package on demand
or schedule. The SQL Server Agent Job can be single Step calling a SSIS Package or it
can consist of multiple steps calling more than one SSIS Packages. In most of the
companies the packages are scheduled by using SQL Server Agent. SQL Server agent
can access the packages those are stored in SQL Server or from folder storage.
I have written as post how to run SSIS Package from Excel.You can check here. What I
am doing in that post, I am calling dtexec.exe to execute SSIS Package on button click
that I created in Excel by using VBA. You can use any program of your choice that can
start dtexec.exe to run your SSIS Package, that can be custom application.
There are basically two ways you can execute your SSIS package from the user stored
procedure. First, you can create a SQL Server Agent job that would have a step to
execute the SSIS package, and wherever you want to execute it, you can use
sp_start_job system stored procedure to kick off the job, which in turn will execute the
package. You might not have any schedule attached to the job if you just want to
execute it from the user stored procedure, when you don't want your job to be kicked off
(SSIS package to be executed) on a defined schedule time/interval. The disadvantage of
using this approach is you don't have an easy way if you have to pass some values
(assign values to SSIS package variable) at run time. You might need to have a
metadata table from where your SSIS package will consume/retrieve the runtime values.
Second, you can enable xp_cmdshell extended stored procedure, and using it you can
execute DTEXEC utility to execute your SSIS package. The disadvantage of using this
approach is that enablement of xp_cmdshell poses security threats (operating system
level access) and hence by default it's disabled. However using this approach provides
finer level control of passing SSIS package variables' runtime values easily. In this
article, I am assuming the first approach is simple and straightforward and thus I will
jump directly to the second approach.
Please make sure you have enabled xp_cmdshell component or extended stored
procedure, or else you will get an exception like this:
Enabling xp_cmdshell
To enable xp_cmdshell you need to use sp_configure system stored procedure with
advance options like this:
USE master
GO
-- To allow advanced options to be changed.
EXEC sp_configure 'show advanced options', 1
GO
-- To update the currently configured value for advanced options.
-- WITH OVERRIDE disables the configuration value checking if the value is valid
RECONFIGURE WITH OVERRIDE
GO
-- To enable the xp_cmdshell component.
EXEC sp_configure 'xp_cmdshell', 1
GO
RECONFIGURE WITH OVERRIDE
GO
-- Revert back the advance option
EXEC sp_configure 'show advanced options', 0
GO
RECONFIGURE WITH OVERRIDE
GO
Executing a SSIS package stored in SQL Server from the user stored procedure
Once xp_cmdshell extended stored procedure is enabled, you can execute any operating
system command as a command string. In our case, we will be using DTEXEC utility of
SSIS to execute the package. Since the SSIS package is stored in SQL Server, we need
to use /SQL switch with DTEXEC command and /SET switch to set the values of SSIS
variables as shown below:
Executing a SSIS package stored in file system from the user stored procedure
As our SSIS package is now stored in the file system, we need to use /FILE or /F switch
with DTEXEC command and /SET switch to set the values of SSIS variables as shown
below. You can see that executing the xp_cmdshell extended stored procedure it
outputs, if there is any, results in rows of texts. If you don't want this output to be
returned, you can use second parameter (NO_OUTPUT) of this stored procedure:
Notes
Conclusion
In this article I talked about executing a SSIS package from user defined stored
procedure. In the first approach, we created a job, making SSIS package call as a job
step and executing it by calling sp_start_job system stored procedure from the user
defined function. In the second approach, we enabled xp_cmdshell to execute DTEXEC
command line utility from the user defined stored procedure.
What types of deployment are available for a SSIS Package? Explain all.
Prior SSIS 2012, in all versions like SSIS 2005, 2008 or 2008 R2 we had ‘Package
Deployment Model’. With the introduction of SQL Server 2012 or 2014, a new Deployment
model introduced named ‘Project Deployment Model’.
Let’s see both deployment models in details.
In this model, package is the unit of deployment. At a time we can deploy single
package not multiple like in SSIS 2012.
Packages and configuration saved in the file system. Package with extension ‘.dtsx’
and configuration file with extension ‘.dtsConfig’.
Packages are validated just before execution. You can also validate a package
with dtExec or managed code.
During execution, events that are produced by a package are not captured
automatically. A log provider must be added to the package to capture events.
Under this model, package configuration is required for each and every package
under the project.
When we deploy same package again, it’ll overwrite the old one and due to this
reason there was no way to check the previous history that how much time we’ve
deployed our packages.
The following is the example showing how we can configure our SSIS 2008 created package
for deployment on another computer.
In this deployment model, to deploy any packages, we need to go through the following four
steps:
On clicking this you’ll get Package Configuration Organizer, check ‘Enable package
configuration’ option and you’ll get Add button to add your configuration as shown below.
Click on the Add button and you’ll get Package configuration Wizard as shown below.
Click on Next and select type of configuration you want in your package.
I’m selecting XML configuration file and will save the file at desired location.
Click on Next and select the properties you want to be exported to the configuration file.
Click on Next and you’ll get a summary of your configuration file.
Click on Finish to finalize your configuration file creation. After doing the above steps, you’ll
get something like below.
Now we’re done with Step 1 where we created the package configuration file.
Step 2: To create a deployment utility
At this step, we require a package deployment utility for our project, which contains the
package that we want to deploy.
To create Deployment Utility, right click on your package and select Properties and you’ll get
the following window:
On this window, set CreateDeploymentUtility to ‘True’ and click on OK.
Now build your Project as shown below.
On successful build, you’ll get the following message in the output window.
1. ------ Build started: Project: Project1_BIDS (package deployment mod
el), Configuration: Development------
2. Build started: SQL Server Integration Services project: Incremental.
..
3. Creating deployment utility...
4. Deployment Utility created.
5. Build complete -- 0 errors, 0 warnings
6. ========== Build: 1 succeeded or up-to-date, 0 failed, 0 skipped ===
=======
If you go to \bin folder of your project you’ll find a newly created directory named
‘Deployment’ with some files.
These files are nothing but your configuration file, Manifest file and your packages you want
to deploy.
Step 3: Copy Deployment folder
Copy your Deployment folder under which you had built your project to the target computer
on which you want to deploy your package.
Step 4: Package Installation
Install your package with help of the ‘Package Installation Wizard’ to the file system or to
an instance of SQL Server.
As we can see, we’ve added 3 packages in our project and if we want to deploy those
packages, we need to do it one-by-one.
So, this was some information regarding package deployment model. Now, let’s move to the
next deployment model.
In this model, the project is considered as a unit of deployment. This means we can
deploy whole project.
During execution, events that are produced by the package are captured
automatically and saved in the catalog.
One disadvantage I found under this deployment model is that, you cannot deploy
one or more packages without deploying the whole project. But SSIS 2016
introduced an Incremental Package Deployment feature that allows you to deploy
one or more packages without deploying the whole project.
When you build a project under this deployment model, it’ll create a deployment file
with .ispac extension as shown below.
The project deployment file shown above is a self contained unit of deployment that includes
only the essential information about the packages and the parameters of the project.
Project Deployment Model doesn’t require any package configuration file as we did in
Package Deployment. Here you can build and deploy your package under your catalog.
Let’s deploy our package which we have created under Project deployment model.
After successful build, you’ll get ‘Integration Services Project Deployment File (.ispac)’
file under a bin/Deployment folder. Double click on that file and it’ll open Deployment Wizard
as shown below.
Click next and select your Project Deployment File as Source.
Click Next to choose Destination where you want to deploy your SSIS packages.
I’ve already created my Integration Service Catalog and I’ll deploy my package over there.
Click Next and you’ll get the summary of the deployment i.e. Source, Destination, etc.
Click on Deploy to finalize the setup. On successful deployment you’ll get the following
window.
We’ve successfully deployed our package in this Model which seems very simple as
compared to Package Deployment Model.
Now, if you open Integration Services Catalog under SQL Server, you’ll find your deployed
package as shown below.
So with this, we’re done with Deployment models available in SSIS.
Packages are executed with T-SQL. Packages are executed with dtexe
Parameters and environments can and dtexecui. Command parameters
be set with T-SQL. can be passed to the command
prompt.
Which version of SSIS can track versions of a SSIS Package deployed to the Server?
To check the package format version you have to open .dtsx file itself . If
PackageFormatVersion>3 then its 2008 when PackageFormatVersion>6 then its 2012
and when PackageFormatVersion>8 then 2014. I have added snapshot it might help you
to understand how to check version in SSIS.
What are the different ways to run your SSIS package on a schedule?
SSIS Package can be executed by multiple ways, here are some of them.
2) DtExecUI
Execute Package Utility (DtExecUI) is graphical interface to run the SSIS Packages. The
Utility can run packages from different locations such as MS SQL Server Database,SSIS
Package Stored or packages stored in file system.
When you connect to SSIS Instance by using SSMS and then run the package , it
initiates DtExecUI. The graphical interface provide you different options to change the
values of variables , connection mangers etc.
If your packages are stored in file system task and you double click the .dtsx file, it
opens with DtexecUI. It is stand alone utility.
3) Dtexec.exe
Dtexec.exe is command line way to run your package. You have to provide information
such as package path to run the package from command line. You can also provide the
values of variables or Connection managers from command line to run the package with
specific requirements.
I have written as post how to run SSIS Package from Excel.You can check here. What I
am doing in that post, I am calling dtexec.exe to execute SSIS Package on button click
that I created in Excel by using VBA. You can use any program of your choice that can
start dtexec.exe to run your SSIS Package, that can be custom application.
Let’s say you have configured Event Handler to send an email to report an error for Data
Flow Task inside For Each Loop. If error occurred in a data flow task, you will get multiple
emails. Why is that? Howwe can prevent those series of emails coming for one error?
Scenario:
I have created an SSIS Package and have configured Event Handler on Package level.
If any error occurs in any Task, I want to Send Email with Error Code , Error Description
etc. It is working great. But each time SSIS Package fails, It send more than one email.
How to get only single email if error occurs in SSIS Package?
Solution:
If the Event Handler is configured on package level OnError Event, If any Task fails then
sequence of calls will be send to Event Handler. Let's say we have Data Flow Task
inside the Sequence container. If Data Flow Task fails then first call will be send to Event
Handler and then second will be Send by Sequence Container. That is the reason Tasks
inside Event Handler will run multiple times in our case Send Email.
To handle this, we will create a variable "ErrorCnt" of Integer type. We will have value 0
for this variable and after Sending Email in Event Handler we will increase the value of
variable from 0 to 1. We will use this variable in Precedence constraint to Send Email
Task or any other task only if the value is 0. As the value will be increased after first call
from 0 to 1. Next time the tasks will not run.
Step 1:
Create an SSIS Package, I have created SSIS Package with two Data Flow Tasks inside
Sequence Container as shown below
Step 3:
Let's go to Event Handler Pane now and configure. Bring the Script Task to Event
Handler Pane. We will be using the Script Task as dummy. The only purpose is to set
the Precedence Constraint so the below Tasks run only when ErrorCnt=0. I have used
Execute SQL Task to Send an Email. Connect Script Task to Execute SQL Task. The
last Task is going to be Execute SQL Task in which we are going to change the value of
ErrorCnt variable from 0 to 1.
In Fig 3, I have showed you that how your Event Handler would look like. Now lets see
how did I configured them. If you see the very first item that I have configured is
Precedence Constraint between Script Task and Execute SQL Task(I am using this to
send an email). I am putting condition ErrorCnt=0, That means it will run only when the
value of ErrorCnt=0. Here is how it is configured
Fig 4: Configuring Precedence Constraint to use ErrorCnt Variable
Let's change the value of ErrorCnt variable from 0 to 1 so multiple calls can be handled.
Fig 5: Set the value of variable ErrorCnt to 1
You are all done. This sound lengthy package as I had to create some Data Flow Tasks
etc. but it is pretty short and simple. You are creating a variable with value=0 and that is
begin used in Precedence Constraint and let the other tasks run when value is 0. Once
the Tasks are run in Event Handler , you set the value of variable to 1 so on next
iteration, it will make the expression false and no Task will run. If you need to see how to
send email