You are on page 1of 36

Q1 Explain architecture of SSIS?

SSIS architecture consists of four key parts:


a) Integration Services service: monitors running Integration Services packages and manages the
storage of packages.
b) Integration Services object model: includes managed API for accessing Integration Services tools,
command-line utilities, and custom applications.
c) Integration Services runtime and run-time executables: it saves the layout of packages, runs
packages, and provides support for logging, breakpoints, configuration, connections, and transactions.
The Integration Services run-time executables are the package, containers, tasks, and event handlers that
Integration Services includes, and custom tasks.
d) Data flow engine: provides the in-memory buffers that move data from source to destination.

1.what is a package?
a)A discrete executable unit of work composed of a collection of control flow and other objects,
including data sources, transformations ,process sequence, and rules, errors and event handling, and data
destinations.

2.what is a workflow in ssis ?


a)a workflow is a set of instructions on how to execute tasks.
(It is a set of instructions on how to execute tasks such as sessions, emails and shell commands. a
workflow is created form work flow mgr.)
3.what is the diff between Control Flow Items and Data Flow Items?
a).the control flow is the highest level control process. It allows you to manage the run-time process the
run time process activities of data flow and other processes within a package.
when we want to extract, transform and load data within a package. you add an ssis dataflow task to the
package control flow.

4.different components in ssis package?


1.control flow
2.data flow
3.event handler
4.package explorer

5.how to deploy the package?


).to deploy the package first we need to configure some properties.
goto project tab->package properties->we get a window,configure deployment utilIty as "true"
mention the path as "bin/deployment"

7. Connection manager:
a).It is a bridge b/w package object and physical data. It provides logical representation of a connection
at design time the properties of the connection mgr describes the physical connection that integration
services creates when the package is run.

8. Tell the utilIty to execute (run) the package?


a) In BIDS a package that can be executed in debug mode by using the debug menu or toolbar or from
solution explorer.
In production, the package can be executed from the command line or from a Microsoft windows utilIty,
or It can be scheduled for automated execution by using the sql server agent.

i).goto->debug menu and select the start debugging button


ii).press F5 key
iii).right click the package and choose execute package.
iv).command prompts utilIties

a).DTExecUI
1. To open command prompt->run->type dtexecui->press enter
2. The execute package utilIty dialog box opens.
3. in that click execute to run the package.
Wait until the package has executed successfully.

b).DTExec utilIty
1.open the command prompt window.
2.command prompt window->type dtexec/followed by the DTS,SQL,or file option and the package path
,including package name.
3.if the package encryption level is encrypt sensItive wIth password or encrypt all wIth password, use
the decrypt option to provide the password.
If no password is included, dtexec will prompt you for the password.
4. Optionally, provide additional command-line options
5. Press enter.
6. Optionally, view logging and reporting information before closing the command prompt window.
The execute package utilIty dialog box opens.

7. In the execute package utilIty dialog box, click execute package.


Wait until the package has executed successfully.
v).using sql server mgmt studio to execute package
1. In SSMS right click a package, and then click run package.
Execute package utility opens.
2. Execute the package as described previously.

9. How can u design SCD in SSIS?


a) Def:-SCD explains how to capture the changes over the period of
time.This is also known as change data capture.

Type1: It keeps the most recent values in the target. It does not
maintain the history.
Type2: It keeps the full history in the target database. For every
update in the source anew record is inserted in the target.
Type3: It keeps current & previous information in the target.
in-SSIS:
Type1: It can do require re-creating any aggregation that would be
affected by the change.

Type2: Changes can cause a serious inflation in the number of


members of a dimension.
T1ype3: as with a type 1 change, type 3 change requires a
dimension update, so u need to re-process All aggregations
affected after change.
9) Slowly Changing Dimension?
The Slowly Changing Dimension transformation coordinates the updating and inserting of records in
data warehouse dimension tables. For example, you can use this transformation to configure the
transformation outputs that insert and update records in the DimProduct table of the
AdventureWorksDW2012 database with data from the Production.Products table in the AdventureWorks
OLTP database.
The Slowly Changing Dimension transformation provides the following functionality for managing
slowly changing dimensions:

 Matching incoming rows with rows in the lookup table to identify new and existing rows.

 Identifying incoming rows that contain changes when changes are not permitted.

 Identifying inferred member records that require updating.

 Identifying incoming rows that contain historical changes that require insertion of new records
and the updating of expired records.

 Detecting incoming rows that contain changes that require the updating of existing records,
including expired ones.
10. How can u handle the errors through the help of logging in SSIS?
a) To create an on error event handler to which you add the log error execute sql task.

11. What is a log file and how to send log file to mgr?
a) It is especially useful when the package has been deployed to the production environment, and you
can not use BIDS and VSA to debug the package.
SSIS enables you to implement logging code through the Dts. Log method. When the Dts. Log method
is called in the script, the SSIS engine will route the message to the log providers that are configured in
the containing package.

12. What is environment variable in SSIS?

An environment variable configuration sets a package property equal to the value in an environment
variable.
Environmental configurations are useful for configuring properties that are dependent on the computer
that is executing the package.

13. about multiple configurations?


It means including the xml configuration, environment variable, registry entry, parent package variable,
SQL Server table, and direct and indirect configuration types.

14. How to provide security to packages?


In two ways
1. Package encryption
2. Password protection.

15. as per error handling in T/R, which one handle the better performance? Like fail component,
redirect row or ignore failure?
Redirect row provides better performance for error handling.

16. Staging area??


It is a temporary data storage location. Where various data T/R activities take place. A staging area is a
kitchen of data warehouse.

17. Task??
a) An individual unit of work.
1. Active x script task
2. Analysis services execute DDL task
3. Analysis services processing task
4. Bulk insert task *
5. Data flow task *
6. Data mining query task
7. Execute Dts 2000 package task
8. Execute package task *
9. Execute process task
10. Execute sql task *
11. File system task
12. Ftp task
13. Message queue task
14. Script task *
15. Send mail task *
16. Web service task
17. Wmi data reader task
18. Wmi event task
19. Xml task

18. Event handling & logging?


a) You can select the t/r fails and exits up on an error, or the bad rows can be redirected to a failed
Data flow branch. Ignore failure, redirect row.
Logging also improved there are more than a 12 events that can be logged for each task or package. You
can enable partial logging for one task and enable much more detailed logging for billing tasks.
Ex:-on error
On post validate
On progress
On warning
--->log file can be written to usually any connection
Sql profiler
Text files
Sql server
Window event log
Xml file

19. Import & export wizard?


a) Easiest method to move data from sources like oracle, db2, sql server.
Right click on database name->goto task->import and export wizard
Select the source
Select the destination
Query copy of tables
Execute
Finish

20.solution explorer?
after creating project
project name
-data source
-data source views
-packages
-miscellaneous

21. Precedence constraints?


a) Constraints that link executable, container, and tasks within the package control flow and specify
condition that determine the sequence
And conditions for determine whether executable run.

22. Data pipeline?


a) The memory based, multithreaded, buffered t/r process flow data through an SSIS data flow task
during package execution.

23. TRANSFORMATIONS??
It is an object that generates, modifies, or passes data.
1.AGGEGATE T/R:-It applies an aggregate function to grouped records and produces new output
records from aggregated results.
2.AUDIT T/R:-the t/r adds the value of a system variable, such as machine name or execution instance
GUID to a new output column.
3.CHARACTER MAP T/R:-this t/r makes string data changes such as changing data from lower case to
upper case.
4.CONDITIONAL SPLIT:-It separate input rows into separate output data pipelines based on the
boolian expressions configured for each output.
5.COPY COLUMN:-add a copy of column to the t/r output we can later transform the copy keeping the
original for audIting personal
6.DATA CONVERSION:-converts a columns data type to another data type.
7.DATA MINING QUERY:-perform a data mining query against analysis services.
8.DERIVED COLUMN:-create a new derive column calculated from expression.
9.EXPORT COLUMN:-It allows you to export a column from the data flow to a file.
10.FUZZY GROUPING:-perform data cleansing by finding rows that are likely duplicates.

11.FUZZY LOOKUP:-matches and standardizes data based on fuzzy logic.


eg:-transform the name jon to john
12.IMPORT COLUMN:-reads the dat from a file & adds It into a dataflow.
13.LOOKUP:-perform the lookup of data tobe used later in a transform.
ex:-t/f to lookup a cIty based on zipcode.
1.getting a related value from a table using a key column value
2.update slowly changing dimension table
3.to check whether records already exist in the table.
14.MERGE:-merges two sorted data sets into a single data set into a single data flow.
15.MERGE JOIN:-merges two data sets into a single dataset using a join junction.
16.MULTI CAST:-sends a copy of two datato an addItional path in the workflow.
17.ROW COUNT:-stores the rows count from the data flow into a variable.
18.ROW SAMPLING:-captures the sample of data from the dataflow by using a row count of the total
rows in dataflow.
19.ROW SAMPLING:-captures the sample of the data from the data flow by using a row count of the
total rows in data flow.
20.UNION ALL:-merge multiple data sets into a single dataset.
21.PIVOT:-converts rows into columns
22.UNPIVOT:-converts columns into rows

24. Batch?
a) A batch is defined as group of sessions. Those are 2 types.
1. Parallel batch processing
2. Sequential batch processing

-----For executing the package we can use "execute package utilIty"----

----for deploying the package we can use "package deployment utilIty"—

25 How would you do Logging in SSIS?


Logging Configuration provides an inbuilt feature which can log the detail of various events like
onError, onWarning etc to the various options say a flat file, SqlServer table, XML or SQL Profiler.
Q3 How would you do Error Handling?
A SSIS package could mainly have two types of errors
a) Procedure Error: Can be handled in Control flow through the precedence control and redirecting the
execution flow.
b) Data Error: is handled in DATA FLOW TASK buy redirecting the data flow using Error Output of a
component.

Q4 How to pass property value at Run time? How do you implement Package Configuration?
A property value like connection string for a Connection Manager can be passed to the pkg using
package configurations.Package Configuration provides different options like XML File, Environment
Variables, SQL Server Table, Registry Value or Parent package variable.

Q5 How would you deploy a SSIS Package on production?


 A) Through Manifest
1. Create deployment utility by setting its propery as true .
2. It will be created in the bin folder of the solution as soon as package is build.
3. Copy all the files in the utility and use manifest file to deply it on the Prod.
B) Using DtsExec.exe utility
C)Import Package directly in MSDB from SSMS by logging in Integration Services.

Q6 Difference between DTS and SSIS?


Every thing except both are product of Microsoft :-).

Q7 What are new features in SSIS 2008?


explained in other post
http://sqlserversolutions.blogspot.com/2009/01/new-improvementfeatures-in-ssis-2008.html

Q9 What is Execution Tree?


Execution trees demonstrate how package uses buffers and threads. At run time, the data flow engine
breaks down Data Flow task operations into execution trees. These execution trees specify how buffers
and threads are allocated in the package. Each tree creates a new buffer and may execute on a different
thread. When a new buffer is created such as when a partially blocking or blocking transformation is
added to the pipeline, additional memory is required to handle the data transformation and each new tree
may also give you an additional worker thread.

Q10 What are the points to keep in mind for performance improvement of the package?
http://technet.microsoft.com/en-us/library/cc966529.aspx

Q11 You may get a question stating a scenario and then asking you how would you create a
package for that e.g. How would you configure a data flow task so that it can transfer data to
different table based on the city name in a source table column?

Q13 Difference between Unionall and Merge Join?


a) Merge transformation can accept only two inputs whereas Union all can take more than two inputs

b) Data has to be sorted before Merge Transformation whereas Union all doesn't have any condition like
that.

Q14 May get question regarding what X transformation do?Lookup, fuzzy lookup, fuzzy
grouping transformation are my favorites.
For you.

Q15 How would you restart package from previous failure point?What are Checkpoints and how
can we implement in SSIS?
When a package is configured to use checkpoints, information about package execution is written to a
checkpoint file. When the failed package is rerun, the checkpoint file is used to restart the package from
the point of failure. If the package runs successfully, the checkpoint file is deleted, and then re-created
the next time that the package is run.

Q16 Where are SSIS package stored in the SQL Server?


MSDB.sysdtspackages90 stores the actual content and ssydtscategories, sysdtslog90,
sysdtspackagefolders90, sysdtspackagelog, sysdtssteplog, and sysdtstasklog do the supporting roles.

Q17 How would you schedule a SSIS packages?


Using SQL Server Agent. Read about Scheduling a job on Sql server Agent

Q18 Difference between asynchronous and synchronous transformations?


Asynchronous transformation have different Input and Output buffers and it is up to the component
designer in an Async component to provide a column structure to the output buffer and hook up the data
from the input.

Q19 How to achieve parallelism in SSIS?


Parallelism is achieved using MaxConcurrentExecutable property of the package. Its default is -1 and
is calculated as number of processors + 2.

-More questions added-Sept 2011

Q20 How do you do incremental load?

Fastest way to do incremental load is by using Timestamp column in source table and then storing last
ETL timestamp, In ETL process pick all the rows having Timestamp greater than the stored Timestamp
so as to pick only new and updated records

Q21 How to handle Late Arriving Dimension or Early Arriving Facts.


 Late arriving dimensions sometime get unavoidable 'coz delay or error in Dimension ETL or may be
due to logic of ETL. To handle Late Arriving facts, we can create dummy Dimension with
natural/business key and keep rest of the attributes as null or default.  And as soon as Actual dimension
arrives, the dummy dimension is updated with Type 1 change. These are also known as Inferred
Dimensions.

1) How to find out deployed packages?


C:\Programm File\Microsoft Sql Server \100\DTS\Package

4) Sequence Container?
The Sequence container defines a control flow that is a subset of the package control flow. Sequence
containers group the package into multiple separate control flows, each containing one or more tasks
and containers that run within the overall package control flow.

5) For Loop Ciontaineer?


The For Loop container defines a repeating control flow in a package. The loop implementation is
similar to the For looping structure in programming languages. In each repeat of the loop, the For Loop
container evaluates an expression and repeats its workflow until the expression evaluates to False.
The For Loop container usesthe following elements to define the loop:

 An optional initialization expression that assigns values to the loop counters.

 An evaluation expression that contains the expression used to test whether the loop should stop
or continue.

 An optional iteration expression that increments or decrements the loop counter.

6) For Each Loop Container?


The Foreach Loop container defines a repeating control flow in a package. The loop implementation is
similar to Foreach looping structure in programming languages. In a package, looping is enabled by
using a Foreach enumerator. The Foreach Loop container repeats the control flow for each member of a
specified enumerator.

IV. Create the job, schedule the job and run the job
In SQL Server Management Studio, highlight SQL Server Agent -> Start. Highlight Job ->New Job…,
name it , myJob.
Under Steps, New Step, name it, Step1,
Type: SQL Server Integration Service Package
Run as: myProxy
Package source: File System
Browse to select your package file xxx.dtsx
Click Ok
Schedule your job and enable it
 
Now you can run your job.

8) Script Task?

10) Fuzzy LooKup


The Fuzzy Lookup transformation performs data cleaning tasks such as standardizing data, correcting
data, and providing missing values.
The Fuzzy Lookup transformation differs from the Lookup transformation in its use of fuzzy matching.
The Lookup transformation uses an equi-join to locate matching records in the reference table. It returns
either an exact match or nothing from the reference table. In contrast, the Fuzzy Lookup transformation
uses fuzzy matching to return one or more close matches from the reference table.

11) Fuzzy Groupping?

12) Merge

13) Merge JOIN


The Merge Join transformation provides an output that is generated by joining two sorted datasets using
a FULL, LEFT, or INNER join. For example, you can use a LEFT join to join a table that includes
product information with a table that lists the country/region in which a product was manufactured. The
result is a table that lists all products and their country/region of origin.

You can configure the Merge Join transformation in the following ways:

 Specify the join is a FULL, LEFT, or INNER join.

 Specify the columns the join uses.


 Specify whether the transformation handles null values as equal to other nulls.

Q 4:SSIS includes logging features that write log entries when run-time events occur and can also write
custom messages.

Integration Services supports a diverse set of log providers, and gives you the ability to create custom
log providers. The Integration Services log providers can write log entries to text files, SQL Server
Profiler, SQL Server, Windows Event Log, or XML files.

Logs are associated with packages and are configured at the package level. Each task or container in a
package can log information to any package log. The tasks and containers in a package can be enabled
for logging even if the package itself is not.

To customize the logging of an event or custom message, Integration Services provides a schema of
commonly logged information to include in log entries. The Integration Services log schema defines the
information that you can log. You can select elements from the log schema for each log entry.

To enable logging in a package


1. In Business Intelligence Development Studio, open the Integration Services project that contains the
package you want.
2. On the SSIS menu, click Logging.
3. Select a log provider in the Provider type list, and then click Add.
Q5:

SQL Server 2005 Integration Services (SSIS) makes it simple to deploy packages to any computer.
There are two steps in the package deployment process:
-The first step is to build the Integration Services project to create a package deployment utility.
-The second step is to copy the deployment folder that was created when you built the Integration
Services project to the target computer, and then run the Package Installation Wizard to install the
packages.
Q9:

Variables store values that a SSIS package and its containers, tasks, and event handlers can use at run
time. The scripts in the Script task and the Script component can also use variables. The precedence
constraints that sequence tasks and containers into a workflow can use variables when their constraint
definitions include expressions.

Integration Services supports two types of variables: user-defined variables and system variables. User-
defined variables are defined by package developers, and system variables are defined by Integration
Services. You can create as many user-defined variables as a package requires, but you cannot create
additional system variables.

Scope :

A variable is created within the scope of a package or within the scope of a container, task, or event
handler in the package. Because the package container is at the top of the container hierarchy, variables
with package scope function like global variables and can be used by all containers in the package.
Similarly, variables defined within the scope of a container such as a For Loop container can be used by
all tasks or containers within the For Loop container.
Question 5 - Can you name some of the core SSIS components in the Business Intelligence
Development Studio you work with on a regular basis when building an SSIS package?
Connection Managers
Control Flow
Data Flow
Event Handlers
Variables window
Toolbox window
Output window
Logging
Package Configurations

Question 3 - Can you name 5 or more of the native SSIS connection managers?
OLEDB connection - Used to connect to any data source requiring an OLEDB connection (i.e., SQL
Server 2000)
Flat file connection - Used to make a connection to a single file in the File System. Required for reading
information from a File System flat file
ADO.Net connection - Uses the .Net Provider to make a connection to SQL Server 2005 or other
connection exposed through managed code (like C#) in a custom task
Analysis Services connection - Used to make a connection to an Analysis Services database or project.
Required for the Analysis Services DDL Task and Analysis Services Processing Task
File connection - Used to reference a file or folder. The options are to either use or create a file or folder
Excel
FTP
HTTP
MSMQ
SMO
SMTP
SQLMobile
WMI

Question 4 - How do you eliminate quotes from being uploaded from a flat file to SQL Server?
In the SSIS package on the Flat File Connection Manager Editor, enter quotes into the Text qualifier
field then preview the data to ensure the quotes are not included.
Additional information: How to strip out double quotes from an import file in SQL Server Integration
Services

Question 5 - Can you name 5 or more of the main SSIS tool box widgets and their functionality?
For Loop Container
Foreach Loop Container
Sequence Container
ActiveX Script Task
Analysis Services Execute DDL Task
Analysis Services Processing Task
Bulk Insert Task
Data Flow Task
Data Mining Query Task
Execute DTS 2000 Package Task
Execute Package Task
Execute Process Task
Execute SQL Task
etc.
Question 2 - Can you explain how to setup a checkpoint file in SSIS?
The following items need to be configured on the properties tab for SSIS package:
CheckpointFileName - Specify the full path to the Checkpoint file that the package uses to save the
value of package variables and log completed tasks. Rather than using a hard-coded path as shown
above, it's a good idea to use an expression that concatenates a path defined in a package variable and
the package name.
CheckpointUsage - Determines if/how checkpoints are used. Choose from these options: Never
(default), IfExists, or Always. Never indicates that you are not using Checkpoints. IfExists is the typical
setting and implements the restart at the point of failure behavior. If a Checkpoint file is found it is used
to restore package variable values and restart at the point of failure. If a Checkpoint file is not found the
package starts execution with the first task. The Always choice raises an error if the Checkpoint file
does not exist.
SaveCheckpoints - Choose from these options: True or False (default). You must select True to
implement the Checkpoint behavior.

Question 3 - Can you explain different options for dynamic configurations in SSIS?
Use an XML file
Use custom variables
Use a database per environment with the variables
Use a centralized database with all variables

Question 4 - How do you upgrade an SSIS Package?


Depending on the complexity of the package, one or two techniques are typically used:
Recode the package based on the functionality in SQL Server DTS
Use the Migrate DTS 2000 Package wizard in BIDS then recode any portion of the package that is not
accurate

Question 5 - Can you name five of the Perfmon counters for SSIS and the value they provide?
SQLServer:SSIS Service
SSIS Package Instances - Total number of simultaneous SSIS Packages running
SQLServer:SSIS Pipeline
BLOB bytes read - Total bytes read from binary large objects during the monitoring period.
BLOB bytes written - Total bytes written to binary large objects during the monitoring period.
BLOB files in use - Number of binary large objects files used during the data flow task during the
monitoring period.
Buffer memory - The amount of physical or virtual memory used by the data flow task during the
monitoring period.
Buffers in use - The number of buffers in use during the data flow task during the monitoring period.
Buffers spooled - The number of buffers written to disk during the data flow task during the monitoring
period.
Flat buffer memory - The total number of blocks of memory in use by the data flow task during the
monitoring period.
Flat buffers in use - The number of blocks of memory in use by the data flow task at a point in time.
Private buffer memory - The total amount of physical or virtual memory used by data transformation
tasks in the data flow engine during the monitoring period.
Private buffers in use - The number of blocks of memory in use by the transformations in the data flow
task at a point in time.
Rows read - Total number of input rows in use by the data flow task at a point in time.
Rows written - Total number of output rows in use by the data flow task at a point in time
New improvements / features in SSIS 2008

With the release of SQL SERVER 2008 comes improved SSIS 2008. I will try to list down the improved
and new features in SSIS 2008

1) Improved Parallelism of Execution Trees:

The biggest performance improvement in the SSIS 2008 is incorporation of parallelism in the
processing of execution tree. In SSIS 2005, each execution tree used a single thread whereas in SSIS
2008 , the Data flow engine is redesigned to utilize multiple threads and take advantage of dynamic
scheduling to execute multiple components in parallel, including components within the same execution
tree

2) Any .NET language for Scripting:

SSIS 2008 is incorporated with new Visual Studio Tool for Application(VSTA) scripting engine.
Advantage of VSTA is it enables user to use any .NET language for scripting.

3) New ADO.NET Source and Destination Component:

SSIS 2008 gets a new Source and Destination Component for ADO.NET Record sets.

4) Improved Lookup Transformation:

In SSIS 2008, the Lookuo Transformation has faster cache loading and lookup operations. It has new
caching options, including the ability for the reference dataset to use a cache file(.caw) accessed by the
Cache Connectin Manager. In addition same cache can be shared between multiple Lookup
Transformations.

5) New Data Profiling Task:

SSIS 2008 has a new debugging aid Data Profiling Task that can help user analyze the data flows
occurring in the package.In many cases, execution errors are caused by unexpected variations in the data
that is being transferred. The Data Profiling Task can help users to discover the cource of these errors by
giving better visibility into the data flow.

6) New Connections Project Wizard:

One of the main usability enhancement to SSIS 2008 is the new Connections Project Wizard. The
Connections Project Wizard guides user through the steps required to create source and destinations

Q: What are the tools associated with SSIS?


We use Business Intelligence Development Studio (BIDS) and SQL Server Management Studio (SSMS)
to work with Development of SSIS Projects.
We use SSMS to manage the SSIS Packages and Projects.
Q: What are the differences between DTS and SSIS
Data Transformation Services SQL Server Integration Services

Limited Error Handling Complex and powerful Error Handling


Message Boxes in ActiveX Scripts Message Boxes in .NET Scripting

No Deployment Wizard Interactive Deployment Wizard

Limited Set of Transformation Good number of Transformations

NO BI functionality Complete BI Integration

 
Q: What are variables and what is variable scope ?
Variables store values that a SSIS package and its containers, tasks, and event handlers can use at run
time. The scripts in the Script task and the Script component can also use variables. The precedence
constraints that sequence tasks and containers into a workflow can use variables when their constraint
definitions include expressions. Integration Services supports two types of variables: user-defined
variables and system variables. User-defined variables are defined by package developers, and system
variables are defined by Integration Services. You can create as many user-defined variables as a
package requires, but you cannot create additional system variables.

Q: Can you name five of the Perfmon counters for SSIS and the value they provide?
 SQLServer:SSIS Service
 SSIS Package Instances
 SQLServer:SSIS Pipeline
 BLOB bytes read
 BLOB bytes written
 BLOB files in use
 Buffer memory
 Buffers in use
 Buffers spooled
 Flat buffer memory
 Flat buffers in use
 Private buffer memory
 Private buffers in use
 Rows read
 Rows written

Question: Have you used SSIS Framework?


Comment: This is common term in SSIS world which just means that you have templates that are set up
to perform routine tasks like logging, error handling etc. Yes answer would usually indicate experienced
person, no answer is still fine if your project is not very mission critica

• Can you name 5 or more of the native SSIS connection managers?


1) OLEDB connection – Used to connect to any data source requiring an OLEDB connection (i.e.,
SQL Server 2000)
2) Flat file connection – Used to make a connection to a single file in the File System. Required for
reading information from a File System flat file
3) ADO.Net connection – Uses the .Net Provider to make a connection to SQL Server 2005 or other
connection exposed through managed code (like C#) in a custom task
4) Analysis Services connection – Used to make a connection to an Analysis Services database or
project. Required for the Analysis Services DDL Task and Analysis Services Processing Task
5) File connection – Used to reference a file or folder. The options are to either use or create a file or
folder
6) Excel

• What is the use of Bulk Insert Task in SSIS?


Bulk Insert Task is used to upload large amount of data from flat files into Sql Server. It supports only
OLE DB connections for destination database.
• What is Conditional Split transformation in SSIS?
This is just like IF condition which checks for the given condition and based on the condition
evaluation, the output will be sent to the appropriate OUTPUT path. It has ONE input and MANY
outputs. Conditional Split transformation is used to send paths to different outputs based on some
conditions. For example, we can organize the transform for the students in a class who have marks
greater than 40 to one path and the students who score less than 40 to another path.
• How do you eliminate quotes from being uploaded from a flat file to SQL Server? 
This can be done using TEXT QUALIFIER property. In the SSIS package on the Flat File Connection
Manager Editor, enter quotes into the Text qualifier field then preview the data to ensure the quotes are
not included.
• Can you explain how to setup a checkpoint file in SSIS?
The following items need to be configured on the properties tab for SSIS package:
CheckpointFileName – Specify the full path to the Checkpoint file that the package uses to save the
value of package variables and log completed tasks. Rather than using a hard-coded path as shown
above, it’s a good idea to use an expression that concatenates a path defined in a package variable and
the package name.
CheckpointUsage – Determines if/how checkpoints are used. Choose from these options:
Never(default), IfExists, or Always. Never indicates that you are not using Checkpoints. IfExists is the
typical setting and implements the restart at the point of failure behavior. If a Checkpoint file is found it
is used to restore package variable values and restart at the point of failure. If a Checkpoint file is not
found the package starts execution with the first task. The Always choice raises an error if the
Checkpoint file does not exist.
SaveCheckpoints – Choose from these options: True or False (default). You must select True to
implement the Checkpoint behavior.
• What are the different values you can set for CheckpointUsage property ?
There are three values, which describe how a checkpoint file is used during package execution:
1) Never: The package will not use a checkpoint file and therefore will never restart.
2) If Exists: If a checkpoint file exists in the place you specified for the CheckpointFilename property,
then it will be used, and the package will restart according to the checkpoints written.
3) Always: The package will always use a checkpoint file to restart, and if one does not exist, the
package will fail.
• What is the ONLY Property you need to set on TASKS in order to configure CHECKPOINTS
to RESTART package from failure?
The one property you have to set on the task is FailPackageOnFailure. This must be set for each task
or container that you want to be the point for a checkpoint and restart. If you do not set this property to
true and the task fails, no file will be written, and the next time you invoke the package, it will start from
the beginning again.
• Where can we set the CHECKPOINTS, in DataFlow or ControlFlow ?
Checkpoints only happen at the Control Flow; it is not possible to checkpoint transformations or restart
inside a Data Flow. The Data Flow Task can be a checkpoint, but it is treated as any other task.
• Can you explain different options for dynamic configurations in SSIS?
1) XML file
2) custom variables
3) Database per environment with the variables
4) Use a centralized database with all variables
• What is the use of Percentage Sampling transformation in SSIS?
Percentage Sampling transformation is generally used for data mining. This transformation builds a
random sample of set of output rows by choosing specified percentage of input rows. For example if the
input has 1000 rows and if I specify 10 as percentage sample then the transformation returns 10% of the
RANDOM records from the input data.
• What is the use of Term Extraction transformation in SSIS?
Term Extraction transformation is used to extract nouns or noun phrases or both noun and noun phrases
only from English text. It extracts terms from text in a transformation input column and then writes the
terms to a transformation output column. It can be also used to find out the content of a dataset.
• What is Data Viewer and what are the different types of Data Viewers in SSIS?
A Data Viewer allows viewing data at a point of time at runtime. If data viewer is placed before and
after the Aggregate transform, we can see data flowing to the transformation at the runtime and how it
looks like after the transformation occurred. The different types of data viewers are:
1. Grid
2. Histogram
3. Scatter Plot
4. Column Chart.

• What is Ignore Failure option in SSIS?


In Ignore Failure option, the error will be ignored and the data row will be directed to continue on the
next transformation. Let’s say you have some JUNK data(wrong type of data or JUNK data) flowing
from source, then using this option in SSIS we can REDIRECT the junk data records to another
transformation instead of FAILING the package. This helps to MOVE only valid data to destination and
JUNK can be captured into separate file.
• Which are the different types of Control Flow components in SSIS?
The different types of Control Flow components are: Data Flow Tasks, SQL Server Tasks, Data
Preparation Tasks, Work flow Tasks, Scripting Tasks, Analysis Services Tasks, Maintenance Tasks,
Containers.
• What are containers? What are the different types of containers in SSIS?
Containers are objects that provide structures to packages and extra functionality to tasks. There are four
types of containers in SSIS, they are: Foreach Loop Container, For Loop Container, Sequence Container
and Task Host Container.
• What are the different types of Data flow components in SSIS?
There are 3 data flow components in SSIS.
1. Sources
2. Transformations
3. Destinations
• What are the different types of data sources available in SSIS?
There are 7 types of data sources provided by SSIS: a.) Data Reader source b.) Excel source c.) Flat file
source d.) OLEDB source e.) Raw file source f.) XML source g.) Script component
• What is SSIS Designer?
It is a graphical tool for creating packages. It has 4 tabs: Control Flow, Data Flow, Event Handlers and
Package Explorer.
• What is the function of Event handlers tab in SSIS?
On the Event handlers tab, workflows can be configured to respond to package events.
For example, we can configure Work Flow when ANY task Failes or Stops or Starts ..
• What is the function of Package explorer tab in SSIS?
This tab provides an explorer view of the package. You can see what is happening in the package. The
Package is a container at the top of the hierarchy.
• What is Solution Explorer?
It is a place in SSIS Designer where all the projects, Data Sources, Data Source Views and other
miscellaneous files can be viewed and accessed for modification.
• How do we convert data type in SSIS?
The Data Conversion Transformation in SSIS converts the data type of an input column to a different
data type.
• How are variables useful in ssis package?
Variables can provide communication among objects in the package. Variables can provide
communication between parent and child packages. Variables can also be used in expressions and
scripts. This helps in providing dynamic values to tasks.
• Explain Aggregate Transformation in SSIS?
It aggregates data, similar you do in applying TSQL functions like Group By, Min, Max, Avg, and
Count. For example you get total quantity and Total line item for each product in Aggregate
Transformation Editor. First you determine input columns, then output column name in Output Alias
table in datagrid, and also operations for each Output Alias in Operation columns of the same datagrid.
Some of operation functions listed below :
• Group By
• Average
• Count
• Count Distinct : count distinct and non null column value
• Min, Max, Sum
In Advanced tab, you can do some optimization here, such as setting up Key Scale option (low,
medium, high), Count Distinct scale option (low, medium, high), Auto Extend factor and Warn On
Division By Zero. If you check Warn On Division By Zero, the component will give warning instead of
error. Key Scale option will optimize transformation cache to certain number of key threshold. If you set
it low, optimization will target to 500,000 keys written to cache, medium can handle up to 5 million
keys, and high can handle up to 25 million keys, or you can specify particular number of keys here.
Default value is unspecified. Similar to number of keys for Count Distinct scale option. It is used to
optimize number of distinct value written to memory, default value is unspecified. Auto Extend Factor
is used when you want some portion of memory is used for this component. Default value is 25% of
memory.
• Explain Audit Transformation ?
It allows you to add auditing information as required in auditing world specified by HIPPA and
Sarbanes-Oxley (SOX). Auditing options that you can add to transformed data through this
transformation are :
1. Execution of Instance GUID : ID of execution instance of the package
2. PackageID : ID of the package
3. PackageName
4. VersionID : GUID version of the package
5. Execution StartTime
6. MachineName
7. UserName
8. TaskName
9. TaskID : uniqueidentifier type of the data flow task that contains audit transformation.
• Explain Character Map Transformation?
It transforms some character. It gives options whether output result will override the existing column or
add to new column. If you define it as new column, specify new column name. Operations available
here are:
1. Uppercase
2. Lowercase
3. Byte reversal : such as from 0×1234 to 0×4321
4. Full width
5. Half width
6. Hiragana/katakana/traditional Chinese/simplified Chinese
7. Linguistic casing
• Explain Conditional split Transformation ?
It functions as if…then…else construct. It enables send input data to a satisfied conditional branch. For
example you want to split product quantity between less than 500 and greater or equal to 500. You can
give the conditional a name that easily identifies its purpose. Else section will be covered in Default
Output Column name.
After you configure the component, it connect to subsequent transformation/destination, when
connected, it pops up dialog box to let you choose which conditional options will apply to the
destination transformation/destination.
• Explain Copy column Transformation?
This component simply copies a column to another new column. Just like ALIAS Column in T-Sql.
• Explain Data conversion Transformation?
This component does conversion data type, similar to TSQL function CAST or CONVERT. If you wish
to convery the data from one type to another then this is the best bet. But please make sure that you have
COMPATABLE data in the column.
• Explain Data Mining query Transformation?
This component does prediction on the data or fills gap on it. Some good scenarios uses this component
is:
1. Take some input columns as number of children, domestic income, and marital income to predict
whether someone owns a house or not.
2. Take prediction what a customer would buy based analysis buying pattern on their shopping cart.
3. Filling blank data or default values when customer doesn’t fill some items in the questionnaire.
• Explain Derived column Transformation?
Derived column creates new column or put manipulation of several columns into new column. You can
directly copy existing or create a new column using more than one column also.
• Explain Merge Transformation?
Merge transformation merges two paths into single path. It is useful when you want to break out data
into path that handles errors after the errors are handled, the data are merge back into downstream or
you want to merge 2 data sources. It is similar with Union All transformation, but Merge has some
restrictions :
1. Data should be in sorted order
2. Data type , data length and other meta data attribute must be similar before merged.
• Explain Merge Join Transformation?
Merge Join transformation will merge output from 2 inputs and doing INNER or OUTER join on the
data. But if you the data come from 1 OLEDB data source, it is better you join through SQL query
rather than using Merge Join transformation. Merge Join is intended to join 2 different data source.
• Explain Multicast Transformation?
This transformation sends output to multiple output paths with no conditional as Conditional Split does.
Takes ONE Input and makes the COPY of data and passes the same data through many outputs. In
simple Give one input and take many outputs of the same data.
• Explain Percentage and row sampling Transformations?
This transformation will take data from source and randomly sampling data. It gives you 2 outputs. First
is selected data and second one is unselected data. It is used in situation where you train data mining
model. These two are used to take the SAMPLE of data from the input data.
• Explain Sort Transformation?
This component will sort data, similar in TSQL command ORDER BY. Some transformations need
sorted data.
• What r the possible locations to save SSIS package?
You can save a package wherever you want.
SQL Server
Package Store
File System
• What is Design time Deployment in SSIS ?
When you run a package from with in BIDS,it is built and temporarily deployed to the folder. By default
the package will be deployed to the BIN folder in the Package’s Project folder and you can configure for
custom folder for deployment. When the Package’s execution is completed and stopped in BIDS,the
deployed package will be deleted and this is called as Design Time Deployment.

1. ODBC Support
The ODBC support is becoming first class now I guess because of the future full integration with
Hadoop and an increased demand to integrate more easily with various open source platforms. So I
guess the days when you will be able to easily connect to a Linux machine from a SQL Server are
coming. Attunity connectors also get more readily available and covering more vendors.

2. Change Data Capture for SSIS


The Change Data Capture (CDC) is not new to SQL Server, but it is a new kind of an animal to SSIS:

Now with CDC one can easily capture the changes in data sources and provide them for reporting, data
analysis or feed into the Data Warehouse.

3. Support for Variable Number of Columns in a Flat File


This is a productivity enhancement that potentially pays for a good portion of the upgrade fee (IMHO). I
just happen to see how many developers stumble upon such a reality unable to overcome this barrier
resorting to various online forums or blogs. No longer!

If you see a file as depicted below:

No fear, it will be understood by the SSIS engine and handled without incidents:

Hooray! No more time wasted and scratching your head!

4. Revamped Configurations
This is another big improvement.

Did you ever wonder why you deployed a package and it took the design time parameters? Did you
struggle to deploy your config files or a database along with the package?
No longer! You now can have several configurations, for Dev and Prod, no problem. If you envied your
fellow C# or VB .Net developer being able to store parameters right in the Visual Studio, no more, now
you can, too. As an aside, there is no more BIDS, there is the new Data Tools, but to me it is a Visual
Studio 2010, I just develop special projects in it, and it is a 1st class tool! And how about this: you can
even add parameters after the package has been deployed? Do you feel thrilled as me? Not yet, then
how about the possibility of sharing parameters across many packages within a project?

5. Script Component – you can debug it, finally!


If your heart is not beating faster by now, then let’s recall how much you struggled to find out why a
Script Component does not work as expected? A value, or worse yet, three are not right?

Remember? No? I do, I remember how I needed to build a console app till 10 PM to just solve the
mystery why the values were wrong sitting along in the office biting nails because at midnight a
package just had to load the latest flight data. I wish I could just debug the mysterious component with
400 lines of code. Sigh and smile, now I will:

Better yet, all my runtime values are captured. Did I say it is a Visual Studio?

6. SSIS Package Format Changed and the Specs are Open Source!
Bye-bye the lineage IDs, cryptic, long XML! Hello comparable, mergable packages!

vs. 
Easily compare packages with Diff tools now! Full specs are at: http://msdn.microsoft.com/en-
us/library/gg587140.aspx

7. Built-in Reporting
Yes, there will be three canned reports provided for You, dear developer to benchmark, troubleshoot
and just better support a live implementation:
8. Data Taps
This is totally new: have you ever been asked to fix a package with no rights to access the data source? I
had such an “opportunity”, their DBA just shrugged off my requests to provide with a read only
account. But now you are more in control, you can now turn on and off small data dumps to a CSV file
for an ad-hock analysis. Those, most often, are instrumental in finding metadata differences and thus
allowing a real quick fix to many issues. More on this topic is here: http://goo.gl/AUBP5

9. Deploying Projects from Visual Studio


Yes, like I said, Visual Studio is the centerpiece to developing and deploying a SSIS solution. Now you
need to think more project oriented as a result, so there is a bit of paradigm shift, or I would say you
need to think of a project as unit more than of a package now in SSIS 2012 (for those not ready for the
change the old deployment model still works, so not to worry).

So what is different, actually all and more simple, you just deploy with a right-click on the project, no
more fiddling around with the Deployment manifest or manual copy and paste, import, etc.

The configurations are taken care of automatically!


(picture is taken from Rafael Salas blog http://www.rafael-salas.com/2012/01/ssis-2012-project-
deployment-model-and.html).

10. Manage with PowerShell


Did I mention about the PowerShell book at the beginning of the post? I did this on purpose  . SSIS
2012 provides with 1st class support to managing the SSIS indices as the SSIS catalog, package
deployment and maintenance. You can craft and automate most tasks using an editor, just reference the
needed module:

There are also the APIs to validate a package, configure and deploy a package:

Oh, I have just already covered 10 improvements, wait but there are more:

 Un-do and Re-do are now possible (I can hear the wow!);
 New designer surface (AKA canvas) with adorners
 Shared (across a project) Connection Managers (no more click and copy, pastes)!
 Shared (across packages in project) Cache Managers
 Do you remember the dreaded errors all over the package after some metadata changed? Now
you can resolve them all up the stream with a single click!
 Group items to reduce clutter without resorting to sequence containers:

 The ability to rollback to an older (and working) version of a package:

I can hear the applause…

OK for now!

I hope I wet your appetite enough to go and explore the features yourself. And to stay always tuned do
not forget too bookmark the aggregated SSIS Resources page: http://goo.gl/2WZxp!

Data Ware house


1. What are the fixed measure and calculated measure?
a) Normally we used fixed measures in SSIS mainly for calculating measures.
Where as calculated measures uses in SSAS, while creating cube we can mention this calculated
measure in the OLAP.

2. What are measures?


a) Measures are numeric data based on columns in a fact table.

3. What are cubes?


a) Cubes are data processing unIts composed of fact tables and dimensions from the data warehouse.
They provided multidimensional analysis.

4. What are virtual cubes?


These are combination of one or more real cubes and require no disk space to store them. They store
only definItion and not the data.

DATAWARE HOUSE CONCEPTS:-


1. Diff b/w OLTP AND OLAP?
OLTP OLAP
_________________________________________
1.transactional processing 1.query processing
2.time sensItive 2.history oriented
3. Operator & clerks view 3.Managers, CEOs, PM’s views
4. organized by transaction 4.organized by subjects
(Order, input, inventory) (product, customer)
5.relatively smaller DB 5.large DB size
6.volatile data 6.non-volatile
7.stores all data 7.stores relevant data
8. Not flexible 8.flexible

Diff b/w star schema and snowflake?


Star Shcema Snow Flake

centrally located fact table Centrally located fact table


surrounded by de normalise

surrounded by de normalise surrounded by the normalized


Dimensions

all dimensions will be link all dim link with each other (or)directly with
fact table. 1-N relationship with other table.

It is easy to understand by It is diff to understand end user or tech people

It is diff to retrieve the data while We can easily retrieve data parsing the query
against the facts n dim.

We can easily retrieve data parsing the query more joins.


against the facts n dim. ance because It involve less
Joins.

It involve less
Joins.

What are fact tables?


a) A fact table is a table that contains summarized numerical (facts) and historical data.
This fact table has a foreign key-primary key relation wIth a dimension table. the fact table maintains
the information in 3rd normal form.
3. Types of facts?
1. AddItive:-able to add the facts along wIth all the dimensions
-discrete numerical measures.
-Ex:-retail sales in $
2. semi addItive:-snapshot taken at a point in time
- Measure of intensIty
-not addItive along time dimensions
ex:-account balance, inventory balance
3.non-addItive:-numerical measures that can't be added across any dimensions.
-intensIty measure arranged across all dimension
ex:-room temperatures, averages

4. Data warehouse?
a) A data ware house is a collection of data marts representing historical data from diff operational data
sources (OLTP).
The data from these OLTP are structured and optimized for querying and data analysis in a data
warehouse.
5. Data mart?
a) A data mart is a subset of a data warehouse that can provide data for reporting and analysis
on a section, unIt or a department like sales dept, hr dept.

6. What is OLAP?
a) OLAP stands for online analytical processing. It uses databases tables (fact and dimension
table) to enable multi dimensional viewing, analysis and querying of large amount of data.

7. What is OLTP?
a) OLTP stands for online transactional processing. Except data warehouse databases the other
databases are OLTP.
These OLTP uses normalized schema structure.
These OLTP databases are designed for recording the daily operations and transactions of a
business.

8. What are dimensions?


Dimensions are categories by which summarized data can be viewed. For example a profIt
summary fact table can be viewed by a time dimension.

9. What are conformed dimension?


a) The dimensions which are reusable and fixed in nature. Example customer, time, geography
dimensions.

10. Staging area?


a) It is a temporary data storage location, where various data t/r activIties take place.

11. Fact grain(granularIty)?


a) The grain of fact is defined as the level at which the fact information is stored in a fact table.

12. What is a fact less fact table?


a) The fact table which does not contain facts is called as fact table.
Generally when we need to combine two data marts, then one data mart will have a fact less
fact table and other one wIth common fact table.

13. What are measures?


a) Measures are numeric data based on columns in a fact table.

14. What are cubes?


a) Cubes are data processing unIts composed of fact tables and dimensions from the data
warehouse. They provided multidimensional analysis.

15. What are virtual cubes?


These are combination of one or more real cubes and require no disk space to store them. they
store only definItion and not hte data.
16.SCD's?
a)
type-I(current data)
type-II(full historical information& Current data)
type-III(Current data & Recent data)

SQL-SERVER-2005:-

1. Surrogate key?
a)It is an artificial or synthetic key that is used as a substItute for a natural keys.
It is just a unique identifier or number for each row that can be used for the primary key to the
table.
(It is a sequence generate key which is assigned to be a primary key in the system(table)).

What is data mart?

Data marts are generally designed for a single subject area. An organization may have data pertaining to
different departments like Finance, HR, Marketting etc. stored in data warehouse and each department
may have separate data marts. These data marts can be built on top of the data warehouse.

What is ER model?

ER model or entity-relationship model is a particular methodology of data modeling wherein the goal of
modeling is to normalize the data by reducing redundancy. This is different than dimensional modeling
where the main goal is to improve the data retrieval mechanism.

What is dimensional modeling?

Dimensional model consists of dimension and fact tables. Fact tables store different transactional
measurements and the foreign keys from dimension tables that qualifies the data. The goal of
Dimensional model is not to achive high degree of normalization but to facilitate easy and faster data
retrieval.

Ralph Kimball is one of the strongest proponents of this very popular data modeling technique which is
often used in many enterprise level data warehouses.

If you want to read a quick and simple guide on dimensional modeling, please check our Guide to
dimensional modeling.
What is dimension?

A dimension is something that qualifies a quantity (measure).

For an example, consider this: If I just say… “20kg”, it does not mean anything. But if I say, "20kg of
Rice (Product) is sold to Ramesh (customer) on 5th April (date)", then that gives a meaningful sense.
These product, customer and dates are some dimension that qualified the measure - 20kg.

Dimensions are mutually independent. Technically speaking, a dimension is a data element that
categorizes each item in a data set into non-overlapping regions.

What is Fact?

A fact is something that is quantifiable (Or measurable). Facts are typically (but not always) numerical
values that can be aggregated.

What are additive, semi-additive and non-additive measures?

Non-additive Measures

Non-additive measures are those which can not be used inside any numeric aggregation function (e.g.
SUM(), AVG() etc.). One example of non-additive fact is any kind of ratio or percentage. Example, 5%
profit margin, revenue to asset ratio etc. A non-numerical data can also be a non-additive measure when
that data is stored in fact tables, e.g. some kind of varchar flags in the fact table.

Semi Additive Measures

Semi-additive measures are those where only a subset of aggregation function can be applied. Let’s say
account balance. A sum() function on balance does not give a useful result but max() or min() balance
might be useful. Consider price rate or currency rate. Sum is meaningless on rate; however, average
function might be useful.
Additive Measures

Additive measures can be used with any aggregation function like Sum(), Avg() etc. Example is Sales
Quantity etc.

SSAS

SSAS - SQL Server Analysis Services


Q: What is Analysis Services? List out the features?
Microsoft SQL Server 2005 Analysis Services (SSAS) delivers online analytical processing (OLAP) and
data mining functionality for business intelligence applications. Analysis Services supports OLAP by
letting we design, create, and manage multidimensional structures that contain data aggregated from
other data sources, such as relational databases. For data mining applications, Analysis Services lets we
design, create, and visualize data mining models that are constructed from other data sources by using a
wide variety of industry-standard
data mining algorithms.
Analysis Services is a middle tier server for analytical processing, OLAP, and Data mining. It manages
multidimensional cubes of data and provides access to heaps of information including aggregation of
data. One can create data mining models from data sources and use it for Business Intelligence also
including reporting features.
Analysis service provides a combined view of the data used in OLAP or Data mining. Services here
refer to OLAP, Data mining. Analysis services assists in creating, designing and managing
multidimensional structures containing data from varied sources. It provides a wide array of data mining
algorithms for specific trends and needs.
Some of the key features are:
 Ease of use with a lot of wizards and designers.
 Flexible data model creation and management
 Scalable architecture to handle OLAP
 Provides integration of administration tools, data sources, security, caching, and reporting etc.
 Provides extensive support for custom applications
Q: What is UDM? Its significance in SSAS?
The role of a Unified Dimensional Model (UDM) is to provide a bridge between the user and the data
sources. A UDM is constructed over one or more physical data sources, and then the end user issues
queries against the UDM using one of a variety of client tools, such as Microsoft Excel. At a minimum,
when the UDM is constructed merely as a thin layer over the data source, the advantages to the end user
are a simpler, more readily understood model of the data, isolation from heterogeneous backend data
sources, and improved performance for summary type queries. In some scenarios a simple UDM like
this is constructed totally automatically. With greater investment in the construction of the UDM,
additional benefits accrue from the richness of metadata that the model can provide.
The UDM provides the following benefits:
• Allows the user model to be greatly enriched.
• Provides high performance queries supporting interactive analysis, even over huge data volumes.
• Allows business rules to be captured in the model to support richer analysis.
Q: What is the need for SSAS component?
 Analysis Services is the only component in SQL Server using which we can perform Analysis
and Forecast operations.
 SSAS is very easy to use and interactive.
 Faster Analysis and Troubleshooting.
 Ability to create and manage Data warehouses.
 Apply efficient Security Principles.
Q: Explain the TWO-Tier Architecture of SSAS?
 SSAS uses both server and client components to supply OLAP and data mining functionality BI
Applications.
 The server component is implemented as a Microsoft Windows service. Each instance of
Analysis Services implemented as a separate instance of the Windows service.
 Clients communicate with Analysis Services using the standard the XMLA (XML For
Analysis) , protocol for issuing commands and receiving responses, exposed as a web service.
Q: What are the components of SSAS?
 An OLAP Engine is used for enabling fast ad hoc  queries by end users. A user can interactively
explore data by drilling, slicing or pivoting.
 Drilling refers to the process of exploring details of the data.
 Slicing refers to the process of placing data in rows and columns.
 Pivoting refers to switching categories of data between rows and columns.
 In OLAP, we will be using what are called as Dimensional Databases.
Q: What is FASMI ?
A database is called a OLAP Database if the database satisfies the FASMI  rules :
 Fast Analysis– is defined in the OLAP scenario in five seconds or less.
 Shared – Must support access  to data by many users in  the factors of Sensitivity and Write
Backs.
 Multidimensional – The data inside the OLAP Database must be multidimensional in structure.
 Information – The OLAP database Must support large volumes of data..

Q: What languages are used in SSAS ?
 Structured Query Language (SQL)
 Multidimensional Expressions (MDX) - an industry standard query language orientated towards
analysis
 Data Mining Extensions (DMX) - an industry standard query language oriented toward data
mining.
 Analysis Services Scripting Language (ASSL) - used to manage Analysis Services database
objects.
Q: How Cubes are implemented in SSAS ?
 Cubes are multidimensional models that store data from one or more sources.
 Cubes can also store aggregations
 SSAS Cubes are created using the Cube Wizard.
 We also build Dimensions when creating Cubes.
 Cubes can see only the DSV( logical View).
Q: What is the difference between a derived measure and a calculated measure?
The difference between a derived measure and a calculated measure is when the calculation is
performed. A derived measure is calculated before aggregations are created, and the values of the
derived measure are stored in the cube. A calculated measure is calculated after aggregations are
created, and the values of a calculated measure aren’t stored in the cube. The primary criterion for
choosing between a derived measure and a calculated measure is not efficiency, but accuracy.
Q: What is a partition?
A partition in Analysis Services is the physical location of stored cube data. Every cube has at least one
partition by default. Each time we create a measure group, another partition is created. Queries run
faster against a partitioned cube because Analysis Services only needs to read data from the partitions
that contain the answers to the queries. Queries run even faster when partition also stores aggregations,
the pre calculated totals for additive measures. Partitions are a powerful and flexible means of managing
cubes, especially large cubes.
Q: While creating a new calculated member in a cube what is the use of property
called non-empty behavior?
Nonempty behavior is important property for ratio calculations. If the denominator Is empty, an MDX
expression will return an error just as it would if the denominator Were equal to zero. By selecting one
or more measures for the Non-Empty Behavior property, we are establishing a requirement that each
selected measure first be evaluated before the calculation expression is evaluated. If each selected
measure is empty, then The expression is also treated as empty and no error is returned.

Q: What is a RAGGED hierarchy?


Under normal circumstances, each level in a hierarchy in Microsoft SQL Server 2005 Analysis Services
(SSAS) has the same number of members above it as any other member at the same level. In a ragged
hierarchy, the logical parent member of at least one member is not in the level immediately above the
member. When this occurs, the hierarchy descends to different levels for different drilldown paths.
Expanding through every level for every drilldown path is then unnecessarily complicated.

Q: What are the roles of an Analysis Services Information Worker?


The role of an Analysis Services information worker is the traditional "domain expert" role in business
intelligence (BI) someone who understands the data employed by a solution and is able to translate the
data into business information. The role of an Analysis Services information worker often has one of the
following job titles: Business Analyst (Report Consumer), Manager (Report Consumer), Technical
Trainer, Help Desk/Operation, or Network Administrator.
Q: What are the different ways of creating Aggregations?    d
We can create aggregations for faster MDX statements using Aggregation Wizard or thru UBO – Usage
Based Optimizations. Always, prefer UBO method in realtime performance troubleshooting.
Q: What is WriteBack? What are the pre-conditions?
The Enable/Disable Writeback dialog box enables or disables writeback for a measure group in a cube.
Enabling writeback on a measure group defines a writeback partition and creates a writeback table for
that measure group. Disabling writeback on a measure group removes the writeback partition but does
not delete the writeback table, to avoid unanticipated data loss.
Q: What is processing?
Processing is a critical and resource intensive operation in the data warehouse lifecycle and needs to be
carefully optimized and executed. Analysis Services 2005 offers a high performance and scalable
processing architecture with a comprehensive set of controls for database administrators.
We can process an OLAP database, individual cube, Dimension or a specific Partition in a cube.
 
Q: Name few Business Analysis Enhancements for SSAS?
The following table lists the business intelligence enhancements that are available in Microsoft SQL
Server 2005 Analysis Services (SSAS). The table also shows the cube or dimension to which each
business intelligence enhancement applies, and indicates whether an enhancement can be applied to an
object that was created without using a data source and for which no schema has been
generated.

Enhancement Type Applied to No data source


Time Intelligence Cube Cube No
Account Intelligence Dimension Dimension or cube No
Dimension Intelligence Dimension Dimension or cube Yes
Dimension (unary operator) or
Custom Aggregation Dimension No
cube
Semiadditive Behavior Cube Cube Yes>
Custom Member Formula Dimension Dimension or cube No
Custom Sorting and Uniqueness
Dimension Dimension or cube Yes
Settings
Dimension Writeback Dimension Dimension or cube Yes
 
Q: What MDX functions do you most commonly use?
This is a great question because you only know this answer by experience.  If you ask me this question,
the answer practically rushes out of me.  “CrossJoin, Descendants, and NonEmpty, in addition to Sum,
Count, and Aggregate.  My personal favorite is CrossJoin because it allows me identify non-contiguous
slices of the cube and aggregate even though those cube cells don’t roll up to a natural ancestor.” 
Indeed, CrossJoin has easily been my bread and butter.
Q: Where do you put calculated members?
The reflexive answer is “in the Measures dimension” but this is the obvious answer.  So I always follow
up with another question.  “If you want to create a calculated member that intersects all measures, where
do you put it?”  A high percentage of candidates can’t answer this question, and the answer is “In a
dimension other than Measures.”  If they can answer it, I immediately ask them why.  The answer is
“Because a member in a dimension cannot intersect its own relatives in that dimension.”
Q: How do I find the bottom 10 customers with the lowest sales in 2003 that were not null?

A: Simply using bottomcount will return customers with null sales. You will have to combine it with
NONEMPTY or FILTER.

SELECT { [Measures].[Internet Sales Amount] } ON COLUMNS ,


BOTTOMCOUNT(
NONEMPTY(DESCENDANTS( [Customer].[Customer Geography].[All Customers]
, [Customer].[Customer Geography].[Customer] )
, ( [Measures].[Internet Sales Amount] ) )
, 10
, ( [Measures].[Internet Sales Amount] )
) ON ROWS
FROM [Adventure Works]
WHERE ( [Date].[Calendar].[Calendar Year].&[2003] ) ;
Q: How in MDX query can I get top 3 sales years based on order quantity?

By default Analysis Services returns members in an order specified during attribute design. Attribute
properties that define ordering are "OrderBy" and "OrderByAttribute". Lets say we want to see order
counts for each year. In Adventure Works MDX query would be:

SELECT {[Measures].[Reseller Order Quantity]} ON 0


, [Date].[Calendar].[Calendar Year].Members ON 1
FROM [Adventure Works];

Same query using TopCount:


SELECT
{[Measures].[Reseller Order Quantity]} ON 0,
TopCount([Date].[Calendar].[Calendar Year].Members,3, [Measures].[Reseller Order Quantity]) ON 1
FROM [Adventure Works];
Q: How do you extract first tuple from the set?

Use could usefunction Set.Item(0)


Example:

SELECT {{[Date].[Calendar].[Calendar Year].Members


}.Item(0)}
ON 0
FROM [Adventure Works]
Q: How can I setup default dimension member in Calculation script?

You can use ALTER CUBE statement. Syntax:


ALTER CUBE CurrentCube | YourCubeName UPDATE DIMENSION , DEFAULT_MEMBER='';
Question: What is XMLify component?
Comment: It is 3rd party free component used rather frequently to output errors into XML field which
saves development time.

Question: What command line tools do you use with SSIS ?


Comment: dtutil (deployment), dtexec (execution), dtexecui (generation of execution code)

2) How to pass parameters from Parent to Chieled Package?


When using “Execute Package Task” in SSIS control flow, you can’t pass in any configuration
options or variables to the child package. Here is a workaround to do that:
1. In the parent package, create a variable for each variable you want to set in child package. It
must be defined as a global variable.

2. In the child package, create variables to correspond to the ones you created in the Parent
package (e.g. “ChildVar” to correspond to “ParentVar”). Important: assign these variables valid
default values, otherwise you will not be able to run the child package standalone.

3. In the child package, open the package Configurations. Add a new configuration, and select the
“Parent package variable”. Enter the variable name you added to the parent package (in our
example “ParentVar”). Note that variables are case sensitive. Assign this to the variable of the

child package (”ChildVar”).


Now when you run the parent package, it will pass the value of the “ParentVar” to the “ChildVar”.
When you run the child package directly (without a parent package), it will use the default value of the
“ChildVar”.

3) How to log Events?


 In SQL Server Data Tools, open the Integration Services project that contains the package you want.

 On the SSIS menu, click Log Events. You can optionally display the Log Events window by
mapping the View.LogEvents command to a key combination of your choosing on the Keyboard page
of the Options dialog box.

 On the Debug menu, click Start Debugging.

As the runtime encounters the events and custom messages that are enabled for logging, log entries for
each event or message are written to the Log Events window.

 On the Debug menu, click Stop Debugging.

The log entries remain available in the Log Events window until you rerun the package, run a different
package, or close SQL Server Data Tools.

 View the log entries in the Log Events window.

 Optionally, click the log entries to copy, right-click, and then click Copy.

 Optionally, double-click a log entry, and in the Log Entry dialog box, view the details for a single
log entry.

 In the Log Entry dialog box, click the up and down arrows to display the previous or next log entry,
and click the copy icon to copy the log entry.

 Open a text editor, paste, and then save the log entry to a text file

You might also like