You are on page 1of 19

Here are some SSIS related Interview Questions with answers. hope they help.

6) how do you schedule ssis packages to run on the fly


7) how do you run stored procedure and get data
8) A scenario: Want to insert a text file into database table, but during the upload want
to change a column called as months - January, Feb, etc to a code, - 1,2,3.. .This code
can be read from another database table called months. After the conversion of the
data , upload the file. If there are any errors, write to error table. Then for all errors, read
errors from database, create a file, and mail it to the supervisor.
How would you accomplish this task in SSIS?
9)what are variables and what is variable scope ?
Answers
For Q 1 and 2:
Q1) What is the control flow
Q2) what is a data flow
In SSIS a workflow is called a control-flow. A control-flow links together our modular
data-flows as a series of operations in order to achieve a desired result.
Control Flow:
• Process Oriented
• Doesn’t manage or pass data between components.
• It functions as a task coordinator
• In control flow tasks requires completion (Success.,failure or completion)
• Synchronous in nature, this means, task requires completion before moving to
next task. If the tasks are not connected with each other but still they are
synchronous in nature.
• Tasks can be executed both parallel and serially

What are the Three types of control flow elements in SSIS 2005
Answer: - Containers: - Provides structures in the packages
Tasks: - Provides functionality in the packages
Precedence Constraints: - Connects containers, executables and and tasks
into an ordered control flow.
We can control the sequence execution for tasks and also specify the conditions that
tasks and containers run.
It is possible to include nested containers as SSIS Architecture supports nesting of the
containers. Control flow can include multiple levels of nested containers.
Data Flow
• Streaming in nature
• Information oriented
• Passes data between other components
• Transformations work together to manage and process data. This means first set
of data from the source may be in the final destination step while at the same
time other set of data is still flowing. All the transformations are doing work at the
same time.
What are the Three types of Three types of Data Flow components
Sources: - Extracts data from the various sources (Database, Text Files etc)
Transformations: - Cleans, modify, merge and summarizes the data
Destination: - Loads data into destinations like database, files or in memory datasets

Q3: how do you do error handling in SSIS


When a data flow component applies a transformation to column data, extracts data
from sources, or loads data into destinations, errors can occur. Errors frequently occur
because of unexpected data values.

For example, a data conversion fails because a column contains a string instead of a
number, an insertion into a database column fails because the data is a date and the
column has a numeric data type, or an expression fails to evaluate because a column
value is zero, resulting in a mathematical operation that is not valid.

Errors typically fall into one the following categories:

-Data conversion errors, which occur if a conversion results in loss of significant digits,
the loss of insignificant digits, and the truncation of strings. Data conversion errors also
occur if the requested conversion is not supported.
-Expression evaluation errors, which occur if expressions that are evaluated at run time
perform invalid operations or become syntactically incorrect because of missing or
incorrect data values.
-Lookup errors, which occur if a lookup operation fails to locate a match in the lookup
table.

Many data flow components support error outputs, which let you control how the
component handles row-level errors in both incoming and outgoing data. You specify
how the component behaves when truncation or an error occurs by setting options on
individual columns in the input or output.

For example, you can specify that the component should fail if customer name data is
truncated, but ignore errors on another column that contains less important data.

Q 4: how do you do logging in ssis


SSIS includes logging features that write log entries when run-time events occur and can
also write custom messages.

Integration Services supports a diverse set of log providers, and gives you the ability to
create custom log providers. The Integration Services log providers can write log entries
to text files, SQL Server Profiler, SQL Server, Windows Event Log, or XML files.

Logs are associated with packages and are configured at the package level. Each task
or container in a package can log information to any package log. The tasks and
containers in a package can be enabled for logging even if the package itself is not.

To customize the logging of an event or custom message, Integration Services provides


a schema of commonly logged information to include in log entries. The Integration
Services log schema defines the information that you can log. You can select elements
from the log schema for each log entry.

To enable logging in a package


1. In Business Intelligence Development Studio, open the Integration Services project
that contains the package you want.
2. On the SSIS menu, click Logging.
3. Select a log provider in the Provider type list, and then click Add.
• Q 5 : how do you deploy ssis packages.
SQL Server 2005 Integration Services (SSIS) makes it simple to deploy
packages to any computer.
There are two steps in the package deployment process:
-The first step is to build the Integration Services project to create a package
deployment utility.
-The second step is to copy the deployment folder that was created when you
built the Integration Services project to the target computer, and then run the
Package Installation Wizard to install the packages.
Q9:

Variables store values that a SSIS package and its containers, tasks, and event
handlers can use at run time. The scripts in the Script task and the Script
component can also use variables. The precedence constraints that sequence
tasks and containers into a workflow can use variables when their constraint
definitions include expressions.

Integration Services supports two types of variables: user-defined variables and


system variables. User-defined variables are defined by package developers,
and system variables are defined by Integration Services. You can create as
many user-defined variables as a package requires, but you cannot create
additional system variables.

Scope :

A variable is created within the scope of a package or within the scope of a


container, task, or event handler in the package. Because the package container
is at the top of the container hierarchy, variables with package scope function like
global variables and can be used by all containers in the package. Similarly,
variables defined within the scope of a container such as a For Loop container
can be used by all tasks or containers within the For Loop container.

Question 1 - True or False - Using a checkpoint file in SSIS is just like


issuing the CHECKPOINT command against the relational engine. It commits
all of the data to the database.
False. SSIS provides a Checkpoint capability which allows a package to restart
at the point of failure.

Question 2 - Can you explain the what the Import\Export tool does and the
basic steps in the wizard?
The Import\Export tool is accessible via BIDS or executing the dtswizard
command.
The tool identifies a data source and a destination to move data either within 1
database, between instances or even from a database to a file (or vice versa).

Question 3 - What are the command line tools to execute SQL Server
Integration Services packages?
DTSEXECUI - When this command line tool is run a user interface is loaded in
order to configure each of the applicable parameters to execute an SSIS
package.
DTEXEC - This is a pure command line tool where all of the needed switches
must be passed into the command for successful execution of the SSIS package.

Question 4 - Can you explain the SQL Server Integration Services


functionality in Management Studio?
You have the ability to do the following:
Login to the SQL Server Integration Services instance
View the SSIS log
View the packages that are currently running on that instance
Browse the packages stored in MSDB or the file system
Import or export packages
Delete packages
Run packages

Question 5 - Can you name some of the core SSIS components in the
Business Intelligence Development Studio you work with on a regular basis
when building an SSIS package?
Connection Managers
Control Flow
Data Flow
Event Handlers
Variables window
Toolbox window
Output window
Logging
Package Configurations

Question Difficulty = Moderate

True or False: SSIS has a default means to log all records updated, deleted
or inserted on a per table basis.
False, but a custom solution can be built to meet these needs.

What is a breakpoint in SSIS? How is it setup? How do you disable it?


A breakpoint is a stopping point in the code. The breakpoint can give the
Developer\DBA an opportunity to review the status of the data, variables and the
overall status of the SSIS package.
10 unique conditions exist for each breakpoint.
Breakpoints are setup in BIDS. In BIDS, navigate to the control flow interface.
Right click on the object where you want to set the breakpoint and select the 'Edit
Breakpoints...' option.

Can you name 5 or more of the native SSIS connection managers?


OLEDB connection - Used to connect to any data source requiring an OLEDB
connection (i.e., SQL Server 2000)
Flat file connection - Used to make a connection to a single file in the File
System. Required for reading information from a File System flat file
ADO.Net connection - Uses the .Net Provider to make a connection to SQL
Server 2005 or other connection exposed through managed code (like C#) in a
custom task
Analysis Services connection - Used to make a connection to an Analysis
Services database or project. Required for the Analysis Services DDL Task and
Analysis Services Processing Task
File connection - Used to reference a file or folder. The options are to either use
or create a file or folder
Excel
FTP
HTTP
MSMQ
SMO
SMTP
SQLMobile
WMI

How do you eliminate quotes from being uploaded from a flat file to SQL
Server?
In the SSIS package on the Flat File Connection Manager Editor, enter quotes
into the Text qualifier field then preview the data to ensure the quotes are not
included.
Additional information: How to strip out double quotes from an import file in SQL
Server Integration Services

Can you name 5 or more of the main SSIS tool box widgets and their
functionality?
For Loop Container
Foreach Loop Container
Sequence Container
ActiveX Script Task
Analysis Services Execute DDL Task
Analysis Services Processing Task
Bulk Insert Task
Data Flow Task
Data Mining Query Task
Execute DTS 2000 Package Task
Execute Package Task
Execute Process Task
Execute SQL Task
etc.

Question Difficulty = Difficult

Question 1 - Can you explain one approach to deploy an SSIS package?


One option is to build a deployment manifest file in BIDS, then copy the directory to the
applicable SQL Server then work through the steps of the package installation wizard
A second option is using the dtutil utility to copy, paste, rename, delete an SSIS
Package
A third option is to login to SQL Server Integration Services via SQL Server
Management Studio then navigate to the 'Stored Packages' folder then right click on the
one of the children folders or an SSIS package to access the 'Import Packages...' or
'Export Packages...'option.
A fourth option in BIDS is to navigate to File | Save Copy of Package and complete the
interface.

Question 2 - Can you explain how to setup a checkpoint file in SSIS?


The following items need to be configured on the properties tab for SSIS package:
CheckpointFileName - Specify the full path to the Checkpoint file that the package uses
to save the value of package variables and log completed tasks. Rather than using a
hard-coded path as shown above, it's a good idea to use an expression that
concatenates a path defined in a package variable and the package name.
CheckpointUsage - Determines if/how checkpoints are used. Choose from these
options: Never (default), IfExists, or Always. Never indicates that you are not using
Checkpoints. IfExists is the typical setting and implements the restart at the point of
failure behavior. If a Checkpoint file is found it is used to restore package variable values
and restart at the point of failure. If a Checkpoint file is not found the package starts
execution with the first task. The Always choice raises an error if the Checkpoint file
does not exist.
SaveCheckpoints - Choose from these options: True or False (default). You must
select True to implement the Checkpoint behavior.

Question 3 - Can you explain different options for dynamic configurations in


SSIS?
Use an XML file
Use custom variables
Use a database per environment with the variables
Use a centralized database with all variables

Question 4 - How do you upgrade an SSIS Package?


Depending on the complexity of the package, one or two techniques are typically used:
Recode the package based on the functionality in SQL Server DTS
Use the Migrate DTS 2000 Package wizard in BIDS then recode any portion of the
package that is not accurate

Question 5 - Can you name five of the Perfmon counters for SSIS and the value
they provide?
SQLServer:SSIS Service
SSIS Package Instances - Total number of simultaneous SSIS Packages running
SQLServer:SSIS Pipeline
BLOB bytes read - Total bytes read from binary large objects during the monitoring
period.
BLOB bytes written - Total bytes written to binary large objects during the monitoring
period.
BLOB files in use - Number of binary large objects files used during the data flow task
during the monitoring period.
Buffer memory - The amount of physical or virtual memory used by the data flow task
during the monitoring period.
Buffers in use - The number of buffers in use during the data flow task during the
monitoring period.
Buffers spooled - The number of buffers written to disk during the data flow task during
the monitoring period.
Flat buffer memory - The total number of blocks of memory in use by the data flow task
during the monitoring period.
Flat buffers in use - The number of blocks of memory in use by the data flow task at a
point in time.
Private buffer memory - The total amount of physical or virtual memory used by data
transformation tasks in the data flow engine during the monitoring period.
Private buffers in use - The number of blocks of memory in use by the transformations
in the data flow task at a point in time.
Rows read - Total number of input rows in use by the data flow task at a point in time.
Rows written - Total number of output rows in use by the data flow task at a point in
time.

How would you set up an SQL Agent


What is the difference between Multicast and Conditional Split
Can I have more the 2 outputs from the conditional Split Component
Want to insert a tect file into database table, but during the upload want to change
a column called as months - January, Feb, etc to a code, - 1,2,3.. .This code can be
read from another database table called months. After the conversion of the data ,
upload the file. If there are any errors, write to error table. Then for all errors, read
errors from database, create a file, and mail it to the supervisor.
How would you accomplish this task in SSIS?

• For questions 3
Pivot transfermation
For question 2:
The Multicast transformation distributes its input to one or more outputs. This
transformation is similar to the Conditional Split transformation. Both transformations
direct an input to multiple outputs. The difference between the two is that the Multicast
transformation directs every row to every output, and the Conditional Split directs a
row to a single output
for question 1

Can I put a Breakpoint on the components on the Data Flow Tab

SSIS QUESTIONS AND ANSWERS


1.what is a package?
a).a discrete executable unIt of work composed of a collection of control flow and other
objects, including data sources, transformations ,process sequence, and rules, errors
and event handling, and data destinations.

2.what is a workflow in ssis ?


a).`a workflow is a set of instructions on how to execute tasks.
(It is a set of instructions on how to execute tasks such as sessions, emails and shell
commands. a workflow is created form work flow mgr.)

3.what is the diff between control flow Items and data flow Items?
a).the control flow is the highest level control process.It allows you to manage the run-
time process the run time process activIties of data flow and other processes wIthin a
package.
when we want to extract, transform and load data wIthin a package. you add an ssis
dataflow task to the package control flow.

4.what are the main component of ssis(project-archItecture)?


A).ssis archItecture has 4 main components
1.ssis service
2.ssis runtime engine & runtime executables
3.ssis dataflow engine & dataflow components
4.ssis clients

5.different components in ssis package?


a).1.control flow
2.data flow
3.event handler
4.package explorer

containers:::provide structure and scope to your package


types of containers:
i.task host container:the taskhost container services a single task.
ii.sequence container:It can handle the flow of subset of a package and can help you
drive a package into smaller more managable process.
uses:-1.grouping tasks so thatyou can disable a part of the package thats no longer
needed.
2.narrowing the scope of the variable to a container.
3.managing the property of multiple tasks in one step by setting the properties of the
container.
iii.for loop container:evaluates an expression and repeats Its workflow until the
expression evaluates to false.
iv.for each loop container:defines a control flow repeatedly by using an enumerator.
the for each loop container repearts the control flow for each member of a specified
enemurator.

tasks:::It provide the functionalIty to ur package.


It is a individual unIt of work.

event handler::respond to raised events in your package.

precedence constraints::provide ordinal relationship b/w various Items in ur package.


6.how to deploy the package?
a).to deploy the package first we need to configure some properties.
goto project tab->package properties->we get a window,configure deployment utilIty as
"true"
mention the path as "bin/deployment"

7. Connection manager:
a).It is a bridge b/w package object and physical data. It provides logical representation
of a connection at design time the properties of the connection mgr describes the
physical connection that integration services creates when the package is run.

8. Tell the utilIty to execute (run) the package?


a) In BIDS a package that can be executed in debug mode by using the debug menu or
toolbar or from solution explorer.
In production, the package can be executed from the command line or from a Microsoft
windows utilIty, or It can be scheduled for automated execution by using the sql server
agent.
i).goto->debug menu and select the start debugging button
ii).press F5 key
iii).right click the package and choose execute package.
iv).command prompts utilIties
a).DTExecUI
1. To open command prompt->run->type dtexecui->press enter
2. The execute package utilIty dialog box opens.
3. in that click execute to run the package.
WaIt until the package has executed successfully.
b).DTExec utilIty
1.open the command prompt window.
2.command prompt window->type dtexec/followed by the DTS,SQL,or file option and the
package path ,including package name.
3.if the package encryption level is encrypt sensItive wIth password or encrypt all wIth
password, use the decrypt option to provide the password.
If no password is included, dtexec will prompt you for the password.
4. Optionally, provide addItional command-line options
5. Press enter.
6. Optionally, view logging and reporting information before closing the command prompt
window.
The execute package utilIty dialog box opens.
7. In the execute package utilIty dialog box, click execute package.
WaIt until the package has executed successfully.
v).using sql server mgmt studio to execute package
1. In SSMS right click a package, and then click run package.
Execute package utilIty opens.
2. Execute the package as described previously.

9. How can u design SCD in SSIS?


a) Def:-SCD explains how to capture the changes over the period of time.
This is also known as change data capture.
type1: It keeps the most recent values in the target. It does not maintain the history.
type2: It keeps the full history in the target database. For every update in the source
anew record is inserted in the target.
type3: It keeps current & previous information in the target.
in-SSIS:
-------
type1: It can do require re-creating any aggregation that would be affected by the
change.
type2: changes can cause a serious inflation in the number of members of a dimension.
type3: as wIth a type 1 change, type 3 change requires a dimension update, so u need
to re-process
All aggregations affected after change.

10. How can u handle the errors through the help of logging in SSIS?
a) To create an on error event handler to which you add the log error execute sql task.
11. What is a logfile and how to send log file to mgr?
a) It is especially useful when the package has been deployed to the production
environment, and you can not use BIDS and VSA to debug the package.
SSIS enables you to implement logging code through the Dts. Log method.
When the Dts. Log method is called in the script, the SSIS engine will route the message
to the log providers that are configured in the containing package.

12. What is environment variable in SSIS?


a) An environment variable configuration sets a package property equal to the value in
an environment variable.
Environmental configurations are useful for configuring properties that are dependent on
the computer that is executing the package.

13. about multiple configurations?


a) It means including the xml configuration, environment variable, registry entry, parent
package variable, SQL Server table, and direct and indirect configuration types.

14. How to provide securIty to packages?


a) In two ways
1. Package encryption
2. Password protection.

15. as per error handling in T/R, which one handle the better performance? Like fail
component, redirect row or ignore failure?
a) Redirect row provides better performance for error handling.

16. Staging area??


a) It is a temporary data storage location. Where various data T/R activIties take place. A
staging area is a kItchen of data warehouse.

17. Task??
a) An individual unIt of work.
Types:-
.....
1. Active x script task
2. Analysis services execute DDL task
3. Analysis services processing task
4. Bulk insert task *
5. Data flow task *
6. Data mining query task
7. Execute Dts 2000 package task
8. Execute package task *
9. Execute process task
10. Execute sql task *
11. File system task
12. Ftp task
13. Message queue task
14. Script task *
15. Send mail task *
16. Web service task
17. Wmi data reader task
18. Wmi event task
19. Xml task

18. Event handling & logging?


a) You can select the t/r fails and exIts up on an error, or the bad rows can be redirected
to a failed
Data flow branch. Ignore failure, redirect row.
Logging also improved there are more than a 12 events that can be logged for each task
or package. You can enable partial logging for one task and enable much more detailed
logging for billing tasks.
Ex:-on error
On post validate
On progress
On warning
--->log file can be wrItten to usually any connection
Sql profiler
Text files
Sql server
Window event log
Xml file

19. Import & export wizard?


a) Easiest method to move data from sources like oracle, db2, sql server.
Right click on database name->goto task->import and export wizard
Select the source
Select the destination
Query copy of tables
Execute
Finish

20.solution explorer?
after creating project
project name
-data source
-data source views
-packages
-miscellaneous

21. Precedence constraints?


a) Constraints that link executable, container, and tasks wIthin the package control flow
and specify condItion that determine the sequence
And condItions for determine whether executable run.

22. Data pipeline?


a) The memory based, multIthreaded, buffered t/r process flow data through an SSIS
data flow task during package execution.

23. TRANSFORMATIONS??
It is an object that generates, modifies, or passes data.
1.AGGEGATE T/R:-It applies an aggregate function to grouped records and produces
new output records from aggregated results.
2.AUDIT T/R:-the t/r adds the value of a system variable, such as machine name or
execution instance GUID to a new output column.
3.CHARACTER MAP T/R:-this t/r makes string data changes such as changing data
from lower case to upper case.
4.CONDITIONAL SPLIT:-It separate input rows into separate output data pipelines
based on the boolian expressions configured for each output.
5.COPY COLUMN:-add a copy of column to the t/r output we can later transform the
copy keeping the original for audIting personal
6.DATA CONVERSION:-converts a columns data type to another data type.
7.DATA MINING QUERY:-perform a data mining query against analysis services.
8.DERIVED COLUMN:-create a new derive column calculated from expression.
9.EXPORT COLUMN:-It allows you to export a column from the data flow to a file.
10.FUZZY GROUPING:-perform data cleansing by finding rows that are likely
duplicates.
11.FUZZY LOOKUP:-matches and standardizes data based on fuzzy logic.
eg:-transform the name jon to john
12.IMPORT COLUMN:-reads the dat from a file & adds It into a dataflow.
13.LOOKUP:-perform the lookup of data tobe used later in a transform.
ex:-t/f to lookup a cIty based on zipcode.
1.getting a related value from a table using a key column value
2.update slowly changing dimension table
3.to check whether records already exist in the table.
14.MERGE:-merges two sorted data sets into a single data set into a single data flow.
15.MERGE JOIN:-merges two data sets into a single dataset using a join junction.
16.MULTI CAST:-sends a copy of two datato an addItional path in the workflow.
17.ROW COUNT:-stores the rows count from the data flow into a variable.
18.ROW SAMPLING:-captures the sample of data from the dataflow by using a row
count of the total rows in dataflow.
19.ROW SAMPLING:-captures the sample of the data from the data flow by using a row
count of the total rows in data flow.
20.UNION ALL:-merge multiple data sets into a single dataset.
21.PIVOT:-converts rows into columns
22.UNPIVOT:-converts columns into rows

24. Batch?
a) A batch is defined as group of sessions. Those are 2 types.
1. Parallel batch processing
2. Sequential batch processing

-----For executing the package we can use "execute package utilIty"----

----for deploying the package we can use "package deployment utilIty"—

SSRS:--

1. What are the main components of reporting services?


a) Report designer, report server, report manager, report user.

2. Where can u publish the report?


a) By using report designer or publish reports on report server.
3. What are the necessIty things for creating matrix report?
a) Page, column, row, details

4. for generating reports which is used like RDBMS OR CUBE?


a) Depends on data

5. What is assembly code in SSRS?


a)

6. What is Rdl file?


a) Rdl is a Report DefinItion Language. Every report is saving wIth Rdl Extension.

7. How can u create a job?


a) Job is a group of reports.
The reports can be processed eIther sequentially or parallel.

SSAS:-

1. What are the fixed measure and calculated measure?


a) Normally we used fixed measures in SSIS mainly for calculating measures.
Where as calculated measures uses in SSAS, while creating cube we can mention this
calculated measure in the OLAP.

2. What are measures?


a) Measures are numeric data based on columns in a fact table.

3. What are cubes?


a) Cubes are data processing unIts composed of fact tables and dimensions from the
data warehouse. They provided multidimensional analysis.

4. What are virtual cubes?


These are combination of one or more real cubes and require no disk space to store
them. They store only definItion and not the data.

DATAWARE HOUSE CONCEPTS:-


1. Diff b/w OLTP AND OLAP?
A)

OLTP OLAP
transactional processing query processing
time sensItive history oriented
Operator & clerks view Managers, CEOs, PM’s views
organized by transaction organized by subjects
relatively smaller DB .large DB size
volatile data non-volatile
stores all data stores relevant data
Not flexible flexible

2. Diff b/w star schema and snowflake?


a) STAR SCHEMA SNOWFLAKE
_____________________________________
1.centrally located fact table 1.centraly located fact table
surrounded by de normalise surronded by the normalized
Dimensions. dimension table.
2.all dimensions will be link 2.all dim link wIth each other (or)
directly wIth fact table. 1-N relationship wIth other table.
3.It is easy to understand by 3.It is diff to understand.
end user or tech people 4.It is diff to retrieve the data while
4.We can easily retrieve data parsing the query against the facts n dim.
By passing the simple queries.
5.increase the query perform- 5.more joins.
ance because It involve less
Joins.

What are fact tables?


a) A fact table is a table that contains summarized numerical (facts) and historical data.
This fact table has a foreign key-primary key relation wIth a dimension table. the fact
table maintains the information in 3rd normal form.

3. Types of facts?
a)
1. AddItive:-able to add the facts along wIth all the dimensions
-discrete numerical measures.
-Ex:-retail sales in $
2. semi addItive:-snapshot taken at a point in time
- Measure of intensIty
-not addItive along time dimensions
ex:-account balance, inventory balance
3.non-addItive:-numerical measures that can't be added across any dimensions.
-intensIty measure arranged across all dimension
ex:-room temperatures, averages

4. Data warehouse?
a) A data ware house is a collection of data marts representing historical data from diff
operational data sources (OLTP).
The data from these OLTP are structured and optimized for querying and data analysis
in a data warehouse.

5. Data mart?
a) A data mart is a subset of a data warehouse that can provide data for reporting and
analysis on a section, unIt or a department like sales dept, hr dept.

6. What is OLAP?
a) OLAP stands for online analytical processing. It uses databases tables (fact and
dimension table) to enable multi dimensional viewing, analysis and querying of large
amount of data.

7. What is OLTP?
a) OLTP stands for online transactional processing. Except data warehouse databases
the other databases are OLTP.
These OLTP uses normalized schema structure.
These OLTP databases are designed for recording the daily operations and transactions
of a business.

8. What are dimensions?


Dimensions are categories by which summarized data can be viewed. For example a
profIt summary fact table can be viewed by a time dimension.

9. What are conformed dimension?


a) The dimensions which are reusable and fixed in nature. Example customer, time,
geography dimensions.

10. Staging area?


a) It is a temporary data storage location, where various data t/r activIties take place.

11. Fact grain(granularIty)?


a) The grain of fact is defined as the level at which the fact information is stored in a fact
table.

12. What is a fact less fact table?


a) The fact table which does not contain facts is called as fact table.
Generally when we need to combine two data marts, then one data mart will have a fact
less fact table and other one wIth common fact table.

13. What are measures?


a) Measures are numeric data based on columns in a fact table.

14. What are cubes?


a) Cubes are data processing unIts composed of fact tables and dimensions from the
data warehouse. They provided multidimensional analysis.

15. What are virtual cubes?


These are combination of one or more real cubes and require no disk space to store
them. they store only definItion and not hte data.
16.SCD's?
a)
type-I(current data)
type-II(full historical information& Current data)
type-III(Current data & Recent data)

SQL-SERVER-2005:-

1. Surrogate key?
a)It is an artificial or synthetic key that is used as a substItute for a natural keys.
It is just a unique identifier or number for each row that can be used for the primary key
to the table.
(It is a sequence generate key which is assigned to be a primary key in the
system(table)).

2.primary key?
a)can be used to uniquely identify every row of the table.
------unique + not null-------

3.foreign key?
a)It is a column r combination of columns that contain values that are found in primary
key of some table.
It may be null, not unique.

4.composIte key?
a)It is a primary key consisting of more than one column.

4. indexes?
a) It is an access strategy that is a way to sort and search records in the table.
Indexes are essential to improve the speed wIth which records can be located and
retrieved from a table.
Types: - cluster index (can create only one index on table)
Non-cluster index (can create 249 indexes on table)
Unique index
Composite index
(Simple index, reverse key index, bitmap index, function index)

5. View?
a) It is used for data security reason
To reduce the redundant data.

6. Cluster?
a) 1-many access path.
Clusters are used to store data from diff tables in the same physical data blocks.

7. Sequences?
a) It is used to quickest way to retrieve the data.

8. Cursors?
a) Implicit cursor
Explicit cursor
Parameter cursor

9. Triggers?
a) Row trigger
Statement trigger
10. Transactions?
Save point
Commit & rollback.
11. Security?
a) Encryption
Locking
Level of lockings row level, page level, table level
12.constraints?
a)primary->
foreign(reference)->
check->
unique->
13. Diff b/w having and where?
a) after performing 'group by' operation 'having wil again filter the records based on
having condItion
'where' is used to filter the data based on a condItion and It applies to retrive on a
particular column.
14. Joins?
a) Join can combine the information from two tables into a single unit.
inner join:-
they matched the records together based on one or more common fields(matched-
records only).
outer join:-

full join:-It combines the all rows on both sides of the join.
cross join:-
15. Union & union-all?
a) Union:-columns, data types should be same
Select distinct values
Remove duplicates
Union-all:-displays all the rows exact & duplicates.
16. Diff b/w drop, delete & truncate?
delete:-delete all rows at a time
delete a single row data based on condItion.
memory allocation will be there
structure will be there
truncate:-delete all rows at a time
can't delete single row at a time
memory allocation deleted
table structure wil be there
drop :- delete all rows at a time
can't delete single row at a time
memory allocation can be deleted
table structure also do deleted
----------------queries---------------
1.Nth highest salary?
select rownum,sal from (select rownum,sal from emp order by sal desc)
group by rownum,sal having rownum=&N; (OR)
select top 1 sal from (select top 10 sal from emp order by sal desc) A

Answers to Advanced SSIS Interview Question


Last week I wrote a set of Advanced SSIS Interview Questions. Here are the answers I
came up for these. Of course you may have some variation of what I have and I would
love to hear how you would answer these. Feel free to email me how you would answer
them.
I’ll skip the demonstrate/whiteboard part of the questions.
1. Demonstrate or whiteboard how you would suggest using configuration files in
packages. Would you consider it a best practice to create a configuration file for
each connection manager or one for the entire package?
There should be a single configuration file for each connection manager in your
packages that stores their connection string information. So if you have 6 connection
managers then you have 6 config files. You can use the same config file across all your
packages that use the same connections.
If you have a single config file that stores all your connection managers then all your
packages must have contain the connection managers that are stored in that config file.
This means you may have to put connection managers in your package that you don’t
even need.
2. Demonstrate or whiteboard how checkpoints work in a package.
When checkpoints are enabled on a package if the package fails it will save the point at
which the package fails. This way you can correct the problem then rerun from the point
that it failed instead of rerunning the entire package. The obvious benefit to this is if you
load a million record file just before the package fails you don’t have to load it again.
3. Demonstrate or whiteboard using a loop in a package so each file in a directory
with the .txt extension is loaded into a table. Before demonstrating this tell which
task/container accomplishes this and which enumerator will be used. (Big hint on
which task/container to use is that it requires and enumerator)
This would require a Foreach Loop using the Foreach File Enumerator. Inside the
Foreach Loop Editor you need to set a variable to store the directory of the files that will
be looped through. Next select the connection manager used to load the files and add
an expression to the connection string property that uses the variable created in the
Foreach Loop.
4. Demonstrate or whiteboard how transactions work in a package.
If transactions are enabled on your package and tasks then when the package fails it will
rollback everything that occurred during the package. First make sure MSDTC (Microsoft
Distributed Transaction Coordinator) is enabled in the Control Panel -> Administrative
Tools -> Component Services. Transactions must be enabled not only on the package
level but also on each task you want included as part of the transaction. To have the
entire package in a transaction set TransactionOption at the package level to Required
and each task to Supported.
5. If you have a package that runs fine in Business Intelligence Development
Studio (BIDS) but fails when running from a SQL Agent Job what would be your
first guess on what the problem is?
The account that runs SQL Agent Jobs likely doesn’t have the needed permissions for
one of the connections in your package. Either elevate the account permissions or
create a proxy account.
To create a proxy account you need to first create new credentials with the appropriate
permissions. Next assign those credentials to a proxy account. When you run the job
now you will select Run As the newly created proxy account.
6. What techniques would you consider to add auditing to your packages? You’re
required to log when a package fails and how many rows were extracted and
loaded in your sources and destinations.
I like to create a database that is designated for package auditing. Track row counts
coming from a source and which actually make it to a destination. Row counts and
package execution should be all in one location and then optionally report off that
database.
There are also third party tools that can accomplish this for you (Pragmatic Works BI
xPress).
7. What techniques would you consider to add notification to your packages?
You’re required to send emails to essential staff members immediately after a
package fails.
This could either be set in the SQL Agent when the package runs or actually inside the
package you could add a Send Mail Task in the Event Handlers to notify when a
package fails.
There are also third party tools that can accomplish this for you (Pragmatic Works BI
xPress).
8. Demonstrate or whiteboard techniques you would use to for CDC (Change Data
Capture)? Tell how you would write a package that loads data but first detects if
the data already exists, exists but has changes, or is brand new data for a
destination.
If for some reason you’ve avoided using a whiteboard to show your ideas to this point
then make sure you start on this question! For small amounts of data I may use the
Slowly Changing Dimension.
More often than not the data is too large to use in such a slow transform. I prefer to do a
lookup on the key of the target table and rows that don’t match are obviously new rows
that can be inserted. If they do match it’s possible they are updates or duplicates.
Determine this by using a conditional split comparing rows from the target to incoming
rows. Send updates to a staging table that can then be updated in an Execute SQL
Task.
Explain that putting updates in a staging table instead of updating using the OLE DB
Command is much better for performance because the Execute SQL Task performs a
bulk operation.
9. Explain what breakpoints are and how you would use them.
Breakpoints put pauses in your package. It’s a great tool for debugging a package
because you can place a breakpoint on a task and it will pause the package based on
execution events.
A reason in which I have used breakpoints is when I have a looping container and I want
to see how my variables are changed by the loop. I would place a watch window on the
package and type the variable name in. Set a break point on the container the stop after
each iteration of the loop.