You are on page 1of 22

Interview Topics for SQL & MSBI

Table of Contents
Recovery Models:........................................................................................................................................2
Simple Recovery Model:..........................................................................................................................2
Full recovery Model:................................................................................................................................2
Bulk-Logged:............................................................................................................................................2
Back-ups......................................................................................................................................................2
Back-up Scopes:.......................................................................................................................................2
A) Database backups:.......................................................................................................................2
B) Partial Back-ups:..........................................................................................................................2
C) File Back-ups:...............................................................................................................................2
Back-Up Types.............................................................................................................................................2
A) Full Backups:....................................................................................................................................2
B) Differential backups:........................................................................................................................2
SQL SERVER REPLICATION...........................................................................................................................2
A) Load Balancing:................................................................................................................................2
B) Offline Processing:...........................................................................................................................2
C) Redundancy:....................................................................................................................................2
A) Publishers........................................................................................................................................2
B) Subscribers......................................................................................................................................2
A) Snapshot Replication:......................................................................................................................2
B) Transactional Replication:................................................................................................................2
C) Merge Replication:..........................................................................................................................2
A) Express edition................................................................................................................................2
B) Workgroup edition..........................................................................................................................2
C) Standard edition..............................................................................................................................2
D) Enterprise edition............................................................................................................................2
Difference between Temp tables and Table variables in SQL Server...........................................................2
Suggestion for choosing between these two:..........................................................................................2
Stored Procedures.......................................................................................................................................2
Advantages of Stored Procedures:..........................................................................................................2

Author: Vinay Kotha Page 2


Differences between User Defined Functions and Stored Procedures........................................................2
SSAS.............................................................................................................................................................2
Different Dimension types by Microsoft available in Analysis Services...................................................2
Different Types of Dimensions....................................................................................................................2
Confirmed Dimension:.............................................................................................................................2
Junked Dimension:..................................................................................................................................2
Degenerated Dimension:.........................................................................................................................2
Slowly Changing Dimensions:..................................................................................................................2
There are 10 types of dimension Tables..................................................................................................2
Differences between Analysis Services 2005 and 2008...............................................................................2
Define temporary and extended stored procedure.....................................................................................2
Differences between SSRS 2005 and SSRS 2008..........................................................................................2
Performance Tuning of SSRS: Handling a Large workload...........................................................................2
Steps to Improve Performance................................................................................................................2
Control the Size of your Reports..........................................................................................................2
Use Cache Execution............................................................................................................................2
Configure and Schedule Your Reports.................................................................................................2
Deliver Rendered Reports for Non-browser Formats..........................................................................2
Populate the Report Cache by Using Data-Driven Subscriptions for Parameterized Reports..............2
Back to Report Catalogs.......................................................................................................................2
Tuning with Web Service.....................................................................................................................2
Memory Limits in SQL Server Reporting Services 2008...........................................................................2
Memory Limit......................................................................................................................................2
Maximum Memory Limit.....................................................................................................................2
Performance Tuning of SQL Server..............................................................................................................2
Section A:.................................................................................................................................................2
Microsoft Tips on Performance Tuning:..................................................................................................2
 Not knowing the performance and scalability characteristics of your system:............................2
 Retrieving too much data:...........................................................................................................2
 Misuse of Transactions:...............................................................................................................2
 Misuse of Indexes:.......................................................................................................................2
 Mixing OLTP, OLAP and reporting workloads:.............................................................................2

Author: Vinay Kotha Page 3


 Inefficient Schemas:.....................................................................................................................2
 Using an inefficient disk sub-system:...........................................................................................2
SSIS 10 Best Practices:.................................................................................................................................2
SSIS Performance tuning.............................................................................................................................2
 Data Flow Optimization Modes.......................................................................................................2
 Buffers:............................................................................................................................................2
 Buffer Sizing:....................................................................................................................................2
 Buffer Tuning:..................................................................................................................................2
 Parallelism:......................................................................................................................................2
 Extraction Tuning.............................................................................................................................2
 Transformation Tuning....................................................................................................................2
 Merge-Join Transformation.............................................................................................................2
 Slowly Changing Dimensions...........................................................................................................2
 Data Types.......................................................................................................................................2
 Miscellaneous..................................................................................................................................2
 Load Tuning.....................................................................................................................................2
Differences between SSIS 2005 and SSIS 2008............................................................................................2
Look-up....................................................................................................................................................2
Cache Transformation.............................................................................................................................2
Data Profiling Task...................................................................................................................................2
Script Task and Transformation...............................................................................................................2

Recovery Models:
There are 3 recovery Models in SQL Server.

1) Simple
2) Full
3) Bulk-Logged

Simple Recovery Model: Simple recovery model allows you to recover data only to the most
recent full database or differential back-up. Transaction log back-ups are not available because the
contents of the transaction log are truncated each time a checkpoint is issued for the database.

Or

Author: Vinay Kotha Page 4


Simple recovery model is just that simple, in this approach; SQL Server maintains only a minimal amount
of information in the transaction log. SQL Server truncates the transaction log each time the database
reaches a transaction checkpoint, leaving no log entries for disaster recovery purposes.

In databases using simple recovery model, you may restore full or differential back-up only. It is not
possible to restore such a database to a given point in time; you may only restore it to the exact time
when a full or differential back-up occurred. Therefore, you will automatically lose any data
modifications made between the time of the most recent full/differential back-up and the time of
failure.

Full recovery Model: Full recovery model uses database back-ups and transaction log back-ups to
provide complete protection against failure. Along with being able to restore a full or differential back-
up, you can recover the database to the point of failure or to a specific point in time. All operations,
including bulk operations such as SELECT INTO, CREATE INDEX and bulk-loading data, are fully logged
and recoverable.

Or

Full Recovery model also bears a self-descriptive name. In this model, SQL Server preserves the
transaction log until you back it up. This allows you to design a disaster back-up in conjunction with
transaction log back-ups.

In the event of a database failure, you have the most flexibility restoring databases using the full
recovery model. In addition to preserving data modifications stored in the transaction log, the full
recovery model allows you to restore a database to a specific point in time. For example, if an erroneous
modification corrupted your data at 2:36 Am on Monday, you could use SQL Server’s point in time to
restore to roll your database back to 2:35 AM, wiping out the effects of the recovery.

Bulk-Logged: Bulk-logged recovery model provides protection against failure combined with the
best performance. In order to get better performance, the following operations are minimally logged
and not fully recoverable: SELECT INTO, bulk load operations.

Or

Bulk recovery model is a special-purpose model that works in a similar manner to the full recovery
model. The only difference is in the way it handles bulk data modification operations. The bulk-logged
model records these operations in the transaction log using a technical known as minimal logging. This
saves significantly on processing time, but prevents you from using point-in-time restore option.

Microsoft recommends that the bulk-logged recovery model only be used for short periods of time. Best
practice dictates that you switch a database to the bulk-logged recovery model immediately before
conducting bulk operations and restore it to the full recovery model when those operations complete.

Author: Vinay Kotha Page 5


Back-ups
One of the major advantages that enterprise-class databases offer over their desktop counterparts is a
robust back-up and recovery feature set. Microsoft SQL Server provides database administrators with
the ability to customize a database backup and recovery plan to the business and technical
requirements of an organization.

In this article, we explore the process of backing up data with Microsoft SQL Server. When you create a
backup plan, you will need to create an appropriate mix of backups with varying[em] backup
scopes[/em] and [em]backup types[/em] that meet the recovery objectives of your organization and are
suitable for your technical environment.

Back-up Scopes: The scope of a back-up defines the portion of the database covered by the
backup. It defines the database, file and or file-group that SQL Server will backup. There are three
different types of back-up scope available in Microsoft SQL Server:

A) Database backups: These cover the entire database including all structural schema
information, the entire data contents of the database and any portion of the transaction log
necessary to restore the database from scratch to its state at the time of the backup. Database
backups are the simplest way to restore your data in the event of a disaster, but they consume a
large amount of disk space and time to complete.
B) Partial Back-ups: These are good alternatives to database back-ups for very large
databases that contain significant quantities of read-only data. If you have read-only file-groups
in your database, it probably doesn’t make sense to back them up frequently, as they do not
change. Therefore, the scope of a partial back-up includes all files in the primary file-group; all
read/write file-groups, and any read-only file- groups that you explicitly specify.
C) File Back-ups: This allows you to individually back-up files and/or file-groups from your
database. They may be used to complement partial back-ups by creating one-time-only backups
of your read-only file-groups. They may also play a role in complex back-up models.

Back-Up Types
The second decision you need to make when planning a SQL Server database backup model is the type
each backup included in your plan. The backup type describes the temporal coverage of the database
backup. SQL Server supports two different back-up types:

A) Full Backups: This includes all data within the backup scope. For example, a full database
backup will include all data in the database, regardless of when it was last created for modified.
Similarly, a full partial backup will include the entire contents of every file and file-group within
in the scope of the partial backup.
B) Differential backups: This includes only the portion of data that had changed since the
last full backup. For example, if you perform a full database backup on Monday morning and
then perform a differential backup on Monday evening, the differential backup will be a much

Author: Vinay Kotha Page 6


smaller file and takes much less time to create, this includes only the data changed during the
day on Monday.

You should keep in mind that the scope and type of a backup are two independent decisions made
when creating your backup plan. As described above, each type and scope allows you to customize
the amount of data included in the backup and, therefore, the amounts of time required to backup
and restore the database in the event of a disaster.

SQL SERVER REPLICATION


SQL Server replication allows database administrators to distribute data to various servers
throughout an organization. You may wish to implement replication in your organization for a
number of reasons, such as

A) Load Balancing: Replication allows you to disseminate your data to a number of servers
and then distribute the query load among those servers.
B) Offline Processing: you may wish to manipulate data from your database on a machine
that is not always connected to the network.
C) Redundancy: Replication allows you to build a fail-over database server that’s ready to pick
up the processing load at a moment’s notice.

In any replication scenario there are 2 main components:

A) Publishers have data to offer to the other servers. Any given replication scheme may have
one or more publishers.
B) Subscribers are database servers that wish to receive updates from the publisher when the
data is modified

There’s nothing preventing a single system from acting both of these capabilities. In fact, this is often
done in large-scale distributed database systems. Microsoft SQL Server supports three types of database
replication. They are

A) Snapshot Replication: It acts in the manner its name implies. The publisher simply takes a
snapshot of the entire replicated database and shares it with the subscribers. Of course, this is a
very time and resource-intensive process. For this reason, most administrators don’t use
snapshot replication on a recurring basis for databases that change frequently. There are two
scenarios where snapshot replication is commonly used. First, it is used for databases that rarely
change. Second, it is used to set a baseline to establish replication between systems while future
updates are propagated using transactional or merge replication.
B) Transactional Replication: This offers a more flexible solution for databases that
change on a regular basis. With transactional replication, the replication agent monitors the
publisher for changes to the database and transmits those changes to the subscribers. This
transmission can take place immediately or on a periodic basis.

Author: Vinay Kotha Page 7


C) Merge Replication: It allows the publisher and subscriber to independently make changes
to the database. Both entities can work without an active network connection. When they are
reconnected, the merge replication agents checks for changes on both sets of data and modifies
each database accordingly. If changes conflict with each other, it uses a predefined conflict
resolution algorithm to determine the appropriate data. Merge replication is commonly used by
laptop users and others who cannot be constantly connected to the publisher.

Each one of these replication techniques serves a useful purpose and is well-suited to particular
database scenarios.

If you are working with SQL Server 2005, you’ll need to choose your edition based upon your
replication needs. Each edition has differing capabilities.

A) Express edition has extremely limited replication capabilities. It is able to act as a


replication client only.
B) Workgroup edition adds limited publishing capabilities. It is able to serve five clients using
transactional replication and up to 25 clients using merge replication. It can also act as a
replication client.
C) Standard edition has full, unlimited replication capabilities with other SQL Server
databases
D) Enterprise edition adds a powerful tool for those operating in a mixed database
environments—its capable of replication with oracle databases

As you have undoubtedly recognized by this point, SQL Server’s replication capabilities offer
database administrators a powerful tool for managing and scaling databases in an enterprise
environment.

Difference between Temp tables and Table variables in SQL Server


1) Transaction log are not recorded for table variables so they are transactional neutral or you can
say that they are out of scope of transaction mechanism. Whereas temp tables participate in
transactions just like normal tables
2) Table variables cannot be altered it means no DDL action is allowed on them. Whereas temp
tables can be altered
3) Stored procedures with a temporary table cannot be pre-compiled, while an execution plan of
procedures with table variables can be statically compiled in advance. Pre-compiling a script
gives a major advantage to its speed of execution. This advantage can be dramatic for long
procedures, where recompilation can be too pricy.
4) Unlike temp tables, table variables are memory resident but not always. Under memory
pressure, the pages belonging to a table variable can be pushed out to tempdb.
5) There can be big performance differences between using table variables and temporary tables.
In most cases, temporary tables are faster than table variables. Although queries using table
variables didn’t generate parallel query plans on a large SMP box, similar queries using

Author: Vinay Kotha Page 8


temporary tables (local or global) and running under same circumstances did generate parallel
plans.
6) Table variables use internal metadata in a way that prevents the engine from using a table
variable with parallel query. SQL Server maintains statistics for queries that use temporary
tables but not for queries that use table variables. Without statistics, SQL Server might choose a
poor processing plan for a query that contains a table variable.
No Statistics is maintained on the table variable which means that any changes in data impacting
table variable will not cause recompilation of queries accessing table variable. Queries involving
table variables don’t generate parallel plans.

Suggestion for choosing between these two:


1) Use table variable where you want to pass table to the SP as parameter because there is no
other choice.
2) It’s found that table variable are slow in SQL Server 2005 than in 2000 on similar data and
circumstances, so if you have used table variables extensively in your database and planning to
migrate from 2000 to 2005, make your choice carefully.
3) Table variable are OK if used in small queries and for processing small amount of data otherwise
go for temp tables.
4) If you are using very complex business logic in your SP, its better using temp tables than table
variables.

Stored Procedures
A stored Procedure is a group of SQL statements that form a logical unit and perform a particular
task. Stored procedures are used to encapsulate a set of operations or queries to execute on a
database server. For example, operations on an employee database (hire, fire, promote, lookup)
could be coded as stored procedure executed by application code. Stored procedures can be
compiled and executed with different parameters and results, and they may have any combination
of input, output, and input/output parameters.

Advantages of Stored Procedures:


A) Precompiled execution: SQL Server compiles each stored procedure once and then reutilizes the
execution plan. This results in tremendous performance boosts when stored procedures are
called repeatedly.
B) Reduced client/server traffic: If network bandwidth is a concern in your environment, you’ll be
happy to learn that stored procedures can reduce long SQL queries to a single line that it
transmitted over the wire.
C) Efficient re-use of code and programming abstraction: Stored Procedures can be used by
multiple users and client programs. If you utilize them in a planned manner, you’ll find the
development cycle takes less time.
D) Enhanced security controls: you can grant users permissions to execute a stored procedure
independently of underlying table permissions.

Author: Vinay Kotha Page 9


Differences between User Defined Functions and Stored Procedures
Stored procedures are very similar to user-defined functions, but there are suitable differences.
Both allow you to create bundles of SQL statements that are stored on the server for future use. This
offers you a tremendous efficiency benefit, as you can save programming by

A) Reusing code from one program to another, cutting down on program development time
B) Hiding the SQL details, allowing database developers to worry about SQL and application
developers to deal only in higher-level languages
C) Centralize maintenance, allowing you to make business logic changes in a single place that
automatically affect all dependent applications

At first glance, functions and stored procedures seem identical. However, there are several subtle, yet
important differences between the two:

A) Stored procedures are called independently, using the EXEC command, while functions are
called from within another SQL statement
B) Stored procedures allow you to enhance application security by granting users and applications
permission to use stored procedures, rather than permission to access the underlying tables.
Stored procedures provide the ability to restrict user actions at a much more granular level than
standard SQL Server permissions. For example if you have an inventory table that cashiers must
update each time an item is sold (to decrement the inventory for that item by 1 unit), you can
grant cashiers permissions to use a decrement item stored procedure, rather than allowing
them to make arbitrary changes to the inventory table.
C) Functions always must return a value (either a scalar value or a table). Stored procedures may
return a scalar value, a table value or nothing at all.

Overall, stored procedures are one of the greatest treasures available to SQL Server developers. The
efficiency and security benefits are well worth the upfront investment in time.

SSAS
Different Dimension types by Microsoft available in Analysis Services
1) Regular
2) Time
3) Organization
4) Geography
5) Bill of Materials
6) Accounts
7) Customers
8) Products
9) Scenario
10) Quantitative

Author: Vinay Kotha Page 10


11) Utility
12) Currency
13) Rates
14) Channel
15) Promotion

Regular: A dimension whose type has not been set to a special dimension type

Time: A dimension whose attributes represents time periods, such as years, semesters, quarters,
months and days

Organization: A dimension whose attributes represents organizational information such as


employers or subsidiaries

Geography: A dimension whose attribute represents geographic information, such as cities or postal
codes

Bill of Materials: A dimension whose attributes represent inventory r manufacturing information


such as parts lists for products

Accounts: A dimension whose attributes represent a chart of accounts for financial reporting
purposes

Customers: A dimension whose attribute represent customer or contact information

Products: Dimensions whose attribute represent product information

Scenario: Dimensions whose attribute represent planning or strategic analysis information

Quantitative: Dimensions whose attributes represent quantitative information

Utility: Dimensions whose attribute represent miscellaneous information

Currency: A dimension whose attributes represents currency rate information

Rates: Dimensions whose attribute represent currency rate information

Channel: A dimension whose attribute represent channel information

Promotion: Dimensions whose attribute represent marketing promotion information.

Different Types of Dimensions


1) Confirmed Dimension
2) Junk Dimension
3) Degenerated Dimension
4) Slowly changing dimensions

Author: Vinay Kotha Page 11


Confirmed Dimension: These dimensions are something that is built once in your model and
can de reused multiple times with different fact tables. For example consider a model containing
multiple fact tables, representing different data-marts. Now look for a dimension that is common to
these fact tables. In this example let’s consider that the product dimension is common and hence
can be reused by creating short cuts and joining the different fact tables. Some of the examples are
time dimension are customer dimension, product dimension.

Junked Dimension: When you consolidate lots of small dimensions and instead of having 100s
of small dimensions, that will have few records in them, cluttering your database with these mini
identifier tables, all records from all these small dimension tables are loaded into ONE dimension
table and we call this dimension table as JUNK dimension table. (Since we are storing all the Junk in
this one table) For example a company might have handful of manufacture plants, handful of order
types, and so on, so forth, and we can consolidate them into one dimension table called Junk
dimension table

Degenerated Dimension: An item that is in the fact table but is stripped off of its
description, because the description belongs in dimension table, is referred to as Degenerated
Dimension. Since it looks like dimension, but is really in fact table and has been degenerated of its
description, hence is called as Degenerated Dimension.

Slowly Changing Dimensions: These dimensions are those where key value will remain
static but description might change over the period of time

There are 10 types of dimension Tables (This is not the case in most of the instances)
1) Primary Dimensions
2) Secondary Dimensions
3) Degenerate Dimensions
4) Confirmed Dimensions
5) Slowly Changing Dimensions
6) Rapidly Changing Dimensions
7) Large Dimensions
8) Rapidly Changing Monster Dimensions
9) Junk Dimensions
10) Role-Playing Dimensions

Differences between Analysis Services 2005 and 2008


A) Real time best practice design warnings. These warnings are implemented in AMO, exposed in
the UI via blue squiggly lines, and can be dismissed individually (a single occurrence) or turned
off all together. To disable/re-enable build project and then in the warning window select
warning message and right mouse click to choose disable or enable.
B) New Dimension Design Wizard

Author: Vinay Kotha Page 12


C) New Cube Design Wizard
D) Attribute relationship tab in dimension designer. Allows easier to define and understand
attribute relationship.
E) CREATE MEMBER syntax extensions to support defining caption, display folders and associated
measure group.
F) CREATE SET syntax extensions to support defining caption and display folders as well as the
ability to define dynamic named sets.
G) CREATE KPI command is added
H) Backup performance improvements. In SSAS 2005 backup time for big databases grew
exponentially. In SSAS 2008 backup time grow is linear. Redesigned backup storage will remove
backup size limits.
I) Write-back to MOLAP Analysis Services 2008 removes the requirement to query ROLAP
partitions when performing write-backs, which results in huge performance gains.
J) Scale-out Analysis Services. A single read-only copy of Analysis Services database can be shared
between many Analysis Services through a virtual IP address. This creates a highly scalable
deployment option for an Analysis Services Solution.
K) UPDATE MEMBER new statement. The UPDATE MEMBER statement updates an existing
calculated member while preserving the relative precedence of this member with respect to
other calculations. Therefore, you cannot use the UPDATE MEMBER statement to change
SOLVEORDER. An UPDATE MEMBER statement cannot be specified in the MDX script for a Cube.
L) Block Computation. This eliminates unnecessary aggregation calculations (for example, when
the values to be aggregated are NULL) and provides a significant improvement in analysis cube
performance, which enables users to increase the depth of their hierarchies and complexity of
computations.
M) Aggregation Designer Changes. Algorithm that builds aggregations will be improved, there will
be support for manual edit/create/delete of aggregations and we should be able to see what
aggregates was designed. Also aggregation designer will have built-in validations for optimal
design assistance.
N) Data Management Views (DMV). These DMVs will allow writing SELECT type statements against
SSAS instance to get performance and statistics information.
O) SSAS database attach/detach
P) Analysis Services Personalization Extensions

Define temporary and extended stored procedure.

Answer - Temporary Stored Procedure is stored in Tempdb database. It is volatile and is deleted once
connection gets terminated or server is restarted......

Author: Vinay Kotha Page 13


Differences between SSRS 2005 and SSRS 2008
1) For SSRS 2005, it required Internet information services (IIS) to rum, where as in SSRS 2008, it no
longer requires IIS. 2008 uses http.sys driver and listens for report requests through http.sys.
Not only does this reduce deployment headaches, it also reduces server overhead
2) SSRS 2005 used more memory and it was extremely resource intensive, so much so that many
companies would install it on other machine apart from SQL Server, but 2008 utilizes memory
more efficiently, especially when working with reports that contain large sets of data.
Additionally, SSRS 2008 will often load the first page of a report much faster than 2005.

Performance Tuning of SSRS: Handling a Large workload


To get the highest performance when handling large workloads that include user requests for large
reports, implement the following recommendations

Steps to Improve Performance


1) Control the size of your reports
2) Use Cache Execution
3) Configure and Schedule your reports
4) Deliver Rendered Reports for Non-browser Formats
5) Populate the Report Cache by Using Data-Driven Subscriptions for Parameterized Reports
6) Back to the Report Catalogs
7) Tuning the Web Service

Control the Size of your Reports


you will first want to determine the purpose of these reports and whether a large multi-page report is
even necessary. If a large report is necessary, how frequently will it be used? If you provide users with
smaller summary reports, can you reduce the frequency with which users attempt to access this large
multi-page report? Large reports have a significant processing load on the report server, the report
server catalog, and report data, so it is necessary to evaluate each report on a case-by-case basis

Some common problems with these large reports are that they contain data fields that are not used in
the report or they contain duplicate datasets. Often users retrieve more data than they really need. To
significantly reduce the load placed on your Reporting Services environment, create summary reports
that use aggregates created at the data source, and include only the necessary columns,. If you want to
provide data feeds, you can do this asynchronously using more appropriate tools such as SSIS, to provide
the data feed.

Use Cache Execution


If the reports do not need to have live execution, enable the cache execution setting for each of your

Author: Vinay Kotha Page 14


appropriate reports. This setting causes the report server to cache a temporary copy of those reports in
memory.

Configure and Schedule Your Reports


For your large reports, use the Report Execution Timeouts setting to control how long a report can
execute before it times out. Some reports simply need a long time to run, so timeouts will not help you
there, but if reports are based on bad or runaway queries, execution timeouts ensure that resources are
not being inappropriately utilized

If you have large reports that create data processing bottle-necks, you can mitigate resource contention
issues by using Scheduled Snapshots. Instead of the report data itself, a regularly scheduled report
execution snapshot is used to render the report. The scheduled snapshot can be executed during off-
peak hours, leaving more resources available for live reports for users during peak hours.

Deliver Rendered Reports for Non-browser Formats


rendering performance of non-browser formats such as PDF and XLS has improved SQL Server 2008
Reporting Services, nevertheless, to reduce the load on your SQL Server Reporting Services
environment, you can place non-browser format reports onto a file share and/or Sharepoint, so users
can access the file directly instead of continually regenerating the report.

Populate the Report Cache by Using Data-Driven Subscriptions for


Parameterized Reports
For your large parameterized reports; you can improve performance by pre-populating the report cache
using data-driven subscriptions. Data-driven subscriptions enable easier population of the cache for set
combinations of parameter values that are frequently used when the parameterized report is executed.
Note that if you choose a set of parameters that are not used, you take on the cost of running the cache
with little value in return. Therefore, to identify the more frequent parameter value combinations,
analyze the Execution-Log2 view. Ultimately, when a user opens the report, the report server can now
use a cached copy of the report instead of creating the report on demand. You can schedule and
populate the report cache by using data-driven subscriptions.

Back to Report Catalogs


You can also increase the size of your report server catalogs, which allows the database to store more of
the snapshot data.

Tuning with Web Service


IIS and Http.Sys tuning helps get the last incremental performance out of the report server computer.
The low-level options allow you to change the length of the HTTP request queue, the duration that
connections are kept alive, and so on. For large concurrent reporting loads, it may be necessary to
change these settings to allow your server computer to accept enough requests to fully utilize the server
resources.
you should consider this only if your servers are at maximum load and you do not see full resource
utilization or if you experience connection failures to the Reporting Services.

Author: Vinay Kotha Page 15


Memory Limits in SQL Server Reporting Services 2008
Memory Limit
This configuration is similar to “WorkingSetMinimum” in SQL Server 2008. Its default is 60% of physical
memory. Increasing the value helps Reporting Services handle more requests. After this threshold is
reached, no new requests are accepted.

Maximum Memory Limit


This configuration is similar to “WorkingSetMaximum” in SQL Server 2008. Its default is 80% of physical
memory. But unlike SQL Server 2008 version, when its threshold is reached, it starts aborting process
instead of rejecting new requests

Performance Tuning of SQL Server


Section A:
 Increasing the ‘min memory per query’ option to improve the performance of queries that use
hashing or sorting operations, if your SQL Server has a lot of memory available and there are
many queries running concurrently on the server. Default ‘min memory per query’ option is
equal to 1024 kb.
 Increasing the ‘max async IO’ option if the SQL Server works on a high performance server with
high-speed intelligent disk subsystem (such as hardware-based RAID with more than 10 disks)
 Changing the ‘Network Packet Size’ option to the appropriate value. By default packet size is
4096 kb, for queries with high amounts of data packet size can be increased accordingly
 You can increase the ‘Recovery Interval’ value
 Increasing the ‘Priority boost’ for SQL Server options to 1. By default it is set to 0.
 Set the ‘Max Worker Threads’ options to maximum number of user connections to your SQL
Server box.
The default setting for the ‘max worker threads’ options is 255. If the number of user
connections will be less than the ‘max worker threads’ value, a separate operating system
thread will be created for each client connection, but if the number of user connections will
exceed this value the thread pooling will be used. For example, if the maximum number of the
user connections to your SQL Server box is equal to 50, you can set the ‘max worker threads’
option to 50, this frees up resources for SQL Server to use elsewhere. If the maximum number of
the user connections to your SQL Server box is equal to 500, you can set the ‘max worker
threads’ options to 500, this can improve SQL Server performance because thread pooling will
not be used.
 Specify the ‘Min Server Memory’ and ‘Max Server Memory’ options
 Specify the ‘Set Working Set Size’ SQL Server option to reserve the amount of physical memory
space for SQL Server.

Author: Vinay Kotha Page 16


Microsoft Tips on Performance Tuning:
 Not knowing the performance and scalability characteristics of your
system: If performance and scalability of a system are important to you, the biggest
mistake that you can make is to not to know the actual performance and scalability
characteristics of important queries, and the effect the different queries have on each other
in a multiuser system. You achieve performance and scalability when you limit resource use
and handle contention for those resources. Contention is caused by locking and by physical
contention. Resource use includes CPU utilization, network I/O, disk I/O and memory use.
 Retrieving too much data: A common mistake is to retrieve more data than you
actually require. Retrieving too much data leads to increased network traffic, and increased
server and client resources. This can include both the columns and rows.
 Misuse of Transactions: Long-running transactions, transactions that depend on user
input to commit, transactions that never commit because of an error, and non-transactional
queries inside transactions cause scalability and performance problems because they lock
resources longer than needed.
 Misuse of Indexes: if you do not create indexes that support the queries that are issued
against your server, the performance of your application suffers as a result. However, if you
have too many indexes, then insert and update performance of your application suffers. You
have to find a balance between the indexing needs of the writes and reads that is based on
how your application is used.
 Mixing OLTP, OLAP and reporting workloads: OLTP workloads are
characterized by many small transactions, with an expectation of very quick response time
from the user. OLAP and reporting workloads are characterized by a few-long running
operations that might consume more resources and cause more contention. The long-
running operations are caused by locking and by the underlying physical sub-system. You
must resolve this conflict to achieve a scalable system.
 Inefficient Schemas: Adding indexes can help improve performance, however their
impact may be limited if your queries are inefficient because of poor table design that
results in too many join operations or in inefficient join operations. Schema design is a key
performance factor. It also provides information to the server that may be used to optimize
query plans. Schema design is largely a tradeoff between good read performance and good
write performance. Normalization helps write performance. De-normalization helps read
performance
 Using an inefficient disk sub-system: the physical disk sub-system must provide a
database server with sufficient I/O processing power to permit the database server to run
without disk queuing or long I/O waits.

SSIS 10 Best Practices:


1) SSIS is an in-memory pipeline, so ensure all transformations occur in memory
2) Plan for capacity by understanding resource utilization

Author: Vinay Kotha Page 17


3) Baseline source system extract speed
4) Optimize SQL data source, lookup transformations and destination
5) Tune your network
6) Use data types – yes, back to data types wisely
7) Change the design
8) Partition the problem
9) Minimize logged operations
10) Schedule and distribute it correctly

SSIS Performance tuning


SSIS architecture has two engines, Run-Time engine and Data Flow engine. Run-Time engine is a highly
parallel control flow engine that co-ordinates the execution of tasks or units work within SSIS and
manages the engine threads that carry out those tasks. Data-Flow engine manages the data pipeline
within a data flow task.

 Data Flow Optimization Modes


Data flow task has a property called “RunInOptimizedMode”. When this property is enabled,
any down-stream component that doesn’t use any of the source component columns is
automatically disabled, and unused column is also automatically disabled. The net result of
enabling the “RunInOptimizedMode” property is the performance of the entire data-flow task is
improved
SSIS projects also have a “RunInOptimizedMode” property. This indicates that the
“RunInOptimizedMode” property of all the data-flow tasks in the project is overridden at design
time, and that all of data-flow tasks in the project run is optimized mode during debugging.

 Buffers:
A buffer is an in-memory dataset object utilized by the data flow engine to transform data. The
data flow task has a configurable property called “DefaultMaxBufferSize”, which is set to 10,000
by default. Data-flow task also has a configurable property called “DefaultBufferSize”, which is
set to 10 MB by default. Additionally, data-flow task has a property called “MaxBufferSize”,
which is set to 100 MB and cannot be changed.

 Buffer Sizing:
When performance tuning a data-flow task, the goal should be to pass as many records as
possible through a single buffer while efficiently utilizing memory. This begs the question: what
does “efficiently utilizing memory” mean? SSIS estimates the size of a buffer row by calculating
the data source meta-data at design time. Optimally, the buffer row size should be as small as
possible, which can be accomplished by employing the smallest possible data-type for each
column. SSIS automatically multiplies the estimated buffer row size by the
“DefaultMaxBufferRows” setting to determine how much memory to allocate to each buffer in
the data-flow engine. If this amount of memory exceeds Max Buffer Size100 MB, SSIS

Author: Vinay Kotha Page 18


automatically reduces the number of buffer rows to fit within the 100 MB boundary.

Data-flow task has another property called “MinBufferSize”, which is 64 KB and cannot be
changed. If the amount of memory estimated by SSIS to be allocated for each buffer is below 64
KB, SSIS will automatically increase the number of buffer rows per buffer in order to exceed
MinBufferSize memory boundary.

 Buffer Tuning:
Data-flow task has a property called “BufferSizeTuning”. When the value of this property is set
to true, SSIS will add information to the SSIS log indicating where SSIS had adjusted the buffer
size. While buffer tuning, the goal should be to fit as many rows into buffer as possible. Thus, the
value for “DefaultMaxBufferRows” should be as large as possible without exceeding a total
buffer size of 100 MB.

 Parallelism:
SSIS natively supports the parallel execution of packages, tasks and transformations. Therefore,
parallelism can greatly improve the performance of a package when it is configures with-in the
constraints of system resources. A package has a property called “MaxConcurrentExecutables”,
which can be configured to set the maximum number of threads that can execute in parallel per
package. By default this is set to -1, which translates to the number of logical machine
processors plus 2. All or some of the operations in a package can execute in parallel.

Additionally, data-flow task has a property called “EngineThreads”, which defines how many
threads the data-flow engine can create and run in parallel. This property applies equally to both
the source threads that the data flow engine creates for sources and the worker threads that
the engine creates for transformations and destinations. For example, setting the EngineThreads
property to 10 indicates that the data-flow engine can create upto 10 source threads and 10
worker threads.

 Extraction Tuning
a) Increase the connection manager’s packet size property: Use separate connection
managers for bulk loading and smaller packet size for ole-db command transformations
b) Affinitize network connections: this can be accomplished if a machine has multiple
cores and multiple NICs.
c) Tune Queries:
--Select only needed columns
--Use a hint to specify that no shared locks be used during the select (query can potentially
read uncommitted data). Used only if the query must have the best performance
d) Look-ups
-- Select only needed columns
-- Use the “Shared Look-up Cache” (available in 2008)

Author: Vinay Kotha Page 19


e) Sorting
Merge and Merge-Join transformations require sorted inputs. Source data for these
transformations that is already sorted obviates the need for an upstream Sort transformation
and improves data flow performance. The following properties must be configured on a source
component if the source data is already sorted
a) IsSorted: The outputs of a source component have a property called IsSorted. The value of
this property must be true.
b) Sort Key Position: Each output column of a source component has this property, which
indicates whether a column is sorted, the column’s sort order and the sequence in which
multiple columns are sorted. This property must be set for each column of sorted data.
 Transformation Tuning
Partially Blocking (Asynchronous): Merge, Merge-Join, union all can possible be optimized in the
source query
Use SSIS 2008:
-- Improved data flow task scheduler
-- Union All transforms no longer necessary to split up and parallelize execution trees
Blocking Transformations (Asynchronous): Aggregate, Sort, Pivot, Un-Pivot should be limited one
per data flow on the same data
Aggregate Transformations: This transformations includes the Keys, KeyScale, CountDistinctKeys
and CountDistinctScale properties, which improves performance by enabling the transformation
to pre-allocate the amount of memory that the transformation needs for the data that the
transformation caches. If the exact or approximate number of groups that are expected to result
from a Group By operation is known, then set the Keys and KeyScale properties respectively. If
the exact or approximate number of distinct values that are expected to result from a Distinct
Count operation is known, then set the CountDistinctKeys and CountDistinctScale properties
respectively.
If the creation of multiple aggregations in a data flow is necessary, then consider the creation of
multiple aggregations that use one Aggregate transformation instead of creating multiple
transformations. Performance is improved with this approach because when one aggregation is
a subset of another aggregation, the transformation’s internal storage is optimized by scanning
the incoming data only once. For example, if an aggregation uses a Group By clause and an AVG
aggregation, then performance can be improved by combining them into one transformation.
However, aggregation operations are serialized when multiple aggregations are performed
within one aggregation transformation. Therefore, performance might not be improved when
multiple aggregations must be computed independently.
 Merge-Join Transformation
Max Buffers Per Input: this property specifies the maximum number of buffers that can be active
for each input at one time. This property can be used to tune the amount of memory that
buffers consume, and consequently the performance of the transformation. As the number of
buffers increase, the more memory the transformation uses…which improves performance. The
default value of this property is 5. This is the number of buffers that works well in most

Author: Vinay Kotha Page 20


scenarios. Performance can be tuned by using a slightly different number of buffers such as 4 or
6.using a very small number of buffers should be avoided if possible. For example, there is a
significant impact on performance when MaxBuffersPerInput is set to 1 instead of 5.
Additionally, MaxBuffersPerInput shouldn’t be set to 0 or less. Throttling doesn’t occur with this
range of values. Also, depending on the data load the amount of memory available, the package
may not complete.
 Slowly Changing Dimensions
this wizard creates a set of data flow transformation components which work together with the
slowly changing dimension transformation component. This wizard creates OLE DB Command
transformation components that perform Update’s against a single row at a time. Performance
can be improved by replacing these transformation components with destination components
that save all rows to be updated to a staging table. Then, an Execute SQL Task can be added that
performs a single set-based T-SQL Update statement against all rows at the same time.
 Data Types
1) Use the smallest possible data-types in the data flow.
2) Use the CAST or CONVERT functions in the source query if possible
 Miscellaneous
1) Sort in the Query if possible
2) if possible, use the T-SQL Merge statement instead of the SCD transformation
3) If possible, use the T-SQL Insert Into statement instead of the data flow task
4) A data reload may perform better than a delta refresh
 Load Tuning
Use the SQL Server Destination
1) Only helps if the data flow and the destination databases are on the same machine
2) Weaker error handling then the OLE DB Destination
3) Set Commit Size = 0
Use OLE DB Destination
1) Set Commit Size = 0
Drop Indexes based on the expected % load growth
1) Don’t drop an index if it’s the only clustered index: Data in a table is sorted by a clustered
index. Primary keys are clustered indexes. Loading will always be faster than dropping and
recreating a primary key, and usually be faster than dropping and recreating a clustered index
2) Drop a non-clustered index if the load will cause 100% increase: This is the rule of thumb
3) Don’t drop non-clustered index if the load increase is under 10%: Not a rule of thumb,
experiment to find out the optimal value.
Use Partitions if Necessary
1) Use the SQL Server Profiler to trace the performance
2) see “The Data Load Performance Guide”
3) Use the Truncate statement instead of the t-sql Delete statement. Delete is a logged
operation which performs slower than Truncate

Author: Vinay Kotha Page 21


4) Affinitize the network

Differences between SSIS 2005 and SSIS 2008


There is no difference between the architecture of both the SSIS 2005 and SSIS 2008. 2008 has some
additional features which 2005 did not have, it can be said that 2008 is the enhancement of features to
the 2005 version.

Look-up
In 2005 for Error Output look-ups had only 3 options Fail Component, Ignore Failure and Re-direct row.
But in 2008 it has an additional feature “No match Out-Put”
In 2005 it did not had the Cache mode, while 2008 has 3 different Cache modes Full Cache, Partial Cache
and No Cache
2005 didn’t have the Connection Manager types while 2008 has OLE DB Connection Manager and Cache
Connection Manager

Cache Transformation
2005 did not have this transformation; it is introduced in 2008 version. This is a Data-flow
transformation. Cache transformation writes data from a connected data source in the data-flow to a
Cache Connection Manager. The Look-up transformation in a package performs lookups on the data
In a single package, only one Cache Transformation can write data to the same Connection Manager. If
the package contains multiple Cache transforms, then first Cache transform that are called when the
package runs, writes the data to the connection manager. The write operations of subsequent cache
transforms fail.
Configuring of the Cache can be made in the following way
1) Specify the connection manager
2) Map the input columns in the cache transform to destination columns in the Cache
connection manager

Data Profiling Task


2005 did not have this Task while it is introduced in 2008; this is a Control-flow task. It lets you analyze
data in a SQL Server database and from the results of that analysis, generate XML reports that can be
saved to a file or an SSIS variable. By configuring one or more of the task’s profile types, you can
generate a report that provides details such as a column’s minimum and maximum values, or the
number and percentage of null values.

Script Task and Transformation


2008 gives the option of writing the scripts either in VB or C#, where as 2005 only enabled the users to
write the scripts in only VB

Author: Vinay Kotha Page 22

You might also like