You are on page 1of 128

Interview

Practice
Technical
Questions

Made By :
The Most Awesome-est People

Non-Clustered Index Page


- Cover Page ····························································
1
- Index Page ····························································
2

1
- Model 1) Data Modeling············································ 3 ~
10
- Model 2) TSQL ··················································· 11 ~
58
- Model 3) Data Warehouse Design .................................. 59 ~
74
- Model 4) SSIS .................................................. 75 ~
101
- Model 5) SSAS ................................................. 102 ~
117
- Model 6) SSRS ................................................. 118 ~
129

**** IMPORTANT NOTES THAT YOU SHOULD READ FIRST ****

Note 1: Feel free to add more questions with accurate and well-explained answers. Also if you see any
errors, please fix them. Keep in mind to try to keep the same format!

Note 2: Do not edit the cover and index pages unless you are a moderator.

Note 3: All the keywords should be underlined.

Note 4: Problems in red are the ones that have a possible error and need to be reviewed.

Note 5: Some of the questions were answered as if the questions are scenario-based.

2
Model 1 - Database Designing

1. What is the difference between Top-down design and Bottom-up design?


○ Top-down: A designing method that starts from the high level
(overview) to the low level (implementation). It is a general
approach of designing a new DB.
○ Bottom-up: A designing method that starts from the low level
(implementation) to the high level (overview). It is generally
used when you redesign an application, when you need to reuse
some of the features of an existing application and when you
don’t need to start from the scratch.
2. Describe a primary key.
○ A column that uniquely identifies a row in a table
○ Cannot be repeating
○ Cannot be NULL
○ There can be only one primary key in a table
○ An primary key can be a composite primary keys
○ Primary key creates unique clustered indexes
4. Describe an unique key.
○ A column that uniquely identifies a row in a table
○ Cannot be repeating
○ Can accept only one NULL value
○ There can be more than one unique keys in one table
○ An unique key can be a composite unique keys
○ Unique key creates unique non-clustered indexes
5. Describe a foreign key.
○ A column in a table that points to the primary key in another
table
○ When it is an unary degree relationship, the foreign key points
to its own primary key.
○ It can accept NULLs but it is not recommended
○ It enforces referential integrity between the related tables.
6. Describe a candidate key.
○ Any key that has a potential to be a primary key is a candidate
key
7. What are the design levels of a database?
○ Conceptual level
■ Input: business requirements
■ Get an idea of what the DB will include based on the
business requirements
■ Identify entities and attributes
■ Output: an accurate ER diagram
○ Logical level

3
■ Input: an accurate ER diagram
■ Identify default constraints
■ Identify keys and key attributes
■ Identify relationships and their cardinality
■ The last step of the Logical level is normalization
■ Output: a normalized ER diagram
○ Physical level
■ Using data modeler tools such as ERWin or Microsoft Visio
to digitize an accurate and normalized ER diagram. In other
words, you transfer the ER diagram on a paper drawing to a
computer file format.
■ Select a particular platform to implement the ER diagram
with.
■ It is possible that the technological platform to be used
may be determined by a business requirement.
■ Finally, forward engineer your ER diagram.
8. Terminologies comparison Conceptual and Logical <-> Physical.
○ entities <-> table
○ attributes <-> column
○ relationship <-> physical relationship
○ key attribute <-> primary key
○ tuples <-> rows
9. What are the different types of attributes?
○ Simple vs. Composite
■ Simple: cannot be divided; Ex: SSN
■ Composite: can be split into component; Ex: Address

○ Single-valued vs. Multi-valued


■ Single-valued: can take only one value for each entity
instance; Ex: SSN
■ Multi-valued: can take multiple values; Ex: Skill set of
employees

4
○ Stored vs. Derived
■ Stored: attribute that is inputted manually to a database;
Ex: Name
■ Derived: attribute that can be calculated based on other
attributes; Ex: Number of year in the company

10. What is a relationship?


○ physical relationship between two or more entities
11. What are the types of relationship?
○ one-to-one
■ Ex: husband - wife
○ one-to-many
■ Ex: mother - children
○ many-to-many
■ Ex: students - teachers
○ one-to-fixed
■ Ex: employee - three phone numbers
○ one-to-one, one-to-many, and one-to-fixed are subsets of many-to-
many.
12. What does cardinality mean?
○ It simply means the maximum number of relationship. For
example, the cardinality of one-to-one relationship is one. The
cardinality of one-to-many and many-to-many relationships are
many. The cardinality of one-to-fixed (one-to-four for example)
is four.
13. What is degree and what are the three cases of degree?
○ Number of entity types that participate in a relationship
○ Unary: a relationship between two instances of one entity
■ Ex: [Person] - <Is_married_to>
○ Binary: a relationship between the instances of two entity types
■ Ex: [Wife] - <Is_married_to> - [Husband]

5
○ Ternary: a simultaneous relationship among the instances of three
entity types
■ Ex: [Part]
|
[Vendor]---<Ships>---[Warehouse]
13. How do you represent total and partial relationship on an ER
diagram?
○ Total: a double line connecting from an entity to its
relationship
○ Partial: just a solid line connecting from an entity to its
relationship
14. What is the difference between Strong Entity and Weak Entity?
○ Strong: can exist on its own
○ Weak: needs a strong entity to depend on
○ Ex: tax payer is a strong entity and his/her dependent is a weak
entity.
○ Weak entities always participate totally.
○ Know how to represent strong and weak entities on ER diagram!
15. What is Integrity?
○ Integrity is a way to keep the consistency and quality of data.
○ For example, if an employee is entered with an employee ID value
of 123, the database should not permit another employee to have
an ID with with the same value 123.
○ There are different types of integrity:
■ Domain Integrity: integrity on the complete system level.
● Primary Key and Unique Key
■ Referential Integrity: integrity on the entity level
● Foreign Key
■ User-defined Integrity: integrity defined by users.
● Triggers, User-defined data type, check constraints.
16. What is an ER diagram?
○ a blueprint, or pictorial analysis, of a database.
○ a diagram that analyzes how a database can look on the physical
level.
○ components of an ER diagram would be:
■ entity
■ attribute
■ relationship
○ ER diagram is platform independent
○ English form of an ER diagram is called ‘relational schema’.
■ e.g. Empdetails( E#, Project#, Role, Number_Of_shares,
Share_worth)

6
17. What are the 4 golden rules to sketch an ER Diagram? (something
that David told us...)
○ Identify the business process to track.
○ Identify all the entities and attribute that interact with the
business process.
○ Identify the relationship between the entities.
○ Perform normalization.
18. What are OLTP and OLAP and what’s the difference between OLTP
and OLAP?
○ OLTP stands for Online Transactional Processing
○ OLAP stands for Online Analytical Processing

<OLTP & OLAP Difference Table>


OLTP OLAP

Normalization highly normalized highly denormalized


Level

Usage database data warehouse

Data type current data historical data

Processing fast for delta operations fast for read operations


Speed (dml)

Purpose to control and run business to help with analyzing


tasks business information

Size many small tables few big flat tables

Operation delta operation (update, read operation (select)


insert, delete) aka DML

Terms Used table, columns and dimension table, fact table


relationships

19. What is Normalization?


○ Step by step process to reduce the degree of data redundancy.
○ Breaking down one big flat table into multiple table based on
normalization rules.
○ Optimizing the memory but not in term of performance.
○ Normalization will get rid of insert, update and delete
anomalies.
○ Normalization will improve the performance of the delta operation
(aka. DML operation); UPDATE, INSERT, DELETE

7
○ Normalization will reduce the performance of the read operation;
SELECT
20. What does functional dependency mean?
○ Functional dependency explains the relationship between an non-
key attribute and a key attribute within an entity.
○ If X and Y are an attribute of the same entity and if it is known
that Y is based on X, we say X determines Y or Y is functionally
dependent on X.
○ X -> Y is how you show X determines Y or Y is functionally
dependent on X.
■ Ex: Marks -> Grade
○ If X -> Y, Y -> X is not possible.
21. What are the different types of functional dependencies?
○ 1) Full: Y is fully functionally dependent on X if Y is
functionally dependent on X and not functionally dependent on any
subset of X.
○ Another definition: When a non-key attribute is dependent on all
the prime attributes.
■ Ex: Report(Student#, Course#, Lname, Room#, Marks)
Student#, Course# -> Marks shows full dependency.
Student#, Course# -> Lname does not show full dependency
because Lname is functionally dependent only on Student#.
○ 2) Partial: Y is partially dependent on X if Y is functionally
dependent only on a subset of X.
○ Another definition: When a non-key attribute is dependent on only
some of the prime attributes.
■ Ex: Report(Student#, Course#, Lname, Room#, Marks)
Student#, Course# -> Lname shows partial dependency because
Lname only depends on one prime attribute, Course#.
○ 3) Transitive: Z is transitively dependent on X if Y is
functionally dependent on X and Y is functionally dependent on Z.
○ Another definition: When a non-key attribute is dependent on a
non-key attribute, which is dependent on a key-attribute.
■ X -> Y, Y -> Z is X -> Z
■ BookTitle -> Author -> Author Nationality
BookTitle does not determine Author Nationality
22. What are the three degrees of normalization and how is
normalization done in each degree?
○ 1NF:
■ A table is in 1NF when:
● All the attributes are single-valued.

8
● With no repeating columns (in other words, there
cannot be two different columns with the same
information).
● With no repeating rows (in other words, the table
must have a primary key).
● All the composite attributes are broken down into its
minimal component.
● There should be SOME (full, partial, or transitive)
kind of functional dependencies between non-key and
key attributes.
■ 99% of times, it’s usually 1NF.
○ 2NF
■ A table is in 2NF when:
● It is in 1NF.
● There should not be any partial dependencies so they
must be removed if they exist.
○ 3NF
■ A table is in 3NF when:
● It is in 2NF.
● There should not be any transitive dependencies so
they must be removed if they exist.
○ BCNF
■ A stronger form of 3NF so it is also known as 3.5NF
■ We do not need to know much about it. Just know that here
you compare between a prime attribute and a prime attribute
and a non-key attribute and a non-key attribute.
23. What are the different types of design level you can work on in
ERWin?
○ Logical
■ For reverse engineering
■ Cannot forward engineer on this level
○ Physical
■ For forward engineering
■ Can select what kind of platform you want to use
○ Logical/Physical
■ Can do both logical and physical
■ Always pick this option!
24. What are the kinds of relationship?
○ Identifying relationship: Foreign key being part of its primary
key
○ Non-identifying relationship: Foreign key not being part of its
primary key
25. Explain about ERwin.

9
○ ERwin is a data modeling tool that can be used to create an ER
diagram in the conceptual and logical level.
○ ERwin has the features of...
■ Logical Design
■ Physical Design
■ Logical-to-Physical Transformation
■ Forward Engineering (top-down)
■ Reverse Engineering (bottom-up)
○ Some notes about ERWin:
■ Always try to use non-identifying relationships unless you
have to create a weak entity.
■ When creating a many-to-many relationship in the logical
design, you will be able to see its conjuction table when
you transform it into the physical design.
■ You should also check and change the data types when you
reach to the physical level.

Model 2 - TSQL

10
● Some Notes about TSQL
○ TSQL stands for Transact Structured Query Language.
○ TSQL is not case-sensitive but you can make it so.
○ Every database has the same folders: Database Diagrams, Tables,
Views, Synonyms, Programmability, Service Broker, Storage,
Security
○ TSQL servers are parallel, which means you can have a lot of
users at the same time (See Q3 for more information).
○ Every time you run a query, make your that you selected a correct
database.
○ There are two ways to make a comment.
■ 1. -- comment
■ 2. /* block comment */
○ DDL stands for Data Definition Language: CREATE, ALTER, DROP
○ DML stands for Data Manipulation Language: INSERT, UPDATE, DELETE
○ To access the log file:
■ Right click on the database whose log file you want to
access to
■ Click on Properties
■ Go to Files
○ Something + NULL = NULL

1. What are the two types of authentication on SQL Server 2008 ?


○ Windows: Map with the current user. In other words, you can
get on the server as long as you are on your Windows. Uses the
credentials for Windows.
○ SQL Server: Also known as mixed mode authentication. You need
both Windows and the SQL server authentication so this type of
authentication is more secure.
2. What are the different database objects ?
○ There are total seven database objects (6 permanent database
object + 1 temporary database object)
■ Permanent DB objects
● Table
● Views
● Stored procedures
● User-defined Functions
● Triggers
● Indexes
■ Temporary DB object
● Cursors
3. What is the difference between a connection and session ?

11
○ Connection: It is the number of instance connected to the
database. An instance is modelized soon as the application is
open again. 49 connections maximum are allowed in SQL Server 2008
○ Session: A session run queries.In one connection, SQL Server 2008
allowed 10 million of session for one connection.
4. What is the architecture of SQL Servers in terms of its hierarchy?
Server
|
Database
|
Schema
|
Tables
|
Columns
○ A server has multiple databases and a database has multiple
schemas and so on and on...
5. What is a constraint and what are the seven constraint ?
○ Constraint: something that limits the flow in a database.
○ 1. Primary key
○ 2. Foreign key
○ 3. Check
■ Ex: check if the salary of employees is over 40,000
○ 4. Default
■ Ex: If the salary of an employee is missing, place it with
the default value.
○ 5. Nullability
■ NULL or NOT NULL
○ 6. Unique Key
○ 7. Surrogate Key
■ mainly used in data warehouse
6. Give some commonly used data types in SQL Server
○ INT
■ 4 bytes
○ BIGINT
■ 8 bytes
○ SMALLINT
■ 2 bytes
○ TINYINT
■ 1 byte
○ CHAR()
■ 1 byte/character
■ fixed memory allocation

12
○ VARCHAR()
■ 1 byte/character
■ variable memory allocation
○ DATETIME
■ 8 bytes
■ stores DATA and TIME information
○ MONEY
■ 4 bytes
■ stores monetary information
○ NCHAR/NVARCHAR
■ ? bytes
■ unicode datatype, Excel files
○ TEXT
■ whole chunk of information
○ FLOAT/NUMERIC/DECIMAL
■ 8 bytes
■ Ex: NUMERIC(4,2)
● 4 is number’s mantissa and 2 is its precision.
● Ex: 1234.56
7. How differently would you use CHAR() and VARCHAR()?
○ VARCHAR() is used when you are not sure about the maximum length
of its value (for variable length). VARCHAR() is also used for
saving memories as well.
■ Ex: first name, last name, email...
○ CHAR() is used when you know the length of of fixed characters.
■ Ex: SSN, Phone Number...
○ You can do VARCHAR(MAX). MAX can contain information up to 2gb.
8. What is a lightweight primary key ?
○ It is a unique key with NOT NULL constraint.
○ More efficient than a primary key because a primary key creates
unique clustered indexes, which takes a lot of maintenance. On
the other hand, a light weight primary key, also an unique key,
creates non-clustered indexes which does not require a lot of
maintenance.
○ Can be used as an alternate of regular primary keys and used for
small and simple DB.
9. What is the command GO in SQL 2008 ?
○ GO is used to separate batches from scripts and
○ It is an internal way of refreshing SQL Server with the help of
TSQL script.
○ It is highly recommended to use GO statement after every DDL
statement.
○ It can also be used as a loop. For example:

13
■ INSERT INTO Table1 VALUES (‘Jason’)
■ GO 5
■ This will insert value ‘Jason’ into Table1 five times.
10. What are the three different types of methods of creating new
tables?
○ Method 1 (Best): Developers has full control over constraint
names by creating a table first and add constraints later using
ALTER TABLE. No particular order of table creation is required
(no code dependency). ERWin uses this method.
○ Method 2 (Not as good): This method will make constraint names
automatically assigned by a SQL Server so once you have a lot of
constraints, it will be hard to keep track of them. The order of
creating tables are important (code dependency). Since constraint
names are auto-generated, you have to specifically find the
constraints in the Object Explorer and delete them.
○ Method 3 (Better): This method is a hybrid of method 1 and 2. The
developers have control over constraint names, but the order of
table creation is still important (code dependency exists here as
well). Constraints are created at the end of column definition.
11. How can we change the name of a column with SQL request ?
○ Using the sp_RENAME
12. What is a Surrogate Key ?
○ ‘Surrogate’ means ‘Substitute’.
○ Surrogate key is always implemented with a help of an identity
column.
○ Identity column is a column in which the value are automatically
generated by a SQL Server based on the seed value and incremental
value.
○ Identity columns are ALWAYS INT, which means surrogate keys must
be INT.
○ Identity columns cannot have any NULL and cannot have repeated
values.
○ Surrogate key is a logical key.
13. What is a derived column , hows does it work , how it affects
the performance of a database and how can it be improved?
○ The Derived Column a new column that is generated on the fly by
applying expressions to transformation input columns.
Ex: FirstName + ‘ ‘ + LastName AS ‘Full name’
○ Derived column affect the performances of the data base due to
the creation of a temporary new column.
○ Execution plan can save the new column to have better performance
next time.
14. What is a LIKE operator and what are the four wildcard?

14
○ The LIKE operator is used to search for a specified pattern in a
string column.
1. % - searches for 0,1 or many character or digit
2. _ - searches 1 space for any character or digit
3. ^ - This tells us not to search the following character or digit
○ e.g. ‘^abc’ indicates that a string will look for only bc
4. [ ] - searches for a range of characters or digits inside the
brackets.
○ ‘[abc][d-f][g-hxyz]’ will look for 3 characters where
the first character can only be either a, b, or c and
the second character will be d through f and the third
character has to be g through h or x, y, or z.
○ Ex:
-- Employees whose title starts with 'E'
SELECT * FROM HumanResources.Employee
WHERE Title LIKE 'E%'

-- Employees whose 2nd character is 'u' and have at least


two 's's in the title with third last character 'a'
SELECT * FROM HumanResources.Employee
WHERE Title LIKE '_u%a__' AND
Title LIKE '%s%s%'

15. What is an ESCAPE key and how would you use it?
○ While using the keyword LIKE you might want to search a wild
card. In order to search the actual wildcard character in
a string you have to use an escape character before it. In
SQL Server you must define your own escape key (which can be
character)
○ Ex:
-- Get me info of all employees who has '%' in their title
(of course nobody has % in their title)
SELECT * FROM HumanResources.Employee
WHERE Title LIKE '%#%%' ESCAPE '#'

17. What are some DATE/TIME functions to remember?


GETDATE(): get the current data
SELECT GETDATE() -- current datetime
SELECT YEAR(GETDATE()) -- get only the year part of it
SELECT YEAR('1988-08-13') -- 1998
SELECT MONTH(GETDATE()) -- get only the month number part
SELECT DAY(GETDATE()) -- get only the day

15
DATEPART(interval, date): allows you to extract any portion of
the date
SELECT DATEPART(hour, GETDATE())
SELECT DATEPART(minute, GETDATE())
SELECT DATEPART(second, GETDATE())

DATEDIFF(interval, start_date, end_date): returns the difference


of two date portions
SELECT EmployeeID, HireDate, DATEDIFF(YEAR, HireDate, GETDATE())
AS Experience
FROM HumanResources.Employee

DATEADD(interval, incremental_int, date): adds a number to a data


portion
SELECT DATEADD(MONTH, 5, GETDATE())
SELECT DATEADD(YEAR, 8, GETDATE())

17. What are some String functions to remember?


LEN(string): returns the length of string
SELECT LastName, LEN(LastName)

UPPER(string) & LOWER(string): returns its upper/lower string


SELECT LastName, UPPER(LastName), LOWER(LastName)
FROM Person.Contact

LTRIM(string) & RTRIM(string): remove empty string on either ends


of the string
SELECT LTRIM(' xxx') -- left side
SELECT RTRIM('xxx ') -- right side
SELECT LTRIM(RTRIM(' xxx ')) -- both sides

LEFT(string): extracts a certain number of characters from left


side of the string
SELECT LastName, LEFT(LastName,2) -- will return first 2
characters of LastName
FROM Person.Contact

RIGHT(string): extracts a certain number of characters from right


side of the string
SELECT LastName, RIGHT(LastName,2) -- - will return last 2
characters of LastName
FROM Person.Contact

16
SUBSTRING(string, starting_position, length): returns the
substring of the string
SELECT LastName, SUBSTRING(LastName, 2, 3)
FROM Person.Contact
-- Remember that the last argument is the length of the
substring, -- not the ending character position!!!

REVERSE(string): returns the reverse string of the string


SELECT LastName, REVERSE(LastName)
FROM Person.Contact

Concatenation: Just use + sign for it


SELECT FirstName + ' ' + ISNULL(MiddleName,'') + ' ' + LastName
AS FullName
FROM Person.Contact

REPLACE(string, string_replaced, string_replace_with)


SELECT LastName, REPLACE(LastName, 'a', 'X') AS Temp <-- derived
column so no changes done to the table
FROM Person.Contact
PATINDEX('%pattern%', expression): Returns the starting position
of the first occurrence of a pattern in a specified expression,
or zeros if the pattern is not found, on all valid text and
character data types.
-PAT index helps find patterns within a chunk of a string by
allowing the use of wildcards
SELECT DocumentID, DocumentSummary,
PATINDEX('%bicycle%',DocumentSummary)
FROM Production.Document

CHARINDEX(expressionToFind, expressionToSearch [, start_location]


): Searches an expression for another expression and returns its
starting position if found.
-CHARINDEX has to find an exact match as it doesn’t accept
wildcards
SELECT DocumentID, DocumentSummary, CHARINDEX('bicycle',
DocumentSummary)
FROM Production.Document

18. What are the two data conversion functions?


CAST(<column to convert> AS datatype)

17
SELECT CAST(123 AS VARCHAR(10)) + 'a' -- makes 123 as a string so
the datatypes of 123 and 'a' are matching.

CONVERT(datetype, <column>)
SELECT CONVERT(VARCHAR(10), 123) + 'a'
SELECT CONVERT(INT, '3214') + 1

CONVERT function has an optional third parameter which is the


format code.
SELECT CONVERT(VARCHAR(50), GETDATE()) -- convert date to string
SELECT CONVERT(VARCHAR(50), GETDATE(), 101) -- convert date to
string
SELECT CONVERT(VARCHAR(50), GETDATE(), 107) -- convert date to
string

19. What are the five aggregate functions and what are the rules of
them?
● SUM, MIN, MAX, AVG, COUNT
● 1. If a non-aggregate column is selected along with aggregate
function then all non-aggregate columns MUST be included in GROUP
BY clause.
● 2. All the columns included in GROUP BY clause may/may not be in
SELECT predicate.
● 3. Generally, HAVING is used in conjunction with GROUP BY.
● 4. If I am using GROUP BY clause I may/may not use HAVING clause.
● 5. HAVING clause is used to filter data based on aggregate
functionality.

20. How do you create a Computed Column on SQL?


Format: CREATE TABLE Tablename (
Name_of_Computed_Column AS Formula)
Ex: CREATE TABLE Human (
Name VARCHAR(50),
DOB DATE,
AGE AS DATEDIFF(YEAR, DOB, GETDATE()))
21. What is the SQL server query execution sequence?
○ FROM -> goes to Secondary files via primary file
○ WHERE -> applies filter condition (non-aggregate column)
○ SELECT -> dumps data in tempDB system database
○ GROUP BY -> groups data according to grouping predicate
○ HAVING -> applies filter condition (aggregate function)
○ ORDER BY -> sorts data ascending/descending
22. What does ORDER BY do?

18
○ ORDER BY sorts your result based on some columns
23. What does TOP do?
○ TOP gets a list of first n records.
24. What does DISTINCT do?
○ DISTINCT gets rid of duplicates and returns only unique data
entries.
25. What is a SELECT INTO statement and how does it work?
○ It is a way to copy a table to a new table.
○ It first creates a table with the schema same as the result set.
○ Secondly, it loads a table with the result set.
○ Ex:
SELECT SalesPersonID, SUM(TotalDue) AS 'Total Sales'
INTO NewDB.dbo.Sales_Aggregation -- New table here
FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL
GROUP BY SalesPersonID
ORDER BY SalesPersonID
26. How do you copy just the structure of a table?
SELECT *
INTO NewDB.dbo.Emp_Structure
FROM HumanResources.Employee
WHERE 1=0 -- Put any condition that does not make any
sense.
27. What are the different types of Joins?
○ INNER JOIN: Gets all the matching records from both the left and
right tables based on joining columns.
○ LEFT OUTER JOIN: Gets all non-matching records from left table &
AND one copy of matching records from both the tables based on
the joining columns.
○ RIGHT OUTER JOIN: Gets all non-matching records from right table
& AND one copy of matching records from both the tables based on
the joining columns.
○ FULL OUTER JOIN: Gets all non-matching records from left table &
all non-matching records from right table & one copy of matching
records from both the tables.
○ CROSS JOIN: returns the Cartesian product.
28. What are the different types of Restricted Joins?
○ SELF JOIN: joining a table to itself
○ RESTRICTED LEFT OUTER JOIN: gets all non-matching records from
left side
○ RESTRICTED RIGHT OUTER JOIN - gets all non-matching records from
right side

19
○ RESTRICTED FULL OUTER JOIN - gets all non-matching records from
left table & gets all non-matching records from right table.
29. What is a sub-query?
○ It is a query within a query
○ Syntax:
SELECT <column_name> FROM <table_name>
WHERE <column_name> IN/NOT IN
(
<another SELECT statement>
)
○ Everything that we can do using sub queries can be done using
Joins, but anything that we can do using Joins may/may not be
done using Subquery.
○ Sub-Query consists of an inner query and outer query. Inner query
is a SELECT statement the result of which is passed to the outer
query. The outer query can be SELECT, UPDATE, DELETE. The result
of the inner query is generally used to filter what we select
from the outer query.
○ We can also have a subquery inside of another subquery and so
on. This is called a nested Subquery. Maximum one can have is 32
levels of nested Sub-Queries.
31. What are the SET Operators?
○ SQL set operators allows you to combine results from two or more
SELECT statements.
○ Syntax:
SELECT Col1, Col2, Col3 FROM T1
<SET OPERATOR>
SELECT Col1, Col2, Col3 FROM T2
○ Rule 1: The number of columns in first SELECT statement must be
same as the number of columns in the second SELECT statement.
○ Rule 2: The metadata of all the columns in first SELECT statement
MUST be exactly same as the metadata of all the columns in second
SELECT statement accordingly.
○ Rule 3: ORDER BY clause do not work with first SELECT statement.
○ UNION, UNION ALL, INTERSECT, EXCEPT
31. What is a derived table?
○ SELECT statement that is given an alias name and can now
be treated as a virtual table and operations like joins,
aggregations, etc. can be performed on it like on an actual
table.
○ Scope is query bound, that is a derived table exists only in the
query in which it was defined.
SELECT temp1.SalesOrderID, temp1.TotalDue FROM

20
(SELECT TOP 3 SalesOrderID, TotalDue
FROM Sales.SalesOrderHeader
ORDER BY TotalDue DESC) AS temp1
LEFT OUTER JOIN
(SELECT TOP 2 SalesOrderID, TotalDue
FROM Sales.SalesOrderHeader
ORDER BY TotalDue DESC) AS temp2
ON temp1.SalesOrderID = temp2.SalesOrderID
WHERE temp2.SalesOrderID IS NULL
32. What is a View?
○ Views are database objects which are virtual tables whose
structure is defined by underlying SELECT statement and is mainly
used to implement security at rows and columns levels on the base
table.
○ One can create a view on top of other views.
○ View just needs a result set (SELECT statement).
○ We use views just like regular tables when it comes to query
writing. (joins, subqueries, grouping....)
○ We can perform DML operations (INSERT, DELETE, UPDATE) on a view.
It actually affects the underlying tables only those columns can
be affected which are visible in the view.
33. What are the types of views?
○ Regular View: It is a type of view in which you are free to make
any DDL changes on the underlying table.
○ Ex:
CREATE TABLE T1 (
ID INT,
Name VARCHAR(20))

INSERT INTO T1 VALUES (1,'a'),(2,'b'),(3,'c'),(4,'d')

-- create a regular view


CREATE VIEW v_regular AS
SELECT * FROM T1

-- display a view
SELECT * FROM v_regular

-- use a DROP statement (one of the DDL operations) to the


-- underlying table
DROP TABLE T1

-- display a view again

21
SELECT * FROM v_regular -- but it will throw an error

○ Schemabinding View: It is a type of view in which the schema


of the view (column) are physically bound to the schema of the
underlying table. We are not allowed to perform any DDL changes
to the underlying table for the columns that are referred by the
schemabinding view structure.
■ All objects in the SELECT query of the view must
be specified in two part naming conventions
(schema_name.tablename).
■ You cannot use * operator in the SELECT query inside the
view (individually name the columns)
■ All rules that apply for regular view.
○ Ex:
CREATE TABLE T2 (
ID INT,
Name VARCHAR(20))

INSERT INTO T2 VALUES (1,'a'),(2,'b'),(3,'c'),(4,'d')

CREATE VIEW v_schemabound WITH SCHEMABINDING AS


SELECT ID, Name
FROM dbo.T2 -- remember to use two part naming convention

DROP TABLE T2 -- will throw an error since it’s a


-- schemabinding view
○ Indexed View: See Problem 89.
35. What does WITH CHECK do?
○ WITH CHECK is used with a VIEW.
○ It is used to restrict DML operations on the view according to
search predicate (WHERE clause) specified creating a view.
○ Users cannot perform any DML operations that do not satisfy the
conditions in WHERE clause while creating a view.
○ WITH CHECK OPTION has to have a WHERE clause.
○ Ex:
CREATE VIEW v_check AS
SELECT * FROM Test
WHERE ID BETWEEN 1 AND 10
WITH CHECK OPTION

INSERT INTO Test VALUES (10,'hh','NY')


-- following will throw an error
INSERT INTO Test VALUES (99,'Jihoon','Norfolk')

22
35. What is a RANKING function and what are the four RANKING
functions?
○ Ranking functions are used to give some ranking numbers to each
row in a dataset based on some ranking functionality.
○ Every ranking function creates a derived column which has integer
value.
○ Different types of RANKING function:
■ ROW_NUMBER(): assigns an unique number based on the
ordering starting with 1. Ties will be given different
ranking positions.
■ RANK(): assigns an unique rank based on value. When the set
of ties ends, the next ranking position will consider how
many tied values exist and then assign the next value a
new ranking with consideration the number of those previous
ties. This will make the ranking position skip placement
position numbers based on how many of the same values
occurred (ranking not sequential).
■ DENSE_RANK(): same as rank, however it will maintain its
consecutive order nature regardless of ties in values;
meaning if five records have a tie in the values, the
next ranking will begin with the next ranking position.
(sequential)
See result set table for example:
Name Tips Made Row Number Rank Dense Rank

Ally $10 1 1 1

Ben $10 2 1 1

Cathy $20 3 3 2

David $20 4 3 2

Edward $30 5 5 3

Frank $30 6 5 3

■ Syntax:
<Ranking Function>() OVER(condition for ordering)
-- always have to have an OVER clause
■ Ex:
SELECT SalesOrderID,
SalesPersonID,
TotalDue,
ROW_NUMBER() OVER(ORDER BY TotalDue),

23
RANK() OVER(ORDER BY TotalDue),
DENSE_RANK() OVER(ORDER BY TotalDue)
FROM Sales.SalesOrderHeader
■ NTILE(n): Distributes the rows in an ordered partition into
a specified number of groups.
36. What is PARTITION BY?
○ Creates partitions within the same result set and each partition
gets its own ranking. That is, the rank starts from 1 for each
partition.
○ Ex:
SELECT *, DENSE_RANK() OVER(PARTITION BY Country ORDER BY
Sales DESC) AS DenseRank
FROM SalesInfo
37. What is Temporary Table and what are the two types of it?
○ They are tables just like regular tables but the main difference
is its scope.
○ The scope of temp tables is temporary whereas regular tables
permanently reside.
○ Temporary table are stored in tempDB.
○ We can do all kinds of SQL operations with temporary tables just
like regular tables like JOINs, GROUPING, ADDING CONSTRAINTS,
etc.
○ Two types of Temporary Table
■ Local
● #LocalTempTableName -- single pound sign
● Only visible in the session in which they are
created.
● It is session-bound.
■ Global
● ##GlobalTempTableName -- double pound sign
● Global temporary tables are visible to all sessions
after they are created, and are deleted when
the session in which they were created in is
disconnected.
● It is last logged-on user bound. In other words, a
global temporary table will disappear when the last
user on the session logs off.
38. Explain Variables.
○ Variable is a memory space (place holder) that contains a scalar
value EXCEPT table variables, which is 2D data.
○ Variable in SQL Server are created using DECLARE Statement.
○ Variables are BATCH-BOUND.
○ Variables that start with @ are user-defined variables.

24
○ Variables that start with @@ are system variables.
○ Syntax: DECLARE @var INT
○ Ex:
DECLARE @var INT -- declare an integer
SET @var = 10 -- insert a value into a variable
SELECT @var -- reading a variable

○ Assigning values using a SELECT statement:


SELECT @sum = SUM(TotalDue),
@avg = AVG(TotalDue),
@max = MAX(TotalDue)
FROM Sales.SalesOrderHeader

-- I can do this as well with variables.


SELECT @sum + @avg + @max AS SumOfEverything

○ Ex: Write a select statement that will display the entire


transaction record where total due is equal to the maximum
account.
SELECT *
FROM Sales.SalesOrderHeader
WHERE TotalDue = @max
39. Explain Dynamic SQL (DSQL).
○ Dynamic SQL refers to code/script which can be used to operate
on different data-sets based on some dynamic values supplied by
front-end applications. It can be used to run a template SQL
query against different tables/columns/conditions.
○ Declare variables: which makes SQL code dynamic
○ Main disadvantage of D-SQL is that we are opening SQL Server for
SQL Injection attacks.
○ You should build the SQL script by concatenating strings and
variable.
○ Ex:
DECLARE @list VARCHAR(100),
@table VARCHAR(100),
@where VARCHAR(100),
@startingRange INT,
@endingRange INT,
@sql VARCHAR(300)
SET @table = 'HumanResources.Employee'
SET @list = 'EmployeeID, Title, HireDate'
SET @where = 'HireDate'
SET @startingRange = '40000'

25
SET @endingRange = '50000'
SET @sql = 'SELECT ' + @list +
' FROM ' + @table +
' WHERE ' + @where +
' BETWEEN ' + CONVERT(VARCHAR(5), @startingRange) +
' AND ' + CONVERT(VARCHAR(5), @endingRange)
-- PRINT @sql
EXEC(@sql)
40. What is SQL Injection Attack?
○ Moderator’s definition: when someone is able to write a code at
the front end using DSQL, he/she could use malicious code to
drop, delete, or manipulate the database. There is no perfect
protection from it but we can check if there is certain commands
such as 'DROP' or 'DELETE' are included in the command line.
○ SQL Injection is a technique used to attack websites by inserting
SQL code in web entry fields.
41. What is SELF JOIN?
○ JOINing a table to itself
○ When it comes to SELF JOIN, the foreign key of a table points to
its primary key.
○ Ex: Employee(Eid, Name, Title, Mid)
○ Know how to implement it!!!
42. What is Correlated Subquery?
○ It is a type of subquery in which the inner query depends on
the outer query. This means that that the subquery is executed
repeatedly, once for each row of the outer query.
○ In a regular subquery, inner query generates a resultset that is
independent of the outer query.
○ Ex:
SELECT *
FROM HumanResources.Employee E
WHERE 5000 IN (SELECT S.Bonus
FROM Sales.SalesPerson S
WHERE S.SalesPersonID = E.EmployeeID)
○ The performance of Correlated Subquery is very slow because its
inner query depends on the outer query. So the inner subquery
goes through every single row of the result of the outer
subquery.
43. What is the difference between Regular Subquery and Correlated
Subquery?
○ Based on the explanation of Problem 42, an inner subquery is
independent from its outer subquery in Regular Subquery. On the
other hand, an inner subquery depends on its outer subquery in

26
Correlated Subquery.

44. What are the differences between DELETE and TRUNCATE?


DELETE TRUNCATE

Statement (or DML statement that deletes DDL statement that wipes out
Operation) rows from a table and can the entire table and you
also specify rows using a cannot delete specific rows.
WHERE clause.

Logging Logs every row deleted in Does minimal logging, minimal


the log file. as not logging everything.
TRUNCATE will remove the
pointers that point to their
pages, which are deallocated.

Performance Slower since DELETE records Faster since TRUNCATE does


every row that is deleted. not record into the log file.

Identity Column DELETE continues using the TRUNCATE resets the identity
earlier max value of the column.
identity column.

Triggers Can have triggers on Cannot have triggers on


DELETE. TRUNCATE.

45. What are the three different types of Control Flow statements?
○ 1. WHILE
■ Syntax:
WHILE (condition)
BEGIN
statement
END
■ Ex: Print the counter variable
DECLARE @counter INT
SET @counter = 0

WHILE (@counter <= 0)


BEGIN
PRINT @counter
SET @counter = @counter + 1
END
○ 2. IF-ELSE
■ Syntax:
IF (condition)
BEGIN

27
statement
END
ELSE IF (condition)
BEGIN
statement
END
ELSE
BEGIN
statement
END
■ Ex: Check if the variable is even or odd
DECLARE @x INT = 3

IF (@x % 2 = 0)
BEGIN
PRINT ‘even’
END
ELSE
BEGIN
PRINT ‘odd’
END
○ 3. CASE
■ Evaluates a list of conditions and returns one of multiple
possible result expression.
■ Used in a SELECT statement
■ Syntax:
CASE
WHEN <condition 1> THEN <output1>
WHEN <condition 2> THEN <output2>
...
WHEN <condition n-1> THEN <output n-1>
ELSE <condition n>
END
■ Ex: Defining levels of employees based on their
experiences.
SELECT E.EmployeeID,
E.HireDate,
DATEDIFF(YEAR, E.HireDate, GETDATE()) AS
Experience,
CASE
WHEN DATEDIFF(YEAR, E.HireDate, GETDATE())
BETWEEN 0 AND 5 THEN 'Level 1'

28
WHEN DATEDIFF(YEAR, E.HireDate, GETDATE())
BETWEEN 6 AND 10 THEN 'Level 2'
ELSE 'Level 3' -- WHEN DATEDIFF(YEAR,
E.HireDate, GETDATE()) > 10 THEN 'Level 3'
END AS 'Level'
FROM HumanResources.Employee AS E
46. What is Table Variable? Explain its advantages and
disadvantages.
○ If we want to store tabular data in the form of rows and columns
into a variable then we use a table variable.
○ It is able to store and display 2D data (rows and columns).
○ We cannot perform DDL (CREATE, ALTER, DROP).
○ Advantages:
■ Table variables can be faster than permanent tables.
■ Table variables need less locking and logging resources.
○ Disadvantages:
■ Scope of Table variables is batch bound.
■ Table variables cannot have constraints.
■ Table variables cannot have indexes.
■ Table variables do not generate statistics.
■ Cannot ALTER once declared (Again, no DDL statements).
○ Ex:
-- Declare a table variable
DECLARE @tbl TABLE(id INT, name VARCHAR(50))

-- DML operations are allowed on a table variable!


INSERT INTO @tbl VALUES (1,'a'),(2,'b'),(3,'c')

-- declare a table variable again


DECLARE @emp TABLE (eid INT, cid INT)

-- inserting into a table variable


INSERT INTO @emp
SELECT EmployeeID, ContactID
FROM HumanResources.Employee

-- can do JOINs as well!


SELECT *
FROM @emp AS E
INNER JOIN Person.Contact AS C
ON E.cid = C.ContactID

29
47. What are the differences between Temporary Table and Table
Variable?
Temporary Table Table Variable

Statement (or Can perform both DML and Can perform only DML, but not
Operation) DDL DDL

Scope Session bound Batch bound

Syntax CREATE TABLE #temp DECLARE @var TABLE(...)

Index Can have indexes Cannot have indexes

48. Explain Execution Plan.


○ Query optimizer is a part of SQL server that models the way
in which the relational DB engine works and comes up with the
most optimal way to execute a query. Query Optimizer takes into
account amount of resources used, I/O and CPU processing time
etc. to generate a plan that will allow query to execute in most
efficient and faster manner. This is known as EXECUTION PLAN.
○ Optimizer evaluates a number of plans available before choosing
the best and faster on available.
○ Every query has an execution plan.
○ Definition by the mod: Execution Plan is a plan to execute a
query with the most optimal way which is generated by Query
Optimizer. Query Optimizer analyzes statistics, resources used,
I/O and CPU processing time and etc. and comes up with a number
of plans. Then it evaluates those plans and the most optimized
plan out of the plans is Execution Plan. It is shown to users as
a graphical flow chart that should be read from right to left and
top to bottom.
49. What is Stored Procedure (SP)?
○ It is one of the permanent DB objects that is precompiled set of TSQL
statements that can accept and return multiple variables.
○ It is used to implement the complex business process/logic. In other
words, it encapsulates your entire business process.
○ Compiler breaks query into Tokens. And passed on to query optimizer.
Where execution plan is generated the very 1st time when we execute a
stored procedure after creating/altering it and same execution plan is
utilized for subsequent executions.
○ Database engine runs the machine language query and execute the code
in 0's and 1's.

30
○ When a SP is created all Tsql statements that are the part of SP are
pre-compiled and execution plan is stored in DB which is referred for
following executions
○ Explicit DDL requires recompilation of SP's.

○ Syntax:
-- Syntax for creating SP
CREATE PROC <proc_name> (@para1 INT, @para2 ...) AS
<any TSQL statements>

-- Syntax for executing SP


EXEC <proc_name> ‘para1’,’para2’ -- again, when you first
-- execute this SP for
-- the first time, it will
-- create its execution
-- plan.
50. What are the four types of SP?
○ System Stored Procedures (SP_****): built-in stored procedures
that were created by Microsoft.
○ User Defined Stored Procedures: stored procedures that are
created by users. Common naming convention (usp_****)
○ CLR (Common Language Runtime): stored procedures that are
implemented as public static methods on a class in a
Microsoft .NET Framework assembly.
○ Extended Stored Procedures (XP_****): stored procedures that can
be used in other platforms such as Java or C++.
51. What is a nested SP?
○ Executing a SP inside a SP.
52. Show the Five Examples of SP.
○ SP with no parameters:
CREATE PROC usp_emp_list AS -- you can also use ALTER
SELECT E.EmployeeID, C.FirstName, C.LastName
FROM HumanResources.Employee AS E
INNER JOIN Person.Contact AS C
ON E.ContactID = C.ContactID

EXEC usp_emp_list

○ SP with a single input parameter:


ALTER PROC usp_emp_list(@eid INT) AS
SELECT E.EmployeeID, C.FirstName, C.LastName
FROM HumanResources.Employee AS E
INNER JOIN Person.Contact AS C

31
ON E.ContactID = C.ContactID
WHERE E.EmployeeID = @eid

EXEC usp_emp_list 189

○ SP with multiple parameters:


ALTER PROC usp_emp_list(@start_eid INT, @end_eid INT) AS
SELECT E.EmployeeID, C.FirstName, C.LastName
FROM HumanResources.Employee AS E
INNER JOIN Person.Contact AS C
ON E.ContactID = C.ContactID
WHERE E.EmployeeID BETWEEN @start_eid AND @end_eid

EXEC usp_emp_list 15,20

○ SP with output parameters: extracting data from a stored


procedure based on an input parameter and outputting them using
output variables.
CREATE PROC usp_getInfo (@eid int, @t VARCHAR(50) OUT, @DOB
DATE OUT) AS
SELECT @t = Title, @DOB = BirthDate
FROM HumanResources.Employee
WHERE EmployeeID = @eid
-----------------------END OF SP---------------------
DECLARE @t VARCHAR(50)
DECLARE @b DATE

EXEC usp_getInfo 200, @t OUT, @b OUT

PRINT @t
PRINT @b

○ SP with RETURN statement (the return value is always single and


integer value)
CREATE PROC usp_rowcount AS
DECLARE @c INT
SELECT @c = COUNT(*) FROM HumanResources.Employee
RETURN @c

DECLARE @count INT


EXEC @count = usp_rowcount
PRINT @count

32
53. What are the characteristics of SP?
○ SP can have any kind of DML and DDL statements.
○ SP can have error handling (TRY ... CATCH).
○ SP can use all types of table.
○ SP can output multiple integer values using OUT parameters, but
can return only one scalar INT value.
○ SP can take any input except a table variable.
○ SP can set default inputs.
○ SP can use DSQL.
○ SP can have nested SPs.
○ SP cannot output 2D data (cannot return and output table
variables).
○ SP cannot be called from a SELECT statement. It can be executed
using only a EXEC/EXECUTE statement.
○ REMEMBER:
■ RETURN statement can return ONLY INT value.
■ Once RETURN statement is executed, Execution Control
is returned to the next statement after EXEC/EXECUTE
statement.
■ If you want to return non-numeric values from SP, use
output parameters (users can have any number of input and
output parameters).
■ All the parameters must be passed in accordance with their
definition while creating SP (there has to be one-to-one
mapping between parameters).
54. What are the advantages of SP?
○ Precompiled code hence faster.
○ They allow modular programming, which means it allows you to
break down a big chunk of code into smaller pieces of codes. This
way the code will be more readable and more easier to manage.
○ Reusability.
○ Can enhance security of your application. Users can be granted
permission to execute SP without having to have direct
permissions on the objects referenced in the procedure.
○ Can reduce network traffic. An operation of hundreds of lines of
code can be performed through single statement that executes the
code in procedure rather than by sending hundreds of lines of
code over the network.
○ SPs are pre-compiled, which means it has to have an Execution
Plan so every time it gets executed after creating a new
Execution Plan, it will save up to 70% of execution time. Without
it, the SPs are just like any regular TSQL statements.
55. What is Default Input and how do you use it?

33
○ In case users don’t provide any input values for a SP, you can
set up a default values for your input.
○ Ex:
ALTER PROC usp_temp (@start_eid INT = 1, @end_eid INT = 290) AS
SELECT E.EmployeeID, C.FirstName, C.LastName
FROM HumanResources.Employee AS E
INNER JOIN Person.Contact AS C
ON E.ContactID = C.ContactID
WHERE E.EmployeeID BETWEEN @start_eid AND @end_eid
ORDER BY E.EmployeeID

EXEC usp_temp 100, 120 -- you can still provide values


EXEC usp_temp -- this will use the default inputs.
EXEC usp_temp @end_eid = 100 -- you have to specify the
-- variable and input its value
56. Explain about Recompiling.
○ Query Optimizer generates a new Execution Plan the first time SPs
are executed after getting CREATed of ALTERed. Thereafter, the
same execution plan is used for following executions.
○ We can also force the Query Optimizer to generate a new execution
plan for SP. This is called Recompiling.
○ One of the main reasons for Recompiling is that we create a new
index or alter existing indexes on the TABLEs or VIEWs being used
in SPs (Or when any DDL operation is performed on the tables
or views). This means that there is a faster plan than the one
already being used.
○ Parameters passed may also affect the SP’s execution plan when
SPs take parameters whose values differ in such a way that it
causes different optimized execution plans to be created often.
57. What are the three different ways to Recompile?
○ 1. WITH RECOMPILE option during CREATE time: A SP with this
option gets recompiled every time that is is executed. Not
commonly used because it slows down the SP execution because of
recompilation before every single execution.
○ Syntax:
CREATE PROC usp_info(...) WITH RECOMPILE AS

○ 2. WITH RECOMPILE option during EXECUTION time: A SP gets


recompiled only for this execution. It will use the saved plan
for the next executions.
○ Syntax:
EXEC usp_info ... WITH RECOMPILE

34
○ 3. SP_Recompile (System SP): forces recompilation of a SP the
next time that it is run after executing the sp_recompile
statement. In other words, this system SP will give a flag (aka
mark) which indicates that this SP will be recompiled next time.
Once it is recompiled, the SP will be unflagged. You would use it
when you don’t want to execute but still recompile later.
○ Syntax:
sp_recompile <name_of_SP>
○ Ex:
sp_recompile ‘dbo.testproc1’ -- marking for recompilation
EXEC dbo.testproc1

58. What is User Defined Functions (UDF)?


○ UDFs are a database object and a precompiled set of TSQL
statements that can accept parameters, perform complex business
calculation, and return of the action as a value.
○ The return value can either be single scalar value or result set-
2D data.
○ UDFs are also pre-compiled and their execution plan is saved.
○ PASSING INPUT PARAMETER(S) IS/ARE OPTIONAL, BUT MUST HAVE A
RETURN STATEMENT.
59. What is the difference between SP and UDF?
Stored Procedure User-Defined Function

Calling/Execution Must be executed Must be called implicitly


explicitly with an EXECUTE from SELECT/WHERE/HAVING
statement. clause(s).

Return Value may or may not return any must return something, which
value. When it does, it can be either scalar/table-
must be scalar INT. valued.

Out Value Can have OUT parameters. Cannot have OUT parameters.

Temporary Table Can create temporary Cannot access to temporary


tables. tables.

Error-handling Can have robust error No robust error handling


handling in SP (TRY/CATCH, available in UDF like TRY/
transactions). CATCH and transactions.

Purpose Used to implement complex Used to implement complex


business logic. business formula.

Operations Can include any DDL and Cannot have any DDL and
DML statements. can do DML only with table
variables.

35
Nested calling Can call other SPs and Can call other UDFs but not
UDFs. SPs.

Syntax -CREATE/ALTER PROC <usp_*> UDF must have BEGIN .. END


-EXEC <usp_*> block except in-line UDF.

60. What are the types of UDF?


○ 1. Scalar
■ 1.1 Deterministic UDF: UDF in which particular input
results in particular output. In other words, the output
depends on the input.
■ 1.2 Non-deterministic UDF: UDF in which the output does not
directly depend on the input.
○ Ex:
-- Scalar: Deterministic
CREATE FUNCTION UDF_Addition(@no1 INT, @no2 INT)
RETURNS INT -- UDF must return something!
AS
BEGIN
RETURN @no1 + @no2
END

SELECT dbo.UDF_Addition(10,12)
PRINT dbo.UDF_Addition(10,12)

-- Scalar: Non-deterministic
CREATE FUNCTION UDF_Random()
RETURNS INT -- UDF must return something!
AS
BEGIN
DECLARE @var INT
-- display current time in ms
SET @var = DATEPART(MS, GETDATE()) +
DATEPART(SECOND, GETDATE()) * 1000 +
DATEPART(MINUTE, GETDATE()) * 60 * 1000 +
DATEPART(HOUR, GETDATE()) * 60 * 60 * 1000
RETURN @var
END

SELECT dbo.UDF_Random()
PRINT dbo.UDF_Random()

36
○ 2. In-line UDF: UDFs that do not have any function
body(BEGIN...END) and has only a RETURN statement. In-line UDF
must return 2D data.
○ Ex:
CREATE FUNCTION Emp_Function(@EmpID INT)
RETURNS TABLE AS
RETURN (SELECT *
FROM AdventureWorks.HumanResources.Employee
WHERE EmployeeID = @EmpID)

-- since the returned value is a table, you have to use


FROM
SELECT * FROM dbo.Emp_Function(10)
-- which is same as this derived table form
SELECT * FROM (SELECT *
FROM AdventureWorks.HumanResources.Employee
WHERE EmployeeID = @EmpID) AS Temp

-- You can also give a default value for a parameter


ALTER FUNCTION Emp_Function(@EmpID INT = 1)
RETURNS TABLE AS
RETURN (SELECT *
FROM AdventureWorks.HumanResources.Employee
WHERE EmployeeID = @EmpID)
-- Then use the word, DEFAULT.
SELECT * FROM dbo.Emp_Function(DEFAULT)

○ 3. Multi-line or Table Valued Functions: It is an UDF that has


its own function body (BEGIN ... END) and can have multiple TSQL
statements that return a single output. Also must return 2D data
in the form of table variable.
○ Ex:
CREATE FUNCTION Emp1(@empID INT)
RETURNS @TabVar TABLE(ID INT, -- returns a table variable
FirstName VARCHAR(100),
Title VARCHAR(50))
AS
BEGIN
INSERT INTO @TabVar
SELECT E.EmployeeID, C.FirstName, E.Title
FROM AdventureWorks.HumanResources.Employee AS E
INNER JOIN AdventureWorks.Person.Contact AS C
ON E.ContactID = C.ContactID

37
WHERE EmployeeID = @empID

DELETE FROM @TabVar WHERE ID = @empID


RETURN
END

SELECT * FROM dbo.Emp1(10)

61. What is the difference between a nested UDF and recursive UDF?
○ Nested UDF: calling an UDF within an UDF
○ Ex:
CREATE FUNCTION F1()
RETURNS TABLE AS
RETURN (SELECT * FROM F2())
○ Recursive UDF: calling an UDF within itself
○ Ex:
CREATE FUNCTION F1()
RETURNS TABLE AS
RETURN (SELECT * FROM F1())
Ex: <
Anchor Member Statement (set at Level =0)
UNION ALL
Recursive Member Statement (set next level at Level +1)
>
62. What is a Trigger?
○ It is a precompiled set of TSQL statements that are automatically
executed on a particular DDL, DML or log-on event.
○ Triggers do not have any parameters or return statement.
○ Triggers are the only way to access to the INSERTED and DELETED
tables (aka. Magic Tables).
○ You can DISABLE/ENABLE Triggers instead of DROPPING them:
DISABLE TRIGGER <name> ON <table/view name>/DATABASE/ALL SERVER
ENABLE TRIGGER <name> ON <table/view name>/DATABASE/ALL SERVER
63. What are the types of Triggers?
○ 1. DML Trigger
■ DML Triggers are invoked when a DML statement such as
INSERT, UPDATE, or DELETE occur which modify data in a
specified TABLE or VIEW.
■ A DML trigger can query other tables and can include
complex TSQL statements.
■ They can cascade changes through related tables in the
database.

38
■ They provide security against malicious or incorrect DML
operations and enforce restrictions that are more complex
than those defined with constraints.
■ Two options:
● 1) AFTER/FOR DML Trigger: AFTER TRIGGERs are executed
after the DML action is performed. The first DML
statement gets executed then the trigger body gets
executed. AFTER TRIGGERs can be specified only on
tables. A table can have several AFTER TRIGGERs for
each triggering DML action.
● Syntax:
CREATE TRIGGER t1 ON <table/view_name>
AFTER/FOR <DML_action> AS
BEGIN
...
END
Ex: --NOTE: table is called test_trigger to test it out
CREATE TABLE test_trigger (
ID INT,
Name VARCHAR(50))

INSERT INTO test_trigger VALUES


(1,'a'),(2,'b'),(3,'c'),(4,'d')

CREATE TRIGGER tg_t1 ON test_trigger


AFTER INSERT,DELETE
AS
BEGIN
PRINT 'Inside after trigger.'
END

INSERT INTO test_trigger VALUES (5,'e') -- the


trigger fires upon execution
DELETE FROM test_trigger WHERE ID = 5 -- the
trigger fires upon execution

DROP TRIGGER tg_t1 -- how you drop a trigger

● INSTEAD OF TRIGGER: triggers are fired in place


of the triggering DML action. In other words, it
performs an alternative action for the given DML.
Can be specified for VIEWs and TABLEs. Each TABLE
and VIEW can have only one INSTEAD OF TRIGGER for

39
each triggering DML action. The syntax is the same
as AFTER/FOR except you use INSTEAD OF rather than
AFTER/FOR.
● Ex:
ALTER TRIGGER tg_t1 ON test_trigger
INSTEAD OF INSERT,DELETE
AS
BEGIN
PRINT 'Inside instead of trigger.'
END

INSERT INTO test_trigger VALUES (6,'f') --


trigger fired and this INSERT statement will
not be executed
DELETE FROM test_trigger -- trigger fired and
this DELETE statement will not be executed

○ 2. DDL Trigger
■ Pretty much the same as DML Triggers but DDL Triggers are
for DDL operations.
■ DDL Triggers are at the database or server level (or
scope).
■ DDL Trigger only has AFTER. It does not have INSTEAD OF.
■ Syntax:
CREATE TRIGGER <trg_name> ON DATABASE/ALL SERVER
AFTER <DDL_event> AS
BEGIN
...
END
■ Example of DDL events: create_table, drop_table,
alter_table, create_index, drop_index...
■ Ex:
CREATE TRIGGER trg_drop ON DATABASE
AFTER drop_table AS
BEGIN
PRINT 'Table has been dropped.'
ROLLBACK -- rollbacks current transaction
including DROP table
END
■ How to drop a DDL Trigger:
DROP TRIGGER <name> ON DATABASE/ALL SERVER
○ 3. Logon Trigger
■ Logon triggers fire in response to a logon event.

40
■ This event is raised when a user session is established
with an instance of SQL server.
■ Logon TRIGGER has server scope.
■ Syntax:
CREATE TRIGGER <name> ON ALL SERVER
AFTER LOGON AS....
■ Not really recommended to use a Logon Trigger because it
gets triggered every time a user logs on, which means it
can make the database very unstable and not safe.
64. What are ‘inserted’ and ‘deleted’ tables (aka. magic tables)?
○ They are tables that you can communicate with between the
external code and trigger body.
○ The structure of inserted and deleted magic tables depends upon
the structure of the table in a DML statement.
○ UPDATE is a combination of INSERT and DELETE, so its old record
will be in the deleted table and its new record will be stored in
the inserted table.
65. What is a Transaction?
○ It is a set of TSQL statement that must be executed together as a
single logical unit.
○ Has ACID properties:
■ 1. Atomicity: Transactions on the DB should be all or
nothing. So transactions make sure that any operations in
the transaction happen or none of them do.
■ 2. Consistency: Values inside the DB should be consistent
with the constraints and integrity of the DB before and
after a transaction has completed or failed.
■ 3. Isolation: Ensures that each transaction is separated
from any other transaction occurring on the system.
■ 4. Durability: After successfully being committed to the
RDMBS system the transaction will not be lost in the event
of a system failure or error.
○ Actions performed on explicit transaction:
■ BEGIN TRANSACTION: marks the starting point of an explicit
transaction for a connection.
■ COMMIT TRANSACTION (transaction ends): used to end an
transaction successfully if no errors were encountered. All
DML changes made in the transaction become permanent.
■ ROLLBACK TRANSACTION (transaction ends): used to erase a
transaction which errors are encountered. All DML changes
made in the transaction are undone.
■ SAVE TRANSACTION (transaction is still active): sets a
savepoint in a transaction. If we roll back, we can only

41
rollback to the most recent savepoint. Only one save
point is possible per transaction. However, if you nest
Transactions within a Master Trans, you may put Save points
in each nested Tran. That is how you create more than one
Save point in a Master Transaction.
○ Ex:
BEGIN TRAN Tran1
DROP TABLE dbo.whatever1
COMMIT TRAN Tran1

BEGIN TRAN Tran2


DROP TABLE dbo.whatever2
SAVE TRAN Tran2 -- will rollback to here
DROP TABLE dbo.whatever3
ROLLBACK TRAN Tran2

66. What are the three different types of Error Handling?


○ 1. TRY CATCH
■ Syntax:
BEGIN TRY
<set of TSQL statements that might generate an error>
END
BEGIN CATCH
<set of TSQL statements that should be executed if
an
error occurs>
END
■ The first error encountered in a TRY block will direct you
to its CATCH block ignoring the rest of the code in the TRY
block will generate an error or not.
■ Ex:
BEGIN TRY
SELECT 1/0 -- error here
SELECT * FROM T2 -- so this will never get executed
END TRY
BEGIN CATCH
PRINT ‘Error occurred.’
END CATCH
○ 2. @@error
■ stores the error code for the last executed SQL statement.
■ If there is no error, then it is equal to 0.
■ If there is an error, then it has another number (error
code).

42
■ Ex:
SELECT 1/0
IF @@error > 0
BEGIN
PRINT ‘error!!!’
END
ELSE IF @@error = 0
PRINT ‘success!!!’
END
○ 3. RAISERROR() function
■ A system defined function that is used to return messages
back to applications using the same format which SQL Server
uses for errors or warning message.
■ Format: RAISERROR(<error_message>, severity, state)
■ severity 0 ~ 10: info message or warning message
■ severity 11 ~ 18: errors
■ ‘state’ is used to identify the location of an error faster
in case you have multiple RAISERROR statements with the
same error description and severity.
○ 4. Transaction
○ 5. @@rowcount
67. Explain about Cursors.
○ Cursors are a temporary database object which are used to loop
through a table on row-by-row basis.
○ There are five types of cursors:
■ 1. Static: shows a static view of the data with only the
changes done by session which opened the cursor.
■ 2. Dynamic: shows data in its current state as the cursor
moves from record-to-record.
■ 3. Forward Only: move only record-by-record
■ 4. Scrolling: moves anywhere.
■ 5. Read Only: prevents data manipulation to cursor data
set.
○ Syntax:
-- declare a cursor
DECLARE <cursor_name> STATIC FORWARD_ONLY
CURSOR FOR
<any SELECT statement>
-- reading from a cursor
OPEN <cursor_name>

FETCH NEXT/FIRST/PRIOR/ABSOLUTE FROM <cursor_name>

43
CLOSE <cursor_name>

DEALLOCATE <cursor_name>
68. What is an Index in TSQL?
http://www.youtube.com/watch?v=R5RJlgQTI38&feature=related
○ It is the second step to optimize TSQL. (partition is the first
step actually)
○ It is a database object that is used to optimize the performance
of the Read operation, not the delta operations.
○ Think of ‘Indexes’ as an index page in a book.
69. What is the difference between scan and seek?
○ Scan: going through from the first page to the last page of an
offset by offset or row by row.
○ Seek: going to the specific node and fetching the information
needed.
○ ‘Seek’ is the fastest way to find and fetch the data. So if you
see your Execution Plan and if all of them is a seek, that means
it’s optimized.
70. Why are the DML operations are slower on Indexes?
○ It is because the sorting of indexes and the order of sorting has
to be always maintained.
○ When inserting or deleting a value that is in the middle of the
range of the index, everything has to be rearranged again. It
cannot just insert a new value at the end of the index.
71. What is a heap (table on a heap)?
○ When there is a table that does not have a clustered index, that
means the table is on a heap.
○ Ex: Following table ‘Emp’ is a table on a heap.
SELECT * FROM Emp
WHERE ID BETWEEN 2 AND 4 -- This will do scanning.
72. What is the architecture in terms of a hard disk, extents, and
pages?
○ A hard disk is divided into Extents.
○ Every extent has eight pages.
○ Every page is 8KBs ( 8060 bytes).

44
73. How is a table on a heap stored on a hard disk?
○ The rows of a table are stored on a hard disk in a random and
scattered manner.
○ Each data segment (offset) of a table is linked like a linked
list. In other words, the rows in a heap are not stored
contiguously.
○ This is the worst way of storing data because you have to jump
from an offset to an offset.
74. What is a RID and what is it used for?
○ RID stands for ‘Row Identifier’.
○ It is used to pinpoint a data on a hard disk.
○ Format:
(Extent #, Page #, Offset #)
74. What are the ranges of a page number and offset number?
○ Page #: 0 ~ 7
○ Offset:#: 0 ~ 8059
75. What is Table Scan?
○ It is a method of getting information by scanning a table row-by
-row. It exists only on a table on a heap. This is the worst way
of getting information from a table.
76. What are the nine different types of Indexes?
○ 1. Clustered
○ 2. Non-clustered
○ 3. Covering
○ 4. Full Text Index
○ 5. Spatial
○ 6. Unique
○ 7. Filtered

45
○ 8. XML
○ 9. Index View
77. What is a Clustering Key?
○ It is a column on which I create any type of index is called a
Clustering Key for that particular index.
78. Explain about a Clustered Index.
○ Unique Clustered Indexes are automatically created when a PK is
created on a table.
○ But that does not mean that a column is a PK only because it has
a Clustered Index.
○ Clustered Indexes store data in a contiguous manner. In other
words, they cluster the data into a certain spot on a hard disk
continuously.
○ The clustered data is ordered physically.
○ You can only have one CI on a table.
79. What is a B-Tree?
○ It is a data structure that dynamically rebalances itself based
on how much data it contains.
○ In SQL Server, a B-Tree is automatically generated in the
background when you create an index. So every index has its own
B-Tree.
○ The root and intermediate nodes have key values and pointers to
their child nodes no matter what kind of index you have.
○ A B-Tree of CI has actual data on its leaf nodes. A B-Tree of
NCI-H has key values of the NCI and RID (row identifier) on its
leaf nodes. A B-Tree of NCI-CI has key values of the NCI and the
key values of CI on its leaf nodes.
○ Each node of a B-Tree is a page, which means each node can
contain 8KB = 8060 Bytes of data.
80. What happens when Clustered Index is created?
○ First, a B-Tree of a CI will be created in the background.
○ Then it will physically pull the data from the heap memory and
physically sort the data based on the clustering key.
○ Then it will store the data in the leaf nodes.
○ Now the data is stored in your hard disk in a continuous manner.
81. Why is there only one CI on a table?
○ When a CI is created, it will store the actual data on the leaf
nodes. If there is one more CI created on the same table, it will
have the same data on the leaf node, which will be just useless
duplicated data. Also, data is physically ordered only in one way
so it wouldn’t make sense if there is one more CI.
82. What are the four different types of searching information in a
table?

46
○ 1. Table Scan -> the worst way
○ 2. Table Seek -> only theoretical, not possible
○ 3. Index Scan -> scanning leaf nodes
○ 4. Index Seek -> getting to the node needed, the best way
83. What is the difference between NCI-CI and NCI-H?
○ NCI-H has key values of the NCI and RID on the leaf nodes. Pretty
much the pointer to the data in your heap memory.
○ NCI-CI has key values of the NCI and the key values of the CI.
Pretty much the pointer to the root node of the CI.
84. What happens when you drop a CI?
○ The table becomes a table on a heap.
○ The data on a B-Tree won’t go away and it will be moved to the
heap memory. Just the B-Tree structure will be dropped.
○ If there is a NCI-CI that is pointing to the root node if the CI,
it will become a NCI-H.
85. What happens when you disable a CI?
○ ALTER INDEX index_name ON T9 DISABLE
○ The B-Tree of CI is still there but you cannot access any data in
the B-Tree.
○ Meaning that the NCIs pointing to that CI will be disabled as
well.
○ It is used when maintaining indexes to take care of index
fragment issues.
86. What is a Covering Index?
○ It is an extended and enhanced version of a NCI. It is also known
as a light weight NCI. Only defined on a NCI. Cannot be defined
on a CI.
○ It is a type of index that allows you to get requested columns in
a SELECT predicate without performing key lookup.
○ It is implemented with the help of INCLUDED and the columns in
INCLUDE do not have to be sorted. They are just added to the
matching data in the leaf nodes blindly.
○ So it gives you higher performance.
○ INCLUDE can have up to 16 columns or 900 bytes, whichever comes
first.
○ There can be up to 1023 covering indexes.
○ Covering Index helps you with only SELECT predicates, not search
predicates.
○ Syntax:
CREATE INDEX ind ON T9 (Address)
INCLUDE (Phone, Name)
87. How will you simulate a CI with covering index functionality?
ex) Emp(id, name, address, phone, salary)

47
CREATE CLUSTERED INDEX idx1 on Emp(id)

CREATE NONCLUSTERED INDEX idx2 on Emp(name)


INCLUDE (address, phone, salary)
88. How would you avoid key lookups using Wide NCI and Covering
Index? (See the example of the problem #86)
○ 1. Using Wide NCI
CREATE NONCLUSTERED INDEX idx3 Emp (name, address, phone,
salary)
-- nothing wrong with it but since it has more columns, it
-- will do more sorting. So this will be slower.
○ 2. Using Covering Index
CREATE NONCLUSTERED INDEX idx4 Emp (name)
INCLUDE (address, phone, salary)
89. What is a Filtered Index?
○ Like a Covering Index, it is an advanced version of NCI, in which
users have control over the number of entries that can be added
onto the leaf node for B-Tree structure of NCI.
○ Only those records which fall in the condition specified by WHERE
clause while creating a Filtered Index will be added to the leaf
node of the B-Tree.
○ A Filtered Index is more optimized than a normal NCI which
has all the entries for all the rows in case if your search
predicates would always be from specific range.
○ Syntax:
CREATE NONCLUSTERED INDEX ind_fnci_t9_name ON T9(Name)
WHERE Name >= ‘AAA’ AND Name <= ‘DZZ’ AND IS BETWEEN 10 AND 20
INCLUDE (Address) -- can be a filtered covering index!!
90. What is an Indexed View?
○ It is technically one of the types of View, not Index.
○ Using Indexed Views, you can have more than one clustered index
on the same table if needed.
○ All the indexes created on a View and underlying table are shared
by Query Optimizer to select the best way to execute the query.
○ Both the Indexed View and Base Table are always in sync at any
given point.
○ Indexed Views cannot have NCI-H, always NCI-CI, therefore a
duplicate set of the data will be created.
○ Steps of creating an Indexed View:
■ 1. Check the switches (recommended but not mandatory)
SET ANSI_NULLS ON (default: OFF) -- where ID = NULL
SET ANSI_PADDING ON -- VARCHAR is not trail trimmed

48
SET CONCAT_NULL_YIELDS NULL ON -- String + NULL = NULL
SET QUOTED_IDENTIFIER ON (default: OFF) -- can use “
■ 2. Create Schemabinding View
■ 3. Create Unique Clustered Index on View
○ Once the schemabinding view has an unique CI, then it will hold
the actual data so it won’t be a virtual table anymore. It
becomes an actual physical table.
○ You have to maintain fragmentation on both the underlying table
and the view because each will have each B-Tree.
○ Using an Indexed View, you have an unique CI on multiple tables
joined by JOINs.
91. So what are the ways to have more than one CI virtually on a
same table?
○ using Covering Indexes, Wide Indexes, or Indexed Views
92. What is a HINT technology?
○ HINT technology is used to override the query optimizer.
○ In other words, the developer decides which Execution Plan is
used.
○ Not recommended to use because you gotta be smarter than Query
Optimizer.
○ Three Types of HINT:
■ 1. Join
■ 2. Query
■ 3. Table
93. What is Fragmentation?
○ Fragmentation is a phenomenon in which storage space is used
inefficiently.
○ In SQL Server, Fragmentation occurs in case of DML statements on
a table that has an index.
○ When any record is deleted from the table which has any index, it
creates a memory bubble which causes fragmentation.
○ Fragmentation can also be caused due to page split, which is the
way of building B-Tree dynamically according to the new records
coming into the table.
○ Taking care of fragmentation levels and maintaining them is the
major problem for Indexes.
○ Since Indexes slow down DML operations, we do not have a lot of
indexes on OLTP, but it is recommended to have many different
indexes in OLAP.
94. What are the two types of fragmentation?
○ 1. Internal Fragmentation

49
■ It is the fragmentation in which leaf nodes of a B-Tree
is not filled to its fullest capacity and contains memory
bubbles.
■ If there are many NCI on CI then the internal fragmentation
on the CI will cascades all NCI, as corresponding key
values of the NCI must be also deleted.
■ Internal fragmentation is mainly at the leaf nodes but it’s
possible for intermediate and root nodes to have it as
well.
○ 2. External Fragmentation
■ It is fragmentation in which the logical ordering of the
pages does not match the physical ordering of the pages on
the secondary storage device.
■ In other words, external fragmentation occurs when most of
the pages of the B-Tree are scattered in your hard disk.
■ More serious than Internal Fragmentation since it drops the
performance of the read operation sharply.
95. What are the two ways to take care of those fragmentation?
○ REBUILD
■ same as dropping and recreating the index from scratch.
■ more costly as it may take long time to rebuild the index.
■ ONLINE option is available to make data accessible to the
users while the index is being rebuilt.
○ REORGANIZE
■ Rearranges the data contagiously to the front part of the
leaf nodes.
■ Pushes the memory bubbles to the end.
■ A temporary solution and eventually the index still needs
to be rebuilt.
96. What is DMF?
○ Dynamic Management Function
○ sys.dm_db_index_physical_stats(database object id, table or view
object id, index object id, partition number, flag)
○ The two columns to look at:
■ 1. avg_page_space_used_in_percent -> Internal Fragmentation
● You want to keep the percentage of it as high as
possible.
■ 2. avg_fragmentation_in_percent -> External Fragmentation
● You want to keep the percentage of it as low as
possible.
97. What are the five System Server Databases?
○ 1. master

50
■ It is a starting point of your SQL Server that is linked to
everything.
■ It holds server level information like users, login names,
passwords, linked servers, server level settings...
■ It holds all the pointers to all the user defined
databases.
■ If ‘master’ fails, the whole DBs will fail as well.
○ 2. model
■ holds templates that are implemented by Microsoft that will
help developers develop a database from scratch.
○ 3. tempDB
■ holds temporary data such as grouping, sorting, temporary
tables data
■ think of it as a scratch paper for TSQL
■ it is session bound, which means it will be dropped once
SQL server is closed
○ 4. msdb
■ holds schedules of TSQL scripts, SSIS, MDX, or etc. that
should be executed without human interaction. SQL Server
Agent will execute them according to the schedule.
○ 5. resource
■ read-only DB that contains all the system objects for SQL
server.
■ holds metadata
■ not visible to users
■ has DMV (Data Management View) for users.
98. Explain about File Groups.
○ How data files are grouped.
■ 1. Primary Data Files (.mdf = master data file)
● Holds all the pointers to your database objects in
that particular database. Plus, it has the database
level information.
● It is a starting point of a database.
■ 2. Secondary Data Files (.ndf)
● Holds actual data.
■ 3. Log Files (.ldf)
● Holds logging information.
● DBA can manage Log Files using three Recovery Models:
a. Full: logs every single thing
b. Simple: logs everything except any bulk
operation
c. Bulk Logged: logs only bulk operations
99. What is Table Partitioning?

51
○ A process of physical dividing table into smaller tables (not
logically) based on the boundary values.
○ It improves the performance of your READ operation.
○ Every partitioned table will be considered as smaller tables
internally.
○ Every partitioned table will have its secondary data files.
○ Normalization breaks down columns and Table Partitioning breaks
down rows.
100. What are the steps in partitioning a table?
○ 1. Create a partition Function:
■ it is a first step of partitioning a table. It defines a
boundary point for partitioning of data along with the data
type on which the partition needs to be done.
■ Syntax:
CREATE PARTITION FUNCTION partFunc (INT) AS
RANGE LEFT FOR VALUES (1000, 2000, 3000, 4000)
/*
Range LEFT
1. -inf to 1000
2. 1001 to 2000
3. 2001 to 3000
4. 3001 to 4000
5. 4001 to +inf

Range Right
1. -inf to 999
2. 1000 to 1999
3. 2000 to 2999
4. 3000 to 3999
5. 4000 to + inf */

○ 2. Create Partition Scheme


■ partition scheme decides physical file groups on to the
corresponding data needs to be partitioned and must be
mapped with partition function. (Mapping each partition to
each file group)
■ Syntax:
CREATE PARTITION SCHEME partscheme AS
PARTITION partFunc TO
([FG1], [FG2], [FG3], [FG4], [FG5])
○ 3. Create a table with Partition Scheme
■ Syntax:
CREATE TABLE Emppp (

52
ID INT IDENTITY (1,1),
Name VARCHAR(100))
ON partscheme(ID)
101. How do you partition a table that is already created?
○ If there is a CI on a table, you have to drop it.
○ And you have to create a CI again on the scheme.
○ Then every partition will have a separate B-Tree from scratch.
○ Then you can further optimize each partition.
102. What are Statistics?
○ Statistics allow the Query Optimizer to choose the optimal path
in getting the data from the underlying table.
○ Statistics are histograms of max 200 sampled values from columns
separated by intervals.
○ Every statistic holds the following info:
■ 1. The number of rows and pages occupied by a table’s data
■ 2. The time that statistics was last updated
■ 3. The average length of keys in a column
■ 4. Histogram showing the distribution of data in column
103. What is the very last stop/tool to optimize a query?
○ SQL Profiler and DTA (Database Engine Tuning Advisor)
○ SQL Profiler generates a trace file (.trc), which is a zoomed-in
and very detailed version of Execution Plan. Using SQL Profiler,
you can capture the query you want to optimize.
○ Input of SQL Profiler: a slow running query
○ Output of SQL Profiler: a trc file
○ DTA is a trc file translator. So it goes through a trace file
that is generated by SQL Profiler and display how to optimize the
query further.
○ Input of Database Engine Tuning Advisor: a trace file
○ Output of Database Engine Tuning Advisor: Microsoft
recommendation on how to improve your query in English
○ Still does not mean that the query is optimized to its maximum
point.
104. What is the User-defined Data Types?
○ As the name says, it is a data type that is defined by a user so
you give an alias name to a data type.
○ For example, you can use PHONE instead of typing NVARCHAR(25)
every time.
105. What are some optimization techniques in TSQL?
○ 1. Build indexes. Using indexes on a table, It will dramatically
increase the performance of your read operation because it will
allow you to perform index scan or index seek depending on your
search predicates and select predicates instead of table scan.

53
Building non-clustered indexes, you could also increase the
performance further.
○ 2. You could also use an appropriate filtered index for your non-
clustered index because it could avoid performing a key lookup.
○ 3. You could also use a filtered index for your non-clustered
index since it allows you to create an index on a particular part
of a table that is accessed more frequently than other parts.
○ 4. You could also use an indexed view, which is a way to create
one or more clustered indexes on the same table. In that way,
the query optimizer will consider even the clustered keys on
the indexed views so there might be a possible faster option to
execute your query.
○ 5. Do table partitioning. When a particular table as a billion of
records, it would be practical to partition a table so that it
can increase the read operation performance. Every partitioned
table will be considered as physical smaller tables internally.
○ 6. Update statistics for TSQL so that the query optimizer will
choose the most optimal path in getting the data from the
underlying table. Statistics are histograms of maximum 200 sample
values from columns separated by intervals.
○ 7. Use stored procedures because when you first execute a stored
procedure, its execution plan is stored and the same execution
plan will be used for the subsequent executions rather than
generating an execution plan every time.
○ 8. Use the 3 or 4 naming conventions. If you use the 2 naming
convention, table name and column name, the SQL engine will take
some time to find its schema. By specifying the schema name or
even server name, you will be able to save some time for the SQL
server.
○ 9. Avoid using SELECT *. Because you are selecting everything, it
will decrease the performance. Try to select columns you need.
○ 10. Avoid using CURSOR because it is an object that goes over a
table on a row-by-row basis, which is similar to the table scan.
It is not really an effective way.
○ 11. Avoid using unnecessary TRIGGER. If you have unnecessary
triggers, they will be triggered needlessly. Not only slowing the
performance down, it might mess up your whole program as well.
○ 12. Manage Indexes using RECOMPILE or REBUILD.
The internal fragmentation happens when there are a lot of data
bubbles on the leaf nodes of the b-tree and the leaf nodes are
not used to its fullest capacity. By recompiling, you can push
the actual data on the b-tree to the left side of the leaf level
and push the memory bubble to the right side. But it is still a

54
temporary solution because the memory bubbles will still exist
and won’t be still accessed much.
The external fragmentation occurs when the logical ordering of
the b-tree pages does not match the physical ordering on the
hard disk. By rebuilding, you can cluster them all together,
which will solve not only the internal but also the external
fragmentation issues.
You can check the status of the fragmentation by using Data
Management Function, sys.dm_db_index_physical_stats(db_id,
table_id, index_id, partition_num, flag), and looking at
the columns, avg_page_space_used_in_percent for the internal
fragmentation and avg_fragmentation_in_percent for the external
fragmentation.
○ 13. Try to use JOIN instead of SET operators or SUB-QUERIES
because set operators and sub-queries are slower than joins and
you can implement the features of sets and sub-queries using
joins.
○ 14. Avoid using LIKE operators, which is a string matching
operator but it is mighty slow.
○ 15. Avoid using blocking operations such as order by or derived
columns.
○ 16. For the last resort, use the SQL Server Profiler. It
generates a trace file, which is a really detailed version of
execution plan. Then DTA (Database Engine Tuning Advisor) will
take a trace file as its input and analyzes it and gives you the
recommendation on how to improve your query further.
106. How do you present the following tree in a form of a table?
A
/ \
B C
/ \ / \
D E F G

CREATE TABLE tree (


node CHAR(1),
parentNode CHAR(1),
[level] INT)

INSERT INTO tree VALUES ('A', null, 1),


('B', 'A', 2),
('C', 'A', 2),
('D', 'B', 3),
('E', 'B', 3),

55
('F', 'C', 3),
('G', 'C', 3)

SELECT * FROM tree

Result: A NULL 1
B A 2
C A 2
D B 3
E B 3
F C 3
G C 3

107. How do you reverse a string without using REVERSE(‘string’)?


CREATE PROC rev (@string VARCHAR(50)) AS
BEGIN
DECLARE @new_string VARCHAR(50) = ''
DECLARE @len INT = LEN(@string)

WHILE (@len <> 0)


BEGIN
DECLARE @char CHAR(1) = SUBSTRING(@string, @len, 1)
SET @new_string = @new_string + @char
SET @len = @len - 1
END

PRINT @new_string
END

EXEC rev 'jihoon'


108. How do you create a primary key without a clustered index?
○ You can define a primary key index as NONCLUSTERED to prevent the
table rows from being ordered according to the primary key, but
you cannot define a primary key without some associated index.
○ There are three methods to achieve this:
○ Method 1
Create Table tblTest
(
Field1 int Identity not null primary key nonclustered,
Field2 varchar(30),
Field 3 int null
)
○ Method 2

56
Create Table tblTest
(
Field1 int Identity not null,
Field2 varchar(30),
Field 3 int null
Constraint pk_parent primary key nonclustered (Field1)
)
Go
○ Method 3
step 1) Find the constraint name
sp_helpconstraint tblTest

/*
This way we could find out the constraint name. Let’s
assume that our constraint name is PK_tblTest_74794A92
*/

step 2) Drop the existing constraint


Alter table tblTest drop constraint PK_tblTest_74794A92

step 3) Add the new nonclustered index to that field now


Alter table tblTest add constraint PK_parent1 primary key
nonclustered (Field1)
109. What are the different isolation levels in TSQL?
○ Isolation Level: different levels of how locking works between
transactions.
○ There are 5 isolation levels in SQL Server 2008
■ 1) Read Uncommitted: The lowest isolation level. Higher
concurrency. It causes no shared lock, which means you
can read data that is currently being modified in other
transactions.
■ 2) Read Committed (Default): The default level of the
isolation level. It reads only committed data. So when
you do a select statement, there will be shared lock on
the data you are querying on. So if there is other user
querying on the same data, you have to wait until that
transaction finishes.
■ 3) Repeatable Read: This is similar to Read Committed but
with the additional guarantee that if you issue the same
select twice in a transaction you will get the same results
both times. It does this by holding on to the shared locks,
it obtains on the records it reads until the end of the
transaction. This means any transactions that try to modify

57
these records are force to wait for the read transaction to
complete.
■ 4) Serializable: Enhanced version or Repeatable Read. This
also gets rid of Phantom Reads by placing range locks on
the queried data. So any other transactions trying to
modify or insert data touched on by this transaction have
to wait until it finishes.
■ 5) Snapshot: Guarantees the same thing as Serializable. But
the way it works is different from Serializable because
it creates its own snapshot of the data being read at
the time. So if you read that data again in the same
transaction, it reads it from its snapshot.
110. When would you use NOLOCK in your code?
○ The same thing as Read Uncommitted.
○ So if you are fine and you think it is safe to read uncommitted
rows, you could use it.
○ If you are only reading data or if you just want a faster READ
operation, you could use NOLOCK.
111. What is dirty read and phantom read? What are the differences?
○ Dirty Read
■ Reading uncommitted modifications are call Dirty Reads.
Values in the data can be changed and rows can appear
or disappear in the data set before the end of the
transaction, thus getting you incorrect or wrong data.
■ This happens at READ UNCOMMITTED transaction isolation
level, the lowest level. Here transactions running do
not issue SHARED locks to prevent other transactions from
modifying data read by the current transaction. This also
do not prevent from reading rows that have been modified
but not yet committed by other transactions.
■ To prevent Dirty Reads, READ COMMITTED or SNAPSHOT
isolation level should be used.
○ Phantom Read
■ Data getting changed in current transaction by other
transactions is called Phantom Reads. New rows can be added
by other transactions, so you get different number of rows
by firing same query in current transaction.
■ In REPEATABLE READ isolation levels Shared locks are
acquired. This prevents data modification when other
transaction is reading the rows and also prevents data read
when other transaction are modifying the rows. But this
does not stop INSERT operation which can add records to a
table getting modified or read on another transaction. This

58
leads to PHANTOM reads.
■ PHANTOM reads can be prevented by using SERIALIZABLE
isolation level, the highest level. This level acquires
RANGE locks thus preventing READ, Modification and INSERT
operation on other transaction until the first transaction
gets completed.
112. What is Deadlock?
○ Deadlock is a situation where, say there are two transactions,
the two transactions are waiting for each other to release their
locks.
○ The SQL Server automatically picks which transaction should be
killed, which becomes a deadlock victim, and roll back the change
for it and throws an error message for it.

Model 3 - Data Warehouse Designing


Think in a high level perspective!!!

<Review of the differences between OLTP and OLAP>

59
1. So....what is a Data Warehouse?
○ It is a repository of an organization’s historical data that is
designed to facilitate reporting and analysis of all the business
process.
○ It contains data from different sources stored in a central
location specifically for analysis and decision making purposes.
2. What is a Data Mart?
○ It is a subset of an organizational data warehouse, designed
for the needs of a specific business process from among all the
process that exist.
○ Data marts are often derived from subsets of data in a data
warehouse, though in the bottom-up data warehouse design
methodology the data warehouse is created from the union of
organizational data marts.
3. What are some advantages of a Data Warehouse?

60
○ Broad Analysis: Data warehouses facilitate decision support
system applications such as trend reports, exception reports, and
reports that show actual performance versus goals as it contains
huge repository thus comprehensive information.
○ Common Data Model: A Data Warehouse is for all data regardless
of the data’s source. This makes it easier to report and analyze
information that it would be if multiple data models were used to
retrieve information.
○ No Anomalies: Prior to loading data into the Data Warehouse,
inconsistencies are identified and resolved. This greatly
simplifies reporting and analysis.
○ Very Efficient: Very efficient at querying data as heavily
indexed.
○ OLTP Not Affected: It is because they are separated from OLTP,
Data Warehouses provide retrieval of data without slowing down an
OLTP.
4. What are some disadvantages of a Data Warehouse?
○ Data warehouses are not the optimal environment for unstructured
data (no data model).
○ Because data must be extracted, transformed and loaded into the
warehouse, there is an element of latency in data warehouse data.
○ Maintenance costs are high.
○ There is a cost of delivering suboptimal information to the
organization.
○ There is often a fine line between data warehouses and
operational systems. Duplicates, expensive functionality may
be developed. Or, functionality may be developed in the data
warehouse that, in retrospect, should have been developed in the
operational systems and vice versa.

5. What are the components of a Data Warehouse?

61
<Diagram from The Data Warehouse Toolkit by Kimball>

<Our Own In-Class Data Warehouse Diagram>

1) Operational Source Systems


○ This is where all the transactions of the business are captured.
○ This area maintains very little historical data or just current
data.

62
○ The source doesn’t have to be necessarily a SQL Server DB. It
could be an Oracle DB, IBM DB2 or even flat files or Excel files
can be the source.
2) Data Staging Area
○ This area is where the ETL occurs.
○ There are two DBs in the data staging area: Pre-staging DB and
Staging DB.
○ Pre-staging Database
○ First, Extraction takes place. Extraction is the process
of reading and understanding the source data and moving
the data from the different sources into the pre-staging
database.
○ When you extract, you just move the data as it is into
separate table for each extraction source.
○ Then the data profiling is done, which is a task of
analyzing the source data more efficiently, understanding
the source better and preventing data quality problems
before they are introduced into the data warehouse.
■ You could write stored procedures or use the Data
Profiling Task. For example...
● Is there any null value?
● What is the maximum and minimum value?
● What is the length of a string?
● What is the general statistics and distribution
of the data?
○ Once you find some anomalies using the data profiling, flag
the anomalies and send them to your data analyst.
○ Then you want to perform the Data Cleansing tasks using
the combination of different techniques such as the Fuzzy
Lookup and Fuzzy Groupings.
○ Data Cleansing is a process of correcting misspellings,
dealing with missing values, deduplicating data and such.
○ Sometimes, the data cleansing cannot be done perfectly at
once. You might want to perform it multiple times to make
it perfectly clean.
○ Once you finish everything in the Pre-staging Database,
denormalize the tables and aggregate the data based on the grain
level and transfer them into the Staging Database using the SSIS
packages (Transform).
○ Staging Database
○ The Staging Database has the structure of the data
warehouse, which is either the star schema or snowflake
schema.

63
○ Using SSIS packages, that transformed the data accordingly,
you transfer the data into the Staging Database.
○ This is where you perform Data Verification to check the
logic of the SSIS packages.
○ Finally, Loading takes place. Loading is a process of moving the
data into the actual data warehouse.
○ Do not forget to always backup the data warehouse before loading.
3) Data Presentation Area
○ The data presentation area is where data is organized, stored,
and made available for direct querying by users, report writers,
and other analytical applications.
○ The data presentation area is the actual data warehouse because
the backroom staging area is off-limits.
○ This area can consist of either one single data warehouse or a
series of integrated data marts.
4) Data Access Tools
○ The data access tools is the final major component of the data
warehouse environment.
○ The data access tools are the variety of capabilities that can be
provided to business users to leverage the presentation area for
analytic decision making.
○ SSAS to create cubes for faster analysis.
○ SSRS to create reports.
6. What is Data Profiling?
○ Data Profiling is a process of analyzing the source data more
effectively, understanding the source data better, and preventing
data quality problems before they are introduced into the data
warehouse.
7. What is a Fact Table?
○ The primary table in a dimensional model where the numerical
performance measurements (or facts) of the business are stored so
they can be summarized to provide information about the history
of the operation of an organization.
○ Ex:

○ We use the term fact to represent a business measure.


○ The level of granularity defines the grain of the fact table.

64
○ The most useful facts are numeric and additive, such as dollar
sales amount. Additivity is crucial because data warehouse
applications almost never retrieve a single fact table row.
Rather, they bring back hundreds or thousands of facts rows at
a time, and the most useful thing to do with the rows is to add
them up.
○ A fact table contains, as foreign keys, the primary keys of
related dimensions tables and which contain the attributes of
these fact records.
○ A fact table is also known as a deep table because it contains a
lot of historical data.
8. What is a Dimension Table?
○ Dimension tables are highly denormalized tables that contain the
textual descriptions of the business and facts in their fact
table.

○ Since it is not uncommon for a dimension table to have 50 to 100


attributes and dimension tables tend to be relatively shallow in
terms of the number of rows, they are also called a wide table.
○ A dimension table has to have a surrogate key as its primary key
and has to have a business/alternate key to link between the OLTP
and OLAP.
○ A dimension tables often represent hierarchical relationships in
the business. In the sample dimension table above, products can
roll up into brands and then into categories, for example.
○ According to Kimball, improving storage efficiency by normalizing
or snowflaking has virtually no impact on the overall database
size.
9. What are the types of Measures?
○ Additive: measures that can be added across all dimensions (cost,
sales).
○ Semi-Additive: measures that can be added across few dimensions
and not with others.

65
○ Non-Additive: measures that cannot be added across all dimensions
(stock rates).
10. What is a Star Schema?
○ It is a data warehouse design where all the dimensions tables in
the warehouse are directly connected to the fact table.
○ The number of foreign keys in the fact table is equal to the
number of dimensions.
○ It is a simple design and hence faster query.
11. What is a Snowflake Schema?
○ It is a data warehouse design where at least one or more multiple
dimensions are further normalized.
○ Number of dimensions > number of fact table foreign keys
○ Normalization reduces redundancy so storage wise it is better but
querying can be affected due to the excessive joins that need to
be performed.
12. What is granularity?
○ The lowest level of information that is stored in the fact table.
○ Usually determined by the time dimension table.
○ The best granularity level would be per transaction but it would
require a lot of memory.
13. What is a Surrogate Key?
○ It is a system generated key that is an identity column with the
initial value and incremental value and ensures the uniqueness of
the data in the dimension table.
○ Every dimension table must have a surrogate key to identify each
record!!!
14. What are some advantages of using the Surrogate Key in a Data
Warehouse?
○ 1. Using a SK, you can separate the Data Warehouse and the OLTP:
to integrate data coming from heterogeneous sources, we need to
differentiate between similar business keys from the OLTP. The
keys in OLTP are the alternate key (business key).
○ 2. Performance: The fact table will have a composite key. If
surrogate keys are used, then in the fact table, we will have
integers for its foreign keys.
■ This requires less storage than VARCHAR.
■ The queries will run faster when you join on integers
rather than VARCHAR.
■ The partitioning done on SK will be faster as these are in
sequence.
○ 3. Historical Preservation: A data warehouse acts as a repository
of historical data so there will be various versions of the same
record and in order to differentiate between them, we need a SK

66
then we can keep the history of data.
○ 4. Special Situations (Late Arriving Dimension): Fact table has
a record that doesn’t have a match yet in the dimension table.
Surrogate key usage enables the use of such a ‘not found’ record
as a SK is not dependent on the ETL process.
15. What is a Data Mapping Document?
○ Some kind of document that is in the form of spreadsheet, a
diagram, or text document that identifies matching OLAP columns
and OLTP columns. The OLAP does not have to have every column of
the OLTP.
16. What is a Business (or Natural) Key?
○ A Business Key is a key that links between the tables in OLTP and
dimension tables in OLAP.
○ The primary key from the OLTP becomes the Business Key in the
Dimension Tables while the surrogate key acts as a unique
identifier for the row in the Dimension Table.
17. What is the datatype difference between a fact and dimension
tables?
○ 1. Fact Tables
■ They hold numeric data.
■ They contain measures.
■ They are deep.
○ 2. Dimensional Tables
■ They hold textual data.
■ They contain attributes of their fact tables.
■ They are wide.
18. What is the cardinality of a relationship between a dimension
table and a fact table?
○ mostly one-to-many relationship (many on the fact table side).
○ it is possible that the relationship can be many-to-many. In
that case, you have to create a factless fact table (same as a
conjunction table but just different terminologies in OLTP and
OLAP).
19. Can you connect from a dimension table to a dimension table?
○ Yes, but it will be a snowflake schema.
20. Can you connect from a fact table to a fact table?
○ No, because measures are different.
21. What are the steps of creating the structure of a Data
Warehouse?
○ 1. Identify and understand the business process to analyze
through two types of meetings:

67
■ 1. GRD: a meeting where top business people get together
and discuss business requirements and what they want to
analyze from information they can gather.
■ 2. JAD: an internal meeting where IT guys get together
and convert business requirements to functional/technical
requirements.
○ 2. Identify and understand the data source such as OLTPs and DB
objects in OLTPs (OLTP should be created first, of course).
○ 3. Identify and understand dimensions and fact tables along with
measures that would be a part of the fact table.
○ 4. Decide the appropriate grain or the level of granularity
according to the business process.
○ 5. Create a Data Mapping Document (DMD) or ETL Mapping Documents
that will have appropriate mapping from the source tables from
the OLTP to destination tables in OLAP.
○ 6. Debate on the star or snowflake schema that could be the
design structure of your data warehouse.
○ 7. Debate on Kimball and/or Inmon methodology that you would use
to design your DW.
○ 8. Create a prototype (POC = Proof Of Concept).
○ 9. Start designing DW using ERWin, MS Visio or manually with
scripts on your Development Server.
22. What are the types of dimension tables?
○ 1. Conformed Dimensions
■ when a particular dimension is connected to one or more
fact tables. ex) time dimension
○ 2. Parent-child Dimensions
■ A parent-child dimension is distinguished by the fact that
it contains a hierarchy based on a recursive relationship.
■ when a particular dimension points to its own surrogate key
to show an unary relationship.
○ 3. Role Playing Dimensions
■ when a particular dimension plays different roles in the
same fact table. ex) dim_time and orderDateKey,
shippedDateKey...usually a time dimension table.
■ Role-playing dimensions conserve storage space, save
processing time, and improve database manageability .
○ 4. Slowly Changing Dimensions: A dimension table that have data
that changes slowly that occur by inserting and updating of
records.
■ 1. Type 0: columns where changes are not allowed - no
change ex) DOB, SSNm

68
■ 2. Type 1: columns where its values can be replaced without
adding its new row - replacement
■ 3. Type 2: for any change for the value in a column, a new
record it will be added - historical data. Previous values
are saved in records marked as outdated. For even a single
type 2 column, startDate, EndDate, and status are needed.
■ 4. Type 3: advanced version of type 2 where you can set up
the upper limit of history which drops the oldest record
when the limit has been reached with the help of outside
SQL implementation.
■ Type 0 ~ 2 are implemented on the column level.
○ 5. Degenerated Dimensions: a particular dimension that has an
one-to-one relationship between itself and the fact table.
■ When a particular Dimension table grows at the same rate as
a fact table, the actual dimension can be removed and the
dimensions from the dimension table can be inserted into
the actual fact table.
■ You can see this mostly when the granularity level of the
the facts are per transaction.
■ E.g. The dimension salesorderdate (or other dimensions in
DimSalesOrder would grow everytime a sale is made therefore
the dimension (attributes) would be moved into the fact
table.
○ 6. Junk Dimensions: holds all miscellaneous attributes that may
or may not necessarily belong to any other dimensions. It could
be yes/no, flags, or long open-ended text data.
23. What is your strategy for the incremental load?
○ I used the combination of different techniques for the
incremental load in my previous projects; time stamps, CDC
(Change Data Capture), MERGE statement and CHECKSUM() in TSQL,
LEFT OUTER JOIN, TRIGGER, the Lookup Transformation in SSIS.
24. What is CDC?
CDC (Change Data Capture) is a method to capture data changes, such as
INSERT, UPDATE and DELETE, happening in a source table by reading transaction
log files. Using CDC in the process of an incremental load, you are going to
be able to store the changes in a SQL table, enabling us to apply the changes
to a target table incrementally.

In data warehousing, CDC is used for propagating changes in the source system
into your data warehouse, updating dimensions in a data mart, propagating
standing data changes into your data warehouse and such.

The advantages of CDC are:

69
- It is almost real time ETL.
- It can handle small volume of data.
- It can be more efficient than replication.
- It can be auditable.
- It can be used to configurable clean up.

-- Create a Change Data Capture Practice DB


CREATE DATABASE CDC_Practice
GO
USE CDC_Practice
GO

-- System stored procedure to enable CDC


EXEC sys.sp_cdc_enable_db -- 'EXEC sys.sp_cdc_disable_db' to disable
CDC

-- Check if CDC is enabled on the CDC_Practice database.


-- If the value is 1, it means it's enabled.
-- If the value is 0, it means it's un-abled.
SELECT is_cdc_enabled FROM sys.databases where name = 'CDC_Practice'

-- Create a source table and populate it.


CREATE TABLE source_table(
ID INT NOT NULL,
Name VARCHAR(50))
-- Must have a primary key.
ALTER TABLE source_table
ADD CONSTRAINT pk_source_table_ID
PRIMARY KEY (ID)

INSERT INTO source_table VALUES (1,'A'),(2,'B'),(3,'C')

SELECT * FROM source_table

-- Enable CDC on the table 'source_table'.


EXEC sys.sp_cdc_enable_table
@source_schema = 'dbo',
@source_name = 'source_table',
@role_name = 'jihoon',
@supports_net_changes = 1
-- Verify it (should be 1).
SELECT is_tracked_by_cdc FROM sys.tables WHERE name = 'source_table'

-- Update the source table.


UPDATE source_table

70
SET Name = 'Jihoon'
WHERE ID = 3

DELETE FROM source_table


WHERE ID = 2

INSERT INTO source_table VALUES (100, 'whatever')

SELECT * FROM source_table

-- Retrieve the changes.


SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_source_table
(sys.fn_cdc_get_min_lsn('dbo_source_table'),
sys.fn_cdc_get_max_lsn(), 'all')

SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_source_table


(sys.fn_cdc_get_min_lsn('dbo_source_table'),
sys.fn_cdc_get_max_lsn(), 'all')

You can see the products of CDC on the Object Explorer.

Disadvantages of CDC are:


- Lots of change tables and functions
- Bad for big changes e.g. truncate & reload

Optimization of CDC:
- Stop the capture job during load
- When applying changes to target, it is ideal to use merge.

25. What is Late Arriving Dimensions/Early Arriving Facts?


○ Sometimes when you implement an incremental load for you ETL
strategy, you have to include late arriving dimensions or early
arriving facts in your consideration.
○ For example in a banking scenario, the banking data warehouse
might receive just the amount of a transaction first then its
attribute later, such as where the transaction occurred, who is
the owner of the transaction and such.
○ There are several options to handle this problem:
■ 1. You could just hold onto the fact in the staging area if
it is expected that the late arriving dimension data to be
coming in soon.
■ 2. You could also create an empty row in the dimension
called ‘Unknown’. If you are processing orders and no

71
information about a promotion comes in, then it would be
safe to link the fact to a special row in the dimension
that denotes that no promotional information was available.
Depending on your dimension and business requirements, you
could have many different levels of unknowns. For example:
● -1, ‘none’
● -2, ‘unknown’
● -3, ‘not applicable’
■ 3. Another option is to use an inferred dimension. Simply
insert a new row into dimension with all of the information
you know to be true about the dimension. In order to do
this, you need to make another attribute in your dimension
called “inferred” and its value with true or false.
■ 4. The last method is to use the Slowly Changing Dimension
Transformation in SSIS. Using it, you can enable inferred
member support on the SCD Wizard.
26. What is the Time Stamps?

Timestamp is the synonym for the rowversion data type and is subject to
the behavior of data type synonyms. So it is a data type that is generated
automatically. The value of time stamps or rowversion is unique binary
numbers within a database. It is usually used for version stamping table
rows, which has the size 8 bytes. To record a date or time, you could also
use datetime2 datetype.

-- Create an example table with a time stamp column.


CREATE TABLE ExampleTable (ID int PRIMARY KEY, Name VARCHAR(10),
timestamp)

-- Populate the table.


INSERT INTO ExampleTable VALUES (1, 'A',DEFAULT)
INSERT INTO ExampleTable VALUES (2, 'B',DEFAULT)
INSERT INTO ExampleTable VALUES (3, 'C',DEFAULT)
INSERT INTO ExampleTable VALUES (4, 'D',DEFAULT)
INSERT INTO ExampleTable VALUES (5, 'E',DEFAULT)
SELECT * FROM ExampleTable

/* -- Result
1 A 0x00000000000007DD
2 B 0x00000000000007DE
3 C 0x00000000000007DF
4 D 0x00000000000007E0
5 E 0x00000000000007E1
*/

72
-- Do some changes.
UPDATE ExampleTable
SET Name = 'Jihoon'
WHERE ID = 4

UPDATE ExampleTable
SET Name = 'AA'
WHERE ID = 1

DELETE FROM ExampleTable


WHERE Name = 'E'

INSERT INTO ExampleTable VALUES (77, 'oo', DEFAULT), (88, 'ooo',


DEFAULT)

SELECT * FROM ExampleTable


/* -- Result after changes
1 AA 0x00000000000007E5
2 B 0x00000000000007DE
3 C 0x00000000000007DF
4 Jihoon 0x00000000000007E4
77 oo 0x00000000000007E2
88 ooo 0x00000000000007E3
*/

As you can see, each column has a unique rownumber value. If a change occurs
on a row, it will be given a new rownumber so that you will be able to
capture the changes occurred.

27. What is MERGE in TSQL?


○ It is a new feature introduced in SQL 2008.
○ It is a way to manage update/delete/insert in a dimension table.
○ It is commonly used when loading data and allows you to perform
multiple DML operations.
○ It can be used as one of the methods of incremental load.
○ In order to use MERGE, you first need to provide a source table
and a destination table. Then. you need to give a method of
matching rows between the two tables such as on the primary key
or the business key.
○ Then, you can specify a number of actions depending on whether
a match is found or not. If a match is found, you may want to
update the existing record. If a match is not found, then it is

73
likely that we will want to insert a new record.
MERGE Customer AS [Target]
USING StagingCustomer AS [Source]
ON Target.Email = Source.Email
WHEN MATCHED AND
(
Target.FirstName <> Source.FirstName
OR Target.LastName <> Source.LastName
OR Target.Title <> Source.Title
OR Target.DoB <> Source.DoB)
THEN UPDATE SET
FirstName = Source.FirstName
,LastName = Source.LastName
,Title = Source.Title
,DoB = Source.DoB
,DTUpdated = GETDATE()
WHEN NOT MATCHED BY TARGET
THEN INSERT (
Email
,FirstName
,LastName
,Title
,DoB
,IsActive
,DTInserted
,DTUpdated)
VALUES (
Source.Email
,Source.FirstName
,Source.LastName
,Source.Title
,Source.DoB
,1
,GETDATE()
,GETDATE()
)
WHEN NOT MATCHED BY SOURCE
AND Target.IsActive=1
THEN UPDATE SET
IsActive = 0
,DTUpdated = GETDATE()
;

74
Model 4 - SSIS
1. What is SSIS?
○ SQL Server Integration Services
○ It is a platform for building enterprise-level data integration
and data transformations solutions.
○ Simply put, it is a component of the Microsoft SQL Server
database software that can be used for a broad range of data
migration tasks, data integration and workflow applications. It
also features a fast and flexible data warehousing tool used for
data extraction, transformation and loading (ETL).
○ The Integration Services is used to solve complex business
problems by copying or downloading files, sending e-mail messages
in response to events, updating data warehouses, cleaning and
mining data, and managing SQL Server objects and data.
○ The packages can work alone or along with other packages to
address complex business needs.
○ Integration Services that can extract and transform data from a
wide variety of sources such as XML data files, flat files, and
relational data sources, and then load the data into one or more
destinations.
○ Integration Services includes a rich set of built-in tasks
and transformations; tools for constructing packages; and the
Integration Services service for running and managing packages.
○ You can use the graphical Integration Services tools to create
solutions without writing a single line of code; or you can
program the extensive Integration Services object model to create
packages programmatically and code custom tasks and other package
objects.

75
2. Are variables that are declared in SSIS case-sensitive?
○ YES!!! (not case-sensitive in TSQL)
3. What are the three scopes of variables in SSIS?
○ 1. package bound
○ 2. container bound
○ 3. task bound
4. What is Control Flow?
○ The highest point of your package from where the execution of
package starts.
○ Three executables of Control Flow:
■ 1. package
■ 2. container
■ 3. tasks
○ Arrows in Control Flow represent Precedence Constraints.
■ Green Arrow: a path to take on success.
■ Red Arrow: a path to take on failure.
■ Blue Arrow: a path to take on completion.
5. What is Data Flow?
○ A flow that is used to actually move data from point A to point
B.
○ Data flow exists only when you have at least one Data Flow Task
in Control Flow.
○ For every data flow task, which is one of the the components of
the Control Flow, we have separate Data Flow Tab.
○ There won't be any Data Flow Tab if we don't have any data flow
tasks in Control Flow.
○ Three components of Data Flow:
■ 1. Sources Adapters -> (E)
■ 2. Transformations -> (T)
■ 3. Destination Adapters -> (L)
○ Arrows on Data Flow represent Data Pipeline.
■ Green Arrow: a path to take on success.
■ Red Arrow: a path to take on failure.
6. What is Connection Manager?
○ It connects outside of SSIS and points to different data sources.
○ It can be used to connect to a source of data or to a
destination.
○ Integration Services includes a variety of connection managers
for connecting to different data sources, such as relational
databases, Analysis Services databases, and files in CSV and XML
formats.
○ A connection manager can be created at the package level or at
the project level. The connection manager created at the project

76
level is available to all the packages in the project. Whereas,
connection manager created at the package level is available to
that specific package.
7. List the Control Flow Items and their description.
○ Containers
■ For Loop Container
● A SSIS control flow item that defines a repeating
control flow in a package.
● Similar to a for loop in any other programming
languages.
● In order to use a For Loop container, you have to
specify following elements to define the loop:
○ 1. An optional initialization expression that
assigns values to the loop counters. (Ex.
@counter = 1)
○ 2. An evaluation expression that contains the
expression used to test whether the loop should
stop or continue. (Ex. @counter <= 4)
○ 3. An optional iteration expression that
increments or decrements the loop counter. (Ex.
@counter = @counter + 1)
● Any expression that uses the assignment operator must
have the syntax @Var = <expression>.
● You can create multiple different files using a For
Loop container. For example, you extract data from
a certain table and load the data to five different
flat files. In order to do so, you have to make
the CounterString value of your Destination source
dynamic. Then, you have to configure the Expressions
under Properties of the Connection Manager.
■ Foreach Loop Container
● A SSIS control flow item that defines a repeating
control flow in a package but it is an advanced
version of the For Loop Container.
● But the difference is that looping of the Foreach
Loop Container is enabled by using a Foreach
enumerator. So the Container repeats the control flow
for each member of a specified enumerator.
● It is more dynamic than the For Loop Container
because you don’t have to specify how many times the
Foreach Loop Container will loop. On the other hand,
you have to specify the condition using a counter
variable in the For Loop Container.

77
● Commonly used Enumerator types:
○ 1. Foreach File Enumerator
○ 2. Foreach Item Enumerator: Define the items in
the Foreach Item collection, including columns
and column data types.
○ 3. Foreach From Variable Enumerator: Specify
the variable that contains the objects to
enumerate.
● Three ‘Retrieve File Name’ Options:
○ 1. Full qualified (recommended since you are
specifying the full path so SSIS doesn’t have
to look up by just the file name)
○ 2. Name and extension
○ 3. Name only
■ Sequence Container
● A SSIS Control Flow item that defines a control flow
that is a subset of the package control flow.
● A sequence container logically groups the package
into multiple separate control flows, each containing
one or more tasks and containers that run within the
overall package control flow.
● A sequence container can be used to implement a
transaction on the SSIS level.
○ Tasks
■ Bulk Insert Task
● A SSIS Control Flow item that provides an way to copy
a large amount of data into a SQL Server table or
view.
● The same concept as BULK INSERT in TSQL.
● Bulk Insert Task needs two connections:
○ 1. Source connection: a file that you want to
extract data from.
○ 2. Destination connection: a table/view that
you want to dump the data to.
● A table/view you BULK INSERT to should be created
previously.
● Edit -> Options -> BatchSize: A number of dividers of
the records by. The default value is 0, which means
it will grab the whole dataset and dump it into the
table/view at once. If you set the value to 1, it
will divide the dataset into 2 parts and dump them
partially.

78
● Edit -> Options -> MaxErrors: A number of errors that
are ignored when a record has an error. The default
value is 0, which means it will throw an error for
the first error encountered. If the value is 5, for
example, the first five rows that have an error will
be ignored.
● Edit -> Options -> FirstRow initially starts from 1,
not from 0.
■ Data Flow Task
● A SSIS Control Flow item that encapsulates the data
flow engine that moves data between sources and
destinations providing the facility to transform,
cleanse, and modify data as it is moved.
● A data flow consists of at least one data flow
component, but it is typically a set of connected
data flow components: sources that extract data;
transformations that modify, route, or summarize
data; and destinations that load data.
● At run time, the Data Flow task builds an execution
plan from the data flow, and the data flow engine
executes the plan. You can create a Data Flow task
that has no data flow, but the task executes only if
it includes at least one data flow.
● Each Data Flow Task creates corresponding Data Flow
Task in Control Flow, which will have its own Data
Flow tab and canvas.
● If you right click on it and go to Edit, it will
direct you to its corresponding Data Flow canvas.
■ Execute Package Task
● A SSIS Control Flow item that lets you run other
packages in a package as part of a workflow.
● This task is mainly used to make master-child
packages.
● Each Execute Package Task needs its own connection to
point to the package.
● One Execute Package Task for one package, which means
the task cannot have multiple packages in it.
● Using master-child packages you can do modular
programming (just like SPs in TSQL), which will
improve the manageability and readability of your
SSIS packages.
● Edit -> Package -> ExecuteOutOfProcess

79
○ False (default): the master and child packages
will share the same memory in the RAM so it is
slower.
○ True: the two separate processes will be
created and the master and child packages will
have their own separate memory in the RAM so it
will be much faster. It is recommended to set
the value of ExecuteOutOfProcess to True unless
the file is too big.
■ Execute Process Task
● A SSIS Control Flow item that runs an application
or .exe file (outside of SSIS) as part of a SSIS
package workflow.
● You can run any type of application including
Microsoft Word or Excel.
■ Execute SQL Task
● A SSIS Control Flow item that runs SQL statements
(not necessarily TSQL statements) or stored
procedures from a package.
● The task can contain either a single SQL statement or
multiple SQL statements that run sequentially.
● Generally you can use the Execute SQL task for the
following purposes:
○ Truncate a table or view in preparation for
inserting data.
○ Create, alter, and drop database objects such
as tables and views.
○ Re-create fact and dimension tables before
loading data into them.
○ Run stored procedures.
○ Save the rowset returned from a query into a
variable.
● You can also pass dynamic variables to the Execute
SQL Task with the help of ?s. In that case, you have
to configure the values of ‘Parameter Name’ to map
the variables to the parameters appropriately.
■ File System Task
● A SSIS Control Flow item that performs operations on
files and directories in the file system.
● The list of the operations of a File System Task:
○ Copy directory
○ Copy file
○ Create directory

80
○ Delete directory
○ Delete directory content
○ Delete file
○ Move directory
○ Move file
○ Rename file
○ Set attributes: sets attributes on files and
folders. Attributes include Archive, Hidden,
Normal, ReadOnly, and System.
■ FTP Task
● A SSIS Control Flow item that performs FTP operations
such as sending and receiving files.
● Not only downloading and updating files, you can also
manage files on servers.
● You can use the FTP task for the following purposes:
○ 1. Copying directories and data files from one
directory to another, before or after moving
data, and applying transformations to the data.
○ 2. Logging into a source FTP location and
copying files or packages to a destination
directory.
○ 3. Downloading files from an FTP location and
applying transformations to column data before
loading the data into a database.
● Predefined FTP Task operations:
○ 1. Send files
○ 2. Receive files
○ 3. Create local directory
○ 4. Create remote directory
○ 5. Remove local directory
○ 6. Remove remote directory
○ 7. Delete local files
○ 8. Delete remote files
■ Script Task
● A SSIS Control Flow item that provides interactive
custom task authoring.
● It allows you to write and edit scripts using the VB
and C# and configure the task’s properties.
● You can select read-only and write-only variables.
● You would use it to implement a feature that is not
provided by SSIS.
■ Send Email Task
● A SSIS Control Flow item that sends an email.

81
● By using the Send Mail task, a package can send
messages if tasks in the package workflow succeed or
fail, or send messages in response to an event that
the package raises at run time.
8. List the Data Flow Items and their description.
○ Source Adaptors
■ Excel Source
● A SSIS Data Flow item that extracts data from
worksheets or ranges in Microsoft Excel workbooks.
■ Flat File Source
● A SSIS Data Flow item that reads data from a text
file.
● The text file can be in delimited, fixed width, or
mixed format.
■ OLE DB Source (Object Linking & Embedding Database)
● A SSIS Data Flow item that extracts data from a
variety of OLE DB-compliant relational databases by
using a database table, a view, or an SQL command.
● For example, the OLE DB source can extract data from
tables in Microsoft Office Access or SQL Server
databases.
● The OLE DB source provides four different data access
modes for extracting data:
○ 1. A table or view.
○ 2. A table or view specified in a variable.
○ 3. The results of an SQL statement.
○ 4. The results of an SQL statement stored in a
variable.
○ Transformations
■ Aggregate Transformation
● A SSIS Data Flow item that aggregates or groups
values in a dataset.
● It applies aggregate functions, such as Average,
and column values and copies the results to the
transformation output.
● Besides aggregate functions, the transformation
provides the GROUP BY clause, which you can use to
specify groups to aggregate across.
● Aggregate Functions:
○ 1. Group by
○ 2. Sum
○ 3. Average
○ 4. Count

82
○ 5. Count Distinct: Returns the number of unique
non-null values in a group.
○ 6. Minimum (= MIN in TSQL)
○ 7. Maximum (= MAX in TSQL)
■ Audit Transformation
● A SSIS Data Flow item that enables the data flow in
a package to include data about the environment in
which the package runs.
● You can think of it as a highly specific version of
the Derived Column transformation. It performs the
same function, adding a new column to the dataset.
However, it's limited to a few package and task-
specific options such as Package ID, Package Name,
User Name, Task Name and so on.
● For example, the name of the package, computer, and
operator can be added to the data flow. Microsoft SQL
Server Integration Services includes system variables
that provide this information.
● http://www.mssqlsage.com/content/audit-transformation
■ Character Map Transformation
● A SSIS Data Flow item that applies string functions,
such as converting from lowercase to uppercase, to
character data.
● You can set the destination of it to either a new
column (New Column) or just replacing the existing
column (In-place change).
● Some operations of the transformation:
○ 1. Byte reversal: Reverses byte order.
○ 2. Lowercase: Converts characters to lowercase.
○ 3. Uppercase: Converts characters to uppercase.
■ Conditional Split Transformation
● A SSIS Data Flow item that can route data rows to
different outputs depending on the content of the
data and the conditional expression you specify.
● The same concept with a case decision structure in
programming languages and WHERE in TSQL.
● You can also specify the default output so if a row
does not match any of the expressions specified, it
will be directed to the default output.
■ Copy Column Transformation
● A SSIS Data Flow item that creates new columns by
copying input columns and adding the new columns to
the transformation output.

83
●For example, you can use the Copy Column
transformation to create a copy of a column and then
convert the copied data to uppercase characters by
using the Character Map transformation, or apply
aggregations to the new column by using the Aggregate
transformation.
■ Data Conversion Transformation
● A SSIS Data Flow item that converts the data in an
input column to a different data type and then copies
it to a new output column.
● The similar concept as CONVERT or CAST in TSQL.
■ Derived Column Transformation
● A SSIS Data Flow item that creates new column values
by applying expressions to transformation input
columns.
● The same concept as a derived column in TSQL, which
is a column that is generated on the fly.
● An expression can contain any combination of
variables, functions, operators, and columns from the
transformation input.
■ Export Column Transformation
● A SSIS Data Flow item that allows you to export a
column that has its values in a binary format to a
certain file format.
■ Fuzzy Lookup
● A SSIS Data Flow item that is for data cleansing
such as standardizing data, correcting data, and
providing missing values using a complex string match
algorithm, which should be done data staging area.
● It is similar to the Lookup Transformation. However,
instead of exact matching, the Fuzzy Lookup will do
fuzzy matching.
● How much not clear will be taken care of using the
similarity threshold value in the Fuzzy Lookup. If
you set the threshold value to 1.00, it is the same
as ‘Lookup’. If you set the threshold value to 0.00,
each dirty data will be mapped to every clean data.
● It has two inputs:
○ Reference Table: the clean data (static) in the
copy of the dimension table in the data staging
area.
○ Actual Table: the dirty data from OLTP.

84
● You can specify tokens that should not be considered
as dirty data. For example, ‘ should not be
considered as dirty data when someone’s name is
William O’neil.
● You can set up error tolerance indexes (ETI)
to improve the performance of the Fuzzy Lookup
Transformation (and Fuzzy Grouping Transformation).
Instead of creating cube rums on the fly every time,
the cube rums/tokens will be stored for the faster
lookup.
● _Similarity and _Confidence columns will be created
automatically in the Lookup destination.
○ Similarity: shows how the dirty data is similar
to the clean data.
○ Confidence: shows how accurate the SSIS engine
thinks about the Fuzzy matching.
■ Fuzzy Grouping
● A SSIS Data Flow item that performs data cleaning
tasks by identifying rows of data that are likely to
be duplicates and selecting a canonical row of data
to use in standardizing the data.
● Unlike the Fuzzy Lookup, it just takes one input,
which is dirty data.
● It is for grouping a similar type of dirty data.
●The Fuzzy Grouping transformation requires a
connection to an instance of SQL Server to create the
temporary SQL Server tables that the transformation
algorithm requires to do its work.
● key_in, key_out, and score columns will be created
automatically.
○ key_in: identifier in the Fuzzy Group.
○ key_out: the value that shows which group the
row belongs to.
○ score: the score that shows how matching the
data is to its canonical row.
○ When key_in = key_out, that means it is
canonical (the best matching row in a group).
○ For example, there are two rows have the same
key_out value and it means they are in the same
fuzzy group. The row that is not canonical will
point to its canonical row from key_out of the
non-canonical rows to key_in of the canonical
row.

85
■ Import Column Transformation
● A SSIS Data Flow item that allows you import a column
to data in a binary format from a certain file
format.
■ Lookup Transformation
● A SSIS Data Flow item that performs lookup, which is
mainly used to implement the functionality of ‘equi-
join’.
● Equi-join is a type of join that returns only the
first matching values and skip the rest of the
subsequent matching values from both the tables based
on the joining columns.
● Ex:
Emp Phone
ID Name ID Phone
1 A 1 1111
2 B 1 2222
3 C 2 3333
4 D 3 4444
3 5555

If you do equi-join on both Emp and Phone tables based on the ID column by
using the Lookup Transformation on both the tables...

First Green pipeline Second Green pipeline (or Red pipeline)


1 A 1111 4 D NULL
2 B 3333
3 C 4444

● The Lookup Transformation takes two inputs:


○ 1. Reference table: not coming through a data
pipeline (hard drive). Dimension tables in DW
are always a reference table.
○ 2. Actual table: coming through a data pipeline
(RAM). Tracking tables are always an actual
table.
● This transformation is mainly used to implement
incremental load from OLTP to OLAP.
● Edit -> General -> Cache Mode
○ 1. Full cache: faster matching
○ 2. Partial cache

86
○ 3. No cache: slower matching since this cache
mode compares between the values in the hard
disk and the values in the RAM.
■ Merge Transformation
● A SSIS Data Flow item that that allows you to perform
UNION ALL.
● You can think of it as a small brother of the UNION
ALL transformation but it can accept max two inputs
and the inputs must be sorted.
■ Merge Join Transformation
● A SSIS Data Flow item that provides an output that
is generated by joining two sorted datasets using a
FULL, LEFT, or INNER join.
● The same concept as JOINs in TSQL.
● You can specify if the Merge Join transformation uses
INNER, LEFT, or FULL join.
○ If you want to implement the feature of RIGHT
OUTER JOIN, you can use LEFT join and just swap
the two tables.
● You can specify the joining columns.
● Requirements of Merge Join transformation:
○ 1. The joining columns of the two data must
have the same metadata (data type).
○ 2. The joining columns must be sorted in the
ascending order.
● In order to let SSIS know that a certain column is
sorted, you can use Sort transformation (which is not
recommended because it slows down the performance)
or you can order it when extracting data from a
database table using ORDER BY. In that case, you have
to go to ‘Show Advanced Editor...’ of the source and
set which column is sorted using ‘SortKeyPosition’
and ‘Sorted()’ to true.
■ Multicast Transformation
● A SSIS Data Flow item that distributes its input to
one or more outputs.
● Similar to the Conditional Split but the difference
is that the Multicast transformation directs every
row to every output, and the Conditional Split
directs a row to a single output.
■ OLE DB Command Transformation
● A SSIS Data Flow item that runs an SQL statement for
each row in a data flow in the Data Flow level.

87
● You will need a separate connection manager to push
data into.
● Practical use:
○ performing incremental load on dimension tables
or fast tables.
○ implementing Cursor as TSQL
■ Percentage Sampling Transformation
● A SSIS Data Flow that gives you the feature of TOP
PERCENT as TSQL in SSIS.
■ Row Count Transformation
● A SSIS Data Flow item that counts rows as they pass
through a data flow and stores the final count in a
variable.
● The variables the Row Count Transformation must
already exist and must be in the scope of the task.
● The Row Count transformation has only one input and
output. It does not have an error output.
■ Script Component Transformation
● A SSIS Data Flow item that allows you to write any VB
or C# scripts on the Data Flow level.
■ Slow Changing Dimension Transformation
● A SSIS Data Flow item that coordinates the updating
and inserting of records in data warehouse dimension
tables.
● It is the only transformation that has its
own ‘wizard’.
● Step 1) You need to specify which column is going
to be a business key and which columns are not a
key column. You don’t need to specify anything for
historical attributes such as StartDate, EndDate and
Status.
● Step 2) Then you have to specify types, type 0 (fixed
attribute), type 1 (changing attribute), or type 2
(historical attribute), for each column specified in
the first step. You cannot specify type 3 unless you
write a separate script for it.
● Step 3) This is where you can change the options of
fixed attributes and changing attributes.
○ Check or uncheck ‘Fail transformation if
changes are detected in a fixed attribute’
(Checked recommended).
○ Check or uncheck ‘Change all the records,
including outdated records, when changes are

88
detected in a changing attribute’ (Unchecked
recommended).
● Step 4) The step where you specify historical
attributes to record. You can pick either to use a
single column (Status -> Current or Expired) or two
columns (StartDate and EndDate).
● Step 5) The last step where you can use inferred
members when a fact table may reference dimension
tables are not yet loaded (early arriving facts or
late arriving dimensions).
○ Inferred Dimension Members: dummy records
that are used in a dimension table just for
the purpose of inserting into the fact table
and the dummy records will be replaced later
manually.
● It is important but not recommended because the Slow
Changing Dimension transformation is broken down into
multiple components such as Derived Columns and OLD
EB Commands that slow down the performance.
■ Sort Transformation
● A SSIS Data Flow item that sorts its input in an
ascending or descending order and copies the sorted
data to its output.
● You can specify ‘Sort Order’ if you want to sort by
multiple columns. The similar concept as ORDER BY in
TSQL. For example, ORDER BY BirthDate, Salary will
sort by BirthDate first, which will be Sort Order 1
in SSIS, and sort by Salary next, which will be Sort
Order 2 in SSIS.
● It is a blocking operation.
■ Union All Transformation
● The name says everything :p
● Just that it can accept multiple inputs.
○ Destination Adaptors
■ Excel Destination
● A SSIS Data Flow item that loads data from worksheets
or ranges in Microsoft Excel workbooks.
■ Flat File Destination
● A SSIS Data Flow item that loads data into a flat
file.
● It writes data into a text file. The text file can
be in delimited, fixed width, fixed width with row
delimiter, or ragged right format.

89
■ OLE DB Destination (Object Linking & Embedding Database)
● A SSIS Data Flow item that loads data into a
relationship DB by using an OLE DB provider.
● It loads data into a variety of OLE DB-compliant
databases using a database table or view or an SQL
command.
● The OLE DB destination provides five different data
access modes for loading data:
○ 1. A table or view.
○ 2. A table or view using fast-load options.
○ 3. A table or view specified in a variable.
○ 4. A table or view specified in a variable
using fast-load options.
○ 5. The results of an SQL statement.
9. What are the error handling methods in SSIS?
○ Control Flow level
■ 1. Using precedence constraints
● On success, failure, or completion
● A Precedence constraint can be configured so that
just one constraint has to be met in order for the
pointed executable to be run or all constraints have
to be met by setting checking if it is AND (regular
line) or OR (dotted line).
○ Ex: Two tasks point to another task. This
produces two constraints. You can set a
constraint so that only as long as one
executable reaches the executable, it will run
that task.
● A Precedence constraint can be configured to evaluate
a passing data flow by a constraint, expression, or
combination of the two.
● So you can also set up an expression to check through
on a precedence constraint. Then you have four
evaluations here:
○ 1. Constraint
○ 2. Expression
○ 3. Constraint AND Expression
○ 4. Constraint OR Expression
■ 2. Using Event Handlers: it is a trigger that is executed
on a particular event. You can add executables in the
event handler canvas which just acts like an on event
package
NOTE: Event Handlers can only be executed on the control

90
flow level.
● OnError
● OnTaskFailed (most commonly used)
● OnError and OnTaskFailed are the actual error
handling events.
● OnPostExecute
● OnPreExecute
● OnPostValidate
● OnPreValidate
■ When Error handling, always set the system variable
propagation value to false. If it is not set to false,
the same events for the parent control flow objects will
be fired. This is problematic because error handling on
the package level may contain a different set of tasks to
execute than on the dataflow level. However, if the the
propagation is set to true for a particular task, it will
fire both the parents same event type as well as its own
event type.
○ Data Flow level (Right click on an executable -> Edit -> Error
Output)
■ 1. Ignore Failure
■ 2. Redirect Row (to the red/error data pipeline)
■ 3. Fail Component (default)
○ You should also change the values of some properties:
■ MaxErrorCount
■ DelayValidation
10. What are transactions in SSIS and how can they be useful?
○ Transactions in SSIS are similar to transactions in TSQL in that
has the ACID properties: if one control flow executable fails,
then all the other executables will rollback.
○ Transactions are placed on the Control Flow level.
○ There are three properties named ‘TransactionOption’ to a
transaction for each executable:
■ 1. Required
● Starts a Transaction, if a transaction exists in a
parent executable, then this property will act as if
it were set to supported.
■ 2. Supported
● Indicates that an executable is part of its
transaction.
■ 3. Not Supported

91
● Has nothing to do with the transaction, failure
upon an executable with ‘Not Supported’ Transaction
property setting will not roll back.
11. What is the Checkpoint and what are the steps of implementing
the Checkpoints in SSIS?
○ As the name ‘checkpoint’ says, it allows you to save a checkpoint
on a certain spot of the workflow. So if you create a checkpoint
on a package, the package will be executed from the latest point
of failure.
○ You use transactions in combination with checkpoints to create a
performance enhancing ETL Strategy.
○ Whenever an executable fails, a checkpoint file will be created
to let the SSIS package know where to restart the package upon
where it had previously failed (Starts from the latest checkpoint
of failure). Without a checkpoint file, the workflow will just
start from the scratch.
○ Steps:
■ 1. Package -> Properties -> CheckPointFileName
● Where you save your checkpoint file that logs the
checkpoints.
■ 2. Package -> Properties -> CheckPointUsage
● Where you set up the usage of the checkpoint
○ never: always start from the scratch.
○ ifExist: if the checkpoint file exists, start
from wherever that file points to.
○ always: you must have the checkpoint file or it
will throw an error.
■ 3. Set ‘SaveCheckpoints’ to TRUE
■ 4. Set ‘FailPackageOnFailure’ to TRUE on an executable on
which you want to save a checkpoint.
12. What are the benefits to using Checkpoints?
● Avoid repeating the downloading and uploading of large files.
● Avoid repeating the loading of large amounts of data.
● Avoid repeating the aggregation of values.
13. Why is using StartDate and EndDate as historical attributes
better than Status as a historical attribute?
○ Using StartDate and EndDate, you can know the time period of the
old records.
14. What is the difference between the Merge Transformation and
Union All Transformation?
● Merge Transformation can accept only 2 inputs and the inputs have
to be sorted before the merge.

92
● Union All Transformation can accept multiple input without being
sorted before.
15. What is the difference between the Data Flow Task and the Bulk
Insert Task?
Data Flow Task Bulk Insert Task

Source You can have any types of You can have only a Flat
sources such as Excel, File as its source.
Flat File, OLE DB Table/
View, XML, etc.

Number of Connections You can have one or more You must have two
source connections and connections, a source
destination connections. connection for the flat
file in the hard disk and
a destination connection
for the table in the SQL
Server.

Transformation You can make You cannot make


transformations between transformations between
the source and the the source and the
destination. destination.

Destination You can have any types You can have only a OLE
of destinations such as DB Table/View as its
Excel, Flat File, OLE DB destination.
Table/View, XML, etc.

16. What is the difference between the For Loop Container and
Foreach Loop Container?
For Loop Container Foreach Loop Container

Description The same concept as a for- An advanced version of the


loop in programming languages For Loop Container that
such as C or Java. allows you to loop over
files in a folder, items,
variables and so on.

Characteristic Static - you have to set up a Dynamic - the number of


defined number of loop as a loop changes depends on its
variable. enumerator.

93
Configuration After setting up a counter Example of Foreach File
variable, you have to enter enumerator: after setting
its initial expression a variable for the dynamic
(@counter = 1), evaluation change of the file path,
expression (@counter <= you have to map it when
5) and assign expression configuring the Foreach
(@counter = @counter + 1). Loop container, which is
going to be either ‘Fully
qualified’, ‘Name and
extension’, or ‘Name only’.

Looping Method Using an integer counter Using an enumerator.


variable. The most commonly used
enumerators are Foreach
File, Foreach Item, and
Foreach From Variable
enumerators.

17. What are the types of Precedence Constraints (in terms of


constraint options)?
○ Evaluation Operation: Constraint, Expression, Constraint and
Expression, Constraint or Expression.
○ Precedence Constraint Values: success, failure, completion.
○ For multiple Precedence Constraints coming into one executable:
Logical AND (default and straight line) and Logical OR (dotted
line).
18. Explain about Master-Child packages.
○ Just like stored procedures in TSQL, using master-child packages
allow you to do modular programming, which gives you better
manageability and readability of your SSIS packages.
○ I created master-child packages using the Execute Package Task
on the control flow level so I could control the sequence of the
package execution accordingly and conditionally. Sometimes I
had to even disable a particular package to exclude it from the
package execution, if required.
○ The Execute Package Task can run child packages that are
contained in the same project that contains the parent package,
that are in the different projects, or even the packages on the
SQL server.
○ Sometimes I had a problem to access an encrypted package since
it was password protected. And sometimes, I needed to pass a
variable from a parent package to a child package. In order to
do that, I created a variable on the child package to store the
parent package variable. Then, I configured the child package
using the Parent Package Variable configuration.

94
19. What is the difference between OnError event handler and
OnTaskFailed event handler?
○ ‘OnError’ gets triggered every time there is an error in your
package. For example, it will be triggered 5 times if there are 5
errors in the package.
○ ‘OnTaskFailed’ gets triggered when a package fails no matter how
many errors there are in the package. For example, it will be
triggered once even though there are 5 errors in the package.
20. What is SSIS Package Logging?
○ When your package is deployed, you might have some issues. If you
do not have any logging enabled, you won’t be able to find where
the errors are coming from.
○ So Package Logging allows you to record to capture log-enabled
events at the run time and on the package level.
○ Your logging output will have a collection of information about
the package that is collected when the package runs. For example,
a logging output can provide the start and finish times for a
package run.
○ There are two types of logging:
■ 1. SSIS logging (GUI)
● Package -> SSIS on the menu -> Logging...
● You can specify what kind of event you want to log.
● Ex:

■ 2. Custom logging (you have to write your own code)

95
●as the name specifies, it is a way to customize your
logging using the Event Handlers on SSIS. So whenever
a particular event occurs, your logging information
about the package will be recorded on the run time.
● Example:
● The way I used to do for the past project is to build
a SSIS Log table first in the SQL Server.
CREATE TABLE SSISLog (
EventID INT IDENTITY (1,1) NOT NULL,
EventType VARCHAR(20) NOT NULL, -- e.g. OnPostExecute, OnError
PackageName VARCHAR(50) NOT NULL,
TaskName VARCHAR(50) NOT NULL,
EventCode INT NULL,
EventDescription VARCHAR(1000),
PackageDuration INT,
ContainerDuration INT,
InsertCount INT,
UpdateCount INT,
DeleteCount INT,
Host VARCHAR(50)
CONSTRAINT PK_SSISLog_EventID
PRIMARY KEY CLUSTERED (EventID DESC))

● Then I would configure the Event Handler on SSIS,


particularly for the following events: OnError,
OnPostExecute, OnTaskFailed, OnWarning.
● So in the each event, I created ‘Execute SQL Task’
and sometimes ‘Script Task’ to insert the log
information.
○ There are five provider (logging output) types:
■ 1. Text File
■ 2. XML
■ 3. SQL server (secure)
■ 4. SQL server profiler
■ 5. Windows event log
21. What is SSIS Package Configuration?
○ SSIS Package configuration allows you to configure the parameters
of your package that change from a machine to a machine.
○ You cannot just copy and paste the package file and send it from
a developing server to a testing server because of...
■ different connection managers
■ different files, dbs,..
■ different machine names, computer names, passwords,
operating systems...

96
○ So you need to configure your package to send using a
configuration file.
○ SSIS package configuration increases the portability of your
package.
○ So this will help you to distribute your package into different
servers in a more stable way.
○ Package -> SSIS option -> Package Configurations
○ There are five different configuration types:
■ 1. XML (best since XML files are machine-compatible)
■ 2. Environment variable (variable at the Windows level)
■ 3. Registry entry (Windows registry level, very secure)
■4. Parent package variable (variable including
configuration info such as ConnectionString from a parent
package to a child package)
■ 5. SQL Server
○ The order of the configuration is from top to bottom. So the
configuration at the very bottom will override the upper ones.
22. What are the different types of configuration methods?
○ Direct Configuration and Indirect Configuration
○ Direct Configuration: when you deploy your package to an external
environment, you include the configuration file in the deployment
package.
■ Pros
● You don’t need environment variables creation or
maintenance.
● Changes can be made to the configuration files
(.dtsconfig) when deployment is made using SSIS
deployment utility.
● Scales well when multiple databases are used on the
same server.
■ Cons
● Need to specify configuration file that we want to
use when the package is triggered with DTExec.
● If multiple layers of packages are used (parent/child
packages), you need to transfer configured values
from the parent to the child package using parent
package variables.
○ Indirect Configuration: when you deploy your package to an
external environment, you do not include the configuration file.
Instead, you make your configuration file available somewhere the
users can fetch it from such as using the environment variable.
■ Pros

97
● All packages can reference the configuration file(s)
via environment variable.
● Packages can be deployed simply using copy/paste. No
need to mess with the SSIS Deployment Utility.
● Packages are not dependent on the configuration
switches when triggered with DTExec.
■ Cons
●Need creation and maintenance of environment
variables.
● Not easy to support multiple databases to be used on
the same server.
23. What is SSIS Package Security?
○ Security on SSIS packages that can be implemented on the package
level and on the properties of a package.
○ So there are five different types of ProtectionLevel:
■ 1. EncryptSensitiveWithUserKey (default)
● Encrypting only the sensitive information such as
password, connection strings, serve names, etc.
● The key will be your username.
● The lowest level of protection.
● Only the same user who uses the same profile can load
the package.
■ 2. EncryptSensitiveWithPassword
● The same with the first one except the key will be
the password instead of the user name.
■ 3. EncryptAllWithPassword
● Using a password to encrypt the whole package.
● The users won’t be able to see the XML code of the
package.
■ 4. EncryptAllWithUserKey
● The same with the third one except this one uses the
user key to encrypt the whole package.
■ 5. ServerStorage
● The highest level of protection.
● Rely on server storage for encryption.
● This protection saves everything in MSDB on the
Integration Services Server.
24. Explain about deploying a SSIS package.
○ After Developing the SSIS Package in the Business Intelligence
Development Studio (BIDS), we need to create Deployment Utility
to deploy the package in Server (Deployment Utility makes the
SSIS package portable).
○ Before deploying, set up the appropriate security of the package.

98
○ You then have to change deployment properties of the package:
■ Right click on the most top hierarchy on Solution Explorer
-> Properties
■Set AllowConfigurationChanges (which updates package
configurations when packages deployed) to True.
■ Set CreateDeploymentUtility to True.
■ We can keep DeploymentOutputPath as it is or we can update
it to another location.
○ Build the package then *.ssisdeploymentmanifest file will be
created in the location you specified.
○ If you double click on the on the *.ssisdeploymentmanifest file,
the Package Installation Wizard will open up.
○ There are two types of deployment methods :
■ 1. Push Deployment (By selecting SQL Server Deployment):
This installs the SSIS packages in SQL Server. In other
words, this sends the packages to the server.
■ 2. Pull Deployment (By selecting File System Deployment):
This installs the SSIS packages to the specified folder in
the file system. In other words, this creates a file folder
where people can access to get the packages.
25. What are the different ways of executing packages?
○ 1. Execute on the SSIS level using the debug mode.
○ 2. Using SQL Server Agent
■ Automated process and used mostly.
○ 3. dtexecui
■ From Command Prompt.
■ Opens up the Execute Package Utility.
○ 4. dtexec /f <fully qualified path of your deploying SSIS
package>
■ Can be executed using SQL Server Agent and should be
specified the type as PowerShell.
■ Will not open UI.
■ Also from Command Prompt.
○ 5. xp_cmdshell
■ An extended stored procedure.
■ You can do on SQL Server:
● xp_cmdshell ‘dtexecui’
● xp_cmdshell ‘dtexec /f ...’
○ 6. Import/Export Wizard
■ Not commonly used.
■ Importing is for push deployment and exporting is for pull
deployment.

99
26. What are some ways to optimize SSIS packages on the Control Flow
level?
○ 1. Use parallel execution by changing precedence constraints. If
there are two executables that can be executed independently,
don’t connect them using a precedence constraints. In order to
make more executables get executed parallely, increase the value
of the property of the package named ‘MaxConcurrentExecutables’.
■ The default value of MaxConcurrentExecutables = -1 (4).
○ 2. Use the Event Handler instead of failure precedence
constraints (the red arrows). It is because all the executables,
even the ones red arrows point to, in the package will be
validated and compiled and the executables in the Event Handler
won’t be unless the event is triggered. So try to use as much as
event handlers.
○ 3. Avoid ‘Bulk Insert Task’. Use ‘Data Flow Task’ instead.
○ 4. Use master-child packages using ‘Execute Package Tasks’. You
can also change the property of it, ExecuteOutOfProcess, to true
so that the child package will a separate process for the faster
performance.
○ 5. Use ‘Execute SQL Task’ instead of doing everything on the SSIS
level.
○ 6. If you have the logging configured, log only things that are
necessary instead of everything.
○ 7. Avoid unnecessary redundant event handlers.
○ 8. Set the Propagate variable to false since the event gets
escalated to the executable’s parent.
○ 9. Lastly, optimize each component of the Control Flow.
27. What are some ways to optimize SSIS packages on the Data Flow
level?
○ 1. Optimize the sources. Use SQL command for the OLE DB Source
and use JOINs and ORDER BY here instead of doing them using the
SSIS transformations.
○ 2. Avoid the blocking transformations.
○ 3. Avoid the Merge Join and Merge Transformations because the
inputs of them must be sorted.
○ 4. When you do the Lookup Transformation, use the Full Cache.
○ 5. When you use the Fuzzy Lookup Transformation, use the ETI
(Error Tolerance Index).
○ 6. You can increase the buffer size of the data pipelines
(Default = 10MB).
○ 7. You can also increase the size of the DefaultBufferMaxRows.
○ 8. Use SQLServerDestination since it’s specifically for the local
server.

100
○ 9. If you have a flat file source that loads huge flat file,
use ‘Fast Parse’ in the Advanced Options in the columns where you
are sure of the data integrity.
○ 10. Increase the value of EngineThreads, which is the same as
MaxConcurrentExecutables in the Control Flow level.
○ 11. Lastly, choose right method of execution and optimize
individual components.

Model 5 - SSAS
1. Okay. So what is SSAS?
○ SQL Server Analysis Services
○ SSAS delivers online analytical processing (OLAP) and data mining
functionality for business intelligence applications.
○ Analysis Services supports OLAP by letting you design, create,
and manage multidimensional structures that contain data
aggregated from other data sources, such as relational databases.
2. What are the objects in SSAS?
○ Data Source
■ See the Question #8 in Model 5 - SSAS
○ Data Source View
■ See the Question #10 in Model 5 - SSAS
○ Cube
■ See the Question #4 in Model 5 - SSAS
○ Dimension
■ See the Question #3 in Model 5 - SSAS
○ Mining Structure
■ Eh...not really important for now.
○ Roles
■ Add here.
○ Assembles
■ How you connect to a cube from a third party.
3. What is a Dimension?
○ Dimensions are a fundamental component of cubes.
○ Dimensions organize data with relation to an area of interest,
such as customers, stores, or employees, to users.

101
○ Dimensions in Analysis Services contain attributes that
correspond to columns in dimension tables. These attributes
appear as attribute hierarchies and can be organized into
user-defined hierarchies, or can be defined as parent-child
hierarchies based on columns in the underlying dimension table.
4. What is a Cube?
○ A cube is a set of related measures and dimensions that is used
to analyze data.
○ The measures and dimensions in a cube are derived from the tables
and views in the data source view on which the cube is based, or
which is generated from the measure and dimension definitions.
○ A cube is an object where possible aggregated information is
already calculated and stored in a multidimensional format so the
users can pull the data faster.
○ Using cubes, you can combine descriptive dimension data with
with numeric measures and you can create aggregated measures by
summing numeric values along the hierarchies contained in the
dimensions.
○ file extension = *.xmla

5. What is the scope of cubes?


○ Data Source View (DSV) bound.
6. What are the two ways to create a cube?
○ Top-down
○ Bottom-up (recommended)
7. What is a Measure Group?

102
○ a group of measures.
○ The same with a fact table in Data Warehousing...Just a different
terminology in SSAS.
8. What are the types of Measures?
○ 1. Additive: measures that could be added over any dimensions
(ex. Sales Amount, Qty Sold) -- SUM
○ 2. Semi-Additive: measures that could be added by some
dimensions, but not all the dimensions (ex. Stock inventory,
Balances) -- FIRST NON EMPTY CHILD, LAST NON EMPTY CHILD
○ 3. Non-Additive: measures that cannot be added to any dimension
(ex. rates, percentages) -- AVERAGE
9. What is a Data Source?
○ It is a source of your data that contains the information that
Analysis Services uses to connect to the source database.
○ The source data must be contained in a relational database such
as SQL Server 2008, Oracle, IBM DB2…
10. What is Impersonation Information and what are the different
options for it?
○ Impersonation Information is Windows credentials that SSAS will
use to connect to the data source.
○ There are four options for Impersonation Information:
■ 1. The Use A Specific Windows User Name And Password option
lets you enter the username and password of a Windows user
account.
■ 2. The Use The Service Account option will have Analysis
Services use its service logon user ID to connect to the
data source.
■ 3. The Use The Credentials Of The Current User option is
only used for some very specialized circumstances. It is
important to note that when you use this option, Analysis
Services will not use the Windows user name and password
of the current user for most processing and query tasks.
■ 4. The Inherit option causes this data source to use
the impersonation information contained in the Analysis
Services ‘DataSourceImpersonationInfo’ database property.
11. What is a Data Source View?
○ A data source view is a logical data model that exists between
your physical source database and Analysis Services dimensions
and cubes.
○ A data source view retrieves metadata for your data source
objects you use in the project.
○ The same concept as a VIEW in TSQL.

103
○ Using a DSV, you can select what kind of information you want to
retrieve from the data source.
12. What is a Named Calculation?
○ It is a computed column in a data source view.
○ You use a named calculation when you need to apply
transformations to your source data using an expression and add
the named calculation as a new attribute. For example, you may
want to append customers' first name and last names.
○ Right Click on the head of a table -> New Named Calculation
13. What is a Named Query?
○ Sometimes you may need to apply transformation that are more
complicated than just applying an expression. You may want to
filter, group, or join data from multiple tables. For example,
you want to get geographical information about the customers in
Dim Customer. Then you can get Dim Geography using a Named Query
and add the appropriate columns to Dim Customer. You can solve
this problem using a named query.
○ Right-click on the header of a table -> Replace Table -> With New
Named Query
14. What is a Hierarchy in SSAS and what are the different types of
it?
○ Hierarchies in SSAS is logical entities that an end user can use
to analyze fact data.
○ The SSAS engine doesn’t know the hierarchy of attributes, days to
a week, weeks to a month...for example. So you have to specify it
in SSAS.
○ A hierarchy is also a way of arranging attributes into levels to
view aggregated fact data at multiple levels.
○ A hierarchy improves the performance of browsing and querying.
○For example, The DimSalesTerritory table includes the
SalesTerritoryRegion, SalesTerritoryCountry, and
SalesTerritoryGroup columns. If you were to define a hierarchy
in the Territory database dimension based on this natural
relationship, you could then drill down into the data first by
sales group, then by country, and finally by region. And each
time you drilled deeper, you would view aggregated data at a more
granular level.
○ Generally, there are two types of hierarchies:
■ 1. Attribute hierarchy
● There are only two levels for an attribute hierarchy.
○ All level
○ Member level: contains actual column values.

104
● When you create an attribute, an attribute hierarchy
is automatically created.
■ 2. User-Defined hierarchy
● There can be multiple levels.
● It is a hierarchy pre-defined by users to help them
browse or query a dimension.
○ Based on the relationships between levels...
■ 1. Natural hierarchy
● One-to-many relationship between levels.
● Attribute hierarchies are a natural hierarchy but
user-defined hierarchy is not usually a natural
hierarchy.
● Time dimension would be an example of a natural
hierarchy.
■ 2. Unnatural hierarchy
●Not necessarily one-to-many relationship. The
relationships of an unnatural hierarchy can be one-
to-one or many-to-many.
● User-defined hierarchies are usually an unnatural
hierarchy.
● Unnatural hierarchies are usually for reporting
purposes.
○ Based on the structure of the hierarchy...
■ 1. Balanced
● The same depth for every leaf node to the root.
■ 2. Unbalanced
● The different depths for leaf nodes to the root.
■ 3. Ragged
● Where a certain node's parent is missing or is not an
immediate level.
15. Explain about the Cube Wizard.
○ First of all, you need to select a data source view that your
cube will be based on. All of the fact and dimension tables for
the cube must be in that data source view since the scope of a
cube is its DSV.
○ You then identify the fact table columns that will be used to
create measures in the cube.
○ A cube requires at least one measure group that must contain
at least one measure, but you most likely will choose to have
several measure groups in your cube.
○ After you have selected the measures that will be in your cube,
you then select the dimensions. The dimensions you include in the
cube must be based on the dimensions tables that are related to

105
the fact tables in the cube.
○ Finally, you give the cube a name and the wizard will create the
cube.
16. What happens in the background when you deploy a cube?
○ When you deploy your cube, an XML file will be created and copied
to the Analysis Service server. Then the cube structure will be
created and the source data is then process into the cube.
17. Explain about the Browse tab on SSAS.
○ The Browse tab is the main reporting tool in SSAS that is very
convenient to use while you develop your cube, but it has not
been designed for end users and cannot be deployed as stand-alone
application. So you should use Excel, or RS for reporting.
○ The Browser Tab has three panes: Metadata Pane, Sub-cube Pane,
and Report Pane.
○ The Sub-Cube Pane allows you to create more complex filters using
operators such as Equal, Not Equal, In, Not In, and so on.
○ You can create a simple filter by dropping a hierarchy on the
filter area of the Report Pane.

18. What are the different types of hierarchy relationship?


○ Flexible
○ Rigid
19. What is the maximum number of dimension a single cube have?
○ 28 dimensions in 2008 R2.
20. What is the difference between a table and a matrix?
○ Table: has values in columns.
○ Matrix: has values in the combination of rows and columns.
21. A little basic syntax of MDX...

106
SELECT
column(axisspecification/dimspecification) ON COLUMNS
rows(axisspecification/dimspecification) ON ROWS
FROM Cube_name
WHERE slicerspecification
-- if you don’t specify a measure on WHERE,
-- it will just select the default measure.
22. What is an Axis in MDX?
○ An axis is a number that defines each dimension.
○ Columns = 0
○ Rows = 1
○ Pages = 2
○ Sections = 3
○ Chapters = 4
○ ...
○ There are total 128 axes in MDX, from 0 to 127.
23. What is the difference between DESC and BDESC?
24. What is a Tuple?
○ the axis specification for multiple dimensions. It is a
combination of multiple dimensions.
○ Ex:
([Dim Product].[Product Category].Member,
[Dim Currency].[Currency].Members) ON ROWS
25. What is a Set?
○ It is a combination of tuples that share the same dimensionality
and hierarchy.
○ Ex:
SELECT
{(tuple1),(tuple2),(tuple3)} ON COLUMNS,
{(tuple1),(tuple2),(tuple3)} ON ROWS
FROM Cube
WHERE {(tuple1),(tuple2),(tuple3)}
26. What is a Calculated Member (MDX)?
○ It is the same concept as a derived column in TSQL.
○ It is a calculation performed on two or more members and gives a
single member as a result.
○ It is called a Calculation Measure because they are mostly
performed on the members of a measure group.
○ Ex:
WITH MEMBER <Member_name>
AS
AggregateFunction {()}
SELECT

107
... ON COLUMNS,
... ON ROWS
FROM Cube
WHERE ...
27. What is a Named Set?
○ A named set is the calculation performed on multiple members and
retrieve one or multiple members as a result.
28. What is Partition in SSAS?
○ Partition in SSAS is dividing a measure group into different
parts and each part will store a part of data for a measure
group.
○ Each partition can pull its data from the same table or from
separate table as long as the structure matches.
○ Each partition can have its own aggregation and storage scheme.
○ Most commonly, you will do a partition on time basis.
○ Partitions can speed processing because the engine may have to
process only a small subset of data.
○ By default, the SSAS will make a partition for each group defined
within the cube structure (table binding).
○ You can store the partition of a single measure group at
different storage locations.
○ By default, every partition is table-bound.
○ When you are performing any partition, you will make it query
binding from table binding.
○ Ex:
SELECT * FROM [dbo].[FactInternetSales]
WHERE OrderDateKey <= 20051231
-- 2006
SELECT * FROM [dbo].[FactInternetSales]
WHERE OrderDateKey >= 20060101 AND OrderDateKey <= 20061231
-- 2007
SELECT * FROM [dbo].[FactInternetSales]
WHERE OrderDateKey >= 20070101 AND OrderDateKey <= 20071231
-- 2008
SELECT * FROM [dbo].[FactInternetSales]
WHERE OrderDateKey >= 20080101 AND OrderDateKey <= 20081231
-- 2009
SELECT * FROM [dbo].[FactInternetSales]
WHERE OrderDateKey >= 20090101
29. What is Merging Partitions?
○ Merging combines multiple partitions into one.
○ There are certain criteria that must be met:
■ They should be in the same cube.

108
■ They should have the same structure.
■ They have the same storage modes.
■ They contain identical aggregation designs.
30. What is Aggregation in SSAS?
○ Aggregations are higher level summaries of the data.
○ Aggregations are recalculated summaries of data of difference
dimension combinations. Specifically, an aggregation contains
the summarized values of all measures in a measure group by a
combination of different dimensions.
○ Aggregations are most useful for speeding up queries by returning
pre-calculated values instead of figuring them out at runtime.
○ You cannot control the types of aggregation creation. You can
only have the control define the percent of possible combinations
it has to take.
○ You can create only 1 aggregation to any measure group or
partition.
31. What are different AggregationUsage settings of aggregations?
<AggregationUsage Settings Table>
Setting Description

Full Every aggregation in this cube must include this attribute


or a related attribute at a lower level of the attribute
chain. For example, if you have Month->Quarter->Year
attribute relationships and you set the AggregationUsage
property of Year to Full, the server might favor Quarter
instead of Year because the Year totals can be derived from
Quarter totals.

None No aggregations will include this attribute. Consider using


this option for infrequently used attributes.

Unrestricted Leaves it to the Aggregation Design Wizard to consider the


attribute when designing aggregations.

Default Escalates the All attribute, the dimension key attribute,


and the attributes participating in the user hierarchies
to Unrestricted. The aggregation usage of attributes
participating in Many-To-Many, Referenced, and Data Mining
dimensions will be set to None.

32. What is One Third Rule?


○ The aggregation wizard will only create aggregations if the size
of the aggregation is 33% or less of the size of the fact table.
○ It is recommended not to go beyond 20%. Create aggregation only
till 20% and rest we can do later using Usage-based Optimization.
33. What is Usage Based Optimization?

109
○ An aggregation contains the summarized values of all members
in a measure group by a combination of different attributes.
At design time, you can use the Aggregation Design Wizard to
define aggregations based on your dimensional design and data
statistics.
○ After the cube is in production and representative query
statistics are available, you should consider running the Usage-
Based Optimization Wizard to fine-tune the aggregation design
based on the actual queries submitted to the server.
○ Properties to check on the Analysis Server:
■ CreateQueryLogTable: set it to TRUE to allow the AS to
create a table to log the queries.
■ QueryLogSampling: the frequency for query sampling. By
default the value is 10, which means every tenth query is
logged.
■ QueryLogConnectionString: Connection string specifying the
server and database to be used to log the queries.
■ QueryLogTableName: Name of the table in which you log the
queries that are run against the cube. The query log table
will have following columns:
● MSOLAP_Databse
● MSOLAP_ObjectPath
● MSOLAP_User
● Dataset
● StartTime
● Duration
○ The server does not log the actual MDX queries. Instead, it
logs certain query information, which includes the dimensions
requested by the query and the attribute that the server used to
satisfy the query. The server logs these statistics in the query
log table.
○ The most important column for the Usage Based Optimization Wizard
is the Dataset Column, which captures the attributes used to
resolve the query.
34. What is Storage Modes in SSAS?
○ The cube metadata is always stored on the SSAS server, but as an
administrator, you can control the storage locations of the cube
data and any aggregation.
○ Different types of Storage/Caching Mode:
■ MOLAP: stores metadata, data and aggregations are stored in
SSAS. Data is stored in compressed format on the server.
Storage size is 20% ~ 25% of the relational data. The
administrator can enable proactive caching on a MOLAP

110
partition to implement real-time data refreshing. It is
like a snapshot in time.
■ HOALP: stores metadata and aggregations. The actual data
is stored in the relational database. HOLAP is the most
efficient mode in terms of disk space because detail-level
data is not duplicated, as it is with MOLAP, and HOLAP
requires less space to store aggregations than ROLAP does.
■ ROLAP: stores only metadata. Both the cube data and the
cube aggregations remain in the relational database. So
the SSAS Server must create additional relational tables to
hold the cube aggregations. Actually it stores aggregations
in indexed views in the relational database.
35. What are the differences between MOLAP and ROLAP?
<MOLAP and ROLAP Difference Table>
MOLAP ROLAP

Standing For Multidimensional OLAP Relational OLAP

Usage For a partition that is For a partition that is rarely


accessed frequently. accessed.

Data Stores metadata, Stores only metadata.


aggregations, and actual
data.

Latency High latency (not up to Low latency (always up to


date) date)

Performance High performance since it Low performance since it does


contains the actual data. not contain the actual data.

36. What are different settings of Partition Storage?


<SQL Server Standard Partition Storage Settings Table>
Mode Query Time

Real-time ROLAP As with standard ROLAP, partition data and aggregations


are stored in the relational database. The server
maintains an internal cache to improve query
performance. When the change notification is received,
the server drops the ROLAP cache to ensure that data
is not out of sync with the data in the relational
database.

Real-time HOLAP As with standard HOLAP, partition data is stored in the


relational database, and aggregations are stored in the
cube. Aggregations are rebuilt as soon as a data change
notification is received.

111
Low-latency MOLAP The MOLAP cache expires in 30 minutes.

Medium-latency Similar to Automatic MOLAP except that the MOLAP cache


MOLAP expires in 4 hours.

Automatic MOLAP The default silence interval is set to 10 seconds.


As a result, the server will not react if the data
change batches are fewer than 10 seconds apart. If
there is not a period of silence, the server will start
processing the cache in 10 minutes.

Scheduled MOLAP Same as MOLAP except that the server will process the
partition on a daily schedule.

MOLAP The partition storage mode is standard MOLAP. Proactive


caching is disabled. You need to process the partition
to refresh data.

37. What is Proactive Caching?


○ With MOLAP, the server brings the cube data from the data source
into the cube when the cube is processed. The data is duplicated
because it exists in both the data source and the cube.
○ MOLAP data latency is high because new data is available only
when the partition is processed.
○ However, the administrator can enable proactive caching on a
MOLAP partition to implement real-time data refreshing.
○ Proactive caching is especially useful when the relational
database is transaction oriented and data changes at random.
○ When data changes are predictable, such as when you use an
extract, transformation, and load (ETL) process to load data,
consider processing the cube explicitly.
○ When data source is transaction oriented and you want to minimize
latency, consider configuring the cube to process automatically
by using proactive caching.
38. What are the two properties of Proactive Caching?
○ 1. Silent Interval: the time period of waiting for a change.
○ 2. Silent Override Interval: the maximum period of the trace file
to wait.
NOTE: These do not process the cube, it just merely stores the
data inside the cube to make the data available for reporting
services.
39. What are the disadvantages of MOLAP?
○ Since MOLAP uses data stored inside of a cube, it has to first
grab the data from the data warehouse. The disadvantage of this
is that the data stored on the cube is not always the most recent

112
processed to have the most current data in the data warehouse;
thus subjected to High Latency depending on caching mode.
○ It is possible to use SSAS to process this partition of the cube
through storage option properties: [Drop Outdated Cache, or
Update the Cache Periodically] however, it is not recommended as
this will constantly reprocess the cube making it unavailable.
○ This is a problem for current reports, but using Proactive
Caching can be the solution to a user’s reporting needs.
40. What are the disadvantages of Proactive Caching?
○ The disadvantage of a Proactive Caching is that the Cube will be
processed frequently as it is being updated frequently. When
the cube is being processed it can’t be accessed by Reporting
Services or any other Services.
○ Depending on the size of the Cube, it could take many hours to
process.
○ A user should use SSIS ETL strategy to process a cube over using
SSAS to update or reprocess the cube completely.
41. What kind of storage/caching mode does a dimension table use?
○ MOLAP or ROLAP
○ Dimension tables do not have aggregations so HOLAP is not
possible.
42. What are different options of Processing options?
<Processing Options for OLAP Objects>
Processing Description Applies To
Option

Process Default Performs the minimum number of tasks All objects


required to fully initialize the
object. The server converts this
option to one of the other options
based on the object state.

Process Full Drops the object stores and rebuilds All objects
the object. Metadata changes, such as
adding a new attribute to a dimension,
require Process Full.

Process Update Applies member inserts, deletes, and Dimension


updates without invalidating the
affected cubes.

Process Add Adds only new data. Dimension,


partition

113
Process Data Loads the object with data without Dimension, cube,
building indexes and aggregations. measure group,
partition

Process Index Retains data and builds only indexes Dimension, cube,
and aggregations. measure group,
partition

Unprocess Deletes the object data or the data in All objects


the containing objects.

Process Deletes the partition data and Cube


Structure applies Process Default to the cube
dimensions.

43. What is an Action in SSAS?


○ Actions help us to extend the scope of the cube. If we have to
provide additional information from the cube or outside of the
cube, we can make use of Actions is SSAS.
○ 1. Standard/URL Actions
○ 2. Report Actions
○ 3. Drill Through Actions
○ URL and Report Actions extends the scope of your cube to external
environments and Drill Through Actions gives your cube detailed
information from within the cube.
○ Drill Through Action is the only type of action that provides
information from the cube.
○ Rest of the actions would redirect the users to an external
environment.
44. What is a KPI?
○ KPIs are quantifiable measures that represent critical success
factors, and analysts use them to measure company performance,
over time, against a predefined goal for helping you with
comparative analysis.
○ For example: Sales Profit, Revenue Growth, and Growth In Customer
Base are good KPI candidates.
○ KPIs are typically used as part of a strategic performance
measurement framework, commonly known as a business scorecard.
○ There are KPI templates already available also.

114
○ There are four expression configurations to consider while
creating a KPI:
■ Value Expression: indicates the current level of business
(sales amount) current value of my business (MDX
calculations).
■ Goal Expression: what is my business goal. For example, by
next year I need to have 40% increase in my Sales Amount
when compared to this year (MDX calculations).
■ Status Expression: shows how is your progress towards your
goal -- by comparing your current value with your goal.
● -1 0 1 (visual icons)
■ Trend Expression: indicates what is the present trend with
your business during these period (The trend expression is
used to compare the current status value with the value of
Status at a previous point in time).
● -1 0 1 (visual icons)
45. What is Perspective in SSAS?
○ Perspectives are used to provide access to a subset of the
Cube which contains the data that a particular group/users are
interested in.
○ In other words, Perspective is for designing a subset of an
existing cube to limit the scope of the cube for users and
provide only the information a subset that users are interested
in. You have to define what objects fall under a certain
perspecitve.
○ The perspectives will help the users to just use the information
that they are interested in or that is related to their business
analysis.
○ You need to have a clear understanding of the data structure
and business process before you implement the perspectives on
the cubes so that you would identify proper/appropriate objects
related to that perspective.

115
46. What does an SSAS Administrator have to do and why are they
important?
○ SSAS administrator will perform day-to-day tasks to protect Cube
data
○ The SSAS user security architecture is layered on top of Windows
security.
47. How are users authenticated on an SSAS server?
○ Users are authenticated by their Windows account and Authorized
by their assigned Roles.
48. What are the two main roles that in SSAS?
○ Administrator Role
■ A user who is a member of this role has unrestricted access
to the entire server.
■ For example, members of the Administrators role can create
SSAS databases and change server properties.
■ You cannot delete the Administrators role.
○ Database Roles
■ By default, users who are not members of the Administrators
role are denied access to SSAS.
■ To grant users access to SSAS objects, you must create one
or more database roles and assign the users to a role that
gives them the required permissions.
NOTE:
○ To simplify security management, SSAS lets you group Windows
users and groups into roles.
○ The security policies you define in a role restrict the cube
space that the user is authorized to access.
○ Understanding Permissions
○ When you configure a role, you specify a set of security policies
that are saved as a collection of permissions inside the object
metadata.
○ Cell Security is the most detailed level of SSAS Security.
49. What are the ways to optimize your cube?

Model 6 - SSRS
1. What is SSRS?
○ SQL Server Reporting Services

116
○ SSRS is a component of MS SQL Server that adds a server-based
reporting solution to the Microsoft BI framework.
○ SSRS allows you to generate a report, which a structured
arrangement of information to answer business questions by
presenting data in matrices, charts, tables, maps and such.
○ SSRS also allows you not only to create but manage and share
reports in your organization.
○ In a technical point of a view, a report is nothing but a
metadata layer in a form of an XML file.
○ ReportServer and ReportServerTempDB in the database engine
are the databases that SSRS depends on and SSRS stores its
information on.
2. What is a Report?
○ A report is a structured arrangement of information to answer
business questions by presenting data in matrices, charts,
tables, maps and such. It is nothing but a metadata layer in a
XML format.
○ .rdl = report definition language
3. What is a Report Model?
○ A report model is a semantic description of business data. It is
also a metadata layer, like a report, that serves as the basis
for building ad hoc reports.
○ Non-technical users and users who do not have a good
understanding about the underlying data structure will use the
report model as the data source for the ad hoc reports instead of
directly accessing the database or the cube.
○ A Report Model is an intermediate layer between the business user
and the source data that generates queries based on the objects
that the users select.
○ With this layer in place, the user does not need to know the
query language to retrieve data successfully for a report (ad hoc
report).
○ .smdl = semantic model definition language
○ A report model has three objects:
■ 1. entities = tables in the DSV
■ 2. attributes = columns in the DSV
■ 3. roles = defines a relationship between two entities
4. What is the Reporting Lifecycle?
○ Authoring
■ Authoring is where you define what kind of data to be
presented, organize the data in the well-structured format,
and apply formatting to enhance the report’s appearance.

117
■ SSRS transforms the design of the report into report
definition, which has the .rdl file extension and which is
nothing but an XML file that contains the structure of the
report, the metadata and such.
○ Management
■ Management deals with other contents on the server and the
performance of other administrative tasks, such as setting
report properties, managing report execution, and applying
security.
■ A report author or an administrator set up a report
execution schedule in the Management state as well as
building shared schedules and shared data source.
○ Accessing/Delivery
■ Delivery includes all the activities related to the
distribution of reports such as accessing reports online,
rendering reports to various formats, saving and printing
reports, and subscribing to reports.
■ There are two methods to access/deliver a report.
● 1. On-Demand Access: allows users to select a
particular report they need on a report viewing tool
such as Report Manager.
● 2. Subscription-Based Access: automatically generates
and delivers a rendered report, which has a format
XML, HTML, PDF, EXCEl and such to an e-mail address
or a shared folder on Windows.

5. Explain about the Reporting Server Configuration Manager.

118
○ It is a configuration window for SSRS.
○ You can see following things:
■ Report Server Status: shows the current status of your
report such as SQL Server instance, Report Server database
name, Report Server mode and status.You can start or stop
the Report Server.
■ Service Account: where you specify a built-in account or
Windows domain user account to run RS.
■ Web Service URL: where you configure an URL of your Report
Server.
■ Database: where you specify in which database the RS will
store its content and application data.
■ Report Manager URL: where you configure an URL of your
Report Manager.
■ E-mail Settings: where you specify the SMTP server and
email address to user report server email features.
■ Execution Account: where you specify an account to enable
the use of report data sources that do not require any
credential or to connect to remote servers to store
external images used in reports.
■ Encryption Keys: where you can backup, restore, change and
delete encryption keys for SSRS.
■ Scale-out Deployment: where you can view information about
scale-out deployment.

119
6. What are the four main components in SSRS?
○ 1. Report Designer
■ Your BIs and the templates.
● 1) Report Model Project: in which you create models
that support Adhoc reports.
● 2) Report Server Project: in which you create managed
reports.
○ 2. Report Builder
■ The component/tool we use to design Adhoc reports from
report models.
■ Takes a model as its input and outputs Adhoc reports.
○ 3. Report Server
■ The web interface which acts as a gateway to the underlying
report server DB.
■ http://localhost/ReportServer
○ 4. Report Manager
■ The web user interface to the report server where you can
manage and administrate the report items.
■ http://localhost/Reports
7. What is a Data Region?
○ Data region is a report item that displays data from datasets in
a table, matrix, chart, list and such. You can also nest data
regions within other data regions.
8. Can you create a report formatted as a chart using the Report Wizard?
○ No. The Report Wizard automatically generates only a tabular
or matrix report for you. You must create a chart data region
directly on your report.
9. Can you use a stored procedure to provide data for your report?
○ Yes, you can use a stored procedure in your dataset. It actually
becomes faster because it is a stored procedure, not just a plain
SQL query.
10. What are different types of reports?
○ Managed Reports
■ drill down report
■ drill through report
■ sub report
■ chart report
■ map report
■ parameterized report
■ cascaded parameterized report
■ multi-valued parameterized report
○ Ad hoc Report
○ Embedded Report

120
○ Linked Report
○ Snapshot Report
○ Cached Report
11. You want a report to display Sales by Category, SubCategory, and
Product. You want users to see only summarized information initially
but to be able to display the details as necessary. How would you
create the report?
○ You can use a drill down report to initially hide the subcategory
and product information.
○ First I would set up a proper data source to get the product,
sales, and order date information.
○ Then I would use a proper query to retrieve the information
needed.
○ Then I would create a matrix data region. Then add the columns,
product, sub category and category, keeping the correct order of
the group. You can specify the parent or child group by clicking
on the column or row level or you can specify groups on the row
groups and column groups windows.
○ Then you would hide the product and subcategory group and make
them visible by toggling their parent group.

12. What is the main reason for adding a parameter to a report?


○ I would add an parameter mainly to increase the user
interactivity for my report. By doing so, I can let users change
the report behavior based on the selection of the parameter.
○ For example, I have a parameter for the order year. Then you can
interactively select 2005, 2006, 2007 and such to see the certain
year’s sales information.
○ So this is where parameterized reports come in.

121
13. What are the actions in SSRS?
○ Actions are used on a report to extend the scope of a report.
○ You can configure the type of actions on properties
named ‘Action’.
○ You can also specify which data will be a bookmark for an action.
○ There are...
■ URL
■ Bookmark
■ Report
14. What is Conditional Formatting in SSRS?
○ It is changing the format of report items based on the data in
the report.
○ At run time, SSRS evaluates the expression and substitutes the
result for the property value.
○ When the report is rendered, the run-time value is used.
○ Using Conditional Formatting, you can change the background color
of the cells of a matrix based on the field value or change the
color of the font in a field and such.
15. How would you add page numbers, execution time, userID and such
to your report?
○ I would first create a text box and place it at the bottom of
the report. I would create a page footer (or page header) if
necessary.
○ Using expression and built-in fields, I can add page number,
total page number, execution time and userID in a text box.
○ I would have to conversion if needed.
<Expressions how it’s displayed on the preview>

16. What is a Document Map and how do you create it?


○ A Document Map is a part of a report where navigate users to a
value to the particular location on a report.
○ It is like an explorer where you can drill down and select a
particular record you want to navigate to.
○ You can create a document map by configuring the property of a
particular field of your data.
<You can see the document map on the left side of the Preview>

122
17. What is Interactive Sorting?
○ Interactive Sorting is for increasing the user interactivity of
your report by allowing them to sort by a column of their choice.
○ You don’t know what the users are going to sort by. So I have the
Interactive Sorting properties to those columns that the users
potentially sort by.
○ Let’s say there is a tabular report that has columns Student ID,
Last Name, GPA and such. And I don’t know what the users are
going to sort by so I would give the Interactive Sorting property
to the Student ID, Last Name and GPA. It all depends on the
business requirements I am given.
○ Once you enable interactive sorting on a particular field, the
users can sort it in either an ascending or descending order.
<Interactive Sorting on City. The cities of Germany is sorting in an
ascending order. Notice the little arrow next to City>

123
18. How do you deploy a report?
○ When you deploy your report, the data source needs to be deployed
on the server as well as the report.
○ First, you need to define the connection of the project first.
○ Properties of the project
■ Specify ...
● TargetDataSourceFolder
● TargetReportFolder
● TargetServerURL (http://jihoon-pc/reportserver)
■ By configuring these project properties, you can deploy
your report to different locations.
■ Then you can deploy the whole project or just deploy the
data source and report individually.
■ When you deploy, the SSRS engine will automatically ‘build’
the project.
19. What are the two types of deploying a report?
○ Native Mode
○ SharePoint/Integration Mode
20. Tell me about security in SSRS.

124
○ SSRS uses role-based security to allow individual users or groups
of users to perform specific tasks.
○ Roles are used to establish groups of tasks based on the
functional needs of users.
○ Followings are the default user roles:
■ Browser: It is the most limited role. Browsers are able
to only navigate through the folder hierarchy and open
reports.
■ Report Builder: Report builder has the same permissions
as Browsers, except Report Builders can load report
definitions from the Report Server into a local instance of
Report Builder.
■ My Report: It allows users to manage their own reports
separately from the main folder hierarchy.
■ Publisher: It allows users to add content to the Report
Server.
■ Content Manager: It is the broadest role, which allows
users to take their ownership of the item, including the
ability to manage security.
○ You can also apply system security using system roles, which
allow selected users or groups to perform system administration
tasks that are independent of content managed on the server.
System roles provide access only to server activities.
■ System Administrators: are users who can always access to
the Report Server to change the site settings.
■ System User: are users who can access to the site settings
so that role members can view the server properties and
shared schedules.
21. What is a Linked Report?
○ A linked report is a report server item that provides an access
point to an existing report.
○ You can think of it as a shortcut on your desktop to run a
program.
○ To create a linked report...
■ 1. In Report Manager, navigate to the folder containing the
report that you want to link to, and then open the options
menu of the file. Then you can click Create Linked Report.
■ 2. Type a name for the new linked report. Optionally type a
description.
■ 3. To select a different folder for the report, click
Change Location. Click the folder you want to use, or type
the folder name in the Location box. Click OK. If you do
not select a different folder, the linked report is created

125
in the current folder (where the report it is based on is
stored).
■ 4. Click OK. The linked report opens.
○ A linked report is derived from an existing report and retains
the original's report definition. A linked report always inherits
report layout and data source properties of the original report.
○ All other properties and settings, however, can be different from
those of the original report, including security, parameters,
location, subscriptions, and schedules.
22. What is an Embedded Report?
○ It is a type of report that is generated using .NET or C#.
23. What is a Subreport?
○ Such as a nested procedure, a subreport is a report called into
another report. It is possible also to pass parameter value from
the report to the subreport to make it more dynamic.
24. What is a Cached Report?
○ Caching a report is helpful when you want to strike a balance
between having current data in the report and having faster
access to the online report. The first time that a user clicks
the link for a report configured to cache, the report execution
take place. However, the report is flagged as a cached instance
and store in ReportServerTempDB until the time specified by the
cache settings expire.
25. What is a Snapshot Report?
○ A report snapshot executes the query and produces the
intermediate format in advance for the user’s request to view
the report. It can be generated on demand, or you can set-up a
recurring schedule. It is stored in the ReportServerDatabase as
part of permanent storage. (Keep an history of that report)
26. What is a Subscription on SSRS and how and why did you use it?
○ Standard Subscriptions
You can render a report to multiple users in a single rendering format.
All the users/subscribers will receive the same rendering format and
the subscriber information is hardcoded into the subscription.
○ Data Driven Subscriptions
We can deliver multiple reports to multiple users in multiple rendering
formats. The user/subscriber information is not readily available but
it is the underlying database. You need to retrieve the subscriber
information and format the subscription accordingly.
27. What is the difference between a dashboard and scorecard?
○ A dashboard is a container for various types of reports,
including scorecards. It might consist of one or more pages,
and it might have more than one module on each page. A typical

126
dashboard might contain a scorecard, an analytic report, and an
analytic chart and so on.
○ A scorecard measures performance against goals. It displays a
collection of KPIs together with performance targets for each
KPI.
28. What are some command line utilities for SSRS?
○ rsconfig.exe – assists in managing the SSRS instance connection
to the repository database.
○ rskeymgmt.exe – assists in the management of the encryption keys
for operations such as backup, restore, and create.
○ rs.exe – assists in the .NET scripting of report management
operation.
29. What is the Report Service Configuration File and What did you
use it for?
○ Located in...
■ C:\Program Files\Microsoft SQL
Server\MSRS10_50.MSSQLSERVER\Reporting
Services\ReportServer\rsreportserver.config
○ Report Service Configuration File is an XML file that contains
all the configuration information of SSRS.
○ Once you open the file and go to the ‘Render’ block, you can see,
by default, the hidden rendering options such as ATOM, RPL, and
such. These options are not available on your BIDS.
○ The same stuff with the ‘Data’ block, which represent available
data source options.
○ By putting Visible=”false” at the end of each option, you can
hide it.
○ I would want to do this when I want to limit the available
options for the users. And it all depends on the business
requirements at the end.

127
128

You might also like