You are on page 1of 9

License Agreement

Microsoft Corporation Technical Documentation License Agreement (Standard)


READ THIS! THIS IS A LEGAL AGREEMENT BETWEEN MICROSOFT CORPORATION
("MICROSOFT") AND THE RECIPIENT OF THESE MATERIALS, WHETHER AN
INDIVIDUAL OR AN ENTITY ("YOU"). IF YOU HAVE ACCESSED THIS AGREEMENT IN
THE PROCESS OF DOWNLOADING MATERIALS ("MATERIALS") FROM A
MICROSOFT WEB SITE, BY CLICKING "I ACCEPT", DOWNLOADING, USING OR
PROVIDING FEEDBACK ON THE MATERIALS, YOU AGREE TO THESE TERMS. IF
THIS AGREEMENT IS ATTACHED TO MATERIALS, BY ACCESSING, USING OR
PROVIDING FEEDBACK ON THE ATTACHED MATERIALS, YOU AGREE TO THESE
TERMS.
1. For good and valuable consideration, the receipt and sufficiency of which are
acknowledged, You and Microsoft agree as follows:
(a) If You are an authorized representative of the corporation or other entity designated
below ("Company"), and such Company has executed a Microsoft Corporation NonDisclosure Agreement that is not limited to a specific subject matter or event ("Microsoft
NDA"), You represent that You have authority to act on behalf of Company and agree that
the Confidential Information, as defined in the Microsoft NDA, is subject to the terms and
conditions of the Microsoft NDA and that Company will treat the Confidential Information
accordingly;
(b) If You are an individual, and have executed a Microsoft NDA, You agree that the
Confidential Information, as defined in the Microsoft NDA, is subject to the terms and
conditions of the Microsoft NDA and that You will treat the Confidential Information
accordingly; or
(c)If a Microsoft NDA has not been executed, You (if You are an individual), or Company
(if You are an authorized representative of Company), as applicable, agrees: (a) to refrain
from disclosing or distributing the Confidential Information to any third party for five (5)
years from the date of disclosure of the Confidential Information by Microsoft to
Company/You; (b) to refrain from reproducing or summarizing the Confidential
Information; and (c) to take reasonable security precautions, at least as great as the
precautions it takes to protect its own confidential information, but no less than
reasonable care, to keep confidential the Confidential Information. You/Company,
however, may disclose Confidential Information in accordance with a judicial or other
governmental order, provided You/Company either (i) gives Microsoft reasonable notice
prior to such disclosure and to allow Microsoft a reasonable opportunity to seek a
protective order or equivalent, or (ii) obtains written assurance from the applicable judicial
or governmental entity that it will afford the Confidential Information the highest level of
protection afforded under applicable law or regulation. Confidential Information shall not
include any information, however designated, that: (i) is or subsequently becomes
publicly available without Your/Companys breach of any obligation owed to Microsoft; (ii)
became known to You/Company prior to Microsofts disclosure of such information to
You/Company pursuant to the terms of this Agreement; (iii) became known to
You/Company from a source other than Microsoft other than by the breach of an
obligation of confidentiality owed to Microsoft; or (iv) is independently developed by
You/Company. For purposes of this paragraph, "Confidential Information" means
nonpublic information that Microsoft designates as being confidential or which, under the

Copyright 2006 by Microsoft Corporation. All rights reserved. By using or providing


feedback on these materials, you agree to the attached license agreement. Please provide
feedback at BI FeedBack Alias.

circumstances surrounding disclosure ought to be treated as confidential by Recipient.


"Confidential Information" includes, without limitation, information in tangible or intangible
form relating to and/or including released or unreleased Microsoft software or hardware
products, the marketing or promotion of any Microsoft product, Microsoft's business
policies or practices, and information received from others that Microsoft is obligated to
treat as confidential.
2. You may review these Materials only (a) as a reference to assist You in planning and
designing Your product, service or technology ("Product") to interface with a Microsoft
Product as described in these Materials; and (b) to provide feedback on these Materials
to Microsoft. All other rights are retained by Microsoft; this agreement does not give You
rights under any Microsoft patents. You may not (i) duplicate any part of these Materials,
(ii) remove this agreement or any notices from these Materials, or (iii) give any part of
these Materials, or assign or otherwise provide Your rights under this agreement, to
anyone else.
3. These Materials may contain preliminary information or inaccuracies, and may not
correctly represent any associated Microsoft Product as commercially released. All
Materials are provided entirely "AS IS." To the extent permitted by law, MICROSOFT
MAKES NO WARRANTY OF ANY KIND, DISCLAIMS ALL EXPRESS, IMPLIED AND
STATUTORY WARRANTIES, AND ASSUMES NO LIABILITY TO YOU FOR ANY
DAMAGES OF ANY TYPE IN CONNECTION WITH THESE MATERIALS OR ANY
INTELLECTUAL PROPERTY IN THEM.
4. If You are an entity and (a) merge into another entity or (b) a controlling ownership
interest in You changes, Your right to use these Materials automatically terminates and
You must destroy them.
5. You have no obligation to give Microsoft any suggestions, comments or other feedback
("Feedback") relating to these Materials. However, any Feedback you voluntarily provide
may be used in Microsoft Products and related specifications or other documentation
(collectively, "Microsoft Offerings") which in turn may be relied upon by other third parties
to develop their own Products. Accordingly, if You do give Microsoft Feedback on any
version of these Materials or the Microsoft Offerings to which they apply, You agree: (a)
Microsoft may freely use, reproduce, license, distribute, and otherwise commercialize
Your Feedback in any Microsoft Offering; (b) You also grant third parties, without charge,
only those patent rights necessary to enable other Products to use or interface with any
specific parts of a Microsoft Product that incorporate Your Feedback; and (c) You will not
give Microsoft any Feedback (i) that You have reason to believe is subject to any patent,
copyright or other intellectual property claim or right of any third party; or (ii) subject to
license terms which seek to require any Microsoft Offering incorporating or derived from
such Feedback, or other Microsoft intellectual property, to be licensed to or otherwise
shared with any third party.
6. Microsoft has no obligation to maintain confidentiality of any Microsoft Offering, but
otherwise the confidentiality of Your Feedback, including Your identity as the source of
such Feedback, is governed by Your NDA.
7. This agreement is governed by the laws of the State of Washington. Any dispute
involving it must be brought in the federal or state superior courts located in King County,
Washington, and You waive any defenses allowing the dispute to
be litigated elsewhere. If there is litigation, the losing party must
pay the other partys reasonable attorneys fees, costs and other

Copyright 2006 by Microsoft Corporation. All rights reserved. By using or providing


feedback on these materials, you agree to the attached license agreement. Please provide
feedback at BI FeedBack Alias.

expenses. If any part of this agreement is unenforceable, it will be considered modified to


the extent necessary to make it enforceable, and the remainder shall continue in effect.
This agreement is the entire agreement between You and Microsoft concerning these
Materials; it may be changed only by a written document signed by both You and
Microsoft.

Copyright 2006 by Microsoft Corporation. All rights reserved. By using or providing


feedback on these materials, you agree to the attached license agreement. Please provide
feedback at BI FeedBack Alias.

5
Data Staging
In this Chapter

Factors for staging or not staging

Different staging techniques

Overview
Data extracted, transformed and loaded (ETL) from the source to destination may require
temporary staging for various reasons such as addressing failures, reducing load on the
source system, data cleansing, auditing purposes etc. The following section will iterate
through various scenarios to help identify if staging is required in first place, if you do
need staging then selecting the most appropriate model that fits the criteria becomes
critical for an optimal ETL solution.

Decision to Stage or Not to Stage


As described in the Transformation guide, it is possible to load data directly from the
extraction process into the data warehouse, without a separate staging area. The
opportunity to avoid a staging database is even greater when using SSIS, which allows
you to apply multiple transformations to data as it passes through memory between the
source and the destination. Whether you need to stage the data or not as part of your
ETL process is common design decision that you have to make.

Factors for Not Staging


In most simple scenario, the ETL process copies data from the source directly to the
data warehouse. In this case, if there is a failure, you restart the job, again extracting
data from the source system. Assuming that the source is continually available in case
the job needs to restart, and the transformations are simple enough that they can be
performed in memory, this is an extremely efficient way to load the data warehouse
Also, if the source is simple and clean enough that it does not need any data cleansing
or transformations before loading to warehouse then it is a simple mapping without the
need of a separate transformation process. The required data is extracted, mapped
and loaded to warehouse, is the scenario here.

Factors for Staging


Following are reasons you would need a staging database:

Copyright 2006 by Microsoft Corporation. All rights reserved. By using or providing


feedback on these materials, you agree to the attached license agreement. Please provide
feedback at BI FeedBack Alias.

There is a time lag between Extraction and Loading process. If the extraction
occurs at 7 AM and the loading to the warehouse happens at midnight then the
extracted data has to be stored somewhere. A staging database is a reasonable
option..

Restart on failure:. If various parts of the ETL process are vulnerable to fail,
recovery is easier if a staging database is available as the starting point. For
example, Out of 100 tables, the extraction process is failed after extracting 60 files to
staging area successfully. When the extraction process is restarted it does not need
to extract all but the failed 40 as the data is already available for successful 60 in
staging. When the source system resources are constrained, it is particularly
important to avoid repeating an extract unnecessarily. The staging database allows
the Extraction process to be decoupled from the subsequent processes.

Memory Constraints. Some transformations particularly those that require sorting,


can consume substantial memory resources. In many cases, those transformations
can be done more efficiently by making use of a disk-based staging. Depending
entirely on memory availability limits the ability of the ETL process to scale. When
there are memory intensive transformations, having a disk-based process allows for
more growth in the volume of data you can process.

Auditing Requirements. Depending on the specifics of the extraction and


transformation process, a staging area can provide a level of auditing support. For
example, if it turns out that a specific customer is missing from the data warehouse,
you could examine the staging database to help identify why that customers records
were lost.

Multiple Source systems: When the data is to be consolidated from multiple source
systems, a staging database allows the consolidating to take place before processing
the transformation.

Types of Staging
Assuming that there is a need for staging the extracted data, there are various types of
staging repository patterns available. Consider the following common staging
repository architectures when designing the ETL process.

Staging Data from Multiple Sources

Staggered Staging

Persisted Staging

Accumulated Staging

Chunked Accumulated Staging

Copyright 2006 by Microsoft Corporation. All rights reserved. By using or providing


feedback on these materials, you agree to the attached license agreement. Please provide
feedback at BI FeedBack Alias.

Staging Data from multiple sources


In this case the data arriving from different source systems are temporally stored in a
staging location before the ETL process. As illustrated, the data from the external
sources is extracted through the firewall to a staging database before the
transformation process begins. This is the consolidation of data from multiple sources
at a conceptual level as other lower level tasks such as data merging, prioritizing,
timing and cleaning is also required once after the data consolidation (explained in data
transformation chapter)

ETL Back
office

Productio
n

E, T
L
Target Db
Source
Db

Staging
Db
Fire Wall

Staggered Staging
In this case, you create multiple staging databases for different stages of the ETL
process. The reason for using this approach is if different steps of the process are very
expensive to repeat and are also vulnerable to failure. There is clearly a cost in disk
space for this option, but it allows multiple levels of restart capability, without having to
repeat earlier stages of the process.

Extract

Exception
Source
Db

Transfor
m

Load

Exception
Staging
Db

Exception

Target Db

Copyright 2006 by Microsoft Corporation. All rights reserved. By using or providing


feedback on these materials, you agree to the attached license agreement. Please provide
feedback at BI FeedBack Alias.

Persisted Staging
In this case an archive copy of the staging database is created routinely. The primary
reason for using this alternative is to allow auditing of the extract and transformation
processes for more than the current cycle.
ETL

Archive
Extracted
Data
Source
Db

Staging
Db

Target Db

Accumulated Staging
Ideally, the source systems provide delta data having the corresponding transaction
details (Inserts, Updates, Deletes).
Accumulated staging approach can be adopted, when
1. If the source system does not have inbuilt delta detection mechanism
2. If the source system does not provide the kind of transaction applied for each
delta detected record
If the source system does not have inbuilt delta detection, all the records from the
required tables are extracted to staging then compared against the accumulated
staging.
If the souce system does have delta detection mechanism and extracted only the
changed/added records the next step is to determine the transaction type that occurred
for these changed records (Inserts/Updates/Deletes). Following example would give
more insight on how the comparison is made to determine the transaction type.
Incoming Extraction (Table Employee):
EmpId

EmpName

EmpLocation

1011

John

Redmond

1021

Dave

Seattle

Accumulated Staging (Table Employee):


EmpId

EmpName

EmpLocation

1011

John

New York

In the above tables, when compared the incoming employee table with accumulated
staging, the record EmpId 1011 can be determined as Modified where as the EmpId
1021 can be determined as Inserted.

Copyright 2006 by Microsoft Corporation. All rights reserved. By using or providing


feedback on these materials, you agree to the attached license agreement. Please provide
feedback at BI FeedBack Alias.

T1 L1
E1
T2 L2
E1+E2
T3 L3
E1+E2+E3

Source
Db

Staging
Db

Target Db

Chunked Accumulated Staging


This is a case were you need to extract huge volume of data and you spread the
extract load across the day on the source system. The accumulated information is then
transformed and loaded to the destination warehouse as required by business needs.
For example extraction could happen every hour throughout the day and
transformation and loading could be done once at the end of the day.

11AM
3 PM

Transform &
Load

9 PM
Chunks of data
extracted at
different times
Source
Db

Staging
Db

Target Db

Destination Considerations
The previous sections described various ways of staging data before loading the data
warehouse. In some BI database implementations, there may be an additional step
that involves distributing the data to separate data marts. .

Copyright 2006 by Microsoft Corporation. All rights reserved. By using or providing


feedback on these materials, you agree to the attached license agreement. Please provide
feedback at BI FeedBack Alias.

Transform &
Load

Source
Db

Staging
Db

Warehous
e

Marts

For example, the data from the central data warehouse may need to be distributed to
data marts at different geographical locations. In some ways, the data warehouse is
functioning as a staging area for the distributed data marts. This is not technically a
stage, it is still taken into consideration for choosing appropriate conditions to load the
warehouse to ensure it is appropriate populated to destination marts.

Copyright 2006 by Microsoft Corporation. All rights reserved. By using or providing


feedback on these materials, you agree to the attached license agreement. Please provide
feedback at BI FeedBack Alias.

You might also like