Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
17Activity
0 of .
Results for:
No results containing your search query
P. 1
Pivot and Unpivot on SSIS

Pivot and Unpivot on SSIS

Ratings:

4.0

(2)
|Views: 1,674 |Likes:
Published by sergiotarrillo
Guias de como usar el item pivot/unpivot de Integration Services
Guias de como usar el item pivot/unpivot de Integration Services

More info:

Published by: sergiotarrillo on May 12, 2009
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as DOCX, PDF, TXT or read online from Scribd
See more
See less

11/15/2012

pdf

text

original

 
SQL Server 2005 IntegrationServices - Part 38 - PivotTransformation
Extensive ETL (Extraction, Transformation, and Loading) capabilities that SQL Server2005 Integration Services are based on, deliver such essential functionality as thecombination and cleanup of data originating from heterogeneous sources or schedulingand coordination of activities that frequently take place beyond the boundaries of database management systems. While a substantial number of these features helpwith traditional database administration tasks, there are also a few intended primarilyfor assisting with data analysis. In this article, we will cover Pivot transformation,which is one of the more popular choices in this category.Pivot operation (just like its T-SQL equivalent) modifies the way in which a recordset ispresented; typically by rotating row data into columns, (SSIS also offers Unpivottransformation, which reverses this process). Even though these changes do notintroduce any new data, they tend to enhance the ability to analyze existing content bysimplifying comparisons and uncovering less apparent trends. In order to help youunderstand this concept, we will present an example illustrating pivot operation. As ourdata source, we will use sample spreadsheets available on the Microsoft Web site in theform of Excel 2002 Sample: PivotTable Reportswhich you need to extract to anarbitrary folder by running the downloadable Report.exe. The target location will hostSampleSalespersonReports.xls (which we will manipulate throughout the course of thisarticle), SampleProductReports.xls, SampleOrderReports.xls, andSampleCustomerReports.xls Excel workbooks. Even though they were designed withExcel Pivot Table functionality in mind, we will be able to leverage them for the purposeof our demonstration.The spreadsheet serving as our data source ('Source Data' in theSampleSalespersonReports.xls) contains inventory of orders handled between July2003 and May 2005 by nine salespeople located in the USA and the UK. Our intentionis to convert it into a recordset that would allow us to easily determine the total valueof orders for each salesperson during each year. More specifically, we want theoutcome to consist of five columns - Salesperson, Country, 2003 Orders Amount, 2004Orders Amount, and 2005 Orders Amount. Since the values stored in the last threeneed to be calculated by adding individual order amounts on per salesperson and peryear basis, we will use the Derived Column and Aggregate transformations for thispurpose. Once the summarized data is available (still in the original format), we willreorganize it by applying Pivot. The final result will be saved in a spreadsheet by usingthe Excel Destination Data Flow component.To accomplish this, start by initiating a new project of Integration Services type in theBusiness Intelligence Development Studio. Add to the newly created project a DataFlow task (by dragging its icon from the Toolbox onto Designer interface) and double-click on it to switch to its tabbed area. Create Excel Source (listed under Data Flowsources in the Toolbox) and display its Editor (by selecting the Edit... entry from itscontext sensitive menu). Within the Editor window, click on the New... commandbutton to provide parameters for Connection Manager, pointing toSampleSalespersonReports.xls. Ensure that the "Table or view" option appears in the
 
"Data access mode" listbox and pick 'Source Data$' as Name of the Excel sheet. Switchto the Column section within the Editor window and mark Country, Salesperson, OrderDate, and Order Amount in the Available External Columns listing. Once you completethese steps, click on the OK button to close the Editor window.Next, drag Derived Column transform from the Toolbox and connect the output of ourExcel Source with its input. Launch its Editor window and define a new derived columnnamed Order Year, calculated using the
YEAR([Order Date])
expression (set its datatype to two byte unsigned integer). Accept the changes by clicking on the OK button.Once back to the Data Task tab, add Aggregate transform to the Data Flow task areaand drag the green arrow originating from the Derived Column to its input. On theAggregation tab of its Editor window, select Country, Salesperson, Order Year andOrder Amount in the Available Input Columns box and ensure that the first three arelisted with "Group by" operation and the last one has "Sum" applied to it. Close theEditor window and return to the Data Flow task tab.Next transformation that needs to be included in our package is Pivot. Once you havedropped it onto the Data Flow area from the Toolbox, connect the output of Aggregateto its input and select Edit or Show Advanced Editor - interestingly both present youwith the same Advanced Editor for Pivot window. Once there, review the ComponentProperties tab and switch to the Input Columns tab. Ensure that all available inputcolumns (Salesperson, Order Amount, Country, and Order Year) are selected andswitch to the Input and Output Properties. This is where the majority of configurationtakes place.As mentioned before, our goal is to display the outcome in the specific format, withthree extra columns (2003 Orders Amount, 2004 Orders Amount, and 2005 OrdersAmount), in addition to the two original ones - Salesperson and Country. With theassistance of Derived Column and Aggregate, we have so far managed to create arecordset with Salesperson, Country, Order Year, and Order Amount fields, whichcontains the total amount of orders for a specific salesperson in a given year, giving us27 rows (9 salespeople times 3 years) - down from 799 rows in theSampleSalespersonReports.xls spreadsheet. At this point, we want to rearrangerecords in such way that instead of Order Year and Order Amount columns, we willhave three columns, one per each year covered by our sales inventory (giving us atable with 9 rows and 5 columns - with a single row for each salesperson) listing theamount of sales for an individual salesperson in that year. According to pivotnomenclature, Salesperson and Country function as SetKeys (values in these inputcolumns identify records that need to be grouped together in the same output row),Order Year serves the role of the PivotKey (column which values are used to determineadditional columns in the resulting recordset), and Order Amount containsPivotedValues (which are copied to the new columns created by pivot). Keep in mindthat entries in SetKey and PivotKey columns have to be unique on the per-row basis(which is the case, since the data has been aggregated prior to applying the pivot).Continue our configuration by expanding the Pivot Default Input node, which lists allinput columns. For each, you need to define its role in the pivot process, by setting thePivotUsage custom property, which can take on one of the following values:
0 - indicates that content of the column is simply copied to the output,
1 - designates column participating in KeySet (this value should be assigned forSalesperson and Country),
2 - identifies the PivotKey column (Order Year in our case),
3 - intended for PivotedValues (Order Amount).
 
For all input columns, take a note of the values of their LineageID property, since youwill need to know them to proceed with the next step. Once completed, switch to thePivot Default Output node and create the following output columns:
Salesperson - set its SourceColumn custom property to match the LineageIDparameter of the Salesperson input column,
Country - set its SourceColumn custom property to match the LineageIDparameter of the Country input column,
2003 Orders - set its SourceColumn custom property to match the LineageIDparameter of the Order Amount input column and its PivotKeyValue to thenumber 2003 (needs to be equal to "2003" integer value in Order Year column),
2004 Orders - set its SourceColumn custom property to match the LineageIDparameter of the Order Amount input column and its PivotKeyValue to thenumber 2004 (needs to be equal to "2004" integer value in Order Year column),
2005 Orders - set its SourceColumn custom property to match the LineageIDparameter of the Order Amount input column and its PivotKeyValue to thenumber 2005 (needs to be equal to "2005" integer value in Order Year column).Confirm your choices by clicking on the OK button and return to the Data Flow tabarea. To capture the results, create an Excel Destination, connect its input with theoutput of the Pivot transformation and specify the target spreadsheet by assigningappropriate values in its Excel Connection Manager. The outcome should contain fivecolumns and nine rows, listing aggregate order values for each salesperson in each of the three years covered by the SampleSalespersonReports.xls.It is important to remember that correct output requires that SetKeys entriescontaining identical values appear in adjacent input rows. In our example, this washandled by the Aggregate component (grouping all records by salesperson), howeverin cases where this operation is not needed, make sure you introduce Sorttransformation prior to performing pivot. Otherwise, you will end up with a separaterow for each non-adjacent value in the SortKey column (and NULLs entries in some of pivoted columns for this row). For example, if three rows for a given salesperson werenot grouped together in our Pivot input data, we would end up with three output rowssharing the same SetKey value (i.e. for the total of 11 rows in the output recordset).One of them would contain total sales in the 2003 Orders column as well as two NULLsunder 2004 and 2005 Orders, while the remaining two would have a single value in the2004 Orders and 2005 Orders columns, respectively (and NULLs in the other twocolumns).

Activity (17)

You've already reviewed this. Edit your review.
1 hundred reads
1 thousand reads
Sai Phanindra T liked this
abhinavak liked this
lupevian liked this
vrksanthosh liked this
navakanthkodi liked this
bobbyamor liked this
kssamar liked this

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->