This action might not be possible to undo. Are you sure you want to continue?
By Marcin Policht Extensive ETL (Extraction, Transformation, and Loading) capabilities that SQL Server 2005 Integration Services are based on, deliver such essential functionality as the combination and cleanup of data originating from heterogeneous sources or scheduling and coordination of activities that frequently take place beyond the boundaries of database management systems. While a substantial number of these features help with traditional database administration tasks, there are also a few intended primarily for assisting with data analysis. In this article, we will cover Pivot transformation, which is one of the more popular choices in this category. Pivot operation (just like its T-SQL equivalent) modifies the way in which a recordset is presented; typically by rotating row data into columns, (SSIS also offers Unpivot transformation, which reverses this process). Even though these changes do not introduce any new data, they tend to enhance the ability to analyze existing content by simplifying comparisons and uncovering less apparent trends. In order to help you understand this concept, we will present an example illustrating pivot operation. As our data source, we will use sample spreadsheets available on the Microsoft Web site in the form of Excel 2002 Sample: PivotTable Reports which you need to extract to an arbitrary folder by running the downloadable Report.exe. The target location will host SampleSalespersonReports.xls (which we will manipulate throughout the course of this article), SampleProductReports.xls, SampleOrderReports.xls, and SampleCustomerReports.xls Excel workbooks. Even though they were designed with Excel Pivot Table functionality in mind, we will be able to leverage them for the purpose of our demonstration. The spreadsheet serving as our data source ('Source Data' in the SampleSalespersonReports.xls) contains inventory of orders handled between July 2003 and May 2005 by nine salespeople located in the USA and the UK. Our intention is to convert it into a recordset that would allow us to easily determine the total value of orders for each salesperson during each year. More specifically, we want the outcome to consist of five columns - Salesperson, Country, 2003 Orders Amount, 2004 Orders Amount, and 2005 Orders Amount. Since the values stored in the last three need to be calculated by adding individual order amounts on per salesperson and per year basis, we will use the Derived Column and Aggregate transformations for this purpose. Once the summarized data is available (still in the original format), we will reorganize it by applying Pivot. The final result will be saved in a spreadsheet by using the Excel Destination Data Flow component. To accomplish this, start by initiating a new project of Integration Services type in the Business Intelligence Development Studio. Add to the newly created project a Data Flow task (by dragging its icon from the Toolbox onto Designer interface) and doubleclick on it to switch to its tabbed area. Create Excel Source (listed under Data Flow sources in the Toolbox) and display its Editor (by selecting the Edit... entry from its context sensitive menu). Within the Editor window, click on the New... command button to provide parameters for Connection Manager, pointing to SampleSalespersonReports.xls. Ensure that the "Table or view" option appears in the
"Data access mode" listbox and pick 'Source Data$' as Name of the Excel sheet. Switch to the Column section within the Editor window and mark Country, Salesperson, Order Date, and Order Amount in the Available External Columns listing. Once you complete these steps, click on the OK button to close the Editor window. Next, drag Derived Column transform from the Toolbox and connect the output of our Excel Source with its input. Launch its Editor window and define a new derived column named Order Year, calculated using the YEAR([Order Date]) expression (set its data type to two byte unsigned integer). Accept the changes by clicking on the OK button. Once back to the Data Task tab, add Aggregate transform to the Data Flow task area and drag the green arrow originating from the Derived Column to its input. On the Aggregation tab of its Editor window, select Country, Salesperson, Order Year and Order Amount in the Available Input Columns box and ensure that the first three are listed with "Group by" operation and the last one has "Sum" applied to it. Close the Editor window and return to the Data Flow task tab. Next transformation that needs to be included in our package is Pivot. Once you have dropped it onto the Data Flow area from the Toolbox, connect the output of Aggregate to its input and select Edit or Show Advanced Editor - interestingly both present you with the same Advanced Editor for Pivot window. Once there, review the Component Properties tab and switch to the Input Columns tab. Ensure that all available input columns (Salesperson, Order Amount, Country, and Order Year) are selected and switch to the Input and Output Properties. This is where the majority of configuration takes place. As mentioned before, our goal is to display the outcome in the specific format, with three extra columns (2003 Orders Amount, 2004 Orders Amount, and 2005 Orders Amount), in addition to the two original ones - Salesperson and Country. With the assistance of Derived Column and Aggregate, we have so far managed to create a recordset with Salesperson, Country, Order Year, and Order Amount fields, which contains the total amount of orders for a specific salesperson in a given year, giving us 27 rows (9 salespeople times 3 years) - down from 799 rows in the SampleSalespersonReports.xls spreadsheet. At this point, we want to rearrange records in such way that instead of Order Year and Order Amount columns, we will have three columns, one per each year covered by our sales inventory (giving us a table with 9 rows and 5 columns - with a single row for each salesperson) listing the amount of sales for an individual salesperson in that year. According to pivot nomenclature, Salesperson and Country function as SetKeys (values in these input columns identify records that need to be grouped together in the same output row), Order Year serves the role of the PivotKey (column which values are used to determine additional columns in the resulting recordset), and Order Amount contains PivotedValues (which are copied to the new columns created by pivot). Keep in mind that entries in SetKey and PivotKey columns have to be unique on the per-row basis (which is the case, since the data has been aggregated prior to applying the pivot). Continue our configuration by expanding the Pivot Default Input node, which lists all input columns. For each, you need to define its role in the pivot process, by setting the PivotUsage custom property, which can take on one of the following values: • 0 - indicates that content of the column is simply copied to the output, 1 - designates column participating in KeySet (this value should be assigned for Salesperson and Country), 2 - identifies the PivotKey column (Order Year in our case), 3 - intended for PivotedValues (Order Amount).
• • •
For all input columns, take a note of the values of their LineageID property, since you will need to know them to proceed with the next step. Once completed, switch to the Pivot Default Output node and create the following output columns:
• • •
Salesperson - set its SourceColumn custom property to match the LineageID parameter of the Salesperson input column, Country - set its SourceColumn custom property to match the LineageID parameter of the Country input column, 2003 Orders - set its SourceColumn custom property to match the LineageID parameter of the Order Amount input column and its PivotKeyValue to the number 2003 (needs to be equal to "2003" integer value in Order Year column), 2004 Orders - set its SourceColumn custom property to match the LineageID parameter of the Order Amount input column and its PivotKeyValue to the number 2004 (needs to be equal to "2004" integer value in Order Year column), 2005 Orders - set its SourceColumn custom property to match the LineageID parameter of the Order Amount input column and its PivotKeyValue to the number 2005 (needs to be equal to "2005" integer value in Order Year column).
Confirm your choices by clicking on the OK button and return to the Data Flow tab area. To capture the results, create an Excel Destination, connect its input with the output of the Pivot transformation and specify the target spreadsheet by assigning appropriate values in its Excel Connection Manager. The outcome should contain five columns and nine rows, listing aggregate order values for each salesperson in each of the three years covered by the SampleSalespersonReports.xls. It is important to remember that correct output requires that SetKeys entries containing identical values appear in adjacent input rows. In our example, this was handled by the Aggregate component (grouping all records by salesperson), however in cases where this operation is not needed, make sure you introduce Sort transformation prior to performing pivot. Otherwise, you will end up with a separate row for each non-adjacent value in the SortKey column (and NULLs entries in some of pivoted columns for this row). For example, if three rows for a given salesperson were not grouped together in our Pivot input data, we would end up with three output rows sharing the same SetKey value (i.e. for the total of 11 rows in the output recordset). One of them would contain total sales in the 2003 Orders column as well as two NULLs under 2004 and 2005 Orders, while the remaining two would have a single value in the 2004 Orders and 2005 Orders columns, respectively (and NULLs in the other two columns).
SQL Server 2005 Books Online (November 2008) Pivot Transformation Updated: 14 April 2006 The Pivot transformation makes a normalized data set into a less normalized but more compact version by pivoting the input data on a column value. For example, a normalized Orders data set that lists customer name, product, and quantity purchased typically has multiple rows for any customer who purchased multiple products, with each row for that customer showing order details for a different product. By pivoting the data set on the product column, the Pivot transformation can output a data set with a single row per customer. That single row lists all the purchases by the customer, with the product names shown as column names, and the quantity shown as a value in the product column. Because not every customer purchases every product, many columns may contain null values. When a dataset is pivoted, input columns perform different roles in the pivoting process. A column can participate in the following ways: • The column is passed through unchanged to the output. Because many input rows can result only in one output row, the transformation copies only the first input value for the column. The column acts as the key or part of the key that identifies a set of records. The column defines the pivot. The values in this column are associated with columns in the pivoted dataset. The column contains values that are placed in the columns that the pivot creates.
• • •
The following diagram shows a data set before the data is pivoted on the Product column.
The following diagram shows a data set after the data has been pivoted on the Product column.
To pivot data efficiently, which means creating as few records in the output dataset as possible, the input data must be sorted on the pivot column. If the data is not sorted, the Pivot transformation might generate multiple records for each value in the set key, which is the column that defines set membership. For example, if the dataset is pivoted on a Name column but the names are not sorted, the output dataset could
have more than one row for each customer, because a pivot occurs every time that the value in Name changes. The input data might contain duplicate rows, which will cause the Pivot transformation to fail. "Duplicate rows" means rows that have the same values in the set key columns and the pivot columns. For example, if you use the data set before the data is pivoted on the Product column, as shown in the diagram, and add a row with Kate in the Cust column and Soda in the Product column, these duplicates values would cause the Pivot transformation to fail, regardless of the quantity in the Qty column. To avoid failure, you can either configure the transformation to redirect error rows to an error output or you can pre-aggregate values to ensure there are no duplicate rows. For example, in the sample data set, you could sum the values in the Qty column by customer and product. The Pivot transformation uses the properties on its input and output columns to define the pivot operation. The Pivot transformation includes the PivotKeyValue custom property. This property can be updated by a property expression when the package is loaded. For more information, see Integration Services Expression Reference, Using Property Expressions in Packages, and Transformation Custom Properties. This transformation has one input, one regular output, and one error output. Configuring the Sample Dataset The sample dataset shown in the diagram was configured as follows: the PivotUsage property of the Cust column was set to 1, to indicate that it is a set key column; the PivotUsage property of the Product input column was set to 2, to indicate that a column must be created for each product; the PivotUsage property of the Qty input column was set to 3, to indicate that quantity values are placed into the pivot column. The transformation output was configured to include six columns. The columns, which can be added by using the Advanced Editor dialog box, were named Cust, Ham, Soda, Milk, Beer, and Chips. The PivotKeyValue property of the Ham column was set to Ham, to indicate that the transformation should look for that value in the input column. Similarly, the PivotKeyValue property of the Soda column was set to Soda, and so on. Columns in the transformation input were then mapped to columns in the output. The SourceColumn property of the Cust column was configured to use the lineage identifier of the Cust input column. The SourceColumn properties of the Ham, Soda, Milk, Beer, and Chips columns were configured to use the lineage identifier of the Qty input column. Another way to configure this would be to set the SourceColumn property of the Ham, Soda, Milk, Beer, and Chips columns to -1, which would insert the value True instead of the data value. For example, instead of the values 12 and 24, the Beer column would then contain the value True, to indicate only that the customer had purchased the product, instead of showing the quantity purchased. The rows in the transformation output contain the values from the Cust and Qty input columns. Pivot Options You set the PivotUsage property of the input columns to specify the role each column performs in the pivoting process. The valid values of PivotUsage are 0, 1, 2, and 3. The following table describes the PivotUsage options. Option Description
0 1 2 3
The column is not pivoted, and the column values are passed through to the transformation output. The column is part of the set key that identifies one or more rows as part of one set. All input rows with the same set key are combined into one output row. The column is a pivot column. At least one column is created from each column value. The values from this column are placed in columns that are created as a result of the pivot.
Configuring the Pivot Transformation You can set properties through SSIS Designer or programmatically. For more information about the properties that you can set in the Advanced Editor dialog box or programmatically, click one of the following topics:
• • • •
Common Properties Transformation Custom Properties How to: Set the Properties of a Data Flow Component in the Properties Window How to: Set the Properties of a Data Flow Component Using the Advanced Editor
For more information about how to set the properties, click one of the following topics:
Unpivot Transformation Updated: 14 April 2006 The Unpivot transformation makes an unnormalized dataset into a more normalized version by expanding values from multiple columns in a single record into multiple records with the same values in a single column. For example, a dataset that lists customer names has one row for each customer, with the products and the quantity purchased shown in columns in the row. After the Unpivot transformation normalizes the data set, the data set contains a different row for each product that the customer purchased. The following diagram shows a data set before the data is unpivoted on the Product column.
The following diagram shows a data set after it has been unpivoted on the Product column.
Under some circumstances, the unpivot results may contain rows with unexpected values. For example, if the sample data to unpivot shown in the diagram had null values in all the Qty columns for Fred, then the output would include only one row for Fred, not five. The Qty column would contain either null or zero, depending on the column data type. The Unpivot transformation includes the PivotKeyValue custom property. This property can be updated by a property expression when the package is loaded. For more information, see Integration Services Expression Reference, Using Property Expressions in Packages, and Transformation Custom Properties. This transformation has one input and one output. It has no error output. Configuring the Unpivot Transformation You can set properties through SSIS Designer or programmatically. For more information about the properties that you can set in the Unpivot Transformation Editor dialog box, click one of the following topics:
Unpivot Transformation Editor
For more information about the properties that you can set in the Advanced Editor dialog box or programmatically, click one of the following topics:
• • • • •
Common Properties Transformation Custom Properties How to: Set the Properties of a Data Flow Component Using a Component Editor How to: Set the Properties of a Data Flow Component in the Properties Window How to: Set the Properties of a Data Flow Component Using the Advanced Editor
For more information about how to set the properties, click one of the following topics:
Pivot and UnPivot with SSIS
By : Dinesh Asanka Nov 28, 2007 Page 2 / 5
Next, we need to derive the Quarter. Even though we can modify the initial T-SQL to return the Quarter, I have used derive column data flow transformation task. The following expression is used to derive the Quarter. MONTH(OrderDate) MONTH(OrderDate) MONTH(OrderDate) MONTH(OrderDate) >= >= >= >= 1 && MONTH(OrderDate) <= 3 ? 1 : 4 && MONTH(OrderDate) <= 6 ? 2 : 7 && MONTH(OrderDate) <= 9 ? 3 : 10 && MONTH(OrderDate) <= 12 ? 4 : 0
We now need to group the above data with Category and Quarter. We can use aggregate transformation and configure it to be grouped by Name and intQtr.
Next we need to add a sort transformation, and here I have used category to sort. We also need to sort the key column, otherwise pivot will not work properly. To see the data up to this point,you can add a data viewer. Below is the scrennshot of the data set should be getting, which is the data set we need to pivot.
We have now reached the core part of this article- pivoting. For pivoting, there is a pivot transformation confirguration which is not exactly straight forward. At input tab of the pivot transformation, you need to select columns that you would use in the pivot operation, which in this case would be all three available columns. The next most important tab is the ‘Input and Output’ properties tab, pictured below.
For input columns, we need to configure the pivot usage attribute. Optio Description n 0 1 The column is not pivoted, and the column values are passed through to the transformation output. The column is part of the set key that identifies one or more rows as part of one set. All input rows with the same set key are combined into one output row. The column is a pivot column. At least one column is created from each column value. The values from this column are placed in columns that are created as a result of the pivot. Source: Books on line, SQL Server 2005 According to the above table, Name column should be Option 1 , intQtr should have Option 2 and OrderQty should have Option 3 for pivot usage attribute value.
SSIS: UNPIVOT Transformation
Turning some of columns into rows was one of the tasks had to be done recently. Even though I couldn’t use “SSIS UNPIVOT transformation” for that, I had a chance to play with it. As it is really a useful data flow item for some operations, thought to make a post on it. The given below is the data contain in a text file. ProjectName equipments transportation rental software hardware CCN 54632.56 78433.00 9876 0 0 FX5 100547.55 205465.00 99526 78000 45465
Assume we need to load above information structured like below. ProjectName ExpenseType Amount CCN equipments 54632.56 CCN hardware 0.00 CCN rental 9876.00 CCN software 0.00 CCN transportation 78433.00 FX5 equipments 100547.55 FX5 hardware 45465.00 FX5 rental 99526.00 FX5 software 78000.00 FX5 transportation 205465.00 Simple create a SSIS package and add necessary source (for the text file) and destination items. Then add a UNPIVOT transformation item and set the source output path to it. Open the UNPIVOT transformation editor and set equipments, transportation, rental, software and hardware as Input Column. Do not select ProjectName. Set “Amount” for all Destination Column of all Input Columns. The Pivot Key Value will be same as Input Column name. Enter “ExpenseType” for Pivot key value column name. Set the output of UNPIVOT transformation item to the destination. It is done! Since my requirement was little bit different, I had to load them to the SQL Server temp table and use UNPIVOT TSQL command. But for scenario like this, this method can be easily applied. You may be adding Data Conversion item to convert data if the destination is SQL Server.
How to convert a row to a column
• ○ ○ Alert Me Alert Me
Thursday, August 02, 2007 1:21 PM
0 votesVote As Helpful
hi, I have a requirement as stated below Convert one row in a table to a column: Input
Name John Cary ID 12 1 Date 1/1/1900 1/1/1900 Profile Admin Admin Manager 12 12
Output Name ID Date Profile Manager John 12 1/1/1900 Admin 12 Cary 1 1/1/1900 Admin 12
I thing i have to use pivot transformation in the data flow but i am not sure how to configure this. Can anyone suggest me how to implemet this or configure this! Thanks in advance!
Report As Abuse ○ ○ ○ Reply Quote Quote
Eric WisdahlModerator 0 votesVote As Helpful
Thursday, August 02, 2007 5:53 PM
What it appears that you are trying to do doesn't make much sense. Would you continue to add columns to the table if you had more than two records? Do you know what type of records you would like to get out? What would tie the records together such that you know that all of the values in the "Column" are related? It appears that you would really just like to present the data differently, which will be a front end application job (C# or VB). If I am misreading your question, here is a breif overview of what a pivot (denormalize) / unpivot (normalize) would do: Pivot: Name Type Amount name1,typeA,2 name1,typeB,3 name2,typeB,1 name3,typeC,4 ===> Name typeA typeB typeC name1,2,3,0 name2,0,1,0 name3,0,0,4 Notice that the name in the column "Type" is treated as the column name when unpivoted. These names can be set up to be different than the value which causes them to pivot, but most people will leave them the same so as to not confuse themselves later.
The value that was stored in the column "Amount" is transferred to the column name associated with it. The record is identified by the key value of name. UnPivot: Name typeA typeB typeC name1,2,3,0 name2,0,1,0 name3,0,0,4 ==> Name Type Amount name1,typeA,2 name1,typeB,3 name2,typeB,1 name3,typeC,4 Notice that the column name is transferred to the pivoting column "Type" and the value that was stored in that column is pivoted into the column "Amount". The record is identified by the key value of name.
<Rant> NOTE: There have been a few questions lately on how you would pivot multiple rows, which usually look something like the following: Name T1 D1 T2 D2 a,txt1,01/01/2000,txt2,01/01/2007 b,txt7,02/02/2010,txt3,08/08/2004 and they would like to "Pivot" to Name T D
a,txt1,01/01/2000 a,txt2,01/01/2007 b,txt7,02/02/2010 b,txt3,08/08/2004 This IS NOT pivoting! This is subselecting into a generic category. A pivot or unpivot operation must include the column name which is being transferred.
</Rant> Report As Abuse ○ ○ ○ ○ Reply Quote Propose As Answer Propose As Answer
Thursday, August 02, 2007 6:56 PM
0 votesVote As Helpful
I have an excel workbook with many sheets, each sheet has a table with different structure. the first spread sheet has a list of all tables(its like contents). I will get a tablename at a time from the content sheet and go to the respective sheet which has the record information.
Then the first row has column info ... based upon this info i have to create a table dynamically
What i am trying to do is get the column names (the colum number and names changes frequently) and build a table dynamically in the Database.
For which i need to get the column names from the first row of excel spread sheet and convert them to a single column so that i can use them one by one loop over and dynamically generate a DDL script for the table. The above diagram illustrated how i want to unpivot the table.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.