You are on page 1of 5

NESUG 2007 Foundations & Fundamentals

Taming the PROC TRANSPOSE


Matt Taylor, Carolina Analytical Consulting, LLC

ABSTRACT
The PROC TRANSPOSE is often misunderstood and seldom used. SAS ® users are unsure of the results it will
give and curious about syntax that is often not particularly intuitive. Many programmers resort to use of the Data
step to achieve transposition which they believe results in better control. This paper is intended to demystify the
procedure, explaining the syntax and providing useful examples of how to utilize it.

INTRODUCTION
The PROC TRANSPOSE is part of the SAS language that does not get used as much as it should. It is very help-
ful when needing to shift data from rows to columns or vice versa. This function if achieved in a DATA step can be
much more cumbersome to code. The PROC TRANSPOSE can save time and complexity once it is properly ex-
plained.

TRANSPOSE SYNTAX:
The syntax for the PROC TRANPOSE is somewhat misunderstood amongst SAS users. It is probably a little unor-
thodox when compared with other procedures that analysts use. In this section we will detail all of the features of
the PROC TRANSPOSE.

BY
This statement allows you to transpose data within the combination of the BY variables, the BY variables them-
selves aren’t transposed. A PROC SORT needs to be run on the source data in order for BY variables to be
processed properly, unless the DESCENDING or NOTSORTED options are

COPY
This statement copies the value of a variable from the source data set to the data set resulting of the proce-
dure. Because of this, the number of records in your output data set will be the same as your input data set.
Missing data will show up for your duplicate records.

ID
This statement identifies the variable which creates a name for the value that was transposed. If the variable in
the ID statement is numeric, an underscore will be put at the beginning of the variable name, in keeping with
variable naming convention. Without the ID variable the default value will be col1, col2, etc.

IDLABEL
This statement labels the variable being transposed. In order for this statement to work properly, it must follow
the ID statement.

VAR
This statement lists the actual data that needs to be transposed. If you do not include a VAR statement, the
procedure will transpose all numeric variables that are not included in a BY statement or a ID statement. If you
want to transpose a character variable, a VAR statement is required.

DATA=
This option specifies the input data set.

LABEL=
This option allows you to choose a name for the automatic variable _LABEL_ created by the procedure. In
many cases this variable is dropped from the final results.

1
NESUG 2007 Foundations & Fundamentals

NAME=
This option allows you to choose a name for the automatic variable _NAME_ created by the procedure. If you
have chosen a more complex problem that involves multiple variables in the var statement this additional vari-
able becomes important to identify which variable is represented in the results. Otherwise it can generally be
dropped.

OUT=
This option creates a new data set for your results. If you do not specify an output data set in the code the re-
sults will be put into a default data set called data1.

PREFIX=
The prefix adds a string to the beginning of the transposed variable. In the default example, the prefix would be
col as mentioned in the ID description. This option can be used in conjunction with the ID variable or with the
default ID value.

EXAMPLE 1 – SIMPLE TRANSPOSE


We start with a simple example of the transpose procedure. The data we are beginning with looks like the results
of a typical PROC FREQ. It includes a product, a decision code and a count of each combination.

State Popflag COUNT

DC Pop2 6
DC Pop3 2
DC Pop4 3
DE Pop2 6
DE Pop3 5
DE Pop4 6
FL Pop2 8
FL Pop3 6
FL Pop4 6
GA Pop2 8

Our desired output is to transpose the value of count for each state. We would also like the columns to be titled
with the values of popflag so that the data is clearly labeled. The following code illustrates how this would be done:

proc sort data=tr1;


by state;
run;

proc transpose data=tr1 out=tr2;


by state;
var count;
id popflag;
run;

The variable we desire to transpose is count and therefore goes into the VAR statement. The title desired for the
transposed columns is the popflag field and goes in the ID field. Because we would like the transpose to occur for
each value of product, it goes into the BY statement. The results of this code look like the following:

Obs State _NAME_ _LABEL_ Pop2 Pop3 Pop4 Pop1

1 DC COUNT Frequency Count 6 2 3 .


2 DE COUNT Frequency Count 6 5 6 .
3 FL COUNT Frequency Count 8 6 6 .
4 GA COUNT Frequency Count 8 8 4 .
5 NC COUNT Frequency Count 16 9 2 22

2
NESUG 2007 Foundations & Fundamentals

6 PA COUNT Frequency Count 2 . . .


7 SC COUNT Frequency Count 12 14 3 .
8 VA COUNT Frequency Count 25 12 5 10

Note the default variables of _NAME_ and _LABEL_ were created by the procedure indicating which variable was
transposed. If your ID variable was numeric, SAS would automatically put an underscore in front of it to conform to
SAS rules on naming conventions.

EXAMPLE 2 – A MORE COMPLEX TRANSPOSE

Our beginning data for this example has another dimension. It includes the state, the popflag, a count of accounts
and a sum of the balances those accounts have. Here is its appearance:

State Popflag count balance

DC Pop2 6 23861
DC Pop3 2 3544
DC Pop4 3 15485
DE Pop2 6 24388
DE Pop3 5 19154
DE Pop4 6 26540
FL Pop2 8 42289
FL Pop3 6 33745
FL Pop4 6 32695

The desired output for this procedure is to keep the state value on the left. We wish to transpose all of the numeric
variables in the data set, those being count and balance. The ID variable will be the popflag variable in the data
set. In this example we will be able to utilize the _NAME_ variable to keep the transposed variables straight. The
code would look like this:

proc sort data=test1;


by state;
run;

proc transpose data=test1 out=test2(drop=_label_) name=metrics;


by state;
var count balance;
id popflag;
run;

As with the previous example, the transposed variables are listed in the VAR statement. In this example we use
the NAME= option to title the name column and drop the _LABEL_ field. The resulting data looks like this:

Obs State metrics Pop2 Pop3 Pop4 Pop1

1 DC count 6 2 3 .
2 DC balance 23861 3544 15485 .
3 DE count 6 5 6 .
4 DE balance 24388 19154 26540 .
5 FL count 8 6 6 .
6 FL balance 42289 33745 32695 .
7 GA count 8 8 4 .
8 GA balance 54868 59833 25787 .

Note the name variable is now titled metrics, while the _LABEL_ field has been deleted. The package codes are
now across and the transposed variable are vertical.

3
NESUG 2007 Foundations & Fundamentals

EXAMPLE 3 – THE DOUBLE TRANSPOSE

Transposition of the same data twice in theory should return you to the exact same data. However, there are a few
quirks to the procedure that a programmer can use to your advantage. For this example, our starting data has dif-
ferent source channels for each month and the corresponding booked accounts.

Obs State Popflag COUNT

1 GA Pop2 8
2 GA Pop3 8
3 GA Pop4 4
4 NC Pop1 22
5 NC Pop2 16
6 NC Pop3 9
7 NC Pop4 2

One note about our beginning data is that not all values of popflag are represented in each month. On some occa-
sions, you would like to report on all values in all months, whether they had population or not. This is where the
double transpose can come in handy. The first transpose is similar to previous examples. We are transposing the
count variable with popflag as the title of each column, and transposing it by state.

proc sort data=tr1;


by state;
run;

proc transpose data=tr1 out=tr2(drop=_label_);


by state;
var count;
id popflag;
run;

As can be seen, the resulting data has a placeholder for the popflags without data, therefore achieving the result
we desired. Also note that we kept the _NAME_ variable this time because we will need it in the second transpose.

Obs State _NAME_ Pop2 Pop3 Pop4 Pop1

1 GA COUNT 8 8 4 .
2 NC COUNT 16 9 2 22

The second transpose is an attempt to restore the data that we had originally. The syntax is designed to reverse
the previous procedure. However, since the first transpose added placeholders for the missing months, they will
be kept in the resulting data.

proc sort data=tr2;


by state;
run;

proc transpose data=tr2 out=tr3(drop=_label_) name=popflag;


by state;
var pop1 pop2 pop3 pop4;
id _name_;
run;

The result of the second transpose returns the data back to its previous structure, but adds fields that were miss-
ing in the previous data. This gives the user a complete look at the data for all values of popflag and all states.

4
NESUG 2007 Foundations & Fundamentals

Obs State popflag COUNT

1 GA Pop1 .
2 GA Pop2 8
3 GA Pop3 8
4 GA Pop4 4
5 NC Pop1 22
6 NC Pop2 16
7 NC Pop3 9
8 NC Pop4 2

CONCLUSIONS
The PROC TRANPOSE can be a very useful procedure for SAS users. Once you pick up the syntax, it can serve
a useful purpose in your coding arsenal and can make your life easier when desiring to shift data.

ACKNOWLEDGMENTS
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are
trademarks of their respective companies.

CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Matt Taylor
Carolina Analytical Consulting, LLC
8511 Davis Lake Parkway Ste # C6-285
Charlotte, NC 28269
704-947-8882
taylor_matthew@yahoo.com
www.CACanalytics.com

************************************************

You might also like