You are on page 1of 17

SAS Techies

info@sastechies.com
http://www.sastechies.com
 When you store your data
in a SAS data file, you use
the sum of the data
storage space that is
required for the following:
◦ the descriptor portion
◦ the observations
◦ any storage overhead
◦ any associated indexes.

Techniques:
◦ Get rid of waste space
◦ Compress datasets
◦ Use Views

SAS Techies 2009 11/13/09 2
 LENGTH variable(s) $ length;  SAS assigns a default length
◦ where of 8 bytes to the Character
◦ variable(s) specifies the name of variable
one or more SAS variables,
separated by spaces.
◦ length is an integer from 1 to  SAS character variables
32,767 that specifies the length
of the variable(s).
store data as 1 character
per byte. A SAS character
variable can be from 1 to
32,767 bytes in length.
 One way to reduce the
amount of data storage
space that you need is to
reduce the length of
character data, thereby
eliminating wasted space.
Instead of recording the
complete name in the data
set, you could assign a
code/abbreviation.

SAS Techies 2009 11/13/09 3
 The default length for a numeric
variable is 8 bytes.

 SAS stores all numeric values using
double-precision floating-point
representation. SAS stores the value
of a numeric variable as multiple
digits per byte. A SAS numeric
variable can be from 2 to 8 bytes or
3 to 8 bytes in length, depending on
your operating environment.

 LENGTH var length <DEFAULT=n>;
◦ where
◦ the optional DEFAULT=n argument
changes the default number of
bytes that SAS uses to store the
values of any newly created
numeric variables. If you use the
DEFAULT= argument, you do not
need to list any variable(s).

SAS Techies 2009 11/13/09 4
 Compressing a data file is a process that reduces the
number of bytes that are required in order to represent
each observation in a data file.

 Reading from or writing to a compressed file during data
processing requires fewer I/O operations because there
are fewer data set pages in a compressed data file.

 However, in order to read a compressed file, each
observation must be uncompressed. This requires more
CPU resources than reading an uncompressed file.

 Also, in some cases, compressing a file might actually
increase its size rather than decreasing it.

SAS Techies 2009 11/13/09 5
 By default, a SAS data file is not compressed. In
uncompressed data files,
◦ each data value and each observation  Compressed data files
occupies the same number of bytes as any
other data value of that variable. ◦ treat an observation as a single
string of bytes by ignoring
◦ character values are padded with blanks. variable types and boundaries.

◦ numeric values are padded with binary zeros.
◦ collapse consecutive repeating
◦ there is a 16-byte overhead at the beginning of
characters and numbers into
each page. fewer bytes.

◦ there is a 1-bit per observation overhead
(rounded up to the nearest byte) at the end of ◦ contain a 28-byte overhead at
each page; this bit denotes an observation's
status as deleted or not deleted.
the beginning of each page.

◦ new observations are added at the end of the
file. If a new observation won't fit on the
◦ contain a 12-byte- or 24-byte-
current last page of the file, a whole new data per-observation overhead
set page is added. following the page overhead.
This space is used for deletion
◦ the descriptor portion of the data file is stored
at the end of the first page of the file.
status, compressed length,
pointers, and flags.

SAS Techies 2009 11/13/09 6
 A data file is not a good  compression can be
candidate for beneficial when the data
compression if it has file has one or more of the
◦ few repeated characters following properties:
◦ small physical size ◦ It is large.
◦ It contains many long
◦ few missing values character values.
◦ short text strings. ◦ It contains many values
that have repeated
characters or binary zeros.
◦ It contains many missing
values.
◦ It contains repeated values
in variables that are
physically stored next to
one another.

SAS Techies 2009 11/13/09 7
◦ To compress a data file, you use
either the COMPRESS= data set
option or the COMPRESS= system
option.

◦ You use the COMPRESS= system
option to compress all data files that
you create during a SAS session.

◦ Similarly, you use the COMPRESS=
data set option to compress an
individual data file.

◦ CHAR or YES uses the Run Length
Encoding (RLE) compression
algorithm, which compresses
repeating consecutive bytes such as
trailing blanks or repeated zeros.

◦ BINARY uses Ross Data Compression
(RDC), which combines run-length
encoding and sliding-window
compression.  

SAS Techies 2009 11/13/09 8
◦ Another way to save disk
space is to leave your data
in its original location and
use a SAS data view to
access it.

◦ A SAS data file and a SAS
data view are both types of
SAS data sets. The first
type, a SAS data file,
contains both descriptor
information about the data
and the data values. The
second type, a SAS data
view, contains only
descriptor information
about the data and
instructions on how to
retrieve data values that
are stored elsewhere.

SAS Techies 2009 11/13/09 9
SAS Techies 2009 11/13/09 10
 use options and a statement to
control the size and number of
data buffers, which in turn can
affect your programs' execution
times by reducing the number of
I/O operations that SAS must
perform.

 When you create a SAS data set
using a DATA step,
◦ SAS copies the data from the input
data set to a buffer in memory
◦ one observation at a time is loaded
into the program data vector
◦ each observation is written to an
output buffer when processing is
complete
◦ the contents of the output buffer
are written to the disk when the
buffer is full

SAS Techies 2009 11/13/09 11
◦ choosing a page/buffer size that is
larger than the default can speed
options bufsize=30720 bufno=10;   up execution time by reducing the
filename orders 'c:\orders.dat';     number of times that SAS must
data company.orders_fact; read from or write to the storage
infile orders; medium.
<more SAS code>     run;
◦ You can use the BUFNO= system or
data set option to control the
number of buffers that are
available for reading or writing a
SAS data set. By increasing the
number of buffers, you can control
how many pages of data are loaded
into memory with each I/O transfer.

◦ The product of BUFNO= and
BUFSIZE=, rather than the specific
value of either option, determines
how much data can be transferred
in one I/O operation. Increasing the
value of either option increases the
amount of data that can be
transferred in one I/O operation.

SAS Techies 2009 11/13/09 12
sasfile company.sales load;       Another way of improving
proc print data=company.sales; performance is to use the SASFILE
statement to hold a SAS data file
var Customer_Age_Group;      run; in memory so that the data is
available to multiple program
proc tabulate data=company.sales; steps. Keeping the data file open
        reduces open/close operations,
class Customer_Age_Group;         including the allocation and
freeing of memory for buffers.
var Customer_BirthDate;        
Table  It is important to note that I/O
Customer_Age_Group,Customer_B processing is reduced only if there
irthDate*(mean median);   is sufficient real memory. If there
run;      is not sufficient real memory, the
sasfile company.sales close; operating environment might
◦ use virtual memory
◦ use the default number of
buffers.
◦ If SAS uses virtual memory,
there might be a degradation in
performance.

SAS Techies 2009 11/13/09 13
SAS Techies 2009 11/13/09 14
SAS Techies 2009 11/13/09 15
SAS Techies 2009 11/13/09 16
 Before you test the programming techniques, turn on the
SAS system options that report resource usage.

 Execute the code for each programming technique in a
separate SAS session.

 In each programming technique that you are testing, include
only the SAS code that is essential for performing the task.

 Run your benchmarking tests under the conditions in which
your final program will run.

 After testing is finished, consider turning off the options that
report resource usage.

SAS Techies 2009 11/13/09 17