You are on page 1of 17

SAS: Managing Memory and

Optimizing System Performance
Jacek Czajkowski
09/29/2008

Optimizing System Performance  Optimizing System Performance consists of managing the interplay of the following three critical computer resources:  I/O  Memory  CPU time 2 .

 System Performance is measured by the overall amount of I/O. memory. While you may not be able to take advantage of every technique for every situation. you can choose the ones that are best suited for a particular situation. and CPU time used to process individual DATA or PROC steps. By using certain techniques and SAS system options you can reduce or reallocate your usage of these three critical resources to improve system performance. You can obtain these statistics by using SAS system options that can help you measure your job's initial performance and to determine how to improve performance. and CPU time that your system uses to process SAS programs. 3 . memory.Definitions  Performance Statistics are measurements of the total input and output operations (I/O).

NOTE: DATA statement used: real time 1.02 seconds Memory 1162k Page Faults 0 Page Reclaims 2619 Page Swaps 0 Voluntary Context Switches 81 Involuntary Context Switches 6 Block Input Operations 0 Block Output Operations 0 4 .01 seconds system cpu time 0. NOTE: The SAS System used: real time 0.16 seconds cpu time 0.System Options  Options STIMER.09 seconds  Options FULLSTIMER.16 seconds user cpu time 0.

exclusive of capacity and load factors. your CPU time will not increase. If you must wait longer for a resource. less of that resource is available to you. 5 .  CPU time represents the actual processing time required by the CPU to execute the job. As more users share a particular resource.Interpreting the Performance Statistics  Real time represents the clock time it took to execute a job or step. which decreases more predictably as you modify your program to become more efficient. A more accurate assessment of system performance is CPU time. but your real time will increase. it is heavily dependent on the capacity of the system and the current load. It is not advisable to use real time as the only criterion for the efficiency of your program because you cannot always control the capacity and load demands on your system.

6 .Interpreting the Performance Statistics Description of FULLSTIMER Statistics Statistic Real Time User CPU System CPU Memory Page Faults Page Reclaims Page Swaps Description the amount of time spent to process the SAS job. the CPU time spent to execute your SAS code. the number of pages that can be accessed without I/O activity. Block Input Operations the number of I/O operations performed to read the data into memory. Involuntary Context Switches the number of times that the operating system forced a process into an inactive state. the number of pages that SAS tried to access but were not in main memory and required I/O activity. Real time is also referred to as elapsed time. Voluntary Context Switches the number of times that the SAS process had to give up on the CPU because of a resource constraint such as a disk drive. the CPU time spent to perform system overhead tasks on behalf of the SAS process. the amount of memory required to run a step. the number of times a process was swapped out of main memory. Block Output Operations the number of I/O operations performed to write the data to file.

you must reduce the number of times SAS accesses disk or tape devices.  Reduce the number of times it processes the data internally by:  creating SAS data sets  using indexes  accessing data through views 7 . Most SAS jobs consist of repeated cycles of reading a particular set of data to perform various data analysis and data manipulation tasks. To improve the performance of a SAS job.Optimizing I/O  I/O is one of the most important factors for optimizing performance.  Process only the necessary variables and observations by:  using WHERE processing  using DROP and KEEP statements  using LENGTH statements  using the OBS= and FIRSTOBS= data set options.

Note that the default is set to optimize the sequential access method. Note that because observations cannot span pages. your memory usage increases. In contrast. you can set a small page size with the BUFSIZE= option. so that the total data set size remains small and you minimize the amount of wasted space on a page. Increasing this option's value can improve your application's performance by allowing SAS to read more data with fewer passes. If you know that the total amount of data is going to be small. Large data sets that are accessed sequentially benefit from larger page sizes because sequential access reduces the number of system calls that are required to read the data set. The default value for BUFSIZE= is determined by your operating environment. if you know that you are going to have many observations in a data set. typically there is unused space on a page. however. 8 . you should change the value for BUFSIZE=. you should optimize BUFSIZE= so that as little overhead as possible is needed.  BUFSIZE= When the Base SAS engine creates a data set. it uses the BUFSIZE= option to set the permanent page size for the data set. the engine always writes complete pages regardless of how full or empty those pages are. To improve performance for direct (random) access. Whether you use your operating environment's default value or specify a value. Experiment with different values for this option to determine the optimal value for your needs. Note that each page requires some additional overhead. The page size is the amount of data that can be transferred for an I/O operation to one buffer.Optimizing I/O Process more data each time a device is accessed by:  BUFNO= SAS uses the BUFNO= option to adjust the number of open page buffers when it processes a SAS data set.

which automatically closes the file and frees the buffers. available to subsequent DATA and PROC steps. Using the SASFILE statement can improve performance by reducing multiple open/close operations (including allocation and freeing of memory for buffers) to process a SAS data set to one open/close operation reducing I/O processing by holding the data in memory. data is held in memory.Optimizing I/O  SASFILE global statement The SASFILE global statement opens a SAS data set and allocates enough buffers to hold the entire data set in memory. Once it is read. until either a second SASFILE statement closes the file and frees the buffers or the program ends. 9 .

68 sec +13.51 sec Space 52 MB 39 MB -13 MB 10 . compressing your data may improve the I/O performance of your application.19 sec Space 235 MB 54 MB -181 MB Mostly Numeric Values Dataset Resource Uncompressed Compressed Change CPU 1. storing your data this way means that more CPU time is needed to decompress the observations as they are made available to SAS.Optimizing I/O  COMPRESS= One further technique that can reduce I/O processing is to store your data as compressed data sets by using the COMPRESS= data set option. Long Character Values Dataset Resource Uncompressed Compressed Change CPU 4. However.27 sec 27. and not CPU usage.17 sec 14. But if your concern is I/O.46 sec +23.

For example. run the SAS procedure or DATA step with MEMSIZE set to 0 and the FULLSTIMER option.Optimizing Memory Usage  If memory is a critical resource. To determine this optimal value.  However. most of them also increase I/O processing or CPU usage. several techniques can reduce your dependence on increased memory.  MEMSIZE= Specifies the limit on the total amount of memory to be used by the SAS System SAS does not automatically reserve or allocate the amount of memory that you specify in the MEMSIZE option. Note the amount of memory used by the process and then set MEMSIZE to a larger amount. a DATA step might only require 20M of memory. so even though MEMSIZE is set to 500M. by increasing memory available to SAS. is reduced. or reading pages of data into memory. you can decrease processing time because the amount of time that is spent on paging. Setting MEMSIZE to 0 is not recommended except for debugging and testing purposes. However. SAS will use only 20M of memory. SAS will only use as much memory as it needs to complete a process. 11 .

SURVEYLOGISTIC. SURVEYFREQ. REPORT.Optimizing Memory Usage  SORTSIZE= Specifies the amount of memory that is available to the SORT procedure  SUMSIZE= Specifies a limit on the amount of memory that is available for data summarization procedures such as the MEANS. SUMMARY.  MVARSIZE= Specifies the maximum size for in-memory macro variable values 12 . SURVEYMEANS. OLAP. and TABULATE procedures.

13 . Setting the value of REALMEMSIZE too low might result in less than optimal performance. Use the MCOMPILENOTE option to write to the SAS log the size of the compiled macro. then disk I/O increases. additional macro variables are written out to disk. A value of 0 causes all macro symbol tables to be written to disk. set REALMEMSIZE to the amount of memory (excluding swap space) that is available to the SAS session at invocation.  REALMEMSIZE= Indicates the amount of real memory SAS can expect to allocate. After the macro completes. If memory is not available to execute the macro. Once the maximum value is reached. an out-of-memory message is written to the SAS log. For better performance. The MEMSIZE option does not affect the MEXECSIZE option. Use the MEXECSIZE option to control the maximum size macro that will be executed in memory as opposed to being executed from a file. Memory is allocated only when the macro is executed. and CPU usage increases. The MEXECSIZE option value is the compiled size of the macro.Optimizing Memory Usage  MEXECSIZE= Specifies the maximum macro size that can be executed in memory. Use the REALMEMSIZE system option to optimize the performance of SAS procedures that alter their algorithms and memory usage. then less memory is available for the application. If this option is set too low and the application frequently reaches the specified memory limit.  MSYMTABMAX= Specifies the maximum amount of memory available to the macro variable symbol table. the memory is released. If this option is set too high (on some operating environments) and the application frequently reaches the specified memory limit.

Lists the current values of all SAS system options. For example on our system: MEMSIZE=67108864 (64Mb) Specifies the limit on the total amount of memory to be used by SAS (maximum was found to be 4GB) SUMSIZE=0 Upper limit for data-dependent memory usage during summarization SORTSIZE=50331648 (48Mb) Specifies the amount of memory that is available to the SORT procedure MEXECSIZE=65536 (64KB) Maximum size for a macro to execute in memory MSYMTABMAX=4194304 (4MB) Maximum amount of memory allocated for the macro table MVARSIZE=32768 (32KB) Maximum length of value of macro variable REALMEMSIZE=0 Limit on the total amount of real memory to be used by the SAS System BUFNO=1 Number of buffers for each SAS data set BUFSIZE=0 Size of buffer for page of SAS data set  Setting value of 0 for some of the options causes the maximum allowable memory to be set 14 .SAS System Options Lisiting  PROC OPTIONS.

34 seconds Memory 15820k Page Faults 0 Page Reclaims 7215 Page Swaps 0 Voluntary Context Switches 1275 Involuntary Context Switches 4 Block Input Operations 0 Block Output Operations 0 15 . NOTE: There were 982 observations read from the data set GANON3_1 NOTE: The data set WORK. NOTE: DATA statement used (Total process time): real time 1.ganon3_1.86 seconds user cpu time 0. set indat.SAS Typical Memory/CPU Usage  Creating a copy of a dataset data basic.56 seconds system cpu time 0.run.BASIC has 982 observations and 42068 variables.

but less memory is available to other processes. because more information can be read and stored in one operation. 16 . an option or technique that reduces the number of I/O operations can also have a positive effect on CPU usage. Optimizing CPU performance in these instances is usually a tradeoff. you might be able to reduce CPU time by using more memory.Optimizing CPU Performance  One can Reduce CPU Time by:    Using More Memory For example. Reducing I/O Because the CPU performs all the processing that is needed to perform an I/O operation. Executing a single stream of code takes approximately the same amount of CPU time each time that code is executed.

You can execute stored compiled programs as needed. This is especially true for large DATA step jobs that are not I/O-intensive.Optimizing CPU Performance  Other Techniques to improve CPU performance:  Storing a Compiled Program for Computation-Intensive DATA Steps Another technique that can improve CPU performance is to store a DATA step that is executed repeatedly as a compiled program rather than as SAS statements. without having to recompile them. To load a compiled DATA step: DATA PGM=stored-program-name . A stored compiled DATA step program is a SAS file that contains a DATA step program that has been compiled and then stored in a SAS data library. To store a compiled DATA step: DATA data-set-name(s) / PGM=stored-program-name. Stored compiled DATA step programs are of member type PROGRAM. 17 .