Professional Documents
Culture Documents
Understanding the SORT Procedure Sorting data is useful for reordering data for reporting, reducing data retrieval time, and enabling BY-group processing. However, PROC SORT is resource-intensive, using considerable disk space, memory, I/O, and CPU time. You can use options or techniques with PROC SORT to minimize resource usage. SAS supports PROC SORT in all operating environments, so PROC SORT cant take advantage of any platform-specific sort enhancements. PROC SORT executes in memory up to the limit imposed by the SORTSIZE= option. In fact, PROC SORT minimizes the use of external storage and tries to sort entirely in memory, if possible. By default, PROC SORT executes in parallel using multiple threads. Taking advantage of threaded processing in SAS can help you reduce I/O when you sort data. These are some useful terms related to threaded processing: Term thread Definition a single, independent flow of execution through a program or within a process multiple units of work that the operating system schedules for concurrent execution computers with multiple CPUs that share the same memory and a thread-enabled operating system; can spawn and process multiple threads simultaneously
parallel processing
You can determine how many CPUs are available in your SAS session by using a PROC OPTIONS step that specifies OPTION=CPUCOUNT. When you specify OPTION=CPUCOUNT, the SAS log displays the number of available processors.
Improving Sort Performance When you use the SAS sort, a quick rule of thumb for sort space is four times the size of the SAS data set. Even when you sort in place (sort a data set back to the same name), you need enough space in the library for two copies of the data. Sorting takes place in the PROC SORT utility work space. This work space is shared by memory and disk. But if you can sort the data all in memory, the sort runs faster, because you avoid writing and reading temporary utility swap files. Determining how much sort space you need is not an exact science. The amount of space that the SAS sort needs depends on four conditions:
Setting the Sort Indicator and the Validation Indicator Even when PROC SORT creates a separate output data set, if the data is already sorted, the procedure only copies the data set. When SAS sorts a data set, it sets a sort indicator. When the sort indicator is YES and you try to re-sort the data by the same BY variables, SAS doesn't perform another sort.
Controlling the Sort Order When you sort data, you can control the sort order in two ways: by specifying a collating sequence, and by specifying whether or not the observations in a BY group remain in the same order in the output data set. Controlling the order of observations is also a potential way to improve sort performance.