You are on page 1of 57

SAS Notes

Cognizant Technology Solutions

SAS Notes

Introduction......................................................................................................................................................4
1. ORIGIN OF SAS................................................................................................................................ 4
2. WHY SAS?....................................................................................................................................... 4
3. CHARACTERISTICS OF SAS............................................................................................................... 5
4. DATA WAREHOUSING......................................................................................................................... 5
Basics of SAS Software....................................................................................................................................7
1. SAS DATA SET................................................................................................................................ 7
2. A SAMPLE SAS PROGRAM................................................................................................................. 8
3. THE DATA STEP................................................................................................................................ 8
3. THE PROC STEP................................................................................................................................ 9
4. THE PARSE / EXECUTE CYCLE............................................................................................................ 9
5. DATA IN MEMORY........................................................................................................................... 10
6. THE OBSERVATION LOOP................................................................................................................. 11
8. FILE STRUCTURE............................................................................................................................ 13
9. SAS DATE & TIME......................................................................................................................... 13
Rules of SAS Language.................................................................................................................................14
1. KEYWORDS.................................................................................................................................... 14
2. AUTOMATIC VARIABLES.................................................................................................................. 15
3. VARIABLE ATTRIBUTES................................................................................................................... 15
4. VARIABLE LISTS.............................................................................................................................. 16
5. THE NUMERIC DATA TYPE............................................................................................................... 16
6. OPTIONS......................................................................................................................................... 17
SAS Programming concepts..........................................................................................................................18
I. THE DATA STEP................................................................................................................................ 18
Types of DATA steps...............................................................................................................................18
1. Data from an external file.................................................................................................................18
2. Data in job stream.............................................................................................................................18
3. Data in existing SAS data set............................................................................................................19
4. Writing reports...................................................................................................................................19
II. WORKING WITH SAS DATASETS....................................................................................................... 20
Dataset Options.................................................................................................................................................20

III. INPUT / OUTPUT STYLES................................................................................................................ 21
1. List Input............................................................................................................................................21
2. Column Input.....................................................................................................................................21
3. Formatted Input.................................................................................................................................21
4. Named Input......................................................................................................................................22
IV. WORKING WITH EXTERNAL FILES................................................................................................... 24
1. INPUT operation............................................................................................................................24
2.

INFILE.............................................................................................................................................................. 24

OUTPUT operation........................................................................................................................26

FILE.................................................................................................................................................................. 26

Pointer Controls & Line-Hold Specifiers..............................................................................................28
V. DATA STEP STATEMENTS................................................................................................................. 29
1. File handling Statements...................................................................................................................29
2. Action Statements..............................................................................................................................29
3. Control Statements.............................................................................................................................30
4. Information Statements......................................................................................................................30
VI. OPERATORS IN SAS....................................................................................................................... 32
VII. COMBINING DATA SETS................................................................................................................ 33
Introduction............................................................................................................................................33
1 Concatenating data sets.....................................................................................................................33
2. Interleaving........................................................................................................................................34
3. One to One Reading..........................................................................................................................35
4. One to One Merging..........................................................................................................................35
5. Match-Merging.................................................................................................................................36
Duplicate values of BY variable.......................................................................................................................36
Nonmatching observations................................................................................................................................37

6. Updating Data sets............................................................................................................................38

Page 2 of 57

SAS Notes
Duplicate values of BY variable.......................................................................................................................38
Non-matched observations................................................................................................................................39

VIII. ARRAYS.................................................................................................................................... 42
IX. FUNCTIONS................................................................................................................................ 43
Procedures......................................................................................................................................................47
Append...................................................................................................................................................47
Compare.................................................................................................................................................47
Contents.................................................................................................................................................48
Datasets..................................................................................................................................................48
Formats..................................................................................................................................................49
Summary or Means................................................................................................................................50
Print.......................................................................................................................................................52
SQL.........................................................................................................................................................53
SAS Macro Language....................................................................................................................................56
Macro Variables.....................................................................................................................................56
Macros....................................................................................................................................................57
Some other SAS Products..............................................................................................................................58

Page 3 of 57

SAS Notes

Introduction
The SAS system began as a software system for Data Analysis & statistical work. Since then, SAS has
evolved and made its presence in diverse fields. Today, SAS Systems analysis tools range from simple
statistics to specialized analysis for econometrics & forecasting, statistical design, computer performance
evaluation & Operation Research. SAS finds its highest application in the field of Data Warehousing &
Data Mining.

1. Origin of SAS
SAS originally stood for “statistical analysis system” and many of the characteristics of SAS can be traced
back to its statistical background.
In statistical experiments, a measuring process can be repeated at many different times. Each instance of
measurement is called an observation and different qualities that are measured are called variables.
That’s the source of these two SAS terms and the form of a SAS dataset.
Ideally, these observations are independent, i.e. different observations do not depend on each other. The
data from each observation can be processed independently, without reference to data from other
observations, and the order in which observations are processed do not affect the conclusion. This makes
possible the concept of observation loop for a computer program involving a repeated process of reading
one observation at a time into memory and extracting the information needed from it. This observation
loop is a central part of the design of the SAS system.

2. Why SAS?
SAS System is an integrated system of software products.

Its power, flexibility & ease of use enables you to gain strategic control of all your data processing
needs. SAS System has a collection of ready-to-use programs called procedures. Combined with
other features of SAS System, it makes it possible to have a variety of applications – from generalpurpose data processing to specialized analysis in many application areas.

It facilitates applications that run on more than one computing environment. SAS applications work
the same, look the same and produce the same results irrespective of your hardware or OS. This is
possible because SAS System has a layered structure called Multi Vendor Architecture (MVA) This
consists of a host specific component which is specifically written for each environment and the
portable component which brings it a universal ‘feel’. You can develop SAS applications on one
environment and run them in other environments without any changes.

It can accommodate skill level of potential users. SAS provides flexible user interface in the form of
menu-driven or task-oriented interfaces. New users can practically develop applications without
learning the syntax of the SAS language through these interfaces.

It provides an exhaustive inventory of application development tools.

3. Characteristics of SAS

The SAS System has a modular design. It involves a large collection of several programs that are
coordinated by a central program called the supervisor.

Page 4 of 57

SAS Notes



SAS is an interpreted language but has some characteristics of a compiler. Most SAS statements are
grouped into segments called steps, rather than being interpreted and executed at one time, before
execution.
SAS is called a step-structured language because it only allows one step to run at a time, one after the
other.
SAS has been called a very high-level language because much of its syntax is even more abstract than
most high-level languages. The program code correlate as close as possible to the ideas of the
programmer and the result he/she seeks to achieve.
SAS has it’s own storage format and SAS language provides high-level access to files in this format.
Data files that SAS system accesses this way are called SAS Datasets. The simplified access to SAS
datasets in SAS syntax eliminates most of the work of programming input & output in SAS
programs. At the same time SAS also provides high-level access to files of other formats through
specialized routines called format & informat. The input and output capabilities of SAS are still
among the most powerful & flexible of any programming language.

4. Data warehousing
“The goal of data warehousing is to free the information that is locked up in the operational data
bases and to mix it with information from other, often external, sources of data”
Operational systems are systems that help in running the enterprise operations. They are the backbone of
the enterprise running daily transactions such as “inventory”, “payroll”, “accounting” and other such
transactional systems. Such systems are indispensable to an organization, as an enterprise cannot operate
without these systems. These systems are tuned for high performance and quick response time and often
need to extremely stable and robust.
Informational systems perform the crucial functions of enabling the planning, forecasting and other
strategy related management functions. In a dynamic business environment, the enterprise has to be
geared for the future in order to sustain itself and grow in a healthy manner. The Informational systems
are knowledge-based (where as the operational systems are data based) and they deal with analyzing data
and helping managers in arriving at decisions.
The significant difference between the operational systems and the informational systems can be seen in
the area of the focus of the two systems. An operational system is focussed on a single area while an
informational system has to span a breadth of different areas. This is because an operational system is
concerned with the data and transactions in a particular area while the informational systems needs to
data from different sources to facilitate decision making. Even if there is an all-encompassing operational
system, it cannot double up as an informational system because its main function is efficiency in
operations. Data used in analysis is typically historical data which is inactive and this data if mixed with
operational live data causes performance degradation of the operational system. Thus informational
systems have to be designed that aid the decision-makers in performing analyzing and planning for the
future. A Data warehouse effectively performs the function of an informational system on an enterprise
level.
A Data warehouse is a “collection of integrated subject-oriented databases designed to support the DSS
function; where each unit of data is relevant to some moment in time. The data warehouse contains
atomic data and lightly summarized data.“
Most databases are designed to ease data entry, reduce redundancy and speed the retrieval of a single
entity. The data warehouse, on the other hand, is designed of fast retrieval of information & answers.
This means that groups of records will be retrieved, manipulated & analyzed. It may require that data
needs to be accessed from multiple database sources – a collection of integrated subject-oriented
databases.

Page 5 of 57

The data in the data warehouse is usually at a level higher than the data at the operational level i. some analysis or aggregation has already been performed to the operational data.g. Data in typically saved for a large period of time as the efficiency of analysis improves with the breadth and depth of data available in the data warehouse. the grain of the fact in a data warehouse might be the sales on a particular day and this might be the same as the data in a operational system) Page 6 of 57 . There are certain data items that will be in the same level as that of the operational data (e. Data from the operational systems is triggered to go to the data warehouse when most of the activities on these operational data has been completed.SAS Notes The data resident in a data warehouse is non-volatile data.e.

A one-level name used for a SAS dataset implies that the default WORK library is being assumed. It consists of the Descriptor information & Data values. A Macro Facility – to generate & store text strings & communicate info from one program to another. SAS datasets are kept in collections called SAS data libraries. A variable is the set of data values that describe the characteristics of the object. SAS DATA set The Data to be used must be in a form understandable to the SAS System. Procedures – Pre-written computer programs that analyze & process datasets & display results. An Observation is a collection of data that usually relate to a single object. It consists of     The SAS language – programming language used to mange your data. The Data values contain the actual data to be analyzed. SAS recognizes missing values and has an internal representation for them. This representation is used because SAS requires values for all variables for every observation in the dataset. A Windowing Environment called SAS Display Manager System. All SAS programs consists of a series of statements that.SAS Notes Basics of SAS Software The core of the SAS system is the Base SAS software. are designed to accomplish a specific task called SAS steps.   The Descriptor information describes the contents of the SAS dataset to the SAS System. Missing values are values unavailable to the System. as a group. These SAS steps fall into 2 categories – Page 7 of 57 . A SAS dataset is identified in a SAS program by a two-level name that identifies the SAS data library & the SAS dataset. This form is called the SAS Data set. The DATA values are arranged into a rectangular structure of rows (called observation) and columns (called variables). This ensures the rectangular structure of the data values. It gives you all the tools you need to make your data useful & meaningful. The Base SAS software provide tools for :      Information Storage And Retrieval Data Modification And Programming Report Writing Statistical Analysis File Handling The next few sections discuss the following :        SAS DATA set Data step PROC step The Parse / Execute Cycle Options File structure SAS date and time 1.

until the end of the input line is reached. height. init_wt. Any statement that follows on the same line will be completely ignored. ht_loss = init_ht – height. Since there is more than one data line. var wt_loss ht_loss. They are the building blocks of all SAS programs. Page 8 of 57 . run. The Data Step DATA – instructs the SAS system to create s SAS dataset. run. INPUT – provides information to the System to organize data into SAS datasets. proc print data = min_wt. wt_loss and ht_loss 3. CARDS – mark the end of programming stmt & beginning of data within the same step as the interpreter stops as soon as it gets to the CARDS statement. run. proc summary data = ht_wt. A Sample SAS program data ht_wt. weight. wt_loss = init_wt – weight. creating one observation from each input line. It describes your input data. output out = min_wt min(wt_loss )=. init_ht. output out = min_ht min(ht_loss)=. giving name to each variable and identifying its location on the disk or tape file. the data step is executed repeatedly. cards. endsas. John M 35 76 172 73 175 James M 32 75 167 73 169 Mary F 30 68 165 62 154 Ruby F 32 65 158 58 163 . sex. proc print data = min_ht. Note: The variables in SAS dataset ht_wt are name.SAS Notes DATA steps & PROC steps. run. input name $ 1-10 sex $ 12 age 14-15 height 17-18 weight 20-22 init_ht 24-25 init_wt 27-29. It signals the beginning of the DATA step and gives a name to the SAS data set you are creating. The SAS datasets is useful to store data between SAS steps 2.

The PRINT proc produces a report in a table form. Otherwise. frequency tables. Print files are divided into pages and usually have a tittle that appears on the first few lines of each page. The first PROC step asks SAS to call a procedure form its library and to execute that procedure. The Parse / Execute cycle 1. cross tabulation tables. the two variables listed in the var statement. plots etc. Proc SUMMARY computes the minimum values of ht_loss & wt_loss. _freq_ is an automatic variable created by summary procedure which gives the number of observations for the current subgroup. it parses them into tokens & statements until a complete step is formed. Proc’s are specialized application programs that analyze data in a dataset for producing univariate descriptive statistics. The output datasets contain all the variables named in the proc step and a few extra identifier variables. There is no compiling feature in PROC steps because PROCs are already compiled programs. Statements executed immediately are called global statements and do not have to be associated with a step. or stored for later use. displayed on screen. you can analyze the data & write reports using a set of utilities known as SAS Procedures. Once your data are accessible as a SAS dataset. whether it is sent to printer. Note: The variable in SAS dataset min_ht is ht_loss & _freq_ and in SAS dataset min_wt is wt_loss & _freq_. with the SAS data set ht_wt as input. including program statements. charts.       It then checks the syntax of the statement If it is the kind of statement that is executed immediately. 4. Other procedures provide ways to manage SAS files. The log file contains the SAS supervisor’s step-by-step account of the execution of the program. then the statement is executed. Another file that is automatically created and maintained by SAS is the log file. tabular reports. When the interpreter reads lines from the program file. the program has two proc steps. of all the data vales in the SAS dataset. To most SAS users they are the main attraction of the SAS System. 2 Then it parses more lines from the program file until it forms another complete step and the above process is repeated for the execution of that step.SAS Notes RUN – instructs SAS to execute the previous statements 3. The standard print file is the file ordinarily used to hold text output from programs. The minimum values of the variables ht_loss & wt_loss are stored in the same variables itself in the corresponding output datasets. Page 9 of 57 . warnings & error messages. in this case. Two output SAS datasets are created (min_wt & min_ht) with minimum value of variable wt_loss in one(min_wt) & minimum value of variable ht_los in the other (min_ht) SAS dataset. If there is a syntax error it generates an error message. The Proc Step In addition to the data step. This report prints the contents of the standard print file. 3 The above process continues until the end of program is reached. notes. it adds information from the statement to the step being built Continues processing statements until it reaches the end of the step Then it executes the step if no errors are reported.

You can change the names of the input(SET. input files. The input SAS dataset that are present when the data step is compiled should have the same variables & attributes as the SAS data set that will be used when the compiled dataset is run. REDIRECT INPUT compiled SAS dataset name = actual SAS dataset name …. Where NEW is the dataset with only the Descriptor information of dataset ABC. the standard print file. functions. informats and formats being used in the step  File names associated with filerefs or librefs in a FILENAME or LIBNAME statement Page 10 of 57 . REDIRECT INPUT compiled SAS dataset name = actual SAS dataset name …. output files and any libraries being used in the step. Compiling a data step creates a SAS file in a SAS library. Data in memory SAS keeps the values of all variables in a step in a block of memory called the program data vector or PDV. RUN PGM=PROJECT. STOP. They can be created as : DATA NEW. but need not contain any observations. DATA ABC. 5. INPUT A B C D E. CALL routines. the log file. including the program file. The PGM= option specifies a SAS file name where the compiled dataset will be stored. which limits the number of variables a step can have to a few thousands. A compiled data step can be executed by using the PGM= option on a DATA statement. The compiled data set is not a machine language program but a parsed code called “intermediate code”.  Buffers that contain data read from or written to each file. The size of the PDV is fixed. which can be run in a SAS program. The dataset is not executed in this case.  The SAS supervisor  The current step and any procs. MERGE or UPDATE statements) & output (DATA statement) SAS data sets in the compiled data step by using the REDIRTECT INPUT & REDIRECT OUTPUT statements. INFILE ABC. In addition it can include:  Pointers to all the files being used.ABC. DATA PGM =SAS file. RUN. The PDV and variable attributes represent a modest part of the SAS system’s use of memory. SET ABC.SAS Notes The scope of the SAS data step begins with the key word DATA and ends with any one of the following:       The keyword RUN (or QUIT) Another Data step beginning with the key word DATA Another Proc step beginning with the key word PROC End of program code The keyword ENDSAS CARDS or CARDS4 statement in a data step A SAS dataset can be also be compiled and stored separately.

 When the scope of the Data step is detected. The Observation Loop The amount of input data read by one repetition of the observation loop is an observation. a new iteration of the data step begins. During compilation. 3. and UPDATE) are executable. The automatic variable _N_ is set to the next value. Program Data Vector : Area of memory where the SAS data set is built. one observation at a time. 2.  By default SAS data step executes once for each observation being created. if any Position of the variable in the dataset During Execution. The variable attributes include : 1 2 3 4 5 6 7 8 9 Name of the variable Data type Length Label RETAIN or reinitialize DROP or KEEP Initial value assigned. Return – The system returns to the top of the Data set in preparation for another iteration. From here the values are written to the output SAS dataset as a single observation. and do not have to be placed right after the data step.SAS Notes     System options Array definitions Titles Macro variable names & values 6. MERGE. Descriptor information : Information the SAS System creates & maintains about each SAS dataset like data set attributes & variable attributes.  Statements that read data (INPUT.  Subsequent SAS statements are executed for the current record. Records coming from another SAS dataset are read directly into the Program Data Vector. if any Format or Informat . Values are assigned to the variables in the program vector during execution. They may appear anywhere in the data step. the following occur automatically : At the bottom of the observation loop: 1. SAS creates three things: 1.  A raw data reading statement like Input causes a record of data to be read into the input buffer and then into the Program data vector. Page 11 of 57 . or  written when a Put statement executes. Each time the DATA statement executes. Buffer : Area of memory into which each record of raw data is  read when an Input statement executes. Output – an observation is written into the new SAS dataset 2. SET.

These are initialized to 0.SAS Notes At the top of the observation loop: 3. direct input/output operations and gather descriptive information about files and their contents. MERGE. 2. Variables comes from a SAS dataset (read with SET. Values of variables created with Input or assignment statements are reset to missing in Program Data Vector. Resets the current input file to CARDS and the current output file to LOG. or UPDATE statements) are retained and not reset to missing as the program passes through the DATA statement. MERGE or UPDATE in a SAS dataset or raw data file. Variables used on the left-side of a sum statement e.  The next iteration ensues and the process is repeated. SET. MERGE. Engine: Engines are a set of internal instructions that the SAS System uses to read from and write to files. SAS Libraries contain SAS data files. SET. MERGE. UPDATE or DISPLAY. 4. The observation count _N_ increases by 1 4. The engine uses this information to organize data into correct logical form – SAS data sets. The SAS Supervisor does not set variable values to missing at the top of the DATA step for: 1. Normally. SAS files in permanent libraries are specified by a two level qualifier where the first qualifier stands for the libref and the second qualifier for the name of the file. Variables listed in a RETAIN statement 3.g. 8. Engines open files. variable I is not reset in I + 5. Permanent Libraries: they reside on the external storage medium and are not deleted when the SAS session terminates.  The data step terminates when the end-of-file condition is encountered for an INPUT. the SET. UPDATE or INPUT statement executes conditionally  RECFM=U option is specified with INFILE option The SAS interpreter does not create an observation loop for steps that do not have INPUT. Permanent libraries are stored till they are specifically deleted. MERGE. These are initialized to 0 or data-dependent values. Every SAS data set and Data Library is accessed through an engine. MERGE. 2. SAS data Library: It is the logical structure of files accessed by an Engine for processing by the SAS System. But this does not happen for the following cases:  The INPUT statement has trailing @@  The POINT= option is used with SET statement for direct access  EOF= option is used with INFILE statement  SET. UPDATE or FILE. SET. Variables used in I/O statement option for INFILE. They are of two types : Permanent & temporary 3. File Structure 1. Variables that are array elements and the array uses temporary variables. 5. 5. UPDATE and INPUT statements stop the data step when the end of input data is detected. Page 12 of 57 .

SAS System processes time by converting it to integer representing number of seconds since midnight of the current day. They are of two types : SAS data files & SAS data views. 9. SAS time values are independent of the date. SAS data file: Contains both the data values and the descriptor information 7. time & datetime values through formats & informats.000 A. Page 13 of 57 . 1960 and a specified date. SAS Date & Time SAS System processes calendar date values by converting dates to integer representing the number of days between January 1st.D.D. to 20. 5. Only the information necessary to derive the descriptor information or data values is stored in the file. SAS System reads & displays date. Based on this. SAS provides these libraries for files created during the session but are not required after the termination of the session. Temporary Libraries: the are available only for the current session or job run and are deleted at the end of the session or job run. Valid SAS dates can be positive or negative numbers and range from 1582 A. Informats are used to read fields according to a specified width and form while a Format writes or displays values according to a specified width and form.SAS Notes 4. 1960 and a specified datetime. the century values of dates are determined by SAS system. The first qualifier need not be specified in this case as it defaults to the temporary WORK library. SAS data Set: It is the logical structure into which Engine fits data for processing by the SAS System. 6. SAS System processes datetime by converting it to integer representing number of seconds since midnight of January 1st. YEARCUTOFF System Option : This option specifies the first year of a 100 year span used by Informats & functions. SAS data View: Obtains the descriptor information or data values or both from other files.

OTHERWISE. KEEP.        A statement end with ‘ . LIBNAME. symbols & constants. VAR. ERROR. FREQ. output OUTPUT. UPDATE input data step DATA. MODEL. DROP. DELETE. a keyword is identified by it’s location. PUT. LINK. FOOTNOTE. PAGE. END. ELSE. SELECT. except for a few statements that do not begin with a keyword – like assignment & sum statement. DISPLAY. Keywords A Keyword is a word that has particular meaning in SAS syntax. Words that are used as SAS keywords can also be used as names. Names in SAS can only be upto 8 characters long. CALL. SET. X Anywhere immediate action Page 14 of 57 . run proc WEIGHT Proc step FILENAME. The SAS language has three types of tokens – words. GOTO. WHERE Input data & proc steps ABORT. DO. FORMAT. LENGTH ACTION set variable attributes data & proc steps BY. control flow LOSTCARD. LABEL. Some keywords that begin statements are : STATEMENT WHERE ATTRIB. FILE. So. MERGE. INPUT. STOP. No blanks are allowed in SAS names. SAS Data set names can only have alphabets. OUTPUT. MISSING. INFORMAT. ‘ uppercase or lowercase allowed as SAS convert all to uppercase before compiling. RENAME data step ARRAY. Statements can begin in any column There is no special continuation characters and single statements can flow over to next line. TITLE Anywhere DM. Tokens are grouped into statements. LIST.SAS Notes Rules of SAS Language The smallest meaningful unit of a program is a token. WINDOW data step Miscellaneous data step CLASS. IF. RETAIN. RETURN. Program parameters OPTIONS. WHEN INFILE. SAS statements are named for the words they start with. ID. PROC. numbers or ‘_’ in them and cannot start with a number 1. SKIP.

They exist within the Program Data Vector but are not output to the data set being created. For all the other observations in the BY group. SAS variables have these attributes: length. _ALL_ : Depending on context _ALL_ may mean all the datasets or all the variables that are available.g. you cannot use the same name for a variable & an array in the same step. 2. it’s value is set to 0. the value of the First. format. the value of the Last. Always created within a Data step with an initial value of 1. It’s value gets incremented automatically each time the Data step executes the Data step and begins a new iteration of the Data set. then SAS assigns the automatic index variable _I_ by which elements in the array can be accessed. 7. _I_ : If the index variable is omitted from the array definition. 5. Some common automatic variables found in data steps are : 1. CARDS4. array & statement labels in different steps. you can use the same names for variables.<var_name> is set to 1 where <var_name> is the BY variable. 3. E. Variable Attributes SAS variables are of two data types: numeric and character. RUN mark end of step between steps Some other common keywords are : FROUPFORAMT LT GE CHARACTER GT IN EQ CANCEL AND DEFAULT DESCENDING LE LIKE BETWEEN MAX MIN NE NOBS SAME NOTIN NOTSORTED OR OF NUMERIC OUT NOT TO UNTIL WHILE _PAGE_. LINES. _LAST_ : specifies the most recently created dataset. 3. DATA4 etc. When an observation is the first in a BY group.SAS Notes CARDS.<var_name> : Temporary variable created by SAS to identify the beginning of a BY group. 2. DATA3. the different classes of objects in SAS programs have different name spaces. That means that you can use the same name for different objects as long as they are not the same kind of object. 8.<var_name> : Temporary variable created by SAS to identify the end of a BY group. Automatic Variables These variables are automatically created by the SAS System in various circumstances. _N_ : Denotes the iteration of the Data step. QUIT. it’s value is set to 0. Last . When an observation is the last in a BY group. LINES4. _NULL_ : specifies that no SAS dataset is to be created 9.<var_name> is set to 1 where <var_name> is the BY variable. DATA2. and label Page 15 of 57 . _DATA_ : Using _DATA_ asks the SAS interpreter to name the new dataset as a subsequent name from the series: DATA1. _Error_ : Initially set to 0. First . 4. Also. informat. For all the other observations in the BY group. 6. you could have a SAS variable TIME in a SAS dataset TIME. In addition to their type. as only one step runs at a time. Note: Generally. It’s value gets incremented if SAS encounters an error within the Data step. Since variables and datasets share the same name space.

for character variables. Besides. The default length is 8. numbers. 5.  There are three special variable lists : _ALL_ represents list of all variables available. $w. If a data step has variables AA BB CC SUM TOTAL defined in that order. A named range can be combined with a special variable list by putting the keyword NUMERIC or CHARACTER between the two hyphens in named range of variables as BB-NUMERIC-TOTAL includes only numeric variables located from BB to TOTAL in memory. _CHARACTER_ or _CHAR_ represents list of all character variables available alone _NUMERIC_ represents list of all numeric variables available alone  A colon is sometimes used to indicate alphabetic range of variable names. All numeric variables use 8 bytes of memory. Variable lists Variable lists normally consist of variable names separated by spaces.  Length : The length attribute of a variable is the number of bytes used to store each of its values in a SAS data set. Page 16 of 57 . the SAS System saves programmers from having to be concerned with the problems of converting between data types. for character variables  Label : The label attribute of a variable is a descriptive label of up to 40 characters that can be printed by certain procedures instead of the variable name. the numeric data is used for dates. As AQ: would be used to identify all variables whose names begin with the letters AQ. By this. for numeric variables. The Numeric Data Type The SAS programming language has only two data types: numeric and character. then BB--TOTAL represents the variables BB CC SUM TOTAL. including automatic variables. The default informat is w. 4.  One form of abbreviated variable list uses a hyphen to indicate variable names with a range of numeric suffixes A1-A12 is same as A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12.  Informat : A variables informat is the pattern that SAS uses to read data values into the variable. This differs from most high-level languages. $w.SAS Notes The attributes of the variables are stored in the descriptor information of the SAS dataset. The shortest length allowed is dependent on operating system and can be 2 or 3. times and logical values. Lengths shorter than 8 can be used to save storage space and I/O time but values might not be precise due to truncation. This data type corresponds to the double precision or 8-byte real type of other languages.  Format : A variables format is the pattern SAS uses to write each value of a variable. but their length attribute determines the amount of storage that will be used or the variable if it is stored in a SAS dataset. It represents the order in which they appear in memory. The default format is BEST12. which have many different numeric data types. for numeric variables. This kind of variable list is called a named range. The Length statement determines the length of a numeric variable only in the data set being created while for character variables it determines the length of the variable both in the Program Data Vector and the dataset being created.  A double hyphen indicates a different kind of range of variables.

Page 17 of 57 . Options SAS software uses 3 types of options  System options : This options will be in effect for all DATA and PROC steps in a SAS job or session unless they are re-specified in another OPTIONS statement. They are instructions that affect the entire SAS session and controls the way it performs operations. SAS applies data set options specified with input data sets before it evaluates program statements or applies data set options applied on output data sets.  Statement options : which are specified only in a given SAS statement or statements and affect only that statement or step.  SAS data set options : Which are specified in parentheses following a SAS data set’s name and affect only that data set.SAS Notes 6.

Once your data are in a SAS data set. data lines . and updating the old data sets. INFILE statement. Editing – Checking for errors in your data and correcting them. and information retrieval are all handled in DATA steps. The Data Step Before you can use SAS software to prepare your data for analysis or use a SAS procedure to analyze your data. INPUT statement. Data in job stream DATA statement. Types of DATA steps A DATA step is a group of SAS statements that begins with a DATA statement and usually includes all the statements in one of these four groups : 1.     Retrieval – Getting your input data into SAS data set.SAS Notes SAS Programming concepts The programming concepts discusses the following :       The Data step Input styles Data step statements Operators in SAS Combining datasets Arrays I. The DATA step can include statements asking SAS to create one or more new SAS data sets and programming statements that perform the manipulations necessary to build the data sets. Producing new SAS data sets from existing ones by subsetting. other SAS statements used in the DATA step Run. computing new variables Printing reports according to your specifications and writing disk or tape files. You can use the DATA step for these purposes. you must first get them into a SAS data set. INPUT Statement. you can combine the data set with other SAS data sets in many different ways to and use any of the SAS procedures. file management. Data from an external file DATA statement. 2. other SAS Statements used on the DATA step CARDS Statement. Page 18 of 57 . Data analysis. merging.

PUT Statement.SAS Notes 3. Page 19 of 57 . FILE Statement. PUT Statements write the lines of the report or file. or SET|MERGE|UPDATE Statement. SET|MERGE|UPDATE Statement. 4. The FILE statement tells SAS where to print the report or write the file. other SAS Statements used in the DATA step Run. INPUT and CARDS|INFILE Statement. Data in existing SAS data set DATA Statement . Writing reports DATA _NULL_. other SAS Statements used in the DATA step Run.

After writing the last observation. PROC steps create output SAS datasets with the OUT= option on the PROC or OUTPUT statement. most Engines also add more information to the descriptor information. SAS data steps create output SAS datasets using the DATA and OUTPUT statements. If a SAS dataset exists with the same name. Dataset Options Dataset options input or output processes related to SAS datasets.SAS Notes II. they control the data stored in SAS dataset. The SAS step first processes the descriptor information of the SAS dataset and then inputs or outputs one observation at a time. they affect the way the SAS dataset appears to the step. All output SAS datasets are new files. CNTLLEV= MEM: specifies whether to lock the entire dataset CNTLLEV= REC: specifies whether to lock only one observation at a time For Output datasets COMPRESS=: specifies whether the SAS dataset is compressed or not REUSE=: specifies whether space can be reused in compressed SAS datasets INDEX=: creates indexes REPLACE=: specifies whether to allow existing SAS datasets to be replaced by new SAS datasets with same name. Direct access is done by the SET statement with the POINT= option. Working with SAS datasets   The SAS language supports both sequential and direct access input from SAS datasets. but they do not actually change the stored data in the input file. On Output. Page 20 of 57 . the old dataset is deleted after the successful completion of the step and dataset with same name is created. Both for Input & Output datasets DROP= variable: list of variables not to be kept in the dataset KEEP= variable: list of variables to be kept in the dataset LABEL=: creates label for SAS datasets RENAME= old = new: changes the name of a variable For Input datasets FIRSTOBS=: causes the processing to begin from a specified observation OBS=: causes the processing to end with the specified observation. Sequential input is provided by the SET. WHERE=: selects observations that meet the specified condition. On Input. MERGE and UPDATE statement in the data step and the DATA= option and some other options in the proc step.

input name $ 1-8 age 12-14 sex $ 16. Fields can only be read in order Data must be standard numeric or character format. Character values cannot contain embedded blanks Missing values must be specified by “. the data type (default is numeric. Leading & trailing blank within the fields are ignored. It differs from the List Input in that it enables you to read nonstandard data for which SAS requires additional instructions. +1 salary comma5. 5. input name $char 8. 3. data newfile.. 3. SAS requires you to name the variable. 5.SAS Notes III." only. $ for character values) and the columns within which the data values are to be located for each record. run. They are not required to be aligned in columns. Data values must be within the same columns on all the input lines Character values cannot contain embedded blanks Missing values need not be specified by “. Character values can be from 1 . 2.200 characters long. so give $ for character values). SAS requires you to name the variable and the data type alone (default is numeric. Formatted Input Formatted input is used with pointer controls which controls the position of the input pointer in the input buffer when reading data. max (default) length of 8 will be applied to character values unless specifically overridden by a LENGTH statement. data newfile. run. Column Input Data values are required to be aligned in columns. data newfile. Formats or Informats will convert it to Formatted Input style. 7. Fields can be read in any order regardless of their position in the records. Features : 1. Features : 1. 4. 2. Data must be standard numeric or character format. 3. Input / Output Styles 1.".. 6. input name $ age sex $. List Input Data values are required to be separated by at least one blank (the default delimiter) or the delimiter (if specified). 2. Page 21 of 57 . +4 age 2. 4.

SAS system interprets the format and convert it to an internal format that it understands. 4. Character values can contain embedded blanks Missing values need not be specified by “. input @1 date julian5. Format: specifies the format of the values for output variables. Page 22 of 57 . 99001 age=23 salary=1000 name=Mary phone=232345 97234 phone=242334 salary=1000 name=Martin age=21 91210 age=24 salary=2000 phone= 223198 name=John 99001 name=Maggie phone=238971 age=24 salary=1600 . The SAS system converts the value to a format that is required to appear in the output.. Character values can be from 1 . 5. 2. 4. Once the INPUT statement starts reading named inputs. 3. cards. 4. the value is automatically read from the input. Cannot read data stored in non-standard form.SAS Notes run. Named Input Named input is used when data lines contain variable names followed by an equal sign and a value for the variable. whether or not it is explicitly specified in the input statement. data newfile. Features : 1. The variables in the INPUT statement do not have to be in the same order in which they occur in the data records. Is supplied along with the PUT statement or other output procedures. Fields can be read in any order regardless of their position in the records. 2. 3. Cannot switch to another input style for a particular input line once you start reading it with named input.200 characters long. Can read data stored in non-standard form. Informat: specifies the format of the values for variables being read. Features : 1. If any of the values are not in named input form then the System handles them as invalid data. If a variable that appears on the named input lines appear in any other statement. informat phone 6. All these input styles have a corresponding output style as well. Is supplied with the INPUT statement or as data set option of input data set if required. the System expects all remaining values in the input line to be of the same form. name= age= salary=.".

When you use more than one INFILE statement for the same fileref and you use options in each INFILE statement. 3. To update individual fields within a record instead of the entire record. you can use multiple INFILE and INPUT statements. which the INPUT statement then reads. you can use the FILEVAR= option. Data positions are usually stated in terms of columns which represent the distance from the beginning of the record. It can have several uses: 1) To bring input data line into the input buffer without creating any SAS variables. Use options that are common to both the INFILE and FILE statements in the INFILE statement instead of the FILE statement. use the SHAREBUFFERS option. (FILEVAR= enables you to dynamically change the current input file within your SAS job. It sets the current input file. That is. Default is NOPAD LINESIZE= / LS=: Limits the number of characters in a record available to the INPUT statement. INPUT operation The syntax for input from text files involves 2 statements:  INFILE – provides general identification information about the input file  INPUT – controls the way the input data in interpreted and assigned to variables A blank INPUT statement (with no arguments) is called a NULL INPUT statement. you can use it in conditional processing (in an IF-THEN statement. This data line can be copied as such to output file 2) Or release an input line held by a trailing @ or double trailing @. It prevents the INPUT statement from reading past a certain column. You can read from several external files within one DATA step. INFILE Because the INFILE statement identifies the file to be read. Page 23 of 57 . Any such options used in the FILE statement are ignored. even within the same data step. Specify the same fileref or physical filename in each statement. for example). To read from multiple input files in a single iteration of the DATA step. then close it and open another. SAS supports sequential input & output for text files. The INFILE statement is an executable statement. To do so. SAS input / output syntax for text files is more powerful and flexible than any of the classic high level languages. As it is executable. 1. follow these steps: 1.SAS Notes IV. To read from one file. You can use the INFILE statement in combination with the FILE statement to update records in an external file. Specify the INFILE statement before the FILE statement. Working with External files A Text file is a sequence of records. So the INFILE has to be executed in every repetition of the observation loop that executes INPUT statement. It must be executed before the INPUT statement to which it refers. the options specified in each INFILE statement are added to the options specified in any previous INFILE statements for that file. Options LRECL=: The number of characters in a record PAD: Pads input fields shorter than the LRECL value with trailing blanks. 2. The current input file is changed to CARDS at the top of the observation loop. it must execute before the INPUT statement that reads the data lines. the effect is additive.

UNBUFFERED / UNBUF: Tells the SAS supervisor not to look ahead at the next record when reading a record. using the DSD option enables you to place the character string in quotes and read a comma as a valid character. OBS=: The number of the last record to be read from the input file. The quotes are not stored as part of the character value. The END= variable cannot be used to indicate the last line in the input file. Otherwise. you must specify it with the DELIMITER= option. A value can be assigned before the PUT statement to change the extend of the _INIFLE_ string. FLOWOVER: The remaining variables are read from the first column of the next record. Use this option to skip records at the beginning of the file. When you use the DSD option. Page 24 of 57 . Changing the value of the variable between the INPUT & the PUT statement can change the length of the _INFILE_ string. When the DSD option is in effect. For example. consecutive delimiters are treated as a unit. if data are separated with commas. This means that any positions skipped over by the PUT statement will stay the way they were before. COLUMN / COL= variable: designates a numeric variable that the INPUT statement sets to the column pointer location LINE= variable: designates a numeric variable that the INPUT statement sets to the line pointer location. START: designates the numeric variable that identifies the starting character to be used I the _INFILE_ string. use the DSD option. a value that is missing between consecutive delimiters is read as a missing value. DELIMITER= / DLM=: Delimiter used in list input. DSD: This option changes the way delimiters are treated when using list input and enables you to read delimiters as characters within quoted strings. the positions in an output record that the PUT statement does not write to are filled with blanks. By default. before it finds all values for all the variables in the record. END= variable: designates a numeric variable that the INPUT statement sets to 1 when it reads the last record in the file.SAS Notes FIRSTOBS=: The number of the first observation to be read from the input file. The variable then can be sued with $VARYING informat to read varying length records. FILEVAR= variable: Changing the value of the character variable causes the INFILE statement to close the input file and to open the file whose physical name is the value of the variable. The default is ‘ ‘. To read a value as missing between two consecutive delimiters. Use this option to skip records at the end of an input file. N=: Specifies the number of lines available to the input pointer. This option makes it possible to change some fields in a file without processing other fields. SHAREBUFFERS / SHAREBUFS: Use this option for text files being edited to use the same buffer for input & output. the delimiter is assumed to be a comma. therefore. EOF= label: The INPUT statement branches to the statement label indicated if it attempts to read past the end of a file. The DSD option also enables list input to read a character value that contains a delimiter within a quoted string. This is the default action. LENGTH= variable: designates a numeric variable that contains the length of the input line. If the data contain another delimiter. consecutive delimiters are treated separately. The following determines what the INPUT statement does when it gets to the end of a record.

The default INPUT file is CARDS 2. which the PUT statement then writes. Options Many of the options find a similar one for INFILE statement. you can use it in conditional processing (in an IF-THEN statement. OUTPUT operation The syntax for input from text files involves 2 statements:  FILE – provides general identification information about the output file  PUT – controls the way the input data in interpreted and assigned to variables FILE The FILE statement is an executable statement.SAS Notes TRUNCOVER: Using the TRUNCOVER option enables you to read variable-length records when some records are shorter than expected by the INPUT statement. at the top of each page of a PRINT file. It must be executed before the PUT statement to which it refers. MOD: This option makes the step writes output records at the end of the file. It sets the current output file. replacing any previous contents of the file. LRECL=: The number of characters in a record PAD: Pads input fields shorter than the LRECL value with trailing blanks. for example). LINESIZE= / LS=: Limits the number of characters that can be written to a record by the PUT statement. PRINT or NOPRINT: Tells whether a file is a print file or a non-print file. So the FILE has to be executed in every repetition of the observation loop that executes PUT statement. STOPOVER: The step stops running as an error condition is created. To write to multiple output files in a single iteration of the DATA step. adding records to the previous contents of the file. defined in a TITLE statement. NOTITLES: tells the supervisor not to put the current titles. MISSOVER: The remaining variables are assigned missing values. FIRSTOBS=: The number of the first observation to be written to the output file. You can write to several external files within one DATA step. PAGESIZE / PS : determines the number of lines per page of output. OLD: This option makes the step writes output records at the beginning of the file. The _INFILE_ string refers to the last record read from the current input file. you can use multiple FILE and PUT statements. Page 25 of 57 . Default is NOPAD for variable length records and PAD for fixed length records. Use this option to skip records at the beginning of the file. The current output file is changed to LOG at the top of the observation loop. As it is executable.

DROPOVER: The option discards data items that exceed the output line length as specified by the LINESIZE= option in the FILE statement and the column pointer remains positioned after the last value written in the current line. Page 26 of 57 . it branches out to the HEADER= statement label to execute a group of statements there until a RETURN statement is reached. SHAREBUFFERS / SHAREBUFS: Use this option for text files being edited to use the same buffer for input & output. The System writes the portion of the line built before the error occurred and issues an error message. This option makes it possible to change some fields in a file without processing other fields. DELIMITER= / DLM=: Delimiter used in list output. HEADER= label: When the PUT statement writes to the end of a PAGE. Use this option to skip records at the end of an output file. The default OUTPUT file is LOG. STOPOVER: stops processing the data step immediately if a PUT statement attempts to write a data item that exceeds the current line length. including the current line pointer. LINESLEFT= / LL= variable: designates a numeric variable that tells the number of lines remaining on the current page. N=: Specifies the number of lines available to the output pointer. COLUMN / COL= variable: designates a numeric variable that the PUT statement sets to the column pointer location LINE= variable: designates a numeric variable that the PUT statement sets to the line pointer location.SAS Notes OBS=: The number of the last record to be written to the output file. Otherwise. The default is ‘ ‘. This means that any positions skipped over by the PUT statement will stay the way they were before. the positions in an output record that the PUT statement does not write to are filled with blanks. When a PUT statement attempts to write beyond the maximum allowed line length (as specified by LINESIZE= option in FILE statement). FILEVAR= variable: Changing the value of the character variable causes the FILE statement to close the output file and to open the file whose physical name is the value of the variable. the following options on the FILE statement can cause varying results FLOWOVER: The current output line is written to the file and the data item that exceeds the current line length is written to a new line. for output.

only comparison with a string constant.’aa’). will return values that start by any of these character set.g. / . If 0 the pointer moves to column 1.Moves the pointer n columns. Any decimal portion of variable values is truncated and only integer values are used.Is a character comparison operator that modifies existing comparison operators compare all values that start with a given character. when used to compare 2 text strings. irrespective of their total length. or if ‘ABC’ =: ‘AB’. even across iterations of the DATA step. + .SAS Notes Pointer Controls & Line-Hold Specifiers As the SAS System reads values from data records in the input buffer. if ‘ABC’ =: ‘ABCD’.Advances the pointer to column 1 of the next line. : . if upcase (charvar) =: “SMIT”. It has no effect when it is used between two variables.Moves the pointer to line n. it keeps track of its position with a pointer. It changes the nature of the comparison from an exact match to a “begins with” match. if a NULL INPUT statement executes Page 27 of 57 . Trailing @@ . will return records where variable charvar values begin with B. Pointer controls are provided on the Input statement so that you can reset the position of the pointer to read data values in records at certain positions. E. Here each input line contains values for several observations. # . If charvar in: (‘SM’. @ .g. Any decimal portion of variable values is truncated and only integer values are used. Between INPUT statements the pointer position remains the same. Trailing @ . Prevents the next INPUT statement from automatically releasing the current Input record and reading the next one into the input buffer. If charvar >: ‘A’. e.’Will’. Any decimal portion of variable values is truncated and only integer values are used. An input line held by the system is released immediately if the pointer moves past the end of the line.Column pointer control that moves the pointer to column n.To allow a record to be held for the next INPUT statement.To allow the next INPUT statement in the same DATA step to read from the same record. the longest string will be truncated to the length of the shortest one for the purpose of evaluation. Line-hold specifiers allow you to hold a data record in the input buffer to be processed by another INPUT statement. will both evaluate to TRUE. If 0 the pointer moves to column 1.

The fileref is a logical name to the physical file. positional. File handling Statements  CARDS – precedes card data or lines entered at terminal . Executable statements (denoted by X) are programming Statements that cause some action.(X)  SET – reads observations from one or more existing SAS data sets.data that are part of the job stream (P)  CARDS4 – precedes in-stream data lines containing semicolons (P)  DATA – tells SAS to begin a DATA step and to start building a SAS data set (P)  FILE – identifies the data file where lines are to be written by the DATA step (X)  INFILE – The INFILE statement gives the fileref of the control Statement (FILENAME statement). depending on the mode of executing. Declarative Statements (D) supply additional information to SAS. Positional Statements (P) cause no action at execution. 1.  LOSTCARD – corrects for lost data lines when an observation has an incorrect number of data lines. (X)  UPDATE – applies transactions to a master file. action Statements. DATA step Statements The SAS Statements that can appear in a DATA step fall into several categories: File handling Statements.  MISSING – declares that certain values in the input data represent special missing values for Page 28 of 57 .SAS Notes V. Both transaction and master file are SAS data sets (X) 2. Action Statements  ABORT – stops the current DATA step or the job. and information Statements.(X)  MERGE – combines observations from two or more SAS data sets into a new data set.  LIST – lists the current input lines to the LOG. Control Statements. (X)  PUT – describes the format of the lines to be written by SAS. The fileref identifies the external file containing raw data to be read. Each Statement is either executable. or declarative.  Assignment – creates and modifies variables.  ERROR – writes messages on the SAS log. (X)  INPUT – describes the records on the external input file. but their position in DATA step is important. When the INFLE statement is executed the external file is opened.  CALL – invokes or calls a routine.  DELETE – excludes observations from the data set being created.

 FORMAT – specifies formats for printing variable values. 4.  KEEP – identifies variables to be included in a data set or analysis. Information Statements.  IF-THEN/ELSE – conditionally executes a SAS statement  LINK-RETURN – causes SAS to jump to a labeled statement in the step and execute statements until it encounters a RETURN Statement.  ARRAY – defines a set of variables to be processed the same way.  STOP – stops creating the current data set. (D)  ATTRIB – specifies a format.  RETURN – when not combined with a LINK statement. informat. returns to the statement immediately following the most recently executed LINK.  subsetting IF – selects observations for the data set being created. causes SAS to return to the beginning of the DATA step to begin execution.(D)  DROP – identifies variables to be excluded from a data set or analysis. Control Statements  DO – sets up a group of statements to be executed as one statement.(D)  BY – specifies that the data set is to be processed in groups defined by the BY variables. When combined with a LINK statement.  INFORMAT – specifies informats for storing variable values.  SELECT – conditionally executes one of several SAS Statements.  GO TO – causes SAS to jump to a labeled statement in the step and continue execution at that point.  iterative DO  DO UNTIL  DO WHILE  END – signals the end of a DO or SELECT group.  SUM – accumulates total 3. Page 29 of 57 . label and length for a variable.SAS Notes numeric data fields.  OUTPUT – creates new observations.

Page 30 of 57 .  RETAIN – identifies variables whose values are not to be set to missing each time the DATA step is executed and can give variables an initial value for the first iteration (otherwise for first iteration it would have been missing).  RENAME – changes the name of the variables in a data set.SAS Notes  LABEL – associates descriptive labels with variable names. Sum statements variables are retained by default.

x=a<>b. ' || 'Smith'. Group III + - addition subtraction f=g+h. if a=b or c=d then x=1. x=a><b. if not z then put x. Group IV || concatenate values Group V < <= = ^= > >= LT LE EQ NE GT GE IN less than less than or equal to equal to not equal to greater than greater than or equal to equal to one of a list if y < z then put x=. Operators in SAS Priority Symbol Group I ** + ^ or ~ >< <> Mnemonic Equivalent Definition Example NOT MIN MAX exponetiation positive prefix negative prefix logical NOT minimum maximum y=a**2.SAS Notes VI.'f') then result='correct'. y=+(a*b). f=g/h. Group II * / multiplication division c=a*b. z=-(a*b). if sex in ('m'. character name = 'J. Group VI & | AND OR logical AND logical OR if a=b and c=d then x=1. if z ge a then output. Page 31 of 57 . if y eq (a+b) then output. f=g-h. if y le z then put x=. if x ne z then output. if z gt a then output.

Set ourfile1 ourfile2…. The observations can be ordered by sorting or indexing the dataset. Note: Using 2 files in SET statement is equivalent to using proc APPEND if the output dataset also appears as the first dataset on the SET statement. Merge in2 in3. When processing SET. Using more than 1 SET statement is equivalent to MERGE operation. Run. E. Combining Data Sets Introduction The SAS System provides a means for processing observations that are ordered or grouped according to the values of one or more variables read from existing SAS data sets. The number of observations in the new dataset is the sum of observations of the old data sets and the order is all observations from the first followed by all from the second. The SAS System expects observations to be ordered or grouped by the value of the variables specified in the BY statement. Is the same as. observations form one data sets have missing values for variables defined only in the other data set. Set in2 in3. Ourfile1 OBS 1 2 3 4 5 6 COMMON a b c d e f ANIMAL ant bird cat dog eagle frog Page 32 of 57 .SAS Notes VII. The NOTSORTED option in the BY statement is used when the data is not in alphabetical or numeric order but are arranged in groups according to the values of the BY variable. Is the same as..variable. the SAS System reads one observation at a time into the program data vector according to the values of the BY variable. Proc append base=in2 new=in3. Set in3. If input data set contains different variables. The SAS System detects the pattern by tracking the values of the temporary variables FIRST. it expects the next observation to be from the next BY group. MERGE & UPDATE statements. After processing all the observations from one BY group. Data in1. E.g. The most frequent use of BY group processing in the Data step is to combine two or more SAS data sets. 1 Concatenating data sets Data myfile.g.variable & LAST. Data in1. Concatenation is combining two or more datasets one after the other into a single dataset. Set in2. Data in2.

the data set ourfile1 & ourfile2 are SET by the variable COMMON Myfile OBS 1 2 3 4 5 6 7 8 9 10 11 12 COMMON a a b b c c d d e e f f ANIMAL Ant PLANT apple Bird banana Cat coconut Dog dewberry Eagle eggplant frog fig Page 33 of 57 . Interleaving Data myfile. By var1 var2 var3….. The observations in the new data sets is arranged by the value of the BY variable and within each BY group by the order in the old data sets. In the example... Set ourfile1 ourfile2 ourfile3…..SAS Notes Ourfile2 OBS 1 2 3 4 5 6 COMMON a b c d e f PLANT apple banana coconut dewberry eggplant fig myfile OBS 1 2 3 4 5 6 7 8 9 10 11 12 COMMON a b c d e f a b c d e f ANIMAL ant bird cat dog eagle frog PLANT apple banana coconut dewberry eggplant fig 2. The sum of observations in the new data set is the total of observations of the old data sets.

When the number of observations are unequal. One to One Reading Data myfile. If there are several observations with the same BY variable values.SAS Notes 3. Similarly all nth observations from all data sets are merged to from nth observation in new data set. The number of observations in the new data set is equal to the number of observations in the largest data set. Set ourfile2. Merge ourfile1 ourfile2.. Merge ourfile1 ourfile2…. But. they are matched in a manner similar to the one-to-one merging process.. The new data set contains all the variables from all the input data sets. Set ourfile1. the values read from the last data set replace those read from earlier ones. Myfile OBS 1 2 3 4 5 6 COMMON a b C D E F ANIMAL ant bird cat dog eagle frog PLANT apple banana coconut dewberry eggplant fig the result is similar the result obtained by One to One Reading when the number of observations in the merging data sets are equal. the SAS System stops processing before all observations are read from all data sets with One to One Reading.. 5. Observations from the different datasets with the same BY variable values are combined. One to One Merging Data myfile. Match-Merging Data myfile. By var1 var2…. in this Page 34 of 57 . If the data set contains common variables. Myfile OBS 1 2 3 4 5 6 COMMON a b c d e f ANIMAL ant bird cat dog eagle frog PLANT Apple Banana Coconut Dewberry Eggplant Fig 4. The number of observations in the new data set in the number of observations in the smallest original data set. Match-Merging combines observations from two or more data sets into a single observation in the new dataset according to the values of the common variable. SAS system combines the first observation from all the data sets in the MERGE statement into the first observation in the new data sat.

The number of observations in the new data set is equal to the total of the largest number of observations in each BY group from among all input data set. Duplicate values of BY variable Ourfile1 OBS 1 2 3 4 5 6 COMMON a a b c d e ANIMAL ant ape bird cat dog eagle Ourfile2 OBS 1 2 3 4 5 6 COMMON a b c c d e PLANT apple banana coconut celery dewberry eggplant Myfile OBS 1 2 3 4 5 6 7 COMMON a a b c c d e ANIMAL ant ape bird cat cat dog eagle PLANT apple apple banana coconut celery dewberry eggplant Page 35 of 57 . if one SAS dataset has fewer observations in a BY group than the other. Each observation in new data set contains all the variables from all data sets. instead of missing values. between different input SAS datasets in a MERGE statement. When there is a conflict in the value of the variable. the value from the SAS dataset named later in the MERGE statement is used.SAS Notes case. the values of the last observation in the BY group are used to form the rest of the observations in the BY group. Missing values are used when one SAS dataset has no observations in a BY group. other than a BY variable.

Eggplant fig 6. Updating Data sets Data myfile. PLANT apple Banana Coconut . All observations in the find relation with observations in the master data set by values of the BY variable. Master OBS 1 2 3 4 COMMON a b c d ANIMAL ant bird cat dog PLANT apple banana coconut dewberry Page 36 of 57 . Update master trans. By common. The values of the BY variable or combination of BY variables must be unique for each observation in the master data set. The number of observations in the new data set is the sum of the observations in the master data set and the number of unmatched observations in the transaction data set.SAS Notes Nonmatching observations Ourfile1 OBS 1 2 3 4 COMMON a c d e ANIMAL ant cat dog eagle Ourfile2 OBS 1 2 3 4 5 COMMON a b c e f PLANT apple banana Coconut Eggplant Fig Myfile OBS 1 2 3 4 5 6 COMMON a b c d e f ANIMAL ant . The update statement uses observations from the transaction data set to change values of corresponding observations from master data set. The BY variables do not get updated. cat dog eagle .

SAS Notes 5 6 e f eagle frog eggplant fig Trans OBS 1 2 3 4 5 6 COMMON a b c d e f PLANT apricot barley cactus date eucalyptus fennel Myfile OBS 1 2 3 4 5 6 COMMON a b c d e f ANIMAL ant bird cat dog eagle frog PLANT apricot barley cactus date eucalyptus fennel Duplicate values of BY variable Master OBS 1 2 3 4 5 6 7 COMMON a b b c d e f ANIMAL ant bird bird cat dog eagle frog PLANT apple banana banana coconut dewberry eggplant fig Trans OBS 1 2 3 4 5 5 6 COMMON a b c d d e f PLANT apricot barley cactus date dill eucalyptus fennel Myfile OBS 1 2 3 4 COMMON A B b c ANIMAL ant bird bird cat PLANT apricot barley banana cactus Page 37 of 57 .

the special missing value . SET. garnet Myfile OBS 1 2 3 4 5 6 7 COMMON a b c d e f g ANIMAL Ant . However. . Non-matched observations Master OBS 1 2 3 4 5 COMMON A C D E f ANIMAL Ant Cat Dog Eagle Frog COMMON a b c e f g PLANT Apricot Barley Cactus . Any observation from the transaction data set that does not correspond to the master data set is written to the program data vector and becomes the basis for an observation in the new data set. SAS applies both transactions to the observation and the last value copied into the program vector is written into the new data set. If necessary.SAS Notes 5 6 7 d e f dog eagle frog dill eucalyptus fennel If the master data set contains two observations with the same value of the BY variable. Coconut Dewberry Eggplant Fig Trans OBS 1 2 3 4 5 6 MINERAL Amethyst beryl . Fennel Grape PLANT . Warning message are also generated as the BY variables are to be unique. This means that data at the end of the other stream is never reached. . Input files can be coordinated by using the END= option on the INPUT. There can be more than one input file involved in the creation of an observation by combinations of SET and INPUT statements. Page 38 of 57 . . Garnet Only non-missing values from the transaction dataset is used when updating values in the master dataset. In these cases. or MERGE statements and using programming statements to identify the end of an input stream. PLANT Apricot Barley Cactus Dewberry. . the step stops after one of the input streams (they operate completely independent of each other) reaches the end of data. the first observation is updated and the second is ignored._ can be used to update a value to missing. Eggplant Fennel Grape MINERAL Amethyst Beryl . If the transaction data set contains duplicate values of the BY variable. Cat Dog Eagle Frog .

@12 AGE 2. OBS 1 2 3 4 5 6 SEQNO 1 2 3 . @15 BIRTHDAY $5. NAME AGE MOLLY SUSAN ARCHIE BILLY JOHN SARAH 24 22 22 26 22 27 BIRTHDAY 23JAN69 23FEB72 01OCT59 29JUL79 04NOV75 11MAY79 Page 39 of 57 . RUN. . .. INFILE FLATFILE END= LAST1. IF LAST1 AND LAST2 THEN STOP. SEQNO =_N_.SAS Notes The following code does a one-to-one merge of a SAS dataset with a sequential file to create a new SAS dataset. IF NOT LAST1 THEN INPUT @1 NAME $10. IF NOT LAST2 THEN SET SASFILE (KEEP=SEQNO) END=LAST2. SEQUENTIAL FILE FLATFILE NAME AGE BIRTHDAY 24 22 22 26 22 27 23JAN69 23FEB72 01OCT59 29JUL79 04NOV75 11MAY79 MOLLY SUSAN ARCHIE BILLY JOHN SARAH SASFILE STRUCTURE OBS 1 2 3 SEQNO 1 2 3 SAS CODE: DATA NEW.

array journal[4] &list. cards.more SAS statements -run. subscript values can be passed as number. e. proc print data=myfile. %let list ref usage intro glossary. e. To use in several data step. FUNCTIONS Page 40 of 57 . Here books[2] and usage are equivalent. 56 . array books [4] ref usage intro glossary. end. 45 46 112 23 65 53 123 . -. data two. data one. Array books[4] ref usage intro glossary.g. ARRAYS Array arrayname [ number-of-elements ] list-of-variables e.SAS Notes VIII.g.g. 23 134 45 . To refer to all variables in an array use the special array subscript asterisks (*) IX.more SAS statements -run. The Array statement defining the array must appear in a data step before any reference to that array. variable with numeric value or an expression. An array definition is only in effect for the duration of the data step. if books[count] =. title ‘Data set produced with array processing’. array books[4] &list. run. Data myfile (drop=count). then books[count] = 0. you must redefine the array in each data step. -. Input ref usage intro glossary. do count = 1 to 4. 154 32 .

45. Returns a non negative number equal in magnitude to that of the argument.g. put y . x = byte(80).g.1) <== x will return 3. d = day (now). x= ceil(2. DHMS Returns a SAS datetime from date. x = abs(-2.SAS Notes ABS e. second) e.g. COMPRESS Removes specified character(s) from character expressions.g. <== d will hold the value 4. DATETIME Returns the current datetime value. x = ‘A. will be removed e.g. BYTE e.22.g.B (C=D). now = ’04May98’d. y = compress (x .4).bound-n) Bound-n – Specifies the dimension in a multi-dimension array for which you want to know the number of elements e.().20). put thedate wordate. e. minute.g. 3) returns the number of elements in the second dimension of a multi-dimension array array2. <== way to specify constant datetime values.’. if any.’. <== will give you AB C=D DATE Returns current date as a SAS date value DATEJUL Converts a Julian date to a SAS date value. thedate = datepart (datetm). If second argument is not specified then blanks. Returns the date of the month from a SAS date value. dt = dhms(‘01jan97’d. minute and second.g. <== will return & in EBCIDIC CEIL e. e. x= datejul(01001).g. DAY e. datetm = ‘01feb98:8:45’dt . Smallest integer that is greater than or equal to the argument.. DIM Returns the number of elements in an array Syntax : DIM <n>(arrayname)| (arrayname. <== will return value in SAS date of 14976 DATEPART Extracts the date from a SAS datetime value. dim(array2 . Page 41 of 57 . Returns the nth character on the ASCII or EBCIDIC collating sequence. put x. <== way to specify constant values for dates. hour. hour.’). <== prints the date alone in wordate fromat. Syntax : DHMS (date.

000 from values like 10000 INTNX Syntax : INTNX(interval. The informat specified determines whether the result is numeric or character.20). The interval must be a character constant or variable whose value is one of those listed below. Returns 0 if the excerpt is not found.SAS Notes EXP Raises the ‘e’ to the power supplied by the argument. from.6) <== returns 2.g. to) e. Returns 0 if the excerpt is not found. time.g.). e. yr = intnx(‘year’. e.’01may99’d).g.’05jan89’d). FLOOR HMS HOUR Largest integer that is less than or equal to the argument. INTCK Gives the number of intervals in a given time span Syntax : intck(interval. excerpt) INPUT The function allows you to read argument using any informat specified by the second argument. minutes & seconds values. Syntax : index (source. fmtsale = input(sale. e. <== will return value 5 Page 42 of 57 . Used to convert character values to numeric values. Returns hour from a SAS time or datetime value INDEX Searches the source for the character string specified by the excerpt and returns its position. <== will return $10. <== will return value 89. Returns a SAS time value from hours. time. comma8. This function generates a SAS date. excerpt) INDEXC Searches the source for the characters specified by the excerpt and returns its position.g. or datetime value that is a given number of time intervals from a starting value(from). from. number) advances a date.g. Syntax : index (source. or datetime value by a given interval. x= floor (1. sastime = hms(14.45.’10jan98’d. qart = intck(‘qtr’. DATE interval DAY WEEK MONTH QTR YEAR DATETIME interval DTDAY DTWEEK DTMONTH DTQTR DTYEAR TIME interval HOUR MINUTE SECOND e.

e. it returns a portion of an expression you specify as argument.g. e. If the value is missing the length returns a value of 1. p = left(‘sdqwert’). The result of a put function is always a character string. partend = substr(var1. jul = juldate(sdate). var= ‘CATNAP’.g. e.6). replacing n characters starting with the character you specify in position. p = left(sd).g. SUBSTR Syntax : substr(argument.n>) If used on the right side of an assignment statement. it places the value of the expression on the right side of the assignment statement into the argument of SUBSTR. e. Page 43 of 57 .3. <== will return value of ‘KIDNAP’ into variable var. If the argument is an uninitialized numeric variable. e. This is useful for converting a numeric value to a character value. <== will return character string of length 6 from the 3 rd position. sdate = ’01feb99’d. <== will return character string from the 3 rd position till end. p = max(‘sdqwert’).g.position<.g. <== will return value of 7 for variable p MAX Returns largest of the non-missing arguments. If used on the left side of an assignment statement.3). Put var. <== will return value of 7 for variable p PUT Specifies an output format for a value. it returns a value of 12. If the value is missing the length returns a value of 1.SAS Notes JULDATE Returns the Julian date from a SAS date value e. <== will return value of 7 for variable p LENGTH Returns the length of an argument.1. p = lenght(‘sdqwert ’). <== will return value 99032 LEFT Left aligns a SAS character expression. e. substr(var. SYSPARM The SYSPARM function lets you access a character string specified with the SYSPARM = system option in the job control for your job or in an OPTIONS statement. <== will return value with trailing blanks LENGTH Returns the length of an argument as the right-most non-blanks character in the argument. sd = ’ feb day’. part = substr(var1.3) = ‘KID’.g.g.

SAS Notes Page 44 of 57 .

WHERE : limits the observations selected from DATA= dataset that are to be appended to BASE= dataset. FORCE : forces PROC APPEND to concatenate datasets when the DATA= dataset contains variables that are either  not in the BASE= dataset or  do not have the same type as the variables in the BASE= dataset or  are longer than the variables in the BASE= dataset. Matching observations : observations that have the same values for all variables that occur in the same position in the datasets. Values in pairs of observations that match It produces 1.SAS Notes Procedures Append The APPEND proc adds the observations from one dataset to the end of another. Proc append base = myfile new = newfile (where ( x=2 ) ) force. Variables 3.   matching variables : variables with the same name or those explicitly paired by VAR and WITH statements. Data set attributes 2. with student1 birth5 state3. Page 45 of 57 . out = newfile. Options BASE : names the dataset to which observations are added. Printed reports 3. PROC COMPARE compares the following in order: 1. An output dataset 2. The advantage over SET operation is that it bypasses processing of data in BASE= dataset and adds new observations directly to the end of the BASE= dataset. If not found a new data set with the name specified is created DATA : names the SAS dataset with observations which is to be added to the end of the BASE= data set. Proc compare base = myfile (where (state = ‘NC’)) compare =yourfile. var student birth state major. Attributes of variables 4. Observations 5. Compare This proc compares the contents of two datasets or compare the values of different variables within a single dataset to produces a variety of reports. A numeric return code stored in the automatic macro variable &SYSINFO Contents This Proc provides information about a SAS data library or individual files in a SAS data library.

Some of the differences in DATASETS compared with other procs: 1. It can take the following values: ACCESS – access files created using SAS/ACCESS software ALL – all member types CATALOG – catalogs DATA – SAS datasets PROGRAM – stored compiled SAS programs VIEW – views created using SQL procedures NODS : Only the SAS data library is printed if this option is used. _all_ memtype = view nods position. DATA : specifies the SAS dataset or library whose information is to be determined. This option prints a second list of variables names in the order of their position in the dataset.SAS Notes libname person ‘ library-dataset ’. libname eat ‘dataset 2 ’. copy. rename and delete SAS files and to manage indexes and append SAS datasets in a data library. Run. libname drink ‘dataset1 ’. Proc datasets library = eat memtype = data. run. INDEX CREATE. Groups of statements can execute without a RUN statement. The output dataset contains information similar to that given in the variable description section in the printed output POSITION: The default order of listing variables names in the SAS dataset is alphabetical. RENAME. LABEL. The information of all the files in the library having type as specified by MEMTYPE= option is gained by using the keyword _all_ MEMTYPE : Specifies one or more types of members in the SAS data library. Delete sandwich. Statements are executed in the order of writing. Copy out = drink move. Run. 4. and COPY procedures. Change apple = apricot. 3. E.g. There is a dependence of some statements on other statements. The input library is specified in the LIBRARY= option 2. FORMAT. INFORMAT. OUT : gives the name of the output SAS dataset. Page 46 of 57 . Select custard icecream. Proc contents data=person . It also provides all the capabilities of the APPEND. Run. INDEX DELETE can only be used after a MODIFY statement 5. out = newfile. The DATASET procedure remains active until you type one of the following :  QUIT  RUN CANCEL  A new PROC or DATA statement If a syntax error is encountered the RUN group containing the error is not executed. the SELECT & EXCLUDE statements can only be executed immediately after a COPY statement. CONTENTS. Datasets This Proc is used to list.

Formats & informats gives the SAS system information about data that is to be read or written. The MOVE option is used to delete from the input library after copying DELETE : Use DELETE statement to specify members to be deleted from SAS library. For example. Only one dataset name is allowed per modify statement. How many bytes it occupies 3. 2. Convert a character number to a character string . E. Rename price = rate. They can have an optional width specification before the period. prefix & negative number representation. User defined informats convert character input values into a different form 1. Decimal placement for numbers 4. Convert from one character string to another character string . decimal & comma punctuation.YES to Y 3.g.YES as Y There are two types of formats : Value format : converts output values to a different form. Only applicable to numeric values. etc.SAS Notes Modify hamburger. Proc Format. Specify a template to format the way a numeric value is printed – print in the format of a telephone number. Data type 2. value SEX 1 = “Male” 2 = “Female” . numeric variable values converted to character form for output when the numeric format SEX is used. Format stored as number but formatted as character values. create a control dataset for writing other informats and formats or read a control dataset to create informats & formats. 3.1 to YES Convert a character string to number .YES as 1(numeric) Convert a character string to another character string . Options with FORMAT procedure can be used to print the contents of a format library.g. How to handle leading trailing or embedded blanks or zeros. values of OUI & NON are stored as YES and NO by the SAS system when the character informat $FRENCH is used. invalue $FRENCH ‘OUI’ = ‘YES’ ‘NON’ = ‘NO’. MODIFY : USE MODIFY statement to change the attributes of the specifies datasets. quit. 1. Page 47 of 57 . In the second case. Convert number to a character string – 1 to YES 2. CHANGE : Use CHANGE <old_name> = <new_name> statement to rename one or more members COPY : Use COPY OUT = <from_lib> to copy members from one library to another. In the first case. Picture format : specify template for printing numbers giving specifics like leading zeros. Numeric formats & informats can also have an optional decimal specification after the period. 1.000’. Proc Format. A word immediately followed by a period indicates a format or informat name. User defined formats convert a value to a different form for output E. Formats This proc is used to create your own formats & informats. fill characters. picture PHONENUM other = ‘000 / 000 .

run. by sl_no. Summary or Means The summary procedure computes descriptive statistics on numeric variables in a SAS dataset and outputs the results to a new SAS dataset. Each observation in the new dataset contains the statistics for a different subgroup of the observations in the input dataset representing all possible combinations of the levels of variables specified in the CLASS statement.  Std Deviation. 1003 32 1003 22 . This procedure creates a SAS dataset containing summary statistics or descriptive statistics on numeric variables. id name. CLASS : This specifies the variables used to form sub-groups. The level of interaction between the variables specified is obtained by this statement. ID : If additional variables from the input dataset are to be included in the output dataset. BY : A separate analysis on observations in the group specified by the BY variables is obtained.g. CLASS. The difference between Means & Summary is that Summary does not produce any printed output on its own. output out = min_ht min(ht_loss)=. FREQ. output out = min_wt min(wt_loss )=.  minimum &  maximum value for that sub-group for different values of _TYPE_.SAS Notes picture FAX other = ‘0999 ) 999 . If a variable is taken into account for a certain sub-group then it’s is assigned a binary value 1. The SAS dataset should be sorted by the BY variables if the NOTSORTED option is not used. If this statement is not used then all variables except those in the BY. then they can be given with the ID statement. The decimal equivalent of this binary numbers for a subgroup is the _TYPE_ value for that sub-group. e. PROC SUMMARY data = ht_wt. The summary output data set is typically printed with PROC PRINT or is input to a DATA step that extracts the desired information. The output produces statistic info like  number of observations.999’ ( prefix = ‘(‘ ). var wt_loss ht_loss. The following code produces the result as : Input file IN1: code age 1000 23 1001 . 25 date 99123 99123 99123 00123 00123 ind Y N Y Y Y bit 1 1 0 0 0 Code: Page 48 of 57 . else it is 0. then this forms a single record in the output dataset. If OUT= option is used. ID and WEIGHT statements are analyzed.  Mean . VAR: The variables in the dataset for which statistics have to be calculated. class group.

00 MEAN 27.00 .0000 1000.SAS Notes DATA IN2.0000 1003.0000 2.50 STD 6.00 STD . CLASS IND BIT.00 23.00 27.00 . 1.12 0. 1001. BY DATE NOTSORTED.3640 1. .00 23. 1001.0000 1003. 1001.00 MAX 32. .0000 1000.00 32.00 MIN 23.00 Page 49 of 57 .0000 1.50 . run.00 23. 1001.0000 2.0000 1003.0000 1000.0000 1000.0000 1000. .0000 1000.00 32.33 6. 0.0000 3.0000 1003.00 32.0000 1.00 32. .00 32.0000 1003.00 .5000 1001. N 2.00 23. OUTPUT OUT=IN3.5000 1001. 1.71 0. N 2.3640 2.53 1.00 32.00 MAX . will give : DATE 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 99123 123 IND BIT 0 0 0 0 0 1 1 1 1 1 N N N N N N Y Y Y Y Y Y Y Y Y Y N N N N Y Y Y Y Y 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 _TYPE_ 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 _FREQ_ 3 3 3 3 3 1 1 1 1 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 _STAT_ N MIN MAX MEAN STD N MIN MAX MEAN N MIN MAX MEAN STD N 1 1 1 1 2 2 2 2 2 N MIN MAX MEAN STD N MIN MAX MEAN STD N MIN MAX MEAN STD AGE CODE 2.00 MIN .00 23.0000 1. VAR AGE CODE. 1001.00 .00 23. 1.00 STD .0000 1.0000 1003.0000 1001.00 .0000 1.0000 1003.0000 1000.0000 1. PROC SUMMARY.0000 1003.00 23.00 32.00 MEAN . .00 . 1001. SET IN1.

label name=‘Associates*in the team’.00 25. sum age.5000 1003. BY : the Procedure prints a separate analysis for each variable in the BY group. sumby weight.5000 1003.0000 1003. PAGEBY : begins printing on a new page when the value of the specified BY variable changes.00 22.00 22.1213 . CONNECT TO DB2 (SSID=&SYS). N : prints the number of observations in the dataset at the end of printed output. if this option is used.0000 1003. var name age team weight. pageby weight.00 22.00 2.00 2. Page 50 of 57 . Print This Proc prints the observations in the dataset using some or all of the variables. They can also print totals and sub-totals for numeric variables.0000 1003.00 25.0000 1. SQL The SQL procedure implements the Structured Query Language ( SQL) for SAS version 6 onwards.0000 1003. SPLIT : Splits labels used as column headings across multiple lines where the split character appears. VAR : names the variables to be printed.SAS Notes 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 0 0 0 0 0 Y Y Y Y Y Y Y Y Y Y 0 0 0 0 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 MIN MAX MEAN STD N MIN MAX MEAN STD N MIN MAX MEAN STD N MIN MAX MEAN STD 22. The split character is not printed.0000 1.00 25.1213 . SUM : specifies the variables whose values is to be totaled. run.00 23.1213 .0000 1. 2.0000 1003 23.00 23.5000 1003. It is required that the dataset be sorted by BY variable. id name.00 23. 2.00 2. title ’ player details ’. by name.0000 1003. 2.1213 .5000 1003 2.0000 1003 25. Sample code PROC SQL. Proc print data = person split = ‘*’ label n . LABEL: variable labels are used as column headings instead of variable names.0000 1003.

. DISCONNECT FROM DB2.SLTKT_PUR_DT.STR_ID AND T..ISCT_CHRG_USER_NBR IN ('01'.VSTORE B. ST&OWN.. SI&OWN..ISCT_DT.ISCT_CHRG_STR_ID. SI&OWN. 0. T. ST&OWN. /* more variables …… */ T. 43 )..DRPT_ACPT_DT = &ED AND T.ISCT_CRED_USER_NBR.SAS Notes CREATE VIEW TEMP AS SELECT * FROM CONNECTION TO DB2 ( SELECT T... T.ISCT_CRED_STR_ID.ISCT_TOTL_AMT. T.ISCT_CRED_USER_NBR = '01' AND T. T..SLTKT_PUR_NBR. T.USER_NBR AND T.STR_DLY_RPT D WHERE T.ISCT_CRED_USER_NBR = S. T. %PUT &SQLXRC. ' '.VSTORE B. T.. ' '. T. T. %PUT &SQLXMSG.ISCT_CHRG_USER_NBR. ' '.ISCT_CHRG_TYP_CD = 'WTYRPR' AND T.ISCT_NBR. ORDER BY 22.STORE_TRSF SI&OWN.STORE_TRSF T.ISCT_DT. T. 2. MI&OWN.. 3.VSTORE S. /* more variables …… */ FROM SI&OWN. Page 51 of 57 .ISCT_CRED_STR_ID.ISCT_CRED_USER_NBR. SI&OWN.TRSF_RFND_LN R.VSTORE S. CURRENT DATE. SI&OWN.DRPT_SALE_DT UNION ALL SELECT T.SKU_VRSN V.ISCT_CHRG_TYP_CD.DRPT_SALE_DT = D. 0. T. T. 39. '' FROM SI&OWN.'22') AND T.ISCT_CRED_STR_ID = S. 18.STR_DLY_RPT D.ISCT_CHRG_STR_ID. ' '. 0.

RUN. %PUT &SQLXMSG. Another Example: PROC SQL.HHVIEW (IN=APNDS). missing values of BY variable Page 52 of 57 .HHLDID ORDER BY HHLDID.CBLHHLD. MERGE HHLD1(IN=BASEDS) WORK1. But through MERGE.HHLDID=APNDS.CBLHHLD AS SELECT * FROM HHLD1 AS BASEDS INNER JOIN WORK1. CREATE TABLE WORK2.HHVIEW AS APNDS ON BASEDS. IF BASEDS & APNDS THEN OUTPUT WORK2. IF _N_ = 1 THEN DO.SAS Notes QUIT. BY HHLDID. SET TEMP (RENAME=( ISCT_DT=ISCTDT ISCT_CRE=CREDUSR ISCT_CR0=CREDSTR ISCT_NBR=ISCTNBR ISCT_TOT=ISCTAMT ISCT_CHR=CHRGTYPE SLTKT_PU=ORGTKTDT SLTKT_P1=ORIGTKT SLTKT_PY=PAYTYPE ISCT_CH2=CHGTOUSR ISCT_CH3=CHGTOSTR JNL_ACCT=JACCTNBR ICRT_NBR=ICRTNBR SRVC_TKT=SERVINV SRVC_CMP=WARREPDT SLMKR_IN=SLMKINTL DRPT_ACP=DRACPTDT DRPT_SAL=DRDATE ISCT_RFN=RFNREASN ISCT_CUS=CUSTNAME ISCT_CMN=FFCOMENT EXPRESSN=SKUNBR /*rename all the variables from input dataset to a 8 character length name before reading in*/ EXPRES10=SKUSMDES EXPRES11=VCHRAMT EXPRES12=SLTKTDT EXPRES13=SLTKTNBR EXPRES14=SLTKTTM EXPRES15=ACCTNBR EXPRES16=VCHRDESC ACCTG_DI=ACCTDIV ACCTG_RE=ACCTREG ACCTG_17=ACCTDIST ACCTG_GR=ACCTGRP STR_TELE=STAREACD STR_TE18=STEXCHNO STR_TE19=STSTANO ISCT_C20=CHRTOUSR ISCT_C21=CHRTOSTR EXPRES22=CLASSCD DRPT_MAN=MANFLAG )) END=LAST.CBLHHLD. %PUT &SQLXMSG. TITLE "JOIN HOUSEHOLDS". END. %PUT &SQLXRC. %PUT &SQLXRC. DATA TEMP2. QUIT. RUN. This code is equivalent to the SAS MERGE step as: DATA WORK2. Note: Through PROC SQL it is not possible to get merged observations for records where the BY variable (WHERE clause) has values only in one dataset (or TABLE) as WHERE condition forces it to retrieve records where values coincide from the two tables.

So in order to simulate the SQL statement.SAS Notes are merged together to form observations in the output. the merge criteria should be for common values of both indicator variables. Page 53 of 57 .

So. To refer to the macro variable value the pattern &name (called macro variable reference) is used. Macro variable. constant text. except in data lines. the macro processor replaces a reference to a null value with 0 characters. &plot The statements have to be enclosed in %STR() function so that semicolons within the value are part of the text and not the end of the %let statement. Finally. Macro statements act on macro languages in certain ways. %put !!!Newdata!!!. It is also possible to create macro variable values that contain sections of SAS program as. The value of a macro variable is simply a string of characters. contains one value that remains constant until explicitly changed. run. Macros Page 54 of 57 . the Macro is a stored macrolanguage object. on the other hand.SAS Notes SAS Macro Language SAS Macro language is a language in its own rights. It is not part of the proper SAS language but can act on the SAS language. ). The simplest way to display macro variables is to use %PUT statement as. macro functions and macro operators. The macro processor resolves references in double quotes but not in single quotes. A null value assigned to a macro variable has a length of 0. You can define and use macro variables anywhere in a SAS program. The macro processor does not make a distinction between character and numeric values. Macro Variables Macro variables (or symbolic variables) belong to the SAS macro language and are different from Data step variables. Macroexpressions include macrovariables. It is a set of characters that are identified by name. %let plot= %str( proc print data = Newdata. %let dsn= Newdata. will be resolved as. Also. %put !!!&dsn!!!. the value of a dataset variable depends on the observation being processed. A Macro variable is independent of the SAS Data set. The simplest macrolanguage object is the macrovariable. TITLE “Display of dataset &dsn”.

usually WORK. To invoke a macro. Page 55 of 57 . The %MEND statement closes every macro. %dsn (work. place a % in front its name. proc print data =&lvar. SAS System does not support copying or renaming macros. %mend dsn.. The %MACRO statement must begin every macro and must contain a name for the macro.SAS Notes A macro is stored text identified by name. The macro processor matches the first value to the first macro variable name. The macro name can also appear after %MEND for clarity. the second to the second. etc. A macro is an entity in a utility catalog in a library.&fvar . A macro variable defined in parenthesis in a %MACRO statement is a macro parameter.fvar). as %macro dsn(lvar. run.sasdsn) This pattern is called a macro invocation or a macro call.

linear and non-linear systems simulation. The software can be used to develop computer-based training courses and on-line help systems. and updated by the SAS system. and presentation. planning and financial modeling.handles general assignment problems. There are two basic types of files created by the ACCESS procedure: ACCESS and VIEW. and determines minimum cost flow. SAS/ETS software is a component of the SAS System. create data entry applications that include cross validation of field values and other data manipulation. The data specified in a VIEW file can be used by the SAS System in much the same way as a SAS data set is used. The ACCESS procedure allows you to create access descriptors and view descriptors that can be used to operate on data in a DBMS (like DB2) table or view through the SAS system procedures. non-query SQL statements to the DBMS. SAS/GRAPH software offers device-intelligent color graphics for producing charts. You can also access data from many relational DBMSs using the SQL Procedure Pass-Through Facility. SAS/OR software is a component of the SAS System. Ready-to-use procedures construct menu screens from fill-in-the-blank panels. a component of the SAS System. SAS/OR software . query and letter writing. thus allowing you to read data from several DB2 tables at once. customize data presentation screens. When you create the view. analyzed. SAS/FSP software.an operations research. depreciation and row-and-column financial reporting. and shortest path through a network. loan amortization.SAS Notes Some other SAS Products SAS/CONNECT software is a cooperative processing product that through connections between remote SAS sessions. when the PROC SQL view is used in a SAS program. project management and decision support tool . any options that you specify in the corresponding CONNECT statement are also stored. The SQL Procedure Pass-Through facility consists of three statements and one component: CONNECT statement establishes a connection with the DBMS. However. performs critical path analysis and linear programming. The ACCESS procedure creates SAS files of type ACCESS and VIEW that can be used in SAS programs. The ACCESS file describes the data in the table to the SAS System. provides the ability to transfer data among operating platforms supported by the SAS System. SAS/ETS software is an econometric and time series analysis tool for forecasting. management. an applications system for data access. You use the CONNECTION TO component in the FROM clause of a PROC SQL SELECT statement to retrieve data directly from a DBMS. SAS/AF software is an interactive facility used to create user-friendly windowing applications. the SAS System can establish the appropriate connection to the DBMS. and transport screens between operating environments. You can also store your Pass-Through code in a PROC SQL view for later use. Thus. DISCONNECT statement terminates the connection with the DBMS. An ACCESS file is a file of descriptive information relating to a DB2 table. There are procedures for time series analysis. and present multiple graphs on a page. you cannot modify data read through a DB2 view. SAS/GRAPH software is a component of the SAS System. Users can customize graphs with the software. The SQL procedure is a base SAS procedure that works with SAS/ACCESS software to send and receive data directly between a DBMS and the SAS System. analysis. maximum flow. includes procedures for full-screen interactive data entry. You can use the Pass-Through Facility statements in a PROC SQL query or you can store them in a PROC SQL view. EXECUTE statement sends dynamic. It also provides remote submission capabilities that allow users to submit SAS code to another host for processing. With SAS/FSP software. VIEW files can identify a subset of the data described by the ACCESS file. These files describe a DB2 table or view to the SAS System so that the data contained in the table or view may be read. Page 56 of 57 . maps and plots in a variety of patterns. You can also create ACCESS files for DB2 views. used. editing. users can generate personalized form letters and reports.

and for experimental design. offers a wide range of capabilities including analysis of variance. Page 57 of 57 . Included are procedures for generating Shewhart. psychometric analysis. offers a variety of specialized tools for statistical quality improvement applications. cumulative sum and moving average control charts. for performing process capability analysis. survival analysis. "Concurrent update access" means that two or more users simultaneously update the same SAS file or SAS data library. regression. a component of the SAS System.SAS Notes SAS/QC software. a comprehensive statistical analysis tool. SAS/SHARE software is a component of the SAS System. SAS/SHARE software provides concurrent update access to SAS files. cluster analysis and nonparametric analysis. SAS/STAT software. multivariate analysis. categorical analysis. SAS/STAT software is a component of the SAS System.