You are on page 1of 22

SASTechies

info@sastechies.com
http://www.sastechies.com
data finance.duejan;
set finance.loans;
Interest=amount*(rate/12);
run;

SAS Data Set Finance.Loans

Account Amount Rate Months Payment


101-1092  22000 0.1000     60   467.43
101-1731 114000  0.0950   360   958.57
101-1289  10000   0.1050     36   325.02
101-3144    3500  0.1050     12   308.52

SAS Techies 2009 11/02/21 2


 Each time the SET statement is executed, SAS
reads one observation into the program data
vector. SET reads all variables and all
observations from the input data sets unless
you tell SAS to do otherwise. A SET statement
can contain multiple data sets; a DATA step
can contain multiple SET statements.

 SET<SAS-data-set(s) <(data-set-options(s) )
>>
<options>;

SAS Techies 2009 11/02/21 3


data lab23.drug1h;
set research.cltrials;
if placebo='YES';
run;

data lab23.drug1h;
set research.cltrials;
Where placebo='YES';
run;

data lab23.drug1h;
set research.cltrials (
Where=( placebo='YES‘));
data lab23.drug1h;
run;
set A C;
run;

SAS Techies 2009 11/02/21 4


data  If you don't process certain
lab23.drug1h(drop=placeb variables and you don't
o); want them to appear in the
set research.cltrials new data set, specify them
(drop=triglycerides in the DROP= option in the
uricacid); SET statement.
if placebo='YES';
run;  If you do need to process a
variable in the original data
set (in a subsetting IF
statement, for example),
data you must specify the
lab23.drug1h(drop=placeb variable in the DROP=
o); option in the DATA
set research.cltrials statement. Otherwise, the
(drop=triglycerides uricacid statement that is using the
placebo); variable for processing
if placebo='YES'; causes an error.
run;

SAS Techies 2009 11/02/21 5


Proc sort data=a;by num;
Proc sort data=b;by num;
data sharad;
set a b; data sharad;
run; merge a b;
by num;
run;
SAS Techies 2009 11/02/21 6
 The DATA step provides a large number of other programming
features for manipulating data sets. For example, you can

◦ use IF-THEN/ELSE logic to control processing based on one


or more conditions
◦ specify additional data set options
◦ perform calculations
◦ create new variables
◦ process variables in arrays
◦ use SAS functions
◦ use special variables such as FIRST. and LAST. to control
processing.

You can also combine SAS data sets in other ways, including
match merging, interleaving, one-to-one merging, and
updating.

SAS Techies 2009 11/02/21 7


 produces an output data
set that contains values
from all observations in
all input data sets.
DATA output-SAS-data-set;
 In DATA step match-
MERGE SAS-data-set-1 SAS-data-
merging, all data sets to
set-2; be merged must be
BY variable(s); sorted or indexed by the
RUN; values of BY variable
 The common variable
must have the same type
  and length in all data
You can specify any number
 
 
of input data sets in the MERGE sets to be merged.
statement.
 
 

SAS Techies 2009 11/02/21 8


PROC SORT <DATA=SAS-
Obs ID Age Sex Date data-set> <OUT=SAS-
1 A001 21 m 05/22/75 data-set> <options>;  
2 A001 32 m 06/15/63
BY variable(s);
3 A003 24 f 08/17/72
4 A004 .   03/27/69 RUN;
5 A005 44 f 02/24/52
6 A007 39 m 11/11/57
Interesting options
-nodupkey
Obs
cc
ID Age Sex Date
-noduprecs
1 A001 21 m 05/22/75
-where statement
2 A001 21 m 05/22/75
3 A003 24 f 08/17/72
4 A004 .   03/27/69 Note: If you don't use the OUT= option,
5 A005 44 f 02/24/52 PROC SORT permanently sorts the data set
specified in the DATA= option
6 A007 39 m 11/11/57

SAS Techies 2009 11/02/21 9


data clinic.combined; (RENAME=(old-variable-
merge clinic.demog name=new-variable-
(rename=(date=BirthDate)) name))    where
clinic.visit the RENAME= option, in
(rename=(date=VisitDate)); parentheses, follows the
by id; name of each data set that
If Birthdate=’05Mar2005’d; contains one or more
Rename birthdate=somedate; variables to be renamed
run; old-variable-name names the
variable to be renamed
Note: when you rename you should new-variable-name specifies
be using the new name in that
datastep. the new name for the
variable.

You can rename any number


of variables in each
occurrence of the RENAME=
option.

SAS Techies 2009 11/02/21 10


(IN=variable)   where
the IN= option, in parentheses,
follows the data set name
variable names the variable to
be created.

 the IN= data set option to create


and name a variable that
indicates whether the data set
data combined; contributed data to the current
merge clients (in=A) observation
Amounts(in=B);
by Name;  the subsetting IF statement to
If A and B; check the IN= values and output
run; only those observations that
appear in the data sets for which
Note:If the expression is true for the IN= is specified.
observation, the current observation
is written to the output data set.

SAS Techies 2009 11/02/21 11


 The Compilation Phase: Setting Up
the New Data Set

To prepare to merge data sets, SAS


software
◦ reads the descriptor portions of
data sets listed in the MERGE
statement
◦ reads the remainder of the
DATA step program
◦ creates the program data vector
(PDV)
◦ assigns a tracking pointer to
each data set listed in the
MERGE statement.
◦ If variables with the same name
appear in more than one data
set, the variable from the first
data set that contains the
variable (in the order listed in
the MERGE statement)
determines the length of the
variable.

SAS Techies 2009 11/02/21 12


 The Execution Phase:
After compiling the DATA step, SAS
software sequentially match-merges
observations by moving the pointers
down
each observation of each data set and
checking to see whether the BY values
match.

◦ If Yes, the observations are written to the PDV


in the order the data sets appear in the MERGE
statement. (Remember that values of any like-
named variable are overwritten by values of
the like-named variable in subsequent data
sets.) SAS software writes the combined
observation to the new data set and retains the
values in the PDV until the BY value changes in
all the data sets.

SAS Techies 2009 11/02/21 13


 If No, SAS software
determines which of
the values comes
first and writes the
observation
containing this value
to the PDV. Then the
observation is written
to the new data set.

SAS Techies 2009 11/02/21 14


 When the BY value
changes in all the input
data sets, the PDV is
initialized to missing.

 The DATA step merge


continues to process
every observation in each
data set until it exhausts
all observations in all
data sets.

SAS Techies 2009 11/02/21 15


 Handling Unmatched Observations and
Missing Values By default, all observations
written to the PDV, including observations
with missing data and no matching BY
values, are written to the output data set.
(If you specify a subsetting IF statement to
select observations, only those that meet
the IF condition are written.)
 If an observation contains missing values
for a variable, the observation in the
output data set contains the missing
values as well. Observations with missing
values for the BY variable appear at the top
of the output data set.
 If an input data set doesn't have any
observations for a given value of the
common variable, the observation in the
output data set contains missing values
for the variables unique to that input data
set.

SAS Techies 2009 11/02/21 16


SAS Techies 2009 11/02/21 17
 The DATA step provides a large number of
other programming features for manipulating
data sets during match-merging. For example,
you can
 use IF-THEN/ELSE logic to control processing
based on one or more conditions
 specify additional data set options
 perform calculations
 create new variables
 process variables in arrays
 use SAS functions
 use special variables such as FIRST. and LAST.
to control processing.

SAS Techies 2009 11/02/21 18


options pageno=1 nodate linesize=80
pagesize=60;
data testfile;  When an observation is the first in a
Set some; BY group, SAS sets the value of
by Drug Rx; FIRST.variable to 1 for the variable
If first.Drug then TRx=0; whose value changed, as well as for
TRx+Rx; all of the variables that follow in the
If last.Drug then output; BY statement. For all other
Run; observations in the BY group, the
value of FIRST.variable is 0.
Drug Rx
A 10 Output Testfile
A 11 Drug TRx  Likewise, if the observation is the last
B 11 A 21 in a BY group, SAS sets the value of
B 12 B 23 LAST.variable to 1 for the variable
whose value changes on the next
observation, as well as for all of the
FIRST.Drug FIRST.Rx LAST.Drug LAST.Rx variables that follow in the BY
statement. For all other observations
in the BY group, the value of
1 1 0 1 LAST.variable is 0. For the last
observation in a data set, the value
0 1 1 1 of all LAST.variable variables are set
to 1.
1 1 0 1

0 1 1 1

SAS Techies 2009 11/02/21 19


 System options apply to the datasets,  Dataset options applies to that
output for the entire session. particular dataset only.

 Can be overridden by Dataset  CANNOT be overridden by system


options options.

 Can be declared anywhere except  Can be declared only with the


within Datalines/cards statements dataset options

 Ex: options compress=yes, obs=max  Ex:


data new;
 Ex: Set cool (obs=10,compress=yes);
Options compress=no obs=max; Run;
data new;
Set cool (obs=10,compress=yes);
Run;

SAS Techies 2009 11/02/21 20


 GOTO label;  LINK label;

 The GOTO statement tells SAS  The LINK statement tells SAS to
to jump immediately to the jump immediately to the
statement label that is statement label that is
indicated in the GOTO indicated in the LINK statement
statement and to continue and to continue executing
executing statements from statements from that point
that point until a RETURN until a RETURN statement is
statement is executed. executed.

 A RETURN statement after a  The RETURN statement sends


GO TO statement returns program control to the
execution to the beginning of statement immediately
the next DATA step iteration following the LINK statement.

SAS Techies 2009 11/02/21 21


Goto Statement LINK Statement

data hydro; input type $ depth station $;


data info; if type ='aluv' then link calcu;
input x; date=today();
if 1<=x<=5 then goto add; return;
put x=;
return; calcu:
if station='site_1' then elevatn=6650-depth;
add: else if station='site_2' then elevatn=5500-depth;
sumx+x; return;
return;
datalines;
datalines; aluv 523 site_1
7 uppa 234 site_2
4 aluv 666 site_2 ...
323 ; more data lines... ;
Run;

SAS Techies 2009 11/02/21 22

You might also like