You are on page 1of 28

Concatenating SAS Data Sets

Objectives
 Define concatenation.
 Use the SET statement in a DATA step to
concatenate two or more SAS data sets.
 Use the RENAME= data set option to change the
names of variables.
 Use the SET and BY statements in a DATA step to
interleave two or more SAS data sets.

2
Concatenating SAS Data Sets
Use the SET statement in a DATA step to concatenate
SAS data sets.

DATA SAS-data-set ;
SET SAS-data-set1 SAS-data-set2 . . .;
<other SAS statements>
RUN;

3
Concatenating SAS Data Sets
You can read any number of SAS data sets with a single
SET statement.

SAS data sets


jan work.qtr1
data work.qtr1; jan
feb set work.jan work.feb feb
work.mar;
run; mar
mar

4 ...
Business Task
Two SAS data sets, na1 and na2, contain data for newly
hired navigators. Concatenate the data sets into a new
data set named newhires.

na1 na2
Name Gender JobCode Name Gender JobCode
TORRES M NA1 LISTER M NA2
LANG F NA1 TORRES F NA2
SMITH F NA1

5
Concatenating SAS Data Sets: Execution
When SAS reaches end-of-file on the last data set,
DATA step execution ends.

newhires
Name Gender JobCode
TORRES M NA1
LANG F NA1
SMITH F NA1
LISTER M NA2
TORRES F NA2

6
Business Task
Two SAS data sets, fa1 and fa2, contain data for newly
hired flight attendants. Concatenate the data sets into a
new data set named newfa.
fa1 fa2
Name Gender JobCode Name JCode Gender
KENT F FA1 LOPEZ FA2 F
PATEL M FA1 GRANT FA2 F
JONES F FA1

7
Concatenating SAS Data Sets: Execution
fa1 fa2
Name Gender JobCode Name JCode Gender
KENT F FA1 LOPEZ FA2 F
PATEL M FA1 GRANT FA2 F
JONES F FA1
data newfa;
set fa1 fa2;
run;
newfa
Name Gender JobCode JCode
KENT F FA1
PATEL M FA1
JONES F FA1
LOPEZ F FA2
GRANT F FA2

8 ...
The RENAME= Data Set Option
You can use a RENAME= data set option to change the
name of a variable.

General form of the RENAME= data set option:

SAS-data-set(RENAME=(old-name-1=new-name-1
old-name-2=new-name-2
.
.
.
old-name-n=new-name-n))

9
The RENAME= Data Set Option
fa1 fa2
Name Gender JobCode Name JCode Gender
KENT F FA1 LOPEZ FA2 F
PATEL M FA1 GRANT FA2 F
JONES F FA1
data newfa;
set fa1 fa2(rename=(JCode=JobCode));
run;
newfa
Name Gender JobCode
KENT F FA1
PATEL M FA1
JONES F FA1
LOPEZ F FA2
GRANT F FA2
10 ...
Interleaving SAS Data Sets
Use the SET statement with a BY statement in a DATA
step to interleave SAS data sets.
General form of a DATA step interleave:

DATA SAS-data-set;
SET SAS-data-set1 SAS-data-set2 . . .;
BY BY-variable;
<other SAS statements>
RUN;

11
Interleaving SAS Data Sets
Interleaving SAS data sets simply concatenates SAS data
sets so the observations in the resulting data set are in
order.
ia.miamiemp
ID Salary
109 36000 work.allemp
171 54000 data work.allemp; ID Salary
ia.parisemp 059 60000
set ia.miamiemp 083 87000
ID Salary ia.parisemp
083 87000 109 36000
217 42000 ia.romeemp; 154 88000
by ID; 171 54000
ia.romeemp run; 217 42000
ID Salary
059 60000
154 88000
12 c08s1d4 ...
Interleaving SAS Data Sets
fa1 fa2
Name Gender JobCode Name JCode Gender
JONES F FA1 GRANT FA2 F
KENT F FA1 LOPEZ FA2 F
PATEL F FA1
data newfa;
set fa1 fa2(rename=(JCode=JobCode));
by Name;
run;
newfa
Name Gender JobCode
GRANT F FA2
JONES F FA1
KENT F FA1
LOPEZ F FA2
PATEL M FA1
13 ...
Merging SAS Data Sets
Objectives
 Prepare data for merging using the SORT procedure
and data set options.
 Merge SAS data sets on a single common variable.

15
Merging SAS Data Sets
Use the MERGE statement in a DATA step to join
corresponding observations from two or more SAS data
sets.

DATA SAS-data-set;
MERGE SAS-data-sets;
BY BY-variable(s);
<other SAS statements>
RUN;

16
Merging SAS Data Sets
You can read any number of SAS data sets with a single
MERGE statement.
SAS data sets
costs sales goals taxes

data compare;
merge costs sales goals taxes;
by Month;
run;

compare
costs sales goals taxes

17
Business Task

International Airlines is
comparing monthly sales
performance to monthly sales
goals.

The sales and goals data are


stored in separate SAS data
sets.

18
Business Task
To calculate the difference between revenues and goals, the
performance and goals data sets must be merged.
ia.performance ia.goals
Month Sales Month Goal
1 2118223 1 2130000
2 1960034 2 1920000
Match-merge the data sets by
Month and compute the
difference between the variable
values for Sales and Goal.
ia.compare
Month Sales Goal Difference
1 2118223 2130000 -11777
2 1960034 1920000 40034
19
Business Task
Merge two data sets to acquire the names of the German
crew who are scheduled to fly next week.
ia.gercrew ia.gersched
EmpID LastName EmpID FlightNum
E00632 STRAUSS E04064 5105
E01483 SCHELL-HAUNGS E00632 5250
E01996 WELLHAEUSSER E01996 5501
E04064 WASCHK

To match-merge the data sets by EmpID, the data sets


must be ordered by EmpID.

20
Merging SAS Data: Execution
ia. EmpID LastName EmpID FlightNum work.
gercrew E00632 STRAUSS E00632 5250 gersched
E01483 SCHELL-HAUNGS E01996 5501
E01996 WELLHAEUSSER E04064 5105
E04064 WASCHK

data work.nextweek;
merge ia.gercrew work.gersched;
by EmpID;
run;

work. EmpID LastName FlightNum


nextweek E00632 STRAUSS 5250
E01483 SCHELL-HAUNGS
E01996 WELLHAEUSSER 5501
E04064 WASCHK 5105
21 ...
Eliminating Nonmatches
Exclude from the data set crew members who are not
scheduled to fly next week.
ia.gercrew work.gersched
EmpID LastName EmpID FlightNum
E00632 STRAUSS E00632 5250
E01483 SCHELL-HAUNGS E01996 5501
E01996 WELLHAEUSSER E04064 5105
E04064 WASCHK

22
The IN= Data Set Option
Use the IN= data set option to determine which data
set(s) contributed to the current observation.

General form of the IN= data set option:

SAS-data-set(IN=variable)

Variable is a temporary numeric variable that has


two possible values:
0 indicates that the data set did not contribute to the
current observation.
1 indicates that the data set did contribute to the current
observation.

23
The IN= Data Set Option
EmpID LastName EmpID FlightNum
ia. E00632 STRAUSS E00632 5250 work.
E01483 SCHELL-HAUNGS E01996 5501
gercrew E01996 WELLHAEUSSER E04064 5105
gersched
E04064 WASCHK
data work.combine;
merge ia.gercrew(in=InCrew)
work.gersched(in=InSched);
by EmpID;
run;
D D
PDV EmpID LastName FlightNum InCrew InSched
E00632 STRAUSS 5250 1 1

24 ...
Other Merges
In addition to one-to-one merges, the DATA step merge
works with many other kinds of data combinations:
one-to-many unique BY values are in one
data set and duplicate
matching BY values are in the
other data set.

many-to-many duplicate matching BY values


are in both data sets.

25
One-to-Many Merging
work.one work.two
X Y X Z
1 A 1 A1
2 B 1 A2
3 C 2 B1
3 C1
3 C2
data work.three; work.three
X Y Z
merge work.one work.two;
by X; 1 A A1
run; 1 A A2
2 B B1
3 C C1
3 C C2
26 ...
Many-to-Many Merging
work.one work.two
X Y X Z
1 A1 1 AA1
1 A2 1 AA2
2 B1 1 AA3
2 B2 2 BB1
2 BB2

data work.three; work.three


merge work.one work.two; X Y Z
by X; 1 A1 AA1
run; 1 A2 AA2
1 A2 AA3
2 B1 BB1
2 B2 BB2
27 ...
Exercises

1. Concatenating SAS Data Sets


2. Merging SAS Data Sets
3. Identifying Data Set Contributors

28