You are on page 1of 35

Combining

Data Sets
Concatenating, Interleaving, and
Merging Data Sets
Combining Data Sets Vertically
Definition of Concatenating

SET Statement used for concatenation


Definition of Interleaving

Interleaving involves using


both SET and BY statements
Combining Data Sets Side By Side
Important
Definition of Merging

BY statement ensure
data sets are merged
According to values of
BY variable i.e. YEAR

Data sets matched by observation number Data sets matched by values of YEAR
Sales

Customer_support
Note: The variable ‘HomePhone’ is not in this data set
Obs from Data
Missing
SECURITY
values
(Sorted by Project)
(Sorted by Project)
Both data sets must be sorted by same variable before concatenating

Common By-variable
C H A P T E R 18

Merging combines observations from two or more SAS data sets


into a single observation in a new SAS data set.
Important

BY statement ensure
data sets are merged
According to values of
BY variable i.e. YEAR

Data sets matched by observation number Data sets matched by values of YEAR
Important

Data Company

The only ID variable is Name

Data Finance

The ID variables are Name & IdNumber


* Sorting data sets;
proc sort data = company; by name;
Proc sort data = finance; by name;
Run;

Data Employee_info
Finance

Repertory
(Both data sets are sorted by IdNumber)
Finance, company and Repertory have no variables common to all data sets
But pairs of data sets have common variables

Name is the common variable

Same data sorted with a different By-variable


in preparation for another merge

IdNumber is the common variable


Sometimes we need to combine one observation from one data set with
many observations from another

Example: Shoe store wants to offer discount on certain types of shoes

Data Shoes Data Discount


Style Type RegularPrice Type Adjustment
Both data sets must be first sorted by variable ‘type’

Proc sort data = shoes ;


by Type;

Proc sort data = Discount ;


by Type;

• One-to-many merge ;

Data Prices;
merge Shoes Discount ;
by Type ;

NewPrice = Round( (RegularPrice – adjustment * RegularPrice), 0.01);

run;

Homework: Look at ROUND function in SAS Help documentation


Find share of shoe-type in total sales

Data shoes – quarterly shoe sales

Style Type Sales

Type ;

Type ;

Calculating share in sales


Data ShoeSummary
In this case there is no common variable between data sets

SAS Program syntax

Read only in 1st iteration & retained


Data used in example

Style Type Sales

You might also like