Professional Documents
Culture Documents
Proc
The workhorses:
Matched
merge & concatenation. Data twothree; merge two three; run; Forgot BY statement, yields 1 to 1 merge instead of matched merge
1 to 1 merge caught with options mergenoby=error; Error messages in the LOG help to find mistakes Not all problems yield error messages: the note, repeats of BY values > many-to-many merge
serious problem can occur without any message in the LOG For example, data loss by truncation due to unequal variable lengths
Example: Concatenation
Concatenation
of 2 datasets named two and three, each input dataset contains: numeric variable named id.
A A
same two:var has a length of $2, all values begin with the character 2 three:var has a length of $3, all values begin with the character 3
twothree; set two three; run; Note: the output dataset name indicates the order of input datasets in SET statement.
Dataset twothree
id 1 2 3 4 5 6 7 var 21 23 24 25 27 2g 2k id 0 1 2 3 4 8 9 var 3k 31 35 34 35 37 3g
twothree:id=1 var =21 and var=31 In inputs: two:id=1 var=21 three:id=1 var=31a Value of var in twothree that came from three lacks the a, Why?
output:
Explanation:
Variable
length in output is determined by variable length in the first (leftmost) input dataset in the SET statement, two:var $2. 31a does not fit into a character variable of length $2, so the a is lost through truncation. LOG does not mention data loss.
would happen if we reverse the order of the datasets in the SET statement? data threetwo; set three two; run;
Dataset threetwo
id 0 1 2 3 4 8 9 var 3ks 31a 35a 34a 350 370 3gv id 1 2 3 4 5 6 7 var 21 23 24 25 27 2g 2k
merge: data from last dataset overwrites data from first dataset. Usually re-name variables with same name to avoid overwriting (except BY variables). Maybe you want matched values to overwrite, so you do not rename.
1 2
3 4
31 35
34 35
6 7
8 9
2g 2k
37 3g
Length of variable:
Determined
by first, or left most, dataset in MERGE statement. Again, the LOG is silent on data loss through truncation. What to do?
Solutions:
Complain
to SAS Until SAS corrects, use macro before merge or concatenation to check for unequal character variable lengths If found, re-size variables
Macro: verifyVariables.sas
Invoke
before merge or concatenation. %verifyVariables(two,three,set); data twothree; set two three; run;
contents on two and three, outputs results to datasets Merges datasets and uses datastep to find like named character variables It they have different lengths, puts ERROR statement into log.