You are on page 1of 41

Functionality of PROC SQL in SAS

Anne Wolfley
Senior Programmer/Analyst
February 9, 2015

Topics for todays presentation

What to expect from this presentation


Why PROC SQL?
Syntax
Aliases
Joins and unions
Case logic
Summary functions
Macro variables
Resources and references

All Rights Reserved, Duke Medicine 2007

What to expect from this presentation


Basic information about several topics
Code to get you started
Resources and references so you can explore on
your own

All Rights Reserved, Duke Medicine 2007

Why PROC SQL?


The power of SQL with SAS functionality
Only way to correctly do a many-to-many merge
Dont need to sort datasets in order to merge (join)
them
Quick reporting
Summary functions are quick and easy
Create data-driven macro variables on the fly (great
with summary functions!)

All Rights Reserved, Duke Medicine 2007

Basic syntax
proc sql;
create table output-dataset as
select comma-separated variable list
from input-dataset
where
order by ;
quit;

Create table if you want an output dataset.


Eliminate this step if you just want to print to your .lst
file.
All Rights Reserved, Duke Medicine 2007

Quick report
proc sql;
select name, age, height, weight
from sashelp.class
where sex=F
order by name;
quit;
------------------------------------------------proc sort data=sashelp.class out=class; by name; run;
proc print data=class noobs;
var name age height weight;
where sex=F;
quit;

All Rights Reserved, Duke Medicine 2007

Quick report - Results


Name
Age
Height
Weight
-------------------------------------Alice
13
56.5
84
Barbara
13
65.3
98
Carol
14
62.8
102.5
Jane
12
59.8
84.5
Janet
15
62.5
112.5
Joyce
11
51.3
50.5
Judy
14
64.3
90
Louise
12
56.3
77
Mary
15
66.5
112

All Rights Reserved, Duke Medicine 2007

Quick note: Using SAS functionality to


achieve the same results
proc sql;
select name, age, height, weight
from sashelp.class (where=(sex=F))
order by name;
quit;

All Rights Reserved, Duke Medicine 2007

SELECT UNIQUE / SELECT DISTINCT:


Selects only the unique values of variables in the select clause.
proc sql;
select unique sex
from sashelp.class;
quit;

OR
proc sql;
select distinct sex
from sashelp.class;
quit;

All Rights Reserved, Duke Medicine 2007

SELECT UNIQUE - Results


Sex
----F
M

All Rights Reserved, Duke Medicine 2007

Joins
PROC SQL join = SAS data step merge*
Types of joins
Full join: All observations from both datasets
Inner join: Observations matched in both datasets
Left join: All observations from the left dataset +
matching observations from the right dataset
Right join: All observations from the right dataset +
matching observations from the left dataset (same
as a left join, just referencing the dataset listed on
the right side instead of the left side)
* Except for many-to-many merges
All Rights Reserved, Duke Medicine 2007

Full join: All observations from both datasets


proc sql;
create table patientdata as
select coalesce(ds1.patient, ds2.patient), ds1.age,
ds2.name
from ds1 full join ds2
on ds1.patient = ds2.patient
order by ds1.patient;
quit;
-------------------------------------------------------------proc sort data=ds1; by patient; run;
proc sort data=ds2; by patient; run;
data patientdata;
merge ds1 ds2;
by patient;
keep patient age name;
proc sort;
by patient;
run;
All Rights Reserved, Duke Medicine 2007

Inner join: Observations matched in both datasets


proc sql;
create table patientdata as
select ds1.patient, ds1.age, ds2.name
from ds1, ds2
where ds1.patient = ds2.patient
order by ds1.patient;
quit;
-------------------------------------------------------------proc sort data=ds1; by patient; run;
proc sort data=ds2; by patient; run;
data patientdata;
merge ds1(in=a) ds2(in=b);
by patient;
if a and b;
keep patient age name;
proc sort;
by patient;
run;
All Rights Reserved, Duke Medicine 2007

Left join: All observations in one dataset +


observations matched in other dataset
proc sql;
create table patientdata as
select ds1.patient, ds1.age, ds2.name
from ds1 left join ds2
on ds1.patient = ds2.patient
order by ds1.patient;
quit;

-------------------------------------------------------------proc sort data=ds1; by patient; run;


proc sort data=ds2; by patient; run;
data patientdata;
merge ds1(in=a) ds2;
by patient;
if a;
keep patient age name;
proc sort;
by patient;
run;
All Rights Reserved, Duke Medicine 2007

Aliases
Aliases are nicknames for datasets, used as a
shortcut
proc sql;
create table patientdata as
select a.patient, a.age, b.name
from sasdata.patientage a, sasdata.patientname b
where a.patient = b.patient
order by a.patient;
quit;

All Rights Reserved, Duke Medicine 2007

Many-to-many join (Cartesian join)


data ds1;
input patient :8. date1 :date9.;
format date1 date9.;
datalines;
12345 '09DEC2014'
12345 '01JAN2015'
12345 '15JAN2015'
;
run;
data ds2;
input patient :8. date2 :date9. visit :8.;
format date2 date9.;
datalines;
12345 '08DEC2014' 1
12345 '01JAN2015' 2
12345 '09JAN2015' 3
12345 '16JAN2015' 4
;
run;
All Rights Reserved, Duke Medicine 2007

Many-to-many join (contd)


proc sql;
create table manytomany as
select coalesce(a.patient,b.patient) as patient,
date1, date2, visit
from ds1 a, ds2 b
where a.patient = b.patient
order by patient, date1, date2;
quit;

All Rights Reserved, Duke Medicine 2007

Many-to-many join (contd)


patient

date1

date2

visit

12345
12345
12345
12345
12345
12345
12345
12345
12345
12345
12345
12345

09DEC2014
09DEC2014
09DEC2014
09DEC2014
01JAN2015
01JAN2015
01JAN2015
01JAN2015
15JAN2015
15JAN2015
15JAN2015
15JAN2015

08DEC2014
01JAN2015
09JAN2015
16JAN2015
08DEC2014
01JAN2015
09JAN2015
16JAN2015
08DEC2014
01JAN2015
09JAN2015
16JAN2015

1
2
3
4
1
2
3
4
1
2
3
4

All Rights Reserved, Duke Medicine 2007

Union
proc sql;
create table patientdata as
select patient, age, name
from ds1
union
select patient, age, name
from ds2
order by patient;
quit;

(Use UNION ALL instead of UNION to prevent removal of duplicate rows)


---------------------------------------------------------------------data patientdata;
set ds1 ds2;
keep patient age name;
proc sort;
by patient;
run;

All Rights Reserved, Duke Medicine 2007

Creating a new variable and controlling


variable attributes - Syntax
proc sql;
create table newvar as
select name, age format=8.1,
Rolling Green Elementary as school length=100
label=School Name
from sashelp.class
order by name;
quit;

All Rights Reserved, Duke Medicine 2007

Case Logic - Syntax


case when logical-expression1 then new-variable-value1 when
logical-expression2 then new-variable-value2 when else newvariable-value3 end as new-variable

proc sql;
create table caselogic as
select name, age, sex,
case when sex=F then Female
when sex=M then Male
else
end as sex2
from sashelp.class
order by name;
quit;

Equivalent to if/then/else clause in SAS data step.


All Rights Reserved, Duke Medicine 2007

Case Logic Results


Name

Age

Alfred
Alice
Barbara
Carol
Henry
James
Jane
Janet
Jeffrey
John
Joyce
Judy
Louise
etc

14.0
13.0
13.0
14.0
14.0
12.0
12.0
15.0
13.0
12.0
11.0
14.0
12.0

All Rights Reserved, Duke Medicine 2007

School Name
Rolling
Rolling
Rolling
Rolling
Rolling
Rolling
Rolling
Rolling
Rolling
Rolling
Rolling
Rolling
Rolling

Green
Green
Green
Green
Green
Green
Green
Green
Green
Green
Green
Green
Green

Elementary
Elementary
Elementary
Elementary
Elementary
Elementary
Elementary
Elementary
Elementary
Elementary
Elementary
Elementary
Elementary

IFC and IFN - Syntax


ifc(logical-expression, value-if-true, value-if-false,
value-if-missing) as new-variable

proc sql;
create table ifc as
select name, age, sex
ifc(sex=F,Female,Male,) as sex2
from sashelp.class
order by name;
quit;

Similar to case when/then/else but only works for binaries.


IFC is for resultant character variables (e.g., sex2 above); IFN
is for resultant numeric variables.
All Rights Reserved, Duke Medicine 2007

IFC and IFN Results (look familiar?)


Name
Alfred
Alice
Barbara
Carol
Henry
James
Jane
Janet
Jeffrey
John
Joyce
Judy
Louise
etc
All Rights Reserved, Duke Medicine 2007

Age
14
13
13
14
14
12
12
15
13
12
11
14
12

Sex

sex2

M
F
F
F
M
M
F
F
M
M
F
F
F

Male
Female
Female
Female
Male
Male
Female
Female
Male
Male
Female
Female
Female

Summary functions
Summary functions summarize the data vertically, like PROC
MEANS or PROC UNIVARIATE.
Full list of functions can be found here.
MEDIAN is not an available summary function. You must use
PROC MEANS or PROC UNIVARIATE.

AVG|MEAN: arithmetic mean or average of values


COUNT|FREQ|N: number of non-missing values
MAX: largest value
MIN: smallest value
NMISS: number of missing values
STD: standard deviation
SUM: sum of values

All Rights Reserved, Duke Medicine 2007

Summary functions Basic syntax


proc sql;
create table mean_age as
select mean(age) as mean_age
from sashelp.class;
quit;
------------------------------------------------proc univariate noprint data=sashelp.class;
var age;
output out=mean_age mean=mean_age;
run;

All Rights Reserved, Duke Medicine 2007

Summary functions Results


PROC SQL
mean_sql
13.3158
-----------------------------------------------------------PROC UNIVARIATE
mean_uni
13.3158

All Rights Reserved, Duke Medicine 2007

Summary functions Using GROUP BY


proc sql;
create table mean_sql as
select sex, mean(age) as mean_sql
from sashelp.class
group by sex;
quit;
------------------------------------------------proc univariate noprint data=sashelp.class;
class sex;
var age;
output out=mean_uni mean=mean_uni;
run;

All Rights Reserved, Duke Medicine 2007

Summary functions Results


PROC SQL
Sex

mean_sql

F
13.2222
M
13.4000
------------------------------------------------PROC UNIVARIATE
Sex

F
M

All Rights Reserved, Duke Medicine 2007

mean_uni

13.2222
13.4000

Summary functions CAUTION!


Be careful that you do not confuse a SQL summary function with a
SAS function! If you list more than one variable within the
parentheses then it is a SAS function.
proc sql;
select mean(height,weight) as mean
from sashelp.class;
Quit;

This gives you the mean of the height and the weight for each
observation in the dataset.

All Rights Reserved, Duke Medicine 2007

Summary functions CAUTION! (contd)


If you want mean height and mean weight across all
observations, then
proc sql;
select mean(height) as mean_height, mean(weight)
as mean_weight
from sashelp.class;
Quit;

All Rights Reserved, Duke Medicine 2007

Macro variables
proc sql noprint;
select mean(age)
into :mean_age
from sashelp.class;
quit;

%put MEAN_AGE = &mean_age;

Results in log:
MEAN_AGE = 13.31579

All Rights Reserved, Duke Medicine 2007

Multiple macro variables


proc sql noprint;
select mean(age), min(age), max(age)
into :mean_age, :min_age, :max_age
from sashelp.class;
quit;
%put MEAN_AGE = &mean_age / MIN_AGE = &min_age / MAX_AGE =
&max_age;

Results in log:
MEAN_AGE = 13.31579 / MIN_AGE =

All Rights Reserved, Duke Medicine 2007

11 / MAX_AGE =

16

Macro variable tip (SAS v9.3 and up)


Remove leading spaces (call symputx)
proc sql noprint;
select mean(age), min(age), max(age)
into :mean_age trimmed, :min_age trimmed,
:max_age trimmed
from sashelp.class;
quit;

%put MEAN_AGE = &mean_age / MIN_AGE = &min_age / MAX_AGE =


&max_age;

Results in log:
MEAN_AGE = 13.31579 / MIN_AGE = 11 / MAX_AGE = 16

All Rights Reserved, Duke Medicine 2007

Macro variable lists


proc sql noprint;
select age
into :ages
separated by ", "
from sashelp.class;
quit;

%put AGES = &ages;

Results in log:
AGES = 14, 13, 13, 14, 14, 12, 12, 15, 13, 12, 11, 14, 12, 15,
16, 12, 15, 11, 15

All Rights Reserved, Duke Medicine 2007

Resources and References


SAS website:
http://support.sas.com/documentation/cdl/en/sqlproc/
65065/HTML/default/viewer.htm#n1oihmdy7om5rmn
1aorxui3kxizl.htm
Lex Jansen website: www.lexjansen.com
Top 10 Most Powerful Functions for PROC SQL
Ready To Become Really Productive Using PROC
SQL?

All Rights Reserved, Duke Medicine 2007

Summary functions using HAVING clause


You have a dataset with multiple records per
PATIENT, each with different DATE values. You
want to select the most recent date per patient.
patient
12345
12345
12345
23456
23456
23456
23456

All Rights Reserved, Duke Medicine 2007

date
15JUN2012
18SEP2013
19JUN2014
01FEB2011
03MAR2012
15FEB2013
08FEB2014

Summary functions using HAVING clause


(contd)
proc sql;
select patient, date
from ds1
group by patient
having date=max(date);
quit;
patient
date
------------------12345 19JUN2014
23456 08FEB2014

All Rights Reserved, Duke Medicine 2007

Querying SAS Views (proc contents)


Dictionary tables (information about the dataset, e.g.
creation date, number of observations)
Dictionary columns (information about the variables
in the dataset, e.g. length, format, label)

All Rights Reserved, Duke Medicine 2007

Querying SAS Views Example 1


Get dataset creation date and write it to a macro
variable (to use in the file name, for example).
proc sql noprint;
select datepart(crdate) format=date9.
into :datadate
from dictionary.tables
where libname=SASHELP and memname=CLASS;
quit;
%put &datadate;

All Rights Reserved, Duke Medicine 2007

Querying SAS Views Example 2


Figure out whether a variable is numeric or character.
proc sql;
select type
from dictionary.columns
where libname=SASHELP and memname=CLASS and
name=Age;
quit;

Results:
Column
Type
-----num

All Rights Reserved, Duke Medicine 2007