You are on page 1of 122

SAS Interview Questions:Base SAS

What is SAS? 

 SAS, Statistical Analysis System, is an integrated set of software products.


 SAS enables the developers to perform:
Data entry, retrieval, mining and management.
Writing reports and graphics generation
Business planning, decision support and forecasting
Improvement of quality
Operations research and project management
Statistical analysis
Remote and platform independent computing 

SAS - What are the special input delimiters? How are they used? 

What are the special input delimiters? How are they used? 

 DLM and DSD are the input delimiters


 They are used in the statement ‘infile’
 Comma Separated Value files are the most common files that are used for reading with
DSD option
 If two delimiters are provided, DSD treats as MISSING value.
 DSD ignores the delimiters which are enclosed in quotation marks. 

Difference between an informat and a format 

 Format is to write data, where as informat is to read data


 Comma, dollar and date are the informats
 MMDDYYw, DATEw, TIMEw,PERCENTw are informats

Describe any three SAS functions 

 LENGTH: The length of an argument is returned without counting the trailing blanks. 
Ex:
animal=’my cat’;
len=LENGTH(animal); 
Result is - 6
 SUBSTR: The SUBSTR function extracts a substring from a given argument starting
at a given ‘position’ for ‘n’ characters or until the end if ‘n’ is not specified
Ex:
data dsn;
value =’(916)734-6241’;
substring=SUBSTR(value,2,3); 
Result is - 916
 TRIM: It removes the trailing blanks from a given character expression
Ex:
Str1 = ‘my’;
Str2 = ‘cat’;
Result = TRIM(Str1)(Str2);
Result = ‘mycat’ 

SELECT construct is used instead of IF statements 

 When there is a long series of mutually existing conditions


 And comparison is numeric
 Because the CPU time is reduced by using SELECT.

SELECT GROUP:

Select: Beginning of the SELECT GROUP.


When: Identifies SAS statements meant for execution when a condition is true
Otherwise (optional): A statement is specified if no WHEN condition is met. 
End: Ends a SELECT group. 

How to code a merge that will write the matches of both to one data set, the non-matches from
the left-most data. 

 Step 1: Define three datasets in DATA step


 Assign values of IN statement to different variables for two datasets
 Check for the condition using IF statement
 Check for output the matching to the first dataset
 Check for no matches to various datasets. 

What is Program Data Vector (PDV)? What are its functions? 

 PDV is a logical area in the memory


 SAS creates a dataset one observation at a time
 Input buffer is created at the time of compilation, for holding a record from external
file
 PDV is created followed by the creation of input buffer
 SAS builds dataset in the PDV area of memory 

Explain about INFILE options 

 FLOWOVER is the default option on INFILE statement.


 When the INPUT statement reaches the end of non-blank characters, without filling all
the variables, a new line is read into the Input Buffer.
 INPUT attempts to fill the rest of the variables right from starting column one.
 Next time, when an INPUT statement is executed, Input Buffer receives a new line
 MISSOVER – Used when INPUT reads a short line
 TRUNCOVER – Causes the INPUT statement to read records of variable-length 

Explain about PUT statement and give some examples

 PUT is a flexible tool in the data step programmers toolkit.


 Ex : PUT one two three - Writes the values of three variables out separated by a space
PUT 132*’_’ – Writes 132 underscores
PUT Var 1 -5 – The value of Var is displayed from column 1 through 5
PUT _all_ - Writes the values of all variables including _error_and_n_
PUT one two three @ - Writes the values of three variables out separated by a space
and keep the line open. So that next PUT statement will continue on. 

Explain the concepts and capabilities of Business Intelligence. 

Business Intelligence helps to manage data by applying different skills, technologies, security
and quality risks. This also helps in achieving a better understanding of data. Business
intelligence can be considered as the collective information. It helps in making predictions of
business operations using gathered data in a warehouse. Business intelligence application
helps to tackle sales, financial, production etc business data. It helps in a better decision
making and can be also considered as a decision support system.

Explain the concepts and capabilities of Business Intelligence.

Business Intelligence is all about processes, skills, technologies, practices and applications
used for supporting decision making.
Business Intelligence applications could perform
- Centrally initiated by the business needs
- It includes decision support system, query reporting, OLAP, data mining, forecasting

Name some of the standard Business Intelligence tools in the market. 

Business intelligence tools are to report, analyze and present data. Few of the tools available
in the market are:

 Eclipse BIRT Project:- Based on eclipse. Mainly used for web applications and it is
open source.
 Freereporting.com:- It is a free web based reporting tool.
 JasperSoft:- BI tool used for reporting, ETL etc.
 Pentaho:- Has data mining, dashboard and workflow capabilities.
 Openl:- A web application used for OLAP reporting.

Name some of the standard Business Intelligence tools in the market.

The following are the standard Business Intelligence tools in the market:
1) BUSINESS OBJECTS CRYSTAL REPORTS
2) MICRO STRATEGY
3) MS-OLAP SERVICES
4) COGNOS REPORT NET 

Explain the Dashboard in the business intelligence. 

A dashboard in business intellgence allows huge data and reports to be read in a single
graphical interface. They help in making faster decisions by replying on measurable data seen
at a glance. They can also be used to get into details of this data to analyze the root cause of
any business performance. It represents the business data and business state at a high level.
Dashboards can also be used for cost control. Example of need of a dashboard: Banks run
thousands of ATM’s. They need to know how much cash is deposited, how much is left etc. 
Explain the Dashboard in the business intelligence.

Dashboard in business intelligence is used for rapid prototyping, cloning and deployment for
all databases, operational applications or spread sheets through an organization.
A dashboard in BI allows an enterprise’s status/position, heading to, by using graphs, maps
and chars. The drill-down and roll-over capabilities allows organizing things without
revealing important information. It is fully customizable, including free-form design options.
Dashboard consolidates vital statistics of business into an easy-to-read page. 

Explain the concepts and capabilities of Business Object.

The entities of a business domain are represented as objects in an OO program. An employee


might be represented as business object to represent each employee, reporting to person. The
business objects represents as domains with relationships among them. A business object
encapsulates the data and business functionality together that associates with the business
entity. 

What is broad cast agent?

Broadcast agent is utilized for scheduling the document or refresh the document in a particular
period of time and the users receives these documents in spreadsheet of pdf format. Broadcast
Agent is used for scheduling the document. 

What SAS statements would you code to read an external raw data file to a DATA step?
INFILE statement.

· How do you read in the variables that you need?


Using Input statement with the column pointers like @5/12-17 etc.

· Are you familiar with special input delimiters? How are they used?
DLM and DSD are the delimiters that I’ve used. They should be included in the infile
statement. Comma separated values files or CSV files are a common type of
file that can be used to read with the DSD option. DSD option treats two delimiters in a row
as MISSING value.

DSD also ignores the delimiters enclosed in quotation marks.

· If reading a variable length file with fixed input, how would you prevent SAS
from reading the next record if the last variable didn't have a value?
By using the option MISSOVER in the infile statement.If the input of some data lines are
shorter than others then we use TRUNCOVER option in the infile statement.

· What is the difference between an informat and a format? Name three informats or formats.

Informats read the data. Format is to write the data.


Informats: comma. dollar. date.
Formats can be same as informatsInformats: MMDDYYw. DATEw. TIMEw. ,
PERCENTw,Formats: WORDIATE18., weekdatew.

· Name and describe three SAS functions that you have used, if any?

LENGTH: returns the length of an argument not counting the trailing blanks.
(missing values have a length of
1)Ex: a=’my cat’;x=LENGTH(a); Result: x=6…

SUBSTR: SUBSTR(arg,position,n) extracts a substring from an argument starting at


‘position’ for ‘n’ characters or until end if no ‘n’.
Ex: A=’(916)734-6241’;X=SUBSTR(a,2,3); RESULT: x=’916’

TRIM: removes trailing blanks from character expression.


Ex: a=’my ‘; b=’cat’;X= TRIM(a)(b); RESULT: x=’mycat’.

SUM: sum of non missing values.Ex: x=Sum(3,5,1); result: x=9.0

INT: Returns the integer portion of the argument.

· How would you code the criteria to restrict the output to be produced?
Use NOPRINT option.

· What is the purpose of the trailing @ and the @@? How would you use
them?
@ holds the value past the data step.@@ holds the value till a input statement or end of the
line.

Double trailing @@: When you have multiple observations per line of
raw data, we should use double trailing signs (@@) at the end
of the INPUT statement. The line hold specifies like a stop sign telling SAS, “stop, hold that
line of raw data”.

Trailing @: By using @ without specifying a column, it is as if you are telling


SAS,” stay tuned for more information. Don’t touch that dial”. SAS will hold the line of data
until it reaches either the end of the data step or an INPUT statement that does not end with
the trailing.

· Under what circumstances would you code a SELECT construct instead of IF


statements?
When you have a long series of mutually exclusive conditions and the comparison is numeric,
using a SELECT group is slightly more efficient than using IF-THEN or IF-THEN-ELSE
statements because CPU time is reduced.
SELECT GROUP:
Select: begins with select group.When: identifies SAS statements that are
executed when a particular condition is true.
Otherwise (optional): specifies a statement to be executed if no WHEN condition is met.
End: ends a SELECT group.

·What statement you code to tell SAS that it is to write to an


external file?

.What statement do you code to write the record to the file?


PUT and FILE statements.

· If reading an external file to produce an external file, what is


the shortcut to write that record without coding every single
variable on the record?

· If you're not wanting any SAS output from a data step, how would you code the data
statement to prevent SAS from producing a set?
Data _Null_

· What is the one statement to set the criteria of data that can be coded in any step?
Options statement: This a part of SAS program and effects all steps that follow it.

· Have you ever linked SAS code? If so, describe the link and any required
statements used to either process the code or the step itself

.· How would you include common or reuse code to be processed along with your
statements?
By using SAS Macros.

· When looking for data contained in a character string of 150 bytes, which function is the
best to locate that data: scan, index, or indexc?

SCAN.· If you have a data set that contains 100 variables, but you need only five of those,

.what is the code to force SAS to use only those variable?


Using KEEP option or statement.

· Code a PROC SORT on a data set containing State, District and County as the primary
variables, along with several numeric variables.
Proc sort data=one;
BY State District County ;
Run ;

· How would you delete duplicate observations?


NONUPLICATES
· How would you delete observations with duplicate keys?
NODUPKEY

· How would you code a merge that will keep only the observations that have matches from
both sets.
Check the condition by using If statement in the Merge statement while merging datasets.

· How would you code a merge that will write the matches of both to one data set, the non-
matches from the left-most data.

Step1: Define 3 datasets in DATA step


Step2: Assign values of IN statement to different variables for 2 datasets
Step3: Check for the condition using IF statement and output the matching to first dataset and
no matches to different datasets

Ex: data xxx;


merge yyy(in = inxxx) zzz (in = inzzz);
by aaa;
if inxxx = 1 and inyyy = 1;
run;

· What is the Program Data Vector (PDV)? What are its functions?
Function: To store the current obs;PDV (Program Data Vector) is a logical area in memory
where SAS creates a dataset one observation at a time. When SAS processes a data step it has
two phases. Compilation phase and execution phase. During the compilation phase the input
buffer is created to hold a record from external file. After input buffer is
created the PDV is created. The PDV is the area of memory where SAS builds dataset, one
observation at a time. The PDV contains two automatic variables _N_ and _ERROR_.

The Logical Program Data Vector (PDV) is a set of buffers that includes all variables
referenced either explicitly or implicitly in the DATA step. It is created at compile time, then
used at execution time as the location where the working values of variables are stored as they
are processed by the DATA step program(source:
http://www2.sas.com/proceedings/sugi24/Posters/p235-24.pdf).

· Does SAS 'Translate' (compile) or does it 'Interpret'? Explain.


SAS compiles the code· At compile time when a SAS data set is read, what items are created?
Automatic variables are created. Input Buffer, PDV and Descriptor Information

· Name statements that are recognized at compile time only?


PUT

· Name statements that are execution only.


INFILE, INPUT·

.Identify statements whose placement in the DATA step is critical.


DATA, INPUT, RUN.

· Name statements that function at both compile and execution time.


INPUT

· In the flow of DATA step processing, what is the first action in a typical DATA Step?
The DATA step begins with a DATA statement. Each time the DATA statement executes, a
new iteration of the DATA step begins, and the _N_ automatic variable is
incremented by 1.

· What is _n_?
It is a Data counter variable in SAS.
Note: Both -N- and _ERROR_ variables are always available to you in the data step

.–N- indicates the number of times SAS has looped through the data step.This
is not necessarily equal to the observation number, since a simple sub setting IF statement can
change the relationship between Observation number and the number of iterations of the data
step.The –ERROR- variable ha a value of 1 if there is a error in the data for that observation
and 0 if it is not. Ex: This is nothing but a implicit variable created by SAS
during data processing. It gives the total number of records SAS has iterated in a dataset. It is
Available only for data step and not for PROCS. Eg. If we want to find every third record in a
Dataset thenwe can use the _n_ as follows

Data new-sas-data-set;
Set old;
if mod(_n_,3)= 1 then;
run;

Note: If we use a where clause to subset the _n_ will not yield the required result.

1. What SAS statements would you code to read an external raw data file
2. How do you read in the variables that you need?
3. Are you familiar with special input delimiters? How are they used?
4. If reading a variable length file with fixed input, how would you prevent SAS variable
didn’t have a value?
5. What is the difference between an informat and a format? Name three
6. Name and describe three SAS functions that you have used, if any?
7. How would you code the criteria to restrict the output to be produced?
8. What is the purpose of the trailing @? The @@? How would you use
9. Under what circumstances would you code a SELECT construct instead
10. What statement do you code to tell SAS that it is to write to an external file? What record
to the file?
11. If reading an external file to produce an external file, what is the shortcut to write variable
on the record?
12. If you’re not wanting any SAS output from a data step, how would you
producing a set?
13. What is the one statement to set the criteria of data that can be coded in any step?
14. Have you ever linked SAS code? If so, describe the link and any required statements the
step itself.
15. How would you include common or reuse code to be processed along
16. When looking for data contained in a character string of 150 bytes,
index, or indexc?
17. If you have a data set that contains 100 variables, but you need only five of those, only
those variable?
18. Code a PROC SORT on a data set containing State, District and County
numeric variables.
19. How would you delete duplicate observations?
20. How would you delete observations with duplicate keys?
21. How would you code a merge that will keep only the observations that
22. How would you code a merge that will write the matches of both to
set to a second data set, and the non-matches of the right-most data set to a third 23. What is
the Program Data Vector (PDV)? What are its functions?
24. Does SAS ‘Translate’ (compile) or does it ‘Interpret’? Explain.
25. At compile time when a SAS data set is read, what items are created?
26. Name statements that are recognized at compile time only?
27. Identify statements whose placement in the DATA step is critical.
28. Name statements that function at both compile and execution time.
29. Name statements that are execution only.
30. In the flow of DATA step processing, what is the first action in a typical DATA
31. What is _n_?

SAS interview Questions :


Under what circumstances would you code a SELECT construct instead of IF statements?

A: I think Select statement is used when you are using one condition to compare with several
conditions like…….
Data exam;
Set exam;
select (pass);
when Physics >60;
when math > 100;
when English = 50;
otherwise fail;
run;

What is the one statement to set the criteria of data that can be coded in any step?
A) Options statement.

What is the effect of the OPTIONS statement ERRORS=1?

A) The –ERROR- variable has a value of 1 if there is an error in the data for that observation
and 0 if it is not.

What's the difference between VAR A1 - A4 and VAR A1 -- A4?


A) There is no diff between VAR A1-A4 a VAR A1—A4. Where as if we submit VAR A1---
A4 instead of VAR A1-A4 or VAR A1—A3, u will see error message in the log.

What do the SAS log messages "numeric values have been converted to character" mean?
What are the implications?

A) It implies that automatic conversion took place to make character functions possible.

Why is a STOP statement needed for the POINT= option on a SET statement?
A) Because POINT= reads only the specified observations, SAS cannot detect an end-of-file
condition as it would if the file were being read sequentially.

How do you control the number of observations and/or variables read or written?

A) FIRSTOBS and OBS option

Approximately what date is represented by the SAS date value of 730?


A) 31st December 1961

Identify statements whose placement in the DATA step is critical.


A) INPUT, DATA and RUN…

Does SAS 'Translate' (compile) or does it 'Interpret'? Explain.


A) Compile

What does the RUN statement do?


A) When SAS editor looks at Run it starts compiling the data or proc step, if you have more
than one data step or proc step or if you have a proc step. Following the data step then you can
avoid the usage of the run statement.

Why is SAS considered self-documenting?


A) SAS is considered self documenting because during the compilation time it creates and
stores all the information about the data set like the time and date of the data set creation later
No. of the variables later labels all that kind of info inside the dataset and you can look at that
info using proc contents procedure.

What are some good SAS programming practices for processing very large data sets?
A) Sort them once, can use firstobs = and obs = ,

What is the different between functions and PROCs that calculate thesame simple descriptive
statistics?
A) Functions can used inside the data step and on the same data set but with proc's you can
create a new data sets to output the results. May be more ...........

If you were told to create many records from one record, show how you would do this using
arrays and with PROC TRANSPOSE?
A) I would use TRANSPOSE if the variables are less use arrays if the var are more .................
depends

What is a method for assigning first.VAR and last.VAR to the BY groupvariable on unsorted
data?
A) In Unsorted data you can't use First. or Last.

How do you debug and test your SAS programs?


A) First thing is look into Log for errors or warning or NOTE in some cases or use the
debugger in SAS data step.

What other SAS features do you use for error trapping and datavalidation?
A) Check the Log and for data validation things like Proc Freq, Proc means or some times
proc print to look how the data looks like ........

How would you combine 3 or more tables with different structures?


A) I think sort them with common variables and use merge statement. I am not sure what you
mean different structures.

Other questions:

What areas of SAS are you most interested in?


A) BASE, STAT, GRAPH, ETSBriefly

Describe 5 ways to do a "table lookup" in SAS.


A) Match Merging, Direct Access, Format Tables, Arrays, PROC SQL

What versions of SAS have you used (on which platforms)?


A) SAS 9.1.3,9.0, 8.2 in Windows and UNIX, SAS 7 and 6.12

What are some good SAS programming practices for processing very large data sets?
A) Sampling method using OBS option or subsetting, commenting the Lines, Use Data Null

What are some problems you might encounter in processing missing values? In Data steps?
Arithmetic? Comparisons? Functions? Classifying data?
A) The result of any operation with missing value will result in missing value. Most SAS
statistical procedures exclude observations with any missing variable values from an analysis.

How would you create a data set with 1 observation and 30 variables from a data set with 30
observations and 1 variable?
A) Using PROC TRANSPOSE

What is the different between functions and PROCs that calculate the same simple descriptive
statistics?
A) Proc can be used with wider scope and the results can be sent to a different dataset.
Functions usually affect the existing datasets.

If you were told to create many records from one record, show how you would do this using
array and with PROC TRANSPOSE?

A) Declare array for number of variables in the record and then used Do loop Proc Transpose
with VAR statement
What are _numeric_ and _character_ and what do they do?
A) Will either read or writes all numeric and character variables in dataset.

How would you create multiple observations from a single observation?


A) Using double Trailing @@

For what purpose would you use the RETAIN statement?


A) The retain statement is used to hold the values of variables across iterations of the data
step. Normally, all variables in the data step are set to missing at the start of each iteration of
the data step.What is the order of evaluation of the comparison operators: + - * / ** ()?A) (),
**, *, /, +, -

How could you generate test data with no input data?


A) Using Data Null and put statement

How do you debug and test your SAS programs? 


A) Using Obs=0 and systems options to trace the program execution in log.

What can you learn from the SAS log when debugging?
A) It will display the execution of whole program and the logic. It will also display the error
with line number so that you can and edit the program.

What is the purpose of _error_?


A) It has only to values, which are 1 for error and 0 for no error.

How can you put a "trace" in your program?


A) By using ODS TRACE ON

How does SAS handle missing values in: assignment statements, functions, a merge, an
update, sort order, formats, PROCs?
A) Missing values will be assigned as missing in Assignment statement. Sort order treats
missing as second smallest followed by underscore.

How do you test for missing values?


A) Using Subset functions like IF then Else, Where and Select.

How are numeric and character missing values represented internally?


A) Character as Blank or “ and Numeric as.

Which date functions advances a date time or date/time value by a given interval?
A) INTNX.

In the flow of DATA step processing, what is the first action in a typical DATA Step?
A) When you submit a DATA step, SAS processes the DATA step and then creates a new
SAS data set.( creation of input buffer and PDV)
Compilation Phase
Execution Phase

What are SAS/ACCESS and SAS/CONNECT?


A) SAS/Access only process through the databases like Oracle, SQL-server, Ms-Access etc.
SAS/Connect only use Server connection.

What is the one statement to set the criteria of data that can be coded in any step?
A) OPTIONS Statement, Label statement, Keep / Drop statements.

What is the purpose of using the N=PS option?


A) The N=PS option creates a buffer in memory which is large enough to store PAGESIZE
(PS) lines and enables a page to be formatted randomly prior to it being printed.

What are the scrubbing procedures in SAS?


A) Proc Sort with nodupkey option, because it will eliminate the duplicate values.

What are the new features included in the new version of SAS i.e., SAS9.1.3?
A) The main advantage of version 9 is faster execution of applications and centralized access
of data and support.

There are lots of changes has been made in the version 9 when we compared with the version
8. The following are the few:SAS version 9 supports Formats longer than 8 bytes & is not
possible with version 8.
Length for Numeric format allowed in version 9 is 32 where as 8 in version 8.
Length for Character names in version 9 is 31 where as in version 8 is 32.
Length for numeric informat in version 9 is 31, 8 in version 8.
Length for character names is 30, 32 in version 8.3 new informats are available in version 9 to
convert various date, time and datetime forms of data into a SAS date or SAS time.

·ANYDTDTEW. - Converts to a SAS date value ·ANYDTTMEW. - Converts to a SAS time


value. ·ANYDTDTMW. -Converts to a SAS datetime value.CALL SYMPUTX Macro
statement is added in the version 9 which creates a macro variable at execution time in the
data step by ·

Trimming trailing blanks · Automatically converting numeric value to character.


New ODS option (COLUMN OPTION) is included to create a multiple columns in the output.

WHAT DIFFERRENCE DID YOU FIND AMONG VERSION 6 8 AND 9 OF SAS.


The SAS 9
A) Architecture is fundamentally different from any prior version of SAS. In the SAS 9
architecture, SAS relies on a new component, the Metadata Server, to provide an information
layer between the programs and the data they access. Metadata, such as security permissions
for SAS libraries and where the various SAS servers are running, are maintained in a common
repository.

What has been your most common programming mistake?


A) Missing semicolon and not checking log after submitting program,
Not using debugging techniques and not using Fsview option vigorously.

Name several ways to achieve efficiency in your program.


Efficiency and performance strategies can be classified into 5 different areas.
·CPU time
·Data Storage
· Elapsed time
· Input/Output
· Memory CPU Time and Elapsed Time- Base line measurements

Few Examples for efficiency violations:


Retaining unwanted datasets Not sub setting early to eliminate unwanted records.
Efficiency improving techniques:
A)
Using KEEP and DROP statements to retain necessary variables. Use macros for reducing the
code.
Using IF-THEN/ELSE statements to process data programming.
Use SQL procedure to reduce number of programming steps.
Using of length statements to reduce the variable size for reducing the Data storage.
Use of Data _NULL_ steps for processing null data sets for Data storage.

What other SAS products have you used and consider yourself proficient in using?
B) A) Data _NULL_ statement, Proc Means, Proc Report, Proc tabulate, Proc freq and Proc
print, Proc Univariate etc.

What is the significance of the 'OF' in X=SUM (OF a1-a4, a6, a9);
A) If don’t use the OF function it might not be interpreted as we expect. For example the
function above calculates the sum of a1 minus a4 plus a6 and a9 and not the whole sum of a1
to a4 & a6 and a9. It is true for mean option also.

What do the PUT and INPUT functions do?


A) INPUT function converts character data values to numeric values.
PUT function converts numeric values to character values.EX: for INPUT: INPUT (source,
informat)
For PUT: PUT (source, format)
Note that INPUT function requires INFORMAT and PUT function requires FORMAT.
If we omit the INPUT or the PUT function during the data conversion, SAS will detect the
mismatched variables and will try an automatic character-to-numeric or numeric-to-character
conversion. But sometimes this doesn’t work because $ sign prevents such conversion.
Therefore it is always advisable to include INPUT and PUT functions in your programs when
conversions occur.

Which date function advances a date, time or datetime value by a given interval?
INTNX:
INTNX function advances a date, time, or datetime value by a given interval, and returns a
date, time, or datetime value. Ex: INTNX(interval,start-from,number-of-
increments,alignment)

INTCK: INTCK(interval,start-of-period,end-of-period) is an interval functioncounts the


number of intervals between two give SAS dates, Time and/or datetime.

DATETIME () returns the current date and time of day.

DATDIF (sdate,edate,basis): returns the number of days between two dates.

What do the MOD and INT function do? What do the PAD and DIM functions do? MOD:
A) Modulo is a constant or numeric variable, the function returns the reminder after numeric
value divided by modulo.

INT: It returns the integer portion of a numeric value truncating the decimal portion.

PAD: it pads each record with blanks so that all data lines have the same length. It is used in
the INFILE statement. It is useful only when missing data occurs at the end of the record.

CATX: concatenate character strings, removes leading and trailing blanks and inserts
separators.

SCAN: it returns a specified word from a character value. Scan function assigns a length of
200 to each target variable.

SUBSTR: extracts a sub string and replaces character values.Extraction of a substring:


Middleinitial=substr(middlename,1,1); Replacing character values: substr (phone,1,3)=’433’;
If SUBSTR function is on the left side of a statement, the function replaces the contents of the
character variable.

TRIM: trims the trailing blanks from the character values.

SCAN vs. SUBSTR: SCAN extracts words within a value that is marked by delimiters.
SUBSTR extracts a portion of the value by stating the specific location. It is best used when
we know the exact position of the sub string to extract from a character value.

How might you use MOD and INT on numeric to mimic SUBSTR on character Strings?
A) The first argument to the MOD function is a numeric, the second is a non-zero numeric;
the result is the remainder when the integer quotient of argument-1 is divided by argument-2.
The INT function takes only one argument and returns the integer portion of an argument,
truncating the decimal portion. Note that the argument can be an expression.

DATA NEW ;A = 123456 ;


X = INT( A/1000 ) ;
Y = MOD( A, 1000 ) ;
Z = MOD( INT( A/100 ), 100 ) ;
PUT A= X= Y= Z= ;
RUN ;
A=123456X=123Y=456Z=34

In ARRAY processing, what does the DIM function do?


A) DIM: It is used to return the number of elements in the array. When we use Dim function
we would have to re –specify the stop value of an iterative DO statement if u change the
dimension of the array.

How would you determine the number of missing or nonmissing values in computations?
A) To determine the number of missing values that are excluded in a computation, use the
NMISS function.

data _null_;
m=.;
y=4;
z=0;
N = N(m , y, z);
NMISS = NMISS (m , y, z);
run;

The above program results in N = 2 (Number of non missing values) and NMISS = 1 (number
of missing values).

Do you need to know if there are any missing values?


A) Just use: missing_values=MISSING(field1,field2,field3);
This function simply returns 0 if there aren't any or 1 if there are missing values.If you need to
know how many missing values you have then use
num_missing=NMISS(field1,field2,field3);

You can also find the number of non-missing values with non_missing=N
(field1,field2,field3);

What is the difference between: x=a+b+c+d; and x=SUM (of a, b, c ,d);?


A) Is anyone wondering why you wouldn’t just use total=field1+field2+field3;

First, how do you want missing values handled?


The SUM function returns the sum of non-missing values. If you choose addition, you will get
a missing value for the result if any of the fields are missing. Which one is appropriate
depends upon your needs.However, there is an advantage to use the SUM function even if you
want the results to be missing. If you have more than a couple fields, you can often use
shortcuts in writing the field names If your fields are not numbered sequentially but are stored
in the program data vector together then you can use: total=SUM(of fielda--zfield); Just make
sure you remember the “of” and the double dashes or your code will run but you won’t get
your intended results. Mean is another function where the function will calculate differently
than the writing out the formula if you have missing values.There is a field containing a date.
It needs to be displayed in the format "ddmonyy" if it's before 1975, "dd mon ccyy" if it's after
1985, and as 'Disco Years' if it's between 1975 and 1985.

How would you accomplish this in data step code?


Using only PROC FORMAT.
data new ;
input date ddmmyy10.
;
cards;
01/05/1955
01/09/1970
01/12/1975
19/10/1979
25/10/1982
10/10/1988
27/12/1991;
run;

proc format ;
value dat low-'01jan1975'd=ddmmyy10.'01jan1975'd-'01JAN1985'd="Disco Years"'
01JAN1985'd-high=date9.;
run;

proc print;
format date dat. ;
run;

In the following DATA step, what is needed for 'fraction' to print to the log?
data _null_;
x=1/3;
if x=.3333 then put 'fraction';
run;

What is the difference between calculating the 'mean' using the mean function and PROC
MEANS?
A) By default Proc Means calculate the summary statistics like N, Mean, Std deviation,
Minimum and maximum, Where as Mean function compute only the mean values.

What are some differences between PROC SUMMARY and PROC MEANS?
Proc means by default give you the output in the output window and you can stop this by the
option NOPRINT and can take the output in the separate file by the statement
OUTPUTOUT= , But, proc summary doesn't give the default output, we have to explicitly
give the output statement and then print the data by giving PRINT option to see the result.

What is a problem with merging two data sets that have variables with the same name but
different data?
A) Understanding the basic algorithm of MERGE will help you understand how the
stepProcesses. There are still a few common scenarios whose results sometimes catch users
off guard. Here are a few of the most frequent 'gotchas':

1- BY variables has different lengthsIt is possible to perform a MERGE when the lengths of
the BY variables are different,But if the data set with the shorter version is listed first on the
MERGE statement, theShorter length will be used for the length of the BY variable during the
merge. Due to this shorter length, truncation occurs and unintended combinations could
result.In Version 8, a warning is issued to point out this data integrity risk. The warning will
be issued regardless of which data set is listed first:WARNING: Multiple lengths were
specified for the BY variable name by input data sets.This may cause unexpected results.
Truncation can be avoided by naming the data set with the longest length for the BY variable
first on the MERGE statement, but the warning message is still issued. To prevent the
warning, ensure the BY variables have the same length prior to combining them in the
MERGE step with PROC CONTENTS. You can change the variable length with either a
LENGTH statement in the merge DATA step prior to the MERGE statement, or by recreating
the data sets to have identical lengths for the BY variables.Note: When doing MERGE we
should not have MERGE and IF-THEN statement in one data step if the IF-THEN statement
involves two variables that come from two different merging data sets. If it is not completely
clear when MERGE and IF-THEN can be used in one data step and when it should not be,
then it is best to simply always separate them in different data step. By following the above
recommendation, it will ensure an error-free merge result.

Which data set is the controlling data set in the MERGE statement?
A) Dataset having the less number of observations control the data set in the merge statement.

How do the IN= variables improve the capability of a MERGE?


A) The IN=variablesWhat if you want to keep in the output data set of a merge only the
matches (only those observations to which both input data sets contribute)? SAS will set up
for you special temporary variables, called the "IN=" variables, so that you can do this and
more. Here's what you have to do: signal to SAS on the MERGE statement that you need the
IN= variables for the input data set(s) use the IN= variables in the data step appropriately, So
to keep only the matches in the match-merge above, ask for the IN= variables and use
them:data three;merge one(in=x) two(in=y); /* x & y are your choices of names */by id; /* for
the IN= variables for data */if x=1 and y=1; /* sets one and two respectively */run;

What techniques and/or PROCs do you use for tables?


A) Proc Freq, Proc univariate, Proc Tabulate & Proc Report.

Do you prefer PROC REPORT or PROC TABULATE? Why?


A) I prefer to use Proc report until I have to create cross tabulation tables, because, It gives me
so many options to modify the look up of my table, (ex: Width option, by this we can change
the width of each column in the table) Where as Proc tabulate unable to produce some of the
things in my table. Ex: tabulate doesn’t produce n (%) in the desirable format.

How experienced are you with customized reporting and use of DATA _NULL_ features?
A) I have very good experience in creating customized reports as well as with Data _NULL_
step. It’s a Data step that generates a report without creating the dataset there by development
time can be saved. The other advantages of Data NULL is when we submit, if there is any
compilation error is there in the statement which can be detected and written to the log there
by error can be detected by checking the log after submitting it. It is also used to create the
macro variables in the data set.

What is the difference between nodup and nodupkey options?


A) NODUP compares all the variables in our dataset while NODUPKEY compares just the
BY variables.

What is the difference between compiler and interpreter?


Give any one example (software product) that act as an interpreter?
A) Both are similar as they achieve similar purposes, but inherently different as to how they
achieve that purpose. The interpreter translates instructions one at a time, and then executes
those instructions immediately. Compiled code takes programs (source) written in SAS
programming language, and then ultimately translates it into object code or machine language.
Compiled code does the work much more efficiently, because it produces a complete machine
language program, which can then be executed.

Code the table’s statement for a single level frequency?


A) Proc freq data=lib.dataset;
table var;*here you can mention single variable of multiple variables seperated by space to get
single frequency;
run;

What is the main difference between rename and label?


A) 1. Label is global and rename is local i.e., label statement can be used either in proc or data
step where as rename should be used only in data step. 2. If we rename a variable, old name
will be lost but if we label a variable its short name (old name) exists along with its
descriptive name.

What is Enterprise Guide? What is the use of it? 


A) It is an approach to import text files with SAS (It comes free with Base SAS version 9.0)

What other SAS features do you use for error trapping and data validation?
What are the validation tools in SAS?
A) For dataset: Data set name/debugData set: name/stmtchk
For macros: Options:mprint mlogic symbolgen.

How can you put a "trace" in your program?


A) ODS Trace ON, ODS Trace OFF the trace records.

How would you code a merge that will keep only the observations that have matches from
both data sets?
A) Using "IN" variable option. Look at the following example.
data three;
merge one(in=x) two(in=y);
by id;
if x=1 and y=1;
run;
or
data three;
merge one(in=x) two(in=y);
by id;
if x and y;
run;

What are input dataset and output dataset options?


A) Input data set options are obs, firstobs, where, in output data set options compress,
reuse.Both input and output dataset options include keep, drop, rename, obs, first obs.

How can u create zero observation dataset?


A) Creating a data set by using the like clause.ex: proc sql;create table latha.emp like
oracle.emp;quit;In this the like clause triggers the existing table structure to be copied to the
new table. using this method result in the creation of an empty table.

Have you ever-linked SAS code, If so, describe the link and any required statements used to
either process the code or the step itself?

A) In the editor window we write%include 'path of the sas file';run;if it is with non-
windowing environment no need to give run statement.

How can u import .CSV file in to SAS? tell Syntax?


A) To create CSV file, we have to open notepad, then, declare the variables.

proc import datafile='E:\age.csv'out=sarathdbms=csv replace;


getnames=yes;
proc print data=sarath;
run;

What is the use of Proc SQl?


A) PROC SQL is a powerful tool in SAS, which combines the functionality of data and proc
steps. PROC SQL can sort, summarize, subset, join (merge), and concatenate datasets, create
new variables, and print the results or create a new dataset all in one step! PROC SQL uses
fewer resources when compard to that of data and proc steps. To join files in PROC SQL it
does not require to sort the data prior to merging, which is must, is data merge.

What is SAS GRAPH?


A) SAS/GRAPH software creates and delivers accurate, high-impact visuals that enable
decision makers to gain a quick understanding of critical business issues.

Why is a STOP statement needed for the point=option on a SET statement?


A) When you use the POINT= option, you must include a STOP statement to stop DATA step
processing, programming logic that checks for an invalid value of the POINT= variable, or
Both. Because POINT= reads only those observations that are specified in the DO statement,
SAScannot read an end-of-file indicator as it would if the file were being read sequentially.
Because reading an end-of-file indicator ends a DATA step automatically, failure to substitute
another means of ending the DATA step when you use POINT= can cause the DATA step to
go into a continuous loop.

What is the difference between nodup and nodupkey options?


A) The NODUP option checks for and eliminates duplicate observations. The NODUPKEY
option checks for and eliminates duplicate observations by variable values.
Posted by SAS at 8:41 AM 1 comment: 

Questions
1.   How do I change default SAS System user profile folder in SAS 9.0? For example, I
want to set the working directory to c:\mysas.
2.   How do I check syntax errors in my SAS program before I submit it?
3.   How do I reorder variables' position in a SAS data set?
4.   How do I remove duplicate observations from my data set?
5.   I have a huge data set. Is there a quick way for me to sort my whole data out?
6.   How do I replace missing values with the variable mean in my data set?
7.   How do I accumulate a numeric variable to get a total amount (totfaminc) for each group
(one household id per record)?
8.   How do I create a unique ID for my data set?
9.   How do I count observations per subject in a data set?
10.   How do I count words in SAS?
11.   I have a messy string variable with people's names and addresses. How do I correct
them?
12.   How do I add a mean value of a variable back to my original data set?
13.   How do I add mean values and total count of a categorical variable back to my original
data set? For example: mean values of height for Sex (female and male).
14.   I have a program that I closed and forgot to save but I do have a log file from when I ran
it. Can I take the text from the log file and make it a program again?
15.   I want to save hard drive space. How do I write a code to delete some or all temporary
files once I do not need them any more?
16.   I am using a very huge data set in SAS for windows. How do I minimize space
requirements?
17.   What is the difference between One-to-Many and Many-to-One Matched Merge in SAS?
18.   I have two data sets. I want to do a Matched Merge and output only consisting of
variables from both files.How do I do it?
19.   I have two data sets. How do I do a Matched Merge and output consisting of
variables in file1 but not in file2, or in file2 but not in file1 (id--4 5 6 7)?
20.   I have two data sets. How do I do a Matched Merge and output consisting of variables
from master file (id--1, 2, 3, 4, 5)?
21.   I have two large data sets. How do I do a Matched Merge and output consisting of
variables file1 but not infile2 (id--4, 5)?
22.   How do I merge 10 data sets with a common variable and the same prefix data name?
For example: data1 todata10 with common variable of ID.
23.   I have SAS and STATA but I do not have Stat/Transfer program in my computer. Is
there any easy way for meto convert my SAS data set to STATA?
24.   How do I rotate data from long format to wide format?
25.   My string variable MEMO contains two quotes (it's 'ok'). What should I do if I
want to use an if/then statementto check the value of X (if MEMO = it's 'ok' then ...;)?
26.   How do I use Proc SQL to get mean(var), std(var), min(var), max(var), median(var) from
a data set?
27.   How do I create a data set with observations =100, mean 0 and standard deviation 1?
28.   How do I randomly sample a certain proportion of observations from a SAS dataset? For
example, 10%.
29.   How do I create a summarized report without creating a new variable? For example: I
have an AGE variable and I want to generate a summarized report that shows the number of
people in different groups.
30.   Is there a simple way to indicate an integer range in SAS?
31.   How do I know what products in my computer are licensed? What is my site number and
when will my SASexpire?
32.   I have a file that is an activity based file for each person throughout the day. There is a
start time and end time as well as a duration time that shows how many minutes they were
doing the activity. I need to create a file which has a 1440 observation for each person for
every minute of the day. How do I create a time serial data?
33.   How do I create a data set with a permanent variable value label (permanent format)?
34.   I have just received a SAS data file (emp.sas7bdat) with a SAS format file
(formats.sas7bcat). How do I use them?
35.   How do I create a flat text file from a SAS data set?
36.   How do I set ODS to save my output in HTML format by default?
37.   How do I sort the values of two related arrays?
38.   I am using a large data set. How do I eliminate the output in the output window?
39.   How can I change the orientation for printing in my SAS program?
40.   I want to save paper while printing. How can I change the font when printing
my SAS code, output file, and log file?
41.   How can I generate percentile ranks using SAS?
42.   How do I generate a directory file with total observations, total variables, file size and
the last time the file was modified for a defined library name?
43.   How do I direct output a file from output window to a flat file?
44.   How do I automatically output a single txt file that contains both LOG and LST files?
45.   I want to save paper when printing the output file by skipping page breaks. What can I
do?
46.   I have a character variable with regulate time value like 01:00:00 pm and 09:00:02
am in my SAS data set. I want to order 9am earlier than 1pm. What should I do?
47.   I have a character variable with regulate time value like 04:30:01 or
23:00:00 in my SAS data set. I want toconvert them to hours. How do I do that?
48.   How do I convert a character date value to a SAS date value?
49.   I have a character variable with mix regulate date value like 01mar99, 01 mar 99 and 01-
mar-1999. I want toconvert them to SAS date value. What should I do?
50.   How do I create a categorical variable using a variable's percentage?
51.   I keep getting an error message "out of memory" when I use proc freq with two
categorical variables to try toget a freq report. Why and how do I avoid this problem?
52.   I need to create a comma separated file (.csv) from a SAS data, what should I do?

ANSWERS
1. Q: How do I change default SAS System user profile folder in SAS 9.0? For example, I
want to set the working directory to c:\mysas.

A: The best way to do this is to change the -sasuser parameter in the sasv9.cfg file. Just set
this in sasv9.cfg from your computer and you are ready to go.

-sasuser "c:\mysas"
2. Q: How do I check syntax errors in my SAS program before I submit it?

A:
1. Add "Option Obs=0; NoReplace;" on the top of your SAS program and submit
your SAS program.
2. Check your Log window and reset System Options back to "Obs=max; Replace;" if there is
not
    any errors in your SAS program.
3. Submit your SAS program again.
3. Q: How do I reorder variables' position in a SAS data set?

A: You can add either Length or Retain statement before a Set statement in a data step.

For example:
data a; 
input a1 b1 c1 a2 b2 c2 a3 b3 c3; 
datalines;
1 2 3 4 5 6 7 8 9
;

data b; 
length a1-a3 3. b1-b3 3. c1-c3 3. ;
set a; 
run;

data c; 
retain a1-a3 b1-b3 c1-c3;
set a;
run;
4. Q: How do I remove duplicate observations from my data set?

A:
1. You can use Proc Sort with option Nodup to remove duplicate observations for all variables
or with
     option NodupKey to remove duplicate observations for By key variables only from your
data set.
2. Dupout is a new option in Proc Sort. It names a SAS data set that will contain the duplicate
    records eliminated from the data  set.
3. You can also use FIRST. and LAST. variables in a data step to remove
duplicate observations.
data temp1;
input id x y ;
cards ;
1 20 1
1 20 1
1 20 2
2 20 3
3 20 4
;
proc sort data=temp1 out=temp2 nodup;
by id;
run;

proc print data=temp2 ;


title 'no duplicates for all variables';
run;

proc sort data=temp1 out=temp3 nodupkey dupout=dropped;


by id;
run;

proc print data=temp3 ;


title 'no duplicate for by variable only';
run;
proc print data=dropped;
title 'new data set Dropped lists  2 observations with duplicate key values were deleted from
the data set Temp1';
run;
proc sort data=temp1;
by id;
run;

data nodup;
set temp1;
by id;
if first.id then output;
run;

proc print data=nodup;


run;

data nodup dup;


set temp1;
by id;
if first.id then output nodup;
else output dup;
run;

proc print data=nodup;


run;
proc print data=dup;
run;
5. Q: I have a huge data set. Is there a quick way for me to sort my whole data out?

A: Usually, a general rule-of-thumb for calculating sort space requires three times the space of
the dataset.
1. The Tagsort option is very good on large datasets where key is small. 
2. Use the Noequals option on PROC SORT if you do not want to keep that order the same as
the
    observations that were in the input data.
6. Q: How do I replace missing values with the variable mean in my data set?

A: The easiest way is to use Proc Standard with Replace option to replace all missing values
with the variable mean. For example:

data raw ; 
input var1-var5; 
datalines;
1.166
2 5 2 7 7
3 5 . 8 8
4 5 4 9 9
5 5 5 . 10

proc standard data=raw out=rep_mean replace; 
  var var1-var5; 
run;
Output:
1 5 1 6.0 6
2 5 2 7.0 7
3 5 3 8.0 8
4 5 4 9.0 9
5 5 5 7.5 10
7. Q: How do I accumulate a numeric variable to get a total amount (totfaminc) for each group
(one hhid per record)?

A: You can use First. and Last. variables in a data step to do it.

data test;
input hhid pn faminc;
datalines;
1 1 30000
1 2 20000
130
2 1 50000
220
;
proc sort;
by hhid;
run ;
data new;
set test;
by hhid;
if first.hhid then totfaminc=0;
totfaminc+faminc;
if last.hhid then output;
drop pn;
run;

Output:
hhid   pn   faminc   totfaminc
  1     3      0           50000
  2     2      0           50000
8. Q: How do I create a unique ID for my data set?

A: You can either use Sum statement (id+1;) or system variable "_n_" to do so.

data demo;
id+1;
input id $ v1-v5;
cards ;
A 10 20 30 40 50
B 11 12 13 14 15
C 15 20 25 30 35
D 22 33 44 55 66
;
data new;
set sashelp.class;
id=_n_;
run;
9. Q: How do I count observations per subject in a data set?

A: Use SUM statement and FIRST. variable to count observations per subject. 


For example:

data temp; 
input id num; 
datalines;
1 1 
1 2 
1 1 
2 1 
2 2 
3 1 

proc sort data=temp; 
by id; 

data temp1; 
set temp; 
by id; 
count+1; 
if first.id then count=1; 
run;
proc sort;
by id descending count;
run;
data temp2;
set temp1;
by id; 
retain totc;
if first.id then
totc=count;
output;
run;

proc print;
run;

Output:
id   num count totc
1    1    1       3
1    2    2       3
1    1    3       3
2    1    1       2
2    2    2       2
3    1    1       1
10. Q: How do I count words in SAS? 

A: Use the function of count( ) in SAS v9 to simply count the number of sub-strings that
occur in a string.

data a;
x="good good study, day day up"
y=count(x,'day');
run;
11. Q: I have a messy string variable with people's names and addresses. How do I correct
them? 

A: It's very easy to do it with PROPCASE function in SAS V9. For example:

data proper;
input name char20.;
name = propcase(name);
datalines;
jeff Rust
Amy lee
;
proc print; run;
12. Q: How do I add a mean value of a variable back to my original data set?

A: 
1. Use Proc Means to create a new data set to hold a mean value for the variable.
2. Use a data step with two SET statements to add a mean value back to your original data set.
For example:
proc means data=sashelp.class;
var height;
output out=new mean=avg_height;
run;
data addavg;
if _N_ =1 then set new;
set sashelp.class ;
run;
13. Q: How do I add mean values and total count of a categorical variable back to my original
data set? For example: mean values of height for Sex (female and male).

A:
1. Sort the data by Sex.
2. Use Proc Means with BY to create a new data set to hold a mean and count values by Sex.
3. Use a data step with MERGE statements to merge them by Sex.
proc sort data=ye.class;
by sex;
proc means data=ye.class;
var height;
by sex;
output out=new mean=avg_height n =n_height;
run;
proc sort;
by sex; run;
data addavg;
merge sashelp.class new;
by sex;
drop _type_ _freq_;
run;

Output:
Heigh
Name SexAge WeightWeight heightn_height
t

Joyce F 11 51.3 50.5 8


61.7500
Yeats F 12 59.8 84.5 8
61.7500
Louise F 12 56.3 77.0 8
61.7500
Alice F 13 56.5 84.0 8
61.7500
Barbara F 13 65.3 98.0 8
61.7500
Carol F 14 62.8 102.5 61.7500 8

Judy F 14 64.3 90.0 8


61.7500
Janet F 15 62.5 112.5 61.7500 8

Mary F 15 66.5 112.0 61.7500 8

ThomasM 11 57.5 85.0 62.7636 11

James M 12 57.3 83.0 62.7636 11

John ye M 12 59.0 99.5 62.7636 11

Robert M 12 64.8 128.0 62.7636 11

Jeffrey M 13 62.5 84.0 62.7636 11

Alfred M 14 69.0 112.5 62.7636 11

Henry M 14 63.5 102.5 62.7636 11


Willia
M 15 66.5 1.0 62.7636 11
m

Ronald M 15 67.0 133.0 62.7636 11

Philip M 16 72.0 1.0 62.7636 11

14. Q: I have a program that I closed and forgot to save but I do have a log file from when I
ran it. Can I take the text from the log file and make it a program again?

A: Yes, You can. In log window, hold ALT, move the cursor to your code and highlight it.
Right click and select edit, then select copy. Go back to SAS Edit Window and paste it.
15. Q: I want to save hard drive space. How do I write a code to delete some or all temporary
files once I do not need them any more? 

A: You can manually delete all temporary files from "Work" library. You can also use the
Datasets procedure with the Delete or Save statement to delete some of temporary files. The
Kill option will delete all SAS files immediately after you submit the statement.

Example:
1. Delete all temporary files from "Work" library
proc datasets library=work memtype=data kill; 
run;
quit;

2. Delete some temporary files from "Work" library


proc datasets library=work memtype=data;
delete test1 test2;
*deletes datasets test1 & test2 and keeps the rest;
quit;

proc datasets library=work memtype=data;


save test3; 
*saves test3 and deletes the rest;
quit;
16. Q: I am using a very huge data set in SAS for windows. How do I minimize space
requirements?

A: When you are working with large data sets, you can do the following steps to reduce space
requirements.

1. Split huge data set into smaller data sets.


2. Clean up your working space as much as possible at each step.
3. Use dataset options (keep= , drop=) or statement (keep, drop) to limit to only the variables
needed. 
4. Use IF statement or OBS = to limit the number of observations. 
5. Use WHERE= or WHERE or index to optimize the WHERE expression to limit the
number of observations in a Proc Step and Data Step. 
6. Use length to limit the bytes of variables.
7. Use _null_ dataset name when you don't need to create a dataset. 
8. Compress dataset using system options or dataset options (COMPRESS=yes or
COMPRESS=binary).
9. Use SQL to do merge, summary, sort etc. rather than a combination of Proc Step and Data
Step with temporary datasets.
17. Q: What is the difference between One-to-Many and Many-to-One Matched
Merge in SAS?

A: One-to-Many merge is the same as a Many-to-One Matched Merge except the order of the
variables in the new data set is different. 
For example:
data d1;
input stud $ teacher $; 
datalines;
BarrySandy

Bill Sue

Ellen Sue

John Sue

;
proc sort ;
by teacher;
run;

data d2;
input teacher $ room;
datalines;
Sandy 101
Sue 103
;
proc sort ;
by teacher; 
run;
data combine1t2 ; 
      merge d1 d2 ; by teacher ; 
run;

data combine2t1 ; 
      merge d2 d1; by teacher ; 
run;

Output(merge d1 d2;):
stud teacherroom

BarrySandy 103
Bill Sue 101

Ellen Sue 101

John Sue 101

Output(merge d2 d1;):
teache
roomstud
r

Sandy 103 Barry

Sue 101 Bill

Sue 101 Ellen

Sue 101 John

18. Q: I have two data sets. I want to do a Matched Merge and output only consisting
of observations from both files. How do I do it? 

A: In SAS, there is a Boolean flag called IN= variable. It is used for matched merge to track
and select whichobservations in the data set from the merge statement will go to a new data
set.

data file1;
input id name $;
datalines;
1 John
2 Joe
3 Bill
4 Bob
5 Sandy
;
proc sort data=file1;
by id;
run;
data file2;
input id state $;
datalines;
1 MD
2 NY
3 VA
6 NJ
7 NC
;
proc sort data=file2;
by id;
run;
data in_both;
merge file1(in=infile1) file2(in=infile2);
           by id;
           if infile1= infile2;
run;
Output:
1 John MD
2 Joe   NY
3 Bill     VA
19. Q: I have two data sets. How do I do a Matched Merge and output consisting
of observations in file1 but not infile2, or in file2 but not in file1 (id--4 5 6 7)? 

A:
data in_both;
merge file1(in=infile1) file2(in=infile2);
           by id;
           if infile1 ne infile2;
run;
Output:
4 Bob
5 Sandy
6                NJ
7                NC
20. Q: I have two data sets. How do I do a Matched Merge and output consisting
of observations from master file (id--1, 2, 3, 4, 5)?

A:
data in_both;
merge file1(in=a) file2;
           by id;
           if a; 
run;
proc print;
run;
Output:
1 john        MD
2 joe          NY
3 bill           VA
4 bob
5 sandy
21. Q: I have two large data sets. How do I do a Matched Merge and output consisting of
variables file1 but not infile2 (id--4, 5)?

A: Both Data Step with Merge statement and Proc SQL are OK, but Proc SQL has a higher
efficiency to do it for large data set.
data not_in_a;
merge file1(in=x) file2(in=y);
by id;
if not y;
run;

proc sql; 
create table not_in_a as
select id 
from file1 
except 
select id 
from file2

quit;
22. Q: How do I merge 10 data sets with a common variable and the same prefix data name?
For example: data1to data10 with common variable of ID.

A: The best way is to use a macro.

options mprint;
data data1;
input id x;
datalines;
11
22
33
44
;
data data2;
input id y;
datalines;
1 5
26
37
48
;
data data3;
input id z;
datalines;
19
2 10
3 11
4 12
;
....
%macro mymerge (n); 
     Data merged; 
     Merge 
          %do i = 1 % to &n; data&i %end; 
          By id; 
     Run; 
%mend;

%mymerge(10)
23. Q: I have SAS and STATA but I do not have Stat/Transfer program in my computer. Is
there any easy way for me to convert my SAS data set to STATA?

A: Yes.
Method1: There is a Stata user-written (Ado-file) command “usesas” created by Dr. Dan
Blanchette. The command uses SAS to run SAS macro savastata.sas and load
the SAS dataset into Stata's memory if you have SAS installed inyour computer. The user can
decide to save the data as a Stata data set using Stata's “save” command after that.
For example:

1. Use Stata's command "ssc install" to download and install it.


     ssc install usesas     

2. Use "usesas" command to loads a SAS data set from your computer into Stata's memory.


     usesas using “c:\data\yoursasdat.sas7bdat”

3. Save your Stata data if needed.


     save datname
 

Method2:

In SAS, create a transport format file.


libname out XPORT "c:\data\class.xpt";
data out.class;
set sashelp.class;
run;

In STATA, use command "fdause" to read the transport format file into Stata's memory:


cd c:\data
fdause class
save class
24. Q: How do I rotate a data from long format to wide format?

A: Restructure data set from long format to wide format with arrays. 

data x;
input tucaseid tulineno perlr earv;
datalines;
1 1 100 200
2 1 300 400
2 2 500 600
3 1 700 800
3 2 900 1000
3 3 1100 1200
4 1 1300 1400 
;
proc sort;
by tucaseid ;
run;

data y ;
set x;
by tucaseid ;
retain perlr1-perlr4 earv1-earv4 i;
array ary_perlr(4) perlr1-perlr4;
array ary_earv(4) earv1-earv4;

if first.tucaseid then do;


           do i = 1 to 4;
              ary_perlr(i)=0;
              ary_earv(i)=0;
           end;
           i=1; 
end;

      ary_perlr(i)=perlr;
      ary_earv(i)=earv;
      i=i+1;
if last.tucaseid then output; 

drop perlr earv i tulineno;


run;

Output:
tucasei perlr perlr earv
perlr1 perlr4earv1 earv3earv4
d 2 3 2

1 100 0 0 200 0 0 0  

2 300 500 0 0 400 600 0 0

3 700 900 1100 0 800 1000 1200 0

4 1300 0 0 0 1400 0 0 0

25. Q: My string variable MEMO contains two quotes (it's 'ok'). What should I do if I
want to use an if/then statement to check the value of X (if MEMO = it's 'ok' then ...;)? 
A: Check the value of MEMO using if/then statement as follows: 
if MEMO = "it's 'ok'" then y=1;
26. Q: How do I use Proc SQL to get mean(var), std(var), min(var), max(var), median (var)
from a data set? 

A: Median() function is not a "summary" function in PROC SQL. But it is possible for


you to calculate MEDIAN with more labor job using PROC SQL.
data a; 
input age; 
datalines;






Proc sql; 
Select mean(age), std(age), min(age), max(age) from a; 
Select avg(median) as median from (select x.age as median from a as x, a as y group by x.age
having sum(sign(x.age-y.age)) in (1,0,-1)); 
Quit;
27. Q: How do I create a data set with observations =100, mean 0 and standard deviation 1?

A: data one;
do i = 1 to 100;
           num = 0 + rannor(1) * 1; 
           output; 
end; 
run; 

proc means data = one mean stddev; 


var num; 
run;
28. Q: How do I randomly sample a certain proportion of observations from a SAS dataset?
For example, 10%. 

A: In the DATA step, include the line to select approximately 10% of the observations from


the original data. 
if ranuni(0)<=.1 ; 

You can also use Proc Surveyselect to create a simple random sampling size. For example,
sampling size 10. 

proc surveyselect data=sashelp.class method=srs n=10 


          out=Sample10; 
run;
29. Q: How do I create a summarized report without creating a new variable? For example: I
have an AGE variable and I want to generate a summarized report that shows the number of
people in different groups. 

A:
1. Create a user defined format for the group. 
2. In Proc Freq, when the format is applied to a variable on the TABLE statement, the
formatted value will be used for the distribution. 

Proc format; 
Value fmtnum 
. = 'missing'
Low -<0 = 'negative'
0 = 'zero'
0 <- high = 'positive';
Run; 

Proc freq; 
var sex; 
format sex sexfmt.; 
run;
30. Q: Is there a simple way to indicate an integer range in SAS? 

A: Yes, in SAS version9, IN operator accepts integer ranges more easily than before. 


For example: you can use "age in (11, 13:19)" instead of "if age =11 or 13 <= age <=19"
31. Q: How do I know what products in my computer are licensed? What is my site number
and when will my SASexpire? 

A: Proc setinit; tells you what you have currently valid licenses and site number, and defined
products in your session. 

Proc setinit; 
run ;
32. Q: I have a file that is an activity based file for each person throughout the
day. There is a start time and end time as well as a duration time that show how many minutes
they were doing the activity. I need to create a file which has a 1440 observation for each
person for every minute of the day. Howdo I create a time serial data? 

A: There are several ways to do it. If your START TIME and END TIME are SAS time
values, you can easily create "startmin" variable using "starttime" divided by 60 and create
"endmin" variable using "endtime" divided by 60, and then create your time serial data.
Otherwise, you can just use duration time variable to do it. 

data a; 
input id activity_code starttime endtime; 
datalines;
1 1 1 8 
1 2 9 17 
1 3 18 24 
2 1 1 9 
2 5 10 18 
2 3 19 24 

data b; 
set a; 
hrs=starttime; 
   do until(hrs=endtime); 
    hrs+1; 
  count=hrs-starttime; 
    output; 
   end; 
run; 
%p
33. Q: How do I create a data set with permanent variable value label (permanent format)? 

A. 
1. Create a permanent format using option of "library=project" in proc format statement.
2. Use "Options FMTSEARCH = (project);" to tell SAS where to look for the format. 
3. Accesses a permanent format using format statement.

/*myformat.sas -- Creating a permanent format*/


Libname project 'C:\';
proc format library=project;
value $ sexfmt
'M'='male'
'F '='female'
;
value agefmt
1-17 ='17 and under'
18-high ='18 and up'
;
run; 

/*mysasfile.sas--Accessing a permanent format*/


Options FMTSEARCH = (project); /*Tell SAS where to look for the format */ 
libname project 'c:\'; 

data ye.perm_format;
set sashelp.class;
format sex $sexfmt. age agefmt.;
run;
proc print;
run;
34. Q: I have just received a SAS data file (emp.sas7bdat) with a SAS format file
(formats.sas7bcat). How do I use them? 

A: Save both of them in the same directory in your computer. For example, C:\. Write
a SAS program as below and submit it. 

libname mylib "c:\"; 


libname library "c:\"; 
options fmtsearch = mylib; 
proc means data=mylib.auto; 
run;
35. Q: How do I create a flat text file from a SAS data set? 

A: You can use a Stat/Transfer program to convert it. You can also use Procure (Proc export
and Proc printto) or File and Put statements in a data step to create it. 

filename rawdat 'c:\output.txt' ; 


data _null_; 
    set sashelp.class ; 
    file rawdat ; 
    put name age; 
run ;
36. Q: How do I set ODS to save my output in HTML format by default? 

A: You can go to SAS preference screen to reset settings available to tell SAS to save all Data


step and Proc outputin HTML format. 

1. Click the Tools option, 


2. Click the Options option, 
3. Click the Preferences... option,
4. Click the Results tab, 
5. For HTML output, check the Create HTML box.
37. Q: How do I sort the values of two related arrays? 

A: You can either use Do loop with two arrays or CALL SORTN to do it. 

data temp;
x=3 ;
y=1 ;
z=2 ; 
value_x=100;
value_y=300;
value_z=200;
run ;

data b;
set temp;
array ary1(3) x y z;
array ary2(3) value_x value_y value_z;
max=0;
do i = 1 to 3;
if ary1(i) > max then do;max= ary1(i) ; maxother= ary2(i) ; end;
else 
do;t=max; ary1(i-1)=ary1(i);ary1(i)=t; 
tother=maxother; ary2(i-1)=ary2(i); ary2(i)=tother; 
end;
end;
keep x y z value_x value_y value_z;
run; 

data sortarray; 
set temp; 
CALL SORTN (x, y, z, value_x, value_y, value_z); 
run;
38. Q: I am using a large data set. How do I eliminate the output in output window? 

A: Add the following statement in your code to make "No output destinations active". ODS
listing close;
39. Q: How can I change the orientation for printing in my SAS program? 

A: Add the follow command in your SAS program to control the printer's orientation setting. 

dm 'dlgprtsetup orient=landscape nodisplay';


40. Q: I want to save paper while printing. How can I change the font when printing
my SAS code, output file, and log file? 

A. The default font for printing SAS code, output file, and log file is 10. You can use option
SYSPRINTFONT to set it. 

OPTIONS SYSPRINTFONT='SAS Monospace' 6;
proc print data=sashelp.class; 
run;
41. Q: How can I generate percentile ranks using SAS? 

A: proc rank groups=100 out=dataset; 


var x y; 
ranks prv1 prv2; 
run;
42. Q: How do I generate a directory file with total observations, total variables, file size and
the last time the file was modified for a defined library name? 

A: Use proc datasets to check information of total observations, total variables, file size and


the last time the file was modified for a defined "library" from log window. 

proc datasets library=mylib mt=data details; 


run; 
quit; 

Use ODS to save them to a new data set: 


ods output members=dir_mylib; 
proc datasets library=mylib mt=data details; 
run; 
quit; 
ods output close;
43. Q: How do I direct outputting a file from output window to a flat file?

A: You can use Proc Printto to automatically route output to a file. 


proc printto print='c:\auto.lst' new; 
.... 
run;
44. Q: How do I automatically output a single txt file that contains both LOG and LST files? 
A: filename proj1 'c:\mysas\output\proj1.txt'; 
proc printto log= proj1 print= proj1 new; 

           *begging your program here*/ 


           data test; 
          set sashelp.class; 
           proc reg; 
                      model weight = height; 
          run; 
           *end your program here*/ 

proc printto; 
run;
45. Q: I want to save paper when printing output file by skipping page breaks. What can I do? 

A: To turn off the page break setting form output window, use system options. 

To turn off the page break set this option.....


options formdlim=' '; a blank

To turn page breaks back on 


options formdlim=''; no blank
data new;
time="11:25";
hour=scan(time,1,':');
minute=scan(time,2,':');
sastime=hms(hour,minute,0);
format sastime time5.;
run;
46. Q: I have a character variable with regulate time value like 01:00:00 pm and 09:00:02
am in my SAS data set. I want to order 9am earlier than 1pm. What should I do? 

A: You can use input function with SAS time format to convert it to SAS time value, and then
sort by ascending. 

data a; 
*input begin time8.; 
input begin $ 10.; 
datalines;
01:00:00 am 
01:00:00 pm 
09:00:02 am 
12:00:03 pm 

data b; 
set a; 
newtime=input(begin, time12.); 
proc sort; 
by newtime; 
proc print; 
run;
47. Q: I have a character variable with regulate time value like 04:30:01 or
23:00:00 in my SAS data set. I want toconvert them to hours. How do I do that? 

A:
1. Use SCAN function to get how many hours, minutes and seconds from your character
variable
2. Use HMS function to convert them to seconds
3. Convert seconds to hours 

For example: 
data new; 
time="04:30:01"; 
hour=scan(time,1,':'); 
minute=scan(time,2,':'); 
se=scan(time,3,':'); 
sastime=hms(hour,minute,se); 
hr24=hms(hour,minute,se)/3600; 
convert24=hms(scan(time,1,':'),scan(time,2,':'),scan(time,3,':'))/3600; 
run;
48. Q: How do I convert character date value to a SAS date value? 

A: You can use input function to do it. 

data one; 
input chardate $8. ; 
datalines; 
19600101 
19591231 
20000101 

data two; 
set one; 
sasdate=input(chardate,yymmdd8.); 
run; 

Output: 
chardate sasdate 
19600101 0 
19591231 -1 
20000101 14610
49. Q: I have a character variable with mix regulate date value like 01mar99, 01 mar 99 and
01-mar-1999. I wantto convert them to SAS date value. What should I do? 

A: You can use input function with SAS date format to convert it to SAS date value. 

SAS data set. 
data aa; 
input @1 var1 $ 20. ; 
datalines;
01mar99 
01 mar 99 
01-mar-1999 

data bb; 
set aa; 
statadate = input(var1,date11.) ; 
run;
50. Q: How do I create a categorical variable using a variable's percentage? 

A: You can use Proc Freq to generate a variable called PERCENT, merge it back to your
original data, and then create a categorical variable using variable PERCENT. 

proc freq data=ye.class; 


table age/out=agefreq(keep=age PERCENT); 
run; 
proc sort data=ye.class; 
by age; 
proc sort data=agefreq; 
by age; 
data all; 
merge ye.class(in=a) agefreq; 
by age; 
if a; 
PERCENT=int(PERCENT); 
if PERCENT > 20 then freqcat=1; 
else freqcat=0; 
run;
51. Q. I keep getting an error message "out of memory" when I use proc freq with two
categorical variables to tryto get a freq report. Why and how do I avoid this problem? 

A: SAS stores all of the value combinations in memory during processing when you use Proc
Freq. When there aretoo many levels of categorical variables they will use up many memories
resulting in the error message "out of memory". In order to prevent this error, use Proc Sort
with By statement and also include a By statement in your Proc Freq.
52. Q: I need to create a comma separated file (.csv) from a SAS data, what should I do? 

A: A quick and easy way is using "Data _null_". 

filename outfile 'c:\class.csv';


data _null_; 
         file outfile; 
         set sashelp.class; 
         put (_all_) (','); 
run;

 What SAS statements would you code to read an external raw data file to a DATA
step? Infile statement “ path ”;
 How do you read in the variables that you need? Input statement
 Are you familiar with special input delimiters? How are they used? Dlm and DSD
 If reading a variable length file with fixed input, how would you prevent SAS from
reading the next record if the last variable didn't have a value?
Misover
 What is the difference between an informat and a format? Name three informats or
formats.

 Name and describe three SAS functions that you have used, if any?
1.Compress is one of the sas function which is used to
remove spaces in string value and concatenate two values
without spaces.

2.Input is another function of sas, it is one of conversion function in sas. It converts numeric
into char.

3.Put is another conversion function in sas. It converts


char to numeric.

SAS functions can be used to convert data and manipulate


character variable values.
different types of functions:1)TRIM
2)SUBSTR
3)ABS
4)SCAN
TRIM:Removing the trailing blanks from character expressions.
syntax=trim(argument)
substr:extracts the substring from an argument
syntax=substr(argument,position<,n>)
abs=returns the absolute of the argument
syntax=abs(argument)

 
 How would you code the criteria to restrict the output to be produced?
 What is the purpose of the trailing @? The @@? How would you use them?
 Under what circumstances would you code a SELECT construct instead of IF
statements?
 What statement do you code to tell SAS that it is to write to an external file? What
statement do you code to write the record to the file?
 If reading an external file to produce an external file, what is the shortcut to write
that record without coding every single variable on the record?
 If you're not wanting any SAS output from a data step, how would you code the
data statement to prevent SAS from producing a set?
 What is the one statement to set the criteria of data that can be coded in any step?
 Have you ever linked SAS code? If so, describe the link and any required
statements used to either process the code or the step itself.
 How would you include common or reuse code to be processed along with your
statements?
 When looking for data contained in a character string of 150 bytes, which function
is the best to locate that data: scan, index, or indexc?
 If you have a data set that contains 100 variables, but you need only five of those,
what is the code to force SAS to use only those variable?
 Code a PROC SORT on a data set containing State, District and County as the
primary variables, along with several numeric variables.
 How would you delete duplicate observations? NoDup
 How would you delete observations with duplicate keys? NoDopkey
 How would you code a merge that will keep only the observations that have
matches from both sets.

Data dataname ;
Merge test 1(in =a) test 2(in=b);
By id
If a=b;
Run;
 How would you code a merge that will write the matches of both to one data set,
the non-matches from the left-most data set to a second data set, and the non-matches
of the right-most data set to a third data set.
More SAS Interview Questions submitted by Sumit
Very Basic
 What SAS statements would you code to read an external raw data file to a DATA
step?
 How do you read in the variables that you need?
 Are you familiar with special input delimiters? How are they used?
 If reading a variable length file with fixed input, how would you prevent SAS from
reading the next record if the last variable didn't have a value?
 What is the difference between an informat and a format? Name three informats or
formats.
 Name and describe three SAS functions that you have used, if any?
 How would you code the criteria to restrict the output to be produced?
 What is the purpose of the trailing @? The @@? How would you use them?
 Under what circumstances would you code a SELECT construct instead of IF
statements?
 What statement do you code to tell SAS that it is to write to an external file? What
statement do you code to write the record to the file?
 If reading an external file to produce an external file, what is the shortcut to write
that record without coding every single variable on the record?
 If you're not wanting any SAS output from a data step, how would you code the
data statement to prevent SAS from producing a set?
 What is the one statement to set the criteria of data that can be coded in any step?
 Have you ever linked SAS code? If so, describe the link and any required
statements used to either process the code or the step itself.
 How would you include common or reuse code to be processed along with your
statements?
 When looking for data contained in a character string of 150 bytes, which function
is the best to locate that data: scan, index, or indexc?
 If you have a data set that contains 100 variables, but you need only five of those,
what is the code to force SAS to use only those variable?
 Code a PROC SORT on a data set containing State, District and County as the
primary variables, along with several numeric variables.
 How would you delete duplicate observations?
 How would you delete observations with duplicate keys?
 How would you code a merge that will keep only the observations that have
matches from both sets.
 How would you code a merge that will write the matches of both to one data set,
the non-matches from the left-most data set to a second data set, and the non-matches
of the right-most data set to a third data set.
Internals
 What is the Program Data Vector (PDV)? What are its functions?
PDV (Program Data Vector) is a logical area in memory where
SAS creates a dataset one observation at a time.
When SAS processes a data step it has two phases.
Compilation phase and execution phase.
During the compilation phase the input buffer is created to
hold a record from external file. After input buffer is
created the PDV is created. The PDV is the area of memory
where sas builds dataset, one observation at a time. The PDV contains two automatic variables
_N_ and _ERROR_.
Along with data set variables and computed variables, the
PDV contains two automatic variables, _N_ and _ERROR_. The
_N_ variable counts the number of times the DATA step
begins to iterate. The _ERROR_ variable signals the
occurrence of an error caused by the data during execution.
The value of _ERROR_ is either 0 (indicating no errors
exist), or 1 (indicating that one or more errors have
occurred). SAS does not write these variables to the output
data set.
  
 Does SAS 'Translate' (compile) or does it 'Interpret'? Explain.
 At compile time when a SAS data set is read, what items are created?
 Name statements that are recognized at compile time only?
 Identify statements whose placement in the DATA step is critical.
 Name statements that function at both compile and execution time.
 Name statements that are execution only.
 In the flow of DATA step processing, what is the first action in a typical DATA
Step?
 What is _n_?
Base SAS
 What is the effect of the OPTIONS statement ERRORS=1?
 What's the difference between VAR A1 - A4 and VAR A1 -- A4?
 What do the SAS log messages "numeric values have been converted to character"
mean? What are the implications?
 Why is a STOP statement needed for the POINT= option on a SET statement?
 How do you control the number of observations and/or variables read or written?
 Approximately what date is represented by the SAS date value of 730?
 How would you remove a format that has been permanently associated with a
variable??
 What does the RUN statement do?
 Why is SAS considered self-documenting?
 What areas of SAS are you most interested in?
 Briefly describe 5 ways to do a "table lookup" in SAS.
 What versions of SAS have you used (on which platforms)?
 What are some good SAS programming practices for processing very large data
sets?
 What are some problems you might encounter in processing missing values? *In
Data steps? Arithmetic? Comparisons? Functions? Classifying data?
 How would you create a data set with 1 observation and 30 variables from a data
set with 30 observations and 1 variable?
Proc transpose
 What is the different between functions can calculate one observation at a time like
mean for that observation and PROCs used for mnay observtaions at a time that
calculate the same simple descriptive statistics?
 If you were told to create many records from one record, show how you would do
this using arrays and with PROC TRANSPOSE?
 What are _numeric_ and _character_ and what do they do?
 How would you create multiple observations from a single observation?
 For what purpose would you use the RETAIN statement?
 What is a method for assigning first.VAR and last.VAR to the BY group variable
on unsorted data?
 What is the order of application for output data set options, input data set options
and SAS statements?
 What is the order of evaluation of the comparison operators: + - * / ** ( ) ?
Testing, debugging
 How could you generate test data with no input data?
 How do you debug and test your SAS programs?
 What can you learn from the SAS log when debugging?
 What is the purpose of _error_?
 How can you put a "trace" in your program?
 Are you sensitive to code walk-throughs, peer review, or QC review?
 Have you ever used the SAS Debugger?
 What other SAS features do you use for error trapping and data validation?
Missing values
 How does SAS handle missing values in: assignment statements, functions, a
merge, an update, sort order, formats, PROCs?
 How many missing values are available? When might you use them?
 How do you test for missing values?
 How are numeric and character missing values represented internally?
General
 What has been your most common programming mistake?
 What is your favorite programming language and why?
 What is your favorite operating system? Why?
 Do you observe any coding standards? What is your opinion of them?
 What percent of your program code is usually original and what percent copied
and modified?
 Have you ever had to follow SOPs or programming guidelines?
 Which is worse: not testing your programs or not commenting your programs?
 Name several ways to achieve efficiency in your program. Explain trade-offs.
 What other SAS products have you used and consider yourself proficient in using?
Functions
 How do you make use of functions?
 When looking for contained in a character string of 150 bytes, which function is
the best to locate that data: scan, index, or indexc?
 What is the significance of the 'OF' in X=SUM(OF a1-a4, a6, a9);?
 What do the PUT and INPUT functions do?
 Which date function advances a date, time or date/time value by a given interval?
 What do the MOD and INT function do?
 How might you use MOD and INT on numerics to mimic SUBSTR on character
strings?
 In ARRAY processing, what does the DIM function do?
 How would you determine the number of missing or nonmissing values in
computations?
 What is the difference between: x=a+b+c+d; and x=SUM(a,b,c,d);?
Same .
 There is a field containing a date. It needs to be displayed in the format "ddmonyy"
if it's before 1975, "dd mon ccyy" if it's after 1985, and as 'Disco Years' if it's between
1975 and 1985. How would you accomplish this in data step code? Using only PROC
FORMAT.
 In the following DATA step, what is needed for 'fraction' to print to the log? data
_null_; x=1/3; if x=.3333 then put 'fraction'; run;
 What is the difference between calculating the 'mean' using the mean function and
PROC MEANS?
PROCs
 Have you ever used "Proc Merge"? (be prepared for surprising answers..)
 If you were given several SAS data sets you were unfamiliar with, how would you
find out the variable names and formats of each dataset?
 What SAS PROCs have you used and consider yourself proficient in using?
 How would you keep SAS from overlaying the a SAS set with its sorted version?
 In PROC PRINT, can you print only variables that begin with the letter "A"?
 What are some differences between PROC SUMMARY and PROC MEANS?
 PROC FREQ: 
*Code the tables statement for a single-level (most common) frequency. 
*Code the tables statement to produce a multi-level frequency. 
*Name the option to produce a frequency line items rather that a table. 
*Produce output from a frequency. Restrict the printing of the table. 
 PROC MEANS: 
*Code a PROC MEANS that shows both summed and averaged output of the data. 
*Code the option that will allow MEANS to include missing numeric data to be
included in the report. 
*Code the MEANS to produce output to be used later. 
 Do you use PROC REPORT or PROC TABULATE? Which do you prefer?
Explain.
Merging/Updating
 What happens in a one-on-one merge? When would you use one?
 How would you combine 3 or more tables with different structures?
 What is a problem with merging two data sets that have variables with the same
name but different data?
 When would you choose to MERGE two data sets together and when would you
SET two data sets?
 Which data set is the controlling data set in the MERGE statement?
 How do the IN= variables improve the capability of a MERGE?
 Explain the message 'MERGE HAS ONE OR MORE DATASETS WITH
REPEATS OF BY VARIABLES".
Simple statistics
 How would you generate 1000 observations from a normal distribution with a
mean of 50 and standard deviation of 20. How would you use PROC CHART to look
at the distribution? Describe the shape of the distribution.
 How do you generate random samples?
Customized Report Writing
 What is the purpose of the statement DATA _NULL_ ;?
 What is the pound sign used for in the DATA _NULL_?
 What would you use the trailing @ sign for?
 For what purpose(s) would you use the RETURN statement?
 How would you determine how far down on a page you have printed in order to
print out footnotes?
 What is the purpose of using the N=PS option?
Macro
 What system options would you use to help debug a macro?
Symbologen
Mprint
Mlogic
Merror
Serror
memrpt
 Describe how you would create a macro variable.
 How do you identify a macro variable?
 How do you define the end of a macro?
 How do you assign a macro variable to a SAS variable?
 For what purposes have you used SAS macros?
 What is the difference between %LOCAL and %GLOBAL?
 How long can a macro variable be? A token?
 If you use a SYMPUT in a DATA step, when and where can you use the macro
variable?
 What do you code to create a macro? End one?
 Describe how you would pass data to a macro.
 You have five data sets that need to be processed identically; how would you
simplify that processing with a macro?
 How would you code a macro statement to produce information on the SAS log?
This statement can be coded anywhere.
 How do you add a number to a macro variable?
 If you need the value of a variable rather than the variable itself, what would you
use to load the value to a macro variable?
 Can you execute a macro within a macro? Describe.
 Can you a macro within another macro? If so, how would SAS know where the
current macro ended and the new one began?
Pharmaceutical Industry How are parameters passed to a macro?

 Describe the types of SAS programming tasks that you performed: Tables?
Listings? Graphics? Ad hoc reports? Other?
 Have you been involved in editing the data or writing data queries?
 What techniques and/or PROCs do you use for tables?
 Do you prefer PROC REPORT or PROC TABULATE? Why?
 Are you involved in writing the inferential analysis plan? Tables specifications?
 What do you feel about hardcoding?
 How experienced are you with customized reporting and use of DATA _NULL_
features?
_Null_ is useful only When u want to use Data step without actually creating a SAS dataset. A
set statement specifies the SAS dataset that you want to read from.

Data _null;
Set Clinic.stress;
 
 How do you write a test plan?
 What is the difference between verification and validation?
Difference between Verification and Validation:
Verification takes place before validation, and not vice versa. Verification evaluates
documents, plans, code, requirements, and specifications.
Validation, on the other hand, evaluates the product itself. The inputs of verification
are checklists, issues lists, walkthroughs and inspection meetings, reviews and
meetings. The input of validation, on the other hand, is the actual testing of an actual
product. The output of verification is a nearly perfect set of documents, plans,
specifications, and requirements document. The output of validation, on the other
hand, is a nearly perfect, actual product.
Intangibles
 What was the last computer book you purchased? Why?
 What is your favorite all time computer book? Why?
 For contractors: 
*Will it bother you if the guy at the next desk times the frequency and duration of your
bathroom/coffee breaks on the grounds that 'you are getting paid twice as much as he
is'? 

*How will you react when, while consulting a SAS documentation manual to get an
answer to a problem, someone says: 'hey, I thought you were supposed to know all that
stuff already, and not have to look it up in a book!' 

*Can you continue to write code while the rest of the people on the floor where you
work have a noisy party to which you were not invited?

Non-Technical
 Can you start on Monday?
 Do you think professionally? 
*How do you put a giraffe into the refrigerator? Correct answer: Open the refrigerator
door, put the giraffe in, and close the door. This question tests whether or not the
candidate is doing simple things in a complicated way. 

*How do you put an elephant in the refrigerator? Incorrect answer: Open the
refrigerator door, put in the elephant, and close the door. Correct answer: Open the
refrigerator door, take out the giraffe, put in the elephant, and close the door. This
question tests your foresight. 

*The Lion King is hosting an animal conference. All the animals in the world attend
except one. Which animal does not attend? Correct answer: The elephant. The
elephant is in the refrigerator, remember? This tests if you are capable of
comprehensive thinking. 

*There is a river notoriously known for it's large crocodile population. With ease, how
do you safely cross it? Correct answer: Simply swim across. All of the crocodiles are
attending the Lion King's animal conference. This questions your reasoning ability.
Open-ended questions
 Describe a time when you were really stuck on a problem and how you solved it.
 Describe the function and utility of the most difficult SAS macro that you have
written.
 Give me an example of ..
 Tell me how you dealt with ...
 How do handle working under pressure?
 Of all your work, where have you been the most successful?
 What are the best/worst aspects of your current job?
 If you could design your ideal job, what would it look like?
 How necessary is it to be creative in your work?
 If money were no object, what would you like to do?
 What would you change about your job?

 
INFILE, INPUT·
.Identify statements whose placement in the DATA step is critical.
 DATA, INPUT, RUN.
· Name statements that function at both compile and execution time.
 INPUT
· In the flow of DATA step processing, what is the first action in a typical DATA Step?
 The DATA step begins with a DATA statement. Each time the DATA statement executes,
anew iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1.
· What is _n_?
 It is a Data counter variable in SAS.Note: Both -N- and _ERROR_ variables are always
available to you in the data step.
 – 
N- indicates the number of times SAS has looped through the data step.This is notnecessarily
equal to the observation number, since a simple sub setting IF statement canchange the
relationship between Observation number and the number of iterations of the datastep.The
 – 
ERROR- variable ha a value of 1 if there is a error in the data for that observationand 0 if it is
not. Ex: This is nothing but a implicit variable created by SAS during dataprocessing. It gives
the total number of records SAS has iterated in a dataset. It is Availableonly for data step and
not for PROCS. Eg. If we want to find every third record in a Datasetthenwe can use the _n_
as follows
Data new-sas-data-set;Set old;if mod(_n_,3)= 1 then;run;
 Note: If we use a where clause to subset the _n_ will not yield the required result.
How do i convert a numeric variable to a character variable?
 You must create a differently-named variable using the PUT function.
How do i convert a character variable to a numeric variable?
 You must create a differently-named variable using the INPUT function.
How can I compute the age of something?
 Given two sas date variables born and calc:
age = int(intck('month',born,calc) / 12);if month(born) = month(calc) then age = age
- (day(born) >day(calc));
 
How can I compute the number of months between two dates?
Given two sas date variables begin and end:
months = intck('month',begin,end) - (day(end) <>
 
How can I determine the position of the nth word within acharacter string?
 
Use a combination of the INDEXW and SCAN functions:
pos =indexw(string,scan(string,n));
I need to reorder characters within a string...use SUBSTR?
  You can do this using only one function call with TRANSLATE versus two functionscalls
with SUBSTR. The following lines each move the first character of a 4-characterstring to the
last:
 
reorder = translate('2341',string,'1234');reorder = substr(string,2,3) substr(string,1,1);
How can I putmy sas date variable so that December 25, 1995 would appear as'19951225'?
(with no separator)use a combination of the YEAR. and MMDDYY. formats to
simply display the value:
put sasdate year4. sasdate mmddyy4.;
or use a combination of the PUT and COMPRESS functions to store the value:
newvar = compress(put(sasdate,yymmdd10.),'/');
 
How can I put my sas time variable with a leading zero for hours 1-9?
 Use a combination of the Z. and MMSS. formats:
hrprint = hour(sastime);put hrprint z2. ':' sastime mmss5.;
 
INFILE OPTIONS
 Prepared by Sreeja E V(sreeja@kreara.com) 
source: kreara.blogspot.com.
 Infile has a number of options available.
FLOWOVER
 FLOWOVER is the default option on INFILE statement. Here, when the INPUT
statementreaches the end of non-blank characters without having filled all variables, a new
line is readinto the Input Buffer and INPUT attempts to fill the rest of the variables starting
from columnone. The next time an INPUT statement is executed, a new line is brought into
the InputBuffer.Consider the following text file containing three variables id, type and
amount.
11101 A11102 A 10011103 B 4311104 C11105 C 67
 The following SAS code uses the flowover option which reads the next non missing valuesfor
missing variables.
data
B;infile"External file"flowover;

 
inputid $ type $ amount;
run
;
which creates the following dataset 
MISSOVER
When INPUT reads a short line, MISSOVER option on INFILE statement doesnot allow it
to move to the next line. MISSOVER option sets all the variables without valuesto missing.
data 
B;infile"External file"missover;inputid $ type $ amount;
run
;
which creates the following dataset
TRUNCOVER
 Causes the INPUT statement to read variable-length records where some records are
shorterthan the INPUT statement expects. Variables which are not assigned values are set
tomissing.
Difference between TRUNCOVER and MISSOVER
Both will assign missing values to variables if the data line ends before the variable‟s field
starts. But when the data line ends in the middle of a variable field, TRUNCOVER will takeas
much as is there, whereas MISSOVER will assign the variable a missing value.Consider the
text file below containing a character variable chr.
abbcccddddeeeeeffffff
 Consider the following SAS code
data
trun;infile"External file"truncover;inputchr$3.;
run
;
When using truncover option we get the following dataset
data
miss;infile"External file"missover;inputchr$3.;
run
;
While using missover option we get the output

SAS interview Questions and Answers

1) Which date functions advances a date time or date/time value by a given interval?
A) INTNX.

2) How we can call macros with in data step?


A) We can call the macro with CALLSYMPUT

3) In the flow of DATA step processing, what is the first action in a typical DATA Step?
A) When you submit a DATA step, SAS processes the DATA step and then creates a new
SAS data set.( creation of input buffer and PDV)Compilation PhaseExecution Phase

4) How do u identify a macro variable


A) Ampersand (&)
5) What are SAS/ACCESS and SAS/CONNECT?
A) SAS/Access only process through the databases like Oracle, SQL-server, Ms-Access etc.
SAS/Connect only use Server connection.

6) How could you generate test data with no input data?

7) What is the one statement to set the criteria of data that can be coded in any step?
A) OPTIONS Statement, Label statement, Keep / Drop statements
8) What is the purpose of using the N=PS option?
A) The N=PS option creates a buffer in memory which is large enough to store PAGESIZE
(PS) lines and enables a page to be formatted randomly prior to it being printed.
9) What are the scrubbing procedures in SAS?
A) Proc Sort with nodupkey option, because it will eliminate the duplicate values.

10) What are the new features included in the new version of SAS i.e., SAS9.1.3?
The main advantage of version 9 is faster execution of applications and centralized access of
data and support.
11) WHAT DIFFERRENCE DID YOU FIND AMONG VERSION 6 8 AND 9 OF SAS.

12) What are the advantages of using SAS in clinical data management? Why should not we
use other software products in managing clinical data?

ADVANTAGES OF USING A SAS®-BASED SYSTEM

Less hardware is required. A Typical SAS®-based system can utilize a standard file server to
store its databases and does not require one or more dedicated servers to handle the
application load. PC SAS® can easily be used to handle processing, while data access is left
to the file server. Additionally, as presented later in this paper, it is possible to use the SAS®
product SAS®/Share to provide a dedicated server to handle data transactions.

Fewer personnel are required. Systems that use complicated database software often require
the hiring of one ore more DBA’s (Database Administrators) who make sure the database
software is running, make changes to the structure of the database, etc. These individuals
often require special training or background experience in the particular database application
being used, typically Oracle. Additionally, consultants are often required to set up the system
and/or studies since dedicated servers and specific expertise requirements often complicate the
process.

Users with even casual SAS® experience can set up studies. Novice programmers can build
the structure of the database and design screens. Organizations that are involved in data
management almost always have at least one SAS® programmer already on staff. SAS®
programmers will have an understanding of how the system actually works which would
allow them to extend the functionality of the system by directly accessing SAS® data from
outside of the system.

Speed of setup is dramatically reduced. By keeping studies on a local file server and making
the database and screen design processes extremely simple and intuitive, setup time is reduced
from weeks to days.

All phases of the data management process become homogeneous. From entry to analysis,
data reside in SAS® data sets, often the end goal of every data management group.
Additionally, SAS® users are involved in each step, instead of having specialists from
different areas hand off pieces of studies during the project life cycle.

No data conversion is required. Since the data reside in SAS® data sets natively, no
conversion programs need to be written.

Data review can happen during the data entry process, on the master database. As long as
records are marked as being double-keyed, data review personnel can run edit check programs
and build queries on some patients while others are still being entered.
Tables and listings can be generated on live data. This helps speed up the development of
table and listing programs and allows programmers to avoid having to make continual copies
or extracts of the data during testing.

13) What has been your most common programming mistake?


I remember Missing semicolon and not checking log after submitting program, Not using
debugging techniques and not using Fsview option vigorously are my common programming
errors I made when I started learning SAS and in my initial projects.

14) Have you ever had to follow SOPs or programming guidelines?


SOP describes the process to assure that standard coding activities, which produce tables,
listings and graphs, functions and/or edit checks, are conducted in accordance with industry
standards are appropriately documented.
15) Name several ways to achieve efficiency in your program. Explain trade-offs.

Efficiency and performance strategies can be classified into 5 different areas.


· CPU time
· Data Storage
· Elapsed time
· Input/Output
· Memory

CPU Time and Elapsed Time- Base line measurements

Efficiency improving techniques:


Using KEEP and DROP statements to retain necessary variables.
Use macros for reducing the code.
Using IF-THEN/ELSE statements to process data programming.
Use SQL procedure to reduce number of programming steps.
Using of length statements to reduce the variable size for reducing the Data storage.
Use of Data _NULL_ steps for processing null data sets for Data storage.

16) What other SAS products have you used and consider yourself proficient in using?
Data _NULL_ statement, Proc Means, Proc Report, Proc tabulate, Proc freq and Proc print,
Proc Univariate etc.

17) What is the significance of the 'OF' in X=SUM (OF a1-a4, a6, a9);

If don’t use the OF function it might not be interpreted as we expect. For example the function
above calculates the sum of a1 minus a4 plus a6 and a9 and not the whole sum of a1 to a4 &
a6 and a9. It is true for mean option also.
18) What do the PUT and INPUT functions do?
INPUT function converts character data values to numeric values.
PUT function converts numeric values to character values.
EX: for INPUT: INPUT (source, informat)
For PUT: PUT (source, format)

Note that INPUT function requires INFORMAT and PUT function requires FORMAT.

If we omit the INPUT or the PUT function during the data conversion, SAS will detect the
mismatched variables and will try an automatic character-to-numeric or numeric-to-character
conversion. But sometimes this doesn’t work because $ sign prevents such conversion.
Therefore it is always advisable to include INPUT and PUT functions in your programs when
conversions occur.

19) Which date function advances a date, time or datetime value by a given interval?
INTNX: INTNX function advances a date, time, or datetime value by a given interval, and
returns a date, time, or datetime value.
Ex: INTNX(interval,start-from,number-of-increments,alignment)

INTCK: INTCK(interval,start-of-period,end-of-period) is an interval functioncounts the


number of intervals between two give SAS dates, Time and/or datetime.
DATETIME () returns the current date and time of day.
DATDIF (sdate,edate,basis): returns the number of days between two dates.

20) What do the MOD and INT function do? What do the PAD and DIM functions do?

MOD: Modulo is a constant or numeric variable, the function returns the reminder after
numeric value divided by modulo.
INT: It returns the integer portion of a numeric value truncating the decimal portion.
PAD: it pads each record with blanks so that all data lines have the same length. It is used in
the INFILE statement. It is useful only when missing data occurs at the end of the record.
CATX: concatenate character strings, removes leading and trailing blanks and inserts
separators.
SCAN: it returns a specified word from a character value. Scan function assigns a length of
200 to each target variable.
SUBSTR: extracts a sub string and replaces character values.
Extraction of a substring: Middleinitial=substr(middlename,1,1);
Replacing character values: substr (phone,1,3)=’433’;
If SUBSTR function is on the left side of a statement, the function replaces the contents of the
character variable.
TRIM: trims the trailing blanks from the character values.

SCAN vs. SUBSTR:


SCAN extracts words within a value that is marked by delimiters.
SUBSTR extracts a portion of the value by stating the specific location. It is best used when
we know the exact position of the sub string to extract from a character value.

21) How might you use MOD and INT on numeric to mimic SUBSTR on character
Strings?
The first argument to the MOD function is a numeric, the second is a non-zero numeric; the
result is the remainder when the integer quotient of argument-1 is divided by argument-2. The
INT function takes only one argument and returns the integer portion of an argument,
truncating the decimal portion. Note that the argument can be an
expression.

DATA NEW ;
A = 123456 ;
X = INT( A/1000 ) ;
Y = MOD( A, 1000 ) ;
Z = MOD( INT( A/100 ), 100 ) ;
PUT A= X= Y= Z= ;
RUN ;
A=123456
X=123
Y=456
Z=34

22) In ARRAY processing, what does the DIM function do?


DIM: It is used to return the number of elements in the array. When we use Dim function we
would have to re –specify the stop value of an iterative DO statement if u change the
dimension of the array.

Difference between Proc Report and Proc tabulate:

Proc Tabulate is a possibility to report statistical relations between variables in up to three


dimensions (rows, columns, pages). You don't have too many possibilities to influence single
cells, rows, columns, pages and not too much on the layout. The things you influence are
alsways related to whole dimensions. If you want to have something like calculated columns,
e.g. one is the difference of the 3 left of it, not possible. If you want to do it anyway, it's
getting difficult. The main goal is to present summarized data-values in cells.

Proc report mainly is a listing procedure. Very strong features to influence the layout, also
with ordering and grouping. The simplest form of a REPORT output is not a table, but a list,
where the results of statistics is presenten in SUMMARY lines while the other lines contain
the details. In addition, you HAVE influence on singel cells, rows, columns. You CAN relate
columns and have calculated columns of them which are left of the new one. Cou can have
influence on all rows with a DATA-step like programming language and you can influence
single cells with that. E.g. a "traffic-lighting" dependant on certain limits is possible.

SAS Interview Questions:Base SAS

Very Basic:

· What SAS statements would you code to read an external raw data file to a DATA step?

INFILE statement.

· How do you read in the variables that you need?

Using Input statement with the column pointers like @5/12-17 etc.

· Are you familiar with special input delimiters? How are they used?

DLM and DSD are the delimiters that I’ve used. They should be included in the infile
statement. Comma separated values files or CSV files are a common type of file that can be
used to read with the DSD option. DSD option treats two delimiters in a row as MISSING
value. DSD also ignores the delimiters enclosed in quotation marks.

· If reading a variable length file with fixed input, how would you prevent SAS from reading
the next record if the last variable didn't have a value?

By using the option MISSOVER in the infile statement.


If the input of some data lines are shorter than others then we use TRUNCOVER option in the
infile statement.

· What is the difference between an informat and a format? Name three informats or formats.
Informats read the data. Format is to write the data.
Informats: comma. dollar. date.Formats can be same as informats
Informats: MMDDYYw. DATEw. TIMEw. , PERCENTw,
Formats: WORDIATE18., weekdatew.

· Name and describe three SAS functions that you have used, if any?

LENGTH: returns the length of an argument not counting the trailing blanks.(missing values
have a length of 1)
Ex: a=’my cat’;
x=LENGTH(a); Result: x=6…

SUBSTR: SUBSTR(arg,position,n) extracts a substring from an argument starting at


‘position’ for ‘n’ characters or until end if no ‘n’.
Ex: A=’(916)734-6241’;
X=SUBSTR(a,2,3); RESULT: x=’916’

TRIM: removes trailing blanks from character expression.


Ex: a=’my ‘; b=’cat’;
X= TRIM(a)(b); RESULT: x=’mycat’.

SUM: sum of non missing values.


Ex: x=Sum(3,5,1); result: x=9.0

INT: Returns the integer portion of the argument.

· How would you code the criteria to restrict the output to be produced?

Use NOPRINT option.

· What is the purpose of the trailing @ and the @@? How would you use them?
@ holds the value past the data step.
@@ holds the value till a input statement or end of the line.

Double trailing @@: When you have multiple observations per line of raw data, we should
use double trailing signs (@@) at the end of the INPUT statement. The line hold specifies like
a stop sign telling SAS, “stop, hold that line of raw data”.

Trailing @: By using @ without specifying a column, it is as if you are telling SAS,” stay
tuned for more information. Don’t touch that dial”. SAS will hold the line of data until it
reaches either the end of the data step or an INPUT statement that does not end with the
trailing.

· Under what circumstances would you code a SELECT construct instead of IF statements?

When you have a long series of mutually exclusive conditions and the comparison is numeric,
using a SELECT group is slightly more efficient than using IF-THEN or IF-THEN-ELSE
statements because CPU time is reduced.
SELECT GROUP:
Select: begins with select group.
When: identifies SAS statements that are executed when a particular condition is true.
Otherwise (optional): specifies a statement to be executed if no WHEN condition is met.
End: ends a SELECT group.

·What statement you code to tell SAS that it is to write to an external file? What statement do
you code to write the record to the file?

PUT and FILE statements.

· If reading an external file to produce an external file, what is the shortcut to write that record
without coding every single variable on the record?

· If you're not wanting any SAS output from a data step, how would you code the data
statement to prevent SAS from producing a set?

Data _Null_

· What is the one statement to set the criteria of data that can be coded in any step?

Options statement: This a part of SAS program and effects all steps that follow it.

· Have you ever linked SAS code? If so, describe the link and any required statements used to
either process the code or the step itself.

· How would you include common or reuse code to be processed along with your statements?

By using SAS Macros.

· When looking for data contained in a character string of 150 bytes, which function is the
best to locate that data: scan, index, or indexc?

SCAN.

· If you have a data set that contains 100 variables, but you need only five of those, what is the
code to force SAS to use only those variable?
Using KEEP option or statement.

· Code a PROC SORT on a data set containing State, District and County as the primary
variables, along with several numeric variables.
Proc sort data=
BY State District County ;
Run ;

· How would you delete duplicate observations?

NONUPLICATES

· How would you delete observations with duplicate keys?

NODUPKEY

· How would you code a merge that will keep only the observations that have matches from
both sets.

Check the condition by using If statement in the Merge statement while merging datasets.

· How would you code a merge that will write the matches of both to one data set, the non-
matches from the left-most data.
Step1: Define 3 datasets in DATA step
Step2: Assign values of IN statement to different variables for 2 datasets
Step3: Check for the condition using IF statement and output the matching to first dataset and
no matches to different datasets
Ex: data xxxmerge yyy(in = inxxx) zzz (in = inzzz);by aaa;if inxxx = 1 and inyyy = 1;run;

· What is the Program Data Vector (PDV)? What are its functions?
Function: To store the current obs;
PDV (Program Data Vector) is a logical area in memory where SAS creates a dataset one
observation at a time. When SAS processes a data step it has two phases. Compilation phase
and execution phase. During the compilation phase the input buffer is created to hold a record
from external file. After input buffer is created the PDV is created. The PDV is the area of
memory where SAS builds dataset, one observation at a time. The PDV contains two
automatic variables _N_ and _ERROR_.

· Does SAS 'Translate' (compile) or does it 'Interpret'? Explain.


SAS compiles the code
· At compile time when a SAS data set is read, what items are created?
Automatic variables are created. Input Buffer, PDV and Descriptor Information

· Name statements that are recognized at compile time only?

PUT

· Name statements that are execution only.

INFILE, INPUT

· Identify statements whose placement in the DATA step is critical.


DATA, INPUT, RUN.

· Name statements that function at both compile and execution time.

INPUT

· In the flow of DATA step processing, what is the first action in a typical DATA Step?

The DATA step begins with a DATA statement. Each time the DATA statement executes, a
new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1.

· What is _n_?
It is a Data counter variable in SAS.

Note: Both -N- and _ERROR_ variables are always available to you in the data step.
–N- indicates the number of times SAS has looped through the data step.
This is not necessarily equal to the observation number, since a simple sub setting IF
statement can change the relationship between Observation number and the number of
iterations of the data step.
The –ERROR- variable ha a value of 1 if there is a error in the data for that observation and 0
if it is not. Ex: This is nothing but a implicit variable created by SAS during data processing.
It gives the total number of records SAS has iterated in a dataset. It is Available only for data
step and not for PROCS. Eg. If we want to find every third record in a Dataset thenwe can use
the _n_ as follows Data new-sas-data-set;Set old;if mod(_n_,3)= 1 then;run;Note: If we use a
where clause to subset the _n_ will not yield the required result.

SAS interview questions:General

Under what circumstances would you code a SELECT construct instead of IF statements?

A: I think Select statement are used when you are using one condition to compare with several
conditions like
select pass
when Physics >60
when math > 100
when English = 50;
otherwise fail;

What is the one statement to set the criteria of data that can be coded
in any step?
A) Options statement.

What is the effect of the OPTIONS statement ERRORS=1?


A) The –ERROR- variable ha a value of 1 if there is a error in the data for that observation
and 0 if it is not.

What's the difference between VAR A1 - A4 and VAR A1 -- A4 ?


A: There is no diff between VAR A1-A4 an VAR A1—A4. Where as If u submit VAR A1---
A4 instead of VAR A1-A4 or VAR A1—A3, u will see error message in the log.

What do the SAS log messages "numeric values have been converted to character" mean?
What are the implications?
It implies that automatic conversion took place to make character functions possible

Why is a STOP statement needed for the POINT= option on a SET statement?
Because POINT= reads only the specified observations, SAS cannot detect an end-of-file
condition as it would if the file were being read sequentially.

How do you control the number of observations and/or variables read or written?
FIRSTOBS and OBS option

Approximately what date is represented by the SAS date value of 730?


31st December 1961

Identify statements whose placement in the DATA step is critical.


A: INPUT, DATA and RUN…

Does SAS 'Translate' (compile) or does it 'Interpret'? Explain.


A) Compile

What does the RUN statement do?


a) When SAS editor looks at Run it starts compiling the data or proc step, if you have more
than one data step or proc step or if you have a proc step Following the data step then you can
avoid the usage of the run statement.

Why is SAS considered self-documenting?


A) SAS is considered self documenting because during the compilation time it creates and
stores all the information about the data set like the time and date of the data set creation later
No. of the variables later labels all that kind of info inside the dataset and you can look at that
info
using proc contents procedure.

What are some good SAS programming practices for processing very large data sets?
A) Sort them once, can use firstobs = and obs = ,

What is the different between functions and PROCs that calculate the
same simple descriptive statistics?
A)Functions can used inside the data step and on the same data set but with proc's you can
create a new data sets to output the results. May be more ...........

If you were told to create many records from one record, show how you
would do this using arrays and with PROC TRANSPOSE?
A) I would use TRANSPOSE if the variables are less use arrays if the var are more .................
depends

What is a method for assigning first.VAR and last.VAR to the BY group


variable on unsorted data?
A) In Unsorted data you can't use First. or Last.

How do you debug and test your SAS programs?


A) First thing is look into Log for errors or warning or NOTE in some cases or use the
debugger in SAS data step.

What other SAS features do you use for error trapping and data
validation?
A) Check the Log and for data validation things like Proc Freq, Proc means or some times
proc print to look how the data looks like ........

How would you combine 3 or more tables with different structures?


A) I think sort them with common variables and use merge statement. I am not sure what you
mean different structures.

other questions:

What areas of SAS are you most interested in?


BASE, STAT, GRAPH, ETS

Briefly describe 5 ways to do a "table lookup" in SAS.


Match Merging, Direct Access, Format Tables, Arrays, PROC SQL

What versions of SAS have you used (on which platforms)?


SAS 8.2 in Windows and UNIX, SAS 7 and 6.12
What are some good SAS programming practices for processing very large data sets?
Sampling method using OBS option or subsetting, commenting the Lines, Use Data Null

What are some problems you might encounter in processing missing values? In Data steps?
Arithmetic? Comparisons? Functions? Classifying data?
The result of any operation with missing value will result in missing value. Most SAS
statistical procedures exclude observations with any missing variable values from an analysis.

How would you create a data set with 1 observation and 30 variables from a data set with 30
observations and 1 variable?
Using PROC TRANSPOSE

What is the different between functions and PROCs that calculate the same simple descriptive
statistics?
Proc can be used with wider scope and the results can be sent to a different dataset. Functions
usually affect the existing datasets.

If you were told to create many records from one record, show how you would do this using
array and with PROC TRANSPOSE?
Declare array for number of variables in the record and then used Do loop
Proc Transpose with VAR statement

What are _numeric_ and _character_ and what do they do?


Will either read or writes all numeric and character variables in dataset.

How would you create multiple observations from a single observation?


Using double Trailing @@

For what purpose would you use the RETAIN statement?


The retain statement is used to hold the values of variables across iterations of the data step.
Normally, all variables in the data step are set to missing at the start of each iteration of the
data step.

What is the order of evaluation of the comparison operators: + - * / ** ()?


(), **, *, /, +, -

How could you generate test data with no input data?


Using Data Null and put statement

How do you debug and test your SAS programs?


Using Obs=0 and systems options to trace the program execution in log.

What can you learn from the SAS log when debugging?
It will display the execution of whole program and the logic. It will also display the error with
line number so that you can and edit the program.

What is the purpose of _error_?


It has only to values, which are 1 for error and 0 for no error
How can you put a "trace" in your program?
By using ODS TRACE ON

How does SAS handle missing values in: assignment statements, functions, a merge, an
update, sort order, formats, PROCs?
Missing values will be assigned as missing in Assignment statement. Sort order treats missing
as second smallest followed by underscore.

How do you test for missing values?


Using Subset functions like IF then Else, Where and Select

How are numeric and character missing values represented internally?


Character as Blank or “ and Numeric as.

Which date functions advances a date time or date/time value by a given interval?
INTNX.

In the flow of DATA step processing, what is the first action in a typical DATA Step?
When you submit a DATA step, SAS processes the DATA step and then creates a new SAS
data set.( creation of input buffer and PDV)
Compilation Phase
Execution Phase

What are SAS/ACCESS and SAS/CONNECT?


SAS/Access only process through the databases like Oracle, SQL-Server, Ms-Access etc.
SAS/Connect only use Server connection.

What is the one statement to set the criteria of data that can be coded in any step? OPTIONS
Statement, Label statement, Keep / Drop statements.

What is the purpose of using the N=PS option?


The N=PS option creates a buffer in memory which is large enough to store PAGESIZE (PS)
lines and enables a page to be formatted randomly prior to it being printed.

What are the scrubbing procedures in SAS?


Proc Sort with nodupkey option, because it will eliminate the duplicate values.

hat are the new features included in the new version of SAS i.e., SAS9.1.3?
The main advantage of version 9 is faster execution of applications and centralized access of
data and support.
There are lots of changes has been made in the version 9 when we compared with the version
8. The following are the few:
SAS version 9 supports Formats longer than 8 bytes & is not possible with version 8.
Length for Numeric format allowed in version 9 is 32 where as 8 in version 8.
Length for Character names in version 9 is 31 where as in version 8 is 32.
Length for numeric informat in version 9 is 31, 8 in version 8.
Length for character names is 30, 32 in version 8.
3 new informats are available in version 9 to convert various date, time and datetime forms of
data into a SAS date or SAS time. ·
ANYDTDTEW. - Converts to a SAS date value ·
ANYDTTMEW. - Converts to a SAS time value. ·
ANYDTDTMW. -Converts to a SAS datetime value.
CALL SYMPUTX Macro statement is added in the version 9 which creates a macro variable
at execution time in the data step by ·
Trimming trailing blanks · Automatically converting numeric value to character.

New ODS option (COLUMN OPTION) is included to create a multiple columns in the output.

WHAT DIFFERRENCE DID YOU FIND AMONG VERSION 6 8 AND 9 OF SAS. The
SAS 9
Architecture is fundamentally different from any prior version of SAS. In the SAS 9
architecture, SAS relies on a new component, the Metadata Server, to provide an information
layer between the programs and the data they access. Metadata, such as security permissions
for SAS libraries and where the various SAS servers are running, are maintained in a common
repository.

What has been your most common programming mistake?


Missing semicolon and not checking log after submitting program, Not using debugging
techniques and not using Fsview option vigorously.

Name several ways to achieve efficiency in your program. Explain trade-offs. Efficiency and
performance strategies can be classified into 5 different areas. ·
CPU time
·Data Storage
· Elapsed time
· Input/Output
· Memory

CPU Time and Elapsed Time- Base line measurements Few Examples for efficiency
violations: Retaining unwanted datasets Not sub setting early to eliminate unwanted records.

Efficiency improving techniques: Using KEEP and DROP statements to retain necessary
variables. Use macros for reducing the code. Using IF-THEN/ELSE statements to process
data programming. Use SQL procedure to reduce number of programming steps. Using of
length statements to reduce the variable size for reducing the Data storage.
Use of Data _NULL_ steps for processing null data sets for Data storage.

What other SAS products have you used and consider yourself proficient in using? Data
_NULL_ statement, Proc Means, Proc Report, Proc tabulate, Proc freq and Proc print, Proc
Univariate etc.

What is the significance of the 'OF' in X=SUM (OF a1-a4, a6, a9);
If don’t use the OF function it might not be interpreted as we expect. For example the function
above calculates the sum of a1 minus a4 plus a6 and a9 and not the whole sum of a1 to a4 &
a6 and a9. It is true for mean option also.

What do the PUT and INPUT functions do?


INPUT function converts character data values to numeric values. PUT function converts
numeric values to character values.
EX: for INPUT: INPUT (source, informat)
For PUT: PUT (source, format)
Note that INPUT function requires INFORMAT and PUT function requires FORMAT. If we
omit the INPUT or the PUT function during the data conversion, SAS will detect the
mismatched variables and will try an automatic character-to-numeric or numeric-to-character
conversion. But sometimes this doesn’t work because $ sign prevents such conversion.
Therefore it is always advisable to include INPUT and PUT functions in your programs when
conversions occur.

Which date function advances a date, time or datetime value by a given interval?

INTNX: INTNX function advances a date, time, or datetime value by a given interval, and
returns a date, time, or datetime value. Ex: INTNX(interval,start-from,number-of-
increments,alignment)

INTCK: INTCK(interval,start-of-period,end-of-period) is an interval functioncounts the


number of intervals between two give SAS dates, Time and/or datetime. DATETIME ()
returns the current date and time of day. DATDIF (sdate,edate,basis): returns the number of
days between two dates.

What do the MOD and INT function do? What do the PAD and DIM functions do?

MOD: Modulo is a constant or numeric variable, the function returns the reminder after
numeric value divided by modulo.

INT: It returns the integer portion of a numeric value truncating the decimal portion.

PAD: it pads each record with blanks so that all data lines have the same length. It is used in
the INFILE statement. It is useful only when missing data occurs at the end of the record.

CATX: concatenate character strings, removes leading and trailing blanks and inserts
separators.

SCAN: it returns a specified word from a character value. Scan function assigns a length of
200 to each target variable.

SUBSTR: extracts a sub string and replaces character values.


Extraction of a substring: Middleinitial=substr(middlename,1,1); Replacing character values:
substr (phone,1,3)=’433’; If SUBSTR function is on the left side of a statement, the function
replaces the contents of the character variable.

TRIM: trims the trailing blanks from the character values.


SCAN vs. SUBSTR: SCAN extracts words within a value that is marked by delimiters.

SUBSTR extracts a portion of the value by stating the specific location. It is best used when
we know the exact position of the sub string to extract from a character value.

How might you use MOD and INT on numeric to mimic SUBSTR on character Strings?

The first argument to the MOD function is a numeric, the second is a non-zero numeric; the
result is the remainder when the integer quotient of argument-1 is divided by argument-2. The
INT function takes only one argument and returns the integer portion of an argument,
truncating the decimal portion. Note that the argument can be an expression.
DATA NEW ;
A = 123456 ;
X = INT( A/1000 ) ;
Y = MOD( A, 1000 ) ;
Z = MOD( INT( A/100 ), 100 ) ;
PUT A= X= Y= Z= ;
RUN ;
A=123456
X=123
Y=456
Z=34

In ARRAY processing, what does the DIM function do?

DIM: It is used to return the number of elements in the array. When we use Dim function we
would have to re –specify the stop value of an iterative DO statement if u change the
dimension of the array.

How would you determine the number of missing or nonmissing values in computations?
To determine the number of missing values that are excluded in a computation, use the
NMISS function.
data _null_;
m=.;y=4;z=0;
N = N(m , y, z);
NMISS = NMISS (m , y, z);
run;

The above program results in N = 2 (Number of non missing values) and NMISS = 1 (number
of missing values).

Do you need to know if there are any missing values?

Just use: missing_values=MISSING(field1,field2,field3); This function simply returns 0 if


there aren't any or 1 if there are missing values.
If you need to know how many missing values you have then use
num_missing=NMISS(field1,field2,field3); You can also find the number of non-missing
values with non_missing=N (field1,field2,field3);

What is the difference between: x=a+b+c+d; and x=SUM (of a, b, c ,d);?

Is anyone wondering why you wouldn’t just use total=field1+field2+field3; First, how do you
want missing values handled? The SUM function returns the sum of non-missing values. If
you choose addition, you will get a missing value for the result if any of the fields are missing.
Which one is appropriate depends upon your needs.

However, there is an advantage to use the SUM function even if you want the results to be
missing. If you have more than a couple fields, you can often use shortcuts in writing the field
names If your fields are not numbered sequentially but are stored in the program data vector
together then you can use: total=SUM(of fielda--zfield); Just make sure you remember the
“of” and the double dashes or your code will run but you won’t get your intended results.
Mean is another function where the function will calculate differently than the writing out the
formula if you have missing values.

There is a field containing a date. It needs to be displayed in the format "ddmonyy" if it's
before 1975, "dd mon ccyy" if it's after 1985, and as 'Disco Years' if it's between 1975 and
1985. How would you accomplish this in data step code? Using only PROC FORMAT.
data new ;
input date ddmmyy10. ;
cards;
01/05/1955
01/09/1970
01/12/1975
19/10/1979
25/10/1982
10/10/1988
27/12/1991
;
run;
proc format ;
value dat low-'01jan1975'd=ddmmyy10.
'01jan1975'd-'01JAN1985'd="Disco Years"
'01JAN1985'd-high=date9.;
run;
proc print;
format date dat. ;
run;

In the following DATA step, what is needed for 'fraction' to print to the log?
data _null_;
x=1/3;
if x=.3333 then put 'fraction';
run;
What is the difference between calculating the 'mean' using the mean function and PROC
MEANS?

By default Proc Means calculate the summary statistics like N, Mean, Std deviation,
Minimum and maximum, Where as Mean function compute only the mean values.
What are some differences between PROC SUMMARY and PROC MEANS?

Proc means by default give you the output in the output window and you can stop this by the
option NOPRINT and can take the output in the separate file by the statement
OUTPUTOUT= , But, proc summary doesn't give the default output, we have to explicitly
give the output statement and then print the data by giving PRINT option to see the result.

What is a problem with merging two data sets that have variables with the same name but
different data?

Understanding the basic algorithm of MERGE will help you understand how the step
Processes. There are still a few common scenarios whose results sometimes catch users off
guard. Here are a few of the most frequent 'gotchas':

1- BY variables has different lengths


It is possible to perform a MERGE when the lengths of the BY variables are different,
But if the data set with the shorter version is listed first on the MERGE statement, the
Shorter length will be used for the length of the BY variable during the merge. Due to this
shorter length, truncation occurs and unintended combinations could result.

In Version 8, a warning is issued to point out this data integrity risk. The warning will be
issued regardless of which data set is listed first:

WARNING: Multiple lengths were specified for the BY variable name by input data sets.
This may cause unexpected results. Truncation can be avoided by naming the data set with the
longest length for the BY variable first on the MERGE statement, but the warning message is
still issued. To prevent the warning, ensure the BY variables have the same length prior to
combining them in the MERGE step with PROC CONTENTS. You can change the variable
length with either a LENGTH statement in the merge DATA step prior to the MERGE
statement, or by recreating the data sets to have identical lengths for the BY variables.

Note: When doing MERGE we should not have MERGE and IF-THEN statement in one data
step if the IF-THEN statement involves two variables that come from two different merging
data sets. If it is not completely clear when MERGE and IF-THEN can be used in one data
step and when it should not be, then it is best to simply always separate them in different data
step. By following the above recommendation, it will ensure an error-free merge result.

Which data set is the controlling data set in the MERGE statement?
Dataset having the less number of observations control the data set in the merge statement.
How do the IN= variables improve the capability of a MERGE?
The IN=variables

What if you want to keep in the output data set of a merge only the matches (only those
observations to which both input data sets contribute)?

SAS will set up for you special temporary variables, called the "IN=" variables, so that you
can do this and more. Here's what you have to do: signal to SAS on the MERGE statement
that you need the IN= variables for the input data set(s) use the IN= variables in the data step
appropriately, So to keep only the matches in the match-merge above, ask for the IN=
variables and use them:
data three;
merge one(in=x) two(in=y); /* x & y are your choices of names */
by id; /* for the IN= variables for data */
if x=1 and y=1; /* sets one and two respectively */
run;

What techniques and/or PROCs do you use for tables?


Proc Freq, Proc univariate, Proc Tabulate & Proc Report.

Do you prefer PROC REPORT or PROC TABULATE? Why?


I prefer to use Proc report until I have to create cross tabulation tables, because, It gives me so
many options to modify the look up of my table, (ex: Width option, by this we can change the
width of each column in the table) Where as Proc tabulate unable to produce some of the
things in my table. Ex: tabulate doesn’t produce n (%) in the desirable format.

How experienced are you with customized reporting and use of DATA _NULL_ features?
I have very good experience in creating customized reports as well as with Data _NULL_
step. It’s a Data step that generates a report without creating the dataset there by development
time can be saved. The other advantages of Data NULL is when we submit, if there is any
compilation error is there in the statement which can be detected and written to the log there
by error can be detected by checking the log after submitting it. It is also used to create the
macro variables in the data set.

What is the difference between nodup and nodupkey options?


NODUP compares all the variables in our dataset while NODUPKEY compares just the BY
variables.

What is the difference between compiler and interpreter? Give any one example (software
product) that act as an interpreter?
Both are similar as they achieve similar purposes, but inherently different as to how they
achieve that purpose. The interpreter translates instructions one at a time, and then executes
those instructions immediately. Compiled code takes programs (source) written in SAS
programming language, and then ultimately translates it into object code or machine language.
Compiled code does the work much more efficiently, because it produces a complete machine
language program, which can then be executed.

Code the table’s statement for a single level frequency?


Proc freq data=lib.dataset;
table var; *here you can mention single variable of multiple variables seperated by space to
get single frequency;
run;

What is the main difference between rename and label?


1. Label is global and rename is local i.e., label statement can be used either in proc or data
step where as rename should be used only in data step. 2.If we rename a variable, old name
will be lost but if we label a variable its short name (old name) exists along with its
descriptive name.

What is Enterprise Guide? What is the use of it?


It is an approach to import text files with SAS (It comes free with Base SAS version 9.0)

What other SAS features do you use for error trapping and data validation? What are the
validation tools in SAS?
For dataset: Data set name/debug
Data set: name/stmtchk
For macros: Options:mprint mlogic symbolgen.

How can you put a "trace" in your program?


ODS Trace ON, ODS Trace OFF the trace records.

How would you code a merge that will keep only the observations that have matches from
both data sets?
Using "IN" variable option. Look at the following example.
data three;
merge one(in=x) two(in=y);
by id;
if x=1 and y=1;
run;

or
data three;
merge one(in=x) two(in=y);
by id;
if x and y;
run;

What are input dataset and output dataset options?


Input data set options are obs, firstobs, where, in output data set options compress, reuse.Both
input and output dataset options include keep, drop, rename, obs, first obs.

How can u create zero observation dataset?


Creating a data set by using the like clause.
ex: proc sql;
create table latha.emp like oracle.emp;
quit;
In this the like clause triggers the existing table structure to be copied to the new table. using
this method result in the creation of an empty table.
Have you ever-linked SAS code, If so, describe the link and any required statements used to
either process the code or the step itself?
In the editor window we write
%include 'path of the sas file';
run;

if it is with non-windowing environment no need to give run statement.

How can u import .CSV file in to SAS? tell Syntax?


To create CSV file, we have to open notepad, then, declare the variables.

proc import datafile='E:\age.csv'out=sarath


dbms=csv
replace;
getnames=yes;
proc print data=sarath;
run;

What is the use of Proc SQl?


PROC SQL is a powerful tool in SAS, which combines the functionality of data and proc
steps. PROC SQL can sort, summarize, subset, join (merge), and concatenate datasets, create
new variables, and print the results or create a new dataset all in one step! PROC SQL uses
fewer resources when compard to that of data and proc steps. To join files in PROC SQL it
does not require to sort the data prior to merging, which is must, is data merge.

What is SAS GRAPH?


SAS/GRAPH software creates and delivers accurate, high-impact visuals that enable decision
makers to gain a quick understanding of critical business issues.

Why is a STOP statement needed for the point=option on a SET statement?


When you use the POINT= option, you must include a STOP statement to stop DATA step
processing, programming logic that checks for an invalid value of the POINT= variable, or
Both. Because POINT= reads only those observations that are specified in the DO statement,
SAScannot read an end-of-file indicator as it would if the file were being read sequentially.
Because reading an end-of-file indicator ends a DATA step automatically, failure to substitute
another means of ending the DATA step when you use POINT= can cause the DATA step to
go into a continuous loop.

What is the difference between nodup and nodupkey options?


The NODUP option checks for and eliminates duplicate observations. The NODUPKEY
option checks for and eliminates duplicate observations by variable values.

Have you used macros? For what purpose you have used?
Yes I have, I used macros in creating analysis datasets and tables where it is necessary to
make a small change through out the program and where it is necessary to use the code again
and again.

2. How would you invoke a macro?


After I have defined a macro I can invoke it by adding the percent sign prefix to its name like
this: % macro name a semicolon is not required when invoking a macro, though adding one
generally does no harm.

3. How we can call macros with in data step?


We can call the macro with CALLSYMPUT

4. How do u identify a macro variable?


Ampersand (&)

5. How do you define the end of a macro?


The end of the macro is defined by %Mend Statement

6. For what purposes have you used SAS macros?


If we want use a program step for executing to execute the same Proc step on multiple data
sets. We can accomplish repetitive tasks quickly and efficiently. A macro program can be
reused many times. Parameters passed to the macro program customize the results without
having to change the code within the macro program. Macros in SAS make a small change in
the program and have SAS echo that change thought that program.

7. What is the difference between %LOCAL and %GLOBAL?


% Local is a macro variable defined inside a macro.%Global is a macro variable defined in
open code (outside the macro or can use anywhere).

8. How long can a macro variable be? A token?


A component of SAS known as the word scanner breaks the program text into fundamental
units called tokens.· Tokens are passed on demand to the compiler.· The compiler then
requests token until it receives a semicolon.· Then the compiler performs the syntax check on
the statement.

9. If you use a SYMPUT in a DATA step, when and where can you use the macro variable?
Macro variable is used inside the Call Symput statement and is enclosed in quotes.

10. What do you code to create a macro? End one?


%MACRO and %MEND

11. What is the difference between %PUT and SYMBOLGEN?


%PUT is used to display user defined messages on log window after execution of a program
where as % SYMBOLGEN is used to print the value of a macro variable resolved, on log
window.

12. How do you add a number to a macro variable?


Using %eval function

13. Can you execute a macro within a macro? Describe.


Yes, Such macros are called nested macros. They can be obtained by using symget and call
symput macros.

14. If you need the value of a variable rather than the variable itself what would you use to
load the value to a macro variable?
If we need a value of a macro variable then we must define it in such terms so that we can call
them everywhere in the program. Define it as Global. There are different ways of assigning a
global variable. Simplest method is %LET.
Ex:A, is macro variable. Use following statement to assign the value of a rather than the
variable itselfe.g.%Let A=xyzx="&A";This will assign "xyz" to x, not the variable xyz to x.

15. Can you execute macro within another macro? If so, how would SAS know where the
current macro ended and the new one began?
Yes, I can execute macro within a macro, what we call it as nesting of macros, which is
allowed. Every macro's beginning is identified the keyword %macro and end with %mend.

16. How are parameters passed to a macro?


A macro variable defined in parentheses in a %MACRO statement is a macro parameter.
Macro parameters allow you to pass information into a macro. Here is a simple example:
%macro plot(yvar= ,xvar= ); proc plot; plot &yvar*&xvar; run;%mend plot;

17. How would you code a macro statement to produce information on the SAS log?
This statement can be coded anywhere?OPTIONS, MPRINT MLOGIC MERROR
SYMBOLGEN;

18. How we can call macros with in data step?


We can call the macro with CALLSYMPUT, Proc SQL and %LET statement.

19. Tell me about call symput?


CALL SYMPUT takes a value from a data step and assigns it to a macro variable. I can then
use this macro variable in later steps. To assign a value to a single macro variable, I use CALL
SYMPUT with this general form:

CALL SYMPUT (“macro-variable-name”, value);


Where macro-variable-name, enclosed in quotation marks, is the name of a macro variable,
either new or old, and value is the value I want to assign to that macro variable. Value can be
the name of a variable whose value SAS will use, or it can be a constant value enclosed
quotation marks.

CALL SYMPUT is often used in if-then statements such as this:


If age>=18 then call symput (“status”,”adult”);
Else call symput (“status”,”minor”);
These statements create a macro variable named &status and assign to it a value of either adult
or minor depending on the variable age.

Caution: We cannot create a macro variable with CALL SYMPUT and use it in the same data
step because SAS does not assign a value to the macro variable until the data step executes.
Data steps executes when SAS encounters a step boundary such as a subsequent data, proc, or
run statement.

20. Tell me about % include and % eval?


The %include statement, despite its percent sign, is not a macro statement and is always
executed in SAS, though it can be conditionally executed in a macro.

It can be used to setting up a macro library. But this is a least approach. The use of %include
does not actually set up a library. The %include statement points to a file and when it executed
the indicated file (be it a full program, macro definition, or a statement fragment) is inserted
into the calling program at the location of the call. When using the %include building a macro
library, the included file will usually contain one or more macro definitions.

%EVAL is a widely used yet frequently misunderstood SAS(r) macro language function due
to its seemingly simple form. However, when its actual argument is a complex macro
expression interlaced with special characters, mixed arithmetic and logical operators, or macro
quotation functions, its usage and result become elusive and problematic. %IF condition in
macro is evaluated by %eval, to reduce it to true or false.

21. Describe the ways in which you can create macro variables?
There are the 5 ways to create macro variables:
%Let
%Global
Call Symput
Proc SQl
Parameters.

22. Tell me more about the parameters in macro?


Parameters are macro variables whose value you set when you invoke a macro. To add the
parameters to a macro, you simply name the macro vars names in parenthesis in the %macro
statement.
Syntax:
%MACRO macro-name (parameter-1= , parameter-2= , ……parameter-n = );
macro-text
%MEND macro-name;

23. What is the maximum length of the macro variable?


32 characters long.

24. Automatic variables for macro?


Every time we invoke SAS, the macro processor automatically creates certain macro var. eg:
&sysdate &sysday.

25. What system options would you use to help debug a macro?
Debugging a Macro with SAS System Options. The SAS System offers users a number of
useful system options to help debug macro issues and problems. The results associated with
using macro options are automatically displayed on the SAS Log. Specific options related to
macro debugging appear in alphabetical order in the table below.SAS Option Description:

MEMRPT Specifies that memory usage statistics be displayed on the SAS Log.
MERROR: SAS will issue warning if we invoke a macro that SAS didn’t find. Presents
Warning Messages when there are misspellings or when an undefined macro is called.
SERROR: SAS will issue warning if we use a macro variable that SAS can’t find.
MLOGIC: SAS prints details about the execution of the macros in the log.
MPRINT: Displays SAS statements generated by macro execution are traced on the SAS Log
for debugging purposes.
SYMBOLGEN: SAS prints the value of macro variables in log and also displays text from
expanding macro variables to the SAS Log.
26. If you need the value of a variable rather than the variable itself what would you use to
load the value to a macro variable?
If we need a value of a macro variable then we must define it in such terms so that we can call
them everywhere in the program. Define it as Global. There are different ways of assigning a
global variable. Simplest method is %LET.
Ex:A, is macro variable. Use following statement to assign the value of a rather than the
variable itselfe.g.%Let A=xyzx="&A";This will assign "xyz" to x, not the variable xyz to x.

27. Can you execute macro within another macro? If so, how would SAS know where the
current macro ended and the new one began?
Yes, I can execute macro within a macro, what we call it as nesting of macros, which is
allowed. Every macro's beginning is identified the keyword %macro and end with %mend.

28. How are parameters passed to a macro? A macro variable defined in parentheses in a
%MACRO statement is a macro parameter. Macro parameters allow you to pass information
into a macro. Here is a simple example: %macro plot(yvar= ,xvar= ); proc plot; plot
&yvar*&xvar; run;%mend plot;

29. How would you code a macro statement to produce information on the SAS log?
This statement can be coded anywhere?OPTIONS, MPRINT MLOGIC MERROR
SYMBOLGEN;

30. How we can call macros with in data step?


We can call the macro with CALLSYMPUT, Proc SQL and %LET statement.

31. What are SYMGET and SYMPUT?


SYMPUT puts the value from a dataset into a macro variable where as SYMGET gets the
value from the macro variable to the dataset.

32. What are the macros you have used in your programs?
Used macros for various puposes, few of them are..

1) Macros written to determine the list of variables in a dataset:


%macro varlist (dsn);
proc contents data = &dsn out = cont noprit;
run;
proc sql noprint;
select distinct name into:
varname1-:varname22
from cont;
quit;
%do i =1 %to &sqlobs;
%put &i &&varname&i;
%end;
%mend varlist;

%varlist(adverse)
2) Distribution or Missing / Non-Missing Values
%macro missrep(dsn, vars=_numeric_);
proc freq data=&dsn.;
tables &vars. / missing;
format _character_ $missf. _numeric_ missf.;
title1 ‘Distribution or Missing / Non-Missing Values’;
run;
%mend missrep;
%missrep(study.demog, vars=age gender bdate);

3) Written macros for sorting common variables in various datasets


%macro sortit (datasetname, pid, investigator, timevisit)PROC SORT DATA =
&DATASETNAME;
BY &PID &INVESTIGATOR;
%mend sortit;
4) Macros written to split the number of observations in a dataset

%macro split (dsnorig, dsnsplit1, dsnsplit2, obs1);


data &dsnsplit1;
set &dsnorig (obs = &obs1);
run;

data &dsnsplit2;
set &dsnorig (firstobs = %eval(&obs1 + 1));
run;
%mend split;
%split(sasuser.admit,admit4,admit5,2)

33. What is auto call macro and how to create a auto call macro? What is the use of it? How to
use it in SAS with macros?
Enables the user to call macros that have been stored as SAS programs. The auto call macro
facility allows users to access the same macro code from multiple SAS programs. Rather than
having the same macro code for in each program where the code is required, with an autocall
macro, the code is in one location. This permits faster updates and better consistency across
all the programs.

Macro set-up:
The fist step is to set-up a program that contains a macro, desired to be used in multiple
programs. Although the program may contain other macros and/or open code, it is advised to
include only one macro.

Set MAUTOSOURSE and SASAUTOS:


Before one can use the autocall macro within a SAS program, The MAUTOSOURSE option
must be set open and the SASAUTOS option should be assigned. The MAUTOSOURSE
option indicates to SAS that the autocall facility is to be activated. The SASAUTOS option
tells SAS where to look for the macros.
For ex: sasauto=’g:\busmeas\internal\macro\’;

34. What %put do?


It displays the macro variable value when we specify
%put (my first macro variable… is &……..)
% Put _automatic_ option displays all the SAS system macro variables includind
&SYSDATE AND &SYSTIME.

 What is the difference between a FUNCTION and a PROC?

Example: MEAN function and PROC MEANS


Answer: One will give an average across an observation (a row) and the other
will give an average across many observations (a column)

 What are some of the differences between a WHERE and an IF statement?

Answer: Major differences include:

o IF can only be used in a DATA step


o Many IF statements can be used in one DATA step
o Must read the record into the program data vector to perform
selection with IF
o WHERE can be used in a DATA step as well as a PROC.
o A second WHERE will replace the first unless the option ALSO
is used
o Data subset prior to reading record into PDV with WHERE
 Name some of the ways to create a macro variable

Answer:

o %let
o CALL SYMPUT(....)
o when creating a macro example:
o %macro mymacro(a=,b=);
o in SQL using INTO

Questions submitted by Charles Patridge

1. Question: Describe 1 way to avoid using a series of "IF" statements such as 
if branch = 1 then premium = amount; else 
if branch = 2 then premium = amount; else 
if..... ; else
if branch = 20 then premium = amount;

Answer: Use a Format and the PUT function.


Or, use an SQL JOIN with a table of Branch codes.
Or, use a Merge statement with a table of Branch codes.

2. Question: When reading a NON SAS external file what is the best way of
reading the file?
Answer: Use the INPUT statement with pointer control - ex: INPUT @1
dateextr mmddyy8. etc.
3. Question: How can you avoid using a series of "%INCLUDE" statements?
Answer: Use the Macro Autocall Library.
4. Question: If you have 2 sets of Format Libraries, can a SAS program have
access to both during 1 session?
Answer: Yes. Use the FMTSEARCH option.
5. Question: Name some of the SAS Functions you have used.
Answer: Looking to see which ones and how many a potential candidate has
used in the past.
6. Question: Name some of the SAS PROCs you have used.
Answer: Looking to see which ones and how many a potential candidate has
used in the past, as well as what SAS products he/she has been exposed to.
7. Question: Have you ever presented a paper to a SAS user group (Local,
Regional or National)?
Answer: Checking to see if candidate is willing to share ideas, proud of his/her
work, not afraid to stand up in front of an audience, can write/speak in a
reasonable clear and understandable manner, and finally, has mentoring
abilities.

Interview questions for a SAS data support manager by Ron Fehd of CDC Atlanta GA USA
RJF2@cdc.gov

 Question: How would you check the uniqueness of a data set? i.e. that the data
set was unique on its primary key (ID). suppose there were two Identifier
variables: ID1, ID2

answer:

1. proc FREQ order = FREQ;


tables ID;
shows multiples at top of listing
2. Or, compare number of obs of original data set and data after
proc SORT nodups
or is that
proc SORT nodupkey
3. use first.ID in data step to write non-unique to separate data set
if first.ID and last.ID then output UNIQUE; else output DUPS;

 1. Have you used macros? For what purpose you have used? 
Yes I have, I used macros in creating analysis datasets and tables where it is necessary
to make a small change through out the program and where it is necessary to use the
code again and again. 

2. How would you invoke a macro? 


After I have defined a macro I can invoke it by adding the percent sign prefix to its
name like this: % macro name a semicolon is not required when invoking a macro,
though adding one generally does no harm. 
 3. How can you create a macro variable with in data step? 
with CALL SYMPUT 

4. How would you identify a macro variable? 


with Ampersand (&) sign 

5. How would you define the end of a macro? 


The end of the macro is defined by %Mend Statement 

6. For what purposes have you used SAS macros? 


If we want use a program step for executing to execute the same Proc step on multiple
data sets. We can accomplish repetitive tasks quickly and efficiently. A macro
program can be reused many times. Parameters passed to the macro program
customize the results without having to change the code within the macro program.
Macros in SAS make a small change in the program and have SAS echo that change
thought that program. 

7. What is the difference between %LOCAL and %GLOBAL? 


% Local is a macro variable defined inside a macro. % Global is a macro variable
defined in open code (outside the macro or can use anywhere). 

8. How long can a macro variable be? A token? 


A component of SAS known as the word scanner breaks the program text into
fundamental units called tokens.
 · Tokens are passed on demand to the compiler.
 · The compiler then requests token until it receives a semicolon.
 · Then the compiler performs the syntax check on the statement. 

9. If you use a SYMPUT in a DATA step, when and where can you use the macro
variable? 
The macro variable created by the CALL SYMPUT routine cannot be used in the
same datastep in which it got created. Other than that we can use the macro variable at
any time.. 

10. What do you code to create a macro? End one? 


We create a macro with %MACRO statement and end a macro with %MEND
statemnt. 

11. What is the difference between %PUT and SYMBOLGEN? 


%PUT is used to display user defined messages on log window after execution of a
program where as % SYMBOLGEN is used to print the value of a macro variable
resolved, in log window. 
 12. How do you add a number to a macro variable? 
Using %eval function or %sysevalf function if the number is a floating number. 

13. Can you execute a macro within a macro? Describe. 


Yes, Such macros are called nested macros. They can be obtained by using symget and
call symput macros. 
14. If you need the value of a variable rather than the variable itself what would you
use to load the value to a macro variable? 
If we need a value of a macro variable then we must define it in such terms so that we
can call them everywhere in the program. Define it as Global. There are different ways
of assigning a global variable. Simplest method is %LET. 

Ex: 
A, is macro variable. Use following statement to assign the value of a rather than the
variable itself
 %Let A=xyz; %put x="&A"; 

This will assign "xyz" to x, not the variable xyz to x. 

15. Can you execute macro within another macro? If so, how would SAS know where
the current macro ended and the new one began? 

Yes, I can execute macro within a macro, we call it as nesting of macros, which is
allowed. 
Every macro's beginning is identified the keyword %macro and end with %mend. 

16. How are parameters passed to a macro? 


A macro variable defined in parentheses in a %MACRO statement is a macro
parameter. Macro parameters allow you to pass information into a macro. 

Here is a simple example:



%macro plot(yvar= ,xvar= ); 
proc plot; 
plot &yvar*&xvar; 
run; 
%mend plot;
 %plot(age,sex) 

17. How would you code a macro statement to produce information on the SAS log? 
This statement can be coded anywhere? 
 OPTIONS MPRINT MLOGIC MERROR SYMBOLGEN; 

18. How we can call macros with in data step? 


 We can call the macro with 
CALL SYMPUT, 
Proc SQL , 
%LET statement. and macro parameters. 

19. Tell me about call symput?



CALL SYMPUT takes a value from a data step and assigns it to a macro variable. I
can then use this macro variable in later steps. To assign a value to a single macro
variable, 
I use CALL SYMPUT with this general form: 
 CALL SYMPUT (“macro-variable-name”, value); 
 Where macro-variable-name, enclosed in quotation marks, is the name of a macro
variable, and value is the value I want to assign to that macro variable. Value can be
the name of a variable whose value SAS will use, or it can be a constant value
enclosed quotation marks. 

CALL SYMPUT is often used in if-then statements such as this: 


 If age>=18 then call symput (“status”,”adult”); 
else call symput (“status”,”minor”); 

These statements create a macro variable named &status and assign to it a value of
either adult or minor depending on the variable age.Caution: We cannot create a macro
variable with CALL SYMPUT and use it in the same data step because SAS does not
assign a value to the macro variable until the data step executes. Data steps executes
when SAS encounters a step boundary such as a subsequent data, proc, or run
statement. 

20. Tell me about % include and % eval? 


The %include statement, despite its percent sign, is not a macro statement and is
always executed in SAS, though it can be conditionally executed in a macro.It can be
used to setting up a macro library. But this is a least approach. 

The use of %include does not actually set up a library. The %include statement points
to a file and when it executed the indicated file (be it a full program, macro definition,
or a statement fragment) is inserted into the calling program at the location of the call. 

When using the %include building a macro library, the included file will usually
contain one or more macro definitions.%EVAL is a widely used yet frequently
misunderstood SAS(r) macro language function due to its seemingly simple form. 

However, when its actual argument is a complex macro expression interlaced with


special characters, mixed arithmetic and logical operators, or macro quotation
functions, its usage and result become elusive and problematic. %IF condition in
macro is evaluated by %eval, to reduce it to true or false. 

21. Describe the ways in which you can create macro variables? 
There are the 5 ways to create macro variables: 
 %Let
 %Global 
Call Symput 
Proc SQl into clause 
Macro Parameters. 

22. Tell me more about the parameters in macro? 


Parameters are macro variables whose value you set when you invoke a macro. To add
the parameters to a macro, you simply name the macro vars names in parenthesis in
the %macro statement. 
 Syntax: 
 %MACRO macro-name (parameter-1= , parameter-2= , ……parameter-n = ); 
macro-text%; 
%MEND macro-name;
 %macro_name(par1,par2,....parn); 

23. What is the maximum length of the macro variable? 


32 characters long. 

24. Automatic variables for macro? 


Every time we invoke SAS, the macro processor automatically creates certain macro
var. eg: &sysdate &sysday. 

25. What system options would you use to help debug a macro? 
The SAS System offers users a number of useful system options to help debug macro
issues and problems. The results associated with using macro options are
automatically displayed on the SAS Log. 

Specific options related to macro debugging appear in alphabetical order in the table


below: 

MEMRPT: Specifies that memory usage statistics be displayed on the SAS Log. 


MERROR: SAS will issue warning if we invoke a macro that SAS didn’t
find. Presents Warning Messages when there are misspellings or when an undefined
macro is called. 
SERROR: SAS will issue warning if we use a macro variable that SAS can’t find. 
MLOGIC: SAS prints details about the execution of the macros in the log. 
MPRINT: Displays SAS statements generated by macro execution are traced on the
SAS Log for debugging purposes. 
SYMBOLGEN: SAS prints the value of macro variables in log and also displays text
from expanding macro variables to the SAS Log. 

27. What are SYMGET and SYMPUT? 


SYMPUT puts the value from a dataset into a macro variable where as 
SYMGET gets the value from the macro variable to the dataset. 

28. What are the macros you have used in your programs? 
Used macros for various puposes, few of them are..

1) Macros written to determine the list of variables in a dataset: 
 %macro varlist (dsn); 
proc contents data = &dsn out = cont noprint; 
run; 
%mend; 
%varlist(demo); 

proc sql noprint; 
select distinct name into:varname1-:varname22 from cont; 
quit; 

%do i =1 %to &sqlobs; 
%put &i &&varname&i; 
%end; 
%mend varlist; 
%varlist(adverse) 

2) Distribution or Missing / Non-Missing Values


 %macro missrep (dsn, vars=_numeric_); 

proc freq data=&dsn.; 
tables &vars. / missing; 
format _character_ $missf. _numeric_ missf.; 
title1 ‘Distribution or Missing / Non-Missing Values’; 
run; 
%mend missrep; 
%missrep(study.demog, vars=age gender bdate); 

3) Written macros for sorting common variables in various datasets


 %macro sortit (datasetname,pid,inverstigator);

PROC SORT DATA = &DATASETNAME; 
BY &PID &INVESTIGATOR; 
%mend sortit; 
 %sortit (ae,001,sarath);

4) Macros written to split the number of observations in a dataset

%macro split (dsnorig, dsnsplit1, dsnsplit2, obs1); 
data &dsnsplit1; 
set &dsnorig (obs = &obs1); 
run;
 data &dsnsplit2; 
set &dsnorig (firstobs = %eval(&obs1 + 1)); 
run;
 %mend split; 
%split(sasuser.admit,admit4,admit5,2) 

29. What is auto call macro and how to create a auto call macro? What is the use of it?
How to use it in SAS with macros? 

SAS Enables the user to call macros that have been stored as SAS programs. 

The auto call macro facility allows users to access the same macro code from multiple
SAS programs. Rather than having the same macro code for in each program where
the code is required, with an autocall macro, the code is in one location. This permits
faster updates and better consistency across all the programs.Macro set-up:The fist
step is to set-up a program that contains a macro, desired to be used in multiple
programs. Although the program may contain other macros and/or open code, it is
advised to include only one macro. 

Set MAUTOSOURSE and SASAUTOS: 


Before one can use the autocall macro within a SAS program, The MAUTOSOURSE
option must be set open and the SASAUTOS option should be assigned. The
MAUTOSOURSE option indicates to SAS that the autocall facility is to be activated.
The SASAUTOS option tells SAS where to look for the macros. 

For ex: sasauto=’g:\busmeas\internal\macro\’; 

30. Why and How to Use %PUT Statement: 

%Put statement is similar to the PUT statement in data step, What it does is it writes
text and values of macro variable after execution to the SAS System LOG. If you want
to make sure your macro variable resolves as expected, you can make sure it with
%PUT statement. 

Unique advantage of %PUT over PUT is …you can use %PUT outside the data step
whereas you can’t with PUT. 

How to use %PUT: 

%let program=AE; 
%put program Name here as &program; 

Above %put statement resolves to … %put Program Name here as AE; 

What can you do with %PUT: 

Numerous options are available for the %PUT statement. 

%PUT _all_: 
It prints all macro variables in the log that are available in all environments (global,
local, user and automatic). 

%PUT _automatic_: 
It prints all the SAS defined automatic macro variables in the log. (ex: &sysdate,
&systime ,%sysdsn, %syserr etc) 

%PUT _global_: 
It prints macro variables that are created by the user and available in all environments. 

%PUT _local_: 
It prints macro variables that are created by the user and available only in the local
environment. (couldn’t be able use those macro variables outside the particular data
step) 

%PUT _user_: 
It prints macro variables that are created by the user in each environment.

31) How to know how &&var&i or &&dsn&i resolves?
It is very confusing some times to tell rightaway how &&var&i or &&dsn&i get
resolved...
but here is the simple technique you can use to know....
ex: We generally use &&var&i or &&dsn&i these macro variables when we are using a
%do loop... to execute same code n number of times.
You have a dataset and it has 5 variables ... Patid sex age ethnic race wt ht;
%macro doit;
%do i=1 %to &nvars;
&&var&i
%end;
%mend doit;
So if the nvars value is 7, then the loop creates a macro variable list of
&var1 &var2 &var3 &var4 &var5 &var6 &var7
which further get resolved to
patid sex age ethnic race wt ht
You can always use Macro debugging option SYMBOLGEN to see how each macro
variable got resolved....
Very Basic
 What SAS statements would you code to read an external raw data file to a DATA
step?
 How do you read in the variables that you need?
 Are you familiar with special input delimiters? How are they used?
 If reading a variable length file with fixed input, how would you prevent SAS from
reading the next record if the last variable didn't have a value?
 What is the difference between an informat and a format? Name three informats or
formats.
 Name and describe three SAS functions that you have used, if any?
 How would you code the criteria to restrict the output to be produced?
 What is the purpose of the trailing @? The @@? How would you use them?
 Under what circumstances would you code a SELECT construct instead of IF
statements?
 What statement do you code to tell SAS that it is to write to an external file? What
statement do you code to write the record to the file?
 If reading an external file to produce an external file, what is the shortcut to write
that record without coding every single variable on the record?
 If you're not wanting any SAS output from a data step, how would you code the
data statement to prevent SAS from producing a set?
 What is the one statement to set the criteria of data that can be coded in any step?
 Have you ever linked SAS code? If so, describe the link and any required
statements used to either process the code or the step itself.
 How would you include common or reuse code to be processed along with your
statements?
 When looking for data contained in a character string of 150 bytes, which function
is the best to locate that data: scan, index, or indexc?
 If you have a data set that contains 100 variables, but you need only five of those,
what is the code to force SAS to use only those variable?
 Code a PROC SORT on a data set containing State, District and County as the
primary variables, along with several numeric variables.
 How would you delete duplicate observations?
 How would you delete observations with duplicate keys?
 How would you code a merge that will keep only the observations that have
matches from both sets.
 How would you code a merge that will write the matches of both to one data set,
the non-matches from the left-most data set to a second data set, and the non-matches
of the right-most data set to a third data set.

Internals

 What is the Program Data Vector (PDV)? What are its functions?
 Does SAS 'Translate' (compile) or does it 'Interpret'? Explain.
 At compile time when a SAS data set is read, what items are created?
 Name statements that are recognized at compile time only?
 Identify statements whose placement in the DATA step is critical.
 Name statements that function at both compile and execution time.
 Name statements that are execution only.
 In the flow of DATA step processing, what is the first action in a typical DATA
Step?
 What is _n_?

Base SAS

 What is the effect of the OPTIONS statement ERRORS=1?


 What's the difference between VAR A1 - A4 and VAR A1 -- A4?
 What do the SAS log messages "numeric values have been converted to character"
mean? What are the implications?
 Why is a STOP statement needed for the POINT= option on a SET statement?
 How do you control the number of observations and/or variables read or written?
 Approximately what date is represented by the SAS date value of 730?
 How would you remove a format that has been permanently associated with a
variable??
 What does the RUN statement do?
 Why is SAS considered self-documenting?
 What areas of SAS are you most interested in?
 Briefly describe 5 ways to do a "table lookup" in SAS.
 What versions of SAS have you used (on which platforms)?
 What are some good SAS programming practices for processing very large data
sets?
 What are some problems you might encounter in processing missing values? *In
Data steps? Arithmetic? Comparisons? Functions? Classifying data?
 How would you create a data set with 1 observation and 30 variables from a data
set with 30 observations and 1 variable?
 What is the different between functions and PROCs that calculate the same simple
descriptive statistics?
 If you were told to create many records from one record, show how you would do
this using arrays and with PROC TRANSPOSE?
 What are _numeric_ and _character_ and what do they do?
 How would you create multiple observations from a single observation?
 For what purpose would you use the RETAIN statement?
 What is a method for assigning first.VAR and last.VAR to the BY group variable
on unsorted data?
 What is the order of application for output data set options, input data set options
and SAS statements?
 What is the order of evaluation of the comparison operators: + - * / ** ( ) ?

Testing, debugging

 How could you generate test data with no input data?


 How do you debug and test your SAS programs?
 What can you learn from the SAS log when debugging?
 What is the purpose of _error_?
 How can you put a "trace" in your program?
 Are you sensitive to code walk-throughs, peer review, or QC review?
 Have you ever used the SAS Debugger?
 What other SAS features do you use for error trapping and data validation?

Missing values
 How does SAS handle missing values in: assignment statements, functions, a
merge, an update, sort order, formats, PROCs?
 How many missing values are available? When might you use them?
 How do you test for missing values?
 How are numeric and character missing values represented internally?

General

 What has been your most common programming mistake?


 What is your favorite programming language and why?
 What is your favorite operating system? Why?
 Do you observe any coding standards? What is your opinion of them?
 What percent of your program code is usually original and what percent copied
and modified?
 Have you ever had to follow SOPs or programming guidelines?
 Which is worse: not testing your programs or not commenting your programs?
 Name several ways to achieve efficiency in your program. Explain trade-offs.
 What other SAS products have you used and consider yourself proficient in using?

Functions

 How do you make use of functions?


 When looking for contained in a character string of 150 bytes, which function is
the best to locate that data: scan, index, or indexc?
 What is the significance of the 'OF' in X=SUM(OF a1-a4, a6, a9);?
 What do the PUT and INPUT functions do?
 Which date function advances a date, time or date/time value by a given interval?
 What do the MOD and INT function do?
 How might you use MOD and INT on numerics to mimic SUBSTR on character
strings?
 In ARRAY processing, what does the DIM function do?
 How would you determine the number of missing or nonmissing values in
computations?
 What is the difference between: x=a+b+c+d; and x=SUM(a,b,c,d);?
 There is a field containing a date. It needs to be displayed in the format "ddmonyy"
if it's before 1975, "dd mon ccyy" if it's after 1985, and as 'Disco Years' if it's between
1975 and 1985. How would you accomplish this in data step code? Using only PROC
FORMAT.
 In the following DATA step, what is needed for 'fraction' to print to the log? data
_null_; x=1/3; if x=.3333 then put 'fraction'; run;
 What is the difference between calculating the 'mean' using the mean function and
PROC MEANS?

PROCs

Have you ever used "Proc Merge"? (be prepared for surprising answers..)

If you were given several SAS data sets you were unfamiliar with, how would you
find out the variable names and formats of each dataset?
What SAS PROCs have you used and consider yourself proficient in using?

How would you keep SAS from overlaying the a SAS set with its sorted version?

In PROC PRINT, can you print only variables that begin with the letter "A"?

What are some differences between PROC SUMMARY and PROC MEANS?

PROC FREQ: 

*Code the tables statement for a single-level (most common) frequency. 


*Code the tables statement to produce a multi-level frequency. 
*Name the option to produce a frequency line items rather that a table. 
*Produce output from a frequency. Restrict the printing of the table. 

PROC MEANS: 
*Code a PROC MEANS that shows both summed and averaged output of the data. 
*Code the option that will allow MEANS to include missing numeric data to be
included in the report. 
*Code the MEANS to produce output to be used later. 

Do you use PROC REPORT or PROC TABULATE? Which do you prefer? Explain.
Difference between SQL join and merge. Which is much better and why?

What are the different ways to get unique data from a SAS dataset ?

How to create format from a SAS dataset and its importance?

First obs and obs?

Proc contents?

Goup by.?

Proc sort in detail .

Difference between substr and scan.

Difference between cat , cats , cats , catx Compbl() Find (). Index() Put and input ()
Format and Informat.

Difference between where and if.


What is PDV ?

What are the two automatic variables in PDV?

Default stat of proc means?

Output statement in proc means and freq.


Merge is equal to which join In= option in details.

Prerequisite for data step merge.

DSD and Missover @ and @@ Drop and keep = option.

Difference between Subsetting if and conditional if.

When to use select when ?

Significance of _type_ .

Difference between lib and sql pass through ?

How to add new field in sas dataset?

When to use Calculated keyword in sql?

How to create new table How to update the format of a sas dataset?

How to merge two dataset that contain 1000000 and 50 obs respectively?

How to sort data by sql ?

How to get total no of obs from a dataset ?

Diff between proc means and proc summary.

ODS destination. Tranward() Difference between trim() left() right() . -

Merging/Updating

 What happens in a one-on-one merge? When would you use one?


 How would you combine 3 or more tables with different structures?
 What is a problem with merging two data sets that have variables with the same
name but different data?
 When would you choose to MERGE two data sets together and when would you
SET two data sets?
 Which data set is the controlling data set in the MERGE statement?
 How do the IN= variables improve the capability of a MERGE?
 Explain the message 'MERGE HAS ONE OR MORE DATASETS WITH
REPEATS OF BY VARIABLES".

Simple statistics

 How would you generate 1000 observations from a normal distribution with a
mean of 50 and standard deviation of 20. How would you use PROC CHART to look
at the distribution? Describe the shape of the distribution.
 How do you generate random samples?
Customized Report Writing

 What is the purpose of the statement DATA _NULL_ ;?


 What is the pound sign used for in the DATA _NULL_?
 What would you use the trailing @ sign for?
 For what purpose(s) would you use the RETURN statement?
 How would you determine how far down on a page you have printed in order to
print out footnotes?
 What is the purpose of using the N=PS option?

Macro

 What system options would you use to help debug a macro?


 Describe how you would create a macro variable.
 How do you identify a macro variable?
 How do you define the end of a macro?
 How do you assign a macro variable to a SAS variable?
 For what purposes have you used SAS macros?
 What is the difference between %LOCAL and %GLOBAL?
 How long can a macro variable be? A token?
 If you use a SYMPUT in a DATA step, when and where can you use the macro
variable?
 What do you code to create a macro? End one?
 Describe how you would pass data to a macro.
 You have five data sets that need to be processed identically; how would you
simplify that processing with a macro?
 How would you code a macro statement to produce information on the SAS log?
This statement can be coded anywhere.
 How do you add a number to a macro variable?
 If you need the value of a variable rather than the variable itself, what would you
use to load the value to a macro variable?
 Can you execute a macro within a macro? Describe.
 Can you a macro within another macro? If so, how would SAS know where the
current macro ended and the new one began?
 How are parameters passed to a macro?

proc format ;
Picture mydate other = "%0m-%0d-%0y"(datatype = Date);
run;
data A;
Call symput('DATE',put(today()- 4,mydate.));
run;
%put &Date;

Describe the validation procedure? How would you perform the validation for TLG as well as
analysis data set?
Ans:- Validation procedure is used to check the output of the SAS program generated by the source
programmer. In this process validator write the program and generate the output. If this output is same
as the output generated by the SAS programmer's output then the program is considered to be valid. We
can perform this validation for TLG by checking the output manually and for analysis data set it can be
done using PROC COMPARE.

How would you perform the validation for the listing, which has 400 pages?

Ans:- It is not possible to perform the validation for the listing having 400 pages manually. To do this,
we convert the listing in data sets by using PROC RTF(ODS) and then after that we can compare it by
using PROC COMPARE.

Can you use PROC COMPARE to validate listings? Why?

Ans:- Yes, we can use PROC COMPARE to validate the listing because if there are many entries
(pages) in the listings then it is not possible to check them manually. So in this condition we use PROC
COMPARE to validate the listings

Questions on Testing and debugging


 How could you generate test data with no input data?

 How do you debug and test your SAS programs?

 What can you learn from the SAS log when debugging?

 What is the purpose of _error_?

 How can you put a "trace" in your program?

 Are you sensitive to code walk-throughs, peer review, or QC review?

 Have you ever used the SAS Debugger?

 What other SAS features do you use for error trapping and data

validation?

Questions on Missing values


-   How does SAS handle missing values in: assignment statements, functions, a
merge, an update, sort order, formats, PROCs?
-   How many missing values are available? When might you use them?
-   How do you test for missing values?
-   How are numeric and character missing values represented internally?
Some General Non-Technical questions
-   What has been your most common programming mistake?
-   What is your favorite programming language and why?
-   What is your favorite operating system? Why?
-   Do you observe any coding standards? What is your opinion of them?
-   What percent of your program code is usually original and what percent
copied and modified?
-   Have you ever had to follow SOPs or programming guidelines?
-   Which is worse: not testing your programs or not commenting your programs?
-   Name several ways to achieve efficiency in your program. Explain trade-offs.
-   What other SAS products have you used and consider yourself proficient in
using?
Questions on Functions
-   How do you make use of functions?
-   When looking for contained in a character string of 150 bytes, which function
is the best to locate that data: scan, index, or indexc?
-   What is the significance of the 'OF' in X=SUM(OF a1-a4, a6, a9);?
-   What do the PUT and INPUT functions do?
-   Which date function advances a date, time or date/time value by a given
interval?
-   What do the MOD and INT function do?
-   How might you use MOD and INT on numerics to mimic SUBSTR on
character strings?
-   In ARRAY processing, what does the DIM function do?
-   How would you determine the number of missing or nonmissing values in
computations?
-   What is the difference between: x=a+b+c+d; and x=SUM(a,b,c,d);?
-   There is a field containing a date. It needs to be displayed in the format
"ddmonyy" if it's before 1975, "dd mon ccyy" if it's after 1985, and as 'Disco
Years' if it's between 1975 and 1985. How would you accomplish this in data step
code? Using only PROC FORMAT.
-   In the following DATA step, what is needed for 'fraction' to print to the log?
data _null_; x=1/3; if x=.3333 then put 'fraction'; run;
- What is the difference between calculating the 'mean' using the mean function
and PROC MEANS?
-
Questions on PROCs
-   Have you ever used "Proc Merge"?
-   If you were given several SAS data sets you were unfamiliar with, how would
you find out the variable names and formats of each dataset?
-   What SAS PROCs have you used and consider yourself proficient in using?
-   How would you keep SAS from overlaying the a SAS set with its sorted
version?
-   In PROC PRINT, can you print only variables that begin with the letter "A"?
-   What are some differences between PROC SUMMARY and PROC MEANS?
-   PROC FREQ:
*Code the tables statement for a single-level (most common) frequency.
*Code the tables statement to produce a multi-level frequency.
*Name the option to produce a frequency line items rather that a table.
*Produce output from a frequency. Restrict the printing of the table.

PROC MEANS:
*Code a PROC MEANS that shows both summed and averaged output of the data.
*Code the option that will allow MEANS to include missing numeric data to be included
in the report.
*Code the MEANS to produce output to be used later.

SAS interview questions:Macros 

1. Have you used macros? For what purpose you have used?

 Yes I have, I used macros in creating analysis datasets and tables where it is necessary
tomake a small change through out the program and where it is necessary to use the codeagain
and again.

2. How would you invoke a macro?

 After I have defined a macro I can invoke it by adding the percent sign prefix to its namelike
this: % macro name a semicolon is not required when invoking a macro, thoughadding one
generally does no harm.

3. How can you create a macro variable with in data step?

 with CALL SYMPUT

4. How would you identify a macro variable?

 with Ampersand (&) sign

5. How would you define the end of a macro?

 The end of the macro is defined by %Mend Statement

6. For what purposes have you used SAS macros?

 If we want use a program step for executing to execute the same Proc step on multipledata
sets. We can accomplish repetitive tasks quickly and efficiently. A macro programcan be
reused many times. Parameters passed to the macro program customize the resultswithout
having to change the code within the macro program. Macros in SAS make asmall change in
the program and have SAS echo that change thought that program.
7. What is the difference between %LOCAL and %GLOBAL?

 % Local is a macro variable defined inside a macro.%Global is a macro variable definedin


open code (outside the macro or can use anywhere).

8. How long can a macro variable be? A token?

 A component of SAS known as the word scanner breaks the program text intofundamental
units called tokens.· Tokens are passed on demand to the compiler.· The compiler then
requests token until it receives a semicolon.· Then the compiler performs the syntax check on
the statement.

9. If you use a SYMPUT in a DATA step, when and where can you use the macrovariable?

 The macro variable created by the CALL SYMPUT routine cannot be used in the
samedatastep in which it got created. Other than that we can use the macro variable at
anytime..

10. What do you code to create a macro? End one?

 We create a macro with

MACRO statement and end a macro with %MEND statemnt.

Read the full version

How could you generate test data with no input data?Using Data Null and put statementHow
do you debugand test your SAS programs?Using Obs=0 and systems options to trace the
program execution in log.What can you learn from the SAS log when debugging?It will
display the execution of whole program and the logic. It will also display the error with line
number sothat you can and edit the program.What is the purpose of _error_?It has only to
values, which are 1 for error and 0 for no errorHow can you put a "trace" in your program?By
using ODS TRACE ONHow does SAS handle missing values in: assignment statements,
functions, a merge, an update, sort order,formats, PROCs?Missing values will be assigned as
missing in Assignment statement. Sort order treats missing as secondsmallest followed by
underscore.How do you test for missing values?Using Subset functions like IF then
Else,Where and SelectHow are numeric and character missing values represented internally?
Character as Blank or “ and Numeric as.

 Which date functions advances a date time or date/time value by a given interval?

INTNX.In the flow of DATA step processing,what is the first action in a typical DATA Step?

When you submit a DATA step, SAS processes the DATA step and then creates a new SAS
data set.( creation of input buffer and PDV)Compilation PhaseExecution Phase

What are SAS/ACCESS and SAS/CONNECT?SAS/Access only process through the


databases like Oracle, SQL-server, Ms-Access etc. SAS/Connect only useServer connection.

What is the one statement to set the criteria of data that can be coded in any step?

OPTIONS Statement, Label statement, Keep / Drop statements.

What is the purpose of using the N=PSoption?The N=PS option creates a buffer in memory
which is large enough to store PAGESIZE (PS) lines and enables a page to be formatted
randomly prior to it being printed.What are the scrubbing procedures in SAS?

proc Sort with nodupkey option, because it will eliminate the duplicate values.What are the
new features included in the new version of SAS i.e., SAS9.1.3?The main advantage of
version 9 is faster execution of applications and centralized access of data andsupport.There
are lots of changes has been made in the version 9 when we compared with the version 8.The
following are the few:SAS version 9 supports Formats longer than 8 bytes & is not possible
with version 8.Length for Numeric format allowed in version 9 is 32 where as 8 in version
8.Length for Character names in version 9 is 31 where as in version 8 is 32.Length for
numeric informat in version 9 is 31, 8 in version 8.Length for character names is 30, 32 in
version 8.3new informats are available in version 9 to convert various date, time and datetime
forms of data into a SASdate or SAS time.

•ANYDTDTEW.-

Converts to a SAS date value •ANYDTTMEW.

-Converts to a SAS time value. •ANYDTDTMW.

-Converts to a SAS datetime value.CALL SYMPUTX Macro statement is added in the


version 9 which creates a

macro variable at execution time in the data step by •Trimming trailing blanks • Automatically
converting

numeric value to character.New ODS option (COLUMN OPTION) is included to create a


multiple columns in the output.WHATDIFFERRENCE DID YOU FIND AMONG
VERSION 6 8 AND 9 OF SAS.The SAS 9 Architecture is fundamentally different from any
prior version of SAS. In the SAS 9 architecture, SASrelies on a new component,the Metadata
Server, to provide an information layer between the programs and the data they
access.Metadata, such as security permissions for SAS libraries and where the various SAS
servers are running, aremaintained in a common repository.What has been your most common
programming mistake?Missing semicolon and not checking log after submitting program,Not
using debugging techniques and not using Fsview option vigorously.Name several ways to
achieve efficiency in your program.

Explain trade-offs.Efficiency and performance strategies can be classified into 5 different


areas.

 CPU time

 Data Storage

 Elapsed time

 Input/Output

 Memory

 CPU Time and Elapsed Time- Base line measurements Few Examples for efficiency
violations: Retainingunwanted datasets Not sub setting early to eliminate unwanted records.

Efficiency improving techniques: Using KEEP and DROP statements to retain necessary
variables. Use macrosfor reducing the code. Using IF-THEN/ELSE statements to process data
programming. Use SQL procedure toreduce number of programming steps. Using of length
statements to reduce the variable size for reducing theData storage.Use of Data _NULL_ steps
for processing null data sets for Data storage.What other SAS products have you used and
consider yourself proficient in using?Data _NULL_ statement, Proc Means, Proc Report, Proc
tabulate, Proc freq and Proc print, Proc Univariate etc.What is the significance of the 'OF' in
X=SUM (OF a1-a4, a6, a9);

If don’t use the OF function it might not be interpreted as we exp

ect. For example the function abovecalculates the sum of a1 minus a4 plus a6 and a9 and not
the whole sum of a1 to a4 & a6 and a9. It is true formean option also.What do the PUT and
INPUT functions do?INPUT function converts character data values to numeric values. PUT
function converts numeric values tocharacter values.EX: for INPUT: INPUT (source,
informat)For PUT: PUT (source, format)Note that INPUT function requiresINFORMAT and
PUT function requires FORMAT.If we omit the INPUT or the PUT function during the data
conversion, SAS will detect the mismatched variablesand will try an automatic character-to-
numeric or numeric-to-character conversion.

But sometimes this doesn’t work because $ sign prevents such conversion. Therefore it is
always advisa

ble toinclude INPUT and PUT functions in your programs when conversions occur.Which
date function advances a date, time or datetime value by a given interval?INTNX: INTNX
function advances a date, time, or datetime value by a given interval, and returns a date,
time,or datetime value.Ex: INTNX(interval,start-from,number-of-
increments,alignment)INTCK: INTCK(interval,start-of-period,end-of-period) is an interval
functioncounts the number of intervalsbetween two give SAS dates, Time and/or datetime.
DATETIME () returns the current date and time of day.DATDIF (sdate,edate,basis): returns
the number of days between two dates.What do the MOD and INT function do? What do the
PAD and DIM functions do?MOD: Modulo is a constant or numeric variable, the function
returns the reminder after numeric value dividedby modulo.INT: It returns the integer portion
of a numeric value truncating the decimal portion.PAD: it pads each record with blanks so
that all data lines have the same length. It is used in the INFILEstatement. It is useful only
when missing data occurs at the end of the record.CATX: concatenate character strings,
removes leading and trailing blanks and inserts separators.SCAN: it returns a specified word
from a character value. Scan function assigns a length of 200 to each targetvariable.

SAS Technical Interview Questions

You can go into a SAS interview with more confidence if you know that you are prepared to
respond to the kind of technical questions that an interviewer might ask you. I do not provide
the specific answers here, both because these questions can be asked in a variety of ways and
because it is not my objective to help those who have little actual interest in SAS to bluff their
way through a SAS technical interview. The discussion here, though, may give you an idea of
whether you have fully considered the implications contained in the questions.

Key concepts

A SAS technical interview typically starts with a few of the key concepts that are essential in
SAS programming. These questions are intended to separate those who have actual
substantive experience with SAS from those who have used in only a very limited or
superficial way. If you have spent more than a hundred hours reading and writing SAS
programs, it is safe to assume that you are familiar with topics such as these:

 SORT procedure
 Data step logic
 KEEP=, DROP= dataset options
 Missing values
 Reset to missing, or the RETAIN statement
 Log
 Data types
 FORMAT procedure for creating value formats
 IN= dataset option

Tricky Stuff

After the interviewer is satisfied that you have used SAS to do a variety of things, you are
likely to get some more substantial questions about SAS processing. These questions typically
focus on some of the trickier aspects of the way SAS works, not because the interviewer is
trying to trick you, but to give you a chance to demonstrate your knowledge of the details of
SAS processing. At the same time, you can show how you approach technical questions and
issues, and that is ultimately more important than your knowledge of any specific feature in
SAS.
STOP statement

The processing of the STOP statement itself is ludicrously simple. However, when you
explain the how and why of a STOP statement, you show that you understand:

 How a SAS program is divided into steps, and the difference between a data step and a
proc step
 The automatic loop in the data step
 Conditions that cause the automatic loop to terminate, or to fail to terminate

RUN statement placement

The output of a program may be different based on whether a RUN statement comes before or
after a global statement such as an OPTIONS or TITLE statement. If you are aware of this
issue, it shows that you have written SAS programs that have more than the simplest of
objectives. At the same time, your comments on this subject can also show that you know:

 The distinction between data step statements, proc step statements, and global
statements
 How SAS finds step boundaries
 The importance of programming style

SUM or +

Adding numbers with the SUM function provides the same result that you get with the +
numeric operator. For example, SUM(8, 4, 3)provides the same result as 8 + 4 + 3.
Sometimes, though, you prefer to use the SUM function, and at other times, the + operator. As
you explain this distinction, you can show that you understand:

 Missing values
 Propagation of missing values
 Treatment of missing values in statistical calculations in SAS
 Why it matters to handle missing values correctly in analytic processing
 The use of 0 as an argument in the SUM function to ensure that the result is not a
missing value
 The performance differences between functions and operators
 Essential ideas of data cleaning

Statistics: functions vs. proc steps

Computing a statistic with a function, such as the MEAN function, is not exactly the same as
computing the same statistic with a procedure, such as the UNIVARIATE procedure. As you
explain this distinction, you show that you understand:

 The difference between summarizing across variables and summarizing across


observations
 The statistical concept of degrees of freedom as it relates to the difference between
sample statistics and population statistics, and the way this is implemented in some
SAS procedures with the VARDEF= option
REPLACE= option

Many SAS programmers never have occasion to use the REPLACE= dataset option or system
option, but if you are familiar with it, then you have to be aware of:

 The distinction between the input dataset and the output dataset in a step that makes
changes in a set of data
 The general concept of name conflicts in programming theory
 Issues of programming style related to name conflicts
 How the system option compares to the corresponding dataset option

A question on this topic may also give you the opportunity to mention syntax check mode and
issues of debugging SAS programs.

WHERE vs. IF

Sometimes, it makes no difference whether you use a WHERE statement or a subsetting IF


statement. Sometimes it makes a big difference. In explaining this distinction, you have the
opportunity to discuss:

 The distinction between data steps and proc steps


 The difference between declaration (declarative) statements and executable (action)
statements
 The significance of the sequence of executable statements in a data step
 Some of the finer points of merging SAS datasets
 A few points of efficiency theory (although tests do not seem to bear the theory out in
this case)
 The origin of the WHERE clause in SQL (of course, bring this up only if you’re good
at SQL)
 WHERE operators that are not available in the IF statement or other data step
statements

Compression

Compressing a SAS dataset is easy to to, so questions about it have more to do with
determining when it is a good idea. You can weigh efficient use of storage space against
efficient use of processing power, for example. Explain how you use representative data and
performance measurements from SAS to test efficiency techniques, and you establish yourself
as a SAS programmer who is ready to deal with large volumes of data. If you can explain why
compression is effective in SAS datasets and observations larger than a certain minimum size
and why binary compression works better than character compression for some kinds of data,
then it shows you take software engineering seriously.

Macro processing

Almost the only reason interviewers ask about macros is to determine whether you appreciate
the distinction between preprocessing and processing. Most SAS programmers are somewhat
fuzzy about this, so if you have it perfectly clear in your mind, that makes you a cut about the
rest — and if not, at least you should know that this is a topic you have to be careful about.
There are endless technical issues with SAS macros, such as the system options that determine
how much shows up in the log; your experience with this is especially important if the job
involves maintaining SAS code written with macros.

SAS macro language is somewhat controversial, so be careful what you say of your opinion of
it. To some managers, macro use is what distinguishes real SAS programmers from the
pretenders, but to others, relying on macros all the time is a sure sign of a lazy, fuzzy-headed
programmer. If you are pressed on this, it is probably safe to say that you are happy to work
with macros or without them, depending on what the situation calls for.

Procedure vs. macro

The question, “What is the difference between a procedure and a macro?” can catch you off
guard if it has never occurred to you to think of them as having anything in common. It can
mystify you in a completely different way if you have thought of procedures and macros as
interchangeable parts. You might mention:

 The difference between generating SAS code, as a macro usually does, and taking
action directly on SAS data, as a procedure usually does
 What it means, in terms of efficiency, for a procedure to be a compiled program
 The drastic differences in syntax between a proc step and a macro call
 The IMPORT and EXPORT procedures, which with some options generate SAS
statements much like a macro
 The %SYSFUNC macro function and %SYSCALL macro statement that allow a
macro to take action directly on SAS data, much like a procedure

Scope of macro variables

If the interviewer asks a question about the scope of macro variables or the significance of the
difference between local and global macro variables, the programming concept of scope is
being used to see how you handle the new ways of thinking that programming requires. The
possibility that the same name could be used for different things at different times is one of
the more basic philosophical conundrums in computer programming. If you can appreciate the
difference between a name and the object that the name refers to, then you can probably
handle all the other philosophical challenges of programming.

Run groups

Run-group procedures are not a big part of base SAS, so a question about run-group
processing and the difference between the RUN and QUIT statements probably has more to
do with:

 What a procedure is
 What a step is
 All the work SAS has to go through as it alternately acquires a part of the SAS
program from the execution queue, then executes that part of the program
 Connecting the program and the log messages

SAS date values


Questions about SAS date values have less to do with whether you have memorized the
reference point of January 1, 1960, than with whether you understand the implications of time
data treated as numeric values, such as:

 Using a date format to display the date variable in a meaningful way


 Computing a length of time by subtracting SAS date values

Efficiency techniques

With today’s bigger, faster computers, efficiency is a major concern only for the very largest
SAS projects. If you get a series of technical questions about efficiency, it could mean one of
the following:

 The employer is undertaking a project with an especially large volume of data


 The designated computer is not one of today’s bigger, faster computers
 The project is weighed down with horrendously inefficient code, and they are hoping
you will be able to clean it all up

On the other hand, the interviewer may just be trying to gauge how well you understand the
way SAS statements correspond to the actions the computer takes or how seriously you take
the testing process for a program you write.

Debugger

Most SAS programmers never use the data step debugger, so questions about it are probably
intended to determine how you feel about debugging — does the debugging process bug you,
or is debugging one of the most essential things you do as a programmer?

Informats vs. formats

If you appreciate the distinction between informats and formats, it shows that:

 You can focus on details


 It doesn’t confuse you that two routines have the same name
 You have some idea of what is going on when a SAS program runs

TRANSPOSE procedure

The TRANSPOSE procedure has a few important uses, but questions about it usually don’t
have that much to do with the procedure itself. The intriguing characteristic of the
TRANSPOSE procedure is that input data values determine the names of output variables.
The implication of this is that if the data values are incorrect, the program could end up with
the wrong output variables. In what other ways does a program depend on having valid or
correct data values as a starting point? What does it take to write a program that will run no
matter what input data values are supplied?

_N_
Questions about the automatic variable _N_ (this might be pronounced “underscore N
underscore” or just “N”) are meant to get at your understanding of the automatic actions of the
data step, especially the automatic data step loop, also known as the observation loop.

A possible follow-up question asks how you can store the value of _N_ in the output SAS
dataset. If you can answer this, it may show that you know the properties of automatic
variables and know how to create a variable in the data step.

PUT function

A question about the PUT function might seem to be a trick question, but it is not meant to be.
Beyond showing that you aren’t confused by two things as different as a statement and a
function having the same name, your discussion of the PUT function can show:

 An understanding of what formats are


 Your experience in creating variables in data step statements
 A few of the finer points of SQL query optimization

Important SAS trivia

Some SAS trivia may be important to know in a technical interview, even though it may never
come up in your actual SAS programming work.

 MERGE is a data step statement only. There is no MERGE procedure. “PROC


MERGE” is a mythical construction created years ago by Rhena Seidman, and if you
are asked about it in a job interview, it is meant as a trick question.
 It is possible to use the MERGE statement without a BY statement, but this usually
occurs by mistake.
 SAS does not provide an easy way to create a procedure in a SAS program. However,
it is easy to define informats and formats and use them in the same program.
Beginning with SAS 9.2, the same is true of functions.
 The MEANS and SUMMARY procedures are identical except for the defaults for the
PRINT option and VAR statement.
 Much of the syntax of the TABULATE procedure is essentially the same of that of the
SUMMARY procedure.
 CARDS is another name for DATALINES (or vice versa).
 “DATA _NULL_” is commonly used as a code word to refer to data step
programming that creates print output or text data files.
 The program data vector (PDV) is a logical block of data that contains the variables
used in a data step or proc step. Variables are added to the program data vector in
order of appearance, and this is what determines their position (or variable number)
attribute.
ABOUT FIRST PROJECT
My recent work experience was in Genetech and I worked on a Schizophrenia drug in the
Neuroscience therapeutic area. Was a primary protocol programmer for the study and
supported analysis for multiple protocols for the drug. Generated and validated multiple
datasets and tables and listings for safety and efficacy analysis. I have used various stat
procedures like GLM, MIXED, T-TESTS and CMH Fishers exact tests for generating P-
values and CIs. I worked on SCS/SCE integration analysis for the same drug and also
involved in the preparation of SDTM and analysis datasets for e-submission package. As
part of my regular day to day work I reviewed various documents like Protocol, CRF, and
SAP and also various company SOPs. I maintained and worked well with fellow
programmers and statisticians.
2)My team consists of 2 statisticians one technical lead and 5 programmers. Iwas a primary
protocol programmer for the study and supported analysis for the multiple studies conducted
for same drug .

3)I approach my lead for data managment issues (i.e) like as per the protocol the
opthamology exam (safety parameter ) has to be performed only at certain sites but was
having records with the sites that are not specified. The discharge procedures should be
performed on between (study 21 and 42) but the was not appropriate and for some patients
who discharged the discharge criteria was not met.(CGI- score <=3, is not violent, ).

Approch statistician for clarification on the SAP/DPP or any issues that got raised during the
programming or validation activities.

 Like what values should be consider when there are missing values for the severity
for an AE started prior and during the treatment period. (This issue arrised when I was
programming for the AE analysis dataset when deriving the treatment emergent flag.)

 When imputing the missing values by using the LOCF approach for efficacy
parameters should we consider (records having missing visit and non missing values,
and the records which are created for the missing visits by LOCF) for deriving
analysis flag.

 Is the approach for using MMR model is correct (i.e ) the factors included in the
repeated statement are correct and the construction of estimate stmt (which calculates
the custom means ) is proper or not.

 How to align the digits for the visits when pooling the data from different studies when
creating ISS/ISE datasets. (????)
I
4)My resposibilties as a Primary protocol Programmer were to prepare the initial draft
document of the specifications required for analyzing the study . To co-ordinate the work
between the team members as per (the study delivery plan which is been created by the
lead ) for meeting the deadlines. (The delivery plan will be finalized by the statistician,
lead and the TA represtative.). Was a point of contact for the data mangamnet team and
supported the analysis for the clean database. Had to make sure that validation summary
book(which is an excel file that has the contents of source programmer name , validator
program name, dates of the programs developed, purpose, outcome, validator comments,
resoved/not) after each test run. Acted as liaison between the vendor and the company and
drafted the secluded timelines for the delivery of the SDTM datasets which has to be
reviewed and approved by the lead and the statistician. Had coordinated the QC of SDTM
work between the programmers and make sure the comments column have been updated
in the excel tracker(which is sent by the vendor for reporting any discrepancies)
( we had a program testing phase with >30%clean data had test run 1 one with
>60% and at an intervals of 10 and 5 weeks prior the database close(all queries are
resolved) and
5)Developed dataset specifications. Generated and validated derived data sets for
Analysis, Tables, and Listings.
Generated derived datasets for efficacy parameters (PANNS, CGI-S, BPRS,CGI-I) and
safety parameters (AE,LB,ECG,VS,DS,CM,PE,MH, EPS SCALES).
Efficay datset: Has structure of one record per parameter per visit per subject. Only
randomized subjects are considered for analysis. Derived variables, study day ,blflg, baseline
value, change from bseline, analysis window(week1---week6), analysis flag, percentage
change relative from (baseline(change)*100/baseline value), responder flg(>20% from
baseline), obstype flg(which=1 before imputations and 2 after imputations).
The values for the efficacy parameters are collected by conducting structural clinical
interviews (by an experienced clinician or trained psychiatric rater ) with standard
questionnaire which has numeric scoring for each question(item) for performing the analysis.
The primary outcome of the study is the total score obtained from PANSS questioner
(measures the positive and negative syndrome symptons) which has 30 items and each is
scored on a 7 point scale(1=absent, 2=minimal ----7=extreme). The PANNS total score
parameter is got by adding the scores from all the items. All the questioners are conducted at
scheduled visits in the treatment period.
The primary efficacy parameter is change in the PANSS total score from
baseline to last visit assement in the treatment period. The analysis will be based on intent- to
– treat population. For the analysis of the primary efficacy parameter a two-way analysis of
covariance (ANCOVA) model, with treatment group and study center as factors and baseline
PANSS score as the covariate is used.(………….need to do)

6)Created submission ready datasets, Case Report Tabulations (CRTs) which includes
Data Definition Tables and Transport Files for submissions.

Created submission ready standards for analysis datasets. For these datasets the CDISC’S
ADAM model was not used instead we had an internal GUI tool, which when given the
libname reference of the datasets generated an excel file with all the variable level metadata
(memname,variable name , label, new variable name and an empty colum for entering the
comments) and two word documents appendix1.doc (formats) , and appendix2 .doc for
entering long columns(>200 charcters in length). Had filled comment columns for the
variables (blfl, analflg, trtemergnet flg, study day, window, observed flag, …) .

Then run the tool again by giving the file reference of the csv file(excel file with filled
comments column), generated xport format files and define.doc (dataset name, structure,
label, variable metadata). Finally, run a word macro (inputs define.doc,
appendix1.doc,appendix2.doc ) which created define.pdf.
7)Reviewed mapping specifications, for converting legacy data to SDTM datasets and
define.xml. Performed validation, QC for the created SDTM datasets in compliance with
SDTM-IG, by writing edit check programs.

According to the planned schedule, QC of the SDTM domains had been divided among the
programmers For validating the safety and efficacy domains we did parallel programming and
used proc compare for 100% match. For TV datasets and special purpose datastes(DM, CO,
SV,SE,SUPPQUAL) developed edit check programs according to the specifications and
checked if they in compliant with the SDTMP-ING 3.1.2. If discrepancies entered the
comments in the excel tracker and sent to them. And the final doc(????) will be updated in the
documentation tool. I had validated the validated the efficacy parameters domain(QS) , AE,
LB, CM.

QS: 1)Verified that all required variables (studyid, usubjid, sqseq, qstest,qstestcd,) are present
and does not have missing values and variable name <8 and values also <8 , label <40 bytes
and comment(<200)(proc contents macro, proc means data= &indata nmiss; run;).
2)Verified the date is in iso8601 format. (%macro(data= , var= )
3)verified the values of qscat and qsscat are consistent through the dataset.
4)If the std results in character format are converted into numeric format.
5)The total and subtot parameters shouldn’t have values for original results and have values
for derived flags as Y.
6)The values for blfl = null or Y.
7)The non std variables are represented in supp qualifiers dataset(observed flg, responders,
month,year, analsiflag, changfrom baseline…)
7)verified if all the qstestcd have code values and code text in the value level meta data of
define.xml .
8)Veriified if the formats, variable level metadata, dataset level metadata are

8)Developed and tested various project and study level macros for the analysis.(will wite
the code)
1) To open a directory and see if the dataset is present.
2) To check if a variable is present in the dataset.
3) Analyssis macros : baseline flag macro, proc freq macro , proc univariates macros
(input=dataset, va= , output= statictics dataset) on discrete and continuous variables.
Proc sort macro the datsets with different by variables. Proc printto,
4) Title ,page no macros: which gives the number of titles and footnotes ,creates macro
variables from the titles and footnotes for a table. Conts tne number of pages and
5) Macro which generates efficacy tables for the efficacy parameters.
6) Macros for sub group analysis.
7) Macro for which generates by visit descriptive statiastcs table for the lab parametes
8) Macro for generating graph for efficacy parameters.
9) Macro which creates a disposition table for summary table for the people discontinued
in the treatment period. In this we calculated P value by using chisqr, Fishers exact test
for each reason discontinued for comparing the treatment effects.

9)Involved in monthly meetings with the clinical and Data management teams.
We had monthly protocol meetings and by weekly team meetings. In protocol meetings we
( stastician, protocol manager, medical doctor, our team ) discus about the how the study is
been conducting (i.e) whether data collectd is accurate and is compliant with the protocol ,
are there any amendments to be taken, are there any data issues,
10)Ammendments:
1 The CRF Inclusion criteria form had two criterions :

INC13A)If female, patient is at least 2 years post-menopausal, surgically sterile, or


practicing a medically acceptable method of contraception (for at least 1
month before entry to study) and the patient agrees to continue to use the
same method throughout the study duration.? (1= Yes, 2=No, 3= Not applicable)
So when edit checking : if sex= “ M” then some records had values “No” and for
some “Not applicable”, so modified the form design by considering only 2 choices(yes, not
applicable).

INC14A) If female and of childbearing potential, patient has a negative serum βHCG
pregnancy test at Visit 1.?
1 ○Yes
2 ○No
98 ○NA

2)Included one more criterion for the patient to be discharged (the CGI-I >2)
3)To asses optamology test at visit 10 also (visit1, visit8).

12)In team meetings we discuss about the programming issues,


the?????????????????????????

13)what kind of challenges you faced dealing with clinical study?

1) To use the repaeated and the estimate stmt s in proc mixed for analyzing the efficacy
parameters. I have a knowledge that Proc Mixed is used for the analysis of mixed
linear models (data exhibiting covariance and non constant variability (i.e) the model
consists of both fixed effects and random effects ) .Since, I never got an opportunity
to work on this procedure wasen’t sure of the construction of repeated and estimate
stmts are appropriate to the study design. So for the clarification approached my
statician and explained her my approach and got her comments and then implemented
the model and generated Pvalues , parameter estimates, lsmeans…

2) QC of SDTM data sets. Since it was my first experience to work on CDISC SDTM’S
standards I took it challenging and made analysis on how the study data collected will
be mapped to SDTM domains by following the standard implementation guide. This
ground work gave me more insights of mappings ,which helped me in giving my
comments ,inputs on the mapping specifications send made by the vendor.

According to SOP the QC of SDTM datsets is performed in sequential steps. In the


initial meetings with the vendors we had issues resolved on legacy data mappings to
SDTM domains.

a)Study collected medical history data in 3 domains(1-mediacl and surgical history, 2


- for psychiatric history,3- other psychiatric history). The vendors specifications were
to create 1 standard SDTM domain for the medical and surgical history dataset and for
the rest to create a new domains which has a similar structure of MH domain. But,
according to the implematation guide u can map all the MH data collectd in the study
into a single MH domain by specifying MHCAT variable by specifying values
(General medical history, psychiatric history, other psychiatric history)

After the requirements have been finalized, we have a final kickoff meeting in which
the timelines , schedule of delivery of the datasets and schedule for responding back
for the comments will be determined. It took 3months for converting legacy data to
SDTM and 9-10 months for doing QC. The work was divided among the team
members and it was a collaborated team effort . (did parallel programming for key
domains (efficacy and ,AE,LB,DS,VS,ECG) and edit checks for trial design model
and special purpose model domains. Used proc freq ,proc univarite to get the the
counts for the variables in the dataset and verified with raw datset, see total no of
records are matching).

13)Adhoc analysis: The analysis performed on the study data which is either not
planned or for FDA requests or requirmet from the medical doctors, statisticians.

#1)To generate a listing ,report for the subjects who took concomitant medications
15days before and after the adverse event during the treatment period.

1)select all the adverse events that have start date in the treatment period.

2) proc sort data =ae out = ae1nodupkey;

By &bygrp aedecod;

Run;

3)proc sql noprint;

Select a.subjid , a.aedecod, a.aestdate, b.cmdecod b.cmstdn,

From ae as a inner join cm as b

On a.subjid= b.subjid and

(Aestnd-15) <= cmstdn <= (aestdn+15)

Order by subjid;

Report variables: PT, Soc, summarized by treatment groups.

Listing variables: &bygrp., sex, ethic, race, aestdate, soc/pt, cmclass/cmdecod

2)Two concomitant medications taken in the same time? (resprine, loprazen)

a)output the records with the 2 cm medications in to different datasets.

b)proc sql;
Select a.subjid , a.cmdecod as cm_re, a.cmstdn as st_re, a.cmendn as en_re,
b.cmdecod as cm_lo , b.cmstdn as st_ lo ,b.cmendn as en_lo

From cm1 as a inner join cm2 as b

On a.subjid= b.subjid and

a.cmstdn <= b. cmstdn <= (a.cmendn)

Order by subjid;

b)reports for different ranges (weighgroup, pansscore>=90-120)

d)Report and lising of valid CM’S in the treatment period.

e)Ae table with columns soc/pt and totals for safety population.

14)Difference between Report and listings:

1)Reports gives us an overview of summary statistics (of the variables by trttmet


groups at periods specified.)

2)Listings explicitly displays each record of subject for the criteria we are listing for.

15)Priortization/Busy days: we had to generate reports and listings for the FDA requests and
had short deadlines (approx 30 in 3 days). So, at that point had to plan and schedule my work
and worked for exatra hours as I always give high priority to my company’s and team
objectives along with my own.

16)Different Reports:

IND annual reports: Created RTF and PDF formats for these reports. Genereated DM(baseline
summary statistics , DS(randomized but not treated, premeturl discontinues in the treatment
period), AE (summary for all enrolled subjects). For ongoing and recently locked studies last
1 year data will be considered.

Clinical Trial Transparency report for NIH: Created AE (summary of SAE and non SAE
considering treated subjects). After the database lock.(RTF format) and for ongoing studies
we took 4 months of recent data and generated (DM, AE,DS) reports.

Data monitoring Committee: We had to generate monthly reports for the treated subjects and
consider the all the data until the last day of the month. A DPP will include all the listings and
reports to be generated .

17)Issues with ISS/ISE: We integrated 3 phase- 2 studies and 2 phase- 3 studies data.

Phase 2 studies were having different doses and different comparators and the one study has
longer duration. The two phase 3 studies have same comparator and dose regimen and same
duration and conducted for evaluating the safety and efficacy of the drug for the people with
paranoid type ,resudial type.
Had programmed for integration of efficacy and safety(AE, VS,LB, ECG) and Demography
datasts according to the specifications.

I-ADDM

a)Created an unique Integarted studyid =(studyid+siteid+subjid)

b)Integrated Armcd = have to dissuss

c)Integarted Trtcd = For all subjects in the phase 2 studies who are treated with the study drug
are mapped to same Integarted Treatment code. And for the rest code from the single studies
craeated different Integrated treat codes.

d)Visits from individual studies are pooled and for the study having longer duration the visit
numbers are alligned.(visit 11 = 70- 80 days, visit 12= 81-90).

e)Different subgroups (bmi ,weight, age, ethnic, pannscore….)

e)created 3 different pool groups:

pool1= two phase 3 studies used for generating Integrated efficacy reports

pool2 = two pase 3 and one pase 2 study for generating Integarted safety reports for safety
parameters (EPS scales)

pool3= Pooles all the study data for generating Itegarted safety tables

I=ADAE

1) The structural elements of the MEDRA dictionary (HLGTAECAT, HLTAESCAT,


SOCAEBODSYS , LLTAETERM, PTAEDECOD) needed to be updated to latest
version.
2) Needed to remove any special characters in the AETERM’s of different studies before
merging with the latest version of coding dictionary.
3) Aever variable needed to be updated with latest version.

4) Variable names where different in some studies (aedecod, ptdecod), were mapped to
common variable called aedecod.
5)ISTUDYID, ITRTCD from integrated demography and rest variables from individual
studies.
6)Epoch=?????????
7)Visitid=?????????
I-ADLB
1) The units ????????
2) ISTUDYID, ITRTCD from integrated demography and the
3) Get all the variables from the respective individual ADLB datasets .
4) The pcs from invidual studies.

Under what circumstances would you code a SELECT construct instead of IF statements?
Ans: if there are many mutually exclusive conditions.

What statement do you code to tell SAS that it is to write to an external file?

What statement do you code to write the record to the file?

Ans: data _null_

set xxx;

file ‘YYY’;

put vv ii nnoodd;

run;

If reading an external file to produce an external file, what is the shortcut to write that record
without codingevery single variable on the record?

If you are not wanting any SAS output from a data step, how would you code the data
statement toprevent SAS from producing a set? What is the one statement to set the criteria of
data that can be coded in any step?

Ans: setHave you ever linked SAS code? If so, describe the link and any required statements
used to either processthe code or the step itself.How would you include common or reuse
code to be processed along with your statements? When looking for data contained in a
character string of 150 bytes, which function is the best to locate thatdata: scan, index, or
indexc?If you have a data set that contains 100 variables, but you need only five of those,
what is the code to forceSAS to use only those variable?Code a PROC SORT on a data set
containing State, District and County as the primary variables, along withseveral numeric
variables.How would you delete duplicate observations?How would you delete observations
with duplicate keys?How would you code a merge that will keep only the observations that
have matches from both sets. Ans: data xxxmerge yyy(in = inxxx) zzz (in = inzzz) by aaa;if
inxxx = 1 and inyyy = 1;run;How would you code a merge that will write the matches of both
to one data set, the non-matches from theleft-most data set to a second data set, and the non-
matches of the right-most data set to a third data set. What is the Program Data Vector
(PDV)? What are its functions? Ans: To store the current obs;Does SAS ‘TranslateÃ
¢â‚¬â„¢ (compile) or does it ‘Interpret’? Explain. Ans: compile. At compile
time when a SAS data set is read, what items are created? Ans: Descriptor portion of the data
set, PDV.Name statements that are recognized at compile time only? Ans: putIdentify
statements whose placement in the DATA step is critical. Ans: Data, Run;Name statements
that function at both compile and execution time. Ans:input
Name statements that are execution only.In the flow of DATA step processing, what is the
first action in a typical DATA Step? Ans: set the variables to missing. What is _n_? Ans: SAS
variable to count the no of obs read.5.MohanPosted 11/22/2006 at 11:12 am | PermalinkIf
you have a data set that contains 100 variables, but you need only five of those, what is the
code to forceSAS to use only those variable? Ans: use the data step option OBS=5Code a
PROC SORT on a data set containing State, District and County as the primary variables,
along withseveral numeric variables. Ans: PROC SORTdata= out= ;BY Country State
District;RUN;How would you delete duplicate observations? Ans: use PROC SORT with the
option NODUPHow would you delete observations with duplicate keys? Ans: use PROC
SORT with the option NODUPKEY 6.MohanPosted 11/22/2006 at 11:15 am | PermalinkIf
you have a data set that contains 100 variables, but you need only five of those, what is the
code to forceSAS to use only those variable? Ans: Sorry, my apologies for the above answer
for this question. The correct answer isuse KEEP= option in the SET statement of the DATA
step

7.thiyagarajanPosted 12/20/2006 at 8:38 am | PermalinkHi All,1)Can you please tell me the


list of compile time statements?2)I have 100 observations from that i want to create 5 datasets
like a1,a2,a3,a4,a5.Each data set shouldcontain 20 observation each.For ex.a1 would contain 1
to 20 obs and a2 would contain 21-40 like that… waiting for your comments8.Kishore
Kumar.K Posted 12/26/2006 at 12:40 am | Permalink13. What is the one statement to set the
criteria of data that can be coded in any step? Ans:- Options9.srinivas.MPPosted 1/2/2007 at
10:07 am | Permalink2)I have 100 observations from that i want to create 5 datasets like
a1,a2,a3,a4,a5.Each data set shouldcontain 20 observation each.For ex.a1 would contain 1 to
20 obs and a2 would contain 21-40 like thatà ¢â‚¬Â¦data a1 a2 a3 a4 a5;set ;if obs>=1 and
=21 and =41 and =61 and

10.kavithaPosted 1/18/2007 at 11:21 am | Permalinkhiis the following code right for


following question?How would you code a merge that will write the matches of both to one
data set, the non-matches from theleft-most data set to a second data set, and the non-matches
of the right-most data set to a third data set.data one two three;maerge a (in=ina) an b=(inb) by
xx;if ina=1 and inb=1 then proc print data=one;if ina=1 and inb=0 then proc print data=two;if
ina=0 and inb=1 then proc print data = three;run;11.kavithaPosted 1/18/2007 at 11:31 am |
PermalinkI have 100 observations from that i want to create 5 datasets like
a1,a2,a3,a4,a5.Each data set should contain20 observation each.For ex.a1 would contain 1 to
20 obs and a2 would contain 21-40 like that…let us say x is the data set that had all the
199 observations…data a1 a2 a3 a4 a5;set x;if firstobs=1 and obs=20 then proc print data
=a1;if firstobs=21 and obs=40 then proc print data =a2;if firstobs=41 and obs=60 then proc
print data =a3;if firstobs=61 and obs=80 then proc print data =a4;if firstobs=81 and obs=100
then proc print data =a5;run;if you do not want the output in log file you do not have to use
proc print

12.kavithaPosted 1/18/2007 at 11:38 am | Permalink At compile time when a SAS data set is
read, what items are created?1. input buffer2. Program Data Vector (PDV)3. Descriptor
Information13.kavithaPosted 1/18/2007 at 11:46 am | Permalink When looking for data
contained in a character string of 150 bytes, which function is the best to locate thatdata: scan,
index, or indexc?ans: Index will give you the position14.kavithaPosted 1/19/2007 at 11:51 am
| Permalinkcan anyone please tell me if there is a sas user group or discussion forum near
fairfax, va15.Kishore Kumar.KandikatlaPosted 3/25/2007 at 2:14 pm | Permalink
13. What is the one statement to set the criteria of data that can be coded in any step? Ans:-
Options16.Kishore Kumar.KandikatlaPosted 3/25/2007 at 2:18 pm | PermalinkHow do you
read in the variables that you need?Using input statement.17.Arun Mathura.G said,Posted
4/3/2007 at 6:48 am | PermalinkI have 100 observations from that i want to create 5 datasets
like a1,a2,a3,a4,a5.Each data set should contain20 observation each.For ex.a1 would contain 1
to 20 obs and a2 would contain 21-40 like that… Answer:data a1 a2 a3 a4 a5;set x;if
_N_ >=1 and _N_ =21 and _N_ =41 and _N_ =61 and _N_ =81 and _N_18.hiPosted 4/4/2007
at 1:04 am | Permalinkdata a1 a2 a3;set x;if _n_ >=1 and _n_ =21 and _n_ =41 and _n_

19.SIVA Posted 4/10/2007 at 7:08 am | Permalinkdata a1 a2 a3 a4 a5;set x;select;when (_n_


le 20) output a1;when (2020.sivaPosted 4/10/2007 at 7:13 am | Permalink At compile time
when a SAS data set is read, what items are created? While reading a SAS data set the input
buffer will not be created. It will be created only when the externaldata is read. So when a sas
data set is readPDV and the descriptor portion will only be created.21.raoPosted 4/17/2007 at
9:49 pm | Permalink1) what is difference between SASV8.0 and SASV9.0?2) u r creating
Macro variable where we have to stored it?3) what is difference between SAS debugging and
Macro debugging?4)what is the feature of SET statements?5)what is the feature of merge
statements?

22.DharshaPosted 5/1/2007 at 8:51 am | PermalinkHow would you code a merge that will
keep only the observations that have matches from both sets. Ans)suppose their are two files
difined. assume A & B, based on key (ID) u need only matching form both thedatasets to be
in anotherData infilea;infileinfilea;input @1 id $char10.@11 datafiled $char100.;Proc Sort by
id;Data infileb;infileinfileb;input @1 id $char10.@11 datafiled $char100.;Proc Sort by
id;Data_null_Merger infilea(in=a) infileb (in=b) by id;If A=B thenFile Outfile;Put @1 id
Char10;@11 Datafield $char100.End;23.KiranPosted 5/2/2007 at 5:42 am | PermalinkHow
would you code a merge that will write the matches of both to one data set, the non-matches
from theleft-most data set to a second data set, and the non-matches of the right-most data set
to a third data set.data one two three;maerge a (in=ina) an b=(inb) by xx;

if ina=1 and inb=1 then output one;if ina=1 and inb=0 then output two;if ina=0 and inb=1
then output three;run;24.MontacerPosted 5/3/2007 at 7:09 am | PermalinkI would like to
estimate a Tobit model on panel data. PROC QLIM is supposed to be the
appropriateprocedure. But did not find any details on how to tell PROC QLIM that I am using
panel data. Could youmind helping me please?25.sannidhiPosted 5/14/2007 at 9:04 pm |
Permalinkhi%let A= 3%let B= 4%let C= &A + &B%let C= &C What is the value of %Put
statement?Please give me the answer.26.raviPosted 5/21/2007 at 11:44 am | Permalinkhi%let
A= 3

%let B= 4%let C= &A + &B%let C= &C What is the value of %Put statement?Please give
me the answer.c=3+427.PaulPosted 5/25/2007 at 4:50 pm | Permalink Also firstobs and obs
are dataset control options. Not execution time statements. This code will never work. A more
feasible answer is to do this:%*sets up a bogus datasource;data x;do i=1 to 100;xx=’Your
Data’;end;run;%*accomplishes the task;data a1 a2 a3 a4 a5;set x;cnt+1; put cnt=;if 0kavitha
said,I have 100 observations from that i want to create 5 datasets like a1,a2,a3,a4,a5.Each data
set should contain20 observation each.For ex.a1 would contain 1 to 20 obs and a2 would
contain 21-40 like that…let us say x is the data set that had all the 199 observationsÃ
¢â‚¬Â¦data a1 a2 a3 a4 a5;set x;if firstobs=1 and obs=20 then proc print data =a1;if
firstobs=21 and obs=40 then proc print data =a2;if firstobs=41 and obs=60 then proc print
data =a3;if firstobs=61 and obs=80 then proc print data =a4;if firstobs=81 and obs=100 then
proc print data =a5;run;if you do not want the output in log file you do not have to use proc
print

28.PaulPosted 5/25/2007 at 4:51 pm | Permalink%*accomplishes the task;data a1 a2 a3 a4


a5;set x;cnt+1; put cnt=;if 029.bharathPosted 7/16/2007 at 12:57 am | Permalinkhi,these is
bharath,iam learning sas will anyone give me the answers for these quetions .plz forward
me1.name statments that functions at both compile & execution time ?2.namestatments that
are execution only ?3.what is the difference between var A1-A4 &varA1–A4?4.wat is the
order of evaluation of the comparison operators : + - */** () ?5.wat is the significance of the
‘OF’ in x= sum (OF a1-a4,a6,a9); ?30.kishorekumarkandikatlaPosted 7/28/2007 at 2:21 pm |
PermalinkHi rao here are my comments:1) what is difference between SASV8.0 and
SASV9.0?ans:- u can take this as one: SASv9 (x)…more improved version from sas with
respect to BI tools and withmany advanced features.SASv8.0:- limited features not that much
concentrated on BI tools.

2) u r creating Macro variable where we have to stored it?ans: it depends on you, if u want to
make them as reusable components u can save them in one pertucularlocation and u can call
them by using %include.3) what is difference between SAS debugging and Macro debugging?
ANS: debugging is debuggin. no idea.4)what is the feature of SET statements?ans: you can
apply transformations and u can even load the data.one importance: its a technique to improve
the efficiency of sas programs.5)what is the feature of merge statements?One simple answer is
u can perform all types of joins.31.nvv Posted 8/14/2007 at 7:18 am | PermalinkHow would
you code a merge that will write the matches of both to one data set, the non-matches from
theleft-most data set to a second data set, and the non-matches of the right-most data set to a
third data set.THIS NOT WORKING GUYS… PLZ CHECK ITfilename x ‘C:\Documents
and Settings\vinayak\Desktop\v.txt’;data a1 a2 ;infile x;if firstobs=1 and obs=20 then proc
print data=a1;if firstobs=21 and obs=40 then proc print data=a2;run;proc print data=a1
a2;run;32.GopiPosted 8/19/2007 at 9:15 am | PermalinkHi All,Can anyone explain
about:Proc LIFETEST and Proc GLM

33.madhuPosted 9/21/2007 at 8:26 am | PermalinkQ:what is the difference between dsd and


missover?Q:with out data set name create data set ? it is possible?34.AKA Posted 12/11/2007
at 6:10 pm | PermalinkQ:what is the difference between dsd and missover?missover prevents
a sas program from going to a new input line if it doesnot find values in the current linefor all
the input variables.Q:with out data set name create data set ? it is possible?data
_NULL_;35.AKA Posted 12/11/2007 at 6:55 pm | Permalink# How does SAS handle missing
values in: assignment statements, functions, a merge, an update, sort order,formats, PROCs?#
How many missing values are available? When might you use them?# How do you test for
missing values?# How are numeric and character missing values represented internally?

36.AKA Posted 12/11/2007 at 6:56 pm | Permalink# How do you make use of functions?#
When looking for contained in a character string of 150 bytes, which function is the best to
locate thatdata: scan, index, or indexc?# What is the significance of the ‘OF’ in X=SUM(OF
a1-a4, a6, a9);?# What do the PUT and INPUT functions do?# Which date function advances
a date, time or date/time value by a given interval?# What do the MOD and INT function do?#
How might you use MOD and INT on numerics to mimic SUBSTR on character strings?# In
ARRAY processing, what does the DIM function do?# How would you determine the number
of missing or nonmissing values in computations?# What is the difference between:
x=a+b+c+d; and x=SUM(a,b,c,d);?# There is a field containing a date. It needs to be
displayed in the format “ddmonyy” if it’s before 1975,“ddmonccyy” if it’s after 1985, and as
‘Disco Years’ if it’s between 1975 and 1985. How would you accomplishthis in data step
code? Using only PROC FORMAT.# In the following DATA step, what is needed for
‘fraction’ to print to the log? data _null_; x=1/3; if x=.3333then put ‘fraction’; run;# What is
the difference between calculating the ‘mean’ using the mean function and PROC MEANS?
37.GowriPosted 1/10/2008 at 2:55 am | Permalink When looking for contained in a character
string of 150 bytes, which function is the best to locate that data:scan, index, or indexc? Ans:
INDEX function. Scan just sub strings the data and INDEXC looks for one char.# What is the
significance of the ‘OF’ in X=SUM(OF a1-a4, a6, a9);? Ans: Without OF
this will be sonsidered as subtraction. I mean a4 will be subtracted from a1.# What do the
PUT and INPUT functions do? Ans: PUT converts Numeric to character and INPUT converts
Character to Numeric explicitly Which date function advances a date, time or date/time value
by a given interval? Ans: INTNX

In the following DATA step, what is needed for ‘fraction’ to print to the
log? data _null_;x=1/3; if x=.3333 then put ‘fraction’; run; Ans: FRACTw.
formatcna be used to accomplish this38.DavidPosted 5/1/2008 at 10:26 pm |
Permalink1.What SAS statements would you code to read an external raw data file to a
DATA step? Answer: INFILE statement is used to read the external raw data fiel to a Data
Step.2. How do you read in the variables that you need? Answer: List the variable names after
INPUT statement.3. Are you familiar with special input delimiters? How are they used?
Answer: Yes, Using INPUT statement and DLM=option and DSD optin to read delimited
file.4. If reading a variable length file with fixed input, how would you prevent SAS from
reading the next recordif the last variable didn’t have a value? Answer: Using
MISSOVER option in INFILE statement.5. What is the difference between an informat and a
format? Name three informats or formats. Answer: Informat gives SAS special instruction for
reading a variable, and Format gives SAS specialinstruction for writing a variable.For
example: DATEw. w. w.d6. Name and describe three SAS functions that you have used, if
any? Answer: DAY(date) Today() SUBSTR(ARG,POSITION,N)SUM()7. How would you
code the criteria to restrict the output to be produced? Answer: Using WHERE statement8.
What is the purpose of the trailing @? The @@? How would you use them? Answer: Both
are line hold specifiers, the difference is how long they hold a line for input, The trailing@
isused to hold the line of the raw data for subsequent INPUT statements, but releases that line
when SASreturns to the top of DATA step, the trailing @@ will hold the line even when SAS
starts build a new obsrvation.9. Under what circumstances would you code a SELECT
construct instead of IF statements? Answer:10. What statement do you code to tell SAS that it
is to write to an external file? Answer: Using _null_ statement11. What statement do you code
to write the record to the file? Answer: Using FILE and PUT statements12. If reading an
external file to produce an external file, what is the shortcut to write that record withoutcoding
every single variable on the record? Answer: Using EXPORT procedure

13. If you’re not wanting any SAS output from a data step, how would you code the
data statementto prevent SAS from producing a set? Answer: _NULL_14. What is the one
statement to set the criteria of data that can be coded in any step? Answer: Where
statement15. Have you ever linked SAS code? If so, describe the link and any required
statements used to eitherprocess the code or the step itself.16. How would you include
common or reuse code to be processed along with your statements?
MICRO39.jyotsna.GPosted 6/25/2008 at 5:15 am | Permalink1.what is the difference between
dsd and missover?2.Give some information about _IOSC_ ?3.What is the role of _n_ and
_null_ ?40.ajaysinhaPosted 6/27/2008 at 8:36 pm | Permalink1)What SAS statements would
you code to read an external raw data file to a DATA step?ans)infile statement2)How do you
read in the variables that you need?ans)use varlike proc print data=er; varnum;run;3)Are you
familiar with special input delimiters? How are they used?ans)yes . dlm=’,'; or dlm=’ ‘;etc4)If
reading a variable length file with fixed input, how would you prevent SAS from reading the
next recordif the last variable didn’t have a value?ans)use specific length or
formatlike input num 9. char $5.;

5)What is the difference between an informat and a format? Name three informats or
formats.ans)informat is used to print the data from the dataset by ommitng some values
whereas format reads a data differntly then what is actually stored in the dataset by adding our
own values.date, ddmmyy and worddate

6)Name and describe three SAS functions that you have used, if any?
ans)a=sum(a,b,c);s=log(a);d=lag(a);

7)How would you code the criteria to restrict the output to be produced?ans) use format

8)What is the purpose of the trailing @? The @@? How would you use them?ans)@ is
coulmn pointer to get the specific value while @@ is used if variables are less and value are
more and u want to read each value input @ ‘content:’ a 9. b 5.;input a b c @@;

9)Under what circumstances would you code a SELECT construct instead of IF statements?
ans) when u want to use a group

10)What statement do you code to tell SAS that it is to write to an external file?ans)ods or
datafile.

11)What statement do you code to write the record to the file?ans)dde12)If reading an external
file to produce an external file, what is the shortcut to write that record withoutcoding every
single variable on the record?ans)dde and ods13)Ifyou’re not wanting any SAS
output from a data step, how would you code the data statement toprevent SAS from
producing a set?ans) data _null_;14)What is the one statement to set the criteria of data that
can be coded in any step?ans)set15)Have you ever linked SAS code? If so, describe the link
and any required statements used to eitherprocess the code or the step itself.ans)use
macro16)How would you include common or reuse code to be processed along with your
statements?ans)using format17)When looking for data contained in a character string of 150
bytes, which function is the best to locatethat data: scan, index, or indexc?ans)scan18)If you
have a data set that contains 100 variables, but you need only five of those, what is the code
toforce SAS to use only those variable?ans)in input statement use only those variables name
which u want.19)Code a PROC SORT on a data set containing State, District and County as
the primary variables, along with several numeric variables.ans)proc sort data=q; by
state;run;similarly other statements

20)How would you delete duplicate observations?ans)using dupkey

21)How would you code a merge that will keep only the observations that have matches from
both sets.ans)usning merge and dupkey
22)How would you code a merge that will write the matches of both to one data set, the non-
matches fromthe left-most data set to a second data set, and the non-matches of the right-most
data set to a third data set.ans)merge , dupkey and nodupkey

23)What is the Program Data Vector (PDV)? What are its functions?ans)memory that sas
creates while running the sas program in compilation stage is called PDV

24)Does SAS ‘Translate’ (compile) or does it ‘Interpret’?


Explain.ans)compile with creating PDV

25)At compile time when a SAS data set is read, what items are created?ans)variables

26)Name statements that are recognized at compile time only?ans)data

27)Identify statements whose placement in the DATA step is critical.ans)run;

28)Name statements that function at both compile and execution time.ans)run;

29)Name statements that are execution only.ans)proc;

30)In the flow of DATA step processing, what is the first action in a typical DATA Step?
ans)creating the name of dataset.

31)What is _n_?ans)new line

41.SanPosted 8/4/2008 at 3:32 am | PermalinkI have mentioned the correct answer for some
Q. David and Ajay has already mentioned the correct answer for the other Qs.4)If reading a
variable length file with fixed input, how would you prevent SAS from reading the next
recordif the last variable didn’t have a value?Use MISSOVER while the file is
readINFILE datalines MISSOVER;5)What is the difference between an informat and a
format? Name three informats or formats.Format is used to print the output in the desired
formate.g., date9. comma9. DDMMYY10.Informat is used with the input file which helps to
read the data in the desired format to use it inside the

program for valuationINFORMATs can be used with the INPUT() function, and FORMAT
with the PUT() function.8)What is the purpose of the trailing @? The @@? How would you
use them?SAS provides two line-hold specifiers.The trailing at sign (@) holds the input
record for the execution of the next INPUT statement.The double trailing at sign (@@) holds
the input record for the execution of the next INPUT statement, evenacross iterations of the
DATA step.input Name $20. @; or input Name $20. @@;9)Under what circumstances would
you code a SELECT construct instead of IF statements? When you have a long series of
mutually exclusive conditions and the comparison is numeric, using aSELECT group is
slightly more efficient than using a series of IF-THEN or IF-THEN/ELSE statements because
CPU time is reduced. SELECT groups also make the program easier to read and
debug.10)What statement do you code to tell SAS that it is to write to an external file?Using
FILE statement we can write the data from the external file. INFILE for reading the data from
theexternal file11)What statement do you code to write the record to the file?Using PUT
statement or OUTPUT statement18)If you have a data set that contains 100 variables, but you
need only five of those, what is the code toforce SAS to use only those variable?List
Input19)Code a PROC SORT on a data set containing State, District and County as the
primary variables, along with several numeric variables.proc sort data=q; by state District
County;run;31)What is _n_?No of times the data step executed.42.saibabuPosted 9/8/2008 at
3:57 am | Permalinkhi,i have 100 observations from that i want to create 5 datasets like
a1,a2,a3,a4,a5.Each data set should contain20 observation each.For ex.a1 would contain 1 to
20 obs and a2 would contain 21-40 like that…data a1 a2 a34 a4 a5;set xxx;if 1 le _n_ le
20 then ouput a1;if 21 le _n_ le 40 then ouput a2;if 41 le _n_ le 60 then ouput a3;if 61 le _n_
le 80 then ouput a4;

if 81 le _n_ le 100 then ouput a5;run;

You might also like