You are on page 1of 16

SAS-INTERVIEW QUESTIONS

1. What SAS statements would you code to read an external raw


data file to a DATA step?
Ans: Infile and Input statements are used to read
external raw data file to a Data Step.
2. How do you read in the variable that you need?
Ans: If we want to read a particular variable in a
set of SAS data set, we can mention the variable
we want in the INPUT statement.
3. Are you familiar with special input delimiters? How are
they used?
Ans: Yes, we have special delimiters like DLM and DSD
in SAS. Both these delimiters can be used in
the infile statement
The DLM can read the commas and spaces
as data delimiters. You may choose any delimiters
you wish with this option. You can choose
multiple character such as DLM=XX for your
delimiter.
The DSD option allows you to treat two
consecutive delimiters as containing a missing
value.
4. If reading a variable length file with fixed input, how
would you prevent SAS from reading the next record if the
last variable didnt have a value?
Ans: We can use MISS OVER option in the INFILE statement
5. What is the difference between an informat and a format?
Name three informat or format?
Ans: An informat is an instruction that SAS uses to read
data values into a variable
A format is an instruction that SAS uses to write
data values
The three informat are: A) Date informat
B) Character informat
c) Numeric informat
The three Formats are:-

A) Date format
B) Character Format
C) Numeric Format

6. Name and describe three SAS function that u have used, if


any?
Ans:
A) SUM Function: It adds the variable together by
ignoring the missing values
if any
E.G: Var=SUM (var1, var2varn);
Var1= SUM (1,., 3) = 4
B) Mean Function: This function returns the arithmetic
mean (average) and ignores the missing value.
E.G: Var=MEAN (var1, var2, var3varn);
C) SUBSTR Function: The SUBSTR function extracts a
portion of the character data values based on how
many characters are designated for retrieval.
E.G: Var=SUBSTR (var, start<, number of characters);
Var1=SUBSTR (ASHOK, 1, 3)
In the above example the SUBSTR function takes
String ASHOK cuts from start-point (1) till number
of Characters (3) and stores ASH in Var1

7. How would you code the criteria to restrict the output


to be produced?
Ans: ods output close;

8. What is the purpose of trailing@? The @@? How would you


use them?
Ans: The trailing @ is also known as column pointer
By using the trailing@, in the INPUT statement
gives you ability to read a part of your raw data
line, test it, and then decide how to read
additional data from the same record.
The single trailing @ tells the SAS system to hold
the line.
The double Trailing @@ tells the SAS system to
Hold the line more strongly.
NOTE : An INPUT statement ending with @@ instructs the program
to release the current raw data line only when there are no

data values left to be read from that line. The @@, therefore,
hold the input record even across multiple iteration of the
data step.
9. Under what circumstances would you code a SELECT construct
instead of IF statement?
Ans: Especially if you are recoding a variable into a
large number of categories.
10. What statement do you code to tell SAS that it is to
write to an external file?
Ans: Filename fileref path;
File fileref;
Put _all_
/* will write all the variables. */
Or put the variables which you require.
11. If reading an external file to produce an external
file, what shortcut to write record without coding
every single variable on the record?
Ans: Put _all _

12. If you do not want any SAS output from a data step, how
would you code the data statement to prevent SAS from
producing a set?
Ans: By using DATA _NULL_ the desired output is a file
and not a SAS dataset.
13. What is the one statement to set the criteria of a data
that can be coded in any step?
Ans: Options statement

14.

Have you ever-linked SAS code? If so, describe the like


and any required statement used to either process the
code or the step itself.
Ans : The link statement tells SAS to jump immediately
To the statement label that is indicated in the
Label statement and to continue executing
statements from that point until a RETURN
statement is executed. The RETURN statement
ends program control to the statement immediately
following the LINK statement.

Note: The LINK statement and the destination must


be in the same DATA step. The destination
is identified by a statement label in the
LINK statement.
15.

How would you include common or reuse code to be


Processed along with your statement?
Ans: By using %Include

16.

When looking for the data contained in a character


string of 150 bytes, which function is the best to
locate that data: scan, index or indexc?
Ans: Scan

17. If you have a data set that contains 100 variables, but
you need only five of those, what is the code to force
SAS to use only those variables?
Ans: Use keep = option;
18.

Code a PROC SORT on a data set containing state,


district and country as the primary variable, along
with several numeric variables.
Ans:
PROC SORT data-set-name;
BY state district country;
Run;

19. How would you delete duplicate observation?


Ans: There are three ways to delete duplicate
observations in a dataset
1) Proc sort data=SAS-data-set nodups;
by var;
run;
2) Proc sql;
Create sas-data-set as select * from
old_sas_data_set where var=distinct(var);
quit;
3)Data clean;
Set temp;
By group;
If first.group and last.group then
Run;

20. How would you code a merge that will keep only the
observation that have matches form both sets?
Ans: By using the IN internal variable in the merge
statement.
DATA NEW;
MERGE ONE_TEMP (IN=ONE) TWO_TEMP (IN=TWO);
BY NAME;
IF ONE=1 AND TWO=1;
RUN;

21. What is the Program Data Vector (PDV)? What are


functions?

their

Ans:
Program Data Vector is the temporary holding area.
For example
The WHERE statement is may be more
efficient then the sub setting If (especially if you are
taking a very small sunset from a large file) because it
checks on the validity of the condition to see if the
observation is to be kept or not. This temporary holding
area is called the program data vector (PDV).
22. Does SAS Translate (compile) or does it Interpret?
Explain.
Ans: When you submit a DATA step for execution, SAS
checks the syntax of the SAS statements and compiles
them, that is, automatically translates the
statements into machine code. In this phase, SAS
identifies the type and length of each new variable,
and determines whether a type conversion is
necessary for each subsequent reference to a
variable.

23. At compile time when a SAS data set is read, what


items are created?
Ans: At compile time SAS creates the following
A) Input Buffer
B) Program Data Vector(pdv)
C) Descriptor information

24. Name statements that are recognized at compile time

Only?
Ans: Drop Keep e.t.c

25. Identify statement whose placement in the DATA step is


critical
Ans: Input Statement.
26. Name statements that function at both compile and
execution time.
27. Name statements that are execution only.
28. In the flow of the DATA step processing, what is the
first action in a typical DATA step?
Ans: SAS first performs Syntax check.

29. What is _n_?


Ans: This is nothing but a implicit variable created by
SAS during data processing. It gives the total number
Of records SAS has iterated in a dataset. It is
Available only for data step and not for procs.
E.G: If we want to find every third record in a
Dataset then we can use the _n_ as follows
Data new-sas-data-set;
Set old;
If mod (_n_, 3) =1 then;
Run;
Note: If we use a where clause to subset the _n_
Will not yield the required result.

BASE SAS:
30. What is the effect of the OPTION statement ERROR=1?
Ans: If the particular data step has one or more errors
then end the processing

31. Whats the difference between VAR A1 A4 and VAR A1--A4?

32. What do the SAS log messages numeric values have been
converted to character mean?
Ans:

If we try some character function on the numeric


values the SAS will automatically convert the
numeric variable into character variable.

33. Why is a STOP statement needed for a POINT=option on a SET


statement?
Ans: Because POINT= reads only the specified observations,
SAS cannot detect an end-of-file condition as itwould if the
file were being read sequentially. Because detecting an end-offile condition terminates a DATA step automatically, failure to
substitute another means of terminating the DATA step when you
use POINT= can cause the DATA step to go into a continuous
loop.
NOTE:
You cannot use the POINT= option with any of the
following:

BY statement
WHERE statement
WHERE= data set option
transport format data sets
sequential data sets (on tape or disk)
a table from another vendor's relational database
management system.

34. How do you control the number of observation and /or


variable read or write?
Ans: By specifying obs option
35. Approximately what date is represented by the SAS date
value of 730?
Ans: 1 January 1962.
36. How would remove a format that has been permanently
associated with a variable.
Ans: By Using proc datasets library= somelibrary;
Modify sasdataset;
Run;

37. What does the RUN statement do?

Ans: The run statement executes the statement.


38. Why SAS considered self-documenting?
Ans: when a sas-data-set is created SAS creates the
Descriptor portion and the data portion of the
Data set. The descriptor portion contains the
Details like when the dataset was created, no. of
Observations, no. of variables e.t.c. Hence SAS is
Considered self documenting.

39. Briefly describe 5 ways to do a table lookup in SAS.


Ans:
1) Simple table lookup (merging (merge (including
IN=OPTION) and sub setting IF statement)
2) Simple table lookup (formats (PROC FORMAT AND PUT
function).
3) Looking up with two variable (merging (merge
(including IN=OPTION) and sub setting IF statement)
4) Looking up with two variable ((formats (PROC
FORMAT, PUT AND INPUT Function)
5) A two-way Looking table (merge statement using two
variables).

40. What are some good SAS programming practices for


processing vary large data set?
Ans: For vary large data set with many variables we can
make use of arrays in the SAS systerm.

41. How would you create a data set with 1 observation and 30
variables from a data set with 30 observations and 1
Variable?
Ans: Using Proc Transpose and also do with the sas arrays.

44. What are _numeric_ and _character_ and what do they do?

Ans: If we want to do a particular task for all the


numeric variable we can use the _numeric_ and same as
if we want to do a particular task for all the
character variable we can use the _character_

46. What is the order of application for output data set


option, input data set option and SAS statement?
Ans: INPUT data set option, SAS statement option and then
OUTPUT option.
47. What is the order of evaluation of the comparison
operators: + - * /** ()?

Missing Value:

56. How many missing values are available? When might you use
them?
Ans: Two missing values are available in SAS, they are
numeric and character.
57. How do you test for missing values?
Ans: We can test the missing values by using NMISS
option in the input statement
58. How are numeric and character missing values represented
internally?
Ans: The numeric missing values represented as dots(.) and
the character missing values represented as blank

FUNCTIONS:
59. What is the significance of the OF in X=SUM (OF a1-a4,
a6, a9);?
60. What do the PUT and INPUT function do?
Ans: The PUT function is used to identify the logic
Problem Which piece of code is executed and not
executed what the current value of the particular
variable and what the current value of the all
variable.

INPUT function:
The traditional use is the reread a character variable with a
numeric format, execute a character-to-numeric conversion.
The character to numeric conversion function;
INPUT (variable, informat-name)
The INPUT function converts the character variable to numeric
Salary=input (EMP_SALARY, dollar7.);
Character value
EMP_SALARY
$85,000

Numeric value
SALARY
85000

Rename the assigning variable we cannot have the same name.


Like: EMP_SALARY=input (EMP_SALARY, dollar7.);
The numeric to character conversion function
PUT (variable, informat-name);
newphone=put (phone, 7);

numeric value
PHONE
6778000

character value
PHONE
6778000

61. Which date advances a date, time or date/time value by a


given interval?
62. What do the MOD and INT function do?
Ans: MOD function is very useful if suppose you want to
select every third observation from SAS data set.
Example=
data third;
Set old;
If mod(_N_,3)=1;
Run;
The INT function retunes the integer portion of an
argument. To truncate a number (drop off the
fractional part), you use the INT function.

63. In ARRAY processing, what does the DIM function do?


Ans: DIM is the dimension function. This returns the
length of the array (i.e. the number of variable in
the list).
64. How would you determine the number of missing or nonmissing value in computation?
Ans: We can use the N option for the number of NONMISSING values and NMISS option for the number of
MISSING values.

65. What is the difference between: X=a+b+c+d; and X=SUM (a,


b, c, d);?
Ans: If we use SUM (a, b, c, d) it will ignore the missing
Values if any and compute the sum.
For E.G SUM(1,.,2,3)=6
X=1+.+2+3 = MISSING.

66. There is a field containing a date. It needs to be


displayed in the format ddmonyy if its before 1975,dd
mon ccyy if its after 1985, and as disco years if its
between 1975 and 1985. How would you accomplish this in
data step code? Using only PROC FORMAT.
67. In the following DATA step, what is needed for
fractionto print to the log
Ans:
data _null_; X=1/3;
if X=.333 then ;
put fraction;
run;

68. What is the difference between calculating the mean


using the mean function and PROC MEANS?
Ans: The mean function returns the mean of the non-missing
values in the variable list. Actually, you may not
have figured out the importance of the way the MEAN
function deals with the missing values, and this is
quit important .if you calculate SCORE by simply

adding up all the item and dividing by 50 as follows


SCORE=(item1 +item2+item3+..+item50)/50;
You would be in big trouble if any of the items had
missing values. When SAS statement tries to do
arithmetic operation on missing values, the result is
always missing.

PROCs:

69. If you were given several SAS data sets you were
unfamiliar with, how would you find out the variable names
and formats of each dataset?
Ans: I can use the contents Procedure of all in the
libname and see all the variable name and formats of
each data set
EG:
PROC CONTENTS DATA=LIBREF._ALL_;
RUN;

70. How would you keep SAS from overlaying the SAS set with
its sorted version?
Ans: By creating a new dataset after sorting by specifying
Out = new sas dataset
71. In PROC PRINT, can you print only variable that begin with
the letter A
Ans: Yes we can print variable which begin with the letter
A by using the WHERE statement in the PROC PRINT
statement
WHERE (VARIABLE NAME) LIKE A%;
Or
WHERE (VARIABLE NAME =: A;
72. What are some differences between PROC SUMMARY and PROC
MEANS?
Ans:
1) PROC MEANS produces subgroup statistics only when a
BY statement is used and the input data has been
previously sorted (use PROC SORT) by the BY
variables.PROC SUMMARY automatically produces

statistics for all subgroups, giving you all the


information in one run that you would get by
repeatedly sorting a data set by the variables that
define each subgroup and running PROC MEANS/.
2) PROC SUMMARY does not produce any information in
your output so you will always need to use the
OUTPUT statement to create a new data set and use
PROC PRINT to see the computed statistics.

PROC FREQ:
73. Code the table statement for a single-level (most common)
frequency.
Ans
The statement for single-level.
DATA MAR.FREQTEST;
SET BAS.AMPERS;
PROC FREQ DATA =MAR.FREQTEST;
TABLE AGE;
RUN;
74. Code the table statement to produce a multi-level
frequency.
Ans:
The statement for multilevel.
DATA MAR.FREQTEST;
SET BAS.AMPERS;
PROC FREQ DATA =MAR.FREQTEST;
TABLE AGE * gender;
RUN;

75. Name the option to produce a frequency line items rather


that a table.
76. Produce output from a frequency. Restrict the printing of
the table.

PROC MEANS:
77. Code a PROC MEANS that shows both summed and averaged
output of the data.
78. Code the option that will allow MEANS to include missing
numeric data to be included in the report.
79. Code the MEANS to produce output to be used later.
80. Do you use PROC REPORT or PROC TABULATE? Which do you
prefer? Explain.

MERGING/UPDATING :
81. What happens in a one-on-one merge? When would you use
one?
Ans:If you want to merge two data set that have different
variable and only one variable as a common variable
with that unique variable we can merge the data set
with one-on-one merge.
82.

How would you combine 3 or more tables with different


structures?

83.

What is the problem with merging two data set that have
variable with the same name but different data?
Ans:The second data set value will overwrite the value
of the first data set.

84.

When would you choose to MERGE two data sets together and
when would you SET two data sets?
Ans: If we want to create a dataset as an exact copy of
The old dataset without any bothering about which
Dataset is going to contribute to the new dataset
Then we will use set statement.
If we want to control the contribution of the old
Datasets to the new dataset then we will use the
Merge statement

85.

Which data set is the controlling data set in the MERGE


statement?
Ans: The second final dataset after the merge statement.

86.

How do the IN= variable improve the capability of a


MERGE?
Ans: IN is a implicit variable in SAS which helps in
controlling which dataset needs to contribute to
the new dataset

87.

Explain the message MERGE HAS ONE OR MORE DATASETS WITH


REPEATS OF BY VARIABLE.

COSTOMIZED REPORT WRITING:


88.

What is the purpose of the statement DATA_NULL_?


Ans:
Use the keyword _NULL_, which allows the power of the
DATA step without creating a data set.

89.

What is the pound sign used for the DATA _NULL_?

90.

What is the purpose of using the N=PS option?


Ans: Specifying N=PS in the FILE statement allows
the output pointer to write on any line of the
current output

MACRO:
91.

What system option would you use to help debug a macro?


Ans: Symbolgen Mlogic Mprint

92.

Describe how you would create a macro variable?


Ans: %let var=value;

93. How do you identify a macro variable?


94. How do you define the end of a macro?
Ans: %mend
95. How do you assign a macro variable to a SAS variable?
Ans: Using CallSymput

96. what is the difference between %LOCAL and %GLOBAL?


Ans: The %LOCAL that variable will be used only at the
particular block only but in case of the %GLOBAL that
variable will be used till the end of the SAS session
97. How long can a macro variable be? A token?
Ans: Till it passes to the word scanner.
98. If you use a SYMPUT in a DATA step, when and where can you
use the macro variable?
Ans: It can be used outside the scope of dataset and will
Be globally available.
100. How would you code a macro statement to produce
information on the SAS log?
Ans: %put Statement

You might also like