SAS Interview Questions:Base SAS

Very Basic:· What SAS statements would you code to read an external raw data file to a DATA step? INFILE statement. · How do you read in the variables that you need? Using Input statement with the column pointers like @5/12-17 etc. · Are you familiar with special input delimiters? How are they used? DLM and DSD are the delimiters that I’ve used. They should be included in the infile statement. Comma separated values files or CSV files are a common type of file that can be used to read with the DSD option. DSD option treats two delimiters in a row as MISSING value. DSD also ignores the delimiters enclosed in quotation marks. · If reading a variable length file with fixed input, how would you prevent SAS from reading the next record if the last variable didn't have a value? By using the option MISSOVER in the infile statement.If the input of some data lines are shorter than others then we use TRUNCOVER option in the infile statement. · What is the difference between an informat and a format? Name three informats or formats. Informats read the data. Format is to write the data. Informats: comma. dollar. date. Formats can be same as informatsInformats: MMDDYYw. DATEw. TIMEw. , PERCENTw,Formats: WORDIATE18., weekdatew. · Name and describe three SAS functions that you have used, if any? LENGTH: returns the length of an argument not counting the trailing blanks.(missing values have a length of 1)Ex: a=’my cat’;x=LENGTH(a); Result: x=6… SUBSTR: SUBSTR(arg,position,n) extracts a substring from an argument starting at ‘position’ for ‘n’ characters or until end if no ‘n’. Ex: A=’(916)734-6241’;X=SUBSTR(a,2,3); RESULT: x=’916’ TRIM: removes trailing blanks from character expression. Ex: a=’my ‘; b=’cat’;X= TRIM(a)(b); RESULT: x=’mycat’. SUM: sum of non missing values.Ex: x=Sum(3,5,1); result: x=9.0 INT: Returns the integer portion of the argument. · How would you code the criteria to restrict the output to be produced?

Don’t touch that dial”. ·What statement you code to tell SAS that it is to write to an external file? .· How would you include common or reuse code to be processed along with your statements? By using SAS Macros.When: identifies SAS statements that are executed when a particular condition is true. describe the link and any required statements used to either process the code or the step itself .Use NOPRINT option. . SAS will hold the line of data until it reaches either the end of the data step or an INPUT statement that does not end with the trailing. Otherwise (optional): specifies a statement to be executed if no WHEN condition is met.” stay tuned for more information. The line hold specifies like a stop sign telling SAS. how would you code the data statement to prevent SAS from producing a set? Data _Null_ · What is the one statement to set the criteria of data that can be coded in any step? Options statement: This a part of SAS program and effects all steps that follow it. · If reading an external file to produce an external file. what is the shortcut to write that record without coding every single variable on the record? · If you're not wanting any SAS output from a data step. using a SELECT group is slightly more efficient than using IF-THEN or IFTHEN-ELSE statements because CPU time is reduced. · Under what circumstances would you code a SELECT construct instead of IF statements? When you have a long series of mutually exclusive conditions and the comparison is numeric. · Have you ever linked SAS code? If so.What statement do you code to write the record to the file? PUT and FILE statements. Double trailing @@: When you have multiple observations per line of raw data.@@ holds the value till a input statement or end of the line. · What is the purpose of the trailing @ and the @@? How would you use them? @ holds the value past the data step. “stop. End: ends a SELECT group. SELECT GROUP: Select: begins with select group. hold that line of raw data”. Trailing @: By using @ without specifying a column. it is as if you are telling SAS. we should use double trailing signs (@@) at the end of the INPUT statement.

what is the code to force SAS to use only those variable? Using KEEP option or statement. but you need only five of those. Run . · How would you code a merge that will write the matches of both to one data set. Proc sort data=one. which function is the best to locate that data: scan.· If you have a data set that contains 100 variables. Check the condition by using If statement in the Merge statement while merging datasets.· When looking for data contained in a character string of 150 bytes. along with several numeric variables. After input buffer is created the PDV is created. run. one observation at a time. the non-matches from the left-most data. During the compilation phase the input buffer is created to hold a record from external file. The PDV contains two automatic variables _N_ and _ERROR_. BY State District County . The PDV is the area of memory where SAS builds dataset. When SAS processes a data step it has two phases. · What is the Program Data Vector (PDV)? What are its functions? Function: To store the current obs. · How would you delete duplicate observations? NONUPLICATES · How would you delete observations with duplicate keys? NODUPKEY · How would you code a merge that will keep only the observations that have matches from both sets. if inxxx = 1 and inyyy = 1.PDV (Program Data Vector) is a logical area in memory where SAS creates a dataset one observation at a time. index. Step1: Define 3 datasets in DATA step Step2: Assign values of IN statement to different variables for 2 datasets Step3: Check for the condition using IF statement and output the matching to first dataset and no matches to different datasets Ex: data xxx. · Code a PROC SORT on a data set containing State. by aaa. . District and County as the primary variables. or indexc? SCAN. Compilation phase and execution phase. merge yyy(in = inxxx) zzz (in = inzzz). .

what is the first action in a typical DATA Step? The DATA step begins with a DATA statement. INPUT · In the flow of DATA step processing. INPUT. a new iteration of the DATA step begins.3)= 1 then. then used at execution time as the location where the working values of variables are stored as they are processed by the DATA step program(source: http://www2. If we want to find every third record in a Dataset thenwe can use the _n_ as follows Data new-sas-data-set.This is not necessarily equal to the observation number. Note: If we use a where clause to subset the _n_ will not yield the required result. · Does SAS 'Translate' (compile) or does it 'Interpret'? Explain.Identify statements whose placement in the DATA step is critical.variable ha a value of 1 if there is a error in the data for that observation and 0 if it is not. if mod(_n_.The Logical Program Data Vector (PDV) is a set of buffers that includes all variables referenced either explicitly or implicitly in the DATA step. SAS compiles the code· At compile time when a SAS data set is read.indicates the number of times SAS has looped through the data step.com/proceedings/sugi24/Posters/p235-24. DATA. Each time the DATA statement executes. Eg.pdf). · Name statements that function at both compile and execution time. It is created at compile time. . Note: Both -N. Input Buffer. Ex: This is nothing but a implicit variable created by SAS during data processing.and _ERROR_ variables are always available to you in the data step . PDV and Descriptor Information · Name statements that are recognized at compile time only? PUT · Name statements that are execution only. It gives the total number of records SAS has iterated in a dataset.The –ERROR. · What is _n_? It is a Data counter variable in SAS.–N. Set old. INPUT· . and the _N_ automatic variable is incremented by 1. run. since a simple sub setting IF statement can change the relationship between Observation number and the number of iterations of the data step. what items are created?Automatic variables are created. It is Available only for data step and not for PROCS. RUN.sas. INFILE.

Sign up to vote on this title
UsefulNot useful