What SAS statements would you code to read an external raw data file to a DATA step?

We use SAS statements – FILENAME – to specify the location of the file INFILE - Identifies an external file to read with an INPUT statement INPUT – to specify the variables that the data is identified with. How do you read in the variables that you need? Using Input statement with column /line pointers, informats and length specifiers. Are you familiar with special input delimiters? How are they used? DLM, DSD are the special input delimiters… DELIMITER= delimiter(s) specifies an alternate delimiter (other than a blank) to be used for LIST input DSD (delimiter-sensitive data) specifies that when data values are enclosed in quotation marks, delimiters within the value be treated as character data. The DSD option changes how SAS treats delimiters when you use LIST input and sets the default delimiter to a comma. When you specify DSD, SAS treats two consecutive delimiters as a missing value and removes quotation marks from character values http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000146932.htm#a000177189 If reading a variable length file with fixed input, how would you prevent SAS from reading the next record if the last variable didn't have a value? Options MISSOVER and TRUNCOVER options.. MISSOVER

prevents an INPUT statement from reading a new input data record if it does not find values in the current input line for all the variables in the statement. When an INPUT statement reaches the end of the current input data record, variables without any values assigned are set to missing. TRUNCOVER overrides the default behavior of the INPUT statement when an input data record is shorter than the INPUT statement expects. By default, the INPUT statement automatically reads the next input data record. TRUNCOVER enables you to read variable-length records when some records are shorter than the INPUT statement expects. Variables without any values assigned are set to missing. http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000146932.htm#a000177189 What is the difference between an informat and a format? Name three informats or formats. INFORMAT Statement - Associates informats with variables It’s basically used in an input / SQL create table statements to read external file raw data or data that is not in a SAS format. http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000178244.htm eg: commaw. datew. Wordatew. dollarw. $varyinglengthw. FORMAT Statement Associates formats with variables It’s basically used in a datastep format / SQL select / Procedure format statements to output SAS data to a file/report etc. Formats can look-like informats but are differentiated as to which statement they are used in…

eg. Datew., Worddatew., mmddyyw. http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000178212.htm Name and describe three SAS functions that you have used, if any? The most common functions that would be used areConversion functions - Input / Put / int / ceil / floor Character functions - Scan / substr / index / Left / trim / compress / cat / catx / upcase,lowcase Arithmetic functions - Sum / abs / Attribute info functions – Attrn / length Dataset – open / close / exist Directory - dexist / dopen / dclose / dcreate / dinfo File functions – fexist / fopen/ filename / fileref SQL functions – coalesce / count / sum/ mean Date functions – date / today / datdif / datepart / datetime / intck / mdy Array functions – dim http://sastechies.com/SASfunctions.php How would you code the criteria to restrict the output to be produced? In view of in-sufficient clarity as to what the interviewer refers to – Global statement – options obs=; Dataset options – obs= Proc SQL - NOPRINT option for reporting / inobs= , outobs= for SQL select

Proc datasets – NOLIST option What is the purpose of the trailing @ and the @@? How would you use them? Line-hold specifiers keep the pointer on the current input record when
• • •

a data record is read by more than one INPUT statement (trailing @) one input line has values for more than one observation (double trailing @) a record needs to be reread on the next iteration of the DATA step (double trailing @).

Use a single trailing @ to allow the next INPUT statement to read from the same record. Use a double trailing @ to hold a record for the next INPUT statement across iterations of the DATA step. Normally, each INPUT statement in a DATA step reads a new data record into the input buffer. When you use a trailing @, the following occurs: The pointer position does not change. • No new record is read into the input buffer. • The next INPUT statement for the same iteration of the DATA step continues to read the same record rather than a new one.

SAS releases a record held by a trailing @ when

a null INPUT statement executes: input;

• •

an INPUT statement without a trailing @ executes the next iteration of the DATA step begins.

Normally, when you use a double trailing @ (@@), the INPUT statement for the next iteration of the DATA step continues to read the same record. SAS releases the record that is held by a double trailing @
• •

immediately if the pointer moves past the end of the input record immediately if a null INPUT statement executes: input;

when the next iteration of the DATA step begins if an INPUT statement with a single trailing @ executes later in the DATA step:

input @; A record held by the double trailing at sign (@@) is not released until

the input pointer moves past the end of the record. Then the input pointer moves down to the next record.

>----+----10--V+102 92 78 103 84 23 36 75

an INPUT statement without a line-hold specifier executes.

input ID $4. @@; . . input Department 5.;

enables the next INPUT statement to read from the same record releases the current record when a subsequent INPUT statement executes without a line-hold specifier.

Unlike the @@, the single @ also releases a record when control returns to the top of the DATA step for the next iteration. data perm.sales97; infile data97 missover; input ID $4. @; do Quarter=1 to 4; input Sales : comma. @; output; end; run; Raw Data File Data97 >----V----10---+----20---+----30---+----40 0734 1,323.34 2,472.85 3,276.65 5,345.52 0943 1,908.34 2,560.38 1009 2,934.12 3,308.41 4,176.18 7,581.81

data perm.people (drop=type);

infile census; retain Address; input type $1. @; if type='H' then input @3 Address $15.; if type='P'; input @3 Name $10. @13 Age 3. @15 Gender $1.; run;

>V---+----10---+---H 321 S. MAIN ST data perm.residnts; P MARY E 21 F infile census; retain Address; P WILLIAM M 23input type $1. @; M P if type='H' then do; SUSAN K 3 F > 1 then output; if _n_ Total=0; input Address $ 3-17; end; else if type='P' then total+1; >----+----10---+----20 H P P P H P P P P P H P P 321 S. MAIN ST MARY E 21 F WILLIAM M 23 M SUSAN K 3 F 324 S. MAIN ST THOMAS H 79 M WALTER S 46 M ALICE A 42 F MARYANN A 20 F JOHN S 16 M 325A S. MAIN ST

JAMES L 34 M H LIZA A 31 F P 325B S. MAIN ST P MARGO K 27 F WILLIAM R 27 M P ROBERT W 1 M

Under what circumstances would you code a SELECT construct instead of IF statements? The SELECT statement begins a SELECT group. SELECT groups contain WHEN statements that identify SAS statements that are executed when a particular condition is true. Use at least one WHEN statement in a SELECT group. An optional OTHERWISE statement specifies a statement to be executed if no WHEN condition is met. An END statement ends a SELECT group. Null statements that are used in WHEN statements cause SAS to recognize a condition as true without taking further action. Null statements that are used in OTHERWISE statements prevent SAS from issuing an error message when all WHEN conditions are false. Using Select-When improves processing efficiency and understandability in programs that needed to check a series of conditions for the same variable. Use IF-THEN/ELSE statements for programs with few statements. Using a subsetting IF statement without a THEN clause could be dangerous because it would process only those records that meet the condition specified in the IF clause. http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000201966.htm

What statement you code to tell SAS that it is to write to an external file? FILENAME / FILE/ PUT The FILENAME statement is an optional statement that species the location of the external file. PUT Statement – Writes the variable values to the external file. The FILE statement specifies the current output file for PUT statements in the DATA step. When multiple FILE statements are present, the PUT statement builds and writes output lines to the file that was specified in the most recent FILE statement. If no FILE statement was specified, the PUT statement writes to the SAS log. The specified output file must be an external file, not a SAS data library, and it must be a valid access type. If reading an external file to produce an external file, what is the shortcut to write that record without coding every single variable on the record? Use the _infile_ option in the put statement filename some 'c:\cool.dat'; filename cool1 'c:\cool1.dat'; data _null_; infile some; input some; file cool1; put _infile_; run;

If you're not wanting any SAS output from a data step, how would you code the data statement to prevent SAS from producing a set? Data _null_; _NULL_ - specifies that SAS does not create a data set when it executes the DATA step. Data _null_ is majorly used in
o

creating quick macro variables with call symput routine Data _null_; Set somedata; Call symput(‘macvar’,dsnvariable); Run;

eg.

o

Creating a Custom Report

Eg. The second DATA step in this program produces a custom report and uses the _NULL_ keyword to execute the DATA step without creating a SAS data set: data sales; input dept : $10. jan feb mar; datalines; shoes 4344 3555 2666 housewares 3777 4888 7999 appliances 53111 7122 41333 ; data _null_; set sales; qtr1tot=jan+feb+mar; put 'Total Quarterly Sales: ' qtr1tot dollar12.; run;

What is the one statement to set the criteria of data that can be coded in any step? WHERE statement can sets the criteria for any data set in a datastep or a proc step. Have you ever linked SAS code? If so, describe the link and any required statements used to either process the code or the step itself. SAS code could be linked using the GOTO or the Link statement. GOTO - http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000201949.htm LINK - http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000201972.htm The difference between the LINK statement and the GO TO statement is in the action of a subsequent RETURN statement. A RETURN statement after a LINK statement returns execution to the statement that follows LINK. A RETURN statement after a GO TO statement returns execution to the beginning of the DATA step, unless a LINK statement precedes GO TO, in which case execution continues with the first statement after LINK. In addition, a LINK statement is usually used with an explicit RETURN statement, whereas a GO TO statement is often used without a RETURN statement. When your program executes a group of statements at several points in the program, using the LINK statement simplifies coding and makes program logic easier to follow. If your program executes a group of statements at only one point in the program, using DO-group logic rather than LINK-RETURN logic is simpler. Goto eg. data info; input x; if 1<=x<=5 then go to add; put x=;

add: sumx+x; datalines; 7 6 323 ; Link Eg. data hydro; input type $ depth station $; /* link to label calcu: */ if type ='aluv' then link calcu; date=today(); /* return to top of step */ return; calcu: if station='site_1' then elevatn=6650-depth; else if station='site_2' then elevatn=5500-depth; /* return to date=today(); */ return; datalines; aluv 523 site_1 uppa 234 site_2

aluv 666 site_2 ...more data lines... ;

How would you include common or reuse code to be processed along with your statements? - Using SAS Macros. - Using a %include statement When looking for data contained in a character string of 150 bytes, which function is the best to locate that data: scan, index, or indexc? Index function - Searches a character expression for a string of characters SAS Statements a='ABC.DEF (X=Y)'; b='X=Y'; x=index(a,b); put x; For learning purposes The INDEXC function searches for the first occurrence of any individual character that is present within the character string, whereas the INDEX function searches for the first occurrence of the character string as a pattern. Results

10

b='have a good day'; x=indexc(b,'pleasant','very'); put x; The INDEXW function searches for strings that are words, whereas the INDEX function searches for patterns as separate words or as parts of other words. INDEXC searches for any characters that are present in the excerpts. s='asdf adog dog'; p='dog '; x=indexw(s,p); put x;

If you have a data set that contains 100 variables, but you need only five of those, what is the code to force SAS to use only those variables? Use KEEP= dataset option (data statement or set statement) or KEEP statement in a datastep. eg. Data fewdata (keep = var10 var11); Set fulldata (Keep= VAR1 VAR2 VAR3 VAR4 VAR5); Keep var6 var7; Run;

Code a PROC SORT on a data set containing State, District and County as the primary variables, along with several numeric variables.

Proc sort data= Dist_County; By state district city; Run; How would you delete duplicate observations? noduprecs option in a Proc Sort. data cricket; input id country $9. score; cards; 1 australia 342 2 somerset 343 1 australia 342 2 somerset 341 ; run; proc sort data = cricket noduprecs; by id; run; Here in the example observation 1 and 3 are duplicate records….so Obs 1 is retained…

How would you delete observations with duplicate keys? nodupkey option in a Proc Sort. proc sort data = cricket nodupkey; by id; run; In the above example Observation 1/ 3 and 2 / 4 have duplicate key (variable id) values i.e. 1 and 2 respectively…so observations 3 / 4 get deleted… How would you code a merge that will keep only the observations that have matches from both sets. data mergeddata; merge one(in=A) two(in=B); By ID; if A and B; run; How would you code a merge that will write the matches of both to one data set, the nonmatches from the left-most data. Data one two three; Merge DSN1 (in=A) DSN2 (in=B); By ID; If A and B then output one; If A and not B then output two; If not A and B then output three; Run;

What is the Program Data Vector (PDV)? What are its functions? PDV is a logical area in memory where SAS builds a data set, one observation at a time. When a program executes, SAS reads data values from the input buffer or creates them by executing SAS language statements. The data values are assigned to the appropriate variables in the program data vector. From here, SAS writes the values to a SAS data set as a single observation. Along with data set variables and computed variables, the PDV contains two automatic variables, _N_ and _ERROR_. The _N_ variable counts the number of times the DATA step begins to iterate. The _ERROR_ variable signals the occurrence of an error caused by the data during execution. The value of _ERROR_ is either 0 (indicating no errors exist), or 1 (indicating that one or more errors have occurred). SAS does not write these variables to the output data set.

Does SAS 'Translate' (compile) or does it 'Interpret'? Explain. At compile time when a SAS data set is read, what items are created? SAS compiles the code sent to the compiler. When you submit a DATA step for execution, SAS checks the syntax of the SAS statements and compiles them, that is, automatically translates the statements into machine code. In this phase, SAS identifies the type and length of each new variable, and determines whether a type conversion is necessary for each subsequent reference to a variable. During the compile phase, SAS creates the following three items: input buffer is a logical area in memory into which SAS reads each record of raw data when SAS executes an INPUT statement. Note that this buffer is created only when the DATA step reads raw data. (When the DATA step reads a SAS data set, SAS reads the data

directly into the program data vector.) program data vector (PDV) is a logical area in memory where SAS builds a data set, one observation at a time. When a program executes, SAS reads data values from the input buffer or creates them by executing SAS language statements. The data values are assigned to the appropriate variables in the program data vector. From here, SAS writes the values to a SAS data set as a single observation. Along with data set variables and computed variables, the PDV contains two automatic variables, _N_ and _ERROR_. The _N_ variable counts the number of times the DATA step begins to iterate. The _ERROR_ variable signals the occurrence of an error caused by the data during execution. The value of _ERROR_ is either 0 (indicating no errors exist), or 1 (indicating that one or more errors have occurred). SAS does not write these variables to the output data set. is information that SAS creates and maintains about each SAS data set, including data set attributes and variable attributes. It contains, for example, the name of the data set and its member type, the date and time that the data set was created, and the number, names and data types (character or numeric) of the variables. The Execution Phase By default, a simple DATA step iterates once for each observation that is being created. The flow of action in the Execution Phase of a simple DATA step is described as follows: The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1. • SAS sets the newly created program variables to missing in the program data vector (PDV).

descriptor information

SAS reads a data record from a raw data file into the input buffer, or it reads an observation from a SAS data set directly into the program data vector. You can use an INPUT, MERGE, SET, MODIFY, or UPDATE statement to read a record. • SAS executes any subsequent programming statements for the current record. • At the end of the statements, an output, return, and reset occur automatically. SAS writes an observation to the SAS data set, the system automatically returns to the top of the DATA step, and the values of variables created by INPUT and assignment statements are reset to missing in the program data vector. Note that variables that you read with a SET, MERGE, MODIFY, or UPDATE statement are not reset to missing here. • SAS counts iteration, reads the next record or observation, and executes the subsequent programming statements for the current observation. • The DATA step terminates when SAS encounters the end-of-file in a SAS data set or a raw data file.

All the variables are assigned missing values (Blank for character, . for numeric values)

Name statements that are recognized at compile time only? drop, keep, rename, label, format, informat, attrib, where, by, retain, length, array Name statements that are execution only. INFILE, INPUT, Output, Call routines Identify statements whose placement in the DATA step is critical. DATA, INPUT, RUN, CARDS ,INFILE,WHERE,LABEL,SELECT,INFORMAT,FORMAT Name statements that function at both compile and execution time. options, title, footnote

In the flow of DATA step processing, what is the first action in a typical DATA Step? The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1. What is _n_? The _N_ variable counts the number of times the DATA step begins to iterate. It is one of the Automatic data step (and not proc’s) variables (the other one being _ERROR_) that SAS provides in a PDV. It should be noted that _n_ does not necessarily equal the observation number in a dataset. How do I convert a numeric variable to a character variable? Practically, the data type of a variable cannot be changed in one data step, but the data values could…One should create a new variable with data type character and assign the values of the numeric variable with a PUT function, drop the numeric variable, and rename the character variable to the numeric variable name. Note: You would receive a warning saying that the variable has already been defined as numeric. Eg.

http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000199354.htm#a000226452 How do I convert a character variable to a numeric variable? Practically, the data type of a variable cannot be changed in one data step, but the data values could…One should create a new variable with data type numeric and assign the values of the character variable with a INPUT function, drop the character variable, and rename the numeric variable to the character variable name. Note: You would receive a warning saying that the variable has already been defined as character. http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000180357.htm

find more @ http://sastechies.blogspot.com/

Sign up to vote on this title
UsefulNot useful