# Question 1)  Creating a library “libname acf703 'C:\Users\Nitish\Desktop\GP1';”  “data a; set acf703.

Grouppresentation; where fyear = 2003; run;” - This step involves describing the summary statistics for the newly created variables listed in the library.  “proc sort data = a; by sich;” - Involves sorting of data as per the desired key i.e. sich/sic2/sic3.  “proc freq data = a; tables sich / out=counts; run;” - The following statements are used to count the number of firms with more than 10 observations of data in the sample. This requires two separate frequency counts to be performed. In stage 1, we count the number of observations per firm (tables sic2). In stage 2, we count the number of counts from stage 1 (tables count).  Now we create a new data set ‘b’ to narrow down on the selected firms from the result obtained from the above steps using “data b; set work.a; where sich = 2834 or sich = 2836; run;”  We now use the ‘proc freq’ to give us the frequency outcomes of the desired observations selected in the data set b using “proc freq data = b; tables sich / out = counts; run;” Result:
The SAS System Standard Industrial Frequency Percent of Total Classification Count Frequency Historical 2834 84 58.7413 2836 59 41.2587

Question 2)  We now create a new data set ‘c’ from the work set b to refine the data of two industries. All earnings must be positive, therefore we provide the ‘if’ statement using “data c; set work.b; if mdfy1 > 0 and mdfy2 > 0 and actual > 0 and ni > 0; Further the first three eps metrics (ieps1, ieps2 and iepsa) are from IBES ieps1=mdfy1; ieps2=mdfy2; iepsa=actual; The last two are based on Compustat data and are adjusted for stock splits ceps1 = epspx / ajex; ceps2 = ni / (ajex*cshpri); Calculate BPS usin ‘bps = ceq / (ajex*csho);’ Hit Run.

output out = descript n = n1-n7 mean = mn1-mn7 median = md1-md7 std = std1-std7 min = min1-min7 p1 = p11-p17 q1 = q11-q17 q3 = q31-q37 p99 = p991-p997 max = max1-max7 sum = sum1-sum7. We now use ‘proc univariate’ to check the descriptive statistics “proc univariate data = c noprint.” This procedure contains details about both Industries from the sample. run. output out = descrip n = n1-n7 mean = mn1-mn7 median = md1md7 std = std1-std7 min = min1-min7 p1 = p11-p17 q1 = q11-q17 q3 = q31-q37 p99 = p991-p997 max = max1-max7 sum = sum1-sum7.”  . by sich. by sich. var ieps1 ieps2 iepsa ceps1 ceps2 bps p4.run. proc univariate data = c. var ieps1 ieps2 iepsa ceps1 ceps2 bps p4. Now we sort the above data by industry where in we segregate the two merged data using “proc sort data = c.