You are on page 1of 9

1) Import all the 4 files in SAS data environment (8 Mark)

/*Importing the file*/


FILENAME REFFILE '/home/u61988860/Agent_Score.csv';

PROC IMPORT DATAFILE=REFFILE


DBMS=CSV
OUT=Agent_S;
GETNAMES=YES;
RUN;

FILENAME REFFILE '/home/u61988860/Third_Party.csv';

PROC IMPORT DATAFILE=REFFILE


DBMS=CSV
OUT=Third_party_S;
GETNAMES=YES;
RUN;

FILENAME REFFILE '/home/u61988860/Roll_Agent.csv';

PROC IMPORT DATAFILE=REFFILE


DBMS=CSV
OUT=Roll_agent_S;
GETNAMES=YES;
RUN;

FILENAME REFFILE '/home/u61988860/Online.csv';

PROC IMPORT DATAFILE=REFFILE


DBMS=CSV
OUT=Online_S;
GETNAMES=YES;
RUN;

Only mode of SAS was utilized in completing this project, thus I uploaded all the datasets to the
online directory and than imported all of them in SAS data environment using the above code.
2) Create one dataset from all the 4 dataset? (8 Mark)

Further instructions regarding this question was to first append three files except
Agent_Score, and then I was supposed to perform a Left join between the appended
dataset and Agent_score, which I did using the below mentioned code.

/* Appending the first three datasets*/

data Combined_data;
set Online_S Roll_agent_S Third_party_S;
run;

/* Doing a Left join for getting fraud score */

proc sql;
create table overall as
select * from Agent_S as x left join Combined_data as y
on x.AgentID = y.agentid;
quit;

Sample output:
3) Remove all unwanted ID variables? (2 Mark)

/* Data cleaning */
/* Droppin unwanted ids */

data overall_new (drop= hhid proposal_num);


set overall;
run;

/*Checking feature of data*/


proc contents data = overall_new varnum;
run;

Using this above mentioned code I dropped hhid and proposal_num.

4) Calculate annual premium for all customers? (4 Mark)

/* Calculate annual premium for all customers */

proc sql;
select sum(Premium) as sum_Premium from overall_new;
quit;

Using sql commands in SAS I calculated the overall sum of premiums for all the
customers.

Output:
5) Calculate age and tenure as of 31 July 2020 for all customers? (4 Marks)

For this calculation several format changes were done for the Dates in the dataset. After
bringing them in appropriate formats YRDIF function of SAS library was used to
calsulate the Age and Tenure. Variables utilized were dob and Policy_date.

/* age calculation */
data overall_new;
set overall_new;
Date_=put(dob,date9.);
PDate_ = put(policy_date,date9.);
run;

data overall_new;
set overall_new;
Date__=input(Date_,date9.);
PDate__=input(PDate_,date9.);
run;

proc sql;
create table age_pool as
select * from overall_new;
quit;

/* Age and Tenure of the Customer */

data age_calc;
set age_pool;
ref = '30Jul2020'd;
Age = floor(yrdif(Date__,ref,"AGE"));
Tenure_IN_Months = floor(yrdif(PDate__,ref,'30/360'))*12;
run;

Output:
Utilizing this code I added two extra columns to
the dataset.

6) Create a product name by using both


level of product information. And
product name should be representable i.e. no code should be present in
final product name? (4 Marks)

For doing this I utilized the SUBSTR function of SAS for removing the product id from
Product_lvl2 and than used CAT function to join the result with product_lvl1, thus
producing the desired Product names.

/* Extracting Product name */


data age_calc;
set age_calc;
Product_name = cat(product_lvl1,substr(product_lvl2,5));
run;

Output:
7) After doing clean up in your data, you have to calculate the distribution
of customers across product and policy status and interpret the result
(4+1 Marks)

For performing these analysis I used PROC SQL, and used the below mentioned code.

/* distribution of customers across product and policy status */

proc sql;
select policy_status,count(custid) as Number_of_Customer from age_calc group by
policy_status;
quit;

proc sql;
select Product_name,count(custid) as Number_of_Customer from age_calc group
by Product_name;
quit;

Output:

Interpration:

Major amount of policies of this company, is


under payment due, which is not a good
condition.

A large amont of policies have also lapsed so that


decreases companies liablities.

On the product based distribution, we can observe


that all products on average are performing similarly, but the best product turns out to be Term
Kishan.
8) Calculate Average annual premium for different payment mode and
interpret the result? (4+1 Marks)

For dealing with this problem, we have to observe that the premium that is mentioned wrt
every custid is not the annual premium, it is the premium that the customer pays at the
end of the due date. Thus for a customer with monthly payment mode, it shows the
monthly premium. Thus based on the payment mode, the premium column needs to be
updated to have annual premium in each row. Which I achieved using UPDATE
command of PROC SQL.

/* Calculate Average annual premium for different payment mode and interpret the
result */

proc sql;
update age_calc set premium = premium*4 where payment_mode= 'Quaterly';
update age_calc set premium = premium*2 where payment_mode= 'Semi Annual';
update age_calc set premium = premium*12 where payment_mode= 'Monthly';
select payment_mode,avg(premium) as Avg_premium from age_calc group by
payment_mode;
quit;

/* Interpration - It shows that annual payment_mode has lowest average premium


*/

Output:

Interpration: We can easily see that, customer has to pay the least anuual premium if the
payment mode is annual as opposed to the two times amount one pays for monthly payment
mode.
9) Calculate Average persistency score, no fraud score and tenure of
customers across product and policy status, and interpret the result?
(4+1 Marks)

Using PROC SQL,

/* Calculate Average persistency score, no fraud score and tenure of customers


across product
and policy status, and interpret the result */

proc sql;
select Product_name,avg(Persistency_score) as Avg_Persistency_score,
avg(NoFraud_score) as Avg_NoFraud_score,avg(Tenure_IN_Months) as
Avg_Tenure_IN_Months
from age_calc group by Product_name;
quit;

proc sql;
select policy_status,avg(Persistency_score) as Avg_Persistency_score,
avg(NoFraud_score) as Avg_NoFraud_score,avg(Tenure_IN_Months) as
Avg_Tenure_IN_Months
from age_calc group by policy_status;
quit;

/*

Output:

Interpration:

According to these measure,


Return 5D Plan and Term Jawan
are the best products based on
different scores.
10) Calculate Average age of customer across acquisition channel and
policy status, and interpret the result? (4+1 Marks)

/* Calculate Average age of customer across acquisition channel


and policy status, and interpret the result */

proc sql;
select acq_chnl,avg(age) as Avg_age from age_calc group by acq_chnl;
select policy_status,avg(age) as Avg_age from age_calc group by
policy_status;
quit;

/*

Using this we calculated the avg age grouped by acquiasation channel and policy status.

Output:

Interception:

Though given all the differences, average age


across every field is approximetely same.

You might also like