Professional Documents
Culture Documents
Only mode of SAS was utilized in completing this project, thus I uploaded all the datasets to the
online directory and than imported all of them in SAS data environment using the above code.
2) Create one dataset from all the 4 dataset? (8 Mark)
Further instructions regarding this question was to first append three files except
Agent_Score, and then I was supposed to perform a Left join between the appended
dataset and Agent_score, which I did using the below mentioned code.
data Combined_data;
set Online_S Roll_agent_S Third_party_S;
run;
proc sql;
create table overall as
select * from Agent_S as x left join Combined_data as y
on x.AgentID = y.agentid;
quit;
Sample output:
3) Remove all unwanted ID variables? (2 Mark)
/* Data cleaning */
/* Droppin unwanted ids */
proc sql;
select sum(Premium) as sum_Premium from overall_new;
quit;
Using sql commands in SAS I calculated the overall sum of premiums for all the
customers.
Output:
5) Calculate age and tenure as of 31 July 2020 for all customers? (4 Marks)
For this calculation several format changes were done for the Dates in the dataset. After
bringing them in appropriate formats YRDIF function of SAS library was used to
calsulate the Age and Tenure. Variables utilized were dob and Policy_date.
/* age calculation */
data overall_new;
set overall_new;
Date_=put(dob,date9.);
PDate_ = put(policy_date,date9.);
run;
data overall_new;
set overall_new;
Date__=input(Date_,date9.);
PDate__=input(PDate_,date9.);
run;
proc sql;
create table age_pool as
select * from overall_new;
quit;
data age_calc;
set age_pool;
ref = '30Jul2020'd;
Age = floor(yrdif(Date__,ref,"AGE"));
Tenure_IN_Months = floor(yrdif(PDate__,ref,'30/360'))*12;
run;
Output:
Utilizing this code I added two extra columns to
the dataset.
For doing this I utilized the SUBSTR function of SAS for removing the product id from
Product_lvl2 and than used CAT function to join the result with product_lvl1, thus
producing the desired Product names.
Output:
7) After doing clean up in your data, you have to calculate the distribution
of customers across product and policy status and interpret the result
(4+1 Marks)
For performing these analysis I used PROC SQL, and used the below mentioned code.
proc sql;
select policy_status,count(custid) as Number_of_Customer from age_calc group by
policy_status;
quit;
proc sql;
select Product_name,count(custid) as Number_of_Customer from age_calc group
by Product_name;
quit;
Output:
Interpration:
For dealing with this problem, we have to observe that the premium that is mentioned wrt
every custid is not the annual premium, it is the premium that the customer pays at the
end of the due date. Thus for a customer with monthly payment mode, it shows the
monthly premium. Thus based on the payment mode, the premium column needs to be
updated to have annual premium in each row. Which I achieved using UPDATE
command of PROC SQL.
/* Calculate Average annual premium for different payment mode and interpret the
result */
proc sql;
update age_calc set premium = premium*4 where payment_mode= 'Quaterly';
update age_calc set premium = premium*2 where payment_mode= 'Semi Annual';
update age_calc set premium = premium*12 where payment_mode= 'Monthly';
select payment_mode,avg(premium) as Avg_premium from age_calc group by
payment_mode;
quit;
Output:
Interpration: We can easily see that, customer has to pay the least anuual premium if the
payment mode is annual as opposed to the two times amount one pays for monthly payment
mode.
9) Calculate Average persistency score, no fraud score and tenure of
customers across product and policy status, and interpret the result?
(4+1 Marks)
proc sql;
select Product_name,avg(Persistency_score) as Avg_Persistency_score,
avg(NoFraud_score) as Avg_NoFraud_score,avg(Tenure_IN_Months) as
Avg_Tenure_IN_Months
from age_calc group by Product_name;
quit;
proc sql;
select policy_status,avg(Persistency_score) as Avg_Persistency_score,
avg(NoFraud_score) as Avg_NoFraud_score,avg(Tenure_IN_Months) as
Avg_Tenure_IN_Months
from age_calc group by policy_status;
quit;
/*
Output:
Interpration:
proc sql;
select acq_chnl,avg(age) as Avg_age from age_calc group by acq_chnl;
select policy_status,avg(age) as Avg_age from age_calc group by
policy_status;
quit;
/*
Using this we calculated the avg age grouped by acquiasation channel and policy status.
Output:
Interception: