You are on page 1of 52

Data workout SAS code

Complete SAS code for workout


Data treatment demonstration
 For converting categorical variables in to indicator variable
 For missing value treatment
 Knowing when we don’t need to create indicator variable for the categorical
variable
 (Note – categorical variable is either character variable or such numeric variable,
which looks like numeric but doesn’t have numeric meaning (like say sector 1, 2
etc.)
/* Proc Means to check the values of variable TargetD for distinct values of variable
TargetB */

proc means data=shan.gift;


var TargetD;
class TargetB;
run;

The MEANS Procedure

Analysis Variable : TargetD Target Gift Amount

Target Gift N
Flag Obs N Mean Std Dev Minimum Maximum
--------------------------------------------------------------------------------------------
0 4843 0 . . . .

1 4843 4843 15.6243444 12.4451371 1.0000000 200.0000000


--------------------------------------------------------------------------------------------
/* Checking for missing value count*/
data chk;
set shan.gift;
where GiftAvgCard36=.;
run;

NOTE: There were 1780 observations read from the data set SHAN.GIFT.
WHERE GiftAvgCard36=.;
NOTE: The data set WORK.CHK has 1780 observations and 28 variables.
NOTE: DATA statement used (Total process time):
real time 0.05 seconds
cpu time 0.01 seconds
/*missing values for age*/
data chk;
set shan.gift;
where DemAge=.;
run;

NOTE: There were 2407 observations read from the data set SHAN.GIFT.
WHERE DemAge=.;
NOTE: The data set WORK.CHK has 2407 observations and 28 variables.
NOTE: DATA statement used (Total process time):
real time 0.09 seconds
cpu time 0.02 second
/*missing value treatment*/

data shan.gift1;
set shan.gift;
if DemAge=. then
do;
DemAge=0;
flagDemAge=1;
end;
else flagDemage=0;
if GiftAvgCard36=. then
do;
GiftAvgCard36=0;
flagGiftAvgCard36=1;
end;
else flagGiftAvgCard36=0;
run;

NOTE: This SAS session is using a registry in WORK. All changes will be lost at the end of
this session.
23 data shan.gift1;
24 set shan.gift;
25 if DemAge=. then do;
26 DemAge=0;
27 flagDemAge=1;
28 end;
29 else flagDemage=0;
30 if GiftAvgCard36=. then do;
31 GiftAvgCard36=0;
32 flagGiftAvgCard36=1;
33 end;
34 else flagGiftAvgCard36=0;
35 run;

NOTE: There were 9686 observations read from the data set SHAN.GIFT.
NOTE: The data set SHAN.GIFT1 has 9686 observations and 30 variables.
NOTE: DATA statement used (Total process time):
/*Find the significance of character variables on the dependent variable*/

proc freq data=shan.gift1;


tables StatusCat96NK*TargetB/chisq;
run;

The FREQ Procedure


Frequency Table of StatusCat96NK by TargetB
Percent
Row Pct StatusCat96NK(Status Category 96NK) TargetB(Target Gift Flag) Total
Col Pct
0 1

A 3032 2794 5826


31.30 28.85 60.15
52.04 47.96  
62.61 57.69  

E 95 132 227
0.98 1.36 2.34
41.85 58.15  
1.96 2.73  

F 403 257 660


4.16 2.65 6.81
Table of StatusCat96NK by TargetB

StatusCat96NK(Status Category 96NK) TargetB(Target Gift Flag) Total

0 1

61.06 38.94  
8.32 5.31  

L 17 17 34
0.18 0.18 0.35
50.00 50.00  
0.35 0.35  

N 310 264 574


3.20 2.73 5.93
54.01 45.99  
6.40 5.45  

S 986 1379 2365


10.18 14.24 24.42
41.69 58.31  
20.36 28.47  

Total 4843 4843 9686


50.00 50.00 100.00

Statistics for Table of StatusCat96NK by TargetB


Statistic DF Value Prob

Chi-Square 5 117.0430 <.0001

Likelihood Ratio Chi-Square 5 117.6493 <.0001

Mantel-Haenszel Chi-Square 1 50.8320 <.0001

Phi Coefficient   0.1099  

Contingency Coefficient   0.1093  

Cramer's V   0.1099  

Sample Size = 9686

/* combine E and S of the output cross tablulation*/


/*A L and N can be combined of cross tab to reduce no of distinct categories in
categorical variable*/
/*Please remove the categorical variable in the dataset once indicator variables are
created*/
data shan.gift1;
set shan.gift1;
ind_stcat96nk_E_or_S=0;
ind_stcat96nk_A_L_N=0;
if StatusCat96NK in ('E','S') then ind_stcat96nk_E_or_S=1;
else if StatusCat96NK in ('A','L','N') then ind_stcat96nk_A_L_N=1;
run;

proc freq data=shan.gift1;


tables ind_stcat96nk_E_or_S*StatusCat96NK
ind_stcat96nk_A_L_N*StatusCat96NK/norow nocol nocum
nopercent chisq;
run;

The FREQ Procedure


Frequency Table of ind_stcat96nk_E_or_S by StatusCat96NK

ind_stcat96nk_E_or_S StatusCat96NK(Status Category 96NK) Total

A E F L N S

0 5826 0 660 34 574 0 7094


Table of ind_stcat96nk_E_or_S by StatusCat96NK

ind_stcat96nk_E_or_S StatusCat96NK(Status Category 96NK) Total

A E F L N S

1 0 227 0 0 0 2365 2592

Total 5826 227 660 34 574 2365 9686

Statistics for Table of ind_stcat96nk_E_or_S by StatusCat96NK


Statistic DF Value Prob

Chi-Square 5 9686.0000 <.0001

Likelihood Ratio Chi-Square 5 11252.4170 <.0001

Mantel-Haenszel Chi-Square 1 6831.6503 <.0001

Phi Coefficient   1.0000  

Contingency Coefficient   0.7071  

Cramer's V   1.0000  

Sample Size = 9686


Frequency Table of ind_stcat96nk_A_L_N by StatusCat96NK

ind_stcat96nk_A_L_N StatusCat96NK(Status Category 96NK) Total

A E F L N S

0 0 227 660 0 0 2365 3252

1 5826 0 0 34 574 0 6434

Total 5826 227 660 34 574 2365 9686

Statistics for Table of ind_stcat96nk_A_L_N by StatusCat96NK


Statistic DF Value Prob

Chi-Square 5 9686.0000 <.0001

Likelihood Ratio Chi-Square 5 12362.6467 <.0001

Mantel-Haenszel Chi-Square 1 6385.9061 <.0001

Phi Coefficient   1.0000  

Contingency Coefficient   0.7071  

Cramer's V   1.0000  
Sample Size = 9686

/*Creating an Indicator variable for demcluster varaible*/


/*First take chisq*/

proc freq data=shan.gift1;


tables Demcluster*targetB/chisq;
run;

/*output of chisq*/
The SAS System

The FREQ Procedure


Frequency Table of DemCluster by TargetB
Percent
Row Pct DemCluster(Demographic Cluster) TargetB(Target Gift Flag) Total
Col Pct
0 1

00 100 140 240


1.03 1.45 2.48
41.67 58.33  
2.06 2.89  
Table of DemCluster by TargetB

DemCluster(Demographic Cluster) TargetB(Target Gift Flag) Total

0 1

01 54 67 121
0.56 0.69 1.25
44.63 55.37  
1.12 1.38  

02 92 99 191
0.95 1.02 1.97
48.17 51.83  
1.90 2.04  

03 68 85 153
0.70 0.88 1.58
44.44 55.56  
1.40 1.76  

04 21 30 51
0.22 0.31 0.53
41.18 58.82  
0.43 0.62  

05 48 47 95
0.50 0.49 0.98
50.53 49.47  
0.99 0.97  
Table of DemCluster by TargetB

DemCluster(Demographic Cluster) TargetB(Target Gift Flag) Total

0 1

06 31 22 53
0.32 0.23 0.55
58.49 41.51  
0.64 0.45  

07 34 44 78
0.35 0.45 0.81
43.59 56.41  
0.70 0.91  

08 102 80 182
1.05 0.83 1.88
56.04 43.96  
2.11 1.65  

09 36 34 70
0.37 0.35 0.72
51.43 48.57  
0.74 0.70  

10 106 69 175
1.09 0.71 1.81
60.57 39.43  
2.19 1.42  
Table of DemCluster by TargetB

DemCluster(Demographic Cluster) TargetB(Target Gift Flag) Total

0 1

11 109 127 236


1.13 1.31 2.44
46.19 53.81  
2.25 2.62  

12 163 160 323


1.68 1.65 3.33
50.46 49.54  
3.37 3.30  

13 138 171 309


1.42 1.77 3.19
44.66 55.34  
2.85 3.53  

14 121 127 248


1.25 1.31 2.56
48.79 51.21  
2.50 2.62  

15 56 52 108
0.58 0.54 1.12
51.85 48.15  
1.16 1.07  
Table of DemCluster by TargetB

DemCluster(Demographic Cluster) TargetB(Target Gift Flag) Total

0 1

16 97 104 201
1.00 1.07 2.08
48.26 51.74  
2.00 2.15  

17 92 86 178
0.95 0.89 1.84
51.69 48.31  
1.90 1.78  

18 153 168 321


1.58 1.73 3.31
47.66 52.34  
3.16 3.47  

19 25 25 50
0.26 0.26 0.52
50.00 50.00  
0.52 0.52  

20 78 93 171
0.81 0.96 1.77
45.61 54.39  
1.61 1.92  
Table of DemCluster by TargetB

DemCluster(Demographic Cluster) TargetB(Target Gift Flag) Total

0 1

21 91 74 165
0.94 0.76 1.70
55.15 44.85  
1.88 1.53  

22 60 65 125
0.62 0.67 1.29
48.00 52.00  
1.24 1.34  

23 60 71 131
0.62 0.73 1.35
45.80 54.20  
1.24 1.47  

24 185 216 401


1.91 2.23 4.14
46.13 53.87  
3.82 4.46  

25 71 64 135
0.73 0.66 1.39
52.59 47.41  
1.47 1.32  
Table of DemCluster by TargetB

DemCluster(Demographic Cluster) TargetB(Target Gift Flag) Total

0 1

26 49 51 100
0.51 0.53 1.03
49.00 51.00  
1.01 1.05  

27 165 166 331


1.70 1.71 3.42
49.85 50.15  
3.41 3.43  

28 85 109 194
0.88 1.13 2.00
43.81 56.19  
1.76 2.25  

29 33 40 73
0.34 0.41 0.75
45.21 54.79  
0.68 0.83  

30 153 109 262


1.58 1.13 2.70
58.40 41.60  
3.16 2.25  
Table of DemCluster by TargetB

DemCluster(Demographic Cluster) TargetB(Target Gift Flag) Total

0 1

31 63 62 125
0.65 0.64 1.29
50.40 49.60  
1.30 1.28  

32 45 27 72
0.46 0.28 0.74
62.50 37.50  
0.93 0.56  

33 26 26 52
0.27 0.27 0.54
50.00 50.00  
0.54 0.54  

34 64 68 132
0.66 0.70 1.36
48.48 51.52  
1.32 1.40  

35 182 202 384


1.88 2.09 3.96
47.40 52.60  
3.76 4.17  
Table of DemCluster by TargetB

DemCluster(Demographic Cluster) TargetB(Target Gift Flag) Total

0 1

36 216 185 401


2.23 1.91 4.14
53.87 46.13  
4.46 3.82  

37 56 43 99
0.58 0.44 1.02
56.57 43.43  
1.16 0.89  

38 53 65 118
0.55 0.67 1.22
44.92 55.08  
1.09 1.34  

39 118 124 242


1.22 1.28 2.50
48.76 51.24  
2.44 2.56  

40 197 235 432


2.03 2.43 4.46
45.60 54.40  
4.07 4.85  
Table of DemCluster by TargetB

DemCluster(Demographic Cluster) TargetB(Target Gift Flag) Total

0 1

41 113 84 197
1.17 0.87 2.03
57.36 42.64  
2.33 1.73  

42 67 73 140
0.69 0.75 1.45
47.86 52.14  
1.38 1.51  

43 123 104 227


1.27 1.07 2.34
54.19 45.81  
2.54 2.15  

44 111 74 185
1.15 0.76 1.91
60.00 40.00  
2.29 1.53  

45 123 105 228


1.27 1.08 2.35
53.95 46.05  
2.54 2.17  
Table of DemCluster by TargetB

DemCluster(Demographic Cluster) TargetB(Target Gift Flag) Total

0 1

46 92 104 196
0.95 1.07 2.02
46.94 53.06  
1.90 2.15  

47 52 34 86
0.54 0.35 0.89
60.47 39.53  
1.07 0.70  

48 48 48 96
0.50 0.50 0.99
50.00 50.00  
0.99 0.99  

49 175 148 323


1.81 1.53 3.33
54.18 45.82  
3.61 3.06  

50 35 35 70
0.36 0.36 0.72
50.00 50.00  
0.72 0.72  
Table of DemCluster by TargetB

DemCluster(Demographic Cluster) TargetB(Target Gift Flag) Total

0 1

51 119 101 220


1.23 1.04 2.27
54.09 45.91  
2.46 2.09  

52 19 13 32
0.20 0.13 0.33
59.38 40.63  
0.39 0.27  

53 70 88 158
0.72 0.91 1.63
44.30 55.70  
1.45 1.82  

Total 4843 4843 9686


50.00 50.00 100.00

Statistics for Table of DemCluster by TargetB


Statistic DF Value Prob

Chi-Square 53 90.3768 0.0010


Statistic DF Value Prob

Likelihood Ratio Chi-Square 53 90.7359 0.0010

Mantel-Haenszel Chi-Square 1 8.5881 0.0034

Phi Coefficient   0.0966  

Contingency Coefficient   0.0961  

Cramer's V   0.0966  

Sample Size = 9686

proc freq data=shan.gift1;


tables Demcluster*targetB/nocol nofreq nocum
nopercent chisq;
run;

/* Let’s take this output to excel and check it */

data shan.gift1;
set shan.gift1;
ind_demclus_1=0;
ind_demclus_2=0;
ind_demclus_3=0;
ind_demclus_4=0;
If DemCluster in ( '32') then ind_demclus_1 = 1 ;
If DemCluster in ( '10') then ind_demclus_1 = 1 ;
If DemCluster in ( '47') then ind_demclus_1 = 1 ;
If DemCluster in ( '44') then ind_demclus_1 = 1 ;
If DemCluster in ( '52') then ind_demclus_1 = 1 ;
If DemCluster in ( '06') then ind_demclus_1 = 1 ;
If DemCluster in ( '30') then ind_demclus_1 = 1 ;
else If DemCluster in ( '41') then ind_demclus_2 = 1 ;
else If DemCluster in ( '37') then ind_demclus_2 = 1 ;
else If DemCluster in ( '08') then ind_demclus_2 = 1 ;
else If DemCluster in ( '21') then ind_demclus_2 = 1 ;
else If DemCluster in ( '43') then ind_demclus_2 = 1 ;
else If DemCluster in ( '49') then ind_demclus_2 = 1 ;
else If DemCluster in ( '51') then ind_demclus_2 = 1 ;
else If DemCluster in ( '45') then ind_demclus_2 = 1 ;
else If DemCluster in ( '36') then ind_demclus_2 =1;
else If DemCluster in ( '25') then ind_demclus_3 =1;
else If DemCluster in ( '15') then ind_demclus_3 =1;
else If DemCluster in ( '17') then ind_demclus_3 =1;
else If DemCluster in ( '09') then ind_demclus_3 =1;
else If DemCluster in ( '05') then ind_demclus_3 =1;
else If DemCluster in ( '12') then ind_demclus_3 =1;
else If DemCluster in ( '31') then ind_demclus_3 =1;
else If DemCluster in ( '19') then ind_demclus_3 =1;
else If DemCluster in ( '33') then ind_demclus_3 =1;
else If DemCluster in ( '48') then ind_demclus_3 =1;
else If DemCluster in ( '50') then ind_demclus_3 =1;
else If DemCluster in ( '27') then ind_demclus_3 =1;
else If DemCluster in ( '26') then ind_demclus_3 =1;
else If DemCluster in ( '14') then ind_demclus_3 =1;
else If DemCluster in ( '39') then ind_demclus_3 =1;
else If DemCluster in ( '34') then ind_demclus_3 =1;
else If DemCluster in ( '16') then ind_demclus_3 =1;
else If DemCluster in ( '02') then ind_demclus_3 =1;
else If DemCluster in ( '22') then ind_demclus_4 =1;
else If DemCluster in ( '42') then ind_demclus_4 =1;
else If DemCluster in ( '18') then ind_demclus_4 =1;
else If DemCluster in ( '35') then ind_demclus_4 =1;
else If DemCluster in ( '46') then ind_demclus_4 =1;
else If DemCluster in ( '11') then ind_demclus_4 =1;
else If DemCluster in ( '24') then ind_demclus_4 =1;
else If DemCluster in ( '23') then ind_demclus_4 =1;
else If DemCluster in ( '20') then ind_demclus_4 =1;
else If DemCluster in ( '40') then ind_demclus_4 =1;
else If DemCluster in ( '29') then ind_demclus_4 =1;
else If DemCluster in ( '38') then ind_demclus_4 =1;
else If DemCluster in ( '13') then ind_demclus_4 =1;
else If DemCluster in ( '01') then ind_demclus_4 =1;
else If DemCluster in ( '03') then ind_demclus_4 =1;
else If DemCluster in ( '53') then ind_demclus_4 =1;
else If DemCluster in ( '28') then ind_demclus_4 =1;
else If DemCluster in ( '07') then ind_demclus_4 =1;
run;
/*demgender variable signifcance*/

proc freq data=shan.gift1;


tables demgender*targetb/ chisq;
run;

The FREQ Procedure


Frequency Table of DemGender by TargetB
Percent
Row Pct DemGender(Gender) TargetB(Target Gift Flag) Total

0 1
Col Pct Table of DemGender by TargetB

DemGender(Gender) TargetB(Target Gift Flag) Total

0 1

M 1963 1962 3925


20.27 20.26 40.52
50.01 49.99  
40.53 40.51  

U 266 272 538


2.75 2.81 5.55
49.44 50.56  
5.49 5.62  

Total 4843 4843 9686


50.00 50.00 100.00

Statistics for Table of DemGender by TargetB


Statistic DF Value Prob

Chi-Square 2 0.0720 0.9647

Likelihood Ratio Chi-Square 2 0.0720 0.9647

Mantel-Haenszel Chi-Square 1 0.0346 0.8524


Statistic DF Value Prob

Phi Coefficient   0.0027  

Contingency Coefficient   0.0027  

Cramer's V   0.0027  

Sample Size = 9686


The SAS System

/*categorical variable conversion to indicator Not required*/

proc freq data=shan.gift1;


tables demhomeowner*targetb/ chisq;
run;

The FREQ Procedure


Frequency Table of DemHomeOwner by TargetB
Percent
Row Pct DemHomeOwner(Home Owner) TargetB(Target Gift Flag) Total
Col Pct
0 1
Table of DemHomeOwner by TargetB

DemHomeOwner(Home Owner) TargetB(Target Gift Flag) Total

0 1

U 2174 2135 4309


22.44 22.04 44.49
50.45 49.55  
44.89 44.08  

Total 4843 4843 9686


50.00 50.00 100.00

Statistics for Table of DemHomeOwner by TargetB


Statistic DF Value Prob

Chi-Square 1 0.6359 0.4252

Likelihood Ratio Chi-Square 1 0.6359 0.4252

Continuity Adj. Chi-Square 1 0.6037 0.4372

Mantel-Haenszel Chi-Square 1 0.6358 0.4252

Phi Coefficient   -0.0081  

Contingency Coefficient   0.0081  


Statistic DF Value Prob

Cramer's V   -0.0081  

/*categorical variable conversion to indicator Not required*/

Multi collinearity Treatment


 Randomly diving dataset into two parts – for development and validation of model
 Multi collinearity treatment steps
o Knowing individual strength of variables in explaining the dependent variable
o Knowing variable, which has high multi collinearity
o Knowing which are the variables, with which it is collinear
o Deciding which one to keep among collinear variable
/* dividing the data in to test and validation data*/

data test val ;


set shan.gift1;
if ranuni(1)<=0.7 then output test;
else output val;
run;

/* log file*/

NOTE: There were 9686 observations read from the data set SHAN.GIFT1.
NOTE: The data set WORK.TEST has 6793 observations and 36 variables.
NOTE: The data set WORK.VAL has 2893 observations and 36 variables.
NOTE: DATA statement used (Total process time):
real time 0.07 seconds
cpu time 0.01 second
/* Knowing bi-variate strength of the independent variables in explaining the dependent
variable*/
proc logistic data = test ;
model targetB =
DemAge
DemMedHomeValue
DemMedIncome
DemPctVeterans
GiftAvg36
GiftAvgAll
GiftAvgCard36
GiftAvgLast
GiftCnt36
GiftCntAll
GiftCntCard36
GiftCntCardAll
GiftTimeFirst
GiftTimeLast
PromCnt12
PromCnt36
PromCntAll
PromCntCard12
PromCntCard36
PromCntCardAll
StatusCatStarAll
flagDemAge
flagGiftAvgCard36
ind_demclus_1
ind_demclus_2
ind_demclus_3
ind_demclus_4
ind_stcat96nk_A_L_N
ind_stcat96nk_E_or_S/selection = stepwise maxstep=1 details;
ods output EffectNotInModel = log_data ;
run;
/* Multi collinearity treatment – step 01
Note – we also dropped some insignificant variables based on bi-variate strength*/

proc reg data = test ;


model targetB =
DemAge
DemMedHomeValue
GiftAvg36
GiftAvgAll
GiftAvgLast
GiftCnt36
GiftCntAll
GiftCntCard36
GiftCntCardAll
GiftTimeFirst
GiftTimeLast
PromCnt12
PromCnt36
PromCntAll
PromCntCard12
PromCntCard36
PromCntCardAll
StatusCatStarAll
flagGiftAvgCard36
ind_demclus_1
ind_demclus_2
ind_demclus_4
ind_stcat96nk_A_L_N
ind_stcat96nk_E_or_S
/ vif collin; ODS OUTPUT CollinDiag = collin_data (drop = intercept) ParameterEstimates
= para_data; run;
/* First cycle of VIF we removed promcntcardall*/

/* Close look at complete multi collinearity removal output*/


Final Model development
 Trying to create model based on variables left after multi collinearity treatment
o Checking model estimate and variables significance
o Selecting best variables using step wise regression
o Generate variables coefficient in validation data set
o Checking coefficient stability
/*Model with 10 variables to show the significance of all the variables coming after
multicollinearity test*/
proc logistic data = test outmodel = model_1;
model targetb(event = '1') =
DemAge
DemMedHomeValue
GiftAvg36
GiftCnt36
GiftCntCardAll
GiftTimeLast
ind_demclus_1
ind_demclus_2
ind_demclus_4
ind_stcat96nk_E_or_S/
details;
ods output EffectNotInModel = log_data ;
run;
/*final model variable selection using step wise regression */
proc logistic data = test outmodel = model_1;
model targetb(event = '1') =
DemAge
DemMedHomeValue
GiftAvg36
GiftCnt36
GiftCntCardAll
GiftTimeLast
ind_demclus_1
ind_demclus_2
ind_demclus_4
ind_stcat96nk_E_or_S/
selection = stepwise maxstep=8 details;
ods output EffectNotInModel = log_data ;
run;
/*Developing final model variable coefficients on validation data set*/

proc logistic data = val ;


model targetb(event = '1') =
DemMedHomeValue
GiftAvg36
GiftCnt36
GiftCntCardAll
GiftTimeLast
ind_demclus_1
ind_demclus_2
ind_demclus_4
;
run;

/* Take a look at coefficient stability worksheet */


Knowing model strength
 Obtaining other popular measures of model strength
o Generating score in the development data set
o Understand what is actually the score
o How can you get score manually as well
o Calculating KS statistics on development data
o Checking scoring ability of model - calculating KS statistics on validation data
(using the model developed on development data)
/*Keeping model coefficients in a data set*/

proc logistic data = test outmodel = model_1;


model targetb(event = '1') =
DemMedHomeValue
GiftAvg36
GiftCnt36
GiftCntCardAll
GiftTimeLast
ind_demclus_1
ind_demclus_2
ind_demclus_4
;
run;

/* Generating score in the test data */

proc logistic inmodel = model_1;


score data= test out = predicted;
run;

/* Proc contents of test data just to see what extra fields were added */

proc contents data=predicted;


run;

/* Understand how proc logistic generates score in the dataset


And what is the score actually */

data predicted;
set predicted;
P_0_D = round(P_0*1000,0.1);
log_odds=0.2751 +
DemMedHomeValue*9.425E-7 +
GiftAvg36*-0.00915 +
GiftCnt36*0.0847 +
GiftCntCardAll*0.0273 +
GiftTimeLast*-0.0362 +
ind_demclus_1*-0.3611 +
ind_demclus_2*-0.2279 +
ind_demclus_4*0.1434 ;
prob=exp(log_odds)/(1+exp(log_odds));

run;

/*P_0=Probability of '0' in the model


P_1=Probability of 1
prob =e(logs_odds/(1+exp(log_odds) this is the derived probability value using equation
which should be equal to P_1 in the predicted dataset(almost equal)
P_0_D=It is P_0 Multiplied by 1000 to make it easier to read for the user*/

proc print data=predicted (obs=50);


var DemMedHomeValue
GiftAvg36
GiftCnt36
GiftCntCardAll
GiftTimeLast
ind_demclus_1
ind_demclus_2
ind_demclus_4
P_0
P_1
log_odds
Prob
P_0_D;
run;

/*Creates ten deciles for the score variable on the dataset.


The decile will be ascending(P_0)
Please note lower value of P_0 is same as high value of P_1, hence more of outcome =1
*/

proc rank data=predicted out=practice group=10 ties=low ;


var P_0_D;
ranks P_Final;
run;
/*Check how does the actual score and ranked variable look like */
proc print data=practice(obs=50);
var P_0_D P_Final ;
run;

/*Getting figures to calculate KS and Gini in Development datset */


proc sql ;
select P_final, min(P_0_D)as Min_score, max(P_0_D)as Max_score, sum(1*targetB) as
responder, count(targetB) as population
from practice
group by P_final
order by P_final
;
quit;

/*Scoring the validation dataset – using coefficients obtained on development data*/

proc logistic inmodel = model_1;


score data= val out = predicted;
run;

data predicted;
set predicted;
P_0_D = round(P_0*1000,0.1);
run;

proc rank data=predicted out=practice group=10 ties=low ;


var P_0_D;
ranks P_Final;
run;

proc print data=practice(obs=50);


var P_0_D P_Final ;
run;

proc sql ;
select P_final, min(P_0_D)as Min_score, max(P_0_D)as Max_score, sum(1*targetB) as
responder, count(targetB) as population
from practice
group by P_final
order by P_final
;
quit;

You might also like