You are on page 1of 2

20 marks objective test

10 marks ppt
20 marks project
UCI website for data mining
google data for data mining
data dictionary where all variables are defined
variable name is in german
case of logistic regression, so ans will be in the form of if the person is a go
od person ( will repay the amount ) ?
Data is categorical so create a dummy variable
Bi variate profiling next class
duration is aledmount of credit is continous
create a new column called random =RANDBETWEEN(1,10)
save and close
spss
Convert duration to scale variable,hoehe , and alter are scale variables
analyze-regression-->binary logistic
dependent variable is credit
balance credit and everything except random is covariate
random is selection variable
1)click on rule and do less than or equal to 7 that means random number which ar
e less than or equal to 7 will be a part of training data and reamining is vali
dation data
2)there are two types of data traininng data and validation data
3) clic on categorical and select all variables and unselect duration hoehe and
alter
4) click on first as reference c ategory is compared to others reference is 0 me
ans NO
5) click on save and click under predicted values on probabilites
6)click on option and cluck on hosmer_lemeshow
7) Methord is forward LR is same as stepwise methord in linear regression
Analysis
case processing summary
736 cases selected for test data
dependent variable is 0 or 1
categorical variable coding 0 is taken as dummy variable which means model witho
ut it shows how different variables are defined in categorical form
Beginning block
model is predicted 69.6 in selected cases and 66.7 in unselected cases
Variables in the Equation
Block 1 mehtord=forward stepwise is the methord tell our computer to run the log
istic reegression
Model summary
-2log likelihood tell that the last model likelihood values becomes less as soon
the step increases
Hosmer test tell that how many expected cases are equal to observed cases. Rejec
t null hypothesis if P<=alpha
null is ob=exp
significance value is 0.969 so not reject the value
Variables in the equation
all the signicatn variables that predict 0 and 1 in the model
now click on Pre_1 which means answer wald=b/s.e if wald value is high it is goo
d
to see model has predicted test well or not ROC curve
Test variable is predicted prob
state variable is credit
value of state variable is 1
roc curve with diagonal reference line
area under the curve more it is close to 1 means model has preicted well
ROC curve is sensistivity vs
click on file --save as excel file
open the excel file and apply filter
click on random and uncheck all except 8 , 9 ,10 since we have to apply test in
excel sps validate for 1--7
create new sheet after applying random filter
sort predicted value largest to smallest
copy credit column after predicted column name
Create a decile which means dividing data set into buckets .
now give 1 to all first 265 variables
265 is the total count
select credit and decile and create a new pivot table
decile to row lables
and credit to values
Cumulative good
to determine cuttoff the maximum profit than go to cu bad avoided % fourth decil
e