You are on page 1of 26

Y Xun a) answer X normal Acual Y Pcut = 0.4 Pcut = 0.

5
0 20 0 0 0 0
0 22 0.125 0 0 0
0 24 0.25 0 0 0
0 26 0.375 0 0 0
0 28 0.5 0 1 1
0 28.6 0.5375 0 1 1
0 27 0.4375 0 1 0
1 27.4 0.4625 1 1 0
1 28 0.5 1 1 1
1 28.4 0.525 1 1 1
1 29 0.5625 1 1 1
1 30 0.625 1 1 1
1 32 0.75 1 1 1
1 34 0.875 1 1 1
1 36 1 1 1 1

c) Answer

When we chose random 5 varia


Lift chart (training dataset) But when we got the best 5, we
9
8
7
6 Cumulative Y when
Cumulative

5 sorted using predicted


values
4
3 Cumulative Y using
average
2
1
0
0 2 4 6 8 10 12 14 16
# cases

d) Answer

Y Xun X normal Pi Acual Y Pcut = 0.4 Pcut = 0.5


0 20 0 0.1464 0 0 0
0 22 0.125 0.1938 0 0 0
0 24 0.25 0.2500 0 0 0
0 26 0.375 0.3232 0 0 0
0 28 0.5 0.5000 0 1 1
0 28.6 0.5375 0.5968 0 1 1
0 27 0.4375 0.3750 0 0 0
1 27.4 0.4625 0.4032 1 1 0
1 28 0.5 0.5000 1 1 1
1 28.4 0.525 0.5791 1 1 1
1 29 0.5625 0.6250 1 1 1
1 30 0.625 0.6768 1 1 1
1 32 0.75 0.7500 1 1 1
1 34 0.875 0.8062 1 1 1
1 36 1 0.8536 1 1 1

When Pcut value = 0.5, these two methods show the same Sensitivity & Specificity. However wh
Therefore d) rule would be better.
Pcut = 0.6 B) Answers
0 Pcut value = 0.4
0 Confusion Matrix Predicted Class Accuracy 0.80
0 Actual class 1 0 Sensitivity 1.00
0 1 8 0 Specificity 0.57
0 0 3 4
0
0 Pcut value = 0.5
0 Confusion Matrix Predicted Class Accuracy 0.80
0 Actual class 1 0 Sensitivity 0.88
0 1 7 1 Specificity 0.71
0 0 2 5
1
1 Pcut value = 0.6
1 Confusion Matrix Predicted Class Accuracy 0.73
1 Actual class 1 0 Sensitivity 0.50
1 4 4 Specificity 1.00
0 0 7

hen we chose random 5 variables, it could correct nearly over 2 successes


t when we got the best 5, we can get almost 5 successes. So this prediction will be shown correctly.

Pcut = 0.6 Pcut value = 0.4


0 Confusion Matrix Predicted Class Accuracy 0.87
0 Actual class 1 0 Sensitivity 1.00
0 1 8 0 Specificity 0.71
0 0 2 5
0
0 Pcut value = 0.5
0 Confusion Matrix Predicted Class Accuracy 0.80
0 Actual class 1 0 Sensitivity 0.88
0 1 7 1 Specificity 0.71
0 0 2 5
1
1 Pcut value = 0.6
1 Confusion Matrix Predicted Class Accuracy 0.80
1 Actual class 1 0 Sensitivity 0.63
1 1 5 3 Specificity 1.00
0 0 7

ty & Specificity. However when we chose other Pcut value, D's pi shows better resaults.
XLMiner : Multiple Linear Regression

Output Navigator

Inputs Train. Score - Summary Valid. Score - Summary Test Score - Summary Database Score

Elapsed Time Train. Score - Detailed Rep. Valid. Score - Detailed Rep. Test Score - Detailed Rep. New Score - Detailed Rep.

ANOVA Training Lift Charts Validation Lift Charts Test Lift Charts Subset selection

Reg. Model Residuals Var. Covar. Matrix Collinearity Diagnostics

Inputs

Data
Training data used for building the model ['2007007723_이승훈_Data Mining_HW#3.xlsx']'data'!
$A$34:$D$48
# Records in the training data 15

Variables
# Input Variables 1

Input variables Pi

Output variable Y
Constant term present Yes

Output options chosen


Summary report of scoring on training data
Lift charts on training data

The Regression Model

Input variables Coefficient Std. Error p-value SS


Constant term -0.32129854 0.24628934 0.21466681 4.26666689
Pi 1.69143438 0.44914669 0.00235517 1.94783044

Training Data scoring - Summary Report

Total sum of
RMS Error Average Error
squared errors

1.78550285446 0.3450123529 4.1354902E-08

Elapsed Time

Overall (secs) 4.00


Date: 09-Oct-2013 20:01:05 (Ver: 12.5.3E)

Database Score

New Score - Detailed Rep.

Subset selection

Residual df 13
R-squared 0.5217402941
Std. Dev. estimate 0.37060273
Residual SS 1.78550291
$A$33:$D$48
XLMiner : Multiple Linear Regression - Lift chart for training data

Decile-wise lift chart (training datase


Lift chart (training dataset)
2
9 1.8

Decile mean / Global mean


8 1.6
7 1.4
6 Cumulative Y when 1.2
Cumulative

5 sorted using predicted 1


values
4 0.8
3 Cumulative Y using
average 0.6
2 0.4
1 0.2
0 0
0 2 4 6 8 10 12 14 16 1 2 3 4 5 6 7 8
# cases Deciles

Decile Mean Std.Dev.


1 1 0
2 1 0
3 1 0
4 1 0
5 1 0
6 0 0
7 1 0
8 0 0
9 1 0
10 0.1666666667 0.3726779962
Date: 09-Oct-2013 20:01:06 (Ver: 12.5.3E)

Back to Navigator

e lift chart (training dataset)

3 4 5 6 7 8 9 10
Deciles

Min. Max.
1 1
1 1
1 1
1 1
1 1
0 0
1 1
0 0
1 1
0 1
Serial no. in training data in training data edicted values
1 1.1224310107 1 1
2 1.0423125458 1 2
3 0.947277245 1 3
4 0.8234248295 1 4
5 0.7358479475 1 5
6 0.6881910802 0 5
7 0.6581382797 1 6
8 0.52441865 0 6
9 0.52441865 1 7
10 0.3606462198 1 8
11 0.3129893525 0 8
12 0.2254124705 0 8
13 0.101560055 0 8
14 0.0065247542 0 8
15 -0.073593711 0 8
using average Deciles / Global mean
0.5333333333 1 1.875
1.0666666667 2 1.875
1.6 3 1.875
2.1333333333 4 1.875
2.6666666667 5 1.875
3.2 6 0
3.7333333333 7 1.875
4.2666666667 8 0
4.8 9 1.875
5.3333333333 10 0.3125
5.8666666667
6.4
6.9333333333
7.4666666667
8
XLMiner : Multiple Linear Regression

DataSource
WorkBook Path D:\2013 2학기 수업\Data Mining\HW3
WorkBook Name 2007007723_이승훈_Data Mining_HW#3.xlsx
Training Range [data]!$A$34:$D$48
#Training Rows 15
#Variables in Data set 4
#Selected Variables 2

Data Dictionary
Variables in Data Set Y Xun X normal Pi
Variable Type* Continuous Continuous Continuous Continuous
Variable Data Type Number Number Number Number

Mining Schema
Selected Variables Pi Y
Variable Type Input Output
Inputs Normalised No
Constant term present Yes

Model

Input Variables Coefficient


Constant Term -0.32129854
Pi 1.69143438
Date: 09-Oct-2013 20:01:08 (Ver: 12.5.3E)

*This is an indication of how XLMiner stores this variable for later retrieval; it does not necessarily reflect what type of variable was originall
what type of variable was originally input.
XLMiner : Multiple Linear Regression

Output Navigator

Inputs Train. Score - Summary Valid. Score - Summary Test Score - Summary Database Score

Elapsed Time Train. Score - Detailed Rep. Valid. Score - Detailed Rep. Test Score - Detailed Rep. New Score - Detailed Rep.

ANOVA Training Lift Charts Validation Lift Charts Test Lift Charts Subset selection

Reg. Model Residuals Var. Covar. Matrix Collinearity Diagnostics

Inputs

Data
Training data used for building the model ['2007007723_이승훈_Data Mining_HW#3.xlsx']'data'!
$A$2:$D$16
# Records in the training data 15

Variables
# Input Variables 1

Input variables X normal

Output variable Y
Constant term present Yes

Output options chosen


Summary report of scoring on training data
Lift charts on training data

The Regression Model

Input variables Coefficient Std. Error p-value SS


Constant term -0.14704199 0.22532152 0.52539462 4.26666689
X normal 1.3562299 0.40151784 0.00494938 1.74501586

Training Data scoring - Summary Report

Total sum of
RMS Error Average Error
squared errors

1.988317449163 0.3640803436 -9.8333333E-09

Elapsed Time

Overall (secs) 2.00


Date: 09-Oct-2013 20:01:32 (Ver: 12.5.3E)

Database Score

New Score - Detailed Rep.

Subset selection

Residual df 13
R-squared 0.4674149604
Std. Dev. estimate 0.39108503
Residual SS 1.98831749
$A$1:$D$16
XLMiner : Multiple Linear Regression - Lift chart for training data

Decile-wise lift chart (training datase


Lift chart (training dataset)
2
9 1.8

Decile mean / Global mean


8 1.6
7 1.4
6 Cumulative Y when 1.2
Cumulative

5 sorted using predicted 1


values
4 0.8
3 Cumulative Y using
average 0.6
2 0.4
1 0.2
0 0
0 2 4 6 8 10 12 14 16 1 2 3 4 5 6 7 8
# cases Deciles

Decile Mean Std.Dev.


1 1 0
2 1 0
3 1 0
4 1 0
5 1 0
6 0 0
7 1 0
8 0 0
9 1 0
10 0.1666666667 0.3726779962
Date: 09-Oct-2013 20:01:33 (Ver: 12.5.3E)

Back to Navigator

e lift chart (training dataset)

3 4 5 6 7 8 9 10
Deciles

Min. Max.
1 1
1 1
1 1
1 1
1 1
0 0
1 1
0 0
1 1
0 1
Serial no. in training data in training data edicted values
1 1.20918791 1 1
2 1.0396591725 1 2
3 0.870130435 1 3
4 0.7006016975 1 4
5 0.6158373288 1 5
6 0.5819315812 0 5
7 0.5649787075 1 6
8 0.53107296 0 6
9 0.53107296 1 7
10 0.4802143388 1 8
11 0.4463085912 0 8
12 0.3615442225 0 8
13 0.192015485 0 8
14 0.0224867475 0 8
15 -0.14704199 0 8
using average Deciles / Global mean
0.5333333333 1 1.875
1.0666666667 2 1.875
1.6 3 1.875
2.1333333333 4 1.875
2.6666666667 5 1.875
3.2 6 0
3.7333333333 7 1.875
4.2666666667 8 0
4.8 9 1.875
5.3333333333 10 0.3125
5.8666666667
6.4
6.9333333333
7.4666666667
8
XLMiner : Multiple Linear Regression

DataSource
WorkBook Path D:\2013 2학기 수업\Data Mining\HW3
WorkBook Name 2007007723_이승훈_Data Mining_HW#3.xlsx
Training Range [data]!$A$2:$D$16
#Training Rows 15
#Variables in Data set 4
#Selected Variables 2

Data Dictionary
Variables in Data Set Y Xun a) answer X normal
Variable Type* Continuous Continuous Categorical Continuous
Variable Data Type Number Number String Number

Mining Schema
Selected Variables X normal Y
Variable Type Input Output
Inputs Normalised No
Constant term present Yes

Model

Input Variables Coefficient


Constant Term -0.14704199
X normal 1.3562299
Date: 09-Oct-2013 20:01:33 (Ver: 12.5.3E)

*This is an indication of how XLMiner stores this variable for later retrieval; it does not necessarily reflect what type of variable was originall
what type of variable was originally input.

You might also like