0% found this document useful (0 votes)

148 views18 pages

Cubist

The document describes Cubist, an R package that implements regression using model trees. Cubist grows a decision tree where the terminal nodes contain linear regression models. It can create a single model tree or ensembles using "committees" of iteratively built model trees. An example is provided using the Boston Housing data to illustrate a model tree produced by Cubist.

Uploaded by

afnytiara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

148 views18 pages

Cubist

Uploaded by

afnytiara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Cubist Models For Regression

Max Kuhn (max.kuhn@pfizer.com)

Steve Weston
Chris Keefer
Nathan Coulter

December 10, 2016

1 Introduction
Cubist is an R port of the Cubist GPL C code released by RuleQuest at

http://rulequest.com/cubist-info.html

See the last section of this document for information on the porting. The other parts describes the
functionality of the R package.

2 Model Trees
Cubist is a rule–based model that is an extension of Quinlan’s M5 model tree. A tree is grown where
the terminal leaves contain linear regression models. These models are based on the predictors used
in previous splits. Also, there are intermediate linear models at each step of the tree. A prediction
is made using the linear regression model at the terminal node of the tree, but is “smoothed” by
taking into account the prediction from the linear model in the previous node of the tree (which
also occurs recursively up the tree). The tree is reduced to a set of rules, which initially are paths
from the top of the tree to the bottom. Rules are eliminated via pruning and/or combined for
simplification.
This is explained better in Quinlan (1992). Wang and Witten (1997) attempted to recreate this
model using a “rational reconstruction” of Quinlan (1992) that is the basis for the M5P model in
Weka (and the R package RWeka).
Cubist Models For Regression

An example of a model tree can be illustrated using the Boston Housing data in the mlbench
package.

> library(Cubist)
> library(mlbench)
> data(BostonHousing)
> BostonHousing$chas <- as.numeric(BostonHousing$chas) - 1
> set.seed(1)
> inTrain <- sample(1:nrow(BostonHousing), floor(.8*nrow(BostonHousing)))
> trainingPredictors <- BostonHousing[ inTrain, -14]
> testPredictors <- BostonHousing[-inTrain, -14]
> trainingOutcome <- BostonHousing$medv[ inTrain]
> testOutcome <- BostonHousing$medv[-inTrain]
> modelTree <- cubist(x = trainingPredictors, y = trainingOutcome)
> modelTree

Call:
cubist.default(x = trainingPredictors, y = trainingOutcome)

Number of samples: 404

Number of predictors: 13

Number of committees: 1
Number of rules: 4

> summary(modelTree)

Call:
cubist.default(x = trainingPredictors, y = trainingOutcome)

Cubist [Release 2.07 GPL Edition] Sat Dec 10 21:42:00 2016

---------------------------------

Target attribute `outcome'

Read 404 cases (14 attributes) from undefined.data

Model:

Rule 1: [88 cases, mean 13.81, range 5 to 27.5, est err 2.10]

if
nox > 0.668
then
outcome = 2.07 + 3.14 dis - 0.35 lstat + 18.8 nox + 0.007 b
- 0.12 ptratio - 0.008 age - 0.02 crim

Rule 2: [153 cases, mean 19.54, range 8.1 to 31, est err 2.16]

2 of 18
Cubist Models For Regression

if
nox <= 0.668
lstat > 9.59
then
outcome = 34.81 - 1 dis - 0.72 ptratio - 0.056 age - 0.19 lstat + 1.5 rm
- 0.11 indus + 0.004 b

Rule 3: [39 cases, mean 24.10, range 11.9 to 50, est err 2.73]

if
rm <= 6.23
lstat <= 9.59
then
outcome = 11.89 + 3.69 crim - 1.25 lstat + 3.9 rm - 0.0045 tax
- 0.16 ptratio

Rule 4: [128 cases, mean 31.31, range 16.5 to 50, est err 2.95]

if
rm > 6.23
lstat <= 9.59
then
outcome = -1.13 + 1.6 crim - 0.93 lstat + 8.6 rm - 0.0141 tax
- 0.83 ptratio - 0.47 dis - 0.019 age - 1.1 nox

Evaluation on training data (404 cases):

Average |error| 2.27

Relative |error| 0.34
Correlation coefficient 0.94

Attribute usage:
Conds Model

78% 100% lstat

59% 53% nox
41% 78% rm
100% ptratio
90% age
90% dis
62% crim
59% b
41% tax
38% indus

Time: 0.0 secs

3 of 18
Cubist Models For Regression

There is no formula method for cubist; the predictors are specified as matrix or data frame and the
outcome is a numeric vector.
There is a predict method for the model:

> mtPred <- predict(modelTree, testPredictors)

> ## Test set RMSE
> sqrt(mean((mtPred - testOutcome)^2))

[1] 3.337924

> ## Test set R^2

> cor(mtPred, testOutcome)^2

[1] 0.8573504

3 Ensembles By Committees
The Cubist model can also use a boosting–like scheme called committees where iterative model
trees are created in sequence. The first tree follows the procedure described in the last section.
Subsequent trees are created using adjusted versions to the training set outcome: if the model
over–predicted a value, the response is adjusted downward for the next model (and so on). Unlike
traditional boosting, stage weights for each committee are not used to average the predictions from
each model tree; the final prediction is a simple average of the predictions from each model tree.
The committee option can be used to control number of model trees:

> set.seed(1)
> committeeModel <- cubist(x = trainingPredictors, y = trainingOutcome,
+ committees = 5)
> summary(committeeModel)

Call:
cubist.default(x = trainingPredictors, y = trainingOutcome, committees = 5)

Cubist [Release 2.07 GPL Edition] Sat Dec 10 21:42:00 2016

---------------------------------

Target attribute `outcome'

Read 404 cases (14 attributes) from undefined.data

Model 1:

4 of 18
Cubist Models For Regression

Rule 1/1: [88 cases, mean 13.81, range 5 to 27.5, est err 2.10]

if
nox > 0.668
then
outcome = 2.07 + 3.14 dis - 0.35 lstat + 18.8 nox + 0.007 b
- 0.12 ptratio - 0.008 age - 0.02 crim

Rule 1/2: [153 cases, mean 19.54, range 8.1 to 31, est err 2.16]

if
nox <= 0.668
lstat > 9.59
then
outcome = 34.81 - 1 dis - 0.72 ptratio - 0.056 age - 0.19 lstat + 1.5 rm
- 0.11 indus + 0.004 b

Rule 1/3: [39 cases, mean 24.10, range 11.9 to 50, est err 2.73]

if
rm <= 6.23
lstat <= 9.59
then
outcome = 11.89 + 3.69 crim - 1.25 lstat + 3.9 rm - 0.0045 tax
- 0.16 ptratio

Rule 1/4: [128 cases, mean 31.31, range 16.5 to 50, est err 2.95]

if
rm > 6.23
lstat <= 9.59
then
outcome = -1.13 + 1.6 crim - 0.93 lstat + 8.6 rm - 0.0141 tax
- 0.83 ptratio - 0.47 dis - 0.019 age - 1.1 nox

Model 2:

Rule 2/1: [71 cases, mean 13.41, range 5 to 27.5, est err 2.66]

if
crim > 5.69175
dis > 1.4254
then
outcome = 42.13 + 2.45 dis - 0.47 lstat - 0.71 ptratio - 1.8 rm

Rule 2/2: [84 cases, mean 18.75, range 8.1 to 27.5, est err 2.25]

if
crim <= 5.69175
nox > 0.532
dis > 1.4254

5 of 18
Cubist Models For Regression

tax > 222

ptratio > 17
then
outcome = 44.08 + 1.19 crim - 0.43 lstat - 1.05 ptratio - 0.011 age

Rule 2/3: [15 cases, mean 23.43, range 5 to 50, est err 5.62]

if
dis <= 1.4254
ptratio > 17
then
outcome = 174.86 - 100.95 dis - 1.07 lstat - 0.09 ptratio

Rule 2/4: [77 cases, mean 23.90, range 11.8 to 50, est err 2.37]

if
ptratio <= 17
lstat > 5.12
then
outcome = -3.3 + 8.3 rm - 0.0238 tax - 1.66 dis - 0.063 age - 0.1 lstat
- 0.21 ptratio - 3.8 nox + 0.007 zn

Rule 2/5: [128 cases, mean 25.56, range 14.4 to 50, est err 3.12]

if
crim <= 5.69175
nox <= 0.532
ptratio > 17
then
outcome = -15.58 + 2.43 crim + 7.1 rm - 0.075 age + 0.24 lstat
- 0.41 dis - 0.16 ptratio

Rule 2/6: [16 cases, mean 27.91, range 15.7 to 39.8, est err 5.25]

if
tax <= 222
lstat > 5.12
then
outcome = 274.62 - 12.31 ptratio - 0.212 age - 0.03 lstat

Rule 2/7: [18 cases, mean 30.49, range 22.5 to 50, est err 3.69]

if
rm <= 6.861
lstat <= 5.12
then
outcome = -58.03 + 10.96 crim + 13.3 rm - 0.03 lstat - 0.08 dis
- 0.06 ptratio - 1.1 nox

Rule 2/8: [19 cases, mean 41.54, range 31.2 to 50, est err 3.63]

6 of 18
Cubist Models For Regression

if
rm > 6.861
age <= 71
lstat <= 5.12
then
outcome = -56.93 + 14.2 rm - 0.07 lstat - 0.2 dis - 2.6 nox
- 0.13 ptratio + 0.006 zn

Rule 2/9: [14 cases, mean 43.48, range 22.8 to 50, est err 5.55]

if
age > 71
lstat <= 5.12
then
outcome = -24.48 + 1.99 crim + 0.467 age + 3.5 rm

Model 3:

Rule 3/1: [88 cases, mean 13.81, range 5 to 27.5, est err 2.32]

if
nox > 0.668
then
outcome = -9 + 5.5 dis + 19.4 nox + 0.014 b - 0.12 lstat - 0.16 ptratio
- 0.04 crim

Rule 3/2: [10 cases, mean 17.64, range 11.7 to 27.5, est err 11.68]

if
nox <= 0.668
b <= 179.36
then
outcome = -2.07 + 0.149 b + 0.77 lstat

Rule 3/3: [156 cases, mean 19.68, range 8.1 to 33.8, est err 2.23]

if
nox <= 0.668
lstat > 9.53
then
outcome = 28.56 - 1.09 dis - 0.27 lstat - 0.068 age + 2.6 rm
- 0.6 ptratio

Rule 3/4: [164 cases, mean 29.68, range 11.9 to 50, est err 3.44]

if
lstat <= 9.53
then
outcome = 6.57 + 4.08 crim - 0.75 lstat + 7.6 rm - 0.0301 tax
- 0.79 ptratio - 0.15 dis - 2.2 nox + 0.001 b

7 of 18
Cubist Models For Regression

Model 4:

Rule 4/1: [335 cases, mean 19.44, range 5 to 50, est err 2.69]

if
rm <= 7.079
lstat > 5.12
then
outcome = 45.08 - 0.4 lstat + 0.27 rad - 0.0124 tax - 0.2 crim
- 0.6 ptratio - 8.5 nox - 0.36 dis - 0.04 indus

Rule 4/2: [19 cases, mean 20.96, range 5 to 50, est err 6.81]

if
rm <= 7.079
dis <= 1.4261
then
outcome = 163.2 - 85.4 dis - 1.21 lstat - 0.15 crim

Rule 4/3: [111 cases, mean 23.01, range 14.4 to 32, est err 1.92]

if
nox <= 0.51
rm <= 7.079
tax > 193
lstat > 5.12
then
outcome = 9.18 + 12.12 crim + 2.8 rm - 0.031 age - 0.05 lstat + 0.04 rad
- 0.002 tax - 0.1 ptratio - 0.1 dis - 1.6 nox

Rule 4/4: [9 cases, mean 24.33, range 15.7 to 36.2, est err 7.38]

if
rm <= 7.079
tax <= 193
lstat > 5.12
then
outcome = 22.72

Rule 4/5: [18 cases, mean 30.49, range 22.5 to 50, est err 4.91]

if
rm <= 6.861
lstat <= 5.12
then
outcome = 20.95 + 8.16 crim - 0.54 lstat + 0.23 rad + 1.3 rm

Rule 4/6: [35 cases, mean 36.15, range 22.5 to 50, est err 3.61]

if
age <= 71

8 of 18
Cubist Models For Regression

lstat <= 5.12

then
outcome = -67.4 + 15.9 rm - 1.05 rad - 0.005 b - 0.05 lstat

Rule 4/7: [43 cases, mean 39.37, range 15 to 50, est err 6.37]

if
rm > 7.079
then
outcome = -123.73 + 0.308 b + 8.8 rm - 0.45 rad - 1.38 ptratio
- 0.04 lstat - 0.0016 tax - 0.1 dis - 1.2 nox - 0.02 indus
- 0.01 crim

Rule 4/8: [14 cases, mean 43.48, range 22.8 to 50, est err 5.14]

if
age > 71
lstat <= 5.12
then
outcome = -34.28 + 0.598 age - 0.75 lstat + 6.1 rm - 0.047 b + 0.16 rad

Model 5:

Rule 5/1: [88 cases, mean 13.81, range 5 to 27.5, est err 2.73]

if
nox > 0.668
then
outcome = -35.12 + 8.59 dis + 38.7 nox + 0.017 b - 0.04 lstat
- 0.07 ptratio + 0.01 rad + 0.1 rm

Rule 5/2: [156 cases, mean 19.68, range 8.1 to 33.8, est err 2.53]

if
nox <= 0.668
lstat > 9.53
then
outcome = 44.88 - 1.48 dis - 0.076 age - 0.28 lstat - 0.8 ptratio
+ 0.012 b + 0.1 rad + 0.3 rm - 1.6 nox - 0.0007 tax

Rule 5/3: [189 cases, mean 24.76, range 12.7 to 50, est err 2.41]

if
dis > 3.3175
then
outcome = -24.62 + 1.13 crim + 10.4 rm - 0.0183 tax - 0.69 dis
- 0.19 lstat - 0.043 age - 0.26 ptratio + 0.022 zn

Rule 5/4: [44 cases, mean 35.04, range 11.9 to 50, est err 6.37]

9 of 18
Cubist Models For Regression

dis <= 3.3175

lstat <= 9.53
then
outcome = 32.74 + 6.34 crim - 0.0468 tax - 0.87 lstat + 5.5 rm
- 1.16 ptratio

Evaluation on training data (404 cases):

Average |error| 1.91

Relative |error| 0.29
Correlation coefficient 0.96

Attribute usage:
Conds Model

65% 99% lstat

46% 56% nox
32% 71% rm
18% 88% dis
13% 95% ptratio
12% 65% crim
9% 55% tax
4% 56% age
36% b
34% rad
23% indus
12% zn

Time: 0.0 secs

For this model:

> cmPred <- predict(committeeModel, testPredictors)

> ## RMSE
> sqrt(mean((cmPred - testOutcome)^2))

[1] 2.870122

> ## R^2
> cor(cmPred, testOutcome)^2

[1] 0.8955536

10 of 18
Cubist Models For Regression

4 Instance–Based Corrections
Another innovation in Cubist using nearest–neighbors to adjust the predictions from the rule–based
model. First, a model tree (with or without committees) is created. Once a sample is predicted by
this model, Cubist can find it’s nearest neighbors and determine the average of these training set
points. See Quinlan (1993a) for the details of the adjustment.
The development of rules and committees is independent of the choice of using instances. The
original C code allowed the program to choose whether to use instances, not use them or let the
program decide. Our approach is to build a model with the cubist function that is ignorant to
the decision about instances. When samples are predicted, the argument neighbors can be used to
adjust the rule–based model predictions (or not).
We can add instances to the previously fit committee model:

> instancePred <- predict(committeeModel, testPredictors, neighbors = 5)

> ## RMSE
> sqrt(mean((instancePred - testOutcome)^2))

[1] 2.693565

> ## R^2
> cor(instancePred, testOutcome)^2

[1] 0.9111374

Note that the previous models used the implicit default of neighbors = 0 for their predictions.
To tune the model over different values of neighbors and committees, the train function in the
caret package can be used to optimize these parameters. For example:

> library(caret)
> set.seed(1)
> cTune <- train(x = trainingPredictors, y = trainingOutcome,
+ "cubist",
+ tuneGrid = expand.grid(committees = c(1, 10, 50, 100),
+ neighbors = c(0, 1, 5, 9)),
+ trControl = trainControl(method = "cv"))
> cTune

Cubist

404 samples
13 predictor

No pre-processing
Resampling: Cross-Validated (10 fold)

11 of 18
Cubist Models For Regression

#Instances
0 ● 1 5 9

RMSE (Cross−Validation) 4.0

3.8

3.6

3.4 ●

●
●
3.2

3.0

0 20 40 60 80 100

#Committees

Figure 1: The relationship between performance and the two tuning parameters, as estimated using
cross–validation.

Summary of sample sizes: 363, 363, 363, 363, 362, 365, ...
Resampling results across tuning parameters:

committees neighbors RMSE Rsquared

1 0 4.081800 0.7916640
1 1 4.111955 0.7950087
1 5 3.943515 0.8054412
1 9 3.959522 0.8022459
10 0 3.371765 0.8597818
10 1 3.370218 0.8681521
10 5 3.168392 0.8767757
10 9 3.207153 0.8725973
50 0 3.238911 0.8704658
50 1 3.257555 0.8741483
50 5 3.035711 0.8845178
50 9 3.071004 0.8810091
100 0 3.211165 0.8713608
100 1 3.254918 0.8739276
100 5 3.005851 0.8855715
100 9 3.044205 0.8820627

RMSE was used to select the optimal model using the smallest value.
The final values used for the model were committees = 100 and neighbors = 5.

Figure 1 shows the profiles of the tuning parameters produced using plot(cTune).
It may also be useful to see how different models fit a single predictor:

12 of 18
Cubist Models For Regression

50
Rules
40 100 Committees
Median Home Value

Rules + 5 Neighbors
30
20
10

5 10 15 20 25 30 35

lstat

Figure 2: Different Cubist models for a single predictor.

> lstat <- trainingPredictors[, "lstat", drop = FALSE]

> justRules <- cubist(x = lstat, y = trainingOutcome)
> andCommittees <- cubist(x = lstat, y = trainingOutcome, committees = 100)

Figure 2 shows the model fits for the test data. For these data, there doesn’t appear to be much
of a improvement when committees or instances are added to the based rules.

5 Variable Importance
The modelTree method for Cubist shows the usage of each variable in either the rule conditions or
the (terminal) linear model. In actuality, many more linear models are used in prediction that are
shown in the output. Because of this, the variable usage statistics shown at the end of the output
of the summary function will probably be inconsistent with the rules also shown in the output. At
each split of the tree, Cubist saves a linear model (after feature selection) that is allowed to have
terms for each variable used in the current split or any split above it. Quinlan (1992) discusses a
smoothing algorithm where each model prediction is a linear combination of the parent and child
model along the tree. As such, the final prediction is a function of all the linear models from the
initial node to the terminal node. The percentages shown in the Cubist output reflects all the
models involved in prediction (as opposed to the terminal models shown in the output).
The raw usage statistics are contained in a data frame called usage in the cubist object.
The caret package has a general variable importance method varImp. When using this function
on a cubist argument, the variable importance is a linear combination of the usage in the rule

13 of 18
Cubist Models For Regression

conditions and the model.

For example:

> summary(modelTree)

Call:
cubist.default(x = trainingPredictors, y = trainingOutcome)

Cubist [Release 2.07 GPL Edition] Sat Dec 10 21:42:00 2016

---------------------------------

Target attribute `outcome'

Read 404 cases (14 attributes) from undefined.data

Model:

Rule 1: [88 cases, mean 13.81, range 5 to 27.5, est err 2.10]

if
nox > 0.668
then
outcome = 2.07 + 3.14 dis - 0.35 lstat + 18.8 nox + 0.007 b
- 0.12 ptratio - 0.008 age - 0.02 crim

Rule 2: [153 cases, mean 19.54, range 8.1 to 31, est err 2.16]

if
nox <= 0.668
lstat > 9.59
then
outcome = 34.81 - 1 dis - 0.72 ptratio - 0.056 age - 0.19 lstat + 1.5 rm
- 0.11 indus + 0.004 b

Rule 3: [39 cases, mean 24.10, range 11.9 to 50, est err 2.73]

if
rm <= 6.23
lstat <= 9.59
then
outcome = 11.89 + 3.69 crim - 1.25 lstat + 3.9 rm - 0.0045 tax
- 0.16 ptratio

Rule 4: [128 cases, mean 31.31, range 16.5 to 50, est err 2.95]

if
rm > 6.23
lstat <= 9.59
then

14 of 18
Cubist Models For Regression

outcome = -1.13 + 1.6 crim - 0.93 lstat + 8.6 rm - 0.0141 tax

- 0.83 ptratio - 0.47 dis - 0.019 age - 1.1 nox

Evaluation on training data (404 cases):

Average |error| 2.27

Relative |error| 0.34
Correlation coefficient 0.94

Attribute usage:
Conds Model

78% 100% lstat

59% 53% nox
41% 78% rm
100% ptratio
90% age
90% dis
62% crim
59% b
41% tax
38% indus

Time: 0.0 secs

> modelTree$usage

Conditions Model Variable

1 78 100 lstat
2 59 53 nox
3 41 78 rm
4 0 100 ptratio
5 0 90 age
6 0 90 dis
7 0 62 crim
8 0 59 b
9 0 41 tax
10 0 38 indus
11 0 0 zn
12 0 0 chas
13 0 0 rad

> library(caret)
> varImp(modelTree)

Overall
lstat 89.0
nox 56.0

15 of 18
Cubist Models For Regression

rm 59.5
ptratio 50.0
age 45.0
dis 45.0
crim 31.0
b 29.5
tax 20.5
indus 19.0
zn 0.0
chas 0.0
rad 0.0

It should be noted that this variable importance measure does not capture the influence of the
predictors when using the instance–based correction.

6 Exporting the Model

As previously mentioned, this code is a port of the command–line C code. To run the C code, the
training set data must be converted to a specific file format as detailed on the RuleQuest website.
Two files are created. The file.data file is a header–less, comma delimited version of the data
(the file part is a name given by the user). The file.names file provides information about the
columns (eg. levels for categorical data and so on). After running the C program, another text file
called file.models, which contains the information needed for prediction.
Once a model has been built with the R cubist package, the exportCubistFiles can be used to
create the .data, .names and .model files so that the same model can be run at the command–line.

7 Current Limitations
There are a few features in the C code that are not yet operational in the R package:

• only continuous and categorical predictors can be used (the original source code allows for
other data types)

• there is an option to let the C code decide on using instances or not. The choice is more
explicit in this package

• non–standard predictor names are not currently checked/fixed

• the C code supports binning of predictors

Many of these features will be implemented in the future.

16 of 18
Cubist Models For Regression

8 About the Cubist C Code and Our Approach

This section may be interesting or important to those of you who care about the implementation
(if you exist at all).
The cubist sources are written to take specific data files from the file system, pull them into
memory, run the computations, then write the results to a text file that is also saved to the file
system. The code makes use of a lot of global variables (especially for the data). The code has
been around for a while and, after reading it, one can tell that the author put in a lot of time to
catch many special cases. At Pfizer, we have pushed millions of samples through the non–GPL code
without any substantive errors.
So the approach here is to pass in the training data as strings that mimic the formats that one
would use with the command line version and get back the textual representation that would be
saved to the .model file also as a string. The prediction function would then pass the model text
string (and the data text string if instances are used) to the C code for prediction.
We did this for a few reasons. First, this approach would require us to re–write main() and touch
as little of the original code as possible (otherwise we would have to write a parser for the data
and try to get it into the global variable structure with complete fidelity). Second, most modeling
functions implicitly assume that the data matrix is all numeric, thus factors are converted to dummy
variables etc. Cubist doesn’t want categorical data split into dummy variables based on how it does
splits. Thus, we would have to pass in the numeric and categorical predictors separately unless we
want to get really fancy.

9 Session Information
• R Under development (unstable) (2016-10-26 r71594), x86_64-apple-darwin13.4.0
• Locale: C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
• Running under: OS X Mavericks 10.9.5
• Base packages: base, datasets, grDevices, graphics, methods, stats, utils
• Other packages: Cubist 0.0.19, caret 6.0-73, ggplot2 2.2.0, lattice 0.20-34, mlbench 2.1-1
• Loaded via a namespace (and not attached): MASS 7.3-45, Matrix 1.2-7.1,
MatrixModels 0.4-1, ModelMetrics 1.1.0, Rcpp 0.12.8, SparseM 1.74, assertthat 0.1,
car 2.1-4, codetools 0.2-15, colorspace 1.3-1, compiler 3.4.0, foreach 1.4.3, grid 3.4.0,
gtable 0.2.0, iterators 1.0.8, lazyeval 0.2.0, lme4 1.1-12, magrittr 1.5, mgcv 1.8-15,
minqa 1.2.4, munsell 0.4.3, nlme 3.1-128, nloptr 1.0.4, nnet 7.3-12, parallel 3.4.0,
pbkrtest 0.4-6, plyr 1.8.4, quantreg 5.29, reshape2 1.4.2, scales 0.4.1, splines 3.4.0,
stats4 3.4.0, stringi 1.1.2, stringr 1.1.0, tibble 1.2, tools 3.4.0

17 of 18
Cubist Models For Regression

10 References
Quinlan. Learning with continuous classes. Proceedings of the 5th Australian Joint Conference
On Artificial Intelligence (1992) pp. 343-348

Quinlan. Combining instance-based and model-based learning. Proceedings of the Tenth Interna-
tional Conference on Machine Learning (1993a) pp. 236-243

Quinlan. C4.5: Programs For Machine Learning (1993b) Morgan Kaufmann Publishers Inc. San
Francisco, CA

Wang and Witten. Inducing model trees for continuous classes. Proceedings of the Ninth European
Conference on Machine Learning (1997) pp. 128-137

http://rulequest.com/cubist-info.html

18 of 18

Cubist Regression Models in R
No ratings yet
Cubist Regression Models in R
18 pages
Lecture 9
No ratings yet
Lecture 9
14 pages
Predicting Health Insurance Costs Analysis
No ratings yet
Predicting Health Insurance Costs Analysis
43 pages
Econ 3750 HW2
No ratings yet
Econ 3750 HW2
9 pages
Lectures Week14 s25
No ratings yet
Lectures Week14 s25
21 pages
Linear Regression Model in R Guide
No ratings yet
Linear Regression Model in R Guide
5 pages
Statistical Analysis of Cube Measurements
No ratings yet
Statistical Analysis of Cube Measurements
6 pages
Econometric S Cheat Sheet
No ratings yet
Econometric S Cheat Sheet
3 pages
Econometrics Cheat Sheet
No ratings yet
Econometrics Cheat Sheet
4 pages
New Section 1
No ratings yet
New Section 1
39 pages
H-311 Linear Regression Analysis With R
100% (1)
H-311 Linear Regression Analysis With R
71 pages
Wisconsin Hospital Cost Analysis Report
No ratings yet
Wisconsin Hospital Cost Analysis Report
7 pages
Univariate Regression and Data Subsampling
No ratings yet
Univariate Regression and Data Subsampling
36 pages
Ba ZG512 Ec-2r First Sem 2024-2025
No ratings yet
Ba ZG512 Ec-2r First Sem 2024-2025
12 pages
Regression Analysis Guide
100% (7)
Regression Analysis Guide
7 pages
Slide 3 Linear Regression
No ratings yet
Slide 3 Linear Regression
27 pages
Babies Learning Language - Methods (05-06)
No ratings yet
Babies Learning Language - Methods (05-06)
2 pages
Quadratic Model Selection Guide
No ratings yet
Quadratic Model Selection Guide
3 pages
FE - Final Project Report【0809v2】
No ratings yet
FE - Final Project Report【0809v2】
12 pages
Health Care Panel Data Analysis
No ratings yet
Health Care Panel Data Analysis
58 pages
STAT22209 - Chapter 03-Multiple Regression - 2022
No ratings yet
STAT22209 - Chapter 03-Multiple Regression - 2022
41 pages
Linear Regression Analysis on Hospital Costs
No ratings yet
Linear Regression Analysis on Hospital Costs
9 pages
Multiple Regression Analysis Insights
No ratings yet
Multiple Regression Analysis Insights
20 pages
Advanced Statistical Modeling Exam 2004
No ratings yet
Advanced Statistical Modeling Exam 2004
23 pages
Project 1: Analyze The Healthcare Cost and Utilization in Wisconsin Hospitals
0% (1)
Project 1: Analyze The Healthcare Cost and Utilization in Wisconsin Hospitals
13 pages
Econometrics by Example Solution
89% (19)
Econometrics by Example Solution
122 pages
Econometrics by Example Solution PDF
No ratings yet
Econometrics by Example Solution PDF
122 pages
Regression Analysis with Hypothesis Testing
No ratings yet
Regression Analysis with Hypothesis Testing
6 pages
2017, Sem 5, Applied Econometrics
No ratings yet
2017, Sem 5, Applied Econometrics
27 pages
Josh's Optimal Bundle Analysis
No ratings yet
Josh's Optimal Bundle Analysis
5 pages
E-commerce Sales Forecasting with Regression
No ratings yet
E-commerce Sales Forecasting with Regression
5 pages
CS550 Regression
No ratings yet
CS550 Regression
62 pages
2 Modele Lineare
No ratings yet
2 Modele Lineare
43 pages
Understanding Heteroskedasticity in Regression
No ratings yet
Understanding Heteroskedasticity in Regression
12 pages
Non-Linear Regression Techniques Explained
No ratings yet
Non-Linear Regression Techniques Explained
50 pages
Final Report Non Comp
No ratings yet
Final Report Non Comp
14 pages
Model Specification in Econometrics
No ratings yet
Model Specification in Econometrics
52 pages
Pattern Recognition and Deep Learning Linear Models For Regression
No ratings yet
Pattern Recognition and Deep Learning Linear Models For Regression
39 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
32 pages
QM 3 Multiple Regression 1
No ratings yet
QM 3 Multiple Regression 1
48 pages
Lecture 4
No ratings yet
Lecture 4
57 pages
Upper East Side Housing Price Forecast
No ratings yet
Upper East Side Housing Price Forecast
43 pages
Inverter Data Regression Analysis
No ratings yet
Inverter Data Regression Analysis
6 pages
Advanced Statistical Methods in Data Science
No ratings yet
Advanced Statistical Methods in Data Science
100 pages
EC311 Slides Spring25 Week14 Part1
No ratings yet
EC311 Slides Spring25 Week14 Part1
19 pages
Panel Lecture - Gujarati
100% (1)
Panel Lecture - Gujarati
26 pages
Unit 3
No ratings yet
Unit 3
24 pages
Logistic Regression for Tumor Classification
No ratings yet
Logistic Regression for Tumor Classification
13 pages
Generalization Bounds and Representation Learning For Estimation of Potential Outcomes and Causal Effects
No ratings yet
Generalization Bounds and Representation Learning For Estimation of Potential Outcomes and Causal Effects
50 pages
Wage Determinants: Education & Experience
No ratings yet
Wage Determinants: Education & Experience
11 pages
Da Unit-Iii-2
No ratings yet
Da Unit-Iii-2
19 pages
Exam Solutions
No ratings yet
Exam Solutions
7 pages
Profitability Modeling with Linear Regression
100% (7)
Profitability Modeling with Linear Regression
5 pages
Simple Linear Regression Model Guide
No ratings yet
Simple Linear Regression Model Guide
6 pages
Linear Regression: Predicting Outcomes
No ratings yet
Linear Regression: Predicting Outcomes
20 pages
Multiple Linear Regression Guide
No ratings yet
Multiple Linear Regression Guide
3 pages
1 Homework 2: 1.1 Large Scale Data Analysis / Aalto University, Spring 2023
No ratings yet
1 Homework 2: 1.1 Large Scale Data Analysis / Aalto University, Spring 2023
12 pages
XI Commerce Holiday Home Work
No ratings yet
XI Commerce Holiday Home Work
11 pages
Services Ebrochure
No ratings yet
Services Ebrochure
19 pages
Distributed Database Systems (David Bell, Jane Grimson)
No ratings yet
Distributed Database Systems (David Bell, Jane Grimson)
426 pages
Additivemanufacturiing
No ratings yet
Additivemanufacturiing
13 pages
Reading Skills for ESL Learners
No ratings yet
Reading Skills for ESL Learners
2 pages
70 Volt PA System Installation Guide
No ratings yet
70 Volt PA System Installation Guide
4 pages
Aks
No ratings yet
Aks
13 pages
Smart Nav Sih Sih Template
No ratings yet
Smart Nav Sih Sih Template
7 pages
Electrical Engineer Resume Oman
No ratings yet
Electrical Engineer Resume Oman
4 pages
IronKey Teardown
No ratings yet
IronKey Teardown
16 pages
Trends and Issues in Virtual Communities
No ratings yet
Trends and Issues in Virtual Communities
11 pages
HVAC Engineering Cheat Sheet Guide
No ratings yet
HVAC Engineering Cheat Sheet Guide
2 pages
CAT Theory Study Guide-2020FINAL
50% (2)
CAT Theory Study Guide-2020FINAL
203 pages
Unit I Finite Automata 1. What Is Deductive Proof?
No ratings yet
Unit I Finite Automata 1. What Is Deductive Proof?
9 pages
Background Subtraction Algorithms Overview
No ratings yet
Background Subtraction Algorithms Overview
26 pages
Stochastic Consumer Behavior Models
No ratings yet
Stochastic Consumer Behavior Models
5 pages
S34 Blending Oil Template
No ratings yet
S34 Blending Oil Template
4 pages
150 Data Science Interview Q&A Guide
No ratings yet
150 Data Science Interview Q&A Guide
55 pages
UNIT 2 Day 1 Perception
No ratings yet
UNIT 2 Day 1 Perception
5 pages
SNRG 155 Lab 3 Solar Pathfinder Assistant Software W20
No ratings yet
SNRG 155 Lab 3 Solar Pathfinder Assistant Software W20
8 pages
Feature and Service State Overview
No ratings yet
Feature and Service State Overview
6 pages
Raspberry Pi Pico Gps Tracker
No ratings yet
Raspberry Pi Pico Gps Tracker
6 pages
KT
No ratings yet
KT
11 pages
Integration Summary
No ratings yet
Integration Summary
19 pages
SHS Qualitative Research
No ratings yet
SHS Qualitative Research
14 pages
eCos: Open Source Embedded OS Guide
No ratings yet
eCos: Open Source Embedded OS Guide
16 pages
HHW Class Vi
No ratings yet
HHW Class Vi
6 pages
Priority and Round Robin Scheduling in C
No ratings yet
Priority and Round Robin Scheduling in C
7 pages

Cubist

Uploaded by

Cubist

Uploaded by

Cubist Models For Regression

Max Kuhn (max.kuhn@pfizer.com)

December 10, 2016

Number of samples: 404

Cubist [Release 2.07 GPL Edition] Sat Dec 10 21:42:00 2016

Target attribute `outcome'

Read 404 cases (14 attributes) from undefined.data

Evaluation on training data (404 cases):

Average |error| 2.27

78% 100% lstat

Time: 0.0 secs

> mtPred <- predict(modelTree, testPredictors)

> ## Test set R^2

Cubist [Release 2.07 GPL Edition] Sat Dec 10 21:42:00 2016

Target attribute `outcome'

Read 404 cases (14 attributes) from undefined.data

tax > 222

lstat <= 5.12

dis <= 3.3175

Evaluation on training data (404 cases):

Average |error| 1.91

65% 99% lstat

Time: 0.0 secs

For this model:

> cmPred <- predict(committeeModel, testPredictors)

> instancePred <- predict(committeeModel, testPredictors, neighbors = 5)

RMSE (Cross−Validation) 4.0

committees neighbors RMSE Rsquared

Figure 2: Different Cubist models for a single predictor.

> lstat <- trainingPredictors[, "lstat", drop = FALSE]

conditions and the model.

Cubist [Release 2.07 GPL Edition] Sat Dec 10 21:42:00 2016

Target attribute `outcome'

Read 404 cases (14 attributes) from undefined.data

outcome = -1.13 + 1.6 crim - 0.93 lstat + 8.6 rm - 0.0141 tax

Evaluation on training data (404 cases):

Average |error| 2.27

78% 100% lstat

Time: 0.0 secs

Conditions Model Variable

6 Exporting the Model

• non–standard predictor names are not currently checked/fixed

• the C code supports binning of predictors

Many of these features will be implemented in the future.

8 About the Cubist C Code and Our Approach

You might also like