You are on page 1of 3

Please note, figures are updated from the previous assignment as we went back through the code and

found some mistakes.


Q1 & 2 - Method Justification

Figure 1: Number of Bags vs Loss/MASE


Figure 2: Node size/ntree/Mtry vs Mase/Sales Loss

To determine optimal adjustable characteristics of


Boosting, Bagging and Random Forest methods, each
characteristic was altered to create a sensitivity
analysis. Based on the Figures 1-3, optimal
characteristics were chosen based on minimising the
loss function and MASE without causing excessive
processing time. It can be seen that 50 bags were the
Figure 3: Nu Learning Rate vs Mase/Sales Loss
optimal number for the model. Initially as the number
of bags increased both the loss function and MASE
decreased, however as more bags were used this fluctuated. Intuitively, this makes sense as more
randomness is introduced into the model, the MASE and loss function increases. As the variables of
Random Forest analysis interact, a number of different combinations found the optimal settings to
reduce error as highlighted by Figure 2. Figure 3 looks at learning rate for Boosting, with the inflection
point where significant diminishing returns are seen is 0.1. It is interesting to note that while learning
rate is increasing, we are seeing an improvement in error. Although this seems counterintuitive this is
due to a mismatch between outputs and the other variables.
Q3- The highest and lowest values are
coloured red and green respectively in
Figure 4. Random Forest (RF) displays the
lowest results across all the prediction
error metrics, indicating it is the
preferred model if the goal to minimise
model error. This result is explained
using multiple functions (bags) in a more
sophisticated manner than Bagging,
which reduces the amount of variance
through the process of bootstrapping
Figure 4. Prediction error and loss function for all models.
multiple times.
As described in Assignment 3, the asymmetric loss penalises underprediction 5 times as much as
overprediction. RF had the second highest loss function and is significantly lower than the other
ensemble methods of Bagging (2.316) and Boosting (2.431), whilst the SVM models were higher still on
2.821. RF would be the clear choice out of those models. Only KMeansReg has a lower loss function,
almost half that of RF. Knn is slightly higher than RF (2.07), whilst the remaining supervised and
unsupervised models range from 2.4 to the regression models which are all over 6. These results
indicate the regression models had substantially more underprediction than the models with similar
error.

After RF, Bagging was the second strongest performer with 4-5% lower error on all 4 metrics than
Boosting and both SVM models, due to the bootstrapping and repeated training on subset data
methodology discussed above. The error for Bagging was 12% higher than RF, indicating a significant
advantage in the more sophisticated tree node functionality of RF. The loss function for Bagging was
20% higher than RF, and closer to Boosting and SVM than RF, most likely due to an exacerbation of the
model error discrepancy.

Boosting and the two SVM models had similar RMSE results yet differed on the remaining metrics, with
Boosting scoring higher on MAPE and MASE by 2%, but substantially lower on the loss function by 14%,
suggesting it is the preferable model to SVM models. Boosting is still a competitive model in comparison
to the remaining models from weeks 1 and 3 as it demonstrates competitive error and loss function
results. Both SVM models are identical in their performance, indicating no observable difference
between linear and polynomial kernels. Whilst the models have average error results in comparison to
the other models, only the regression models have higher loss function results indicating they would be
amongst the last to be recommended for predictive purposes in this context.

The Bagging model shows VA, Alc and Sulphates are the
most important variables in order to predict the quality
score of red wine. The variable’s magnitude changed
with the different number of bags chosen. It should be
noted the interpretation of the magnitude is not possible
with this model type. The Random Forest model shows
Alc, Sulphates and VA are the most important variables
in order to predict the quality score of red wine. The
Figure 5: VIP Bagging variable’s magnitude changed with the different number

Figure 6 VIP Random Forest

Figure 7: VIP Boosting


of trees, node size and mtry. Similarly, to bagging, the interpretation of the magnitude is not possible
with this model type. The Boosting model shows Alc, VA and Sulphates as the three variables which
reduced the loss the greatest. Although VA had a lower frequency, the magnitude of loss reduction was
higher than Sulphates. Throughout the new models introduced in task 4, Alc Sulphates and VA were
consistently the most important physicochemical properties when predicting red wine’s quality score,
although the order changed between the different models.

Q4- Overall, the methods did not show amazing results in terms of the loss function as well as standard
predictive metrics. There are quite a few reasons why this may be the case.

Our output metric (QS) score is an integer and subjective. This presents a number of limitations on the
ability to model and predict this value. Firstly, as an integer the granularity of the output is reduced, thus
the variables being used to predict may show less sensitivity to a change in QS.

In our opinion, the subjectivity of the QS score is the major reason why the models trialled showed poor
predictability. QS score is based on a person’s opinion of the wine, and we know that opinions are based
on a number of other factors other than the wine itself. For example, the assessor’s mood, wine
preferences, the score of the previous wine and a host of other factors would influence this score. This is
compounded by the fact that the variables provided were all objective measures (chemical analysis) and
no other assessor variables were considered. There would be two ways to address this issue that may
result in better prediction. Firstly, the QS scores provided would need to be validated. This would mean
testing the inter-rater reliability of the score itself through metrics such as Pearson Coefficients or Kappa
Statistics. A simpler method would involve the QS being judged by more tasters and using a mean of the
score received. Otherwise, subjective factors of the assessor could be measured, collated and added
into the results, however, this presents its own issues in terms of statistical reliability.

Another issue that would affect the results is the data itself, and it’s skew away from high scoring wines.
Of the 1600 samples provided, only 18 score 8 or above and 199 score 7. This reduces the relative
sample size of predicting said scores which will result in a less robust understanding of variable
importance. Since scores of 7 and above are those that are most valuable to the wine maker, a more
robust data set with more wines scoring above 7 would be ideal. This becomes a greater issue in
processes such as bootstrapping as random sampling runs the risk of not all observations being used. In
addition to this, as evaluation metrics (loss function) entirely dependent on test data set of 200 data
points, bias in the overall data set this will influence the model evaluation. More representative data
points will increase model robustness and potentially result in better predictions.

Finally, in terms of the modelling tools used, the automatic and predictive models can lose
interpretation as they are highly sensitive to changes in parameters (e.g. learning rate has a huge impact
on frequency variables). This results in the models losing an element of intuition that is imperative. Each
of the models used have their own characteristics that make them more desirable in certain situations,
such as the type of Boosting loss function used (e.g. squared, Absolute, or Huber). It is important to
understand the overall objective of prediction and ensure the results are meeting said context.

You might also like