Professional Documents
Culture Documents
After RF, Bagging was the second strongest performer with 4-5% lower error on all 4 metrics than
Boosting and both SVM models, due to the bootstrapping and repeated training on subset data
methodology discussed above. The error for Bagging was 12% higher than RF, indicating a significant
advantage in the more sophisticated tree node functionality of RF. The loss function for Bagging was
20% higher than RF, and closer to Boosting and SVM than RF, most likely due to an exacerbation of the
model error discrepancy.
Boosting and the two SVM models had similar RMSE results yet differed on the remaining metrics, with
Boosting scoring higher on MAPE and MASE by 2%, but substantially lower on the loss function by 14%,
suggesting it is the preferable model to SVM models. Boosting is still a competitive model in comparison
to the remaining models from weeks 1 and 3 as it demonstrates competitive error and loss function
results. Both SVM models are identical in their performance, indicating no observable difference
between linear and polynomial kernels. Whilst the models have average error results in comparison to
the other models, only the regression models have higher loss function results indicating they would be
amongst the last to be recommended for predictive purposes in this context.
The Bagging model shows VA, Alc and Sulphates are the
most important variables in order to predict the quality
score of red wine. The variable’s magnitude changed
with the different number of bags chosen. It should be
noted the interpretation of the magnitude is not possible
with this model type. The Random Forest model shows
Alc, Sulphates and VA are the most important variables
in order to predict the quality score of red wine. The
Figure 5: VIP Bagging variable’s magnitude changed with the different number
Q4- Overall, the methods did not show amazing results in terms of the loss function as well as standard
predictive metrics. There are quite a few reasons why this may be the case.
Our output metric (QS) score is an integer and subjective. This presents a number of limitations on the
ability to model and predict this value. Firstly, as an integer the granularity of the output is reduced, thus
the variables being used to predict may show less sensitivity to a change in QS.
In our opinion, the subjectivity of the QS score is the major reason why the models trialled showed poor
predictability. QS score is based on a person’s opinion of the wine, and we know that opinions are based
on a number of other factors other than the wine itself. For example, the assessor’s mood, wine
preferences, the score of the previous wine and a host of other factors would influence this score. This is
compounded by the fact that the variables provided were all objective measures (chemical analysis) and
no other assessor variables were considered. There would be two ways to address this issue that may
result in better prediction. Firstly, the QS scores provided would need to be validated. This would mean
testing the inter-rater reliability of the score itself through metrics such as Pearson Coefficients or Kappa
Statistics. A simpler method would involve the QS being judged by more tasters and using a mean of the
score received. Otherwise, subjective factors of the assessor could be measured, collated and added
into the results, however, this presents its own issues in terms of statistical reliability.
Another issue that would affect the results is the data itself, and it’s skew away from high scoring wines.
Of the 1600 samples provided, only 18 score 8 or above and 199 score 7. This reduces the relative
sample size of predicting said scores which will result in a less robust understanding of variable
importance. Since scores of 7 and above are those that are most valuable to the wine maker, a more
robust data set with more wines scoring above 7 would be ideal. This becomes a greater issue in
processes such as bootstrapping as random sampling runs the risk of not all observations being used. In
addition to this, as evaluation metrics (loss function) entirely dependent on test data set of 200 data
points, bias in the overall data set this will influence the model evaluation. More representative data
points will increase model robustness and potentially result in better predictions.
Finally, in terms of the modelling tools used, the automatic and predictive models can lose
interpretation as they are highly sensitive to changes in parameters (e.g. learning rate has a huge impact
on frequency variables). This results in the models losing an element of intuition that is imperative. Each
of the models used have their own characteristics that make them more desirable in certain situations,
such as the type of Boosting loss function used (e.g. squared, Absolute, or Huber). It is important to
understand the overall objective of prediction and ensure the results are meeting said context.