You are on page 1of 1

DR.

PETER ARNDT HEINRICH-HEINE-UNIVERSITÄT DÜSSELDORF


DR. KONRAD VÖLKEL WINTER TERM 2022/23
Machine Learning
Exercise Sheet 10
(3 Exercises, 100 Points)
Due: 20.12.2022, 10:00

Exercise 1: (25 Points)


Gradient boosting linear regression

We can use the gradient boosting approach with base models other than regression trees.
Explain under which circumstances gradient boosting linear regression (with another linear
regression predicting residuals) will work better or worse than just a single linear regression
on the data and explain why.
For the linear regression considered here, don’t use any kernels or base function extension
(so the model is just linear).

Exercise 2: (15 Points)


The number of trees hyperparameter

Explain what happens with a random forest classification model if, all other hyperparameters
kept fixed, the number of trees in the ensemble grows (in the limit to ∞).
Will the classification accuracy on a test set not used for training go up or down? What about
the variance?

Exercise 3: (60 Points)


Tree ensemble methods on penguins (programming task)

(see the notebook ml-forests-companion.ipynb for clues on preparing the dataset)


The goal is to implement both Random Forests and AdaBoost for classification and evaluate
the implementation by comparison with Scikit-Learn’s implementation on the Palmer Penguin
dataset, after finding suitable hyperparameters.
Proceed in these steps:
1. (5 points) Load and prepare the Palmer Penguin dataset for the classification task,
exclude the species ’Chinstrap’ from the data (to make it a binary classification problem).
2. (10 points) Find good hyperparameters for high accuracy using
sklearn.ensemble.RandomForestClassifier and
sklearn.ensemble.AdaBoostClassifier on the dataset prepared before.
3. (20+20 points) Using the sklearn.tree.DecisionTreeClassifier but none
of the sklearn.ensemble methods and classes, implement both Random Forest
and AdaBoost algorithms.
4. (5 points) Compare the performance of your own implementations against Scikit-learn
on the hyperparameters found previously. In case you see a notable difference, try to
explain that.
If you decide to implement only one of the two algorithms, you can still get up to 40 points
in this exercise.

You might also like