You are on page 1of 11

AIDI 1010 –

Introduction to Emerging Technologies

WEEK4
Jahanzeb Abbas (JB)
Week Objectives
 (A) auto-sklearn
 What is it?

 (B) auto-sklearn
 Pros/Cons
 How-to-Install
 Hyperparameters

 (C) Example
 (D) References

2
(A) auto-sklearn (What is it?)
 Auto-sklearn is an OOB solution that automatically searches for the correct learning
algorithm for a new ML dataset and also provides optimization of hyperparameters
 It was made by Matthias Feurer, et al. in 2014; available as open-source and all
details are provided on its GitHub page (https://github.com/automl/auto-sklearn)
 It is a replacement for scikit-learn classifiers; it has a total of 15 classification
algorithms; 14 feature preprocessing algorithms
 It is based on Bayesian optimization (chooses best hyperparameter)
 It assists with data scaling, encoding of categorical parameters and missing values
 It has a generic ML framework which has an efficient global optimization process; it
builds an ensemble of all models tested and speeds up the optimization process by
using ‘meta-learning’ to identify similar datasets and uses knowledge gathered in
the past
 It addresses regression, classification, and multi-label classification problems; check
additional examples (https://
automl.github.io/auto-sklearn/master/examples/index.html)

3
TANGENT: Ensemble Modeling
 Improve ML modeling by combining multiple models as
opposed to single models
 This modeling uses both “Voting” and “Stacking” ensemble
methods for combining models
 Voting is a concept that predicts based on weighted average of predicted
class probability (when it comes to classification) or predicted regression
targets (when it comes to regression)
 Stacking is a concept that combines heterogeneous models and trains a meta-
model based on the output from the individual models
 Heterogeneous models contain a set of classifiers of the different types built on
same data; use different feature selection methods with the same training data
 Homogenous models contain a set of classifiers of the same types built on
different data; use same feature selection method with different training data
distributing over several nodes

4
(B) auto-sklearn: Pros/Cons
PROS CONS
Best for small or medium datasets Performs very slow in large datasets

Creates a pipeline to optimize a prediction Cannot produce modern deep learning systems

Components: Meta learning (used to initialize


parameters) & Bayesian Reasoning to evaluate auto
collection construction of configuration
(hyperparameters) during optimization process &
ensemble construction

5
(B) auto-sklearn: How-To-Install / Syntax
!pip install auto-sklearn

import autosklearn.classification
import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics

X, y = sklearn.datasets.load_digits(return_X_y=True)

X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, random_state=1)

automl = autosklearn.classification.AutoSklearnClassifier()

automl.fit(X_train, y_train)

y_hat = automl.predict(X_test)

print("Accuracy score", sklearn.metrics.accuracy_score(y_test, y_hat))

6
(B) auto-sklearn: Hyperparameters
Parameter name Default value Description

load_models True Show the models after fitting or not


time_left_for_this_task 3600 It shows how many seconds are left for the task. If you increase it, the chance for better performance will be increased as
well.

per_run_time_limit None It shows how many seconds each ML model should spend.

initial_configurations_via_metalearning 25 How many configurations via meta-learning considers hyperparameter optimization. If set 0, this option will be inactive.
Also, this parameter is not available in the auto-sklearn V2.

ensemble_size 50 The number of the models in the ensemble. To disable this parameter, set it to 1.

n_jobs 1 The number of parallel jobs. For using all processors, set it -1
ensemble_nbest 50 Number of best models for building an ensemble model. Only works when ensemble_size is more than one.

include_estimators None It will use all estimators when there is None. Not available in auto-sklearn V2.

exclude_estimators None You can exclude some estimators from the search space. Not available in auto-sklearn V2.

Metric None If you don’t define a metric, it will be selected based on the task. In this article, we define it (autosklearn.metrics.roc_auc).

7
(B) auto-sklearn: Example

8
Disclaimer
Due to nature of the course, various materials have been compiled from different open source
resources with some moderation.
The course designer (slides creator), sincerely acknowledges their hard work and contribution,
credit will be given wherever necessary

9
References
1. https://neptune.ai/blog/a-quickstart-guide-to-auto-sklearn-automl-for-machine-learning-practitioners
2. https://automl.github.io/auto-sklearn/master/
3. https://github.com/automl/auto-sklearn
4. https://www.automl.org/automl/auto-sklearn/
5. https://youtu.be/jn-22XyKsgo
6. https://analyticsindiamag.com/complete-guide-to-using-autosklearn-tool-for-faster-machine-learning-implementations/
7. https://www.alibabacloud.com/blog/6-top-automl-frameworks-for-machine-learning-applications-may-2019_595317
8. https://www.kaggle.com/soham1024/know-about-different-automl-frameworks?scriptVersionId=43751621#Pros-and-cons
9. https://www.kaggle.com/soham1024/know-about-different-automl-frameworks?scriptVersionId=43751621#Auto-Sklearnhttps://
www.kaggle.com/soham1024/know-about-different-automl-frameworks?scriptVersionId=43751621#Auto-Sklearn
10. https://www.hindawi.com/journals/mpe/2013/312067/tab2/
11. https://stackoverflow.com/questions/49445446/homogeneous-vs-heterogeneous-ensembles
12. https://adamnovotnycom.medium.com/google-colab-and-automl-auto-sklearn-setup-2dff936372e6
13. https://machinelearningmastery.com/auto-sklearn-for-automated-machine-learning-in-python/
14. https://youtu.be/K490SP-_H0U
15. https://archive.ics.uci.edu/ml/datasets/wine+quality

10
Thank You Very Much

Any Questions?
11

You might also like