You are on page 1of 12

An Empirical Study on the Usage of Automated

Machine Learning Tools


Forough Majidi† , Moses Openja† , Foutse Khomh† , Heng Li†
{forough.majidi, openja.moses, foutse.khomh, heng.li}@polymtl.ca

Polytechnique Montréal, Montréal, Quebec, Canada

Abstract—The popularity of automated machine learning (Au- introduced for automating model management, data labeling,
toML) tools in different domains has increased over the past few and optimizing hyperparameters.
years. Machine learning (ML) practitioners use AutoML tools to
The popularity of AutoML tools has attracted the attention
arXiv:2208.13116v1 [cs.SE] 28 Aug 2022

automate and optimize the process of feature engineering, model


training, and hyperparameter optimization and so on. Recent of researchers [8]–[10]. For instance, Xin et al [8] conducted
work performed qualitative studies on practitioners’ experiences a qualitative study to understand the use of AutoML tools in
of using AutoML tools and compared different AutoML tools practice by examining users’ experience when using AutoML
based on their performance and provided features, but none of
the existing work studied the practices of using AutoML tools in
tools, the integration of AutoML features in ML workflows
real-world projects at a large scale. Therefore, we conducted and the challenges faced when using the features of the
an empirical study to understand how ML practitioners use AutoML tools in ML workflow. They suggested that AutoML
AutoML tools in their projects. To this end, we examined the tools should support the interactive user interface design to
top 10 most used AutoML tools and their respective usages in explore the intermediate execution (i.e., “human-in-loop”).
a large number of open-source project repositories hosted on
GitHub. The results of our study show 1) which AutoML tools are
Also, through a qualitative study, Wang et al. [9] studied how
mostly used by ML practitioners and 2) the characteristics of the much automation is required by data scientists and reported
repositories that use these AutoML tools. Also, we identified the that designing a “human-in-the-loop” AutoML tool is better
purpose of using AutoML tools (e.g. model parameter sampling, than a tool that Completely automates the ML works [9].
search space management, model evaluation/error-analysis, Data/ Drozdal et al. [10] studied what impacts the trust in the ML
feature transformation, and data labeling) and the stages of
the ML pipeline (e.g. feature engineering) where AutoML tools
models built from using AutoML tools, through qualitative
are used. Finally, we report how often AutoML tools are used studies. They found that the non-functional requirements such
together in the same source code files. We hope our results as model performance metrics, visualization, and transparency
can help ML practitioners learn about different AutoML tools increase the confidence of data scientists in AutoML tools.
and their usages, so that they can pick the right tool for their
purposes. Besides, AutoML tool developers can benefit from our Although prior works performed qualitative studies to un-
findings to gain insight into the usages of their tools and improve derstand ML practitioners’ experiences of using AutoML
their tools to better fit the users’ usages and needs. tools, it remains unclear how real-world projects use AutoML
Index Terms—Empirical, Automated machine learning, Au- tools in their ML pipelines. Our study attempts to complement
toML tools, GitHub repository the existing studies through large-scale analysis of real-world
open-source projects that use AutoML tools, aiming to get
I. I NTRODUCTION new empirical evidence and insights on the use of AutoML
tools. Specifically, we extracted the information of the projects
Due to the growth of available data, the usage of machine
that use AutoML tools from GitHub and followed a mixture
learning (ML) has increased dramatically over the past few
of both qualitative and quantitative analyses. We organize our
years and a lot of business communities and researchers are
study through the following research questions (RQs):
using ML to analyze their data and reach their goals [1]. ML
practitioners usually have to follow a set of repetitive tasks • RQ1: What are the most used AutoML tools? We
including data collection and data preprocessing, feature engi- examined the usage of AutoML tools in GitHub projects and
neering, hyperparameter optimization, model training, evalu- we identified the top 10 most used AutoML tools: Optuna,
ating the performance of the resulting models, and deploying HyperOpt, Skopt, Featuretools, Tpot, Bayes opt, Autokeras,
and monitoring the models in the production environment. As Auto-sklearn, AX, and Snorkel. Then, we analyzed the char-
a result, a new field of automated machine learning (AutoML) acteristics of the projects that use these tools, including their
has emerged over the past few years to automate and reduce development history and popularity. ML practitioners can learn
the effort and time consumed by these repetitive tasks [2], such from our findings about which AutoML tools are mostly used
as model training/ retraining and hyperparameter optimization and their characteristics when searching for the right tools for
in the ML pipeline. For example, Tpot is an AutoML tool their ML pipeline. Besides, our findings can help AutoML tool
that uses genetic algorithm to optimize the ML pipeline. In developers gain insights into the projects that use their tools,
addition, a lot of AutoML tools such as Autosklearn [3], which could allow them to make better decisions to further
[4], Autokeras [5], Snorkel [6], and Optuna [7] have been improve their tools.
• RQ2: How do ML practitioners use AutoML tools? hinder the training of the ML model. The stage of data
Through manual analysis, we examined the purposes of using labeling refers to assigning the proper labels to the data, such
AutoML tools and the stages of the ML pipeline where they as assigning labels to a set of images reflecting the name of
are used. We observed that AutoML tools are used mainly the objects present in the image.
during the Hyperparameter optimization, Model training, and In the stage of feature engineering, ML practitioners pre-
Model evaluation stages of the ML pipeline. We also observed pare (e.g., transforming or normalizing the features/data) and
10 main purposes of using AutoML tools, such as Hyperpa- validate the features that should be used in the model training
rameter optimization, Model management, Data management, phase. The model training phase refers to training the ML
and Visualization. ML practitioners can save time and effort by models using the prepared features and the different imple-
using AutoML tools to automate their ML pipeline tasks [2] mented ML algorithms. The result from this stage is a trained
for similar purposes in similar stages of ML pipeline. ML model used to predict the label of new incoming data. In
• RQ3: Are different AutoML tools used together? We the model evaluation stage, the predictive performance of the
analyzed the source code of the projects using AutoML trained ML models is evaluated on a validation dataset.
tools to examine whether AutoML tools are used together in The model deployment stage is where the evaluated ML
the same source code files. We found that, among the top model and the entire workflow are exported for deployment.
10 AutoML tools, only a few of them were used together, Finally, The model monitoring stage is where the model
including Hyperopt being used together with other tools, such behavior is observed in the production environment to ensure
as Tpot, Skopt, Optuna. Our findings can help ML practitioners that the model is doing what is expected to do.
with the needs for heterogeneous AutoML features choose
the combination of tools that are mostly used together by

Requirements
other ML practitioners. In addition, developers of AutoML

Engineering

Deployment

Monitoring
Evaluation
Collection

Cleaning

Training
labeling

Feature
Model

Model

Model
Model

Model
tools can leverage our findings to provide APIs to make the

Data

Data

Data
collaboration between their tool and other co-used tools easier.
Organization: The rest of the paper is organized as follows.
In Section II, the background about the AutoML tools and ML
pipeline is provided. In Section III, we provided a summary of
the important related works. In Section IV, the experimental
setup of the study is explained. The results of the research Fig. 1. The stages of the ML pipeline based on [11].
questions are presented in Section V. A discussion of these
results is presented in Section VI. Section VII discusses threats B. Automated machine learning (AutoML)
to the validity of our study. In Section VIII, we conclude the The primary goal of automation in ML started from reduc-
paper and outline some avenues for future works. ing human effort in the loop, specifically during the hyperpa-
rameter tuning and model selection stage [14]. However, the
II. BACKGROUND success of hyperparameter tuning and model selection also
AutoML tools are used to automate one or more stages (e.g., depends on other steps, such as data cleaning and preparation
hyperparameter optimization) in the ML pipeline. This section or feature engineering [14]–[17]. More recently, AutoML has
provides background information on a typical ML pipeline been broadly used in automating multiple steps of the machine
and AutoML tools. In this article, the word “ML practitioner” learning workflow, from data collection and preparation to the
refers to anyone who uses AutoML or ML in their project or deployment and monitoring of the ML model in production.
work. An ML practitioner can be a developer, a researcher, a A variety of AutoML tools have been proposed to automate
data scientist, or an ML engineer. the computational work of building an ML pipeline. Some of
A. Machine learning pipeline these tools are built upon widely used ML frameworks. For
example, Auto-Sklearn [18], [19], and TPOT [20], [21] are
Figure 1 shows the nine stages of a typical ML pipeline built on scikit-learn [22]. These AutoML tools focus largely on
for building and deploying ML software systems used at supporting data preparation [14], [16], [23], feature engineer-
Microsoft, described by Amershi et al. [11]. We consider it ing [24], hyperparameter tuning [25], model selection [14],
as a reference ML pipeline in our study. Similar ML pipelines [16], [26], and end-to-end development support for machine
are also adopted by other commercial companies, such as learning software systems [27].
Google [12], or IBM [13]. In the following, we describe the
different stages of the ML pipeline mentioned in [11]. III. R ELATED W ORKS
The model requirements stage is where ML practitioners The related research papers are categorized into two groups
select the type of ML model that fits their problem and and discussed in this section.
products. In data collection stage, ML practitioners identify
the data from different sources to collect, such as database A. Qualitative studies on the usage of AutoML tools
systems or file storage. The stage of data cleaning includes Prior work performs qualitative studies (e.g., through in-
cleaning the data to remove any anomalies that would likely terviews) to understand practitioners’ experiences of using
AutoML tools [8]–[10], [15], [28]–[30]. For example, Xin Experimental Setup (IV)

et al. [8] investigated the use of AutoML tools through AutoML tools AutoML tools
Extracting the
Con gurations (CN) of
Search for projects Download source
from website studied in research importing CN code les for PN
semi-structured interviews with AutoML users of different AutoML tools

categories and expertise. They argued that although AutoML Collecting GitHub projects (PN)
Identifying AutoML
1 2 Filtering AutoML tools 3 using AutoML tools
tools increase productivity for experts and make ML accessible tools
and their metadata

to beginners, designing completely automated ML tools is


Results (V)
impractical, and designers of AutoML tools should focus on
improving the collaboration between ML practitioners and RQ1 RQ2 RQ3

AutoML tools. Similarly, Wang et al. [9] argued that there is Extracting the Identifying the purpose of
Examine the usage of
no need for tools that are fully automated, instead, developing function calls and
nding the most
Analyze characteristics
of the projects
Manual analysis of
functions
using AutoML tools
and the stages of
different AutoML tools in the
same source code le
used AutoML tools the ML pipeline
AutoML tools that are explainable and that integrate people fi fi fi fi

in the ML workflow loop is more important. Crisan et al. [28]


observed that using AutoML tools in real practices is not easy Fig. 2. An overview of our study design. Cn: configurations of a given
AutoML tool (e.g., ”import+tpot”), Pn: GitHub project that are using AutoML
and suggested that one way to facilitate the interaction between (i.e., containing the source code file that imports atleast one of the AutoML
humans and AutoML tools is data visualization. Similarly, configuration), IV and V correspond to the Section indices.
Drozdal et al. [10] found that visualization and inclusion
of transparency features build trust in users. Also, they rec- In this initial step of our methodology, the AutoML tools
ommended that AutoML tools should be easily customized were identified from two sources: research papers and Google
for enabling users to apply their preferences. Xanthopoulos search. The two sources allow us to cover a broader range
et al. [30] defined some criteria to evaluate the features of of AutoML tools proposed in the research publications and
AutoML tools. They concluded that the performance does not industry. In the following, we describe the identification steps
guarantee the success of an AutoML tool because there are from each data source.
other important factors that should be taken into consideration, • AutoML tools from the research papers:
such as the ease of use and the interpretability of the results. To extract the list of AutoML tools mentioned in research
papers. We used the search keywords “automated” AND
B. Comparison of AutoML tools “machine learning” to extract the scholarly literature related to
Ferreira et al. [31] studied eight AutoML tools and com- AutoML on the Google scholar [33] and Engineering Village
pared them by conducting computational experiments based (using Inspect and Compendex databases) [34] platforms. We
on General Machine Learning (GML), deep learning, and limited our search to the literature published between 2010
XGboost scenarios. The results show that current GML Au- to 2021. Then, we examine the first 10 research papers from
toML tools can produce close or even better predictive results each of the two sources to collect the list of AutoML tools,
than human-created ML models. Truong et al. [2] reported as the search results of both sources are sorted based on the
the different stages of the ML pipeline covered by some papers’ relevance with the search keywords. Specifically, we
specific AutoML tools. Then, they examined the performance were interested in the open-source AutoML tools that host
of the AutoML tools on different datasets. They found that their source codes on GitHub to allow us to study in detail
none of the studied AutoML tools perform better than others. their usage. In total, we extracted 31 open-source AutoML
They suggest that AutoML tool developers should consider tools from the research papers.
leveraging ideas from other AutoML tools to improve their • AutoML tools from Google search:
tools. We used Google search to identify the names of open-
The previous works studied 1) the usage of AutoML tools source AutoML tools mentioned across different websites. We
by conducting qualitative studies such as interviews, 2) the tried different keywords and finally settled for the keywords:
comparison between AutoML tools. However, no work studied “(automl OR “automated machine learning”) AND tools AND
how ML practitioners use AutoML tools in real-world projects (GitHub OR list OR curated)” and extracted all the possible
at a large scale. In this work, we attempt to fill this gap by AutoML tools from the first 100 resulting websites. We created
investigating the usage of AutoML tools in real-world GitHub a list of open-source AutoML tools that have a GitHub
projects. We aim to provide insights for ML practitioners and repository in the next step. One of the results was the awe-
AutoML tool providers to improve the usage and design of someopensource [35] website that contains 426 AutoML open-
AutoML tools. source projects. However, extracting all the possible AutoML
tools listed on the website requires significant manual effort.
IV. E XPERIMENTAL S ETUP Therefore, to get the most out of these tools listed, we limited
This section describes the methodology followed to answer our criteria to the most popular AutoML tools listed on the
the proposed research questions. Our study design consists awesomeopensource [35] website by considering the AutoML
of a mixture of qualitative and quantitative (i.e., a sequential tools with more than 1,000 stars. In total, we collected the
mixed-methods [32]) approaches. Figure 2 show an overview names of 95 AutoML tools from Google search.
of our study design explained in the following paragraphs. 2 Filtering AutoML tools:
1 Identifying AutoML tools: After extracting the lists of names of AutoML tools from
research papers and Google search results, we merged the we removed the corresponding source code files path of
two lists and removed the duplication. Then, using GitHub the not-real projects (2,568 source code file paths removed).
API [36] we extracted the repository details of each of these Furthermore, following the best practices established by pre-
AutoML tools on November 24, 2021. For each AutoML tool’s vious studies [39]–[43], we removed the repositories and the
repository, we collected the following metadata: the number corresponding source code file paths of the forked projects.
of contributors, the number of releases, description, number This step ensures that the studied projects are not copies of
of forks, size of the repositories, creation date, last update, another project but the mainline projects (190 source code file
fork status, programming languages, number of stars, number paths removed). Finally, 21,981 projects and 55,863 source
of commits, and last commit date. We removed the tools with code file paths are retained in our dataset. Then, using GitHub
only one contributor (8 tools removed) or zero stars (1 tool API, we extracted each repository’s metadata, such as the
removed). Also, using the last update date of the tools, we number of contributors, number of commits, description, forks,
removed the tools that have not been active since 2019 or repository size, creation date, update date, last commit date,
earlier (12 tools removed) because we are interested in the fork status, and number of stars. We collected the repositories’
tools that are still under development due to the growth of details on December 5th and 6th. Finally, we downloaded the
the technology. Also, we manually checked all the remaining source code files using the collected paths on December 8,
tools to verify that they are real AutoML tools. At the end of 2021, for further analysis.
this step, we remained with 57 AutoML tools. The rest of the data analysis is described in the results
3 Collecting GitHub projects using AutoML tools: Section V. We share the replication package of this study
To find the projects that use open-source AutoML tools, in [44].
we collected the configurations of each AutoML tool identi-
fied in the previous step by examining their documentation, V. R ESULTS
GitHub repository, and sample source code. For example, A. RQ1: What are the most used AutoML tools?
‘import+optuna’ and ‘from+optuna’ are the con-
figurations that we found for Optuna. We only focused on 1) Motivation: Although AutoML tools help ML experts
the tools for which the configurations are in the Python and non-expert practitioners save effort and time by automat-
programming language because Python is known as one of ing the repetitive tasks [2], choosing the right AutoML tool
the main programming languages for building ML models and is still a challenging task [8], [10], [15], [45]. The goal of
ML software projects [37], [38]. Therefore, we removed the RQ1 is to highlight the most used AutoML tools and provide
tools for which the configurations are not written in Python detailed information about the projects using these tools.
and the tools for which we could not find the configurations This information can help ML practitioners select appropriate
in their documentations (3 tools removed). AutoML tools for their projects. Besides, the insights gained
Next, using the AutoML configurations as keywords, we from the projects using the tools can help AutoML tool’s
searched GitHub API for the source codes and the respec- developers improve their tools in future releases.
tive GitHub projects that import the configuration. For ex- 2) Approach: To find the most used AutoML tools, we
ample, we searched for the keywords ‘from+tpot’ and collected the AutoML function calls in the GitHub projects.
‘import+tpot’ in GitHub API to find repositories and Also, we analyzed the characteristics of the GitHub projects
the corresponding source codes that use the Tpot AutoML that are using AutoML tools.
tool. Then, we collected the paths to 97,815 source code files Collecting function calls of AutoML tools in GitHub
that contain at least one configuration of an AutoML tool and projects: To understand the popularity of the AutoML tools,
the name of the corresponding repositories (33,568 repository we analyzed the source code files of the GitHub projects
names collected) on December 4th and 5th, 2021. In the third that use AutoML tools and collected the calls that invoke
step, we removed duplication from the resulting repositories the functions of each AutoML tool. Usually, a function is
(10,985 duplicates removed) and the source code files (39,194 designed to perform a specific kind of task, whereby the name
duplicates removed). of that function often reflects that task. Therefore, calling
Some repositories that use AutoML tools are not real such a function within a code invokes the tasks specified
projects, such as the course assignment repositories. We are in the function. Invoking more functions of an AutoML tool
not interested in these projects because they may not reflect indicates the tool is more engaged in a project. Thus we use
the actual use of AutoML tools in real projects. Therefore, we the number of function calls to represent the popularity of
identified the keywords that may reflect non-real projects. To an AutoML tool. The following steps are taken in order to
do so, we randomly sampled 378 repositories from our dataset extract the function calls from the source code files: Step 1) All
with a 95% confidence level and 5% confidence interval. We Python notebooks files are converted to python file using the
manually analyzed these repositorires and identified keywords nbconvert [46] and nbformat [47] Python libraries. Step 2) The
includeing “assignment”, “book”, “chapter”, “tutorial”, and Python2 source code files are converted to Python3. Step 3)
“course” (case insensitive). We used these keywords and The syntax errors are resolved by removing the lines of source
removed the repositories with at least one of these keywords code files that start with !, @, %, $, install, pip, conda. Step 4)
appearing in their name (602 repositories removed). Also, The imported AutoML tools’ libraries and their corresponding
imported functions are extracted from each source code file
30000

using the abstract syntax tree (AST) Python library [48]. Step
5) The AST Python library is used to extract the function

Number of function calls


20000 area
calls related to the imported AutoML tools’ libraries and the Others
Top10

corresponding imported functions from the source code files.


10000
Step 6) The functions that are:a) directly mentioned in the
code, b) called in the right side of an assignment, c) called in
the if, while, for loop, try, with, list, assert, return, tuple, unary 0

Optuna
Hyperopt
Skopt
featuretools
Tpot
Bayes_opt
Autokeras
Autosklearn
Ax
Snorkel
Dragonfly
Pycaret
Autogluon
nni
Keras−tuner
Learn2learn
Evalml
FEDOT
Hypernets
MOE
Auto−PyTorch
Once−for−all
NASLib
Pytorch−meta
Adanet
Hyperoptsklearn
Hyperparam_hunter
Aw_nas
Merlion
Determined
FLAML
MLBox
AutoDL
Archai
Mindsdb
Optunity
FAR−HO
HPOlib2
Oboe
Hypertunity
Auptimizer
Forecasting
Devol
Autobot
Autoai
AutoGL
Igel
Runai
Ludwig
Mljar
Automl
ATM
AutoFolio
BTB
Gama
Libra
SMAC3
operation statements, d) super functions of a class in which
the parent is an AutoML tool are extracted. After collecting AutoML Tools

the function calls from all source code files, we counted the
total number of calls of the functions of each AutoML tool. Fig. 3. AutoML tools and the number of function calls.
Then, we sorted the AutoML tools based on the number of Age commits Contributors

total function calls among all projects in our dataset. 10.0

600
Analyzing the characteristics of GitHub projects us- 20 7.5

ing AutoML tools: To understand the characteristics of the 400


5.0
projects that use AutoML tools, we analyzed the details of 10

the projects that use AutoML tools, such as the number of 200
2.5
Metrics
contributors, number of forks, size, age, star, and commits. Age
0 0 0.0
commits
3) Results: Figure 3 shows the total number of function

Value
Age commits Contributors
Contributors
Forks size stars
calls for each AutoML tool. As seen in Figure 3, the top 3e+05 Forks
size
10 AutoML tools with the most number of function calls 7.5
20
stars

are specified in green color. The percentage of function calls 2e+05


15

that are associated with the top 10 most used tools (i.e., 5.0

10
18% of the 57 selected AutoML tools) is 68%; these top 2.5
1e+05

tools include Optuna, HyperOpt, Skopt, Featuretools, Tpot, 5

Bayes opt, Autokeras, Auto-sklearn, AX, and Snorkel. The 0.0 0e+00 0
Forks size stars
details of each of these top 10 most used AutoML tools are Project's Metrics

provided in Table I. As seen in Table I, the most used AutoML


tools are popular based on the number of forks and number of Fig. 4. The summary characteristics of the projects that use AutoML tools in
stars. Also, they are 67.1 months old in average, which means terms of their total Age in months (i.e., the difference between the creation
date and the latest commit date), number of commits, number of contributors,
that they are not young AutoML tools and it took them years number of forks, project’s size based on lines of code, and the number of
to become popular. stars in each project.

The 18% (10 out of 57) most used AutoML tools (e.g., B. RQ2 : How do ML practitioners use AutoML tools?
Optuna) account for 68% of all function calls. 1) Motivation: ML practitioners may use AutoML tools for
different purposes across their ML pipeline. We analyzed the
Furthermore, the profile of the projects that are using stages of the ML pipeline in which AutoML tools are used and
AutoML tools is shown in Figure 4. For clear visualization, the purposes of using AutoML tools to help AutoML tools’
we removed the outliers within the last 5 percent of data in developers gain more insights on the usage of their tool and
each metric and showed the distribution of 95% percent of the enable them to improve their tools in a more efficient way.
data in Figure 4. As seen in Figure 4, 75% of the repositories Understanding the stages of the ML pipeline in which the
that are using AutoML tools are less than five months old AutoML tools are most popular can help developers of the
and have fewer than 65 commits, two contributors, two forks, tools find the shortcomings of their tools and help improve
35,753 lines of code, two stars. On the other hand, the projects their tools efficiently. Besides, our results can provide insights
that use AutoML tools also include some projects that are very for ML practitioners to better leverage AutoML tools in
mature (up to 10 years in age), popular (with up to 102,983 different stages of their ML pipeline and for different purposes.
stars), with many contributors (with up to 433 contributors),
and large in size (with up to 17,453,577 lines of code). 2) Approach: To understand the purposes of using AutoML
tools and the stages of the ML pipeline where they are being
The most used AutoML tools are popular tools accord- used, we used the extracted function calls in Section V-A3
ing to their number of stars and forks. Furthermore, and labeled them. To reduce the required time and effort for
most of the projects that use AutoML tools are young manual labeling, we only focused on the function calls of the
and not popular (based on the number of stars). top 10 most used AutoML tools (identified in RQ1). Besides,
we focused on the function calls that are most commonly
TABLE I
S UMMARY STATISTICS OF THE TOP 10 AUTO ML TOOLS .
Function: Number of function calls, S-code: Number of unique source files importing the tool. Project: Number of unique projects importing the tool, Size:
Size of the tool. Stars: Total number of stars. Age: Age of the tool (i.e., the difference between latest commit date and the creation date). Forks: Number of
forks associated to the tool. Releases: The total number of releases associated to the tool. Descriptions: The description of the tool.

AutoML repository (short name) Function S-code Project Size Stars Age Forks Release Descriptions
1 optuna/optuna (Optuna) 34,916 5,482 3,140 13,577 5,553 45 602 41 A software framework for automatic hyperarameter optimization [49]
2 hyperopt/hyperopt (Hyperopt) 20,903 5,271 4,432 6,059 5,955 122 925 0 A library to optimize hyperparameter [50]
3 scikit-optimize/scikit-optimize (Skopt) 20,313 2,807 1,830 9,423 2,238 68 424 23 ”Sequential model-based optimization” [51]
4 Featuretools/featuretools (Featuretools) 14,786 1,053 548 5,888 5,864 50 769 94 A library to automate feature engineering [52]
5 EpistasisLab/tpot (Tpot) 13,725 2,893 1,796 7,9030 8,349 72 1,437 27 A library to optimize machine learning pipeline [53]
6 fmfn/BayesianOptimization (Bayes opt) 11,582 3,128 2,605 23,290 5,543 89 1,222 7 ”Global optimization with gaussian processes” [54]
7 keras-team/autokeras (Autokeras) 10,331 1,259 550 44,487 8,241 48 1,334 53 ”An AutoML system based on Keras” [55]
8 automl/auto-sklearn (Auto-sklearn) 8,909 974 543 7,4993 5,870 76 1,092 29 An AutoML toolkit based on sklearn [56]
9 facebook/Ax (AX) 8,442 722 213 283,337 1,649 33 178 23 A adaptive platform for adaptive experiments [57]
10 snorkel-team/snorkel (Snorkel) 8,237 853 264 292,468 4,922 68 797 14 A tool for generating training data [6]

used in different projects. Thus, from the set of all unique in machine learning and software engineering. The primary
function calls of the selected AutoML tools, we compute their reference during the labeling is the official documentation
total frequency and Shannon entropy value, then choose the of the functions from the tool websites and the publications
functions that have more than 80% of 1) the Shannon entropy associated with the AutoML tools. For example, the func-
value and 2) the total frequency for manual labeling. In the tions ‘autokeras.TextClassifier.evaluate’ [60]
following, we explain 1) the process of selecting functions and ‘autokeras.ImageRegressor.evaluate’ [61]
based on Shannon entropy and usage frequency and 2) the of AutoKeras are used to evaluate the best model and their
process of manual labeling. purposes were labeled as ‘Model evaluation’. The functions
Selecting functions based on Shannon entropy and usage related to data formatting, generating statistics of the input
frequency: We used the normalized Shannon’s entropy [58], data, or cleaning is assigned the stage of ‘Data cleaning’.
[59] to compute the distribution of the function calls across During the first iteration of labeling, the first author labeled
the projects: the functions of Optuna, Skopt, Featuretools, and Ax, and the
second author labeled the functions of the remaining six tools.
i=n
X In the Second iteration, all the initial labels were combined,
Hn (F ) = − pi ∗ logn (pi ) (1) and the first two authors underwent through each label together
i=1 and subsequently further discussed the labels that were not
In Equation (1), F is a function that we want to measure agreed upon until a consensus was achieved. A random sample
its Shannon’s entropy value, n is the total number of unique of 238 functions was selected with a confidence level of 95%
projects, i is the unique number of projects, pi is the proba- and a confidence interval of 5% for further validation by the
Pi=n
bility of the use of function F (pi ≥ 0, and i=1 pi = 1) in a other two authors (not involved in the initial labeling). They
given project i. pi is calculated by dividing the total number of agreed on 98% of the stages of the functions and 96% of the
occurrence of function F in the project i by the total number purposes of the functions.
of occurrence of function F in all the projects. If all the unique 3) Results: Table II details the stage of the ML pipeline
projects have the same probability of using function F (i.e., (i.e., the table columns) in which ML practitioners use Au-
pi = n1 , ∀i ∈ 1, 2, .., n), the Shannon entropy of F is maximal toML tools. Each cell value in Table II represents the per-
(i.e., Hn (F ) = 1). Thus, the function F with higher Shannon centage of the total number of calls of the functions from
entropy is more likely to be used more by ML practitioners the respective AutoML tool corresponding to the stages of the
compared to the functions with less Shannon entropy. ML pipeline. As shown in Table II, the AutoML tools are
The functions with Hn (F ) > 0.8 and with their respective mostly used for hyperparameter optimization (accounting for
counts of function calls above 80% percentage (sorted in 40.9% of the function calls). One of the possible reasons is
descending order) were chosen for manual labeling. A total that hyperparameter optimization involves multiple repetitive
of 619 functions from the top 10 AutoML tools were selected tasks and requires a lot of effort and time [2]; similarly, there
in this step. might be numerous different sub-tasks involved compared to
other stages of the ML pipeline.
Manual labeling: Here we try to understand why AutoML
tools are used by ML practitioners and what they are used
for. To this end, we manually labeled the selected functions in AutoML tools are mainly used (40.9%) during the Hy-
the previous step and formed categories and sub-categories perparameter Optimization stage of the ML pipeline,
of purposes of using AutoML tools. We also assigned a followed by Model training (11.5%), Model evaluation
separate label representing the stages of the ML pipeline for (11.2%), and Feature Engineering (10.1%). AutoML
which the respective functions are called, following the stages tools mostly focus on providing functions for training
presented in Figure 1. The first two authors were involved ML models effectively, selecting a model, and evalu-
in the initial labeling and construction of the taxonomy. ating the performance of the resulting model.
Both individuals are graduate students with strong knowledge
TABLE II during model training), handling of input/output operations of
T HE PERCENTAGE USAGE OF AUTO ML FUNCTIONS ACROSS THE ML ML models, showing training or optimization progress, and
WORKFLOW.
handling of thread execution.
Feature Hyperparameter
Data col- Data Dala label- Model Model Model de- Model
AutoML
lection cleaning ing
engineer-
ing
optimiza-
tion
training evaluation ployment monitor
Others
• Feature engineering: This category is about using AutoML
Optuna 1.4 2.9 0 0 72.9 2.1 0.7 0.7 12.1 6.4
hyperopt
Skopt
11.5
0
1.9
0
0
0
5.8
0
75
90.6
0
3.1
0
3.1
0
0
3.8
0
1.9
3.1
tools to automate the extraction of features from the given
Featuretools 12.8 7.7 0 69.2 0 0 0 0 2.6 7.7
Tpot
Bayes-opt
0
0
11.6
0
0
0
9.3
0
9.3
78.9
23.3
0
18.6
0
18.6
0
4.7
5.3
4.7
15.8
data, usually using domain knowledge of the data and the
AutoKeras
Auto-sklearn
Facebook-Ax
3.6
2.7
0
3.6
5.4
4.2
0
1.4
0
8.3
5.4
0
8.3
4.1
68.8
40.5
25.7
4.2
25
44.6
4.2
7.1
1.4
2.1
0
4.1
4.2
3.6
5.4
12.5
model being built. The resulting features are used as input
Snorkel
AVERAGE
9.5
4.2
21.6
5.9
27
2.8
2.7
10.1
1.4
40.9
16.2
11.5
16.2
11.2
1.4
3.1
0
3.7
4.1
6.5 to some algorithms to train the ML models. We reported
in Figure 5 two main sub-categories of Feature engineering,
Figure 5 presents high-level (in the grey background) and including feature analysis and feature generation/extraction.
low-level (in white background) categories of the purposes
for which ML practitioners use AutoML tools. As shown • Pipeline management: This category is about using Au-
in Figure 5, the high-level categories of purposes are in the toML to automate the end-to-end management of an ML
order: Hyperparameter Optimization (30.6%), Model manage- model or a set of multiple models, including the orchestration
ment (26.5%), Data management (15%), Visualization (7.4%), of data that flows into or out of the model.
Logging (3.4%), Utility functions (3.2%), Feature Engineering • Visualization: This category is about the graphical rep-
(3.1%), Storage management (2.1%), Pipeline management resentation of data and information, such as visualizing the
(2%) and User Interface (0.5%). The following section de- feature importance using a chart for interpretability or visual
scribes some of the categories and sub-categories of purposes. representation of data distribution during data analysis.
• Hyperparameter optimization (HPO): This category is • Logging: This category involves using AutoML tools to
about using AutoML to determine the right combination of track and store records related to the execution events, data
hyperparameters that maximizes the model performance, usu- inputs, processes, data outputs, resulting in model performance
ally through running multiple experiments in a single training and other related tasks when developing and deploying ML
process. Automating HPO is indeed crucial, particularly for models. ML practitioners use the resulting information to
Deep Neural Networks (DNN) [7], [62], which depend on identify any suspicious activities in ML software projects’
a wide range of hyperparameter choices about the function privileged operations and assess the impact of state transfor-
tuning or regularization, parameter sampling, the network’s mations on performance.
architecture, and optimization. In Figure 5, we reported 16
• Storage management: In this category, AutoML tools
different activities involved during the HPO process, from ex-
are used for managing the data storage resources aiming
ecuting hyperparameter optimization and parameter sampling
at improving and maximizing the efficiency of data storage
to HPO documentation.
resources, often in terms of performance, availability, recov-
• Model management: This category is about using AutoML erability, and capacity.
to handle the different tasks related to training ML algorithms
with some input data and the resulting model that can generate • User interface: This category is about the element of
predictions using patterns in the input data. In Figure 5 we AutoML tools that allows the ML practitioners to easily
highlighted the different sub-tasks where ML practitioners use interact with the AutoML features, using the concepts such
AutoML in the ML models management, using AutoML for as visual and interactive design and information architecture
model evaluation and error-analysis, model training/testing, (e.g., providing web-based interactive interface for displaying
model definition, model monitoring, and optimizing model live code).
training performance. We used the label Others for the function calls which do
• Data management: In this category, AutoML is used to not fall into any of the categories.
handle the tasks related to the data used for building ML
models. This process includes a set of steps involving data, We derived 10 high-level categories of the purposes
including data generation and processing steps, managing data of using AutoML tools, including Hyperparameter
flow to ML models, working with multiple datasets, and optimization (30.6%), Model management (26.5%),
other tasks throughout the data pipeline. In particular, we Data management (15%), Visualization (7.4%), and
summarized 12 different sub-tasks related to data management Logging (3.4%). ML practitioners can save time and
in Figure 5 including data transformation, data analysis, data effort by using AutoML tools to automate their ML
loading, data labeling, and data export. pipeline tasks [2] for similar purposes.
• Utility functions: These are functions provided by specific
AutoML tools to perform some common functionally and In Figure 6 we provide the breakdown of the major cate-
are often reusable in the ML pipeline. The Utility functions gories of purposes (x-axis) of using AutoML provided in the
are particularly important to reduce the pipeline complexity top 10 AutoML tools (y-axis). The circle size represents the
and increase the readability of the source code. Examples of composition of using the respective AutoML tools for that
the tasks in this category include session management (e.g., purpose, computed as the percentage of total function calls
Purposes

Feature engineering Utility function Data management Model management Hyperparameter


Others (6.2%)
(3.1%) (3.2%) (15.0%) (26.5%) Optimization (30.6%)

Feature analysis Manage session Data/feature transform Model evaluation/Model Execute optimization
(2.1%) (0.8%) (3.5%) error analysis (9.3%) (7.2%)

Feature generation/ Dependency Data analysis Execute model Search space


extraction (1.0%) management (0.6%) (3.4%) tting/re tting (6.3%) management (3.9%)

Pipeline management Show execution Data loading De ne/Generate model Setup/Get


(2.0%) progress (0.5%) (2.6%) (3.7%) hyperparameter (2.6%)

Pipeline export Input/Output operation Data labeling Score model Evaluate


(1.0%) (0.5%) (2.6%) (2.1%) objective function (2.4%)

Score pipeline Manage timezone Data cleaning/ ltering Model/training report Report optimization
(0.3%) (0.3%) (1.3%) (1.9%) (2.3%)

Pipeline con guration Thread execution Data export Export/Import model Sample parameters
(0.3%) (0.2%) (0.6%) (1.8%) (2.1%)

Pipeline optimization Helper function Data formating Meta-learning Minimize/Maximize


(0.2%) (0.2%) (0.6%) (0.6%) objective function (1.8%)

Pipeline complexity Data validation Load model Suggest parameter


(0.2%) (0.6%) (0.3%) (1.6%)

Visualization Data sampling Model monitoring Manage Report/Get


(7.4%) (0.5%) (0.3%) objective function (1.4%) best parameters (0.6%)

Logging Data initialization Optimize training Analyze Get parameter


(3.4%) (0.3%) performance (0.2%) optimization details (1.1%) importance (0.6%)

Storage management Data preprocessing Export optimization Search space


(2.1%) (0.3%) (1.1%) transformation (0.5%)

User Interface Data/feature generation Prunning Documentation


(0.5%) (0.2%) (1.1%) (0.3%)
fi
fi
fi
fi
fi
Fig. 5. The taxonomy of the purposes of using AutoML tools. The high level categories are highlighted in grey, while the sub-categories are in white boxes.

defining that purpose to the total number of function calls function calls for Feature engineering (45%) and Pipeline man-
from the individual AutoML tool. agement (22.7%) tasks are dominantly used from Featuretools
and Tpot, respectively, while a minimal percentage (1.3%) of
Tpot
function calls related to User Interface comes from Snorkel
and Optuna.
Snorkel

Skopt
C. RQ3: Are different AutoML tools used together?
1) Motivation: From Table II we observe that at least
Top 10 AutoML tools

Optuna
value

Hyperopt
0 two of the AutoML tools can be used to automate the
20

Featuretools 40
functionality in each stage of the ML pipeline. This question
60 aims to understand whether the different AutoML tools are
Facebook−Ax
used together in the same source code file. For instance,
Bayes−opt
ML practitioners may use AutoSklearn for model training
Autokeras
and tune the hyperparameters using the functionality provided
Auto−sklearn by Optuna within the same code. Answering this question
could help ML practitioners choose the AutoML tools used
User.Interface

Storage.management

Logging

Visualization

Pipeline.management

Feature.engineering

Utility.function

Data.management

Model.management

HPO

Others

together for different purposes. Similarly, the information


can help the developer of the AutoML tools improve the
Main categories of purpose integration procedure of the AutoML tools used together, such
as providing an API to facilitate the usage of the other AutoML
tools while getting input or sending output.
Fig. 6. The percentage composition of the high-level categories for purposes
of using AutoML across the top 10 most used tools. 2) Approach: We compared the composition of the source
We can clearly see in Figure 6 that the functions for code files (of the projects using AutoML tool) where different
Hyperparameter optimization, Model management, and Data AutoML tools are used together as follows. For every source
management are used the most for most of the top 10 AutoML code files {T s1 , T s2 , T s3 , ...T sn } importing and using the
tools. Specifically, Hyperparameter optimization and Model functions from a given AutoML tool, we computed the per-
management dominate at least 40% of the AutoML tools. In centage composition of the shared source files as:
contrast, some of the purposes of using AutoML tools are T si·j ∗ 100
specific to certain tools. For example, we can observe that the (2)
T siN
TABLE III with Hyperopt (0.69%), Optuna with Hyperopt (0.29%), Skopt
T HE PERCENTAGE NUMBER OF G IT H UB PROJECTS AND SOURCE CODE with Hyperopt (0.25%) and Bayes-Opt with Skopt (0.20%). In
FILES USING DIFFERENT AUTO ML TOOLS TOGETHER .
Table IV, we provide examples of the functions used together
No. AutoML
% No.of Projects
1
92.52
2
5.85
3
1.28
4
0.26
5
0.053
≥6
0.045
in the same source code file. For instance, both Hyperopt and
% No.of code files 98.77 0.90 0.32 0.004 0 0
Optuna provide rich functions for optimizing hyperparameters
TABLE IV
E XAMPLES OF THE AUTO ML FUNCTIONS USED TOGETHER . according to Figure 6; however, optuna also offers other
features that are useful during hyperparameter optimization,
Tools combination Example function
hyperopt, tpot
hyperopt.fmin’, ’tpot.TPOTClassifier.fit’, such as Visualization and Logging. This implies that ML
’tpot.TPOTClassifier.score’
’hyperopt.fmin’, ’hyperopt.hp.choice’, practitioners using HyperOpt with Optuna together have more
hyperopt, optuna ’hyperopt.hp.uniform’, ’optuna.create_study’,
’optuna.create_study.optimize’ options for HPO while also utilizing the visualization and
’bayes_opt.BayesianOptimization’,
bayes opt, skopt ’bayes_opt.BayesianOptimization. maximize’, logging functions from Optuna. Similarly, Tpot provides few
’skopt.BayesSearchCV’, ’skopt.BayesSearchCV.fit’
’hyperopt.Trials’, ’hyperopt.fmin’, functions for optimizing hyperparameters but offers more
’optuna.create_study’, ’optuna.create_study.optimize’,
tpot, hyperopt, optuna
’tpot.TPOTClassifier.fit’, ’tpot.TPOTClassifier.score’ functions such as pipeline management and model manage-
ment; the ML practitioners can simultaneously automate these
where T si·j is the total number of source files using the tasks when using Tpot with Hyperopt.
functions of two AutoML tools Ti and Tj in the same file, and
T siN is the total number of files using at least one function Less than 8% of projects use two or more different
from the AutoML tool Ti . We define the results of the equation AutoML tools. Different AutoML tools are rarely used
as the correlation between two AutoML tools. in the same source code files. Among the few AutoML
In addition, we also calculated the distribution of the tools used together, Hyperopt is used with other tools,
projects that use the different number of AutoML tools to such as Tpot, Skopt, and Optuna. ML practitioners
understand how these AutoML tools are used together in the can consider using the combination of functions from
same projects. different AutoML tools together to help automate
3) Results: Table III shows the distribution of the projects multiple tasks across the ML pipeline. Similarly, the
and source codes that use a different number of AutoML tools. AutoML developers can propose an efficient API to
We focus on the top 10 AutoML tools and the projects that use integrate the AutoML tools that are used together.
these tools. We observe that the vast majority of the projects
use only one AutoML tool. There are only fewer than 8% of
the projects that use two or more different AutoML tools in VI. D ISCUSSION AND I MPLICATION
the same project.
Figure 7 shows the correlation matrix of the percentage This section further discusses the implications of our find-
composition of shared source files using different top AutoML ings for the research community, the ML practitioners, and
tools together, computed using Equation (2). We only reported tool developers.
the correlation of the top 10 most used AutoML tools for clear • Researchers: Our study provides a general view of the usage
visualization. of AutoML tools by ML practitioners across the ML pipeline
and highlights the most used AutoML tools. Our initial results
indicate that most of the popular AutoML tools are used in
snorkel 0 0.01 0 0 0 0 0 0 0 1
1 relatively immature and less popular projects, with 75% of the
ax 0 0.01 0.01 0 0 0 0 0 1 0 projects being developed by fewer than two contributors and
autosklearn 0.03 0 0 0 0.16 0.01 0 1 0 0
0.8
being younger than five months old. Also, the result shows
autokeras 0 0 0 0 0.07 0 1 0 0 0 that the top 10 tools (18% of the tools) account for 68% of
Top 10 AutoML

0.6
bayes_opt 0.14 0.18 0.2 0.03 0.01 1 0 0.01 0 0 the usages. Future work can explore the factors that impact
tpot 0.42 0.69 0.1 0.09 1 0.01 0.05 0.1 0 0
0.4
the adoption of the AutoML tools.
featuretools 0.07 0.02 0 1 0.09 0.02 0 0 0 0
Also, regarding the usage of the AutoML tools in the studied
skopt 0.08 0.26 1 0 0.07 0.11 0 0 0 0
0.2 projects, our results indicate that most of the functions used
hyperopt 0.49 1 0.25 0.01 0.45 0.1 0 0 0 0
by the ML practitioners are model-oriented, specifically in
optuna 1 0.29 0.05 0.03 0.16 0.05 0 0.01 0 0

optuna hyperopt skopt featuretools tpot bayes_opt autokeras autosklearn ax snorkel


0
the Hyperparameter optimization (40.9%) and model training
Top 10 AutoML (11.5%) stages of the ML pipeline. These results indicate that
the current AutoML tools focus more on providing effective
model training and the resulting models and pay little attention
Fig. 7. The composition of project’s source code files using different AutoML
tools.
to the data prepossessing process. Indeed data prepossessing
From Figure 7 we observe that, across the top 10 AutoML has been the least focus of AutoML tools, in line with the
tools, only a few of the source files (< 2%) uses the functions previous works [2], [14], [63], yet an ML model is as good
from more than one AutoML tool. This result implies that most as the quality of the data [64]. Researchers can investigate the
of these tools are used independently. Among the most used challenges involved in automating the data-oriented tasks of
AutoML tools, the few AutoML tools used together are Tpot the ML pipeline for a better quality of the data.
• AutoML Tool Developers: The developers of the AutoML AutoML tools, which may affect the results of this study.
tools, on the one hand, can use our results to improve their However, a function is usually designed to perform a specific
tools to support the most used purposes and stages of the task, as reflected by the name of that function. Therefore,
ML pipeline while keeping in mind the target users of their we believe that this threat is minimal. In the future, we will
tools. They can also add more features to support the stages analyze the relation cardinality between functions of different
not currently supported such as data labeling. On the other AutoML tools to understand how the studied tools handle the
hand, the developers of the least used AutoML tools can same situation with the same or different number of function
improve their tools by implementing or reusing the similar calls.
functionalities provided by the most popular AutoML tools. External validity threats: The number of selected AutoML
We also observe that, although not frequent, some AutoML tools for manual labeling affects the generalizability of the
tools are used together in the same source files. For example, results. To minimize this effect, we selected the 10 most used
Hyperopt is one of the tools that is used with other tools, such AutoML tools, which account for 68% of the usage in the
as Tpot, Skopt, and Optuna. The developers of these tools can GitHub projects. We assumed that our findings derived from
propose an efficient API for integrating these tools. Also, Au- these 10 AutoML tools can represent how ML practitioners
toML tools’ developers can propose better tooling/frameworks use AutoML tools and their main purposes for using them.
integrated into their AutoML tools for improving the quality
Construct validity threats: To extract the function calls of
of the data. Moreover, we observed that the functionalities
AutoML tools in GitHub projects, we followed the official
related to documentation and user interface (UI) are among
documentation of AST library. However, it’s still possible that
the least used, indicating that the current AutoML tools may
we may have missed some function calls that are rarely used
suffer from the level of adaption (and possible customization)
and hard to recognize. We provide all our scripts and data in
by the ML practitioners of diverse skills and expertise. Xin et
our replication package [44], to allow replicating our results.
al. [8] in their study found that customization is one of the
lowest-rated quality attributes of AutoML tools, as reported
by practitioners. ML practitioners expressed the need for more VIII. C ONCLUSION AND FUTURE WORK
customization. They claim that this would improve their trust This paper presents an empirical study on the usage of
and transparency in using the AutoML tools [8]. Therefore, AutoML tools in open-source projects hosted on GitHub. As
AutoML tools’ developers should consider providing more the initial step, we examined the most used AutoML tools
visualization functionalities and improving the UI, to increase and the details of the projects using the tools. We reported
the usability of their tool. the top 10 most used AutoML tools and showed that most
• ML Practitioners: We encourage the ML practitioners to of the projects using the tools are young and less popular.
use our findings to learn about the popular AutoML tools and Then, we manually investigated the function calls of ML
the different purposes for which they can be used, in order to tools to understand the purposes of using AutoML tools and
automate similar tasks in their ML pipeline; saving both time their used stages in the ML pipeline. We showed that some
and effort [2]. Also, they can choose the combination of the of the purposes (such as User Interface, Logging, Feature
tools that are mostly used together including Hyperopt with Engineering, and Pipeline management) are only specific to a
other tools, such as Tpot, Skopt, or Optuna. For instance, when few AutoML tools (e.g., Optuna, Tpot and Featuretool) com-
the ML practitioner wants to optimize hyperparameters and pared to other purposes (e.g., Hyperparameter Optimization,
manage the entire ML pipeline, Hyperopt (for hyperparameter Model management, Data management) which span across
optimization) with Tpot (for pipeline management) can be a multiple AutoML tools. We also showed that most AutoML
good choice according to Figures 6 and 7. tools focused on model training and model selection but had
few functions to automate the data-oriented stages of the
VII. T HREATS TO VALIDITY ML pipeline. Moreover, we examined whether the different
AutoML tools are used together in the same source code and
We now discuss threats to the validity of our study.
observed that only a lower percentage of the files use different
Internal validity threats: We manually studied a sample of
AutoML tools together in the same code. Our future work
AutoML function calls to understand ML practitioners’ pur- will investigate the features that are provided by the AutoML
poses for using AutoML tools. However, our labeling results tools but not used by ML practitioners. In addition, we will
may not truly reflect the actual purposes of the developers. quantitatively study the factors that impact the successful
Future work can perform developer surveys to confirm our adoption of the AutoML tools in real-world ML projects.
results. Besides, the process of manual labeling affects the
results of this study. To reduce the impact of biased labels, ACKNOWLEDGEMENT
two authors labeled the functions, and the other two authors
reviewed the labels separately to validate them. Another pos- This work is funded by the Fonds de Recherche du Québec
sible threat is related to our use of function calls to determine (FRQ), Natural Sciences and Engineering Research Council
the popularity of the AutoML tools. As different numbers of of Canada (NSERC), and Canadian Institute for Advanced
calls may combine to achieve the same goal across different Research (CIFAR).
R EFERENCES [21] R. S. Olson, N. Bartley, R. J. Urbanowicz, and J. H. Moore, “Evaluation
of a tree-based pipeline optimization tool for automating data science,”
[1] R. Elshawi, M. Maher, and S. Sakr, “Automated machine learning: State- in Proceedings of the genetic and evolutionary computation conference
of-the-art and open challenges,” arXiv preprint arXiv:1906.02287, 2019. 2016, 2016, pp. 485–492.
[2] A. Truong, A. Walters, J. Goodsitt, K. Hines, C. B. Bruss, and R. Farivar, [22] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
“Towards automated machine learning: Evaluation and comparison of O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al.,
automl approaches and tools,” in 2019 IEEE 31st international con- “Scikit-learn: Machine learning in python,” the Journal of machine
ference on tools with artificial intelligence (ICTAI). IEEE, 2019, pp. Learning research, vol. 12, pp. 2825–2830, 2011.
1471–1479. [23] D. J.-L. Lee and S. Macke, “A human-in-the-loop perspective on automl:
[3] M. Feurer, A. Klein, J. Eggensperger, Katharina Springenberg, M. Blum, Milestones and the road ahead,” IEEE Data Engineering Bulletin, 2020.
and F. Hutter, “Efficient and robust automated machine learning,” in [24] J. M. Kanter and K. Veeramachaneni, “Deep feature synthesis: Towards
Advances in Neural Information Processing Systems 28 (2015), 2015, automating data science endeavors,” in 2015 IEEE International Con-
pp. 2962–2970. ference on Data Science and Advanced Analytics, DSAA 2015, Paris,
[4] M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, and F. Hutter, France, October 19-21, 2015. IEEE, 2015, pp. 1–10.
“Auto-sklearn 2.0: Hands-free automl via meta-learning,” 2020. [25] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next-
[5] H. Jin, Q. Song, and X. Hu, “Auto-keras: An efficient neural architecture generation hyperparameter optimization framework,” in Proceedings
search system,” in Proceedings of the 25th ACM SIGKDD International of the 25th ACM SIGKDD international conference on knowledge
Conference on Knowledge Discovery & Data Mining. ACM, 2019, pp. discovery & data mining, 2019, pp. 2623–2631.
1946–1956. [26] M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim, “Do we
[6] (2021) snorkel. [Online]. Available: https://github.com/snorkel-team/ need hundreds of classifiers to solve real world classification problems?”
snorkel The journal of machine learning research, vol. 15, no. 1, pp. 3133–3181,
[7] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next- 2014.
generation hyperparameter optimization framework,” in Proceedings [27] Y. Li, Y. Shen, W. Zhang, J. Jiang, B. Ding, Y. Li, J. Zhou, Z. Yang,
of the 25th ACM SIGKDD international conference on knowledge W. Wu, C. Zhang et al., “Volcanoml: speeding up end-to-end automl via
discovery & data mining, 2019, pp. 2623–2631. scalable search space decomposition,” arXiv preprint arXiv:2107.08861,
[8] D. Xin, E. Y. Wu, D. J.-L. Lee, N. Salehi, and A. Parameswaran, 2021.
“Whither automl? understanding the role of automation in machine [28] A. Crisan and B. Fiore-Gartland, “Fits and starts: Enterprise use of
learning workflows,” in Proceedings of the 2021 CHI Conference on automl and the role of humans in the loop,” in Proceedings of the 2021
Human Factors in Computing Systems, 2021, pp. 1–16. CHI Conference on Human Factors in Computing Systems, 2021, pp.
[9] D. Wang, Q. V. Liao, Y. Zhang, U. Khurana, H. Samulowitz, S. Park, 1–15.
M. Muller, and L. Amini, “How much automation does a data scientist [29] Q. Wang, Y. Ming, Z. Jin, Q. Shen, D. Liu, M. J. Smith, K. Veeramacha-
want?” arXiv preprint arXiv:2101.03970, 2021. neni, and H. Qu, “Atmseer: Increasing transparency and controllability
[10] J. Drozdal, J. Weisz, D. Wang, G. Dass, B. Yao, C. Zhao, M. Muller, in automated machine learning,” in Proceedings of the 2019 CHI
L. Ju, and H. Su, “Trust in automl: exploring information needs for Conference on Human Factors in Computing Systems, 2019, pp. 1–12.
establishing trust in automated machine learning systems,” in Proceed- [30] I. Xanthopoulos, I. Tsamardinos, V. Christophides, E. Simon, and
ings of the 25th International Conference on Intelligent User Interfaces, A. Salinger, “Putting the human back in the automl loop.” in
2020, pp. 297–307. EDBT/ICDT Workshops, 2020.
[11] S. Amershi, A. Begel, C. Bird, R. DeLine, H. Gall, E. Kamar, N. Na- [31] L. Ferreira, A. Pilastri, C. M. Martins, P. M. Pires, and P. Cortez, “A
gappan, B. Nushi, and T. Zimmermann, “Software engineering for comparison of automl tools for machine learning, deep learning and
machine learning: A case study,” in 2019 IEEE/ACM 41st International xgboost,” in 2021 International Joint Conference on Neural Networks
Conference on Software Engineering: Software Engineering in Practice (IJCNN). IEEE, 2021, pp. 1–8.
(ICSE-SEIP). IEEE, 2019, pp. 291–300. [32] N. V. Ivankova, J. W. Creswell, and S. L. Stick, “Using mixed-methods
[12] C. Google, “Mlops: Continuous delivery and automation pipelines sequential explanatory design: From theory to practice,” Field methods,
in machine learning,” https://cloud.google.com/architecture/ vol. 18, no. 1, pp. 3–20, 2006.
mlops-continuous-delivery-and-automation-pipelines-in-machine-learning, [33] (2022) Google scholar. [Online]. Available: https://scholar.google.com/
2021. [34] (2022) Engineering village. [Online]. Available: http://www.
[13] IBM, “The machine learning development and operations,” engineeringvillage.com/
https://ibm-cloud-architecture.github.io/refarch-data-ai-analytics/ [35] (2021) The top 426 automl open source projects on github. [Online].
methodology/MLops/, 2020. Available: https://awesomeopensource.com/projects/automl
[14] Q. Yao, M. Wang, Y. Chen, W. Dai, Y.-F. Li, W.-W. Tu, Q. Yang, [36] (2021) Github rest api. [Online]. Available: https://developer.github.
and Y. Yu, “Taking human out of learning applications: A survey on com/v3/
automated machine learning,” arXiv preprint arXiv:1810.13306, 2018. [37] V. Christina, “What is the best programming language
[15] D. Wang, J. D. Weisz, M. Muller, P. Ram, W. Geyer, C. Dugan, for machine learning?” https://towardsdatascience.com/
Y. Tausczik, H. Samulowitz, and A. Gray, “Human-ai collaboration in what-is-the-best-programming-language-for-machine-learning-a745c156d6b7,
data science: Exploring data scientists’ perceptions of automated ai,” 2017.
Proceedings of the ACM on Human-Computer Interaction, vol. 3, no. [38] S. Gupta. (October 6, 2021) What is the best language for
CSCW, pp. 1–24, 2019. machine learning? [Online]. Available: https://www.springboard.com/
[16] M.-A. Zöller and M. F. Huber, “Benchmark and survey of automated ma- blog/data-science/best-language-for-machine-learning/
chine learning frameworks,” Journal of artificial intelligence research, [39] N. Munaiah, S. Kroh, C. Cabrey, and M. Nagappan, “Curating github for
vol. 70, pp. 409–472, 2021. engineered software projects,” Empirical Software Engineering, vol. 22,
[17] A. Crisan and B. Fiore-Gartland, “Fits and starts: Enterprise use of no. 6, pp. 3219–3253, 2017.
automl and the role of humans in the loop,” in Proceedings of the 2021 [40] J. Businge, M. Openja, S. Nadi, E. Bainomugisha, and T. Berger,
CHI Conference on Human Factors in Computing Systems, 2021, pp. “Clone-based variability management in the android ecosystem,” in 2018
1–15. IEEE International Conference on Software Maintenance and Evolution
[18] M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, and (ICSME), 2018, pp. 625–634.
F. Hutter, “Auto-sklearn 2.0: The next generation,” arXiv preprint [41] J. Businge, M. Openja, D. Kavaler, E. Bainomugisha, F. Khomh, and
arXiv:2007.04074, vol. 24, 2020. V. Filkov, “Studying android app popularity by cross-linking github
[19] M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and and google play store,” in 2019 IEEE 26th International Conference
F. Hutter, “Efficient and robust automated machine learning,” Advances on Software Analysis, Evolution and Reengineering (SANER), 2019, pp.
in neural information processing systems, vol. 28, 2015. 287–297.
[20] R. S. Olson and J. H. Moore, “Tpot: A tree-based pipeline optimization [42] J. Businge, M. Openja, S. Nadi, and T. Berger, “Reuse and mainte-
tool for automating machine learning,” in Workshop on automatic nance practices among divergent forks in three software ecosystems,”
machine learning. PMLR, 2016, pp. 66–74. Empirical Software Engineering, vol. 27, no. 2, pp. 1–47, 2022.
[43] M. Openja, F. Majidi, F. Khomh, B. Chembakottu, and H. Li, “Studying [54] (2020) Bayesianoptimization. [Online]. Available: https://github.com/
the practices of deploying machine learning projects on docker,” in The fmfn/BayesianOptimization
International Conference on Evaluation and Assessment in Software [55] (2022) autokeras. [Online]. Available: https://github.com/keras-team/
Engineering 2022, ser. EASE 2022. New York, NY, USA: Association autokeras
for Computing Machinery, 2022, p. 190–200. [Online]. Available: [56] (2022) auto-sklearn. [Online]. Available: https://github.com/automl/
https://doi.org/10.1145/3530019.3530039 auto-sklearn
[44] “Replication; “an empirical study on the usage of automated ma- [57] (2022) ax. [Online]. Available: https://github.com/facebook/Ax
chine learning tools”,” https://github.com/Empirical-Study-on-AutoML/ [58] C. E. Shannon, “A mathematical theory of communication,” ACM
AutoML-paper, 2022. SIGMOBILE mobile computing and communications review, vol. 5,
[45] S. Passi and S. J. Jackson, “Trust in data science: Collaboration, transla- no. 1, pp. 3–55, 2001.
tion, and accountability in corporate data science projects,” Proceedings [59] F. Khomh, B. Chan, Y. Zou, and A. E. Hassan, “An entropy evaluation
of the ACM on Human-Computer Interaction, vol. 2, no. CSCW, pp. approach for triaging field crashes: A case study of mozilla firefox,” in
1–28, 2018. 2011 18th Working Conference on Reverse Engineering. IEEE, 2011,
[46] “nbconvert,” 2022. [Online]. Available: https://nbconvert.readthedocs. pp. 261–270.
io/en/latest/ [60] (2022) Autokeras text classifier. [Online]. Available: https://autokeras.
[47] “nbformat,” 2022. [Online]. Available: https://nbformat.readthedocs.io/ com/text classifier/
en/latest/ [61] (2022) Autokeras image classifier. [Online]. Available: https://autokeras.
[48] “Abstract syntaxt trees,” 2022. [Online]. Available: https://docs.python. com/image classifier/
org/3/library/ast.html [62] M. Feurer and F. Hutter, “Hyperparameter optimization,” in Automated
[49] (2022) Optuna. [Online]. Available: https://github.com/optuna/optuna machine learning. Springer, Cham, 2019, pp. 3–33.
[50] (2021) Hyperopt. [Online]. Available: https://github.com/hyperopt/ [63] A. Inc. (2022) The state of data science 2020 moving
hyperopt from hype toward maturity. [Online]. Available: https:
[51] (2021) Scikit-optimize. [Online]. Available: https://github.com/ //www.anaconda.com/state-of-data-science-2020?utm medium=press&
scikit-optimize/scikit-optimize utm source=anaconda&utm campaign=sods-2020&utm content=report
[52] (2022) Featuretools. [Online]. Available: https://github.com/ [64] E. Breck, N. Polyzotis, S. Roy, S. Whang, and M. Zinkevich, “Data
Featuretools/featuretools validation for machine learning.” in MLSys, 2019.
[53] (2021) tpot. [Online]. Available: https://github.com/EpistasisLab/tpot

You might also like