You are on page 1of 9

Sustainable Computing: Informatics and Systems 38 (2023) 100864

Contents lists available at ScienceDirect

Sustainable Computing: Informatics and Systems


journal homepage: www.elsevier.com/locate/suscom

How sustainable is ‘‘common’’ data science in terms of power consumption?


Bjorge Meulemeester a,b ,∗, David Martens a
a
Prinsstraat 13, 2000 Antwerpen, Belgium
b
Ludwig-Eerhard-Allee 2, D-53175, Bonn, Germany

ARTICLE INFO ABSTRACT

Keywords: Continuous developments in data science have brought forth an exponential increase in complexity of
Sustainability machine learning models. Additionally, data scientists have become ubiquitous in the private market and
Carbon emission academic environments. All of these trends are on a steady rise, and are associated with an increase in
AI
power consumption and associated carbon footprint. The increasing carbon footprint of large-scale advanced
Data science
data science has already received attention, but the latter trend has not. This work aims to estimate the
Energy consumption
Carbon footprint
contribution of the increasingly popular ‘‘common’’ data science to the global carbon footprint. To this end,
Common data science the power consumption of several typical tasks in the aforementioned common data science tasks are measured
and compared to: large-scale ‘‘advanced’’ data science, common computer-related tasks, and everyday non-
computer related tasks. An automated data science project is also run on various hardware architectures.
To assess its sustainability in terms of carbon emission, the measurements are converted to gCO2 eq and an
equivalent unit of ‘‘km driven by car’’. Our main findings are: ‘‘common’’ data science consumes 2.57 more
power than regular computer usage, but less than some common everyday power-consuming tasks such as
lighting or heating; advanced data science consumes substantially more power than common data science,
and can be either on par or vastly surpass common everyday power-consuming tasks, depending on the scale
of the project. In addition to the reporting of these results, this work also aims to inspire researchers to include
power usage and estimated carbon emission as a secondary result in their work.

1. Introduction adopted [12], measures targeting other areas, such as inefficiencies in


servers, circular economy, and heat reuse, are still lagging [13].
Data science (DS) and Artificial Intelligence (AI) have become a fun- These values represents the steep increase in data center demand
damental aspect of our digital world; be it for marketing strategies [1], and their usages, but does not take into account the increase in ‘‘com-
uncovering shopping patterns of clients [2], computer-mediated lan- mon’’ data science jobs that do not require these large data centers.
guage comprehension [3], detecting cardiac arrhythmias [4] or simply The term common data science is used in this work to denote any data
predicting the weather [5]. Deep neural networks have experienced science that can be performed by one (or a team of) data scientist(s)
a steep increase in complexity, and correspondingly a steep increase on their home or work computer/laptop.
in the hardware required to train these models [6]. As the hardware Data science entails more than huge complex neural networks and
requirements have grown exponentially, so has the power consumption big datacenters; common data scientists are omnipresent in the pri-
attributed to these hardware requirements. Large models are commonly vate market in various fields. Data science related jobs often end up
trained in big datacenters, powered by various types of energy [7]. It
in the top 5 fastest growing jobs in terms of demand: according to
is estimated that these datacenters currently account for about 1% of
LinkedIn, Machine learning engineer is the 4th fastest rising job in the
the worldwide electricity use [8]. Their projected energy consumption
U.S. [14], and the 2nd fastest rising job in The Netherlands, Italy and
and carbon emission for the upcoming years is subject of debate,
the U.K. [15]. Other data science related jobs such as Data scientist
and estimates range between the status quo [8] up to 8% of global
and Data engineer tend to end up high on these lists as well. Does this
carbon emission by 2030 [9]. Major takeaways from this discussion
increase in common ‘‘everyday’’ data science also imply an increase
is that datacenters are gradually shifting towards green energy [10],
are getting increasingly efficient [8], but are simultaneously being in power consumption and carbon emission? Masanet et al. [8] esti-
used more and more for hyperscale data science [11] and, despite the mate that in 2020, powering digital devices (excluding smartphones)
shift towards green energy in datacenters’ infrastructure being widely accounted for 20% of all ICT-related greenhouse gas emissions (GHGe),

∗ Corresponding author at: Prinsstraat 13, 2000 Antwerpen, Belgium.


E-mail addresses: bjorge.meulemeester@mpinb.mpg.de (B. Meulemeester), david.martens@uantwerpen.be (D. Martens).

https://doi.org/10.1016/j.suscom.2023.100864
Received 31 May 2022; Received in revised form 6 January 2023; Accepted 20 February 2023
Available online 25 February 2023
2210-5379/© 2023 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
B. Meulemeester and D. Martens Sustainable Computing: Informatics and Systems 38 (2023) 100864

and communication networks account for 25%; the sum of which Table 1
is comparable to the estimate of data center GHGe of 45%. The Overview of considered experiments showing benefits and drawbacks, showing
three criteria for each experiment: Hardware representativeness, workflow realism
contribution to the global GHGe of household ICT devices, such as those (how well the considered task mirrors a real-world task), and workflow repre-
used by common data scientists, should not be underestimated. sentativeness (compared to other data scientists). The experiment indices follow
This work aims to compare the previously mentioned advanced the enumeration as given in Section 2.
data science, or ‘‘big datacenter data science’’ to the latter common Experiment 1 2
data science. The main question is if the latter common data science Hardware Representative 1 architecture Several architectures
has a substantial contribution to the global GHGe. To this end, the Workflow Realistic Real workflow Automated
carbon emissions of common data science is estimated and compared to Representative Singular Singular & automated
common computer tasks, common non-computer related everyday tasks
and the aforementioned ‘‘big datacenter’’ advanced data science. This is
done by means of performing a common data science project and other (NCR) and advanced data science (ADS). During this cross-comparison,
various tasks on a laptop while measuring their power consumption, all computer related tasks and common data science tasks were per-
and comparing these values to one another. Additionally, an automated formed on the same hardware architecture. The considered hardware
data science project is run on various hardware architectures, and an architecture is described in Table 2 under index 3.
empirical law is constructed to extrapolate these results to even more
Only during this experiment is the hardware limited to a single
hardware architectures. Further details of these experiments are given
hardware structure. This is done in order to allow for consistent com-
in Section 2.
parison between these tasks; the aim of this experiment is to asses an
The goal of this work is not to report a large-scale benchmark on
estimate of the relative difference in power consumption of these tasks.
the power consumption and environmental impact of common data
science; such results are only possible and accurate if these numbers are (CR) Baseline computer (idle). This task is done by leaving a computer
present plentifully in the literature. This is the only way to accumulate running with the screen on, and no open programs. The value of this
realistic, trustworthy data that covers a broad range of data scientists, task varies depending on the efficiency of the hardware, background
data science projects and different hardware architectures. This work tasks and screen brightness.
does aim to aid the popularization of reporting of such numbers as
secondary results in order to make this a possibility. Such reporting (CR) Watching a movie. This task is completed by streaming Avengers:
is not yet standard practice, as is demonstrated by the lack of these Age of Ultron via Disney+ on Firefox 96.0 for Ubuntu. No other tabs
numbers in hyper-scale AI projects such as DALL⋅E [16] and GPT-3 [3], or programs were running. The carbon emission of the datacenter
but also in many other data science related works. providing the movie is not taken into account due to lack of available
data.
2. Methods (CR) Normal computer usage. Normal usage entails browsing the in-
ternet, reading a pdf and opening/closing various programs: Inkscape,
In order to assess the environmental impact of performing common Firefox, text editor and file browser. Tasks that involve video rendering
data science, two different experiments are performed: (such as gaming, watching Youtube or viewing local video files) are
1. Performing a real data science project on a single hardware purposely avoided, as these are comparable to the task Watching a
device and comparing it to other common computer related tasks movie.
on the same hardware device, common non-computer related (CR) Working in excel. This task entails making plots and performing
tasks, and advanced data science. various column transformations on a 4, 000 × 20 dataset containing
2. Automating the previous data science project and running it on only continuous numerical values. These column transformations aim
multiple devices, comparing the outcomes. to reflect common data science operations such as: deleting columns,
The goal of these experiments is to assess the contribution of common scaling columns, and power- and log-transformations. Plots are limited
data science to the overall power consumption of one or more devices. to histograms. These Excel operations are performed in LibreOffice Calc
This contribution in power consumption is then converted to carbon version 1:6.4.7-0ubuntu0.20.04.2.
emission as explained in Section 2.6. As it is not feasible to consider
(CDS) Data science project. This task entails doing a common data sci-
every possible combination of hardware architectures as individual
ence project, namely a credit scoring classification problem: identifying
parameters, and such large-scale benchmark is not the scope of this
defaulters on a loan. This project is split into two main parts:
work, we consider instead three essential properties that an experiment
must have in order to reflect an accurate assessment of the power 1. Exploration & preprocessing
consumption of common data science: 2. Gridsearch & fit
1. Hardware representativeness
Exploration & preprocessing entails constructing plots of features
2. Workflow representativeness
and processing features according to their meaning, type and distribu-
3. Workflow realism
tion. This is done by means of column transformations on a 20, 000 ×
How well each experiment covers these properties is shown in Table 1, 35 dataset with mixed continuous and categorical features. All data
and revisited in Section 4.5. Note that this work is inherently limited exploration and processing is done in Python3.8 using matplotlib,
by the workflow of the authors, and no experiment in this work can pandas and scikit-learn. Column transformations includes: encoding
perfectly simulate or represent the workflow of other data scientists. (one-hot encoding and WOE-encoding), scaling to normal distributions
More details on both is provided below. with QuantileTransformer(), filling missing values with means
or modes and over- and undersampling with ADASYN and Rando-
2.1. Experiment №1: Performing a real data science project and comparing mUnderSampler(). The exact workflow can be found in the auto-
to other common tasks mated version of the data science project (see Section 2.2), available
on github at [17].
This section provides a detailed overview of each task that was per- Gridsearch & fit entails performing two hyperparameter gridsearches
formed in order to compare common data science (CDS) to other com- for the machine learning models Random Forest, KNN and Logistic
mon computer related tasks (CR), common non-computer related tasks Regression on the dataset as described in the previous paragraph.

2
B. Meulemeester and D. Martens Sustainable Computing: Informatics and Systems 38 (2023) 100864

Table 2
Overview of all hardware types considered in this work. The indices are ordered to be congruent with the graphs.
Index Name CPU Threads RAM slots RAM
Dell Inc. Intel(R) Core(TM) SODIMM DDR4 Synchronous
3 Inspiron 7570 i7-8550U CPU 8 1x Unbuffered (Unregistered)
(07EA) Notebook @ 1.80 GHz 2400 MHz (0,4 ns)
Dell Inc. Intel(R) Core(TM) SODIMM DDR4 Synchronous
2 XPS 15 9560 i7-7700HQ CPU 8 2x Unbuffered (Unregistered)
(07BE) Notebook @ 2.80 GHz 2400 MHz (0,4 ns)
Dell Inc. Intel(R) Core(TM)
SODIMM DDR4 Synchronous
1 XPS 15 9500 i9-10885H CPU 16 2x
3200 MHz (0,3 ns)
(097D) Notebook @ 2.40 GHz
11th Gen
Dell Inc. SODIMM DDR4 Synchronous
Intel(R) Core(TM) 16 1x
0 Latitude 5421 3200 MHz (0.3 ns)
i7-11850H
(0A66) Notebook
@ 2.50 GHz

Table 3
Hyperparameter grids for the two gridsearches performed during the task Gridsearch & fit.
Model Gridsearch #1 Gridsearch #2
n_estimators: [100, 500, 1000, 2000] n_estimators: [2000, 2500, 3000]
Random Forest max_depth: [10, 50, 100] max_depth: [50, 70, 90]
min_samples_split: [2, 10] min_samples_split: [2]
n_neighbors: [10, 20, 50, 100, 500, 1000] n_neighbors: [10, 20, 50]
KNN
p: [1, 2] p: [1]
penalty: [l2, l1, elasticnet] penalty: [l2]
Logistic Regression C: [0.001, 0.01, 0.1, 1, 10, 100, 1000] C: [0.1, 0.3162, 1.0, 3.162, 10.0]
max_iter: [10, 100, 500, 1000] max_iter: [1500, 2000, 3000]

Gridsearches are performed by starting with a coarse Gridsearch and re- • This value was measured using different hardware.
fining the grid intervals once with a second Gridsearch. The considered • The dataset was already processed; processing does not contribute
hyperparameter grids are shown in Table 3. A 5-fold cross-validation to the reported value.
scheme is used. • Only one training session is reported, which does not reflect a
realistic case of trial and error, with multiple training sessions.
(NCR) Burning a lightbulb for an hour. Assuming a 10 W light bulb. • While computationally challenging, it is feasible to replicate this
task on a home computer. This measurement lives in the gray
(NCR) Streaming season 1 of friends. The carbon emission of this task
zone between common and advanced data science.
was calculated using the same measurement for power consumption as
the task Watching a movie and extrapolating this to the duration of the (ADS) Training GPT-3 in the EU. This measurement is an estimate
first season of Friends (2h28′ ). taken from [6], assuming Microsoft’s average datacenter PUE (Power
Usage Effectiveness) of 2015, re-calculated with updated values for the
(NCR) Leaving the office lights on over the weekend. This is calculated by GHGe associated with energy production (EU28 average from 2020)
assuming the office is lit by 8 10 W TL lights, burning from Friday 5pm and GHGe of cars (EU28 average from 2021).
until Monday 9am: the lighting setup at the office of the first author.
2.2. Experiment №2: Running an automated data science project on multi-
(NCR) Heating an office for a day. This is calculated by assuming an
ple hardware architectures
office with following properties:

• 50 m2 area In order to include multiple hardware architectures, and increase


• One outside wall the hardware representativeness of the results, an automated version of
• Solid and insulated cavity walls the data science project as described in Section 2.1 was created [17].
• A ceiling height of 3 m This automated version performs all the steps of the original data
• A minimum roof insulation of 75 mm science project, albeit at a hyper-efficient tempo. This experiment only
• Electrically heated at 21 ◦ C (70 ◦ F) includes measurements on successfully run code, and does not include
any overhead tasks such as (but not limited to) debugging, looking up
Data was taken from [18]. documentation or other browser-related activities, and idling device.
The considered hardware architectures are given in Table 2.
(NCR) Commercial airflight for 1 passenger (BRU → JFK NY). This The aim of this experiment is not to benchmark a vast amount of
carbon emission was calculated assuming a flight from Brussels (BRU) hardware architectures on some data science project; large-scale bench-
to New York (JFK NY), where all seats are economy seats and occupied. marks for various hardware architectures already exist (see, for exam-
Data taken from [19]. ple, [20]). The goal is rather to report and reduce the bias introduced
by considering only one hardware type.
(ADS) Training a CNN on images. This measurement was taken from [6],
The automated data science project can be defined by the following
where more details can be found. This advanced data science task was
step-by-step plan:
performed by training the CNN U-NET for 100 epochs on the LIDC
medical image dataset. This required 1.25 ± 0.25 kWh, as is visible in 1. Loading data and transforming to dataframe.
Figure 1 of Anthony et al. [6]. Note that: 2. Plotting distributions of all features.

3
B. Meulemeester and D. Martens Sustainable Computing: Informatics and Systems 38 (2023) 100864

3. Performing basic processing on the data to allow for training. 𝙽 > 𝚘𝚞𝚝𝚙𝚞𝚝.𝚝𝚡𝚝 (1)
4. Training a logistic regression model on unprocessed data.
5. Iteratively: where -D enables the -R option, -R denotes sampling should be done
on the RAPL interface, -f enables showing the average CPU frequency,
(a) Improved processing on all features. -d denotes a delay before starting the measurement (in seconds), 𝛥𝑡
(b) Checking effect of previous processing by re-training the denotes the time interval between samples (in seconds), N denotes
logistic regression model. the amount of samples and -g enables measuring the GPU power
(c) Plotting distributions of processed features. consumption as well. GPU power consumption and GPU programming
are not explored in this work, as the authors consider this advanced
6. Performing a coarse gridsearch.
data science, rather than common data science. The sampling interval
7. Performing a finer gridsearch.
should be sufficiently small compared to the duration of the task in
The code for this automated workflow is available on github at [17]. order to accurately measure the distribution of its power usage.
While the code allows for multiple iterations of improving data pro- Where possible, CarbonTracker [6] is also used to measure energy
cessing and testing the effect of these improved processing steps on the consumption. CarbonTracker is designed to track and predict the en-
resulting AUC score with a logistic regression model, only one iteration ergy consumption of training AIs, but can be used for any piece of code.
is actually performed. Multiple iterations are not included in this work, It must be noted that CarbonTracker was designed for measuring the
since any choice of number for the amount of iterations would be power consumption of training an AI during one or more epochs, and
arbitrary, and would not increase the realism of the automated data
not shorter code snippets prevalent in common data science. Due to
science project by any measurable metric. Additionally, the authors
this, its accuracy may deviate, especially if the runtime of the code is
argue that a singular iteration of data processing improvement is not
short. For this reason, interpreting the data will be based on the RAPL
too unrealistic for a data science project, albeit a minimum. However,
measurements, and the CarbonTracker measurements are only shown
this argument is subject to the specificity of the author’s workflow, as
for completeness, and as a benchmark to check if it produces similar
mentioned before at the end of Section 2 and Table 1, and is open to
results. Deviating results are also discussed in Section 4.
adversity.

2.3. Assumptions 2.5. A note on the measurement tools

Inevitably, assumptions need to be made in order to compare the As mentioned before, both sampling the RAPL interface directly,
measurements in a comprehensive manner. The measurements in this as well as CarbonTracker [6] is used to measure the power/energy
work are only as truthful as the truth value of these assumptions. These
consumption. The resulting values should be interpreted differently.
will also be revisited in Section 4.5.
The RAPL interface. This method measures by directly sampling the
1. It suffices to measure the power consumption of the CPU to gain
power consumption of the CPU every other time interval. This mea-
insight in the total power consumption of performing various
surement is code-independent. Whether or not your code is running,
computer tasks, as we can estimate the total power consumption
this method measures the power consumption. This method gives a
by assuming a CPU to RAM power consumption of 3 ∶ 2, as
good overview of the entire power consumption of some task, including
estimated by Anthony et al. [6].
2. Differences in power consumption of the automated data science the non-code related parts, e.g. looking up documentation and articles,
project across the measured hardware architectures are repre- debugging, running code that finishes with an error, etc. This mea-
sentative for differences in power consumption of data science surement gives the most accurate measurement of a certain task, but
projects for other hardware architectures. will include variability depending on the efficiency, experience and
3. The energy consumption of common data science on other hard- knowledge of the data scientist performing the task. Note that only
ware structures can, to some extent, be estimated based using the power consumption of the CPU is considered. This may deviate
the Thermal Design Power (TDP) as a proxy for the power from the true power consumption, especially when running code that
consumption of the entire machine, and assuming an inverse relies heavily on RAM. This drawback is discussed in Section 4.5, when
relationship between the total consumed energy for a single revisiting the assumptions made earlier (see Section 2.3).
automated data science task 𝐸𝐴𝑢𝑡𝑜𝐷𝑆 and TDP of the nature:
CarbonTracker. This method measures the energy consumption of a
𝐸𝐴𝑢𝑡𝑜𝐷𝑆 = 𝑎∕𝑇 𝐷𝑃 + 𝑏, where 𝑎 and 𝑏, the dropoff rate and
single piece of code. This measurement gives a good overview of how
intercept respectively, are parameters that are to be estimated.
much energy was consumed by running the code, and nothing but the
4. A car emits as much gCO2 eq as the EU average from 2019.
code. If a script finishes on an error, no measurement is written out.
5. Energy production emits as much CO2 eq as the EU average from
2020. This measurement has a higher repeatability, but lacks information
on the overhead of the tasks: the non-code related parts and running
2.4. Measuring the power: Measurement tools bugged code that finishes with an error.

RAPL vs CarbonTracker. Note that the samples for the Carbontracker


The previous sections expanded on the details of the different measurements are not the same kind of samples as those fore the
experiments. Both experiments require measurements of the power RAPL measurements. Carbontracker reports aggregate values at the end
consumption of the considered hardware device. This section expands of every successful exit of a piece of code, independent of how long
upon the methods used to measure this power consumption.
it took, while RAPL measurements produce a real-time measurement
Measuring the real-time power consumption and/or energy con-
every other time interval, independent of what the computer is doing.
sumption of a task performed on the computer is done by using
The variance of the Carbontracker measurements is thus a variance be-
Ubuntu’s powerstat package to sample the intel RAPL interface. This
tween the total power consumptions of different successfully run code
can be done by sudo apt install powerstat and consequently
snippets. The variance of the RAPL measurements, on the other hand,
running the shell command
is the variance on the real-time power consumption. This difference in
𝚜𝚞𝚍𝚘 𝚙𝚘𝚠𝚎𝚛𝚜𝚝𝚊𝚝 − 𝙳𝚁𝚐𝚏 − 𝚍 = ⟨𝑠⟩ 𝛥𝑡 sample types should be kept in mind when interpreting the results.

4
B. Meulemeester and D. Martens Sustainable Computing: Informatics and Systems 38 (2023) 100864

Fig. 1. Power consumption distributions of working on common computer tasks and Fig. 2. Energy consumption and carbon emission (in km driven by car equivalent) of
data science, measured by CarbonTracker (dark blue) and sampling the RAPL interface working on common computer tasks and data science related activities. Only RAPL
(light teal). RAPL samples are taken every 10 s, while CarbonTracker samples are taken measurements are shown.
every successful run of a piece of code. (For interpretation of the references to color
in this figure legend, the reader is referred to the web version of this article.)

2.6. Converting

While energy usage and carbon emission are inherently linked by


the fact that energy production produces greenhouse gas emissions,
their exact relation depends on the method that is used to produce
energy, and how green these methods are. Varying countries employ
various methods to produce energy. In order to link energy consump-
tion to carbon emission, the EU-28 (28 countries of the European
Union) average greenhouse gas emission associated with electricity
generation during 2020 [21] is used, yielding an average emission per
energy unit of:

CO2 eq∕𝑒𝑛𝑒𝑟𝑔𝑦 = 230.7 g∕kWh (2)

where CO2 eq denotes the amounts of gram CO2 that would yield a
greenhouse effect of equal magnitude as the considered greenhouse
gases, i.e. the seven greenhouse gases considered by the Kyoto Proto-
col [22]: carbon dioxide (CO2 ), methane (NH4 ), nitrous oxide (N2 O),
hydrofluorocarbons (HFCs), perfluorocarbons (PFCs), sulphur hexaflu-
oride (SF6 ) and nitrogen trifluoride (NF3 ).
In order to increase the interpretability of the gCO2 eq quantity,
they are converted to the quantity ‘‘km driven by car’’, as suggested
by Anthony et al. [6]. To do so, we use the average GHGe of every
registered car in the EU up until 2019 [23], yielding a distance per
equivalent carbon emission of:

8.17661488144 m∕gCO2 eq (3)

3. Results
Fig. 3. Carbon emission of common non-computer related tasks and data science
3.1. Experiment №1: Performing a real data science project and comparing (common and large-scale) on a logarithmic scale, expressed in units of km driven by car.
to other common tasks Values for common data science and computer-related tasks are reported for hardware
type 3 (see Table 2).

Fig. 1 shows the power consumption distribution per computer-


related task.
Fig. 2 shows the energy consumption of working on a certain task. These are compared to large-scale data science and other everyday
For tasks with an unspecified end time, a duration of 8 h is used to tasks in terms of carbon emission in Fig. 3. The total energy con-
reflect a day worth of work. sumption (and consequently associated carbon emission) for common

5
B. Meulemeester and D. Martens Sustainable Computing: Informatics and Systems 38 (2023) 100864

Fig. 4. Power distributions for running an automated data science project on different Fig. 5. Total energy used for running an automated data science project on different
hardware structures as described in Section 2.2. The experiment index and CPU are hardware structures as described in Section 2.2. The experiment index and CPU are
given as axis labels, where the indices are congruent with the hardware as shown in given as axis labels, where the indices are congruent with the hardware as shown in
Table 2. Table 2.

data science (i.e. running GridSearchCV() for an hour and doing a


common data science project) are calculated as
5
𝐸𝑡𝑜𝑡𝑎𝑙 = × 𝐸𝐶𝑃 𝑈 (4)
3
to be consistent with a 3 ∶ 2 CPU to RAM power consumption ratio, as
described in Section 2.3.

3.2. Experiment №2: Running an automated data science project on multi-


ple hardware architectures

The power consumption distributions of running an automated data


science experiment on multiple hardwares are shown in Fig. 4. Their
energy consumptions are shown in Fig. 5.

3.3. Estimated extrapolation of energy consumption in function of TDP

Fig. 6 shows an estimated extrapolation of the total energy con-


sumed by a piece of hardware using the TDP as input parameter, where
Fig. 6. Estimated projection of the consumption of common data science depending
we have assumed an inverse relationship between the two variables of on the TDP of the machine. Note that using TDP as sole proxy for power consumption
the shape has caveats (see Section 4.5), and that the error only includes variance on the
measurements, which does not contain all variance mentioned in Section 2.3.
𝑎
𝐸𝐴𝑢𝑡𝑜𝐷𝑆 = +𝑏 (5)
𝑇 𝐷𝑃
and the best fit yielded parameter values of:
4. Discussion
𝑎 = 17.175 ± 5.415
(6) 4.1. Experiment №1: Performing a real data science project and comparing
𝑏 = −0.055 ± 0.208
to other common tasks

3.4. Comparison of considered CPUs to other CPUs Power consumption. We can see in Fig. 1 how CarbonTracker yields
similar results on power consumption as directly sampling the RAPL
interface for Gridsearch & fit, but there is a notable difference between
The Thermal Design Power (TDP) of all CPUs recorded by Passmark the two measurements for the task Exploration & preprocessing. This can
[20] are shown in Fig. 7. The TDP of the CPU considered in Experiment be explained by the fact that the latter task includes a lot of overhead,
№1 is denoted with a blue line, while all additional CPUs considered by such as looking up documentation, scaling columns one by one, pro-
Experiment №2 have a TDP equal to 45 W, as denoted by a gold line. gramming, debugging and running faulty code. All of the previously

6
B. Meulemeester and D. Martens Sustainable Computing: Informatics and Systems 38 (2023) 100864

between energy consumption and carbon emission is specific to each


country, and impose great variance. For example, training your model
in Estonia will yield about 64 times more CO2 eq emission than training
it in Iceland, and training some model for twice as many epochs will
yield an emission about twice as big. The measurements are supposed
to reflect a representative training session.

4.2. Experiment №2: Running an automated data science project on multi-


ple hardware architectures

Figs. 4 and 5 are ordered and indexed to be consistent to each other,


and to Table 2. It is visible how Fig. 4 has an ascending trend from top
to bottom, while Fig. 5 has a descending trend from top to bottom. This
indicates that the CPUs that are capable of drawing the most power end
up consuming the least amount of energy for the same task at the end.
Fig. 5 shows similar results for both the RAPL measurements and
CarbonTracker measurements, except for the case for hardware №3.
Nonetheless, a couple of things stand out.
As a first observation, an 8th generation Intel i7-8550 consumes
more energy than a 7th generation Intel i7-7700. Likely, the fact that
Fig. 7. Thermal Design Power of all CPUs as recorded by Passmark [20]. The TDP of
the latter clocks in at 2.80 GHz, a full 1.00 GHz more than the former,
the CPU considered in experiment №1 is denoted by a vertical blue line, the additional
CPUs considered in experiment №2 by a gold ones (all exactly at 45 W), and the is responsible for this difference.
median TDP by a black one. (For interpretation of the references to color in this figure Secondly, the rest of the CPUs (indices 2, 1 and 0) show a declin-
legend, the reader is referred to the web version of this article.)
ing trend for energy consumption for the same tasks for augmenting
generations (7, 10 and 11 respectively). This indicates that, while a
big jump in CPU frequency will improve performance and energy
mentioned overhead is not captured by CarbonTracker measurements; consumption (as described above), more recent CPU generations and/or
only code that runs without error is measured by CarbonTracker. Car-
more threads also benefit these metrics, even if the clock speed drops.
bonTracker can be interpreted as the hypothetical power consumption
Lastly, the overall power consumption varies for the most part
of a hyper-efficient data scientist that does not make any mistakes,
between 25 W and 75 W, which is not entirely inconsistent with [24],
knows the sklearn library by heart and writes their code instanta-
neously. Gridsearch & fit, on the other hand, is defined by few to no but appear to be on the higher end. This is likely due to the fact
programming, but simply leaving a script running. This explains the that more modern machines are generally capable of higher power
similarity of the measurements of CarbonTracker and directly sampling throughput. A trend already visible in Fig. 4. The most consistent value
the RAPL interface. is hardware structure №3, which was released in 2017, closer to the
publication date of [24] than the other hardware architectures (2017,
Energy consumption. Fig. 2 reports the energy consumption of the same 2020 and 2021 for hardware indices 2, 1 and 0 respectively).
tasks as shown in Fig. 1, sorted from low to high. If no duration is
reported in the label, a duration of an 8 hour working day is considered.
It is notable how streaming a 1.5 h movie requires less energy than 4.3. Projection
sitting through a 1 hour Teams call, despite both entailing similar tasks:
real-time video rendering. This is possibly due to the fact that online
Fig. 6 shows an estimated projection of the total amount of energy
meetings cannot make use of the same video compression methods as
streaming a movie, since the video data of the former is produced in some hardware structure would consume while performing an auto-
real-time. Processing less compressed data would then require more mated data science task (𝐸𝐴𝑢𝑡𝑜𝐷𝑆 ) using the TDP as an input parameter.
computational power and put the CPU under heavier load, requiring Using the TDP as a proxy for power consumption has caveats, as
more energy. It must be noted that the carbon emission of the data- explained further in Section 4.5. The 𝑦-axis is in log-scale, to make
center providing the movie is not taken into consideration. This would the relative error apparent, rather than an absolute error. The 𝑥-axis
increase the associated carbon emission. is in log-scale to better show the relationship for lower values of TDP,
It is also notable how the 2 gridsearches make up about 1∕3 of the which are more common (see Fig. 7). Also note that this Figure does not
total energy consumption of the data science project, despite making include any estimations on the error due to variance between different
up only 14% of the time (1h17′ out of 9h04′ ). This makes sense, as data scientists or different workflows. It is clear that any estimation on
performing a gridsearch makes a CPU run in parallel under heavy load, the power consumption of common data science cannot be truthfully
yielding a high power throughput. This was already visible in Fig. 1. extrapolated to machines with a TDP larger than the ones considered
in this manuscript. For values equal or lower to the ones considered in
Carbon emission. Fig. 3 shows a log-scale of the carbon emission of
various everyday tasks, common data science tasks, and advanced data this manuscript, the relative error on the estimated 𝐸𝐴𝑢𝑡𝑜𝐷𝑆 remains
science tasks. The carbon emission is expressed in units km driven by car steady around 100%, supporting the belief that these extrapolation
as described in Section 2.6. It is visible how all considered common are very crude estimates at best. Given the low amount of datapoints
data science tasks emit less CO2 eq than e.g. leaving the office lights and relatively high variance even for these sparse datapoints, this is
on over the weekend. Even an advanced data science task such as unsurprising. If a better extrapolation is to be made, more data needs
training a CNN on images emits less CO2 eq, when performed under the to be gathered for various hardware structures. This would also open
same conditions as described by Anthony et al. [6] does not emit as up the possibility of assessing the accuracy of using TDP as an input
much gCO2eq as common heating or lighting. Note that the relationship variable for total energy consumption of some hardware structure.

7
B. Meulemeester and D. Martens Sustainable Computing: Informatics and Systems 38 (2023) 100864

4.4. Hardware comparison 4. The conversion of gCO2 eq to km driven by car is based on


the EU average from 2019. European Environment Agency EEA
Fig. 7 shows a distribution of the Thermal Design Power (TDP) of all [23] shows a clear declining trend in car emission. The values
CPUs recorded by Passmark [20]. The distribution does not represent expressed in the latter unit in this paper will become increasingly
the relative frequency of these CPUs; each CPU is a single datapoint, outdated every year. Of course, due to the variance in carbon
independent of its popularity in modern hardware. It shows how the emission for each car, the values also vastly differ depending on
CPU used in Experiment №1 yields a TDP of 15.0 W compared to the which car you consider.
median TDP of 51.0 W. The additional CPUs considered in Experiment 5. As already mentioned at the end of Section 4.1, the conversion
№2 yield TDP values of 45.0 W. Measurements of the tasks as described from energy to gCO2 eq depends heavily on which country you
in Sections 2.1 and 2.2 may yield different values when performed on consider. Outsourcing heavy computational work to countries
other hardware. Previous results indicate that CPUs with a higher TDP with a greener energy production can substantially decrease the
are less energy-consuming for the same task, and the results in this carbon footprint.
work are conservative, likely reporting higher numbers than the results
on a CPU with a median TDP of 51 W. To estimate the additional contribution of common data science to
the global GHGe, let us use the results and the revisited assumptions to
4.5. Assumptions revisited compare common data science to regular computer usage. When using
the RAPL measurements, scaling everything to the same duration, and
Let us revisit the assumptions made in Section 2.3 and verify if they under the assumptions that the relative difference in energy usage be-
hold true, or in which way they should be adapted. They are numbered tween a data science project and normal computer usage in Experiment
in the same way as in Section 2.3 №1 is representative for other computer users, data scientists and data
science in general.
1. The exact contribution of RAM is not measured in this work,
The extra load in global GHGe of common data science when
and the estimated power consumption ratio of CPU to RAM
compared to normal computer usage for a single device (hardware
3 ∶ 2 remains an estimated value, adding additional uncertainty
index 3) is equal to
to the reported values. The power consumption of the screen
is also not reported. However, when considering the power 𝐸𝐷𝑎𝑡𝑎𝑠𝑐𝑖𝑒𝑛𝑐𝑒𝑃 𝑟𝑜𝑗𝑒𝑐𝑡 8 hr 100.00 Wh
=
consumption of the task ‘‘Baseline (idle)’’ in Fig. 1, it is visible 𝐸𝑁𝑜𝑟𝑚𝑎𝑙𝑈 𝑠𝑎𝑔𝑒 9.1 hr 34.72 Wh (7)
how screen use does not contribute substantially to the overall = 2.57
power consumption of some task, contributing about 2 W in
total. This value can only be extrapolated to other hardware architectures
2. The differences of energy consumption and associated carbon under the condition that any change in hardware architecture has
emission of the automated data science project across the consid- an equal effect on both measurements. It is not improbable that one
ered hardware architectures (see Fig. 5) could be extrapolated to measurement would benefit more from e.g. a more efficient processor
other hardware architectures by considering the TDP as relevant than the other. This makes this result unsuitable for extrapolation to
parameter (see Fig. 7). However, there is insufficient data to as- other hardware types, as an error on such an extrapolation cannot be
sess the exact nature of this relationship, be it linear, exponential assessed a priori.
or a power-law for example. In addition, there are some caveats In order to estimate the additional load in GHGe due to common
when using the TDP as a metric for power consumption of a CPU: data science on a global scale, rather than for a single device, one would
have to link this value to the relative global GHGe of household ICT
• The TDP values from the manufacturers are often incor- devices of 45% [10]. To do so, one would need a value for how many
rect [25]. The authors are e.g. sceptical of the fact that 3 of these devices are used for data science (and how often), which is
out of 4 considered CPUs are reported to have a TDP of very hard to obtain.
exactly 45.0 W.
• They only apply when the CPU is under full load on all 5. Conclusion
cores, which is rare in most cases, but not uncommon
during some stages of data science (e.g. performing a The results made clear that common data science (i.e. any form of
gridsearch). data science that can be performed on your laptop at home) requires
• Actual power usage can be altered when the computer is substantially more power than other common computer tasks, yielding
running via power plans (e.g. switching to battery power). an associated carbon footprint that is about 2.57 times higher than
• They do not take into account the electrical usage of other normal computer usage. Keeping in mind that data science jobs have
components, such as the screen, hard drive, RAM, etc. been on a steady popularity increase over the last few years, this implies
For these reasons, our comparison between common data science a higher global GHGe associated with common data science.
to other tasks, as visible in Fig. 3 reports the most conservative When comparing common data science to advanced data science, it
measured value for performing common data science (i.e. on the is clear how the power consumption increases exponentially along with
least energy efficient hardware structure). the complexity of the data science. While advanced data science tasks,
3. We have assumed an inverse relationship between the 𝑇 𝐷𝑃 and such as training GPT-3, require a substantial amount of power, this is
𝐸𝐴𝑢𝑡𝑜𝐷𝑆 of the shape 𝐸𝐴𝑢𝑡𝑜𝐷𝑆 = 𝑎∕𝑇 𝐷𝑃 + 𝑏. While this inverse not the case for common data science.
relationship is apparent in Fig. 5, and it is sensible to assume that CPUs with a higher power profile appear to use their power more
more powerful CPUs are also more efficient, despite drawing efficiently, ultimately leading to a lower total energy consumption, and
more power for a short amount of time, the exact nature of this thus a lower associated carbon footprint.
relationship is an educated guess. Apart from the large range of The reported results in this manuscript are not suitable to be extrap-
possible fits, as seen in Fig. 6, the relationship may well be not olated to other hardware architectures. More data is needed to open up
perfectly inverse. The fact that multiple points have the same the possibility for such an estimate.
TDP and different values for 𝐸𝐴𝑢𝑡𝑜𝐷𝑆 already suggests that TDP The results and discussion considered a very conservative estimate
is not a perfect predictor, as already discussed in Section 4.3. A of the carbon footprint of common data science. Even with this value in
trend is not yet visible, due to the low variance in measured TDP mind, common data science does not appear to contribute substantially
values. to the global carbon emissions compared to other everyday tasks.

8
B. Meulemeester and D. Martens Sustainable Computing: Informatics and Systems 38 (2023) 100864

While the uprising in data science jobs can indeed be associated with [4] A. Rizwan, A. Zoha, I.B. Mabrouk, H.M. Sabbour, A.S. Al-Sumaiti, A. Alomainy,
an increased carbon footprint, it will prove much more substantial M.A. Imran, Q.H. Abbasi, A review on the state of the art in atrial fibrillation
detection enabled by machine learning, IEEE Rev. Biomed. Eng. 14 (2020)
to be mindful about the power consumption and associated carbon
219–239.
emission of some everyday tasks. Efficient heaters and lighting, insu- [5] M. Holmstrom, D. Liu, C. Vo, Machine learning applied to weather forecasting,
lation, green energy production and green travel methods will always Meteorol. Appl. 10 (2016) 1–5.
beat, by a landslide, preferring e.g. a RandomsearchCV() over a [6] L.F.W. Anthony, B. Kanding, R. Selvan, Carbontracker: Tracking and predicting
GridsearchCV() when doing common data science. the carbon footprint of training deep learning models, 2020, arXiv:2007.03051.
[7] G. Cook, J. Lee, T. Tsai, A. Kongn, J. Deans, B. Johnson, E. Jardim, B. Johnson,
That does not mean, however, that efforts to reduce the carbon Clicking Clean: Who Is Winning the Race to Build a Green Internet? Technical
footprint of common data science are in vain. Mindfulness of the report, Greenpeace, 2017.
environmental impact of data science remains important, whether it [8] E. Masanet, A. Shehabi, N. Lei, S. Smith, J. Koomey, Recalibrating global
is applied to common or large-scale data science. The methods de- data center energy-use estimates, Science 367 (6481) (2020) 984–986, http:
//dx.doi.org/10.1126/science.aba3758.
scribed in this work can be used for both. Such mindfulness has not
[9] Z. Cao, X. Zhou, H. Hu, Z. Wang, Y. Wen, Towards a systematic survey for
been reached, given the absence of such numbers in hyper-scale data carbon neutral data centers, 2022, arXiv:2110.09284.
science projects such as GPT-3 and DALL⋅E, but also in most other [10] L. Belkhir, A. Elmeligi, Assessing ICT global emissions footprint: Trends
data science related works. If the reporting of the carbon footprint to 2040 & recommendations, J. Clean. Prod. 177 (2018) 448–463, http://
of data science becomes ubiquitous, independent of the scale of the dx.doi.org/10.1016/j.jclepro.2017.12.239, URL: https://www.sciencedirect.com/
science/article/pii/S095965261733233X.
data science project, it would unlock the possibility of assessing the [11] IEA, Global data centre energy demand by data centre type, 2010–2022,
contribution of common and advanced data science to global GHGe to 2021, https://www.iea.org/data-and-statistics/charts/global-data-centre-energy-
a much more precise and realistic degree. demand-by-data-centre-type-2010-2022. (Last accessed 08 March 2022).
[12] R. Bashroush, A. Lawrence, Beyond PUE: Tackling IT’s Wasted Terawatts, Uptime
Institute, 2020, Available from: https://uptimeinstitute.com/beyond-puetackling-
CRediT authorship contribution statement
it’s-wasted-terawatts. (Accessed 11 June 2020).
[13] N. Rteil, R. Bashroush, R. Kenny, A. Wynne, Interact: IT infrastructure energy
Bjorge Meulemeester: Conceptualization, Data curation, Formal and cost analyzer tool for data centers, Sustain. Comput. Inform. Syst. 33
analysis, Investigation, Methodology, Resources, Software, Validation, (2022) 100618, http://dx.doi.org/10.1016/j.suscom.2021.100618, URL: https:
Visualization, Roles/Writing – original draft, Writing – review & edit- //www.sciencedirect.com/science/article/pii/S2210537921001062.
[14] News LinkedIn, LinkedIn jobs on the rise 2022: The 25 U.S. roles that are
ing. David Martens: Conceptualization, Funding acquisition, Investiga-
growing in demand, 2022, https://www.linkedin.com/pulse/linkedin-jobs-rise-
tion, Methodology, Project administration, Supervision, Roles/Writing 2022-25-us-roles-growing-demand-linkedin-news. (Last accessed 28 February
– original draft, Writing – review & editing. 2022).
[15] News LinkedIn, LinkedIn jobs on the rise 2022: The roles that are grow-
ing in demand, 2022, https://www.linkedin.com/pulse/linkedin-jobs-rise-2022-
Declaration of competing interest
roles-growing-demand-linkedin-news-europe/. (Last accessed 28 February 2022).
[16] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, I.
The authors declare that they have no known competing finan- Sutskever, Zero-shot text-to-image generation, 2021, http://dx.doi.org/10.48550/
cial interests or personal relationships that could have appeared to ARXIV.2102.12092, URL: https://arxiv.org/abs/2102.12092.
influence the work reported in this paper. [17] B. Meulemeester, D. Martens, How sustainable is ‘‘common’’ data science
in terms of power consumption? 2022, URL: https://github.com/bgmeulem/
EmissionCommonDS.
Data availability [18] electricpoint, Electric heating room size calculator, 2022, https://www.
electricpoint.com/heating/electric-heating/how-to-calculate-kw-required-to-
How the data on energy usage is obtained is described in de- heat-a-room. (Last accessed 01 March 2022).
[19] ICAO, ICAO carbon emissions calculator, 2016, https://www.icao.int/
tail. We encourage readers to report the same numbers on their own
environmental-protection/CarbonOffset/Pages/default.aspx. (Last accessed
architecture and data science project. 01 March 2022).
[20] Passmark, CPU mega list, 2022, https://www.cpubenchmark.net/CPU_mega_
References page.html. (Last accessed 01 March 2022).
[21] European Environment Agency EEA, Greenhouse gas emission intensity of
[1] M. Hiransha, E.A. Gopalakrishnan, V.K. Menon, K. Soman, NSE stock mar- electricity generation by country, 2021, https://www.eea.europa.eu/data-and-
ket prediction using deep-learning models, Procedia Comput. Sci. 132 (2018) maps/daviz/co2-emission-intensity-9. (Last accessed 04 February 2022).
1351–1362. [22] K. Protocol, Kyoto Protocol, UNFCCC Website, 1997, Available Online: http:
[2] A. Toth, L. Tan, G. Di Fabbrizio, A. Datta, Predicting shopping behavior with //unfccc.int/kyoto_protocol/items/2830.php. (Accessed 1 January 2011).
mixture of RNNs, 2017, ecom@ sigir. [23] European Environment Agency EEA, CO2 performance of new passenger
[3] T.B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. cars in Europe, 2021, https://www.eea.europa.eu/ims/co2-performance-of-new-
Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. passenger. (Last accessed 04 February 2022).
Krueger, T. Henighan, R. Child, A. Ramesh, D.M. Ziegler, J. Wu, C. Winter, C. [24] A. Menezes, A. Cripps, R. Buswell, J. Wright, D. Bouchlaghem, Estimating
Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, the energy consumption and power demand of small power equipment in
S. McCandlish, A. Radford, I. Sutskever, D. Amodei, Language models are few- office buildings, Energy Build. 75 (2014) 199–209, http://dx.doi.org/10.1016/
shot learners, 2020, http://dx.doi.org/10.48550/ARXIV.2005.14165, URL: https: j.enbuild.2014.02.011, URL: https://www.sciencedirect.com/science/article/pii/
//arxiv.org/abs/2005.14165. S0378778814001224.
[25] I. Cutress, Why intel processors draw more power than expected: TDP
and turbo explained, 2018, https://www.anandtech.com/show/13544/why-intel-
processors-draw-more-power-than-expected-tdp-turbo. (Last accessed 08 March
2022).

You might also like