You are on page 1of 21

Analyzing Software Energy Usage via Natural Language Processing of StackOverflow

Queries
S. Deepajothi1,*, Nour Ali2, Mithileysh Sathiyanarayanan3
1
SRM Institute of Science and Technology, Kattankulathur Campus, Chengalpattu, 603 203,
*Corresponding Author Email: phddeepajothis@gmail.com
2
Computer Science, College of Engineering, Design and Physical Sciences
Brunel University, London, UK. Email : nour.ali@brunel.ac.uk
3
University of London, UK MIT Square, London. Email : mithileysh@mitsquare.com
Abstract
Transforming into green software engineering involves considering decisions on
reducing software's Energy Consumption (EC). Conventionally speaking, EC was correlated
directly to electrical and computer engineering and was thought to be hardware optimization
of embedded systems. Though GPUs and other hardware applications continue to regulate
EC, more responsibility is being shifted to developers of both OS and other applications.
Most of the work in the literature helps developers to form a prototype of applications for EC
and attempt to streamline energy with the help of software repositories. Among these
repositories, Q&A websites, such as StackOverflow, are valuable as they give a
comprehensive idea of issues faced by developers and users related to energy. Moreover,
several studies related to energy published in StackOverflow revealed that programmers
needed clarifications in many areas of energy consumption. This study relies mainly on
StackOverflow data, and the data was curated with the help of NLP (Natural Language
Processing) to pinpoint relevant queries related to the EC by software. In the initial data
collection, Stack Exchange Data Explorer is used. SQL queries related to tag and date are
used for collecting the data. The posts created on or after 01-01-2014 to 31-03-2022 are
considered. The curated sample consists of 9484 questions and 10770 answers. Statistical
analysis and Natural Language Processing (NLP) techniques, such as topic modeling, are
used to mine the text (questions and answers). It is observed that the number of queries is
increasing. With the development of mobile devices, energy consumption issues now vary,
including mobile devices. The finding is that the developers believe that they could learn how
to improve energy efficiency in several ways. The want to learn is apparent in the comments
and indicates that they use the know-how of other developer, tools, example codes,
documentation, etc, to improve their knowledge.
Keywords: Energy Consumption, Mining, StackOverflow, Natural Language Processing,
Energy Efficiency, Software
1. INTRODUCTION
Over the past 10 years, there has been a sudden change in the manner in which
individuals, as well as enterprises, use computing devices. Even the common person has
started using computing platforms such as smartphones, watches, and tablets, and how he
interacts with software. Even immobile platforms such as data centers and desktop
environments have become common. Most of the computing devices the common person
uses are battery-driven; thus, even a small, well-optimized task, such as texting, consumes
energy [1]. Simultaneously, software that is poorly optimized might lead to a waste of energy
and drain the battery in a much faster manner. As a result, it is essential to formulate highly
efficient solutions with more authentic results that are quite innovative concerning the OS,
runtime systems, and hardware/architecture.
Formerly, the assumption was that hardware use was the primary source of Energy
Consumption (EC), so concentration should be paid to tasks that consume less energy.
Recently, it has been found that running software can use up as much energy as hardware and
thus is responsible for the EC. Over the last decade, many studies have been conducted to
formulate a green software design as a concern of core development in order to enhance the
EC of software systems overall. Most of these studies aim to observe EC from higher layers
of computer stack and application software [2]. These approaches also focus on
complementing the previously formed OS/hardware-centric applications so they are not
discarded at the application level while improvising the software applications. Thus,
qualitative and quantitative studies [3. 4, 5, 6] have conducted in-depth surveys on
developers' knowledge quotient of the EC of software.
The efficiency of energy can be influenced by the programmers’ knowledge and design
choices [1]. So, software developers must analyze and apply techniques that can help
conserve energy while tackling real-world problems. For instance, software
developers/programmers should come to understand how energy ‘behaves’ at various levels
and design prudently to improve the use of energy effectively [18]. Though extensive studies
have been done on software EC, and several remedies have been suggested, programmers'
perspective on EC has not been focused much. There are no clear answers to common and
simple questions related to software types and energy usage or requirements and EC.
This work aims to get better clarity on software engineers' views on the EC by software
and how they deal with it. This issue was studied and researched by many researchers, and a
qualitative report was published and summarized based on the experiences of skilled
developers [2, 7, 8]. NLP helps identify sentiments, topics, location of entities in sentences,
and category of blog/article. Text mining involves the exploration of substantial textual data
and the location of patterns, which includes identifying frequent words, sentence length, and
the absence or presence of specific words. Co-occurring keywords are identified by NLP in
order to summarize the large collection of textual information. This helps to discover hidden
topics in a document, annotate these topics with the documents, and organize more
unstructured data. NLP helps in mining appropriate queries that are related to the
consumption of software energy [12].
Among software help repositories, StackOverflow is popular and gives a comprehensive
idea of issues faced by developers and users related to energy. The discussions were analyzed
in order to obtain better insights into this issue. To evaluate their knowledge of consumption
and efficiency of software energy and, more specifically, some of the questions that needed
answers to include:
RQ1: What questions about software energy consumption have been raised on
StackOverflow?
RQ2: What is the dominant topic of discussion related to energy in StackOverflow?
RQ3: What common solutions are suggested for software EC to the raised
questions/issues?
The study utilized data from StackOverflow, a popular Q&A website in software
development with over 14 million registered users, more than 21 million queries, and 31
million answers [19]. The data was easily accessible through the Stack Exchange Data
Explorer tool, allowing random SQL queries to run on the public data. The main findings of
this study are the following:
 It was found that specifying energy requirements directly is complex, and developers tend
to focus on general patterns and high-level designs regarding energy usage.
 The dominant topic of discussion related to energy was software design, and empirical
studies on its impact need to be investigated.
 The study revealed that software engineering positively impacts EC in different devices,
and common solutions suggested included optimizing code for efficiency, reducing
resource usage, and using hardware-level power management features.
 The developers desired to learn how to improve energy efficiency and used various
resources such as examples, documentation, and tools to enhance their knowledge.
The rest of the paper is organized into 4 sections, presenting the related works,
methodology, findings, and conclusion.
2. RESEARCH METHODOLOGY
In this section, we outline the procedure for collecting data and elucidate the qualitative
research methodology that we utilize.
Data from StackOverflow, a popular and efficient Q&A website, was used for our study.
This is considered one of the most prominent forums in software development and is used in
software engineering studies. The popularity of this website is proven by the fact that as of
March 2021, there are 14 million registered users, and this website has received more than 21
million queries and 31 million answers. Moreover, the data is easily accessible through Stack
Exchange Data Explorer, which is an open-source tool that is used to run random SQL
queries as opposed to public data from the Exchange network.
Another mode of identifying appropriate answers for the queries is through ‘tags’
associated with the questions [20]. Though tags act as a helpful navigator, the disadvantage
with tags is their inconsistency, which can lead to ambiguity due to synonyms, hyponyms,
acronyms, and spelling variations, which might lead to the irrelevant text content of the user
post and, at times these inconsistencies can lead to a condition called ‘tag explosion’ where
certain tags become overused.
The flow of the investigation is shown in Figure 1.

Figure 1. Steps in the Investigation


This paper focuses on software-related questions, and “Energy Efficiency (EE)” and
“Energy Consumption (EC)” are used for data collection. In the initial stage, the questions,
answers tags, and other metadata were extracted from the StackOverflow dump. This data is
filtered to find the questions related to energy consumption. The tags from these questions
extract the questions from the StackOverflow. The data explorer used is StackExchange, and
data collection is done using SQL queries related to tag and date. Some tags include power
management, power-saving, CPU usage, energy, EE, EC, CPU architecture, cuda, and
battery. The duration of the study was from 01-01-2014 to 31-03-2022. Appendix I shows the
partial sample of data collected with the help of Stack Exchange Data Explorer. The sample
consisted of 9484 questions and 10770 answers. The data is semi-structured with
unstructured questions and answers with standard fields such as ID, CreationDate, Score,
ViewCount, Title, and Tags for filtering questions. Most generated content is untagged, as
tags cannot be openly used for an answer or comment type. In case of misuse of tags or
inappropriate tagging, the posts’ content might not be represented by tags [11].
Raw data should be restructured and reprocessed to remove redundancy, inconsistency,
and irregularity in order to improve the efficiency and quality of the classification process.
This process is carried on by removing repetitive words, appropriate punctuations, and the
removal of non-English characters, which might improve the proficiency and adeptness of the
data. Tokenizing, re-tweets, hashtags, and URLs should also be removed.
Stop words such as ‘and’, ‘are’, ‘this’, and so on, which occur frequently but do not
contribute to the content or context of the textual document, should be removed, though they
are essential for forming a meaningful sentence. They might not be attributed to the
classification of the document. With regard to textual sources, this stop-word list is
inconsistent and challenging, but this process decreases the text data and enhances the
performance of the system [21].
The following essential aspect is the ‘stemming’ of words. Stemming algorithms can
reduce the grammatical forms in the morphological structure of the language. By stemming, a
work is taken to its root level. Various grammatical forms or word forms, such as adjectives,
adverbs, nouns, verbs, etc., are taken to their root form [13]. Stemming can be considered the
conflation of several forms of words into a single word. To illustrate, the terms close, closed,
closed, and closing can stem from close, which is the word’s root form. Stemming is usually
done by removing attached prefixes and/or suffixes from the index term. As the stem of the
term represents a broader concept than the original term, the number of retrieved instances
eventually increased. As the meaning is the same, but the word form is different, it is
essential to identify every word with its base form. The basic idea is to reduce the total
number of distinct terms in a document, which will reduce the processing time of the final
output.
Another NLP technique is topic modeling, which can extract topics automatically from
the corpus of textual data. Interpretable, semantically consistent topics are generated by topic
models, which can be represented by listing the most probable words that describe every
topic. A topic model is defined as a generative model for documents, as this can help the
generation of documents through a probabilistic procedure. During the compilation of a new
record, a distribution is selected over topics. Subsequently, for every word, a topic is chosen
randomly for every topic based on the distribution, and a word is selected from the topic. The
set of topics can be inferred through standard statistical techniques responsible for generating
document collection, thus reversing the modeled process of authoring [14].
A popular modelling technique is Latent Dirichlet Allocation (LDA), which helps
generate topics from the corpus of data [22]. It is used commonly in several domains,
including software mining repository studies. This prototype provides flexibility in topic
modeling through the various possibilities of treating a document as a member of multiple
topics and treating a topic as a mixture of words that could lead to several topics. According
to StackOverflow, a document can be defined as a single thread, including a question with a
title, body, and the appropriate answer. LDA can be considered as a generative probabilistic
prototype of a corpus. The basic idea behind this is that the documents can be represented as
an arbitrary mixture of latent topics, where the attributes of a topic can be authenticated
through the distribution of words. A topic is represented by LDA through word probabilities.
In each topic, the word with the highest probability usually gives a better idea of the
probability of the word in LDA [23].
A corpus is modelled through an unsupervised generative probabilistic method and is
considered the most used modeling technique. According to LDA, every document is
considered a probabilistic distribution over latent topics, and all documents share a common
Dirichlet. In the LDA model [24], each latent topic is also represented through the
probabilistic distribution of words, and there is a common Dirichlet for word distribution
prior. Given a corpus D consisting of M documents, with document d having N d words (d ∈
1,..., M), LDA models D according to the following generative process [16]:
(a) Choose a multinomial distribution ψ t for topic t (t ∈{1,..., T}) from a Dirichlet
distribution with parameter β
(b) Choose a multinomial distribution θd for document d (d ∈{1,..., M}) from a Dirichlet
distribution with parameter α.
(c) For a word wn (n ∈{1,..., Nd }) in document d

3. EMPIRICAL EVALUATION
This section presents our findings from analyzing the data collected from
StackOverflow. At a high level, we are interested in answering three main research questions.
3.1 RQ1 What questions related to software EC have been raised on StackOverflow?
For answering RQ1, the questions related to software EC addressed in StackOverflow
are investigated. To address this, we aimed to connect the questions (queries) score and the
number of answers associated with EC. To answer the RQ2, we used the tags of the queries
to find the dominant topic of discussion related to energy in StackOverflow. Also, topics
generated through LDA help identify the topics [25]. Finally, for RQ3, common solutions
recommended for software EC to the raised questions/issues are found using the type of
devices and solutions offered. The text content of user posts from the StackOverflow website
is used to find the technology trends over time and discuss topics among developers.

Figure 1: Questions and answers per year.


From the post from 01-01-2014 to 31-03-2022, the number of queries related to EE is
9,484. Figure 1 presents the yearly distribution of questions and answers related to energy
topics on the StackOverflow platform. The data showcases the engagement of the
StackOverflow community in discussing energy-related queries and providing solutions.
Starting from 2014, the number of energy-related questions has fluctuated, ranging from 1000
to 1392 questions over the years. Correspondingly, the answers provided by the community
have also varied, with a peak of 1666 answers observed in 2021-22. This dataset indicates a
consistent interest in energy-related topics within the developer community, with certain
years witnessing increased activity, potentially reflecting advancements, challenges, and
evolving discussions surrounding energy-related software development practices and
concerns.
The quality of each post, according to users, is collaboratively evaluated using a
voting system. Each question or answer can receive up-votes or down-votes from users, with
the sum of votes (up-votes minus down-votes) acting as its overall voting score. The votes
awarded to a user’s posts are accumulated in their ‘reputation, ' another score associated with
individual users intended to identify expert users [17]. The higher the score, the better the
answers. It is seen that the score range (1 to 5) comprises 90.53%, (6 to 10) 6.09% and (>10)
3.48%. The maximum score range is above 200, as seen in queries 1,2, 3, 4.
The ViewCount refers to the number of users who viewed the answers. Though the
number of queries is more or less in the same range, the views have been much higher in the
last two to three years, showing increased interest in EE. Each question can have any number
of answers or responses. The answers can reflect different methods to handle a query. This
gives an idea of how a developer resolves the EC issue. The comments are not answers to the
question but opinions on whether the said answer or solution was effective. Some of the
queries have a maximum comment count5. The total count for ViewCount, Answers, and
comments are shown in Appendix II.
The ability to mark only one answer as accepted is a rule applied to questions on the
platform. This rule is employed to determine the success of a question, which is categorized
as follows: a successful question possesses an accepted answer, an ordinary question has
multiple answers, but none are taken, and an unsuccessful question has no replies. Utilizing
these definitions, Table 1 compares the number of questions within our base group.
Table 1: The success status of questions on StackOverflow
Year Successful (%) Ordinary (%) Unsuccessful (%)

1
https://stackoverflow.com/questions/31326015
2
https://stackoverflow.com/questions/50622525
3
https://stackoverflow.com/questions/53422407
4
https://stackoverflow.com/questions/25185405
5
https://stackoverflow.com/questions/21826693
2014 57.929 32.077 9.994
2015 54.131 31.713 14.156
2016 48.641 33.288 18.071
2017 42.308 33.987 23.705
2018 42.761 33.461 23.778
2019 41.932 32.386 25.682
2020 41.291 28.745 29.963
2021 32.496 31.315 36.189
In our analysis of StackOverflow queries, we examined several key metrics, including
Score, ViewCount, Answers, Comments, and FavoriteCount. These metrics provide valuable
insights into the engagement and popularity of questions within the StackOverflow
community. Table 2 shows the Average, Standard Deviation, and Median for the Score,
ViewCount, Answers, Comments, and FavoriteCount.
The average score, representing the cumulative votes received by questions, indicates
the queries' overall perceived quality and relevance. Questions garnered 2.956 votes on
average, suggesting a generally positive community response. A higher average Answers
count signifies a more engaged and responsive community with a rich pool of knowledge-
sharing. The standard deviation for Score, ViewCount, Answers, Comments, and
FavoriteCount measures the variability or dispersion of these metrics across all questions. A
higher standard deviation indicates a wider range of values, signifying varying degrees of
engagement and interest among questions.
Table 2: The Statistics of Questions
Average Std. Dev Median Histogram

Score 2.956 8.189 2

ViewCount 2604.612 11321.85 779.5


Answers 1.136 1.009 1

Comments 2.138 2.873 1

FavoriteCount 0.856 3.374 0

Summary of RQ1: Our analysis of question categories over the years revealed
changing trends. In 2020, 50.27% of questions were successful, 27.32% were ordinary, and
22.4% were unsuccessful. In contrast, 2016 had 37.64% successful, 43.25% ordinary, and
19.11% unsuccessful questions. Notably, 2017 showed a shift with 45.43% successful, 31.2%
ordinary, and 23.37% unsuccessful questions. These figures highlight the dynamic nature of
question categories in energy consumption.
3.2 RQ2 What is the dominant topic of discussion related to energy in StackOverflow?
The top 5 topics discussed are Energy, Bluetooth, cuda, Android, and C++.
Energy. Questions about energy, for example, “My current project is a constant-
presence application (think Tinder or Foursquare), and the battery consumption is through the
roof. We think the main draw on power is the GPS and Wi-Fi antennas. We want to be able
to measure our app's energy usage under several different configurations.”6
“I'm running a few Blockchain-related containers in a cloud environment (Google
Compute Engine). I want to measure the containers' EC or the instance I'm running.”7
Bluetooth. Questions about Bluetooth, for example, “A Bluetooth low energy device
is uniquely identified by its address (in the Android API, they call this the MAC address and
denote it as colon-separated hex values, e.g., 11:aa:22:bb:33:cc). But to uniquely identify a

6
https://stackoverflow.com/questions/24831813
7
https://stackoverflow.com/questions/61509608
BLE address, you need to know whether it's public or private. In essence, 49 bits are
necessary to identify an address, not 48.”8
"<p>I can send data up to 20 bytes by connecting to an external BLE device. How do
I send data greater than 20 bytes? I have read that we must either fragment the data or split
characteristics into required parts. If I assume my data is 32 bytes, could you tell me what
changes I need to make in my code to get this working? Following are the required snippets
from my code:…..”9
Cuda, Questions about cuda, for example, “I was just reading: href="
https://stackoverflow.com/questions/40431599/efficiently-dividing-unsigned-value by-a-
power-of-two-rounding-up"> Efficiently dividing the unsigned value by a power of two,
rounding up and I was wondering what the fastest way was to do this in CUDA. Of course,
by "fast" I mean in terms of throughput (that question also addressed the case of subsequent
calls depending on each other)….”10
“I'm working on a business project that is done in Java, and it needs huge computation
energy to compute business markets—simple math but with a huge amount of data.</p>
<p>We ordered some CUDA GPUs to try it with, and since CUDA does not support Java,
I'm wondering where to start. Should I build a JNI interface? Should I use JCUDA, or are
there other ways?...”11
Android. Questions about Android, for example, “I have a smartphone connected to a
solar charger. By day, it is powered correctly. But during the night, sometimes, it turns itself
off due to the lack of energy. My question is: Can it turn it back on (programmatically) when
the battery charge exceeds a certain percentage?.."12
“After upgrading to Android version 6.0, Bluetooth Low Energy (BLE) scanning will
only work if Location services are enabled. See here for reference: <a href="
https://stackoverflow.com/questions/33043582/bluetooth-low-energy-startscan-on-android-6-
0-does-not-find-devices/33045489#33045489">Bluetooth Low Energy startScan on Android
6.0 does not find devices…”13
C++. Questions about C++, for example: “I am using C++/winRT UWP to discover
and connect to Bluetooth Low Energy devices. I am using the advertisement watcher to look
for advertisements from devices I can support. This works. Then, I pick one to connect to.
8
https://stackoverflow.com/questions/23471364
9
https://stackoverflow.com/questions/24135682
10
https://stackoverflow.com/questions/43564727
11
https://stackoverflow.com/questions/22866901
12
https://stackoverflow.com/questions/34601041
13
https://stackoverflow.com/questions/33045581
The connection procedure is a little weird by my way of thinking, but according to the
Microsoft docs, one Calls this FromBluetoothAddressAsync() with the BluetoothAddress,
and two things happen: one gets the BluetoothLEDevice, AND a connection attempt is made.
One needs to register a handler for the connection status changed event, BUT you can't do
that until you get the BluetoothLEDevice.”14
“We were working on our audio player project on Mac and noticed that the power
usage was so high (about 7x that of Google Chrome doing the same workload.) I used
Xcode's energy profiling tool, and one of the problems was we had too much cpu-wake
overhead.”15
Table 3 presents a breakdown of questions and answers across different themes and
categorizes questions as successful, ordinary, or unsuccessful within each category.
Another key mechanism on the Stack Overflow site is the use of tags to identify the
content or theme of each post. When a user asks a question, the platform prompts them to add
a small number of content tags (at least one and at most five). A total of 480 tags were used
in the 9,484 posts. The figure shows the word cloud of all tags. The dominant words in tags
and titles were found using statistical methods.
Summary of RQ2: Our analysis categorized EC questions into five themes: Energy,
Bluetooth, cuda, Android, and C++. Notably, less than 30% of questions in these categories
remained unanswered. Surprisingly, the "CUDA" category, despite having a relatively high
number of questions, exhibited the highest success rate and the lowest rate of unanswered
questions. In contrast, "Android" questions had the lowest percentage of successful questions.
These findings shed light on the varying dynamics and engagement levels within different
thematic categories of EC-related queries on StackOverflow.
Table 3: Common EC themes with their status and popularity
Topic Q A A/Q Successful Ordinary Unsuccessful
Energy 1407 1501 1.067 507 502 398
Bluetoot
2987 3100 1.038 1007 1095 885
h
Cuda 3406 4267 1.253 2239 902 265
Android 2101 2225 1.059 673 792 636
C++ 608 724 1.191 362 167 79
Topic Score ViewCoun Answer Comments FavoriteCount

14
https://stackoverflow.com/questions/55768125
15
https://stackoverflow.com/questions/29310100
t s
Energy 4589 4033547 1501 2753 1479
Bluetoot
2987 7673004 3100 5453 2448
h
1090
cuda 10718605 4267 8587 3475
4
Android 6558 5264212 2225 3832 1783
C++ 1572 1216161 724 1833 442

Figure 9: Word Cloud of Tags

Figure 10: Word Cloud of Dominant Words in Title


3.3 RQ3 What common solutions are suggested for software EC to the raised
questions/issues?
In the qualitative analysis, the most common themes regarding reducing EC are
distinguished as follows:
Optimizing code for efficiency: Code that lacks optimization, contains unnecessary
loops, or doesn't take advantage of hardware features can lead to inefficient execution and
increased energy consumption.16,17,18
Optimized Algorithms and Data Structures: Reducing resource usage by avoiding
costly operations and data structures, reducing software running frequency, and turning off or
reducing the intensity of background processes.19,20,21
Graphics performance: Graphics performance is a complex interplay of hardware
capabilities, software optimization, display characteristics, and user settings, all of which are
crucial to delivering a seamless and visually pleasing experience in graphics-intensive
applications and tasks. Graphics performance can be improved through software
optimization, including efficient rendering algorithms, culling techniques, and better multi-
threading to use multi-core CPUs.22,23,24
Background Processes and Services: Unnecessary background processes or services
running alongside the main application can drain energy. Developers often recommend
minimizing or optimizing such processes.25,26,27,28
Battery Management: Battery management is suggested through power-aware
scheduling, power capping, and EE algorithms and techniques.29,30,31
Energy-Aware Design Patterns: Power-aware design patterns, such as the
"Adaptive Computing" pattern, allow the software to adapt its behavior based on power
availability and constraints. Designing energy-efficient software architectures through
software componentization, system-level power management optimization, and creating
systems with energy efficiency in mind are also suggested.32,33,34

16
https://stackoverflow.com/questions/24135682
17
https://stackoverflow.com/questions/22817005
18
https://stackoverflow.com/questions/53002816
19
https://stackoverflow.com/questions/23354974
20
https://stackoverflow.com/questions/35978151
21
https://stackoverflow.com/questions/36071573
22
https://stackoverflow.com/questions/34567088
23
https://stackoverflow.com/questions/22128872
24
https://stackoverflow.com/questions/56699763
25
https://stackoverflow.com/questions/22785140
26
https://stackoverflow.com/questions/44215439
27
https://stackoverflow.com/questions/25067341
28
https://stackoverflow.com/questions/28255475
29
https://stackoverflow.com/questions/53002816
30
https://stackoverflow.com/questions/27290375
31
https://stackoverflow.com/questions/29317744
32
https://stackoverflow.com/questions/21237093
33
https://stackoverflow.com/questions/34617061
34
https://stackoverflow.com/questions/34374738
Energy-Aware Configuration: Developers provide users with options to configure
energy-saving settings within applications, such as screen brightness control and power-
saving modes.35,36
Finally, educating developers on energy-efficient coding practices and tools is vital
for promoting sustainability and reducing EC in software applications. This education
involves raising awareness about the environmental impact of energy-hungry software,
providing training on optimizing algorithms and resource usage, and introducing energy
profiling tools and best practices.37,38,39, ,
With the increase in the use of mobile devices, EC has become a crucial concern for
developers. In recent years, there has been a significant increase in the number of mobile
applications; hence, EC issues are also prevalent in mobile devices. Android is a widely used
mobile platform, and the EC issues on this platform are also getting attention from
developers. The study suggests that software engineering practices can have a significant
impact on EC in different devices, including mobile devices. This is supported by the fact
that there have been various efforts to develop EE software and algorithms and design
devices that use less power. Overall, this finding highlights the importance of considering EC
in software engineering practices, especially in the context of mobile devices.
Summary of RQ3: The standard solutions suggested in Stack Overflow to reduce
software EC include optimizing code for efficiency and reducing resource usage by avoiding
costly operations and data structures, reducing software running frequency, and turning off or
reducing the intensity of background processes. Additionally, optimizing graphics
performance, improving battery management through power-aware scheduling and power
capping, and using EE algorithms and techniques are suggested. Designing EE software
architectures through software componentization, system-level power management
optimization, and creating systems with EE in mind are also presented. Finally, it is
recommended that developers be educated on EE coding practices and tools and encouraged
to consider EC during development.
4. Related Works
A qualitative study was conducted by [2] in an industrial context, where they performed
an in-depth analysis of the interviews of 10 experienced developers and summarized it into a

35
https://stackoverflow.com/questions/52278215
36
https://stackoverflow.com/questions/28758612
37
https://stackoverflow.com/questions/24831813
38
https://stackoverflow.com/questions/22877054
39
https://stackoverflow.com/questions/45946559
set of implications that gave a better idea about the knowledge of developers in the following
areas and is listed down as follows:
 Proper clarity on the developer's idea on the software EC and designing green software.
 The barriers that prevent the adoption of this concept and
 The tools and support required by the developers from the company.
 The simulation of results gave a better perspective of what is expected from the
developers, tool creators, researchers, and the company's management to prioritize
software design for the development of green software.
In a survey conducted by [7], the following findings were observed:
 122 programmers lacked proper training on EC.
 Programmers were not aware of the practices involved in reducing software EC.
 Programmers were unsure about the amount of EC by software.
An empirical study conducted by [8] provided insights from practitioners regarding EC in
software design, construction, and testing:
 The study included a quantitative survey.
 Targeting 464 practitioners from IBM, Google, ABB, and Microsoft.
 The survey was supported by 18 in-depth interviews with Microsoft employees.
 The study's findings emphasized the need to contextualize existing green software
research.
The study by [9] provided directions for researchers to develop tools and strategies for
improving green usage in applications.
 Developers often rely on web-based online platforms, such as Stack Overflow, to seek
solutions for their programming issues.
 Stack Overflow is a popular community website where queries are answered by experts
and community members.
 The platform offers valuable information, insights, and ideas to programmers and
developers.
 Stack Overflow serves as a platform for discussion and keeps developers updated on
changing trends and needs in the programming industry.
StackOverflow conducted a qualitative study based on Q&A to identify the area of EC
that the developers think is significant and the solutions they suggested to enhance EE [5. 10,
11]. [5] conducted an empirical study to understand the views of application-oriented
programmers on software EC.
 The study used data from StackOverflow as the primary source, curating samples from
over 300 questions and 550 answers, analyzing 800 users.
 The simulation revealed that while EC was a popular analysis area, the qualitative
authentication of answers was challenging.
 Some answers were found to be faulty or ambiguous.
 The study identified five essential areas related to EC: General Knowledge,
Measurement, Code Design, Noise, and Context-Specific.
 Code modification received more focus, with more queries in that area.
 The study identified seven significant causes of EC, ranging from background activities
to synchronization.
Empirical study by [10]
 The sample size is 5009 StackOverflow questions.
 Manual analysis of 1000 Android-related questions.
 The study focused on issues faced by developers.
 Analysis of sensor and radio utilization.
 Examination of improper implementation.
In a study by [11]
 Exploration of CSE from the perspective of practitioners.
 Analysis of 12,989 questions from Q&A websites.
 Documentation of answers in StackOverflow.
 Streamlining of topics to derive the dominant subject.
 Qualitative analysis of significant challenges faced.
 Conclusion: Increasing queries on websites, technology as the domain of interest, need
for expert interpretation.
A few previous studies with energy-related questions on StackOverflow revealed that a
few questions were posed by programmers. In order to obtain authentic evidence and insight
into this issue, programmers surveyed and gauged the knowledge of EC and EE of software.
StackOverflow was used as the primary data source, and the data obtained was curated by
NLP in order to receive appropriate answers to the queries posed by software EC.
5. DISCUSSION
For RQ1, ‘What are the questions related to software EC that have been raised on
StackOverflow?’. The findings that EC requirements are often stated in terms other than EC
suggest that energy requirements are challenging to specify directly. The results show that
developers believe that general patterns lead to good or bad EC, and high-level designs are
impacted by EC concerns more frequently than low-level designs.
For RQ2, ‘What is the dominant topic of discussion related to energy in StackOverflow?’.
Findings show that the dominant topic includes software design. Empirical studies of the
impacts of design need to be thoroughly investigated. At the same time, there has been a
significant amount of work in understanding how changes made by developer’s impact EC.
Programming languages or language features help developers during the development of
energy-efficient applications.
For RQ3, ‘What are the common solutions suggested to software EC to the raised
questions/issues?’. Findings show that with the development of mobile devices, EC issues
now vary, including mobile devices. Android platforms get equal attention compared to
computer systems. Software engineering can have a significant impact on EC in different
devices. Recently, there has been an increased focus on developing more EE software and
algorithms and designing devices that use less power. Some of the common solutions
suggested were Optimizing code for efficiency and reducing resource usage, for example, by
avoiding costly operations and data structures, minimizing the amount of time the software
spends waiting for input or other events, reducing the frequency with which the software
runs, turning off or reducing the intensity of background processes, such as network and disk
activity, that consume energy even when the software is idle, using hardware-level power
management features, optimizing graphics performance. Overall, the impact of software
engineering on EC in devices has likely been positive over the past several years and is
expected to continue to improve in the future.
5. CONCLUSION
This study aims to gain a clearer understanding of software engineers' perspectives on
software Energy Consumption (EC) and how they address this issue. Mining StackOverflow
questions for software EC using NLP helps to identify common patterns and trends in how
developers discuss software EC and identify specific questions and answers relevant to the
topic. In this work, keywords were used to search through the text of questions and answers
on StackOverflow to identify relevant posts. Overall, using NLP to mine StackOverflow
questions for software EC provides insights into how developers think about and discuss this
topic and can help identify specific questions and answers relevant to the topic. The finding is
that the developers believe that they could learn how to improve Energy Efficiency (EE) in
several ways. The want to learn is apparent in the comments and indicates that they use the
know-how of other developer, tools, example codes, documentation, etc, to improve their
knowledge.
References
1. Pinto, G., & Castor, F. (2017). Energy efficiency: a new concern for application software
developers. Communications of the ACM, 60(12), 68-75.
2. Ournani, Z., Rouvoy, R., Rust, P., & Penhoat, J. (2020, October). On reducing the energy
consumption of software: From hurdles to requirements. In Proceedings of the 14th
ACM/IEEE International Symposium on Empirical Software Engineering and
Measurement (ESEM) (pp. 1-12).
3. Ardito, L., Coppola, R., Morisio, M., & Torchiano, M. (2019). Methodological guidelines
for measuring energy consumption of software applications. Scientific
Programming, 2019.
4. Pereira, R., Couto, M., Ribeiro, F., Rua, R., Cunha, J., Fernandes, J. P., & Saraiva, J.
(2017, October). Energy efficiency across programming languages: how do energy, time,
and memory relate?. In Proceedings of the 10th ACM SIGPLAN International
Conference on Software Language Engineering (pp. 256-267).
5. Pinto, G., Castor, F., & Liu, Y. D. (2014, May). Mining questions about software energy
consumption. In Proceedings of the 11th Working Conference on Mining Software
Repositories (pp. 22-31).
6. Siegmund, N., Dorn, J., Weber, M., Kaltenecker, C., & Apel, S. (2022). Green
Configuration: Can Artificial Intelligence Help Reduce Energy Consumption of
Configurable Software Systems?. Computer, 55(3), 74-81.
7. Pang, C., Hindle, A., Adams, B., & Hassan, A. E. (2015). What do programmers know
about software energy consumption?. IEEE Software, 33(3), 83-89.
8. Rasheed Abdulkader, Hayder M. A. Ghanimi, Pankaj Dadheech, Meshal Alharbi, Walid
El-Shafai, Mostafa M. Fouda, Moustafa H. Aly, Dhivya Swaminathan, Sudhakar Sengan,
Soft Computing in Smart Grid with Decentralized Generation and Renewable Energy
Storage System Planning, Energies 2023, 16 (6), 2655; DOI:10.3390/en16062655.
9. Shukla, S. (2020). The curious case of posts on Stack Overflow.
10. Malik, H., Zhao, P., & Godfrey, M. (2015, May). Going green: An exploratory analysis of
energy-related questions. In 2015 IEEE/ACM 12th Working Conference on Mining
Software Repositories (pp. 418-421). IEEE.
11. Zahedi, M., Rajapakse, R. N., & Babar, M. A. (2020). Mining questions asked about
continuous software engineering: A case study of stack overflow, in Proceedings of the
evaluation and assessment in software engineering (pp. 41-50).
12. Varatharaj Myilsamy, Sudhakar Sengan, Roobaea Alroobaea & Majed Alsafyani, State-
of-Health Prediction for Li-ion Batteries for Efficient Battery Management System Using
Hybrid Machine Learning Model, Journal of Electrical Engineering & Technology
(2023), https://doi.org/10.1007/s42835-023-01564-2.
13. Nadkarni, P. M., Ohno-Machado, L., & Chapman, W. W. (2011). Natural language
processing: an introduction. Journal of the American Medical Informatics
Association, 18(5), 544-551.
14. Vinothini Arumugham, Hayder M. A. Ghanimi, Denis A. Pustokhin, Irina V. Pustokhina,
Vidya Sagar Ponnam, Meshal Alharbi, Parkavi Krishnamoorthy, and Sudhakar Sengan,
An Artificial-Intelligence-Based Renewable Energy Prediction Program for Demand-Side
Management in Smart Grids, Sustainability 2023, 15 (6), 5453;
DOI:10.3390/su15065453, 2023.
15. Jivani, A. G. (2011). A comparative study of stemming algorithms. Int. J. Comp. Tech.
Appl, 2(6), 1930-1938.
16. Uys, J. W., Du Preez, N. D., & Uys, E. W. (2008), Leveraging unstructured information
using topic modelling. In PICMET'08-2008 Portland International Conference on
Management of Engineering & Technology (pp. 955-961). IEEE.
17. Arul Rajagopalan, Dhivya Swaminathan, Meshal Alharbi, Sudhakar Sengan, Oscar
Danilo Montoya, Walid El-Shafai, Mostafa M. Fouda and Moustafa H. Aly, Modernized
Planning of Smart Grid Based on Distributed Power Generations and Energy Storage
Systems Using Soft Computing Methods, Energies 2022, 15 (23), 8889;
DOI:10.3390/en15238889.
18. Zhang, Y., Chen, M., Huang, D., Wu, D., & Li, Y. (2017). iDoctor: Personalized and
professionalized medical recommendations based on hybrid matrix factorization. Future
Generation Computer Systems, 66, 30-35.
19. Moutidis, I., & Williams, H. T. (2021). Community evolution on stack overflow. Plos
one, 16(6), e0253010.
20. Ingeno, J. (2018). Software Architect’s Handbook: Become a successful software
architect by implementing effective architecture concepts. Packt Publishing Ltd.
21. de Dieu, M. J., Liang, P., Shahin, M., & Khan, A. A. (2023). Characterizing architecture-
related posts and their usefulness in Stack Overflow. Journal of Systems and Software,
111608.
22. Arodh Lal Karn, Bhavana Raj Kondamudi, Ravi Kumar Gupta, Denis A. Pustokhin, Irina
V. Pustokhina, Meshal Alharbi, Subramaniyaswamy Vairavasundaram, Vijayakumar
Varadarajan Sudhakar Sengan, An Empirical Analysis of the Effects of Energy Price
Shocks for Sustainable Energy on the Macro-Economy of South Asian Countries,
Energies 2023, 16 (1), 363; DOI:10.3390/en16010363.
23. Meldrum, S., Licorish, S. A., & Savarimuthu, B. T. R. (2020). Exploring Research
Interest in Stack Overflow--A Systematic Mapping Study and Quality Evaluation. arXiv
preprint arXiv:2010.12282.
24. Luo, X. (2021). Efficient English text classification using selected machine learning
techniques. Alexandria Engineering Journal, 60(3), 3401-3409.
25. Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent
Dirichlet allocation (LDA) and topic modeling: models, applications, a
survey. Multimedia Tools and Applications, 78, 15169-15211.

You might also like