You are on page 1of 14

Computers & Security 116 (2022) 102661

Contents lists available at ScienceDirect

Computers & Security


journal homepage: www.elsevier.com/locate/cose

A study on the use of vulnerabilities databases in software


engineering domain
Sultan S. Alqahtani
Computer & Information Sciences College, Al Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia

a r t i c l e i n f o a b s t r a c t

Article history: Over the last decade several software vulnerability databases have been introduced to guide researchers
Received 18 October 2021 and developers in developing more secure and reliable software. While the Software Engineering research
Revised 1 February 2022
community is increasingly becoming aware of these vulnerabilities databases, no comprehensive litera-
Accepted 14 February 2022
ture survey exists that studies how they are used in software development. The objective of our survey
Available online 16 February 2022
is to provide insights on how the software vulnerability database (SVDBs) research landscape has evolved
Keywords: over the past 17 years and outline some open challenges associated with their use in non-security do-
Security main. More specifically, we introduce a semi-automated methodology based on topic modeling, to dis-
Vulnerability databases cover relevant topics from our dataset of 99 relevant SE research articles. We find 24 topics discussing
Software development the use of SVDBs in SE domain. The results shows that i) topics describing the use of SVDBs range from
Software security security empirical (case) studies to tools for generating security test cases; ii) the majority of the surveyed
Vulnerability analysis
papers cover a limited number of software engineering contributions or activities (e.g., maintenance) and
iii) that most of the surveyed articles rely on only one SVDB as their knowledge source. Dataset and
results are available at https://github.com/isultane/svdbs_dataset
© 2022 Published by Elsevier Ltd.

1. Introduction 1 Specialized SVDBs are databases covering specific vulnerable


aspects (e.g., Mozilla Foundation Security Advisory (MFSA)3 ).
In response to the increasing number of software vulnerabili- Many software manufactures operate their own, highly special-
ties and attacks, various private and public organizations have in- ized databases in which publicly known vulnerabilities of their
troduced Software Vulnerabilities Databases (SVDBs) (e.g., National products are documented (Schumacher et al., 20 0 0).
Vulnerabilities Database (NVD)1 ). Each of these databases captures 2 Common SVDBs are databases publish vulnerabilities for a
not only different types of vulnerability information but are also of number of software projects across vendor boundaries. These
interest to developers and other stakeholders, since they contain databases disclosed vulnerabilities to the public after be-
valuable information about software flaws, causes of defects and ing reviewed by security expert (e.g., National Vulnerabilities
details on how vulnerabilities arise, etc. (Thomas, 2011). Database – NVD4 and Common Vulnerability Enumeration5 ).
Existing SVDBs can be classified into two major categories, pri-
However, software developers are often unaware or unfamiliar
vate and public vulnerabilities databases: Private SVDBs are typi-
with SVDBs and the fact that known vulnerabilities might already
cally managed by for-profit organizations, covering vendor specific
be published in these SVDBs. As a result, vulnerabilities affecting
product (closed source) vulnerabilities and rarely disclosure this
their systems (either directly or indirectly through external vulner-
information to the public. Public SVDBs are organized and main-
able components used by their systems) remain often uncovered.
tained typically by non-profit organizations (e.g., Computer Emer-
Increasing both awareness and accessibility of SVDBs during soft-
gency Response Teams (CERT)2 ) and discloses their vulnerability
ware development can improve software reliability and quality. Es-
information with the public. These public SVDBs can be further
tablishing such required traceability among vulnerabilities across
divided into subcategories: specialized and general (i.e. common)
software artifacts is an essential aspect in identifying and locating
SVDBs, with:
vulnerability code, applying existing fixes, and improving the anal-
ysis of potential impacts of vulnerabilities.

3
E-mail address: ssalqahtani@imamu.edu.sa https://www.mozilla.org/en-US/security/advisories/
1 4
https://nvd.nist.gov/vuln/data-feeds https://nvd.nist.gov/vuln/data-feeds
2 5
https://www.cert.org/ https://cve.mitre.org/index.html

https://doi.org/10.1016/j.cose.2022.102661
0167-4048/© 2022 Published by Elsevier Ltd.
S.S. Alqahtani Computers & Security 116 (2022) 102661

While the non-security (e.g. Software Engineering) research Table 1


Keyword searches for online library search.
community is becoming increasingly aware of these SVDBs, no
comprehensive literature survey exists that studies where (and Category Terms
how) SVDBs are used in SE domain. Our literature survey provides General ("vulnerable" OR "vulnerability" OR
"vulnerabilities" OR "vulnerability database" OR
insights into the current state-of-the-art usage of SVDBs in soft-
"vulnerability databases")
ware development activities. Our primary goals are to characterize AND
and quantify: ("software engineering")
Domain ("vulnerable" OR "vulnerability" OR
"vulnerabilities" OR "vulnerability database" OR
• RQ#1 Which SVDBs are commonly used by the SE community, "vulnerability databases")
• RQ#2: What are the main security topics covered in the re- AND
viewed SE articles? and ("software requirement" OR "software design"
• RQ#3: Has the security interests in specific SE activity (e.g., OR "software coding" OR "software testing" OR
"software verification" OR "software evolution"
testing) changed overtime?
OR "software maintenance")

To answer these research questions, we surveyed 99 articles


(Table A4 in the Appendix) from the Software Engineering litera- 2.1. Dataset – collection and selection
ture that explicitly discuss the use of SVDBs in their approach. We
propose a semiautomatic methodology for analyzing such a knowl- We defined a set of inclusion criteria for articles to be consid-
edge repository (SE articles), with the specific goal of extracting ered in our survey. The main criterion being that a paper discusses
the main usage of SVDBs, uncovering the main security discussion the use of SVDBs in the context of a software engineering activity.
topics, their underlying dependencies, and trends over time. Our Other criteria used during the data collection and selection pro-
methodology is based on Latent Dirichlet Allocation (LDA) (Blei et cess are that an article must be written in English and published
al., 2003), a statistical topic model used to automatically recover as a conference paper, journal papers, technical report, or book in
topics (i.e., groups of related words that approximate a real-world a reputable venue.
concept) from a corpus of text documents. LDA has been applied For our online library search we used search engines such
in many domains, including the SE domain (Panichella et al., 2013; as: ACM Digital Library, IEEE Xplore Digital Library, Springer Link
Thomas, 2011; Hindle et al., 2015). As part of our methodology, we Online Library, Elsevier Science Direct, Wiley Online Library and
introduce and apply metrics on the topics discovered by LDA, al- Google Scholar. A list of search terms and their combinations
lowing us to perform a further quantitative and qualitative evalua- are shown in Table 1. For each query we only considered exact
tions of the identified SE articles. Findings from our analysis show matches of a publication at both, the meta-data and full-text level.
that SVDBs are most commonly used for security empirical (case) As a result of both our searches, we identified a total of 1235
studies, generating security test cases or modeling vulnerabilities articles. As part of our data cleaning, we manually reviewed the
detection techniques. We also find that SE researchers discuss a title and abstract (and, in some cases, the introduction of the pa-
broad range of security topics, from creating theoretical founda- per) to verify that an article meets our main inclusion criteria - a
tions for new security analysis techniques to implementing them paper must discuss the use of SVDBs in the context of a software
as part of software development environments. It should be noted engineering activity. Papers which did not meet the criteria where
that our dataset and results are available online to facilitate the omitted from further processing. After completing this manual re-
replication and reuse of our findings (Alqahtani, 2021). view process, 146 of the initial 1235 articles remained and were
The outcome of our study can be beneficial to different stake- considered for a more detailed review. As part of this detailed re-
holders, since the study provides insights on current trends and view we verified the use of the SVDBs discussed in each article’s
open challenges related to the use of SVDBs. The study provides methodology description. Papers which did not explicitly describe
also a detailed analysis on how the security vulnerability research the use of SVDBs for SE activities were removed from the set of
landscape has evolved over the past 17 years. articles, which left us with 99 articles to be included in our final
The remainder of the paper is organized as follows: Section dataset for further analysis.
2 presents our research methodology. The results of our research For these 99 publications, we included meta-data of each arti-
are presented in Section 3. We discuss our findings in Section cle (including the author(s), title, publication year, publication type
4 Section 5. outlines future research directions. Finally, Section (academic, industry, or both), the name of the SVDBs used, the SE
6 offers concluding remarks. repository used, information on the conference proceeding/journal
a paper was published, and keywords included in the article. For
the full description of the meta-data details, we refer the reader
2. Approach to our dataset publicly available online (Alqahtani, 2021). We used
this dataset for our detailed analysis of the reviewed papers Fig.
In this section, we detail our research methodology; including 2. provide a general overview of the dataset in terms of papers by
metrics we introduced to analyze our research data. year published.
Our methodology is based on the following major processing One of the main findings from our first data analysis is that
steps (see Fig. 1). First, we extracted SE articles published in differ- there has been a significant increase in the number of publications
ent leading software engineering journals and conferences during (per year) that address vulnerabilities analysis in software engi-
2001 to 2018 and apply pre-define selection criteria to the dataset neering, which is a good indicator for the growing research interest
to extract only SE articles that describe the use of SVDBs in their in the domain.
research contributions. Second, we apply topic modeling on these
SE articles followed by a clustering step to identify SE activities and 2.2. Text pre-processing
then further classify the articles based on their supported SE activ-
ities. Finally, we analyze the discovered topics and clusters through As part of our survey methodology, we apply topic modeling
metrics which we introduce to further assess and interpret our to classify the papers in our dataset. As input for the topic mod-
data. In what follows, we describe processing steps in more detail. eling we use only the title and abstract of the papers rather and

2
S.S. Alqahtani Computers & Security 116 (2022) 102661

Fig. 1. An overview of our research methodology.

topic are semantically related, which gives meaning to the topic


as a whole. For example, the words with the highest probability
in a topic might be “vulnerability”, “patch”, “security”, and “buffer
overflow” (occurring together in documents), indicating that this
topic is related to security vulnerabilities. Furthermore, LDA will
also classify a document that contains topics such as security vul-
nerability, programming, or related to both, without the need for
any training data. However, given a set of documents, LDA uses
machine learning algorithms to infer the topics and topic member-
ships for each document
The result of applying LDA to our preprocessed data is (a) a set
of topics, defined as distributions over the unique words in our
dataset and (b) a set of topic membership vectors, top four for each
article, indicating the percentage of words in the article that come
Fig. 2. Number of articles per year. from each of the K topics (K = 24). However, the highest-probability
is assigned to words in a topic that are semantically related and
can be used to define the topic. In addition we manually provide
not the complete text of the articles. Given this reduced corpus, short label for each topic, for example “Overflow detection” for the
we have to ensure that the data we are analyzing is consistent and topic that contains top words such as: “detect”, “technique”, “over-
free of noise. Such data cleansing is perhaps the most important flow”, “automat”, and “buffer” (see Table A2 Appendix A) to further
step in text analysis. For the data cleansing we use first Natural improve the readability of the topics.
Language Processing (NLP) to remove numbers, punctuation, stop
words, strip whitespaces, and perform stemming. For this NLP pro-
cessing step we use the Text Mining (tm) (Ingo, 2013) package that 2.4. Metrics and analysis
offers a number of transformations that ease the data cleaning pro-
cess. For example, the tm package includes a standard list of stop LDA discovers K topics, z1 , ..., zk . We denote the membership
words and a transformation functions that removes common stop of a topic zk in document di as θ (di , zk ). We note that ∀i, k :
words from the dataset such as articles (a, an, the), conjunctions k
0 ≤ θ (di , zk ) ≤ 1 and ∀i : θ ( di , zk ) = 1.
(and, or, but, etc.), common verbs (is), qualifiers (yet, however etc.). 1
We apply also a transformation function for stemming to normal- We define a threshold δ , to indicate whether a topic is “in”
ize words that have a common root – for example: offer, offered a document (article). Usually, an article will have multiple top-
and offering. Through stemming these related words will be re- ics (related to different SE life cycle activities). However, given
duced to their common root, which in this example would be the the probabilistic nature of LDA, some topics are assigned small,
word offer. non-zero (e.g., 0.01) memberships to an article, indicating a non-
In addition, we created several custom transformations, using reliable topic assignment. Using the δ threshold we can now define
the tm content transformer function to address some of the re- what corresponds to main topics in an article. For our study, we
maining problems specific to our datasets. For example, we found set δ = 0.10 as a membership cut-off. Using this threshold, we now
words without spaces between colons and hyphens, which had to only keep topics from each article which are above the threshold
be inserted to ensure proper text processing. value therefore reducing the noise in the topic assignment.
Topic popularity. We define the overall share (popular) of a topic
2.3. Topic modeling zk across all articles as

1 
For the next processing step, we use the cleansed data as input popular (zk ) = θ ( di , zk ) (1)
|D|
to our topic model algorithm. The topic modeling is used to au- di ∈ D
tomate the discovery of topics from our dataset. In this paper, we θ ( di , zk ) ≥ δ
rely on Latent Dirichlet Allocation (LDA), a statistical topic mod-
eling approach which is best suited for finding discussion topics where D is the set of all articles in our dataset. The popularity met-
in natural language text documents (Blei et al., 2003). LDA cre- ric measure allows us to assess the relative popularity of a topic zk
ates topics when it finds sets of words that co-occur frequently across all articles. For example, if a topic has a popularity metric
in the documents of the corpus. Often, the words in a discovered of 10%, then 10% of all articles contain this topic.

3
S.S. Alqahtani Computers & Security 116 (2022) 102661

14% of the surveyed articles. OWASP is dedicated to maintaining a


list of Web applications with known security incidents and is com-
monly used for experiments in testing security vulnerabilities af-
fecting web-applications (e.g., Sampaio and Garcia, 2016; Palsetia
et al., 2016; Bozic et al., 2015). The Open Source Vulnerability
Database (OSVDB7 ) (used by 8% of the surveyed articles), is one of
the earlier publicly available common SVDBs. However, as of April
2016, the database is no longer maintained. Other popular com-
mon SVDBs included are CWE (7%) and SecurityFocus (6%). CWE
is maintained by MITER8 and provides a classification of vulnera-
bilities types which are commonly used for testing and classifying
Fig. 3. Top 6 common SVDBs popularity usage. security attacks. SecurityFocus is an online software systems’ se-
curity news portal that obtains its data from the Bugtraq9 mailing
list. Bugtraq is an independent source for security vulnerabilities,
SE-phases trends over time. To classify and compare trends of SE
alerts, and threats.
activities (e.g., testing vs. coding), the original K = 24 topics used by
While some articles compare their vulnerability results ob-
our popularity metric are too fined grained. Instead, we introduce a
tained from one SVDB with results from other SVDBs, only a few
metric that uses both: topics and topics-relations form these iden-
studies (e.g., Walden et al., 2014; Mendes et al., 2014) combine the
tified SE activities. We define a SE phase as a cluster of topics
usage of different common SVDBs in their approach. As shown in
which corresponds to a technical SE concept (e.g., testing) and is
(Raghavan et al., 2007), combining different common SVDBs data
related to a given topic (e.g., a test-case generation topic). To iden-
sources can increase the zero-day detection performance of vul-
tify a SE phase, sp, related topic zk , we automatically selected the
nerability detection and analysis techniques.
group of the most popular topics that correspond to this SE activ-
Summary Although the CVE database has been available longer
ity. The main advantage of the SE phase metric compared to just
than NVD, most surveyed articles use NVD in their approach. We
counting instances of papers covering these activities is that our
believe that one of the reasons for the widespread use of NVD is
approach is less likely to be affected by false positives, since we
the easy access to vulnerability data through supported feeds and
only consider an article if it contains the topic of interest.
the regular updates to the database. Moreover, we found little re-
For us to measure the trend based on year y and article A(y ),
search exists in combining different SVDBs to improve their ap-
we introduce the SE_phase_impact of SE phase sp as:
proach or vulnerability detection analysis.

SE phase impact (zk , sp, y ) = θ ( di , zk ) (2)
di ∈G(zk ,sp,y )

where G(zk , sp, y ) denotes all topics zk related to SE phase sp in 3.2. What are the main security discussion topics in SE articles (2006
year y. This metric measures the number of articles covering a SE – 2017) (RQ#2)?
phase during a year, relative to all articles in that year.
In what follows, we describe in more detail the 24 topics dis-
3. Results covered by the topic analysis used in our methodology. The full list
of topics discovered, including their popularity metric (popular (1)
We now present the results of applying our research methodol- intrudced erlier) can be found in Appendix A (see Table A3). The
ogy to our dataset and report on findings related to our research full list of topics discovered, including their popularity metric can
questions introduced earlier. be found in public repository (Alqahtani, 2021).
Our analysis also shows that topics span a several security con-
3.1. What types of SVDBs are most commonly used (RQ#1)? cepts, such as “security study”, “sql injection”, “detecting overflow”,
“prediction model” etc. We use the groupings of top words to re-
Our survey shows that none of the surveyed SE article reported flect their semantic similarities (see Table A1 in the Appendix). For
on the use of common SVDBs prior to 2006 (see Fig. 3). This is example, words such as ‘‘model, ‘‘predict, ‘‘evaluation”, and “com-
due to the fact that the first widely-recognized common SVDBs ponents” are grouped together in this context as part of Vulnera-
(e.g., Karlsson, 2012) became publicly available only in late 2004 bility Prediction Model (Hovsepyan et al., 2012). In addition to this
and beginning 2005 with more specialized public SVDBs emerging automated classification, we also manually verified most of the top
in SE articles in late 2010. documents (SE articles), to ensure a natural fit with both the given
Our analysis also shows that the majority of surveyed arti- topic and the other topics in the documents.
cles (91%) use common SVDBs in their work, whereas only 9% In what follows, we present a subset (top 4 from Table A1) of
rely on specialized SVDBs as their primary resource for vulner- topics along with a representative example articles (Article-id: title
ability information. Further analysis of the common SVDBs us- and SVDB used in the article) to illustrate what constitutes such
age in these papers (see Fig. 3) shows that most of the sur- topic.
veyed articles (26%) are used NVD as their SVDB of choice, fol- Security study topic: Mining security software repositories is of-
lowed by CVE6 (16%). It should be noted that NVD is based on ten used to study how programmers deal with security concerns
the CVE dictionary augmented with additional analysis informa- during software development. To this extent, we identified multi-
tion, a database, and a fine-grained search engine. NVD is syn- ple security topics, including Security Study (Empirical), and Secu-
chronized regularly with CVE such that any CVE update will also rity Maintenance and Design. Below are two examples of articles
be reflected in NVD (after approved by the NVD security engi- that fall into the Security Study topic.
neers). NVD includes security checklists, security related software
flaws, misconfigurations, affected product names, and impact met-
rics. The OWASP, another common SVDB which has been used by
7
https://blog.osvdb.org/
8
https://www.mitre.org/
6 9
https://cve.mitre.org/ http://seclists.org/bugtraq/

4
S.S. Alqahtani Computers & Security 116 (2022) 102661

A85-Title: On A18-Title: Evaluation of the IPO-family algorithms for test


mining data case generation in web security testing.
across software Topic-11: test case
repositories. Used SVDBs: OWASP and Exploit-DB
Topic-5: security
study
Used SVDBs: A80-Title: MUTEC: mutation-based testing of cross site
NVD scripting.
Topic-11: test case
Used SVDBs: OSVDB
A86-Title: An empirical study of security problem reports
in Linux distributions. Summary: Among the security studies that mine historical
Topic-5: security study SVDBs data to gain new insights by identifying or confirming
Used SVDBs: NVD causalities among vulnerabilities and other knowledge resources,
vulnerability models for detection, security test cases, and security
Prediction model topic: We further observed that most SE arti-
development are overall the most common SE activities described
cles mentioning “model”, are referring to the Vulnerability Predic-
by these papers.
tion Model (VPM). VPM is a relatively recent field of study which
In addition, our analysis also supports our earlier observation
is used to automatically classify software entities as vulnerable or
in RQ#1 that common SVDBs are more frequently used in research
not. The topic “prediction model” is also assigned to articles that
projects than the specialized SVDBs. An additional manual review
use VPM for vulnerability discovery and prediction, and for vulner-
showed that most SVDBs are used for empirical mining studies re-
ability analysis evaluation. The common usage of SVDBs in these
lated to vulnerability evolution and trend analysis.
articles is for the comparison of vulnerability prediction results.
Below are two examples from that topic assignment:
3.3. Has the security interests changed over time (RQ#3)?
A9-Title: Vulnerability prediction models: a case study on
the Linux kernel. In the previous section we classified articles in our dataset
Topic-24: prediction model based on their topics. This classification allowed us to gain some
Used SVDBs: NVD insights on the security topics covered by these papers. In what
follows, we further analyze the relationship among different topics,
by forming topic clusters that correspond to SE lifecycle activities.
A93-Title: Measuring and enhancing prediction
Topics relations. In what follows, we automatically clustered the
capabilities of vulnerability discovery models for apache
and iis http servers. extracted topics into more meaningful SE phases by creating a net-
Topic-24: prediction model work visualization using the correlation between each topic’s word
Used SVDBs: NVD and Netcraft probabilities. To improve readability and reduce the noise in the
network graph we only consider relationships among topics that
Bugs and release analysis topic: To design an effective approach
have a significant correlation. We apply a 0.025 statistical signifi-
for detecting and recovering from software failures (vulnerabili-
cance threshold across the 24 topics (Bartlett, 1993) Fig. 4. shows
ties) requires an understanding of failure characteristics (D’Ambros
a network visualization, with each number in the graph represent-
and Lanza, 2006). To this extent, we find that the topic “bugs
ing a topic. We then apply the label propagation community de-
and release analysis” assigned to SE articles mine and study
tection algorithm (Raghavan et al., 2007) implemented by igraph11
bug/vulnerability issues. These articles discuss the relationships
to derive clusters within the network. The community detection
between regular bugs in Open Source Software (OSS) products and
algorithm created 6 communities, representing different SE activ-
their relations to security vulnerabilities. Our analysis shows that
ities, plus multiple smaller communities which were not part of
SVDBs used in these articles extract vulnerabilities characteristics
any of the larger community cluster (i.e., topics that do not have
and compare them with the regular bugs (see examples below).
any connections to other topics) (see Fig. 5). We then manually re-
viewed the clustering results, and re-arranged the single topics un-
A4-Title: Do bugs foreshadow vulnerabilities? An in-depth der one common cluster (named “other”). From the resulting net-
study of the chromium project.
work graph, we could observe an overlap between clusters 1, 5,
Topic-13: bugs and release analysis
Used SVDBs: NVD
and 6 as shown in Fig. 5 (for more details see Appendix A Table
A1), which indicates that articles in these clusters contribute also
to different SE activities. For example, Ming et al. (2016) introduces
A33-Title: Bug characteristics in open source software. a testing approach based on taint analysis to detect security vul-
Topic-13: bugs and release analysis nerabilities and also introduces a tool as a proof of concept. Next,
Used SVDBs: NVD we manually labeled all clusters by assigning them a label corre-
Security test cases topic: During the development of a product, sponding to the SE activity they are supporting (e.g., label “Testing”
security testing techniques are often used to reveal flaws in se- as a cluster was assigned to the collection of topics: Topic 6, Topic
curity mechanisms of an information system. We found “security 11, Topic 20, and Topic 22).
test cases” as a topic being assigned to articles that use different In what follows, we further analyze these clusters to gain ad-
type of software security testing such as “mutation analysis” topic. ditional insights on how vulnerability topics in different SE phase
Most of the reviewed articles rely on the analysis of vulnerabilities have evolved over the analysis period. For this analysis we employ
in source code for their testing approaches. We found Exploit-DB10 the SE_phase_impact metric (2) introduced earlier, which allows us
and OSVDB are most commonly used since these common SVDBs to compare the impact of these SE activity clusters over time.
often also include the vulnerability exploit code for education pur- Our analysis shows that SE activities such as Maintenance, Test-
poses and penetration testing. ing and Modeling are the most dominant areas (see Fig. 5) of re-
search involving SVDBs. Furthermore, the impact of Maintenance

10
https://www.exploit-db.com/ 11
http://www.cs.cmu.edu/∼lujiang/resources/igraph.pdf

5
S.S. Alqahtani Computers & Security 116 (2022) 102661

Fig. 4. Topics relations network.

tack mechanisms”, “web application”, “bugs and release analysis”


and “static analysis” (see Table A2 Appendix A). Several articles
(Ming et al., 2016; Theisen et al., 2015; Fang and Hafiz, 2014) ex-
plore the use of “taint analysis” methods to “detect overflow” is-
sues (see example below).

A11-Title: StraightTaint: decoupled offline symbolic taint


analysis.
Topic-1: taint analysis, Topic-7: attack mechanism
Used SVDBs: CVEs

Several articles (Lebeau et al., 2013; Rountev et al., 2004) inves-


tigate “web application” by using “static analysis” methods to ad-
dress “attack mechanisms”. SVDBs in these articles play an impor-
tant role by providing tangible examples of known security vulner-
abilities for validating their maintenance approaches (see example
below).

A59-Title: Predicting common web application


vulnerabilities from input validation and sanitization code
patterns.
Topic-1: taint analysis, Topic-10: web application,
Topic-23: static analysis
Used SVDBs: SecurityFocus
Fig. 5. Network communities.
Summary From our survey we can conclude that the most
widely supported SE activities are Maintenance, Testing, and Mod-
eling. For the Maintenance activity we also observed that articles
and Testing related research has increased over the years, whereas
often address “bugs and release analysis” used bug repositories
fewer papers covering Modeling perspective have been published.
(i.e., issue tracker systems) along with SVDBs.
Fig. 6 shows the trend analysis of SE phases for Maintenance,
Modeling and Testing. Maintenance (Fig. 6a) shows an increasing
trend and is dominant in multiple topics. After a major drop in 4. Discussion
2008, papers on Modeling (Fig. 6b) remain somewhat steady. Test-
ing (Fig. 6c) has remained steady since 2008. We also overserve RQ#1: Which SVDBs are commonly used by the SE community?
that papers in the Maintenance cluster use often topics such as From our survey we observed that most articles report on the use
“taint analysis”, “fuzzy logic assessment”, “detecting overflow”, “at- of common SVDBs compared to specialized SVDBs. The reason for

6
S.S. Alqahtani Computers & Security 116 (2022) 102661

Fig. 6. Trend analysis of SE phases within specific topics.

this a manifold such as: (1) specialized SVDBs contain known se- least provide users with patch information on how to fix the vul-
curity vulnerabilities affecting specific systems written in a specific nerability.
programming language (e.g., PHP). Analysis results obtained from A limitation of many common SVDBs is that they do not include
specialized SVDBs are typically not generalized to other systems the actual code causing the security vulnerability, which is in con-
(e.g., using Java vs PHP) therefore limiting the potential impact trast to specialized SVDBs that often share the code of known se-
of the published work; (2) common SVDBs contain more diverse curity vulnerabilities. Having direct access to this vulnerable code
known security vulnerabilities affecting different types of software fragment, simplifies the work of SE researchers evaluating their se-
systems and therefore can accommodate different research inter- curity analysis approaches.
ests; (3) among the common SVDBs, we found NVD being the most RQ#2: What are the main security topics covered by the re-
popular SVDBs used in the SE community. There are several rea- viewed SE articles? and RQ#3: Has the security interests in specific
sons for this popularity of NVD such as: ease of access (e.g., auto- SE phase changed overtime? We studied security topics discussed
matic data feeds), updates, size, and quality of the dataset. in SE articles that uses SVDBs in their research methodology to
Even with the popularity of common SVDBs, studies have identify how these SVDBs are used. We further clustered these
shown that developers are often not aware of known security vul- topics based on topics’ terms relationships describing SE activi-
nerabilities affecting their systems (Cadariu et al., 2015; Alqahtani ties for a more fine-grained analysis. Our findings related to RQ#2
et al., 2016b; Plate et al., 2015), resulting in situations where reveal that security studies (empirical or case studies) are among
known vulnerabilities are only late or never patched after the dis- the most common research activities covered by our reviewed ar-
closure of a vulnerability. This implies limited communication be- ticles, with most articles only citing a single SVDBs. Even though
tween vendors in charge of patching the vulnerabilities and com- some research has shown that combining multiple SVDBs can im-
mon SVDBs providers, since vendors are expected to provide a new prove vulnerability detection coverage and performance (Massacci
(patched) version of components with known vulnerabilities or at and Nguyen, 2014; Alqahtani et al., 2017).

7
S.S. Alqahtani Computers & Security 116 (2022) 102661

Our results for RQ#3 introduce 7 clusters which represents SE Other usages of SVDBs: Our survey also showed that SE re-
activities such as Maintenance, Testing and Tools design, Model- searchers used SVDBs for topics not associated with any of our
ing, Coding, Risk Analysis, and other. Due to the space limit, we topic clusters, for example:
will show in some details the top three, Maintenance, Testing and Studying the lifetime of vulnerabilities Frei et al. (2006). use
Modeling, summarized as follows: SVDBs (e.g., NVD) to quantify the time period between a vulner-
Maintenance: Among the SE activities which are supported by ability disclosure, the time of exploiting the vulnerability and the
SVDBs, maintenance is most common one. Our further manual time it takes to patch a vulnerability Zhang et al. (2011). used data
analysis of these maintenance related articles showed that many from the NVD to predict the time until a new vulnerability is dis-
of them focus on SVDBs for vulnerability evolution. Like traditional covered in a software product Zaman et al. (2011). preformed an
software evolution research with its focus on analysis and charac- exploratory study using Firefox and the CVE dataset to uncover se-
terization on how a software system evolves over time, the pres- curity bugs. The study reveals that on the hand security bugs are
ence of vulnerabilities is a crucial problem in this context. Vulnera- fixed faster than design bugs, and that security bugs tend to be
bility evolution requires organizations to monitor and manage evo- reopened multiple times in a bug repository.
lution of vulnerabilities to ensure security and reliability of their Studying vulnerabilities and their hidden impact Wijayasekara
systems. Furthermore, it often relies on information on how cer- et al. (2014., 2012) study the hidden impact of vulnerabilities, vul-
tain vulnerabilities evolve over time and what are the causes for nerabilities that are discovered after a bug has been made public.
these vulnerabilities (Alhazmi and Malaiya, 2006). Common to the They used CVE and bug repositories for the analysis of the Linux
reviewed papers, is that SVDBs provide tangible evidence about kernel and MySQL and observed that these systems had 32% and
security issues affecting software systems and how these vulner- 62% of hidden impact vulnerabilities between 2006 and 2011.
abilities and the systems they occur have evolved over time (e.g.,
Murtaza et al., 2016; Stuckman and Purtilo, 2014; Meneely et al., 5. The road ahead
2013).
Models: Modeling in software engineering community is pri- There are two key observations that we believe will impact the
marily concerned with reducing the gap between software prob- future development of SVDBs and how SE researchers and prac-
lems and implementation through the use of models that describe titioners will use SVDBs. The first observation can be helpful for
complex systems at multiple levels of abstraction and from a va- SVDBs designers to enhance current SVDBs features to meet ad-
riety of perspectives (Atlee et al., 2007). In vulnerabilities analysis, ditional requirements from SE researchers and practitioners. The
modeling techniques play an important role, for resource alloca- second is an observation that can guide SE researchers and practi-
tion during patch development and when evaluating the risk of tioners to further improve traceability and documentation of prod-
vulnerabilities exploitations. Vulnerability discovery models (e.g., uct changes caused by patched vulnerabilities.
Massacci and Nguyen, 2014; Alhazmi and Malaiya, 2006) are in-
troduced. A widely used example of such a prediction model is Observation 1. Using SVDBs beyond just being information silos.
the Vulnerabilities Prediction Model (VPM) (Rountev et al., 2004; Our results show that a majority of SE researchers use SVDBs
Morrison et al., 2015) introduced to predict the occurrence or ab- to gain security related knowledge. In fact, developers already use
sence of security vulnerability in the software systems. The use of SVDBs to identify security vulnerabilities, and determine features
VPM is also evident by the common use of the “prediction model” (e.g., vulnerability patch information) that they want to implement.
topic in our surveyed papers. Due to the available vulnerability Presently, the role of SVDBs is mostly as a repository for report-
patch information provided by SVDBs, we find SE researchers also ing known security vulnerability. However, we envision that fu-
start including SVDBs in their vulnerability prediction analysis and ture versions of SVDBs will play an increasing role, as an inte-
recommendations for patching the vulnerabilities (e.g., Sampaio grated knowledge source for guiding secure software development,
and Garcia, 2016; Theisen et al., 2015; Wang et al., 2017; Appelt providing security testing, and refining software security design.
et al., 2015; Chatzipoulidis et al., 2015). Furthermore, SVDBs data Hence, we believe that future versions of SVDBs need to incorpo-
is often used to increase the precision or recall of existing models rate a mechanism where SE researchers can link and trace vulner-
(e.g., Theisen et al., 2015) or as example to further enrich and train ability information directly across knowledge resources.
models with “real” security vulnerability data (e.g., Chatzipoulidis Another interesting finding is that SE researchers and prac-
et al., 2015). titioners reuse vulnerability information usually only from one
Testing and tools: Automated tools play an important role in source (single SVDB), limiting their analysis approach to the data
the software engineering domain (O’Regan, 2017) to support dif- available in this SVDB. One approach to address this problem is
ferent activities and tasks. From our survey, we observed that SE by improving the accessibility of information across SVDBs bound-
researchers introduce different types of automated tools that sup- aries. Providing users with the ability to use a standardized ac-
port vulnerability analysis and avoidance. For example, papers pro- cess to these knowledge resources, where queries will be retriev-
pose automatic test generation (suggested by “test cases” topic), ing information across SVDBs boundaries will represent a first step
and penetration testing tool that automates the process of detect- to perform new types of vulnerability analysis (e.g., global secu-
ing and exploiting SQL injection flaws (suggested by “exploit” and rity impact). While linking these knowledge resources is an im-
“sql injection” topics). Among the articles being cluster in the Tool portant initial step, additional semantic modeling will be needed
and Testing activity, SVDBs were used for evaluating results of the to ensure the consistency and quality of knowledge across SVDBs
proposed tools or as the main knowledge resource for the actual boundaries. For example, threats to consistency and ambiguity
approach presented in the papers (e.g., Wang et al., 2017; Stivalet across these knowledge resources will have to be addressed to
and Fong, 2016; Alqahtani et al., 2016a; Pham et al., 2015; Blome ensure that a vulnerability reported in two databases is actually
et al., 2013). the same (or different) instance. One approach would be to re-
While several security test cases generation and vulnerability place current proprietary knowledge modeling approaches used
prediction tools have emerged, many of these tools remain still at by SVDB providers and agree upon a standardized knowledge
the prototype level. We also observed that little work exists in ana- modeling approach, which would include the ability to semanti-
lyzing cross-cutting security concerns across different security test- cally link, query SVDBs across the repository boundaries and to
ing and modeling approaches. provide each vulnerability with a global, unique identifier, sim-

8
S.S. Alqahtani Computers & Security 116 (2022) 102661

ilar to the Universal Resource Identifier used by the Semantic used in SE research over last 17 years. From our analysis of the 99
Web. papers in our dataset we can conclude the following:

Observation 2. Linking Security Commit Changes to SVDBs • there is an increasing awareness of SVDBs in the research com-
munity in terms of papers being published describing the use
With a more widespread use of SVDBs in software develop- and application of SVDBs in the SE domain;
ment, we believe that SVDBs should become an integrated part of • the majority of the surveyed studies apply SVDBs only to a lim-
current software development processes and best practices. Simi- ited number of software engineering activities;
lar to the current practice of adding an issue number to a com- • most studies rely only on one SVDB for their contribution;
mit message, commit message also should include a link to the • researchers usually treat SVDBs as trusted, information knowl-
vulnerability in the SVDB where it is reported. Such vulnerabil- edge resources without fully integrating them with other soft-
ity traceability can provide additional insights and documentation ware lifecycle artifacts.
to QA and future maintainers when analyzing and comprehending
the code patch. Furthermore, a bi-directional link from the vulner- Our study can be used to further increase the SVDB awareness
abilities to the known and patched code would be desirable. We of both, SE researchers and practitioners and by providing them
further believe that next generation IDEs should not only facilitate with some directions of future research and application of SVDBs
this linking process, but also take advantage of this links to recom- in the SE domain. Addtionly, this paper can be extened to be a sys-
mend patches or identify potential impacts of these vulnerabilities tematic review to include SE publications in vulnerabiltiy analysis
on other parts of the system. between 2018 and 2022, and invistegate some research questions
such as: What are common SE Repositories which are used together
with SVDBs? Which SE software lifecycle activities are supported by
6. Conclusion SVDBs?

While SE research community are increasingly focusing on se- Declaration of Competing Interest
curity and reliability, no comprehensive literature survey exists
that studies how software vulnerabilities databases are used and The authors declare that they have no known competing finan-
integrated in a developer’s tools chain. In this paper, we proposed cial interests or personal relationships that could have appeared to
a methodology to discover and quantify security topics and trends influence the work reported in this paper.
of using SVDBs in SE research. Our methodology is based on LDA, a
widely-applied statistical topic modeling approach, which we used A. Appendix
to discover topics from our dataset of relevant SE research articles.
We define various metrics to quantify how security topics have Tables A1–A4
evolved over time, allowing us to gain insights on how SVDBs are

Table A1
The top 10 terms in the 24 topics clustering 6 major SE phases as found by this research.

Cluster # Cluster Label Topic # Manual label Top 10 terms

1 Maintenance Topic 1 Taint analysis Analysi, execut, analyz, taint, result, hypercal, compar, becom, inform,
lightweight
Topic 2 Fuzzy logic Program, input, fuzz, explor, base, path, real, symbol, stage, limit,
Topic 4 Detecting Detect, technique, overflow, automat, buffer, fals, earli, static, posit, analysi
overflow
Topic 7 Attack Attack, approach, browser, base, firewall, request, detect, implement,
mechanism present, includ
Topic 10 Web application Applic, web, access, control, state, client, check, world, make, real
Topic 13 Bugs and release Bug, releas, research, analysi, increase, non, empir, examin, associ, differ
analysis
Topic 23 Static analysis Static, valid, cross, input, script, site, propos, string, common, dynam
2 Source Code Topic 8 Security design Secur, threat, approach, function, design, base, engine, level, process, featur
Topic 5 Security study Secur, report, problem, studi, process, reliabl, fix, collect, detect, discuss
Topic 19 Development Develop, secur, improve, practice, priorit, evalu, context, find, defin,
framework
Topic 21 Source code Code, sourc, develop, open, contain, line, linux, integ, perform, transform
3 Testing Topic 6 System analysis System, use, base, analysi, signatur, forma, specif, metric, architecture,
scenario
Topic 11 Test cases Test, generat, case, result, use, effect, algorithm, xssvs, data, complex
Topic 20 Software security Softwar, provid, part, time, statist, reduc, general, combin, exploit, secur
Topic 22 Exploit Exploit, system, paper, mitig, challeng, worm, defens, diagnosi, major,
memori
4 Modeling Topic 12 Malware Method, malwar, use, show, behavior, learn, detect, malici, comput, featur
methods
Topic 18 Patch study Vendor, disclosur, patch, time, respons, show, sever, valu, releas, studi
Topic 24 Prediction model Model, predict, paper, evalu, propos, use, compon, methodology, issu, effort
5 Risk Analysis Topic 9 Risk estimation Inform, risk, avail, estim, potenti, oper, cvss, allow, data, provid
Topic 17 Project Data, identify, project, repository, inform, sourc, relat, across, exist,
repository research
6 Tool Topic 3 Binary tool Tool, binary, user, mani, emonstr, creat, present, engine, structur, crash
Topic 16 SQL injection Inject, sql, database, identify, slice, tool, work, xml, type, obtain
7 Other Topic 14 Version Version, investing, assess, caus, autom, trace, anomali, surfac, current,
assessment approxim
Topic 15 Vulnerability Vulner, discov, discoveri, type, signific, known, order, result, exist, flow
discovery

9
S.S. Alqahtani Computers & Security 116 (2022) 102661

Table A2
The 24 topics discovered by LDA.

Topic name Top 10 terms

Security study Secur, report, problem, studi, process, reliabl, fix, collect, detect,
discuss
Prediction model Model, predict, paper, evalu, propos, use, compon,
methodology, issu, effort
Bugs and release Bug, releas, research, analysi, increase, non, empir, examin,
analysis associ, differ
Static analysis Static, valid, cross, input, script, site, propos, string, common,
dynam
System analysis System, use, base, analysi, signatur, forma, specif, metric,
architecture, scenario
Test cases Test, generat, case, result, use, effect, algorithm, xssvs, data,
complex
Web application Applic, web, access, control, state, client, check, world, make,
real
Project Data, identify, project, repository, inform, sourc, relat, across,
repository exist, research
Detecting Detect, technique, overflow, automat, buffer, fals, earli, static,
overflow posit, analysi
Vulnerability Vulner, discov, discoveri, type, signific, known, order, result,
discovery exist, flow
Patch study Vendor, disclosur, patch, time, respons, show, sever, valu,
releas, studi
Malware Method, malwar, use, show, behavior, learn, detect, malici,
comput, featur
Exploit Exploit, system, paper, mitig, challeng, worm, defens, diagnosi,
major, memori
Risk estimation Inform, risk, avail, estim, potenti, oper, cvss, allow, data, provid
Attack Attack, approach, browser, base, firewall, request, detect,
mechanism implement, present, includ
Taint analysis Analysi, execut, analyz, taint, result, hypercal, compar, becom,
inform, lightweight
Development Develop, secur, improve, practice, priorit, evalu, context, find,
defin, framework
Fuzzy logic Program, input, fuzz, explor, base, path, real, symbol, stage,
limit,
Binary tool Tool, binary, user, mani, emonstr, creat, present, engine,
structur, crash
SQL injection Inject, sql, database, identify, slice, tool, work, xml, type, obtain
Source code Code, sourc, develop, open, contain, line, linux, integ, perform,
transform
Version Version, investing, assess, caus, autom, trace, anomali, surfac,
assessment current, approxim
Security design Secur, threat, approach, function, design, base, engine, level,
process, featur
Software security Softwar, provid, part, time, statist, reduc, general, combin,
exploit, secur

Table A3
Topic shares and trends

Topic name Popularity (%) Trend (p-value) Topic name Popularity (%) Trend (p-value)

security study 1.728699 exploit 0.869622


prediction model 1.560169 risk estimation 0.861604
bugs and release 1.513145 attack 0.861546
analysis mechanism
static analysis 1.459491 taint analysis 0.849078
system analysis 1.416026 development 0.837696
test cases 1.240476 fuzzy logic 0.82735
web application 1.233553 binary tool 0.820848
project 1.192519 sql injection 0.67591
repository
detecting 1.157353 source code 0.665216
overflow
vulnerability 1.038362 version 0.639975
discovery assessment
patch study 0.925041 security design 0.532554
malware 0.911013 software security 0.494215

10
S.S. Alqahtani Computers & Security 116 (2022) 102661

Table A4
99 Articles included in the final dataset.

Title DOI

A1 deExploit: Identifying misuses of input data to diagnose


memory-corruption exploits at the binary level 10.1016/j.jss.2016.11.026
A2 Game of detections: how are security vulnerabilities 10.1007/s10664-
discovered in the wild? 015-9403-7
A3 An automatic method for assessing the versions 10.1007/s10664-
affected by a vulnerability 015-9408-2
A4 Do bugs foreshadow vulnerabilities? An in-depth study 10.1007/s10664-
of the chromium project 016-9447-3
A5 An Empirical Study on Detecting and Fixing Buffer
Overflow Bugs 10.1109/ICST.2016.21
A6 A Security Perspective on Code Review: The Case of
Chromium 10.1109/SCAM.2016.30
A7 Large Scale Generation of Complex and Faulty PHP Test
Cases 10.1109/ICST.2016.43
A8 SOFIA: An automated security oracle for black-box
testing of SQL-injection vulnerabilities 10.1145/2970276.2970343
A9 Vulnerability Prediction Models: A Case Study on the
Linux Kernel 10.1109/SCAM.2016.15
A10 Download Malware? No, Thanks. How Formal Methods
Can Block Update Attacks 10.1145/2897667.2897673
A11 StraightTaint: Decoupled offline symbolic taint analysis
10.1145/2970276.2970299
A12 SV-AF - A Security Vulnerability Analysis Framework
10.1109/ISSRE.2016.12
A13 BovInspector: Automatic inspection and repair of buffer
overflow vulnerabilities 10.1145/2970276.2970282
A14 Model-based whitebox fuzzing for program binaries
10.1145/2970276.2970316
A15 Mining trends and patterns of software vulnerabilities
10.1016/j.jss.2016.02.048
A16 Exploring context-sensitive data flow analysis for early
vulnerability detection 10.1016/j.jss.2015.12.021
A17 Securing native XML database-driven web applications
from XQuery injection vulnerabilities 10.1016/j.jss.2016.08.094
A18 Evaluation of the IPO-Family algorithms for test case
generation in web security testing 10.1109/ICSTW.2015.7107436
A19 Behind an Application Firewall, Are We Safe from SQL
Injection Attacks? 10.1109/ICST.2015.7102581
A20 Security slicing for auditing XML, XPath, and SQL
injection vulnerabilities 10.1109/ISSRE.2015.7381847
A21 Approximating Attack Surfaces with Stack Traces
10.1109/ICSE.2015.148
A22 Do Bugs Foreshadow Vulnerabilities? A Study of the
Chromium Project 10.1109/MSR.2015.32
A23 Measuring Dependency Freshness in Software Systems
10.1109/ICSE.2015.140
A24 Hercules: Reproducing Crashes in Real-World
Application Binaries 10.1109/ICSE.2015.99
A25 A Security Practices Evaluation Framework
10.1145/2746194.2746217
A26 Combining software interrelationship data across
heterogeneous software repositories 10.1109/ICSM.2015.7332516
A27 Improving prioritization of software weaknesses using
security models with AVUS 10.1109/SCAM.2015.7335423
A28 Impact assessment for vulnerabilities in open-source
software libraries 10.1109/ICSM.2015.7332492
A29 Information infrastructure risk prediction through
platform vulnerability analysis 10.1016/j.jss.2015.04.062
A30 Automated analysis of security requirements through
risk-based argumentation 10.1016/j.jss.2015.04.065
A31 Profiling and classifying the behavior of malicious codes
10.1016/j.jss.2014.10.031
A32 Discovering Buffer Overflow Vulnerabilities in the Wild:
An Empirical Study 10.1145/2652524.2652533
A33 Bug characteristics in open source software 10.1007/s10664-
013-9258-8
A34 Predicting Vulnerable Software Components via Text
Mining 10.1109/TSE.2014.2340398
A35 Predicting Vulnerable Components: Software Metrics vs
Text Mining 10.1109/ISSRE.2014.32
A36 An Empirical Methodology to Evaluate Vulnerability
Discovery Models 10.1109/TSE.2014.2354037
A37 Total ADS: Automated Software Anomaly Detection
System 10.1109/SCAM.2014.37
A38 Experience Report: An Analysis of Hypercall Handler
Vulnerabilities 10.1109/ISSRE.2014.24
(continued on next page)
11
S.S. Alqahtani Computers & Security 116 (2022) 102661

Table A4 (continued)
A39 Assessing the Threat Landscape for Software Libraries
10.1109/ISSREW.2014.58
A40 Input injection detection in Java code
10.1109/ICODSE.2014.7062698
A41 Automated Test Generation from Vulnerability
Signatures 10.1109/ICST.2014.32
A42 Mining Security Vulnerabilities from Linux Distribution
Metadata 10.1109/ISSREW.2014.101
A43 Security Benchmarks for Web Serving Systems
10.1109/ISSRE.2014.38
A44 A New Technique for Counteracting Web Browser
Exploits 10.1109/ASWEC.2014.28
A45 Mining SQL injection and cross site scripting
vulnerabilities using hybrid program analysis 10.1109/ICSE.2013.6606610
A46 Path sensitive static analysis of web applications for
remote code execution vulnerability detection 10.1109/ICSE.2013.6606611
A47 Program transformations to fix C integers
10.1109/ICSE.2013.6606625
A48 Automated software architecture security risk analysis
using formalized signatures 10.1109/ICSE.2013.6606612
A49 Automatic Generation of Test Drivers for Model
Inference of Web Applications 10.1109/ICSTW.2013.57
A50 VERA: A Flexible Model-Based Vulnerability Testing
Tool 10.1109/ICST.2013.65
A51 A scalable approach for malware detection through
bounded feature space behavior modeling 10.1109/ASE.2013.6693090
A52 When a Patch Goes Bad: Exploring the Properties of
Vulnerability-Contributing Commits 10.1109/ESEM.2013.19
A53 Model-Based Vulnerability Testing for Web Applications
10.1109/ICSTW.2013.58
A54 Vulnerability of the Day: Concrete demonstrations for
software engineering undergraduates 10.1109/ICSE.2013.6606667
A55 Extracting and Analyzing the Implemented Security
Architecture of Business Applications 10.1109/CSMR.2013.37
A56 Using software reliability models for security
assessment - Verification of assumptions 10.1109/ISSREW.2013.6688858
A57 Securing web-clients with instrumented code and
dynamic runtime monitoring 10.1016/j.jss.2013.02.047
A58 A Large Scale Exploratory Analysis of Software
Vulnerability Life Cycles 10.1109/ICSE.2012.6227141
A59 Predicting common web application vulnerabilities
from input validation and sanitization code patterns 10.1145/2351676.2351733
A60 Supporting automated vulnerability analysis using
formalized vulnerability signatures 10.1145/2351676.2351691
A61 Using Multiclass Machine Learning Methods to Classify
Malicious Behaviors Aimed at Web Systems 10.1109/ISSRE.2012.30
A62 Fast Detection of Access Control Vulnerabilities in PHP
Applications 10.1109/WCRE.2012.34
A63 SPaCiTE – Web Application Testing Engine
10.1109/ICST.2012.187
A64 Automated detection of client-state manipulation 10.1145/2531921
vulnerabilities
A65 Securing Opensource Code via Static Analysis
10.1109/ICST.2012.123
A66 Structured Binary Editing with a CFG Transformation
Algebra 10.1109/WCRE.2012.11
A67 CAWDOR: Compiler Assisted Worm Defense
10.1109/SCAM.2012.30
A68 Improving VRSS-based vulnerability prioritization using
analytic hierarchy process 10.1016/j.jss.2012.03.057
A69 SimFuzz: Test case similarity directed deep fuzzing
10.1016/j.jss.2011.07.028
A70 Empirical Results on the Study of Software
Vulnerabilities (NIER Track) 10.1145/1985793.1985960
A71 One Technique is Not Enough: A Comparison of
Vulnerability Discovery Techniques 10.1109/ESEM.2011.18
A72 Using SQL Hotspots in a Prioritization Heuristic for
Detecting All Types of Web Application Vulnerabilities 10.1109/ICST.2011.15
A73 An empirical investigation into open source web 10.1007/s10664-
applicationsâ€TM implementation vulnerabilities 010-9131-y
A74 Searching for a Needle in a Haystack: Predicting
Security Vulnerabilities for Windows Vista 10.1109/ICST.2010.32
A75 Security Trend Analysis with CVE Topic Models
10.1109/ISSRE.2010.53
A76 Client-Side Detection of Cross-Site Request Forgery
Attacks 10.1109/ISSRE.2010.12
A77 Mining security changes in FreeBSD
10.1109/MSR.2010.5463289
(continued on next page)

12
S.S. Alqahtani Computers & Security 116 (2022) 102661

Table A4 (continued)
A78 Detecting recurring and similar software vulnerabilities
10.1145/1810295.1810336
A79 Quantifying security risk level from CVSS estimates of
frequency and impact 10.1016/j.jss.2009.08.023
A80 MUTEC: Mutation-based testing of Cross Site Scripting
10.1109/IWSESS.2009.5068458
A81 Improving CVSS-based vulnerability prioritization and
response with context information 10.1109/ESEM.2009.5314230
A82 Vulnerability analysis for a quantitative security
evaluation 10.1109/ESEM.2009.5315969
A83 Security of open source web applications
10.1109/ESEM.2009.5314215
A84 Towards a Unifying Approach in Understanding Security
Problems 10.1109/ISSRE.2009.25
A85 On mining data across software repositories
10.1109/MSR.2009.5069498
A86 An empirical study of security problem reports in Linux
distributions 10.1109/ESEM.2009.5315985
A87 Static detection of cross-site scripting vulnerabilities
10.1145/1368088.1368112
A88 An Empirical Analysis of the Impact of Software
Vulnerability Announcements on Firm Stock Price 10.1109/TSE.2007.70712
A89 Efficiency of Vulnerability Disclosure Mechanisms to
Disseminate Vulnerability Knowledge 10.1109/TSE.2007.26
A90 Software Vulnerability Assessment Version Extraction
and Verification 10.1109/ICSEA.2007.64
A91 Threat-driven modeling and verification of secure
software using aspect-oriented Petri nets 10.1109/TSE.2006.40
A92 Modeling Software Vulnerabilities With Vulnerability
Cause Graphs 10.1109/ICSM.2006.40
A93 Measuring and Enhancing Prediction Capabilities of
Vulnerability Discovery Models for Apache and IIS HTTP 10.1109/ISSRE.2006.26
Servers
A94 Large-scale vulnerability analysis
10.1145/1162666.1162671
A95 An Empirical Study on Using the National Vulnerability 10.1007/978-3-
Database to Predict Software Vulnerabilities 642-23088-2_15
A96 Security versus performance bugs: a case study on
Firefox 10.1145/1985441.1985457
A97 Vulnerability identification and classification via text
mining bug databases 10.1109/IECON.2014.7049035
A98 Mining Bug Databases for Unidentified Software
Vulnerabilities 10.1109/HSI.2012.22
A99 Estimating ToE Risk Level Using CVSS
10.1109/ARES.2009.151

CRediT authorship contribution statement Bartlett, R.F., 1993. Linear modelling of Pearson’s product moment correlation coef-
ficient: an application of Fisher’s z-transformation. Stat 42 (1), 45. doi:10.2307/
2348110.
Sultan S. Alqahtani: Methodology, Software, Validation, Formal Blei, M.I., David, M., Ng, A.Y., Jordan, 2003. Latent dirichlet allocation. J. Mach. Learn.
analysis, Investigation, Data curation, Writing – original draft. Res. 3, 993–1022.
Blome, A., Ochoa, M., Li, K., Peroli, M., Dashti, M.T., 2013. VERA: a flexible model-
based vulnerability testing tool. In: 2013 IEEE Sixth International Conference
on Software Testing, Verification and Validation, pp. 471–478. doi:10.1109/ICST.
References 2013.65 Mar..
Bozic, J., Garn, B., Simos, D.E., Wotawa, F., 2015. Evaluation of the IPO-family al-
Alhazmi, O., Malaiya, Y., 2006. Measuring and enhancing prediction capabilities gorithms for test case generation in web security testing. In: 2015 IEEE Eighth
of vulnerability discovery models for apache and IIS HTTP servers. In: 2006 International Conference on Software Testing, Verification and Validation Work-
17th International Symposium on Software Reliability Engineering, pp. 343–352. shops (ICSTW), pp. 1–10. doi:10.1109/ICSTW.2015.7107436 Apr..
doi:10.1109/ISSRE.2006.26. Cadariu, M., Bouwers, E., Visser, J., van Deursen, A., 2015. Tracking known security
Alqahtani, S.S., “Dataset,” 2021. https://github.com/isultane/svdbs_dataset (accessed vulnerabilities in proprietary software systems. In: IEEE 22nd International Con-
Dec. 12, 2021). ference on Software Analysis, Evolution, and Reengineering (SANER), pp. 516–
Alqahtani, S.S., Eghan, E.E., Rilling, J., 2016a. SV-AF — a security vulnerability analy- 519. doi:10.1109/SANER.2015.7081868 Mar..
sis framework. In: 2016 IEEE 27th International Symposium on Software Relia- Chatzipoulidis, A., Michalopoulos, D., Mavridis, I., 2015. Information infrastructure
bility Engineering (ISSRE), pp. 219–229. doi:10.1109/ISSRE.2016.12 Oct.. risk prediction through platform vulnerability analysis. J. Syst. Softw. 106, 28–
Alqahtani, S.S., Eghan, E.E., Rilling, J., 2016b. Tracing known security vulnerabili- 41. doi:10.1016/j.jss.2015.04.062, Aug..
ties in software repositories – a Semantic Web enabled modeling approach. Sci. D’Ambros, M., Lanza, M., 2006. Software bugs and evolution: a visual approach to
Comput. Program. 121, 153–175. doi:10.1016/j.scico.2016.01.005, Jun.. uncover their relationship. In: Conference on Software Maintenance and Reengi-
Alqahtani, S.S., Eghan, E.E., Rilling, J., 2017. Recovering semantic traceability links be- neering (CSMR’06), p. 10. doi:10.1109/CSMR.2006.51 pp. –238.
tween APIs and security vulnerabilities: an ontological modeling approach. In: Fang, M., Hafiz, M., 2014. Discovering buffer overflow vulnerabilities in the wild. In:
2017 IEEE International Conference on Software Testing, Verification and Vali- Proceedings of the 8th ACM/IEEE International Symposium on Empirical Soft-
dation (ICST), pp. 80–91. doi:10.1109/ICST.2017.15 Mar.. ware Engineering and Measurement - ESEM ’14, pp. 1–10. doi:10.1145/2652524.
Appelt, D., Nguyen, C.D., Briand, L., 2015. Behind an application firewall, are we safe 2652533.
from SQL injection attacks? In: 2015 IEEE 8th International Conference on Soft- Frei, S., May, M., Fiedler, U., Plattner, B., 2006. Large-scale vulnerability analysis. In:
ware Testing, Verification and Validation (ICST), pp. 1–10. doi:10.1109/ICST.2015. Proceedings of the 2006 SIGCOMM workshop on Large-scale attack defense -
7102581 Apr.. LSAD ’06, pp. 131–138. doi:10.1145/1162666.1162671.
Atlee, J.M., France, R., Georg, G., Moreira, A., Rumpe, B., Zschaler, S., 2007. Modeling Hindle, A., Bird, C., Zimmermann, T., Nagappan, N., 2015. Do topics make sense
in software engineering. In: 29th International Conference on Software Engi- to managers and developers? Empir. Softw. Eng. 20 (2), 479–515. doi:10.1007/
neering (ICSE’07 Companion), pp. 113–114. doi:10.1109/ICSECOMPANION.2007.53 s10664- 014- 9312-1, Apr..
May.

13
S.S. Alqahtani Computers & Security 116 (2022) 102661

Hovsepyan, A., Scandariato, R., Joosen, W., Walden, J., 2012. Software vulnerability Plate, H., Ponta, S.E., Sabetta, A., 2015. Impact assessment for vulnerabilities in
prediction using text analysis techniques. In: Proceedings of the 4th interna- open-source software libraries. In: 2015 IEEE International Conference on Soft-
tional workshop on Security measurements and metrics - MetriSec ’12, p. 7. ware Maintenance and Evolution (ICSME), pp. 411–420. doi:10.1109/ICSM.2015.
doi:10.1145/2372225.2372230. 7332492 Sep..
Ingo, F., “Introduction to the tm package text mining in R.” pp. 1–8, 2013, [Online]. Raghavan, U.N., Albert, R., Kumara, S., 2007. Near linear time algorithm to detect
Available: https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf. community structures in large-scale networks. Phys. Rev. E 76 (3), 036–106.
Karlsson, M., “The edit history of the national vulnerability database and similar doi:10.1103/PhysRevE.76.036106, Sep..
vulnerability databases,” 2012. Rountev, A., Kagan, S., Gibas, M., 2004. Static and dynamic analysis of call chains
Lebeau, F., Legeard, B., Peureux, F., Vernotte, A., 2013. Model-based vulnerability in java. ACM SIGSOFT Softw. Eng. Notes 29 (4), 1. doi:10.1145/1013886.1007514,
testing for web applications. In: 2013 IEEE Sixth International Conference on Jul..
Software Testing, Verification and Validation Workshops, pp. 445–452. doi:10. Sampaio, L., Garcia, A., 2016. Exploring context-sensitive data flow analysis for early
1109/ICSTW.2013.58 Mar.. vulnerability detection. J. Syst. Softw. 113, 337–361. doi:10.1016/j.jss.2015.12.021,
Massacci, F., Nguyen, V.H., 2014. An empirical methodology to evaluate vulnerability Mar..
discovery models. IEEE Trans. Softw. Eng. 40 (12), 1147–1162. doi:10.1109/TSE. Schumacher, M., Haul, C., Hurler, M., Buchmann, Alejandro, 20 0 0. Data mining in
2014.2354037, Dec.. vulnerability databases. Comput. Sci. 12.
Mendes, N., Madeira, H., Duraes, J., 2014. Security benchmarks for web serving sys- Stivalet, B., Fong, E., 2016. Large Scale Generation of complex and faulty PHP test
tems. In: 2014 IEEE 25th International Symposium on Software Reliability Engi- cases. In: 2016 IEEE International Conference on Software Testing, Verification
neering, pp. 1–12. doi:10.1109/ISSRE.2014.38 Nov.. and Validation (ICST), pp. 409–415. doi:10.1109/ICST.2016.43 Apr..
Meneely, A., Srinivasan, H., Musa, A., Tejeda, A.R., Mokary, M., Spates, B., 2013. When Stuckman, J., Purtilo, J., 2014. Mining security vulnerabilities from linux distribution
a patch goes bad: exploring the properties of vulnerability-contributing com- metadata. In: 2014 IEEE International Symposium on Software Reliability Engi-
mits. In: 2013 ACM /IEEE International Symposium on Empirical Software Engi- neering Workshops, pp. 323–328. doi:10.1109/ISSREW.2014.101 Nov..
neering and Measurement, pp. 65–74. doi:10.1109/ESEM.2013.19 Oct.. Theisen, C., Herzig, K., Morrison, P., Murphy, B., Williams, L., 2015. Approximating at-
Ming, J., Wu, D., Wang, J., Xiao, G., Liu, P., 2016. StraightTaint: decoupled of- tack surfaces with stack traces. In: 2015 IEEE/ACM 37th IEEE International Con-
fline symbolic taint analysis. In: Proceedings of the 31st IEEE/ACM Interna- ference on Software Engineering, pp. 199–208. doi:10.1109/ICSE.2015.148 May.
tional Conference on Automated Software Engineering - ASE 2016, pp. 308–319. Thomas, S.W., 2011. Mining software repositories with topic models. In: Soft-
doi:10.1145/2970276.2970299. ware Engineering (ICSE), 2011 33rd International Conference on, pp. 1138–1139.
Morrison, P., Herzig, K., Murphy, B., Williams, L., 2015. Challenges with applying doi:10.1145/1985793.1986020.
vulnerability prediction models. In: Proceedings of the 2015 Symposium and Walden, J., Stuckman, J., Scandariato, R., 2014. Predicting vulnerable components:
Bootcamp on the Science of Security - HotSoS ’15, pp. 1–9. doi:10.1145/2746194. software metrics vs text mining. In: 2014 IEEE 25th International Symposium
2746198. on Software Reliability Engineering, pp. 23–33. doi:10.1109/ISSRE.2014.32 Nov..
Murtaza, S.S., Khreich, W., Hamou-Lhadj, A., Bener, A.B., 2016. Mining trends and Wang, R., Liu, P., Zhao, L., Cheng, Y., Wang, L., 2017. deExploit: identifying misuses of
patterns of software vulnerabilities. J. Syst. Softw. 117, 218–228. doi:10.1016/j. input data to diagnose memory-corruption exploits at the binary level. J. Syst.
jss.2016.02.048, Jul.. Softw. 124, 153–168. doi:10.1016/j.jss.2016.11.026, Feb..
O’Regan, G., “Software engineering tools,” 2017, pp. 279–295. Wijayasekara, D., Manic, M., McQueen, M., 2014. Vulnerability identification and
Palsetia, N., Deepa, G., Khan, F.A., Thilagam, P.S., Pais, A.R., 2016. Securing native classification via text mining bug databases. In: IECON 2014 - 40th Annual Con-
XML database-driven web applications from XQuery injection vulnerabilities. J. ference of the IEEE Industrial Electronics Society, pp. 3612–3618. doi:10.1109/
Syst. Softw. 122, 93–109. doi:10.1016/j.jss.2016.08.094, Dec.. IECON.2014.7049035 Oct..
Panichella, A., Dit, B., Oliveto, R., Di Penta, M., Poshyvanyk, D., De Lucia, A., 2013. Wijayasekara, D., Manic, M., Wright, J.L., McQueen, M., 2012. Mining bug databases
How to effectively use topic models for software engineering tasks? an ap- for unidentified software vulnerabilities. In: 2012 5th International Conference
proach based on genetic algorithms. In: Proceedings of the 2013 International on Human System Interactions, pp. 89–96. doi:10.1109/HSI.2012.22 Jun..
Conference on Software Engineering, pp. 522–531. Zaman, S., Adams, B., Hassan, A.E., 2011. Security versus performance bugs. In: Pro-
Pham, V.-T., Ng, W.B., Rubinov, K., Roychoudhury, A., 2015. Hercules: reproducing ceeding of the 8th working conference on Mining software repositories - MSR
crashes in real-world application binaries. In: 2015 IEEE/ACM 37th IEEE Interna- ’11, p. 93. doi:10.1145/1985441.1985457.
tional Conference on Software Engineering, pp. 891–901. doi:10.1109/ICSE.2015. Zhang, S., Caragea, D., and Ou, X., “An empirical study on using the national vulner-
99 May. ability database to predict software vulnerabilities,” 2011, pp. 217–231.

14

You might also like