You are on page 1of 12

Paper-1

References: Lomio, Francesco, et al. "Just-in-time software vulnerability detection: Are we there
yet?." Journal of Systems and Software 188 (2022): 111283.

Abstract: Software vulnerabilities are weaknesses in source code that might be exploited to cause
harm or loss. Previous work has proposed a number of automated machine learning approaches to
detect them. Most of these techniques work at release-level, meaning that they aim at predicting the
files that will potentially be vulnerable in a future release. Yet, researchers have shown that a commit-
level identification of source code issues might better fit the developer’s needs, speeding up their
resolution.

Dataset: Decision Tree, Random Forest

Methodology: : We perform an empirical study where we consider nine projects accounting for
8991 commits and experiment with eight machine learners built using process, product, and textual
metrics.

Pros:

Cons:

Conclusion/future work: Further research should focus on just-in-time vulnerability


detection, especially with respect to the introduction of smart approaches for feature selection and
training strategies.

Paper-2

References: Thirumalaivasan, D., M. Karmegam, and K. Venugopal. "AHP-DRASTIC: software for


specific aquifer vulnerability assessment using DRASTIC model and GIS." Environmental Modelling &
Software 18.7 (2003): 645-656.

Abstract: A software package AHP-DRASTIC has been developed to derive ratings and weights
of modified DRASTIC model parameters for use in specific aquifer vulnerability assessment studies.
The software is integrated with ArcView Geographical Information System (GIS) software for
modelling aquifer vulnerability, to predict areas which are more likely than others to become
contaminated as a result of activities at or near the land surface. The ranges of a few of the DRASTIC
model parameters have been modified to adapt to local hydrogeologic settings. Analytic Hierarchy
Process (AHP) has been used to compute the ratings and weights of the criteria and sub-criteria of all
parameters used in the DRASTIC model. The output from AHP generates a MS Access database for
these parameters, which is then interfaced with ArcView using Avenue Scripts. AHP-DRASTIC is
aimed at providing userfriendly GUI interfaced with GIS for the estimation of weights and ranks of
the thematic layers used for aquifer vulnerability assessment. Contingency table analysis indicates that
all wells in low and high vulnerability category have concentrations less than 10 ppm and more than
10 ppm, respectively. The model is validated with groundwater quality data and the results have
shown strong relationship between DRASTIC Specific Vulnerability Index and nitrate-as-nitrogen
concentrations with a correlation coefficient of 0.84 at 0.01 level.

Dataset:

Methodology: The methodology to implement AHP involves intensive computing effort as the
number of criteria and subcriteria increases. In this context, it was decided to develop a Graphical
User Interface (GUI) using Visual Basic Application (VB version 6.0) for implementing the AHP
methodology.

Pros: The main advantage of AHP-DRASTIC GUI is that it could be seamlessly integrated with any
GIS software running in the Microsoft platform using the customisation language of that GIS software

Cons: The AHP decomposes the given problem of decision making into hierarchy structure. The
elements at a particular hierarchy level are compared in pairs as described above

Conclusion/future work: The methodology to implement AHP involves intensive computing


effort as the number of criteria and subcriteria increases. In this context, it was decided to develop a
Graphical User Interface (GUI) using Visual Basic Application (VB version 6.0) for implementing the
AHP methodology addressing of local issues and for refined representation of local hydrogeologic
settings. In this study, the ranges of model parameters, namely depth-to-water table, topography,
impact of vadosezone and hydraulic conductivity, were modified for adaptation to local conditions.
The developed methodology combines AHP and GIS, for the determination of DSVI. The seamless
integration of the GUI for AHP with ArcView GIS provides the user with ready-made solutions
involving aquifer vulnerability assessments. The main advantage of AHP-DRASTIC GUI is that it
could be seamlessly integrated with any GIS software running in the Microsoft platform using the
customisation language of that GIS software.
Paper-3

References: Hanif, Hazim, et al. "The rise of software vulnerability: Taxonomy of software vulnerabilities
detection and machine learning approaches." Journal of Network and Computer Applications 179 (2021):
103009.

Abstract: The detection of software vulnerability requires critical attention during the development
phase to make it secure and less vulnerable. Vulnerable software always invites hackers to perform
malicious activities and disrupt the operation of the software, which leads to millions in financial
losses to software companies. In order to reduce the losses, there are many reliable and effective
vulnerability detection systems introduced by security communities aiming to detect the software
vulnerabilities as early as in the development or testing phases. To summarise the software
vulnerability detection system, existing surveys discussed the conventional and data mining
approaches.

Dataset: Multilayer Perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent
Neural Network (RNN) across SARD and NVD datasets, Labelled dataset, Gold-standard dataset,
Synthetic dataset

Methodology: This study collects and analyses past papers from the year 2011 until 2020 that
focus on detecting software vulnerabilities across various problems, programming languages and
source codes. We also analyse papers using machine learning approaches to detect software
vulnerabilities as this study plans to investigate more into the implementation of these approaches in
software vulnerability detection

Pros:

Cons: highlighted the disadvantages of static and dynamic analysis that leads to a high percentage
of errors and false positives when detecting software vulnerabilities. Similarly, Seokmo Kim etal.
(2016) also mentioned the low detection accuracy problem in static analysis techniques for
vulnerability detection.

Conclusion/future work: In conclusion, software vulnerability detection holds an essential


role in software security research. It is instrumental during the software development phase as
developers are still developing the software. It also promotes in-development software vulnerability
detection and efficiently allows developers to reduce the number of vulnerabilities patch after the
production phase. In addition, software vulnerability detection also grants big corporations the ease of
security as they have to concern less regarding the security state of their software and are able to focus
on a more complex decision-making task. As such, there are several methodologies and approaches
proposed by the research and industry community to boost the development of a more sustainable,
reliable, robust and effective vulnerability prediction framework. This allows researchers to produce
numerous review papers that discuss the domain of software vulnerability detection. However,
existing review papers focus on conventional approaches and methodologies while putting less
attention on the primary research problems and the newer machine learning approaches in software
vulnerability detection.

Paper-5

References: Kumar, Manoj, and Arun Sharma. "An integrated framework for software vulnerability
detection, analysis and mitigation: an autonomic system." Sādhanā 42 (2017): 1481-1493.

Abstract: Nowadays, the number of software vulnerabilities incidents and the loss due to
occurrence of software vulnerabilities are growing exponentially. The current existing security
strategies, the vulnerability detection and remediating approaches are not intelligent, automated, self-
managed and not competent to combat against the vulnerabilities and security threats, and to provide
secured self-managed software environment to the organizations. Hence, there is a strong need to
devise an intelligent and automated approach to optimize security and prevent the occurrence of
vulnerabilities or mitigate the vulnerabilities. The autonomic computing is a nature-inspired and self-
management-based computational model. In this paper, an autonomic-computing-based integrated
framework is proposed to detect, fire the trigger of alarm, assess, classify, prioritize, mitigate and
manage the software vulnerability automatically. The proposed framework uses a knowledge base and
inference engine, which automatically takes the remediating actions on future occurrence of software
security vulnerabilities through self-configuration, self-healing, self-prevention and self-optimization
as per the needs. The proposed framework is beneficial to industry and society in various aspects
because it is an integrated, crossconcern and intelligent framework and provides more secured self-
managed environment to the organizations. The proposed framework reduces the security risks and
threats, and also monetary and reputational loss. It can be embedded easily in existing software and
incorporated or implemented as an inbuilt integral component of the new software during software
development.

Dataset:

Methodology:

Pros: The proposed method improves the efficiency in terms of the accuracy.
Cons: They pointed out that the existing static vulnerability detection methods have high false
positive and false negative rates. Hence, they used clustering technology to mine the pattern from the
set of vulnerability sequences and constructed the Vulnerability-Pattern Library (VPL) to improve the
efficiency of proposed method. Experimental results show that proposed method has lower false
positive and false negative rates.

Conclusion/future work: The software industry has made several efforts to develop
vulnerability-free software systems but failed in achieving the objective of developing software
systems 100% free from vulnerabilities. Hence, software industry is looking for alternative adequate
remediating approach to combat the security threats, to reduce security risks and irreparable loss, and
to improve the performance of software system. In this paper, an autonomic computing-based
integrated framework is proposed to identify, analyse, classify, prioritize, analyse risks, impacts on
assets and consequences, mitigate and manage the software vulnerability.

Though the proposed framework provides an inbuilt autonomic facility to existing and new software
systems and is beneficial to the organizations and society, the commercial implementation of
proposed integrated framework has not been carried out and is pending. For global usability and
scalability, it should be developed as a cross-cutting concerns and platform-independent software
using aspect oriented or component-oriented software development paradigm.

Paper-6

References: Baptista, Tiago, Nuno Oliveira, and Pedro Rangel Henriques. "Using machine learning for
vulnerability detection and classification." 10th Symposium on Languages, Applications and Technologies
(SLATE 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2021.

Abstract: The work described in this paper aims at developing a machine learning based tool for
automatic identification of vulnerabilities on programs (source, high level code), that uses an abstract
syntax tree representation. It is based on Fast Scan, using code2seq approach. Fast scan is a recently
developed system aimed capable of detecting vulnerabilities in source code using machine learning
techniques. Nevertheless, Fast Scan is not able of identifying the vulnerability type. In the presented
work the main goal is to go further and develop a method to identify specific types of vulnerabilities.
As will be shown, the goal will be achieved by optimizing the model’s hyper parameters, changing
the method of preprocessing the input data and developing an architecture that brings together
multiple models to predict different specific vulnerabilities. The preliminary results obtained from the
training stage, are very promising. The best f1 metric obtained is 93% resulting in a precision of 90%
and accuracy of 85%, according to the performed tests and regarding a trained model to predict
vulnerabilities of the injection type.
Dataset: The datasets take a major part in this project, because all the developed work has no utility
unless there is enough good data to train the models. The first dataset which, from now on, will be
referenced as dt01. dt01 is composed by 43 different projects, there is the original source code and for
each project there is an XML file with detected vulnerabilities. This XML file was provided by
Checkmark and it is the output of their static analysis tool CxSAST. With one important detail, the
output was validated by humans which means that there are no false positives.

Methodology:

Pros: This representation has some significant advantages over the use of simple code tokenisation,
when compared in terms of code comparison. Namely when trying to find two methods that have the
same functionality but different implementations. Having the AST enables a better comparison, since
both functions paths will be similar, as represented in Figure 2 . So functions will have different token
representations but similar path representation only differing in the Block statement

Cons: An injection attack refers to an attack where untrusted data is supplied as input to a program.
This input is then processed and changes the application’s expected behaviour. Normally, this
vulnerability is related to insufficient user input validation. Since this is a well known and one of the
oldest exploits, that has some automatic tools in order to exploit without having much knowledge,
makes this one of the most common and dangerous vulnerability.

Conclusion/future work: This section is intended to close the paper, summarising the
outcomes reached so far. The first section contains the context on vulnerability detection, the
motivation and objectives of the project. The second section is a literature review on vulnerability
detection. The outcomes of the reported stage provided the foundations for the proposed approach.
The third section presents and discusses the our working proposal. The fourth section explains the
development and includes the presentation of the dataset used for training as well as describes the
hardware details. The fifth section discusses the implementation. Finally the sixth section analyzer the
training results obtained when testing models. Taking into account the results from this first
experiment, it becomes clear that the hyper parameter optimization has improved the results in the
increase the precision and the other metrics. Also the train only for a specific vulnerability might as
well have an influence since the train for a more strict purpose is more effective, namely in this case.
While Fast scan attempts to predict the presence of many types of vulnerabilities, new Fast scan aims
at creating models to predict a single type of vulnerability, gathering the parts into a global analyzer in
a final system
Paper-7

References:

Wei, Wang. "Survey of Software Vulnerability Discovery Technology." 2017 7th International
Conference on Social Network, Communication and Education (SNCE 2017). Atlantis Press, 2017.

Abstract: The 21st century is the information age. The rapid development of computer
technology supports the rapid development of the information age. With the rapid spread of
computers and networks, more and more software products play an important role in people's daily
life. In computer security[1], a vulnerability[2] is a weakness which allows an attacker to reduce a
system's information assurance. Vulnerability is the intersection of three elements: a system
susceptibility or flaw, attacker access to the flaw, and attacker capability to exploit the flaw. Due to
software developer’s negligence in the development process of software or programming language
limitations, software products often have security and functional flaws which damage software,
known as software vulnerabilities. Software vulnerability discovery aims at discovering
vulnerabilities that already exist in software, and then developers patch vulnerabilities to eliminate
damages brought to software products. Now, vulnerability discovery ,in the field of information
security, is becoming increasingly important. This paper mainly introduces the main methods of
vulnerability discovery

Dataset:

Methodology: White box analysis is an analysis method of vulnerability with the binary code of
target software, or source code changing from binary code by reverse engineering[4]. Black box
analysis is an analysis method of vulnerability without binary code, which can control program input,
then observe program output getting information to discover vulnerability[5]. Gray box analysis is an
analytical method of vulnerability that combines above two analysis methods to improve the
efficiency and quality of vulnerability discovery, methods include manual testing technology,
Fuzzing, static analysis technology, dynamic analysis technology

Pros: It is mainly used in program with interface, There is less false positives, high efficiency and
able to detect a variety of vulnerabilities, it is mainly used in program without source code, It is
mainly used in program without source code or dealt with reverse engineering

Cons: It highly depends on the analyst's experience and skills. There is false negative and not
common. The result set to analysis is large and false positive rate is high. It is usually not automatic
Conclusion/future work: With the popularity of computer software in people's daily life,
more and more countries and people concern about the security of software and vulnerabilities of
software. The vulnerability discovery technology is an important aspect in information security field.
Vulnerability discovery technology, born out of the software testing theory and software development
debugging technology, can greatly improve the security of the software. But the vulnerability
discovery is a double-edged sword, which has become the mainstream technology for hacker to hack
software. All in all, the development prospect of vulnerability discovery technology is broad, with the
information security is more and more attention, software development technology is more and more
advanced and the new analysis means will also follow

Paper-8

References: Li, Xin, et al. "Automated software vulnerability detection based on hybrid neural
network." Applied Sciences 11.7 (2021): 3201.

Abstract: Vulnerabilities threaten the security of information systems. It is crucial to detect and
patch vulnerabilities before attacks happen. However, existing vulnerability detection methods suffer
from long-term dependency, out of vocabulary, bias towards global features or local features, and
coarse detection granularity. This paper proposes an automatic vulnerability detection framework in
source code based on a hybrid neural network. First, the inputs are transformed into an intermediate
representation with explicit structure information using lower level virtual machine intermediate
representation (LLVM IR) and backward program slicing. After the transformation, the size of
samples and the size of vocabulary are significantly reduced. A hybrid neural network model is then
applied to extract high-level features of vulnerability, which learns features both from convolutional
neural networks (CNNs) and recurrent neural networks (RNNs). The former is applied to learn local
vulnerability features, such as buffer size. Furthermore, the latter is utilized to learn global features,
such as data dependency. The extracted features are made up of concatenated outputs of CNN and
RNN. Experiments are performed to validate our vulnerability detection method. The results show
that our proposed method achieves excellent results with F1-scores of 98.6% and accuracy of 99.0%
on the SARD dataset. It outperforms state-of-the-art methods.

Dataset: state-of-the-art on the SARD dataset. This is attributed to the proposed intermediate
representation and the hybrid neural network, which takes both long-term dependencies and local
details. VDISC dataset is generated by the traditional static detection method. However, the
conventional static detection method itself has the problem of a high false positive rate. The NVD
dataset only provides the difference between the vulnerability sample and the patch

Methodology:
Pros: The proposed approach has several limitations, which can be further investigated. Firstly, our
method is applied to detect vulnerabilities in source code written in C language at present. In theory,
our approach can be applied to other programming languages as well. Therefore, applying our
methods to other languages is one of the interesting future works

Cons: our approach is only conducted on the SARD dataset due to the lack of labeled vulnerability
datasets and falls into in-project vulnerability detection. The lack of labeled datasets is an open
problem restricting the development of automated vulnerability detection technology. Existing
vulnerability datasets suffer from the wrong labels and coarse-grained vulnerability descriptions

Conclusion/future work: In this paper, a novel approach that detects source code
vulnerabilities automatically is proposed. The programs are transformed into intermediate
representations first. LLVM IR and backward program slicing are utilized. The transformed
intermediate representation not only eliminates irrelevant information but also represents the
vulnerabilities with explicit dependency relations. Then, a hybrid neural network is proposed to learn
both local and long-term features of a vulnerability. We have achieved a prototype. The experiment
results show that our approach outperforms state-of-the-art methods.

Paper-9

References: Eberendu, Adanma Cecilia, et al. "A systematic literature review of software vulnerability
detection." European Journal of Computer Science and Information Technology 10.1 (2022): 23-37.

Abstract: This study provided a systematic literature review of software vulnerability detection
(SVD) by searching ACM and IEEE databases for related literatures. Using the Preferred Reporting
Items for Systematic Reviews and Meta-Analyses (PRISMA) flowchart, a total of 55 studies
published in the selected journals and conference proceeding of IEEE and ACM from 2015 to 2021
were reviewed. The objective is to identify, select and critically evaluate research works carried out
on software vulnerability detection. The selected articles were grouped into 7 categories across
various vulnerability detection evaluation criteria such as neural network – 5 papers, machine learning
– 11 papers, static and dynamic analysis – 8 papers, code clone – 3 papers, classification – 4 papers,
models – 3 papers, and frameworks – 6 papers. There are 15 articles that could not fall into any of
these 7 categories, thus, they were place in others row that used different criteria to implement
vulnerability detection. The result showed that many researchers used machine learning strategy to
detect vulnerability in software since large volume of data can be reviewed easily with machine
learning. Although many systems have been developed for detecting software vulnerability, none is
able to show the type of vulnerability detected.
Dataset:

Methodology: Search strategy, Selection criteria, Quality Assessment, Data Extraction, Analysis
and Results

Pros:

Cons: Some developed techniques for detecting vulnerability were unable to show the type of
vulnerability detected and this is an issue for discussion in subsequent study. The result of this
systematic literature review was evaluated using PRISMA guidelines (Page et al, 2021).

Conclusion/future work: Systematic review of works done on software vulnerability


detection used different techniques such as machine learning and deep learning, neural network,
binary code clone, static and dynamic analysis, methods and framework analysis to detect
vulnerability in software products. Based on this systematic literature review, machine learning and
deep learning approaches were mostly used to detect vulnerability in software because every domain
is driven nowadays by machine learning application and many researchers are venturing into it. Static
and dynamic analysis was also used extensively to detect vulnerability in software. Some developed
techniques for detecting vulnerability were unable to show the type of vulnerability detected and this
is an issue for discussion in subsequent study. The result of this systematic literature review was
evaluated using PRISMA guidelines (Page et al, 2021).

Paper-10

References: Harzevili, Nima Shiri, et al. "A Survey on Automated Software Vulnerability Detection Using
Machine Learning and Deep Learning." arXiv preprint arXiv:2306.11673 (2023).

Abstract: Software vulnerability detection is critical in software security because it identifies


potential bugs in software systems, enabling immediate remediation and mitigation measures to be
implemented before they may be exploited. Automatic vulnerability identification is important
because it can evaluate large codebases more efficiently than manual code auditing. Many Machine
Learning (ML) and Deep Learning (DL) based models for detecting vulnerabilities in source code
have been presented in recent years. However, a survey that summarises, classifies, and analyses the
application of ML/DL models for vulnerability detection is missing. It may be difficult to discover
gaps in existing research and potential for future improvement without a comprehensive survey. This
could result in essential areas of research being overlooked or under-represented, leading to a skewed
understanding of the state of the art in vulnerability detection. This work address that gap by
presenting a systematic survey to characterize various features of ML/DL-based source code level
software vulnerability detection approaches via five primary research questions (RQs). Specifically,
our RQ1 examines the trend of publications that leverage ML/DL for vulnerability detection,
including the evolution of research and the distribution of publication venues. RQ2 describes
vulnerability datasets used by existing ML/DL-based models, including their sources, types, and
representations, as well as analyses of the embedding techniques used by these approaches. RQ3
explores the model architectures and design assumptions of ML/DL-based vulnerability detection
approaches. RQ4 summarises the type and frequency of vulnerabilities that are covered by existing
studies. Lastly, RQ5 presents a list of current challenges to be researched and an outline of a potential
research roadmap that highlights crucial opportunities for future work

Dataset: The quality of datasets can be assessed by different factors such as the source of data, data
size and scale, data types, and preprocessing steps performed on data. For example, inappropriate
preprocessing (representation) on data may result in poor performance of DL models [121]. In this
section, we examine data used in vulnerability detection studies and conducted a comprehensive
analysis of the steps of data source, data type, and data representation

Methodology: Sources of Information, Search Terms, Study Selection, Study Quality


Assessment, Selection Verification,

Pros: Automation: Automation is a significant advantage. ML models can automatically scan and
analyze large codebases, network traffic logs, or system configurations, flagging potential
vulnerabilities without requiring human intervention for each individual case [19]. This automation
speeds up the detection process, allowing security teams to focus on verifying and mitigating
vulnerabilities rather than manual analysis. Performance: ML/DL approaches offer faster analysis.
Traditional vulnerability detection methods rely on manual inspection or the application of predefined
rules [7, 18, 126, 127, 130]. In contrast, ML/DL approaches can evaluate enormous volumes of data
in parallel and generate predictions fast, dramatically shortening the time necessary to find
vulnerabilities. Detection effectiveness: ML/DL models can uncover previously unknown
vulnerabilities, commonly known as zero-day vulnerabilities [10]. These models may uncover signs
of vulnerabilities even when they have not been specifically trained on them by learning patterns and
generalizing from labeled data. This capability improves the overall security posture by helping to
identify and address unknown weaknesses in software before they are exploited by attackers [2].

Cons:

Conclusion/future work: In this study, we conducted a systematic survey on 67 primary


studies using ML/DL models for software security vulnerability detection. We collected the papers
from different journals and conference venues including 25 conferences and 12 journals. Our review
is established based on five major research questions and a set of sub-research questions. We devised
the research questions in a comprehensive manner where to cover various dimensions of software
vulnerability detection. Our analysis of primary studies indicated that there is a booming trend in the
growth of using ML/DL models for software vulnerability detection. Our deep analysis of data
sources of primary studies revealed that 65.7% of studies use benchmark data for software
vulnerability detection. We also find 6 broad categories of DL models along with 14 classic ML
models used in software vulnerability detection. The categories of DL models are classified as
recurrent models, graph models, attention models, convolutional models, general models, and
transformer models. RNNs are by far the most popular DNNs in software vulnerability detection. Our
analysis also finds that RNNs with LSTM cells are the most popular network architectures in
recurrent models, accounting for 8 primary studies. In the category of graph models, GGNN is the
most popular DL model used by 4 primary studies. Our results on vulnerability types reveal that the
most frequent type of vulnerability covered in existing studies is Improper Control of a Resource
Through its Lifetime - (CWE-664) accounting for 18 primary studies. In conclusion, we have
identified a collection of on going challenges that necessitate further exploration in future studies
involving the utilization of ML/DL models for software vulnerability detection

You might also like