You are on page 1of 11

Report

On

Artificial Intelligence's
Challenges on Software Testing
to be submitted by

SHIVANI
(00716424821)

Under the supervision of

Prof. Dr. R L Ujjwal

for the award of the degree of

MASTERS OF TECHNOLOGY - 3rd SEMESTER


in
Computer Science & Engineering

GURU GOBIND SINGH INDRAPRASTHA


UNIVERSITY Sector-16C, Dwarka, New Delhi-110078
(SESSION -2021-22)
Table of contents

Abstract_________________________________________3
1. Introduction____________________________________4
1.1 Overview___________________________________4
2. Literature Review________________________________5
2.1 Issues & Challenges_________________________ 6-9
3. Conclusion ____________________________________10
4. References_____________________________________11
ABSTRACT

Applications that use AI and ML technologies have become more and


more popular in recent years. Software testing is an essential part of an
effective AI/ML application, just like it is in conventional development.
But compared to conventional development, the methodology employed
in AI/ML development is very different. Numerous difficulties with
software testing arise as a result of these variances. In working with
AI/ML applications, software testers encounter a number of significant
issues, some of which are identified and discussed in this paper. This
study has important ramifications for further investigation. Each of the
issues raised in this study is perfect for further research and has a great
deal of potential to reveal new software testing approaches and
strategies facing challenges when AI/ML will be used.
1. INTRODUCTION

1.1 Overview of Artificial Intelligence in Software Testing.

Modern software systems are becoming more sophisticated, which makes the
usage of advanced testing approaches necessary. In terms of demanding labor
consumption, sluggish execution speed, and inadequate test coverage, manual
software testing performs poorly. The level of autonomy in software
development is still very low compared to more sophisticated industries like
self-driving cars or voice-assisted control, but it is still moving in the right
direction for autonomous testing. The software development lifecycle is
intended to be simplified by the application of artificial intelligence in software
testing technologies. By leveraging logic, problem-solving, and, in some cases,
machine learning, AI can be utilized to automate and reduce the number of
routine and repetitive tasks in development and testing. Software testing is the
most common method of validating software to established criteria, accounting
for roughly half of the development cost and time; however, changes in system
infrastructure could save up to a third of these expenditures. On the other hand,
Artificial Intelligence (AI) and Machine Learning (ML) concepts have been
successfully employed to explore the potentialities of data in several domains in
recent decades. AI approaches are well known for providing predictive models
that may be utilized for a variety of engineering objectives, it is still not widely
employed for verifying the accuracy of Systems Under Test (SUT). It is logical
to apply AI to software testing, the lack of an oracle (a system that distinguishes
between correct and improper SUT activity) is currently a bottleneck. Because
of the lack of automation of the oracle problem, AI is still infrequently
employed for mistake detection in SUT. Regression testing is the sole exception,
when the SUT's accurate performance may be inferred from the behaviour of a
previous version. Software testing became more and more dependent on AI as a
result of the growing maturity of AI's algorithms and techniques, as well as the
amazing developments in technology and computer hardware that increased
computer speed and memory. Software testing is a crucial step in the software
development lifecycle that guarantees the achievement of business
requirements, customer satisfaction.
2. LITERATURE REVIEW
Artificial intelligence (AI) in software testing includes logical reasoning,
cognitive automation, machine learning, natural language processing, and
analytics. Cognitive automation uses a variety of technological tools, including
data mining, machine learning, natural language processing, text analytics, and
semantic technology. For instance, one of these AI and cognitive computing's
connecting parts is RPA (Robotic Process Automation). Given the advent of AI,
it only seems sense that software testing and software development would both
benefit from the use of AI. Though the application of AI in testing is still in its
infancy, thought leaders in the industry are talking about self-generating,
self-executing, and self-adapting testing frameworks.

2.1 Issues and challenges of AI in Software Testing:


Software testing for artificial intelligence faces a variety of difficulties due to a
lack of technological know-how and research findings. The difficulties have
been outlined below:

1. Identifying Test Data

Before being used in the real world, an artificial intelligence (AI)


model must first be trained and evaluated. A data scientist or
engineer usually does model training and testing instead of a
software testing specialist. Has production data, for instance, been
acquired at a specific time of day to serve as training data? Does
this represent typical use? Is the data set either too large or too
small? When context conditions are altered, there has been
identified or experienced uncertainty in system outputs, responses,
or actions for the identical input test data. Changes to the test data
sets or training data sets resulted in similar accuracy problems.
A deep learning system to identify gender in photos of numerous
human faces has allegedly been developed by researchers. The
challenging aspect in this situation is altering or analysing the
images to perform more precisely, and how well they perform
depends heavily on the data that was used to train them. More
than 2,000 distinct models were trained and tested using a deep
learning architecture that is comparable to this one. Throughout
this procedure, it was discovered that the models' propensity to
accurately identify gender in difficult image sets varied widely.

In general, these models appeared to have more difficulty


differentiating between women: In six of the eight models, men
were more accurately identified than women (including the model
developed using the largest collection of training data). It is not
entirely evident or predictable why some models might be better at
identifying men than women, or vice versa, just like with their
general accuracy. Two of the models, however, distinguished
women much more precisely than men. Given these findings, AI
algorithms might not always predict the right consequences.

2. Lack of Accurate Effectiveness Measures

The functionality of an application is tested using "black-box"


testing, which disregards implementation structure. The
requirement specifications are used to construct test suites for
black-box testing. Black box testing, which is used to AI models,
entails evaluating the model without being aware of its
components or the process used to build it. The challenging
element of this case is locating the test oracle, which might
compare the output of the test with the projected numbers.
Regarding AI models, there are no prior predicted values. Due to
the prediction form of the models, it is difficult to compare or
validate the forecast vs expected value.Data scientists compare
the expected values with the actual values to assess the model
performance during the model development phase. In contrast, the
predicted values for each input are known when the model is
tested. To assess the effectiveness of the Black box testing on AI
models from a quality assurance perspective, we must define the
procedures for testing or performing quality control checks.
Although a few approaches have been put out, including Model
Performance Testing, Metamorphic Testing, Testing with various
data slices, etc., these appear insufficient in the context of AI to
evaluate a model's efficacy.
3. Data Separation into Training and Testing Sets

The available data will often be automatically split into unique


training and testing datasets using a methodology used by data
scientists. One well-known tool for doing this is SciKit-Learn, which
enables programmers to choose a portion of the necessary
dataset size at random. With the help of a number of specialised
methodologies, training and testing data can be divided in a
representative manner. It should be determined in terms of data
rather than code lines when evaluating the model testing
coverage. In the case of AI, almost every change to the algorithm,
model parameters, or training data often requires starting over with
the model, and regression for functionality that has already been
proven is extremely likely. This is because it's possible that the
entire model, and not just a small section of it, could change as a
result of the necessary alterations.
The incorrect division of datasets into training, validation, and
testing sets may result in overfitting and subpar performance in the
production environment.

4. Absence of Testable Specification

Black box testing techniques should be the most straightforward


way to assess an AI application because they avoid the intricate
details of AI algorithms' inner workings. One of the most popular
black box testing techniques is testing against a needs
specification. The use of AI applications is challenging even when
tested to a standard. The concept of AI needs is fundamentally
contested. While AI programmes are designed to give generalised
behaviour, requirements are intended to record specific behaviour.
An AI requirement specification thus seeks to define general
behaviour by its very nature. In the course of validating an AI
application, the algorithm's efficiency or accuracy is assessed. It
could be challenging to evaluate the accuracy of a testable
method's predictions.
Examples of typical AI requirements found in several papers
demonstrate the testing complexity. A programme that employs
face recognition may already exist. An individual must be
recognised from a photo. It is one thing to be able to evaluate the
model using a completely different selection of photographs from
the same photo shoot as opposed to training it using a collection of
images from a single photo session in which the subject is
displayed in a number of slightly different locations. However, this
limited capability would undoubtedly fall short. The application
might be required to handle a number of other versions. What if we
add cosmetics, alter the hair's colour, or length? What about
becoming older, gaining or losing weight? It was impossible to train
the model on every possible variation. What genuinely qualifies as
acceptable?
3. CONCLUSION

The importance of software testing is the same for AI/ML


applications as it is for other kinds of software development. Due to
the nature of how AI systems work and are developed, software
testers face many difficulties when working with AI applications.
These difficulties range from conventional white box and black box
testing techniques to issues that are highly particular to AI systems,
like overfitting and the proper division of data into training and test
sets. This paper outlines four issues that need to be resolved in order
to evaluate AI/ML systems effectively.
I'll acquire more information and provide more challenges in my
future research, and each of them is ideal for inspiring ideas for
solutions that will address the problem and open the door to more
efficient software testing techniques that can be applied to AI/ML
applications.
REFERENCES
1. Sandeep Kumar and Mohammed Abdul Qadeer, “Application of AI in
Home Automation”, IACSIT International Journal of Engineering and
Technology, Vol. 4, No. 6, December 2012.
2. Jerry Gao and Chuanqi Tao "What is AI Software Testing? and Why"
in April 2019
3. M. Harman. The role of artificial intelligence in software engi-neering.
In 1st International Workshop on Realising ArtificialIntelligence
Synergies in Software Engineering (RAISE 2012),Zurich, Switzerland,
2012
4. Hourani, H., Hammad, A., Lafi, M. (2019, April). “The Impact of
Artificial Intelligence on Software Testing.” In 2019 IEEE Jor-dan
International Joint Conference on Electrical Engineering
andInformation Technology.
5. Miroslav Bures, Tomas Cerny, Internet of Things: Current Challenges
in the Quality Assurance and Testing Methods, publication 9th iCatse
Conference on Information Science and Applications
6. S. Salman, X. Liu, “Overfitting Mechanism and Avoidance in Deep
Neural Networks”, 19 Jan 2019
7. A. Kumar, “QA: Blackbox Testing for Machine Learning Models”,
4_Sep_2018,https://dzone.com/articles/qa-blackbox-testing-for-machin
e-learning-models.
8. W. H. DeasonˈD.B. Brownˈand K.H. Chang. A Rule-Based Software
Test Data Generator. IEEE Transactions on Knowledge and
9. Memon, A. M., Soffa, M. L. and Pollack, M. E., Coverage criteria for
gui testing. ESEC/FSE-9: Proceedings of the 8th European software
engineering conference held jointly with 9th.
10. Bernd Stahl, "Ethical and Legal Issues of the Use of Computational
Intelligence Techniques in Computer security and Forensics", The
2010 International Joint conference in Neural Networks. David
Fumo,https://towardsdatascience.com/types-of-machine-learning-algor
ithms-you-shouldknow-953a08248861.

You might also like