You are on page 1of 57

CO

CO MP0
Fundamentals of Testing MP 103-
010 A7P
3-A
TU
Validation & Verification

Part of the slides are used with kind permission of Dr Shin Yoo and Dr Yue Jia

COMP0103 f.sarro@ucl.ac.uk
Why do we test software?
UCL

COMP0103 f.sarro@ucl.ac.uk
Major Software Failures
UCL
✤ NASA’s Mars lander: September 1999, crashed due to a units
integration fault

✤ Toyota brakes: Dozens dead, thousands of crashes

✤ Ariane 5 explosion

✤ Mars Polar Lander

✤ Intel’s Pentium FDIV bug

✤ THERAC-25 radiation machine: 3 dead


COMP0103-A7P/U f.sarro@ucl.ac.uk
Ariane 5 Explosion (about 370 million $ lost)
UCL
“…converting a floating
point number to a signed 16
bit integer was executed
with an input data value
outside the range
representable by a signed
16 bit integer…”
y = int(x)

https://www.youtube.com/watch?v=gp_D8r-2hwk
http://www.cas.mcmaster.ca/~baber/TechnicalReports/Ariane5/Ariane5.htm
COMP0103-A7P/U f.sarro@ucl.ac.uk
London Heathrow Terminal 5
Opening
UCL
Staff successfully tested
the brand new baggage
handling system with
over 12,000 test pieces of
luggage before the
opening to the public

COMP0103-A7P/U f.sarro@ucl.ac.uk
London Heathrow Terminal 5
Opening
UCL
1 single real life scenario caused
the entire system to become
confused and shut down

For 10 days about 42,000 bags


failed to travel with their
owners and over 500 flights
were cancelled

COMP0103-A7P/U f.sarro@ucl.ac.uk
Cost of Software Bugs
UCL

✤ Inadequate software testing costs the US alone $59


billion annually (NIST report 2002)
http://www.nist.gov/director/planning/upload/report02-3.pdf

✤ Cambridge University study states software bugs cost


economy $312 billion per year (2013)
http://undo-software.com/press-releases/cambridge-university-study-states-
software-bugs-cost-economy-312billion-per-year/

COMP0103-A7P/U f.sarro@ucl.ac.uk
What is Software Testing? UCL
COMP0103-A7P/U f.sarro@ucl.ac.uk
Level of Testing Goals
UCL
to show correctness
Increasing Testing
Process Maturity

to show problems

not to prove anything, but to reduce the risk of


using software

a discipline that helps IT professionals develop


high quality software

COMP0103-A7P/U f.sarro@ucl.ac.uk
Software Testing
UCL

✤ Software testing: An investigation conducted to


provide stakeholders with information about the
quality of the product or service under test

✤ Observing the execution of a software system to


validate whether it behaves as intended

COMP0103-A7P/U f.sarro@ucl.ac.uk
Software Qualities
UCL
✤ Dependability
✤ Correctness
✤ A program is correct if it is consistent with its specification
✤ Seldom practical for non-trivial systems
✤ Reliability
✤ Probability of correct function for some ‘unit’ of behaviour
✤ Relative to a specification and usage profile
✤ Statistical approximation to correctness (100% reliable = correct)
✤ Safety
✤ Preventing hazards (loss of life and/or property)
✤ Robustness
✤ Acceptable (degraded) behaviour under extreme conditions
✤ Performance
✤ Usability

COMP0103-A7P/U f.sarro@ucl.ac.uk
Software Testing
UCL
✤ Testing is the process of finding differences between the expected behaviour specified by
system models and the observed behaviour of the implemented system.

✤ Unit testing finds differences between a specification of an object and its realisation as a
component

✤ Structural testing finds differences between the system design model and a subset of
integrated subsystems

✤ Functional testing finds differences between the use case model and the system

✤ Performance testing finds differences between nonfunctional requirements and actual


system performance

✤ When differences are found, the developers identify the defect causing the observed failure
and modify the system to correct it or if the system model is identified as the cause of the
difference it is updated to reflect the system
COMP0103-A7P/U f.sarro@ucl.ac.uk
Terminology: Fault, Error, Failure
UCL

✤ The purpose of testing is to eradicate all of these

✤ But how are they different from each other?

COMP0103-A7P/U f.sarro@ucl.ac.uk
Terminology: Fault, Error, Failure
UCL

✤ Fault: An anomaly in the source code of a program that may lead to


an erroneous state (error)

✤ Error: The runtime effect of executing a fault, which may result in a


failure

✤ Failure: The manifestation of an error external to the program (any


deviation of the observed behaviour from the specified behaviour)

COMP0103-A7P/U f.sarro@ucl.ac.uk
Example: Faults, Error, Failure
UCL
A patient gives a doctor a list of symptoms

The doctor may look for anomalous internal conditions


(blood pressure, irregular heartbeat)

The doctor tries to diagnose the root cause

COMP0103-A7P/U f.sarro@ucl.ac.uk
Example: Faults, Error, Failure
UCL
A patient gives a doctor a list of symptoms Failure

The doctor may look for anomalous internal conditions Error


(blood pressure, irregular heartbeat)

The doctor tries to diagnose the root cause Fault

COMP0103-A7P/U f.sarro@ucl.ac.uk
Dynamic vs. Static
UCL

✤ Note that both error and failure are runtime events

✤ Testing is a form of dynamic analysis - we execute the program to see


if it behaves correctly

✤ To check the correctness without executing the program is static


analysis - you will see this in the latter half of this course (verification)

COMP0103-A7P/U f.sarro@ucl.ac.uk
What About Software Bugs?
UCL

Bug is used informally

Sometimes speakers mean


fault, sometimes error,
sometimes failure … often the
speaker doesn’t know what it
means !

A page from the Harvard Mark II electromechanical computer's log,


featuring a dead moth that was removed from the device - https://
en.wikipedia.org/wiki/Software_bug#/media/File:H96566k.jpg

COMP0103-A7P/U f.sarro@ucl.ac.uk
How To Deal With Faults
Object-Oriented Software Engineering: Using UML, Patterns, and Java, 3rd Edition

UCL
Prentice Hall, Upper Saddle River, NJ, September 25, 2009.

✤ Fault avoidance

✤ Use methodology to reduce complexity

✤ Use configuration management to prevent inconsistency

✤ Apply verification to prevent algorithmic faults

✤ Use Reviews

✤ Fault detection

✤ Testing: Activity to provoke failures in a planned way

✤ Debugging: Find and remove the cause (faults) of an observed failure

✤ Monitoring: Deliver information about state => Used during debugging

✤ Fault tolerance

✤ Exception handling

✤ Modular redundancy

COMP0103-A7P/U f.sarro@ucl.ac.uk
More Terminology
UCL
✤ Test Input: a set of input values that are used to execute a given program

✤ Test Oracle: a mechanism for determining whether the actual behaviour of a test input
execution matches the expected behaviour

✤ in general, a very difficult and labour-intensive problem

✤ Test Case: Test Input + Test Oracle

✤ Test Suite: a collection of test cases

✤ Test Effectiveness: the extent to which testing reveals faults or achieves other objectives

✤ Testing vs. Debugging: testing reveals faults, while debugging is used to remove them

COMP0103-A7P/U f.sarro@ucl.ac.uk
Example
UCL

SUT

System
Under Test

COMP0103-A7P/U f.sarro@ucl.ac.uk
Example
UCL

x=2 SUT y=5

input System output


Under Test

COMP0103-A7P/U f.sarro@ucl.ac.uk
Example
UCL

x=2 SUT y=5 y>0

input System output oracle


Under Test

COMP0103-A7P/U f.sarro@ucl.ac.uk
Testing Activities
UCL
Test Design Test Execution Test Evaluation

x=2 SUT y=5 y>0

input System output oracle


Under Test

COMP0103-A7P/U f.sarro@ucl.ac.uk
Example
UCL

x = -2 SUT y = -5 y>0

input System output oracle


Under Test

COMP0103-A7P/U f.sarro@ucl.ac.uk
A Test Case Failed
UCL

x = -2 SUT y = -5 y>0

input System output oracle


Under Test

COMP0103-A7P/U f.sarro@ucl.ac.uk
A Test Case Failed
UCL
Requirements

x = -2 SUT y = -5 y>0

input System output oracle


Under Test

COMP0103-A7P/U f.sarro@ucl.ac.uk
A Test Case Failed
UCL
Requirements

x = -2 SUT y = -5 y>0

input System output oracle


Under Test

COMP0103-A7P/U f.sarro@ucl.ac.uk
A Test Case Failed
UCL
Requirements Libs

x = -2 SUT y = -5 y>0

input System output oracle


Under Test

COMP0103-A7P/U f.sarro@ucl.ac.uk
A Test Case Failed
UCL
Requirements Libs

x = -2 SUT y = -5 y>0

input System output oracle


Under Test

OS

COMP0103-A7P/U f.sarro@ucl.ac.uk
A Test Case Failed
UCL
Requirements Libs

x = -2 SUT y = -5 y>0

input System output oracle


Under Test

OS

Hardware

COMP0103-A7P/U f.sarro@ucl.ac.uk
A Test Case Failed
UCL
Requirements Libs …but, when re-executed,
sometimes it passes!

x = -2 SUT y = -5 y>0

input System output oracle


Under Test
Flaky
OS Test
Interesting reading at https://testing.googleblog.com/2016/05/flaky-tests-at-
google-and-how-we.html
Hardware

COMP0103-A7P/U f.sarro@ucl.ac.uk
Brief Look at Software Lifecycle
UCL

COMP0103 f.sarro@ucl.ac.uk
Waterfall Model (Royce, 1970)
UCL
Requirements

Design

Implementation

Integration

Validation

Deployment
COMP0103-A7P/U f.sarro@ucl.ac.uk
Spiral Model (Boehm, 1988)
UCL

COMP0103-A7P/U f.sarro@ucl.ac.uk
Recent Paradigms
UCL
✤ Agile?

✤ Test-Driven Development?

✤ Extremely short development cycles, no physical shipment of


software (many web-apps are simply made available)

✤ The prevailing view on software lifecycle not only determines


“when” we test, but “how often” and “what for” as well

COMP0103-A7P/U f.sarro@ucl.ac.uk
Testing Activities
UCL

Object System Requirements


Client
Design Design Analysis
Expectation
Document Document Document

Unit Integration System Acceptance


Testing Testing Testing Testing

Developer Client

COMP0103-A7P/U f.sarro@ucl.ac.uk
Testing Activities
UCL

✤ Alpha test: tests performed by users in a controlled


environment, observed by the development organisation

✤ Beta test: tests performed by real users in their own


environment, performing actual tasks without interference or
close monitoring

COMP0103-A7P/U f.sarro@ucl.ac.uk
Brief Look at Testing Techniques
UCL

COMP0103 f.sarro@ucl.ac.uk
How Do You Test … ?
UCL

✤ An Eclipse plugin developed by you

✤ A linux command line tool e.g. Find

✤ An android Facebook app

COMP0103-A7P/U f.sarro@ucl.ac.uk
Testing Techniques
UCL
✤ There is no fixed recipe that works always

✤ You need to understand the pros and cons of each


technique so that you can apply the most suitable one

✤ There are two major classes of testing techniques:

✤ Black-box: tester does not look at the code

✤ White-box: tester does look at the code

COMP0103-A7P/U f.sarro@ucl.ac.uk
Random Testing
UCL
✤ Can be both black-box or white box

✤ Test inputs are selected randomly

✤ Pros:

✤ Very easy to implement, can find real faults

✤ Cons:

✤ Can take very long to achieve anything, can be very dumb

COMP0103-A7P/U f.sarro@ucl.ac.uk
Combinatorial Testing
UCL

✤ Black-box technique

✤ Tester only knows the input specification of the program

✤ How do you approach testing systematically?

✤ The same principle applies to testing a single program in


many different environments

COMP0103-A7P/U f.sarro@ucl.ac.uk
Structural Testing
UCL

✤ White-box technique

✤ The adequacy of testing is measured in terms of structural


units of the program source code (e.g. lines, branches, etc)

✤ Necessary but not sufficient (yet still not easy to achieve)

COMP0103-A7P/U f.sarro@ucl.ac.uk
Mutation Testing
UCL

✤ White-box technique

✤ A subclass of structural testing: We artificially inject faults


and see if our testing can detect them

✤ Great potential but not without challenges

COMP0103-A7P/U f.sarro@ucl.ac.uk
Regression Testing
UCL

✤ Can be both black- and white-box

✤ A type of testing that is performed to gain confidence that


the recent code modifications did not break the existing
functionalities

✤ Increasingly important as the development cycle gets


shorter; organisations spend huge amount of resources

COMP0103-A7P/U f.sarro@ucl.ac.uk
Model-based Testing
UCL

✤ Often used for complex systems with model-based design

✤ Test Input: models (representing the behaviour of a SUT)

✤ Abstract test suites are derived from input models automatically

✤ Abstract test suites are mapped to concrete test suites

✤ Rely on the use of high quality models

COMP0103-A7P/U f.sarro@ucl.ac.uk
Why Is Testing Hard?
Exhaustive Testing & Oracles

COMP0103-A7P/U f.sarro@ucl.ac.uk
Exhaustive Testing
UCL
✤ Can we test each and every program with all possible inputs, and
guarantee that it is correct every times? Surely then it IS correct

✤ In theory, yes - this is the fool-proof, simplest method… or is it?

✤ Consider the triangle program

✤ Takes three 32bit integers, tells you whether they can form three
sides of a triangle, and which type if they do

✤ How many possible inputs are there?

COMP0103-A7P/U f.sarro@ucl.ac.uk
Exhaustive Testing
UCL
✤ 32bit integers: between -231 and
231-1, there are 4,294,967,295
numbers

✤ The program takes three integers:


all possible combination is close to
828
Number of Number of
✤ Approximated number of stars in stars in the universe inputs for a program
the known universe is 1024 that can be the coursework
for Programming 101
✤ Not enough time in the whole
World!
COMP0103-A7P/U f.sarro@ucl.ac.uk
A Famous (or Infamous) Quote
UCL
“Testing can only prove the presence of bugs, not
their absence.” — Edsger W. Dijkstra

✤ Testing allows only a sampling of an enormously large program


input space

✤ The difficulty lies in how to come up with effective sampling

COMP0103-A7P/U f.sarro@ucl.ac.uk
Test Oracle
int testMe (int x, int y)
{
return x / y;
}
UCL
✤ In the example, we immediately know something is wrong when we
set y to 0: all computers will treat division by zero as an error

✤ What about those faults that forces the program to produce answers
that are only slightly wrong?

✤ For every test input, we need to have an “oracle” - something that


will tell us whether the corresponding output is correct or not

✤ Implicit oracles: system crash, unintended infinite loop, division by


zero, etc - can only detect a small subset of faults!

COMP0103-A7P/U f.sarro@ucl.ac.uk
Oracles and Non-Testable
Programs
UCL
✤ Weyuker observed that many programs are ‘non-testable’, in the
sense that it is nearly impossible to construct an effective oracle for
them
✤ Many numerical algorithms, e.g., multiplication of two large
matrices containing large values
✤ Must somehow compute result independently to validate it
✤ But that independent computation may be just as faulty
✤ Many large distributed real-time programs, e.g., USA’s Strategic
Defence Initiative (SDI), aka ‘Star Wars’
✤ Testing must demonstrate with sufficient confidence that it
would protect the USA from a nuclear attack
COMP0103-A7P/U f.sarro@ucl.ac.uk
Oracles and Reliability Testing
UCL
✤ Reliability testing gets around some of the problems of non-testable
programs by applying statistical reasoning to a testing activity
✤ Reliability: The probability of failure-free operation over some stated
period of time
✤ Can be estimated through testing, to a level of precision that depends on
how much testing was performed
✤ The greater the amount of testing, the greater the precision
✤ Butler and Finelli observe that it is physically impossible to attain the
stated reliability targets of many safety-critical systems
✤ Example: achieving ‘nine 9s’ reliability would require centuries of
testing
COMP0103-A7P/U f.sarro@ucl.ac.uk
High Dependability
Vs. Time-to-Market UCL
✤ Mass market products

✤ Better to achieve a reasonably high degree of dependability on a tight


schedule than to achieve ultra-high dependability on a much longer
schedule

✤ Critical medical devices

✤ Better to achieve ultra-high dependability on a much longer schedule than


a reasonably high degree of dependability on a tight schedule

✤ See Leveson et al. on the Therac-25 incidents! (http://en.wikipedia.org/


wiki/Therac-25)

COMP0103-A7P/U f.sarro@ucl.ac.uk
When To Stop Testing?
UCL
✤ When the program has been tested “enough”
✤ Temporal Criteria: the time allocated run out
✤ Cost Criteria: the budget allocated run out
✤ Coverage Criteria: a predefined percentage of the elements of a program is
covered by the tests; or test cases covering certain predefined conditions
are selected
✤ Statistical Criteria: predefined MTBF (mean time between failures)
compared to an existing predefined reliability model
✤ Practical Goals
✤ maximising the number of faults found (may require many test cases)
✤ minimising the number of test cases (and therefore the cost of testing)

COMP0103-A7P/U f.sarro@ucl.ac.uk
Competing Goals…
UCL
✤ Practical Goals
✤ maximising the number of faults found (may require many test cases)
✤ minimising the number of test cases (and therefore the cost of testing)

✤ Search-Based Software Testing


✤ Uses meta-heuristic optimising search techniques, such as a Genetic
Algorithm, to automate or partially automate a testing task; for example
the automatic generation of test data

Search-Based Software Testing: Past, Present and Future by P. McMinn


https://ai2-s2-pdfs.s3.amazonaws.com/67a9/ca5a33e3ab4c2300cdcfaafdfa6aeb989eb0.pdf

http://crest.cs.ucl.ac.uk/about/

COMP0103-A7P/U f.sarro@ucl.ac.uk

You might also like