01 Fundamentals of Testing

CO
CO MP0
Fundamentals of Testing MP 103-
010 A7P
3-A
TU
Validation & Verification
Part of the slides are used with kind permission of Dr Shin Yoo and Dr Yue Jia
COMP0103 f.sarro@ucl.ac.uk
Why do we test software?
UCL
Major Software Failures
UCL
✤ NASA’s Mars lander: September 1999, crashed due to a units
integration fault
✤ Toyota brakes: Dozens dead, thousands of crashes
✤ Ariane 5 explosion
✤ Mars Polar Lander
✤ Intel’s Pentium FDIV bug
✤ THERAC-25 radiation machine: 3 dead

COMP0103-A7P/U f.sarro@ucl.ac.uk
Ariane 5 Explosion (about 370 million $ lost)
UCL
“…converting a floating
point number to a signed 16
bit integer was executed
with an input data value
outside the range
representable by a signed
16 bit integer…”
y = int(x)
https://www.youtube.com/watch?v=gp_D8r-2hwk
http://www.cas.mcmaster.ca/~baber/TechnicalReports/Ariane5/Ariane5.htm
London Heathrow Terminal 5
Opening
UCL
Staff successfully tested
the brand new baggage
handling system with
over 12,000 test pieces of
luggage before the
opening to the public
London Heathrow Terminal 5
Opening
UCL
1 single real life scenario caused
the entire system to become
confused and shut down
For 10 days about 42,000 bags

failed to travel with their
owners and over 500 flights
were cancelled
Cost of Software Bugs
UCL
✤ Inadequate software testing costs the US alone $59

billion annually (NIST report 2002)
http://www.nist.gov/director/planning/upload/report02-3.pdf
✤ Cambridge University study states software bugs cost

economy $312 billion per year (2013)
http://undo-software.com/press-releases/cambridge-university-study-states-
software-bugs-cost-economy-312billion-per-year/
What is Software Testing? UCL
Level of Testing Goals
UCL
to show correctness
Increasing Testing
Process Maturity
to show problems
not to prove anything, but to reduce the risk of

using software
a discipline that helps IT professionals develop

high quality software
Software Testing
UCL
✤ Software testing: An investigation conducted to

provide stakeholders with information about the
quality of the product or service under test
✤ Observing the execution of a software system to

validate whether it behaves as intended
Software Qualities
UCL
✤ Dependability
✤ Correctness
✤ A program is correct if it is consistent with its specification
✤ Seldom practical for non-trivial systems
✤ Reliability
✤ Probability of correct function for some ‘unit’ of behaviour
✤ Relative to a specification and usage profile
✤ Statistical approximation to correctness (100% reliable = correct)
✤ Safety
✤ Preventing hazards (loss of life and/or property)
✤ Robustness
✤ Acceptable (degraded) behaviour under extreme conditions
✤ Performance
✤ Usability
Software Testing
UCL
✤ Testing is the process of finding differences between the expected behaviour specified by
system models and the observed behaviour of the implemented system.
✤ Unit testing finds differences between a specification of an object and its realisation as a
component
✤ Structural testing finds differences between the system design model and a subset of
integrated subsystems
✤ Functional testing finds differences between the use case model and the system
✤ Performance testing finds differences between nonfunctional requirements and actual

system performance
✤ When differences are found, the developers identify the defect causing the observed failure
and modify the system to correct it or if the system model is identified as the cause of the
difference it is updated to reflect the system
Terminology: Fault, Error, Failure
UCL
✤ The purpose of testing is to eradicate all of these
✤ But how are they different from each other?
Terminology: Fault, Error, Failure
UCL
✤ Fault: An anomaly in the source code of a program that may lead to

an erroneous state (error)
✤ Error: The runtime effect of executing a fault, which may result in a

failure
✤ Failure: The manifestation of an error external to the program (any

deviation of the observed behaviour from the specified behaviour)
Example: Faults, Error, Failure
UCL
A patient gives a doctor a list of symptoms
The doctor may look for anomalous internal conditions

(blood pressure, irregular heartbeat)
The doctor tries to diagnose the root cause
Example: Faults, Error, Failure
UCL
A patient gives a doctor a list of symptoms Failure
The doctor may look for anomalous internal conditions Error

(blood pressure, irregular heartbeat)
The doctor tries to diagnose the root cause Fault
Dynamic vs. Static
UCL
✤ Note that both error and failure are runtime events
✤ Testing is a form of dynamic analysis - we execute the program to see

if it behaves correctly
✤ To check the correctness without executing the program is static

analysis - you will see this in the latter half of this course (verification)
What About Software Bugs?
UCL
Bug is used informally
Sometimes speakers mean

fault, sometimes error,
sometimes failure … often the
speaker doesn’t know what it
means !
A page from the Harvard Mark II electromechanical computer's log,

featuring a dead moth that was removed from the device - https://
en.wikipedia.org/wiki/Software_bug#/media/File:H96566k.jpg
How To Deal With Faults
Object-Oriented Software Engineering: Using UML, Patterns, and Java, 3rd Edition
UCL
Prentice Hall, Upper Saddle River, NJ, September 25, 2009.
✤ Fault avoidance
✤ Use methodology to reduce complexity
✤ Use configuration management to prevent inconsistency
✤ Apply verification to prevent algorithmic faults
✤ Use Reviews
✤ Fault detection
✤ Testing: Activity to provoke failures in a planned way
✤ Debugging: Find and remove the cause (faults) of an observed failure
✤ Monitoring: Deliver information about state => Used during debugging
✤ Fault tolerance
✤ Exception handling
✤ Modular redundancy
More Terminology
UCL
✤ Test Input: a set of input values that are used to execute a given program
✤ Test Oracle: a mechanism for determining whether the actual behaviour of a test input
execution matches the expected behaviour
✤ in general, a very difficult and labour-intensive problem
✤ Test Case: Test Input + Test Oracle
✤ Test Suite: a collection of test cases
✤ Test Effectiveness: the extent to which testing reveals faults or achieves other objectives
✤ Testing vs. Debugging: testing reveals faults, while debugging is used to remove them
Example
UCL
SUT
System
Under Test
Example
UCL
x=2 SUT y=5
input System output

Under Test
Example
UCL
x=2 SUT y=5 y>0
input System output oracle

Under Test
Testing Activities
UCL
Test Design Test Execution Test Evaluation
x=2 SUT y=5 y>0

Under Test
Example
UCL
x = -2 SUT y = -5 y>0

Under Test
A Test Case Failed
UCL
x = -2 SUT y = -5 y>0

Under Test
A Test Case Failed
UCL
Requirements
x = -2 SUT y = -5 y>0

Under Test
A Test Case Failed
UCL
Requirements
x = -2 SUT y = -5 y>0

Under Test
A Test Case Failed
UCL
Requirements Libs
x = -2 SUT y = -5 y>0

Under Test
A Test Case Failed
UCL
Requirements Libs
x = -2 SUT y = -5 y>0

Under Test
OS
A Test Case Failed
UCL
Requirements Libs
x = -2 SUT y = -5 y>0

Under Test
OS
Hardware
A Test Case Failed
UCL
Requirements Libs …but, when re-executed,
sometimes it passes!
x = -2 SUT y = -5 y>0

Under Test
Flaky
OS Test
Interesting reading at https://testing.googleblog.com/2016/05/flaky-tests-at-
google-and-how-we.html
Hardware
Brief Look at Software Lifecycle
UCL
Waterfall Model (Royce, 1970)
UCL
Requirements
Design
Implementation
Integration
Validation
Deployment
Spiral Model (Boehm, 1988)
UCL
Recent Paradigms
UCL
✤ Agile?
✤ Test-Driven Development?
✤ Extremely short development cycles, no physical shipment of

software (many web-apps are simply made available)
✤ The prevailing view on software lifecycle not only determines

“when” we test, but “how often” and “what for” as well
Testing Activities
UCL
Object System Requirements

Client
Design Design Analysis
Expectation
Document Document Document
Unit Integration System Acceptance

Testing Testing Testing Testing
Developer Client
Testing Activities
UCL
✤ Alpha test: tests performed by users in a controlled

environment, observed by the development organisation
✤ Beta test: tests performed by real users in their own

environment, performing actual tasks without interference or
close monitoring
Brief Look at Testing Techniques
UCL
How Do You Test … ?
UCL
✤ An Eclipse plugin developed by you
✤ A linux command line tool e.g. Find
✤ An android Facebook app
Testing Techniques
UCL
✤ There is no fixed recipe that works always
✤ You need to understand the pros and cons of each

technique so that you can apply the most suitable one
✤ There are two major classes of testing techniques:
✤ Black-box: tester does not look at the code
✤ White-box: tester does look at the code
Random Testing
UCL
✤ Can be both black-box or white box
✤ Test inputs are selected randomly
✤ Pros:
✤ Very easy to implement, can find real faults
✤ Cons:
✤ Can take very long to achieve anything, can be very dumb
Combinatorial Testing
UCL
✤ Black-box technique
✤ Tester only knows the input specification of the program
✤ How do you approach testing systematically?
✤ The same principle applies to testing a single program in

many different environments
Structural Testing
UCL
✤ White-box technique
✤ The adequacy of testing is measured in terms of structural

units of the program source code (e.g. lines, branches, etc)
✤ Necessary but not sufficient (yet still not easy to achieve)
Mutation Testing
UCL
✤ White-box technique
✤ A subclass of structural testing: We artificially inject faults

and see if our testing can detect them
✤ Great potential but not without challenges
Regression Testing
UCL
✤ Can be both black- and white-box
✤ A type of testing that is performed to gain confidence that

the recent code modifications did not break the existing
functionalities
✤ Increasingly important as the development cycle gets

shorter; organisations spend huge amount of resources
Model-based Testing
UCL
✤ Often used for complex systems with model-based design
✤ Test Input: models (representing the behaviour of a SUT)
✤ Abstract test suites are derived from input models automatically
✤ Abstract test suites are mapped to concrete test suites
✤ Rely on the use of high quality models
Why Is Testing Hard?
Exhaustive Testing & Oracles
Exhaustive Testing
UCL
✤ Can we test each and every program with all possible inputs, and
guarantee that it is correct every times? Surely then it IS correct
✤ In theory, yes - this is the fool-proof, simplest method… or is it?
✤ Consider the triangle program
✤ Takes three 32bit integers, tells you whether they can form three
sides of a triangle, and which type if they do
✤ How many possible inputs are there?
Exhaustive Testing
UCL
✤ 32bit integers: between -231 and
231-1, there are 4,294,967,295
numbers
✤ The program takes three integers:

all possible combination is close to
828
Number of Number of
✤ Approximated number of stars in stars in the universe inputs for a program
the known universe is 1024 that can be the coursework
for Programming 101
✤ Not enough time in the whole
World!
A Famous (or Infamous) Quote
UCL
“Testing can only prove the presence of bugs, not
their absence.” — Edsger W. Dijkstra
✤ Testing allows only a sampling of an enormously large program

input space
✤ The difficulty lies in how to come up with effective sampling
Test Oracle
int testMe (int x, int y)
{
return x / y;
}
UCL
✤ In the example, we immediately know something is wrong when we
set y to 0: all computers will treat division by zero as an error
✤ What about those faults that forces the program to produce answers
that are only slightly wrong?
✤ For every test input, we need to have an “oracle” - something that

will tell us whether the corresponding output is correct or not
✤ Implicit oracles: system crash, unintended infinite loop, division by

zero, etc - can only detect a small subset of faults!
Oracles and Non-Testable
Programs
UCL
✤ Weyuker observed that many programs are ‘non-testable’, in the
sense that it is nearly impossible to construct an effective oracle for
them
✤ Many numerical algorithms, e.g., multiplication of two large
matrices containing large values
✤ Must somehow compute result independently to validate it
✤ But that independent computation may be just as faulty
✤ Many large distributed real-time programs, e.g., USA’s Strategic
Defence Initiative (SDI), aka ‘Star Wars’
✤ Testing must demonstrate with sufficient confidence that it
would protect the USA from a nuclear attack
Oracles and Reliability Testing
UCL
✤ Reliability testing gets around some of the problems of non-testable
programs by applying statistical reasoning to a testing activity
✤ Reliability: The probability of failure-free operation over some stated
period of time
✤ Can be estimated through testing, to a level of precision that depends on
how much testing was performed
✤ The greater the amount of testing, the greater the precision
✤ Butler and Finelli observe that it is physically impossible to attain the
stated reliability targets of many safety-critical systems
✤ Example: achieving ‘nine 9s’ reliability would require centuries of
testing
High Dependability
Vs. Time-to-Market UCL
✤ Mass market products
✤ Better to achieve a reasonably high degree of dependability on a tight

schedule than to achieve ultra-high dependability on a much longer
schedule
✤ Critical medical devices
✤ Better to achieve ultra-high dependability on a much longer schedule than

a reasonably high degree of dependability on a tight schedule
✤ See Leveson et al. on the Therac-25 incidents! (http://en.wikipedia.org/

wiki/Therac-25)
When To Stop Testing?
UCL
✤ When the program has been tested “enough”
✤ Temporal Criteria: the time allocated run out
✤ Cost Criteria: the budget allocated run out
✤ Coverage Criteria: a predefined percentage of the elements of a program is
covered by the tests; or test cases covering certain predefined conditions
are selected
✤ Statistical Criteria: predefined MTBF (mean time between failures)
compared to an existing predefined reliability model
✤ Practical Goals
✤ maximising the number of faults found (may require many test cases)
✤ minimising the number of test cases (and therefore the cost of testing)
Competing Goals…
UCL
✤ Practical Goals
✤ maximising the number of faults found (may require many test cases)
✤ minimising the number of test cases (and therefore the cost of testing)
✤ Search-Based Software Testing

✤ Uses meta-heuristic optimising search techniques, such as a Genetic
Algorithm, to automate or partially automate a testing task; for example
the automatic generation of test data
Search-Based Software Testing: Past, Present and Future by P. McMinn

https://ai2-s2-pdfs.s3.amazonaws.com/67a9/ca5a33e3ab4c2300cdcfaafdfa6aeb989eb0.pdf
http://crest.cs.ucl.ac.uk/about/

01 Fundamentals of Testing

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

01 Fundamentals of Testing

Uploaded by

Copyright:

Available Formats

CO

✤ Toyota brakes: Dozens dead, thousands of crashes

✤ Mars Polar Lander

✤ Intel’s Pentium FDIV bug

✤ THERAC-25 radiation machine: 3 dead

For 10 days about 42,000 bags

✤ Inadequate software testing costs the US alone $59

✤ Cambridge University study states software bugs cost

not to prove anything, but to reduce the risk of

a discipline that helps IT professionals develop

✤ Software testing: An investigation conducted to

✤ Observing the execution of a software system to

✤ Performance testing finds differences between nonfunctional requirements and actual

✤ The purpose of testing is to eradicate all of these

✤ But how are they different from each other?

✤ Fault: An anomaly in the source code of a program that may lead to

✤ Error: The runtime effect of executing a fault, which may result in a

✤ Failure: The manifestation of an error external to the program (any

The doctor may look for anomalous internal conditions

The doctor tries to diagnose the root cause

The doctor may look for anomalous internal conditions Error

The doctor tries to diagnose the root cause Fault

✤ Note that both error and failure are runtime events

✤ Testing is a form of dynamic analysis - we execute the program to see

✤ To check the correctness without executing the program is static

Bug is used informally

Sometimes speakers mean

A page from the Harvard Mark II electromechanical computer's log,

✤ Use methodology to reduce complexity

✤ Use configuration management to prevent inconsistency

✤ Apply verification to prevent algorithmic faults

✤ Testing: Activity to provoke failures in a planned way

✤ Debugging: Find and remove the cause (faults) of an observed failure

✤ Monitoring: Deliver information about state => Used during debugging

✤ in general, a very difficult and labour-intensive problem

✤ Test Case: Test Input + Test Oracle

✤ Test Suite: a collection of test cases

x=2 SUT y=5

input System output

x=2 SUT y=5 y>0

input System output oracle

x=2 SUT y=5 y>0

input System output oracle

input System output oracle

input System output oracle

input System output oracle

input System output oracle

input System output oracle

input System output oracle

input System output oracle

input System output oracle

✤ Extremely short development cycles, no physical shipment of

✤ The prevailing view on software lifecycle not only determines

Object System Requirements

Unit Integration System Acceptance

✤ Alpha test: tests performed by users in a controlled

✤ Beta test: tests performed by real users in their own

✤ An Eclipse plugin developed by you

✤ A linux command line tool e.g. Find

✤ An android Facebook app

✤ You need to understand the pros and cons of each

✤ There are two major classes of testing techniques:

✤ Black-box: tester does not look at the code