Software Testing Basics: Elaine Weyuker AT&T Labs - Research Florham Park, NJ November 11, 2002

Software Testing Basics
Elaine Weyuker
AT&T Labs – Research
Florham Park, NJ
November 11, 2002

What is Software Testing?
Executing software in a simulated or real

environment, using inputs selected
somehow.
Goals of Testing
• Detect faults
• Establish confidence in software
• Evaluate properties of software
– Reliability
– Performance
– Memory Usage
– Security
– Usability
Software Testing Difficulties
Most of the software testing literature equates test
case selection to software testing but that is just one
difficult part. Other difficult issues include:
• Determining whether or not outputs are correct.

• Comparing resulting internal states to expected states.
• Determining whether adequate testing has been done.
• Determining what you can say about the software when testing is
completed.
• Measuring performance characteristics.
• Comparing testing strategies.
Determining the Correctness of
Outputs
We frequently accept outputs because they are plausible

rather than correct.
It is difficult to determine whether outputs are correct because:
•We wrote the software to compute the answer.

•There is so much output that it is impossible to validate it all.
•There is no (visible) output.
Dimensions of Test Case Selection
• Stages of Development
• Source of Information for Test Case
Selection
Stages of Testing
Testing in the Small
• Unit Testing
• Feature Testing
• Integration Testing
Unit Testing
Tests the smallest individually executable code units.

Usually done by programmers. Test cases might be
selected based on code, specification, intuition, etc.
Tools:
• Test driver/harness
• Code coverage analyzer
• Automatic test case generator
Integration Testing
Tests interactions between two or more units or
components. Usually done by programmers.
Emphasizes interfaces.
Issues:
• In what order are units combined?
• How do you assure the compatibility and
correctness of externally-supplied components?
Integration Testing
How are units integrated? What are the implications
of this order?
• Top-down => need stubs; top-level tested repeatedly.

• Bottom-up => need drivers; bottom-levels tested
repeatedly.
• Critical units first => stubs & drivers needed; critical
units tested repeatedly.
Integration Testing
Potential Problems:
• Inadequate unit testing.
• Inadequate planning & organization for
integration testing.
• Inadequate documentation and testing of
externally-supplied components.
Stages of Testing
Testing in the Large

• System Testing
• End-to-End Testing
• Operations Readiness Testing
• Beta Testing
• Load Testing
• Stress Testing
• Performance Testing
• Reliability Testing
• Regression Testing
System Testing
Test the functionality of the entire system.

Usually done by professional testers.
Realities of System Testing
• Not all problems will be found no matter how

thorough or systematic the testing.
• Testing resources (staff, time, tools, labs) are limited.
• Specifications are frequently unclear/ambiguous and
changing (and not necessarily complete and up-to-
date).
• Systems are almost always too large to permit test
cases to be selected based on code characteristics.
More Realities of Software Testing
• Exhaustive testing is not possible.

• Testing is creative and difficult.
• A major objective of testing is failure prevention.
• Testing must be planned.
• Testing should be done by people who are
independent of the developers.
Test Selection Strategies
Every systematic test selection strategy can be viewed as

a way of dividing the input domain into subdomains, and
selecting one or more test case from each. The division
can be based on such things as code characteristics (white
box), specification details (black box), domain structure,
risk analysis, etc.
Subdomains are not necessarily disjoint, even though the

testing literature frequently refers to them as partitions.
The Down Side of Code-Based
Techniques
• Can only be used at the unit testing level, and even

then it can be prohibitively expensive.
• Don’t know the relationship between a
“thoroughly” tested component and faults. Can
generally argue that they are necessary conditions
but not sufficient ones.
The Down Side of Specification-Based
Techniques
• Unless there is a formal specification, (which there
rarely/never is) it is very difficult to assure that all
parts of the specification have been used to select
test cases.
• Specifications are rarely kept up-to-date as the
system is modified.
• Even if every functionality unit of a specification
has been tested, that doesn’t assure that there
aren’t faults.
Operational Distributions
An operational distribution is a probability distribution

that describes how the system is used in the field.
How Usage Data Can Be Collected For
New Systems
• The input stream for this system is also the input stream
for a different already-operational system.
• The input stream for this system is the output stream for a
different already-operational system.
• Although this system is new, it is replacing an existing
system which ran on a different platform.
• Although this system is new, it is replacing an existing
system which used a different design paradigm or
different programming language.
• There has never been a software system to do this task,
but there has been a manual process in place.
Operational Distribution-Based Test
Case Selection
• A form of domain-based test case selection.

• Uses historical usage data to select test cases.
• Assures that the testing reflects how it will be used in
the field and therefore uncovers the faults that users are
likely to see.
The Down Side of Operational
Distribution-Based Techniques
• Can be difficult and expensive to collect necessary

data.
• Not suitable if the usage distribution is uniform
(which it never is).
• Does not take consequence of failure into
consideration.
The Up Side of Operational
Distribution-Based Techniques
• Really does provide a user-centric view of the

system.
• Allows you to say concretely what is known about
the system’s behavior based on testing.
• Have metric that is meaningfully related to the
system’s dependability.
Domain-Based Test Case Selection
Look at characteristics of the input domain or

subdomains.
• Consider typical, boundary, & near-boundary cases
(these can sometimes be automatically generated).
• This sort of boundary analysis may be meaningless
for non-numeric inputs. What are the boundaries
of {Rome, Paris, London, … }?
• Can also apply similar analysis to output values,
producing output-based test cases.
Domain-Based Testing Example
US Income Tax System;
If income is Tax is
$0 - 20K 15% of total income
$20 -50K $3K + 25% of amount over $20K
Above $50K $10.5K + 40% of amount over $50K
Boundary cases for inputs: $0, $20K, $50K

Random Testing
Random testing involves selecting test cases based

on a probability distribution. It is NOT the same as
ad hoc testing. Typical distributions are:
• uniform: test cases are chosen with equal probability

from the entire input domain.
• operational: test cases are drawn from a distribution
defined by carefully collected historical usage data.
Benefits of Random Testing
• If the domain is well-structured, automatic

generation can be used, allowing many more test
cases to be run than if tests are manually generated.
• If an operational distribution is used, then it should
approximate user behavior.
The Down Side of Random Testing
• An oracle (a mechanism for determining whether the output is

correct) is required to determine whether the output is correct.
• Need a well-structured domain.
• Even a uniform distribution may be difficult or impossible to
produce for complex domains, or when there is a non-numeric
domains.
• If a uniform distribution is used, only a negligible fraction of
the domain can be tested in most cases.
• Without an operational distribution, random testing does not
approximate user behavior, and therefore does not provide an
accurate picture of the way the system will behave.
Risk-based Testing
Risk is the expected loss attributable to the failures

caused by faults remaining in the software.
Risk is based on
• Failure likelihood or likelihood of occurrence.
• Failure consequence.
So risk-based testing involves selecting test cases
in order to minimize risk by making sure that the most
likely inputs and highest consequence ones are selected.
Risk-based Testing
Example: ATM Machine
Functions: Withdraw cash, transfer money, read

balance, make payment, buy train ticket.
Attributes: Security, ease of use, availability

Risk Priority Table
Features & Occurrence Failure Priority

Attributes Likelihood Consequence (L x C)
Withdraw cash High = 3 High = 3 9
Transfer money Medium = 2 Medium = 2 4
Read balance Low = 1 Low = 1 1
Make payment Low = 1 High = 3 3
Buy train ticket High = 3 Low = 1 3
Security Medium = 2 High = 3 6
Ordered Risk Priority Table
Features & Occurrence Failure Priority
Attributes Likelihood Consequence (L x C)
Withdraw cash High = 3 High = 3 9
Security Medium = 2 High = 3 6
Transfer money Medium = 2 Medium = 2 4
Make payment Low = 1 High = 3 3
Buy train ticket High = 3 Low 1 3
Read balance Low = 1 Low = 1 1
Acceptance Testing
The end user runs the system in their environment to

evaluate whether the system meets their criteria.
The outcome determines whether the customer will
accept system. This is often part of a contractual
agreement.
Regression Testing
Test modified versions of a previously validated

system. Usually done by testers. The goal is to
assure that changes to the system have not
introduced errors (caused the system to regress).
The primary issue is how to choose an effective

regression test suite from existing, previously-run
test cases.
Prioritizing Test Cases
Once a test suite has been selected, it is often

desirable to prioritize test cases based on some
criterion. That way, since the time available for
testing is limited and therefore all tests can’t be
run, at least the “most important” ones can be.
Bases for Test Prioritization
• Most frequently executed inputs.
• Most critical functions.
• Most critical individual inputs.
• (Additional) statement or branch coverage.
• (Additional) Function coverage.
• Fault-exposing potential.
White-box Testing
Methods based on the internal structure of code:

• Statement coverage
• Branch coverage
• Path coverage
• Data-flow coverage
White-box Testing
White-box methods can be used for

• Test case selection or generation.
• Test case adequacy assessment.
In practice, the most common use of white-box

methods is as adequacy criteria after tests have been
generated by some other method.
Control Flow and Data Flow Criteria
Statement, branch, and path coverage are examples

of control flow criteria. They rely solely on
syntactic characteristics of the program (ignoring
the semantics of the program computation.)
The data flow criteria require the execution of path

segments that connect parts of the code that are
intimately connected by the flow of data.
Issues of White-box Testing
• Is code coverage an effective means of detecting

faults?
• How much coverage is enough?
• Is one coverage criterion better than another?
• Does increasing coverage necessarily lead to
higher fault detection?
• Are coverage criteria more effective than random
test case selection?
Test Automation
• Test execution: Run large numbers of test

cases/suites without human intervention.
• Test generation: Produce test cases by processing
the specification, code, or model.
• Test management: Log test cases & results; map
tests to requirements & functionality; track test
progress & completeness
Why should tests be automated?
• More testing can be accomplished in less time.

• Testing is repetitive, tedious, and error-prone.
• Test cases are valuable - once they are created,
they can and should be used again, particularly
during regression testing.
Test Automation Issues
• Does the payoff from test automation justify the

expense and effort of automation?
• Learning to use an automation tool can be
difficult.
• Tests, have a finite lifetime.
• Completely automated execution implies putting
the system into the proper state, supplying the
inputs, running the test case, collecting the results,
and verifying the results.
Observations on Automated Tests
• Automated tests are more expensive to create and

maintain (estimates of 3-30 times).
• Automated tests can lose relevancy, particularly
when the system under test changes.
• Use of tools require that testers learn how to use
them, cope with their problems, and understand what
they can and can’t do.
Uses of Automated Testing
• Load/stress tests -Very difficult to have very large

numbers of human testers simultaneously accessing
a system.
• Regression test suites -Tests maintained from
previous releases; run to check that changes haven’t
caused faults.
• Sanity tests - Run after every new system build to
check for obvious problems.
• Stability tests - Run the system for 24 hours to see
that it can stay up.
Financial Implications of Improved
Testing
NIST estimates that billions of dollars could be

saved each year if improvements were made to the
testing process.
*NIST Report: The Economic Impact of Inadequate Infrastructure

for Software Testing, 2002.
Estimated Cost of Inadequate Testing
Cost of Inadequate Potential Cost
Software Testing Reduction from
Feasible
Improvements
Transportation $1,800,000,000 $589,000,000
Manufacture
Financial Services $3,340,000,000 $1,510,000,000
Total U.S. Economy $59 billion $22 billion
*NIST Report: The Economic Impact of Inadequate Infrastructure

for Software Testing, 2002.

Software Testing Basics: Elaine Weyuker AT&T Labs - Research Florham Park, NJ November 11, 2002

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Software Testing Basics: Elaine Weyuker AT&T Labs - Research Florham Park, NJ November 11, 2002

Uploaded by

Copyright:

Available Formats

Software Testing Basics

November 11, 2002

Executing software in a simulated or real

• Determining whether or not outputs are correct.

We frequently accept outputs because they are plausible

It is difficult to determine whether outputs are correct because:

•We wrote the software to compute the answer.

Testing in the Small

Tests the smallest individually executable code units.

• Top-down => need stubs; top-level tested repeatedly.

Testing in the Large

Test the functionality of the entire system.

• Not all problems will be found no matter how

• Exhaustive testing is not possible.

Every systematic test selection strategy can be viewed as

Subdomains are not necessarily disjoint, even though the

• Can only be used at the unit testing level, and even

An operational distribution is a probability distribution

• A form of domain-based test case selection.

• Can be difficult and expensive to collect necessary

• Really does provide a user-centric view of the

Look at characteristics of the input domain or

US Income Tax System;

Boundary cases for inputs: $0, $20K, $50K

Random testing involves selecting test cases based

• uniform: test cases are chosen with equal probability

• If the domain is well-structured, automatic

• An oracle (a mechanism for determining whether the output is

Risk is the expected loss attributable to the failures

Example: ATM Machine

Functions: Withdraw cash, transfer money, read

Attributes: Security, ease of use, availability

Features & Occurrence Failure Priority

The end user runs the system in their environment to

Test modified versions of a previously validated

The primary issue is how to choose an effective

Once a test suite has been selected, it is often

Methods based on the internal structure of code:

White-box methods can be used for

In practice, the most common use of white-box

Statement, branch, and path coverage are examples

The data flow criteria require the execution of path

• Is code coverage an effective means of detecting

• Test execution: Run large numbers of test

• More testing can be accomplished in less time.

• Does the payoff from test automation justify the

• Automated tests are more expensive to create and

• Load/stress tests -Very difficult to have very large

NIST estimates that billions of dollars could be

*NIST Report: The Economic Impact of Inadequate Infrastructure

Financial Services $3,340,000,000 $1,510,000,000

Total U.S. Economy $59 billion $22 billion

*NIST Report: The Economic Impact of Inadequate Infrastructure

You might also like