You are on page 1of 80

INDEX

1. Introduction
2. General Concepts of Testing
3. Test Case Design Techniques
Black box test
White box test
4. Test Phases
Unit Test
Link Test
Integration test
Function Test
System Test
Acceptance Test
5. Automated Software Testing
Dynamic Analysis
Coverage Analysis
Static Analysis
CHAPTER 1
INTRODUCTION TO SOFTWARE TESTING

Software testing is arguably the least understood part of the development process as well
as most critical element of software quality assurance and represents the ultimate review
of specification, design and code generation.
Once the code of the software has been generated it is must to test it by uncovering as
maximum number of possible errors. Our goal is to design a set of test cases that have
highest probability of finding errors. During early stages of testing a software only
software engineers perform all tests. However as the importance of software is increasing
separate testing specialist may become involved.
Reviews and other software quality assurance activities can uncover errors, but they are
not sufficient. Every time a program is executed the client test it, therefore we have to test
the program with the specific intent of finding and removing as many as errors as
possible. To find the highest possible number of errors, tests must be conducted
systematically and designed using standard techniques.
A series of test cases to test both internal logic and external logic is designed and
documented using disciplined techniques and expected results are also defined and actual
results are recorded to compare with the expected results.
When we begin testing, we should change our point of view and try hard to break the
software design test cases in a disciplined fashion and review the test cases we do create
for thoroughness.
In the end it seems like software testing is destructive activity but in true meanings it is
constructive and requires a great deal of attention.
CHAPTER 2
GENERAL CONCEPTS OF TESTING

During testing the software engineering produces a series of test cases that are used to
“rip apart” the software they have produced. Testing is the one step in the software
process that can be seen by the developer as destructive instead of constructive. Software
engineers are typically constructive people and testing requires them to overcome
preconceived concepts of correctness and deal with conflicts when errors are identified.

2.1 TESTING OBJECTIVES


Testing is the process of executing program(s) with the intent of finding errors, rather
than (a misconception) of showing the correct functioning of the program(s). The
distinction may sound like a matter of semantics, but it has been observed to have
profound effect on testing success. The difference actually lies on the different
psychological effect caused by the different objectives: If our goal is to demonstrate that
a program has no errors, then we will tend to select tests that have a low probability of
causing the program to fail. On the other hand, if our goal is to demonstrate that a
program has errors; our test data will have a higher probability of finding errors.
Specifically, testing should bear the following objectives:
(a) To reveal design errors;
(b) To reveal logic errors;
(c) To reveal performance bottleneck;
(d) To reveal security loophole; and
(e) To reveal operational deficiencies.
All these objectives and the corresponding actions contribute in increasing quality and
reliability of the application software.
2.2 TESTING STRATEGY
There are two strategies for testing software, namely White-Box Testing and Black-Box
Testing. White-Box Testing, also known as Code Testing, focuses on the independent
logical internals of the software to assure that all code statements and logical paths have
been tested. Black-Box Testing, also known as Specification Testing, focuses on the
functional externals to assure that defined input will produce actual results that agreed
with required results documented in the specifications. Both strategies should be used,
according on the levels of testing.

2.3 LEVELS OF TESTINGS

There are 5 levels of Testing, each of which carries a specific functional purpose, to be
carried out in chronological order.

Testing Description Strategy applied


Unit Testing Testing of the program modules in isolation White Box Test
With the objective to find discrepancy
between the programs and the program
specifications
Link Testing Testing of the linkages between tested White Box Test
program modules with the objective to find
discrepancy between the programs and
system specifications
Function Testing Testing of the integrated software on a Black Box Test
function by function basis with the objective
to find discrepancy between the programs
and the function specifications
Systems Testing Testing of the integrated software with the Black Box Test
objective to find discrepancy between the
programs and the original objectives with
regard to the operating environment of the
system (e.g. Recovery, Security,
Performance, Storage, etc.)
Acceptance Testing Testing of the integrated software by the end Black Box Test
users (or their proxy) with the objective to
find discrepancy between the programs and
the end user needs

2.4 GENERAL TESTING PRINCIPLES


The following points should be noted when conducting training:
• As far as possible, testing should be performed by a group of people different
from those performing design and coding of the same system.
• Test cases must be written for invalid and unexpected, as well as valid and
expected input conditions. A good test case is one that has a high probability of
detecting undiscovered errors. A successful test case is one that detects an
undiscovered error.
• A necessary part of a test case is a definition of the expected outputs or results. Do
not plan testing effort on assumption that no errors will be found. The probability
of the existence of more errors in a section of a program is proportional to the
number of errors already found in that section. Testing libraries should be set up
allowing Regression test be performed at system maintenance and enhancement
times.
• The later in the development life cycle a fault is discovered, the higher the cost of
correction.
• Successful testing is relying on complete and unambiguous specification.

2.5 QUALITY ASSUARANCE


Introduction

Software Quality Assurance (SQA) without measures is like a jet with no fuel.
Everyone is ready to go but not much happens. This paper deals with the SQA
activities to ensure the right fuel is available to reach the destination, namely high
quality. There are three processes, common to both development and purchasing,
that enable organizations to be high flyers and reach quality in the stratosphere.
The role of SQA, as described here, is to confirm that core measures are used
effectively by management processes involving technical and purchasing directors,
project and purchasing managers and process improvement groups. They cover:
1. Benchmarking and Process Improvement
2. Estimating and Risk Assessment
3. Progress Control and Reporting
Process improvement enables the same amount of software to be built in less time
with less effort and fewer defects. Informed estimating uses process productivity
benchmarks to evaluate constraints, assess risks and to arrive at viable estimates.
Estimates of the defects at delivery use the history from benchmarked projects and
allow alternative staffing strategies to be evaluated. Throwing people in to meet tight
time to market schedules has a disastrous impact on quality. Progress control tracks
defects found during development in order to avoid premature delivery and to ensure
the reliability goals are achieved.
Each process contributes separately to improving the quality of the final software
product. We describe how the core measures are used in each process to fuel
improved quality. Dramatic quality improvements are achieved by dealing with all
three. (Ensuring the fuel is high octane).
SQA is defined as " a group of related activities employed throughout the software
lifecycle to positively influence and quantify the quality of the delivered software."
(Ref 1.) Much of the SQA literature relates to product assurance. This article
focuses on process assurance and the core measurement data that supports all
management levels.
The basic fuel elements are the Carnegie Mellon Software Engineering Institute (SEI)
recommendations on core software measures, namely software size, time, effort and
defects. (Ref. 2) An extra benefit is that the majority of the SEI-Capability Maturity
Model (CMMI) Key Process Areas (KPA's) are met by assuring the processes use
these measures.

Quality Assurance is the process of making sure that the customers gets enough of what
they pay for to satisfy their needs. Testing is the means by which we perform the
process. You can test without assuring quality, but you can't assure quality without
testing. A common problem with software quality is that the assurance is in the hands of
the producers. While the producers can certainly create and perform insightful, powerful
tests, it is perfectly possible to design tests that will churn away forever and never
discover a defect. This is the psychological temptation when the software producers
program tests to evaluate themselves.
CHAPTER 3
TEST CASE DESIGN TECHNIQUES
The preceding section of this paper has provided a "recipe" for developing a unit test
specification as a set of individual test cases. In this section a range of techniques which
can be to help define test cases are described.
Test case design techniques can be broadly split into two main categories. Black box
techniques use the interface to a unit and a description of functionality, but do not need to
know how the inside of a unit is built. White box techniques make use of information
about how the inside of a unit works. There are also some other techniques which do not
fit into either of the above categories. Error guessing falls into this category.

Black box (functional) White box (structural) Other

Specification derived tests Branch testing Error guessing

Equivalence partitioning Condition testing

Boundary value analysis Data definition-use testing

State-transition testing Internal boundary value testing

Table 3.1 - Categories of Test Case Design Techniques

The most important ingredients of any test design are experience and common sense. Test
designers should not let any of the given techniques obstruct the application of
experience and common sense.
3.1. EQUIVALENCE PARTITIONING
Equivalence partitioning is a much more formalised method of test case design. It is
based upon splitting the inputs and outputs of the software under test into a number of
partitions, where the behaviour of the software is equivalent for any value within a
particular partition.
Data which forms partitions is not just routine parameters. Partitions can also be present
in data accessed by the software, in time, in input and output sequence, and in state.
Equivalence partitioning assumes that all values within any individual partition are
equivalent for test purposes. Test cases should therefore be designed to test one value in
each partition. Consider again the square root function used in the previous example. The
square root function has two input partitions and two output partitions, as shown in table
3.2.

Table 3.2 - Partitions for Square Root

These four partitions can be tested with two test cases:


Test Case 1: Input 4, Return 2
- Exercises the >=0 input partition (ii)
- Exercises the >=0 output partition (a)
Test Case 2: Input -10, Return 0, Output "Square root error - illegal negative input"
using Print_Line.
- Exercises the <0 input partition (i)
- Exercises the "error" output partition (b)

For a function like square root, we can see that equivalence partitioning is quite simple.
One test case for a positive number and a real result; and a second test case for a negative
number and an error result. However, as software becomes more complex, the
identification of partitions and the inter-dependencies between partitions becomes much
more difficult, making it less convenient to use this technique to design test cases.
Equivalence partitioning is still basically a positive test case design technique and needs
to be supplemented by negative tests.

3.2. BOUNDARY VALUE ANALYSIS


Boundary value analysis uses the same analysis of partitions as equivalence partitioning.
However, boundary value analysis assumes that errors are most likely to exist at the
boundaries between partitions. Boundary value analysis consequently incorporates a
degree of negative testing into the test design, by anticipating that errors will occur at or
near the partition boundaries. Test cases are designed to exercise the software on and at
either side of boundary values. Consider the two input partitions in the square root
example, as illustrated by figure 3.2.

Figure 3.2 - Input Partition Boundaries in Square Root

The zero or greater partition has a boundary at 0 and a boundary at the most positive real
number. The less than zero partition shares the boundary at 0 and has another boundary at
the most negative real number. The output has a boundary at 0, below which it cannot go.

Test Case 1: Input {the most negative real number}, Return 0, Output "Square root
error - illegal negative input" using Print_Line

- Exercises the lower boundary of partition (i).

Test Case 2: Input {just less than 0}, Return 0, Output "Square root error - illegal
negative input" using Print_Line
- Exercises the upper boundary of partition (i).

Test Case 3: Input 0, Return 0


- Exercises just outside the upper boundary of partition (i), the lower
boundary of partition (ii) and the lower boundary of partition (a).

Test Case 4: Input {just greater than 0}, Return {the positive square root of the input}
- Exercises just inside the lower boundary of partition (ii).

Test Case 5: Input {the most positive real number}, Return {the positive square root of
the input}
- Exercises the upper boundary of partition (ii) and the upper boundary
of partition (a).

As for equivalence partitioning, it can become impractical to use boundary value analysis
thoroughly for more complex software. Boundary value analysis can also be meaningless
for non scalar data, such as enumeration values. In the example, partition (b) does not
really have boundaries. For purists, boundary value analysis requires knowledge of the
underlying representation of the numbers. A more pragmatic approach is to use any small
values above and below each boundary and suitably big positive and negative numbers

3.3. STATE - TRANSITION TESTING


State transition testing is particularly useful where either the software has been designed
as a state machine or the software implements a requirement that has been modelled as a
state machine. Test cases are designed to test the transitions between states by creating
the events which lead to transitions.
When used with illegal combinations of states and events, test cases for negative testing
can be designed using this approach.

3.4. BRANCH TESTING


In branch testing, test cases are designed to exercise control flow branches or decision
points in a unit. This is usually aimed at achieving a target level of Decision Coverage.
Given a functional specification for a unit, a "black box" form of branch testing is to
"guess" where branches may be coded and to design test cases to follow the branches.
However, branch testing is really a "white box" or structural test case design technique.
Given a structural specification for a unit, specifying the control flow within the unit, test
cases can be designed to exercise branches. Such a structural unit specification will
typically include a flowchart or PDL. Returning to the square root example, a test
designer could assume that there would be a branch between the processing of valid and
invalid inputs, leading to the following test cases:

Test Case 1: Input 4, Return 2


- Exercises the valid input processing branch

Test Case 2: Input -10, Return 0, Output "Square root error - illegal negative input" using
Print_Line.
- Exercises the invalid input processing branch

However, there could be many different structural implementations of the square root
function. The following structural specifications are all valid implementations of the
square root function, but the above test cases would only achieve decision coverage of
the first and third versions of the specification.
Figure 3.3(a) - Specification 1 Figure 3.3(b) - Specification 2

Figure 3.3(c) - Specification 3 Figure 3.3(d) - Specification 4

It can be seen that branch testing works best with a structural specification for the unit. A
structural unit specification will enable branch test cases to be designed to achieve
decision coverage, but a purely functional unit specification could lead to coverage gaps.
One thing to beware of is that by concentrating upon branches, a test designer could loose
sight of the overall functionality of a unit. It is important to always remember that it is the
overall functionality of a unit that is important, and that branch testing is a means to an
end, not an end in itself. Another consideration is that branch testing is based solely on
the outcome of decisions. It makes no allowances for the complexity of the logic which
leads to a decision.

3.5. CONDITION TESTING


There are a range of test case design techniques which fall under the general title of
condition testing, all of which endeavor to mitigate the weaknesses of branch testing
when complex logical conditions are encountered. The object of condition testing is to
design test cases to show that the individual components of logical conditions and
combinations of the individual components are correct.
Test cases are designed to test the individual elements of logical expressions, both within
branch conditions and within other expressions in a unit. As for branch testing, condition
testing could be used as a "black box" technique, where the test designer makes
intelligent guesses about the implementation of a functional specification for a unit.
However, condition testing is more suited to "white box" test design from a structural
specification for a unit.

To illustrate condition testing, consider the example specification for the square root
function which uses successive approximation (figure 3.3(d) - Specification 4). Suppose
that the designer for the unit made a decision to limit the algorithm to a maximum of 10
iterations, on the grounds that after 10 iterations the answer would be as close as it would
ever get. The PDL specification for the unit could specify an exit condition like that given
in figure 3.4.
Figure 3.4 - Loop Exit Condition

If the coverage objective is Modified Condition Decision Coverage, test cases have to
prove that both error<desired accuracy and iterations=10 can independently affect the
outcome of the decision.

Test Case 1: 10 iterations, error>desired accuracy for all iterations.


- Both parts of the condition are false for the first 9 iterations.
On the tenth iteration, the first part of the condition is false and the second part becomes
true, showing that the iterations=10 part of the condition can independently affect its
outcome.

Test Case 2: 2 iterations, error>=desired accuracy for the first iteration, and error<desired
accuracy for the second iteration.
- Both parts of the condition are false for the first iteration. On
the second iteration, the first part of the condition becomes true and the second part
remains false, showing that the error<desired accuracy part of the condition can
independently affect its outcome.
Condition testing works best when a structural specification for the unit is available. It
provides a thorough test of complex conditions, an area of frequent programming and
design error and an area which is not addressed by branch testing. As for branch testing, it
is important for test designers to beware that concentrating on conditions could distract a
test designer from the overall functionality of a unit.
3.6. INTERNAL BOUNDARY VALUE ANALYSIS
In many cases, partitions and their boundaries can be identified from a functional
specification for a unit, as described under equivalence partitioning and boundary value
analysis above. However, a unit may also have internal boundary values which can only
be identified from a structural specification. Consider a fragment of the successive
approximation version of the square root unit specification, as shown in figure 3.5
( derived from figure 3.3(d) - Specification 4).

Figure 3.5 – Fragment of Specification 4

The calculated error can be in one of two partitions about the desired accuracy, a feature
of the structural design for the unit which is not apparent from a purely functional
specification. An analysis of internal boundary values yields three conditions for which
test cases need to be designed.

Test Case 1: Error just greater than the desired accuracy

Test Case 2: Error equal to the desired accuracy

Test Case 3: Error just less than the desired accuracy


Internal boundary value testing can help to bring out some elusive bugs. For example,
suppose "<=" had been coded instead of the specified "<". Nevertheless, internal
boundary value testing is a luxury to be applied only as a final supplement to other test
case design techniques.

Orthogonal Array Testing


The Orthogonal Array Testing Strategy (OATS) is a systematic, statistical way of testing pair-wise
interactions. It provides representative (uniformly distributed) coverage of all variable pair
combinations. This makes the technique particularly useful for integration testing of software
components (especially in OO systems where multiple subclasses can be substituted as the server for a
client). It is also quite useful for testing combinations of configurable options (such as a web page that
lets the user choose the font style, background color, and page layout).

Test case selection poses an interesting dilemma for the software professional. Almost everyone has
heard that you can't test quality into a product, that testing can only show the existence of defects and
never their absence, and that exhaustive testing quickly becomes impossible -- even in small systems.
However, testing is necessary. Being intelligent about which test cases you choose can make all the
difference between (a) endlessly executing tests that just aren't likely to find bugs and don't increase
your confidence in the system and (b) executing a concise, well-defined set of tests that are likely to
uncover most (not all) of the bugs and that give you a great deal more comfort in the quality of your
software.
The basic fault model that lies beneath this technique is:
• Interactions and integrations are a major source of defects.
• Most of these defects are not a result of complex interactions such as "When the background is blue and
the font is Arial and the layout has menus on the right and the images are large and it's a Thursday then
the tables don't line up properly." Most of these defects arise from simple pair-wise interactions such as
"When the font is Arial and the menus are on the right the tables don't line up properly."
• With so many possible combinations of components or settings, it is easy to miss one.
• Randomly selecting values to create all of the pair-wise combinations is bound to create inefficient test
sets and test sets with random, senseless distribution of values.
OATS provides a means to select a test set that:
• Guarantees testing the pair-wise combinations of all the selected variables.
• Creates an efficient and concise test set with many fewer test cases than testing all combinations of all
variables.
• Creates a test set that has an even distribution of all pair-wise combinations.
• Exercises some of the complex combinations of all the variables.
• Is simpler to generate and less error prone than test sets created by hand.
CHAPTER 4

SOFTWARE TESTING TECHNIQUES


Software testing is a fundamental component of software quality assurance and
represents a review of specification, design and coding. The greater visibility of software
systems and the cost associated with software failure are motivating factors for planning,
through testing. Structural (usually called "white box") testing, and functional ("black
box") testing have unique characteristics, advantages and limitations that make them
more or less applicable to certain stages of test.

4.1 WHITE BOX TESTING


White box testing is a test case design approach that employs the control architecture of
the procedural design to produce test cases. Using white box testing approaches, the
software engineering can produce test cases that
(1) Guarantee that all independent paths in a module have been exercised at least once
(2) Exercise all logical decisions
(3) Execute all loops at their boundaries and in their operational bounds
(4) Exercise internal data structures to maintain their validity

4.1.1 Requirements Analysis


Understanding the requirements is key to performing white box testing. To this end, it is
necessary so that the customer can get a better understanding of the system under
consideration. All relevant project documents relating to the system functionality and
design are desirable.

4.1.2 System Architecture


Key to undertaking any white-box testing project is to understand the overall system
architecture. So one should work with client provided documentation relating to system
architecture. The system architecture documentation forms the basis for identification of
systems, subsystems and generation of test cases.

4.1.3 Identification
The identification of the test items is done primarily based on the specifications of the
product. These specifications would be related to:
• Functions (exhaustive list) of the system
• Response criteria (benchmarking and stress testing)
• Volume constraints (number of users, hits, stress testing)
• Stability criteria (24 hour testing with fast operations)
• Database responses (flushing, cleaning, updating rates etc.)
• Network criteria (network traffic, choking, etc.)
• Compatibility (environments, browsers, etc.)
• User Interface / Friendliness Criteria
• Modularity (Ability to easily interface with other tools)
• Security

4.1.4 Criteria for Test Cases


Each Test Plan Item should have the following specific characteristics:
• It should be uniquely identifiable
• It should be unambiguous
• It should have well-defined test-data (or data-patterns)
• It should have well defined pass/fail criteria for each sub-item and overall-criteria
for the pass/fail of the entire test itself
• It should be easy to record
• It should be easy to demonstrate repeatedly

Many of the above criteria are related to actually identifying the test plan items and
would involve a good understanding of the specifications. However, keeping in mind the
need for a strong process, one should keep the above aspects in mind and formulated a
structure of the test plan.

4.2 BASIS PATH TESTING


A testing mechanism proposed by McCabe whose aim is to derive a logical complexity
measure of a procedural design and use this as a guide for defining a basic set of
execution paths.
These are test cases that exercise basic set will execute every statement at least once.

4.2.1 Flow Graph Notation


A notation for representing control flow similar to flow charts and UML activity
diagrams.

4.2.2 Cyclomatic Complexity


The cyclomatic complexity gives a quantitative measure of the logical complexity. This
value gives the number of independent paths in the basis set, and an upper bound for the
number of tests to ensure that each statement is executed at least once.
An independent path is any path through a program that introduces at least one new set of
processing statements or a new condition (i.e., a new edge) Cyclomatic complexity
provides upper bound for number of tests required to guarantee coverage of all program
statements.

4.3 CONTROL STRUCTURE TESTING

4.3.1 Conditions Testing


Condition testing aims to exercise all logical conditions in a program module. They may
define:
• Relational expression: (E1 op E2), where E1 and E2 are arithmetic expressions.
• Simple condition: Boolean variable or relational expression, possibly proceeded
by a NOT operator.
• Compound condition: composed of two or more simple conditions, Boolean
operators and parentheses.
• Boolean expression: Condition without relational expressions..

4.3.2 Loop Testing


Loops fundamental to many algorithms. Can define loops as simple, concatenated,
nested, and unstructured. Note that unstructured loops are not to be tested. Rather, they
are redesigned.

4.4 BLACK BOX TESTING


Black box testing approaches concentrate on the fundamental requirements of the
software. Black box testing allows the software engineer to produce groups of input
situations that will fully exercise all functional requirements for a program. Black box
testing is not an alternative to white box techniques. It is a complementary approach that
is likely to uncover a different type of errors that the white box approaches.
Black box testing tries to find errors in the following categories:
(1) Incorrect or missing functions
(2) Interface errors
(3) Errors in data structures or external database access
(4) Performance errors
(5) Initialization and termination errors

By applying black box approaches we produce a set of test cases that fulfill requirements:
(1) Test cases that reduce the number of test cases to achieve reasonable testing
(2) Test cases that tell use something about the presence or absence of classes of errors

CHAPTER 5
TESTING OF REAL TIME SYSTEMS

Testing real-time systems presents more challenges than testing non-real-time systems; since, in addition to
the value domain, the temporal domain also has to be considered. A number of design issues affect the
testing strategies and the testability of the system. This paper gives a brief introduction to some of these
design issues and explains how testing is affected by the different possible choices. The main conclusion is
that testers need to partake in the design phase to safeguard the testability of the system, an issue that is
otherwise easily overlooked.
Testing the temporal domain of a real-time system also affects potential tools for automatic test execution.
The last part of this paper is devoted to explain the major requirements on testing tools stemming from the
need to test in the temporal domain. The conclusion is that automating the test execution is a necessity for a
successful test.

Introduction
More than 99% of the processors produced today are used in embedded systems [Tur99]. Many of these
embedded systems have real-time requirements (e.g., cellular phones, fuel injection systems, modems, and
video recorders). Still, to the best of our knowledge, little attention has been devoted to develop the theory
and the best practice of testing real-time systems.
Although testing principles developed for non-real-time systems are applicable for real-time systems, the
fact that time is a parameter in the testing complicates many issues.

Real-time Systems
Many computer systems, including most real-time systems can be viewed as in Figure 1. Input, in this
chapter called an event, initiates a computation, in this paper called a task. Upon termination, the task
produces a result. Loosely, a task can be understood to be an arbitrary computation. A real-time task is a
task that must complete at an intended point in time. In practice, it is usually enough if the realtime
task completes before the intended point in time, that is, the deadline.

Figure 5.1 Simple model of a computer system.

This definition of a real-time task can be used to define a real-time system as a system that contains at least
one real-time task [Sch93]. This is only one of many possible definitions of a real-time system with the
common denominator that in real-time systems, both the value and the time domains are important. The
reason for choosing this and the previous definitions in this section is to include as many real-time systems
as possible.
Real-time systems are often classified according to the cost of missing a deadline. Locke [Loc86] describes
four different classes (soft, firm, hard essential, and hard critical) of real-time systems based on the cost of
missing a deadline. In a soft real-time system, completing a task after its deadline will still be beneficial
even if the gain is not as big as if the task had been completed within the deadline and it may be acceptable
to occasionally miss a deadline. In a firm real-time system, completing a task after its deadline will neither
give any benefits nor incur any costs. In a hard essential real-time system, a bounded cost will be the result
of missing a deadline. An example may be lost revenues in a billing system. Finally, missing a deadline of
a hard critical system will have disastrous results, for instance loss of human lives.
Different events give rise to event types. Two events are of the same event type if they have entered the
system through the same channel and they only differ in the time of entrance and possibly in the specific
value. Event types may be very different from each other. Examples of event types are temperature
readings, keyboard commands and pushing a call button of an elevator. When designing real-time systems
the types and frequencies of events are important to consider. An event type may be periodic, sporadic or
aperiodic. An event type is periodic if an event of that type occurs with a regular (and known) period. An
event type is sporadic if events of that type may occur any time but there is a known minimum inter-arrival
time between two consecutive events of that type. An event type is aperiodic if nothing is known about
how often events may occur or when it is known that events may occur anytime. The peak load that is
assumed to be generated by the environment is called the load hypothesis [KV93] and it is often formulated
in terms of types and frequencies of the events from the environment.

TESTING

Testing, that is, dynamic execution of test cases [BS98], has two main goals. These goals are assessing and
increasing reliability [FHL+98]. Testing as a means of assessing the reliability, relies on choosing and
executing test cases based on some operational distribution and monitoring the number of encountered
failures. A failure is defined as a deviation of the software from its expected delivery or service [BS98].
Testing as means of increasing the reliability, builds on selecting test cases that are assumed to be
especially likely to cause failures. The observed failure is analyzed to find the cause of the failure, which is
the fault [BS98]. The fault is removed and the reliability is assumed to increase. Many different test
methods exist (e.g. equivalence partitioning, boundary value analysis, state-based testing, syntax testing,
[Bei90]) that are all assumed to generate test suites containing test cases especially prone to revealing
failures.
The strategies and methods used for testing the value domain in non-real-time systems can to a large extent
be used without alteration in real-time systems. A good example of this is the DO178b standard for testing
avionics systems [DO92]. This standard requires the testing of the most safety critical parts of the avionics
system to reach 100% Modified Condition Decision Coverage (MCDC). MCDC is a code coverage
criterion totally independent of the temporal properties of the application.
The main challenge in testing real-time systems is that they need to be tested in the temporal domain as
well as the value domain [Sch93]. Testing in the temporal domain has several implications. The input to the
test object may need to be issued at a precise moment. The temporal state of the test object at the start of
the test execution may need to be controlled. The timing of the result may need to be observed. There is a
potential for non-determinism etc.
Two central concepts are observability and controllability. Observability, that is, the functionality
facilitated by the system to observe or monitor what the system does, how it does it, and when it does it
[Sch93], and controllability, that is, the functionality available to the user to control the (re-) execution of a
test case [Sch93]. The testability of a system is defined as the attributes of software that bear on the effort
needed for validating the [modified] software [ISO91]. The testability of a system depends to a large extent
on the observability and controllability of the system [Sch93].

Design Trade-offs for Testability in RT Systems

The following sections describe how testability is affected by different design decisions.
3.1 Overview
When designing a real-time system there are many decisions to make. The outcomes of most of these
decisions affect the testing in one way or another. In the following subsections we will describe and discuss
issues relating to some of these decisions. We have selected issues that are likely to be present in many
real-time systems and that have a major impact on the testing. Scheduling and the choice of design
paradigm are the first two issues covered. These two issues are closely related and have a major impact on
the testing strategy that can be used [LMA02]. The next two issues: tracing and support for state
manipulation give concrete examples of how to achieve observability and controllability in a real-time
system. The final issue, caching, is motivated by the fact that most commercial processors today contain
one or more levels of caches. Introduction of caching in a system may increase the performance of the
system but there are severe drawbacks from a testability perspective.
3.2 Scheduling
In practice, most real-time system contains more than one task. Sometimes two or more tasks will be
possible to execute at the same time. In these cases there must be rules for in which order the tasks should
be executed since only one task at a time can use a processorii or other resources. Determining the order of
the task execution based on the supplied rules is called scheduling. Scheduling can be either static or
dynamic.
In static scheduling, the execution order of the tasks is determined in advance. In a simplified description,
as soon as the execution of one task is finished the processor finds the next task to execute by looking in a
table containing the pre-calculated order. Often these pre-calculated orders are cyclic.
In dynamic scheduling there is no pre-calculated execution order. Instead there is a set of rules for how to
resolve a conflict when two or more tasks want to execute at the same time. One common approach is to
assign a priority to each task and force the tasks to execute in priority order. It is not uncommon to let
higher priority tasks interrupt lower priority tasks. This is called preemption.
3.3 Design paradigms
There are two different design paradigms for real-time systems: time-triggered and event-triggered [Sch93].
The main difference between the two design paradigms is when communication between the real-time
system and the environment is performed. A time-triggered system only communicates with its
environment at predefined points in time. Events that has occurred in the environment since the latest
communication point will not be detected and reacted upon by the real-time system until the next
communication point. Similarly a result computed by the time-triggered real-time system will not be passed
back to the environment until the next communication point. Figure 2 illustrates this concept. The
consequence of communicating with the environment only at specific points in time is that the system
works in cycles (… read events, execute tasks, write results…). This in turn means that the cycle time, that
is, the time between two communication points need to be long enough for the worst-case execution time
for any anticipated combination of tasks. Overload situations cannot be handled and should not be possible
since the system is designed for an assumed worst case. The normal way of implementing a time-triggered
system is by polling and it is common to use static scheduling in time-triggered systems.

Figure 5.2 Observation and reaction to an event in a time-triggered system.

In an event-triggered system there are no special points in time when communication has to occur between
the real-time system and its environment. Instead events are observed and reacted upon as they happen.
Produced results are similarly communicated to the environment as soon as they are ready. This can be
seen in Figure 3. Event-triggered systems must be scheduled dynamically and a normal way of achieving
this is by using interrupts.
In contrast to time-triggered systems, an event-triggered system may face overload situations. The effect is
that the event-triggered system must be designed to handle such situations dynamically, in a best-effort
manner. This means that deadlines can occasionally be missed and it is the responsibility of the designer to
minimize the damage in such situations. In many event-triggered systems it is crucial to guarantee a
minimum level of service. Taking a car as an example, the braking service is critical to uphold at all times,
whereas the climate control service can be allowed to fail during an overload situation. Therefore, it is
essential to make a correct load hypothesis and design the system with enough resources to maintain this
minimum level of service under the load hypothesis.

Figure 3 Observation and reaction to event in an event-triggered system.

The choice of design paradigm has consequences for a number of properties of the real-time system.
Unfortunately the desired values of the different properties are in conflict with each other both for
timetriggered
and event-triggered systems. Figure 4 gives a simplified overview of how some of the different
properties are in conflict with each other for the two design paradigms. This will be further explained in the
proceeding sections.

Figure 5.3 Trade-off between testability/predictability and flexibility/efficiency for the two design
Paradigms

Testing
Time triggered design. The order of events occurring within the same observation interval is
insignificant. All of the events are communicated at the same time, that is, at the next
communication point. Due to static schedule, the tasks corresponding to the events are executed in
the same order regardless of the actual order of the events. This means that there is a finite, albeit
large, number of possible behaviors for a time-triggered system, which is why time-triggered
systems can be tested with systematic coverage exploration. Moreover, the controllability is
increased since the tester only need to bother about in which observation interval an event is input
to the system.
• Event-triggered design. Different order of events may lead to different behavior since the dynamic
scheduling algorithm continuously changes the order of task execution based on the current
situation, that is, the state of the system and the incoming event. Further, for two event sequences
with the same order, the exact time of occurrence of an individual event may affect the result,
again due to the dynamic behavior of the scheduling. This makes testing more difficult. In
addition, many of the dynamic scheduling algorithms are heuristic, which means that the result of
executing the same input with the same timing starting from the same state might yield different
results. This leads to the observation that testing with systematic coverage exploration is not
feasible in event-triggered systems. Instead we must use statistical testing with tailored loads.
• Predictability
A system is predictable if the effect of a task can be unambiguously derived from knowledge of that
task and its execution environment. Randomness, heuristics, and race-conditions all have a negative
impact on predictability. Usually predictability focuses on the observable end result. However, in
this article we are not only concerned with the observable end result (e.g., that a deadline was met)
but also to some extent how the result was obtained (e.g., which interleaving of tasks that was
actually executed). Thus, we will use the term predictability in this wider meaning.
• Time-triggered design. The static scheduling and the insignificance of the order of events in the
same observation period of these systems increase the predictability of the system since the
interleaving of tasks is completely determined in advance. A high predictability makes it possible
to use systematic coverage criteria in the testing. Predictability is also of extreme importance in
hard critical real-time systems since such systems require high confidence that the system will
work in all situations.
• Event-triggered design. The dynamic scheduling and in particular the heuristics involved in the
scheduling decision decreases the predictability of the behavior for a given input sequence. This
will of course decrease the confidence in the test results and also make regression testing harder.
This is one of the main reasons why statistical test methods should be used for event-triggered
systems. In statistical methods the same input sequence may be executed many times to increase
7
the confidence that the system works correctly under all circumstances for that input sequence.
Mimicking the operational conditions of the system is also more important for this reason for
event-triggered systems. It is important to note that for most event-triggered systems testing alone
will not permit that 100% confidence is gained of the reliability of the system. This is one of the
reasons why event-triggered systems are seldom in practice used for hard real-time systems.
• Flexibility
• Time-triggered design. A time-triggered system is inflexible when it comes to changing the
system or altering the load or fault hypotheses. Alterations exceeding possible spare capacity for
future extensions in the system or change of the prerequisites require at least that the static
schedule is recomputed and the system reintegrated and re-tested. Sometimes the system even
needs to be redesigned, in particular if the change increases the resource demand.
• Event-triggered design. An event-triggered system is flexible by its nature. The dynamic
scheduling and its ability to handle overloads make the event-triggered systems suitable in less
predictable environments. However, it is important to note that a change to the system, even strict
removals of parts of the system, effectively invalidates all previous test results.
• Efficiency
• Time-triggered design. In a time-triggered system, the schedule usually is static. Sporadic tasks are
scheduled as if they where periodic. The tasks execute with their worst-case execution time. When
the worst case in terms of execution time or arrival rate for sporadic tasks differ much from the
average case, then there is a waste of resources. The reason is that whenever a task completes in
less time than its worst case execution time, the unused time cannot be used for anything else.
• Event-triggered design. This paradigm demands dynamic scheduling. Sporadic tasks are scheduled
on arrival. A more critical task than the currently executed task leads to a preemption of the
current task whereas a less critical task than the current task has to wait. The execution time
varies, which means that if there is a difference between average-case execution time and worstcase,
then the efficiency is much higher in an event-triggered system than it would be in a timetriggered.
The reason is that any unused time may be used for less critical work by dynamically
scheduling an appropriate task.
Pure time-triggered or event-triggered systems are rare. Many systems have characteristics from both
paradigms. A natural trade-off for systems with mixed criticality is to design the critical part of a system
according to the time-triggered paradigm for predictability reasons and design the rest of the system
according to the event-triggered paradigm for efficiency reasons. This can be done if it is possible to
separate the critical parts from the non-critical parts. During the trade-off discussions, it is important to
consider the testing issues as well as all other properties.
3.4 Traces
A common approach to facilitate observability during testing is to use traces. The test object is
instrumented by extra code, usually write statements that prints values of interesting variables. During test
execution these write statements produce a log, which is analyzed after the test execution to determine the
test results or identifying the cause of a failure. The instrumentation of the code introduces a probe effect
[Gai86]. The probe effect occurs because the behavior of the instrumented system under test is different
from the final version of the system.
The probe effect is more severe in real-time systems than in non-real-time systems. There are two reasons
for this. First, the traces need to contain more information to include the temporal information, which leads
to more instrumentation code being inserted. Second and more importantly, each code instruction that we
add to the code will necessarily affect the temporal behavior since each instruction will add execution time.
This means that the test results obtained from an instrumented version of the test object might not be valid
for a non-instrumented version of the same test object. It is important to note that the changed behavior due
to the probe effect often increases but may sometimes even decrease the response time for a specific task
due to the changed interleaving of the task execution. If it were always the case that response times were
increased an instrumented test object meeting its deadlines would imply that the corresponding un-
instrumented object would also meet the deadlines. Since the timing is affected, we might get different
interleavings among the tasks due to the extra instructions. If there is a race condition, a deadline might be
met that would not have been met if it were not for the extra instructions. The conclusion is that a changed
timing behavior in a real-time system may change the results themselves, not only the timing of the results.
A common approach to deal with the probe effect is to leave the instrumentation in the final product. This
requires, however, that there is a mechanism for hiding unwanted information for the end user and that
using this mechanism in itself does not alter the timing of the application.
3.5 Support for system state manipulation
Often, a test case requires the system under test to have a certain internal state as a starting point of the test
case. A big challenge for the tester is to achieve the required state prior to test case execution. The state of
the system shall, in this scope, be interpreted as any requirement imposed by the test case on the system
under test. Obviously, the nature of these requirements depend on the test case but common examples of
such requirements include certain variables having specific values, task queues containing specific tasks,
and certain amount of dynamic memory already allocated. For real-time systems these prerequisites may
also include timing requirements, for instance that a certain task has already executed during exactly 10 ms.
Achieving the right system state prior to test case execution is strongly related to controllability since
controllability includes all means of preparing and controlling the test case execution. However, for the
tester the process of achieving the right system state may also include a check that the right state really has
been achieved. In these cases, some aspects of observability are also included.
Dick and Faivre [DF93] describe two methods of achieving a required internal state prior to test case
execution. One method is to demand from the system under test to provide special test-bed functions,
which can place the system under test directly in any desired state. The other method is to start in an idle
state and then execute other test cases and set-up scripts that will result in the system having the desired
state. Although from a testing point of view having specific test-bed functions clearly is preferable, there
are several drawbacks with this approach. Sometimes it will not be possible to include specific test
functionality, for instance if software components are imported from somewhere else. Even if possible, it
might not be practical due to the cost and complexity introduced. Finally, if dedicated test-bed functions are
built-in to the system, care must be taken so that these functions will not become a security hazard. Thus,
using test cases and set-up scripts will in many cases be the only option, even if this method restricts the
controllability.
Although this discussion has not specifically mentioned real-time systems, the same reasoning apply with
the addition that controlling a system in the temporal domain is even harder than controlling a system in the
value domain. In practice, testing real-time systems when there are requirements on the time domain prior
to test execution will require a statistical approach or a trial-and-observe approach. The statistical approach
builds on the assumption that if we repeat the test cases enough times, we will have achieved the desired
conditions at least one time, but we do not explicitly check that this is the case. The trial-and-observe
approach require that there are means for observing the actual state before test case execution was
performed, thus making it possible to determine if the desired conditions were met.
9
Testing tool Issues
The area of testing tools for non-real-time systems has already received considerable attention elsewhere
[FG99], [Hay95] so this section will only focus on features needed in tools for testing real-time systems.
For most types of tools the real-time aspect of the developed system does not affect the applicability of the
tool. This is especially true for administrative tools, such as tools for test management, traceability, and
error reporting. However, as soon as [parts of] the test case execution shall be automated in a real-time
system the timing aspects need to be considered.
It is quite difficult to find commercial testing tools for automatic test execution of real-time systems. There
are several reasons for this. One reason is that the support for timeliness is difficult to implement,
especially if the system under test is regarded as a black box, which is a necessity for a commercial
program. For instance, the time granularity of the tool needs to be as small or smaller than the time
granularity of the system under test. Another reason for the difficulty of finding commercial tools for
automatic test execution of real-time systems is that many real-time systems are embedded and lacking
standardized interfaces. Also many real-time systems have specialized application domains with very
specific demands. Imagine for instance the different demands of mobile phones, brake-by-wire systems,
and pacemakers. Still another reason is that some systems are built using state-of-the-art technology, which
means that there hardly exist tools that support the new technology. The overall implication of this
reasoning is that most tools for automatic test execution of real-time systems are built in-house.
10
Since building tools in-house is often the only option, based on the contents of this paper, we will give a
small overview of features that might be handy in a tool for automatic test execution of real-time systems.
Even if a commercially available tool is considered, this overview may serve as a checklist for evaluation
of the tool. The order of the items in the list is not significant.
• Probe-effect
In many cases the tool needs to intervene with the system under test. For embedded systems it is quite
common that a part of the test tool is located on the target system to facilitate detailed observation
despite limited means of communication. In other cases the tool might rely on specific timing
information that is obtained by instrumenting the code with special test instructions. In both cases the
test tool will inflict a probe-effect on the system under test. In such cases it is beneficial to have the
ability to determine the amount of probe-effect caused by the tool. To some extent this can be achieved
by having the tool measuring itself. A complementing approach is to make theoretical calculations of
the probe-effect based on known benchmark figures and the actual results of the test execution.
• Event injection
Event injection or stimuli of the system under test is an important aspect of a testing tool for automatic
test execution. For a real-time system timing of the input is important. One useful feature of a tool is to
be able to release stimuli to the system under test at a specific [predefined] time. Another, related and
equally useful, feature is to be able to release stimuli to the system under test with specified delay
relative to an event, which is either another release by the tool or an event occurring in the system
under test that is perceived by the tool.
Ordinary load generation tools intended for non-real-time systems usually support some form of
customization of the distribution of the load. Such and related features are of course useful in the realtime
case as well.
• Observation of timing
Observation of events generated by the system under test is also an important area when automating
the test execution. Part of deciding if a timeliness test case passes of fails is to determine if the deadline
was met. There are basically two methods to implement such a check. Either the tool supports timers
or the events perceived by the tool are time-stamped.
A tool supporting timers is usually limited to measure elapsed time of stimuli-response pairs, where the
stimuli originates from the tool, since the timer has to be triggered somehow. An advantage is that
there are neither probe-effect nor overhead introduced by the tool since all of the intelligence is outside
of the system under test. Another advantage with timers is that the pass/fail decision can be made in
real-time if the timers are implemented as time-outs. A drawback is that signal transfer and processing
times outside the system under test is included in the round-trip delay. A challenge in the timer
approach is when there is not a one-to-one correspondence between stimuli and response or when
stimuli-response pairs are interleaved.
The other solution having events time stamped requires the tool to perform post-analysis to calculate
the actual round-trip delays this prohibits making the pass/fail decision in real-time, but gives the
advantage of handling events not only originating from the test tool itself. If the time stamping is made
by the system under test this will either result in a probe-effect or extra overhead (if the time stamping
is kept in the final system). In addition, the communication between the tool and the system under test
is increased since the time stamps are added to the events. If the tool, on the other hand, time stamps
the events the probe effect and/or overhead are removed and the communication to the tool is reduced,
but again we face the problem with inclusion of transfer and handling times outside the system under
test.
• Synchronization
As soon as there are more than one locus of control in a computer system there might be a clock
synchronization problem. In the case of a tool for automatic test execution, if the clocks of both the
tool and the system under test are used then a synchronization problem arises. Obviously the same
applies in a distributed real-time system with multiple clocks. Clock synchronization can be achieved
in two ways. Either a global clock is used as a master clock frequency, or the local clocks are used
together with a clock synchronization protocol. In either case a real-time network is required.
Another, more coarse-grained, synchronization problem arises whenever the test tool supports
distribution. Many tools for load generation allow multiple clients to generate the load, in order to
generate larger loads than a single machine can manage. In such cases the different clients need
synchronization with respect to at least start and stop of load generation. It is quite an implementation
challenge if high precision is needed in such synchronization, and a real-time network is soon needed,
increasing both the complexity and cost of the tool.
• Resynchronization in case of failure
More advanced tools for automatic execution of test cases, real-time or not, supports resynchronization
of the test execution after a failure has been detected. The idea is that after a failure, the test tool can
take actions to restore the system under test into a known state, and resume test case execution an a
point in the test suite after the test case, which failed.
Due to the complexity in the actions needed to diagnose and reset the system under test, even for nonreal-
time systems resynchronization after a failure is a big challenge. It is an even bigger challenge,
when real-time aspects need to be taken into account. Recent research on the topic reports promising
but not yet widely available results. Form instance, Iorgulescu and Seviora [IS97] report results on
real-time supervision and diagnose of a telephony system. Their work is based on a specification
written in Specification and Description Language (SDL). The specification is used to generate
possible but erroneous states, which are compared with the actual erroneous state. When the problem
has been diagnosed, suitable actions corrective actions can be taken to resume the test case execution.
All this is done in real-time, which is a prerequisite in timeliness testing.

CHAPTER 5
TEST PHASES
SYSTEM TESTING

VALIDATION TESTING

INTEGRATION TESTING

UNIT TESTING

CODE

DESIGN

REQUIREMENTS

SYSTEM ENGINEERING

Figure 5.1 software testing model

5.2 UNIT TEST


The various constituents of the system are tested in increasing degrees of granularity
starting from a component through to the full system. A component is the smallest unit
for testing. A module or API consists of components, while a subsystem comprises
multiple modules. The complete system is built with various subsystems. Control Points
are used to segment the life cycle of the development process into the Design,
development and Deployment phases.

5.2.1 Component
A component is an independent, isolated and reusable unit of a program that performs a
well-defined function. It usually has public interfaces that allow it be used to perform its
functions. Individual components are tested against their functional and design goals
using the parameters outlined in section 4.0.

5.2.2 Module
A module comprises of one or more components to achieve a business function. Also
known as an API, the module encapsulates and aggregates the functionality of its
constituent components and appears as a black box. To its users. Usually, a module is
homogeneous in nature with respect to the application domain. For example, a database
module with interface and/or encapsulate database specific functions. Modules are tested
against their functional and design goals using the parameters outlined in section 4.0.

5.2.3 Subsystem
Subsystems are defined as heterogeneous collections of modules to achieve a business
function. For example, a credit card processing subsystem might interface to a credit card
clearing house, a database component and an audit mechanism to perform complete
credit card related operations. Subsystems are also tested against their functional and
design goals using the parameters outlined in section 4.0.

5.2.4 System
The full system uses multiple subsystems to implement the full functionality of the
Application. An example is an online shopping system that includes catalog, shopping
cart and credit card processing subsystems. The complete system is tested against their
functional and design goals using the parameters outlined in section 4.0.

Units are the smallest building blocks of software. In a language like C, individual
functions make up the units. Unit testing is the process of validating such small building
blocks of a complex system much before testing an integrated large module or the system
as a whole.

5.2.5 Benefits of Unit Testing


• Be able to test parts of a project with out waiting for the other parts to be available,
• Achieve parallelism in testing by being able to test and fix problems simultaneously by
many engineers,
• Be able to detect and remove defects at a much less cost compared to other later stages
of testing,
• Be able to take advantage of a number of formal testing techniques available for unit
testing,
• Simplify debugging by limiting to a small unit the possible code areas in which to
search for bugs,
• Be able to test internal conditions that are not easily reached by external inputs in the
larger integrated systems (for example, exception conditions not easily reached in normal
operation)
• Be able to achieve a high level of structural coverage of the code,
• Avoid lengthy compile-build-debug cycles when debugging difficult problems.

Studies have shown that Unit testing is more cost effective compared to the other stages
of testing. Study leads to the conclusion that better testing at the early phases i.e. unit
testing is a smarter way to detect and fix defects. Unit testing can detect and remove a
significant portion of the defects. A study by Thayer and Lipow shows that
comprehensive path and parameter testing can remove 72.9% of the defects.

5.2.6 Unit testing - some typical problems and there solution

(a) Testing is monotonous, boring and repetitive:


Automation of as many of the routine activities as possible will help a long way in
reducing the monotony of testing. Defining concrete completeness criteria for the testing
activity also brings some predictability to testing in the sense that now there is a concrete
goal to be achieved.

(b) Poor Documentation of Test cases:


It is very useful to ensure that test case documentation gets automated as part of the
testing process. This way while testing gets done the documentation continues to get
generated.
(c) Coding Drivers and Stubs:
Automation of code generation for drivers and stubs can result in an useful saving of
effort for the tester. It also will ensure that there are no defects in the stubs or drivers that
results in avoidable loss of time.

(d) Informal testing process:


Combining Functional (Black box testing based on the Specifications), Structural (White
box testing based on the structure of the code) and Heuristic (based on human intuition)
testing techniques provide much better results than simply using an intuitive approach to
testing. Testing must be mostly systematic and partly intuitive instead of the general
practice of mostly intuitive and partly systematic approach to testing.

(e) Poor Regression Testing:


It is very useful to build a capability of retaining automated test cases as a useful resource
along with the code. Automation is the only solution to regression testing.

(f) Lack of Complete Testing tools:


Computer Aided Software Testing (CAST) Tools are a fast growing discipline. Good unit
test automation tools are beginning to become available. Evolution of such tools
achieving a more comprehensive automation of Unit testing activities are likely to help in
a big way in solving many of the problems currently faced in Unit testing.

5.3 LINK TEST


5.3.1 Scope of Testing
Link Testing is the process of testing the linkages between program modules as against
the system specifications. The goal here is to find errors associated with interfacing. As a
by-product of the testing process, the software modules would be integrated together. It is
worth noting that this level of testing is sometimes referred to as “Integration Testing”,
which is understood to mean that the testing process would end up with the software
modules in integration. However after some careful consideration, the term was
abandoned, as it would cause some confusion over the term “System Integration”, which
means integration of the automated and manual operations of the whole system.

5.3.2 Activities, Documentation and Parties Involved


(a) Test Group to prepare a Link Testing test plan.
(b) Test Group to prepare a Link Testing test specification before testing commences.
(c) Test Group, with the aid of the designers/programmers, to set up the testing
environment.
(d) Test Group to perform Link Testing; and upon fault found issue Test Incident Reports
to Designers/programmers, who would fix up the liable errors.
(e) Test Group to report progress of Link Testing through periodic submission of the Link
Testing Progress Report.

5.3.3 Practical Guidelines


(a) Both control and data interface between the programs must be tested.
(b) Both Top-down and Bottom-up approaches can be applied. Top-down integration is
an incremental approach to the assembly of software structure. Modules are integrated by
moving downward through the control hierarchy, beginning with the main control module
(‘main program’). Modules subordinate (and ultimately subordinate) to the main control
module are incorporated into the structure in either a depth-first or the breadth-first
manner.
Bottom-up integration, as its name implies, begins assembly and testing with modules at
the lowest levels in the software structure. Because modules are integrated from the
bottom up, processing required for modules subordinate to a given level is always
available, and the need for stubs (i.e. dummy modules) is eliminated.
(c) As a result of the testing, integrated software should be produced.

5.4 INTEGRATION TEST


The individual components are combined with other components to make sure that
necessary communications, links and data sharing occur properly. It is not truly system
testing because the components are not implemented in the operating environment. The
integration phase requires more planning and some reasonable sub-set of production-type
data. Larger systems often require several integration steps.

There are three basic integration test methods:

o all-at-once
o top-down
o bottom-up

5.4.1 All-at-once

The all-at-once method provides a useful solution for simple integration problems,
involving a small program possibly using a few previously tested modules.

5.4.2 Top-Down integration

Top-down integration is an incremental approach to the production of program structure.


Modules are integrated by moving downwards through the control hierarchy, starting
with the main control module. Modules subordinate to the main control module are
included into the structure in either a depth-first or breadth-first manner. Relating to the
figure below depth-first integration would integrate the modules on a major control path
of the structure. Selection of a major path is arbitrary and relies on application particular
features. For instance, selecting the left-hand path, modules M1, M2, M5 would be
integrated first. Next M8 or M6 would be integrated. Then the central and right-hand
control paths are produced. Breath-first integration includes all modules directly
subordinate at each level, moving across the structure horizontally. From the figure
modules M2, M3 and M4 would be integrated first. The next control level, M5, M6 etc.,
follows.
M1

M2 M3 M4

M5 M6 M7

M8

Figure 5.2 Top Down Integration


The integration process is performed in a series of five stages:

1. The main control module is used as a test driver and stubs are substituted for all
modules directly subordinate to the main control module.
2. Depending on the integration technique chosen, subordinate stubs are replaced
one at a time with actual modules.
3. Tests are conducted as each module is integrated.
4. On the completion of each group of tests, another stub is replaced with the real
module.
5. Regression testing may be performed to ensure that new errors have been
introduced.

Bottom-up Integration

Bottom-up integration testing, begins testing with the modules at the lowest level (atomic
modules). As modules are integrated bottom up, processing required for modules
subordinates to a given level is always available and the need for stubs is eliminated.
Ma

D1 D2

Cluster 1

Cluster 2

Figure 5.3 Bottom Up Integration


Testing

A bottom-up integration strategy may be implemented with the following steps:


1. Low-level modules are combined into clusters that perform a particular software
subfunction.
2. A driver is written to coordinate test cases input and output.
3. The cluster is tested.
4. Drivers are removed and clusters are combined moving upward in the program
structure.

5.4.4 Comments on Integration Testing


There has been much discussion on the advantages and disadvantages of bottom-up and
top-down integration testing. Typically a disadvantage is one is an advantage of the other
approach. The major disadvantage of top-down approaches is the need for stubs and the
difficulties that are linked with them. Problems linked with stubs may be offset by the
advantage of testing major control functions early. The major drawback of bottom-up
integration is that the program does not exist until the last module is included.

5.5 FUNCTION TESTING

5.5.1 Scope of Testing


Function Testing is the process of testing the integrated software on a function-by-
function basis as against the function specifications. The goal here is to find discrepancy
between the programs and the functional requirements.

5.5.2 Activities, Documentation and Parties Involved


(a) Test Group to prepare Function Testing test plan, to be endorsed by the Project
Committee via the Test Control Sub-Committee, before testing commences.
(b) Test Group to prepare a Function Testing test specification before testing commences.
(c) Test Group, with the aid of the designers/programmers, to set up the testing
environment.
(d) Test Group (participated by user representatives) to perform Function Testing; and
upon fault found issue test incident reports to Designers/programmers, who fix up the
liable errors.
(e) Test Group to report progress of Function Testing through periodic submission of the
Function Testing Progress Report.

5.5.3 Practical Guidelines


(a) It is useful to involve some user representatives in this level of testing, in order to give
them familiarity with the system prior to Acceptance test and to highlight differences
between users’ and developers’ interpretation of the specifications. However, degree of
user involvement may differ from project to project, and even from department to
department, all depending on the actual situation.
(b) User involvement, if applicable, could range from testing data preparation to staging
out of the Function Testing.
(c) It is useful to keep track of which functions have exhibited the greatest number of
errors; this information is valuable because it tells us that these functions probably still
contain some hidden, undetected errors.

5.6 SYSTEM TESTING

5.6.1 Scope of Testing


A system testing is the process of testing the integrated software with regard to the
operating environment of the system. (i.e. Recovery, Security, Performance, Storage, etc.)
It may be worthwhile to note that the term has been used with different environments. In
its widest definition especially for the small-scale projects, it also covers the scope of
Link Testing and the Function Testing. For small-scale projects, which combine the Link
Testing, Function Testing and System Testing in one test, plan and one test specification,
it is crucial that the test specification should include distinct sets of test cases for each of
these 3 levels of testing.

5.6.2 Activities, Documentation and Parties Involved


(a) Test group to prepare a System Testing test plan, to be endorsed by the Project
Committee via the Test Control Sub-Committee, before testing commences.
(b) Test group to prepare a System Testing test specification before testing commences.
(c) Test group, with the aid of the designers/programmers, to set up the testing
environment.
(d) Test group (participated by the computer operators and user representatives) to
perform System Testing; and upon fault found issue test incident reports to the
Designers/programmers, who would fix up the liable errors.
(e) Test group to report progress of the System Testing through periodic submission of the
System Testing Progress Report.

5.6.3 Practical Guidelines


(a) Eight types of Systems Tests are discussed below. It is not claimed that all 8 types will
be mandatory to every application system nor are they meant to be an exhaustive list. To
avoid possible overlooking, all 8 types should be explored when designing test cases.

(i) Volume Testing- Volume testing is to subject the system to heavy volumes of data,
and the attempt of which is to show that the system cannot handle the volume of data
specified in its objective. Since volume testing being obviously expensive, in terms of
machine and people time, one must not go overboard. However every system must be
exposed to at least a few volume tests.

(ii) Stress Testing- Stress testing involves subjecting the program to heavy loads or
stress. A heavy stress is a peak volume of data encountered over a short span of time.
Although some stress test may experience ‘never will occur’ situations during its
operational use, but this does not imply that these tests are not useful. If errors are
detected by these ‘impossible’ conditions, the test is valuable, because it is likely that the
same errors might also occur in realistic, less stressful situations.

(iii) Performance Testing- Many programs have specific performance or efficiency


objectives, such as response times and throughput rates under certain workload and
configuration conditions. Performance testing should attempt to show that the system
does not satisfy its performance objectives.

(iv) Recovery Testing- If processing must continue during periods in which the
application system is not operational, then those recovery processing
procedures/contingent actions should be tested during the System test. In addition, the
users of the system should be involved in a complete recovery test so that not only the
application system is tested but the procedures for performing the manual aspects of
recovery are tested.

(v) Security Testing- The adequacy of the security procedures should be tested by
attempting to violate those procedures. For example, testing should attempt to access or
modify data by an individual not authorized to access or modify that data.
To address security issues with the system under test, the following aspects are tested:
(a) Authentication
The authentication mechanism is tested based on class of user and password validation, if
any. If 3rd party products such as LDAP servers are used in this mechanism, those are
tested as well. In the situation where digital certificates are used for authentication, the
certificates as well as system behavior upon activation of the certificates are tested.
(b) Data Security
Flow of information (channel security) as well as firewall access issues are tested for all
aspects of the system under test. If data encryption is being used, it is checked with
compatibility with standards.
Any computer-based system that manages sensitive information or produces operations
that can improperly harm individuals is a target for improper or illegal penetration.
Security testing tries to verify that protection approaches built into a system will protect it
from improper penetration. During security testing, the tester plays the role of the
individual who wants to enter the system. The tester may try to get passwords through
external clerical approaches; may attack the system with customized software, purposely
produce errors and hope to find the key to system entry. The role of the designer is to
make entry to the system more expensive than that which can be gained.

(vi) Procedure Testing- Computer systems may not contain only computer processes
but also involve procedures performed by people. Any prescribed human procedures,
such as procedures to be followed by the system operator, database administrator, or
terminal user, should be tested during the System test.
(vii) Regression Testing- This discipline enables us to track issues, check the
effectiveness of a solution, and detect any new issues, which may have been created as a
result of fixing the original problem. Reports are generated, problems are tracked, and the
process continues until all of the issues are solved or a new version is developed.
Purpose The purpose of regression testing is to ensure that previously detected and
fixed issues really are fixed, they do not reappear, and new issues are not introduced into
the program as a result of the changes made to fix the issues.
Methodology Typically Breakers performs regression testing on a daily basis. Once
an issue in the defect-tracking database has been fixed it is reassigned back to Breakers
for final resolution. Breakers can either reopen the issue, if it has not been satisfactorily
addressed, or close the issue if it has, indeed, been fixed. For more involved projects
lasting several months, several full regression passes may be scheduled in addition to the
continuous regression testing mentioned above. Full regression passes involve re-
verifying all closed issues in the defect-tracking database as truly closed. A full regression
pass is also typically performed at the very end of the testing effort as part of a final
acceptance test. In addition to verifying closed issues, regression testing seeks to verify
that changes made to fix known defects do not cause further defects. Breakers can
produce a regression-testing suite consisting of test cases that evaluate the stability of all
modules of the software product. Quite often, automation of this regression-testing suite
is well worth considering

(viii) Operational Testing- During the System test, testing should be conducted by the
normal operations staff. It is only through having normal operation personnel conduct the
test that the completeness of operator instructions and the ease with which the system can
be operated can be properly evaluated. This testing is optional, and should be conducted
only when the environment is available.
(b) It is understood that in real situations, due to possibly environmental reasons, some of
the tests (e.g. Procedures test, etc.) may not be carried out in this stage and are to be
delayed to later stages. There is no objection to such delay provided that the reasons are
documented clearly in the Test Summary Report and the test be carried out once the
constraints removed.
Ultimately, software is included with other system components and a set of system
validation and integration tests are performed. Steps performed during software design
and testing can greatly improve the probability of successful software integration in the
larger system. System testing is a series of different tests whose main aim is to fully
exercise the computer-based system. Although each test has a different role, all work
should verify that all system elements have been properly integrated and form allocated
functions. Below we consider various system tests for computer-based systems.

5.7 ACCEPTANCE TESTING


5.7.1 Scope of Testing
Acceptance Testing is the process of comparing the application system to its initial
requirements and the current needs of its end users. The goal here is to determine whether
the software end product is not acceptable to its user.

5.7.2 Activities, Documentation and Parties Involved


(a) User representatives to prepare an Acceptance Testing test plan, which is to be
endorsed by the Project Committee via the Test Control Sub-Committee.
(b) User representatives to prepare an Acceptance Testing test specification, which is to
be endorsed by the Project Committee via the Test Control Sub-Committee.
(c) User representatives to perform Acceptance Testing; and upon fault found issue test
incident reports to the Designers/programmers, who will fix up the liable error.
(d) User representatives to report progress of the Acceptance Testing through periodic
submission of Acceptance Testing Progress Report.

5.7.3 Practical Guidelines


(a) There are three approaches for Acceptance Testing, namely,
(i) A planned comprehensive test using artificial data and simulated operational
procedures, and usually accompanied with the Big Bang implementation approach but
can also be used as a pre-requisite step of other approaches.
(ii) Parallel run using live data and would normally be used when comparison between
the existing system and the new system is required. This approach requires duplicated
resources to operate both systems.
(iii) Pilot run using live data and would normally be used when the user is not certain
about the acceptance of the system by its end-users and/or the public. Users are
responsible to select the approaches that are most applicable to its operating environment.

(b) Precaution for users


(i) Testing staff should be freed from their routine activities
(ii) Commitment is authorized

Verifications and Validations

Software testing is one type of a broader domain that is known as verification and
validation (V&V). Verification related to a set of operations that the software correctly
implements a particular function. Validation related to a different set of activities that
ensures that the software that has been produced is traceable to customer needs.
Dynamic Analysis

Dynamic Analysis uses test data sets to execute software in order to observe its behavior
and produce test coverage reports. This assessment of source code ensures consistent
levels of high quality testing and correct use of capture/playback tools.

Dynamic Analysis provides the facilities to achieve quality standards for critical code,
improve code efficiency, minimize regression test costs, and detect software defects.

When used during software development and maintenance, Dynamic Analysis techniques
can make a significant contribution to a program’s robustness and reliability.

Benefits of Dynamic Analysis

•High quality testing is performed

•Reduces cost and effort of regression testing

•Identifies software anomalies and defects

•Yields a comprehensive test data set which has measurable quality and known test
outcomes

•Reduces maintenance costs to a minimum

•Identifies unnecessary parts of the system/program, which can be removed

•Ensures systems are reliable and as error free as possible

Dynamic Analysis explores the semantics of the application under test via test data
selection. Control and data flow models constructed from the Static Analysis of the
software application are compared with the actual control flow and data flow that are
yielded at run time. This enables checks to be made which show errors in either the Static
or Dynamic Analysis.

Dynamic Analysis is particularly effective for the analysis of software applications which
are required to achieve high levels of reliability. It is the primary requirement for the
testing of safety-critical avionics software and is widely used in all military, safety and
mission critical software.

In addition to the safety-critical industry sectors mentioned above, Dynamic Analysis is


also being used in the banking and telecommunications sectors. A key driver is the
process and efficiency improvements the tool can bring. It is able to demonstrate real cost
savings and return on investments for clients, which lead to large competitive advantages.

Coverage Analysis
Measurement of structural coverage of code is a means of assessing the thoroughness of
testing. There are a number of metrics available for measuring structural coverage, with
increasing support from software tools. Such metrics do not constitute testing techniques,
but a measure of the effectiveness of testing techniques.
A coverage metric is expressed in terms of a ratio of the metric items executed or
evaluated at least once to the total number of metric items. This is usually expressed as a
percentage.
Coverage = items executed at least once/ total number of items
There is significant overlap between the benefits of many of the structural coverage
metrics.
It would be impractical to test against all metrics, so which metrics should be used as part
of an effective testing strategy?
A subjective score has been given for each metric against the evaluation criteria (5=high,
1=low). Simple examples are given to illustrate specific points. Data collected from an
investigation of real code (summarized in annex A) is used to support the analysis.
Section 10 summarizes conclusions and makes recommendations to enable developers to
apply structural coverage metrics in a practical way in real software developments. There
are many equivalent names for each structural coverage metric. The names used in this
paper are those considered to be most descriptive. Equivalent alternative names are listed
annex B. References are given in annex C.

2. Evaluation Criteria
The first evaluation criterion is automation. To be of use on a real software development,
which may involve tens of thousands or hundreds of thousands of lines of code, a metric
must be suitable for automated collection and analysis.
A metric should also be achievable. It should be possible and practical to achieve 100%
coverage (or very close to 100% coverage) of a metric. Any value less than 100%
requires investigation to determine why less than 100% has been achieved.

(a) If it is the result of a problem in the code, the problem should be fixed and tests run
again.
(b) If it is the result of a problem in the test data, the problem should be fixed and tests
run again.
(c) If it is because 100% coverage is infeasible, then the reasons for infeasibility must
be ascertained and justified.

Infeasibility occurs because the semantics of the code constrain the coverage which can
be achieved, for example: defensive programming, error handling, constraints of the test
environment, or characteristics of the coverage metric. Infeasibility should be the only
reason for metric values of less than 100% to be accepted.
When 100% coverage is infeasible, the effort required for investigation and to take
appropriate action is important. This will depend on the frequency at which coverage of
less than 100% occurs and on how comprehensible the metric is. To be comprehensible
the relationship between a metric, design documentation and code should be simple.
Software has to be retested many times throughout its life. Test data required to achieve
100% coverage therefore has to be maintainable. Changes required of test data should
not be disproportionate in scale to changes made to the code.
An ideal criteria against which a coverage metric should be assessed is its effectiveness
at detecting faults in software. To measure the effectiveness of each coverage metric
would require extensive data collection from software tested using the entire range of
coverage metrics. The size of such a data collection would require orders of magnitude
more effort than the investigation described in annex A.
As the investigation was based on static analysis and code reading, the actual
effectiveness of each metric could not be quantified. For the purposes of this paper,
effectiveness is assumed to be a function of thoroughness. The thoroughness with
which test data designed to fulfill a metric actually exercises the code is assessed. A
higher thoroughness score is attributed to metrics which demand more rigorous test data
to achieve 100% coverage.

3. Statement Coverage

Statement Coverage = s/S


where:
s = Number of statements executed at least once.
S = Total number of executable statements.

Statement coverage is the simplest structural coverage metric. From a measurement point
of view one just keeps track of which statements are executed, then compares this to a list
of all executable statements. Statement coverage is therefore suitable for automation.
Statement coverage is easily comprehensible, with the units of measurement
(statements) appearing directly in the code. This makes analysis of incomplete statement
coverage a simple task.
It is practical to achieve 100% statement coverage for nearly all code. An investigation of
real code (as described in annex A) showed no infeasible statements. 100% statement
coverage was achievable for all modules analyzed. However, statement coverage is not a
very good measure of test thoroughness. Consider the following fragment of code:
Example 3a
1. if CONDITION then
2. DO_SOMETHING;
3. end if;
4. ANOTHER_STATEMENT;

Full statement coverage of example 3a could be achieved with just a single test for which
CONDITION evaluated to true. The test would not differentiate between the code given
in example 3a and the code given in example 3b.

Example 3b
1. null;
2. DO_SOMETHING;
3. null;
4. ANOTHER_STATEMENT;

Another criticism of statement coverage, is that test data which achieves 100% statement
coverage of source code, will often cover less than 100% coverage of object code
instructions. Beizer [1] quantifies this at about 75%.
Test data for statement coverage is maintainable by virtue of its simplicity and
comprehensible relationship to the code.
Automation 5
Achievable 5
Comprehensible 5
Maintainable 5
Thoroughness 1

4. Decision coverage

Decision coverage = d/D


Where:
d = Number of decision outcomes evaluated at least once.
D = Total number of decision outcomes.
To achieve 100% decision coverage, each condition controlling branching of the code has
to evaluate to both true and false. In example 4a, decision coverage requires two test
cases.

Example 4a
1. if CONDITION then
2. DO_SOMETHING;
3. else
4. DO_SOMETHING_ELSE;
5. end if;

Test CONDITION
1 True
2 False

Not all decision conditions are as simple, decision conditions are also in case or switch
statements and in loops. However, this does not present an obstacle to automation.
The units of measurement (decision conditions) appear directly in the code, making
decision coverage comprehensible and investigation of incomplete decision coverage
straight forward. An investigation of real code (as described in annex A) showed no
infeasible decision outcomes. 100% decision coverage was achievable for all modules
analyzed.
Test data designed to achieve decision coverage is maintainable. Equivalent code to
example 4a, shown in example 4b, would not require changes to test data for decision
coverage.

Example 4b
1. if not CONDITION then
2. DO_SOMETHING_ELSE;
3. else
4. DO_SOMETHING;
5. end if;
For structured software, 100% decision coverage will necessarily include 100% statement
coverage. The weakness of decision coverage becomes apparent when non-trivial
conditions are used to control branching. In example 4c, 100% decision coverage could
be achieved with two test cases, but without fully testing the condition.

Example 4c
1. if A and B then
2. DO_SOMETHING;
3. else
4. DO_SOMETHING_ELSE;
5. end if;

Test A B
1 True True
2 False True

Untested
True False
False False

For a compound condition, if two or more combinations of components of the condition


could cause a particular branch to be executed, decision coverage will be complete when
just one of the combinations has been tested. Yet compound conditions are a frequent
source of code bugs.
The thoroughness of test data designed to achieve decision coverage is therefore an
improvement over statement coverage, but can leave compound conditions untested.
Automation 5
Achievable 5
Comprehensible 5
Maintainable 5
Thoroughness 2

5. LCSAJ Coverage

An LCSAJ is defined as an unbroken linear sequence of statements:


(a) which begins at either the start of the program or a point to which the control flow
may jump,
(b) which ends at either the end of the program or a point from which the control flow
may jump,
(c) and the point to which a jump is made following the sequence.

Hennell [3] gives a full explanation and some examples to help illustrate the definition of
an LCSAJ.

LCSAJ coverage = l/L

where:
l = Number of LCSAJs exercised at least once.
L = Total number of LCSAJs.

LCSAJs depend on the topology of a module's design and not just its semantics, they do
not map onto code structures such as branches and loops. LCSAJs are not easily
identifiable from design documentation. They can only be identified once code has
already been written. LCSAJs are consequently not easily comprehensible.
Automation of LCSAJ coverage is a bit more difficult than automation of decision
coverage. However, it is relatively easily achieved.
Small changes to a module can have a significant impact on the LCSAJs and the required
test data, leading to a disproportionate effort being spent in maintaining LCSAJ coverage
and maintaining test documentation. Unfortunately this dependence cannot be illustrated
with trivial examples. In examples 5a LCSAJs are marked as vertical bars.

Example 5a
| | | | 1. if A then
| | | 2. STATEMENT;
| | | | | | 3. end if;
| | | | | | 4. if B then
| | | | 5. if C then
| | 6. STATEMENT;
| | 7. else
| 8. STATEMENT;
| | 9. end if;
| | 10. else
| | 11. if D then
| 12. STATEMENT;
| 13. else
| | 14. STATEMENT;
| | | | 15. end if;
| | | | | | 16. end if;
| | | | | | 18. if E then
| | | 19. STATEMENT;
| | | | 20. end if;

Suppose condition B were to be negated and the two nested 'if-else' constructs were to
swap positions in the code. Condition A would then be combined in LCSAJs with
condition D, whereas condition E would be combined in LCSAJs with condition C. The
code would be effectively the same, but the LCSAJs against which LCSAJ coverage is
measured would have changed.
A similar problem occurs with case or switch statements, where LCSAJs lead into the
first alternative and lead out of the last alternative, as shown in example 5b.

Example 5b
| | | | | 1. if A then
| | | | 2. STATEMENT;
| | | | | | | | 3. end if;
| | | | | | | | 4. case B
| | 5. B1:
| | 6. STATEMENT;
| 7. B2:
| 8. STATEMENT;
| | 9. B3:
| | 10. STATEMENT;
| | | 11. end case
| | | | 12. if C then
| | 13. STATEMENT;
| | | 14. end if;

To achieve LCSAJ coverage, condition A must be tested both true and false with each
branch of the case, whereas condition C need only be tested true and false with the last
case and one other case. If the sequence of the case branches were modified, or a default
(others) case were appended to the case statement, the LCSAJs against which coverage is
measured would again change significantly.
Many minor changes and reorganizations of code result in large changes to the LCSAJs,
which will in turn have an impact on the test data required to achieve LCSAJ coverage.
Test data for LCSAJ coverage is therefore not easily maintainable.
A large proportion of modules contain infeasible LCSAJs and as a result, achieving 100%
LCSAJ coverage for other than very simple modules is frequently not achievable.
Hedley[2] provides data on some FORTRAN code, with an average of 56 LCSAJs per
module, in which 12.5% of LCSAJs were found to be infeasible. An experimental
investigation of code, as described in annex A, with an average of 28 LCSAJs per
module, showed 62% of modules to have one or more infeasible LCSAJs.
Each LCSAJ which has not been covered has to be analyzed for feasibility. The large
amount of analysis required for infeasible LCSAJs is the main reason LCSAJ coverage is
not a realistically achievable test metric.
Hennell [3] provides evidence that testing with 100% LCSAJ coverage as a target is more
effective than 100% decision coverage. Test data designed to achieve 100% LCSAJ
coverage is therefore more thorough than test data for decision coverage. However, like
decision coverage, LCSAJ coverage can be complete when just one of the combinations
of a compound condition has been tested (as demonstrated in example 4c).
Automation 4
Achievable 1
Comprehensible 1
Maintainable 2
Thoroughness 3

6. Path Coverage

Path Coverage = p/P

where:
p = Number of paths executed at least once.
P = Total number of paths.

Path coverage looks at complete paths through a program. For example, if a module
contains a loop, then there are separate paths through the module for one iteration of the
loop, two iterations of the loop, through to n iterations of the loop. The thoroughness of
test data designed to achieve 100% path coverage is higher than that for decision
coverage.
If a module contains more than one loop, then permutations and combinations of paths
through the individual loops should be considered. Example 6a shows the first few test
cases required for path coverage of a module containing two 'while' loops.

Example 6a
1. while A loop
2. A_STATEMENT;
3. end loop;
4. while B loop
5. ANOTHER_STATEMENT;
6. end loop;

Test A B
1 False False
2 (True, False False)
3 (True, (True, False) False)
4 (True, False, True, False)
etc.

It can be seen that path coverage for even a simple example can involve a large number
of test cases. A tool for automation of path coverage would have to contend with a large
(possibly infinite) number of paths. Although paths through code are readily identifiable,
the sheer number of paths involved prevents path coverage from being comprehensible
for some code.
As for LCSAJs, it must be considered that some paths are infeasible. Beizer [1], Hedley
[2] and Woodward [6] conclude that only a small minority of program paths are feasible.
Path coverage is therefore not an achievable metric. To make path coverage achievable
the metric has to be restricted to feasible path coverage.

Feasible Path Coverage = f/F


where:
f = Number of paths executed at least once.
F = Total number of feasible paths.

Extracting the complete set of feasible paths from a design or code is not suitable for
automation. Feasible paths can be identified manually, but a manual identification of
feasible paths can never ensure completeness other than for very simple modules. For this
reason path coverage was not included in the investigation described in annex A.
Both path coverage and feasible path coverage are not easily maintainable. The potential
complexity and quantity of paths which have to be tested means that changes to the code
may result in large changes to test data.
Automation 1
Achievable 1 (feasible 3)
Comprehensible 2
Maintainable 2 (feasible 1)
Thoroughness 4

7. Condition Operand Coverage

Condition Operand Coverage = c/C

where:
c = Number of condition operand values evaluated at least once.
C = Total number of condition operand values.

Condition operand coverage gives a measure of coverage of the conditions which could
cause a branch to be executed. Condition operands can be readily identified from both
design and code, with condition operand coverage directly related to the operands. This
facilitates automation and makes condition operand coverage both comprehensible and
maintainable.
Condition operand coverage improves the thoroughness of decision coverage by testing
each operand of decision conditions with both true and false values, rather than just the
whole condition. However, condition operand coverage is only concerned with condition
operands, and does not include loop decisions.
A weakness in the thoroughness of condition operand coverage is illustrated by
examples 7a and 7b.
In example 7a, 100% condition operand coverage requires test data with both true and
false values of operands A and B.

Example 7a
1. if A and B then
2. DO_SOMETHING;
3. else
4. DO_SOMETHING_ELSE;
5. end if;

Example 7b
1. FLAG:= A and B;
2. if FLAG then
3. DO_SOMETHING;
4. else
5. DO_SOMETHING_ELSE;
6. end if;

Condition operand coverage is vulnerable to flags set outside of decision conditions. As a


common programming practice is to simplify complex decisions by using Boolean
expressions with flags as intermediates, the thoroughness of condition operand coverage
is therefore not as good as it could be. Equivalent code in example 7b can be tested to
100% condition operand coverage by only testing with true and false values of FLAG,
but A or B need not have been tested with both true and false values.

Thoroughness can be improved by including all Boolean expressions into the coverage
metric. The term Boolean expression operand coverage refers to such a development of
condition operand coverage.

Boolean Expression Operand Coverage = e/E

where:
e = Number of Boolean operand values evaluated at least once.
E = Total number of Boolean operand values.
Applying Boolean expression operand coverage to example 7b, in order to achieve 100%
coverage, test cases are required in which each of A, B and FLAG have values of true and
false.
There were no infeasible operand values in the real code investigated (see annex A).
100% Boolean expression operand coverage was therefore achievable for all modules
investigated.
Automation 4
Achievable 5
Comprehensible 5
Maintainable 5
Thoroughness 2 (Boolean 3)

8. Condition Operator Coverage


Condition Operator Coverage = o/O

where:
o = Number of condition combinations evaluated at least once.
O = Total number of condition operator input combinations.
Condition operator coverage looks at the various combinations of Boolean operands
within a condition. Each Boolean operator (and, or, xor) within a condition has to be
evaluated four times, with the operands taking each possible pair of combinations of true
and false, as shown in example 8a.

Example 8a
1. if A and B then
2. DO_SOMETHING;
3. end if;
Test A B
1 True False
2 True True
3 False False
4 False True

As for condition operand coverage, Boolean operators and operands can be readily
identified from design and code, facilitating automation and making condition operator
coverage both comprehensible and maintainable. However, condition operator coverage
becomes more complex and less comprehensible for more complicated conditions.
Automation requires recording of Boolean operand values and the results of Boolean
operator evaluations.
As for condition operand coverage, achieving condition operator coverage will not be
meaningful if a condition uses a flag set by a previous Boolean expression. Examples 7a
and 7b illustrated this point. Boolean expression operator coverage improves upon the
thoroughness of condition operator coverage by evaluating coverage for all Boolean
expressions, not just those within branch conditions.

Boolean Expression Operator Coverage = x/X


where:
x = Number of Boolean operator input combinations evaluated at least once.
X = Total number of Boolean operator input combinations.
The thoroughness of Boolean expression operator coverage is higher than for condition
operand coverage, in that sub-expressions of all compound conditions will be evaluated
both true and false.
The investigation of code, described in annex A, identified two infeasible operand
combinations which prevented 100% condition operand coverage being achievable. Both
of these operand combinations occurred in a single module. The general form of the
infeasible combinations is given in example 8b.

Example 8b
1. if (VALUE=N1) or (VALUE=N2) then
2. DO_SOMETHING;
3. end if;

Test =N1 =N2


1 True False
2 False True
3 False False

Infeasible
True True

The infeasible operand combinations were both due to mutually exclusive sub-
expressions, which (assuming N1 /= N2) could never both be true at the same time.
Infeasible operand combinations are rare, are readily identifiable during design, and do
not depend upon the topology of the code. Boolean expression operator coverage is much
more achievable than LCSAJ coverage.
Automation 4
Achievable 4
Comprehensible 4
Maintainable 5
Thoroughness 3 (Boolean 4)

9. Boolean Operand Effectiveness Coverage

Boolean Operand Effectiveness Coverage = b/B

where:
b = Number of Boolean operands shown to independently influence the outcome of
Boolean expressions.
B = Total number of Boolean operands.
To achieve Boolean operand effectiveness coverage, each Boolean operand must be
shown to be able to independently influence the outcome of the overall Boolean
expression. The straight forward relationship between test data and the criteria of Boolean
operand
effectiveness coverage makes the metric comprehensible and associated test data
maintainable. This is illustrated by example 9a.

Example 9a
1. if (A and B) or C then
2. DO_SOMETHING;
3. end if;

Test A B C
1 true true false
2 false true false
(Tests 1 and 2 show independence of A)

3 true true false


4 true false false
(Tests 3 and 4 show independence of B)

5 false false true


6 false false false
(Tests 5 and 6 show independence of C)

It is worth noting that there are other sets of test data which could have been used to show
the independence of C.
There were no infeasible operand values in the real code investigated (see annex A), and
only two infeasible operand combinations, neither of which obstructed the criteria of
Boolean operand effectiveness coverage. 100% Boolean operand effectiveness coverage
was therefore achievable for all modules investigated.
Boolean operand effectiveness coverage is only concerned with the operands and will not
always identify expressions which are using an incorrect operator. Research by Boeing
[7],[8] has shown that for single mutations to operators in a Boolean expression, Boolean
Operand effectiveness coverage is as thorough as Boolean expression operator coverage,
but that it is less thorough for multiple mutations. As multiple mutations are unlikely, we
conclude that the thoroughness of test data designed to achieve 100% Boolean operand
effectiveness coverage is about the same as the thoroughness of Boolean expression
operator coverage.
Automation of Boolean operand effectiveness coverage requires the state of all Boolean
operands in a Boolean expression to be recorded each time the expression is evaluated.
The ability of an operator to independently affect the outcome will not necessarily be
demonstrated by adjacent evaluations of the expression (as in example 9a).
Automation 3
Achievable 5
Comprehensible 5
Maintainable 5
Thoroughness 4
Static Analysis

Even software that has been thoroughly dynamically tested can have its problems.
Static analysis gives the developer useful information on non-functional qualities of the source
code, such as its maintainability and compliance with coding standards.
Static analysis facilities allow the user to check that software complies with coding standards and
is within acceptable limits of complexity.
There is a large number of ‘common sense’ metrics for example
• number of code statements

• measures of complexity(McCabe’s cyclomatic complexity metric)

• maximum depth of loop


• variable assignment
etc.

CHAPTER 8
IMPLEMENTATION
In this chapter we are going to discuss the development of software testing module. We
will discuss the development of project according various phases of software
development life cycle. These phases are described in different section in this chapter.

8.1 Requirement analysis


Analysis is a software engineering task that bridges the gap between system level
requirements engineering and software design. Requirements engineering activities result in
the specifications of software’s operational characteristics (function, data, and behavior),
indicates software’s interface with other system elements, and establish constraints that
software must need. The first step was to identify the requirements of the project.
After understanding the problem, we thought of different possible solutions and thought for
various benefits and limitations of the each possible solution.

8.1.1 Problem specification


As the title of the project suggest we need to implement a software testing module. There
are huge numbers of approaches through which the software can be tested. Many
software techniques and strategies are defined, so it is obvious that thousand number of
software tools are available in market based on different techniques and strategies.
The technique which we have chosen to implement was white box testing. The reason
behind it was as described in previous chapters in this technique the internal body of
program is tested. It is also called programmers testing.

8.1.2 Approach to Problem


The algorithm which we have implemented is Basis path testing. In this algorithm as
described in last chapters, it tests the flow of program on different sets of inputs. The
input of this algorithm is flow graph notation of the function and test input data.
There are different tools available those can convert the code to its flow graph by using
reverse engineering. We were not interested in generating the flow graph from code in
our module because it is only a manipulation of heavy finite automata. It could be some
kind of scanner that should recognize the syntax of whole language and arrange those
construct in some kind of graph form.
After observing the flow of program on certain test cases we will calculate the different
matrices of coverage analysis for example statement coverage, conditional coverage etc,
as defined in last chapters.
In place of flow graph generator we will extend this module by adding the fine grained
black box testing, that is component level testing. Component is a function that executes
independently or in other words the module which is self contained. In this extension the
various inputs or test cases will be applied to component, and after calculating the results
those results will be matched with the expected results. Here we can test the correctness
the components.
The other extension will be the static analyzer. In this we can test that the particular
function is following the programming standards or not. There are various matrices which
can be tested those are defined in last chapters.
The language used in development of this module is c++. The reason behind it is that it
suited our purpose best. In testing a language we need to know very much about that
language first, and with this language we are more familiar than others.
We analyze this problem and after thorough study we identified the various aspects on which
we had to work. Then we defined the inputs those will be needed. we divided the working of
system in various process and data, which flows through these processes. The data flow
diagram shown in next section.

8.2 Design
8.2.1 Data flow diagrams
8.2.1.1 Data flow diagram for coverage analysis
In figure 8.1 the level 0 data flow diagram for coverage analysis is shown. The main
process defined in figure will accept the following inputs form different storage files.
• Graph matrix: This is connection matrix for input flow graph. This matrix will
have the non zero value in cell of matrix at the location where row will be the
source node and column will be the destination node.
• Graph node: This file will contain the information about the nodes of input flow
graph in other words this will have the program code contained by each node. The
codes are written in this file in sequential manner.

The output data form this process will be as follows


• CCP: This is conditional coverage percentage and will be calculated by the
formula given in last chapters.
• SCP: This is statement coverage percentage and will be calculated by the formula
given in last chapters.
The main process coverage analysis will make the flow graph from given inputs and will
trace the paths defined in flow graph in different set of test data.

CCPOF
Matrix file Graph Matrix
Conditional
Coverage Percent
Coverage
analysis Statement
Coverage Percent

Node file Graph Node SCPOF

*CCPOF = Conditional Coverage Percentage Output File


*SCPOF = Statement Coverage Percentage Output File

LEVEL 0
Execute_scrip
Graph Matrix t
Matrix file

Make_graph Statement
Coverage
Percent
Node file Graph Node Gen_script
Conditional
Coverage
Percent
SCPOF
CCPOF

*CCPOF = Conditional Coverage Percentage Output File


*SCPOF = Statement Coverage Percentage Output File
LEVEL 1

The level 1 data flow diagram for coverage analysis is shown in figure 8.2.Here the main
process in divided into three processes the job is each process is as follows
• Make graph: This process will take the input from graph matrix file and graph
node file and generate the flow graph for future use by program and will be
submitted to Gen_script process.
• Gen_scriput: This process will generate the script by using flow graph .Here the
script is a function written in separate file. This script is useful in executing the
program code saved in graph node of flow graph.
• Execute_script: This process will take the script as input and execute that, and
will observe the execution of the script. Here some routines also defined , those
will be helpful in viewing the progress graphically.
8.2.1.1 Data flow diagram for dynamic analysis
The level 0 data flow diagram for dynamic analysis is shown in figure 8.3. The main
process defined in figure will accept the following inputs form different storage files.
• Function file: This file will contain the component or independent function which
is going to be tested. The body is defined similarly as in program.
• Test file: This file will contain the test call of input function. The function will be
tested on these calls with different test cases.
• Expected file: This file will contain the expected result calculated manually. These
results will be compared with the result found after execution of component.
Here the output will be the results of comparison that how many results have been
matched with expected result, those are matched will be termed as pass others as failed.
The main process will take these inputs and will calculate the output.

Function
file
Function

Expected Result
result file Dynamic
Analysis Output file
Expected
Results

Test Calls

Test file LEVEL 0


The level 1 data flow diagram for coverage analysis is shown in figure 8.4.Here the main
process in divided into three processes the job is each process is as follows:
• Identify type: This process takes input function as its input and will give the
return type of function to the script generator, so that the script can be generalized
for different data types.
• Script generator: This process will take test calls, expected results and return type
as input and will generate the script. This function will use the function file in
calling the test calls.
• Execute script: This process will take script as input and will execute it and will
write down the result in output file.

Function
Function Identify Output file
file return type

Return type
Execute
Test Calls Script Result
Script
Test file generator
Script

Expected Expected
output file Results

LEVEL 1
The level 2 data flow diagram for coverage analysis is shown in figure 8.5.Here the script
generator process in divided into tow processes the job is each process is as follows
• Gen script for strings: This process is divided because the way of handling the
strings is different for strings. This process will take the all inputs necessary to
generate the script and will generate for components having the return type string.
• Gen script for others: This process will generate the script for components having
the return type other than strings.

Function
Function
file
Identify Output file
return type
Test file
Test Calls
Return type
Result
Script
Expected
Script Script
Results
Expected generator generator
output file
Return type Script
Script
Gen. Script Return type Gen. Script
for strings for others

LEVEL 2

8.3 Code generation

To show all the program code in this report in not possible so we are giving the brief
description of few module which have more importance.
8.3.1 Modules used in coverage analysis

• Readlib: These modules reads the file graph node and takes the node code from
file and store these in two dimensional array, that will be used in making the flow
graph.
• Make node: This function allocates the memory to the data structure used to save
the information about each node.
• Make graph: This module take connection matrix and node information as inputs
and generate the graph.
• Caltotstmt: This module calculate the total number of statement present in whole
function.
• Calcovstmt: This module will take the graph as input and calculate the number of
statement executed at least once for current test case.
• Stcovper: This function will calculate the coverage metrics called statement
coverage percentage by using the output of fuction caltotstmt and calcovstmt.
• Caltotcondition: This module calculate the total number of conditions present in
whole function.
• Calcovcondition: This module will take the graph as input and calculate the
number of conditions executed at least once for current test case.
• Stcovper: This function will calculate the coverage metrics called condition
coverage percentage by using the output of function caltotcondition and
calcovconditon.
• Show graph: This module shows the flow of control in function while executing
graphically.
• Listrev: This function reverses the linked list. This is used in making flow graph
and in show graph function also there to reverse the covered path list so that it can
start form the beginning.
8.3.2 Modules used in dynamic analysis
• Browse: This function generate the list of text files present in current directory
using the various service routine and will make a list that will be helpful in
display menu module.
• Display menu: This module show the list of text files generated by browse
function and show these file in form of menu so that user can select the
appropriate input files.
• Generate script: This module takes the input function ,test calls and expected
results as input and by finding the return type of function will generate the
script in separate file. That file will be included by execute script routine and
that routine will execute that function. The execute script also contain the
procedure to compare the calculated result with expected result.
• Get response: This function return the response of the mouse pointer, means
after the menu is displayed the user will select the any of the item form menu.
This function will identify the file name on which the user has clicked, this
job is done by calculating the position of the co-ordinate on which the mouse
button has clicked.
• Mouse handling routines: There are various routines also defined those
handles the behavior of mouse on screen few of those is listed here
I. Show mouse pointer
II. Hide mouse pointer
III. Initialize mouse pointer
IV. Highlight region
V. Dehighlight region
These routine does the same job as suggested by names of all these routines.
8.3.3 Modules used in static analysis
As we know that static analysis is about following the programmatical standards. There
are number of rules or standards defined and function should use all those standards in
code generation. Here in this module we have implemented two standards and function
will be tested on these standards.
• Variable assignment: The C compiler reports warning when a variable is
initialized and not used, but there is no warning when the variable is not
initialized and used. In this case the variable will have garbage value and when
program will run using this garbage value that will produce wrong result. So we
have implemented a finite automata in this module first that will recognize the all
variable and then it will check that which variable is not initialized and will
generate the output. The output will have variable name, its line number and
status means it is assigned or unassigned.
• Maximum depth of loops: The depth of loops affects the complexity of a function
in great deal. As the depth increased the complexity multiplies with the upper
limit of inner loops so in result the execution time increases. When complexity
increases it also increases the number of test case required to execute the each and
every path. In this module we have implemented a push down automata to
identify the maximum depth of loops and if the depth is greater than three it will
report the possibility of computation hazard, and if the depth is less than three the
depth is tolerable.

8.4 Testing

Testing is discussed in so much detail earlier that it is unnecessary to explain about


technique which we have use for testing our project. We tested our each analysis module
on different functions and matched which the result calculated manually. We tested out
each module on test cases covering the whole range and loops on its boundary values.

You might also like