1. Introduction 2. General Concepts of Testing 3. Test Case Design Techniques
Black box test White box test

4. Test Phases
Unit Test Link Test Integration test Function Test System Test Acceptance Test

5. Automated Software Testing
Dynamic Analysis Coverage Analysis Static Analysis



Software testing is arguably the least understood part of the development process as well as most critical element of software quality assurance and represents the ultimate review of specification, design and code generation. Once the code of the software has been generated it is must to test it by uncovering as maximum number of possible errors. Our goal is to design a set of test cases that have highest probability of finding errors. During early stages of testing a software only software engineers perform all tests. However as the importance of software is increasing separate testing specialist may become involved. Reviews and other software quality assurance activities can uncover errors, but they are not sufficient. Every time a program is executed the client test it, therefore we have to test the program with the specific intent of finding and removing as many as errors as possible. To find the highest possible number of errors, tests must be conducted systematically and designed using standard techniques. A series of test cases to test both internal logic and external logic is designed and documented using disciplined techniques and expected results are also defined and actual results are recorded to compare with the expected results. When we begin testing, we should change our point of view and try hard to break the software design test cases in a disciplined fashion and review the test cases we do create for thoroughness. In the end it seems like software testing is destructive activity but in true meanings it is constructive and requires a great deal of attention.


During testing the software engineering produces a series of test cases that are used to “rip apart” the software they have produced. Testing is the one step in the software process that can be seen by the developer as destructive instead of constructive. Software engineers are typically constructive people and testing requires them to overcome preconceived concepts of correctness and deal with conflicts when errors are identified.

Testing is the process of executing program(s) with the intent of finding errors, rather than (a misconception) of showing the correct functioning of the program(s). The distinction may sound like a matter of semantics, but it has been observed to have profound effect on testing success. The difference actually lies on the different psychological effect caused by the different objectives: If our goal is to demonstrate that a program has no errors, then we will tend to select tests that have a low probability of causing the program to fail. On the other hand, if our goal is to demonstrate that a program has errors; our test data will have a higher probability of finding errors. Specifically, testing should bear the following objectives: (a) To reveal design errors; (b) To reveal logic errors; (c) To reveal performance bottleneck; (d) To reveal security loophole; and (e) To reveal operational deficiencies. All these objectives and the corresponding actions contribute in increasing quality and reliability of the application software.

There are two strategies for testing software, namely White-Box Testing and Black-Box Testing. White-Box Testing, also known as Code Testing, focuses on the independent logical internals of the software to assure that all code statements and logical paths have been tested. Black-Box Testing, also known as Specification Testing, focuses on the functional externals to assure that defined input will produce actual results that agreed with required results documented in the specifications. Both strategies should be used, according on the levels of testing.

There are 5 levels of Testing, each of which carries a specific functional purpose, to be carried out in chronological order. Testing Unit Testing Description Testing of the program modules in isolation With the objective to find discrepancy between the programs and the program Link Testing specifications Testing of the linkages between tested program modules with the objective to find discrepancy between the programs and Function Testing system specifications Testing of the integrated software on a function by function basis with the objective to find discrepancy between the programs Systems Testing and the function specifications Testing of the integrated software with the objective to find discrepancy between the Black Box Test Black Box Test White Box Test Strategy applied White Box Test

programs and the original objectives with regard to the operating environment of the system (e.g. Recovery, Security, Acceptance Testing Performance, Storage, etc.) Testing of the integrated software by the end Black Box Test users (or their proxy) with the objective to find discrepancy between the programs and the end user needs

The following points should be noted when conducting training: • • As far as possible, testing should be performed by a group of people different from those performing design and coding of the same system. Test cases must be written for invalid and unexpected, as well as valid and expected input conditions. A good test case is one that has a high probability of detecting undiscovered errors. A successful test case is one that detects an undiscovered error. • A necessary part of a test case is a definition of the expected outputs or results. Do not plan testing effort on assumption that no errors will be found. The probability of the existence of more errors in a section of a program is proportional to the number of errors already found in that section. Testing libraries should be set up allowing Regression test be performed at system maintenance and enhancement times. • • The later in the development life cycle a fault is discovered, the higher the cost of correction. Successful testing is relying on complete and unambiguous specification.


Software Quality Assurance (SQA) without measures is like a jet with no fuel. Everyone is ready to go but not much happens. This paper deals with the SQA activities to ensure the right fuel is available to reach the destination, namely high quality. There are three processes, common to both development and purchasing, that enable organizations to be high flyers and reach quality in the stratosphere. The role of SQA, as described here, is to confirm that core measures are used effectively by management processes involving technical and purchasing directors, project and purchasing managers and process improvement groups. They cover: 1. Benchmarking and Process Improvement 2. Estimating and Risk Assessment 3. Progress Control and Reporting Process improvement enables the same amount of software to be built in less time with less effort and fewer defects. Informed estimating uses process productivity benchmarks to evaluate constraints, assess risks and to arrive at viable estimates. Estimates of the defects at delivery use the history from benchmarked projects and allow alternative staffing strategies to be evaluated. Throwing people in to meet tight time to market schedules has a disastrous impact on quality. Progress control tracks defects found during development in order to avoid premature delivery and to ensure the reliability goals are achieved. Each process contributes separately to improving the quality of the final software product. We describe how the core measures are used in each process to fuel improved quality. Dramatic quality improvements are achieved by dealing with all three. (Ensuring the fuel is high octane). SQA is defined as " a group of related activities employed throughout the software lifecycle to positively influence and quantify the quality of the delivered software." (Ref 1.) Much of the SQA literature relates to product assurance. This article focuses on process assurance and the core measurement data that supports all

management levels. The basic fuel elements are the Carnegie Mellon Software Engineering Institute (SEI) recommendations on core software measures, namely software size, time, effort and defects. (Ref. 2) An extra benefit is that the majority of the SEI-Capability Maturity Model (CMMI) Key Process Areas (KPA's) are met by assuring the processes use these measures. Quality Assurance is the process of making sure that the customers gets enough of what they pay for to satisfy their needs. Testing is the means by which we perform the process. You can test without assuring quality, but you can't assure quality without testing. A common problem with software quality is that the assurance is in the hands of the producers. While the producers can certainly create and perform insightful, powerful tests, it is perfectly possible to design tests that will churn away forever and never discover a defect. This is the psychological temptation when the software producers program tests to evaluate themselves.


The preceding section of this paper has provided a "recipe" for developing a unit test specification as a set of individual test cases. In this section a range of techniques which can be to help define test cases are described. Test case design techniques can be broadly split into two main categories. Black box techniques use the interface to a unit and a description of functionality, but do not need to know how the inside of a unit is built. White box techniques make use of information about how the inside of a unit works. There are also some other techniques which do not fit into either of the above categories. Error guessing falls into this category.

Black box (functional) Specification derived tests Equivalence partitioning Boundary value analysis State-transition testing

White box (structural) Branch testing Condition testing Data definition-use testing Internal boundary value testing

Other Error guessing

Table 3.1 - Categories of Test Case Design Techniques The most important ingredients of any test design are experience and common sense. Test designers should not let any of the given techniques obstruct the application of experience and common sense.

Equivalence partitioning is a much more formalised method of test case design. It is based upon splitting the inputs and outputs of the software under test into a number of partitions, where the behaviour of the software is equivalent for any value within a particular partition. Data which forms partitions is not just routine parameters. Partitions can also be present in data accessed by the software, in time, in input and output sequence, and in state. Equivalence partitioning assumes that all values within any individual partition are equivalent for test purposes. Test cases should therefore be designed to test one value in each partition. Consider again the square root function used in the previous example. The square root function has two input partitions and two output partitions, as shown in table 3.2.

Table 3.2 - Partitions for Square Root These four partitions can be tested with two test cases: Test Case 1: Input 4, Return 2 - Exercises the >=0 input partition (ii) - Exercises the >=0 output partition (a) Test Case 2: Input -10, Return 0, Output "Square root error - illegal negative input" using Print_Line. - Exercises the <0 input partition (i) - Exercises the "error" output partition (b) For a function like square root, we can see that equivalence partitioning is quite simple. One test case for a positive number and a real result; and a second test case for a negative number and an error result. However, as software becomes more complex, the

identification of partitions and the inter-dependencies between partitions becomes much more difficult, making it less convenient to use this technique to design test cases. Equivalence partitioning is still basically a positive test case design technique and needs to be supplemented by negative tests.

Boundary value analysis uses the same analysis of partitions as equivalence partitioning. However, boundary value analysis assumes that errors are most likely to exist at the boundaries between partitions. Boundary value analysis consequently incorporates a degree of negative testing into the test design, by anticipating that errors will occur at or near the partition boundaries. Test cases are designed to exercise the software on and at either side of boundary values. Consider the two input partitions in the square root example, as illustrated by figure 3.2.

Figure 3.2 - Input Partition Boundaries in Square Root The zero or greater partition has a boundary at 0 and a boundary at the most positive real number. The less than zero partition shares the boundary at 0 and has another boundary at the most negative real number. The output has a boundary at 0, below which it cannot go. Test Case 1: Input {the most negative real number}, Return 0, Output "Square root error - illegal negative input" using Print_Line - Exercises the lower boundary of partition (i). Test Case 2: Input {just less than 0}, Return 0, Output "Square root error - illegal

negative input" using Print_Line - Exercises the upper boundary of partition (i). Test Case 3: Input 0, Return 0 - Exercises just outside the upper boundary of partition (i), the lower boundary of partition (ii) and the lower boundary of partition (a). Test Case 4: Input {just greater than 0}, Return {the positive square root of the input} - Exercises just inside the lower boundary of partition (ii). Test Case 5: Input {the most positive real number}, Return {the positive square root of the input} - Exercises the upper boundary of partition (ii) and the upper boundary of partition (a). As for equivalence partitioning, it can become impractical to use boundary value analysis thoroughly for more complex software. Boundary value analysis can also be meaningless for non scalar data, such as enumeration values. In the example, partition (b) does not really have boundaries. For purists, boundary value analysis requires knowledge of the underlying representation of the numbers. A more pragmatic approach is to use any small values above and below each boundary and suitably big positive and negative numbers

State transition testing is particularly useful where either the software has been designed as a state machine or the software implements a requirement that has been modelled as a state machine. Test cases are designed to test the transitions between states by creating the events which lead to transitions. When used with illegal combinations of states and events, test cases for negative testing can be designed using this approach.


In branch testing, test cases are designed to exercise control flow branches or decision points in a unit. This is usually aimed at achieving a target level of Decision Coverage. Given a functional specification for a unit, a "black box" form of branch testing is to "guess" where branches may be coded and to design test cases to follow the branches. However, branch testing is really a "white box" or structural test case design technique. Given a structural specification for a unit, specifying the control flow within the unit, test cases can be designed to exercise branches. Such a structural unit specification will typically include a flowchart or PDL. Returning to the square root example, a test designer could assume that there would be a branch between the processing of valid and invalid inputs, leading to the following test cases: Test Case 1: Input 4, Return 2 - Exercises the valid input processing branch Test Case 2: Input -10, Return 0, Output "Square root error - illegal negative input" using Print_Line. - Exercises the invalid input processing branch However, there could be many different structural implementations of the square root function. The following structural specifications are all valid implementations of the square root function, but the above test cases would only achieve decision coverage of the first and third versions of the specification.

Figure 3.3(a) - Specification 1

Figure 3.3(b) - Specification 2

Figure 3.3(c) - Specification 3

Figure 3.3(d) - Specification 4

It can be seen that branch testing works best with a structural specification for the unit. A structural unit specification will enable branch test cases to be designed to achieve decision coverage, but a purely functional unit specification could lead to coverage gaps.

One thing to beware of is that by concentrating upon branches, a test designer could loose sight of the overall functionality of a unit. It is important to always remember that it is the overall functionality of a unit that is important, and that branch testing is a means to an end, not an end in itself. Another consideration is that branch testing is based solely on the outcome of decisions. It makes no allowances for the complexity of the logic which leads to a decision.

There are a range of test case design techniques which fall under the general title of condition testing, all of which endeavor to mitigate the weaknesses of branch testing when complex logical conditions are encountered. The object of condition testing is to design test cases to show that the individual components of logical conditions and combinations of the individual components are correct. Test cases are designed to test the individual elements of logical expressions, both within branch conditions and within other expressions in a unit. As for branch testing, condition testing could be used as a "black box" technique, where the test designer makes intelligent guesses about the implementation of a functional specification for a unit. However, condition testing is more suited to "white box" test design from a structural specification for a unit. To illustrate condition testing, consider the example specification for the square root function which uses successive approximation (figure 3.3(d) - Specification 4). Suppose that the designer for the unit made a decision to limit the algorithm to a maximum of 10 iterations, on the grounds that after 10 iterations the answer would be as close as it would ever get. The PDL specification for the unit could specify an exit condition like that given in figure 3.4.

Figure 3.4 - Loop Exit Condition If the coverage objective is Modified Condition Decision Coverage, test cases have to prove that both error<desired accuracy and iterations=10 can independently affect the outcome of the decision. Test Case 1: 10 iterations, error>desired accuracy for all iterations. - Both parts of the condition are false for the first 9 iterations. On the tenth iteration, the first part of the condition is false and the second part becomes true, showing that the iterations=10 part of the condition can independently affect its outcome. Test Case 2: 2 iterations, error>=desired accuracy for the first iteration, and error<desired accuracy for the second iteration. - Both parts of the condition are false for the first iteration. On the second iteration, the first part of the condition becomes true and the second part remains false, showing that the error<desired accuracy part of the condition can independently affect its outcome. Condition testing works best when a structural specification for the unit is available. It provides a thorough test of complex conditions, an area of frequent programming and design error and an area which is not addressed by branch testing. As for branch testing, it is important for test designers to beware that concentrating on conditions could distract a test designer from the overall functionality of a unit.

In many cases, partitions and their boundaries can be identified from a functional specification for a unit, as described under equivalence partitioning and boundary value analysis above. However, a unit may also have internal boundary values which can only be identified from a structural specification. Consider a fragment of the successive approximation version of the square root unit specification, as shown in figure 3.5 ( derived from figure 3.3(d) - Specification 4).

Figure 3.5 – Fragment of Specification 4 The calculated error can be in one of two partitions about the desired accuracy, a feature of the structural design for the unit which is not apparent from a purely functional specification. An analysis of internal boundary values yields three conditions for which test cases need to be designed. Test Case 1: Error just greater than the desired accuracy Test Case 2: Error equal to the desired accuracy Test Case 3: Error just less than the desired accuracy

Internal boundary value testing can help to bring out some elusive bugs. For example, suppose "<=" had been coded instead of the specified "<". Nevertheless, internal boundary value testing is a luxury to be applied only as a final supplement to other test case design techniques.

Orthogonal Array Testing
The Orthogonal Array Testing Strategy (OATS) is a systematic, statistical way of testing pair-wise interactions. It provides representative (uniformly distributed) coverage of all variable pair combinations. This makes the technique particularly useful for integration testing of software components (especially in OO systems where multiple subclasses can be substituted as the server for a client). It is also quite useful for testing combinations of configurable options (such as a web page that lets the user choose the font style, background color, and page layout). Test case selection poses an interesting dilemma for the software professional. Almost everyone has heard that you can't test quality into a product, that testing can only show the existence of defects and never their absence, and that exhaustive testing quickly becomes impossible -- even in small systems. However, testing is necessary. Being intelligent about which test cases you choose can make all the difference between (a) endlessly executing tests that just aren't likely to find bugs and don't increase your confidence in the system and (b) executing a concise, well-defined set of tests that are likely to uncover most (not all) of the bugs and that give you a great deal more comfort in the quality of your software. The basic fault model that lies beneath this technique is: • Interactions and integrations are a major source of defects.

Most of these defects are not a result of complex interactions such as "When the background is blue and the font is Arial and the layout has menus on the right and the images are large and it's a Thursday then the tables don't line up properly." Most of these defects arise from simple pair-wise interactions such as "When the font is Arial and the menus are on the right the tables don't line up properly." With so many possible combinations of components or settings, it is easy to miss one.

Randomly selecting values to create all of the pair-wise combinations is bound to create inefficient test sets and test sets with random, senseless distribution of values. OATS provides a means to select a test set that: • Guarantees testing the pair-wise combinations of all the selected variables.

• • •

Creates an efficient and concise test set with many fewer test cases than testing all combinations of all variables. Creates a test set that has an even distribution of all pair-wise combinations. Exercises some of the complex combinations of all the variables. Is simpler to generate and less error prone than test sets created by hand.


Software testing is a fundamental component of software quality assurance and represents a review of specification, design and coding. The greater visibility of software systems and the cost associated with software failure are motivating factors for planning, through testing. Structural (usually called "white box") testing, and functional ("black box") testing have unique characteristics, advantages and limitations that make them more or less applicable to certain stages of test.

White box testing is a test case design approach that employs the control architecture of the procedural design to produce test cases. Using white box testing approaches, the software engineering can produce test cases that (1) Guarantee that all independent paths in a module have been exercised at least once (2) Exercise all logical decisions (3) Execute all loops at their boundaries and in their operational bounds (4) Exercise internal data structures to maintain their validity

4.1.1 Requirements Analysis
Understanding the requirements is key to performing white box testing. To this end, it is necessary so that the customer can get a better understanding of the system under consideration. All relevant project documents relating to the system functionality and design are desirable.

4.1.2 System Architecture
Key to undertaking any white-box testing project is to understand the overall system architecture. So one should work with client provided documentation relating to system

architecture. The system architecture documentation forms the basis for identification of systems, subsystems and generation of test cases.

4.1.3 Identification
The identification of the test items is done primarily based on the specifications of the product. These specifications would be related to: • • • • • • • • • • Functions (exhaustive list) of the system Response criteria (benchmarking and stress testing) Volume constraints (number of users, hits, stress testing) Stability criteria (24 hour testing with fast operations) Database responses (flushing, cleaning, updating rates etc.) Network criteria (network traffic, choking, etc.) Compatibility (environments, browsers, etc.) User Interface / Friendliness Criteria Modularity (Ability to easily interface with other tools) Security

4.1.4 Criteria for Test Cases
Each Test Plan Item should have the following specific characteristics: • • • • • • It should be uniquely identifiable It should be unambiguous It should have well-defined test-data (or data-patterns) It should have well defined pass/fail criteria for each sub-item and overall-criteria for the pass/fail of the entire test itself It should be easy to record It should be easy to demonstrate repeatedly

Many of the above criteria are related to actually identifying the test plan items and would involve a good understanding of the specifications. However, keeping in mind the

need for a strong process, one should keep the above aspects in mind and formulated a structure of the test plan.

A testing mechanism proposed by McCabe whose aim is to derive a logical complexity measure of a procedural design and use this as a guide for defining a basic set of execution paths. These are test cases that exercise basic set will execute every statement at least once.

4.2.1 Flow Graph Notation
A notation for representing control flow similar to flow charts and UML activity diagrams.

4.2.2 Cyclomatic Complexity
The cyclomatic complexity gives a quantitative measure of the logical complexity. This value gives the number of independent paths in the basis set, and an upper bound for the number of tests to ensure that each statement is executed at least once. An independent path is any path through a program that introduces at least one new set of processing statements or a new condition (i.e., a new edge) Cyclomatic complexity provides upper bound for number of tests required to guarantee coverage of all program statements.

4.3 CONTROL STRUCTURE TESTING 4.3.1 Conditions Testing
Condition testing aims to exercise all logical conditions in a program module. They may define: • Relational expression: (E1 op E2), where E1 and E2 are arithmetic expressions.

• • •

Simple condition: Boolean variable or relational expression, possibly proceeded by a NOT operator. Compound condition: composed of two or more simple conditions, Boolean operators and parentheses. Boolean expression: Condition without relational expressions..

4.3.2 Loop Testing
Loops fundamental to many algorithms. Can define loops as simple, concatenated, nested, and unstructured. Note that unstructured loops are not to be tested. Rather, they are redesigned.

Black box testing approaches concentrate on the fundamental requirements of the software. Black box testing allows the software engineer to produce groups of input situations that will fully exercise all functional requirements for a program. Black box testing is not an alternative to white box techniques. It is a complementary approach that is likely to uncover a different type of errors that the white box approaches.

Black box testing tries to find errors in the following categories: (1) Incorrect or missing functions (2) Interface errors (3) Errors in data structures or external database access (4) Performance errors (5) Initialization and termination errors By applying black box approaches we produce a set of test cases that fulfill requirements: (1) Test cases that reduce the number of test cases to achieve reasonable testing (2) Test cases that tell use something about the presence or absence of classes of errors


Testing real-time systems presents more challenges than testing non-real-time systems; since, in addition to the value domain, the temporal domain also has to be considered. A number of design issues affect the testing strategies and the testability of the system. This paper gives a brief introduction to some of these design issues and explains how testing is affected by the different possible choices. The main conclusion is that testers need to partake in the design phase to safeguard the testability of the system, an issue that is otherwise easily overlooked. Testing the temporal domain of a real-time system also affects potential tools for automatic test execution. The last part of this paper is devoted to explain the major requirements on testing tools stemming from the need to test in the temporal domain. The conclusion is that automating the test execution is a necessity for a successful test.

More than 99% of the processors produced today are used in embedded systems [Tur99]. Many of these

embedded systems have real-time requirements (e.g., cellular phones, fuel injection systems, modems, and video recorders). Still, to the best of our knowledge, little attention has been devoted to develop the theory and the best practice of testing real-time systems. Although testing principles developed for non-real-time systems are applicable for real-time systems, the fact that time is a parameter in the testing complicates many issues.

Real-time Systems
Many computer systems, including most real-time systems can be viewed as in Figure 1. Input, in this chapter called an event, initiates a computation, in this paper called a task. Upon termination, the task produces a result. Loosely, a task can be understood to be an arbitrary computation. A real-time task is a task that must complete at an intended point in time. In practice, it is usually enough if the realtime task completes before the intended point in time, that is, the deadline.

Figure 5.1 Simple model of a computer system. This definition of a real-time task can be used to define a real-time system as a system that contains at least one real-time task [Sch93]. This is only one of many possible definitions of a real-time system with the common denominator that in real-time systems, both the value and the time domains are important. The reason for choosing this and the previous definitions in this section is to include as many real-time systems as possible. Real-time systems are often classified according to the cost of missing a deadline. Locke [Loc86] describes four different classes (soft, firm, hard essential, and hard critical) of real-time systems based on the cost of missing a deadline. In a soft real-time system, completing a task after its deadline will still be beneficial even if the gain is not as big as if the task had been completed within the deadline and it may be acceptable to occasionally miss a deadline. In a firm real-time system, completing a task after its deadline will neither give any benefits nor incur any costs. In a hard essential real-time system, a bounded cost will be the result of missing a deadline. An example may be lost revenues in a billing system. Finally, missing a deadline of a hard critical system will have disastrous results, for instance loss of human lives. Different events give rise to event types. Two events are of the same event type if they have entered the

system through the same channel and they only differ in the time of entrance and possibly in the specific value. Event types may be very different from each other. Examples of event types are temperature readings, keyboard commands and pushing a call button of an elevator. When designing real-time systems the types and frequencies of events are important to consider. An event type may be periodic, sporadic or aperiodic. An event type is periodic if an event of that type occurs with a regular (and known) period. An event type is sporadic if events of that type may occur any time but there is a known minimum inter-arrival time between two consecutive events of that type. An event type is aperiodic if nothing is known about how often events may occur or when it is known that events may occur anytime. The peak load that is assumed to be generated by the environment is called the load hypothesis [KV93] and it is often formulated in terms of types and frequencies of the events from the environment.

Testing, that is, dynamic execution of test cases [BS98], has two main goals. These goals are assessing and increasing reliability [FHL+98]. Testing as a means of assessing the reliability, relies on choosing and executing test cases based on some operational distribution and monitoring the number of encountered failures. A failure is defined as a deviation of the software from its expected delivery or service [BS98]. Testing as means of increasing the reliability, builds on selecting test cases that are assumed to be especially likely to cause failures. The observed failure is analyzed to find the cause of the failure, which is the fault [BS98]. The fault is removed and the reliability is assumed to increase. Many different test methods exist (e.g. equivalence partitioning, boundary value analysis, state-based testing, syntax testing, [Bei90]) that are all assumed to generate test suites containing test cases especially prone to revealing failures. The strategies and methods used for testing the value domain in non-real-time systems can to a large extent be used without alteration in real-time systems. A good example of this is the DO178b standard for testing avionics systems [DO92]. This standard requires the testing of the most safety critical parts of the avionics system to reach 100% Modified Condition Decision Coverage (MCDC). MCDC is a code coverage criterion totally independent of the temporal properties of the application. The main challenge in testing real-time systems is that they need to be tested in the temporal domain as well as the value domain [Sch93]. Testing in the temporal domain has several implications. The input to the test object may need to be issued at a precise moment. The temporal state of the test object at the start of the test execution may need to be controlled. The timing of the result may need to be observed. There is a potential for non-determinism etc. Two central concepts are observability and controllability. Observability, that is, the functionality facilitated by the system to observe or monitor what the system does, how it does it, and when it does it [Sch93], and controllability, that is, the functionality available to the user to control the (re-) execution of a

test case [Sch93]. The testability of a system is defined as the attributes of software that bear on the effort needed for validating the [modified] software [ISO91]. The testability of a system depends to a large extent on the observability and controllability of the system [Sch93].

Design Trade-offs for Testability in RT Systems

The following sections describe how testability is affected by different design decisions.

3.1 Overview
When designing a real-time system there are many decisions to make. The outcomes of most of these decisions affect the testing in one way or another. In the following subsections we will describe and discuss issues relating to some of these decisions. We have selected issues that are likely to be present in many real-time systems and that have a major impact on the testing. Scheduling and the choice of design paradigm are the first two issues covered. These two issues are closely related and have a major impact on the testing strategy that can be used [LMA02]. The next two issues: tracing and support for state manipulation give concrete examples of how to achieve observability and controllability in a real-time system. The final issue, caching, is motivated by the fact that most commercial processors today contain one or more levels of caches. Introduction of caching in a system may increase the performance of the system but there are severe drawbacks from a testability perspective.

3.2 Scheduling
In practice, most real-time system contains more than one task. Sometimes two or more tasks will be possible to execute at the same time. In these cases there must be rules for in which order the tasks should be executed since only one task at a time can use a processorii or other resources. Determining the order of the task execution based on the supplied rules is called scheduling. Scheduling can be either static or dynamic. In static scheduling, the execution order of the tasks is determined in advance. In a simplified description, as soon as the execution of one task is finished the processor finds the next task to execute by looking in a table containing the pre-calculated order. Often these pre-calculated orders are cyclic. In dynamic scheduling there is no pre-calculated execution order. Instead there is a set of rules for how to resolve a conflict when two or more tasks want to execute at the same time. One common approach is to assign a priority to each task and force the tasks to execute in priority order. It is not uncommon to let higher priority tasks interrupt lower priority tasks. This is called preemption.

3.3 Design paradigms
There are two different design paradigms for real-time systems: time-triggered and event-triggered [Sch93]. The main difference between the two design paradigms is when communication between the real-time

system and the environment is performed. A time-triggered system only communicates with its environment at predefined points in time. Events that has occurred in the environment since the latest communication point will not be detected and reacted upon by the real-time system until the next communication point. Similarly a result computed by the time-triggered real-time system will not be passed back to the environment until the next communication point. Figure 2 illustrates this concept. The consequence of communicating with the environment only at specific points in time is that the system works in cycles (… read events, execute tasks, write results…). This in turn means that the cycle time, that is, the time between two communication points need to be long enough for the worst-case execution time for any anticipated combination of tasks. Overload situations cannot be handled and should not be possible since the system is designed for an assumed worst case. The normal way of implementing a time-triggered system is by polling and it is common to use static scheduling in time-triggered systems.

Figure 5.2 Observation and reaction to an event in a time-triggered system.

In an event-triggered system there are no special points in time when communication has to occur between the real-time system and its environment. Instead events are observed and reacted upon as they happen. Produced results are similarly communicated to the environment as soon as they are ready. This can be seen in Figure 3. Event-triggered systems must be scheduled dynamically and a normal way of achieving this is by using interrupts. In contrast to time-triggered systems, an event-triggered system may face overload situations. The effect is that the event-triggered system must be designed to handle such situations dynamically, in a best-effort manner. This means that deadlines can occasionally be missed and it is the responsibility of the designer to minimize the damage in such situations. In many event-triggered systems it is crucial to guarantee a minimum level of service. Taking a car as an example, the braking service is critical to uphold at all times, whereas the climate control service can be allowed to fail during an overload situation. Therefore, it is essential to make a correct load hypothesis and design the system with enough resources to maintain this

minimum level of service under the load hypothesis.

Figure 3 Observation and reaction to event in an event-triggered system.

The choice of design paradigm has consequences for a number of properties of the real-time system. Unfortunately the desired values of the different properties are in conflict with each other both for timetriggered and event-triggered systems. Figure 4 gives a simplified overview of how some of the different properties are in conflict with each other for the two design paradigms. This will be further explained in the proceeding sections.

Figure 5.3 Trade-off between testability/predictability and flexibility/efficiency for the two design Paradigms Testing Time triggered design. The order of events occurring within the same observation interval is insignificant. All of the events are communicated at the same time, that is, at the next communication point. Due to static schedule, the tasks corresponding to the events are executed in the same order regardless of the actual order of the events. This means that there is a finite, albeit large, number of possible behaviors for a time-triggered system, which is why time-triggered

systems can be tested with systematic coverage exploration. Moreover, the controllability is increased since the tester only need to bother about in which observation interval an event is input to the system. • Event-triggered design. Different order of events may lead to different behavior since the dynamic scheduling algorithm continuously changes the order of task execution based on the current situation, that is, the state of the system and the incoming event. Further, for two event sequences with the same order, the exact time of occurrence of an individual event may affect the result, again due to the dynamic behavior of the scheduling. This makes testing more difficult. In addition, many of the dynamic scheduling algorithms are heuristic, which means that the result of executing the same input with the same timing starting from the same state might yield different results. This leads to the observation that testing with systematic coverage exploration is not feasible in event-triggered systems. Instead we must use statistical testing with tailored loads. • Predictability A system is predictable if the effect of a task can be unambiguously derived from knowledge of that task and its execution environment. Randomness, heuristics, and race-conditions all have a negative impact on predictability. Usually predictability focuses on the observable end result. However, in this article we are not only concerned with the observable end result (e.g., that a deadline was met) but also to some extent how the result was obtained (e.g., which interleaving of tasks that was actually executed). Thus, we will use the term predictability in this wider meaning. • Time-triggered design. The static scheduling and the insignificance of the order of events in the same observation period of these systems increase the predictability of the system since the interleaving of tasks is completely determined in advance. A high predictability makes it possible to use systematic coverage criteria in the testing. Predictability is also of extreme importance in hard critical real-time systems since such systems require high confidence that the system will work in all situations. • Event-triggered design. The dynamic scheduling and in particular the heuristics involved in the scheduling decision decreases the predictability of the behavior for a given input sequence. This will of course decrease the confidence in the test results and also make regression testing harder. This is one of the main reasons why statistical test methods should be used for event-triggered systems. In statistical methods the same input sequence may be executed many times to increase 7 the confidence that the system works correctly under all circumstances for that input sequence. Mimicking the operational conditions of the system is also more important for this reason for event-triggered systems. It is important to note that for most event-triggered systems testing alone will not permit that 100% confidence is gained of the reliability of the system. This is one of the reasons why event-triggered systems are seldom in practice used for hard real-time systems. • Flexibility

• Time-triggered design. A time-triggered system is inflexible when it comes to changing the system or altering the load or fault hypotheses. Alterations exceeding possible spare capacity for future extensions in the system or change of the prerequisites require at least that the static schedule is recomputed and the system reintegrated and re-tested. Sometimes the system even needs to be redesigned, in particular if the change increases the resource demand. • Event-triggered design. An event-triggered system is flexible by its nature. The dynamic scheduling and its ability to handle overloads make the event-triggered systems suitable in less predictable environments. However, it is important to note that a change to the system, even strict removals of parts of the system, effectively invalidates all previous test results. • Efficiency • Time-triggered design. In a time-triggered system, the schedule usually is static. Sporadic tasks are scheduled as if they where periodic. The tasks execute with their worst-case execution time. When the worst case in terms of execution time or arrival rate for sporadic tasks differ much from the average case, then there is a waste of resources. The reason is that whenever a task completes in less time than its worst case execution time, the unused time cannot be used for anything else. • Event-triggered design. This paradigm demands dynamic scheduling. Sporadic tasks are scheduled on arrival. A more critical task than the currently executed task leads to a preemption of the current task whereas a less critical task than the current task has to wait. The execution time varies, which means that if there is a difference between average-case execution time and worstcase, then the efficiency is much higher in an event-triggered system than it would be in a timetriggered. The reason is that any unused time may be used for less critical work by dynamically scheduling an appropriate task. Pure time-triggered or event-triggered systems are rare. Many systems have characteristics from both paradigms. A natural trade-off for systems with mixed criticality is to design the critical part of a system according to the time-triggered paradigm for predictability reasons and design the rest of the system according to the event-triggered paradigm for efficiency reasons. This can be done if it is possible to separate the critical parts from the non-critical parts. During the trade-off discussions, it is important to consider the testing issues as well as all other properties.

3.4 Traces
A common approach to facilitate observability during testing is to use traces. The test object is instrumented by extra code, usually write statements that prints values of interesting variables. During test execution these write statements produce a log, which is analyzed after the test execution to determine the test results or identifying the cause of a failure. The instrumentation of the code introduces a probe effect [Gai86]. The probe effect occurs because the behavior of the instrumented system under test is different from the final version of the system. The probe effect is more severe in real-time systems than in non-real-time systems. There are two reasons for this. First, the traces need to contain more information to include the temporal information, which leads

to more instrumentation code being inserted. Second and more importantly, each code instruction that we add to the code will necessarily affect the temporal behavior since each instruction will add execution time. This means that the test results obtained from an instrumented version of the test object might not be valid for a non-instrumented version of the same test object. It is important to note that the changed behavior due to the probe effect often increases but may sometimes even decrease the response time for a specific task due to the changed interleaving of the task execution. If it were always the case that response times were increased an instrumented test object meeting its deadlines would imply that the corresponding uninstrumented object would also meet the deadlines. Since the timing is affected, we might get different interleavings among the tasks due to the extra instructions. If there is a race condition, a deadline might be met that would not have been met if it were not for the extra instructions. The conclusion is that a changed timing behavior in a real-time system may change the results themselves, not only the timing of the results. A common approach to deal with the probe effect is to leave the instrumentation in the final product. This requires, however, that there is a mechanism for hiding unwanted information for the end user and that using this mechanism in itself does not alter the timing of the application.

3.5 Support for system state manipulation
Often, a test case requires the system under test to have a certain internal state as a starting point of the test case. A big challenge for the tester is to achieve the required state prior to test case execution. The state of the system shall, in this scope, be interpreted as any requirement imposed by the test case on the system under test. Obviously, the nature of these requirements depend on the test case but common examples of such requirements include certain variables having specific values, task queues containing specific tasks, and certain amount of dynamic memory already allocated. For real-time systems these prerequisites may also include timing requirements, for instance that a certain task has already executed during exactly 10 ms. Achieving the right system state prior to test case execution is strongly related to controllability since controllability includes all means of preparing and controlling the test case execution. However, for the tester the process of achieving the right system state may also include a check that the right state really has been achieved. In these cases, some aspects of observability are also included. Dick and Faivre [DF93] describe two methods of achieving a required internal state prior to test case execution. One method is to demand from the system under test to provide special test-bed functions, which can place the system under test directly in any desired state. The other method is to start in an idle state and then execute other test cases and set-up scripts that will result in the system having the desired state. Although from a testing point of view having specific test-bed functions clearly is preferable, there are several drawbacks with this approach. Sometimes it will not be possible to include specific test functionality, for instance if software components are imported from somewhere else. Even if possible, it might not be practical due to the cost and complexity introduced. Finally, if dedicated test-bed functions are built-in to the system, care must be taken so that these functions will not become a security hazard. Thus, using test cases and set-up scripts will in many cases be the only option, even if this method restricts the controllability.

Although this discussion has not specifically mentioned real-time systems, the same reasoning apply with the addition that controlling a system in the temporal domain is even harder than controlling a system in the value domain. In practice, testing real-time systems when there are requirements on the time domain prior to test execution will require a statistical approach or a trial-and-observe approach. The statistical approach builds on the assumption that if we repeat the test cases enough times, we will have achieved the desired conditions at least one time, but we do not explicitly check that this is the case. The trial-and-observe approach require that there are means for observing the actual state before test case execution was performed, thus making it possible to determine if the desired conditions were met. 9

Testing tool Issues
The area of testing tools for non-real-time systems has already received considerable attention elsewhere [FG99], [Hay95] so this section will only focus on features needed in tools for testing real-time systems. For most types of tools the real-time aspect of the developed system does not affect the applicability of the tool. This is especially true for administrative tools, such as tools for test management, traceability, and error reporting. However, as soon as [parts of] the test case execution shall be automated in a real-time system the timing aspects need to be considered. It is quite difficult to find commercial testing tools for automatic test execution of real-time systems. There are several reasons for this. One reason is that the support for timeliness is difficult to implement, especially if the system under test is regarded as a black box, which is a necessity for a commercial program. For instance, the time granularity of the tool needs to be as small or smaller than the time granularity of the system under test. Another reason for the difficulty of finding commercial tools for automatic test execution of real-time systems is that many real-time systems are embedded and lacking standardized interfaces. Also many real-time systems have specialized application domains with very specific demands. Imagine for instance the different demands of mobile phones, brake-by-wire systems, and pacemakers. Still another reason is that some systems are built using state-of-the-art technology, which means that there hardly exist tools that support the new technology. The overall implication of this reasoning is that most tools for automatic test execution of real-time systems are built in-house. 10 Since building tools in-house is often the only option, based on the contents of this paper, we will give a small overview of features that might be handy in a tool for automatic test execution of real-time systems. Even if a commercially available tool is considered, this overview may serve as a checklist for evaluation of the tool. The order of the items in the list is not significant. • Probe-effect In many cases the tool needs to intervene with the system under test. For embedded systems it is quite common that a part of the test tool is located on the target system to facilitate detailed observation despite limited means of communication. In other cases the tool might rely on specific timing information that is obtained by instrumenting the code with special test instructions. In both cases the

test tool will inflict a probe-effect on the system under test. In such cases it is beneficial to have the ability to determine the amount of probe-effect caused by the tool. To some extent this can be achieved by having the tool measuring itself. A complementing approach is to make theoretical calculations of the probe-effect based on known benchmark figures and the actual results of the test execution. • Event injection Event injection or stimuli of the system under test is an important aspect of a testing tool for automatic test execution. For a real-time system timing of the input is important. One useful feature of a tool is to be able to release stimuli to the system under test at a specific [predefined] time. Another, related and equally useful, feature is to be able to release stimuli to the system under test with specified delay relative to an event, which is either another release by the tool or an event occurring in the system under test that is perceived by the tool. Ordinary load generation tools intended for non-real-time systems usually support some form of customization of the distribution of the load. Such and related features are of course useful in the realtime case as well. • Observation of timing Observation of events generated by the system under test is also an important area when automating the test execution. Part of deciding if a timeliness test case passes of fails is to determine if the deadline was met. There are basically two methods to implement such a check. Either the tool supports timers or the events perceived by the tool are time-stamped. A tool supporting timers is usually limited to measure elapsed time of stimuli-response pairs, where the stimuli originates from the tool, since the timer has to be triggered somehow. An advantage is that there are neither probe-effect nor overhead introduced by the tool since all of the intelligence is outside of the system under test. Another advantage with timers is that the pass/fail decision can be made in real-time if the timers are implemented as time-outs. A drawback is that signal transfer and processing times outside the system under test is included in the round-trip delay. A challenge in the timer approach is when there is not a one-to-one correspondence between stimuli and response or when stimuli-response pairs are interleaved. The other solution having events time stamped requires the tool to perform post-analysis to calculate the actual round-trip delays this prohibits making the pass/fail decision in real-time, but gives the advantage of handling events not only originating from the test tool itself. If the time stamping is made by the system under test this will either result in a probe-effect or extra overhead (if the time stamping is kept in the final system). In addition, the communication between the tool and the system under test is increased since the time stamps are added to the events. If the tool, on the other hand, time stamps the events the probe effect and/or overhead are removed and the communication to the tool is reduced, but again we face the problem with inclusion of transfer and handling times outside the system under test. • Synchronization

As soon as there are more than one locus of control in a computer system there might be a clock synchronization problem. In the case of a tool for automatic test execution, if the clocks of both the tool and the system under test are used then a synchronization problem arises. Obviously the same applies in a distributed real-time system with multiple clocks. Clock synchronization can be achieved in two ways. Either a global clock is used as a master clock frequency, or the local clocks are used together with a clock synchronization protocol. In either case a real-time network is required. Another, more coarse-grained, synchronization problem arises whenever the test tool supports distribution. Many tools for load generation allow multiple clients to generate the load, in order to generate larger loads than a single machine can manage. In such cases the different clients need synchronization with respect to at least start and stop of load generation. It is quite an implementation challenge if high precision is needed in such synchronization, and a real-time network is soon needed, increasing both the complexity and cost of the tool. • Resynchronization in case of failure More advanced tools for automatic execution of test cases, real-time or not, supports resynchronization of the test execution after a failure has been detected. The idea is that after a failure, the test tool can take actions to restore the system under test into a known state, and resume test case execution an a point in the test suite after the test case, which failed. Due to the complexity in the actions needed to diagnose and reset the system under test, even for nonrealtime systems resynchronization after a failure is a big challenge. It is an even bigger challenge, when real-time aspects need to be taken into account. Recent research on the topic reports promising but not yet widely available results. Form instance, Iorgulescu and Seviora [IS97] report results on real-time supervision and diagnose of a telephony system. Their work is based on a specification written in Specification and Description Language (SDL). The specification is used to generate possible but erroneous states, which are compared with the actual erroneous state. When the problem has been diagnosed, suitable actions corrective actions can be taken to resume the test case execution. All this is done in real-time, which is a prerequisite in timeliness testing.




The various constituents of the system are tested in increasing degrees of granularity starting from a component through to the full system. A component is the smallest unit for testing. A module or API consists of components, while a subsystem comprises multiple modules. The complete system is built with various subsystems. Control Points are used to segment the life cycle of the development process into the Design, development and Deployment phases.

5.2.1 Component
A component is an independent, isolated and reusable unit of a program that performs a well-defined function. It usually has public interfaces that allow it be used to perform its functions. Individual components are tested against their functional and design goals using the parameters outlined in section 4.0.

5.2.2 Module

A module comprises of one or more components to achieve a business function. Also known as an API, the module encapsulates and aggregates the functionality of its constituent components and appears as a black box. To its users. Usually, a module is homogeneous in nature with respect to the application domain. For example, a database module with interface and/or encapsulate database specific functions. Modules are tested against their functional and design goals using the parameters outlined in section 4.0.

5.2.3 Subsystem
Subsystems are defined as heterogeneous collections of modules to achieve a business function. For example, a credit card processing subsystem might interface to a credit card clearing house, a database component and an audit mechanism to perform complete credit card related operations. Subsystems are also tested against their functional and design goals using the parameters outlined in section 4.0.

5.2.4 System
The full system uses multiple subsystems to implement the full functionality of the Application. An example is an online shopping system that includes catalog, shopping cart and credit card processing subsystems. The complete system is tested against their functional and design goals using the parameters outlined in section 4.0. Units are the smallest building blocks of software. In a language like C, individual functions make up the units. Unit testing is the process of validating such small building blocks of a complex system much before testing an integrated large module or the system as a whole.

5.2.5 Benefits of Unit Testing
• Be able to test parts of a project with out waiting for the other parts to be available, • Achieve parallelism in testing by being able to test and fix problems simultaneously by many engineers, • Be able to detect and remove defects at a much less cost compared to other later stages of testing,

• Be able to take advantage of a number of formal testing techniques available for unit testing, • Simplify debugging by limiting to a small unit the possible code areas in which to search for bugs, • Be able to test internal conditions that are not easily reached by external inputs in the larger integrated systems (for example, exception conditions not easily reached in normal operation) • Be able to achieve a high level of structural coverage of the code, • Avoid lengthy compile-build-debug cycles when debugging difficult problems. Studies have shown that Unit testing is more cost effective compared to the other stages of testing. Study leads to the conclusion that better testing at the early phases i.e. unit testing is a smarter way to detect and fix defects. Unit testing can detect and remove a significant portion of the defects. A study by Thayer and Lipow shows that comprehensive path and parameter testing can remove 72.9% of the defects.

5.2.6 Unit testing - some typical problems and there solution
(a) Testing is monotonous, boring and repetitive: Automation of as many of the routine activities as possible will help a long way in reducing the monotony of testing. Defining concrete completeness criteria for the testing activity also brings some predictability to testing in the sense that now there is a concrete goal to be achieved. (b) Poor Documentation of Test cases: It is very useful to ensure that test case documentation gets automated as part of the testing process. This way while testing gets done the documentation continues to get generated.

(c) Coding Drivers and Stubs: Automation of code generation for drivers and stubs can result in an useful saving of effort for the tester. It also will ensure that there are no defects in the stubs or drivers that results in avoidable loss of time. (d) Informal testing process: Combining Functional (Black box testing based on the Specifications), Structural (White box testing based on the structure of the code) and Heuristic (based on human intuition) testing techniques provide much better results than simply using an intuitive approach to testing. Testing must be mostly systematic and partly intuitive instead of the general practice of mostly intuitive and partly systematic approach to testing. (e) Poor Regression Testing: It is very useful to build a capability of retaining automated test cases as a useful resource along with the code. Automation is the only solution to regression testing. (f) Lack of Complete Testing tools: Computer Aided Software Testing (CAST) Tools are a fast growing discipline. Good unit test automation tools are beginning to become available. Evolution of such tools achieving a more comprehensive automation of Unit testing activities are likely to help in a big way in solving many of the problems currently faced in Unit testing.

5.3 LINK TEST 5.3.1 Scope of Testing
Link Testing is the process of testing the linkages between program modules as against the system specifications. The goal here is to find errors associated with interfacing. As a by-product of the testing process, the software modules would be integrated together. It is worth noting that this level of testing is sometimes referred to as “Integration Testing”, which is understood to mean that the testing process would end up with the software modules in integration. However after some careful consideration, the term was

abandoned, as it would cause some confusion over the term “System Integration”, which means integration of the automated and manual operations of the whole system.

5.3.2 Activities, Documentation and Parties Involved
(a) Test Group to prepare a Link Testing test plan. (b) Test Group to prepare a Link Testing test specification before testing commences. (c) Test Group, with the aid of the designers/programmers, to set up the testing environment. (d) Test Group to perform Link Testing; and upon fault found issue Test Incident Reports to Designers/programmers, who would fix up the liable errors. (e) Test Group to report progress of Link Testing through periodic submission of the Link Testing Progress Report.

5.3.3 Practical Guidelines
(a) Both control and data interface between the programs must be tested. (b) Both Top-down and Bottom-up approaches can be applied. Top-down integration is an incremental approach to the assembly of software structure. Modules are integrated by moving downward through the control hierarchy, beginning with the main control module (‘main program’). Modules subordinate (and ultimately subordinate) to the main control module are incorporated into the structure in either a depth-first or the breadth-first manner. Bottom-up integration, as its name implies, begins assembly and testing with modules at the lowest levels in the software structure. Because modules are integrated from the bottom up, processing required for modules subordinate to a given level is always available, and the need for stubs (i.e. dummy modules) is eliminated. (c) As a result of the testing, integrated software should be produced.


The individual components are combined with other components to make sure that necessary communications, links and data sharing occur properly. It is not truly system testing because the components are not implemented in the operating environment. The integration phase requires more planning and some reasonable sub-set of production-type data. Larger systems often require several integration steps. There are three basic integration test methods:
o o o

all-at-once top-down bottom-up

5.4.1 All-at-once
The all-at-once method provides a useful solution for simple integration problems, involving a small program possibly using a few previously tested modules.

5.4.2 Top-Down integration
Top-down integration is an incremental approach to the production of program structure. Modules are integrated by moving downwards through the control hierarchy, starting with the main control module. Modules subordinate to the main control module are included into the structure in either a depth-first or breadth-first manner. Relating to the figure below depth-first integration would integrate the modules on a major control path of the structure. Selection of a major path is arbitrary and relies on application particular features. For instance, selecting the left-hand path, modules M1, M2, M5 would be integrated first. Next M8 or M6 would be integrated. Then the central and right-hand control paths are produced. Breath-first integration includes all modules directly subordinate at each level, moving across the structure horizontally. From the figure modules M2, M3 and M4 would be integrated first. The next control level, M5, M6 etc., follows.








M8 Figure 5.2 Top Down Integration The integration process is performed in a series of five stages: 1. The main control module is used as a test driver and stubs are substituted for all modules directly subordinate to the main control module. 2. Depending on the integration technique chosen, subordinate stubs are replaced one at a time with actual modules. 3. Tests are conducted as each module is integrated. 4. On the completion of each group of tests, another stub is replaced with the real module. 5. Regression testing may be performed to ensure that new errors have been introduced.

Bottom-up Integration
Bottom-up integration testing, begins testing with the modules at the lowest level (atomic modules). As modules are integrated bottom up, processing required for modules subordinates to a given level is always available and the need for stubs is eliminated.




Cluster 1

Cluster 2

Figure 5.3 Bottom Up Integration Testing

A bottom-up integration strategy may be implemented with the following steps: 1. Low-level modules are combined into clusters that perform a particular software subfunction. 2. A driver is written to coordinate test cases input and output. 3. The cluster is tested. 4. Drivers are removed and clusters are combined moving upward in the program structure.

5.4.4 Comments on Integration Testing

There has been much discussion on the advantages and disadvantages of bottom-up and top-down integration testing. Typically a disadvantage is one is an advantage of the other approach. The major disadvantage of top-down approaches is the need for stubs and the difficulties that are linked with them. Problems linked with stubs may be offset by the advantage of testing major control functions early. The major drawback of bottom-up integration is that the program does not exist until the last module is included.

5.5 FUNCTION TESTING 5.5.1 Scope of Testing
Function Testing is the process of testing the integrated software on a function-byfunction basis as against the function specifications. The goal here is to find discrepancy between the programs and the functional requirements.

5.5.2 Activities, Documentation and Parties Involved
(a) Test Group to prepare Function Testing test plan, to be endorsed by the Project Committee via the Test Control Sub-Committee, before testing commences. (b) Test Group to prepare a Function Testing test specification before testing commences. (c) Test Group, with the aid of the designers/programmers, to set up the testing environment. (d) Test Group (participated by user representatives) to perform Function Testing; and upon fault found issue test incident reports to Designers/programmers, who fix up the liable errors. (e) Test Group to report progress of Function Testing through periodic submission of the Function Testing Progress Report.

5.5.3 Practical Guidelines
(a) It is useful to involve some user representatives in this level of testing, in order to give them familiarity with the system prior to Acceptance test and to highlight differences between users’ and developers’ interpretation of the specifications. However, degree of

user involvement may differ from project to project, and even from department to department, all depending on the actual situation. (b) User involvement, if applicable, could range from testing data preparation to staging out of the Function Testing. (c) It is useful to keep track of which functions have exhibited the greatest number of errors; this information is valuable because it tells us that these functions probably still contain some hidden, undetected errors.

5.6 SYSTEM TESTING 5.6.1 Scope of Testing
A system testing is the process of testing the integrated software with regard to the operating environment of the system. (i.e. Recovery, Security, Performance, Storage, etc.) It may be worthwhile to note that the term has been used with different environments. In its widest definition especially for the small-scale projects, it also covers the scope of Link Testing and the Function Testing. For small-scale projects, which combine the Link Testing, Function Testing and System Testing in one test, plan and one test specification, it is crucial that the test specification should include distinct sets of test cases for each of these 3 levels of testing.

5.6.2 Activities, Documentation and Parties Involved
(a) Test group to prepare a System Testing test plan, to be endorsed by the Project Committee via the Test Control Sub-Committee, before testing commences. (b) Test group to prepare a System Testing test specification before testing commences. (c) Test group, with the aid of the designers/programmers, to set up the testing environment.

(d) Test group (participated by the computer operators and user representatives) to perform System Testing; and upon fault found issue test incident reports to the Designers/programmers, who would fix up the liable errors. (e) Test group to report progress of the System Testing through periodic submission of the System Testing Progress Report.

5.6.3 Practical Guidelines
(a) Eight types of Systems Tests are discussed below. It is not claimed that all 8 types will be mandatory to every application system nor are they meant to be an exhaustive list. To avoid possible overlooking, all 8 types should be explored when designing test cases. (i) Volume Testing- Volume testing is to subject the system to heavy volumes of data, and the attempt of which is to show that the system cannot handle the volume of data specified in its objective. Since volume testing being obviously expensive, in terms of machine and people time, one must not go overboard. However every system must be exposed to at least a few volume tests. (ii) Stress Testing- Stress testing involves subjecting the program to heavy loads or stress. A heavy stress is a peak volume of data encountered over a short span of time. Although some stress test may experience ‘never will occur’ situations during its operational use, but this does not imply that these tests are not useful. If errors are detected by these ‘impossible’ conditions, the test is valuable, because it is likely that the same errors might also occur in realistic, less stressful situations. (iii) Performance Testing- Many programs have specific performance or efficiency objectives, such as response times and throughput rates under certain workload and configuration conditions. Performance testing should attempt to show that the system does not satisfy its performance objectives. (iv) Recovery Testing- If processing must continue during periods in which the application system is not operational, then those recovery processing

procedures/contingent actions should be tested during the System test. In addition, the users of the system should be involved in a complete recovery test so that not only the application system is tested but the procedures for performing the manual aspects of recovery are tested. (v) Security Testing- The adequacy of the security procedures should be tested by attempting to violate those procedures. For example, testing should attempt to access or modify data by an individual not authorized to access or modify that data. To address security issues with the system under test, the following aspects are tested: (a) Authentication The authentication mechanism is tested based on class of user and password validation, if any. If 3rd party products such as LDAP servers are used in this mechanism, those are tested as well. In the situation where digital certificates are used for authentication, the certificates as well as system behavior upon activation of the certificates are tested. (b) Data Security Flow of information (channel security) as well as firewall access issues are tested for all aspects of the system under test. If data encryption is being used, it is checked with compatibility with standards. Any computer-based system that manages sensitive information or produces operations that can improperly harm individuals is a target for improper or illegal penetration. Security testing tries to verify that protection approaches built into a system will protect it from improper penetration. During security testing, the tester plays the role of the individual who wants to enter the system. The tester may try to get passwords through external clerical approaches; may attack the system with customized software, purposely produce errors and hope to find the key to system entry. The role of the designer is to make entry to the system more expensive than that which can be gained. (vi) Procedure Testing- Computer systems may not contain only computer processes but also involve procedures performed by people. Any prescribed human procedures, such as procedures to be followed by the system operator, database administrator, or terminal user, should be tested during the System test.

(vii) Regression Testing- This discipline enables us to track issues, check the effectiveness of a solution, and detect any new issues, which may have been created as a result of fixing the original problem. Reports are generated, problems are tracked, and the process continues until all of the issues are solved or a new version is developed. Purpose The purpose of regression testing is to ensure that previously detected and fixed issues really are fixed, they do not reappear, and new issues are not introduced into the program as a result of the changes made to fix the issues. Methodology Typically Breakers performs regression testing on a daily basis. Once an issue in the defect-tracking database has been fixed it is reassigned back to Breakers for final resolution. Breakers can either reopen the issue, if it has not been satisfactorily addressed, or close the issue if it has, indeed, been fixed. For more involved projects lasting several months, several full regression passes may be scheduled in addition to the continuous regression testing mentioned above. Full regression passes involve reverifying all closed issues in the defect-tracking database as truly closed. A full regression pass is also typically performed at the very end of the testing effort as part of a final acceptance test. In addition to verifying closed issues, regression testing seeks to verify that changes made to fix known defects do not cause further defects. Breakers can produce a regression-testing suite consisting of test cases that evaluate the stability of all modules of the software product. Quite often, automation of this regression-testing suite is well worth considering (viii) Operational Testing- During the System test, testing should be conducted by the normal operations staff. It is only through having normal operation personnel conduct the test that the completeness of operator instructions and the ease with which the system can be operated can be properly evaluated. This testing is optional, and should be conducted only when the environment is available. (b) It is understood that in real situations, due to possibly environmental reasons, some of the tests (e.g. Procedures test, etc.) may not be carried out in this stage and are to be delayed to later stages. There is no objection to such delay provided that the reasons are

documented clearly in the Test Summary Report and the test be carried out once the constraints removed. Ultimately, software is included with other system components and a set of system validation and integration tests are performed. Steps performed during software design and testing can greatly improve the probability of successful software integration in the larger system. System testing is a series of different tests whose main aim is to fully exercise the computer-based system. Although each test has a different role, all work should verify that all system elements have been properly integrated and form allocated functions. Below we consider various system tests for computer-based systems.

5.7 ACCEPTANCE TESTING 5.7.1 Scope of Testing
Acceptance Testing is the process of comparing the application system to its initial requirements and the current needs of its end users. The goal here is to determine whether the software end product is not acceptable to its user.

5.7.2 Activities, Documentation and Parties Involved
(a) User representatives to prepare an Acceptance Testing test plan, which is to be endorsed by the Project Committee via the Test Control Sub-Committee. (b) User representatives to prepare an Acceptance Testing test specification, which is to be endorsed by the Project Committee via the Test Control Sub-Committee. (c) User representatives to perform Acceptance Testing; and upon fault found issue test incident reports to the Designers/programmers, who will fix up the liable error. (d) User representatives to report progress of the Acceptance Testing through periodic submission of Acceptance Testing Progress Report.

5.7.3 Practical Guidelines
(a) There are three approaches for Acceptance Testing, namely,

(i) A planned comprehensive test using artificial data and simulated operational procedures, and usually accompanied with the Big Bang implementation approach but can also be used as a pre-requisite step of other approaches. (ii) Parallel run using live data and would normally be used when comparison between the existing system and the new system is required. This approach requires duplicated resources to operate both systems. (iii) Pilot run using live data and would normally be used when the user is not certain about the acceptance of the system by its end-users and/or the public. Users are responsible to select the approaches that are most applicable to its operating environment. (b) Precaution for users (i) Testing staff should be freed from their routine activities (ii) Commitment is authorized

Verifications and Validations
Software testing is one type of a broader domain that is known as verification and validation (V&V). Verification related to a set of operations that the software correctly implements a particular function. Validation related to a different set of activities that ensures that the software that has been produced is traceable to customer needs.

Dynamic Analysis Dynamic Analysis uses test data sets to execute software in order to observe its behavior and produce test coverage reports. This assessment of source code ensures consistent levels of high quality testing and correct use of capture/playback tools. Dynamic Analysis provides the facilities to achieve quality standards for critical code, improve code efficiency, minimize regression test costs, and detect software defects. When used during software development and maintenance, Dynamic Analysis techniques can make a significant contribution to a program’s robustness and reliability. Benefits of Dynamic Analysis •High quality testing is performed •Reduces cost and effort of regression testing •Identifies software anomalies and defects •Yields a comprehensive test data set which has measurable quality and known test outcomes •Reduces maintenance costs to a minimum •Identifies unnecessary parts of the system/program, which can be removed •Ensures systems are reliable and as error free as possible

Dynamic Analysis explores the semantics of the application under test via test data selection. Control and data flow models constructed from the Static Analysis of the software application are compared with the actual control flow and data flow that are yielded at run time. This enables checks to be made which show errors in either the Static or Dynamic Analysis. Dynamic Analysis is particularly effective for the analysis of software applications which are required to achieve high levels of reliability. It is the primary requirement for the testing of safety-critical avionics software and is widely used in all military, safety and

mission critical software. In addition to the safety-critical industry sectors mentioned above, Dynamic Analysis is also being used in the banking and telecommunications sectors. A key driver is the process and efficiency improvements the tool can bring. It is able to demonstrate real cost savings and return on investments for clients, which lead to large competitive advantages.

Coverage Analysis
Measurement of structural coverage of code is a means of assessing the thoroughness of testing. There are a number of metrics available for measuring structural coverage, with increasing support from software tools. Such metrics do not constitute testing techniques, but a measure of the effectiveness of testing techniques. A coverage metric is expressed in terms of a ratio of the metric items executed or evaluated at least once to the total number of metric items. This is usually expressed as a percentage. Coverage = items executed at least once/ total number of items There is significant overlap between the benefits of many of the structural coverage metrics. It would be impractical to test against all metrics, so which metrics should be used as part of an effective testing strategy? A subjective score has been given for each metric against the evaluation criteria (5=high, 1=low). Simple examples are given to illustrate specific points. Data collected from an investigation of real code (summarized in annex A) is used to support the analysis. Section 10 summarizes conclusions and makes recommendations to enable developers to apply structural coverage metrics in a practical way in real software developments. There

are many equivalent names for each structural coverage metric. The names used in this paper are those considered to be most descriptive. Equivalent alternative names are listed annex B. References are given in annex C. 2. Evaluation Criteria The first evaluation criterion is automation. To be of use on a real software development, which may involve tens of thousands or hundreds of thousands of lines of code, a metric must be suitable for automated collection and analysis. A metric should also be achievable. It should be possible and practical to achieve 100% coverage (or very close to 100% coverage) of a metric. Any value less than 100% requires investigation to determine why less than 100% has been achieved. (a) If it is the result of a problem in the code, the problem should be fixed and tests run again. (b) If it is the result of a problem in the test data, the problem should be fixed and tests run again. (c) If it is because 100% coverage is infeasible, then the reasons for infeasibility must be ascertained and justified. Infeasibility occurs because the semantics of the code constrain the coverage which can be achieved, for example: defensive programming, error handling, constraints of the test environment, or characteristics of the coverage metric. Infeasibility should be the only reason for metric values of less than 100% to be accepted. When 100% coverage is infeasible, the effort required for investigation and to take appropriate action is important. This will depend on the frequency at which coverage of less than 100% occurs and on how comprehensible the metric is. To be comprehensible the relationship between a metric, design documentation and code should be simple. Software has to be retested many times throughout its life. Test data required to achieve 100% coverage therefore has to be maintainable. Changes required of test data should not be disproportionate in scale to changes made to the code.

An ideal criteria against which a coverage metric should be assessed is its effectiveness at detecting faults in software. To measure the effectiveness of each coverage metric would require extensive data collection from software tested using the entire range of coverage metrics. The size of such a data collection would require orders of magnitude more effort than the investigation described in annex A. As the investigation was based on static analysis and code reading, the actual effectiveness of each metric could not be quantified. For the purposes of this paper, effectiveness is assumed to be a function of thoroughness. The thoroughness with which test data designed to fulfill a metric actually exercises the code is assessed. A higher thoroughness score is attributed to metrics which demand more rigorous test data to achieve 100% coverage.

3. Statement Coverage Statement Coverage = s/S where: s = Number of statements executed at least once. S = Total number of executable statements. Statement coverage is the simplest structural coverage metric. From a measurement point of view one just keeps track of which statements are executed, then compares this to a list of all executable statements. Statement coverage is therefore suitable for automation. Statement coverage is easily comprehensible, with the units of measurement (statements) appearing directly in the code. This makes analysis of incomplete statement coverage a simple task. It is practical to achieve 100% statement coverage for nearly all code. An investigation of real code (as described in annex A) showed no infeasible statements. 100% statement coverage was achievable for all modules analyzed. However, statement coverage is not a very good measure of test thoroughness. Consider the following fragment of code:

Example 3a 1. if CONDITION then 2. DO_SOMETHING; 3. end if; 4. ANOTHER_STATEMENT; Full statement coverage of example 3a could be achieved with just a single test for which CONDITION evaluated to true. The test would not differentiate between the code given in example 3a and the code given in example 3b. Example 3b 1. null; 2. DO_SOMETHING; 3. null; 4. ANOTHER_STATEMENT; Another criticism of statement coverage, is that test data which achieves 100% statement coverage of source code, will often cover less than 100% coverage of object code instructions. Beizer [1] quantifies this at about 75%. Test data for statement coverage is maintainable by virtue of its simplicity and comprehensible relationship to the code. Automation 5 Achievable 5 Comprehensible 5 Maintainable 5 Thoroughness 1

4. Decision coverage Decision coverage = d/D

Where: d = Number of decision outcomes evaluated at least once. D = Total number of decision outcomes. To achieve 100% decision coverage, each condition controlling branching of the code has to evaluate to both true and false. In example 4a, decision coverage requires two test cases. Example 4a 1. if CONDITION then 2. DO_SOMETHING; 3. else 4. DO_SOMETHING_ELSE; 5. end if; Test CONDITION 1 True 2 False Not all decision conditions are as simple, decision conditions are also in case or switch statements and in loops. However, this does not present an obstacle to automation. The units of measurement (decision conditions) appear directly in the code, making decision coverage comprehensible and investigation of incomplete decision coverage straight forward. An investigation of real code (as described in annex A) showed no infeasible decision outcomes. 100% decision coverage was achievable for all modules analyzed. Test data designed to achieve decision coverage is maintainable. Equivalent code to example 4a, shown in example 4b, would not require changes to test data for decision coverage. Example 4b 1. if not CONDITION then

2. DO_SOMETHING_ELSE; 3. else 4. DO_SOMETHING; 5. end if; For structured software, 100% decision coverage will necessarily include 100% statement coverage. The weakness of decision coverage becomes apparent when non-trivial conditions are used to control branching. In example 4c, 100% decision coverage could be achieved with two test cases, but without fully testing the condition. Example 4c 1. if A and B then 2. DO_SOMETHING; 3. else 4. DO_SOMETHING_ELSE; 5. end if; Test A B 1 True True 2 False True Untested True False False False For a compound condition, if two or more combinations of components of the condition could cause a particular branch to be executed, decision coverage will be complete when just one of the combinations has been tested. Yet compound conditions are a frequent source of code bugs. The thoroughness of test data designed to achieve decision coverage is therefore an improvement over statement coverage, but can leave compound conditions untested. Automation 5

Achievable 5 Comprehensible 5 Maintainable 5 Thoroughness 2

5. LCSAJ Coverage An LCSAJ is defined as an unbroken linear sequence of statements: (a) which begins at either the start of the program or a point to which the control flow may jump, (b) which ends at either the end of the program or a point from which the control flow may jump, (c) and the point to which a jump is made following the sequence. Hennell [3] gives a full explanation and some examples to help illustrate the definition of an LCSAJ. LCSAJ coverage = l/L where: l = Number of LCSAJs exercised at least once. L = Total number of LCSAJs. LCSAJs depend on the topology of a module's design and not just its semantics, they do not map onto code structures such as branches and loops. LCSAJs are not easily identifiable from design documentation. They can only be identified once code has already been written. LCSAJs are consequently not easily comprehensible. Automation of LCSAJ coverage is a bit more difficult than automation of decision coverage. However, it is relatively easily achieved. Small changes to a module can have a significant impact on the LCSAJs and the required

test data, leading to a disproportionate effort being spent in maintaining LCSAJ coverage and maintaining test documentation. Unfortunately this dependence cannot be illustrated with trivial examples. In examples 5a LCSAJs are marked as vertical bars. Example 5a | | | | 1. if A then | | | 2. STATEMENT; | | | | | | 3. end if; | | | | | | 4. if B then | | | | 5. if C then | | 6. STATEMENT; | | 7. else | 8. STATEMENT; | | 9. end if; | | 10. else | | 11. if D then | 12. STATEMENT; | 13. else | | 14. STATEMENT; | | | | 15. end if; | | | | | | 16. end if; | | | | | | 18. if E then | | | 19. STATEMENT; | | | | 20. end if; Suppose condition B were to be negated and the two nested 'if-else' constructs were to swap positions in the code. Condition A would then be combined in LCSAJs with condition D, whereas condition E would be combined in LCSAJs with condition C. The code would be effectively the same, but the LCSAJs against which LCSAJ coverage is measured would have changed.

A similar problem occurs with case or switch statements, where LCSAJs lead into the first alternative and lead out of the last alternative, as shown in example 5b. Example 5b | | | | | 1. if A then | | | | 2. STATEMENT; | | | | | | | | 3. end if; | | | | | | | | 4. case B | | 5. B1: | | 6. STATEMENT; | 7. B2: | 8. STATEMENT; | | 9. B3: | | 10. STATEMENT; | | | 11. end case | | | | 12. if C then | | 13. STATEMENT; | | | 14. end if; To achieve LCSAJ coverage, condition A must be tested both true and false with each branch of the case, whereas condition C need only be tested true and false with the last case and one other case. If the sequence of the case branches were modified, or a default (others) case were appended to the case statement, the LCSAJs against which coverage is measured would again change significantly. Many minor changes and reorganizations of code result in large changes to the LCSAJs, which will in turn have an impact on the test data required to achieve LCSAJ coverage. Test data for LCSAJ coverage is therefore not easily maintainable. A large proportion of modules contain infeasible LCSAJs and as a result, achieving 100% LCSAJ coverage for other than very simple modules is frequently not achievable. Hedley[2] provides data on some FORTRAN code, with an average of 56 LCSAJs per module, in which 12.5% of LCSAJs were found to be infeasible. An experimental

investigation of code, as described in annex A, with an average of 28 LCSAJs per module, showed 62% of modules to have one or more infeasible LCSAJs. Each LCSAJ which has not been covered has to be analyzed for feasibility. The large amount of analysis required for infeasible LCSAJs is the main reason LCSAJ coverage is not a realistically achievable test metric. Hennell [3] provides evidence that testing with 100% LCSAJ coverage as a target is more effective than 100% decision coverage. Test data designed to achieve 100% LCSAJ coverage is therefore more thorough than test data for decision coverage. However, like decision coverage, LCSAJ coverage can be complete when just one of the combinations of a compound condition has been tested (as demonstrated in example 4c). Automation 4 Achievable 1 Comprehensible 1 Maintainable 2 Thoroughness 3

6. Path Coverage Path Coverage = p/P where: p = Number of paths executed at least once. P = Total number of paths. Path coverage looks at complete paths through a program. For example, if a module contains a loop, then there are separate paths through the module for one iteration of the loop, two iterations of the loop, through to n iterations of the loop. The thoroughness of test data designed to achieve 100% path coverage is higher than that for decision coverage.

If a module contains more than one loop, then permutations and combinations of paths through the individual loops should be considered. Example 6a shows the first few test cases required for path coverage of a module containing two 'while' loops. Example 6a 1. while A loop 2. A_STATEMENT; 3. end loop; 4. while B loop 5. ANOTHER_STATEMENT; 6. end loop; Test A B 1 False False 2 (True, False False) 3 (True, (True, False) False) 4 (True, False, True, False) etc. It can be seen that path coverage for even a simple example can involve a large number of test cases. A tool for automation of path coverage would have to contend with a large (possibly infinite) number of paths. Although paths through code are readily identifiable, the sheer number of paths involved prevents path coverage from being comprehensible for some code. As for LCSAJs, it must be considered that some paths are infeasible. Beizer [1], Hedley [2] and Woodward [6] conclude that only a small minority of program paths are feasible. Path coverage is therefore not an achievable metric. To make path coverage achievable the metric has to be restricted to feasible path coverage. Feasible Path Coverage = f/F

where: f = Number of paths executed at least once. F = Total number of feasible paths. Extracting the complete set of feasible paths from a design or code is not suitable for automation. Feasible paths can be identified manually, but a manual identification of feasible paths can never ensure completeness other than for very simple modules. For this reason path coverage was not included in the investigation described in annex A. Both path coverage and feasible path coverage are not easily maintainable. The potential complexity and quantity of paths which have to be tested means that changes to the code may result in large changes to test data. Automation 1 Achievable 1 (feasible 3) Comprehensible 2 Maintainable 2 (feasible 1) Thoroughness 4

7. Condition Operand Coverage Condition Operand Coverage = c/C where: c = Number of condition operand values evaluated at least once. C = Total number of condition operand values. Condition operand coverage gives a measure of coverage of the conditions which could cause a branch to be executed. Condition operands can be readily identified from both design and code, with condition operand coverage directly related to the operands. This

facilitates automation and makes condition operand coverage both comprehensible and maintainable. Condition operand coverage improves the thoroughness of decision coverage by testing each operand of decision conditions with both true and false values, rather than just the whole condition. However, condition operand coverage is only concerned with condition operands, and does not include loop decisions. A weakness in the thoroughness of condition operand coverage is illustrated by examples 7a and 7b. In example 7a, 100% condition operand coverage requires test data with both true and false values of operands A and B. Example 7a 1. if A and B then 2. DO_SOMETHING; 3. else 4. DO_SOMETHING_ELSE; 5. end if; Example 7b 1. FLAG:= A and B; 2. if FLAG then 3. DO_SOMETHING; 4. else 5. DO_SOMETHING_ELSE; 6. end if; Condition operand coverage is vulnerable to flags set outside of decision conditions. As a common programming practice is to simplify complex decisions by using Boolean expressions with flags as intermediates, the thoroughness of condition operand coverage is therefore not as good as it could be. Equivalent code in example 7b can be tested to

100% condition operand coverage by only testing with true and false values of FLAG, but A or B need not have been tested with both true and false values. Thoroughness can be improved by including all Boolean expressions into the coverage metric. The term Boolean expression operand coverage refers to such a development of condition operand coverage. Boolean Expression Operand Coverage = e/E where: e = Number of Boolean operand values evaluated at least once. E = Total number of Boolean operand values. Applying Boolean expression operand coverage to example 7b, in order to achieve 100% coverage, test cases are required in which each of A, B and FLAG have values of true and false. There were no infeasible operand values in the real code investigated (see annex A). 100% Boolean expression operand coverage was therefore achievable for all modules investigated. Automation 4 Achievable 5 Comprehensible 5 Maintainable 5 Thoroughness 2 (Boolean 3)

8. Condition Operator Coverage Condition Operator Coverage = o/O where: o = Number of condition combinations evaluated at least once.

O = Total number of condition operator input combinations. Condition operator coverage looks at the various combinations of Boolean operands within a condition. Each Boolean operator (and, or, xor) within a condition has to be evaluated four times, with the operands taking each possible pair of combinations of true and false, as shown in example 8a. Example 8a 1. if A and B then 2. DO_SOMETHING; 3. end if; Test A B 1 True False 2 True True 3 False False 4 False True As for condition operand coverage, Boolean operators and operands can be readily identified from design and code, facilitating automation and making condition operator coverage both comprehensible and maintainable. However, condition operator coverage becomes more complex and less comprehensible for more complicated conditions. Automation requires recording of Boolean operand values and the results of Boolean operator evaluations. As for condition operand coverage, achieving condition operator coverage will not be meaningful if a condition uses a flag set by a previous Boolean expression. Examples 7a and 7b illustrated this point. Boolean expression operator coverage improves upon the thoroughness of condition operator coverage by evaluating coverage for all Boolean expressions, not just those within branch conditions. Boolean Expression Operator Coverage = x/X where: x = Number of Boolean operator input combinations evaluated at least once.

X = Total number of Boolean operator input combinations. The thoroughness of Boolean expression operator coverage is higher than for condition operand coverage, in that sub-expressions of all compound conditions will be evaluated both true and false. The investigation of code, described in annex A, identified two infeasible operand combinations which prevented 100% condition operand coverage being achievable. Both of these operand combinations occurred in a single module. The general form of the infeasible combinations is given in example 8b. Example 8b 1. if (VALUE=N1) or (VALUE=N2) then 2. DO_SOMETHING; 3. end if; Test =N1 =N2 1 True False 2 False True 3 False False Infeasible True True The infeasible operand combinations were both due to mutually exclusive subexpressions, which (assuming N1 /= N2) could never both be true at the same time. Infeasible operand combinations are rare, are readily identifiable during design, and do not depend upon the topology of the code. Boolean expression operator coverage is much more achievable than LCSAJ coverage. Automation 4 Achievable 4 Comprehensible 4 Maintainable 5

Thoroughness 3 (Boolean 4)

9. Boolean Operand Effectiveness Coverage Boolean Operand Effectiveness Coverage = b/B where: b = Number of Boolean operands shown to independently influence the outcome of Boolean expressions. B = Total number of Boolean operands. To achieve Boolean operand effectiveness coverage, each Boolean operand must be shown to be able to independently influence the outcome of the overall Boolean expression. The straight forward relationship between test data and the criteria of Boolean operand effectiveness coverage makes the metric comprehensible and associated test data maintainable. This is illustrated by example 9a. Example 9a 1. if (A and B) or C then 2. DO_SOMETHING; 3. end if; Test A B C 1 true true false 2 false true false (Tests 1 and 2 show independence of A) 3 true true false 4 true false false

(Tests 3 and 4 show independence of B) 5 false false true 6 false false false (Tests 5 and 6 show independence of C) It is worth noting that there are other sets of test data which could have been used to show the independence of C. There were no infeasible operand values in the real code investigated (see annex A), and only two infeasible operand combinations, neither of which obstructed the criteria of Boolean operand effectiveness coverage. 100% Boolean operand effectiveness coverage was therefore achievable for all modules investigated. Boolean operand effectiveness coverage is only concerned with the operands and will not always identify expressions which are using an incorrect operator. Research by Boeing [7],[8] has shown that for single mutations to operators in a Boolean expression, Boolean Operand effectiveness coverage is as thorough as Boolean expression operator coverage, but that it is less thorough for multiple mutations. As multiple mutations are unlikely, we conclude that the thoroughness of test data designed to achieve 100% Boolean operand effectiveness coverage is about the same as the thoroughness of Boolean expression operator coverage. Automation of Boolean operand effectiveness coverage requires the state of all Boolean operands in a Boolean expression to be recorded each time the expression is evaluated. The ability of an operator to independently affect the outcome will not necessarily be demonstrated by adjacent evaluations of the expression (as in example 9a). Automation 3 Achievable 5 Comprehensible 5 Maintainable 5 Thoroughness 4

Static Analysis
Even software that has been thoroughly dynamically tested can have its problems. Static analysis gives the developer useful information on non-functional qualities of the source code, such as its maintainability and compliance with coding standards. Static analysis facilities allow the user to check that software complies with coding standards and is within acceptable limits of complexity. There is a large number of ‘common sense’ metrics for example • number of code statements measures of complexity(McCabe’s cyclomatic complexity metric) maximum depth of loop variable assignment

• •



In this chapter we are going to discuss the development of software testing module. We will discuss the development of project according various phases of software development life cycle. These phases are described in different section in this chapter.

8.1 Requirement analysis
Analysis is a software engineering task that bridges the gap between system level requirements engineering and software design. Requirements engineering activities result in the specifications of software’s operational characteristics (function, data, and behavior), indicates software’s interface with other system elements, and establish constraints that software must need. The first step was to identify the requirements of the project. After understanding the problem, we thought of different possible solutions and thought for various benefits and limitations of the each possible solution.

8.1.1 Problem specification
As the title of the project suggest we need to implement a software testing module. There are huge numbers of approaches through which the software can be tested. Many software techniques and strategies are defined, so it is obvious that thousand number of software tools are available in market based on different techniques and strategies. The technique which we have chosen to implement was white box testing. The reason behind it was as described in previous chapters in this technique the internal body of program is tested. It is also called programmers testing.

8.1.2 Approach to Problem
The algorithm which we have implemented is Basis path testing. In this algorithm as described in last chapters, it tests the flow of program on different sets of inputs. The input of this algorithm is flow graph notation of the function and test input data. There are different tools available those can convert the code to its flow graph by using reverse engineering. We were not interested in generating the flow graph from code in our module because it is only a manipulation of heavy finite automata. It could be some

kind of scanner that should recognize the syntax of whole language and arrange those construct in some kind of graph form. After observing the flow of program on certain test cases we will calculate the different matrices of coverage analysis for example statement coverage, conditional coverage etc, as defined in last chapters. In place of flow graph generator we will extend this module by adding the fine grained black box testing, that is component level testing. Component is a function that executes independently or in other words the module which is self contained. In this extension the various inputs or test cases will be applied to component, and after calculating the results those results will be matched with the expected results. Here we can test the correctness the components. The other extension will be the static analyzer. In this we can test that the particular function is following the programming standards or not. There are various matrices which can be tested those are defined in last chapters. The language used in development of this module is c++. The reason behind it is that it suited our purpose best. In testing a language we need to know very much about that language first, and with this language we are more familiar than others.
We analyze this problem and after thorough study we identified the various aspects on which we had to work. Then we defined the inputs those will be needed. we divided the working of system in various process and data, which flows through these processes. The data flow diagram shown in next section.

8.2 Design 8.2.1 Data flow diagrams Data flow diagram for coverage analysis
In figure 8.1 the level 0 data flow diagram for coverage analysis is shown. The main process defined in figure will accept the following inputs form different storage files. • Graph matrix: This is connection matrix for input flow graph. This matrix will have the non zero value in cell of matrix at the location where row will be the source node and column will be the destination node.

Graph node: This file will contain the information about the nodes of input flow graph in other words this will have the program code contained by each node. The codes are written in this file in sequential manner.

The output data form this process will be as follows • • CCP: This is conditional coverage percentage and will be calculated by the formula given in last chapters. SCP: This is statement coverage percentage and will be calculated by the formula given in last chapters. The main process coverage analysis will make the flow graph from given inputs and will trace the paths defined in flow graph in different set of test data.

Matrix file

CCPOF Graph Matrix Conditional Coverage Percent Coverage analysis Graph Node Statement Coverage Percent SCPOF

Node file

*CCPOF = Conditional Coverage Percentage Output File *SCPOF = Statement Coverage Percentage Output File LEVEL 0

Matrix file

Graph Matrix

Execute_scrip t

Make_graph Node file Graph Node

Statement Coverage Percent Gen_script Conditional Coverage Percent SCPOF CCPOF

*CCPOF = Conditional Coverage Percentage Output File *SCPOF = Statement Coverage Percentage Output File LEVEL 1

The level 1 data flow diagram for coverage analysis is shown in figure 8.2.Here the main process in divided into three processes the job is each process is as follows • Make graph: This process will take the input from graph matrix file and graph node file and generate the flow graph for future use by program and will be submitted to Gen_script process. • Gen_scriput: This process will generate the script by using flow graph .Here the script is a function written in separate file. This script is useful in executing the program code saved in graph node of flow graph. • Execute_script: This process will take the script as input and execute that, and will observe the execution of the script. Here some routines also defined , those will be helpful in viewing the progress graphically. Data flow diagram for dynamic analysis
The level 0 data flow diagram for dynamic analysis is shown in figure 8.3. The main process defined in figure will accept the following inputs form different storage files. • • • Function file: This file will contain the component or independent function which is going to be tested. The body is defined similarly as in program. Test file: This file will contain the test call of input function. The function will be tested on these calls with different test cases. Expected file: This file will contain the expected result calculated manually. These results will be compared with the result found after execution of component. Here the output will be the results of comparison that how many results have been matched with expected result, those are matched will be termed as pass others as failed. The main process will take these inputs and will calculate the output.

Function file

Function Result Output file

Expected result file Expected Results

Dynamic Analysis

Test Calls Test file LEVEL 0

The level 1 data flow diagram for coverage analysis is shown in figure 8.4.Here the main process in divided into three processes the job is each process is as follows: • Identify type: This process takes input function as its input and will give the return type of function to the script generator, so that the script can be generalized for different data types. • Script generator: This process will take test calls, expected results and return type as input and will generate the script. This function will use the function file in calling the test calls. • Execute script: This process will take script as input and will execute it and will write down the result in output file.

Function Function file Return type Test Calls Test file

Identify return type Execute Script Script

Output file


Script generator

Expected output file

Expected Results LEVEL 1

The level 2 data flow diagram for coverage analysis is shown in figure 8.5.Here the script generator process in divided into tow processes the job is each process is as follows • Gen script for strings: This process is divided because the way of handling the strings is different for strings. This process will take the all inputs necessary to generate the script and will generate for components having the return type string. • Gen script for others: This process will generate the script for components having the return type other than strings.

Function file Test file

Function Identify return type Test Calls Expected Results Return type Script Gen. Script for strings Return type Gen. Script for others Return type Script Script generator Script Output file

Result Script generator

Expected output file


8.3 Code generation
To show all the program code in this report in not possible so we are giving the brief description of few module which have more importance.

8.3.1 Modules used in coverage analysis
• Readlib: These modules reads the file graph node and takes the node code from file and store these in two dimensional array, that will be used in making the flow graph. • • • • • • • • Make node: This function allocates the memory to the data structure used to save the information about each node. Make graph: This module take connection matrix and node information as inputs and generate the graph. Caltotstmt: This module calculate the total number of statement present in whole function. Calcovstmt: This module will take the graph as input and calculate the number of statement executed at least once for current test case. Stcovper: This function will calculate the coverage metrics called statement coverage percentage by using the output of fuction caltotstmt and calcovstmt. Caltotcondition: This module calculate the total number of conditions present in whole function. Calcovcondition: This module will take the graph as input and calculate the number of conditions executed at least once for current test case. Stcovper: This function will calculate the coverage metrics called condition coverage percentage by using the output of function caltotcondition and calcovconditon. • • Show graph: This module shows the flow of control in function while executing graphically. Listrev: This function reverses the linked list. This is used in making flow graph and in show graph function also there to reverse the covered path list so that it can start form the beginning.

8.3.2 Modules used in dynamic analysis
• Browse: This function generate the list of text files present in current directory using the various service routine and will make a list that will be helpful in display menu module. • Display menu: This module show the list of text files generated by browse function and show these file in form of menu so that user can select the appropriate input files. • Generate script: This module takes the input function ,test calls and expected results as input and by finding the return type of function will generate the script in separate file. That file will be included by execute script routine and that routine will execute that function. The execute script also contain the procedure to compare the calculated result with expected result. • Get response: This function return the response of the mouse pointer, means after the menu is displayed the user will select the any of the item form menu. This function will identify the file name on which the user has clicked, this job is done by calculating the position of the co-ordinate on which the mouse button has clicked. • Mouse handling routines: There are various routines also defined those handles the behavior of mouse on screen few of those is listed here I. Show mouse pointer II. Hide mouse pointer III. Initialize mouse pointer IV. Highlight region V. Dehighlight region These routine does the same job as suggested by names of all these routines.

8.3.3 Modules used in static analysis
As we know that static analysis is about following the programmatical standards. There are number of rules or standards defined and function should use all those standards in code generation. Here in this module we have implemented two standards and function will be tested on these standards. • Variable assignment: The C compiler reports warning when a variable is initialized and not used, but there is no warning when the variable is not initialized and used. In this case the variable will have garbage value and when program will run using this garbage value that will produce wrong result. So we have implemented a finite automata in this module first that will recognize the all variable and then it will check that which variable is not initialized and will generate the output. The output will have variable name, its line number and status means it is assigned or unassigned. • Maximum depth of loops: The depth of loops affects the complexity of a function in great deal. As the depth increased the complexity multiplies with the upper limit of inner loops so in result the execution time increases. When complexity increases it also increases the number of test case required to execute the each and every path. In this module we have implemented a push down automata to identify the maximum depth of loops and if the depth is greater than three it will report the possibility of computation hazard, and if the depth is less than three the depth is tolerable.

8.4 Testing
Testing is discussed in so much detail earlier that it is unnecessary to explain about technique which we have use for testing our project. We tested our each analysis module on different functions and matched which the result calculated manually. We tested out each module on test cases covering the whole range and loops on its boundary values.

Sign up to vote on this title
UsefulNot useful