This action might not be possible to undo. Are you sure you want to continue?
Why is Testing Necessary Testing is necessary because the existence of faults in software is inevitable. Beyond fault-detection, the modern view of testing holds that fault-prevention (e.g. early fault detection/removal from requirements, designs etc. through static tests) is at least as important as detecting faults in software by executing dynamic tests. 1.1. What are Errors, faults, failures, and Reliability 1.1.1.An Error is… A human action producing an incorrect result The error is the activity undertaken by an analyst, designer, developer, or tester whose outcome is a fault in the deliverable being produced. When programmers make errors, they introduce faults to program code We usually think of programmers when we mention errors, but any person involved in the development activities can make the error, which injects a fault into a deliverable.
1.1.2. A Fault is…
A manifestation of human error in software A fault in software is caused by an unintentional action by someone building a deliverable. We normally think of programmers when we talk about software faults and human error. Human error causes faults in any project deliverable. Only faults in software cause software to fail. This is the most familiar situation. Faults may be caused by requirements, design or coding errors All software development activities are prone to error. Faults may occur in all software deliverables when they are first being written or when they are being maintained. Software faults are static - they are characteristics of the code they exist in When we test software, it is easy to believe that the faults in the software move. Software faults are static. Once injected into the software, they will remain there until exposed by a test and fixed. 1.1.3.A failure is… A deviation of the software from its expected delivery or service Software fails when it behaves in a different way that we expect or require. If we use the software properly and enter data correctly into the software but it behaves in an unexpected way, we say it fails. Software faults cause software failures when the program is executed with a set of inputs that expose the fault. A failure occurs when software does the 'wrong' thing We can say that if the software does the wrong thing, then the software has failed. This is a judgement made by the user or tester. You cannot tell whether software fails unless you know how the software is meant to behave. This might be explicitly stated in requirements or you might have a sensible expectation that the software should not 'crash'. 1.1.4.Reliability is… The probability that software will not cause the failure of a system for a specified time under specified conditions It is usually easier to consider reliability from the point of view of a poor product. One could say that an unreliable product fails often and without warning and lets its users down. However, this is an incomplete view. If a product fails regularly, but the users are unaffected, the product may still be deemed reliable. If a product fails only very rarely, but it fails without warning and brings catastrophe, then it might be deemed unreliable. Software with faults may be reliable, if the faults are in code that is rarely used If software has faults it might be reliable because the faulty parts of the software are rarely or never used - so it does not fail. A legacy system may have hundreds or thousands of known faults, but these exist in parts of the system of low criticality so the system may still be deemed reliable by its users. 1.2. Why do we test? 1.2.1.Some informal reasons • To ensure that a system does what it is supposed to do • To assess the quality of a system • To demonstrate to the user that a system conforms to requirements • To learn what a system does or how it behaves. 1.2.2.A technicians view
To find programming mistakes To make sure the program doesn't crash the system
1.3. Error and how do they occur 1.3.1.Imprecise capture of requirements Imprecision in requirements are the most expensive faults we encounter. Imprecision takes the form of incompleteness, inconsistencies, lack of clarity, ambiguity etc. Faults in requirements are inevitable, however, because requirements definition is a labour-intensive and error-prone process. 1.3.2.Users cannot express their requirements unambiguously When a business analyst interviews a business user, it is common for the user to have difficulty expressing requirements because their business is ambiguous. The normal daily workload of most people rarely fits into a perfectly clear set of situations. Very often, people need to accommodate exceptions to business rules and base decisions on gut feel and precedents which may be long standing (but undocumented) or make a decision 'on the fly'. Many of the rules required are simply not defined, or documented anywhere. 1.3.3.Users cannot express their requirements completely It is unreasonable to expect the business user to be able to identify all requirements. Many of the detailed rules that define what the system must do are not written down. They may vary across departments. Under any circumstance, the user being interviewed may not have experience of all the situations within the scope of the system. 1.3.4.Developers do not fully understand the business. Few business analysts, and very few developers have direct experience of the business process that a new system is to support. It is unreasonable to expect the business analyst to have enough skills to question the completeness or correctness of a requirement. Underpinning all this is the belief that users and analysts talk the same language in the first place, and can communicate. 1.4. Cost of a single fault We know that all software has faults before we test it. Some faults have a catastrophic effect but we also know that not all faults are disastrous and many are hardly noticeable. 1.4.1.Programmer errors may cause faults which are never noticed It is clear that not every fault in software is serious. We have all encountered problems with software that causes us great alarm or concern. But we have also encountered faults for which there is a workaround, or which are obvious, but of negligible importance. For example, a spelling mistake on a user screen, which our customers never see, which has no effect on functionality may be deemed 'cosmetic'. Some cosmetic faults are trivial. However, in some circumstances, cosmetic may also mean serious. What might our customers think if we spelt quality incorrectly on our Web site home page? 1.4.2.If we are concerned about failures, we must test more. If a failure of a certain type would have serious consequences, we need to test the software to ensure it doesn't fail in this way. The principle is that where the risk of software failure is high, we must apply more test effort. There is a straight trade off between the cost of testing and the potential cost of failure. 1.5. Exhaustive testing 1.5.1.Exhaustive testing of all program paths is usually impossible Exhaustive path testing would involve exercising the software through every possible program path. However, even 'simple' programs have an extremely large number of paths. Every decision in code with two outcomes, effectively doubles the number of program paths. A 100-statement program might have twenty decisions in it so might have 1,048,576 paths. Such a program would rightly be regarded as trivial compared to real systems that have many thousand or millions of statements. Although the number of paths may not be infinite, we can never hope to test all paths in real systems. 1.5.2.Exhaustive testing of all inputs is also impossible If we disregard the internals of the system and approach the testing from the point of view of all possible inputs and testing these, we hit a similar barrier. We can never hope to test all the infinite number of inputs to real systems. 1.5.3. If we could do exhaustive testing, most tests would be duplicates that tell us nothing Even if we used a tool to execute millions of tests, we would expect that the majority of the tests would be duplicates and they would prove nothing. Consequently, test case selection (or design) must focus on selecting the most important or useful tests from the infinite number possible.
1.6. Effectiveness and efficiency
A test that exercises the software in ways that we know will work proves nothing We know that if we run the same test twice we learn very little second time round. If we know before we run a test, that it will almost certainly work, we learn nothing. If we prepare a test that explores a new piece of functionality or a new situation, we
know that if the test passes we will learn something new - we have evidence that something works. If we test for faults in code and we try to find faults in many places, we increase our knowledge about the quality of the software. If we find faults, we can fix them. If we do not find faults, our confidence in the software increases. Effective tests When we prepare a test, we should have some view on the type of faults we are trying to detect. If we postulate a fault and look for that, it is likely we will be more effective. In other words, tests that are designed to catch specific faults are more likely to find faults and are therefore more effective. Efficient tests If we postulate a fault and prepare a test to detect that, we usually have a choice of tests. We should select the test that has the best chance of finding the fault. Sometimes, a single test could detect several faults at once. Efficient tests are those that have the best chance of detecting a fault. 1.7. Risks help us to identify what to test The principle here is that we look for the most significant and likely risks and use these to identify and prioritise our tests. We identify the most dangerous risks of the system Risks drive our testing. The more typical risks are: (1) Gaps in functionality may cost users their time. An obvious risk is that we may not have built all the required features of the system. Some gaps may not be important, but others may badly undermine the acceptability of the system. For example, if a system allows customer details to be created, but never amended, then this would be a serious problem, if customers moved location regularly, for example. (2) Poor design may make software hard to use. For some applications, ease of use is critical. For example, on a web site used to take orders from household customers, we can be sure that few have had training in the use of the Net or more importantly, our web site. So, the web site MUST be easy to use. (3) Incorrect calculations may cost us money. If we use software to calculate balances for customer bank accounts, our customers would be very sensitive to the problem of incorrect calculations. Consequently, tests of such software would be very high in our priorities. (4) Software failure may cost our customers money. If we write software and our customers use that software to, say, manage their own bank accounts then, again, they would be very sensitive to incorrect calculations so we should of course test such software thoroughly. (5) Wrong software decision may cost a life. If we write software that manages the control surfaces of an airliner, we would be sure to test such software as rigorously as we could as the consequences of failure could be loss of life and injury. We want to design tests to ensure we have eliminated or minimised these risks. We use testing to address risk in two ways: Firstly we aim to detect the faults that cause the risks to occur. If we can detect these faults, they can be fixed, retested and the risk is eliminated or at least reduced. Secondly, if we can measure the quality of the product by testing and fault detection we will have gained an understanding of the risks of implementation, and be better able to decide whether to release the system or not. 1.8. Risks help us to determine how much we test We can evaluate risks and prioritise them Normally, we would constitute a brainstorming meeting, attended by the business and technical experts. From this we identify the main risks and prioritise them as to which are most likely to occur and which will have the greatest impact. What risks conceivably exist? These might be derived from past or current experience. Which are probable, so we really ought to consider them? The business experts need to assess the potential impact of each risk in turn. The technical experts need to assess the potential impact of each risk. If the technical risk can be translated into a business risk, the business expert can then assign a level of impact. For each risk in turn, we identify the tests that are most appropriate. That is, for each risk, we select system features and/or test conditions that will demonstrate that a particular fault that causes the risk is not present or it exposes the fault so the risk can be reduced. We never have enough time to test everything so... The inventory of risks are prioritised and used to steer decision making on the tests that are to be prepared. We test more where the risk of failure is higher. Tests that address the most important risks will be prioritised higher.
We test less where the risk of failure is lower. Tests that do not address any identified risk or address low priority risks may be de-scoped. Ultimately, the concept of risks helps us to ensure the most important tests are implemented in our limited budget. Only in this way can we achieve a balanced test approach. 1.9. Testing and quality Testing and quality are obviously closely related. Testing can measure quality of a product and indirectly, improve its quality. Testing measures quality Testing is a measurement activity. Testing gives us an insight into how closely the product meets it specification so it provides an objective measure of its fitness for purpose. If we assess the rigour and number of tests and if we count the number of faults found, we can make an objective assessment of the quality of the system under test. Testing improves quality When we test, we aim to detect faults. If we do detect faults, then these can be fixed and the quality of the product can be improved. 1.10.Testing and confidence We know that if we run tests to detect faults and we find faults, then the quality of the product can be improved. However, if we look for faults and do not find any, our confidence is increased. If we buy a software package Although our software supplier may be reputable and have a good test process, we would normally assume that the product works, but we would always test the product to give us confidence that we really are buying a good product. We may believe that a package works, but a test gives us the confidence that it will work. When we buy a car, cooker, off-the-peg suit When we buy mass produced goods, we normally assume that they work, because the product has probably been tested in the factory. For example, a new car should work, but before we buy we would always give the car an inspection, a test drive and ask questions about the car's specification - just to make sure it would be suitable. Essentially, we assume that mass produced goods work, but we need to establish whether they will work for us. When we buy a kitchen, haircut, bespoke suit For some products, we are involved in the requirements process. If we had a kitchen designed we know that although we were involved in the requirements, there are always some misunderstandings, some problems due to the imperfections of the materials and our location and the workmanship of the supplier. So, we would wish to be kept closely informed of progress and monitor the quality of the work throughout. To recap, if we were involved in specifying or influencing the requirements, we need to test. 1.11.Testing and contractual requirements Testing is normally a key activity that takes place as part of the contractual arrangement between the supplier and user of software. Acceptance test arrangements are critical and are often defined in their own clause in the contract. Acceptance test dates represent a critical milestone and have two purposes: to protect the customer from poor products and to provide the supplier with the necessary evidence that they have completed their side of the bargain. Large sums of money may depend on the successful completion of acceptance tests. • When we buy custom-built software, a contract will usually state o the requirements for the software o the price of the software o the delivery schedule and acceptance process • We don't pay the supplier until we have received and acceptance tested the software • Acceptance tests help to determine whether the supplier has met the requirements. 1.12.Testing and other requirements Software requirements may be imposed: There are other important reasons why testing may figure prominently in a project plan. Some industries, for example, financial services, are heavily regulated and the regulator may impose rigorous conditions on the acceptability of systems used to support an organisation's activities. Some industries may self-regulate, others may be governed by the law of the land. The Millennium bug is an obvious example of a situation where customers may insist that a supplier's product is compliant in some way, and may insist on conducting tests of their own. For some software, e.g., safety-critical, the type and amount of testing, and the test process itself, may be defined by industry standards.
On almost all development or migration projects, we need to provide evidence that a software product is compliant in one way or another. It is, by and large, the test records that provide that evidence. When project files are audited, the most reliable evidence that supports the proposition that software meets its requirements is derived from test records. 1.13.Types of faults in a system Fault Type Requirements Features and functionality Structural Bugs Data Integration System, Software Architecture Test Definition and Execution Other, Unspecified % 8.1 16.2 25.2 22.4 9.0 1.7 2.8 4.7 This table is derived from Beizer's Software Test Techniques book. It demonstrates the relative frequency of faults in software. Around 25% of bugs are due to 'structure'. These are normally wrong or imprecise decisions made in code. Often programmers concentrate on these. There are significant percentages of other types. Most notable is that 8% are requirements faults. We know that these are potentially the most expensive because they could cost more than the rest of the faults combined. The value of categorising faults is that it helps us to focus our testing effort where it is most important. We should have distinct test activities that address the problems of poor requirements, structure, integration etc. In this way, we will have a more effective and efficient test regime.
Implementation and Coding 9.9
2. Cost and Economics of Testing 2.1. Life Cycle costs of testing Whole Lifecycle Initial Development (20%) Testing 50% Maintenance (80%) Testing 75% Testing = 75% of the whole lifecycle cost. The split of costs described in the table is a great generalisation. Suffice to say, that the costs of testing in the majority of commercial system development is between 40 and 60%. This includes all testing such as reviews, inspections and walkthroughs, programmer and private testing as well as more visible system and acceptance tests. The percentage may be more (or less) in your environment, but the important issue is that the cost of testing is very significant. Once deployed in production, most systems have a lifespan of several years and undergo repeated maintenance. Maintenance in many environments could be considered to be an extended development process. The significance of testing increases dramatically, because changing existing software is error-prone and difficult, so testing to explore the behaviour of existing software and the potential impact of changes takes very much longer. In higher integrity environments, regression testing may dominate the budget. The consequence of all this is that over the entire life of a product, testing costs may dominate all development costs. 2.2. Economics of testing The trick is to do the right amount of the right kind of testing. Too much testing is a waste of money Doing more testing than is appropriate is expensive and likely to waste money because we are probably duplicating effort. Too little is costly Doing too little testing is costly, because we will leave faults in the software that may cost our business users dearly. The cost of the faults may cost more than the testing effort that could have removed them. Even worse is the wrong kind of testing Not only do we waste money by doing too much testing in some areas, by doing too little in other areas, we might miss faults that could cost us our business. 2.3. Influences on the economics of testing How much does testing cost? If we are to fit the right amount of testing into our development budget, we need to know what influences these costs. Degree of risk to be addressed Obviously, if the risk of failure is high, we are more likely to spend more time testing. We would spend little time testing a macro which helped work out car mileage for expenses. We might check the results of a single test and think "that sounds about right". If we were to test software upon which our life depended for example, an aeroplane control system, we are much more likely to commit a lot of time to testing to ensure it works correctly. Efficiency of the test process Like all development activities, there are efficient and inefficient ways to perform tasks. Efficient tests are those which exercise all the diverse features of the software in a large variety of situations. If each test is unique, it is likely to be a very efficient test. If we simply hire people to play with some software, if we don't give them guidance and don't adopt a systematic approach, it is unlikely that we will cover all the software or situations we need to without hiring a large number of people to run tests. This is likely to be very inefficient and expensive. Level of automation Many test activities are repetitive and simple. Test execution is particularly prone to automation by a suitable tool. Using a tool, tests can be run faster, more reliably and cheaper than people can ever run them. Skill of the personnel Skilled testers adopt systematic approaches to organisation, planning, preparation and execution of tests. Unskilled testers are disorganised, ineffective and inefficient. And expensive too. The target quality required. If quality is defined as 'fitness for purpose', we test to demonstrate that software meets the needs of its users and is fit for purpose. If we must be certain that software works in every way defined in the requirements, we will probably need to prepare many more tests to explore every piece of defined functionality in very detailed ways. 2.4. How much Testing is enough?
There are an infinite number of tests we could apply and software is never perfect We know that it is impossible (or at least impractical) to plan and execute all possible tests. We also know that software can never be expected to be perfectly fault-free (even after testing). If 'enough' testing were defined as 'when all the faults have been detected', we obviously have a problem - we can never do 'enough'. So how much testing is enough? So is it sensible to talk about 'enough' testing? Objective coverage measures can be used: There are objective measures of coverage (targets) that we can arbitrarily set, and meet. These are normally based on the traditional test design techniques (see later). Test design techniques give an objective target. The test design and measurement techniques set out coverage items and then tests can be designed and measured against these. Using these techniques, arbitrary targets can be set and met. standards may impose a level of testing Some industries have industry specific standards. DO-178b is a standard for airborne software, and mandates stringent test coverage targets and measures. But all too often, time is the limiting factor The problem is that for all but the most critical developments, even the least stringent test techniques may generate many more tests than are possible or acceptable within the project budget available. In many cases, testing is time limited. Ultimately, even in the highest integrity environments, time limits what testing can be done. We may have to rely on a consensus view to ensure we do at least the most important tests. Often the test measurement techniques give us an objective 'benchmark', but possibly, there will be an impractical number of tests, so we usually need to arrive at an acceptable level of testing by consensus. It is an important role for the tester to provide enough information on risks and the tests that address these risks so that the business and technical experts can understand the value of doing some tests while understanding the risks of not doing other tests. In this way, we arrive at a balanced test approach. 2.5. Where are the bugs? Of course, if we knew that, we could fix them and go home! What a silly question! If we knew where the bugs were, we could simply fix each one in turn and perfect the system. We can't say where any individual fault is, but we can make some observations on, say a macroscopic level. Experience tells us… Experience tells us a number of things about bugs. Bugs are sociable! - they tend to cluster Bugs are sociable, they tend to cluster. Suppose you were invited into the kitchen in a restaurant. While you are there, a large cockroach scurries across the floor and the chef stamps on it and kills it saying "I got the bug". Would you still want to eat there? Probably not. When you see a bug in this context we say "it's infested". It's the same with software faults. Experience tells us that bugs tend to cluster, and the best place to find the next bug is in the vicinity of the last one found. some parts of the system will be relatively bug-free Off the shelf components are likely to have been tested thoroughly and used in many other projects. Bugs found in these components in production have probably been reported and corrected. The same applies to legacy system code that is being reused in a new project. Bug fixing and maintenance are error-prone - 50% of changes cause other faults. Bug fixing and maintenance are error-prone - 50% of changes cause other faults. Have you ever experienced the 'Friday night fix' that goes wrong? All too often, minor changes can disrupt software that works. Tracing the potential impact of changes to existing software is extremely difficult. Before testing, there is a 50% chance of a change causing a problem (a regression) elsewhere in existing software. Maintenance and bug-fixing are error-prone activities. The principle here is that faults do not uniformly distribute themselves through software. Because of this, our test activities should vary across the software, to make the best use of tester's time. 2.6. What about the bugs we can't find? If not in the business critical parts of the system - would the users care? If we've tested the business critical parts of the software, we can say that the bugs that get through are less likely to be of great concern to the users. If not in the system critical parts of the system - should be low impact If we've tested the technically critical parts of the software, we can say that the bugs that get through are less likely to cause technical failures, so perhaps there's no issue there either. Faults should be of low impact. If they are in the critical parts of the system
The bugs remaining in the critical part of the system should be few and far between. If bugs do get through and are in the critical parts of the software, at least we can say that this is the least likely situation as we will have eliminated the vast majority of such problems. Such bugs should be very scarce and obscure. 2.7. Balancing cost and risk Can always do more testing - there is no upper limit Even for the simplest systems, we know that there are an infinite number of tests possible. There is no upper limit on the number of tests we could run. Ultimately, time and cost limit what we can do It is obvious we have to limit the amount of testing because our time and money is limited. So we must look for a balance between the cost of doing testing and the potential or actual risks of not testing. Need to balance: We need to balance the cost of doing testing against the potential cost of risk. It is reasonably easy to set a cost or time limit for the testing. The difficult part is balancing this cost against a risk. The potential impact of certain risks may be catastrophic and totally unacceptable at any cost. However, we really need to take a view on how likely the risks are. Some catastrophic failures may be very improbable. Some minor failures may be very common but be just as serious if they happen too often. In either case, a judgement on how much testing is appropriate must be made. 2.8. Scalability Scalability in the context of risk and testing relates to how we do the right amount of the right kind of testing. Not all systems can or should be tested as thoroughly as is technically possible. Not every system is safety-critical. In fact the majority of systems support relatively low-criticality business processes. The principle must be that the amount of testing must be appropriate to the risks of failure in the system when used in production. Not all systems, sub-systems or programs require the same amount of testing It is obviously essential that testing is thorough when we are dealing with safety critical software. We must obviously do as much as possible. But low criticality systems need testing too, but how much testing is reasonable in this circumstance? The right amount of testing needs to be determined by consensus. Will the planned test demonstrate to the satisfaction of the main stakeholders that the software meets its specification, that it is fault free? Standards and procedures have to be scalable depending on The risks, timescales and cost, and the quality required govern the amount and type of testing that should be done. Standards and procedures, therefore, must be scalable depending on these factors. Our test approach may be unique to today's project, but we normally have to reuse standard procedures for test planning, design and documentation. Within your organisation, there may be a single methodology for all system development, but it is becoming more common for companies to adopt flexible development methodologies to accommodate the variety in project scale, criticality and technology. It is less common for those organisations to have flexible test strategies that allow the tester to scale the testing and documentation in a way that is consistent with the project profile. A key issue in assessing the usefulness of a test strategy is its flexibility and the way it copes with the variety in software projects. The principle means by which we can scale the amount of testing is to adopt some mechanism by which we can measure coverage. We select a coverage measure to define a coverage target and to measure the amount of testing done against that target to give us an objective measure of thoroughness and progress. Fundamental Test Process : 3. Testing Process 3.1. What is a test? A test is a controlled exercise involving: What is a test? Do you remember the biology or physics classes you took when you were 13 or 14? You were probably taught the scientific method where you have a hypothesis, and to demonstrate the hypothesis is true (or not) you set up an experiment with a control and a method for executing a test in a controlled environment. Testing is similar to the controlled experiment. (You might call your test environment and work area a test 'lab'). Testing is a bit like the experimental method for software. You have an object under test that might be a piece of software, a document or a test plan. The test environment is defined and controlled. You define and prepare the inputs - what we’re going to apply to the software under test.
You also have a hypothesis, a definition of the expected results. So, that’s kind of the absolute fundamentals of what a test it. You need those four things. When a test is performed you get Have you ever been asked to test without requirements or asked to test without having any software? It's not very easy to do is it? When you run a test, you get an actual outcome. The outcome is normally some change of state of the system under test and outputs (the result). Whatever happens as a result of the test must be compared with the expected outcome (your hypothesis). If the actual outcome matches the expected outcome, you hypothesis is proven. That is what a test is. 3.2. Expected results When we run a test, we must have an expected result derived from the baseline Just like a controlled experiment, where a hypothesis must be proposed in advance of the experiment taking place, when you run a test, there must be an expected outcome defined beforehand. If you don't have an expected result, there is a risk that the software does what it does and because you have nothing to compare its behaviour to, you may assume that the software works correctly. If you don’t have an expected result at all, you have no way of saying whether the software is correct or incorrect because you have nothing to compare the software's behaviour with. Boris Beizer (ref) suggests that if you watch an eight-year old play pool – they put the cue ball on the table; they address the cue ball, hit it as hard as they can, and if a ball goes in the pocket, the kid will say, "I meant that". Does that sound familiar? What does a professional pool player do? A pro will say, "xxx ball in the yyy pocket". They address the cue ball, hit it as hard as they can, and if it goes in, they will say, "I meant that" and you believe them. It’s the same with testing. A kiddie tester will run some tests and say “that looks okay" or "that sounds right…”, but there will no comparison, no notion of comparison with an expected result - there is no hypothesis. Too often, we are expected to test without a requirement or an expected result. You could call it 'exploratory testing' but strictly, it is not testing at all. An actual result either matches or does not match the expected result What we are actually looking for is differences between our expected result and the actual result. If there is a difference, there may be a fault in the software and we should investigate. If we see a difference, the software may have failed, and that is how we are going to infer the existence of faults in the software. 3.3. What are the test activities? Testing includes: It is important to recognise that testing is not just the act of running tests. What are the testing activities then? Testing obviously includes the planning and scoping of the test and this involves working out what you’re going to do in the test - the test objectives. Specification and preparation of test materials delivers the executable test itself. This involves working out test conditions, cases, and creating test data, expected results and scripts themselves. Test execution involves actually running the test itself. Part of test execution is results recording. We keep records of actual test outcomes. Finally, throughout test execution, we are continually checking for whether we have met our coverage target, our completion criteria. The object under test need not be machine executable. The other key point to be made here is that testing, as defined in this course, covers all activities for static and dynamic testing. We include inspections, reviews, walkthrough activities so static tests are included here too. We'll go through the typical test activities in overview only. 3.4. Test planning How the test strategy will be implemented Test planning comes after test strategy. Whereas a strategy would cover a complete project lifecycle, a test plan would normally cover a single test stage, for example system testing. Test planning normally involves deciding what will be done according to the test strategy but also should say how we’re going to do things differently from that strategy. The plan must state what will be adopted and what will be adapted from the strategy. Identifies, at a high level, the scope, approach and dependencies When we are defining the testing to be done – we identify the components to be tested. Whether it is a program, a subsystem, a complete system, an interfacing system, you may need additional infrastructure. If we’re testing a single component, we may need to have stubs and drivers and other scaffolding, other material in place to help us on a test. This is the basic scoping information defined in the plan.
Having identified what is to be tested, we would normally specify an approach to be taken for test design. We could say that testing is going to be done by users, left to themselves (a possible, but not very sophisticated approach) – or that formal test design techniques will be used to identify test cases and work that way. Finally, the approach should describe how testing will be deemed complete. Completion criteria (often described as exit or acceptance criteria) state how management can judge that the testing is completed. Very briefly, that’s what planning is about. 3.5. Test specification Test inventory (logical test design) With specification we are concerned with identifying, at the next level down from planning, the features of a system to be tested – described by the requirements that we would like to cover. For each feature, we would normally identify the conditions to test by using a test design technique. Tests are designed in this way to achieve the acceptance criteria. When we design the test, select the features to test, then identify test conditions, as we do this, we build up an inventory of test conditions and using the features and conditions inventory we can have enough detail to say that we've covered features, and exercised those features adequately. As we build up the inventory of test conditions, we might, for example find that there are 100 test conditions to exercise in our test. From the test inventory, we might estimate how long it will take to complete the test and execute it. It may be that we haven’t got enough time. The project manager says, "you’d like to do 100 tests, but we’ve only got time to do 60". So, part of the process of test specification must be to prioritise test conditions. We might go through the test inventory and label features and test conditions high, medium and low priority. So, test specification generates a prioritised inventory of test conditions. Because we know that when we design a test, we may not have time to complete the test, prioritisation is always part of specification. Test preparation (test implementation) From the inventory, we can expand that into the test scripts, the procedures, and the materials that we’re going to use to drive the testing itself. From the sequence of test steps and conditions, we can identify requirements for test data in the database and perhaps initial conditions or other environmental set-up. From the defined input data for the test cases we can then predict expected results. Test specification ends with the delivery of test scripts, including input data and expected results. 3.6. Test execution and recording Tests follows the scripts, as defined We go to the trouble of creating test scripts for the sole purpose of executing the test, and we should follow test scripts precisely. The intention is that we don’t deviate from the test script because all the decisions have been made up front. Verify that actual results meet expected results During test execution, we verify that actual results match the expected results. Log test execution As we do this, we log progress – test script passes, failures, and we raise incident reports for failures. 3.7. Test checking for completion The test process as defined in BS7925-2 – the standard for component testing – has been nominated as the standard process that tests should follow. This is reasonable for most purposes, as it is fairly high-level. The slight problem with it is that there is a notion in the standard process that every time you run a test, you must check to see whether you have met the completion criteria. With component level tests, this works fine, but with system testing it doesn’t work that way. You don’t want to have to say, “have I finished yet?” after every test case, because it doesn’t work that way.In the standard process, there is a stage called Test Checking for Completion. It is during this activity that we check whether we have met our completion criteria. Completion criteria vary with different test stages. In system and acceptance testing, we tend to require that the test plan has been completed without a failure. With component testing, we may be more driven by the coverage target, and we may have to create more and more tests to achieve our target. • Objective, measurable criteria for test completion, for example o All tests run successfully o All faults found are fixed and re-tested o Coverage target (set and) met o Time (or cost) limit exceeded • Coverage items defined in terms of o Requirements, conditions, business transactions o Code statements, branches. Often, time pressure forces a decision to stop testing. Often, development slips and testing is ‘squeezed’ to ensure a timely delivery into production. This is a compromise but it may be that some faults are acceptable. When time runs out for testing, the decision to continue testing or to release the system forces a dilemma on the project. “Should we release the system early (on time), with faults, or not?” It is likely that if time runs out you may be left with the fact that some
tests have failures and are still outstanding. Some tests you may not have run yet. So it is common that the completion criteria are compromised. If you do finish all of your testing and there is still time leftover, you might choose to write some more tests, but this isn’t very likely. If you do run out of time, there is the third option: you could release the system, but continue testing to the end of the plan. If you find faults after release, you can fix them in the next package. You are taking a risk but there may be good reasons for doing so. However, clear-cut as the textbooks say completion criteria are, it’s not usually as clean. Only in high-integrity environments does testing continue until the completion criteria are met. • Under time pressure in low integrity systems o Some faults may be acceptable (for this release) o Some tests may not be run at all • If there are no tests left, but there is still time o Maybe some additional tests could be run • You may decide to release the software now, but testing could continue. 3.8. Coverage What we use to quantify testing Testing is open ended - we can never be absolutely sure we have done the right amount, so we need at least to be able to set objective targets for the amount of testing to measure our progress against. Coverage is the term for the objective measures we use to define a target for the amount of testing required, as well as how we measure progress against that target. Defines an objective target for the amount of testing to perform We select coverage measure to help us define an objective target for the amount of testing. Measures completeness or thoroughness As we prepare or execute tests, we can measure progress against that target to determine how complete or thorough our testing has been. Drives the creation of tests to achieve a coverage target The coverage target is usually based on some model of the requirements or the software under test. The target sets out required number of coverage items to be achieved. Most coverage measures give us a systematic definition of the way we must design or select tests, so we can use the coverage target and measure as a guide for test design. If we keep creating tests until the target is met, then we know the tests constitute a thorough and complete set of tests. Quantifies the amount of testing to make estimation easier. The other benefit of having objective coverage measures is that they generate low-level items of work that can have estimated effort assigned to them. Using coverage measures to steer the testing means we can adopt reasonable bottomup estimation methods, at least for test design and implementation. 3.9. Coverage definitions The good thing about coverage definitions are that we can often reduce the difficult decision of how much testing is appropriate to a selection of test coverage measures. Rather than say we will do a lot of testing, we can reduce an unquantifiable statement to a definition of the coverage measures to be used. For example, we can say that we will test a certain component by covering all branches in the code and all boundary values derived from the specification. This is a more objective target that is quantifiable. Coverage targets and measures are usually expressed as percentages. 100% coverage is achieved when all coverage items are exercised in a test. Coverage measures - a model or method used to quantify testing (e.g. decision coverage) Coverage measures are based on models of the software. The models represent an abstraction of the software or its specification. The model defines a technique for selecting test cases that are repeatable and consistent and can be used by testers across all application areas. Coverage item -the unit of measurement (a decision) Based on the coverage model, the fundamental unit of coverage, called a coverage item, can be derived. From the definition of the coverage item, a comprehensive set of test cases can be derived from the specification (functional test cases) or from the code (structural test cases). Functional techniques Functional test techniques are those that use the specification or requirements for software to derive test cases. Examples of functional test techniques are equivalence partitioning, boundary value analysis and state transitions. Structural techniques.
Structural test techniques are those that use the implementation or structure of the built software to derive test cases. Examples of structural test techniques are statement testing, branch testing, linear code sequence and jump (LCSAJ) testing. 3.10.Structural coverage There are over fifty test techniques that are based on the structure of code. Most are appropriate to third generation languages such as COBOL, FORTRAN, C, BASIC etc. In practice, only a small number of techniques are widely used as tools support is essential to measure coverage and make the techniques practical. Statement, decision, LCSAJ... The most common (and simplest) structural techniques are statement and branch (also known as decision) coverage. Measures and coverage targets based on the internal structure of the code Coverage measures are based on the structure (the actual implementation) of the software itself. Statement coverage is based on the executable source code statements themselves. The coverage item is an executable statement. 100% statement coverage requires that tests be prepared which, when executed, every executable statement is exercised. Decision testing depends on the decisions made in code. The coverage item is a single decision outcome and 100% decision coverage requires all decision outcomes to be covered. Normal strategy: The usual approach to using structural test techniques is as follows: (1) Use coverage tool to instrument code. A coverage tool is used to pre-process the software under test. The tool inserts instrumentation code that has no effect on the functionality of the software under test, but logs the paths through the software when it is compiled and run through tests. (2) Execute tests. Test cases are prepared using a functional technique (see later) and executed on the instrumented software under test. (3) Use coverage tool to measure coverage. The coverage tool is then used to report on the actual coverage achieved during the tests. Normally, less than 100% coverage is achieved. The tool identifies the coverage items (statements, branches etc.) not yet covered. (4) Enhance test to achieve coverage target. Additional tests are prepared to exercise the coverage items not yet covered. (5) Stop testing when coverage target is met. When tests can be shown to have exercised all coverage items (100% coverage) no more tests need be created and run. Note that 100% coverage may not be possible in all situations. Some software exists to trap exceptional or obscure error conditions and it may be very difficult to simulate such situations. Normally, this requires special attention or additional scaffolding code to force the software to behave the way required. Often the 100% coverage requirement is relaxed to take account of these anomalies. Structural techniques are most often used in component or link test stages as some programming skills are required to use them effectively. 3.11.Functional coverage There are fewer functional test techniques than structural techniques. Functional techniques are based on the specification or requirements for software. Functional test techniques do not depend on the code, so are appropriate for all software at all stages, regardless of the development technology. Equivalence partitions, boundary values, decision tables etc. The most common (and simplest) functional test techniques are equivalence partitioning and boundary value analysis. Other techniques include decision tables, state transition testing. Measures based on the external behaviour of the system Coverage measures are based on the behaviours described in the external specification. Equivalence partitioning is based on partitioning the inputs and outputs of a system and exercising each partition at least once to achieve coverage. The coverage item is an equivalence partition. 100% coverage requires that tests be prepared which, when executed, exercise every partition. Boundary values are the extreme values for each equivalence partition. Test cases for every identified boundary value are required to achieve 100% boundary value coverage. Inventories of test cases based on functional techniques There are few tools that support functional test techniques. Those that do tend to require the specification or requirements documents to be held in a structured manner or even using a formal notation. Most commonly, a specification is analysed and tables or inventories of logical test cases are built up and comprise a test specification to be used to prepare test data, scripts and expected results.
The value of recording test cases in a tabular format is that it becomes easier to count and prioritise these test cases if the tester finds that too many are generated by the test technique. Using a test technique to analyse a specification, we can be confident that we have covered all the system behaviours and the full scope of functionality, at least as seen by the user. The techniques give us a powerful method to ensure we create comprehensive tests which are consistent in their depth of coverage of the functionality, e.g., we have a measure of the completeness of our testing. 3.12.Completion, closure, exit, or acceptance criteria All the terms above represent criteria that we define before testing starts to help us to determine when to stop testing. We normally plan to complete testing within a pre-determined timescale, so that if things go to plan, we will stop preparing and executing tests when we achieve some coverage target. At least as often, however, we run out of time, and in these circumstances, it is only sensible to have some statement of intent to say what testing we should have completed before we stop. The decision to stop testing or continue can then be made against some defined criteria, rather than by 'gut feel'. Trigger to say: "we've done enough" The principle is that given there is no upper limit on how much testing we could do, we must define some objective and rational criteria that we can use to determine whether 'we've done enough'. Objective, non-technical for managers Management may be asked to define or at least approve exit criteria, so these criteria must be understandable by managers. For any test stage, there will tend to be multiple criteria that, in principle, must be met before the stage can end. There should always be at least one criterion that defines a test coverage target. There should also be a criterion that defines a threshold beneath which the software will be deemed unacceptable. Criteria should be measurable, as it is inevitable that some comparison of the target with reality must be performed. Criteria should also be achievable, at least in principle. Criteria that can never be achieved are of little value. Some typical types of criterion which are used regularly are listed below. 3.13.Limitations of testing Many non-testers believe that testing is easy, that software can be tested until it is fault free, that faults are uniformly difficult (or easy) to detect. Testers must not only understand that there are limits to what can be achieved, but they must also be able to explain these limitations to their peers, developers, project manager and users. Testing is a sampling activity, so can never prove 'mathematical' correctness We know that testers can only run a small proportion of all possible tests. Testing is really a 'sampling' activity. We only ever scratch the surface of software in our tests. Because of this we can never be 100% or mathematically certain that all faults have been detected. It is a simple exercise to devise new fault in software which none of our current tests would detect. In reality, faults appear in a pseudo-random way, so obscure or subtle faults are always likely to foil the best tester. Always possible to create more tests so it is difficult to know when you are finished Even when we believe we have done enough testing, it is relatively simple to think of additional tests that might enhance our test plan. Even though the test techniques give us a much more systematic way of designing comprehensive tests, there is never any guarantee that such tests find all faults. Because of this testers are tempted into thinking that there is always another test to create and so are 'never satisfied' that enough testing has been done; that they never have enough time to test. Given these limitations, there are two paradoxes which can help us to understand how we might better develop good tests and the limitations of our 'art'. Testing paradoxes: (1) The best way to gain confidence in software is to try and break it. The only way we can become confident in our software is for us to try difficult, awkward and aggressive tests. These tests are most likely to detect faults. If they do detect faults, we can fix the software and the quality of the software is increased. If they do not detect a fault, then our confidence in the software is increased. Only if we try and break the software are we likely to get the required confidence. (2) You don't know how good your testing is until maybe a year after release. A big problem for testers is that it is very difficult to determine whether the quality or effectiveness of our testing is good or bad until after the software has gone into production. It is the faults that are found in production by users that give us a complete picture of the total number of bugs that should have been found. Only when these bugs have been detected can we derive a view on our test effectiveness. The more bugs found in testing, compared to production, the better our testing has been. The difficulty is that we might not get the true picture until all production bugs have been found, and that might take years! 3.14.The Psychology of Testing Testers often find they are odds with their colleagues. It can be counter-productive if developers think the testers are ‘out to get them’ or ‘are sceptical, nit-picking pedants whose sole aim is to hold up the project’. Less professional managers can convince testers that they do not add value or are a brake on progress. 3.14.1.Goal 1: make sure the system works – implications A successful test shows a system is working
Like all professional activities, it is essential that testers have a clear goal to work towards. Let’s consider one way of expressing the goal of a tester. ‘Making sure the system works’. If you asked a group of programmers ‘what is the purpose of testing?’, they’d probably say something like, ‘to make sure that the program works according to the specification’, or a variation on this theme. This is not an unreasonable or illogical goal, but there are significant implications to be considered. If your job as a tester is to make sure that a system works, the implication is that a successful test shows that the system is working. Finding a fault undermines the effectiveness of testers If ‘making sure it works’ is our goal, it undermines the job of the testers because it is de-motivating. It seems that the better we are at finding faults, the farther we get from our goal, so it is de-motivating. It is also destructive because everyone in the project is trying to move forward, but the testers continually hold the project back. Testers become the enemy of progress and we aren’t ‘team players’. Under pressure, if a tester wants to meet their goal, the easiest thing to do is to prepare ‘easy’ tests, simply to keep the peace. The boss will then say ‘good job’. It is the wrong motivation because the incentive to a tester becomes don’t find faults, don’t rock the boat. If you’re not effective at finding faults, you can’t have confidence in the product – you’ve never pushed it hard enough to have confidence. You won’t know whether the product will actually work. Quality of released software will be low because: If ‘making sure it works’ is our goal, then the quality of the relased software will be low. Why? If our incentive is not to find faults, we are less likely to be effective at finding them. If it is less likely that we will find them, the number of faults remaining after testing will be higher and the quality of the software will be lower. So, it’s bad news all around, having this goal. 3.14.2.Goal 2: locate faults A successful test is one that locates a fault What is a better goal? A better goal is to locate faults, to be error-centric or focus on faults and use that motivation to do the job. In this case, a successful test is one that finds a fault. If finding faults is the testers' aim: If finding faults is your aim,that is, you see your job as a fault detective, this is a good motivation because when you locate a fault, it is a sign that you are doing a good job. It is a positive motivation. It is constructive because when you find a fault, it won’t be found by the users of the product. The fault can be fixed and the quality of the product can be improved. Your incentive will now be to create really tough tests. If your goal is to find faults, and you try and don’t find any, then you can be confident that the product is robust. Testers should have a mindset which says finding faults is the goal. If the purpose of testing is to find faults, when faults are found, it might upset a developer or two, but it will help the project as a whole. 3.14.3.Tester mindset Some years ago, there was a popular notion that testers should be put into “black teams”. Black teams were a popular idea in the late 1960s and early 1970s. If a successful test is one that locates a fault, the thinking went, then the testers should celebrate finding faults, cheering even. Would you think this was a good idea if you were surrounded by developers? Of course not. There was an experiment some years ago in IBM. They set up a test team, who they called the 'black team' because these guys were just fiends. Their sole aim was to break software. Whatever was given to them to test, they were going to find faults in it. They developed a whole mentality where they were the ‘bad guys’. They dressed in black, with black, Stetson hats and long false moustaches all for fun. They really were the bad guys, just like the movies. They were very effective at finding faults in everyone’s work products, and had great fun, but they upset everyone whose project they were involved in. They were most effective, but eventually were disbanded. Technically, it worked fine, but from the point of view of the organisation, it was counterproductive. The idea of a “black team” is cute, but keep it to yourself: it doesn’t help anyone if you crow when you find a fault in a programmer's code. You wouldn’t be happy if one of your colleagues tells you, your product is poor and laughs about it. It’s just not funny. The point to be made about all this is that the tester’s mindset is critical. Testers must have a split personality Testers need a split personality in a way. Perhaps you need to be more ‘mature’ than the developers. You have to be able to see a fault from both points of view. Pedantic, sceptical, nit-picking to software Some years ago, we were asked to put a slide together, saying who makes the best testers, and we thought and thought, but eventually, all we could think of was, they’ve got to be pedantic and sceptical and a nitpicker. Now, if you called someone a pedant, a sceptic, and a nitpicker, they’d probably take an instant dislike to you. Most folk would regard such a description as abusive because these are personal attributes that we don’t particularly like in other people, do we? These are the attributes that we should wear, as a tester, when testing the product. When discussing
failures with developers however, we must be much more diplomatic. We must trust the developers, but we doubt the product. Most developers are great people and do their best, and we have to get on with them – we’re part of the same team, but when it comes to the product, we distrust and doubt it. But we don’t say this to their faces. We doubt the quality of everything until we’ve tested it. Nothing works, whatever “works” means, until we’ve tested it. Impartial, advisory, constructive to developers: But we are impartial, advisory and constructive to developers. We are not against them, we are on the same team. We have to work with them, not against them. Because it is human nature to take a pride in their work and take criticism of their work personally, bear in mind this quote: ‘tread lightly, because you tread on their dreams’. If development slips and they are late, you can be assured that they’ve been put under a lot of pressure to deliver on time and that they’re working very long hours, and working very hard. Whether they’re being effective is another question, but they’ve been working hard to deliver something to you on time to test. So, when you find the bug, you don’t go up to them and say, this is a lot of rubbish – they are not going to be pleased. They are very emotionally attached to their own work, as we all are with our own work, our own creation. You have to be very careful about how you communicate problems with them. Be impartial; it is the product that is poor, not the person. You want to advise them – here are the holes in the road, we don’t want you guys to fall into. And be constructive – this is how we can get out of this hole. Diplomatic but firm. No, it’s not a feature, it’s a bug. The other thing is, if the developer blames you for the bug being there – you know, you didn’t put the bug in there, did you? Sometimes developers think that the bug wouldn’t be there if you didn’t test it. You know that psychology, ‘it wasn’t there until you tested it’. You have to strike quite a delicate balance: you’ve got to be able to play both sides of the game. In some ways, it’s like having to deal with a child. I don’t mean that developers are children, but you may be dealing blows to their emotions, so you have to be careful. Retesting and Regression Testing: 3.15.Re-Testing A re-test is a test that, on the last occasion you ran it, the system failed and a fault was found, and now you’re repeating that same test to make sure that the fault has been properly corrected. This is called re-testing. We know that every test plan we’ve ever run has found faults in the past, so we must always expect and plan to do some re-testing. Does your project manager plan optimistically? Some project managers always plan optimistically. They ask the testers: “how long is the testing going to take?”. To which the tester replies perhaps “four weeks if it goes as well as possible…”, and what happens is the tester suggest that, with things going perfectly well, maybe it takes a month, knowing that it should take twice as long because things do go wrong, you do find faults, there are delays between finding a fault, fixing it, and retesting. The project manager pounces on the ‘perfect situation’, and plans optimistically. Some project managers plan on the basis of never finding faults, which is absolutely crazy. We must always expect to do some re-testing. • If we run a test that detects a fault we can get the fault corrected • We then repeat the test to ensure the fault has been properly fixed • This is called re-testing • If we test to find faults, we must expect to find some faults so... • We always expect to do some re-testing. 3.16.Regression testing Regression testing is different from re-testing. We know that when we change software to fix a fault, there’s a significant possibility that we will break something else. Studies over many years reveal that the probability of introducing a new fault during corrective maintenance is around 50%. The 50% probability relates to creating a new fault in the software before testing is done. Testing will reduce this figure dramatically, but it is unsafe and perhaps negligent not to test for these unwanted side-effects. • When software is fixed, it often happens that 'knock-on' effects occur • We need to check that only the faulty code has changed • 50% chance of regression faults • Regression tests tell us whether new faults have been introduced o i.e. whether the system still works after a change to the code or environment has been made "Testing to ensure a change has not caused faults in unchanged parts of the system" A regression test is a check to make sure that when you make a fix to software the fix does not adversely affect other functionality. The big question, “is there an unforeseen impact elsewhere in the code?” needs to be answered. The need exists because fault-fixing is error-prone. It’s as simple as that. Regression tests tell you whether software that worked before the fix was made, still works. The last time that you ran a regression test, by definition, it did not find a fault; this time, you’re going to run it again to make sure it still doesn’t expose a fault. A more formal definition of regression testing is – testing to ensure a change has not caused faults in unchanged parts of the system. Not necessarily a separate stage
Some people regard regression testing as a separate stage, but it’s not a separate stage from system/acceptance testing, for example, although a final stage in a system test might be a regression test. There is some regression testing at every test stage, right from component testing through to acceptance testing. Regression testing most important during maintenance activities Regression testing is most important where you have a live production system requiring maintenance. When users are committed to using your software, the most serious problem the users encounter that is worse than having a bug in new code (which they may not yet be dependent on), is having a bug in code that they’re using today and are dependent on. Users get most upset when you 'go backwards' - that is, a system that used to work, stops working. They may not mind losing a few weeks because you’re late with a new delivery. They do mind if you screw up the system they trust and are dependent on at the moment. Effective regression testing is almost always automated. Effective regression testing is almost always automated. Manual regression testing is boring, tedious and testers make too many errors themselves. If it's not automated, it is likely that the amount of regression testing being done is inadequate. More on tools later. 3.17.Selective regression tests An entire test may be retained for subsequent use as a regression test pack It is possible that you may, on a system test say, keep the entire system test plan and run it in its entirety as a regression test. This may be uneconomic or impractical But for most environments, keeping an entire system test for regression purposes is just too expensive. What normally happens is that the cost of maintaining a complete system test as a regression test pack is prohibitive. There will be so much maintenance to do on it because no software is static. Software always requires change, so regular changes are inevitable. Most organisations choose to retain between 10% and 20% of a test plan as the regression test pack. Regression tests should be selected to: Criteria for selecting these test might be for example, they exercise the most critical or the most complex functionality. But also, it might be what is easiest to automate. A regression test does not necessarily need to exercise only the most important functionality. Many simple, lightweight regression tests might be just as valuable as a small number of very complex ones. If you have a GUI application, a regression test might just visit every window on the screen. A very simple test indeed, but it gives you some confidence that the developers haven’t screwed up the product completely. This is quite an important consideration. Selecting a regression test is all very well, but if you’re not going to automate it, it’s not likely to be run as often as you like. 3.18.Automating regression tests Some might say that manual regression tests are a contradiction in terms Manual regression testing is a contradiction in terms but regression tests are selected on the basis that they are perhaps the most stable parts of the software. Regression tests are the most likely to be stable and run repeatedly so: The tests that are easiest to automate are the ones that don’t find the bugs because you’ve run them once to completion. The problem with tests that did find bugs is that they cannot be be automated so easily. The paradox of automated regression testing is that the tests that are easiest to automate are the tests that didn’t find faults the last time we ran them. So the tests we end up automating often aren't the best ones. Stable tests/software are usually easiest to automate. Even if we do have a regression test pack, life can be pretty tough, because the cost of maintenance can become a considerable overhead. It’s another one of the paradoxes of testing. Regression testing is easy to automate in a stable environment, but we need to create regression tests because the environment isn’t stable. We don’t want to have to rebuild our regression test every time that a new version of software comes along. We want to just run them, to flush out obvious inconsistencies within a system. The problem is that the reason we want to do regression testing is because there is constant change in our applications, which means that regression testing is hard, because we have to maintain our regression test packs in parallel with the changing system. 3.19.Expected Results We’ve already seen that the fundamental test process requires that an outcome (expected result) must be predicted before the test is run. Without an expected result the test cannot be interpreted as a pass or fail. Without some expectation of the behaviour of a system, there is nothing to compare the actual behaviour with, so no decision on success or failure can be made. This short section outlines the importance of baselines and expected results. 3.20.External specifications and baselines Specifications, requirements etc. define what the software is required to do
As a tester, I’m going to look at a requirements or a design document and identify what I need to test, the features that I’m going to have to exercise, and the behaviour that should be exhibited when running under certain conditions. For each condition that I’m concerned with, I want an expected result so that I can say whether the system passes or fails the test when I run it. Usually, developers look at a design specification, and work out what must be built to deliver the required functionality. They take a view on what the required features are. Then, they need to understand the rules that the feature must obey. Rules are normally defined as a series of conditions against which the feature must operate correctly, and exhibit the required behaviour. But what is the required behaviour? The developer infers the required behaviour from the description of the requirement and develops the program code from that. Without requirements, developers cannot build, testers cannot test Requirements, design documents, functional specifications or program specs are all examples of baselines. They are documents that tell us what a software system is meant to do. Often, they vary in levels of detail, technical language or scope, and they are all used by developers and testers. Baselines (should) not only provide all the information required to build software system but also how to test it. That is, baselines provide the information for a tester to demonstrate unambiguously that a system does what is required. Programmers need them to write the code It looks like the developer uses the baseline in a very similar way to the tester. They both look for features, then conditions and finally a description of the required behaviour. In fact, the early development thought process is exactly the same for both. Some developers might say that they use use-cases and other object-oriented methods but this reflects a different notation for the same thing. Overall, it’s the same sequence of tasks. What does this mean? It means that without requirements, developers cannot build software and testers cannot test. Getting the baseline right (and early) benefits everyone in the development and test process. What about poor baselines? These tend to be a bigger problem for testers than developers. Developers tend not to question baselines in the same way as testers. There are two mindsets at work but the impact of poor baselines can be dramatic. Developers do question requirements but they tend to focus on issues such as how easy (or difficult) it will be to build the features, what algorithms, system services, new techniques will be required? Without good statements of required behaviour developers can still write code because they are time-pressured into doing so and have time to question users personally or make assumptions. Testers need them to: How do testers use specifications? First they identify the features to be tested and then, for each feature, the conditions (the rules) to be obeyed. For every condition defined, there will usually be a different behaviour to be exhibited by the system and this is inferred from the description of the requirement. Testers have no independent definition of the behaviour of a system other than the system itself, so they have nothing to ‘test against’. By the time a system reaches system test, there is little time to recover the information required to plan comprehensive tests. The testers need them to identify the things that need testing and to compare test results with requirements. 3.21.Baseline as an oracle for required behaviour When we test we get an actual result A baseline is a generic term for the document used to identify the features to test and expected results. Whether it’s acceptance, system, integration or component testing, there should be a baseline. The baseline says what the software should do. We compare results with requirements to determine whether a test has passed From the baseline, you get your expected results, and from the test, you have your actual results. A baseline document describes how we require the system to behave The baseline tells you what the product under test should do. That’s all the baseline is. Sometimes the 'old system' tells us what to expect. In a conversion job, the baseline is the regression test. The baseline is where you get your expected results. The next point to be made is the notion of an oracle. An oracle (with a lowercase “o”) is a kind of ‘font of all knowledge’. If you ask the oracle a question, it gives you the answer. If you need to know what software should do, you go back to the baseline, and the baseline should tell you exactly what the software should do, in all circumstances. A test oracle tells you the answer to the question, ‘what is the expected result?’. If you’re doing a conversion job (consider the Year 2000 work you may have done), the old system gives you the oracle of what the new system must continue to do. You’re going to convert it without changing any functionality. You must make it ‘compliant’ without changing the behaviour of the software. 3.22.Expected results The concern about expected results is that we should define them before we run the tests. Otherwise, we’ll be tempted to say that, whatever the system does when we test it, we’ll pass the result as correct. That’s the risk. Imagine that you’re under pressure from the boss (‘don’t write tests…just do the testing…’). The pressure is immense, so it’s easier to not write anything down, to not think what the results should be, to run some informal tests and pass them as correct. Expected results, (even when good baselines aren’t available) should always be documented. • If we don't define expected result before we execute the test...
o A plausible, but erroneous, result may be interpreted as the correct result o There may be a subconscious desire to see the software pass the test Expected results must be defined before test execution, derived from a baseline
4. Prioritisation of Tests We’ve mentioned coverage before, and we need to go into a little bit more detail on coverage. Were you ever given enough time to test? Probably not. So what happens when you do some initial work to specify a test and then estimate the effort required to complete the testing tasks? Normally, your estimates are too high, things need prioritisation and some tests will be ‘de-scoped’. This is entirely reasonable because we know that at some point the cost of testing must be balanced against the risk of release. 4.1. Test inventories, risk, and prioritisation There is no limit to how much testing we could do, so we must prioritise The principle is that we must adopt a prioritisation scheme for selecting some tests above others. As we start from highest priority and scan the tests in decreasing order of priority, there must be a point at which we reach the first test that is of too low a priority to be done. All tests of a lower priority still are de-scoped. How much testing should we do? Suppose we built an inventory of test cases and perhaps we had a total of a hundred tests. We might estimate from past experience that 100 tests will take 100 man days to complete. What does the Project Manager say? ‘You’ve only got 60 days to do the job.’ You’d better prioritise the tests and lose 40 or so to stay within budget. Suppose you had reviewed the priority of all of the test cases with users and technical experts, and you could separate tests that are in scope from those that are out of scope. As a tester, you might feel that the tests that were left in scope were just not enough. But what could you do? How do you make a case for doing more testing? It won’t help to say to the boss, ‘this isn’t enough’ - showing what is in the test plan will not convince. It is what is not in the test plan that will persuade the boss to reconsider. If you can describe the risk associated with the tests that will not be done, it will be much easier to make your case for more testing. In order to assess whether ‘the line’ has been drawn in the right place, you need to see what is above and below the threshold. The message is therefore: always plan to do slightly more testing than there is time for to provide evidence of where the threshold falls. Only in this way can you make a case for doing more testing. We must use risk assessment to help us to prioritise. How can we associate a risk with a test? Is it possible to associate a risk with each test? As testers we must try - if we can’t associate a risk with a test, then why bother with the test at all? So we must state clearly that if a feature fails in some way, the impact would be, perhaps a measurable or intangible cost associated. Or would the failure be cosmetic, and of no consequence? Could we lose a customer? What is the (potential) cost of that? Project managers understand risk. Business users understand risk. They know what they don’t want to happen. Identifying the unpleasant consequences that could arise will help you to persuade management to allocate more resources. Alternatively, the management may say, ‘yes, we understand the risks of not testing, but these are risks we must take’. So, instead of a risk being taken unconsciously, the risk is being taken consciously. The managers have taken a balanced judgement. 4.2. Test inventory and prioritisation To measure progress effectively, we must define the scope of the testing To measure progress effectively, we need to be able to define the scope of the testing in a form where coverage measurement can be applied. At the highest level, in system and acceptance test plans, we would normally define the features of the system to be tested and the tests to be implemented which will give us confidence that faults have been eliminated and the system has been thoroughly tested. Inventory of tests enable us to prioritise AND estimate Test inventories not only enable us to prioritise the tests to stay within budget, but they also enable us to estimate the effort required. Because inventories are documented in a tabular format, we can use the inventories to keep track of the testing that has been planned, implemented and executed while referencing functional requirements at a level which the user and system experts can understand. 4.3. Prioritisation of the tests Never have enough time The overriding reason why we prioritise is that we never have enough time, and the prioritisation process helps us to decide what is in and out of scope. First principle: to make sure the most important tests are included in test plans So, the first principle of prioritisation must be that we make sure that the most important tests are included in the test plans. That’s pretty obvious. Second principle: to make sure the most important tests are executed The second principle is however, that we must make sure that the most important tests are run. If, when the test execution phase starts and it turns out that we do run out of time before the test plan is complete, we want to make sure that, if we do get squeezed, the most important tests, at least, have been run. So, we must ensure that the most important tests are scheduled early to ensure that they do get run.
If tests reveal major problems, better find them early, to maximise time available to correct problems. There is also a most important benefit of running the most important tests first. If the most important tests reveal problems early on, you have the maximum amount of time to fix them and recover the project. 4.4. Most important tests Most important tests are those that: What do we mean by the most important tests? The most important tests are those that address the most serious risks, exercise the most critical features and have the best chance of detecting faults. Criteria for prioritising tests: There are many criteria that can be used to promote (or demote) tests. Here are the three categories we use most for prioritising requirements, for example. You could refine these three into lower level categories if you wish. The three categories are critical, complex and error-prone. We use these to question requirements and assign a level of criticality. In the simplest case, if something is critical, complex or error-prone, it is deemed to be high priority in the tests. 4.5. Critical When you ask a user which parts of the system are more critical than others, what would you say? ‘We’d like to prioritise the features of the system, so it would help us if you could tell me which requirements are high-priority, the most critical’. What would you expect them to say? ‘All of our requirements are critical’. Why? Because they believe that when they de-prioritise something, it is going to get pushed out, de-scoped, and they don’t want that to happen. They want everything they asked for so they are reluctant to prioritise. So, you have to explain why you’re going through this process because it is most important that you test the most critical parts of the software a bit more than those parts of the system that are less critical. The higher the criticality of a feature, the greater the risk, the greater the need to test it well. People will co-operate with you, once they realise what it is that you’re trying to achieve. If you can convince them that testing is not uniform throughout the system, that some bits need more than others, you just want a steer. These are ways of identifying what is more important. The features of the system that are fundamental to it's operation We have to admit that criticality is in the eye of the beholder. Management may say that their management report is the most important thing, that the telesales agents are just the drones that capture data. Fine for managers’ egos, but thankfully, most managers do recognise that the important thing is to keep the operation going – they can usually give a good steer on what is important. What parts of the system do the users really need to do their job? As a tester, you have to get beyond the response, ‘it’s all critical!’ You might ask, ‘which parts of the system do you really, really need?’ You have to get beyond this kind of knee-jerk reaction that everything is critical. You have to ask, ‘what is really, really important?’ What components must work, otherwise the system is seriously undermined? Another way of putting it might be to ask, if parts of a system were not available, could the user still do their job? What parts could be lost, without fear of the business coming crashing down? Is there a way that you can articulate a question to users that allows you to get that information you need? 4.6. Complex If you know an application reasonably well, then you will be able to say, for example, that these user screens are pretty simple, but the background or batch processes that do end-of-the-day processing are very complicated. Or perhaps that the user-interface is very simple, apart from these half dozen screens that calculate premiums, because the functionality behind those screens consists of a hundred thousand lines of code. Most testers and most users could work out which are the most complex parts of system to be tested. Aspects of the system which are recognised to be complex Are computer systems uniformly simple throughout? Certainly not. Are computer systems uniformly complex throughout? Not usually. Most systems have complex parts and less complex parts. If you think about one of your systems, could you identify a complex, complicated or difficult to understand part of your system? Now, could you identify a relatively simple part of the same system? Probably. Undocumented, poorly documented And what do we know about complexity in software? It means that it is difficult to get right. It tends to be error-prone. Complex could mean that it is just undocumented. If you can’t find anyone who knows how the software should work, how is the developer going to get it right? Are the business rules so complicated that no one knows how they work? It’s not going to be very easy to get right, is it? Difficult to understand from business or system point of view
Perhaps there are areas of functionality that the users don’t understand. Perhaps you are dealing with a legacy system that no-one has worked on continuously and kept pace with the rules that are implemented in the software. Perhaps the original developer of a system has left the company. Perhaps the systems was (or is) developed using methods which do not involve writing documentation? Inadequate business or system knowledge. If there isn’t any business or technical knowledge available, this is a sure sign that it will be more complicated or difficult to get right. So it is error-prone. Can you think of any parts of your system that the developers hate changing? Most systems have a least favourite area where there’s a sign that says swamp! This is where the alligators live and it’s really dangerous. So the issue of complexity is a real issue and you know that if there are parts of the system that people don’t like to go near, requirements which the developers are really wary of taking on – you know that they’re going to make mistakes. So, you should test more. 4.7. Error-prone The third question is error-prone. There is of course a big overlap with complexity here – most complex software is errorprone. But sometimes, what appear to be simpler parts of a system may turn out to be error-prone. Experience of difficulty in the past Is there a part of one of your systems, where every time there is a release, there are problems in one area? If you have a history of problems in the past, history will tend to repeat itself. If you’re involved in a project to replace an existing system, where should your concerns be? Existing system has history of problems in this area Where problems occurred in the old system, it is most likely that most of these problems will occur in the future on the new system. The developers may be using new technology, component-based, object-oriented or rapid application development methods, but the essential difficulties in building reliable software systems are unchanged. Many of the problems of the past will recur. It has been said, that people who fail to learn from the failures in history are doomed to repeat them. It’s just the same with software. Difficult to specify, difficult to implement. You may not be directly involved in the development of requirements, specification or design document or the coding. However, by asking about the problems that have occurred in earlier phases of a project, you should gain some valuable insights into where the difficulties and potential pitfalls lurk. Where there have been difficulties in eliciting requirements, specification and implementation, these are definitely areas that you should considered promoting in your test plans. A problem for you as a tester is that you may not have direct experience of these phases, so you must ask for assistance from both the business side and the technicians. All testers need to take advice from the business and the technical experts.
Module B: Testing Throughout the Software Life Cycle 5. Testing Through the Lifecycle The generally accepted proposition in software testing is that best practice is to test throughout the development lifecycle. Ideally, you should test early, in the middle, and at the end, not just at the end. Early testing is more likely to be tests of requirements, designs, and the techniques used are technical reviews, inspections and so on. We need to fit test activities throughout the lifecycle and this module considers the way that this should work. In doing this, we must discuss how both static tests (reviews etc.) and dynamic tests fit in. 5.1. Verification, validation, and testing (V,V& T) Verification, validation and testing (VV&T) are three terms that were linked some years ago as a way of describing the various test activities through the lifecycle. In commercial IT circles VV&T is considered a little old fashioned. However, in higher integrity environments, VV&T are widely used terms so we must address these now. In this course, we consider testing to include all the activities used to find faults in documents and code and gain confidence that a system is working. Of these activities, some are verification activities and the remainder are validation activities. V&V are useful ways of looking at the test activities in that at nearly all stages of development, there should be some aspect of both happening to ensure software products are build ‘to spec’ and meet the needs of their customer. Verification The principle of verification is this: verification checks that the product of a development phase meets its specification, whatever form that specification takes. More formally, verification implies that all conditions laid down at the start of a development phase are met. This might include multiple baseline or reference documents such as standards, checklists or templates. Validation is really concerned with testing the final deliverable – a system, or a program – against user needs or requirements. Whether the requirements are formally documented or exist only as user expectations, validation activities aim to demonstrate that the software product meets these requirements and needs. Typically, the end-user requirements are used as the baseline. An acceptance test is the most obvious validation activity. Also defined as "did we build the system right?" Essentially, verification asks the following questions: ‘Did we build the system the way that we said we would?’ When we component test, the component design is the baseline, and we test the code against the baseline. The user may have no knowledge of these designs or components - the user only sees the final system. If the test activity is not based on the original business requirements of the system, the test activity is probably a verification activity. Defined as: "determination of the correctness of the products of software development with respect to the user needs and requirements" In other words, validation is the determination of the correctness of the products of a software development with respect to the users' needs and requirements. Verification activities are mainly (but not exclusively) the concern of the suppliers of the system. Verification tends to be more the concern of the supplier/developer of the software product, rather than the concern of the user, at least up until system testing. A technician asks: did we build this product the way we specified? "did we build the right system?" Validation is asking the question, 'Did we build the right system?'. Where the focus is entirely on verification, it is possible to successfully build the wrong system for users. Both verification and validation activities are necessary for a successful software product. 5.2. Ad hoc development Pre mid-1970's development was more focused on "programs" than "systems" In the late sixties and early seventies, software development focused on distinct programs that performed specific processing roles. Programming methods were primitive Techniques and tools for managing large scale systems and their complexity did not exist, so functionality was usually decomposed into manageable chunks which skilled programmers could code. Characteristics of these developments were: (1) Analysis, as a disciplined activity was missing. (2) Analysis techniques were intuitive. ‘Design’ was a term used by programmers to describe their coding activity. (3) Requirements were sketchy. Testing was not a distinct activity at all, but something done by programmers on an informal basis. (4) Programs were written without designs. The main consequence of this approach was that systems were very expensive, fault prone and very difficult to maintain.
5.3. Structured methodologies More complex systems and technologies demanded more structure During the seventies, it became apparent that the way that software had been built in the past would not work in the future. Projects in some business areas were becoming very large, the costs were skyrocketing, and the general view was that there should be a more engineering-based structure to the way that people built software. Structured methods for programming Structured methods, for programming, analysis and project management emerged and by the mid eighties, dominated all large-scale development activities. There were strict methods for programming, ways of constructing software that was easier to maintain, and design criteria that people could apply and benefit from. Structured systems analysis methods The requirements to the design process became structured in terms of a series of stages: requirements definition, analysis, high-level design, low-level design, program specification, and so on. There was a natural flow from high-level abstract documents down to concrete, particular technical documents and finally the code. Relational database technology Databases continue to be the core of most large systems, and as relational systems emerged in the eighties and standards for SQL and related tools became mainstream, developers were released from many of the low-level data manipulation tasks in code. End-user tools and the promise of client/server architectures mean end users can query corporate databases with ease. Project management discipline and tools When software projects started to be organised into sequences of stages, each with defined deliverables, dependencies and skills requirements, the tools and disciplines of traditional project management could then be used. All combined to make up various "structured methodologies". Structured methods continue to be the preferred method for larger projects, even though analysis and design techniques and development technologies are more object-based nowadays. 5.4. Development lifecycles Various models of development There are various development models, the main ones being: Waterfall model The ‘Waterfall Approach’ to development, where development is broken up into a series of sequential stages, was the original textbook method for large projects. There are several alternatives that have emerged in the last ten years or so. Spiral model The Spiral model of development acknowledges the need for continuous change to systems as business change proceeds and that large developments never hit the target 100% first time round (if ever). The Spiral model regards the initial development of a system as simply the first lap around a circuit of development stages. Development never ‘stops’, in that a continuous series of projects refine and enhance systems continuously. Incremental prototyping Incremental prototyping is an approach that avoids taking big risks on big projects. The idea is to run a large project as a series of small, incremental and low-risk projects. Large projects are very risky because by sheer volume, they become complex. You have lots of people, lots of communication, mountains of paperwork, and difficulty. There are a number of difficulties associated with running a big project. So, this is a way of just carving up big projects into smaller projects. The probability of project failure is lowered and the consequence of project failure is lessened. Rapid Application Development Rapid Application Development or RAD, is about reducing our ambitions. In the past, it used to be that 80% of the project budget would go on the 20% of functionality that, perhaps, wasn’t that important – the loose ends, bells and whistles. So, the idea with RAD is that you try and spend 20% of the money but get 80% of the valuable functionality and leave it at that. You start the project with specific aims of achieving a maximum business benefit with the minimum delivery. This is achieved by ‘time-boxing’, limiting the amount of time that you’re going to spend on any phase and cutting down on documentation that, in theory, isn’t going to be useful anyway because it’s always out of date. In a way, RAD is a reaction to the waterfall model, as the Waterfall model commits a project to spending much of its budget on activities that do not enhance the customer’s perceived value for money. Certain common stages: In all of the models of development, there are common stages: defining the system, and building the system. 5.5. Static testing in the lifecycle
Static tests are tests that do not involve executing software. Static tests are primarily used early in the lifecycle. All deliverables, including code, can also be statically tested. All these test techniques find faults, and because they usually find faults early, static test activities provide extremely good value for money. Reviews, walkthroughs, inspections of (primarily) documentation Activities such as reviews, inspections, walkthroughs and static analysis are all static tests. Static tests operate primarily on documentation, but can also be used on code, usually before dynamic tests are done. Requirements Most static testing will operate on project deliverables such as requirements and design specification or test plans. However, any document can be reviewed or inspected. This includes project terms of reference, project plans, test results and reports, user documentation etc. Designs Review of the design can highlight potential risks that if identified early can either be avoided or managed. Code There are techniques that can be used to detect faults in code without executing the software. Review and inspection techniques are effective but labour intensive. Static analysis tools can be used to find statically detectable faults in millions of lines of code. Test plans. It is always a good idea to get test plans reviewed by independent staff on the project - usually business people as well as technical experts. 5.6. Dynamic testing in the lifecycle Static tests do not involve executing the software. Dynamic tests, the traditional method of running tests by executing the software, are appropriate for all stages where executable software components are available. Program (unit, component, module) Dynamic tests start with component level testing on routines, programs, class files, and modules. Component testing is the standard terms for tests that are often called unit, program or module tests. Integration or link testing The process of assembly of components into testable sub-systems is called integration (in the small) and link tests aim to demonstrate that the interfaces between components and sub-systems work correctly. System testing System-level tests are split into functional and non-functional test types. Non-functional tests address issues such as performance, security, backup and recovery requirements. Functional tests aim to demonstrate that the system, as a whole, meets its functional specification. User acceptance testing. Acceptance (and user acceptance) tests address the need to ensure that suppliers have met their obligations and that user needs have been met. 5.7. Test planning in the lifecycle Unit test plans are prepared during the programming phase According to the textbook, developers should prepare a test plan based on a component specification before they start coding. When the code is available for testing, the test plan is used to drive the component test. Test plans should be reviewed. At unit test level, test plans should be reviewed against the component specification. If test design techniques are used to select test cases the plans might also be reviewed against a standard, (the Component Test Standard BS7925-2, for example). System and acceptance test plans written towards the end of the physical design phase including The system and acceptance test plans include the test specifications and the acceptance criteria. System and acceptance tests should also be planned early, if possible. System-level test plans tend to be large documents - they take a lot longer to plan and organise at the beginning and to run and analyse at the end. System test planning normally involves a certain amount of project planning, resourcing and scheduling because of its scale. It’s a bigger process entirely requiring much more effort than testing a single component. Test plans for components and complete systems should be prepared well in advance for two reasons. Firstly, the process of test design detects faults in baseline documents (see later) and second to allow time for the preparation of test materials and test environments. Test planning depends only on good baseline documents so can be done in parallel with other development activities. Test execution is on the critical path – when the time comes for test execution, all preparations for testing should be completed. 5.8. Building block approach
We normally break up the test process into a series of building blocks or stages. The hope is that we can use a 'divide and conquer' approach and break down the complex testing problem into a series of smaller, simpler ones. • Building block approach implies o testing is performed in stages o testing builds up in layers. A series of (usually) sequential stages, each having distinct objectives, techniques, methods, responsibilities defined. Each test stage addresses different risks or modes of failure. When one test stage completes, we 'trust' the delivered product and move onto a different set of risk areas. • But what happens at each stage? • How do we determine the objectives for each layer? The difficult problem for the tester is to work out how each layer of testing contributes to the overall test process. Our aim must be to ensure that there are neither gaps nor overlaps in the test process. 5.9. Influences on the test process What are the influences that we must consider when developing our test strategy? The nature and type of faults to test for What kind of faults are we looking for? Low level, detailed programming faults are best found during component testing. Inconsistencies of the use of data transferred between complete systems can only be addressed very late in the test process, when these systems have been delivered. The different types of faults, modes of failure and risk affect how and when we test. The object under test What is the object under test? A single component, a subsystem or a collection of systems? Capabilities of developers, testers, users Can we trust the developers to do thorough testing, or the users, or the system testers? We may be forced to rely on less competent people to test earlier or we may be able to relax our later testing, because we have great confidence on earlier tests. Availability of: environment, tools, data All tests need some technical infrastructure. But have we adequate technical environments, tools and access to test data? These can be a major technical challenge. The different purpose(s) of testing Over the course of the test process, the nature of the purpose of testing changes. Early on, the main aim is to find faults, but this changes over time to generating evidence that software works and building confidence. 5.10.Staged testing - from small to large The stages of testing are influenced mainly by the availability of software artefacts during the build process. The build process is normally a bottom-up activity, with components being built first, then assembled into sub-systems, then the subsystems are combined into a complete, but standalone, system and finally, the complete system is integrated with other systems in its final configuration. The test stages align with this build and integration process. • Start by testing each program in isolation • As tested programs become available, we test groups of programs - sub-systems • Then we combine sub-systems and test the system • Then we combine single systems with other systems and test 5.11.Layered testing - different objectives Given the staged test process, we define each stage in terms of its objectives. Early test stages focus on low-level and detailed tests that need single, isolated components in small-scale test environments. This is all that is possible. The testing trend moves towards tests of multiple systems using end-to-end business processes to verify the integration of multiple systems working in collaboration. This requires large-scale integrated test environments. • Objectives at each stage are different • Individual programs are tested for their conformance to their specification • Groups of programs are tested for conformance to the physical design • Sub-systems and systems are tested for conformance to the functional specifications and requirements. 5.12.Typical test strategy
5.13.V model: waterfall and locks
5.14.Typical test practice
5.15.Common problems If there is little early testing, such as requirements or design reviews, if component testing and integration testing in the small don't happen, what are the probable consequences? Lots of rework Firstly, lots of faults that should have been found by programmers during component testing cause problems in system test. System testing starts late because the builds are unreliable and the most basic functionality doesn't work. The time taken to fix faults delays system testing further, because the faults stop all testing progressing.
Delivery slippage Re-programming trivial faults distracts the programmers from serious fault fixing. Re-testing and regression testing distract the system testers. The overall quality of the product is poor, the product is late and the users become particularly frustrated because they continue to find faults that they are convinced should have been detected earlier. Cut back on function, deliver low quality or even the wrong system. Time pressure forces a decision: ship a poor quality product or cut back on the functionality to be delivered. Either way, the users get a system that does not meet their requirements at all. 5.16.Fault cost curve
5.17.Front-loading and its advantages "Front-loaded" testing is a discipline that promotes the idea that all test activities should be done as early as possible. This could mean doing early static tests (of requirements, designs or code), or dynamic test preparation as early as possible in the development cycle. • The principle is to start testing early • Reviews, walkthroughs and inspections of documents during the definition stages are examples of early tests • Start preparing test cases early. Test case preparation "tests" the document on which the cases are based • Preparing the user manual tests the requirements and design What are the advantages of a front-loaded test approach? • Requirements, specification and design faults are detected earlier and are therefore less costly (remember the faultcost curve) • Requirements are more accurately captured, because test preparation finds faults in baselines • Test cases are a useful input to designers and programmers (they may prefer them to requirements or design documents) • Starting early spreads the workload of test preparation over the whole project 5.18.Early test case preparation
5.19.V-model The V-model is a great way to explain the relationship of development and test activities and promotes the idea of frontloaded testing. However, it really only covers the dynamic testing (the later stuff) and the front-loading idea is a sensible add-on. Taken at face value, the V-model retains the old-fashioned idea that testing is a 'back-door' activity that happens at the end, so it is a partial picture of how testing should be done. Instils concept of layered and staged testing The testing V-model reinforces the concept of layered and staged testing. The testing builds up in layers, each test stage has its own objectives, and doing testing in layers promotes efficiency and effectiveness.
Test Documentation or High Level Test Plan High Level (or Master) Test Planning is an activity that should take place as soon as possible after the go-ahead on a new development project is received. If testing (in all its various forms) will take 50% of the overall project budget, then high level test planning should consume 50% of all project planning, shouldn't it? This module covers the issues that need to be considered in developing an overall test approach for your projects. a. How to scope the testing? When testers are asked to test a system, they wait for the software to be kindly delivered by the developers (at their convenience) and in whatever environment is available at the time, start gently running (not executing) some test transactions on the system. NOT! Before testers can even think about testing at any stage, there must be some awkward questions asked of the project management, sponsors, technical gurus, developers and support staff. In many ways, this is the fun part of the project. The testers must challenge some of the embedded assumptions on how successful and perfect the development will be and start to identify some requirements for the activities that will no doubt occur late in the project. What stages of testing are required? Full scale early reviews, code inspections, component, link, system, acceptance, large scale integration tests? Or a gentle bit of user testing at the end? How do we identify what to test? What and where are the baselines? Testers cannot test without requirements or designs. (Developers cannot build? (But they usually try)). How much testing is enough? Who will set the budget for testing? Testers can estimate, but we all know that testers assume the worst and aim too high. Who will take responsibility for cutting the test budget down to size? How can we reduce the amount of testing? We know we'll be squeezed during test planning and test execution. What rationale will be used to reduce the effort? How can we prioritise and focus the testing? What are the risks to be addressed? How can we use risk to prioritise and scope the test effort? What evidence do we need to provide to build confidence? b. Test deliverables This is a diagram lifted from the IEEE 829 Standard for Software Test Documentation. The standard defines a comprehensive structure and organisation for test documentation and composition guidelines for each type of document. In the ISEB scheme IEEE 829 is being promoted as a useful guideline and template for your project deliverables. You don't need to memorise the content and structure of the standard, but the standard number IEEE829 might well be given as a potential answer in an examination question. NB: it is a standard for documentation, but makes no recommendation on how you do testing itself.
Master Test Plan
The Master Test Plan sets out the overall approach to how testing will be done in your project. Existing company policies and plans may be input to your project, but you may have to adapt these to your particular objectives. Master test planning is a key activity geared towards identifying the product risks to be addressed in your project and how tests will be scheduled, resourced, planned, designed, implemented, executed, documented, analysed, approved and closed. • Addresses project/product and/or individual application/system issues • Focus of strategies, roles, responsibilities, resources, and schedules • The roadmap for all testing activities • Identifies the detailed test plans required • Adopts/adapts test strategy/policies. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13 14. 15. 16. 17. 18. 19. d. Master Test Plan Outline Test Plan Identifier References Introduction Test Items Software Risk Issues Features to be Tested Features not to be Tested Approach Item Pass/Fail Criteria Suspension Criteria and Resumption Requirements Test Deliverables Remaining Test Tasks Environmental Needs Staffing and Training Needs Responsibilities Schedule Planning Risks and Contingencies Approvals Glossary
e. Brainstorming – agenda It is helpful to have an agenda for the brainstorming meeting. The agenda should include at least the items below. We find it useful to use the Master Test Plan (MTP) headings as an agenda and for the testers to prepare a set of questions associated with each heading to 'drive' the meeting. • • • • • To set the scene, introduce the participants Identify the systems, sub-systems and other components in scope Identify the main risks o what is critical to the business? o which parts of the system are critical? Make a list of issues and define ownership Identify actions to get test planning started. Many of the issues raised by the testers should be resolved at the meeting. However, individuals should be actioned to research possible alternatives or to resolve the ourstanding issues. f. MTP Headings IEEE 829 Main Headings and Guidelines 1. Test plan identifier • • • unique, generated number to identify this test plan, its level and the level of software that it is related to preferably the test plan level will be the same as the related software level may also identify whether the test plan is a Master plan, a Level plan, an integration plan or whichever plan level it represents.
2. References • • • list all documents that support this test plan. e.g. Project Plan, Requirements specifications, design document(s) development and test standards
the purpose of the Plan, possibly identifying the level of the plan (master etc.). the executive summary part of the plan.
4. Test Items (Functions) • • • • what you intend to test developed from the software application inventories as well as other sources of documentation and information includes version numbers, configuration requirements where needed delivery schedule issues for critical elements.
5. Software risk issues • critical areas are, such as: o delivery of a third party product o new version of interfacing software o ability to use and understand a new package/tool o extremely complex functions o error-prone components o Safety, multiple interfaces, impacts on client, government regulations and rules.
6. Features to be tested • • what is to be tested (from the USERS viewpoint) level of risk for each feature
7. Features not to be tested • • what is NOT to be tested (from the Users viewpoint) WHY the feature is not to be tested.
8. Approach (Strategy) • overall strategy for this test plan e.g. o special tools to be used o metrics to be collected o configuration management policy o combinations of HW, SW to be tested o regression test policy o coverage policy etc.
9. Item pass/fail criteria • • completion criteria for this plan at the Unit test level this could be: o all test cases completed o a specified percentage of cases completed with a percentage containing some number of minor faults o code coverage target met at the Master test plan level this could be: o all lower level plans completed o test completed without incident and/or minor faults.
10. Suspension criteria and resumption requirements • • • • when to pause in a series of tests e.g. a number or type of faults where more testing has little value what constitutes stoppage for a test or series of tests what is the acceptable level of faults that will allow the testing to proceed past the faults.
11. Test deliverables
e.g. test plan document, test cases, test design specifications, tools and their outputs, incident logs and execution logs, problem reports and corrective actions
12. Remaining test tasks • • where the plan does not cover all software e.g. where there are outstanding tests because of phased delivery.
13. Environmental needs • • • special requirements such as: special hardware such as simulators, test drivers etc. how test data will be provided
14. Staffing and training • • e.g. training on the application/system training for any test tools to be used.
15. Responsibilities • • • • who is in charge? who defines the risks? who selects features to be tested and not tested who sets overall strategy for this level of plan.
16. Schedule • based on realistic and validated estimates.
17. Planning risks and contingencies • • • overall risks to the project with an emphasis on testing lack of resources for testing lack of environment late delivery of the software, hardware or tools.
18. Approvals • who can approve the process as complete?
19. Glossary • used to define terms and acronyms used in the document, and testing in general, to eliminate confusion and promote consistent communications.
6. Stages of Testing This module sets out the six stages of testing as defined in the ISEB syllabus and provides a single slide description of each stage. The modules that follow this one describe the stages in more detail. 6.1. Test stages We’ve had a look at the “V” model and we’ve had a general discussion about what we mean by layered and stage testing. Here is a description of the stages themselves. 6.2. Component testing Component testing is the lowest-level component that has its own specification. It’s programmer-level testing. Objectives To demonstrate that a program performs as described in its specification. To demonstrate publicly that a program is ready to be included with the rest of the system (for Link Testing). Black and white box. A single program or component
Test technique Object under
test Responsibility Scope
Usually, the component's author Each component is tested separately, but usually a programmer performs some Ad Hoc Testing before formal Component Testing.
Component testing is the lowest level of testing. The purpose of it is to demonstrate that a program performs as described in its specification. Typically, you are testing against a program specification. Techniques – black and white box testing techniques are used. The programmers know how to work out test cases to exercise the code by looking at the code (white box testing). When the programmers are using the program spec to drive their testing, then this is black box testing. Object under test – a single program, a module, class file, or any other low-level, testable object. Who does it ? Normally, the author of the component. It might not be, but usually, it is the same person that wrote the code. 6.3. Integration testing Then, we have integration testing in the small. This is the testing of the assembly of these components into subsystems. Component testing and integration testing in the small, taken together, are subsystem testing. Objectives Test technique Object under test Responsibility Scope To demonstrate that a collection of components interface together as described in the physical design. White box. A sub-system or small group of components sharing an interface. A member of the programming team. Components should be Link Tested as soon as a meaningful group of components have passed component testing. Link Testing concentrates on the physical interfacing between components.
Integration testing in the small, is also called link testing. The principle here is that we’re looking to demonstrate that a collection of components, which have been integrated, have interfaced with each other. We’re testing whether or not those interfaces actually work, according to a physical design. It’s mainly white box testing, that is, we know what the interface looks like technically (the code). Object under test – usually more than one program or component or it could be all of the sub-programs making up a program. Who does it ? Usually a member of the programming team because it’s a technical task. 6.4. Functional system testing Functional system testing is typically against a functional specification and is what we would frequently call a system test. Objectives Test technique Object under test Responsibility Scope To demonstrate that a whole system performs as described in the logical design or functional specification documents. Black box, mainly. A sub-system or system. A test team or group of independent testers. System testing is often divided up into sub-system tests followed by full system tests. It is also divided into testing of "functional" and "non-functional" requirements.
The objective of functional system testing is to demonstrate that the whole system performs according to its functional specification. The test techniques are almost entirely black box. Functional testing is usually done by more than one person - a team of testers. The testers could be made up of representatives from different disciplines, e.g., business analysts, users, etc. or they could be a team of independent testers (from outside the company developing or commissioning the system). 6.5. Non-functional system testing Non-functional system testing is the tests that address things like performance, usability, security, documentation, and so on. Objectives Test technique Object under test Responsibility Scope To demonstrate that the non-functional requirements (e.g. performance, volume, usability, security) are met. Normally a selection of test types including performance, security, usability testing etc. A complete, functionally tested system. A test team or group of independent testers. Non-functional system testing is often split into several types of test organised by the requirement type.
Non-functional requirements describe HOW the system delivers its functionality. Requirements specifying the performance, usability, security, etc. are non-functional requirements. You need a complete system, functionally tested system that is
reliable and robust enough to test without it crashing every five minutes. You may be able to start the preparation of the nonfunctional tests before the system is stable, but the actual tests have to be run on the system as it will be at the time when it is ready for production. 6.6. Integration Testing in Large Very few systems live in isolation these days. All systems talk to other systems. So, where you have a concern of integration of one system with another – integration testing, in the large addresses this. You might also call this end-to-end testing. One issue with integration is that integration doesn’t happen at the beginning or the end; it happens throughout. At almost every stage, there’s a new aspect of integration that needs to be tested. Whether you’re dealing with integration of methods in a class file or really low-level integration, program-to-program, subsystem-to-subsystem, or system-to-system, this is an aspect of integration testing. And the web itself is like one big integrated network. So, integration happens throughout, but the two areas where integration is usually addressed as integration specifically is with integrating components into subsystems (integration testing in the small) and system to system testing (integration testing in the large). Objectives Test technique Object under test Responsibility Scope To demonstrate that a new or changed system interfaces correctly with other systems. Black and white box. A collection of interfacing systems. Inter-project testers. White box tests cover the physical interfaces between systems. White box tests cover the inter-operability of systems. Black-box tests verify the data consistency between interfacing systems.
Integration testing in the large involves testing multiple systems and paths that span multiple systems. Here, we’re looking at whether the new or changed interfaces to other systems actually work correctly. Many of the tests will operate 'end-toend' across multiple systems. This is usually performed by a team of testers. 6.7. User acceptance testing And the last one is acceptance testing. Covering user acceptance and contract acceptance, if applicable. Contract acceptance is not necessarily for the user’s benefit, but it helps you understand whether or not you should pay the supplier. Objectives Test technique Object under test Responsibility Scope To satisfy the users that the delivered system meets their requirements and that the system fits their business process. Entirely black box. An entire system Users, supported by test analysts. The structure of User Testing is in many ways similar to System Testing, however the Users can stage whichever tests that will satisfy them that their requirements have been met. User Testing may include testing of the system alongside manual procedures and documentation.
Here, we are looking at an entire system. Users will do most of the work, possibly supported by more experienced testers.
6.8. Characteristics of test stages
Part of the test strategy for a project will typically take the form of diagram documenting the stages of testing. For each stage, we would usually have a description containing ten or eleven different headings. Objectives What are the objectives? What is the purpose of this test? What kind of errors are we looking for? Test techniques (black or white box) What techniques are going to be used here? What methods are we going to use to derive test plans? Object under test What is the object under test? Responsibility Who performs the testing? Scope As for the scope of the test, how far into the system you will go in conducting a test. How do you know when to stop?
7. Component Testing The first test stage is component testing. Component testing is also known as unit, module or program testing (most often unit). Component testing is most often done by programmers or testers with strong programming skills. 7.1. Relationship of coding to testing The way that developers do testing is to interleave testing with writing of code – they would normally code a little, test a little. To write a program (say 1,000 lines of code), a programmer would probably write the main headings, the structure, and the main decisions but not fill out the detail of the processes to be performed. In other words, they would write a skeletal program with nothing happening in the gaps. And then they’d start to fill in the gaps. Perhaps they’d write a piece of code that captures information on the screen. And then they’d test it. And then they’ll write the next bit, and then test that, and so on. Code a little, test a little. That is the natural way that programmers work. • Preparing tests before coding exposes faults before you commit them to code • Most programmers code and test in one step • Usual to code a little, test a little • Testing mixed with coding is called ad hoc testing. 7.2. Component Testing Objectives Component testing is often called unit, module or program testing Formal component testing is often called unit, module or program testing. Objectives are to demonstrate that: The purpose of component testing is to demonstrate the component performs as specified in a program spec or a component spec. This is the place where you ensure that all code is actually tested at least once. The code may never be executed in the system test so this might be the last check it gets before going live. This is the opportunity to make sure that every line of code that has been written by a programmer has been exercised by at least one test. Another objective is, if you like, the exit criteria. And that is, the component must be ready for inclusion in a larger system. It is ready to be used as a component. It’s trusted, to a degree. 7.3. Ad Hoc Testing Ad hoc testing does not have a test plan Now as far as unit testing is concerned, a unit test covers the whole unit. That’s what a unit test is. It’s a complete, formal test of one component. There is a process to follow for this. If a programmer had not done any testing up to this point, then the program almost certainly would not run through the test anyway. So programmers, in the course of developing a program, do test. But this is not component testing, it is ad hoc testing. It’s called ad hoc because it doesn’t have a test plan. They test as they write. They don't usually use formal test techniques. It’s usually not repeatable, as they can’t be sure what they’ve done (they haven’t written it down). They usually don’t log faults or prepare incidence reports. If anything, they scribble a note to themselves. Criteria for completing ad hoc testing: The criteria for completing ad hoc testing is to ask whether doing a formal unit test is viable? Is it reliable enough or is it still falling over every other transaction? Is the programmer aware of any faults? 7.4. Ad hoc testing v component testing Ad hoc Testing: • Does not have a test plan • Not based on formal case design o Not repeatable o Private to the programmer • Faults are not usually logged Component Testing • Has a test plan • Based on formal test case design o Must be repeatable o Public to the team o Faults are logged 7.5. Analysing a component specification The programmer is responsible for preparing the formal unit test plan. This test is against the program specification. In order to prepare that test plan, the programmer will need to analyse the component spec to prepare test cases. The key recommendation with component testing is to prepare a component test plan before coding the program. This has a number of advantages and is not increasing the workload, as test preparation needs to be done at some point anyway.
Specification reviewers ask 'how would we test this requirement' among other questions If specifications aren't reviewed, the programmer is the first person to 'test' the specification When reviewing a specification, look for ambiguities, inconsistencies and omissions. Omissions are hardest to spot. Preparing tests from specifications finds faults in specifications. Preparing tests from specifications finds faults in specifications. In preparing the tests, the programmer may find bugs in the specification itself. If tests are prepared after the code is written, it is impossible for a programmer to eliminate assumptions that they may have made in coding from their mind, so tests will be self-fulfilling. Get clarification from the author • informal walkthroughs • explains your understanding of the specification May look obvious how to build the program, but is it obvious how to test ? • if you couldn't test it, can you really build it? • how will you demonstrate completion/success? 7.6. Informal Component Testing Informal component testing is usually based on black box techniques. The test cases are usually derived from the specification by the programmer. Usually they are not documented. It may be that the program cannot be run except using drivers and maybe, a debugger to execute the tests. It’s all heavily technical, and the issue is – how will the programmer execute tests of a component if the component doesn’t have a user interface? It’s quite possible. The objective of the testing is to ensure that all code is exercised (tested) at least once. It may be necessary to use the debugger to actually inject data into the software to make it exercise obscure error conditions. The issue with informal component testing is – how can you achieve confidence that the code that’s been written has been exercised by a test when an informal test is not documented? What evidence would you look for to say that all the lines of code in a program have been tested? How could you achieve that? Using a coverage measurement tool is really the only way that it can be shown that everything has been executed. But did the code produce the correct results? This can really only be checked by tests that have expected output that can be compared against actual output. The problem with most software developers is that they don’t use coverage tools. • Usually based on black box techniques • Tables of test cases may be documented • Tests conducted by the programmer • There may be no separate scripts • Test drivers, debugger used to drive the tests o to ensure code is exercised o to insert required input data 7.7. Formal component test strategy Before code is written: In a more formal environment, we will tend to define the test plan before the code is written. We define a target for black and white box coverage. We’d use black box techniques early on, to prepare a test plan based on the specification. After code is written: And then when we run the tests prepared using the black box techniques, we measure the coverage. We might say, for example, we’re going to design tests to cover all the equivalence partitions. We prepare the tests and then run them. But we could have also have a statement coverage target. We want to cover every statement in the code at least once. You get this information by running the tests you have prepared with a coverage tool. When you see the statements that have not been covered, you generate additional tests to exercise that code. The additional tests are white box testing although the original tests may be black box tests.
8. Integration Testing Integration is the process of assembly of tested components into sub-systems and complete systems. Integration is often done using a 'big-bang' approach, that is, an entire system may be assembled from its components in one large build. This can make system testing problematic, as many underlying integration faults may cause a 'complete' system to be untestable. Best practices promote two incremental integration approaches: Bottom-up - building from low-level components towards the complete system Top-down - building from the top control programs first, adding more and more functionality toward the complete system. 8.1. Software integration and link testing There is a lot of confusion concerning integration. If you think about it, integration is really about the process of assembly of a complete system from all of its components. But even a component consists of the assembly of statements of program code. So really, integration starts as soon as coding starts. When does it finish? Until a system has been fully integrated with other systems you aren't finished, so integration happens throughout the project. Here, we are looking at integration testing 'in the small'. It's also called link testing. • In the coding stage, you are performing "integration in the very small" • Strategies for coding and integration: o bottom up, top down, "big bang" o appropriate in different situations • Choice based on programming tool • Testing also affects choice of integration strategy 8.2. Stubs and top down testing The first integration strategy is 'top down'. What this means is that the highest level component, say a top menu, is written first. This can't be tested because the components that are called by the top menu do not yet exist. So, temporary components called 'stubs' are written as substitutes for the missing code. Then the highest level component, the to menu, can be tested. When the components called by the top menu are written, these can be inserted into the build and tested using the top menu component. However, the components called by the top menu themselves may call lower level components that do not yet exist. So, once again, stubs are written to temporarily substitue for the missing components. This incremental approach to integration is called 'top down'.
8.3. Drivers and bottom up testing The second integration strategy is 'bottom up'. What this means is that the lowest level components are written first. These components can't be tested because the components that call them do not yet exist. So, temporary components called 'drvers' are written as substitutes for the missing code. Then the lowest level components, can be tested using the test driver. When the components that call our lowest level components are written, these can be inserted into the build and tested in conjunction with the lowest level components that they call. However, the new components themselves require drivers to be
written to substitute to clling components that do not yet exist. So, once again, drivers are written to temporarily substitue for the missing components. This incremental approach to integration is called 'bottom up'.
8.4. Mixed integration strategy A mixed integration strategy involves some aspect of bottom-up, top-down and big bang.
8.5. Definition of interfaces Statements which transfer control between programs What is an interface? There are usually three aspects of an interface between components. In most software projects, complex functionality is decomposed into a discrete set of simpler components that ‘call’ each other in pre-defined ways. When a software component is executing and it requires the ‘services’ of another component there is a transfer of control. The calling component waits until the called component completes its task and passes back results. The called component usually needs data to operate on and a mechanism to return results to the calling component. Parameters passed from program to program. There are two mechanisms for this. Firstly, the calling component might pass parameters to the called component. A parameter is simply a mechanism for transferring data between interfacing components. Parameters can be used to send
(but not change data) or receive data (the results of calculations, say) or both. Parameters are visible only to the components that use them in a transfer of control. Global variables defined at the time of transfer The second way that data is exchanged by interfacing components is to use global data. Global data is available to all or a selected number of components. Just like parameters, components may be allowed to read from or write to global data or to do both. 8.6. Interface bugs If we look at how faults find their way into interfaces, interface bugs are quite variable in how they occur. These are white box tests in that link testing requires knowledge of the internals of the software, in the main. The kind of faults found during link testing reveals inconsistencies between two components that share an interface. Very often, problems with integration testing highlight a common problem in software projects and that is one of communications. Individuals and project teams often fail to communicate properly so misunderstandings and poor assumptions concerning the requirements for an interface occur. Link testing normally requires a knowledge of the internals of the software components to be tested, so is normally performed by a member of the development team. Transfer of control to the wrong routine One kind of bug that we can detect through link testing is a transfer of control bug. The decision to call a component is wrong; that is, the wrong component is invoked. Within a called component it may be possible to return control back to the calling component in the incorrect way so that the wrong component regains control after the called component completes its task. Programs validate common data inconsistently When making a call to a function or component, a common error is to supply the incorrect type, number, or order of parameters to the called component. Type could be a problem where we may substitute a string value or a numeric value, and this is not noticed until the software is executed. Perhaps, we supply the wrong number of parameters, where the component we call requires six parameters and we only supply five. It may be that the software does not fail and recognize that this has happened. Interface bugs can also occur between components that interpret data inconsistently. For example, a parameter may be passed to a component, which has been validated using a less stringent rule than that required by the called component. For example, a calling component may allow values between one and ten, but the called component may only allow values between one and five. This may cause a problem if non-valid values are actually supplied to the called component. Readonly parameters or global data that is written to. Parameters passed between components may be treated inconsistently by different components. A read-only parameter might be changed by a called component or a parameter passed for update may not be updated by the called component. Much data is held as global data, so is not actually passed across interfaces – rather, it is shared between many components. The common example is a piece of global memory, which is shared by processes running on the same processor. In this case, the ownership of global data and the access rights to creating, reading, changing, and deleting that data may be inconsistent across the components. One more issue, which is common, is where we get the type and number of parameters correct, but we mistake the order of parameters – so, two parameters which should be passed in the order A, then B with values A=‘yes’ and B=‘no’ might be supplied in the wrong order, B, then A, and would probably result in a failure of the called component 8.7. Call characteristics Other integration problems relate to transfer of control between programs. Where the transfer of control occurs in a hierarchical or a lateral sequence. Function/subroutine calls implement a hierarchical transfer of control Control may be passed by a component that calls another component. This implements a hierarchical transfer of control from parent to child and then back again, when the child component finishes execution. When testing these, ensure that the correct programs are called and return of control follows the correct path up the hierarchy. Attempt recursion: A calls B calls C calls B etc. Object/method calls can implement lateral transfer of control Where one object creates another object that then operates indepedently to the first, this might be considered to be a lateral transfer of control. When testing these, ensure that the correct programs or methods are called and the 'chain of control' ends at the correct point. Also check for loops: A calls B calls C calls A. 8.8. Aborted calls An interactive screen is entered, then immediately exited Aborted calls sometimes cause problems in software. If you imagine a system’s menu hierarchy, a new window might be opened and then immediately exited by the user. This would simulate a user making a mistake in the application or changing their mind, perhaps. Aborted calls can cause calling components difficulties because they don’t expect the called component to return immediately, rather, that it should return data.
An interactive screen has commands to return to the top menu, or exit completely Two other examples would be where a screen when entered by a user may have an option to return to the calling screen but might also have the facility to return to the top menu or exit the application entirely. The controlling program, which handles all menu options, perhaps, may not expect to have to deal with returns to top menus or complete exit from the program. A routine checking input parameters immediately exits: Another issue with regard to aborted calls is where a called component checks the data passed to it across the interface. If this data fails the check, the called component returns control to the calling component. The bug assumption would be that the calling component cannot actually handle the exception. does the calling routine handle the exception properly? Is it expecting the called component to return control when it finds an error? It may not be able to handle this exception at all 8.9. Data flows across interfaces There are several mechanisms for passing parameters between components across interfaces. It is possible to select the wrong mechanism and this is a serious problem, in that the called component cannot possibly interpret the data correctly if the call mechanism is incorrect. There are three ways that parameters can be passed: BY VALUE - read-only to the called routine The first is passed ‘by value’, and in effect, what happens is the contents of the variable are passed and the variable is redone as far as that component call is concerned. BY REFERENCE - may be read/written by called routine The variable can be passed ‘by reference’, which allows the called component to examine the data contained within the variables but also provides the reference, allowing it to write back into that variable and return data to the core component. Handles are pointers to pointers to data and need "double de-referencing" Handles are a common term used for pointers to pointers to data. In effect, these references are a label, which points to an address or some other data. These handles are de-referenced and de-referenced again to detect where the data is, that actually has been passed across an interface. 8.10.Global data Interface testing should also address the use of global data. Global data might be an area of memory shared by multiple systems or components. Global data could also refer to the content of a database record or perhaps the system time, for example. May reduce memory required May simplify call mechanisms between routines Use of global data is very convenient from the programmer’s point of view because it simplifies the call mechanism between components. You don’t need parameters any more. Lazy programmers over-use global data But it’s a lazy attitude when one uses global data too much because global data is particularly error-prone because of the misunderstandings that can occur between programmers in the use of global data. Global data is, in a way, a shortcut, that allows programmers not to have to communicate as clearly. Explicitly defined interfaces between processes written by different programmers force those programmers to talk to each other, discuss the interface, and clarify any assumption made about data that is shared between their components. 8.11.Assumptions about parameters and global data Assumed initialised e.g.: The kinds of assumptions that can be made, that cause integration faults in the use of that global data, are assumptions about initialisation. A component may assume that some global data will always exist under all circumstances. For example, the component may assume that the global data is always set by the caller, or that a particular variable is incremented before, rather than after the call (or vice versa). This may not be a safe assumption. Other assumptions: Other assumptions relate to the "ownership" of global data. A component may assume that it can set the value of global data and no other program can unset it or change it in anyway. Other assumptions can be that global data is always correct; that is, under no circumstances can it be changed and be made inconsistent with other information held within a component. A component could also make erroneous assumptions about the repeatability or re-entry of a routine. All of these assumptions may be mistaken if the rules for use of global data are not understood. 8.12.Inter-module parameter checking Does the called routine explicitly check input parameters?
The final category of integration bugs, which might be considered for testing are intermodule parameter checking; that is, does one component explicitly check the value supplied on its input? Does the calling routine check Does the calling component check the return status? Does it actually take the values returned from the called component and validate these return values are correct? Programming or interface standards should define whether callers, called or both routines perform checking and under what circumstances. The principle of all integration testing and all inter-component parameter passing is that interface standards must be clear about how the calling and the called components process passed data and shared data. The issue about integration and integration testing is that documenting these interfaces can eliminate many, if not all, interface bugs. In summary, most interface bugs relate to shared data and mistaken assumptions about the use of that data across interfaces. Where programmers do not communicate well within the programming team, it is common to find interface problems and integration issues within that team. The same applies to different teams who do not document their interfaces and agree the protocol is to be used between their different software products.
9. System and Acceptance Testing System and acceptance testing focus on the testing of complete systems. This module presents a few observations about the similarities and differences between system and acceptance testing because the differences are slight, but important. The most significant difference between acceptance and system testing is one of viewpoint. System testing is primarily the concern of the developers or suppliers of software. Acceptance testing is primarily the concerns of the users of software. 9.1. Similarities Aim to demonstrate that documented requirements have been met Let’s take an as an example, a middle-of-the-road IT application. Say, you’re building a customer information system, or a help desk application, or a telesales system. The objective of both system and acceptance testing is one aim - to demonstrate that the documented requirements have been met. The documented requirements might be the business requirements or what’s in the functional spec, or the technical requirements. Should be independent of designers/ developers In systems and acceptance testing there’s a degree of independence between the designers of the test and the developers of the software. Formally designed, organised and executed There also needs to be a certain amount of formality because it’s a team effort, it’s never one individual system testing. Incidents raised, managed in a formal way Part of the formality is that you run tests to a plan and you manage incidents. Large scale tests, run by managed teams. Another similarity is that both systems and acceptance tests are usually big tests – they’re usually a major activity within the project. 9.2. System testing A systematic demonstration that all features are available and work as specified If you look at system testing from the point of the view of the supplier of the software, system testing tends to be viewed as how the supplier demonstrates that they’ve met their commitment. This might be in terms of a contract or with respect to meeting a specification for a piece of software that they’re going to sell. Run by/on behalf of suppliers of software It tends to be inward looking. The supplier does it. We’re looking at how the supplier is going to demonstrate that what they deliver to a customer is okay. Now, that may not be what the customer wants, but they’re looking at it from the point of view of their contract or their specification. This makes it kind of an introspective activity. Because it is done by the organisation that developed the software, they will tend to use their own trusted documentation, the functional specification that they wrote. They will go through their baseline document in detail and identify every feature that should be present and prepare test cases so that they can demonstrate that they comprehensively meet every requirement in the specification. 9.3. Functional and non-functional system testing System testing splits into two sides - functional testing and non-functional testing. There is almost certainly going to be a question on functional and non-functional testing so I need to be quite clear about what the difference between these two are. Functional system testing The simplest way to look at functional testing is that users will normally write down what they want the system to do, what features they want to see, what behaviour they expect to see in the software. These are the functional requirements. The key to functional testing is to have a document stating these things. Once we know what the system should do, then we have to execute tests that demonstrate that the system does what it says in the specification. Within system testing, fault detection and the process of looking for faults is a major part of the test activities. It’s less about being confident. It’s more about making sure that the bugs are gone. That’s a major focus of system testing. Non-functional system testing Non-functional testing is more concerned with what we might call technical requirements – like performance, usability, security, and other associated issues. These are things that, very often, users don’t document well. It’s not unusual to see a functional requirement document containing hundreds of pages and a non-functional requirement document of one page. Requirements are often a real problem for non-functional testing. Another way to look at non-functional testing is to focus on how it delivers the specified functionality. How it does what it does. Functional testing is about what the system must do. Non-functional is about how it delivers that service. Is it fast? Is it secure? Is it usable? That’s the non-functional side. 9.4. Acceptance testing
Acceptance testing is from a user viewpoint. We tend to treat the system as a great big black box and we’ll look at it from the outside. We don’t take much interest in knowing how it was built, but we need to look at it from the point of view of how we will use it. Fit with business process is the imperative How does the system meet our business requirements? How does it fit the way that we do business? Simplistically, does the system help me do my job as a user? If it makes my life harder, I’m not going to use it, no matter how clever it is or how sophisticated the software is. Emphasis on essential features Users will test the features that they expect to use and not every single feature offered, either because they don’t use every feature or because some features are really not very important to them. Tests designed around how users use the system. The tests are geared around how the system fits the work to be done by the user and that may only use a subset of the software. Usual to assume that all major faults have been removed and the system works It is usual to assume at acceptance testing that all major faults have been removed by the previous component, link and system testing and that the system 'works'. In principle, if earlier testing has been done thoroughly, then it should be safe to assume the faults have been removed. In practice, earlier testing may not have been thorough and acceptance testing can become more difficult. When we buy an operating system, say a new version of Microsoft Windows, we will probably trust it if it has become widely available. But will we trust that it works for our usage? If we’re Joe Public and we’re just going to do some word-processing, we’ll probably assume that it is okay. It’s probably perfectly adequate, and we’re going to use an old version of Word on it and it will probably work just fine. If on the other hand, we are a development shop and we’re writing code to do with device drivers, it needs to be pretty robust. The presumption that it works is no longer safe because we’re probably going to try and break it. That’s part of our job. So this aspect of reliability, this assumption about whether or not it works, is basically from your own perspective. Acceptance tests: Acceptance testing is usually on a smaller scale than the system test. Textbook guidelines say that functional system testing should be about four times as much effort as acceptance testing. You could say that for every user test, the suppliers should have run, around four tests. So, system tests are normally of a larger scale than acceptance tests. On some occasions, the acceptance test is not a separate test, but a sub-set of the system test. The presumption is that we’re hiring a company to write software on our behalf and we’re going to use it when it’s delivered. The company developing the software will run their system testing on their environment. We will also ask them to come to our test environment and to rerun a subset of their test that we will call our acceptance test. 9.5. Design-based testing Design-based testing tends to be used in highly technical environments. For example, take a company who are rewriting a billing system engine that will fit into an existing system. We may say that a technical test of the features will serve as an acceptance test as it is not appropriate to do a ‘customer’ or ‘user’ based test. It would be more appropriate to run a test in the target environment (where it will eventually need to run). So, it’s almost like the supplier will do a demonstration test. Given that system testing is mainly black box, it relies upon design documents, functional specs, and requirements documents for its test cases. We have a choice, quite often, of how we build the test. Again, remember the “V” model, where we have an activity to write requirements, functional specs, and then do design. When we do system testing, what usually happens is that it’s not just the functional spec that is used. Some tests are based on the design. And no supplier who is providing a custom-built product should ignore the business requirements because they know that if they don’t meet the business requirements, the system won’t be used. So, frequently, some tests may be based on the business requirements as well. Tests are rarely based on the design alone. We can scan design documents or the features provided by the system: Let’s think about what the difference is between testing against these different baselines (requirements, functional specs and design documents). Testing against the design document is using a lower level, more technically-oriented document. You could scan the document and identify all of the features that have been built. In principle, this is what has been built. Remember that it is not necessarily what the user has asked for, but what was built. You can see from the design document what conditions, what business rules, what technical rules have been used. We can therefore test those rules. A designbased test is very useful because it can help demonstrate that the system works correctly. We can demonstrate that we built it the way that we said we would. Design based tests: If you base your tests on a design, it’s going to be more oriented towards the technology utilised and what was built rather than what was asked for. Remember that the users requirements are translated into a functional spec and eventually to a design document. Think of each translation as an interpretation. Two things may happen – a resulting feature doesn’t deliver functionality in the way a user intended and also if a feature is missing, you won’t spot it. So, if you test against the design document, you will never find a missing requirement because it just won’t be there to find fault with (if there’s a hole in your software it’s because there’s a hole in your design). There is nothing to tell you what is "missing" using the design document alone.
A design-based test is also strongly influenced by the system provided. If you test according to the design, the test will reflect how the system has been designed and not how it is meant to be used in production. Tests that are based on design will tend to go through features, one by one, right through the design document from end to end. It won’t be tested in the ways that users will use it, and that might not be as good a test. 9.6. Requirements-based testing We can scan requirements documents: The requirements document says what the users want. If we scan the requirements document, it should say which features should be in the system. And it should say which business rules and which conditions should be addressed. So, it gives us information about what we want the system to do. Requirements based tests: If it can be demonstrated that the system does all these things, then the supplier has done a good job. But testing may show that actually there are some features that are missing in the system. If we test according to the requirements document, it will be noticeable if things are missing. Also, the test is not influenced by the solution. We don’t know and we don’t care how the supplier has built the product. We’re testing it as if it were a black box. We will test it the way that we would use it and not test it the way that it was built. 9.7. Requirements v specifications Is it always possible to test from the requirements? No. Quite often, requirements are too high-level or we don’t have them. If it’s a package, the requirements may be at such a high-level that we are saying, for example, we want to do purchasing, invoice payment, and stock control. Here’s a package, go test it. In reality, requirements documents are often too vague to be the only source of information for testing. They’re rarely in enough detail. One of the reasons for having a functional spec is to provide that detail; the supplier needs that level of detail to build the software. The problem is that if you use the functional spec or the design document to test against, there may have been a mistranslation and that means that the system built does not meet the original requirements or that something has been left out. Functional specification Developers: "this is what we promised to build" The requirements are documented in a way that the users understand. And the functional spec, which is effectively the response from the supplier, gives the detail and the supplier will undertake to demonstrate how it meets the users requirements. The functional spec is usually structured in a different way than the requirements document. A lot more detail, and in principle, every feature in the functional spec should reflect how it meets these requirements. Quite often, you’ll see two documents delivered – one is the functional spec and one is a table of references between a feature of the system and how it meets a requirement. And in principle, that’s how you would spot gaps. In theory, a cross-reference table should help an awful lot. User or business requirements System tests may have a few test cases based on the business requirements just to make sure that certain things work the way that they were intended, but most of the testing tends to use the functional spec and the design documents. Users: "this is what we want" From the point of view of acceptance testing, you assume system testing has been done. The system test is probably more thorough than the acceptance test will be. When you come to do an acceptance test, you use your business requirements because you want to demonstrate to the users that the software does what they want. When a gap is detected because what the user wanted is different than what the developers built, then you have a problem. And that is probably why you still need system testing and acceptance testing. Not always the same thing... Probably the real value is that whoever wrote the table has by default checked that all of the features are covered. But many functional specs will not have a cross-reference table to the requirements. This is a real problem because these could be large documents, maybe 50 pages, 100 pages… this might be 500 pages. 9.8. Problems with requirements Another thing about a loose requirement is that when the supplier comes and delivers the system and you test against those requirements, if you don’t have the detail, the supplier is going to say, you never said that you were going to do that, because you didn’t specify that. So, the supplier is expecting payment for a product that the users don’t think works. The supplier contracted to deliver a system that met the functional specs, not the business requirements. You have to be very careful. Requirements don't usually give us enough information to test intents, not detailed implementation Typically a requirements statement says ‘this is what we intend to do with the software’ and ‘this is what we want the software to do’. It doesn’t say how it will do it. It’s kind of a wish list and that’s different than a statement of actuality. It’s intent, not implementation.
need to identify features to test From this ‘wish list’ you need to identify all of the features that need to be tested. Take an example of a requirements statement that says ‘the system must process orders’. How will it process orders? Well, that’s up to the supplier. So, it’s hard to figure out from the requirements how to test it; often you need to look at the specification. many details might be assumed to exist, but can't be identified from requirements When the user writes the requirement, many details might be assumed to exist. The supplier won’t necessarily have those assumptions, so they will deliver what they think will work. Assumptions arise from knowledge that you have yourself, but you didn’t transmit to the requirements document. A lot of low-level requirements, like field validation and steps of the process don’t appear in a requirements document. Again, looking at the processes of a large SAP system, they are incredibly complicated. You have a process called “The Order Process”, and within SAP, there may be 40 screens that you can go through. Now, nobody would use 40 screens to process an order. But SAP can deliver a system that, in theory, could use all 40. The key to it is the configuration that selects only those bits that are useful to you. All that detail backs up the statement ‘process an order’ is the difference between processing an order as you want to do it versus something that’s way over the top. Or the opposite can happen, that is, having a system that processes an order too simplistically when you need variations. That’s another reason why you have to be careful with requirements. 9.9. Business process-based testing The alternative to using the requirements document is to say from a user’s point of view, ‘we don’t know anything about technology and we don’t want to know anything about the package itself, we just want to run our business and see whether the software supports our activities’. Start from the business processes to be supported by the system use most important processes Testing from a viewpoint of business process is no different from the unit testing of code. Testing code is white box testing. In principle, you find a way of modelling the process, whether it’s software or the business, you draw a graph, trace paths, and you say that our covering the paths gives us confidence that we’ve done enough. From the business point of view, usually you identify the most important processes because you don’t have time to do everything. What business decisions need to be covered? Is it necessary to test every variation of the process? It depends. What processes do we need to feel confident about in order to give us confidence that the system will be correct? what business decisions need to be covered From this point of view, the users would construct a diagram on how they want a process to work. The business may have an end-to-end process where there’s a whole series of tasks to follow, but within that, there are decisions causing alternative routes. In order to test the business process, we probably start with the most straightforward case. Then, because there are exceptions to the business rules, we start adding other paths to accommodate other cases. If you have a business process, you can diagram the process with the decision points (in other words, you can graph the process). When testers see a graph, as Beizer says, ‘you cover it’. In other words, you make up test cases to take you through all of the paths. When you’ve covered all the decisions and activities within the main processes, then we can have some confidence that the system supports our needs. is a more natural way for users to specify tests. Testing business processes is a much more natural way for users to define a test. If you ask users to do a test and you give them a functional spec and sit them at a terminal, they just don’t know where to start. If you say, construct some business scenarios through your process and use the system, based on your knowledge based on the training course, they are far more likely to be capable of constructing test cases. And this works at every level, whether you’re talking about the highest level business processes or the detail of how to process a specific order type. Even if at the moment a particular order type is done manually, the decisions taken, whether by a computer or manually, can be diagrammed. 9.10.User Acceptance testing Intended to demonstrate that the software 'fits' the way the users want to work We have this notion of fit between the system and the business. The specific purpose of the user acceptance test is to determine whether the system can be used to run the business. Planned and performed by or on behalf of users It’s usually planned and performed by, or on the behalf of, the users. The users can do everything or you can give the users a couple of skilled testers to help them construct a test plan. It’s also possible to have the supplier or another third-party do the user acceptance test on behalf of the users as an independent test, but this cannot be done without getting users involved. User input essential to ensure the 'right things' are checked It’s not a user test unless you’ve got some users involved. They must contribute to the design of that test and have confidence that the test is representative of the way they want to work. If the users are going to have someone else run a test, they must buy into that and have confidence in the approach. The biggest risk with an independent test group (i.e., not the users) is that the tests won’t be doing what the user would do.
Here’s an example. Most people have bought a second-hand car. Suppose that you went into a showroom, into the forecourt. And you walk around the forecourt in a car dealer’s, and the model that you want is there. And you look at it and you think, well the colour is nice, and you look inside the window and the mileage is okay. And you know from the magazines that it goes really, really fast. And you think, well I’d like to look at this. And the car dealer walks up to you and says, hello sir – can I help you? And you say "I’d like to look at this car, I’d like to take it for a test drive". And the car dealer says, "no, no, no – you don’t want to do that, I’ve done that for you." Would you buy the car? It’s not likely that you’re going to buy the car? Assuming that the car dealer is trustworthy, why wouldn’t you buy a car from a dealer that said he’d tested the car out on your behalf? Because, his requirements may be different than yours. If he does the test – if he designs the test and executes the test – it’s no guarantee that you’ll like it. Software testing differs from this example in one respect. Driving a car is a very personal thing – the seat’s got to be right, the driving position, the feel, the noise, etc. It’s a personal preference. With software, you just want to make sure that it will do the things that the user wants. So, if the user can articulate what these things are, potentially, you can get a third party to do at least part of the testing. And sometimes, user acceptance tests can be included, say as part of a systems test done by the supplier, and then re-run in the customer’s environment. The fundamental point here is that the users have to have confidence that the tests represent the way they want to do business. When buying a package, UAT may be the only form of testing applied. Packages are a problem because there is no such notion of system testing; you only have acceptance testing. That’s the only testing that’s visible if it’s a package that you’re not going to change. Even if it is a package that you are only going to configure (not write software for), UAT is the only testing that’s going to happen. A final stage of validation UAT is usually your last chance to do validation. Is it the right system for me? Users may stage any tests they wish but may need assistance with test design, documentation and organisation The idea of user acceptance testing is that users can do whatever they want. It is their test. You don’t normally restrict users, but they often need assistance to enable them to test effectively. Model office approach: Another approach to user acceptance testing is using a model office. A model office uses the new software in an environment modelled on the business. If, for example, this is a call centre system, then we may set up 5 or 6 workstations, with headsets and telephone connections, manned by users. The test is then run using real examples from the business. So, you’re testing the software, the processes by the people who will be using it. Not only will you test the software, you will find out whether their training is good enough to help them do their job. So a model office is another way of approaching testing and for some situations, it can be valuable. 9.11.Contract acceptance testing Aims to demonstrate that the supplier's obligations are met Contract acceptance testing is done to give you evidence that the supplier’s contractual obligations have been met. In other words, the purpose of a contract acceptance is to show that a supplier has done what they said they would do and you should now pay them. Similar to UAT, focusing on the contractual requirements as well as fitness for purpose The test itself can take a variety of forms. It could be a system test done by a supplier. It could be what we call a factory acceptance test which is a test done by the supplier that is observed, witnessed if you like, by the customer. Or you might bring the software to the customer’s site and run a site acceptance test. Or it could even be the user acceptance test. Contract should state the acceptance criteria The contract should have clear statements about the acceptance criteria, the acceptance process and the acceptance timescales. Stage payments may be based on successful completion. Contract acceptance, when you pay the supplier, might be on the basis of everything going correctly, all the way through, that is 100% payment on final completion of the job. Alternatively, payment might be staged against particular milestones. This situation is more usual, and is particularly relevant for large projects involving lots of resources spanning several months or even a year or more. In those cases, for example, we might pay 20% on contract execution and thereafter all payments are based on achievement. Say another 20% on completion of the build and unit test phase, 20% when the systems test is completed satisfactorily, 20% when the performance criteria are met, and the final 20% only when the users are happy as well. So, contract acceptance testing is really any testing that has a contractual significance and in general, it is linked with payment. The reference in the contract to the tests, however, must be specific enough that it is clear to both parties whether the criteria have been met. 9.12.Alpha and beta testing Often used by suppliers of packages (particularly shrink-wrapped)
Up to now, we’ve been covering variations on system and acceptance testing. Are there more types of testing? Hundreds, but here are a few of the more common ones. Alpha and beta testing are normally conducted by suppliers of packaged (shrink-wrapped) software. For example, Microsoft does beta testing for Windows 95, and they have 30,000 beta testers. The actual definitions for alpha and beta testing will vary from supplier to supplier, so it’s a bit open to interpretation what these tests are meant to achieve. In other words, there’s no definitive description of these test types, but the following guidelines generally apply. Alpha testing normally takes place on the supplier site An alpha test is normally done by users that are internal to the supplier. An alpha test is an early release of a product, that is, before it is ready to ship to the general public or even to the beta testers. Typically it is given to the marketers or other parties who might benefit from knowing the contents of the product. For example, the marketers can decide how they will promote its features and they can start writing brochures. Or we might give it to the technical support people so that they can get a feel for how the product works. To recap, alpha testing is usually internal and is done by the supplier. Beta testing usually conducted by users on their site. Beta testing might be internal, but most beta testing involves customers using a product in their own environment. Sometimes beta releases are made available to big customers because if the customer wants them to take the next version, they may need a year or two years planning to make it happen. So, they’ll get a beta version of the next release so they understand what’s coming and they can plan for it. A beta release of a product is very often a product that’s nearly finished, and is reasonably stable, and usually includes new features that hopefully are of some use to the customer and you are asking your customers to take a view. Assess reaction of marketplace to the product You hear stories about Microsoft having 30,000 beta testers, and you think, don’t they do their own testing? Who are these people? Why are they doing this testing for Microsoft? This type of beta testing is something different. Microsoft isn’t using 30,000 people to find bugs, they have different objectives. Suppose that they gave a beta version of their product out which had no bugs. Do you think that anyone would call them up and say, ‘I like this feature but could you change it a bit’? They leave bugs in so that people will come back to them and give them feedback. So, beta testers are not testers at all really, they’re part of a market research programme. It’s been said that only 30% of a product is planned, the rest is based on feedback from marketers, internal salesmen, beta programmers, so on and so forth. And that’s how a product is developed. When they get 10,000 reports of bugs in a particular area of the software, they know that this is a really useful feature, because everybody who is reporting bugs must be using it! They probably know all about the bug before the product was shipped, but this is a way to see what features people are using. If another bug is only reported three times, then it’s not a very useful feature, otherwise you would have heard about it more. Let’s cut it out of the product. There’s no point in developing this any farther. In summary, beta testing may not really be testing, it may be market research. 9.13.Extended V Model It’s the same as you’ve seen before, but maybe there’s an architectural aspect to this. Multiple systems collaborate in an architecture to deliver a service. And the testing should reflect a higher level than just a system level. It could be thought of as the acceptance test of how the multiple systems deliver the required functionality.
9.14.Phase of Integration
Integration testing is not easy – you need an approach or a methodology to do it effectively. First, you need to identify all of the various systems that are in place and then you need to do analysis to decide the type of fault you may find, followed by a process to create a set of tests covering the paths through integration, i.e., the connection of all these systems. And finally, you have to have a way of predicting the expected results so that you can tell whether the systems have produced the correct answer.
10. Non-Functional System Testing Non-functional requirements (NFR) relate are those that state how a system will deliver its functionality. NFRs are as important as functional requirements in many circumstances but are often neglected. The following seven modules provide an introduction to the most important nonfunctional test types. 10.1.Non-functional test types Here are the seven types of non-functional testing to be covered in the syllabus. Performance and stress testing are the most common form of non-functional test performed, but for the purpose of the examination, you should understand the nature of the risks to be addressed and the focus of each type of test. • Load, performance and stress • Security • Usability • Storage and Volume • Installation • Documentation • Backup and recovery 10.2.Non-functional requirements Functional - WHAT the system does First, let’s take an overview of non-functional requirements. Functional requirements say what the system should do. Non-functional - HOW system does it Non-functional requirements say how it delivers that functionality – for example, it should be secure, have fast response time, be usable, and so on. Requirements difficulties The problem with non-functional requirements is that usually they’re not written down. Users naturally assume that a system will be usable, and that it will be really fast, and that it will work for more than half the day, etc. Many of these aspects of how a system delivers the functionality are assumptions. So, if you look at a functional spec, you’ll see 200 pages of functional spec and then, maybe, one page of functional requirements, and then maybe one page of non-functional requirements. If they are written down rather than assumed, they usually aren’t written down to the level of detail that they need to be tested against. Service Level Agreements may define needs. Suppose you’re implementing a system into an existing infrastructure and you will have a service level agreement that specifies the service to be delivered – the response times, the availability, security, etc. Requirements often require further investigation before testing can start. In most cases, it is not until this service level agreement is required that the non-functional requirements are discussed. It is common for the first activity of non-functional testing to be to establish the requirements. 10.3.Load, Performance and Stress Testing Let’s establish some definitions about performance testing. We need to separate load, performance, and stress testing. 10.4.Testing with automated loads Background or load testing Background or load testing is any test where you have some kind of activity on the system. For example, maybe you want to test for locking in a database. You might run some background transactions and then try to have a transaction that intercepts these. The purpose of this test clearly is not about response times. It’s usually to see if the functional behaviour of the software changes when there is a load. Stress testing Stress testing is where you push the system as hard as you can, up to its threshold. You might record response times, but stress testing is really about trying to break the system. You increase the load until the system can’t cope with it anymore and something breaks. Then you fix that and retest. This cycle continues until you have a system that will endure anything that daily business can hand it. Performance testing Performance testing is not (and this is where it differs from functional testing) a single test. Performance testing aims to investigate the behaviour of a system under varying loads. It’s a whole series of tests. And basically, the objective of performance testing is to create a graph based on a whole series of tests. The idea is to measure the response times from the extremes of a low transaction rate to very high transaction rate. As you run additional tests with higher loads, the response time gets worse. Eventually, the system will fail because it cannot handle the transaction rate. The primary purpose of the test is to show that at the load that the system was designed for, the response times meet the requirement. Another objective of performance or stress testing is to tune the system, to make it faster.
Whether you are doing load testing, performance testing or stress testing, you will need an automated tool to be effective. Performance testing can be done with teams of people, but it gets very boring very quickly for the people that are doing the testing, it’s difficult to control the test, and often difficult to evaluate the results. 10.5.Formal performance test objectives The performance test will need to show that the system meets the stated requirements for transaction throughput. This might be that it can process the required number of transactions per hour and per day within the required response times for screen-based transactions or for batch processes or reports. The performance criteria needs to be met while processing the required transaction volumes using a full sized production database in a production-scale environment. • To show system meets requirements for o transaction throughput o response times • To demonstrate o system functions to specification with o acceptable response times while o processing the required transaction volumes on o a production sized database 10.6.Other objectives Performance testing will vary depending on the objectives of the business. Frequently there are other objectives besides measuring response times and loads. Assess system's capacity for growth The other things that you can learn from a performance test is the system’s capacity for growth. If you have a graph showing today’s load and performance and then build up to a larger load and measurement of the performance, you will know how what the effect of business growth will be before it happens. Stress tests to identify weak points We also use a stress test to identify weak points – that is, to break things under test conditions so that we can make them more robust for the future and less likely to break in production. We can run the tests for long periods just to see if it will support that. Soak, concurrency tests over extended periods to find obscure bugs Soak or concurrency tests can be run over extended timeframes and after hours of running, may reveal bugs that may only rarely occur in a production situation. Bugs detected in a soak test will be easier to trace than those detected in live running. Test bed to tune architectural components We can use performance tests to tune components. For example, we can try a test with a big server or a faster network. So, it’s a test bed for helping us choose the best components. 10.7.Pre-requisites for performance testing It all sounds straightforward enough, but before you can run a test, there are some important prerequisites. You might call these entry criteria. Measurable, relevant, realistic requirements You must have some requirements to test against. This seems obvious, but quite often the requirements are so vague, that before you can do performance testing you need to establish the realistic, detailed performance criteria. Stable software system You must have also have stable software - it shouldn’t crash when you put a few transactions through it. Given that you will be putting tens of thousands of transactions through a system, if you can’t get more than a few hundred transactions through the system before it falls over, then you’re not ready to do performance testing. Actual or comparable production hardware The hardware and software you will use for the performance testing must be comparable to the production hardware. If it’s going to be implemented on the mainframe, you need a mainframe to test it on. If you need servers and wide area networks and you need to simulate thousands of users, then you need to simulate thousands of users. This is not simple at all. Controlled test environment You need a test environment that’s under control. You can’t share it. You’re going to be very demanding on the support resources when you’re running these tests. Tools (test data, test running, monitoring, analysis and reporting). And you need tools. Not just one tool but maybe six or seven or eight. In fact, you need a whole suite of tools. Process. And you need a process. You need an organised way, a method to help you determine what to do and how to do it.
10.8.The 'task in hand' Client application running and response time measurement The task at hand isn’t just generating a load on an environment and running the application. You also need to take response time measurements. In other words, you have to instrument the test. Imagine a fitness test of an athlete on a treadmill in a lab. It’s a controlled environment. The subject has sensors fitted to monitor pulse rate, oxygen, breathing, oxygen intake, carbon dioxide expelled, blood pressure, sweat, etc. The test is one of monitoring the athlete when running at different speeds over different timeframes. The test could be set up to test endurance or it could be set up to test maximum performance for bursts of activity. No matter what the test is, it is useless as an experiment, unless the feedback from the sensors is collected. Load generation With an application system, you will keep upping the transaction rate and load until it breaks, and that’s the stress test. Resource monitoring. But knowing the performance of a system is not enough. You must know what part of the system is doing what. Inevitably when you first test a client-server system, the performance is poor. But this information is not useful at all unless you can point to the bottleneck(s). In other words, you have to have instrumentation. Actually, there’s no limit to what you can monitor. The things to monitor are all the components of the service including the network. The application itself may have instrumentation/logging capability that can measure response times. Most databases have monitoring tools. NT, for example, has quite sophisticated monitoring tools for clients. There’s almost no limit to what you can monitor. And you should try to monitor everything that you might need because re-running a test to collect more statistics is very expensive. 10.9.Test architecture schematic Load generation and client application running don't have to be done by the same tool. Resource monitoring is normally done by a range of different tools as well as instrumentation embedded in application or middleware code. In our experience, you always need to write some of your own code to fill in where proprietary tools cannot help.
10.10.Response time/load graphs Performance testing is about running a series of tests and measuring the performance of different loads. Then you need to look at the results from a particular perspective. If that is the response time, then look at the maximum load you can apply and still meet the response time requirements. If you are looking at load statistics, you can crank the load up to more than your ‘design’ load, and then take a reading.
10.11.Test, analyse, refine, tune cycle Performance testing tends to occur in three stages. One stage is fixing the system to the point where it will run. At first the performance test ends when the system breaks. Quite literally, you’ll run a test and the database or a server will fall over. Or the application on the client crashes. Things break and they get fixed. Then the test is rerun until the next thing falls over. The next stage is identifying the areas of very poor performance that need tuning and attention. Typically, this is when somebody forgot to put the indexes on the database or an entire table of 10,000 rows is being read rather than a single row. The system works (sort of), but it’s slow, dead slow. Or maybe you’re using some unreasonable loads and you’re trying to run 10,000 transactions an hour through an end of month report or something crazy. So, the test itself might also need some refinement too. Eventually, you get to the point where performance is pretty good, and then you’re into the final stage, producing the graphs. And remember, with performance testing, unlike functional testing when you usually get a system that works when you get to the end, there is no guarantee that you’ll get out of this stage. Just because a supplier has said that an architecture would support 2,000 users, doesn’t mean that it is actually possible. To recap, performance testing is definitely a non-trivial and complex piece of work. Assuming that you get the prerequisites of a test environment and decent tools, the biggest obstacles are usually having enough time and stable software. As a rule of thumb, for a performance test that has value, it usually takes around 8-10 elapsed weeks to reach the point where the first reliable tests can be run. Then the system breaks and rework is required, and the start of the iteration phase begins. Again, a rule of thumb for an iteration of test, analyse, tune is about two weeks.
10.12.Security Testing The purpose of this section is not to describe precisely how to do security testing (it’s a specialist discipline that not many people can do), but to look at the risks and establish what should be tested. 10.13.Security threats
When we consider security, we normally think of hackers working late into the night, trying to crack into banks and government systems. Although hackers are one potential security problem, the scope of system security spans a broad range of threats. Natural or physical disasters such as fire, floods, power failures Security covers undesirable events over which we have no control. However, there are often measures we can take that provide a recovery process or contingency. Accidental faults such as accidental change, deletion of data, lack of backups, insecure disposal of media, poor procedures Even the best laid plans can be jeopardised by accidents or unforseen chains of events. Deliberate or malicious actions such as hacking by external people or disgruntled or fraudulent employees Hackers are a popular stereotype presented in the movies. Although the common image of a hacker is of a young college dropout working long into the night, the most threatening hacker is likely to be a professional person, with intimate knowledge of operating system, networking and application vulnerabilities who makes extensive use of automated tools to speed up the process dramatically. 10.14.Security Testing Can an insider or outsider attack your system? There’s an IBM advertisement that illustrates typical security concerns rather well. There are these two college dropouts. One of them says, ‘I’m into the system, I’m in. Look at these vice-presidents, Smith earns twice as much as Jones.’ (He’s into the personnel records.) 'And it's funny they don’t know about it… well, they do now. I’ve just mailed the whole company with it'. Whether this is a realistic scenario or not isn't the point - hackers can wreak havoc, if they can get into your systems. CIA model: The way that the textbooks talk about security is the CIA model. Confidentiality is usually what most people think of when they think of security. The question here is "are unauthorised people looking at restricted data?" The system needs to make certain that authorisation occurs on a person basis and a data basis. The second security point is Integrity. This means not just exercising restricted functionality, but guarding against changes or destruction of data. Could the workings of a system be disrupted by hacking in and changing data? And the third security point is availability. It’s not a case of unauthorized functions, but a matter of establishing whether unauthorised access or error could actually disable the system. 10.15.Testing access control Access control has two functions: Primarily, if we look at the restrictions of function and data, the purpose of security features and security systems is to stop unauthorized people from performing restricted functions or accessing protected data. And don’t forget the opposite – to allow authorized people to get at their data. Tests should be arranged accordingly: So, the tests are both positive tests and negative tests. We should demonstrate that the system does what it should and doesn’t do what it shouldn’t do. Basically, you’ve got to behave like a hacker or a normal person. So, you set up authorised users and test that they can do authorized things. And then you test as an unauthorised person and try to do the same things. And maybe you have to be more devious here and trying getting at data through different routes. It’s pretty clear what you try and do. The issue really is – authorized people, restricted access, and the combinations of those two. 10.16.Security test case example When testing the access control of a system or application, a typical scenario is to set up the security configuration and then try to undermine it. By executing tests of authorised and unauthorised access attempts, the integrity of the system can be challenged and demonstrated. • Make changes to security parameters • Try successful (and unsuccessful) logins • Check: o are the passwords secure? o are security checks properly implemented? o are security functions protected? 10.17.Usability Testing We’re all much more demanding about usability than we used to be. As the Web becomes part of more and more people's lives, and the choice on the web increases, usability will be a key factor in retaining customers. Having a web site with poor usability may mean the web site (and your business) may fail. 10.18.The need for usability testing Users are now more demanding
Usability can be critical and not just cosmetic The issue of usability for web-based systems is critical rather than cosmetic- usability is absolutely a ‘must have’; poor usability will result in poor sales and the company’s image will suffer. Usability requirements may differ. For some systems, the goal of the system is user productivity and if this isn’t achieved, then the system has failed. User productivity can be doubled or halved by the construction of the system. For management/executive information systems (MIS/EIS), for example, the only usability requirement is that it’s easy to use by managers who may access the system infrequently. In the main today, a system has to be usable enough that it makes the users’ job easier. Otherwise, the system will fall into disuse or never be implemented. 10.19.User requirements Perceived difficulty in expressing requirements Again, as with all non-functional test areas, getting requirements defined is a problem for usability testing. There is a perceived difficulty in writing the rules, e.g., documenting the requirements. It is possible to write down requirements for usability. Some of them are quite clear-cut. Typical requirements: • Messages to users will be in plain English. If you’ve got a team of twenty programmers all writing different messages, inconsistencies with style, content and structure are inevitable. • Commands, prompts and messages must have a standard format, should have clear meanings and be consistent. • Help functions should be available, and they need to be meaningful and relevant. • User should always know what state the system is in. Will the user always know where they are? If the phone rings and they get distracted, can they come back and finish off their task, knowing how they got where they were? • Another aspect of usability is the feedback that the system gives them – does it help or does it get in the way? The system will help (not hinder) the user: The previous slide showed positive things that people want that could be tested for. But there’s also ‘features’ that you don’t want. For example, if the user goes to one screen and inputs data, and then goes into another screen and is asked for the data again, this is a negative usability issue. The user shouldn’t have to enter data that isn’t required. Think of a paper form where you have to fill box after box of N/A (not applicable). How many of these are appropriate? The programmer may be lazy and put up a blank form, expecting data to be input, and then processing begins. But it is annoying if the system keeps coming back asking for more data or insists that data is input when for this function, it is irrelevant. The system should only display informational messages as requested by the user. To recap, the system should not insist on confirmation for limited choice entries, must provide default values when applicable, and must not prompt for data it does not need. These requirements can be positively identified or measured. To summarise, once you can identify requirements, you can make tests. 10.20.Usability test cases Test cases based on users' working scenarios What do we mean by usability test cases? The way that we would normally approach this issue is to put a user in a room with a terminal or a PC and ask them to follow some high-level test scripts. For example, you may be asking them to enter an order, but we’re not going to tell them how to do it on a line-by-line basis. We’re just going to give them a system, a user manual and a sheet describing the data, and then let them get on with it. Afterwards you ask them to describe their experience. Other considerations: There are a number of considerations regarding usability test cases. There could be two separate tests staged – one for people that have never seen the system and one for the experienced users. These two user groups have different requirements; the new user is likely to need good guidance and the experienced user is likely to be frustrated by over-guidance, slow responses, and lack of short cuts. Of course to be valid, you need to monitor the results (mistakes made, times stuck, elapsed time to enter a transaction, etc.). 10.21.Performing and recording tests User testing can be done formally in a usability lab. Take, for example, a usability lab for a call centre. Four workstations were set up, each with a chair and a PC and a telephone head set monitored by cameras and audio recording so that the users actions could be replayed and analysed. The monitors were wired effectively to a recording studio and observation booth. From the booth or from replays of the films, you could see what the user did and what they saw on the screen, and also what they heard and what they said. From watching these films, you can observe where the system is giving them
difficulty. There are usability labs that, for example, record eye blink rates as this allegedly correlates to a users perception of difficulty. • Need to monitor the user o how often and where do they get stuck? o number of references to help o number of references to documentation o how much time is lost because of the system? When running usability tests, it is normal practice to log all anomalies encountered during the tests. In a usability laboratory with video and audio capture of the user behaviour and the keystroke capture off the system under test, a complete record of the testing done can be obtained. This is the most sophisticated, (but expensive) approach, but just having witnesses observe users can be very effective. It is common to invite the participants to 'speak their mind' as they work. In this way, the developers can understand the thought processes that users go through and get a thorough understanding of their frustrations. • Need to monitor faults o how many wrong screen or function keys etc. o how many faults were corrected on-line o how many faults get into the database • Quality of data compared to manual systems? • How many keystrokes to achieve the desired objective? (too many?) 10.22.Satisfaction and frustration factors The fact that your software works and that you think that your instructions are clear, does not mean that it will never go wrong. Just because you’re shipping 100,000 CD’s with installation kits, doesn’t mean that will always work. Even if you’ve got the best QA process in the world – if you’re shipping a shrink-wrapped product, you have to test whether people of varying capability who have never seen anything like this before can install it from the instructions. So, that’s the kind of thing that usability labs are used for. The kind of information that might get captured is how many times mistakes are made. If you have selected appropriate users for the lab, then the mistakes are due to usability problems in the system. • Users often express frustration - find out why • Frustrated expert users o do menus or excess detail slow them down? o do trivial entries require constant confirmation • Frustrated occasional users o are there excess options that are never used? o help documentation doesn't help or is irrelevant o users don't get feedback and reassurance 10.23.Storage and Volume Testing Storage and volume testing are very similar and are often confused. Storage tests address the problem of a system expanding beyond its capacity and failing. Volume testing addresses the risk that a system cannot handle the largest (and smallest) tasks that users need to perform. Storage tests demonstrate that a system's usage of disk or memory is within the design limits over time e.g. can the system hold five-years worth of system transactions? The question is, "can a system, as currently configured, hold the volume of data that we need to store in it?" Assume you are buying an entire system including the software and hardware. What you’re buying should last longer than six months, or more than a year, or maybe five years. You want to know whether the system that you buy today can support, say, five years worth of historical data. So, for storage testing, you aim to predict the eventual volume of data based on the number of transactions processed over the system's lifetime. Then, by creating that amount of data, you test that the system can hold it and still operate correctly. Volume tests demonstrate that a system can accommodate the largest (& smallest) tasks it is designed to perform e.g. can end of month processes be accommodated? The volume-tests are simply looking at how large (or small) a task can the system accommodate? Not how many transactions per second (i.e. transaction rate), but how big a task in terms of the number of transactions in total? The limiting resource might be long-term storage on disk, but it might also be short-term storage in memory, as well. Rather than you saying, we want to get hundreds of thousands of transactions per hour through our system, we are asking, ‘can we simultaneously support a hundred users, or a thousand users’? We want to push the system to accommodate as many parallel streams of work as it has been designed for...and a few more. 10.24.Requirements Many people wouldn't bother testing the limits of a system if they thought that the system would give them plenty of warning as a limit is approached so that the eventual failure is predictable. Disk space is compatively cheap these days so storage testing is not the issue it once was. On the other hand, systems are getting bigger and bigger by the day and the failures might be more extreme.
Requirement is for the system to: Testing the initial and anticipated storage and volume requirements involves loading the data to the levels specified in the requirements documents and seeing if the system still works. You can’t just create a mountain of dummy data and then walk away. If the system becomes overloaded (in terms of data volumes) then Storage and volume testing should also include the characteristics of the system when it is approaching the design limits (say, the maximum capacity of a database). When the system approaches the threshold, does the system crash or does it warn you that the limits are going to be exceeded? Is there a way to recover the situation if it does fail? In IT, when a system fails in a way which we can do something about it, we say that it 'fails gracefully'. 10.25.Running tests When you run tests on a large database, you’re going to wait for failures to occur. You have to consider that as you keep adding rows, eventually it will fail. What happens when it does fail? Do you have a simple message and no one can process transactions or is it less serious than that? Do you get warnings before it fails? The test requires the application to be used with designed data volumes Creation of the initial database by artificial means if necessary (data conversion or randomly generated) How do you build a production-sized database for a new system? To create a production-sized database you may need to generate millions and millions of rows of data which obey the rules of the database. Use a tool to execute selected transactions You almost certainly can’t use the application because you’d have to run it forever and ever, until you could get that amount of data in. The issue there is that you have to use a tool to build up the database. But you need very good knowledge of the database design. automated performance test if there is one You may need to run a realistic performance test. Volume tests usually precede the performance tests because you can reuse the production-sized database for performance testing. 10.26.Pre-requisites When constructing storage and volume tests there are certain pre-requisites that must be arranged before testing can start. It is common, as in many non-functional areas, for there to be no written requirements. The tester may need to conduct interviews and analysis to document the actual requirements. Often the research required to specify these tests is significant and requires detailed technical knowledge of the application, the business requirements, the database structure and the overall technical architecture. • Technical requirements o database files/tables/structures o initial and anticipated record counts • Business requirements o standing data volumes o transaction volumes • Data volumes from business requirements using system/database design knowledge. 10.27.Installation Testing Installation testing is relevant if you’re selling shrink-wrapped products or if you expect your 'customers', who may be inhouse users, to do installations for themselves. If you are selling a game or a word-processor or a PC-operating system, and it goes in a box with instructions, an install kit, a manual, guarantees, and anything else that’s part of the package, then you should consider testing the entire package from installation to use. The installation process must work because if it’s no good, it doesn’t matter how good your software is; if people can’t get your software installed correctly, they’ll never get your software running - they'll complain and may ask for their money back. 10.28.Requirements Can the system be installed and configured using supplied media and documentation? shrink-wrapped software may be installed by naïve or experienced users server or mainframe-based software or middleware usually installed by technical staff The least tested code and documentation? The installation pack is, potentially, the least tested part of the whole product because it’s the very last thing that you can do. The absolutely last thing you can do, because you may have burnt the CD’s already. Once you've burnt the CD's, they can’t be
changed. There’s a very short period of time between having a stable, releasable product and shipping it. So, installation testing can be easily forgotten or done minimally. the last thing written, so may be flaky, but is the first thing the user will see and experience. 10.29.Running tests Tests are normally run on a clean, 'known' environment that can be easily restored (you may need to do this several times). Typical installation scenarios are to install, re-install, de-install the product and verify the correct operation of the product in between installations. The integrity of the operating system and the operation of other products that reside on the system under test is also a major consideration. If a new software installation causes other existing products to fail, users would regard this as a very serious problem. Diagnosis of the cause is normally extremely difficult and restoration of the orginal configuration is often a complicated, risky affair. Because the risk is so high, this form of regression testing must be included in the overall installation test plan to ensure that your users are not seriously inconvenienced. • On a 'clean' environment, install the product using the supplied media/documentation • For each available configuration: o are all technical components installed? o does the installed software operate? o do configuration options operate in accordance with the documentation? • Can the product be reinstalled, de-installed cleanly? 10.30.Documentation testing The product to be tested is more than the software The product to be tested is more than just the software. When the user buys software, they might receive a CD-Rom containing the software itself, but they also buy other materials including the user guide, the installation pack, the registration card, the instructions on the outside, etc. Documentation can be viewed as all of the material that helps users use the software. In addition to the installation guide, the user guide, it also includes online Help, all of the graphical images and the information on the packaging box itself. If it is possible for these documents to have faults, then you should consider testing them. • Documentation can include: o user manuals, quick reference cards o installation guides, online help, tutorials, read me files, web site information o packaging, sample databases, registration forms, licences, warranty, packing lists... 10.31.Risks of poor documentation Documentation testing consists of checking or reviewing all occurrences of forms and narratives for accuracy and clarity. If the documentation is poor, people will perceive that the product is of low quality. No matter how good the product is, if the documentation is weak, it will taint the users' view of the product. • Software unusable, error prone, slower to use • Increased costs to the supplier o support desk becomes a substitute for the manual o many problems turn out to be user errors o many 'enhancements' requested because the user can't figure out how to do things • Bad manuals turn customers off the product • Users assume software does things it doesn't and may sue you! 10.32.Hardcopy documentation test objectives Accuracy, completeness, clarity, ease of use Documentation testing tends to be very closely related to usability. Does the document reflect the actual functionality of the documented system? User documentation should reflect the product, not the requirements. Are there features present that are not documented, or worse still, are there features missing from the system? Does the document flow reflect the flow of the system? User documentation should follow the path or flow that a user is likely to use, and not just describe features one by one without attention to their sequence of use. This means that you have to test documentation with the product. Does the organisation of the document make it easy to find material? Since the purpose of documentation is to make usage of the system easier, the organisation of the documentation is a key factor in achieving this objective.
10.33.Documentation test objectives Documentation may have several drafts and require multiple tests Early tests concentrate on target audience, scope, organisation issues - reviewed against system requirements documents. Later tests concentrate on accuracy. Eventually, we will use the documentation to install and operate the system and this of course has to be as close to perfect as possible. Documentation tests often find faults in the software. Overall, tests should concentrate on content, not style. Online help has a similar approach Typical checks of on-line documentation cover: does the right material appear in the right context? have online help conventions been obeyed? do hypertext links work correctly? is the index correct and complete? Online help should be task-oriented: is it easy to find help for common tasks? Is help concise, relevant, useful? 10.34.Backup and Recovery Testing We have all experienced hardware and software failures. The processes we use to protect ourselves from loss of our most precious resource (data) are our backup and recovery procedures. Backup and recovery tests demonstrate that these processes work and can be relied upon if a major failure occurs. The kind of scenarios and the typical way that tests are run is to perform full and partial backups and to simulate failures, verifying that the recovery processes actually work. You also want to demonstrate that the backup is actually capturing the latest version of the database, the application software, and so on. • Can incremental and full system backups be performed as specified? • Can partial and complete database backups be performed as specified? • Can restoration from typical failure scenarios be performed and the system recovered? 10.35.Failure scenarios A large number of scenarios are possible, but few can be tested. The tester needs to work with the technical architect to identify the range of scenarios that should be considered for testing. Here are some examples. • Loss of machine - restoration/recovery of entire environment from backups • Machine crash - automatic database restoration/recovery to the point of failure • Database roll-back to a previous position and roll-forward from a restored position Typical Test Senario Typically you take checkpoints using reports showing specific transactions and totals of particular subsets of data as you go along. Start by performing a full backup, then do some reports, execute a few transactions to change the content of the database and rerun the reports to demonstrate that you have actually made those changes, followed by an incremental backup. Then, reinstall the system from the full backup, and verify with the reports that the data has been restored correctly. Apply the incremental back up and verify the correctness, again by rerunning the reports. This is typical of the way that tests of minor failures and recover scenarios are done. • Perform a full backup of the system o Execute some application transactions o Produce reports to show changes ARE present • Perform an incremental backup • Restore system from full backup o Produce reports to show changes NOT present • Restore system from partial backup o Produce reports to show changes ARE present. While entering transactions into the database, bring the machine down by causing (or simulating) a machine crash You can also do more interesting tests that simulate a disruption. While entering transactions into the system, bring the machine down - pull the plug out, do a shut-down, or simulate a machine crash. You should, of course, seek advice from the hardware engineers of the best way to simulate these failures without causing damage to servers, disks, etc. Reboot the machine and demonstrate by means of query or reporting, that the database has recovered the transactions committed up to the point of failure. The principle is again that when you reboot the system and bring it back on line, you have to conduct a recovery from the failure. This type of testing requires you to identify components and combinations of components that could fail, and simulate the failures of whatever could break, and then using your systems, demonstrate that you can recover from this.
11. Maintenance Testing The majority of effort expended in the IT industry is to do with maintenance. The problem is that the textbooks don’t talk about maintenance very much because it's often complicated and 'messy'. In the real world, systems last longer than the project that created them. Consequently, the effort required to repair and enhance systems during their lifetime exceeds the effort spent building them in the first place. 11.1.Maintenance considerations Poor documentation makes it difficult to define baselines The issue with maintenance testing is often that the documentation, if it exists, is not relevant or helpful when it comes to doing testing. Maintenance changes are often urgent Specifically here we are talking about corrective maintenance, that is, bug-fixing maintenance rather than new developments. The issue about bug-fixing is that it’s often required immediately. If it is a serious bug that’s just come to light, it has to be fixed and released back into production quickly. So, there is pressure not to do elaborate testing. And don’t forget, there’s pressure on the developer to make the change in a minimal time. This situation doesn’t minimise his error rate! 11.2.Maintenance routes Essentially, there are two ways of dealing with maintence changes. Maintenance fixes are normally packaged into manageable releases. • Groups of changes are packaged into releases; for adaptive or non-urgent corrective maintenance. • Urgent changes handled as emergency fixes; usually for corrective maintenance It is often feasible to treat maintenance releases as abbreviated developments. Just like normal development, there are two stages: definition and build. 11.3.Release Definition Maintenance programmers do an awful lot of testing. Half of their work is usually figuring out what the software does and the best way to do this is to try it out. They do a lot of investigation initially to find out how the system works. When they have changed the system, they need to redo that testing. Development Phase/Activity Feasibility Maintenance Tasks Evaluate Change Request (individually) to establish feasibility and priority Package Change Requests into a maintenance package Elaborate Change Request to get full requirements Specify changes Do Impact Analysis Specify secondary changes
User Requirements Specification Design
11.4.Maintenance and regression testing Maintenance package handled like development except testing focuses on code changes and ensuring existing functionality still works What often slips is the regression testing unless you are in a highly disciplined environment. Unless you’ve got an automated regression test pack, maintenance regression testing is usually limited to a minimal amount. That’s why maintenance is risky. If tests from the original development project exist, they can be reused for maintenance regression testing, but it's more common for regression test projects aimed at building up automated regression test packs to have to start from scratch. If the maintenance programmers record their tests, they can be adapted for maintenance regression tests. Regression testing is the big effort. Regression testing dominates the maintenance effort as it is usually takes more than half of the total effort for maintenance. So, part of your maintenance budget must be to do a certain amount of regression testing and, potentially, automation of that effort as well. Maintenance fixes are error-prone - 50% chance of introducing another fault so regression testing is key Regression testing dominates test effort - even with tool support If release is urgent and time is short, can still test after release 11.5.Emergency maintenance
You could make the change and install it, but test it in your test environment. There’s nothing stopping you from continuing to test the system once it’s gone into production. In a way, this is a bit more common than it should be. Releasing before all regression testing is complete is risky, but if testing continues, the business may not be exposed for too long as any bugs found can be fixed and released quickly. • Usually "do whatever is necessary" • Installing an emergency fix is not the end of the process • Once installed you can: o continue testing o include it for proper handling in the next maintenance release
12. Introduction to Testing Techniques ( C & D) 12.1.Test Techniques and the Lifecycle 12.2.Testing throughout the life cycle: the W model 12.3.Comparative testing efficiencies Module C: Black Box or Functional Testing 12.3.1.Equivalence Partitioning 22.214.171.124.1.Equivalence partitioning 126.96.36.199.2.Equivalence partitioning example 188.8.131.52.3.Identifying equivalence classes 184.108.40.206.4.Output partitions 220.127.116.11.5.Hidden partitions 12.3.2.Boundary Value Analysis 18.104.22.168.Boundary value analysis example 12.4.White Box or Structural Testing 12.4.1.Statement Testing and Branch Testing 22.214.171.124.1.Path testing 126.96.36.199.2.Models and coverage 188.8.131.52.3.Branch coverage 184.108.40.206.4.Coverage measurement 220.127.116.11.5.Control flow graphs 18.104.22.168.6.Sensitising the paths 22.214.171.124.7.From paths to test cases 12.5.White Box vs. Black Box Testing 12.6.Effectiveness and efficiency 12.7.Test Measurement Techniques 12.8.Error Guessing 12.8.1.Testing by intuition and experience 12.8.2.Examples of traps Module D: Reviews or Static Testing i. Why do peer reviews? ii. Cost of fixing faults iii. Typical quantitative benefits iv. What and when to review v. Types of Review
vi. Levels of review 'formality' vii. Informal reviews viii. Walkthroughs ix. Formal technical review x. Inspections xi. Conducting the review meeting xii. Three possible review outcomes xiii. Deliverables and outcomes of a review xiv. Pitfalls g. Static Analysis i. Static analysis defined ii. Compilers iii. 'Simple' static analysis iv. Data flow analysis v. Definition-use examples vi. Nine possible d, k, and u combinations vii. Code and control-flow graph viii. Control flow graph ix. Control flow (CF) graphs and testing x. Complexity measures
Module E : Test Management h. Organisation We need to consider how the testing team will be organised. In small projects, it might be an individual who simply has to organise his own work. In bigger projects, we need to establish a structure for the various roles that different people in the team have. Establishing a test team takes time and attention in all projects. i. Who does the testing?
So who does what in the overall testing process? Programmers do the ad-hoc testing It’s quite clear that the programmers should do the ad hoc testing. They probably code a little and test a little simply to demonstrate to themselves that the last few lines of code they have created work correctly. It’s informal, undocumented testing and is private to the programmer. No one outside the programming team sees any of this. Programmers, or other team members may do sub-system testing Subsystem testing is component testing and link testing. The programmers who wrote the code and interfaces normally do the testing simply because it requires a certain amount of technical knowledge. On occasions, it might be conducted by another member of the programming team, either to introduce a degree of independence or to spread out the workload. Independent teams usually do system testing System testing addresses the entire system. It is the first point at which we’d definitely expect to see some independent test activity (in so far as the people who wrote the code won’t be doing the testing). For a nontrivial system, it’s a large-scale activity and certainly involves several people requiring problem management and attention to organisational aspects. Team members include dedicated testers and business analysts or other people from the IT department, and possibly some users. Users (with support) do the users acceptance testing User acceptance testing, on the other hand, is always independent. The users bring their business knowledge to the definition of a test. However, they normally need support on how to organise the overall process and how to construct test cases that are viable. Independent organisations may be called upon to do any of the above testing formally. On occasions there is a need to demonstrate complete independence in testing. This is usually to comply with some regulatory framework or perhaps there is particular concern over risks due to a lack of independence. An independent company may be hired to plan and execute tests. In principle, third party companies and outsource companies, can do any of the layers of testing from component through system or user acceptance testing, but it’s most usual to see them doing system testing or contractual acceptance testing. j. Independence
Independence of mind is the issue When we think about independence in testing, it’s not who runs the test that matters. If a test has been defined in detail, the person running the test will be following instructions (put simply, the person will be following the test script). Whether a tool or a person executes the tests is irrelevant because the instructions describe exactly what that tester must do. When a test finds a bug, it’s very clear that it’s the person who designed that test that has detected the bug and not the person who entered the data. So, the key issue of independence is not who executes the test but who designs the tests. Good programmers can test their own code if they adopt the right attitude The biggest influence on the quality of the tests is the point of the view of the person designing those tests. It’s very difficult for a programmer to be independent. They find it hard to eliminate their assumptions. The problem a programmer has is that sub-consciously they don’t want to see their software fail. Also, programmers are usually under pressure to get the job done quickly and they are keen to write the next new bit of code which is what they see as the interesting part of the job. These factors make it very difficult for them to construct test cases and have a good chance of detecting faults. Of course, there are exceptions and some programmers can be good testers. However, their lack of independence is a barrier to them being as effective as a skilled independent tester. Buddy-checks/testing can reduce the risk of bad assumptions, cognitive dissonance etc. A very useful thing to do is to get programmers in the same team to swap programs so that they are planning and conducting tests on their colleague’s programs. In doing this, they bring a fresh viewpoint because they are not intimately familiar with the program code; they are unlikely to have the same assumptions and they won’t fall into the trap of ‘seeing’ what they want to see. The other reason that this approach is successful is that programmers feel less threatened by their colleagues than by independent testers. Most important is who designs the tests To recap, if tests are documented, then the test execution should be mechanical; that is, anyone could execute those tests. Independence doesn’t affect the quality of test execution, but it significantly affects the quality of test design. The only
reason for having independent people execute tests would be to be certain that the tests are actually run correctly, i.e., using a consistent set of data and software (without manual intervention or patching) in the designated test environment. k. Test team roles
Test manager A Test Manager is really a project manager for the testing project; that is, they plan, organise, manage, and control the testing within their part of the project. There are a number of factors, however, that set a Test Manager apart from other IT project managers. For a start, their key objective is to find faults and on the surface, that is in direct conflict with the overall project’s objective of getting a product out on time. To others in the overall project, they will appear to be destructive, critical and sceptical. Also, the nature of the testing project changes markedly when moving from early stage testing to the final stages of testing. Lastly, a test manager needs a set of technical skills that are quite specific. The Test Manager is a key role in successful testing projects. Test analyst Test analysts are the people, basically, who scope out the testing and gather up the requirements for the test activities to follow. In many ways, they are business analysts because they have to interview users, interpret requirements, and construct tests based on the information gained. Test analysts should be good documenters, in that they will spend a lot of time documenting test specifications, and the clarity with which they do this is key to the success of the tests. The key skills for a test analyst are to be able to analyse requirements, documents, specifications and design documents, and derive a series of test cases. The test cases must be reviewable and give confidence that the right items have been covered. Test analysts will spend a lot of time liasing with other members of the project team. Finally, the test analyst is normally responsible for preparing test reports, whether they are involved in the execution of the test or not. Tester What do testers do? Testers build tests. Working from specifications, they prepare test procedures or scripts, test data, and expected results. They deal with lots of documentation and their understanding and accuracy is key to their success. As well as test preparation, testers execute the tests and keep logs of their progress and the results. When faults are found, the tester will retest the repaired code, usually by repeating the test that detected the failure. Often a large amount of regression testing is necessary because of frequent or extensive code changes and the testers execute these too. If automation is well established, a tester may be in control of executing automated scripts too. Test automation technician The people who construct automated tests, as opposed to manual tests, are ‘test automation technicians’. These people automate manual tests that have been proven to be valuable. The normal sequence of events is for the test automation technician to record (in the same way as a tape-recorder does) the keystrokes and actual outputs of the system. The recording of the test scripts is used as input to the automation process where using the script language provided by the tool they will be manipulated into an automated test. The role of the test automation technician is therefore to create automated test scripts from manual tests and fit them into an automated test suite. The automated scripts are small programs that must be tested like any other program. These test scripts are often run in large numbers. Other activities within the scope of the test automation technician is the preparation of test data, test cases, and expected results based on documented (designed) test plans. Very often, they need to invent ‘dummy data’ because every item of data will not be in the test plan. The test automation technician may also be responsible for executing the automated scripts and preparing reports on the results if tool expertise is necessary to do this. l. Support staff
DBA to help find, extract, manipulate test data Every system has a database as its core. The DBA (database administrator) will need to support the activities of the tester for setting up the test database. They may be expected to help find, extract, manipulate, and construct test data for use in their tests. This may involve the movement of large volumes of data as it is common for whole databases to be exported and imported at the end and start of test cycles. The DBA is a key member of the team. System, network administrators There are a whole range of technical staff that needs to be available to support the testers and their test activities, particularly, from system testing through to acceptance testing. . Operating system specialists, administrators and network administrators may be required to support the test team, particularly in the non-functional side of testing. In a performance test, for example, system and network configurations may have to change to improve performance. Toolsmiths to build utilities to extract data, execute tests, compare results etc. Where automation is used extensively, a key part of any large team involves individuals known as tool smiths, that is, people able to write software as required. These are people who have very strong technical backgrounds; programmers,
who are there to provide utilities to help the test team. Utilities may be required to build or extract test data, to run tests, as harnesses, drivers and to compare results. Experts to provide direction There are two further areas where specialist support is often required. On the technical side, the testers may need assistance in setting up the test environment. From a business perspective, expertise may be required to construct system and acceptance tests that meet the needs of business users. In other words, the test team may need support from experts on the business.
i. Configuration Management Configuration Managemenr or CM is the management and control of the technical resources required to construct a software artefact. A brief definition, but the management and control of software projects is a complex undertaking, and many organisations struggle with chaotic or non-existent control of change, requirements, software components or build. It is the lack of such control that causes testers particular problems. Because of this, CM is introduced to give a flavour of the symptoms of poor CM and the four disciplines that make up CM. m. Symptoms of poor configuration management Can't find latest version of source code or match source to object The easiest way to think about where configuration management (CM) fits is to consider some of the symptoms of poor configuration management. Typical examples are when the developer cannot find the latest version of the source code module in development or no one can find the source code that matches the version in production. Can't replicate previously released version of code for a customer Or if you are a software house and you can’t find the customised version of software that was released to a single customer and there’s a fault reported on it. Bugs that were fixed suddenly reappear Another classic symptom of poor CM is that a bug might have been fixed, the code retested and signed off, and then the bug reappears in a later version. What might have happened was that the code was fixed and released in the morning, and then in the afternoon it was overwritten by another programmer who was working on the same piece of code in parallel. The changes made by the first programmer were overwritten by the old code so the bug reappeared. Wrong functionality shipped Sometimes when the build process itself is manual and/or unreliable, the version of the software that is tested does not become the version that is shipped to a customer. Wrong code tested Another typical symptom is that after a week of testing, the testers report the faults they have found only to be told by the developers ‘actually, you’re testing the wrong version of the software’. Symptoms of poor configuration management are extremely serious because they have significant impacts on testers; most obviously on productivity, but it can be a morale issue as well because it causes a lot of wasted work. Tested features suddenly disappear Alternatively, tested features might suddenly disappear. The screen you might have tested in the morning, is no longer visible or available in the afternoon. Can't trace which customer has which version of code This becomes a serious support issue, usually undermining customer confidence. Simultaneous changes made to same source module by multiple developers and some changes lost. Some issues of control are caused by developers themselves, overwriting each other’s work. Here’s how it happens. There are two changes required to the same source module. Unless we work on the changes serially, which causes a delay, two programmers may reserve the same source code. The first programmer finishes and one set of changes is released back into the library. Now what should happen is that when the second programmer finishes, he applies the changes of the first programme to his code. Faults occur when this doesn’t happen! The second programmer releases his changed code back into the same library, which then overwrites the first programmer’s enhancement of the code. This is the usual cause of software fixes suddenly disappearing. n. Configuration management defined "A four part discipline applying technical and administrative direction, control and surveillance at discrete points in time for the purpose of controlling changes to the software elements and maintaining their integrity and traceability throughout the system development process." Configuration Management, or CM, is a sizeable discipline and takes three to five days to teach comprehensively. However, in essence, CM is easy to describe. It is the "control and management of the resources required to construct a software artefact". However, although the principles might be straightforward, there is a lot to the detail. CM is a very particular process that contributes to the management process for a project. CM is a four-part discipline described on the following slides. o. The answers Configuration Management (CM) provides What is our current software configuration?
When implemented, CM can provide confidence that the changes occurring in a software project are actually under control. CM can provide information regarding the current software configuration; whatever version you’re testing today, you can accurately track down the components and versions comprising that release. What is its status? A CM system will track the status of every component in a project, whether that be tested, tested with bugs, bugs fixed but not yet tested, tested and signed off, and so on. How do we control changes to our configuration? Before a change is made, a CM system can be used to identify, at least at a high level, the impact on any other components or behaviour in the software. Typically, an impact-analysis can help developers understand when they make a change to a single component, what other components call the one that is being changed. This will give an indication as to what potential side effects could exist when the change has been made. What changes have been made to our software? Not only will a CM system have information about current status, it will also keep a history of releases so that the version of any particular component within that release can be tracked too. This gives you trace-ability back to changes over the course of a whole series of releases. Does anyone else's changes affect our software? The CM system can identify all changes that have been made to the version of software that you are now testing. In that respect, it can contribute to the focus for testing on a particular release. p. Software configuration management There are four key areas of Configuration Management or "CM". Configuration Identification relates to the identification of every component that goes into making an application. Very broadly, these are details like naming conventions, registration of components within the database, version and issue numbering, and control form numbering. In Status Accounting, all the transactions that take place within the CM system are logged, and this log can be used for accounting and audit information within the CM library itself. This aspect of CM is for management. Configuration Auditing is a checks and balances exercise that the CM tool itself imposes to ensure integrity of the rules, access rights and authorisations for the reservation and replacement of code.
Configuration Control has three important aspects: the Controlled Area/Library, Problem/Defect Reporting, and Change Control. The Controlled Area/Library function relates to the controlled access to the components; the change, withdrawal, and replacement of components within the library. This is the gateway that is guarded to ensure that the library is not changed in an unauthorized way. The second aspect of Configuration Control is problem or defect reporting. Many CM systems allow you to log incidents or defects against components. The logs can be used to drive changes within the components in the CM system. For example, the problem defect reporting can tell you which components are undergoing change because of an incident report. Also, for
a single component, it could tell you which incidents have been recorded against that component and what subsequent changes have been made. The third area of Configuration Control is Change Control itself. In principle, this is the simple act of identifying which components are affected by a change and maintaining the control over who can withdraw and change code from the software library. Change Control is the tracking and control of changes.
q. CM support to the tester What does configuration management give to the tester? A strong understanding and implementation of CM helps testers... A well-implemented CM system helps testers manage their own testware, in parallel with the software that is being tested. Manage their own testware and their revision levels efficiently In order to ensure that the test materials are aligned with the versions of software components, a good CM system allows test specifications and test scripts to be held or referenced within the CM system itself (whether the CM system holds the testware items or the references to them doesn’t really matter). Associate a given version of a test with the appropriate version of the software to be tested With the test references recorded beside the components, it is possible to relate the tests used to each specific version of the software. Ensure traceability to requirements and problem reports. The CM system can provide the link between requirements documents, specifications, test plans, test specifications, and eventually to an incident report. Some CM tools provide support to testers throughout the process and some CM systems just have the incident reporting facilities that relate directly to the components within a CM system. Ensure problem reports can identify s/w and h/w configurations accurately If the CM system manages incident reports, it’s possible to identify the impact of change within the CM system itself. When an incident is recorded or logged in the CM system under ‘changes made to a component’, the knock-on effects in other areas of the software can potentially be identified through the CM system. This report will give an idea of the regression tests that might be worth repeating. Ensure the right thing is built by development Good CM also helps to ensure that the developers actually build the software correctly. By automating part of the process, a good CM tool eliminates human errors from the build process itself. Ensure the right thing is tested This is obviously a good thing because it ensures that the right software is tested. Ensure the right thing is shipped to the customer. And the right software is shipped to a customer. In other words, the processes of development, testing and release to the customer’s site are consistent. Having this all under control improves the quality of the deliverable and the productivity of the team.
CM support to the project manager
A strong understanding and implementation of CM helps the project manager to: A CM tool provides support to the project manager too. A good CM implementation helps the project manager understand and control the changes to the requirements, and potentially, the impacts. It allows the project members to develop code, knowing that they won’t interfere with each other’s code, as they reserve, create, and change components within the CM system. Programmers are frequently tempted to ‘improve’ code even if there are no faults reported; they will sometimes make changes that haven’t been requested in writing or supported by requirements statements. These changes can cause problems and a good CM tool makes it less likely and certainly more difficult for the developers to make unauthorised changes to software. The CM system also provides the detailed information on the status of the components within the library and this gives the project manager a closer and more technical understanding of the project deliverables themselves. Finally, the CM system ensures the traceability of software instances right back to the requirements and the code that has been tested.
i. Test Estimation, Monitoring, and Control In this module, we consider the essential activities required to project manage the test effort. These are estimation, monitoring and control. The difficulty with estimation is obvious: the time taken to test is indeterminate, because it depends on the quality of the software - poor software takes longer to test. The paradox here, is that we won't know the quality of the software until we have finished testing. Monitoring and control of test execution is primarily concerned with the management of incidents. When a system is passed into the system-level testing, confidence in the quality of the system is finally determined. Confidence may be proved to be well founded or unfounded. In chaotic environments, system test execution can be traumatic because many of the assumptions of completeness and correctness may be found wanting. Consequently, the management of system level testing demands a high level of management commitment and effort. The big questions - "How much testing is enough?" also arises. Just when can we be confident that we have done enough testing, if we expect that time will run out before we finish? According to the textbook, we should finish when the test completion criteria are met, but handling the pressure of squeezed timescales is the final challenge of software test management. s. Test estimates If testing consumes 50% of of the development budget, should test planning comprise 50% of all project planning? Test Stage Unit Link/Integration System Acceptance Notional Estimate 40% 10% 40% 10% Ask a test manager how long it will take to test a system and they’re likely to say, ‘How long is a piece of string?’ To some extent, that’s true, but only if you don’t scope the job at all! It is possible to make reasonable estimates if the planning is done properly and the assumptions are stated clearly. Let’s start by looking at how much of the project cost is testing. Textbooks often quote that testing consumes approximately 50% of the project budget on the average. This can obviously vary depending on the environment and the project. This figure assumes that test activities include reviews, inspections, document walk-throughs (project plans, design and requirements), as well as the dynamic testing of the software deliverables from components through to complete systems. It’s quite clear that the amount of effort consumed by testing is very significant indeed. If one considers the big test effort in a project is, perhaps, half of the total effort in a project, it’s reasonable to propose that test planning, the planning and scheduling of test activities, might consume 50% of all project planning. And that’s quite a serious thing to consider.
Problems in estimating
Total effort for testing is indeterminate Let’s look at the problems in estimating; the difficulty that we have with estimating is that the total effort for testing is indeterminate. If you just consider test execution, you can’t predict before you start how many faults will be detected. You certainly can’t predict their severity; some may be marginal, but others may be real ‘show stoppers’. You can’t predict how easy or difficult it will be to fix problems. You can’t predict the productivity of the developers. Although some faults might be trivial, others might require significant design changes. You can’t predict when testing will stop because you don’t know how many times you will have to execute your system test plan. But if you can estimate test design, you can work out ratios. However, you can still estimate test design, even if you cannot estimate test execution. If you can estimate test design, there are some rules of thumb that can help you work out how long you should provisionally allow for test execution. Total effort for testing is indeterminate Let’s look at the problems in estimating; the difficulty that we have with estimating is that the total effort for testing is indeterminate. If you just consider test execution, you can’t predict before you start how many faults will be detected. You certainly can’t predict their severity; some may be marginal, but others may be real ‘show stoppers’. You can’t predict how easy or difficult it will be to fix problems. You can’t predict the productivity of the developers. Although some faults might be trivial, others might require significant design changes. You can’t predict when testing will stop because you don’t know how many times you will have to execute your system test plan.
But if you can estimate test design, you can work out ratios. However, you can still estimate test design, even if you cannot estimate test execution. If you can estimate test design, there are some rules of thumb that can help you work out how long you should provisionally allow for test execution. u. Allowing enough time to test Allow for all stages in the test process One reason why testing often takes longer than the estimate is that the estimate hasn’t included all of the testing tasks! In other words, people haven’t allowed for all the stages of the test process. Don't underestimate the time taken to set up the testing environment, find data etc. For example, if you’re running a system or acceptance test, the construction, set-up and configuration of a test environment can be a large task. Test environments rarely get created in less than a few days and sometimes require several weeks. Testing rarely goes 'smoothly' Part of the plan must also allow for the fact that we are testing to find faults. Expect to find some and allow for system tests to be run between two and three times. v. 1 – 2 – 3 rules The ‘1-2-3 Rule’ is useful, at least, as a starting point for estimation. The principle is to split the test activities into three stages – specification, preparation, and execution. The ‘1-2-3 Rule’ is about the ratio of the stages. 1 day to specify tests (the test cases) For every day spent on the specification of the test (the test cases or in other words, a description of the conditions to be tested), then it will take two days to prepare the tests. 2 days to prepare tests In the test preparation step we are including specifying the test data, the script, and the expected results. 1-3 days to execute tests (3 if it goes badly) Finally, we say that if everything goes well, it will take one-day to execute the test plan. If things go badly, then it may take three days to execute the tests. 1-2-3 is easy to remember, but you may have different ratios So, the rule becomes ‘one day to specify’, ‘two days to prepare’, ‘one day to execute if everything goes well’. Now, because we know that testing rarely goes smoothly, we should allow for 3 days to execute the tests. And that is the ‘3’ in the ‘1-2-3’ rule. The idea of ‘1-2-3’ is easy to remember, but you have to understand that the ratios are based on experience that may not be applicable to your environment. It may be because of the type of system, the environment, the standards applicable, the availability of good test data, the application knowledge of the testers assigned or any number of other factors, which may cause these ratios to vary. From your experience, you may also realize that perhaps, a one-day allowance for a perfectly running test may be way too low. And in fact, it may be that the ratio of test execution to specification is much higher than 3, when it goes badly. Important thing is to separate spec/prep/exe. The key issue is to separate specification from preparation and execution and then allocate ratios relating to your own environment and experience. w. Estimate versus actual Here's an example of what might happen, the first time you use the 1-2-3 rule. Typically, things will not go exactly as planned. However, the purpose of an estimate is to have some kind of plan that can be monitored. When reality strikes, you can adjust your estimates for next time and hopefully, have a more accurate estimate based on real metrics, not guesswork. • Suppose you estimated that it would take: o 5 days to specify the test and… o 10 days to prepare the test and… o 5 to 15 days to execute the test • When you record actual time it may be that: o preparation actually took 3 times specification o and execution actually took 1.5 times specification (it went very well) • Then, you might adjust your 1, 2, 1-3 ratios to: 1, 4, 1.5-4.5 x. Impact of development slippages
Slippages imply a need for more testing: Let’s look at how a slippage of development impacts testing. We know this never happens, don’t we? Well, if it did and the developers, for whatever reason, proposed that they slipped the delivery into your test environment by two weeks, what options do you have? The first thing you might ask is ‘What is the cause of the slippage?’ Were the original estimates too low? Is it now recognised that the project is bigger than was originally thought? Or is the project more complicated than anticipated, either because of the software or the business rules? Or is the reason for
slippage due to the poor quality of their work and unit testing has been delayed? All of these problems tell you something about how much testing you should do. It’s a fairly rational conclusion to make that if development slips, more testing is required. In other words, if the project is more complicated or bigger or the quality of the product is poorer, then we would automatically think that perhaps we’ve underestimated the amount of testing that may be required. However, in reality, we usually see the opposite happen. This is where we get the classic ‘squeeze’ in testing. Because we have a fixed deadline for delivery of the overall system, the slippage in development forces us to make a choice. We cannot have any more time, so we have to ‘squeeze’ the testing. A slippage in development forces a choice: When we enter the testing phase late, logically there are only three options. We can accept the fact that there will be lower quality in the deliverables because we can’t complete the test plan. Or maybe we get more efficient at testing or add more testers and try to compress the schedule. In a short time, this is very difficult to achieve in reality. Or maybe we should prepare for the slippage in delivery. It’s not a choice that any project manager likes to face, as none of the alternatives are very attractive. Given the three, if we leave deadlines fixed, we’ll get lower quality. If we insist on completing the test process as originally planned, then we must anticipate that there might be even further slippage in delivery other than the late start because the number of problems found, given the situation, may well be greater than planned. y. Monitoring progress
Progress through the test plan: Let’s move to looking at how we monitor progress through the test plan. When we start executing tests, we’ll find that although some test scripts will run without failure, we’ll also find that many tests run with failures. We may not log the fact that these test scripts have not run all of the way through, and until all of the tests scripts run to the end, the ratio will not improve. We will find that the ratio of the number of tests executed without failure compared with those executed with failures will increase over time. Early on we have many failures and towards the end of the testing, the number of failures will decrease. What is interesting is to look at the trend and progress rate of the ratio (of tests completed without failures compared with those with failures). Incident status If we monitor the incidents themselves, we might track incidents raised, incidents cleared, incidents ‘fixed’ and awaiting retest and those that are still outstanding. Again, looking at ratios between those closed and those outstanding, the ratio should improve over time. z. When to stop testing
Test strategy should define policy There will come a time towards the end of the test execution phase, when we have to consider when to stop testing. In principle, the test strategy should define the policy; that is, the exit criteria for the test phase. The strategy should set clear, objective criteria for completion of the test execution phase. Typically, this contains statements specifying that the test plan is complete, all high-severity faults fixed (and retested), and that regression tests have been run as deemed necessary. Not always as clear cut The difficulty is that this is not always clear-cut because we have run out of time before we have completed the test plan. aa. Fault detection rate Increasing - keep testing An important point to consider is the fault-detection rate; that is, the rate at which we are raising incidents, that when diagnosed, are actually software faults. The number can be reviewed on a day-by-day or week-by-week basis. If the rate is increasing, we certainly haven’t reached the end of the faults to be found in the system. Stable and high - keep testing. If, the number that we’re finding is high, but stable, we may consider stopping, but we should probably keep testing. As we progress throught the test plan, there are usually three distinct stages that we can recognise. Early on, the number of incidents raised increases rapidly, then we reach a peak and the rate diminishes over the final stages. If we run out of time as the number of new incidents being logged is decreasing it might be safe to stop testing but we must make some careful considerations: What is the status of the outstanding incidents? Are they severe enough to preclude acceptance? If so, we cannot stop testing and release now.
What tests remain to be completed? If there are tests of critical functionality that remain to be done, it would be unsafe to stop testing and release now. If we are coming towards the end of the test plan, the stakeholders and management may take the view testing can stop before the test plan is complete if (and only if) the outstanding tests cover functionality that is non-critial or low risk. The job of the tester is to provide sufficient information for the stakeholders and management to make this judgement.
bb. Running out of time If you have additional test to run, what are the risks of not running before release? Suppose we’re coming towards the end of the time in the plan for test execution, what is the risk of releasing the software before we complete the test plan? What are the severity of outstanding faults? We have to take a look at the severity of the outstanding faults. For each of the outstanding faults, we have to take a view on whether the fault would preclude release. That is, is this problem so severe that the system wouldn’t be worth using or would cause an unacceptable disruption to the business. Alternatively, there may be outstanding faults that the customer won’t like, but they could live with if necessary. It may also be a situation where the fault relates to an end-of-month process or procedure, which the software has to support. If the end-of-month procedure won’t be executed for another forty days or it is a procedure that could be done manually for the first month, then you may still decide to go ahead with the implementation. Can you continue testing, but release anyway? One last point to consider – just because the software is released doesn’t mean that testing must stop. The test team can continue to find and record faults rather than waiting for the users to find problems.
i. Incident Management We’ve talked about incidents occurring on tests already, but we need to spend some time talking about the management of incidents. Once the project moves into system or acceptance testing phases, to some extent, the project is driven by the incidents. It’s the incidents that trigger activities in the remainder of the project. And the statistics about the incidents provide a good insight as to the status of the project at any moment in time. cc. What is an incident? Unplanned events occurring during testing that have a bearing on the success of the test The formal definition of an incident is an event that occurs during the testing that has a bearing on the success of the test. This might be a concern over the quality of the software because there’s a failure in the test itself. Or it may be something that’s outside the control of the testers, like machine crashes, or there’s a loss of the network, or maybe a lack of test resource. Something that stops or delays testing Going back to the formal definition, an incident is something that occurred that has a bearing on the test. Incident management is about logging and controlling those events. They may relate to either the system under test or the environment or the resource available to conduct the test. Incidents should be logged when independent testers do the testing Incidents are normally formally logged during system and aceptance testing, when independent teams of testers are involved. dd. When a test result is different from the expected result... It could be... When you run a test and the expected results do not match the actual results, it could be due to a number of reasons. The issue here is that the tester shouldn’t jump to the conclusion that it’s a software fault. For example, it could be something wrong with the test itself; the test script may be incorrect in the commands it expected to appear or the expected result may have been predicted incorrectly. Maybe there was a misinterpretation of the requirements. It could be that the tester executing the test didn’t follow the script and made a slip in the entry of some test data and that is what’s caused the software to behave differently than expected. It could be that the results themselves are correct but the tester misunderstood what they saw on the screen or on a printed report. Another issue could be that it might be the test environment. Again, test environments are often quite fluid and changes are being made continuously to refine their behaviour. Potentially, a change in the configuration of the software in the test environment could cause a changed behaviour of the software under test. Maybe the wrong version of a database was loaded or the base parameters were changed since the last test. Finally, it could be something wrong with the baseline; that is, the document upon which the tests are being based is incorrect. The requirement itself is wrong. Or it COULD BE a software fault. It could be any of the reasons above, but it could also be a software fault. A tester’s role in interpreting incidents is that they should be really careful about identifying what the nature of the problem is before they consider calling it a ‘software fault’. There is no faster way to upset developers than raising incidents that are classified as software faults, but upon closer investigation, are not. Although the testers may be under great pressure to complete their tests on time and feel that they do not have time for further analysis, typically the developers are under even greater pressure themselves. ee. Incident logging Tester should stop and complete an incident log What happens when you run a test and the test itself displays an unexpected result? The tester should stop what they’re doing and complete an incident log. It’s most important that the tester completes the log at the time of the test and not wait a few minutes and perhaps do it when it’s more convenient. The tester should log the event as soon as possible after it occurs. What goes into an incident log? The tester should describe exactly what is wrong. What did they see? What did they witness that made them think that the software was not behaving the way it should? They should record the test script they’re following and potentially, the test step at which the software failed to meet an expected result. If appropriate, they should attach any output – screen dumps, print outs, any information that might be deemed useful to a developer so that they can reproduce the problem. Part of the incident log should be an assessment on whether the failure in this script has an impact on other tests that have to be completed. Potentially, if a test fails, it may be a test that has no bearing on the successful completion of any other test. It’s completely independent. However, some tests are designed to create test data for later tests. So, it may be that a failure in one script may cause the rest of the scripts that need to be completed on that day to be shelved because they cannot be run without the first one being corrected. Why do we create incident logs with such a lot of detail? Consider what happens when the developer is told that there may be a potential problem in the software. The developer will use the information contained in the incident report to reproduce the fault. If the developer cannot reproduce the fault (because there’s not enough information on the log), it’s unreasonable
to expect him to fix the problem – he can’t see anything wrong! In cases like this, the developer will say that that no fault has been found when they run the test. In a way, the software is innocent until proven guilty. And that’s not just because developers are being difficult. They cannot start fixing a problem if they have no way to diagnose where the problem might be. So, in order not to waste a lot of time for the developers and yourself, it’s most important that incident logs are created accurately. One further way of passing information test infomation to developers is to record tests using a record/playback tool. It is not that the developer uses the script to replay the test, rather, that they have the exact keystrokes, button presses and data values required to reproduce the problem. It stops dead the comment, "you must have done something wrong, run it again." This might save you a lot of time. ff. Typical test execution and incident management process If you look at the point in the diagram where we run a test, you will see that after we run the test itself we raise an incident to cover any unplanned event. It could be that the tester has made an error so this is not a real incident and needn’t be logged. Where a real incident arises, it should be diagnosed to identify the nature of the problem. It could be that we decide that it is not significant so the test could still proceed to completion.
gg. Incident management process Diagnose incident If it’s determined that there’s a real problem that can be reproduced by the tester and it’s not the tester’s fault, the incident should be logged and classified. It will be classified, based on the information available, as to whether it is an environmental problem, a testware problem or a problem with the software itself. It will then be assigned to the relevant team or to a person who will own the problem, even if it is only temporarily. hh. Resolving the incident Here are the most common incident types and how they would normally be resolved. Fix tester If the tester made a slip during the testing, they should restart the script and follow it to the letter. Fix testware: baseline, test specs, scripts or expected results If the problem is the accuracy of any of the test materials these need to be corrected quickly and the test restarted. On occasion, it may be the baseline document itself that is at fault (and the test scripts reflect this problem. The baseline itself should be corrected and the test materials adjusted to align with the changed baseline. Then the test must restart. Fix environment If the environment is at fault, then the system needs reconfiguring correctly, or the test data adjusting/rebuilding to restore the environment to the required, known state. Then the test should restart. Fix-software, re-build and release Where the incident revealed a fault in the software, the developers will correct the fault and re-release the fix. In this case, the tester needs to restore the test environment to the required state and re-test using the script that exposed the fault.
Then queue for re-test. Often, there has to be a delay (while other tests complete) before failed tests can be re-run. In this case, the re-tests will have to wait until the test schedule allows them to be run. ii. Incident classification i. Priority
Priority determined by testers We’ve covered the type of problem. Let’s look at, first, the issue of priority. This means priority from a testing viewpoint and is the main influence about when the problem will get fixed. The tester should decide whether an incident is of high, medium, or low priority, or whatever gradations you care to implement. To recap, the priority indicates the urgency of this problem to the testers themselves so the urgency relates to, how big an impact the failure has on the rest of testing. A high priority would be one that stops all testing. And if no testing can be done and at this point in the project, testing is on the critical path, then the whole project stops. If the failed script stops some but not all testing, then it might be considered a medum priority incident. It might be considered a low priority incident if all other tests can proceed. ii. Severity Severity determined by users Let’s talk about severity. The severity relates to the acceptability or otherwise of the faults found. Determination of the severity should be done by the end users themselves. Ultimately, the severity reflects the acceptability of that fault in the final deliverable. So, a software fault that is severe would relate to a fault that is unacceptable as far as the delivery of the software into production is concerned. If a high severity fault is in the software at the time of the end of the test, then the system will be deemed unacceptable. If the fault is minor, it might be deemed of low severity and users might choose to implement this software even if it still had the fault. jj. Software fixing
The developers must have enough information to reproduce the problem Let’s look briefly at what developers do with incident reports and when they come to fix software faults. Developers must have enough information to reproduce the problem. If developers can't reproduce it, they probably can't fix it Because if the developers cannot reproduce it, they probably cannot fix the issue because they cannot see it. Testers can anticipate this problem by trying to reproduce the problem themselves. They should also make sure that their description of the incident is adequate. Incidents get prioritised and developer resources get assigned according to priority. To revisit the priority assigned to an incident, developer resources will get assigned according to that priority. This isn’t the same as the severity. The decision that we’ll have to make towards the end of the test phase is "which incidents get worked upon based on priority and also severity"?
kk. Testability Essentially, we can think of testability as the ease by which a tester can specify, implement, execute and analyse tests of software. This module touches on an issue that is critical to the tester ll. Testability definitions (testable requirements)
"The extent to which software facilitates both the establishment of test criteria and the evaluation of the software with respect to those criteria" or "The extent to which the definition of requirements facilitates analysis of the requirements to establish test criteria." mm.A broad definition of testability Here is a less formal, broader definition of testability, which overlaps 80-90% with the standard, but is actually more useful. Testability is the ease by which testers can do their job. The ease by which testers can: It’s the ease by which a tester can specify tests. Namely, are the requirements in a form that you can derive test plans from in a straightforward, systematic way? The ease by which a tester can prepare tests. How difficult is it to construct test plans and procedures that are effective? Can we create a relatively simple test database, simple test script? Is it easy to run tests and understand and interpret the test results? Or when we run tests, does it take days to get to the bottom about where the results are? Do we have to plough through mountains of data? In other words, we are talking about the ease by which we can analyse results and say, pass or fail. How difficult is it to diagnose incidents and point to the source of the fault. nn. Requirements and testability Cannot derive meaningful tests from untestable requirements Requirements are the main problem that we have as testers. If we have untestable requirements, it is impossible to derive meaningful tests. That is the issue. You might ask, 'if we are unable to build test cases, how did the developers know what to build?' This is a valid question and highlights the real problem. The problem is that it is quite feasible for a developer to just get on with it and build the system as he sees it. But if the requirements are untestable, it’s impossible to see if he built the right system. But that's the testers' problem. Complex systems can be untestable: In today’s distributed, web-enabled, client/server world, there is a problem of the system complexity effectively rendering the system untestable. It’s too complex for one person to understand. The specs may be nonexistent, but if they were written, they are far too technical for most testers to understand. Most of the functionality is hidden. We’re building very sophisticated, complex systems from off-the-shelf components. This is good news. It makes the developer’s job much easier because they just import functionality. But the testing effort hasn’t been reduced. We still have to test the same old way, regardless of who built it and whether it’s off-the-shelf or not. So, life for the tester is just as hard as ever, but the developers are suddenly, remarkably, more productive. The difficulty for testers is that they are being asked to test more complex systems with less resource because, of course, you only need 20% of the effort of the developers. oo. Complex systems and testability Can't design tests to exercise vast functionality So testers are expected to test more and more. They are under additional pressure now that off-the-shelf components are being used more. One of the difficulties we have is that we can’t design enough tests. We may have a system that has been built by three people in about a month, but it can still be massively complex. We can’t possibly design tests to exercise all of the functionality. Can't design tests to exercise complex interactions between components We know that these systems are built from components, but we don’t know where there are interactions between components. So we know that there are interactions, but because we don’t exactly where they are, we can’t test them specifically. Do the developers test them? It’s difficult to say. They tend to trust brought-in software because they say, we’re buying off-the-shelf components, it must work. And they are much more concerned with their own custom-built code than off-the-shelf stuff. Difficult to diagnose incidents when raised. When you run a test, is it clear what’s gone wrong? The problem with all of these components is that they’re all messagebased. There’s not a clear hierarchy of responsibility – which event triggered what. You have lots of service components, all talking to each other simultaneously. There is no sequencing you can track. So, you can’t diagnose faults very easily at all. This is a big issue for testers.
pp. Improving testability Testability is going the wrong way. It’s getting worse. How might we improve testability? Here are a few ideas that influence testability, that have a critical effect on testing. Requirements reviewed by testers for testability One way might be to get the testers to review the requirements as they are written. They would review it from the point of view of how will I prepare test plans on this document? Software understandable by testers If you could get developers to write software that testers could understand, that would help, but this is probably impractical. Or is it? If the testers can’t understand it, how are the users going to understand it? The users need to. Software easy to configure to test When you buy a car, you expect it to work. Why do you have to test it? If you’re buying a factory-made product, you expect it to have been significantly tested before it reaches you, and it should work. But even with the example of a car, the only thing you can do to test it is to drive it. This is rather like the functional test. You still won’t know whether the engine will fall apart after 20,000 miles. Software has the same problems. If you do want to test it, you’ve suddenly opened up a can of worms. You have to have such knowledge of the technical architecture and how it all works together that it’s an overwhelming task. How can we possibly create software that is understandable from the point of view of testers getting under the bonnet and looking at the lower-level components? To effectively test components, you need to be able to separate them and test them in isolation. This can be really difficult. Software which can provide data about its internal state The most promising trend is that software is beginning to have instrumentation that will tell you about its behaviour internally. So, quite a lot of the services that run on servers in complex environments generate logging that you can trace as testers. Behaviour which is easy to interpret Another thing that we need to make testability easier is behaviour that is easy to interpret. That is, it’s obvious when the software is working correctly or incorrectly. Software which can 'self-test'. Wouldn't it be nice if software could 'self-test'? Just like hardware, software could perhaps make decisions about its behaviour and tell you when it’s going wrong. Operating system software and some embedded systems do self diagnosis to verify that their internal state is sound. Most software doesn’t do that of course.
i. Standards for Testing qq. Types of standard rr. What the standard covers... A generic test process for software component testing BS7925-2 is a good document. Although it’s wordy with lots of standard-sounding language, it is highly recommended in that it provides a generic clean process for component testing. It is uncomplicated from that point of view. It’s probably more appropriate for a high-integrity environment with formal unit testing, than a small commercial environment. That does not mean that it’s completely useless to you if you’re working in a ‘low-integrity’ environment or you don’t have formal unit testing. A component is the lowest level software entity with an separate specification The component is the lowest-level software entity with a separate spec. If you have a spec for a piece of code, whatever you were going to test against that spec, you could call that a component. It might be a simple sub-routine, a little piece of “C” or it could be a class file, or a window in an application. It could be anything that you might call a module, where you can separate it out and test it in isolation against the document that specifies its behaviour. To recap, if you can test it in isolation, it’s probably a component. Intended to be auditable, as its use may be mandated by customers The purpose of a standard, among other things, is to be auditable. One of the intended uses of the standard is that potential customers may mandate to suppliers of software that this standard is adhered to. Covers dynamic execution only. It only covers dynamic testing, so it’s not about inspections, reviews, or anything like that. It’s about dynamic tests at a component level. ss. The standard does not cover... The standard makes clear statements about its scope. Selection of test design or measurement techniques The standard does not cover the selection of test design or measurement techniques. What that means is that it cannot tell you which test design or measurement technique you should use in your application area because there are no definitive metrics that prove that one technique is better than another. What the standard does provide is a definition of the most useful techniques that are available. The test design and measurement techniques that you should use on your projects would normally be identified in your own internal standards or be mandated by industry standards that you may be obliged to use. Personnel selection or who does the testing The standard doesn’t tell you who should do the testing. Although the standard implies that independence is a ‘good thing’, it only mandates that you document the degree of independence employed. It doesn’t imply that an independent individual or company must do all the testing or that another developer or independent tester must do test design. There are no recommendations in that regard. Implementation (how required attributes of the test process are to be achieved e.g. tools) The standard doesn’t make any recommendations or instructions to do with the implementation of tests. It doesn’t give you any insight as to how the test environment might be created or what tools you might use to execute tests themselves. It’s entirely generic in that regard. Fault removal (a separate process to fault detection). Finally, fault removal is regarded as a separate process to fault detection. The process of fault removal normally occurs in parallel with the fault detection process but is not described in the standard. tt. The component test strategy... ... shall specify the techniques to be employed in the design of test cases and the rationale for their choice... What the component-testing standard does say is that you should have a strategy for component testing. The test strategy for components should specify the techniques you are going to employ in the design of test cases and the rationale for their choice. So although the standard doesn’t mandate one test technique above another, it does mandate that you record the decision that nominated the techniques that you use. ... shall specify criteria for test completion and the rationale for their choice... The standard also mandates that within your test strategy you specify criteria for test completion. These are also often called exit or acceptance criteria for the test stage. Again, it doesn’t mandate what these criteria are, but it does mandate that you document the rationale for the choice of those criteria
Degree of independence required of personnel designing test cases e.g.: A significant issue, with regard to component testing, is the degree of independence required by your test strategy. Again, the standard mandates that your test strategy defines the degree of independence used in the design of test cases but doesn’t make any recommendation on how independent these individuals or the ‘test agency’ will be. The standard does offer some possible options for deciding who does the testing. For example, you might decide that the person who writes the component under test also writes the test cases. You might have an independent person writing the test cases or you might have people from a different section in the company, from a different company. You might ultimately decide that a person should not choose the test cases at all - you might employ a tool to do this uu. Documentation required... Finally, the standard mandates that you document certain other issues in a component test strategy. Whether testing is done in isolation, bottom-up or top-down approaches, or some mixture of these The first one of these is that the strategy should describe how the testing is done with regard to the component's isolation; that is, whether the component is tested in a bottom-up or top-down method of integration or some mixture of these. The requirement here is to document whether you’re using stubs and drivers, in addition to the components of the test, to execute tests. Environment in which component tests will be executed The next thing that the strategy mandates is a description of the environment in which the component testing takes place. Here, one would be looking at the operating system, database, and other scaffolding software that might be required for component tests to be completed. Again, this might cover issues like the networking and Internet infrastructure that you may have to test the components within. est process that shall be used for component testing. The standard mandates that you document the process that you will actually use. Whether you use the process in BS79252 or not, the process that you do use should be described in enough detail for an auditor to understand how the testing has actually been done vv. Test measurement techniques There are five stages in the component test process described in the standard. The standard mandates that the test process activities occur in a defined order; that is, planning, specification, execution, and recording, and the verification of test completion occur in that order. It is clear that in many circumstances, there can be iterations around the loops of the sequence, of the five activities, and there is also a possibility of repeated stages on one or more of the test cases within the test plan for a component. The documentation for the test process in use in your environment should define the testing activities to be performed and the inputs and outputs of each activity. Planning starts the test process and Check for Completion ends it. These activities are carried out for the whole component. Specification, Execution, and Recording can, on any one iteration, be carried out for a subset of the test cases associated with a component. It is possible that later activities for one test case can occur before earlier activities for another. Whenever a fault is corrected by making a change or changes to test materials or the component under test, the affected activities should be repeated. The five generic test activities are briefly described: Planning: The test plan should specify how the project component test strategy and project test plan apply to the component under test. This includes specific identification of all exceptions to project test strategies and all software with which the component under test will interact during test execution, such as drivers and stubs. Specification: Test cases should be designed using the test case design techniques selected in the test planning activity. Each test case should identify its objective, the initial state of the component, its input(s), and the expected outcome. The objective should be described in terms of the test case design technique being used, such as the partition boundaries exercised. Execution: Test cases should be executed as described in the component test specification. Recording: For each test case, test records should show the identities and versions of the component under test and the test specification. The actual outcome should also be recorded. It should be possible to establish that all the specified testing activities have been carried out by reference to the test reports. Any discrepancy between the actual outcome and the expected outcome should be logged and analysed in order to establish where the problem lies. The earliest test activity that should be repeated in order to remove the discrepancy should be identified. For each of the measure(s) specified as test completion criteria in the plan, the coverage actually achieved should also be recorded. Check for Completion: The test records should be checked against the test completion criteria. If these criteria are not met, the earliest test activity that has to be repeated in order to meet the criteria shall be identified and the test process shall be restarted from that point. It may be necessary to repeat the test specification activity to design further test cases to meet a test coverage target
ww. Standard definition of Technique The standard gives you comprehensive definitions of the techniques to be used within the testing itself. Test case design techniques to help users design tests The aim is that test case design techniques can help the users of the standard to construct test cases themselves. Test measurement techniques to help users (and customers) measure the testing The measurement techniques will help testers, and potentially customers, to measure how much testing has actually been done. To promote The purpose in using these design and measurement techniques is to promote a set of consistent and repeatable test practices within the component testing discipline. The process and techniques provide a common understanding between developers, testers, and the customers of software of how testing has been done. This will enable an objective comparison of testing done on various components, potentially by different suppliers. xx. Test case design and measurement One innovation of the standard is that it clarifies two important concepts of test design and test measurement. Test design: The test design activity is split into two, what you might call the analysis, and then the actual design of the test cases themselves. The analysis uses a selected model of the software (control flowgraphs), or the requirements (equivalence partitions) and the model is used to identify what are called coverage items. From the list of coverage items, test cases are developed that will exercise (cover) each coverage item. For example, if you are using control flowgraphs as a model for the software under test, you might use the branch-outcomes as the coverage item to derive test cases from. Test measurement: The same model can then be used for test measurement. If you adopt the branch coverage model and your coverage items are the branches themselves, you can set an objective coverage target and that could be, for example, “100% branch coverage”. Coverage targets based on the techniques in the standard can be adopted before the code is designed or written. The techniques are objective. You’ll certainly achieve a degree of confidence that the software has been exercised adequately, but the test design process is repeatable in that the rule is objective. If you follow the technique and the process that uses that technique to derive test cases then, in principle, the same test cases will be extracted from that model. Normally, coverage targets are set at 100% but sometimes this is impractical perhaps because some branches in software may be unreachable except by executing obscure, error conditions. Test coverage targets less than 100% may be used in these circumstances Model could be used to find faults in a baseline. The process of deriving test cases from a specification can find faults in the specification. Black-box techniques in particular make missing or conflicting requirements stand out and easily identified.
yy. Test case desing technique These are the test design techniques defined in the BS 7925-2 Standard for Component Testing. In this course, we will look at the techniques in red in a little more detail. They are manadatory for the ISEB syllabus. We will also spend a little time looking at State Transition Testing (in blue) but there will not be a question on this in the exam. Equivalence partitioning Boundary value analysis State transition Cause-effect graphing Syntax Statement Branch/decision Data flow Branch condition Branch condition combination Modified condition decision LCSAJ Random Other techniques.
zz. Test measurement technique Nearly all of the the test design techniques can be used to to define coverage targets. In this course, we will look at the techniques in red in a little more detail. They are manadatory for the ISEB syllabus. We will also spend a little time looking at State Transition Testing (in blue) but there will not be a question on this in the exam. Equivalence partitioning coverage Boundary value coverage State transition coverage Cause-effect graphing Statement coverage Branch/decision coverage Data flow coverage Branch condition coverage Branch condition combination coverage Modified condition decision coverage LCSAJ coverage Random testing
Module F: Tool Support for Testing : i. Tool Support for Testing There are a surprising number of types of CAST (Computer Aided Software Testing) Tools now available. Tools are available to support test design, preparation, execution, analysis and management. This module provides an overview of the main types of test tool available and their range of applicability in the test process. aaa.Types of CAST Tool bbb.Categories of CAST tools ccc.Static analysis tools ddd.Requirements testing tools eee.Test design tools fff. Test data preparation tools ggg.Batch test execution tools hhh.On-line test execution tools iii. GUI testing jjj. GUI test stages kkk.Test harnesses lll. Test drivers mmm.File comparison nnn.Performance testing toolkit ooo.Debugging ppp.Dynamic analysis qqq.Source coverage rrr. Test ware management sss.Incident management ttt. Analysis, reporting, and metrics
i. Tool Selection and Implementation uuu.Overview of the selection process vvv. Where to start www.Tool selection considerations xxx.CAST limitations yyy. CAST availability zzz.The tool selection and evaluation team aaaa.Evaluating the shortlist bbbb.Tool implementation process cccc.Pilot project dddd.Evaluation of pilot eeee.Planned phased installation ffff. Keys to success gggg.More keys to success hhhh.Three routes to "shelf ware" iiii. Documentation jjjj. Test Database kkkk.Test Case llll. Test Matrix
i. Glossary and Testing Terms mmmm.Acceptance testing: Formal testing conducted to enable a user, customer, or other authorized entity to determine whether to accept a system or component. nnnn.Actual outcome: The behaviour actually produced when the object is tested under specified conditions. oooo.Ad hoc testing: Testing carried out using no recognised test case design technique. pppp.Alpha testing: Simulated or actual operational testing at an in-house site not otherwise involved with the software developers. qqqq.Arc testing: See branch testing. rrrr. Backus-Naur form: A metalanguage used to formally describe the syntax of a language. ssss.Basic block: A sequence of one or more consecutive, executable statements containing no branches. tttt. Basis test set: A set of test cases derived from the code logic which ensure that 100\% branch coverage is achieved. uuuu.Bebugging: See error seeding. vvvv.Behaviour: The combination of input values and preconditions and the required response for a function of a system. The full specification of a function would normally comprise one or more behaviours. wwww.Beta testing: Operational testing at a site not otherwise involved with the software developers. xxxx.Big-bang testing: Integration testing where no incremental testing takes place prior to all the system's components being combined to form the system. yyyy.Black box testing: See functional test case design. zzzz.Bottom-up testing: An approach to integration testing where the lowest level components are tested first, then used to facilitate the testing of higher level components. The process is repeated until the component at the top of the hierarchy is tested. aaaaa.Boundary value analysis: A test case design technique for a component in which test cases are designed which include representatives of boundary values. bbbbb.Boundary value coverage: The percentage of boundary values of the component's equivalence classes, which have been exercised by a test case suite. ccccc.Boundary value testing: See boundary value analysis. ddddd.Boundary value: An input value or output value which is on the boundary between equivalence classes, or an incremental distance either side of the boundary. eeeee.Branch condition combination coverage: The percentage of combinations of all branch condition outcomes in every decision that have been exercised by a test case suite. fffff. Branch condition combination testing: A test case design technique in which test cases are designed to execute combinations of branch condition outcomes. ggggg.Branch condition coverage: The percentage of branch condition outcomes in every decision that have been exercised by a test case suite. hhhhh.Branch condition testing: A test case design technique in which test cases are designed to execute branch condition outcomes. iiiii. Branch condition: See decision condition. jjjjj. Branch coverage: The percentage of branches that have been exercised by a test case suite kkkkk.Branch outcome: See decision outcome. lllll. Branch point: See decision. mmmmm.Branch testing: A test case design technique for a component in which test cases are designed to execute
nnnnn.Branch outcomes. ooooo.Branch: A conditional transfer of control from any statement to any other statement in a component, or an unconditional transfer of control from any statement to any other statement in the component except the next statement, or when a component has more than one entry point, a transfer of control to an entry point of the component. ppppp.Bug seeding: See error seeding. qqqqq.Bug: See fault. rrrrr.Capture/playback tool: A test tool that records test input as it is sent to the software under test. The input cases stored can then be used to reproduce the test later. sssss.Capture/replay tool: See capture/playback tool. ttttt. CAST: Acronym for computer-aided software testing. uuuuu.Cause-effect graph: A graphical representation of inputs or stimuli (causes) with their associated outputs (effects), which can be used to design test cases. vvvvv.Cause-effect graphing: A test case design technique in which test cases are designed by consideration of cause-effect graphs. wwwww.Certification: The process of confirming that a system or component complies with its specified requirements and is acceptable for operational use. xxxxx.Chow's coverage metrics: See N-switch coverage. [Chow] yyyyy.Code coverage: An analysis method that determines which parts of the software have been executed (covered) by the test case suite and which parts have not been executed and therefore may require additional attention. zzzzz.Code-based testing: Designing tests based on objectives derived from the implementation (e.g., tests that execute specific control flow paths or use specific data items). aaaaaa.Compatibility testing: Testing whether the system is compatible with other systems with which it should communicate. bbbbbb.Complete path testing: See exhaustive testing. cccccc.Component testing: The testing of individual software components. dddddd.Component: A minimal software item for which a separate specification is available. eeeeee.Computation data use: A data use not in a condition. Also called C-use. ffffff.Condition coverage: See branch condition coverage. gggggg.Condition outcome: The evaluation of a condition to TRUE or FALSE.
hhhhhh.Condition: A Boolean expression containing no Boolean operators. For instance, A<B is a condition but A and B is
not. iiiiii. Conformance criterion: Some method of judging whether or not the component's action on a particular specified input value conforms to the specification. jjjjjj. Conformance testing: The process of testing that an implementation conforms to the specification on which it is based. kkkkkk.Control flow graph: The diagrammatic representation of the possible alternative control flow paths through a component. llllll. Control flow path: See path. mmmmmm.Control flow: An abstract representation of all possible sequences of events in a program's execution. nnnnnn.Conversion testing: Testing of programs or procedures used to convert data from existing systems for use in replacement systems. oooooo.Correctness: The degree to which software conforms to its specification.
pppppp.Coverage item: An entity or property used as a basis for testing. qqqqqq.Coverage: The degree, expressed as a percentage, to which a test case suite has exercised a specified coverage item. rrrrrr.C-use: See computation data use. ssssss.Data definition C-use coverage: The percentage of data definition C-use pairs in a component that are exercised by a test case suite. tttttt.Data definition C-use pair: A data definition and computation data use, where the data use uses the value defined in the data definition. uuuuuu.Data definition P-use coverage: The percentage of data definition P-use pairs in a component that are exercised by a test case suite. vvvvvv.Data definition P-use pair: A data definition and predicate data use, where the data use uses the value defined in the data definition. wwwwww.Data definition: An executable statement where a variable is assigned a value. xxxxxx.Data definition-use coverage: The percentage of data definition-use pairs in a component that are exercised by a test case suite. yyyyyy.Data definition-use pair: A data definition and data use, where the data use uses the value defined in the data definition. zzzzzz.Data definition-use testing: A test case design technique for a component in which test cases are designed to execute data definition-use pairs. aaaaaaa.Data flow coverage: Test coverage measure based on variable usage within the code. Examples are data definition-use coverage, data definition P-use coverage, data definition C-use coverage, etc. bbbbbbb.Data flow testing: Testing in which test cases are designed based on variable usage within the code. ccccccc.Data use: An executable statement where the value of a variable is accessed. ddddddd.Debugging: The process of finding and removing the causes of failures in software. eeeeeee.Decision condition: A condition within a decision. fffffff.Decision coverage: The percentage of decision outcomes that have been exercised by a test case suite. ggggggg.Decision outcome: The result of a decision (which therefore determines the control flow alternative taken). hhhhhhh.Decision: A program point at which the control flow has two or more alternative routes. iiiiiii.Design-based testing: Designing tests based on objectives derived from the architectural or detail design of the software (e.g., tests that execute specific invocation paths or probe the worst case behaviour of algorithms). jjjjjjj.Desk checking: The testing of software by the manual simulation of its execution. kkkkkkk.Dirty testing: See negative testing. lllllll.Documentation testing: Testing concerned with the accuracy of documentation. mmmmmmm.Domain testing: See equivalence partition testing. nnnnnnn.Domain: The set from which values are selected. ooooooo.Dynamic analysis: The process of evaluating a system or component based upon its behaviour during execution. ppppppp.Emulator: A device, computer program, or system that accepts the same inputs and produces the same outputs as a given system. qqqqqqq.Entry point: The first executable statement within a component.
rrrrrrr.Equivalence class: A portion of the component's input or output domains for which the component's behaviour is assumed to be the same from the component's specification. sssssss.Equivalence partition coverage: The percentage of equivalence classes generated for the component, which have been exercised by a test case suite. ttttttt.Equivalence partition testing: A test case design technique for a component in which test cases are designed to execute representatives from equivalence classes. uuuuuuu.Equivalence partition: See equivalence class. vvvvvvv.Error guessing: A test case design technique where the experience of the tester is used to postulate what faults might occur, and to design tests specifically to expose them. wwwwwww.Error seeding: The process of intentionally adding known faults to those already in a computer program for the purpose of monitoring the rate of detection and removal, and estimating the number of faults remaining in the program. xxxxxxx.Error: A human action that produces an incorrect result. yyyyyyy.Executable statement: A statement which, when compiled, is translated into object code, which will be executed procedurally when the program is running and may perform an action on program data. zzzzzzz.Exercised: A program element is exercised by a test case when the input value causes the execution of that element, such as a statement, branch, or other structural element. aaaaaaaa.Exhaustive testing: A test case design technique in which the test case suite comprises all combinations of input values and preconditions for component variables. bbbbbbbb.Exit point: The last executable statement within a component. cccccccc.Expected outcome: See predicted outcome. dddddddd.Facility testing: See functional test case design. eeeeeeee.Failure: Deviation of the software from its expected delivery or service. [Fenton] ffffffff.Fault: A manifestation of an error in software. A fault, if encountered may cause a failure. gggggggg.Feasible path: A path for which there exists a set of input values and execution conditions which causes it to be executed. hhhhhhhh.Feature testing: See functional test case design. iiiiiiii.Functional specification: The document that describes in detail the characteristics of the product with regard to its intended capability. jjjjjjjj.Functional test case design: Test case selection that is based on an analysis of the specification of the component without reference to its internal workings. kkkkkkkk.Glass box testing: See structural test case design. llllllll.Incremental testing: Integration testing where system components are integrated into the system one at a time until the entire system is integrated. mmmmmmmm.Independence: Separation of responsibilities, which ensures the accomplishment of objective evaluation. After [do178b]. nnnnnnnn.Infeasible path: A path, which cannot be exercised by any set of possible input values. oooooooo.Input domain: The set of all possible inputs. pppppppp.Input value: An instance of an input. qqqqqqqq.Input: A variable (whether stored within a component or outside it) that is read by the component. rrrrrrrr.Inspection: A group review quality improvement process for written material. It consists of two aspects; product (document itself) improvement and process improvement (of both document production and inspection). After [Graham] ssssssss.Installability testing: Testing concerned with the installation procedures for the system.
tttttttt.Instrumentation: The insertion of additional code into the program in order to collect information about program behaviour during program execution. uuuuuuuu.Instrumented: A software tool used to carry out instrumentation. vvvvvvvv.Integration testing: Testing performed to expose faults in the interfaces and in the interaction between integrated components. wwwwwwww.Integration: The process of combining components into larger assemblies. xxxxxxxx.Interface testing: Integration testing where the interfaces between system components are tested. yyyyyyyy.Isolation testing: Component testing of individual components in isolation from surrounding components, with surrounding components being simulated by stubs. zzzzzzzz.LCSAJ coverage: The percentage of LCSAJs of a component, which is exercised by a test case suite. aaaaaaaaa.LCSAJ testing: A test case design technique for a component in which test cases are designed to execute LCSAJs. bbbbbbbbb.LCSAJ: A Linear Code Sequence And Jump, consisting of the following three items (conventionally identified by line numbers in a source code listing): the start of the linear sequence of executable statements, the end of the linear sequence, and the target line to which control flow is transferred at the end of the linear sequence. ccccccccc.Logic-coverage testing: See structural test case design. [Myers] ddddddddd.Logic-driven testing: See structural test case design. eeeeeeeee.Maintainability testing: Testing whether the system meets its specified objectives for maintainability. fffffffff.Modified condition/decision coverage: The percentage of all branch condition outcomes that Independently affect a decision outcome that have been exercised by a test case suite. ggggggggg.Modified condition/decision testing: A test case design technique in which test cases are designed to execute branch condition outcomes that independently affect a decision outcome. hhhhhhhhh.Multiple condition coverage: See branch condition combination coverage. iiiiiiiii.Mutation analysis: A method to determine test case suite thoroughness by measuring the extent to which a test case suite can discriminate the program from slight variants (mutants) of the program. See also error seeding. jjjjjjjjj.Negative testing: Testing aimed at showing software does not work. kkkkkkkkk.Non-functional requirements testing: Testing of those requirements that do not relate to functionality. I.e. performance, usability, etc. lllllllll.N-switch coverage: The percentage of sequences of N-transitions that have been exercised by a test case suite. mmmmmmmmm.N-switch testing: A form of state transition testing in which test cases are designed to execute all valid sequences of N-transitions. nnnnnnnnn.N-transitions: A sequence of N+1 transitions. ooooooooo.Operational testing: Testing conducted to evaluate a system or component in its operational environment. ppppppppp.Oracle: A mechanism to produce the predicted outcomes to compare with the actual outcomes of the software under test. qqqqqqqqq.Outcome: Actual outcome or predicted outcome. This is the outcome of a test. See also branch outcome, condition outcome, and decision outcome. rrrrrrrrr.Output domain: The set of all possible outputs. sssssssss.Output value: An instance of an output. ttttttttt.Output: A variable (whether stored within a component or outside it) that is written to by the component. uuuuuuuuu.Partition testing: See equivalence partition testing.
vvvvvvvvv.Path coverage: The percentage of paths in a component exercised by a test case suite. wwwwwwwww.Path sensitising: Choosing a set of input values to force the execution of a component to take a given path. xxxxxxxxx.Path testing: A test case design technique in which test cases are designed to execute paths of a component. yyyyyyyyy.Path: A sequence of executable statements of a component, from an entry point to an exit point. zzzzzzzzz.Performance testing: Testing conducted to evaluate the compliance of a system or component with specified performance requirements. aaaaaaaaaa.Portability testing: Testing aimed at demonstrating the software can be ported to specified hardware or software platforms. bbbbbbbbbb.Precondition: Environmental and state conditions, which must be fulfilled before the component can be executed with a particular input value. cccccccccc.Predicate data use: A data use in a predicate. dddddddddd.Predicate: A logical expression, which evaluates to TRUE or FALSE, normally to direct the execution path in code. eeeeeeeeee.Predicted outcome: The behaviour predicted by the specification of an object under specified conditions. ffffffffff.Program instrumented: See instrumented. gggggggggg.Progressive testing: Testing of new features after regression testing of previous features. hhhhhhhhhh.Pseudo-random: A series, which appears to be random but is in fact generated according to some prearranged sequence. iiiiiiiiii.P-use: See predicate data use. jjjjjjjjjj.Recovery testing: Testing aimed at verifying the system's ability to recover from varying degrees of failure. kkkkkkkkkk.Regression testing: Retesting of a previously tested program following modification to ensure that faults have not been introduced or uncovered as a result of the changes made. llllllllll.Requirements-based testing: Designing tests based on objectives derived from requirements for the software component (e.g., tests that exercise specific functions or probe the non-functional constraints such as performance or security). See functional test case design. mmmmmmmmmm.Result: See outcome. nnnnnnnnnn.Review: A process or meeting during which a work product, or set of work products, is presented to project personnel, managers, users or other interested parties for comment or approval. [ieee] oooooooooo.Security testing: Testing whether the system meets its specified security objectives. pppppppppp.Serviceability testing: See maintainability testing. qqqqqqqqqq.Simple subpath: A subpath of the control flow graph in which no program part is executed more than necessary. rrrrrrrrrr.Simulation: The representation of selected behavioural characteristics of one physical or abstract system by another system. [ISO 2382/1]. ssssssssss.Simulator: A device, computer program, or system used during software verification, which behaves or operates like a given system when provided with a set of controlled inputs. tttttttttt.Source statement: See statement. uuuuuuuuuu.Specification: A description of a component's function in terms of its output values for specified input values under specified preconditions. vvvvvvvvvv.Specified input: An input for which the specification predicts an outcome.
wwwwwwwwww.State transition testing: A test case design technique in which test cases are designed to execute state transitions. xxxxxxxxxx.State transition: A transition between two allowable states of a system or component. yyyyyyyyyy.Statement coverage: The percentage of executable statements in a component that have been exercised by a test case suite. zzzzzzzzzz.Statement testing: A test case design technique for a component in which test cases are designed to execute statements. aaaaaaaaaaa.Statement: An entity in a programming language, which is typically the smallest indivisible unit of execution. bbbbbbbbbbb.Static analysis: Analysis of a program carried out without executing the program. ccccccccccc.Static analyser: A tool that carries out static analysis. ddddddddddd.Static testing: Testing of an object without execution on a computer. eeeeeeeeeee.Statistical testing: A test case design technique in which a model is used of the statistical distribution of the input to construct representative test cases. fffffffffff.Storage testing: Testing whether the system meets its specified storage objectives. ggggggggggg.Stress testing: Testing conducted to evaluate a system or component at or beyond the limits of its specified requirements. hhhhhhhhhhh.Structural coverage: Coverage measures based on the internal structure of the component. iiiiiiiiiii.Structural test case design: Test case selection that is based on an analysis of the internal structure of the component. jjjjjjjjjjj.Structural testing: See structural test case design. kkkkkkkkkkk.Structured basis testing: A test case design technique in which test cases are derived from the code logic to achieve 100% branch coverage. lllllllllll.Structured walkthrough: See walkthrough. mmmmmmmmmmm.Stub: A skeletal or special-purpose implementation of a software module, used to develop or test a component that calls or is otherwise dependent on it. After [IEEE]. nnnnnnnnnnn.Sub-path: A sequence of executable statements within a component. ooooooooooo.Symbolic evaluation: See symbolic execution. ppppppppppp.Symbolic execution: A static analysis technique that derives a symbolic expression for program paths. qqqqqqqqqqq.Syntax testing: A test case design technique for a component or system in which test case design is based upon the syntax of the input. rrrrrrrrrrr.System testing: The process of testing an integrated system to verify that it meets specified requirements. sssssssssss.Technical requirements testing: See non-functional requirements testing. ttttttttttt.Test automation: The use of software to control the execution of tests, the comparison of actual outcomes to predicted outcomes, the setting up of test preconditions, and other test control and test reporting functions. uuuuuuuuuuu.Test case design technique: A method used to derive or select test cases. vvvvvvvvvvv.Test case suite: A collection of one or more test cases for the software under test. wwwwwwwwwww.Test case: A set of inputs, execution preconditions, and expected outcomes developed for a particular objective, such as to exercise a particular program path or to verify compliance with a specific requirement. xxxxxxxxxxx.Test comparator: A test tool that compares the actual outputs produced by the software under test with the expected outputs for that test case.
yyyyyyyyyyy.Test completion criterion: A criterion for determining when planned testing is complete, defined in terms of a test measurement technique. zzzzzzzzzzz.Test coverage: See coverage. aaaaaaaaaaaa.Test driver: A program or test tool used to execute software against a test case suite. bbbbbbbbbbbb.Test environment: A description of the hardware and software environment in which the tests will be run, and any other software with which the software under test interacts when under test including stubs and test drivers. cccccccccccc.Test execution technique: The method used to perform the actual test execution, e.g. manual, capture/playback tool, etc. dddddddddddd.Test execution: The processing of a test case suite by the software under test, producing an outcome.
eeeeeeeeeeee.Test Generator: A program that generates test cases in accordance to a specified strategy or heuristic. ffffffffffff.Test Harness: A testing tool that comprises a test driver and a test comparator. gggggggggggg.Test Measurement Technique: A method used to measure test coverage items. hhhhhhhhhhhh.Test Outcome: See outcome. iiiiiiiiiiii.Test Plan: A record of the test planning process detailing the degree of tester independence, the test environment,
the test case design techniques and test measurement techniques to be used, and the rationale for their choice.
jjjjjjjjjjjj.Test Procedure: A document providing detailed instructions for the execution of one or more test cases. kkkkkkkkkkkk.Test Records: For each test, an unambiguous record of the identities and versions of the component under
test, the test specification, and actual outcome.
llllllllllll.Test Script: Commonly used to refer to the automated test procedure used with a test harness. mmmmmmmmmmmm.Test Specification: For each test case, the coverage item, and the initial state of the software
under test, the input, and the predicted outcome.
nnnnnnnnnnnn.Test Target: A set of test completion criteria. oooooooooooo.Testing: The process of exercising software to verify that it satisfies specified requirements and to detect
pppppppppppp.Thread Testing: A variation of top-down testing where the progressive integration of components follows
the implementation of subsets of the requirements, as opposed to the integration of components by successively lower levels.
qqqqqqqqqqqq.Top-Down Testing: An approach to integration testing where the component at the top of the component
hierarchy is tested first, with lower level components being simulated by stubs. Tested components are then used to test lower level components. The process is repeated until the lowest level components have been tested.
rrrrrrrrrrrr.Unit Testing: See component testing. ssssssssssss.Usability Testing: Testing the ease with which users can learn and use a product. tttttttttttt.Validation: Determination of the correctness of the products of software development with respect to the user
needs and requirements.
uuuuuuuuuuuu.Verification: The process of evaluating a system or component to determine whether the products of the
given development phase satisfy the conditions imposed at the start of that phase.
vvvvvvvvvvvv.Volume Testing: Testing where the system is subjected to large volumes of data. wwwwwwwwwwww.Walkthrough: A review of requirements, designs, or code characterized by the author of the object
under review guiding the progression of the review.
xxxxxxxxxxxx.White box testing: See structural test case design.