Professional Documents
Culture Documents
Software faults are static - they are characteristics of the code they exist in
When we test software, it is easy to believe that the faults in the software move. Software faults are static. Once
injected into the software, they will remain there until exposed by a test and fixed.
1.1.4.Reliability is…
The probability that software will not cause the failure of a system for a specified time under specified conditions
It is usually easier to consider reliability from the point of view of a poor product. One could say that an unreliable
product fails often and without warning and lets its users down. However, this is an incomplete view. If a product fails
regularly, but the users are unaffected, the product may still be deemed reliable. If a product fails only very rarely, but it
fails without warning and brings catastrophe, then it might be deemed unreliable.
Software with faults may be reliable, if the faults are in code that is rarely used
If software has faults it might be reliable because the faulty parts of the software are rarely or never used - so it does not
fail. A legacy system may have hundreds or thousands of known faults, but these exist in parts of the system of low
criticality so the system may still be deemed reliable by its users.
1.5.3.
If we could do exhaustive testing, most tests would be duplicates that tell us nothing
Even if we used a tool to execute millions of tests, we would expect that the majority of the tests would be duplicates and
they would prove nothing. Consequently, test case selection (or design) must focus on selecting the most important or
useful tests from the infinite number possible.
Effective tests
When we prepare a test, we should have some view on the type of faults we are trying to detect. If we postulate a fault and
look for that, it is likely we will be more effective.
In other words, tests that are designed to catch specific faults are more likely to find faults and are therefore more effective.
Efficient tests
If we postulate a fault and prepare a test to detect that, we usually have a choice of tests. We should select the test that has
the best chance of finding the fault. Sometimes, a single test could detect several faults at once. Efficient tests are those
that have the best chance of detecting a fault.
(1) Gaps in functionality may cost users their time. An obvious risk is that we may not have built all the required features
of the system. Some gaps may not be important, but others may badly undermine the acceptability of the system. For
example, if a system allows customer details to be created, but never amended, then this would be a serious problem, if
customers moved location regularly, for example.
(2) Poor design may make software hard to use. For some applications, ease of use is critical. For example, on a web
site used to take orders from household customers, we can be sure that few have had training in the use of the Net or
more importantly, our web site. So, the web site MUST be easy to use.
(3) Incorrect calculations may cost us money. If we use software to calculate balances for customer bank accounts, our
customers would be very sensitive to the problem of incorrect calculations. Consequently, tests of such software would
be very high in our priorities.
(4) Software failure may cost our customers money. If we write software and our customers use that software to, say,
manage their own bank accounts then, again, they would be very sensitive to incorrect calculations so we should of
course test such software thoroughly.
(5) Wrong software decision may cost a life. If we write software that manages the control surfaces of an airliner, we
would be sure to test such software as rigorously as we could as the consequences of failure could be loss of life and
injury.
Firstly we aim to detect the faults that cause the risks to occur. If we can detect these faults, they can be fixed, retested and
the risk is eliminated or at least reduced.
Secondly, if we can measure the quality of the product by testing and fault detection we will have gained an understanding
of the risks of implementation, and be better able to decide whether to release the system or not.
There are other important reasons why testing may figure prominently in a project plan.
Some industries, for example, financial services, are heavily regulated and the regulator may impose rigorous conditions on
the acceptability of systems used to support an organisation's activities.
Some industries may self-regulate, others may be governed by the law of the land.
The Millennium bug is an obvious example of a situation where customers may insist that a supplier's product is compliant
in some way, and may insist on conducting tests of their own.
For some software, e.g., safety-critical, the type and amount of testing, and the test process itself, may be defined by
industry standards.
On almost all development or migration projects, we need to provide evidence that a software product is compliant in one
way or another. It is, by and large, the test records that provide that evidence. When project files are audited, the most
reliable evidence that supports the proposition that software meets its requirements is derived from test records.
The split of costs described in the table is a great generalisation. Suffice to say, that the costs of testing in the majority
of commercial system development is between 40 and 60%. This includes all testing such as reviews, inspections and
walkthroughs, programmer and private testing as well as more visible system and acceptance tests. The percentage
may be more (or less) in your environment, but the important issue is that the cost of testing is very significant.
Once deployed in production, most systems have a lifespan of several years and undergo repeated maintenance.
Maintenance in many environments could be considered to be an extended development process. The significance of
testing increases dramatically, because changing existing software is error-prone and difficult, so testing to explore the
behaviour of existing software and the potential impact of changes takes very much longer. In higher integrity
environments, regression testing may dominate the budget.
The consequence of all this is that over the entire life of a product, testing costs may dominate all development costs.
Level of automation
Many test activities are repetitive and simple. Test execution is particularly prone to automation by a suitable tool. Using a
tool, tests can be run faster, more reliably and cheaper than people can ever run them.
Skill of the personnel
Skilled testers adopt systematic approaches to organisation, planning, preparation and execution of tests. Unskilled testers
are disorganised, ineffective and inefficient. And expensive too.
We may have to rely on a consensus view to ensure we do at least the most important tests. Often the test measurement
techniques give us an objective 'benchmark', but possibly, there will be an impractical number of tests, so we usually need
to arrive at an acceptable level of testing by consensus. It is an important role for the tester to provide enough information
on risks and the tests that address these risks so that the business and technical experts can understand the value of doing
some tests while understanding the risks of not doing other tests. In this way, we arrive at a balanced test approach.
Bug fixing and maintenance are error-prone - 50% of changes cause other faults.
Bug fixing and maintenance are error-prone - 50% of changes cause other faults. Have you ever experienced the 'Friday
night fix' that goes wrong? All too often, minor changes can disrupt software that works. Tracing the potential impact of
changes to existing software is extremely difficult. Before testing, there is a 50% chance of a change causing a problem (a
regression) elsewhere in existing software. Maintenance and bug-fixing are error-prone activities.
The principle here is that faults do not uniformly distribute themselves through software. Because of this, our test activities
should vary across the software, to make the best use of tester's time.
If not in the system critical parts of the system - should be low impact
If we've tested the technically critical parts of the software, we can say that the bugs that get through are less likely to cause
technical failures, so perhaps there's no issue there either. Faults should be of low impact.
Need to balance:
We need to balance the cost of doing testing against the potential cost of risk.
It is reasonably easy to set a cost or time limit for the testing. The difficult part is balancing this cost against a risk. The
potential impact of certain risks may be catastrophic and totally unacceptable at any cost. However, we really need to take a
view on how likely the risks are. Some catastrophic failures may be very improbable. Some minor failures may be very
common but be just as serious if they happen too often. In either case, a judgement on how much testing is appropriate
must be made.
2.8. Scalability
Scalability in the context of risk and testing relates to how we do the right amount of the right kind of testing. Not all systems
can or should be tested as thoroughly as is technically possible.
Not every system is safety-critical. In fact the majority of systems support relatively low-criticality business processes. The
principle must be that the amount of testing must be appropriate to the risks of failure in the system when used in
production.
Not all systems, sub-systems or programs require the same amount of testing
It is obviously essential that testing is thorough when we are dealing with safety critical software. We must obviously do as
much as possible. But low criticality systems need testing too, but how much testing is reasonable in this circumstance? The
right amount of testing needs to be determined by consensus. Will the planned test demonstrate to the satisfaction of the
main stakeholders that the software meets its specification, that it is fault free?
3. Testing Process
3.1. What is a test?
A test is a controlled exercise involving:
What is a test? Do you remember the biology or physics classes you took when you were 13 or 14? You were probably
taught the scientific method where you have a hypothesis, and to demonstrate the hypothesis is true (or not) you set up an
experiment with a control and a method for executing a test in a controlled environment.
Testing is similar to the controlled experiment. (You might call your test environment and work area a test 'lab'). Testing is a
bit like the experimental method for software.
You have an object under test that might be a piece of software, a document or a test plan.
You define and prepare the inputs - what we’re going to apply to the software under test.
You also have a hypothesis, a definition of the expected results. So, that’s kind of the absolute fundamentals of what a test
it. You need those four things.
Have you ever been asked to test without requirements or asked to test without having any software? It's not very easy to
do is it?
When you run a test, you get an actual outcome. The outcome is normally some change of state of the system under test
and outputs (the result). Whatever happens as a result of the test must be compared with the expected outcome (your
hypothesis). If the actual outcome matches the expected outcome, you hypothesis is proven. That is what a test is.
When we run a test, we must have an expected result derived from the baseline
Just like a controlled experiment, where a hypothesis must be proposed in advance of the experiment taking place, when
you run a test, there must be an expected outcome defined beforehand. If you don't have an expected result, there is a risk
that the software does what it does and because you have nothing to compare its behaviour to, you may assume that the
software works correctly. If you don’t have an expected result at all, you have no way of saying whether the software is
correct or incorrect because you have nothing to compare the software's behaviour with.
Boris Beizer (ref) suggests that if you watch an eight-year old play pool – they put the cue ball on the table; they address the
cue ball, hit it as hard as they can, and if a ball goes in the pocket, the kid will say, "I meant that". Does that sound familiar?
What does a professional pool player do? A pro will say, "xxx ball in the yyy pocket". They address the cue ball, hit it as hard
as they can, and if it goes in, they will say, "I meant that" and you believe them.
It’s the same with testing. A kiddie tester will run some tests and say “that looks okay" or "that sounds right…”, but there will
no comparison, no notion of comparison with an expected result - there is no hypothesis. Too often, we are expected to test
without a requirement or an expected result. You could call it 'exploratory testing' but strictly, it is not testing at all.
An actual result either matches or does not match the expected result
What we are actually looking for is differences between our expected result and the actual result.
If there is a difference, there may be a fault in the software and we should investigate.
If we see a difference, the software may have failed, and that is how we are going to infer the existence of faults in the
software.
Testing includes:
It is important to recognise that testing is not just the act of running tests. What are the testing activities then?
Testing obviously includes the planning and scoping of the test and this involves working out what you’re going to do in the
test - the test objectives.
Specification and preparation of test materials delivers the executable test itself. This involves working out test conditions,
cases, and creating test data, expected results and scripts themselves.
The slight problem with it is that there is a notion in the standard process that every time you run a test, you must check to
see whether you have met the completion criteria. With component level tests, this works fine, but with system testing it
doesn’t work that way. You don’t want to have to say, “have I finished yet?” after every test case, because it doesn’t work
that way.In the standard process, there is a stage called Test Checking for Completion. It is during this activity that we check
whether we have met our completion criteria.
Completion criteria vary with different test stages. In system and acceptance testing, we tend to require that the test plan
has been completed without a failure. With component testing, we may be more driven by the coverage target, and we may
have to create more and more tests to achieve our target.
• Objective, measurable criteria for test completion, for example
o All tests run successfully
o All faults found are fixed and re-tested
o Coverage target (set and) met
o Time (or cost) limit exceeded
• Coverage items defined in terms of
o Requirements, conditions, business transactions
o Code statements, branches.
Often, time pressure forces a decision to stop testing. Often, development slips and testing is ‘squeezed’ to ensure a
timely delivery into production. This is a compromise but it may be that some faults are acceptable. When time runs out
for testing, the decision to continue testing or to release the system forces a dilemma on the project. “Should we release
the system early (on time), with faults, or not?” It is likely that if time runs out you may be left with the fact that some
tests have failures and are still outstanding. Some tests you may not have run yet. So it is common that the completion
criteria are compromised.
If you do finish all of your testing and there is still time leftover, you might choose to write some more tests, but this isn’t
very likely. If you do run out of time, there is the third option: you could release the system, but continue testing to the
end of the plan. If you find faults after release, you can fix them in the next package. You are taking a risk but there may
be good reasons for doing so. However, clear-cut as the textbooks say completion criteria are, it’s not usually as clean.
Only in high-integrity environments does testing continue until the completion criteria are met.
• Under time pressure in low integrity systems
o Some faults may be acceptable (for this release)
o Some tests may not be run at all
• If there are no tests left, but there is still time
o Maybe some additional tests could be run
• You may decide to release the software now, but testing could continue.
3.8. Coverage
Coverage measures - a model or method used to quantify testing (e.g. decision coverage)
Coverage measures are based on models of the software. The models represent an abstraction of the software or its
specification. The model defines a technique for selecting test cases that are repeatable and consistent and can be used by
testers across all application areas.
Functional techniques
Functional test techniques are those that use the specification or requirements for software to derive test cases. Examples
of functional test techniques are equivalence partitioning, boundary value analysis and state transitions.
Structural techniques.
Structural test techniques are those that use the implementation or structure of the built software to derive test cases.
Examples of structural test techniques are statement testing, branch testing, linear code sequence and jump (LCSAJ)
testing.
3.10.Structural coverage
There are over fifty test techniques that are based on the structure of code. Most are appropriate to third generation
languages such as COBOL, FORTRAN, C, BASIC etc. In practice, only a small number of techniques are widely used as
tools support is essential to measure coverage and make the techniques practical.
Measures and coverage targets based on the internal structure of the code
Coverage measures are based on the structure (the actual implementation) of the software itself. Statement coverage is
based on the executable source code statements themselves. The coverage item is an executable statement. 100%
statement coverage requires that tests be prepared which, when executed, every executable statement is exercised.
Decision testing depends on the decisions made in code. The coverage item is a single decision outcome and 100%
decision coverage requires all decision outcomes to be covered.
Normal strategy:
The usual approach to using structural test techniques is as follows:
(1) Use coverage tool to instrument code. A coverage tool is used to pre-process the software under test. The tool
inserts instrumentation code that has no effect on the functionality of the software under test, but logs the paths through
the software when it is compiled and run through tests.
(2) Execute tests. Test cases are prepared using a functional technique (see later) and executed on the instrumented
software under test.
(3) Use coverage tool to measure coverage. The coverage tool is then used to report on the actual coverage achieved
during the tests. Normally, less than 100% coverage is achieved. The tool identifies the coverage items (statements,
branches etc.) not yet covered.
(4) Enhance test to achieve coverage target. Additional tests are prepared to exercise the coverage items not yet
covered.
(5) Stop testing when coverage target is met. When tests can be shown to have exercised all coverage items (100%
coverage) no more tests need be created and run.
Note that 100% coverage may not be possible in all situations. Some software exists to trap exceptional or obscure error
conditions and it may be very difficult to simulate such situations. Normally, this requires special attention or additional
scaffolding code to force the software to behave the way required. Often the 100% coverage requirement is relaxed to take
account of these anomalies.
Structural techniques are most often used in component or link test stages as some programming skills are required to use
them effectively.
3.11.Functional coverage
There are fewer functional test techniques than structural techniques. Functional techniques are based on the specification
or requirements for software. Functional test techniques do not depend on the code, so are appropriate for all software at all
stages, regardless of the development technology.
Using a test technique to analyse a specification, we can be confident that we have covered all the system behaviours and
the full scope of functionality, at least as seen by the user. The techniques give us a powerful method to ensure we create
comprehensive tests which are consistent in their depth of coverage of the functionality, e.g., we have a measure of the
completeness of our testing.
3.13.Limitations of testing
Many non-testers believe that testing is easy, that software can be tested until it is fault free, that faults are uniformly difficult
(or easy) to detect. Testers must not only understand that there are limits to what can be achieved, but they must also be
able to explain these limitations to their peers, developers, project manager and users.
Always possible to create more tests so it is difficult to know when you are finished
Even when we believe we have done enough testing, it is relatively simple to think of additional tests that might enhance our
test plan. Even though the test techniques give us a much more systematic way of designing comprehensive tests, there is
never any guarantee that such tests find all faults. Because of this testers are tempted into thinking that there is always
another test to create and so are 'never satisfied' that enough testing has been done; that they never have enough time to
test.
Given these limitations, there are two paradoxes which can help us to understand how we might better develop good tests
and the limitations of our 'art'.
Testing paradoxes:
(1) The best way to gain confidence in software is to try and break it. The only way we can become confident in our software
is for us to try difficult, awkward and aggressive tests. These tests are most likely to detect faults. If they do detect faults, we
can fix the software and the quality of the software is increased. If they do not detect a fault, then our confidence in the
software is increased. Only if we try and break the software are we likely to get the required confidence.
(2) You don't know how good your testing is until maybe a year after release. A big problem for testers is that it is very difficult to
determine whether the quality or effectiveness of our testing is good or bad until after the software has gone into production. It is
the faults that are found in production by users that give us a complete picture of the total number of bugs that should have
been found. Only when these bugs have been detected can we derive a view on our test effectiveness. The more bugs found in
testing, compared to production, the better our testing has been. The difficulty is that we might not get the true picture until all
production bugs have been found, and that might take years!
Your incentive will now be to create really tough tests. If your goal is to find faults, and you try and don’t find any, then
you can be confident that the product is robust. Testers should have a mindset which says finding faults is the goal. If
the purpose of testing is to find faults, when faults are found, it might upset a developer or two, but it will help the project
as a whole.
3.14.3.Tester mindset
Some years ago, there was a popular notion that testers should be put into “black teams”. Black teams were a popular
idea in the late 1960s and early 1970s. If a successful test is one that locates a fault, the thinking went, then the testers
should celebrate finding faults, cheering even. Would you think this was a good idea if you were surrounded by
developers? Of course not.
There was an experiment some years ago in IBM. They set up a test team, who they called the 'black team' because
these guys were just fiends. Their sole aim was to break software. Whatever was given to them to test, they were going
to find faults in it. They developed a whole mentality where they were the ‘bad guys’.
They dressed in black, with black, Stetson hats and long false moustaches all for fun. They really were the bad guys,
just like the movies. They were very effective at finding faults in everyone’s work products, and had great fun, but they
upset everyone whose project they were involved in. They were most effective, but eventually were disbanded.
Technically, it worked fine, but from the point of view of the organisation, it was counterproductive. The idea of a “black
team” is cute, but keep it to yourself: it doesn’t help anyone if you crow when you find a fault in a programmer's code.
You wouldn’t be happy if one of your colleagues tells you, your product is poor and laughs about it. It’s just not funny.
The point to be made about all this is that the tester’s mindset is critical.
3.15.Re-Testing
A re-test is a test that, on the last occasion you ran it, the system failed and a fault was found, and now you’re repeating that
same test to make sure that the fault has been properly corrected. This is called re-testing. We know that every test plan
we’ve ever run has found faults in the past, so we must always expect and plan to do some re-testing.
Does your project manager plan optimistically? Some project managers always plan optimistically. They ask the testers:
“how long is the testing going to take?”. To which the tester replies perhaps “four weeks if it goes as well as possible…”, and
what happens is the tester suggest that, with things going perfectly well, maybe it takes a month, knowing that it should take
twice as long because things do go wrong, you do find faults, there are delays between finding a fault, fixing it, and re-
testing. The project manager pounces on the ‘perfect situation’, and plans optimistically. Some project managers plan on the
basis of never finding faults, which is absolutely crazy. We must always expect to do some re-testing.
• If we run a test that detects a fault we can get the fault corrected
• We then repeat the test to ensure the fault has been properly fixed
• This is called re-testing
• If we test to find faults, we must expect to find some faults so...
• We always expect to do some re-testing.
3.16.Regression testing
Regression testing is different from re-testing. We know that when we change software to fix a fault, there’s a significant
possibility that we will break something else. Studies over many years reveal that the probability of introducing a new fault
during corrective maintenance is around 50%. The 50% probability relates to creating a new fault in the software before
testing is done. Testing will reduce this figure dramatically, but it is unsafe and perhaps negligent not to test for these
unwanted side-effects.
• When software is fixed, it often happens that 'knock-on' effects occur
• We need to check that only the faulty code has changed
• 50% chance of regression faults
• Regression tests tell us whether new faults have been introduced
o i.e. whether the system still works after a change to the code or environment has been made
"Testing to ensure a change has not caused faults in unchanged parts of the system"
A regression test is a check to make sure that when you make a fix to software the fix does not adversely affect other
functionality.
The big question, “is there an unforeseen impact elsewhere in the code?” needs to be answered. The need exists
because fault-fixing is error-prone. It’s as simple as that. Regression tests tell you whether software that worked before
the fix was made, still works. The last time that you ran a regression test, by definition, it did not find a fault; this time,
you’re going to run it again to make sure it still doesn’t expose a fault.
A more formal definition of regression testing is – testing to ensure a change has not caused faults in unchanged parts
of the system.
An entire test may be retained for subsequent use as a regression test pack
It is possible that you may, on a system test say, keep the entire system test plan and run it in its entirety as a regression
test.
Some might say that manual regression tests are a contradiction in terms
Manual regression testing is a contradiction in terms but regression tests are selected on the basis that they are perhaps the
most stable parts of the software.
Regression tests are the most likely to be stable and run repeatedly so:
The tests that are easiest to automate are the ones that don’t find the bugs because you’ve run them once to completion.
The problem with tests that did find bugs is that they cannot be be automated so easily.
The paradox of automated regression testing is that the tests that are easiest to automate are the tests that didn’t find faults
the last time we ran them. So the tests we end up automating often aren't the best ones.
3.19.Expected Results
We’ve already seen that the fundamental test process requires that an outcome (expected result) must be predicted before
the test is run. Without an expected result the test cannot be interpreted as a pass or fail. Without some expectation of the
behaviour of a system, there is nothing to compare the actual behaviour with, so no decision on success or failure can be
made. This short section outlines the importance of baselines and expected results.
3.22.Expected results
The concern about expected results is that we should define them before we run the tests. Otherwise, we’ll be tempted to
say that, whatever the system does when we test it, we’ll pass the result as correct. That’s the risk. Imagine that you’re
under pressure from the boss (‘don’t write tests…just do the testing…’). The pressure is immense, so it’s easier to not write
anything down, to not think what the results should be, to run some informal tests and pass them as correct. Expected
results, (even when good baselines aren’t available) should always be documented.
• If we don't define expected result before we execute the test...
o A plausible, but erroneous, result may be interpreted as the correct result
o There may be a subconscious desire to see the software pass the test
• Expected results must be defined before test execution, derived from a baseline
4. Prioritisation of Tests
We’ve mentioned coverage before, and we need to go into a little bit more detail on coverage. Were you ever given enough time
to test? Probably not. So what happens when you do some initial work to specify a test and then estimate the effort required to
complete the testing tasks? Normally, your estimates are too high, things need prioritisation and some tests will be ‘de-scoped’.
This is entirely reasonable because we know that at some point the cost of testing must be balanced against the risk of release.
First principle: to make sure the most important tests are included in test plans
So, the first principle of prioritisation must be that we make sure that the most important tests are included in the test plans.
That’s pretty obvious.
Second principle: to make sure the most important tests are executed
The second principle is however, that we must make sure that the most important tests are run. If, when the test execution
phase starts and it turns out that we do run out of time before the test plan is complete, we want to make sure that, if we do
get squeezed, the most important tests, at least, have been run. So, we must ensure that the most important tests are
scheduled early to ensure that they do get run.
If tests reveal major problems, better find them early, to maximise time available to correct problems.
There is also a most important benefit of running the most important tests first. If the most important tests reveal problems
early on, you have the maximum amount of time to fix them and recover the project.
4.5. Critical
When you ask a user which parts of the system are more critical than others, what would you say? ‘We’d like to prioritise the
features of the system, so it would help us if you could tell me which requirements are high-priority, the most critical’. What
would you expect them to say?
‘All of our requirements are critical’. Why? Because they believe that when they de-prioritise something, it is going to get
pushed out, de-scoped, and they don’t want that to happen. They want everything they asked for so they are reluctant to
prioritise. So, you have to explain why you’re going through this process because it is most important that you test the most
critical parts of the software a bit more than those parts of the system that are less critical. The higher the criticality of a
feature, the greater the risk, the greater the need to test it well.
People will co-operate with you, once they realise what it is that you’re trying to achieve. If you can convince them that
testing is not uniform throughout the system, that some bits need more than others, you just want a steer. These are ways
of identifying what is more important.
What parts of the system do the users really need to do their job?
As a tester, you have to get beyond the response, ‘it’s all critical!’ You might ask, ‘which parts of the system do you really,
really need?’ You have to get beyond this kind of knee-jerk reaction that everything is critical. You have to ask, ‘what is
really, really important?’
4.6. Complex
If you know an application reasonably well, then you will be able to say, for example, that these user screens are pretty
simple, but the background or batch processes that do end-of-the-day processing are very complicated. Or perhaps that the
user-interface is very simple, apart from these half dozen screens that calculate premiums, because the functionality behind
those screens consists of a hundred thousand lines of code. Most testers and most users could work out which are the most
complex parts of system to be tested.
4.7. Error-prone
The third question is error-prone. There is of course a big overlap with complexity here – most complex software is error-
prone. But sometimes, what appear to be simpler parts of a system may turn out to be error-prone.
Verification
The principle of verification is this: verification checks that the product of a development phase meets its specification,
whatever form that specification takes. More formally, verification implies that all conditions laid down at the start of a
development phase are met. This might include multiple baseline or reference documents such as standards, checklists or
templates.
Validation is really concerned with testing the final deliverable – a system, or a program – against user needs or
requirements. Whether the requirements are formally documented or exist only as user expectations, validation activities
aim to demonstrate that the software product meets these requirements and needs. Typically, the end-user requirements
are used as the baseline. An acceptance test is the most obvious validation activity.
Defined as: "determination of the correctness of the products of software development with respect to the user needs and
requirements"
In other words, validation is the determination of the correctness of the products of a software development with respect to
the users' needs and requirements.
Verification activities are mainly (but not exclusively) the concern of the suppliers of the system.
Verification tends to be more the concern of the supplier/developer of the software product, rather than the concern of the
user, at least up until system testing. A technician asks: did we build this product the way we specified?
(2) Analysis techniques were intuitive. ‘Design’ was a term used by programmers to describe their coding activity.
(3) Requirements were sketchy. Testing was not a distinct activity at all, but something done by programmers on an informal
basis.
(4) Programs were written without designs. The main consequence of this approach was that systems were very expensive,
fault prone and very difficult to maintain.
5.3. Structured methodologies
Waterfall model
The ‘Waterfall Approach’ to development, where development is broken up into a series of sequential stages, was the
original textbook method for large projects. There are several alternatives that have emerged in the last ten years or so.
Spiral model
The Spiral model of development acknowledges the need for continuous change to systems as business change proceeds
and that large developments never hit the target 100% first time round (if ever). The Spiral model regards the initial
development of a system as simply the first lap around a circuit of development stages. Development never ‘stops’, in that a
continuous series of projects refine and enhance systems continuously.
Incremental prototyping
Incremental prototyping is an approach that avoids taking big risks on big projects. The idea is to run a large project as a
series of small, incremental and low-risk projects. Large projects are very risky because by sheer volume, they become
complex. You have lots of people, lots of communication, mountains of paperwork, and difficulty. There are a number of
difficulties associated with running a big project. So, this is a way of just carving up big projects into smaller projects. The
probability of project failure is lowered and the consequence of project failure is lessened.
Requirements
Most static testing will operate on project deliverables such as requirements and design specification or test plans. However,
any document can be reviewed or inspected. This includes project terms of reference, project plans, test results and reports,
user documentation etc.
Designs
Review of the design can highlight potential risks that if identified early can either be avoided or managed.
Code
There are techniques that can be used to detect faults in code without executing the software. Review and inspection
techniques are effective but labour intensive.
Static analysis tools can be used to find statically detectable faults in millions of lines of code.
Test plans.
It is always a good idea to get test plans reviewed by independent staff on the project - usually business people as well as
technical experts.
Static tests do not involve executing the software. Dynamic tests, the traditional method of running tests by executing the
software, are appropriate for all stages where executable software components are available.
System testing
System-level tests are split into functional and non-functional test types.
Non-functional tests address issues such as performance, security, backup and recovery requirements.
Functional tests aim to demonstrate that the system, as a whole, meets its functional specification.
System and acceptance test plans written towards the end of the physical design phase including
The system and acceptance test plans include the test specifications and the acceptance criteria. System and acceptance
tests should also be planned early, if possible. System-level test plans tend to be large documents - they take a lot longer to
plan and organise at the beginning and to run and analyse at the end. System test planning normally involves a certain
amount of project planning, resourcing and scheduling because of its scale. It’s a bigger process entirely requiring much
more effort than testing a single component.
Test plans for components and complete systems should be prepared well in advance for two reasons. Firstly, the process
of test design detects faults in baseline documents (see later) and second to allow time for the preparation of test materials
and test environments. Test planning depends only on good baseline documents so can be done in parallel with other
development activities. Test execution is on the critical path – when the time comes for test execution, all preparations for
testing should be completed.
5.15.Common problems
If there is little early testing, such as requirements or design reviews, if component testing and integration testing in the
small don't happen, what are the probable consequences?
Lots of rework
Firstly, lots of faults that should have been found by programmers during component testing cause problems in system test.
System testing starts late because the builds are unreliable and the most basic functionality doesn't work. The time taken to
fix faults delays system testing further, because the faults stop all testing progressing.
Delivery slippage
Re-programming trivial faults distracts the programmers from serious fault fixing.
Re-testing and regression testing distract the system testers.
The overall quality of the product is poor, the product is late and the users become particularly frustrated because they
continue to find faults that they are convinced should have been detected earlier.
Cut back on function, deliver low quality or even the wrong system.
Time pressure forces a decision: ship a poor quality product or cut back on the functionality to be delivered. Either way, the
users get a system that does not meet their requirements at all.
b. Test deliverables
This is a diagram lifted from the IEEE 829 Standard for Software Test Documentation. The standard defines a
comprehensive structure and organisation for test documentation and composition guidelines for each type of document.
In the ISEB scheme IEEE 829 is being promoted as a useful guideline and template for your project deliverables.
You don't need to memorise the content and structure of the standard, but the standard number IEEE829 might well be
given as a potential answer in an examination question.
NB: it is a standard for documentation, but makes no recommendation on how you do testing itself.
e. Brainstorming – agenda
It is helpful to have an agenda for the brainstorming meeting. The agenda should include at least the items below. We find it
useful to use the Master Test Plan (MTP) headings as an agenda and for the testers to prepare a set of questions
associated with each heading to 'drive' the meeting.
f. MTP Headings
• unique, generated number to identify this test plan, its level and the level of software that it is related to
• preferably the test plan level will be the same as the related software level
• may also identify whether the test plan is a Master plan, a Level plan, an integration plan or whichever plan
level it represents.
2. References
3. Introduction
• the purpose of the Plan, possibly identifying the level of the plan (master etc.).
• the executive summary part of the plan.
6. Features to be tested
8. Approach (Strategy)
15. Responsibilities
• who is in charge?
• who defines the risks?
• who selects features to be tested and not tested
• who sets overall strategy for this level of plan.
16. Schedule
18. Approvals
19. Glossary
• used to define terms and acronyms used in the document, and testing in general, to eliminate confusion and
promote consistent communications.
6. Stages of Testing
This module sets out the six stages of testing as defined in the ISEB syllabus and provides a single slide description of each
stage. The modules that follow this one describe the stages in more detail.
Component testing is the lowest level of testing. The purpose of it is to demonstrate that a program performs as described in
its specification. Typically, you are testing against a program specification. Techniques – black and white box testing
techniques are used. The programmers know how to work out test cases to exercise the code by looking at the code (white
box testing). When the programmers are using the program spec to drive their testing, then this is black box testing. Object
under test – a single program, a module, class file, or any other low-level, testable object. Who does it ? Normally, the
author of the component. It might not be, but usually, it is the same person that wrote the code.
Integration testing in the small, is also called link testing. The principle here is that we’re looking to demonstrate that a
collection of components, which have been integrated, have interfaced with each other. We’re testing whether or not those
interfaces actually work, according to a physical design. It’s mainly white box testing, that is, we know what the interface
looks like technically (the code). Object under test – usually more than one program or component or it could be all of the
sub-programs making up a program. Who does it ? Usually a member of the programming team because it’s a technical
task.
Objectives To demonstrate that a whole system performs as described in the logical design or
functional specification documents.
Test technique Black box, mainly.
Object under test A sub-system or system.
Responsibility A test team or group of independent testers.
Scope System testing is often divided up into sub-system tests followed by full system tests. It is
also divided into testing of "functional" and "non-functional" requirements.
The objective of functional system testing is to demonstrate that the whole system performs according to its functional
specification. The test techniques are almost entirely black box. Functional testing is usually done by more than one person
- a team of testers. The testers could be made up of representatives from different disciplines, e.g., business analysts,
users, etc. or they could be a team of independent testers (from outside the company developing or commissioning the
system).
Objectives To demonstrate that the non-functional requirements (e.g. performance, volume, usability,
security) are met.
Test technique Normally a selection of test types including performance, security, usability testing etc.
Object under test A complete, functionally tested system.
Responsibility A test team or group of independent testers.
Scope Non-functional system testing is often split into several types of test organised by the
requirement type.
Non-functional requirements describe HOW the system delivers its functionality. Requirements specifying the performance,
usability, security, etc. are non-functional requirements. You need a complete system, functionally tested system that is
reliable and robust enough to test without it crashing every five minutes. You may be able to start the preparation of the non-
functional tests before the system is stable, but the actual tests have to be run on the system as it will be at the time when it
is ready for production.
Objectives To demonstrate that a new or changed system interfaces correctly with other systems.
Test technique Black and white box.
Object under test A collection of interfacing systems.
Responsibility Inter-project testers.
Scope White box tests cover the physical interfaces between systems.
White box tests cover the inter-operability of systems.
Black-box tests verify the data consistency between interfacing systems.
Integration testing in the large involves testing multiple systems and paths that span multiple systems. Here, we’re looking
at whether the new or changed interfaces to other systems actually work correctly. Many of the tests will operate 'end-to-
end' across multiple systems. This is usually performed by a team of testers.
Objectives To satisfy the users that the delivered system meets their requirements and that the
system fits their business process.
Test technique Entirely black box.
Object under test An entire system
Responsibility Users, supported by test analysts.
Scope The structure of User Testing is in many ways similar to System Testing, however the
Users can stage whichever tests that will satisfy them that their requirements have been
met.
User Testing may include testing of the system alongside manual procedures and
documentation.
Here, we are looking at an entire system. Users will do most of the work, possibly supported by more experienced testers.
Objectives
What are the objectives? What is the purpose of this test? What kind of errors are we looking for?
Responsibility
Who performs the testing?
Scope
As for the scope of the test, how far into the system you will go in conducting a test. How do you know when to stop?
7. Component Testing
The first test stage is component testing. Component testing is also known as unit, module or program testing (most often unit).
Component testing is most often done by programmers or testers with strong programming skills.
Ad hoc Testing:
• Does not have a test plan
• Not based on formal case design
o Not repeatable
o Private to the programmer
• Faults are not usually logged
Component Testing
• Has a test plan
• Based on formal test case design
o Must be repeatable
o Public to the team
o Faults are logged
If specifications aren't reviewed, the programmer is the first person to 'test' the specification
When reviewing a specification, look for ambiguities, inconsistencies and omissions. Omissions are hardest to spot.
Preparing tests from specifications finds faults in specifications.
Informal component testing is usually based on black box techniques. The test cases are usually derived from the
specification by the programmer. Usually they are not documented. It may be that the program cannot be run except using
drivers and maybe, a debugger to execute the tests. It’s all heavily technical, and the issue is – how will the programmer
execute tests of a component if the component doesn’t have a user interface? It’s quite possible.
The objective of the testing is to ensure that all code is exercised (tested) at least once. It may be necessary to use the
debugger to actually inject data into the software to make it exercise obscure error conditions. The issue with informal
component testing is – how can you achieve confidence that the code that’s been written has been exercised by a test when
an informal test is not documented? What evidence would you look for to say that all the lines of code in a program have
been tested? How could you achieve that?
Using a coverage measurement tool is really the only way that it can be shown that everything has been executed. But did
the code produce the correct results? This can really only be checked by tests that have expected output that can be
compared against actual output.
The problem with most software developers is that they don’t use coverage tools.
• Usually based on black box techniques
• Tables of test cases may be documented
• Tests conducted by the programmer
• There may be no separate scripts
• Test drivers, debugger used to drive the tests
o to ensure code is exercised
o to insert required input data
The first integration strategy is 'top down'. What this means is that the highest level component, say a top menu, is written
first. This can't be tested because the components that are called by the top menu do not yet exist. So, temporary
components called 'stubs' are written as substitutes for the missing code. Then the highest level component, the to menu,
can be tested.
When the components called by the top menu are written, these can be inserted into the build and tested using the top
menu component. However, the components called by the top menu themselves may call lower level components that do
not yet exist. So, once again, stubs are written to temporarily substitue for the missing components.
This incremental approach to integration is called 'top down'.
The second integration strategy is 'bottom up'. What this means is that the lowest level components are written first. These
components can't be tested because the components that call them do not yet exist. So, temporary components called
'drvers' are written as substitutes for the missing code. Then the lowest level components, can be tested using the test
driver.
When the components that call our lowest level components are written, these can be inserted into the build and tested in
conjunction with the lowest level components that they call. However, the new components themselves require drivers to be
written to substitute to clling components that do not yet exist. So, once again, drivers are written to temporarily substitue for
the missing components.
This incremental approach to integration is called 'bottom up'.
A mixed integration strategy involves some aspect of bottom-up, top-down and big bang.
8.10.Global data
Interface testing should also address the use of global data. Global data might be an area of memory shared by multiple
systems or components. Global data could also refer to the content of a database record or perhaps the system time, for
example.
Other assumptions:
Other assumptions relate to the "ownership" of global data. A component may assume that it can set the value of global
data and no other program can unset it or change it in anyway. Other assumptions can be that global data is always correct;
that is, under no circumstances can it be changed and be made inconsistent with other information held within a component.
A component could also make erroneous assumptions about the repeatability or re-entry of a routine.
All of these assumptions may be mistaken if the rules for use of global data are not understood.
Programming or interface standards should define whether callers, called or both routines perform checking and under what
circumstances.
The principle of all integration testing and all inter-component parameter passing is that interface standards must be clear
about how the calling and the called components process passed data and shared data. The issue about integration and
integration testing is that documenting these interfaces can eliminate many, if not all, interface bugs. In summary, most
interface bugs relate to shared data and mistaken assumptions about the use of that data across interfaces. Where
programmers do not communicate well within the programming team, it is common to find interface problems and
integration issues within that team. The same applies to different teams who do not document their interfaces and agree the
protocol is to be used between their different software products.
9. System and Acceptance Testing
System and acceptance testing focus on the testing of complete systems.
This module presents a few observations about the similarities and differences between system and acceptance testing
because the differences are slight, but important.
The most significant difference between acceptance and system testing is one of viewpoint.
System testing is primarily the concern of the developers or suppliers of software.
Acceptance testing is primarily the concerns of the users of software.
9.1. Similarities
Aim to demonstrate that documented requirements have been met
Let’s take an as an example, a middle-of-the-road IT application. Say, you’re building a customer information system, or a
help desk application, or a telesales system. The objective of both system and acceptance testing is one aim - to
demonstrate that the documented requirements have been met. The documented requirements might be the business
requirements or what’s in the functional spec, or the technical requirements.
A systematic demonstration that all features are available and work as specified
If you look at system testing from the point of the view of the supplier of the software, system testing tends to be viewed as
how the supplier demonstrates that they’ve met their commitment. This might be in terms of a contract or with respect to
meeting a specification for a piece of software that they’re going to sell.
Usual to assume that all major faults have been removed and the system works
It is usual to assume at acceptance testing that all major faults have been removed by the previous component, link and
system testing and that the system 'works'. In principle, if earlier testing has been done thoroughly, then it should be safe to
assume the faults have been removed. In practice, earlier testing may not have been thorough and acceptance testing can
become more difficult.
When we buy an operating system, say a new version of Microsoft Windows, we will probably trust it if it has become widely
available. But will we trust that it works for our usage? If we’re Joe Public and we’re just going to do some word-processing,
we’ll probably assume that it is okay. It’s probably perfectly adequate, and we’re going to use an old version of Word on it
and it will probably work just fine. If on the other hand, we are a development shop and we’re writing code to do with device
drivers, it needs to be pretty robust. The presumption that it works is no longer safe because we’re probably going to try and
break it. That’s part of our job. So this aspect of reliability, this assumption about whether or not it works, is basically from
your own perspective.
Acceptance tests:
Acceptance testing is usually on a smaller scale than the system test. Textbook guidelines say that functional system testing
should be about four times as much effort as acceptance testing. You could say that for every user test, the suppliers should
have run, around four tests. So, system tests are normally of a larger scale than acceptance tests.
On some occasions, the acceptance test is not a separate test, but a sub-set of the system test. The presumption is that
we’re hiring a company to write software on our behalf and we’re going to use it when it’s delivered. The company
developing the software will run their system testing on their environment. We will also ask them to come to our test
environment and to rerun a subset of their test that we will call our acceptance test.
Functional specification
many details might be assumed to exist, but can't be identified from requirements
When the user writes the requirement, many details might be assumed to exist. The supplier won’t necessarily have those
assumptions, so they will deliver what they think will work. Assumptions arise from knowledge that you have yourself, but
you didn’t transmit to the requirements document. A lot of low-level requirements, like field validation and steps of the
process don’t appear in a requirements document. Again, looking at the processes of a large SAP system, they are
incredibly complicated. You have a process called “The Order Process”, and within SAP, there may be 40 screens that you
can go through. Now, nobody would use 40 screens to process an order. But SAP can deliver a system that, in theory, could
use all 40.
The key to it is the configuration that selects only those bits that are useful to you. All that detail backs up the statement
‘process an order’ is the difference between processing an order as you want to do it versus something that’s way over the
top. Or the opposite can happen, that is, having a system that processes an order too simplistically when you need
variations. That’s another reason why you have to be careful with requirements.
Intended to demonstrate that the software 'fits' the way the users want to work
We have this notion of fit between the system and the business. The specific purpose of the user acceptance test is to
determine whether the system can be used to run the business.
When buying a package, UAT may be the only form of testing applied.
Packages are a problem because there is no such notion of system testing; you only have acceptance testing. That’s the
only testing that’s visible if it’s a package that you’re not going to change. Even if it is a package that you are only going to
configure (not write software for), UAT is the only testing that’s going to happen.
Users may stage any tests they wish but may need assistance with test design, documentation and organisation
The idea of user acceptance testing is that users can do whatever they want. It is their test. You don’t normally restrict
users, but they often need assistance to enable them to test effectively.
Similar to UAT, focusing on the contractual requirements as well as fitness for purpose
The test itself can take a variety of forms. It could be a system test done by a supplier. It could be what we call a factory
acceptance test which is a test done by the supplier that is observed, witnessed if you like, by the customer. Or you might
bring the software to the customer’s site and run a site acceptance test. Or it could even be the user acceptance test.
9.13.Extended V Model
It’s the same as you’ve seen before, but maybe there’s an architectural aspect to this. Multiple systems collaborate in an
architecture to deliver a service. And the testing should reflect a higher level than just a system level. It could be thought of
as the acceptance test of how the multiple systems deliver the required functionality.
9.14.Phase of Integration
Integration testing is not easy – you need an approach or a methodology to do it effectively. First, you need to identify all of
the various systems that are in place and then you need to do analysis to decide the type of fault you may find, followed by
a process to create a set of tests covering the paths through integration, i.e., the connection of all these systems. And finally,
you have to have a way of predicting the expected results so that you can tell whether the systems have produced the
correct answer.
10. Non-Functional System Testing
Non-functional requirements (NFR) relate are those that state how a system will deliver its functionality. NFRs are as important
as functional requirements in many circumstances but are often neglected. The following seven modules provide an introduction
to the most important nonfunctional test types.
10.2.Non-functional requirements
Requirements difficulties
The problem with non-functional requirements is that usually they’re not written down. Users naturally assume that a system
will be usable, and that it will be really fast, and that it will work for more than half the day, etc. Many of these aspects of how
a system delivers the functionality are assumptions. So, if you look at a functional spec, you’ll see 200 pages of functional
spec and then, maybe, one page of functional requirements, and then maybe one page of non-functional requirements. If
they are written down rather than assumed, they usually aren’t written down to the level of detail that they need to be tested
against.
Stress testing
Stress testing is where you push the system as hard as you can, up to its threshold. You might record response times, but
stress testing is really about trying to break the system. You increase the load until the system can’t cope with it anymore
and something breaks. Then you fix that and retest. This cycle continues until you have a system that will endure anything
that daily business can hand it.
Performance testing
Performance testing is not (and this is where it differs from functional testing) a single test. Performance testing aims to
investigate the behaviour of a system under varying loads. It’s a whole series of tests. And basically, the objective of
performance testing is to create a graph based on a whole series of tests. The idea is to measure the response times from
the extremes of a low transaction rate to very high transaction rate. As you run additional tests with higher loads, the
response time gets worse. Eventually, the system will fail because it cannot handle the transaction rate. The primary
purpose of the test is to show that at the load that the system was designed for, the response times meet the requirement.
Another objective of performance or stress testing is to tune the system, to make it faster.
Whether you are doing load testing, performance testing or stress testing, you will need an automated tool to be effective.
Performance testing can be done with teams of people, but it gets very boring very quickly for the people that are doing the
testing, it’s difficult to control the test, and often difficult to evaluate the results.
10.6.Other objectives
Performance testing will vary depending on the objectives of the business. Frequently there are other objectives besides
measuring response times and loads.
Process.
And you need a process. You need an organised way, a method to help you determine what to do and how to do it.
10.8.The 'task in hand'
Load generation
With an application system, you will keep upping the transaction rate and load until it breaks, and that’s the stress test.
Resource monitoring.
But knowing the performance of a system is not enough. You must know what part of the system is doing what. Inevitably
when you first test a client-server system, the performance is poor. But this information is not useful at all unless you can
point to the bottleneck(s). In other words, you have to have instrumentation.
Actually, there’s no limit to what you can monitor. The things to monitor are all the components of the service including the
network. The application itself may have instrumentation/logging capability that can measure response times. Most
databases have monitoring tools. NT, for example, has quite sophisticated monitoring tools for clients. There’s almost no
limit to what you can monitor. And you should try to monitor everything that you might need because re-running a test to
collect more statistics is very expensive.
Resource monitoring is normally done by a range of different tools as well as instrumentation embedded in application or
middleware code.
In our experience, you always need to write some of your own code to fill in where proprietary tools cannot help.
10.12.Security Testing
The purpose of this section is not to describe precisely how to do security testing (it’s a specialist discipline that not many
people can do), but to look at the risks and establish what should be tested.
10.13.Security threats
When we consider security, we normally think of hackers working late into the night, trying to crack into banks and
government systems. Although hackers are one potential security problem, the scope of system security spans a broad
range of threats.
Accidental faults such as accidental change, deletion of data, lack of backups, insecure disposal of media, poor procedures
Even the best laid plans can be jeopardised by accidents or unforseen chains of events.
Deliberate or malicious actions such as hacking by external people or disgruntled or fraudulent employees
Hackers are a popular stereotype presented in the movies. Although the common image of a hacker is of a young college
dropout working long into the night, the most threatening hacker is likely to be a professional person, with intimate
knowledge of operating system, networking and application vulnerabilities who makes extensive use of automated tools to
speed up the process dramatically.
10.14.Security Testing
CIA model:
The way that the textbooks talk about security is the CIA model.
Confidentiality is usually what most people think of when they think of security. The question here is "are unauthorised
people looking at restricted data?" The system needs to make certain that authorisation occurs on a person basis and a
data basis.
The second security point is Integrity. This means not just exercising restricted functionality, but guarding against changes or
destruction of data. Could the workings of a system be disrupted by hacking in and changing data?
And the third security point is availability. It’s not a case of unauthorized functions, but a matter of establishing whether
unauthorised access or error could actually disable the system.
10.17.Usability Testing
We’re all much more demanding about usability than we used to be. As the Web becomes part of more and more people's
lives, and the choice on the web increases, usability will be a key factor in retaining customers. Having a web site with poor
usability may mean the web site (and your business) may fail.
10.19.User requirements
Typical requirements:
• Messages to users will be in plain English. If you’ve got a team of twenty programmers all writing different
messages, inconsistencies with style, content and structure are inevitable.
• Commands, prompts and messages must have a standard format, should have clear meanings and be consistent.
• Help functions should be available, and they need to be meaningful and relevant.
• User should always know what state the system is in. Will the user always know where they are? If the phone rings
and they get distracted, can they come back and finish off their task, knowing how they got where they were?
• Another aspect of usability is the feedback that the system gives them – does it help or does it get in the way?
For example, if the user goes to one screen and inputs data, and then goes into another screen and is asked for the
data again, this is a negative usability issue.
The user shouldn’t have to enter data that isn’t required. Think of a paper form where you have to fill box after box of
N/A (not applicable). How many of these are appropriate? The programmer may be lazy and put up a blank form,
expecting data to be input, and then processing begins. But it is annoying if the system keeps coming back asking for
more data or insists that data is input when for this function, it is irrelevant.
The system should only display informational messages as requested by the user.
To recap, the system should not insist on confirmation for limited choice entries, must provide default values when
applicable, and must not prompt for data it does not need.
Other considerations:
There are a number of considerations regarding usability test cases.
There could be two separate tests staged – one for people that have never seen the system and one for the experienced
users. These two user groups have different requirements; the new user is likely to need good guidance and the
experienced user is likely to be frustrated by over-guidance, slow responses, and lack of short cuts. Of course to be valid,
you need to monitor the results (mistakes made, times stuck, elapsed time to enter a transaction, etc.).
When running usability tests, it is normal practice to log all anomalies encountered during the tests. In a usability
laboratory with video and audio capture of the user behaviour and the keystroke capture off the system under test, a
complete record of the testing done can be obtained. This is the most sophisticated, (but expensive) approach, but just
having witnesses observe users can be very effective.
It is common to invite the participants to 'speak their mind' as they work. In this way, the developers can understand the
thought processes that users go through and get a thorough understanding of their frustrations.
• Need to monitor faults
o how many wrong screen or function keys etc.
o how many faults were corrected on-line
o how many faults get into the database
• Quality of data compared to manual systems?
• How many keystrokes to achieve the desired objective? (too many?)
Storage tests demonstrate that a system's usage of disk or memory is within the design limits over time e.g. can the system
hold five-years worth of system transactions?
The question is, "can a system, as currently configured, hold the volume of data that we need to store in it?"
Assume you are buying an entire system including the software and hardware. What you’re buying should last longer than
six months, or more than a year, or maybe five years. You want to know whether the system that you buy today can support,
say, five years worth of historical data.
So, for storage testing, you aim to predict the eventual volume of data based on the number of transactions processed over
the system's lifetime. Then, by creating that amount of data, you test that the system can hold it and still operate correctly.
Volume tests demonstrate that a system can accommodate the largest (& smallest) tasks it is designed to perform e.g. can
end of month processes be accommodated?
The volume-tests are simply looking at how large (or small) a task can the system accommodate?
Not how many transactions per second (i.e. transaction rate), but how big a task in terms of the number of transactions in
total? The limiting resource might be long-term storage on disk, but it might also be short-term storage in memory, as well.
Rather than you saying, we want to get hundreds of thousands of transactions per hour through our system, we are asking,
‘can we simultaneously support a hundred users, or a thousand users’? We want to push the system to accommodate as
many parallel streams of work as it has been designed for...and a few more.
10.24.Requirements
Many people wouldn't bother testing the limits of a system if they thought that the system would give them plenty of warning
as a limit is approached so that the eventual failure is predictable. Disk space is compatively cheap these days so storage
testing is not the issue it once was. On the other hand, systems are getting bigger and bigger by the day and the failures
might be more extreme.
Requirement is for the system to:
Testing the initial and anticipated storage and volume requirements involves loading the data to the levels specified in the
requirements documents and seeing if the system still works. You can’t just create a mountain of dummy data and then walk
away.
10.25.Running tests
When you run tests on a large database, you’re going to wait for failures to occur. You have to consider that as you keep
adding rows, eventually it will fail. What happens when it does fail? Do you have a simple message and no one can process
transactions or is it less serious than that? Do you get warnings before it fails?
The test requires the application to be used with designed data volumes
Creation of the initial database by artificial means if necessary (data conversion or randomly generated)
How do you build a production-sized database for a new system? To create a production-sized database you may need to
generate millions and millions of rows of data which obey the rules of the database.
10.26.Pre-requisites
When constructing storage and volume tests there are certain pre-requisites that must be arranged before testing can start.
It is common, as in many non-functional areas, for there to be no written requirements. The tester may need to conduct
interviews and analysis to document the actual requirements.
Often the research required to specify these tests is significant and requires detailed technical knowledge of the application,
the business requirements, the database structure and the overall technical architecture.
• Technical requirements
o database files/tables/structures
o initial and anticipated record counts
• Business requirements
o standing data volumes
o transaction volumes
• Data volumes from business requirements using system/database design knowledge.
10.27.Installation Testing
Installation testing is relevant if you’re selling shrink-wrapped products or if you expect your 'customers', who may be in-
house users, to do installations for themselves.
If you are selling a game or a word-processor or a PC-operating system, and it goes in a box with instructions, an install kit,
a manual, guarantees, and anything else that’s part of the package, then you should consider testing the entire package
from installation to use.
The installation process must work because if it’s no good, it doesn’t matter how good your software is; if people can’t get
your software installed correctly, they’ll never get your software running - they'll complain and may ask for their money back.
10.28.Requirements
Can the system be installed and configured using supplied media and documentation?
the last thing written, so may be flaky, but is the first thing the user will see and experience.
10.29.Running tests
Tests are normally run on a clean, 'known' environment that can be easily restored (you may need to do this several times).
Typical installation scenarios are to install, re-install, de-install the product and verify the correct operation of the product in
between installations.
The integrity of the operating system and the operation of other products that reside on the system under test is also a
major consideration. If a new software installation causes other existing products to fail, users would regard this as a very
serious problem. Diagnosis of the cause is normally extremely difficult and restoration of the orginal configuration is often a
complicated, risky affair. Because the risk is so high, this form of regression testing must be included in the overall
installation test plan to ensure that your users are not seriously inconvenienced.
• On a 'clean' environment, install the product using the supplied media/documentation
• For each available configuration:
o are all technical components installed?
o does the installed software operate?
o do configuration options operate in accordance with the documentation?
• Can the product be reinstalled, de-installed cleanly?
10.30.Documentation testing
Documentation can be viewed as all of the material that helps users use the software. In addition to the installation guide,
the user guide, it also includes online Help, all of the graphical images and the information on the packaging box itself.
If it is possible for these documents to have faults, then you should consider testing them.
• Documentation can include:
o user manuals, quick reference cards
o installation guides, online help, tutorials, read me files, web site information
o packaging, sample databases, registration forms, licences, warranty, packing lists...
Does the document reflect the actual functionality of the documented system?
User documentation should reflect the product, not the requirements. Are there features present that are not documented, or
worse still, are there features missing from the system?
Backup and recovery tests demonstrate that these processes work and can be relied upon if a major failure occurs.
The kind of scenarios and the typical way that tests are run is to perform full and partial backups and to simulate failures,
verifying that the recovery processes actually work. You also want to demonstrate that the backup is actually capturing the
latest version of the database, the application software, and so on.
• Can incremental and full system backups be performed as specified?
• Can partial and complete database backups be performed as specified?
• Can restoration from typical failure scenarios be performed and the system recovered?
10.35.Failure scenarios
A large number of scenarios are possible, but few can be tested. The tester needs to work with the technical architect to
identify the range of scenarios that should be considered for testing. Here are some examples.
• Loss of machine - restoration/recovery of entire environment from backups
• Machine crash - automatic database restoration/recovery to the point of failure
• Database roll-back to a previous position and roll-forward from a restored position
Typically you take checkpoints using reports showing specific transactions and totals of particular subsets of data as you go
along. Start by performing a full backup, then do some reports, execute a few transactions to change the content of the
database and rerun the reports to demonstrate that you have actually made those changes, followed by an incremental
backup.
Then, reinstall the system from the full backup, and verify with the reports that the data has been restored correctly. Apply
the incremental back up and verify the correctness, again by rerunning the reports. This is typical of the way that tests of
minor failures and recover scenarios are done.
• Perform a full backup of the system
o Execute some application transactions
o Produce reports to show changes ARE present
• Perform an incremental backup
• Restore system from full backup
o Produce reports to show changes NOT present
• Restore system from partial backup
o Produce reports to show changes ARE present.
While entering transactions into the database, bring the machine down by causing (or simulating) a machine crash
You can also do more interesting tests that simulate a disruption. While entering transactions into the system, bring the
machine down - pull the plug out, do a shut-down, or simulate a machine crash. You should, of course, seek advice from the
hardware engineers of the best way to simulate these failures without causing damage to servers, disks, etc.
Reboot the machine and demonstrate by means of query or reporting, that the database has recovered the transactions
committed up to the point of failure.
The principle is again that when you reboot the system and bring it back on line, you have to conduct a recovery from the
failure. This type of testing requires you to identify components and combinations of components that could fail, and
simulate the failures of whatever could break, and then using your systems, demonstrate that you can recover from this.
11. Maintenance Testing
The majority of effort expended in the IT industry is to do with maintenance. The problem is that the textbooks don’t talk about
maintenance very much because it's often complicated and 'messy'. In the real world, systems last longer than the project that
created them. Consequently, the effort required to repair and enhance systems during their lifetime exceeds the effort spent
building them in the first place.
11.1.Maintenance considerations
11.2.Maintenance routes
Essentially, there are two ways of dealing with maintence changes. Maintenance fixes are normally packaged into
manageable releases.
• Groups of changes are packaged into releases; for adaptive or non-urgent corrective maintenance.
• Urgent changes handled as emergency fixes; usually for corrective maintenance
It is often feasible to treat maintenance releases as abbreviated developments. Just like normal development, there are two
stages: definition and build.
11.3.Release Definition
Maintenance programmers do an awful lot of testing. Half of their work is usually figuring out what the software does and the
best way to do this is to try it out. They do a lot of investigation initially to find out how the system works. When they have
changed the system, they need to redo that testing.
Maintenance package handled like development except testing focuses on code changes and ensuring existing functionality
still works
What often slips is the regression testing unless you are in a highly disciplined environment. Unless you’ve got an
automated regression test pack, maintenance regression testing is usually limited to a minimal amount. That’s why
maintenance is risky.
If tests from the original development project exist, they can be reused for maintenance regression testing, but it's more
common for regression test projects aimed at building up automated regression test packs to have to start from scratch.
If the maintenance programmers record their tests, they can be adapted for maintenance regression tests.
Regression testing is the big effort. Regression testing dominates the maintenance effort as it is usually takes more than
half of the total effort for maintenance. So, part of your maintenance budget must be to do a certain amount of regression
testing and, potentially, automation of that effort as well.
Maintenance fixes are error-prone - 50% chance of introducing another fault so regression testing is key
If release is urgent and time is short, can still test after release
11.5.Emergency maintenance
You could make the change and install it, but test it in your test environment. There’s nothing stopping you from continuing
to test the system once it’s gone into production. In a way, this is a bit more common than it should be.
Releasing before all regression testing is complete is risky, but if testing continues, the business may not be exposed for too
long as any bugs found can be fixed and released quickly.
• Usually "do whatever is necessary"
• Installing an emergency fix is not the end of the process
• Once installed you can:
o continue testing
o include it for proper handling in the next maintenance release
12. Introduction to Testing Techniques ( C & D)
12.3.1.Equivalence Partitioning
12.3.1.1.1.Equivalence partitioning
12.3.1.1.4.Output partitions
12.3.1.1.5.Hidden partitions
12.4.1.1.1.Path testing
12.4.1.1.3.Branch coverage
12.4.1.1.4.Coverage measurement
12.8.Error Guessing
12.8.2.Examples of traps
v. Types of Review
vi. Levels of review 'formality'
viii. Walkthroughs
x. Inspections
xiv. Pitfalls
g. Static Analysis
ii. Compilers
v. Definition-use examples
x. Complexity measures
Module E : Test Management
h. Organisation
We need to consider how the testing team will be organised. In small projects, it might be an individual who simply has to
organise his own work. In bigger projects, we need to establish a structure for the various roles that different people in the
team have. Establishing a test team takes time and attention in all projects.
Independent organisations may be called upon to do any of the above testing formally.
On occasions there is a need to demonstrate complete independence in testing. This is usually to comply with some
regulatory framework or perhaps there is particular concern over risks due to a lack of independence. An independent
company may be hired to plan and execute tests. In principle, third party companies and outsource companies, can do any
of the layers of testing from component through system or user acceptance testing, but it’s most usual to see them doing
system testing or contractual acceptance testing.
j. Independence
Good programmers can test their own code if they adopt the right attitude
The biggest influence on the quality of the tests is the point of the view of the person designing those tests. It’s very difficult
for a programmer to be independent. They find it hard to eliminate their assumptions. The problem a programmer has is that
sub-consciously they don’t want to see their software fail. Also, programmers are usually under pressure to get the job done
quickly and they are keen to write the next new bit of code which is what they see as the interesting part of the job. These
factors make it very difficult for them to construct test cases and have a good chance of detecting faults.
Of course, there are exceptions and some programmers can be good testers. However, their lack of independence is a
barrier to them being as effective as a skilled independent tester.
Buddy-checks/testing can reduce the risk of bad assumptions, cognitive dissonance etc.
A very useful thing to do is to get programmers in the same team to swap programs so that they are planning and
conducting tests on their colleague’s programs. In doing this, they bring a fresh viewpoint because they are not intimately
familiar with the program code; they are unlikely to have the same assumptions and they won’t fall into the trap of ‘seeing’
what they want to see. The other reason that this approach is successful is that programmers feel less threatened by their
colleagues than by independent testers.
Test manager
A Test Manager is really a project manager for the testing project; that is, they plan, organise, manage, and control the
testing within their part of the project.
There are a number of factors, however, that set a Test Manager apart from other IT project managers. For a start, their key
objective is to find faults and on the surface, that is in direct conflict with the overall project’s objective of getting a product
out on time. To others in the overall project, they will appear to be destructive, critical and sceptical. Also, the nature of the
testing project changes markedly when moving from early stage testing to the final stages of testing. Lastly, a test manager
needs a set of technical skills that are quite specific. The Test Manager is a key role in successful testing projects.
Test analyst
Test analysts are the people, basically, who scope out the testing and gather up the requirements for the test activities to
follow.
In many ways, they are business analysts because they have to interview users, interpret requirements, and construct tests
based on the information gained.
Test analysts should be good documenters, in that they will spend a lot of time documenting test specifications, and the
clarity with which they do this is key to the success of the tests.
The key skills for a test analyst are to be able to analyse requirements, documents, specifications and design documents,
and derive a series of test cases. The test cases must be reviewable and give confidence that the right items have been
covered.
Test analysts will spend a lot of time liasing with other members of the project team.
Finally, the test analyst is normally responsible for preparing test reports, whether they are involved in the execution of the
test or not.
Tester
What do testers do? Testers build tests. Working from specifications, they prepare test procedures or scripts, test data, and
expected results. They deal with lots of documentation and their understanding and accuracy is key to their success. As well
as test preparation, testers execute the tests and keep logs of their progress and the results. When faults are found, the
tester will retest the repaired code, usually by repeating the test that detected the failure. Often a large amount of regression
testing is necessary because of frequent or extensive code changes and the testers execute these too. If automation is well
established, a tester may be in control of executing automated scripts too.
l. Support staff
Toolsmiths to build utilities to extract data, execute tests, compare results etc.
Where automation is used extensively, a key part of any large team involves individuals known as tool smiths, that is,
people able to write software as required. These are people who have very strong technical backgrounds; programmers,
who are there to provide utilities to help the test team. Utilities may be required to build or extract test data, to run tests, as
harnesses, drivers and to compare results.
Symptoms of poor configuration management are extremely serious because they have significant impacts on testers; most
obviously on productivity, but it can be a morale issue as well because it causes a lot of wasted work.
Simultaneous changes made to same source module by multiple developers and some changes lost.
Some issues of control are caused by developers themselves, overwriting each other’s work. Here’s how it happens.
There are two changes required to the same source module. Unless we work on the changes serially, which causes a
delay, two programmers may reserve the same source code. The first programmer finishes and one set of changes is
released back into the library. Now what should happen is that when the second programmer finishes, he applies the
changes of the first programme to his code. Faults occur when this doesn’t happen! The second programmer releases
his changed code back into the same library, which then overwrites the first programmer’s enhancement of the code.
This is the usual cause of software fixes suddenly disappearing.
Configuration Management, or CM, is a sizeable discipline and takes three to five days to teach comprehensively. However,
in essence, CM is easy to describe. It is the "control and management of the resources required to construct a software
artefact".
However, although the principles might be straightforward, there is a lot to the detail. CM is a very particular process that
contributes to the management process for a project. CM is a four-part discipline described on the following slides.
In Status Accounting, all the transactions that take place within the CM system are logged, and this log can be used for
accounting and audit information within the CM library itself. This aspect of CM is for management.
Configuration Auditing is a checks and balances exercise that the CM tool itself imposes to ensure integrity of the rules,
access rights and authorisations for the reservation and replacement of code.
Configuration Control has three important aspects: the Controlled Area/Library, Problem/Defect Reporting, and Change
Control.
The Controlled Area/Library function relates to the controlled access to the components; the change, withdrawal, and
replacement of components within the library. This is the gateway that is guarded to ensure that the library is not changed in
an unauthorized way.
The second aspect of Configuration Control is problem or defect reporting. Many CM systems allow you to log incidents or
defects against components. The logs can be used to drive changes within the components in the CM system. For example,
the problem defect reporting can tell you which components are undergoing change because of an incident report. Also, for
a single component, it could tell you which incidents have been recorded against that component and what subsequent
changes have been made.
The third area of Configuration Control is Change Control itself. In principle, this is the simple act of identifying which
components are affected by a change and maintaining the control over who can withdraw and change code from the
software library. Change Control is the tracking and control of changes.
Associate a given version of a test with the appropriate version of the software to be tested
With the test references recorded beside the components, it is possible to relate the tests used to each specific version of
the software.
Ensure problem reports can identify s/w and h/w configurations accurately
If the CM system manages incident reports, it’s possible to identify the impact of change within the CM system itself. When
an incident is recorded or logged in the CM system under ‘changes made to a component’, the knock-on effects in other
areas of the software can potentially be identified through the CM system. This report will give an idea of the regression
tests that might be worth repeating.
A CM tool provides support to the project manager too. A good CM implementation helps the project manager understand
and control the changes to the requirements, and potentially, the impacts.
It allows the project members to develop code, knowing that they won’t interfere with each other’s code, as they reserve,
create, and change components within the CM system.
Programmers are frequently tempted to ‘improve’ code even if there are no faults reported; they will sometimes make
changes that haven’t been requested in writing or supported by requirements statements. These changes can cause
problems and a good CM tool makes it less likely and certainly more difficult for the developers to make unauthorised
changes to software.
The CM system also provides the detailed information on the status of the components within the library and this gives the
project manager a closer and more technical understanding of the project deliverables themselves.
Finally, the CM system ensures the traceability of software instances right back to the requirements and the code that has
been tested.
i. Test Estimation, Monitoring, and Control
In this module, we consider the essential activities required to project manage the test effort. These are estimation, monitoring
and control. The difficulty with estimation is obvious: the time taken to test is indeterminate, because it depends on the quality of
the software - poor software takes longer to test. The paradox here, is that we won't know the quality of the software until we
have finished testing.
Monitoring and control of test execution is primarily concerned with the management of incidents. When a system is passed into
the system-level testing, confidence in the quality of the system is finally determined. Confidence may be proved to be well
founded or unfounded. In chaotic environments, system test execution can be traumatic because many of the assumptions of
completeness and correctness may be found wanting. Consequently, the management of system level testing demands a high
level of management commitment and effort.
The big questions - "How much testing is enough?" also arises. Just when can we be confident that we have done enough
testing, if we expect that time will run out before we finish? According to the textbook, we should finish when the test completion
criteria are met, but handling the pressure of squeezed timescales is the final challenge of software test management.
s. Test estimates
If testing consumes 50% of of the development budget, should test planning comprise 50% of all project planning?
Test Stage Notional Ask a test manager how long it will take to test a system
Estimate and they’re likely to say, ‘How long is a piece of string?’
To some extent, that’s true, but only if you don’t scope
Unit 40%
the job at all! It is possible to make reasonable
estimates if the planning is done properly and the
Link/Integration 10% assumptions are stated clearly.
Let’s start by looking at how much of the project cost is
testing. Textbooks often quote that testing consumes
System 40%
approximately 50% of the project budget on the
average. This can obviously vary depending on the
Acceptance 10% environment and the project. This figure assumes that
test activities include reviews, inspections, document
walk-throughs (project plans, design and requirements),
as well as the dynamic testing of the software
deliverables from components through to complete
systems. It’s quite clear that the amount of effort
consumed by testing is very significant indeed.
If one considers the big test effort in a project is,
perhaps, half of the total effort in a project, it’s
reasonable to propose that test planning, the planning
and scheduling of test activities, might consume 50% of
all project planning. And that’s quite a serious thing to
consider.
t. Problems in estimating
But if you can estimate test design, you can work out ratios.
However, you can still estimate test design, even if you cannot estimate test execution. If you can estimate test design,
there are some rules of thumb that can help you work out how long you should provisionally allow for test execution.
Don't underestimate the time taken to set up the testing environment, find data etc.
For example, if you’re running a system or acceptance test, the construction, set-up and configuration of a test environment
can be a large task. Test environments rarely get created in less than a few days and sometimes require several weeks.
v. 1 – 2 – 3 rules
The ‘1-2-3 Rule’ is useful, at least, as a starting point for estimation. The principle is to split the test activities into three
stages – specification, preparation, and execution. The ‘1-2-3 Rule’ is about the ratio of the stages.
y. Monitoring progress
Incident status
If we monitor the incidents themselves, we might track incidents raised, incidents cleared, incidents ‘fixed’ and awaiting
retest and those that are still outstanding. Again, looking at ratios between those closed and those outstanding, the ratio
should improve over time.
As we progress throught the test plan, there are usually three distinct stages that we can recognise.
Early on, the number of incidents raised increases rapidly, then we reach a peak and the rate diminishes over the final
stages.
If we run out of time as the number of new incidents being logged is decreasing it might be safe to stop testing but we must
make some careful considerations:
What is the status of the outstanding incidents? Are they severe enough to preclude acceptance? If so, we cannot stop
testing and release now.
What tests remain to be completed? If there are tests of critical functionality that remain to be done, it would be unsafe to
stop testing and release now. If we are coming towards the end of the test plan, the stakeholders and management may
take the view testing can stop before the test plan is complete if (and only if) the outstanding tests cover functionality that is
non-critial or low risk.
The job of the tester is to provide sufficient information for the stakeholders and management to make this judgement.
If you have additional test to run, what are the risks of not running before release?
Suppose we’re coming towards the end of the time in the plan for test execution, what is the risk of releasing the software
before we complete the test plan?
Unplanned events occurring during testing that have a bearing on the success of the test
The formal definition of an incident is an event that occurs during the testing that has a bearing on the success of the test.
This might be a concern over the quality of the software because there’s a failure in the test itself. Or it may be something
that’s outside the control of the testers, like machine crashes, or there’s a loss of the network, or maybe a lack of test
resource.
It could be...
When you run a test and the expected results do not match the actual results, it could be due to a number of reasons. The
issue here is that the tester shouldn’t jump to the conclusion that it’s a software fault.
For example, it could be something wrong with the test itself; the test script may be incorrect in the commands it expected to
appear or the expected result may have been predicted incorrectly.
Maybe there was a misinterpretation of the requirements.
It could be that the tester executing the test didn’t follow the script and made a slip in the entry of some test data and that is
what’s caused the software to behave differently than expected.
It could be that the results themselves are correct but the tester misunderstood what they saw on the screen or on a printed
report.
Another issue could be that it might be the test environment. Again, test environments are often quite fluid and changes are
being made continuously to refine their behaviour. Potentially, a change in the configuration of the software in the test
environment could cause a changed behaviour of the software under test.
Maybe the wrong version of a database was loaded or the base parameters were changed since the last test.
Finally, it could be something wrong with the baseline; that is, the document upon which the tests are being based is
incorrect. The requirement itself is wrong.
Diagnose incident
If it’s determined that there’s a real problem that can be reproduced by the tester and it’s not the tester’s fault, the incident
should be logged and classified. It will be classified, based on the information available, as to whether it is an environmental
problem, a testware problem or a problem with the software itself. It will then be assigned to the relevant team or to a
person who will own the problem, even if it is only temporarily.
Fix tester
If the tester made a slip during the testing, they should restart the script and follow it to the letter.
Fix environment
If the environment is at fault, then the system needs reconfiguring correctly, or the test data adjusting/rebuilding to restore
the environment to the required, known state. Then the test should restart.
ii. Severity
If the fault is minor, it might be deemed of low severity and users might choose to implement this software even if it still
had the fault.
Incidents get prioritised and developer resources get assigned according to priority.
To revisit the priority assigned to an incident, developer resources will get assigned according to that priority. This isn’t the
same as the severity. The decision that we’ll have to make towards the end of the test phase is "which incidents get worked
upon based on priority and also severity"?
kk. Testability
Essentially, we can think of testability as the ease by which a tester can specify, implement, execute and analyse tests of
software. This module touches on an issue that is critical to the tester
It’s the ease by which a tester can specify tests. Namely, are the requirements in a form that you can derive test plans from
in a straightforward, systematic way?
The ease by which a tester can prepare tests. How difficult is it to construct test plans and procedures that are effective?
Can we create a relatively simple test database, simple test script?
Is it easy to run tests and understand and interpret the test results? Or when we run tests, does it take days to get to the
bottom about where the results are? Do we have to plough through mountains of data? In other words, we are talking about
the ease by which we can analyse results and say, pass or fail.
How difficult is it to diagnose incidents and point to the source of the fault.
Implementation (how required attributes of the test process are to be achieved e.g. tools)
The standard doesn’t make any recommendations or instructions to do with the implementation of tests. It doesn’t give you
any insight as to how the test environment might be created or what tools you might use to execute tests themselves. It’s
entirely generic in that regard.
... shall specify the techniques to be employed in the design of test cases and the rationale for their choice...
What the component-testing standard does say is that you should have a strategy for component testing. The test strategy
for components should specify the techniques you are going to employ in the design of test cases and the rationale for their
choice. So although the standard doesn’t mandate one test technique above another, it does mandate that you record the
decision that nominated the techniques that you use.
... shall specify criteria for test completion and the rationale for their choice...
The standard also mandates that within your test strategy you specify criteria for test completion. These are also often
called exit or acceptance criteria for the test stage. Again, it doesn’t mandate what these criteria are, but it does mandate
that you document the rationale for the choice of those criteria
Degree of independence required of personnel designing test cases e.g.:
A significant issue, with regard to component testing, is the degree of independence required by your test strategy. Again,
the standard mandates that your test strategy defines the degree of independence used in the design of test cases but
doesn’t make any recommendation on how independent these individuals or the ‘test agency’ will be.
The standard does offer some possible options for deciding who does the testing. For example, you might decide that the
person who writes the component under test also writes the test cases. You might have an independent person writing the
test cases or you might have people from a different section in the company, from a different company. You might ultimately
decide that a person should not choose the test cases at all - you might employ a tool to do this
Whether testing is done in isolation, bottom-up or top-down approaches, or some mixture of these
The first one of these is that the strategy should describe how the testing is done with regard to the component's isolation;
that is, whether the component is tested in a bottom-up or top-down method of integration or some mixture of these. The
requirement here is to document whether you’re using stubs and drivers, in addition to the components of the test, to
execute tests.
Planning starts the test process and Check for Completion ends it. These activities are carried out for the whole component.
Specification, Execution, and Recording can, on any one iteration, be carried out for a subset of the test cases associated
with a component. It is possible that later activities for one test case can occur before earlier activities for another.
Whenever a fault is corrected by making a change or changes to test materials or the component under test, the affected
activities should be repeated. The five generic test activities are briefly described:
Planning: The test plan should specify how the project component test strategy and project test plan apply to the component
under test. This includes specific identification of all exceptions to project test strategies and all software with which the
component under test will interact during test execution, such as drivers and stubs.
Specification: Test cases should be designed using the test case design techniques selected in the test planning activity.
Each test case should identify its objective, the initial state of the component, its input(s), and the expected outcome. The
objective should be described in terms of the test case design technique being used, such as the partition boundaries
exercised.
Execution: Test cases should be executed as described in the component test specification.
Recording: For each test case, test records should show the identities and versions of the component under test and the
test specification. The actual outcome should also be recorded. It should be possible to establish that all the specified
testing activities have been carried out by reference to the test reports. Any discrepancy between the actual outcome and
the expected outcome should be logged and analysed in order to establish where the problem lies. The earliest test activity
that should be repeated in order to remove the discrepancy should be identified. For each of the measure(s) specified as
test completion criteria in the plan, the coverage actually achieved should also be recorded.
Check for Completion: The test records should be checked against the test completion criteria. If these criteria are not met,
the earliest test activity that has to be repeated in order to meet the criteria shall be identified and the test process shall be
restarted from that point. It may be necessary to repeat the test specification activity to design further test cases to meet a
test coverage target
ww. Standard definition of Technique
The standard gives you comprehensive definitions of the techniques to be used within the testing itself.
Test measurement techniques to help users (and customers) measure the testing
The measurement techniques will help testers, and potentially customers, to measure how much testing has actually been
done.
To promote
The purpose in using these design and measurement techniques is to promote a set of consistent and repeatable test
practices within the component testing discipline. The process and techniques provide a common understanding between
developers, testers, and the customers of software of how testing has been done. This will enable an objective comparison
of testing done on various components, potentially by different suppliers.
Test design:
The test design activity is split into two, what you might call the analysis, and then the actual design of the test cases
themselves. The analysis uses a selected model of the software (control flowgraphs), or the requirements (equivalence
partitions) and the model is used to identify what are called coverage items. From the list of coverage items, test cases are
developed that will exercise (cover) each coverage item. For example, if you are using control flowgraphs as a model for the
software under test, you might use the branch-outcomes as the coverage item to derive test cases from.
Test measurement:
The same model can then be used for test measurement. If you adopt the branch coverage model and your coverage items
are the branches themselves, you can set an objective coverage target and that could be, for example, “100% branch
coverage”.
Coverage targets based on the techniques in the standard can be adopted before the code is designed or written. The
techniques are objective. You’ll certainly achieve a degree of confidence that the software has been exercised adequately,
but the test design process is repeatable in that the rule is objective. If you follow the technique and the process that uses
that technique to derive test cases then, in principle, the same test cases will be extracted from that model.
Normally, coverage targets are set at 100% but sometimes this is impractical perhaps because some branches in software
may be unreachable except by executing obscure, error conditions. Test coverage targets less than 100% may be used in
these circumstances
kkk.Test harnesses
mmm.File comparison
ooo.Debugging
ppp.Dynamic analysis
qqq.Source coverage
sss.Incident management
xxx.CAST limitations
cccc.Pilot project
dddd.Evaluation of pilot
iiii. Documentation
kkkk.Test Case
mmmm.Acceptance testing: Formal testing conducted to enable a user, customer, or other authorized entity to determine
whether to accept a system or component.
nnnn.Actual outcome: The behaviour actually produced when the object is tested under specified conditions.
oooo.Ad hoc testing: Testing carried out using no recognised test case design technique.
pppp.Alpha testing: Simulated or actual operational testing at an in-house site not otherwise involved with the software
developers.
rrrr. Backus-Naur form: A metalanguage used to formally describe the syntax of a language.
ssss.Basic block: A sequence of one or more consecutive, executable statements containing no branches.
tttt. Basis test set: A set of test cases derived from the code logic which ensure that 100\% branch coverage is achieved.
vvvv.Behaviour: The combination of input values and preconditions and the required response for a function of a system.
The full specification of a function would normally comprise one or more behaviours.
wwww.Beta testing: Operational testing at a site not otherwise involved with the software developers.
xxxx.Big-bang testing: Integration testing where no incremental testing takes place prior to all the system's components
being combined to form the system.
zzzz.Bottom-up testing: An approach to integration testing where the lowest level components are tested first, then used to
facilitate the testing of higher level components. The process is repeated until the component at the top of the hierarchy
is tested.
aaaaa.Boundary value analysis: A test case design technique for a component in which test cases are designed which
include representatives of boundary values.
bbbbb.Boundary value coverage: The percentage of boundary values of the component's equivalence classes, which have
been exercised by a test case suite.
ddddd.Boundary value: An input value or output value which is on the boundary between equivalence classes, or an
incremental distance either side of the boundary.
eeeee.Branch condition combination coverage: The percentage of combinations of all branch condition outcomes in every
decision that have been exercised by a test case suite.
fffff. Branch condition combination testing: A test case design technique in which test cases are designed to execute
combinations of branch condition outcomes.
ggggg.Branch condition coverage: The percentage of branch condition outcomes in every decision that have been
exercised by a test case suite.
hhhhh.Branch condition testing: A test case design technique in which test cases are designed to execute branch condition
outcomes.
jjjjj. Branch coverage: The percentage of branches that have been exercised by a test case suite
mmmmm.Branch testing: A test case design technique for a component in which test cases are designed to execute
nnnnn.Branch outcomes.
ooooo.Branch: A conditional transfer of control from any statement to any other statement in a component, or an
unconditional transfer of control from any statement to any other statement in the component except the next statement,
or when a component has more than one entry point, a transfer of control to an entry point of the component.
rrrrr.Capture/playback tool: A test tool that records test input as it is sent to the software under test. The input cases stored
can then be used to reproduce the test later.
uuuuu.Cause-effect graph: A graphical representation of inputs or stimuli (causes) with their associated outputs (effects),
which can be used to design test cases.
vvvvv.Cause-effect graphing: A test case design technique in which test cases are designed by consideration of cause-effect
graphs.
wwwww.Certification: The process of confirming that a system or component complies with its specified requirements and is
acceptable for operational use.
yyyyy.Code coverage: An analysis method that determines which parts of the software have been executed (covered) by the
test case suite and which parts have not been executed and therefore may require additional attention.
zzzzz.Code-based testing: Designing tests based on objectives derived from the implementation (e.g., tests that execute
specific control flow paths or use specific data items).
aaaaaa.Compatibility testing: Testing whether the system is compatible with other systems with which it should
communicate.
eeeeee.Computation data use: A data use not in a condition. Also called C-use.
hhhhhh.Condition: A Boolean expression containing no Boolean operators. For instance, A<B is a condition but A and B is
not.
iiiiii. Conformance criterion: Some method of judging whether or not the component's action on a particular specified input
value conforms to the specification.
jjjjjj. Conformance testing: The process of testing that an implementation conforms to the specification on which it is based.
kkkkkk.Control flow graph: The diagrammatic representation of the possible alternative control flow paths through a
component.
mmmmmm.Control flow: An abstract representation of all possible sequences of events in a program's execution.
nnnnnn.Conversion testing: Testing of programs or procedures used to convert data from existing systems for use in
replacement systems.
qqqqqq.Coverage: The degree, expressed as a percentage, to which a test case suite has exercised a specified coverage
item.
ssssss.Data definition C-use coverage: The percentage of data definition C-use pairs in a component that are exercised by
a test case suite.
tttttt.Data definition C-use pair: A data definition and computation data use, where the data use uses the value defined in the
data definition.
uuuuuu.Data definition P-use coverage: The percentage of data definition P-use pairs in a component that are exercised by
a test case suite.
vvvvvv.Data definition P-use pair: A data definition and predicate data use, where the data use uses the value defined in the
data definition.
xxxxxx.Data definition-use coverage: The percentage of data definition-use pairs in a component that are exercised by a
test case suite.
yyyyyy.Data definition-use pair: A data definition and data use, where the data use uses the value defined in the data
definition.
zzzzzz.Data definition-use testing: A test case design technique for a component in which test cases are designed to
execute data definition-use pairs.
aaaaaaa.Data flow coverage: Test coverage measure based on variable usage within the code. Examples are data
definition-use coverage, data definition P-use coverage, data definition C-use coverage, etc.
bbbbbbb.Data flow testing: Testing in which test cases are designed based on variable usage within the code.
ddddddd.Debugging: The process of finding and removing the causes of failures in software.
fffffff.Decision coverage: The percentage of decision outcomes that have been exercised by a test case suite.
ggggggg.Decision outcome: The result of a decision (which therefore determines the control flow alternative taken).
hhhhhhh.Decision: A program point at which the control flow has two or more alternative routes.
iiiiiii.Design-based testing: Designing tests based on objectives derived from the architectural or detail design of the software
(e.g., tests that execute specific invocation paths or probe the worst case behaviour of algorithms).
jjjjjjj.Desk checking: The testing of software by the manual simulation of its execution.
ooooooo.Dynamic analysis: The process of evaluating a system or component based upon its behaviour during execution.
ppppppp.Emulator: A device, computer program, or system that accepts the same inputs and produces the same outputs as
a given system.
sssssss.Equivalence partition coverage: The percentage of equivalence classes generated for the component, which have
been exercised by a test case suite.
ttttttt.Equivalence partition testing: A test case design technique for a component in which test cases are designed to
execute representatives from equivalence classes.
vvvvvvv.Error guessing: A test case design technique where the experience of the tester is used to postulate what faults
might occur, and to design tests specifically to expose them.
wwwwwww.Error seeding: The process of intentionally adding known faults to those already in a computer program for the
purpose of monitoring the rate of detection and removal, and estimating the number of faults remaining in the program.
yyyyyyy.Executable statement: A statement which, when compiled, is translated into object code, which will be executed
procedurally when the program is running and may perform an action on program data.
zzzzzzz.Exercised: A program element is exercised by a test case when the input value causes the execution of that
element, such as a statement, branch, or other structural element.
aaaaaaaa.Exhaustive testing: A test case design technique in which the test case suite comprises all combinations of input
values and preconditions for component variables.
eeeeeeee.Failure: Deviation of the software from its expected delivery or service. [Fenton]
gggggggg.Feasible path: A path for which there exists a set of input values and execution conditions which causes it to be
executed.
iiiiiiii.Functional specification: The document that describes in detail the characteristics of the product with regard to its
intended capability.
jjjjjjjj.Functional test case design: Test case selection that is based on an analysis of the specification of the component
without reference to its internal workings.
llllllll.Incremental testing: Integration testing where system components are integrated into the system one at a time until the
entire system is integrated.
nnnnnnnn.Infeasible path: A path, which cannot be exercised by any set of possible input values.
qqqqqqqq.Input: A variable (whether stored within a component or outside it) that is read by the component.
rrrrrrrr.Inspection: A group review quality improvement process for written material. It consists of two aspects; product
(document itself) improvement and process improvement (of both document production and inspection). After [Graham]
ssssssss.Installability testing: Testing concerned with the installation procedures for the system.
tttttttt.Instrumentation: The insertion of additional code into the program in order to collect information about program
behaviour during program execution.
vvvvvvvv.Integration testing: Testing performed to expose faults in the interfaces and in the interaction between integrated
components.
xxxxxxxx.Interface testing: Integration testing where the interfaces between system components are tested.
yyyyyyyy.Isolation testing: Component testing of individual components in isolation from surrounding components, with
surrounding components being simulated by stubs.
zzzzzzzz.LCSAJ coverage: The percentage of LCSAJs of a component, which is exercised by a test case suite.
aaaaaaaaa.LCSAJ testing: A test case design technique for a component in which test cases are designed to execute
LCSAJs.
bbbbbbbbb.LCSAJ: A Linear Code Sequence And Jump, consisting of the following three items (conventionally identified by
line numbers in a source code listing): the start of the linear sequence of executable statements, the end of the linear
sequence, and the target line to which control flow is transferred at the end of the linear sequence.
eeeeeeeee.Maintainability testing: Testing whether the system meets its specified objectives for maintainability.
fffffffff.Modified condition/decision coverage: The percentage of all branch condition outcomes that Independently affect a
decision outcome that have been exercised by a test case suite.
ggggggggg.Modified condition/decision testing: A test case design technique in which test cases are designed to execute
branch condition outcomes that independently affect a decision outcome.
iiiiiiiii.Mutation analysis: A method to determine test case suite thoroughness by measuring the extent to which a test case
suite can discriminate the program from slight variants (mutants) of the program. See also error seeding.
kkkkkkkkk.Non-functional requirements testing: Testing of those requirements that do not relate to functionality. I.e.
performance, usability, etc.
lllllllll.N-switch coverage: The percentage of sequences of N-transitions that have been exercised by a test case suite.
mmmmmmmmm.N-switch testing: A form of state transition testing in which test cases are designed to execute all valid
sequences of N-transitions.
ooooooooo.Operational testing: Testing conducted to evaluate a system or component in its operational environment.
ppppppppp.Oracle: A mechanism to produce the predicted outcomes to compare with the actual outcomes of the software
under test.
qqqqqqqqq.Outcome: Actual outcome or predicted outcome. This is the outcome of a test. See also branch outcome,
condition outcome, and decision outcome.
ttttttttt.Output: A variable (whether stored within a component or outside it) that is written to by the component.
wwwwwwwww.Path sensitising: Choosing a set of input values to force the execution of a component to take a given path.
xxxxxxxxx.Path testing: A test case design technique in which test cases are designed to execute paths of a component.
yyyyyyyyy.Path: A sequence of executable statements of a component, from an entry point to an exit point.
zzzzzzzzz.Performance testing: Testing conducted to evaluate the compliance of a system or component with specified
performance requirements.
aaaaaaaaaa.Portability testing: Testing aimed at demonstrating the software can be ported to specified hardware or
software platforms.
bbbbbbbbbb.Precondition: Environmental and state conditions, which must be fulfilled before the component can be
executed with a particular input value.
dddddddddd.Predicate: A logical expression, which evaluates to TRUE or FALSE, normally to direct the execution path in
code.
eeeeeeeeee.Predicted outcome: The behaviour predicted by the specification of an object under specified conditions.
gggggggggg.Progressive testing: Testing of new features after regression testing of previous features.
hhhhhhhhhh.Pseudo-random: A series, which appears to be random but is in fact generated according to some prearranged
sequence.
jjjjjjjjjj.Recovery testing: Testing aimed at verifying the system's ability to recover from varying degrees of failure.
kkkkkkkkkk.Regression testing: Retesting of a previously tested program following modification to ensure that faults have
not been introduced or uncovered as a result of the changes made.
llllllllll.Requirements-based testing: Designing tests based on objectives derived from requirements for the software
component (e.g., tests that exercise specific functions or probe the non-functional constraints such as performance or
security). See functional test case design.
nnnnnnnnnn.Review: A process or meeting during which a work product, or set of work products, is presented to project
personnel, managers, users or other interested parties for comment or approval. [ieee]
oooooooooo.Security testing: Testing whether the system meets its specified security objectives.
qqqqqqqqqq.Simple subpath: A subpath of the control flow graph in which no program part is executed more than
necessary.
rrrrrrrrrr.Simulation: The representation of selected behavioural characteristics of one physical or abstract system by another
system. [ISO 2382/1].
ssssssssss.Simulator: A device, computer program, or system used during software verification, which behaves or operates
like a given system when provided with a set of controlled inputs.
uuuuuuuuuu.Specification: A description of a component's function in terms of its output values for specified input values
under specified preconditions.
yyyyyyyyyy.Statement coverage: The percentage of executable statements in a component that have been exercised by a
test case suite.
zzzzzzzzzz.Statement testing: A test case design technique for a component in which test cases are designed to execute
statements.
aaaaaaaaaaa.Statement: An entity in a programming language, which is typically the smallest indivisible unit of execution.
bbbbbbbbbbb.Static analysis: Analysis of a program carried out without executing the program.
eeeeeeeeeee.Statistical testing: A test case design technique in which a model is used of the statistical distribution of the
input to construct representative test cases.
fffffffffff.Storage testing: Testing whether the system meets its specified storage objectives.
ggggggggggg.Stress testing: Testing conducted to evaluate a system or component at or beyond the limits of its specified
requirements.
hhhhhhhhhhh.Structural coverage: Coverage measures based on the internal structure of the component.
iiiiiiiiiii.Structural test case design: Test case selection that is based on an analysis of the internal structure of the
component.
kkkkkkkkkkk.Structured basis testing: A test case design technique in which test cases are derived from the code logic to
achieve 100% branch coverage.
ppppppppppp.Symbolic execution: A static analysis technique that derives a symbolic expression for program paths.
qqqqqqqqqqq.Syntax testing: A test case design technique for a component or system in which test case design is based
upon the syntax of the input.
rrrrrrrrrrr.System testing: The process of testing an integrated system to verify that it meets specified requirements.
ttttttttttt.Test automation: The use of software to control the execution of tests, the comparison of actual outcomes to
predicted outcomes, the setting up of test preconditions, and other test control and test reporting functions.
uuuuuuuuuuu.Test case design technique: A method used to derive or select test cases.
vvvvvvvvvvv.Test case suite: A collection of one or more test cases for the software under test.
wwwwwwwwwww.Test case: A set of inputs, execution preconditions, and expected outcomes developed for a particular
objective, such as to exercise a particular program path or to verify compliance with a specific requirement.
xxxxxxxxxxx.Test comparator: A test tool that compares the actual outputs produced by the software under test with the
expected outputs for that test case.
yyyyyyyyyyy.Test completion criterion: A criterion for determining when planned testing is complete, defined in terms of a
test measurement technique.
aaaaaaaaaaaa.Test driver: A program or test tool used to execute software against a test case suite.
bbbbbbbbbbbb.Test environment: A description of the hardware and software environment in which the tests will be run, and
any other software with which the software under test interacts when under test including stubs and test drivers.
cccccccccccc.Test execution technique: The method used to perform the actual test execution, e.g. manual,
capture/playback tool, etc.
dddddddddddd.Test execution: The processing of a test case suite by the software under test, producing an outcome.
eeeeeeeeeeee.Test Generator: A program that generates test cases in accordance to a specified strategy or heuristic.
ffffffffffff.Test Harness: A testing tool that comprises a test driver and a test comparator.
iiiiiiiiiiii.Test Plan: A record of the test planning process detailing the degree of tester independence, the test environment,
the test case design techniques and test measurement techniques to be used, and the rationale for their choice.
jjjjjjjjjjjj.Test Procedure: A document providing detailed instructions for the execution of one or more test cases.
kkkkkkkkkkkk.Test Records: For each test, an unambiguous record of the identities and versions of the component under
test, the test specification, and actual outcome.
llllllllllll.Test Script: Commonly used to refer to the automated test procedure used with a test harness.
mmmmmmmmmmmm.Test Specification: For each test case, the coverage item, and the initial state of the software
under test, the input, and the predicted outcome.
oooooooooooo.Testing: The process of exercising software to verify that it satisfies specified requirements and to detect
errors.
pppppppppppp.Thread Testing: A variation of top-down testing where the progressive integration of components follows
the implementation of subsets of the requirements, as opposed to the integration of components by successively lower
levels.
qqqqqqqqqqqq.Top-Down Testing: An approach to integration testing where the component at the top of the component
hierarchy is tested first, with lower level components being simulated by stubs. Tested components are then used to test
lower level components. The process is repeated until the lowest level components have been tested.
ssssssssssss.Usability Testing: Testing the ease with which users can learn and use a product.
tttttttttttt.Validation: Determination of the correctness of the products of software development with respect to the user
needs and requirements.
uuuuuuuuuuuu.Verification: The process of evaluating a system or component to determine whether the products of the
given development phase satisfy the conditions imposed at the start of that phase.
vvvvvvvvvvvv.Volume Testing: Testing where the system is subjected to large volumes of data.
wwwwwwwwwwww.Walkthrough: A review of requirements, designs, or code characterized by the author of the object
under review guiding the progression of the review.