You are on page 1of 69

1. What is black box/white box testing?

Black-box and white-box are test design methods. Black-box test design treats the system as a
“black-box”, so it doesn’t explicitly use knowledge of the internal structure. Black-box test design is
usually described as focusing on testing functional requirements. Synonyms for black-box include:
behavioral, functional, opaque-box, and closed-box. White-box test design allows one to peek inside
the “box”, and it focuses specifically on using internal knowledge of the software to guide the
selection of test data. Synonyms for white-box include: structural, glass-box and clear-box.

While black-box and white-box are terms that are still in popular use, many people prefer the terms
"behavioral" and "structural". Behavioral test design is slightly different from black-box test design
because the use of internal knowledge isn't strictly forbidden, but it's still discouraged. In practice, it
hasn't proven useful to use a single test design method. One has to use a mixture of different
methods so that they aren't hindered by the limitations of a particular one. Some call this "gray-
box" or "translucent-box" test design, but others wish we'd stop talking about boxes altogether.

It is important to understand that these methods are used during the test design phase, and their
influence is hard to see in the tests once they're implemented. Note that any level of testing (unit
testing, system testing, etc.) can use any test design methods. Unit testing is usually associated
with structural test design, but this is because testers usually don't have well-defined requirements
at the unit level to validate.

2. What are unit, component and integration testing?

Note that the definitions of unit, component, integration, and integration testing are recursive:

Unit. The smallest compliable component. A unit typically is the work of one programmer (At least
in principle). As defined, it does not include any called sub-components (for procedural languages)
or communicating components in general.

Unit Testing: in unit testing called components (or communicating components) are replaced with
stubs, simulators, or trusted components. Calling components are replaced with drivers or trusted
super-components. The unit is tested in isolation.

component: a unit is a component. The integration of one or more components is a component.

Note: The reason for "one or more" as contrasted to "Two or more" is to allow for components that
call themselves recursively.

component testing: same as unit testing except that all stubs and simulators are replaced with the
real thing.

Two components (actually one or more) are said to be integrated when:

a. They have been compiled, linked, and loaded together.


b. They have successfully passed the integration tests at the interface between them.

Thus, components A and B are integrated to create a new, larger, component (A,B). Note that this
does not conflict with the idea of incremental integration—it just means that A is a big component
and B, the component added, is a small one.

Integration testing: carrying out integration tests.

Integration tests (After Leung and White) for procedural languages. This is easily generalized for OO
languages by using the equivalent constructs for message passing. In the following, the word "call"
is to be understood in the most general sense of a data flow and is not restricted to just formal
subroutine calls and returns – for example, passage of data through global data structures and/or
the use of pointers.
1
Let A and B be two components in which A calls B.
Let Ta be the component level tests of A
Let Tb be the component level tests of B
Tab The tests in A's suite that cause A to call B.
Tbsa The tests in B's suite for which it is possible to sensitize A -- the inputs
are to A, not B.
Tbsa + Tab == the integration test suite (+ = union).

Note: Sensitize is a technical term. It means inputs that will cause a routine to go down a specified
path. The inputs are to A. Not every input to A will cause A to traverse a path in which B is called.
Tbsa is the set of tests which do cause A to follow a path in which B is called. The outcome of the
test of B may or may not be affected.

There have been variations on these definitions, but the key point is that it is pretty darn formal and
there's a goodly hunk of testing theory, especially as concerns integration testing, OO testing, and
regression testing, based on them.

As to the difference between integration testing and system testing. System testing specifically goes
after behaviors and bugs that are properties of the entire system as distinct from properties
attributable to components (unless, of course, the component in question is the entire system).
Examples of system testing issues:
Resource loss bugs, throughput bugs, performance, security, recovery,
Transaction synchronization bugs (often misnamed "timing bugs").

3. What's the difference between load and stress testing ?

One of the most common, but unfortunate misuse of terminology is treating “load testing” and
“stress testing” as synonymous. The consequence of this ignorant semantic abuse is usually that
the system is neither properly “load tested” nor subjected to a meaningful stress test.

Stress testing is subjecting a system to an unreasonable load while denying it the resources (e.g.,
RAM, disc, mips, interrupts, etc.) needed to process that load. The idea is to stress a system to the
breaking point in order to find bugs that will make that break potentially harmful. The system is not
expected to process the overload without adequate resources, but to behave (e.g., fail) in a decent
manner (e.g., not corrupting or losing data). Bugs and failure modes discovered under stress testing
may or may not be repaired depending on the application, the failure mode, consequences, etc. The
load (incoming transaction stream) in stress testing is often deliberately distorted so as to force the
system into resource depletion.

Load testing is subjecting a system to a statistically representative (usually) load. The two main
reasons for using such loads is in support of software reliability testing and in performance testing.
The term "load testing" by itself is too vague and imprecise to warrant use. For example, do you
mean representative load," "overload," "high load," etc. In performance testing, load is
varied from a minimum (zero) to the maximum level the system can sustain without running out of
resources or having, transactions >suffer (application-specific) excessive delay.

A third use of the term is as a test whose objective is to determine the maximum sustainable load
the system can handle. In this usage, "load testing" is merely testing at the highest transaction
arrival rate in performance testing.

4. What's the difference between QA and testing?

2
QA is more a preventive thing, ensuring quality in the company and therefore the product rather
than just testing the product for software bugs?

TESTING means "quality control"


QUALITY CONTROL measures the quality of a product
QUALITY ASSURANCE measures the quality of processes used to create a
quality product.

5. What is the best tester to developer ratio?

Reported tester: developer ratios range from 10:1 to 1:10.

There's no simple answer. It depends on so many things, Amount of reused code, number and type
of interfaces, platform, quality goals, etc.

It also can depend on the development model. The more specs, the less testers. The roles can play
a big part also. Does QA own beta? Do you include process auditors or planning activities?

These figures can all vary very widely depending on how you define "tester" and "developer". In
some organizations, a "tester" is anyone who happens to be testing software at the time -- such as
their own. In other organizations, a "tester" is only a member of an independent test group.

It is better to ask about the test labor content than it is to ask about the tester/developer ratio. The
test labor content, across most applications is generally accepted as 50%, when people do honest
accounting. For life-critical software, this can go up to 80%.

6. What is Software Quality Assurance?

Software QA involves the entire software development PROCESS - monitoring and improving the
process, making sure that any agreed-upon standards and procedures are followed, and ensuring
that problems are found and dealt with. It is oriented to 'prevention'.
7. What is Software Testing?

Testing involves operation of a system or application under controlled conditions and evaluating the
results (eg, 'if the user is in interface A of the application while using hardware B, and does C, then D
should happen'). The controlled conditions should include both normal and abnormal conditions.
Testing should intentionally attempt to make things go wrong to determine if things happen when
they shouldn't or things don't happen when they should. It is oriented to 'detection'.

Organizations vary considerably in how they assign responsibility for QA and testing. Sometimes
they're the combined responsibility of one group or individual. Also common are project teams that
include a mix of testers and developers who work closely together, with overall QA processes
monitored by project managers. It will depend on what best fits an organization's size and business
structure.
8. What are some recent major computer system failures caused by Software bugs?
¾ In March of 2002 it was reported that software bugs in Britain's national tax system
resulted in more than 100,000 erroneous tax overcharges. The problem was partly
attibuted to the difficulty of testing the integration of multiple systems.

¾ A newspaper columnist reported in July 2001 that a serious flaw was found in off-the-shelf
software that had long been used in systems for tracking certain U.S. nuclear materials.
The same software had been recently donated to another country to be used in tracking
their own nuclear materials, and it was not until scientists in that country discovered the
problem, and shared the information, that U.S. officials became aware of the problems.

¾ According to newspaper stories in mid-2001, a major systems development contractor


was fired and sued over problems with a large retirement plan management system.

3
According to the reports, the client claimed that system deliveries were late, the software
had excessive defects, and it caused other systems to crash.

¾ In January of 2001 newspapers reported that a major European railroad was hit by the
aftereffects of the Y2K bug. The company found that many of their newer trains would not
run due to their inability to recognize the date '31/12/2000'; the trains were started by
altering the control system's date settings.

¾ News reports in September of 2000 told of a software vendor settling a lawsuit with a
large mortgage lender; the vendor had reportedly delivered an online mortgage
processing system that did not meet specifications, was delivered late, and didn't work.

¾ In early 2000, major problems were reported with a new computer system in a large
suburban U.S. public school district with 100,000+ students; problems included 10,000
erroneous report cards and students left stranded by failed class registration systems; the
district's CIO was fired. The school district decided to reinstate it's original 25-year old
system for at least a year until the bugs were worked out of the new system by the
software vendors.

¾ In October of 1999 the $125 million NASA Mars Climate Orbiter spacecraft was believed to
be lost in space due to a simple data conversion error. It was determined that spacecraft
software used certain data in English units that should have been in metric units. Among
other tasks, the orbiter was to serve as a communications relay for the Mars Polar Lander
mission, which failed for unknown reasons in December 1999. Several investigating panels
were convened to determine the process failures that allowed the error to go undetected.

¾ Bugs in software supporting a large commercial high-speed data network affected 70,000
business customers over a period of 8 days in August of 1999. Among those affected was
the electronic trading system of the largest U.S. futures exchange, which was shut down
for most of a week as a result of the outages.

¾ In April of 1999 a software bug caused the failure of a $1.2 billion military satellite launch,
the costliest unmanned accident in the history of Cape Canaveral launches. The failure
was the latest in a string of launch failures, triggering a complete military and industry
review of U.S. space launch programs, including software integration and testing
processes. Congressional oversight hearings were requested.

¾ A small town in Illinois received an unusually large monthly electric bill of $7 million in
March of 1999. This was about 700 times larger than its normal bill. It turned out to be
due to bugs in new software that had been purchased by the local power company to deal
with Y2K software issues.

¾ In early 1999 a major computer game company recalled all copies of a popular new
product due to software problems. The company made a public apology for releasing a
product before it was ready.

¾ The computer system of a major online U.S. stock trading service failed during trading
hours several times over a period of days in February of 1999 according to nationwide
news reports. The problem was reportedly due to bugs in a software upgrade intended to
speed online trade confirmations.

¾ In April of 1998 a major U.S. data communications network failed for 24 hours, crippling a
large part of some U.S. credit card transaction authorization systems as well as other
large U.S. bank, retail, and government data systems. The cause was eventually traced to
a software bug.

¾ January 1998 news reports told of software problems at a major U.S. telecommunications
company that resulted in no charges for long distance calls for a month for 400,000
customers. The problem went undetected until customers called up with questions about
their bills.
4
¾ In November of 1997 the stock of a major health industry company dropped 60% due to
reports of failures in computer billing systems, problems with a large database conversion,
and inadequate software testing. It was reported that more than $100,000,000 in
receivables had to be written off and that multi-million dollar fines were levied on the
company by government agencies.

¾ A retail store chain filed suit in August of 1997 against a transaction processing system
vendor (not a credit card company) due to the software's inability to handle credit cards
with year 2000 expiration dates.

¾ In August of 1997 one of the leading consumer credit reporting companies reportedly shut
down their new public web site after less than two days of operation due to software
problems. The new site allowed web site visitors instant access, for a small fee, to their
personal credit reports. However, a number of initial users ended up viewing each others'
reports instead of their own, resulting in irate customers and nationwide publicity. The
problem was attributed to "...unexpectedly high demand from consumers and faulty
software that routed the files to the wrong computers."

¾ In November of 1996, newspapers reported that software bugs caused the 411 telephone
information system of one of the U.S. RBOC's to fail for most of a day. Most of the 2000
operators had to search through phone books instead of using their 13,000,000-listing
database. The bugs were introduced by new software modifications and the problem
software had been installed on both the production and backup systems. A spokesman for
the software vendor reportedly stated that 'It had nothing to do with the integrity of the
software. It was human error.'

¾ On June 4 1996 the first flight of the European Space Agency's new Ariane 5 rocket failed
shortly after launching, resulting in an estimated uninsured loss of a half billion dollars. It
was reportedly due to the lack of exception handling of a floating-point error in a
conversion from a 64-bit integer to a 16-bit signed integer.

¾ Software bugs caused the bank accounts of 823 customers of a major U.S. bank to be
credited with $924,844,208.32 each in May of 1996, according to newspaper reports. The
American Bankers Association claimed it was the largest such error in banking history. A
bank spokesman said the programming errors were corrected and all funds were
recovered.

¾ Software bugs in a Soviet early-warning monitoring system nearly brought on nuclear war
in 1983, according to news reports in early 1999. The software was supposed to filter out
false missile detections caused by Soviet satellites picking up sunlight reflections off cloud-
tops, but failed to do so. Disaster was averted when a Soviet commander, based on a
what he said was a '...funny feeling in my gut', decided the apparent missile attack was a
false alarm. The filtering software code was rewritten.
9. Why is it often hard for management to get serious about quality assurance?

Solving problems is a high-visibility process; preventing problems is low-visibility. This is illustrated


by an old parable:
In ancient China there was a family of healers, one of whom was known throughout the land and
employed as a physician to a great lord. The physician was asked which of his family was the most
skillful healer. He replied,
"I tend to the sick and dying with drastic and dramatic treatments, and on occasion someone is
cured and my name gets out among the lords."
"My elder brother cures sickness when it just begins to take root, and his skills are known among the
local peasants and neighbors."
"My eldest brother is able to sense the spirit of sickness and eradicate it before it takes form. His
name is unknown outside our home."

10. Why does Software have bugs?

5
¾ Miscommunication or no communication - as to specifics of what an application should or
shouldn't do (the application's requirements).

¾ Software complexity - the complexity of current software applications can be difficult to


comprehend for anyone without experience in modern-day software development.
Windows-type interfaces, client-server and distributed applications, data communications,
enormous relational databases, and sheer size of applications have all contributed to the
exponential growth in software/system complexity. And the use of object-oriented
techniques can complicate instead of simplify a project unless it is well-engineered.

¾ Programming errors - programmers, like anyone else, can make mistakes.

¾ changing requirements - the customer may not understand the effects of changes, or may
understand and request them anyway - redesign, rescheduling of engineers, effects on
other projects, work already completed that may have to be redone or thrown out,
hardware requirements that may be affected, etc. If there are many minor changes or any
major changes, known and unknown dependencies among parts of the project are likely to
interact and cause problems, and the complexity of keeping track of changes may result in
errors. Enthusiasm of engineering staff may be affected. In some fast-changing business
environments, continuously modified requirements may be a fact of life. In this case,
management must understand the resulting risks, and QA and test engineers must adapt
and plan for continuous extensive testing to keep the inevitable bugs from running out of
control.

¾ time pressures - scheduling of software projects is difficult at best, often requiring a lot of
guesswork. When deadlines loom and the crunch comes, mistakes will be made.

egos - people prefer to say things like:


'no problem'
'piece of cake'
'I can whip that out in a few hours'
'it should be easy to update that old code'

instead of:
'that adds a lot of complexity and we could end up
making a lot of mistakes'
'we have no idea if we can do that; we'll wing it'
'I can't estimate how long it will take, until I
take a close look at it'
'we can't figure out what that old spaghetti code
did in the first place'

If there are too many unrealistic 'no problem's', the result is bugs.
¾ poorly documented code - it's tough to maintain and modify code that is badly written or
poorly documented; the result is bugs. In many organizations management provides no
incentive for programmers to document their code or write clear, understandable code. In
fact, it's usually the opposite: they get points mostly for quickly turning out code, and there's
job security if nobody else can understand it ('if it was hard to write, it should be hard to
read').

¾ software development tools - visual tools, class libraries, compilers, scripting tools, etc. often
introduce their own bugs or are poorly documented, resulting in added bugs.
11. How can new Software QA processes be introduced in an existing organization?
¾ A lot depends on the size of the organization and the risks involved. For large organizations
with high-risk (in terms of lives or property) projects, serious management buy-in is required
and a formalized QA process is necessary.

6
¾ Where the risk is lower, management and organizational buy-in and QA implementation may
be a slower, step-at-a-time process. QA processes should be balanced with productivity so as
to keep bureaucracy from getting out of hand.

¾ For small groups or projects, a more ad-hoc process may be appropriate, depending on the
type of customers and projects. A lot will depend on team leads or managers, feedback to
developers, and ensuring adequate communications among customers, managers,
developers, and testers.

¾ In all cases the most value for effort will be in requirements management processes, with a
goal of clear, complete, testable requirement specifications or expectations.
12. What is verification? validation?

Verification typically involves reviews and meetings to evaluate documents, plans, code,
requirements, and specifications. This can be done with checklists, issues lists, walkthroughs, and
inspection meetings. Validation typically involves actual testing and takes place after verifications are
completed. The term 'IV & V' refers to Independent Verification and Validation.

13. What is a 'walkthrough'?

A 'walkthrough' is an informal meeting for evaluation or informational purposes. Little or no


preparation is usually required.

14. What's an 'inspection'?

An inspection is more formalized than a 'walkthrough', typically with 3-8 people including a
moderator, reader, and a recorder to take notes. The subject of the inspection is typically a
document such as a requirements spec or a test plan, and the purpose is to find problems and see
what's missing, not to fix anything. Attendees should prepare for this type of meeting by reading
thru the document; most problems will be found during this preparation. The result of the inspection
meeting should be a written report. Thorough preparation for inspections is difficult, painstaking
work, but is one of the most cost effective methods of ensuring quality. Employees who are most
skilled at inspections are like the 'eldest brother' in the parable in 'Why is it often hard for
management to get serious about quality assurance?'. Their skill may have low visibility but they are
extremely valuable to any software development organization, since bug prevention is far more cost-
effective than bug detection.

15. What kinds of testing should be considered?


¾ Black box testing - not based on any knowledge of internal design or code. Tests are based on
requirements and functionality.

¾ White box testing - based on knowledge of the internal logic of an application's code. Tests
are based on coverage of code statements, branches, paths, conditions.

¾ unit testing - the most 'micro' scale of testing; to test particular functions or code modules.
Typically done by the programmer and not by testers, as it requires detailed knowledge of the
internal program design and code. Not always easily done unless the application has a well-
designed architecture with tight code; may require developing test driver modules or test
harnesses.

¾ incremental integration testing - continuous testing of an application as new functionality is


added; requires that various aspects of an application's functionality be independent enough
to work separately before all parts of the program are completed, or that test drivers be
developed as needed; done by programmers or by testers.

¾ integration testing - testing of combined parts of an application to determine if they function


together correctly. The 'parts' can be code modules, individual applications, client and server

7
applications on a network, etc. This type of testing is especially relevant to client/server and
distributed systems.

¾ functional testing - black-box type testing geared to functional requirements of an


application; this type of testing should be done by testers. This doesn't mean that the
programmers shouldn't check that their code works before releasing it (which of course
applies to any stage of testing.)

¾ system testing - black-box type testing that is based on overall requirements specifications;
covers all combined parts of a system.

¾ end-to-end testing - similar to system testing; the 'macro' end of the test scale; involves
testing of a complete application environment in a situation that mimics real-world use, such
as interacting with a database, using network communications, or interacting with other
hardware, applications, or systems if appropriate.

¾ sanity testing - typically an initial testing effort to determine if a new software version is
performing well enough to accept it for a major testing effort. For example, if the new
software is crashing systems every 5 minutes, bogging down systems to a crawl, or
destroying databases, the software may not be in a 'sane' enough condition to warrant further
testing in its current state.

¾ regression testing - re-testing after fixes or modifications of the software or its environment.
It can be difficult to determine how much re-testing is needed, especially near the end of the
development cycle. Automated testing tools can be especially useful for this type of testing.

¾ acceptance testing - final testing based on specifications of the end-user or customer, or


based on use by end-users/customers over some limited period of time.

¾ load testing - testing an application under heavy loads, such as testing of a web site under a
range of loads to determine at what point the system's response time degrades or fails.

¾ stress testing - term often used interchangeably with 'load' and 'performance' testing. Also
used to describe such tests as system functional testing while under unusually heavy loads,
heavy repetition of certain actions or inputs, input of large numerical values, large complex
queries to a database system, etc.

¾ performance testing - term often used interchangeably with 'stress' and 'load' testing. Ideally
'performance' testing (and any other 'type' of testing) is defined in requirements
documentation or QA or Test Plans.

¾ usability testing - testing for 'user-friendliness'. Clearly this is subjective, and will depend on
the targeted end-user or customer. User interviews, surveys, video recording of user
sessions, and other techniques can be used. Programmers and testers are usually not
appropriate as usability testers.

¾ install/uninstall testing - testing of full, partial, or upgrade install/uninstall processes.

¾ recovery testing - testing how well a system recovers from crashes, hardware failures, or
other catastrophic problems.

¾ security testing - testing how well the system protects against unauthorized internal or
external access, willful damage, etc; may require sophisticated testing techniques.

¾ compatability testing - testing how well software performs in a particular


hardware/software/operating system/network/etc. environment.

¾ exploratory testing - often taken to mean a creative, informal software test that is not based
on formal test plans or test cases; testers may be learning the software as they test it.

8
¾ ad-hoc testing - similar to exploratory testing, but often taken to mean that the testers have
significant understanding of the software before testing it.

¾ user acceptance testing - determining if software is satisfactory to an end-user or customer.

¾ comparison testing - comparing software weaknesses and strengths to competing products.

¾ alpha testing - testing of an application when development is nearing completion; minor


design changes may still be made as a result of such testing. Typically done by end-users or
others, not by programmers or testers.

¾ beta testing - testing when development and testing are essentially completed and final bugs
and problems need to be found before final release. Typically done by end-users or others,
not by programmers or testers.

¾ mutation testing - a method for determining if a set of test data or test cases is useful, by
deliberately introducing various code changes ('bugs') and retesting with the original test
data/cases to determine if the 'bugs' are detected. Proper implementation requires large
computational resources.
16. What are 5 common problems in the software development process?
¾ poor requirements - if requirements are unclear, incomplete, too general, or not testable,
there will be problems.

¾ unrealistic schedule - if too much work is crammed in too little time, problems are
inevitable.

¾ inadequate testing - no one will know whether or not the program is any good until the
customer complains or systems crash.

¾ featuritis - requests to pile on new features after development is underway; extremely


common.

¾ miscommunication - if developers don't know what's needed or customer's have erroneous


expectations, problems are guaranteed.
17. What are 5 common solutions to software development problems?
¾ solid requirements - clear, complete, detailed, cohesive, attainable, testable requirements
that are agreed to by all players. Use prototypes to help nail down requirements.

¾ realistic schedules - allow adequate time for planning, design, testing, bug fixing, re-
testing, changes, and documentation; personnel should be able to complete the project
without burning out.

¾ adequate testing - start testing early on, re-test after fixes or changes, plan for adequate
time for testing and bug-fixing.

¾ stick to initial requirements as much as possible - be prepared to defend against changes


and additions once development has begun, and be prepared to explain consequences. If
changes are necessary, they should be adequately reflected in related schedule changes.
If possible, use rapid prototyping during the design phase so that customers can see what
to expect. This will provide them a higher comfort level with their requirements decisions
and minimize changes later on.

¾ communication - require walkthroughs and inspections when appropriate; make extensive


use of group communication tools - e-mail, groupware, networked bug-tracking tools and
change management tools, intranet capabilities, etc.; insure that documentation is
available and up-to-date - preferably electronic, not paper; promote teamwork and
cooperation; use prototypes early on so that customers' expectations are clarified.
18. What is software 'quality'?
9
Quality software is reasonably bug-free, delivered on time and within budget, meets requirements
and/or expectations, and is maintainable. However, quality is obviously a subjective term. It will
depend on who the 'customer' is and their overall influence in the scheme of things. A wide-angle
view of the 'customers' of a software development project might include end-users, customer
acceptance testers, customer contract officers, customer management, the development
organization's management/accountants/testers/salespeople, future software maintenance
engineers, stockholders, magazine columnists, etc. Each type of 'customer' will have their own slant
on 'quality' - the accounting department might define quality in terms of profits while an end-user
might define quality as user-friendly and bug-free.

19. What is 'good code'?

'Good code' is code that works, is bug free, and is readable and maintainable. Some organizations
have coding 'standards' that all developers are supposed to adhere to, but everyone has different
ideas about what's best, or what is too many or too few rules. There are also various theories and
metrics, such as McCabe Complexity metrics. It should be kept in mind that excessive use of
standards and rules can stifle productivity and creativity. 'Peer reviews', 'buddy checks' code analysis
tools, etc. can be used to check for problems and enforce standards.
For C and C++ coding, here are some typical ideas to consider in setting rules/standards; these may
or may not apply to a particular situation:
¾ minimize or eliminate use of global variables.

¾ use descriptive function and method names - use both upper and lower case, avoid
abbreviations, use as many characters as necessary to be adequately descriptive (use of more
than 20 characters is not out of line); be consistent in naming conventions.

¾ use descriptive variable names - use both upper and lower case, avoid abbreviations, use as
many characters as necessary to be adequately descriptive (use of more than 20 characters is
not out of line); be consistent in naming conventions.

¾ function and method sizes should be minimized; less than 100 lines of code is good, less than
50 lines is preferable.

¾ function descriptions should be clearly spelled out in comments preceding a function's code.

¾ organize code for readability.

¾ use whitespace generously - vertically and horizontally

¾ each line of code should contain 70 characters max.

¾ one code statement per line.

¾ coding style should be consistent throught a program (eg, use of brackets, indentations,
naming conventions, etc.)

¾ in adding comments, err on the side of too many rather than too few comments; a common
rule of thumb is that there should be at least as many lines of comments (including header
blocks) as lines of code.

¾ no matter how small, an application should include documentaion of the overall program
function and flow (even a few paragraphs is better than nothing); or if possible a separate
flow chart and detailed program documentation.

¾ make extensive use of error handling procedures and status and error logging.

¾ for C++, to minimize complexity and increase maintainability, avoid too many levels of
inheritance in class heirarchies (relative to the size and complexity of the application).

10
Minimize use of multiple inheritance, and minimize use of operator overloading (note that the
Java programming language eliminates multiple inheritance and operator overloading.)

¾ for C++, keep class methods small, less than 50 lines of code per method is preferable.

¾ for C++, make liberal use of exception handlers


20. What is 'good design'?
'Design' could refer to many things, but often refers to 'functional design' or 'internal design'. Good
internal design is indicated by software code whose overall structure is clear, understandable, easily
modifiable, and maintainable; is robust with sufficient error-handling and status logging capability;
and works correctly when implemented. Good functional design is indicated by an application whose
functionality can be traced back to customer and end-user requirements. For programs that have a
user interface, it's often a good idea to assume that the end user will have little computer knowledge
and may not read a user manual or even the on-line help; some common rules-of-thumb include:

¾ the program should act in a way that least surprises the user

¾ it should always be evident to the user what can be done next and how to exit

¾ the program shouldn't let the users do something stupid without warning them.
21. What is SEI? CMM? ISO? IEEE? ANSI? Will it help?
¾ SEI = 'Software Engineering Institute' at Carnegie-Mellon University; initiated by the U.S.
Defense Department to help improve software development processes.

¾ CMM = 'Capability Maturity Model', developed by the SEI. It's a model of 5 levels of
organizational 'maturity' that determine effectiveness in delivering quality software. It is
geared to large organizations such as large U.S. Defense Department contractors.
However, many of the QA processes involved are appropriate to any organization, and if
reasonably applied can be helpful. Organizations can receive CMM ratings by undergoing
assessments by qualified auditors.
Level 1 - characterized by chaos, periodic panics, and heroic
efforts required by individuals to successfully
complete projects. Few if any processes in place;
successes may not be repeatable.

Level 2 - software project tracking, requirements management,


realistic planning, and configuration management
processes are in place; successful practices can
be repeated.

Level 3 - standard software development and maintenance processes


are integrated throughout an organization; a Software
Engineering Process Group is is in place to oversee
software processes, and training programs are used to
ensure understanding and compliance.

Level 4 - metrics are used to track productivity, processes,


and products. Project performance is predictable,
and quality is consistently high.

Level 5 - the focus is on continouous process improvement. The


impact of new processes and technologies can be
predicted and effectively implemented when required.

Perspective on CMM ratings: During 1997-2001, 1018 organizations


were assessed. Of those, 27% were rated at Level 1, 39% at 2,
23% at 3, 6% at 4, and 5% at 5. (For ratings during the period
11
1992-96, 62% were at Level 1, 23% at 2, 13% at 3, 2% at 4, and
0.4% at 5.) The median size of organizations was 100 software
engineering/maintenance personnel; 32% of organizations were
U.S. federal contractors or agencies. For those rated at
Level 1, the most problematical key process area was in
Software Quality Assurance.
¾ ISO = 'International Organisation for Standardization' - The ISO 9001:2000 standard (which
replaces the previous standard of 1994) concerns quality systems that are assessed by
outside auditors, and it applies to many kinds of production and manufacturing organizations,
not just software. It covers documentation, design, development, production, testing,
installation, servicing, and other processes. The full set of standards consists of: (a)Q9001-
2000 - Quality Management Systems: Requirements; (b)Q9000-2000 - Quality Management
Systems: Fundamentals and Vocabulary; (c)Q9004-2000 - Quality Management Systems:
Guidelines for Performance Improvements. To be ISO 9001 certified, a third-party auditor
assesses an organization, and certification is typically good for about 3 years, after which a
complete reassessment is required. Note that ISO certification does not necessarily indicate
quality products - it indicates only that documented processes are followed.

¾ IEEE = 'Institute of Electrical and Electronics Engineers' - among other things, creates
standards such as 'IEEE Standard for Software Test Documentation' (IEEE/ANSI Standard
829), 'IEEE Standard of Software Unit Testing (IEEE/ANSI Standard 1008), 'IEEE Standard for
Software Quality Assurance Plans' (IEEE/ANSI Standard 730), and others.

¾ ANSI = 'American National Standards Institute', the primary industrial standards body in the
U.S.; publishes some software-related standards in conjunction with the IEEE and ASQ
(American Society for Quality).

¾ Other software development process assessment methods besides CMM and ISO 9000 include
SPICE, Trillium, TickIT. and Bootstrap.
22. What is the 'software life cycle'?

The life cycle begins when an application is first conceived and ends when it is no longer in use. It
includes aspects such as initial concept, requirements analysis, functional design, internal design,
documentation planning, test planning, coding, document preparation, integration, testing,
maintenance, updates, retesting, phase-out, and other aspects.

23. Will automated testing tools make testing easier?


¾ Possibly. For small projects, the time needed to learn and implement them may not be
worth it. For larger projects, or on-going long-term projects they can be valuable.

¾ A common type of automated tool is the 'record/playback' type. For example, a tester
could click through all combinations of menu choices, dialog box choices, buttons, etc. in
an application GUI and have them 'recorded' and the results logged by a tool. The
'recording' is typically in the form of text based on a scripting language that is
interpretable by the testing tool. If new buttons are added, or some underlying code in the
application is changed, etc. the application can then be retested by just 'playing back' the
'recorded' actions, and comparing the logging results to check effects of the changes. The
problem with such tools is that if there are continual changes to the system being tested,
the 'recordings' may have to be changed so much that it becomes very time-consuming to
continuously update the scripts. Additionally, interpretation of results (screens, data, logs,
etc.) can be a difficult task. Note that there are record/playback tools for text-based
interfaces also, and for all types of platforms.

¾ Other automated tools can include:


code analyzers - monitor code complexity, adherence to
standards, etc.

12
coverage analyzers - these tools check which parts of the
code have been exercised by a test, and may
be oriented to code statement coverage,
condition coverage, path coverage, etc.

memory analyzers - such as bounds-checkers and leak detectors.

load/performance test tools - for testing client/server


and web applications under various load
levels.

web test tools - to check that links are valid, HTML code
usage is correct, client-side and
server-side programs work, a web site's
interactions are secure.

other tools - for test case management, documentation


management, bug reporting, and configuration
management.

24. What makes a good test engineer?

A good test engineer has a 'test to break' attitude, an ability to take the point of view of the
customer, a strong desire for quality, and an attention to detail. Tact and diplomacy are useful in
maintaining a cooperative relationship with developers, and an ability to communicate with both
technical (developers) and non-technical (customers, management) people is useful. Previous
software development experience can be helpful as it provides a deeper understanding of the
software development process, gives the tester an appreciation for the developers' point of view, and
reduce the learning curve in automated test tool programming. Judgment skills are needed to assess
high-risk areas of an application on which to focus testing efforts when time is limited.

25. What makes a good Software QA engineer?

The same qualities a good tester has are useful for a QA engineer. Additionally, they must be able to
understand the entire software development process and how it can fit into the business approach
and goals of the organization. Communication skills and the ability to understand various sides of
issues are important. In organizations in the early stages of implementing QA processes, patience
and diplomacy are especially needed. An ability to find problems as well as to see 'what's missing' is
important for inspections and reviews.

26. What makes a good QA or Test manager?


A good QA, test, or QA/Test(combined) manager should:

¾ be familiar with the software development process

¾ be able to maintain enthusiasm of their team and promote a positive atmosphere, despite
what is a somewhat 'negative' process (e.g., looking for or preventing problems)

¾ be able to promote teamwork to increase productivity

¾ be able to promote cooperation between software, test, and QA engineers

¾ have the diplomatic skills needed to promote improvements in QA processes

¾ have the ability to withstand pressures and say 'no' to other managers when quality is
insufficient or QA processes are not being adhered to

¾ have people judgement skills for hiring and keeping skilled personnel

13
¾ be able to communicate with technical and non-technical people, engineers, managers,
and customers.

¾ be able to run meetings and keep them focused

27. What's the role of documentation in QA?

Critical. (Note that documentation can be electronic, not necessarily paper.) QA practices should be
documented such that they are repeatable. Specifications, designs, business rules, inspection
reports, configurations, code changes, test plans, test cases, bug reports, user manuals, etc. should
all be documented. There should ideally be a system for easily finding and obtaining documents and
determining what documentation will have a particular piece of information. Change management for
documentation should be used if possible.
28. What's the big deal about 'requirements'?
One of the most reliable methods of insuring problems, or failure, in a complex software project is to
have poorly documented requirements specifications. Requirements are the details describing an
application's externally-perceived functionality and properties. Requirements should be clear,
complete, reasonably detailed, cohesive, attainable, and testable. A non-testable requirement would
be, for example, 'user-friendly' (too subjective). A testable requirement would be something like 'the
user must enter their previously-assigned password to access the application'. Determining and
organizing requirements details in a useful and efficient way can be a difficult effort; different
methods are available depending on the particular project. Many books are available that describe
various approaches to this task.

Care should be taken to involve ALL of a project's significant 'customers' in the requirements
process. 'Customers' could be in-house personnel or out, and could include end-users, customer
acceptance testers, customer contract officers, customer management, future software maintenance
engineers, salespeople, etc. Anyone who could later derail the project if their expectations aren't met
should be included if possible.

Organizations vary considerably in their handling of requirements specifications. Ideally, the


requirements are spelled out in a document with statements such as 'The product shall.....'. 'Design'
specifications should not be confused with 'requirements'; design specifications should be traceable
back to the requirements.

In some organizations requirements may end up in high level project plans, functional specification
documents, in design documents, or in other documents at various levels of detail. No matter what
they are called, some type of documentation with detailed requirements will be needed by testers in
order to properly plan and execute tests. Without such documentation, there will be no clear-cut way
to determine if a software application is performing correctly.
29. What steps are needed to develop and run software tests?
The following are some of the steps to consider:

¾ Obtain requirements, functional design, and internal design specifications and other necessary
documents

¾ Obtain budget and schedule requirements

¾ Determine project-related personnel and their responsibilities, reporting requirements,


required standards and processes (such as release processes, change processes, etc.)

¾ Identify application's higher-risk aspects, set priorities, and determine scope and limitations
of tests

¾ Determine test approaches and methods - unit, integration, functional, system, load, usability
tests, etc.

¾ Determine test environment requirements (hardware, software, communications, etc.)

14
¾ Determine testware requirements (record/playback tools, coverage analyzers, test tracking,
problem/bug tracking, etc.)

¾ Determine test input data requirements

¾ Identify tasks, those responsible for tasks, and labor requirements

¾ Set schedule estimates, timelines, milestones

¾ Determine input equivalence classes, boundary value analyses, error classes

¾ Prepare test plan document and have needed reviews/approvals

¾ Write test cases

¾ Have needed reviews/inspections/approvals of test cases

¾ Prepare test environment and testware, obtain needed user manuals/reference


documents/configuration guides/installation guides, set up test tracking processes, set up
logging and archiving processes, set up or obtain test input data

¾ Obtain and install software releases

¾ Perform tests

¾ Evaluate and report results

¾ Track problems/bugs and fixes

¾ Retest as needed

¾ Maintain and update test plans, test cases, test environment, and testware through life cycle
30. What's a 'test plan'?

A software project test plan is a document that describes the objectives, scope, approach, and focus
of a software testing effort. The process of preparing a test plan is a useful way to think through the
efforts needed to validate the acceptability of a software product. The completed document will help
people outside the test group understand the 'why' and 'how' of product validation. It should be
thorough enough to be useful but not so thorough that no one outside the test group will read it. The
following are some of the items that might be included in a test plan, depending on the particular
project:
¾ Title

¾ Identification of software including version/release numbers

¾ Revision history of document including authors, dates, approvals

¾ Table of Contents

¾ Purpose of document, intended audience

¾ Objective of testing effort

¾ Software product overview

¾ Relevant related document list, such as requirements, design documents, other test plans,
etc.

¾ Relevant standards or legal requirements

¾ Traceability requirements
15
¾ Relevant naming conventions and identifier conventions

¾ Overall software project organization and personnel/contact-info/responsibilties

¾ Test organization and personnel/contact-info/responsibilities

¾ Assumptions and dependencies

¾ Project risk analysis

¾ Testing priorities and focus

¾ Scope and limitations of testing

¾ Test outline - a decomposition of the test approach by test type, feature, functionality,
process, system, module, etc. as applicable

¾ Outline of data input equivalence classes, boundary value analysis, error classes

¾ Test environment - hardware, operating systems, other required software, data


configurations, interfaces to other systems

¾ Test environment validity analysis - differences between the test and production systems and
their impact on test validity.

¾ Test environment setup and configuration issues

¾ Software migration processes

¾ Software CM processes

¾ Test data setup requirements

¾ Database setup requirements

¾ Outline of system-logging/error-logging/other capabilities, and tools such as screen capture


software, that will be used to help describe and report bugs

¾ Discussion of any specialized software or hardware tools that will be used by testers to help
track the cause or source of bugs

¾ Test automation - justification and overview

¾ Test tools to be used, including versions, patches, etc.

¾ Test script/test code maintenance processes and version control

¾ Problem tracking and resolution - tools and processes

¾ Project test metrics to be used

¾ Reporting requirements and testing deliverables

¾ Software entrance and exit criteria

¾ Initial sanity testing period and criteria

¾ Test suspension and restart criteria

¾ Personnel allocation

¾ Personnel pre-training needs

16
¾ Test site/location

¾ Outside test organizations to be utilized and their purpose, responsibilities, deliverables,


contact persons, and coordination issues

¾ Relevant proprietary, classified, security, and licensing issues.

¾ Open issues

¾ Appendix - glossary, acronyms, etc.


31. What's a 'test case'?
¾ A test case is a document that describes an input, action, or event and an expected response,
to determine if a feature of an application is working correctly. A test case should contain
particulars such as test case identifier, test case name, objective, test conditions/setup, input
data requirements, steps, and expected results.

¾ Note that the process of developing test cases can help find problems in the requirements or
design of an application, since it requires completely thinking through the operation of the
application. For this reason, it's useful to prepare test cases early in the development cycle if
possible.
32. What should be done after a bug is found?
The bug needs to be communicated and assigned to developers that can fix it. After the problem is
resolved, fixes should be re-tested, and determinations made regarding requirements for regression
testing to check that fixes didn't create problems elsewhere. If a problem-tracking system is in place,
it should encapsulate these processes. A variety of commercial problem-tracking/management
software tools are available. The following are items to consider in the tracking process:

¾ Complete information such that developers can understand the bug, get an idea of it's
severity, and reproduce it if necessary.

¾ Bug identifier (number, ID, etc.)

¾ Current bug status (e.g., 'Released for Retest', 'New', etc.)

¾ The application name or identifier and version

¾ The function, module, feature, object, screen, etc. where the bug occurred

¾ Environment specifics, system, platform, relevant hardware specifics

¾ Test case name/number/identifier

¾ One-line bug description

¾ Full bug description

¾ Description of steps needed to reproduce the bug if not covered by a test case or if the
developer doesn't have easy access to the test case/test script/test tool

¾ Names and/or descriptions of file/data/messages/etc. used in test

¾ File excerpts/error messages/log file excerpts/screen shots/test tool logs that would be
helpful in finding the cause of the problem

¾ Severity estimate (a 5-level range such as 1-5 or 'critical'-to-'low' is common)

¾ Was the bug reproducible?

¾ Tester name

17
¾ Test date

¾ Bug reporting date

¾ Name of developer/group/organization the problem is assigned to

¾ Description of problem cause

¾ Description of fix

¾ Code section/file/module/class/method that was fixed

¾ Date of fix

¾ Application version that contains the fix

¾ Tester responsible for retest

¾ Retest date

¾ Retest results

¾ Regression testing requirements

¾ Tester responsible for regression tests

¾ Regression testing results


A reporting or tracking process should enable notification of appropriate personnel at various stages.
For instance, testers need to know when retesting is needed, developers need to know when bugs
are found and how to get the needed information, and reporting/summary capabilities are needed for
managers.

33. What is 'configuration management'?

Configuration management covers the processes used to control, coordinate, and track: code,
requirements, documentation, problems, change requests, designs,
tools/compilers/libraries/patches, changes made to them, and who makes the changes.

34. What if the software is so buggy it can't really be tested at all?

The best bet in this situation is for the testers to go through the process of reporting whatever bugs
or blocking-type problems initially show up, with the focus being on critical bugs. Since this type of
problem can severely affect schedules, and indicates deeper problems in the software development
process (such as insufficient unit testing or insufficient integration testing, poor design, improper
build or release procedures, etc.) managers should be notified, and provided with some
documentation as evidence of the problem.

35. How can it be known when to stop testing?

This can be difficult to determine. Many modern software applications are so complex, and run in
such an interdependent environment, that complete testing can never be done. Common factors in
deciding when to stop are:
¾ Deadlines (release deadlines, testing deadlines, etc.)

¾ Test cases completed with certain percentage passed

18
¾ Test budget depleted

¾ Coverage of code/functionality/requirements reaches a specified point

¾ Bug rate falls below a certain level

¾ Beta or alpha testing period ends


36. What if there isn't enough time for thorough testing?

Use risk analysis to determine where testing should be focused.


Since it's rarely possible to test every possible aspect of an application, every possible combination
of events, every dependency, or everything that could go wrong, risk analysis is appropriate to most
software development projects. This requires judgement skills, common sense, and experience. (If
warranted, formal methods are also available.) Considerations can include:
¾ Which functionality is most important to the project's intended purpose?

¾ Which functionality is most visible to the user?

¾ Which functionality has the largest safety impact?

¾ Which functionality has the largest financial impact on users?

¾ Which aspects of the application are most important to the customer?

¾ Which aspects of the application can be tested early in the development cycle?

¾ Which parts of the code are most complex, and thus most subject to errors?

¾ Which parts of the application were developed in rush or panic mode?

¾ Which aspects of similar/related previous projects caused problems?

¾ Which aspects of similar/related previous projects had large maintenance expenses?

¾ Which parts of the requirements and design are unclear or poorly thought out?

¾ What do the developers think are the highest-risk aspects of the application?

¾ What kinds of problems would cause the worst publicity?

¾ What kinds of problems would cause the most customer service complaints?

¾ What kinds of tests could easily cover multiple functionalities?

¾ Which tests will have the best high-risk-coverage to time-required ratio?


37. What can be done if requirements are changing continuously?
A common problem and a major headache.

¾ Work with the project's stakeholders early on to understand how requirements might change
so that alternate test plans and strategies can be worked out in advance, if possible.

¾ It's helpful if the application's initial design allows for some adaptability so that later changes
do not require redoing the application from scratch.

¾ If the code is well-commented and well-documented this makes changes easier for the
developers.

¾ Use rapid prototyping whenever possible to help customers feel sure of their requirements
and minimize changes.

19
¾ The project's initial schedule should allow for some extra time commensurate with the
possibility of changes.

¾ Try to move new requirements to a 'Phase 2' version of an application, while using the
original requirements for the 'Phase 1' version.

¾ Negotiate to allow only easily-implemented new requirements into the project, while moving
more difficult new requirements into future versions of the application.

¾ Be sure that customers and management understand the scheduling impacts, inherent risks,
and costs of significant requirements changes. Then let management or the customers (not
the developers or testers) decide if the changes are warranted - after all, that's their job.

¾ Balance the effort put into setting up automated testing with the expected effort required to
re-do them to deal with changes.

¾ Try to design some flexibility into automated test scripts.

¾ Focus initial automated testing on application aspects that are most likely to remain
unchanged.

¾ Devote appropriate effort to risk analysis of changes to minimize regression testing needs.

¾ Design some flexibility into test cases (this is not easily done; the best bet might be to
minimize the detail in the test cases, or set up only higher-level generic-type test plans)

¾ Focus less on detailed test plans and test cases and more on ad hoc testing (with an
understanding of the added risk that this entails).
38. What if the project isn't big enough to justify extensive testing?

Consider the impact of project errors, not the size of the project. However, if extensive testing is still
not justified, risk analysis is again needed and the same considerations as described previously in
'What if there isn't enough time for thorough testing?' apply. The tester might then do ad hoc
testing, or write up a limited test plan based on the risk analysis.

39. What if the application has functionality that wasn't in the requirements?

It may take serious effort to determine if an application has significant unexpected or hidden
functionality, and it would indicate deeper problems in the software development process. If the
functionality isn't necessary to the purpose of the application, it should be removed, as it may have
unknown impacts or dependencies that were not taken into account by the designer or the customer.
If not removed, design information will be needed to determine added testing needs or regression
testing needs. Management should be made aware of any significant added risks as a result of the
unexpected functionality. If the functionality only effects areas such as minor improvements in the
user interface, for example, it may not be a significant risk.

40. How can Software QA processes be implemented without stifling productivity?


By implementing QA processes slowly over time, using consensus to reach agreement on processes,
and adjusting and experimenting as an organization grows and matures, productivity will be
improved instead of stifled. Problem prevention will lessen the need for problem detection, panics
and burn-out will decrease, and there will be improved focus and less wasted effort. At the same
time, attempts should be made to keep processes simple and efficient, minimize paperwork, promote
computer-based processes and automated tracking and reporting, minimize time required in
meetings, and promote training as part of the QA process. However, no one - especially talented
technical types - likes rules or bureacracy, and in the short run things may slow down a bit. A typical
scenario would be that more days of planning and development will be needed, but less time will be
required for late-night bug-fixing and calming of irate customers.

41. What if an organization is growing so fast that fixed QA processes are impossible?
20
This is a common problem in the software industry, especially in new technology areas. There is no
easy solution in this situation, other than:

¾ Hire good people

¾ Management should 'ruthlessly prioritize' quality issues and maintain focus on the customer

¾ Everyone in the organization should be clear on what 'quality' means to the customer

42. How does a client/server environment affect testing?


Client/server applications can be quite complex due to the multiple dependencies among clients, data
communications, hardware, and servers. Thus testing requirements can be extensive. When time is
limited (as it usually is) the focus should be on integration and system testing. Additionally,
load/stress/performance testing may be useful in determining client/server application limitations
and capabilities. There are commercial tools to assist with such testing.

43. How can World Wide Web sites be tested?

Web sites are essentially client/server applications - with web servers and 'browser' clients.
Consideration should be given to the interactions between html pages, TCP/IP communications,
Internet connections, firewalls, applications that run in web pages (such as applets, javascript, plug-
in applications), and applications that run on the server side (such as cgi scripts, database interfaces,
logging applications, dynamic page generators, asp, etc.). Additionally, there are a wide variety of
servers and browsers, various versions of each, small but sometimes significant differences between
them, variations in connection speeds, rapidly changing technologies, and multiple standards and
protocols. The end result is that testing for web sites can become a major ongoing effort. Other
considerations might include:
¾ What are the expected loads on the server (e.g., number of hits per unit time?), and what
kind of performance is required under such loads (such as web server response time,
database query response times). What kinds of tools will be needed for performance testing
(such as web load testing tools, other tools already in house that can be adapted, web robot
downloading tools, etc.)?

¾ Who is the target audience? What kind of browsers will they be using? What kind of
connection speeds will they by using? Are they intra- organization (thus with likely high
connection speeds and similar browsers) or Internet-wide (thus with a wide variety of
connection speeds and browser types)?

¾ What kind of performance is expected on the client side (e.g., how fast should pages appear,
how fast should animations, applets, etc. load and run)?

¾ Will down time for server and content maintenance/upgrades be allowed? how much?

¾ What kinds of security (firewalls, encryptions, passwords, etc.) will be required and what is it
expected to do? How can it be tested?

¾ How reliable are the site's Internet connections required to be? And how does that affect
backup system or redundant connection requirements and testing?

¾ What processes will be required to manage updates to the web site's content, and what are
the requirements for maintaining, tracking, and controlling page content, graphics, links, etc.?

¾ Which HTML specification will be adhered to? How strictly? What variations will be allowed for
targeted browsers?

¾ Will there be any standards or requirements for page appearance and/or graphics throughout
a site or parts of a site??

¾ How will internal and external links be validated and updated? how often?

21
¾ Can testing be done on the production system, or will a separate test system be required?
How are browser caching, variations in browser option settings, dial-up connection
variabilities, and real-world internet 'traffic congestion' problems to be accounted for in
testing?

¾ How extensive or customized are the server logging and reporting requirements; are they
considered an integral part of the system and do they require testing?

¾ How are cgi programs, applets, javascripts, ActiveX components, etc. to be maintained,
tracked, controlled, and tested?

¾ Pages should be 3-5 screens max unless content is tightly focused on a single topic. If larger,
provide internal links within the page.

¾ The page layouts and design elements should be consistent throughout a site, so that it's
clear to the user that they're still within a site.

¾ Pages should be as browser-independent as possible, or pages should be provided or


generated based on the browser-type.

¾ All pages should have links external to the page; there should be no dead-end pages.

¾ The page owner, revision date, and a link to a contact person or organization should be
included on each page.

44. How is testing affected by object-oriented designs?

Well-engineered object-oriented design can make it easier to trace from code to internal design to
functional design to requirements. While there will be little affect on black box testing (where an
understanding of the internal design of the application is unnecessary), white-box testing can be
oriented to the application's objects. If the application was well-designed this can simplify test
design.

45. What is Extreme Programming and what's it got to do with testing?

Extreme Programming (XP) is a software development approach for small teams on risk-prone
projects with unstable requirements. It was created by Kent Beck who described the approach in his
book 'Extreme Programming Explained'. Testing ('extreme testing') is a core aspect of Extreme
Programming. Programmers are expected to write unit and functional test code first - before the
application is developed. Test code is under source control along with the rest of the code.
Customers are expected to be an integral part of the project team and to help develope scenarios for
acceptance/black box testing. Acceptance tests are preferably automated, and are modified and
rerun for each of the frequent development iterations. QA and test personnel are also required to be
an integral part of the project team. Detailed requirements documentation is not used, and frequent
re-scheduling, re-estimating, and re-prioritizing is expected.

46. Common Software Errors


Introduction

This document takes you through whirl-wind tour of common software errors. This is an excellent aid
for software testing. It helps you to identify errors systematically and increases the efficiency of
software testing and improves testing productivity. For more information, please refer Testing
Computer Software, Wiley Edition.

Type of Errors

• User Interface Errors

• Error Handling

22
• Boundary related errors

• Calculation errors

• Initial and Later states


• Control flow errors

• Errors in Handling or Interpreting Data

• Race Conditions

• Load Conditions

• Hardware

• Source, Version and ID Control

• Testing Errors

Let us go through details of each kind of error.

User Interface Errors

Functionality
Sl No Possible Error Conditions
1 Excessive Functionality
2 Inflated impression of functionality
3 Inadequacy for the task at hand
4 Missing function
5 Wrong function
6 Functionality must be created by user
7 Doesn't do what the user expects

Communication
Missing Information
Sl No Possible Error Conditions
1 No on Screen instructions
2 Assuming printed documentation is already available.
3 Undocumented features
4 States that appear impossible to exit
5 No cursor
6 Failure to acknowledge input
7 Failure to show activity during long delays
8 Failure to advise when a change will take effect
9 Failure to check for the same document being opened twice
Wrong, misleading, confusing information
10 Simple factual errors
11 Spelling errors
12 Inaccurate simplifications
13 Invalid metaphors

23
14 Confusing feature names
15 More than one name for the same feature
16 Information overland
17 When are data saved
18 Wrong function
19 Functionality must be created by user
20 Poor external modularity
Help text and error messages
21 Inappropriate reading levels
22 Verbosity
23 Inappropriate emotional tone
24 Factual errors
25 Context errors
26 Failure to identify the source of error
27 Forbidding a resource without saying why
28 Reporting non-errors
29 Failure to highlight the part of the screen
30 Failure to clear highlighting
31 Wrong/partial string displayed
32 Message displayed for too long or not long enough
Display Layout
33 Poor aesthetics in screen layout
34 Menu Layout errors
35 Dialog box layout errors
36 Obscured Instructions
37 Misuse of flash
38 Misuse of color
39 Heavy reliance on color
40 Inconsistent with the style of the environment
41 Cannot get rid of on screen information
Output
42 Can't output certain data
43 Can't redirect output
44 Format incompatible with a follow-up process
45 Must output too little or too much
46 Can't control output layout
47 Absurd printout level of precision
48 Can't control labeling of tables or figures
49 Can't control scaling of graphs
Performance
50 Program Speed
51 User Throughput
52 Can't redirect output
53 Perceived performance
54 Slow program
55 slow echoing
56 how to reduce user throughput
57 Poor responsiveness
58 No type ahead
59 No warning that the operation takes long time
60 No progress reports
61 Problems with time-outs
62 Program pesters you

Program Rigidity

24
User tailorability
Sl No Possible Error Conditions
1 Can't turn off case sensitivity
2 Can't tailor to hardware at hand
3 Can't change device initialization
4 Can't turn off automatic changes
5 Can't slow down/speed up scrolling
6 Can't do what you did last time
7 Failure to execute a customization commands
8 Failure to save customization commands
9 Side effects of feature changes
10 Can't turn off the noise
11 Infinite tailorability
Who is in control?
12 Unnecessary imposition of a conceptual style
13 Novice friendly, experienced hostile
14 Surplus or redundant information required
15 Unnecessary repetition of steps
16 Unnecessary limits

Command Structure and Rigidity


Inconsistencies
Sl No Possible Error Conditions
1 Optimizations
2 Inconsistent syntax
3 Inconsistent command entry style
4 Inconsistent abbreviations
5 Inconsistent termination rule
6 Inconsistent command options
7 Similarly named commands
8 Inconsistent Capitalization
9 Inconsistent menu position
10 Inconsistent function key usage
11 Inconsistent error handling rules
12 Inconsistent editing rules
13 Inconsistent data saving rules
Time Wasters
14 Garden paths
15 choice can't be taken
16 Are you really, really sure
17 Obscurely or idiosyncratically named commands
Menus
18 Excessively complex menu hierarchy
19 Inadequate menu navigation options
20 Too many paths to the same place
21 You can't get there from here
22 Related commands relegated to unrelated menus
23 Unrelated commands tossed under the same menu
Command Lines
24 Forced distinction between uppercase and lowercase
25 Reversed parameters
26 Full command names are not allowed
27 Abbreviations are not allowed
28 Demands complex input on one line
29 no batch input

25
30 can't edit commands
Inappropriate use of key board
31 Failure to use cursor, edit, or function keys
32 Non std use of cursor and edit keys
33 non-standard use of function keys
34 Failure to filter invalid keys
35 Failure to indicate key board state changes

Missing Commands
State transitions
Sl No Possible Error Conditions
1 Can't do nothing and leave
2 Can't quit mid-program
3 Can't stop mid-command
4 Can't pause
Disaster prevention
5 No backup facility
6 No undo
7 No are you sure
8 No incremental saves
Disaster prevention
9 Inconsistent menu position
10 Inconsistent function key usage
11 Inconsistent error handling rules
12 Inconsistent editing rules
13 Inconsistent data saving rules
Error handling by the user
14 No user specifiable filters
15 Awkward error correction
16 Can't include comments
17 Can't display relationships between variables
Miscellaneous
18 Inadequate privacy or security
19 Obsession with security
20 Can't hide menus
21 Doesn't support standard OS features
22 Doesn't allow long names

Error Handling

Error prevention
Sl No Possible Error Conditions
1 Inadequate initial state validation
2 Inadequate tests of user input
3 Inadequate protection against corrupted data
4 Inadequate tests of passed parameters
5 Inadequate protection against operating system bugs
6 Inadequate protection against malicious use
7 Inadequate version control

Error Detection
Sl No Possible Error Conditions
1 ignores overflow
2 ignores impossible values
3 ignores implausible values

26
4 ignores error flag
5 ignores hardware fault or error conditions
6 data comparison

Error Recovery
Sl No Possible Error Conditions
1 automatic error detection
2 failure to report error
3 failure to set an error flag
4 where does the program go back to
5 aborting errors
6 recovery from hardware problems
7 no escape from missing disks

Boundary related errors

Sl No Possible Error Conditions


1 Numeric boundaries
2 Equality as boundary
3 Boundaries on numerosity
4 Boundaries in space
5 Boundaries in time
6 Boundaries in loop
7 Boundaries in memory
8 Boundaries with data structure
9 Hardware related boundaries
10 Invisible boundaries
11 Mishandling of boundary case
12 Wrong boundary
13 Mishandling of cases outside boundary

Calculation Errors

Sl No Possible Error Conditions


1 Bad Logic
2 Bad Arithmetic
3 Imprecise Calculations
4 Outdated constants
5 Calculation errors
6 Impossible parenthesis
7 Wrong order of calculations
8 Bad underlying functions
9 Overflow and Underflow
10 Truncation and Round-off error
11 Confusion about the representation of data
12 Incorrect conversion from one data representation to another
13 Wrong Formula
14 Incorrect Approximation

Race Conditions

Sl No Possible Error Conditions


1 Races in updating data
2 Assumption that one event or task finished before another begins
27
3 Assumptions that one event or task has finished before another begins
4 Assumptions that input won't occur during a brief processing interval
5 Assumptions that interrupts won't occur during brief interval
6 Resource races
7 Assumptions that a person, device or process will respond quickly
8 Options out of sync during display changes
9 Tasks starts before its prerequisites are met
10 Messages cross or don't arrive in the order sent

Initial and Later States

Sl No Possible Error Conditions


1 Failure to set data item to zero
2 Failure to initialize a loop-control variable
3 Failure to initialize a or re-initialize a pointer
4 Failure to clear a string
5 Failure to initialize a register
6 Failure to clear a flag
7 Data were supposed to be initialized elsewhere
8 Failure to re-initialize
9 Assumption that data were not re-initialized
10 Confusion between static and dynamic storage
11 Data modifications by side effect
12 Incorrect initialization

Control Flow Errors

Program runs amok


Sl No Possible Error Conditions
1 Jumping to a routine that isn't resident
2 Re-entrance
3 Variables contains embedded command names
4 Wrong returning state assumed
5 Exception handling based exits

Return to wrong place


Sl No Possible Error Conditions
1 Corrupted Stack
2 Stack underflow/overflow
3 GOTO rather than RETURN from sub-routine
Interrupts
Sl No Possible Error Conditions
1 Wrong interrupt vector
2 Failure to restore or update interrupt vector
3 Invalid restart after an interrupt
4 Failure to block or un-block interrupts

Program Stops
Sl No Possible Error Conditions
1 Dead crash
2 Syntax error reported at run time
3 Waiting for impossible condition or combinations of conditions
4 Wrong user or process priority
28
Error Detection
Sl No Possible Error Conditions
1 infinite loop
2 Wrong starting value for the loop control variables
3 Accidental change of loop control variables
4 Command that do or don't belong inside the loop
5 Command that do or don't belong inside the loop
6 Improper loop nesting

If Then Else , Or may not


Sl No Possible Error Conditions
1 Wrong inequalities
2 Comparison sometimes yields wrong result
3 Not equal verses equal when there are three cases
4 Testing floating point values for equality
5 confusion between inclusive and exclusive OR
6 Incorrectly negating a logical expression
7 Assignment equal instead of test equal
8 Commands being inside the THEN or ELSE clause
9 Commands that don't belong either case
10 Failure to test a flag
11 Failure to clear a flag

Multiple Cases
Sl No Possible Error Conditions
1 Missing default
2 Wrong default
3 Missing cases
4 Overlapping cases
5 Invalid or impossible cases
6 Commands being inside the THEN or ELSE clause
7 Case should be sub-divided

Errors Handling or Interpreting Data

Problems in passing data between routines


Sl No Possible Error Conditions
1 Parameter list variables out of order or missing
2 Data Type errors
3 Aliases and shifting interpretations of the same area of memory
4 Misunderstood data values
5 inadequate error information
6 Failure to clean up data on exception handling
7 Outdated copies of data
8 Related variable get out of synch
9 Local setting of global data
10 Global use of local variables
11 Wrong mask in bit fields
12 Wrong value from table

29
Data boundaries
Sl No Possible Error Conditions
1 Un-terminated null strings
2 Early end of string
3 Read/Write past end of data structure or an element in it

Read outside the limits of message buffer


Sl No Possible Error Conditions
1 Complier padding to word boundaries
2 value stack underflow/overflow
3 Trampling another process's code or data

Messaging Problems
Sl No Possible Error Conditions
1 Messages sent to wrong process or port
2 Failure to validate an incoming message
3 Lost or out of synch messages
4 Message sent to only N of N+1 processes

Data Storage corruption


Sl No Possible Error Conditions
1 Overwritten changes
2 Data entry not saved
3 Too much data for receiving process to handle
4 Overwriting a file after an error exit or user abort

Load Conditions

Sl No Possible Error Conditions


1 Required resources are not available
2 No available large memory area
3 Input buffer or queue not deep enough
4 Doesn't clear item from queue, buffer or stock
5 Lost Messages
6 Performance costs
7 Race condition windows expand
8 Doesn't abbreviate under load
9 Doesn't recognize that another process abbreviates output under load
10 Low priority tasks not put off
11 Low priority tasks never done

Doesn't return a resource


Sl No Possible Error Conditions
1 Doesn't indicate that it's done with a device
2 Doesn't erase old files from mass storage
3 Doesn't return unused memory
4 Wastes computer time

Hardware

Sl No Possible Error Conditions


1 Wrong Device
2 Wrong Device Address
3 Device unavailable

30
4 Device returned to wrong type of pool
5 Device use forbidden to caller
6 Specifies wrong privilege level for the device
7 Noisy Channel
8 Channel goes down
9 Time-out problems
10 Wrong storage device
11 Doesn't check the directory of current disk
12 Doesn't close the file
13 Unexpected end of file
14 Disk sector bug and other length dependent errors
15 Wrong operation or instruction codes
16 Misunderstood status or return code
17 Underutilizing device intelligence
18 Paging mechanism ignored or misunderstood
19 Ignores channel throughput limits
20 Assuming device is or isn't or should be or shouldn't be initialized
21 Assumes programmable function keys are programmed correctly
Source, Version, ID Control

Sl No Possible Error Conditions


1 Old bugs mysteriously re appear
2 Failure to update multiple copies of data or program files
3 No title
4 No version ID
5 Wrong version number of title screen
6 No copy right message or bad one
7 Archived source doesn't compile into a match for shipping code
8 Manufactured disks don't work or contain wrong code or data

Testing Errors

Missing bugs in the program


Sl No Possible Error Conditions
1 Failure to notice a problem
2 You don't know what the correct test results are
3 You are bored or inattentive
4 Misreading the Screen
5 Failure to report problem
6 Failure to execute a planned test
7 Failure to use the most promising test case
8 Ignoring programmer's suggestions

Finding bugs that aren't in the program


Sl No Possible Error Conditions
1 Errors in testing programs
2 Corrupted data files
3 Misinterpreted specifications or documentation

Poor reporting
Sl No Possible Error Conditions
1 Illegible reports
2 Failure to make it clear how to reproduce the problem
3 Failure to say you can't reproduce the problem
4 Failure to check your report

31
5 Failure to report timing dependencies
6 Failure to simplify conditions
7 Concentration on trivia
8 Abusive language

Poor Tracking and follow-up


Sl No Possible Error Conditions
1 Failure to provide summary report
2 Failure to re-report serious bug
3 Failure to check for unresolved problems just before release
4 Failure to verify fixes

47. Designing Unit Test Cases

Executive Summary

Producing a test specification, including the design of test cases, is the level of test design which has
the highest degree of creative input. Furthermore, unit test specifications will usually be produced by
a large number of staff with a wide range of experience, not just a few experts.

This paper provides a general process for developing unit test specifications and then describes some
specific design techniques for designing unit test cases. It serves as a tutorial for developers who are
new to formal testing of software, and as a reminder of some finer points for experienced software
testers.

A. Introduction

The design of tests is subject to the same basic engineering principles as the design of software.
Good design consists of a number of stages which progressively elaborate the design. Good test
design consists of a number of stages which progressively elaborate the design of tests:

¾ Test strategy;
¾ Test planning;
¾ Test specification;
¾ Test procedure.

These four stages of test design apply to all levels of testing, from unit testing through to system
testing. This paper concentrates on the specification of unit tests; i.e. the design of individual unit
test cases within unit test specifications. A more detailed description of the four stages of test design
can be found in the IPL paper "An Introduction to Software Testing".

The design of tests has to be driven by the specification of the software. For unit testing, tests are
designed to verify that an individual unit implements all design decisions made in the unit's design
specification. A thorough unit test specification should include positive testing, that the unit does
what it is supposed to do, and also negative testing, that the unit does not do anything that it is not
supposed to do.

Producing a test specification, including the design of test cases, is the level of test design which has
the highest degree of creative input. Furthermore, unit test specifications will usually be produced by
a large number of staff with a wide range of experience, not just a few experts.

This paper provides a general process for developing unit test specifications, and then describes
some specific design techniques for designing unit test cases. It serves as a tutorial for developers

32
who are new to formal testing of software, and as a reminder of some finer points for experienced
software testers.

B. Developing Unit Test Specifications

Once a unit has been designed, the next development step is to design the unit tests. An important
point here is that it is more rigorous to design the tests before the code is written. If the code was
written first, it would be too tempting to test the software against what it is observed to do (which is
not really testing at all), rather than against what it is specified to do.

A unit test specification comprises a sequence of unit test cases. Each unit test case should include
four essential elements:

¾ A statement of the initial state of the unit, the starting point of the test case (this is only
applicable where a unit maintains state between calls);
¾ The inputs to the unit, including the value of any external data read by the unit;
¾ What the test case actually tests, in terms of the functionality of the unit and the analysis
used in the design of the test case (for example, which decisions within the unit are tested);
¾ The expected outcome of the test case (the expected outcome of a test case should always be
defined in the test specification, prior to test execution).

The following subsections of this paper provide a six step general process for developing a unit test
specification as a set of individual unit test cases. For each step of the process, suitable test case
design techniques are suggested. (Note that these are only suggestions. Individual circumstances
may be better served by other test case design techniques). Section 3 of this paper then describes in
detail a selection of techniques which can be used within this process to help design test cases.

B.1 Step 1 - Make it Run

The purpose of the first test case in any unit test specification should be to execute the unit under
test in the simplest way possible. When the tests are actually executed, knowing that at least the
first unit test will execute is a good confidence boost. If it will not execute, then it is preferable to
have something as simple as possible as a starting point for debugging.

Suitable techniques:

- Specification derived tests


- Equivalence partitioning

B.2 Step 2 - Positive Testing

Test cases should be designed to show that the unit under test does what it is supposed to do. The
test designer should walk through the relevant specifications; each test case should test one or more
statements of specification. Where more than one specification is involved, it is best to make the
sequence of test cases correspond to the sequence of statements in the primary specification for the
unit.

Suitable techniques:

- Specification derived tests


- Equivalence partitioning
- State-transition testing

B.3. Step 3 - Negative Testing

Existing test cases should be enhanced and further test cases should be designed to show that the
software does not do anything that it is not specified to do. This step depends primarily upon error
guessing, relying upon the experience of the test designer to anticipate problem areas.
33
Suitable techniques:

- Error guessing
- Boundary value analysis
- Internal boundary value testing
- State-transition testing

B.4. Step 4 - Special Considerations

Where appropriate, test cases should be designed to address issues such as performance, safety
requirements and security requirements. Particularly in the cases of safety and security, it can be
convenient to give test cases special emphasis to facilitate security analysis or safety analysis and
certification. Test cases already designed which address security issues or safety hazards should be
identified in the unit test specification. Further test cases should then be added to the unit test
specification to ensure that all security issues and safety hazards applicable to the unit will be fully
addressed.

Suitable techniques:

- Specification derived tests

B.5. Step 5 - Coverage Tests

The test coverage likely to be achieved by the designed test cases should be visualised. Further test
cases can then be added to the unit test specification to achieve specific test coverage objectives.
Once coverage tests have been designed, the test procedure can be developed and the tests
executed.

Suitable techniques:

- Branch testing
- Condition testing
- Data definition-use testing
- State-transition testing

B.6. Test Execution

A test specification designed using the above five steps should in most cases provide a thorough test
for a unit. At this point the test specification can be used to develop an actual test procedure, and
the test procedure used to execute the tests. For users of AdaTEST or Cantata, the test procedure
will be an AdaTEST or Cantata test script.

Execution of the test procedure will identify errors in the unit which can be corrected and the unit re-
tested. Dynamic analysis during execution of the test procedure will yield a measure of test
coverage, indicating whether coverage objectives have been achieved. There is therefore a further
coverage completion step in the process of designing test specifications.

B.7. Step 6 - Coverage Completion

Depending upon an organization’s standards for the specification of a unit, there may be no
structural specification of processing within a unit other than the code itself. There are also likely to
have been human errors made in the development of a test specification. Consequently, there may
be complex decision conditions, loops and branches within the code for which coverage targets may
not have been met when tests were executed. Where coverage objectives are not achieved, analysis
must be conducted to determine why. Failure to achieve a coverage objective may be due to:
34
¾ Infeasible paths or conditions - the corrective action should be to annotate the test
specification to provide a detailed justification of why the path or condition is not tested.
AdaTEST provides some facilities to help exclude infeasible conditions from Boolean coverage
metrics.
¾ Unreachable or redundant code - the corrective action will probably be to delete the offending
code. It is easy to make mistakes in this analysis, particularly where defensive programming
techniques have been used. If there is any doubt, defensive programming should not be
deleted.
¾ Insufficient test cases - test cases should be refined and further test cases added to a test
specification to fill the gaps in test coverage.

Ideally, the coverage completion step should be conducted without looking at the actual code.
However, in practice some sight of the code may be necessary in order to achieve coverage
targets. It is vital that all test designers should recognize that use of the coverage completion
step should be minimized. The most effective testing will come from analysis and specification,
not from experimentation and over dependence upon the coverage completion step to cover for
sloppy test design.

Suitable techniques:

- Branch testing
- Condition testing
- Data definition-use testing
- State-transition testing

B.8. General Guidance

Note that the first five steps in producing a test specification can be achieved:

¾ Solely from design documentation;


¾ Without looking at the actual code;
¾ Prior to developing the actual test procedure.

It is usually a good idea to avoid long sequences of test cases which depend upon the outcome of
preceding test cases. An error identified by a test case early in the sequence could cause secondary
errors and reduce the amount of real testing achieved when the tests are executed.

The process of designing test cases, including executing them as "thought experiments", often
identifies bugs before the software has even been built. It is not uncommon to find more bugs when
designing tests than when executing tests.

Throughout unit test design, the primary input should be the specification documents for the unit
under test. While use of actual code as an input to the test design process may be necessary in some
circumstances, test designers must take care that they are not testing the code against itself. A test
specification developed from the code will only prove that the code does what the code does, not
that it does what it is supposed to do.

C. Test Case Design Techniques

The preceding section of this paper has provided a "recipe" for developing a unit test specification as
a set of individual test cases. In this section a range of techniques which can be to help define test
cases are described.

Test case design techniques can be broadly split into two main categories. Black box techniques use
the interface to a unit and a description of functionality, but do not need to know how the inside of a
unit is built. White box techniques make use of information about how the inside of a unit works.
35
There are also some other techniques which do not fit into either of the above categories. Error
guessing falls into this category.

The most important ingredients of any test design are experience and common sense. Test designers
should not let any of the given techniques obstruct the application of experience and common sense.

The selection of test case design techniques described in the following subsections is by no means
exhaustive. Further information on techniques for test case design can be found in "Software Testing
Techniques" 2nd Edition, B Beizer,Van Nostrand Reinhold, New York 1990.

C.1. Specification Derived Tests


As the name suggests, test cases are designed by walking through the relevant specifications. Each
test case should test one or more statements of specification. It is often practical to make the
sequence of test cases correspond to the sequence of statements in the specification for the unit
under test. For example, consider the specification for a function to calculate the square root of a
real number, shown in figure 3.1.

36
There are three statements in this specification, which can be addressed by two test cases. Note that
the use of Print_Line conveys structural information in the specification.

Test Case 1: Input 4, Return 2

- Exercises the first statement in the specification


("When given an input of 0 or greater, the positive square
root of the input shall be returned.").

Test Case 2: Input -10, Return 0, Output "Square root error - illegal negative input" using
Print_Line.

- Exercises the second and third statements in the specification

("When given an input of less than 0, the error message


"Square root error - illegal negative input" shall be displayed
and a value of 0 returned. The library routine Print_Line shall
be used to display the error message.").

Specification derived test cases can provide an excellent correspondence to the sequence of
statements in the specification for the unit under test, enhancing the readability and maintainability
of the test specification. However, specification derived testing is a positive test case design
technique. Consequently, specification derived test cases have to be supplemented by negative test
cases in order to provide a thorough unit test specification.

A variation of specification derived testing is to apply a similar technique to a security analysis,


safety analysis, software hazard analysis, or other document which provides supplementary
information to the unit's specification.

C.2. Equivalence Partitioning

Equivalence partitioning is a much more formalised method of test case design. It is based upon
splitting the inputs and outputs of the software under test into a number of partitions, where the
behaviour of the software is equivalent for any value within a particular partition. Data which forms

37
partitions is not just routine parameters. Partitions can also be present in data accessed by the
software, in time, in input and output sequence, and in state.

Equivalence partitioning assumes that all values within any individual partition are equivalent for test
purposes. Test cases should therefore be designed to test one value in each partition. Consider again
the square root function used in the previous example. The square root function has two input
partitions and two output partitions, as shown in table 3.2.

These four partitions can be tested with two test cases:

Test Case 1: Input 4, Return 2


- Exercises the >=0 input partition (ii)
- Exercises the >=0 output partition (a)

Test Case 2: Input -10, Return 0, Output "Square root error - illegal negative input" using
Print_Line.

- Exercises the <0 input partition (i)


- Exercises the "error" output partition (b)

For a function like square root, we can see that equivalence partitioning is quite simple. One test
case for a positive number and a real result; and a second test case for a negative number and an
error result. However, as software becomes more complex, the identification of partitions and the
inter-dependencies between partitions becomes much more difficult, making it less convenient to use
this technique to design test cases. Equivalence partitioning is still basically a positive test case
design technique and needs to be supplemented by negative tests.

C.3. Boundary Value Analysis

Boundary value analysis uses the same analysis of partitions as equivalence partitioning. However,
boundary value analysis assumes that errors are most likely to exist at the boundaries between
partitions. Boundary value analysis consequently incorporates a degree of negative testing into the
test design, by anticipating that errors will occur at or near the partition boundaries. Test cases are
designed to exercise the software on and at either side of boundary values. Consider the two input
partitions in the square root example, as illustrated by figure 3.2.

38
The zero or greater partition has a boundary at 0 and a boundary at the most positive real number.
The less than zero partition shares the boundary at 0 and has another boundary at the most
negative real number. The output has a boundary at 0, below which it cannot go.

Test Case 1: Input {the most negative real number}, Return 0, Output "Square root error - illegal
negative input" using Print_Line

-Exercises the lower boundary of partition (i).

Test Case 2: Input {just less than 0}, Return 0, Output "Square root error - illegal
negative input" using Print_Line

- Exercises the upper boundary of partition (i).


Test Case 3: Input 0, Return 0

- Exercises just outside the upper boundary of partition (i),


the lower boundary of partition (ii) and the lower boundary
of partition (a).

Test Case 4: Input {just greater than 0}, Return {the positive square root of the input}

- Exercises just inside the lower boundary of partition (ii).

Test Case 5: Input {the most positive real number}, Return {the positive square root of the input}

- Exercises the upper boundary of partition (ii) and the upper boundary of
partition (a).

As for equivalence partitioning, it can become impractical to use boundary value analysis thoroughly
for more complex software. Boundary value analysis can also be meaningless for non scalar data,
such as enumeration values. In the example, partition (b) does not really have boundaries. For
purists, boundary value analysis requires knowledge of the underlying representation of the
numbers. A more pragmatic approach is to use any small values above and below each boundary
and suitably big positive and negative numbers

C.4. State-Transition Testing

State transition testing is particularly useful where either the software has been designed as a state
machine or the software implements a requirement that has been modelled as a state machine. Test
cases are designed to test the transitions between states by creating the events which lead to
transitions.

39
When used with illegal combinations of states and events, test cases for negative testing can be
designed using this approach. Testing state machines is addressed in detail by the IPL paper "Testing
State Machines with AdaTEST and Cantata".

C.5. Branch Testing

In branch testing, test cases are designed to exercise control flow branches or decision points in a
unit. This is usually aimed at achieving a target level of Decision Coverage. Given a functional
specification for a unit, a "black box" form of branch testing is to "guess" where branches may be
coded and to design test cases to follow the branches. However, branch testing is really a "white
box" or structural test case design technique. Given a structural specification for a unit, specifying
the control flow within the unit, test cases can be designed to exercise branches. Such a structural
unit specification will typically include a flowchart or PDL.

Returning to the square root example, a test designer could assume that there would be a branch
between the processing of valid and invalid inputs, leading to the following test cases:

Test Case 1: Input 4, Return 2

- Exercises the valid input processing branch

Test Case 2: Input -10, Return 0, Output "Square root error - illegal negative input" using
Print_Line.

- Exercises the invalid input processing branch

However, there could be many different structural implementations of the square root function. The
following structural specifications are all valid implementations of the square root function, but the
above test cases would only achieve decision coverage of the first and third versions of the
specification.

40
41
It can be seen that branch testing works best with a structural specification for the unit. A structural
unit specification will enable branch test cases to be designed to achieve decision coverage, but a
purely functional unit specification could lead to coverage gaps.

One thing to beware of is that by concentrating upon branches, a test designer could loose sight of
the overall functionality of a unit. It is important to always remember that it is the overall
functionality of a unit that is important, and that branch testing is a means to an end, not an end in
itself. Another consideration is that branch testing is based solely on the outcome of decisions. It
makes no allowances for the complexity of the logic which leads to a decision.

C.6. Condition Testing

There are a range of test case design techniques which fall under the general title of condition
testing, all of which try to allay the weaknesses of branch testing when complex logical conditions
are encountered. The object of condition testing is to design test cases to show that the individual
components of logical conditions and combinations of the individual components are correct.

Test cases are designed to test the individual elements of logical expressions, both within branch
conditions and within other expressions in a unit. As for branch testing, condition testing could be
used as a "black box" technique, where the test designer makes intelligent guesses about the
implementation of a functional specification for a unit. However, condition testing is more suited to
"white box" test design from a structural specification for a unit.

The test cases should be targeted at achieving a condition coverage metric, such as Modified
Condition Decision Coverage (available as Boolean Operand Effectiveness in AdaTEST). The IPL paper
entitled "Structural Coverage Metrics" provides more detail of condition coverage metrics.

To illustrate condition testing, consider the example specification for the square root function which
uses successive approximation (figure 3.3(d) - Specification 4). Suppose that the designer for the
42
unit made a decision to limit the algorithm to a maximum of 10 iterations, on the grounds that after
10 iterations the answer would be as close as it would ever get. The PDL specification for the unit
could specify an exit condition like that given in figure 3.4.

If the coverage objective is Modified Condition Decision Coverage, test cases have to prove that both
error<desired accuracy and iterations=10 can independently affect the outcome of the decision.

Test Case 1: 10 iterations, error>desired accuracy for all iterations.

- Both parts of the condition are false for the first 9


iterations. On the tenth iteration, the first part of the
condition is false and the second part becomes true,
showing that the iterations=10 part of the condition can
independently affect its outcome.

Test Case 2: 2 iterations, error>=desired accuracy for the first iteration, and
error<desired accuracy for the second iteration.

- Both parts of the condition are false for the first iteration.
On the second iteration, the first part of the condition
becomes true and the second part remains false, showing
that the error<desired accuracy part of the condition can
independently affect its outcome.

Condition testing works best when a structural specification for the unit is available. It provides a
thorough test of complex conditions, an area of frequent programming and design error and an area
which is not addressed by branch testing. As for branch testing, it is important for test designers to
beware that concentrating on conditions could distract a test designer from the overall functionality
of a unit.

C.7. Data Definition-Use Testing

Data definition-use testing designs test cases to test pairs of data definitions and uses. A data
definition is anywhere that the value of a data item is set, and a data use is anywhere that a data
item is read or used. The objective is to create test cases which will drive execution through paths
between specific definitions and uses.

Like decision testing and condition testing, data definition-use testing can be used in combination
with a functional specification for a unit, but is better suited to use with a structural specification for
a unit.

43
Consider one of the earlier PDL specifications for the square root function which sent every input to
the maths co-processor and used the co-processor status to determine the validity of the result.
(Figure 3.3(c) - Specification 3). The first step is to list the pairs of definitions and uses. In this
specification there are a number of definition-use pairs, as shown in table 3.3.

These pairs of definitions and uses can then be used to design test cases. Two test cases are
required to test all six of these definition-use pairs:

Test Case 1: Input 4, Return 2


- Tests definition-use pairs 1, 2, 5, 6
-
Test Case 2: Input -10, Return 0, Output "Square root error - illegal negative input" using
Print_Line.

- Tests definition-use pairs 1, 2, 3, 4

The analysis needed to develop test cases using this design technique can also be useful for
identifying problems before the tests are even executed; for example, identification of situations
where data is used without having been defined. This is the sort of data flow analysis that some
static analysis tool can help with. The analysis of data definition-use pairs can become very complex,
even for relatively simple units. Consider what the definition-use pairs would be for the successive
approximation version of square root!

It is possible to split data definition-use tests into two categories: uses which affect control flow
(predicate uses) and uses which are purely computational. Refer to "Software Testing Techniques"
2nd Edition, B Beizer,Van Nostrand Reinhold, New York 1990, for a more detailed description of
predicate and computational uses.

C.8. Internal Boundary Value Testing

In many cases, partitions and their boundaries can be identified from a functional
specification for a unit, as described under equivalence partitioning and boundary value analysis
above. However, a unit may also have internal boundary values which can only be identified from a
structural specification. Consider a fragment of the successive approximation version of the square
root unit specification, as shown in figure 3.5 ( derived from figure 3.3(d) - Specification 4).

44
The calculated error can be in one of two partitions about the desired accuracy, a feature of the
structural design for the unit which is not apparent from a purely functional specification. An analysis
of internal boundary values yields three conditions for which test cases need to be designed.

Test Case 1: Error just greater than the desired accuracy


Test Case 2: Error equal to the desired accuracy
Test Case 3: Error just less than the desired accuracy

Internal boundary value testing can help to bring out some elusive bugs. For example, suppose "<="
had been coded instead of the specified "<". Nevertheless, internal boundary value testing is a
luxury to be applied only as a final supplement to other test case design techniques.

C.9. Error Guessing

Error guessing is based mostly upon experience, with some assistance from other techniques such as
boundary value analysis. Based on experience, the test designer guesses the types of errors that
could occur in a particular type of software and designs test cases to uncover them. For example, if
any type of resource is allocated dynamically, a good place to look for errors is in the deallocation of
resources. Are all resources correctly deallocated, or are some lost as the software executes?

Error guessing by an experienced engineer is probably the single most effective method of designing
tests which uncover bugs. A well placed error guess can show a bug which could easily be missed by
many of the other test case design techniques presented in this paper. Conversely, in the wrong
hands error guessing can be a waste of time.

To make the maximum use of available experience and to add some structure to this test case
design technique, it is a good idea to build a check list of types of errors. This check list can then be
used to help "guess" where errors may occur within a unit. The check list should be maintained with
the benefit of experience gained in earlier unit tests, helping to improve the overall effectiveness of
error guessing.

D. Conclusion

Experience has shown that a conscientious approach to unit testing will detect many bugs at a stage
of the software development where they can be corrected economically. A rigorous approach to unit
testing requires:

¾ That the design of units is documented in a specification before coding

45
begins;
¾ That unit tests are designed from the specification for the unit, also
preferably before coding begins;
¾ That the expected outcomes of unit test cases are specified in the unit test
specification.

The process for developing unit test specifications presented in this paper is generic, in that it can be
applied to any level of testing. Nevertheless, there will be circumstances where it has to be tailored
to specific situations. Tailoring of the process and the use of test case design techniques should be
documented in the overall test strategy.

48. LITERATURE REVIEW

2.1 Introduction

The purpose of this dissertation is to increase understanding of how experienced practitioners as


individuals evaluate diagrammatic models in Formal Technical Review (FTR). In this research, those
aspects of FTR relating to evaluation of an artifact by practitioners as individuals are referred to as
Practitioner Evaluation (PE). The relevant FTR literature is reviewed for theory and research
applicable to PE. However, FTR developed pragmatically without relation to underlying cognitive
theory, and the literature consists primarily of case studies with a very limited number of controlled
experiments.

Other work on the evaluation of diagrams and graphs is also reviewed for possible theoretical models
that could be used in the current research. Human-Computer Interaction (HCI) is an Information
Systems area that has drawn extensively on cognitive science to develop and evaluate Graphical
User Interfaces (GUIs). A brief overview of cognitive-based approaches utilized in HCI is presented.
One of these approaches, the Human Information Processing System model, in which the human
mind is treated as an information-processing system, provides the cognitive theoretical model for
this research and is discussed separately because of its importance. Work on attention and the
comprehension of graphics is also briefly reviewed.

Two further areas are identified as necessary for the development of the research task and tools: (1)
types of diagrammatic models and (2) types of software defects. Relevant work in each of these
areas is briefly reviewed and, since typologies appropriate to this research were not located,
appropriate typologies are developed.

2.2 Formal Technical Review

Software review as a technique to detect software defects is not new -- it has been used since the
earliest days of programming. For example, Babbage and von Neumann regularly asked colleagues
to examine their programs [Freedman and Weinberg 1990], and in the 1950s and 1960s, large
software projects often included some type of software review [Knight and Myers 1993]. However,
the first significant formalization of software review practice is generally considered to be the
development by Michael Fagan [1976] of a species of FTR that he called "inspection."

Following Tjahjono [1996, 2], Formal Technical Review may be defined as any "evaluation technique
that involves the bringing together of a group of technical [and sometimes non-technical] personnel
to analyze a software artifact, typically with the goal of discovering errors or other anomalies." As
such, FTR has the following distinguishing characteristics:

1. Formal process.
2. Use of groups or teams. Most FTR techniques involve real groups, but nominal groups are used as
well.
3. Review by knowledgeable individuals or practitioners.
4. Focus on detection of defects.

2.2.1 Types of Formal Technical Review


46
While the focus of this research is on the individual evaluation aspects of reviews, for context several
other FTR techniques are discussed as well. Among the most common forms of FTR are the
following:

1.Desk Checking, or reading over a program by hand while sitting at one's desk, is the oldest
software review technique [Adrion et al. 1982]. Strictly speaking, desk checking is not a form of FTR
since it does not involve a formal process or a group. Moreover, desk checking is generally perceived
as ineffective and unproductive due to (a) its lack of discipline and (b) the general ineffectiveness of
people in detecting their own errors. To correct for the second problem, programmers often swap
programs and check each other's work. Since desk checking is an individual process not involving
group dynamics, research in this area would be relevant but none applicable to the current research
was found.
It should be noted that Humphrey [1995] has developed a review method, called Personal Review
(PR), which is similar to desk checking. In PR, each programmer examines his own products to find
as many defects as possible utilizing a disciplined process in conjunction with Humphrey's Personal
Software Process (PSP) to improve his own work. The review strategy includes the use of checklists
to guide the review process, review metrics to improve the process, and defect causal analysis to
prevent the same defects from recurring in the future. The approach taken in developing the
Personal Review process is an engineering one; no reference is made in Humphrey [1995] to
cognitive theory.
2. Peer Rating is a technique in which anonymous programs are evaluated in terms of their overall
quality, maintainability, extensibility, usability and clarity by selected programmers who have
similar backgrounds [Myers 1979]. Shneiderman [1980] suggests that peer ratings of programs
are productive, enjoyable, and non-threatening experiences. The technique is often referred to as
Peer Reviews [Shneiderman 1980], but some authors use the term peer reviews for generic
review methods involving peers [Paulk et al 1993; Humphrey 1989].

3. Walkthroughs are presentation reviews in which a review participant, usually the software
author, narrates a description of the software and the other members of the review group provide
feedback throughout the presentation [Freedman and Weinberg 1990; Gilb and Graham 1993]. It
should be noted that the term "walkthrough" has been used in the literature variously. Some
authors unite it with "structured" and treat it as a disciplined, formal review process [Myers
1979; Yourdon 1989; Adrion et al. 1982]. However, the literature generally describes
walkthrough as an undisciplined process without advance preparation on the part of reviewers
and with the meeting focus on education of participants [Fagan 1976].

4. Round-robin Review is a evaluation process in which a copy of the review materials is made
available and routed to each participant; the reviewers then write their comments/questions
concerning the materials and pass the materials with comments to another reviewer and to the
moderator or author eventually [Hart 1982].

5. Inspection was developed by Fagan [1976, 1986] as a well-planned and well-defined group
review process to detect software defects – defect repair occurs outside the scope of the
process. The original Fagan Inspection (FI) is the most cited review method in the literature and
is the source for a variety of similar inspection techniques [Tjahjono 1996]. Among the FI-derived
techniques are Active Design Review [Parnas and Weiss 1987], Phased Inspection [Knight and
Myers 1993], N-Fold Inspection [Schneider et al. 1992], and FTArm [Tjahjono 1996]. Unlike the
review techniques previously discussed, inspection is often used to control the quality and
productivity of the development process.

A Fagan Inspection consists of six well-defined phases:

i. Planning. Participants are selected and the materials to be reviewed are prepared and
checked for review suitability.
ii. Overview. The author educates the participants about the review materials through a
presentation.
47
iii. Preparation. The participants learn the materials individually.
iv. Meeting. The reader (a participant other than the author) narrates or paraphrases the review
materials statement by statement, and the other participants raise issues and questions.
Questions continue on a point only until an error is recognized or the item is deemed correct.
v. Rework. The author fixes the defects identified in the meeting.
vi. Follow-up. The "corrected" products are reinspected.

Practitioner Evaluation is primarily associated with the Preparation phase.

In addition to classification by technique-type, FTR may also be classified on other dimensions,


including the following:

A. Small vs. Large Team Reviews. Siy [1996] classifies reviews into those conducted by small
(1-4 reviewers) [Bisant and Lyle 1996] and large (more than 4 reviewers) [Fagan 1976,
1986] teams. If each reviewer depends on different expertise and experiences, a large team
should allow a wider variety of defects to be detected and thus better coverage. However, a
large team requires more effort due to more individuals inspecting the artifact, generally
involves greater scheduling problems [Ballman and Votta 1994], and may make it more
difficult for all participants to participate fully.

B. No vs. Single vs. Multiple Session Reviews. The traditional Fagan Inspection provided for
one session to inspect the software artifact, with the possibility of a follow-up session to
inspect corrections. However, variants have been suggested.

Humphrey [1989] comments that three-quarters of the errors found in well-run inspections
are found during preparation. Based on an economic analysis of a series of inspections at
AT&T, Votta [1993] argues that inspection meetings are generally not economic and should
be replaced with depositions, where the author and (optionally) the moderator meet
separately with inspectors to collect their results.

On the other hand, some authors [Knight and Myers 1993; Schneider et al. 1992] have
argued for multiple sessions, conducted either in series or parallel. Gilb and Graham [1993]
do not use multiple inspection sessions but add a root cause analysis session immediately
after the inspection meeting.

C. Nonsystematic vs. Systematic Defect-Detection Technique Reviews. The most


frequently used detection methods (ad hoc and checklist) rely on nonsystematic techniques, and
reviewer responsibilities are general and not differentiated for single session reviews [Siy 1996].
However, some methods employ more prescriptive techniques, such as questionnaires [Parnas
and Weiss 1987] and correctness proofs [Britcher 1988].
D.Single Site vs. Multiple Site Reviews. The traditional FTR techniques have assumed that the
group-meeting component would occur face-to-face at a single site. However, with improved
telecommunications, and especially with computer support (see item F below), it has become
increasingly feasible to conduct even the group meeting from multiple sites.
E. Synchronous vs. Asynchronous Reviews. The traditional FTR techniques have also
assumed that the group meeting component would occur in real-time; i.e., synchronously.
However, some newer techniques that eliminate the group meeting or are based on computer
support utilize asynchronous reviews.

F. Manual vs. Computer-supported Reviews. In recent years, several computer supported


review systems have been developed [Brothers et al. 1990; Johnson and Tjahjono 1993;
Gintell et al. 1993; Mashayekhi et al 1994]. The type of support varies from simple
augmentation of the manual practices [Brothers et al. 1990; Gintell et al. 1993] to totally new
review methods [Johnson and Tjahjono 1993].

2.2.2 Economic Analyses of Formal Technical Review


48
Wheeler et al. [1996], after reviewing a number of studies that support the economic benefit of FTR,
conclude that inspections reduce the number of defects throughout development, cause defects to be
found earlier in the development process where they are less expensive to correct, and uncover
defects that would be difficult or impossible to discover by testing. They also note "these benefits are
not without their costs, however. Inspections require an investment of approximately 15 percent of
the total development cost early in the process [p. 11]."
In discussing overall economic effects, Wheeler et al. cite Fagan [1986] to the effect that investment
in inspections has been reported to yield a 25-to-35 percent overall increase in productivity. They
also reproduce a graphical analysis from Boehm [1987] that indicates inspections reduce total
development cost by approximately 30%.

The Wheeler et al. [1996] analysis does not specify the relative value of Practitioner Evaluation to
FTR, but two recent economic analyses provide indications.

• Votta [1993]. After analyzing data collected from 13 traditional inspections conducted at AT&T,
Votta reports that the approximately 4% increase in faults found at collection meetings (synergy)
does not economically justify the development delays caused by the need to schedule meetings
and the additional developer time associated with the actual meetings. He also argues that it is
not cost-effective to use the collection meeting to reduce the number of items incorrectly
identified as defective prior to the meeting ("false positives"). Based on these findings, he
concludes that almost all inspection meetings requiring all reviewers to be present should be
replaced with Depositions, which are three person meetings with only the author, moderator, and
one reviewer present.

• Siy [1996]. In his analysis of the factors driving inspection costs and benefits, Siy reports that
changes in FTR structural elements, such as group size, number of sessions, and coordination of
multiple sessions, were largely ineffective in improving the effectiveness of inspections. Instead,
inputs into the process (reviewers and code units) accounted for more outcome variation than
structural factors. He concludes by stating "better techniques by which reviewers detect defects,
not better process structures, are the key to improving inspection effectiveness [Abstract, p. 2]."
(emphasis added)

Votta's analysis effectively attributes most of the economic benefit of FTR to PE, and Siy's explicitly
states that better PE techniques "are the key to improving inspection effectiveness." These findings,
if supported by additional research, would further support the contention that a better understanding
of Practitioner Evaluation is necessary.

2.2.3 Psychological Aspects of FTR

Work on the psychological aspects of FTR can be categorized into four groups.

1.Egoless Programming. Gerald Weinberg [1971] began the examination of psychological issues
associated with software review in his work on egoless programming. According to Weinberg,
programmers are often reluctant to allow their programs to be read by other programmers
because the programs are often considered to be an extension of the self and errors discovered
in the programs to be a challenge to one's self-image. Two implications of this theory are as
follows:
i. The ability of a programmer to find errors in his own work tends to be impaired since he tends
to justify his own actions, and it is therefore more effective to have other people check his
work.

ii. Each programmer should detach himself from his own work. The work should be considered a
public property where other people can freely criticize, and thus, improve its quality;
otherwise, one tends to become defensive, and reluctant to expose one's own failures.

49
These two concepts have led to the justification of FTR groups, as well as the establishment of
independent quality assurance groups that specialize in finding software defects in many software
organizations [Humphrey 1989].

2. Role of Management. Another psychological aspect of FTR that has been examined is the
recording of data and its dissemination to management. According to Dobbins [1987], this must
be done in such a way that individual programmers will not feel intimidated or threatened.

3. Positive Psychological Impacts. Hart [1982] observes that reviews can make one more
careful in writing programs (e.g., double checking code) in anticipation of having to present or
share the programs with other participants. Thus, errors are often eliminated even before the
actual review sessions.

4.Group Process. Most FTR methods are implemented using small groups. Therefore, several key
issues from small group theory apply to FTR, such as group think (tendency to suppress dissent in
the interests of group harmony), group deviants (influence by minority), and domination of the
group by a single member. Other key issues include social facilitation (presence of others boosts
one's performance) and social loafing (one member free rides on the group's effort) [Myers 1990].
The issue of moderator domination in inspections is also documented in the literature [Tjahjono
1996].

Perhaps the most interesting research from the perspective of the current study is that of Sauer
et al. [2000]. This research is unusual in that it has an explicit theoretical basis and outlines a
behaviorally motivated program of research into the effectiveness of software development
technical reviews. The finding that most of the variation in effectiveness of software development
technical reviews is the result of variations in expertise among the participants provides
additional motivation for developing a solid understanding of Formal Technical Review at the
individual level.

It should be noted that all of this work, while based on psychological theory, does not address the
issue of how practitioners actually evaluate software artifacts.

2.3 Approaches to the Evaluation of Diagrammatic Models

The focus of this dissertation is the exploration of how practitioners as individuals evaluate
diagrammatic models for semantic errors that would cause the resulting system not to meet the
functionality, performance, security, usability, maintainability, testability or other requirements
necessary to the purposes of the system [Bass et al. 1998; Boehm et al. 1978].

2.3.1 General Approaches

Information Systems is an applied discipline that traditionally adapts concepts and techniques from
reference disciplines such as management, psychology, and engineering to solve information
systems problems. In searching for a theoretical model that could be used in the current research,
three separate approaches were explored.

1. Computer Aided Design (CAD). Since CAD uses diagrams to specify the design and
construction of physical entities [Yoshikawa and Warman 1987], it seemed reasonable to assume
that techniques developed to evaluate CAD diagrams might be adapted for the evaluation of
diagrams used to specify software systems. However, a review of the literature found relatively
little literature on the evaluation of CAD diagrams, and that which was found pertained to the
formal (i.e., "mathematical") evaluation of circuit designs. Discussion with William Miller of the
University of South Florida Engineering faculty supported this conclusion [Miller 2000], and this
approach was abandoned.

2.Radiological Images. While x-rays are not technically diagrams and do not specify a system,
they are visual artifacts and do convey information. Therefore, it was reasoned that rules for reading
radiological images might provide insights into the evaluation of software diagrammatic models.
50
Review of the literature found nothing appropriate. More importantly, as further conceptual work was
done regarding the purposes of evaluating software diagrammatic models, it became apparent that
the reading of x-rays was not an appropriate analog. This approach was therefore also abandoned.

3.Human-Computer Interaction (HCI). In reviewing the HCI literature, the following facts were
noted:

• The language, concepts, and purposes of HCI are very similar to those of information
systems, and it is arguable that HCI is a part of information systems. (See, for example, the
Huber [1983] and Robey [1983] debate on cognitive style and DSS design.)
• HCI is solidly rooted in psychology, a traditional information systems reference discipline.
• Computer user-interfaces almost always have a visual component and are increasingly
diagrammatic in design.
• User-interfaces can be and are evaluated in terms of the semantic error criteria described
above; i.e., defects in functionality, performance, efficiency, etc.

Based on these facts, a decision was made to attempt to identify an HCI evaluation technique
that could be adapted for evaluation of software diagrammatic models.

2.3.2 Human-Computer Interaction

Human-computer interaction (HCI) has been defined as "the processes, dialogues . . . and actions
that a user employs to interact with a computer environment [Baecker and Buxton 1987, 40]."

2.3.2.1 HCI Evaluation Techniques

Mack and Nielsen [1994] identify eight usability inspection techniques:

1. Heuristic Evaluation. Heuristic evaluation is an informal method that involves having usability
specialists judge whether each dialogue element conforms to established usability principles or
heuristics. Nielsen, the author of the technique, recommends that evaluators go through the
interface twice and notes that "[t]his two-pass approach is similar in nature to the phased
inspection method for code inspection (Knight and Myers 1993) [Nielsen 1994, 29]."

2. Guideline Reviews. Guideline reviews are inspections where an interface is checked for
conformance with a comprehensive list of guidelines. Nielsen and Mack note that "since guideline
documents contain on the order of 1,000 guidelines, guideline reviews require a high degree of
expertise and are fairly rare in practice [Nielsen and Mack 1994, 5]."

3. Pluralistic Walkthroughs. A pluralistic walkthrough is a meeting in which users, developers,


and human factors experts step through a scenario, discussing usability issues associated with
dialogue elements involved in the scenario steps.

4. Consistency Inspections. Consistency inspections have designers representing multiple


projects inspect an interface to see whether it consistent with other interfaces in the "family" of
products.

5. Standards Inspections. In a standards inspection, an expert on some interface standard checks


the interface for compliance with that standard.

6. Cognitive Walkthroughs. Cognitive walkthroughs use an explicitly detailed procedure to


simulate a user's problem-solving process at each step in the human-computer dialog, checking
to see if the simulated user's goals and memory for actions can be assumed to lead to the next
correct action.

7. Formal Usability Inspections. Formal usability inspections are designed to be very similar to
the Fagan Inspection used in code reviews.

51
8. Feature Inspections. In feature inspections the focus is on the functionality provided by the
software system being inspected; i.e., whether the function as designed meets the needs of the
intended end users.

These HCI evaluation techniques are clearly similar to FTR in that they involve the use of
knowledgeable individuals to detect defects in a software artifact; most also involve a formal process
and a group.

2.3.2.2 Cognitive Psychology and HCI


To assist in the design of better dialogues, HCI researchers have attempted to apply the findings of
cognitive psychology since, all other factors being equal, an interface that requires less short-term
memory resources or can be manipulated more quickly because fewer cognitive steps are required
should be superior. The following is a brief overview of cognitive-based approaches utilized in HCI.

• Human Information Processing System (HIPS). During the 1960s and 1970s, the main
paradigm in cognitive psychology was to characterize humans as information processors that
processed information much like a computer. While some of the assumptions of the original
model proved to be overly restrictive and other approaches have become popular, updated HIPS
models continue to be useful for HCI research. Given the importance of this model for this
research, a more complete treatment is provided in Section 2.4.1 below.

• Computational approaches also adopt the computer metaphor as a theoretical framework but
conceptualize the cognitive system in terms of the goals, planning, and action involved in task
performance. Tasks are analyzed not in terms of the amount of information processed in the
various stages but in terms of how the system deals with new information [Preece et al. 1994].

• Connectionist approaches simulate behavior through neural network or Parallel Distributed


Processing (PDP) models in which cognition is represented as a web of interconnected nodes.
Connectionist models have become increasingly accepted in cognitive psychology [Ashcraft
1994], and this fact has been reflected in HCI research [Preece et al. 1994].

• Human Factors/Actors. Bannon [1991, 28] argues that the term human factors should be
replaced with the term human actors to indicate "emphasis is placed on the person as an
autonomous agent that has the capacity to regulate and coordinate his or her behavior, rather
than being a simple passive element in a human-machine system." The change is supposed to
facilitate focusing on the way people act in real work settings instead of viewing them as
information processors.
• Distributed Cognition. An emerging theoretical framework is distributed cognition. The goal of
distributed cognition is to conceptualize cognitive activities as embodied and situated within the
work context in which they occur [Hutchins 1990; Hutchins and Klausen 1992].

The human factors/actors and distributed cognition models are not appropriate to the current
study. The connectionist models show great promise but are not yet sufficiently developed to be
useful for this research. The information processor models are however appropriate and sufficiently
mature; they provide the primary cognitive theoretical base for the dissertation. Computational
approaches are also utilized in that the study analyzes the cognitive system in terms of the task
planning involved in task performance.

2.4 Human Information Processing System (HIPS) Models and Related Topics

2.4.1 General Model

One of the major paradigms in cognitive science is the Human Information Processing System model.
In this model, humans are characterized as information processors, in which information enters the
mind, is processed in a series of ordered stages, and then exits [Preece et al. 1994]. Figure 2.1
summarizes one version of the basic model [Barber 1988].

52
Figure 2.1 Human Information Processing Stages (adapted from Barber [1988])

An early attempt to apply the model was Card et al.'s The Psychology of Human-Computer
Interaction [1983]. In that work, the authors stated that the human mind is also an information-
processing system and developed a simplified model of it that they called the Model Human
Processor. Based on this model, they made predictions about the usability of various user interfaces,
performed experiments, and reported their findings. The results were equivocal, and subsequent
cognitive psychology research has shown that the serial stage approach to cognition of the original
model is overly simplistic.

The original model also did not include memory and attention. Later versions do include these
processes, and Cowan [1995], in his exhaustive examination of the intersection of memory and
attention, discusses a number of these. Figure 2.2 summarizes a model that does include memory
and attention [Barber 1988].

Figure 2.2 Extended Stages of the Information Processing Model (adapted from Barber
[1988])

HIPS models, such as Anderson's ACT-R [1993], continue to be developed and are useful. Further,
the information processing approach has recently been described as the primary metatheory of
cognitive psychology [Ashcraft 1994].

2.4.2 Coping with Attention as a Limited Resource

One of the earliest psychological definitions of attention is that of William James [1890, vol. 1, 403-
404]:

53
Everyone knows what attention is. It is the taking possession of the mind, in clear and
vivid form, of one out of what seem several simultaneously possible objects or trains of
thought. Focalization, concentration of consciousness are of its essence. It implies
withdrawal from some things in order to deal more effectively with others . . . (emphasis
added)

This appeal to intuition explicitly states that attention is a limited resource.


In reaction to the introspection methodology of James, the Behaviorist movement asserted that the
study of internal representations and processes was unscientific. Since behaviorists dominated
American psychological thought during the first half of the Twentieth Century, little or no work was
done on attention in America during this period. In Europe, Gestalt psychology became dominant at
this time and that school, while not actively hostile to attention studies, did not encourage work in
the area. World War II however led to a rethinking of psychological approaches and acceptance of
using the experimental techniques developed by the behaviorists to study internal states and
processes [Cowan 1995].

An example of this rethinking is the work of Broadbent [1952] and Cherry [1953]. They used a
technique to study attention in which different spoken messages are presented to a subject's two
ears at the same time. Their research shows that subjects are able to attend to one message if the
messages are distinguished by physical (rather than merely semantic) cues, but recall almost
nothing of the nonattended channel. In 1956, Miller reviewed a series of experiments that utilized a
different methodology and noted that, across many domains, subjects could keep in mind no more
than about seven "chunks" simultaneously. These findings were among the first experimental
evidence that attentional capacity is a limited resource.

More recent experimental work continues to indicate that attention is a limited resource [Cowan
1995]. Even those cognitive psychologists who have recently challenged the very concept of
attention assume their "attention" analog is limited. One example of this would be Allport
[1980] and Wickens [1984], who argue that the concept of attention should be replaced with
the concept of multiple limited processing resources.

Based on an examination of the exhaustive review by Cowan [1995] of the intersection of memory
and attention, the Shiffrin [1988, 739] definition appears to be representative of contemporary
thought:

Attention has been used to refer to all those aspects of human cognition that the subject
can control . . . and to all aspects of cognition having to do with limited resources or
capacity, and methods of dealing with such constraints. (emphasis added)

Since human cognitive resources are limited, cognitively complex tasks may overload these
resources and decrease the quality and/or quantity of outputs. Various approaches to measuring the
cognitive complexity of tasks have been developed. In HCI, an informal view of complexity is often
utilized. For example, Grant [1990, sec. 1.3] defines a complex task as “one for which there are a
large number of potential practical strategies.” This definitions is not inconsistent with the measure
assumed by Simon [1962] in his paper on the use of hierarchical decomposition to decrease the
complexity of problem-solving.

Simon [1990] argues that humans develop mechanisms to enable them to deal with complex, real-
life situations despite their limited cognitive resources. One such mechanism is task planning.
According to Fredericksen and Breuleaux [1990], task planning is a cognitive bargain in which the
time and effort spent working with an abstract, and therefore, smaller problem space during planning
minimizes actual work on the task in the original, detailed problem space.

Earley and Perry [1987, 279] define a task plan as "a cognitively based routine for attaining a
particular objective and consists of multiple steps." Newell and Simon [1972] identify planning from
verbal protocols as those passages in which:

54
1. a person is considering abstract specifications of the action/information transformations required
to achieve goals;
2. a person considers sequences of two or more such actions or transformations; and
3. after developing the sequences, some or all of them are actually performed.

Two further items should be noted regarding planning:

1. Not all planning is original. Successful plans learned from others or by experience may be
stored in memory or externally [Newell and Simon 1972; Wood and Locke 1990]. Without the
recall, modification, and use of previous plans, the development of expertise would be
impossible.

2. Planning is not complete before action. Both theory and analysis of verbal protocols indicate that
periods of planning are interleaved with action [McDermott 1978; Newell and Simon 1972]. In
other words, practitioners will often plan a response to part of a task, complete some or all of the
actions specified in that plan, plan a new response incorporating information acquired during
prior action period(s), complete the new actions, etc.

2.4.3 Application of the HIPS Model to This Research

In the HIPS model, the nature and amount of stimuli impact both information processing and output.
This research uses a key concept of the HIPS model,
attention, in two ways:

1. Attention is a critical and limited resource, and when attention is overloaded, outputs decrease in
quality and quantity; therefore, a meta-cognitive strategy such as task planning that minimizes
attentional load should improve outputs.

2. Patterns are another meta-cognitive strategy for minimizing attentional load; therefore,
understanding which patterns better support the cognitive processing associated with evaluation
of diagrammatic models may allow individuals to be trained to use these better patterns, thus
lessening their attentional load and improving their outputs.

2.5 Research On the Comprehension of Graphics

Larkin and Simon [1987] consider why diagrams can be superior to a verbal description for solving
problems, and suggest the following reasons:

• Diagrams can group together all information that is used together, thus avoiding large amounts
of search for the elements needed to make a problem-solving inference.
• Diagrams typically use location to group information about a single element, avoiding the need to
match symbolic labels.
• Diagrams automatically support a large number of perceptual inferences, which are extremely
easy for humans.

As noted in Chapter 1, two of these depend on spatial patterns.

Winn [1994] presents an overview of how the symbol system of graphics interacts with the viewers'
perceptual and cognitive processes, which is summarized in figure 2.3. In his description, the
graphical symbol system consists of two elements: (1) Symbols that bear an unambiguous one-to-
one relationship to objects in the domain of reference, and (2) The spatial relations of the symbols to
each other. Thus, how symbols are configured spatially will affect the way viewers understand how
the associated objects are related and interact. For the purposes of this dissertation, a particularly
interesting finding is that biases based on reading direction (left-to-right for English) affect the
interpretation of graphics.

55
Figure 2.3. Winn [1994] Processes Involved in the Perception and Comprehension of
Graphics

Zhang [1997] proposes a theoretical framework for external representation based problem solving.
In an experiment she conducted using a Tic-Tac-Toe board and its logical isomorphs, the results
show that Tic-Tac-Toe behavior is determined by the configuration of the board. External
representations are thus shown to be more than just memory aids and a representational
determinism is suggested. This last point is particularly relevant to this dissertation since it states
that the form of representation determines what information can be perceived in a diagram.

2.6 Types of Diagrammatic Models

Selection of diagrammatic models to be included in the research task requires an appropriate


typology. Two diagrammatic model typologies were examined, Wieringa [1998] and Visible Systems
[1999].

2.6.1 Wieringa 1998

Wieringa, in his discussion of graphical structures or models that may be used in software
specification techniques, lists four general classes:

1. Decomposition Specification Techniques. These represent the conceptual structure of data in


a database system. Examples include Entity-Relationship Diagrams (ERDs) and such ERD
extensions as OO class diagrams.

2. Communication Specification Techniques. These show how the conceptual components


interact to realize external system interactions. Examples include Dataflow Diagrams (DFDs),
Context Diagrams, SADT Activity Diagrams, Object Communication Diagrams, SDL Block
Diagrams, Sequence Diagrams, and Collaboration Diagrams.

56
3. Function Specification Techniques. These specify the external functions of a system or the
functions of system components. Examples Function Refinement Trees, Event-Response
Specifications, and Use Case Diagrams.

4. Behavior Specification Techniques. These show how functions of a system or its components
are ordered in time. Examples include Process Graphs, JSD Process Structure Diagrams, Finite
(and Extended Finite) State Diagrams, Mealy Machines, Moore Machines, Statecharts, and
Process Dependency Diagrams.

2.6.2 Visible Systems

The methods listing in Visible Systems [1999] was examined as a representative of practitioner-
oriented, CASE-tools-based typologies. Seven models are listed; of these, six are diagrammatic in
nature.

1. Functional Decomposition Model. Shows the business functions and the processes they
support drawn in a hierarchical structure; also known as the Business Model. This type of model
is of a high-level functional nature and specifically applies to functions and not to the data that
those functions use. It is generally appropriate for defining the overall functioning of an
enterprise, not for individual projects.

2. Data Model. Shows the data entities of an application and the relationships between the entities.
Entities and relationships can be selected in subsets to produce views of the data model. The
diagramming technique normally used to depict graphically the data model is the Entity
Relationship Diagram (ERD) and the model is sometimes referred to as the Entity-Relationship
Model.

3. Process Model. Shows how things occur in the organization via a sequence of processes,
actions, stores, inputs and outputs. Processes are decomposed into more detail, producing a
layered hierarchical structure. The diagramming technique used for process modeling in
structured analysis is the Data Flow Diagram (DFD). Several notations are available for
representing process modeling, with the most widely used being Yourdon/DeMarco and Gane &
Sarson.

4. Product Model. Shows a hierarchical, top-down design map of how the application is to be
programmed, built, integrated, and tested. The modeling technique used in structured design is
the structure chart. It is a tree or hierarchical diagram that defines the overall architecture of a
program or system by showing the program modules and their interrelationships.

5. State Transition Model (Real Time Model). Shows how objects transition to and from various
states or conditions and the events or triggers that cause them to change between the different
states.

6. Object Class Model. Shows classes of objects, subclasses, aggregations and inheritance and
defines structures and packaging of data for an application.

2.6.3 Evaluation of Typologies in Prior Work

In evaluating these two typologies for this research, two problems were noted:

1.Neither classification scheme includes diagrammatic representations of Graphical User Interfaces


(GUIs). While such representations are not technically graphs (and thus not discussed by Wieringa)
and are not listed in Visible Systems, they may be used to specify parts of a system and are
therefore appropriate to this research.
2. Wieringa's work is based on the theoretical characteristics of graphs while Visible Analyst is
representative of practitioner-oriented, CASE-tool-based typologies. Neither is appropriate to the

57
research of this dissertation since neither captures factors likely to affect the cognitive processing
of practitioners in evaluating software diagrammatic models.

While it would be relatively easy to add diagrammatic representations of GUIs to Wieringa or Visible
Analyst, it was concluded that the second problem disqualified them for the purposes of this
research. Further review of several leading systems analysis and design texts [Fertuck 1995; Hoffer
et al. 1998; Kendall and Kendall 1995] did not yield an appropriate typology of diagrammatic
models, and it was therefore deemed necessary to develop one specifically for this dissertation.

2.6.4 Diagrammatic Model Typology Development

The first step in the development process was to consult several systems analysis and design and
structured techniques texts for classification insights and to derive lists of commonly used
diagrammatic models. These included Fertuck [1995], Hoffer et al. [1998], Kendall and Kendall
[1995], and Martin and McClure [1985].
Martin and McClure make a major distinction between hierarchical diagrams (i.e., those having one
overall node or root and which do not remerge) and mesh or network diagrams (i.e., those not
having a single overall node or root or which do remerge). For the purposes of this research, this
distinction is operationalized as the categorical variable hierarchical/not hierarchical.
Martin and McClure also make a major distinction between diagrams showing sequence and those
that do not. Sequence usually implies temporal directionality; for this dissertation, the distinction is
broadened to include the possibility of logical and other forms of directionality and is operationalized
as the categorical variable directional/not directional.

A distinction found in all texts referenced is between data-oriented and process-oriented diagrams.
Inspection of diagram types shows that the distinction is actually a data/process orientation
continuum. For the purposes of this dissertation, this continuum is collapsed into the categorical
variable data/hybrid/process oriented.

As a test of the feasibility of the classification scheme, twenty diagram types from Martin and
McClure, UML diagrams from Harmon and Watson [1998], and a model of a "typical" GUI were then
categorized. The results of this categorization are shown in table 2.1.

Table 2.1 Diagrammatic Model Types

58
Inspection of table 2.1 shows that only seven of the twelve (2 x 2 x 3) possible categories are
actually populated. Table 2.2 shows the categorization of the diagram types after collapsing
unpopulated categories.

HIERARCHICAL NOT HIERARCHICAL

DIRECTIONAL NOT DIRECTIONAL DIRECTIONAL NOT DIRECTIONAL

DATA HYBRID PROCESS DATA HYBRI PROCES DAT HYBRID PROCE DATA HYBRI PROCE
I II III IV D S A VIII SS X D SS
V VI VII IX XI XII
Functional Functional Data Flow Data “Typica
Decompos Decompos Analysis l”
i-tion II i-tion I GUI

Structure Flow Entity- UML


Charts Charts Relations Use
hip Case

HIPO HIPO Data Inverted- UML


(Overview (VTC) Navigatio L Class
) n

HIPO UML
(Detail) Sequence

Warni Warnier- UML


er-Orr Orr Collaborat
(Data) (Process) ion

Michae Michael Michael UML State


l Jackson Jackson
Jackso System Program-
n Network Structure
Data-
Struct
ure
Nassi- UML
Shneider Activity
man
Charts
Action II Action I

59
Table 2.2 Diagrammatic Model Types (Collapsed)

HIERARCHICAL NOT HIERARCHICAL

DIRECTIONAL DIRECTIONAL NOT DIRECTIONAL

DATA HYBRID PROCESS HYBRID PROCESS DATA HYBRID


I II III VIII IX X XI
Functional Functional Data Flow Data “Typical”
Decomposition Decomposi- Analysis GUI
II tion I
Structure Flow Entity- UML Use
Charts Charts Relationshi Case
p
HIPO HIPO Data Inverted-L UML Class
(Overview) (VTC) Navigation

HIPO UML
(Detail) Sequence

Warnier- Warnier-Orr UML


Orr (Process) Collaboratio
(Data) n
Michael Michael Michael UML State
Jackson Jackson Jackson
Data- System Program-
Structure Network Structure
Nassi- UML Activity
Shneiderma
n Charts
Action II Action I

60
2.7 Types of Software Defects

A semantic software defect (the focus of this research) is defined as a non-syntactic defect that
causes a software artifact or resulting system not to have the functionality, performance, security,
usability, maintainability, testability or other qualities necessary for the purposes of the system. In
other words, software defects are defined in terms of missing qualities. Other research reviewed is
not inconsistent with this approach. For example, Boehm et al. [1978] and Bass et al. [1998]
develop typologies of software qualities, and the definition in Grady [1992, 122] of a defect as "any
flaw in the specification, design, or implementation of a product" inherently includes software
qualities. Therefore, the primary focus of the first section below is on typologies of software qualities.
The second section reviews other software defect typologies, and the third section discusses the
development of the typology used in this research.

2.7.1 Software Quality Typologies

An interesting early software qualities typology is the Software Quality Characteristics Tree
(SQCT) of Boehm et al. [1978]. The SQCT is a hierarchical scheme in which the highest-level
construct, General Utility, is determined by two second-level constructs, As-Is Utility and
Maintainability, and one third-level construct, Portability. The second-level constructs are each
in turn determined by three other third-level constructs, Reliability, Efficiency, and Human
Engineering and Testability, Understandability, and Modifiability respectively. The third-level
constructs are determined by various combinations of twelve primitive characteristics (Device
Independence, Completeness, Accuracy, Consistency, Device Efficiency, Accessibility,
Communicativeness, Structuredness, Self-Descriptiveness, Conciseness, Legibility, and
Augmentability), which are strongly differentiated with respect to each other.

The Software Quality Characteristics Tree is shown in figure 2.4.

61
Figure 2.4 Boehm et al. [1978] Software Quality Characteristics Tree (adapted)

The Grady [1992] software defect model is shown below in figure 2.5. It is also a hierarchical model
(with the root at the bottom) that classifies defects according to origin, type, and mode. Grady
describes six types of software defects that correspond to the five modes plus a residual "Other"
category:

62
1. Specifications/Requirements Defect. A mistake in the definition of the customer/target needs
for a system or system component. Such mistakes can be in functional requirements,
performance requirements, test requirements, development standards, and so on.

2. Design Defect. A mistake in the design of a system or system component. Such mistakes can
be in algorithms, control logic, data structures, database access, input/output formats, interface
descriptions, and so on.

3. Code Defect. A mistake in the implementation of a computer program. Such mistakes can be in
product or test code, JCL, build files, and so on.

4. Documentation Defect. A mistake in any non-code product material delivered to a customer.


Such mistakes can be in user manuals, installation instructions, data sheets, product demos, and
so on. Mistakes in requirements specification documents, design documents, or code listings are
assumed to be specification defects, design defects, and coding defects, respectively.

5. Environmental Support Defect. Defects that arise as a result of the system development
and/or testing environment. Such mistakes can be in the build/configuration process, the
development/integration tools, the testing environment, and so on.

6. Other.

63
Figure 2.5 Grady [1992] Software Defect Model

Bass et al. [1998] discuss ten technical qualities of software, dividing them into those that are
discernible at runtime (DR) and those not discernible at runtime (NDR). The following is a brief
discussion of the software qualities in their typology:

1. Functionality (DR) is the ability of the system to do the work for which it was intended; it is the
basic statement of the system's capabilities, services, and behavior.

2. Performance (DR) refers to the responsiveness of the system - the time required to respond to
stimuli (events) or the number of events processed in some interval of time.

Bass et al. [1998, 79] note that "For most of the history of software engineering, performance
has been the driving factor in software architecture, and this has frequently compromised the
achievement of other qualities."

It should be noted that performance is relative to system requirements and that what would
otherwise be a "defect" may be the result of increasing some other quality.

64
3. Security (DR) is a measure of the system's ability to resist unauthorized attempts at usage and
denial of service while still providing its services to legitimate users.

4. Availability (DR) measures the proportion of time the system is up and running and is typically
defined as

α = (MTF) / (MTF + MTR) ,

where MTF = mean time to failure and


MTR = mean time to repair.

5. Usability (DR) is largely a function of the user interface.

6. Maintainability (NDR). Bass et al. [1998] use the terms modifiability and maintainability
interchangeably and define modifiability as the ability of a system to make changes quickly and
cost effectively. According to them, modifications to a system can be broadly categorized as
follows:

• Extending or changing capabilities. This category includes corrective maintenance and


extensibility.
• Deleting unwanted capabilities.
• Adapting to new operating environments.
• Restructuring.

7. Portability (NDR) is the ability of a system to run under different computing environments.

8. Reusability (NDR) relates to the design of a system so that the system's structure or some of its
components can be reused again in future applications. Bass et al. [1998, 84] note that
"Reusability is actually a special case of modifiability..."

9. Integrability (NDR) is the ability to make the separately developed components of the system
work correctly together.

10. Software testability (NDR) refers to the ease with which software can be made to demonstrate its
faults through (typically execution-based) testing.

This research uses Bass et al. [1998] as the basis for the qualities dimension of the software defects
typology.

2.7.2 Other Defect Dimensions

Review of the literature yields three other dimensions for the classification of software defects.

2.7.2.1 Class
Class refers to whether the defect is the result of logic or other required structure's being missing
(M), incorrect (I), or extra (E) [Ebenau and Strauss 1994].

While extra functionality may increase storage requirements or otherwise decrease efficiency, the
impact on functionality is generally less severe than that caused by the other two types.

2.7.2.2 Severity

The defect severity categories generally listed are major (J), minor (N), and (sometimes) trivial (T)
[Ebenau and Strauss 1994; Gilb and Graham 1993; Kelly et al. 1992].

A major defect is defined as one "that is expected to cause product failure, departure from
specifications, or prevent further correct development of the product[Ebenau and Strauss 1994,
92]." A minor defect is defined as one "that reduces the effectiveness, or confuses a product's
65
representation, format, or development process characteristics, but is not expected to impact the
operation or further development of the product [p. 92 ]."

2.7.2.3 Cause

Humphrey [1995], following Gale [1990], lists five categories of basic defect causes:

1. Education. You did not understand how to do something.


2. Communication. You were not properly informed about something.
3. Oversight. You omitted doing something.
4. Transcription. You knew what to do but made a mistake in doing it.
5. Process. Your process somehow misdirected your actions.
2.7.3 Development of the Defect Typology

The four dimensions discussed above produce a four-dimensional defect space. However,
examination shows that dimensional simplification is appropriate.

1. Defect cause cannot be determined directly from examination of software diagrammatic models.

2. Defect severity is defined in terms of impact on system functionality. Given that functionality is a
type of technical quality, a separate dimension would be redundant.

Further simplification is achieved by ignoring extra functionality defects of the class dimension. The
rationale for this reduction is that, while defects associated with extra functionality may increase
storage requirements or otherwise decrease efficiency, the impact on functionality is generally less
severe than that caused by missing and incorrect defects.

Change is also necessary on the qualities dimension. Six of the Bass et al. [1998] qualities are not
readily discernable from diagrammatic models and are consequently not appropriate to the typology.
However, according to Boehm et al. [1978], the primitive quality Structuredness partially determines
three of the six. Similarly, Fenton and Neil [2001] lists Structuredness as an internal attribute
associated with the external attributes reliability (or availability), maintainability, and reusability. The
six non-discernable qualities are listed below. A B indicates a Boehm quality; an F indicates a Fenton
attribute.

• Availability F
B,F
• Maintainability
• Portability
• Reusability B,F
• Integrability
• Testability B

Since Structuredness is associated with four of the six non-discernable qualities and is readily
discernable from a diagrammatic model, it is substituted as a partial proxy.

During the early development of the research task, several subjects noted that the scope of the
diagrammatic models was not consistent. From a theoretical perspective, lack of Scope Consistency
is an instance of a general consistency problem. In the structured approach to IS development, data
and process models are supposed to model the same system but are fundamentally separate. This
separateness leads to multiple problems including lack of consistency [Repa 2001]. Consideration
was given to adding the broader quality consistency to the topology, but this was rejected because
(1) some subjects perceived lack of Scope Consistency to be a separate issue and (2) lack of Scope
Consistency is different in that it can generally be readily discerned by comparing data and process
models, while other consistency problems are apparent only after significant functional analysis. Lack
of Scope Consistency would be expected to impact negatively on the integrability and maintainability
of the specified system

66
The resulting matrix is a two-dimensional defect space based on quality affected and class. It should
be noted that Scope Consistency and Structuredness are treated as logical variables; the quality is
either present or missing. Table 2.3 shows the resulting matrix.

Table 2.3 Software Defect Matrix: Qualities vs. Class

Scope Consistency
QUALITY

Structuredness

Functionality

Performance

Usability
Security
CLASS

Missing

Incorrect — —

2.7.4 Diagrammatic Model Type vs. Software Defect Type Matrix

Table 2.4 shows the matrix resulting from combining the Diagrammatic Model Type and Software
Defect Type typologies.

Table 2.4 Diagrammatic Model Type vs. Software Defect Type

67
Structuredness
QUALITY

Functionality

Performance
Consistency

Usability
Security
Scope
MODEL
M M M I M I M I M I

Hierarchical-
W-O
D1
Directional-
Data (I)

Hierarchical-
StrC2

Directional-
Hybrid (II)

Hierarchical-
W-O

Directional-
P3

Process (III)
Not
DFD4

Hierarchical-
Directional-
Hybrid (VIII)
Not
FlowC

Hierarchical-
Directional-
Process (IX)
5

Not
ERD6

Hierarchical-
Not Directional-
Data (X)
Not
GUI7

Hierarchical-
Not Directional-
Hybrid (XI)

NOTES
M = missing
I = incorrect

Typical Diagram for Each Category

1 = Warnier-Orr (Data) Diagram


2 = Structure Chart
3 = Warnier-Orr (Process) Diagram
4 = Data Flow Diagram
5 = Flow Chart
6 = Entity-Relationship Diagram
7 = “Typical” GUI

2.8 Summary and Conclusions

Prior theory and research that might inform the dissertation are reviewed. A large body of research
exists concerning Formal Technical Review, but review of this work shows that it is not based on
68
theory and therefore cannot inform this research effort. The first part of the literature review
therefore provides context rather than explicating applicable theory.

Three techniques from non-information systems disciplines for evaluating visual artifacts conveying
meaning are evaluated. While work on the evaluation of human-computer interaction (HCI)
approaches proves not to be directly applicable to this research, one of the HCI paradigms, the
Human Information Processing System (HIPS) model, is found to be relevant. The HIPS model is
reviewed, as is cognitive science work on attention and the comprehension of graphics.

Two other areas are identified as necessary for the development of the research task and tools: (1)
types of diagrammatic models and (2) types of software defects. The literature is reviewed and new
typologies are developed.

69

You might also like