You are on page 1of 4

Tools & Automation

Tools & Automation


Using Monkey
Test Tools
W
e get conflicting opinions about the efficacy of monkey
test tools. Boris Beizer suggests in Black Box Testing that test
monkeys arent very useful for testing todays professionally
created software. His analysis concludes that the use of good
testing practices will find more bugs than keyboard-scrabbling
(also called Rachmaninoff testing). But James Tierney, former
Director of Testing at Microsoft, has reported in internal presen-
tations that some Microsoft applica-
tions groups have found ten to twenty
percent of the bugs in their projects
using monkey test tools.
Which assessment of monkey test-
ing is correct? Probably both.
There is no universal test tool that will find all the bugs in any software. Each
tool has its uses, and some tools are more useful for certain projectsor at specific
points in a project cyclethan others. Test monkeys are no exception. Use them
wisely, and youll have a cost-effective way to find new bugs. Use them carelessly, or
exclusively, and youll release a buggy product. In this article well look at monkey
test tools, examine in detail the class of monkeys Ive used most often, and provide
guidelines to help you make wise choices.
The Family of Test Monkeys
This article uses the term monkey to refer broadly to any form of automated testing
done randomly and without any typical user bias. Calling such tools monkeys de-
rives from variations of this popular aphorism:
Six monkeys pounding on six
typewriters at random for a
million years will recreate all
the works of Isaac Asimov.
How to find bugs
cost-effectively through
random testing
by Noel Nyman
www. st qemagaz i ne. com Sof t war e Test i ng & Qual i t y Engi neer i ng Januar y/Febr uar y 2000
18
QUI CK LOOK
I Types of test monkeys
I Costs and benefits of random testing
I Guidelines to choosing the right monkey
This article is provided courtesy of STQE, the software testing and quality engineering magazine.
Januar y/Febr uar y 2000 Sof t war e Test i ng & Qual i t y Engi neer i ng www. st qemagaz i ne. com
19
While many of us find the mon-
key name appealing, others prefer
the more technical-sounding stochas-
tic testing. Regardless, the essential
elements are:
IThe monkey is relatively ignorant of how hu-
mans use the product. It doesnt know, for
example, how to build a Web page or create
an amortization table.
IThe monkey can randomly choose from
among a large range of inputs for testing,
and may be able to recreate all possible in-
puts for some applications.
Well consider two types of mon-
keys: smart monkeys and dumb mon-
keys. Smart monkeys have some
knowledge about how to access the
user interface in the product theyre
testing. They know at a simple func-
tional level what can be done, and
more importantthey understand
what should happen when they do it.
For example, they may know that
choosing the New item on the File menu
creates a new document, and they
know that the new document will be
displayed as a window with a particular
class and text. If no new document
window appears, or the window has
the wrong caption or class, the monkey
can identify the problem and report a
bug.
Smart monkeys usually get their
product knowledge from a state table
or model of the software they test.
Randomly traversing the state model,
they choose from among all the legal
options in the current state for mov-
ing to another state, and then verify
that they have reached the next ex-
pected state. You can add illegal in-
puts to the monkeys repertoire if the
model includes error-handling states.
Dumb monkeys act differently.
(Ignorant monkey is technically more
accurate, but the term dumb is far
more common.) They dont use a state
table; they have no idea what state the
test application is in, or what inputs are
legal or illegal. Most important, they
cant recognize a bug when they see
one. The pure dumb monkey exempli-
fies Beizers keyboard scrabbling test
tool, and it isnt very useful for most
projects. What can be useful is a not-
quite-dumb monkey thats ignorant
about your project, but understands its
environment enough to find very obvi-
ous bugs like crashes and hangs.
Such tools have been in use for
some time. In the early eighties the
Lisa and Macintosh project teams de-
veloped a dumb monkey test tool with
some limited knowledge of the Apple
operating systems. Some developers
required that their products survive a
specified amount of monkey test time
before they were released. Modern
test monkeys know even more about
their operating systems than those
early Apple simian tools did.
For this discussion, dumb mon-
keys are application-ignorant but en-
vironment-savvy.
When to Use
Smart Monkeys
In the ideal world, you could find all
application bugs by reviewing the
specifications and the code. You would
never have to run any tests on the final
product because it would be bug-free.
I admire the idea of software produced
under those conditions (and I hope to
work on such a project some day), but
most of us fall far short of this lofty
goal. There are many reasons, but one
of the most significant is the complex-
ity of the application environment. To
make thorough reviews, humans must
be able to understand and mentally ex-
ercise the software in its operating en-
vironment. With the proper training
and skills we can do that for embed-
ded systemsand even for some ap-
parently complex software working in
dedicated settings. But our grasp of
the situation, and the quality of our re-
view efforts, starts to fail when our
software must work in an event-driven
system, along with potentially thou-
sands of other unrelated products.
Add the possibilities of hundreds of
thousands of users making simultane-
ous demands on our product, and our
ability to find bugs by review alone
dwindles.
Well-crafted smart monkeys excel
at finding bugs in such situations. If
you accurately model the environment
events in a state table, thousands of
smart monkeys can read that table
and present those events to your
product. The monkeys will find com-
binations and sequences that human
reviewers would never consider, al-
though human users may create them
after the product is released. Most of
the commercially available load and
stress testing tools depend on this
smart monkey technology.
As Brian Marick says in The
Craft of Software Testing, complex
tests find more bugs than simple
tests. But most of our automated tests
are simple. We look for one major out-
come after applying one input. Then
we return the application to a known
base state and execute another simple
test. If the tests are well thought out,
theyll find good bugs. But they re-
main simple tests. When we return
the application to a base state, we dis-
card any history from previous
tests. Real users seldom do that. In-
stead they chain many simple activi-
ties, one after another, to create com-
plex situations.
Our simple tests dont emulate
that user behavior. So if one simple
activity sets up another activity for
failure, our simple tests wont find
that bugbut our users will find it.
Using a smart monkey, however, al-
lows us to make our simple automated
tests into complex user scenarios. Re-
move the return-to-the-known-base-
state routine from the tests. Then let
the monkey decide which tests to run,
and in what order. The monkey will
create very complex tests for as many
hours as you want, and it will make
different series of complex tests every
time you run it.
Another advantage of this simple-
turned-complex testing is that we can
make sure the application handles
memory and resource allocations well
over time. Running the same series of
tests, even complex tests, in the same
sequence over and over again seldom
finds new memory or resource bugs.
Instead, we need to use complex se-
quences that weve never used before.
Monkeys do this more efficiently than
humans.
The Cost of Smart
Monkey Testing
A good smart monkey tool is expen-
sive to create. Like any test automa-
tion, building smart monkey test tools
requires development and test
resources. But the greatest single cost
is generating the model or state table.
Its not unusual to need a 50,000-
node state table for a moderately
complex product. Continuing to add
This article is provided courtesy of STQE, the software testing and quality engineering magazine.
new features results in state explosion
in which the number of nodes increas-
es geometrically. So creating the mod-
el is seldom a one-time cost; for large
models or tables, maintenance be-
comes a major cost factor.
A good state table based on Petri
nets (an automation modeling tech-
nique for expressing concurrent
events in discrete parallel systems) or
Markov chains (a weighted graph in
which all weights are non-negative
and the total weight of outgoing edges
is positive) may have value beyond
the smart monkey utilityand that
may help justify some of the expense.
Even so, the cost of creating the table,
and the monkey to run tests using it,
often outweighs the value of the addi-
tional bugs found. The sad fact is that
most smart monkeys are not easily
adapted to other projects. Your mon-
key must pay back all its costs by
finding bugs on the specific project it
was designed to test.
Making Useful
Dumb Monkeys
I began working with dumb monkeys
several years ago, during the Win-
dows NT 4.0 product cycle. My team
uses retail applications as test engines
to look for operating systems bugs.
We develop automated test suites for
several hundred popular applications
and run them often during the Win-
dows product cycle. But there are
thousands of additional applications
we could use, if we had the time and
testers to create the tests. Some of
those applications might find us good
Windows bugs, and wed like to have
some inexpensive way to test them.
We experimented with dumb mon-
keys to test those applications. Our
dumb monkeys understand Windows
basics. They know about menus and
they can choose options on any menus
they find. They can also recognize
common Windows controls such as
command buttons, check boxes, radio
buttons and edit boxes. They run a set
of pre-defined tests on each control to
make sure its working properly. They
can also recognize a few of the most
common command buttons (such as
OK and Cancel), and the most sophisti-
cated monkeys automatically adjust to
localized versions of Windows.
Weve created monkeys using
several popular automation tools. Al-
though my teams interest is the Win-
dows operating system, similar mon-
keys can be developed for other GUI
operating systems using versions of
automation tools specific to other op-
erating systems.
Monkeys with GUI savvy can ma-
nipulate many Windows applications.
But a few applications rely on custom
controls to expose their functions to
users. Most automation tools have
trouble testing those applications be-
cause the tools cant find the controls
the user must manipulate. If the au-
tomation tool cant find the controls,
the monkey cant find them either. We
deal with that problem in several ways:
IWe tell the monkey to click randomly a few
times in every new window it sees. Occa-
sionally the monkey clicks on one of those
invisible elements and changes the applica-
tion state.
I If the application has interesting areas such
as toolbars that are invisible to the monkey,
we tell it to focus its random clicks in those
areas.
IWe can also ask the monkey to randomly
perform mouse actions, such as left-clicks,
right-clicks and drags, or enter random text
at the current insertion point, if the applica-
tion relies on human users doing those
things often.
(A monkey with those skills can
make some weird and futuristic draw-
ings in Microsoft Paint or Corel
Draw!)
We sometimes call these tools
generic state monkeys, because to
be effective they need to know five
states:
1. The test application is not running.
2. The test application is running and is prob-
ably ready to accept test input.
3. A new window appeared.
4. The new window has Windows controls on
it that the monkey recognizes.
5. The new window went away.
Given a state table with just these
five generic states, our monkey cant
log much useful information about an
applications faults and failures. Most
of the errors it sees are ambiguous; a
human must examine the error log to
decide what really happened. We call
these monkey noise bugs and we try
to avoid themmost often by ignoring
them entirely. Instead, the monkey
starts the application in a debug ses-
sion and we monitor the monkeys
tests with a debugger. We want to find
nasty crashing bugs that display the
dreaded Blue Screen of Death; a de-
bugger is very good at trapping those
bugs. It automatically halts the monkey
and allows a developer to examine the
machine state when the bug occurs.
When to Use
Dumb Monkeys
Although my team uses dumb mon-
keys to look for operating system
bugs, we find quite a few application
errors as well. There are four situa-
tions in every application product cy-
cle in which dumb monkeys can be
cost effective:
IDumb monkeys can find a lot of really good
bugsand save you testing timeearly in
the product cycle. The dumb monkey doesnt
need to know anything about the user inter-
face of the application. It doesnt matter
whether the UI is totally changed from yester-
days build or half of it is missing. The monkey
will test whatever it finds. So, you can start
dumb monkey testing as soon as the new
build arrives. The dumb monkey can explore
the application and perhaps find nasty bugs
while youre still adapting your formal au-
tomation suite to all the new UI changes.
I Dumb monkeys can give you very long runs
of complex tests. Unless they find a crashing
bug, theyll run for as many days as you let
them, pushing memory and resources to
their limits. If you have resource leaks or
memory issues, dumb monkeys will help you
find them.
I Near the end of the product cycle, when you
think youve found all the nasty bugs, dumb
monkey tests can help you increase your con-
fidence. Running the dumb monkey for days
at a time without failures gives you another
measure of the stability of your application.
IThe dumb monkey may be able to show you
holes in your traditional test coverage. Run
several hours of dumb monkey tests on a
version of your application instrumented for
www. st qemagaz i ne. com Sof t war e Test i ng & Qual i t y Engi neer i ng Januar y/Febr uar y 2000
20
This article is provided courtesy of STQE, the software testing and quality engineering magazine.
Januar y/Febr uar y 2000 Sof t war e Test i ng & Qual i t y Engi neer i ng www. st qemagaz i ne. com
21
coverage analysis and compare the results
with a full pass of your non-monkey tests. If
the monkey tests a function thats not
touched by your traditional tests, you need to
re-examine your test plan. If you have a state
table for your application, teach the monkey
to read it and check off each state as it tests
your application. If it finds one new state
thats not defined on your state table, the
monkey has exposed a whole new universe
of untested bug possibilities in your applica-
tionsomething like discovering a wormhole
into the heart of the Beta Quadrant! At least
one commercial tool (Rationals TestFactory)
uses the dumb monkey method to explore
applications and create automation to maxi-
mize coverage while minimizing test time.
(You might be surprised at the lev-
el of test coverage that dumb monkeys
can achieve. On one internal Microsoft
application, with complexity similar to
Microsoft WordPad, we got 65% code
function coverage in less than fifteen
minutes of dumb monkey tests.)
The Cost of Dumb
Monkey Testing
Compared with smart monkeys, and
most traditional automated and manual
testing, dumb monkeys are dirt-cheap.
A dumb monkey can test almost any
application that can run on its operat-
ing system. So, you can leverage it to
many unrelated projects.
You get the best results from a
dumb monkey that knows a few things
about your application. It will waste
less time on useless mouse clicking if
you can tell it about the interesting ar-
eas on the applications windows. But
overeducating dumb monkeys isnt
usually cost effective. Our target is to
spend no more than thirty minutes
teaching a dumb monkey about a new
application.
Once youve given the dumb
monkey the minimum information it
needs to explore your application, set
it up in a corner of your lab or office
on an old, slow computer no one
wants to use for regular testing. Have
it start testing the application under a
debugger and check its progress
every day or so. If the monkey finds
just one good bug, it will be the least
expensive bug your team reports.
Like any test tool, a good dumb
monkey can be expensive to develop.
But unlike many test tools, a mediocre
or beginner dumb monkey has a
good chance of finding some bugs, if
you use it at the right time and for the
right reasons. As the monkey proves
its worth, you can add features and
give it more skills. If you use Rational
Visual Test on the Windows platform,
you can start experimenting with
dumb monkeys using a simplified
monkey based on one of our Mi-
crosoft internal testing simians.
(The Freddie dumb monkey is
available on the compact disc accom-
panying Thomas R. Arnolds Visual
Test 6 Bible [IDG Books]. Chapter 14
of the book describes monkey testing
in more detail and shows you how to
add features to Freddie.)
Choosing Wisely
Monkey testing should not be your
only testing. Monkeys dont under-
stand your application, and in their ig-
norance they miss many bugs. Mon-
keys wont add much value to embed-
ded systems, software running in sim-
ple environments, or projects that are
difficult to automate.
Unless you already have an au-
tomation-readable model or state
table, smart monkeys will be very ex-
pensive to develop. They may be cost
effective, however, for critical parts of
a project where the state table can be
kept small. Theyre also valuable for
load and stress testing. When used in
the right places, smart monkeys will
find a significant number of bugs.
Dumb monkeys that understand
your operating system can be used on
any application to get some basic test-
ing done. A small amount of training
on your specific application greatly
improves the monkeys chances of
finding bugs. Dumb monkeys will not
find many bugs, but the bugs they do
find will be crashes and hangsthe
bugs you probably least want to have
in your product. STQE
Noel Nyman, software test
engineer for Microsofts Windows
2000 Certification (noeln@
microsoft.com), has worked in soft-
ware product development and test-
ing for over twenty years and is a
member of the Los Altos Workshop
on Software Testing. He tests,
therefore he is.
STQE magazine is produced by
STQE Publishing, a division of Software
Quality Engineering.
This article is provided courtesy of STQE, the software testing and quality engineering magazine.

You might also like