You are on page 1of 4

This article is provided courtesy of STQE, the software testing and quality engineering magazine.

Tools & Automation

Using Monkey Test Tools


by Noel Nyman

How to find bugs cost-effectively through random testing

e get conflicting opinions about the efficacy of monkey test tools. Boris Beizer suggests in Black Box Testing that test monkeys arent very useful for testing todays professionally created software. His analysis concludes that the use of good testing practices will find more bugs than keyboard-scrabbling (also called Rachmaninoff testing). But James Tierney, former Director of Testing at Microsoft, has reported in internal presentations that some Microsoft applicaQUICK LOOK tions groups have found ten to twenty percent of the bugs in their projects s Types of test monkeys using monkey test tools. s Costs and benefits of random testing Which assessment of monkey test- s Guidelines to choosing the right monkey ing is correct? Probably both.
There is no universal test tool that will find all the bugs in any software. Each tool has its uses, and some tools are more useful for certain projectsor at specific points in a project cyclethan others. Test monkeys are no exception. Use them wisely, and youll have a cost-effective way to find new bugs. Use them carelessly, or exclusively, and youll release a buggy product. In this article well look at monkey test tools, examine in detail the class of monkeys Ive used most often, and provide guidelines to help you make wise choices.

The Family of Test Monkeys


This article uses the term monkey to refer broadly to any form of automated testing done randomly and without any typical user bias. Calling such tools monkeys derives from variations of this popular aphorism:

Six monkeys pounding on six typewriters at random for a million years will recreate all the works of Isaac Asimov.
18

www.stqemagazine.com

Software Testing & Quality Engineering

Januar y/Februar y 2000

This article is provided courtesy of STQE, the software testing and quality engineering magazine.

While many of us find the monkey name appealing, others prefer the more technical-sounding stochastic testing. Regardless, the essential elements are: s The monkey is relatively ignorant of how humans use the product. It doesnt know, for example, how to build a Web page or create an amortization table. s The monkey can randomly choose from among a large range of inputs for testing, and may be able to recreate all possible inputs for some applications. Well consider two types of monkeys: smart monkeys and dumb monkeys. Smart monkeys have some knowledge about how to access the user interface in the product theyre testing. They know at a simple functional level what can be done, and more importantthey understand what should happen when they do it. For example, they may know that choosing the New item on the File menu creates a new document, and they know that the new document will be displayed as a window with a particular class and text. If no new document window appears, or the window has the wrong caption or class, the monkey can identify the problem and report a bug. Smart monkeys usually get their product knowledge from a state table or model of the software they test. Randomly traversing the state model, they choose from among all the legal options in the current state for moving to another state, and then verify that they have reached the next expected state. You can add illegal inputs to the monkeys repertoire if the model includes error-handling states. Dumb monkeys act differently. (Ignorant monkey is technically more accurate, but the term dumb is far more common.) They dont use a state table; they have no idea what state the test application is in, or what inputs are legal or illegal. Most important, they cant recognize a bug when they see one. The pure dumb monkey exemplifies Beizers keyboard scrabbling test tool, and it isnt very useful for most projects. What can be useful is a notquite-dumb monkey thats ignorant about your project, but understands its environment enough to find very obvi-

ous bugs like crashes and hangs. Such tools have been in use for some time. In the early eighties the Lisa and Macintosh project teams developed a dumb monkey test tool with some limited knowledge of the Apple operating systems. Some developers required that their products survive a specified amount of monkey test time before they were released. Modern test monkeys know even more about their operating systems than those early Apple simian tools did. For this discussion, dumb monkeys are application-ignorant but environment-savvy.

When to Use Smart Monkeys


In the ideal world, you could find all application bugs by reviewing the specifications and the code. You would never have to run any tests on the final product because it would be bug-free. I admire the idea of software produced under those conditions (and I hope to work on such a project some day), but most of us fall far short of this lofty goal. There are many reasons, but one of the most significant is the complexity of the application environment. To make thorough reviews, humans must be able to understand and mentally exercise the software in its operating environment. With the proper training and skills we can do that for embedded systemsand even for some apparently complex software working in dedicated settings. But our grasp of the situation, and the quality of our review efforts, starts to fail when our software must work in an event-driven system, along with potentially thousands of other unrelated products. Add the possibilities of hundreds of thousands of users making simultaneous demands on our product, and our ability to find bugs by review alone dwindles. Well-crafted smart monkeys excel at finding bugs in such situations. If you accurately model the environment events in a state table, thousands of smart monkeys can read that table and present those events to your product. The monkeys will find combinations and sequences that human reviewers would never consider, although human users may create them after the product is released. Most of

the commercially available load and stress testing tools depend on this smart monkey technology. As Brian Marick says in The Craft of Software Testing, complex tests find more bugs than simple tests. But most of our automated tests are simple. We look for one major outcome after applying one input. Then we return the application to a known base state and execute another simple test. If the tests are well thought out, theyll find good bugs. But they remain simple tests. When we return the application to a base state, we discard any history from previous tests. Real users seldom do that. Instead they chain many simple activities, one after another, to create complex situations. Our simple tests dont emulate that user behavior. So if one simple activity sets up another activity for failure, our simple tests wont find that bugbut our users will find it. Using a smart monkey, however, allows us to make our simple automated tests into complex user scenarios. Remove the return-to-the-known-basestate routine from the tests. Then let the monkey decide which tests to run, and in what order. The monkey will create very complex tests for as many hours as you want, and it will make different series of complex tests every time you run it. Another advantage of this simpleturned-complex testing is that we can make sure the application handles memory and resource allocations well over time. Running the same series of tests, even complex tests, in the same sequence over and over again seldom finds new memory or resource bugs. Instead, we need to use complex sequences that weve never used before. Monkeys do this more efficiently than humans.

The Cost of Smart Monkey Testing


A good smart monkey tool is expensive to create. Like any test automation, building smart monkey test tools requires development and test resources. But the greatest single cost is generating the model or state table. Its not unusual to need a 50,000node state table for a moderately complex product. Continuing to add
19

Januar y/Februar y 2000

Software Testing & Quality Engineering

www.stqemagazine.com

This article is provided courtesy of STQE, the software testing and quality engineering magazine.

new features results in state explosion in which the number of nodes increases geometrically. So creating the model is seldom a one-time cost; for large models or tables, maintenance becomes a major cost factor. A good state table based on Petri nets (an automation modeling technique for expressing concurrent events in discrete parallel systems) or Markov chains (a weighted graph in which all weights are non-negative and the total weight of outgoing edges is positive) may have value beyond the smart monkey utilityand that may help justify some of the expense. Even so, the cost of creating the table, and the monkey to run tests using it, often outweighs the value of the additional bugs found. The sad fact is that most smart monkeys are not easily adapted to other projects. Your monkey must pay back all its costs by finding bugs on the specific project it was designed to test.

several popular automation tools. Although my teams interest is the Windows operating system, similar monkeys can be developed for other GUI operating systems using versions of automation tools specific to other operating systems. Monkeys with GUI savvy can manipulate many Windows applications. But a few applications rely on custom controls to expose their functions to users. Most automation tools have trouble testing those applications because the tools cant find the controls the user must manipulate. If the automation tool cant find the controls, the monkey cant find them either. We deal with that problem in several ways: s We tell the monkey to click randomly a few times in every new window it sees. Occasionally the monkey clicks on one of those invisible elements and changes the application state. s If the application has interesting areas such as toolbars that are invisible to the monkey, we tell it to focus its random clicks in those areas. s We can also ask the monkey to randomly perform mouse actions, such as left-clicks, right-clicks and drags, or enter random text at the current insertion point, if the application relies on human users doing those things often. (A monkey with those skills can make some weird and futuristic drawings in Microsoft Paint or Corel Draw!) We sometimes call these tools generic state monkeys, because to be effective they need to know five states: 1. The test application is not running. 2. The test application is running and is probably ready to accept test input. 3. A new window appeared. 4. The new window has Windows controls on it that the monkey recognizes. 5. The new window went away. Given a state table with just these five generic states, our monkey cant log much useful information about an

applications faults and failures. Most of the errors it sees are ambiguous; a human must examine the error log to decide what really happened. We call these monkey noise bugs and we try to avoid themmost often by ignoring them entirely. Instead, the monkey starts the application in a debug session and we monitor the monkeys tests with a debugger. We want to find nasty crashing bugs that display the dreaded Blue Screen of Death; a debugger is very good at trapping those bugs. It automatically halts the monkey and allows a developer to examine the machine state when the bug occurs.

When to Use Dumb Monkeys


Although my team uses dumb monkeys to look for operating system bugs, we find quite a few application errors as well. There are four situations in every application product cycle in which dumb monkeys can be cost effective: s Dumb monkeys can find a lot of really good bugsand save you testing timeearly in the product cycle. The dumb monkey doesnt need to know anything about the user interface of the application. It doesnt matter whether the UI is totally changed from yesterdays build or half of it is missing. The monkey will test whatever it finds. So, you can start dumb monkey testing as soon as the new build arrives. The dumb monkey can explore the application and perhaps find nasty bugs while youre still adapting your formal automation suite to all the new UI changes. s Dumb monkeys can give you very long runs of complex tests. Unless they find a crashing bug, theyll run for as many days as you let them, pushing memory and resources to their limits. If you have resource leaks or memory issues, dumb monkeys will help you find them. s Near the end of the product cycle, when you think youve found all the nasty bugs, dumb monkey tests can help you increase your confidence. Running the dumb monkey for days at a time without failures gives you another measure of the stability of your application. s The dumb monkey may be able to show you holes in your traditional test coverage. Run several hours of dumb monkey tests on a version of your application instrumented for

Making Useful Dumb Monkeys


I began working with dumb monkeys several years ago, during the Windows NT 4.0 product cycle. My team uses retail applications as test engines to look for operating systems bugs. We develop automated test suites for several hundred popular applications and run them often during the Windows product cycle. But there are thousands of additional applications we could use, if we had the time and testers to create the tests. Some of those applications might find us good Windows bugs, and wed like to have some inexpensive way to test them. We experimented with dumb monkeys to test those applications. Our dumb monkeys understand Windows basics. They know about menus and they can choose options on any menus they find. They can also recognize common Windows controls such as command buttons, check boxes, radio buttons and edit boxes. They run a set of pre-defined tests on each control to make sure its working properly. They can also recognize a few of the most common command buttons (such as OK and Cancel), and the most sophisticated monkeys automatically adjust to localized versions of Windows. Weve created monkeys using
20

www.stqemagazine.com

Software Testing & Quality Engineering

Januar y/Februar y 2000

This article is provided courtesy of STQE, the software testing and quality engineering magazine.

coverage analysis and compare the results with a full pass of your non-monkey tests. If the monkey tests a function thats not touched by your traditional tests, you need to re-examine your test plan. If you have a state table for your application, teach the monkey to read it and check off each state as it tests your application. If it finds one new state thats not defined on your state table, the monkey has exposed a whole new universe of untested bug possibilities in your applicationsomething like discovering a wormhole into the heart of the Beta Quadrant! At least one commercial tool (Rationals TestFactory) uses the dumb monkey method to explore applications and create automation to maximize coverage while minimizing test time. (You might be surprised at the level of test coverage that dumb monkeys can achieve. On one internal Microsoft application, with complexity similar to Microsoft WordPad, we got 65% code function coverage in less than fifteen minutes of dumb monkey tests.)

The Cost of Dumb Monkey Testing


Compared with smart monkeys, and most traditional automated and manual testing, dumb monkeys are dirt-cheap. A dumb monkey can test almost any application that can run on its operating system. So, you can leverage it to many unrelated projects. You get the best results from a dumb monkey that knows a few things about your application. It will waste less time on useless mouse clicking if you can tell it about the interesting areas on the applications windows. But overeducating dumb monkeys isnt

usually cost effective. Our target is to spend no more than thirty minutes teaching a dumb monkey about a new application. Once youve given the dumb monkey the minimum information it needs to explore your application, set it up in a corner of your lab or office on an old, slow computer no one wants to use for regular testing. Have it start testing the application under a debugger and check its progress every day or so. If the monkey finds just one good bug, it will be the least expensive bug your team reports. Like any test tool, a good dumb monkey can be expensive to develop. But unlike many test tools, a mediocre or beginner dumb monkey has a good chance of finding some bugs, if you use it at the right time and for the right reasons. As the monkey proves its worth, you can add features and give it more skills. If you use Rational Visual Test on the Windows platform, you can start experimenting with dumb monkeys using a simplified monkey based on one of our Microsoft internal testing simians. (The Freddie dumb monkey is available on the compact disc accompanying Thomas R. Arnolds Visual Test 6 Bible [IDG Books]. Chapter 14 of the book describes monkey testing in more detail and shows you how to add features to Freddie.)

norance they miss many bugs. Monkeys wont add much value to embedded systems, software running in simple environments, or projects that are difficult to automate. Unless you already have an automation-readable model or state table, smart monkeys will be very expensive to develop. They may be cost effective, however, for critical parts of a project where the state table can be kept small. Theyre also valuable for load and stress testing. When used in the right places, smart monkeys will find a significant number of bugs. Dumb monkeys that understand your operating system can be used on any application to get some basic testing done. A small amount of training on your specific application greatly improves the monkeys chances of finding bugs. Dumb monkeys will not find many bugs, but the bugs they do find will be crashes and hangsthe bugs you probably least want to have in your product. STQE Noel Nyman, software test engineer for Microsofts Windows 2000 Certification (noeln@ microsoft.com), has worked in software product development and testing for over twenty years and is a member of the Los Altos Workshop on Software Testing. He tests, therefore he is.

Choosing Wisely
Monkey testing should not be your only testing. Monkeys dont understand your application, and in their ig-

STQE magazine is produced by STQE Publishing, a division of Software Quality Engineering.

21

Januar y/Februar y 2000

Software Testing & Quality Engineering

www.stqemagazine.com

You might also like