Adam Smith Professor Rinard, TR11-12noon Therac-25: Two Design and Testing Errors In their investigation of the

Therac-25 accidents, Leveson and Turner describe many engineering errors that created and exacerbated the catastrophes. In this paper we will look at one design and one testing error; we will evaluate how these blunders contributed to the Therac-25 accidents and how they might be prevented in the future. The Therac-25 did not include a mechanism for recording events and failures, which seriously inflated the time it took to diagnose the machine’s problems. It is important to be able to correct problems using good troubleshooting practices. When a user reported abnormal pain and burning, often the operators contacted the Therac-25 manufacturer, AECL. This company often sent engineers to investigate, but the engineers always had problems reproducing the elusive circumstances that caused the accident. Levenson describes one case of this: “[AECL engineers] spent a day running the machine through tests but could not reproduce [the error].”1 In this case, the error was attributed to some random and rare force; the faulty machine was soon put back into operation. If, however, the Therac-25 had provided an error print-out detailing its state and actions during that session, it is likely that the problem could have been diagnosed faster and lives could have been saved. The engineers designing the Therac-25 also used bad testing practices, which lead to oversights of fatal error conditions. Leveson relates this point explicitly when describing the engineering errors made. “The software should be subjected to extensive testing and formal analysis at the module and software level; system testing alone is not adequate.” AECL had tested the Therac-25 by using it extensively (as a system) in their labs and in a hospital. Unfortunately there were an infinitely large number of possible inputs to the system; some were operator inputs, while others were mechanical switches and radioactivity sensors. As a result, the number of test cases was intractable and not exhausted during AECL’s testing. Therefore, fatal errors were overlooked. Good engineering practice could have prevented this.

1

An Investigation of the Therac-25 Accidents; Leveson, Turner; P. 28

In conclusion, we discussed two engineering errors that led to the Therac-25 accidents. There were many errors in the situation, but the two critical mistakes made contributed significantly to the failures.