Professional Documents
Culture Documents
Therac-25 Accidents
c; .
omputers are increasingly being introduced into safety-critical systems
. •..
and, as a consequence, have been involved in accidents. Some of the most
.
.•
.
.. :
widely cit e d software-related accidents in safety-critical systems involved
• L
sponses of the manufacturer, govern Display Motion enable Beam onloff light emergency
ment agencies. and users, we present terminal switch (footswitch) switches
what we believe are the most compel
ling lessons to be learned in the context Figure 1. Typical Therac-25 facility.
J uly 1()()3 19
the energy increases, the depth in the dent protective circuits for monitoring were done independently, starting from
body at which maximum dose buildup electron-beam scanning. pl us mechani a common base." Reuse of Therac·6
occurs also increases, sparing the tissue cal interlocks for policing thc machine design features or modules may explain
above the target area. Economic advan and ensuring safe operation. The Ther some of the problematic aspects of the
tages also come into play for the cus ac-25 relics more on software for these Therac-25 software (see the sidebar
tomer. since only one machine is re functions. AECL took advantage of the 'Therac-25 software development and
q uired for both treatment modalities computer's abilities to control and mon design " ) . The quality assurance manag
(el ectrons and photons ) . itor the hardware and decided not to er was apparently unaware that some
Several fcatures of the Therac-25 are duplicate all the existing hardware safe Therac-20 routines werc also used in
important in understanding the acci ty mechanisms and interlocks. This ap the Therac-25; this was discovered after
dents. First, like the Therac-6 and the proach is becoming more common as a bug related to one of the Therac-25
Therac-20, the Therac-25 is controlled companies decide that hardware inter accidents was found in the Therac-20
by a PDP 1 1 . However, AECL designed locks and backups are not worth the software.
the Therac-25 to take advantage of com expense. or they put more faith (per AECL produced the first hardwired
puter control from the outset; AECL haps misplaced) on software than on prototype of the Therac-25 in 1 976. and
did not build on a stand-alonc machine. hardware reliability. the completely computerized commer
The Therac-6 and Therac-20 had been Finally, some software for the ma cial version was available in late 1 982.
designed around machines that already chines was interrelated or reused. In a (The sidebars provide details ahout the
had histories of clinical use without com letter to a Therac-25 user, the AECL machine's design and controlling soft
puter control. quality assurance manager said, "The ware. important in understanding the
In addition. the Therac-25 software same Therac-6 package was used by the accidents. )
has more responsibility for maintaining AECL software people when they start In March 1983, AECL performed a
safety than the software in the previous ed the Therac-25 software. The Therac- safety analysis on the Therac-25. This
machines. The Therac-20 has indepen- 20 and Therac-25 software programs analysis was in the form of a fanlt tree
We know that the software for the Therac-25 was devel AECL claims proprietary rights to its software design.
oped by a single person. using POP 11 assembly language, However, from voluminous documentation regarding the ac
over a period of several years. The software -evolved" from Cidents, the repairs, and the eventual design changes, we
the Therac-6 software, which was started in 1972. According can build a rough picture of it.
to a letter from AECl to the FDA, the "program structure and The software is responSible for monitoring the machine
certain subroutines were carried over to the Therac 25 status, accepting input about the treatment desired, and set
around 1976." ting the machine up for this treatment. It turns the beam on
Apparently. very little software documentation was pro in response to an opera1Or command (assuming that certain
duced during development. In a 1986 internal FDA memo, a operational checks on Ihe status of the phY$ical machine are
reviewer lamented, "Unfortunately, the AECL response also satisfied) and also turns the beam off when treatment is
seems to point out an apparent lack of documentation on completed, when an operator commands It, or when a mal
software specifications and a software test plan." function is detected. The operator can print out hard-copy
The manufacturer said that the hardware and software versions of the CRT display or machine setup parameters.
were "tested and exercised separately or together over The treatment unit has an interlock system designed to re
many years.· In his deposition for one of t he lawsuits, the move power to the unit when there is a hardware maHune
quality assurance manager explained that testing was done tion. The computer monitors this interlock system and pro
in two parts. A ·small amount" of software testing was done vides diagnostic messages. Depending on the fault, the
on a Simulator, but most testing was done as a system. It computer either prevents a treatment from being started or,
appears that unit and software testing was minimal. with if the treatment is in progress, creates a pause or a suspen
most effort directed at the Integratedsy_em test. At a Ther sion of the treatment.
ac-25 user group meeting, the same quality assurance man The manufacturer describes the TheraC-25 software as
ager said that the Therac-25 software was tested fOr 2.700 having a stand-alone, real-time treatment operating system.
hours. Under questioning by the users, he clarified this as The system is not built using a standard operating system or
meaning "2,700 hours of use: executive. Rather, the real-time executive was written espe
The programmer left AECl in 1986. In a lawsuh connected cially for the Therac-25 and runs on a 32K PDP 11/23. A
with one of the accidents, the lawyers were unable to obtain preemptive scheduler anocates cycles to Ihe critical and
information about the programmer from AECl. In the depo noncritical tasks.
sitions connected with that case, none of the AECL employ The software, written in POP 11 assembly language, has
ees questioned could provide any information about his edu four major components: stored data, a scheduler, a set of
cational background or experience. Although an attempt was critical and noncritical tasks, and interrupt services. The
made to obtain a deposition from the programmer, the law stored data includes calibration parameters for the accelera
suit was settled before this was accomplished. We have tor setup as well as patient-treatment data. The interrupt rou
been unable to learn anything about his background. tines include
20 COMPUTER
and apparently excluded the software. For "Computer selects wrong mode," a the responses from the manufacturer,
According to the final report, the anal probability of 4 x 1 0-9 is given. The government regulatory agencies. and
ysis made several assumptions: report provides no justification of ei users.
ther number.
(1) Programming errors hav� been Kennestone Regional Oncology Cen
reduced by extensive testing on a hardware ter. 1985. Details of this accident i n
simulator and under field conditions on
teletherapy units. Any residual software
Accident history Marietta, Georgia. a r e sketchy since i t
errors are not included in the analysis.
was never carefully investigated. There
(2) Program software does not degrade Eleven Therac-25s were installed: five was no admission that the injury was
duc to wear, fatigue. or reproduction in the US and six in Canada. Six acci caused by the Therac-25 until long after
process. the occurrence, despite claims by the
dents involving massive overdoses to
(3) Computer execution errors are
caused by faulty hardware components
patients occurred betwe e n 1 985 and patient that she had been injured during
and by "soft" (random ) errors induced by 1 987. The machine was recalled in 1 9 87 treatment. the obvious and severe radi
alpha particl es and electromagnetic noise. for extensive design changes, including ation burns the patient suffered, and
hardware safeguards against software the suspicions of the radiation physicist
The fault tree resulting from this anal errors. involved.
ysis does appear to include computer Related problems were found in the After undergoing a lumpectomy to
failure, although apparently, judging Therac-20 software. These were not rec remove a malignant hreast tumor, a 6 1 -
from these assumptions, it considers only ognized until after the Therac-25 acci year-old woman was receiving follow
hardware failures. For example, in one dents because the Therac-20 included up radiation treatment to nearby lymph
OR gate leading to the event of getting hardware safety interlocks and thus no nodes on a Therac-25 at the Kenne
the wrong energy, a box contains "Com injuries resulted. stone facility in Marietta. The Therac-
puter selects wrong energy" and a prob In this section, we present a chro 25 had heen operating at Kennestone
ability of 1 0-11 is assigned to this event. nological account of the accidents and for about six months; other Therae-25s
• a clock interrupt service routine, • The housekeeper task takes care of system-status in
• a scanning interrupt service routine, terlocks and limit checks. and puts appropriate messages
• traps (for software overflow and computer-hardware on the CRT display. It decodes some information and
generated interrupts). checks the setup verification.
• power up (initiated at power up to initialize the system
and pass control to the scheduler). Noncritical tasks include
• treatment console screen interrupt handler.
• treatment console keyboard interrupt handler, • Check sum processor (scheduled to run periodically).
• service printer interrupt handler. and • Treatment console keyboard processor (scheduled to
• service keyboard interrupt handler. run only if it is called by other tasks or by keyboard inter
rupts). This task acts as the interface between the software
The scheduler controis the sequences of all noninterrupt and the operator.
events and coordinates all concurrent processes. Tasks are • Treatment console screen processor (run periodically).
initiated every 0.1 second, with the critical tasks executed This task lays out appropriate record formats for either dis
first and the noncritical tasks executed in any remaining cy plays or hard copies.
cle time. Critical tasks include the following: • Service keyboard processor (run on demand). This task
arbitrates non-treatment-related communication between
• The treatment monitor (Treat) directs and monitors pa the therapy system and the operator.
tient setup and treatment via eight operating phases. These • Snapshot (run periodically by the scheduler). Snapshot
are called as subroutines, depending on the value of the captures preselected parameter values and is called by the
Tphase control variable. Following the execution of a partic treatment task at the end of a treatment.
ular subroutine, Treat reschedules itself. Treat interacts • Hand-control processor (run periodically).
with the keyboard processing task. which handles operator • Calibration processor. This task is responsible for a
console communication. The prescription data is cross package of tasks that let the operator examine and change
checked and verified by other tasks (for example, the key system setup parameters and interlock limits.
board processor and the parameter setup sensor) that in
form the treatment task of the verification status via shared It is clear from the AECL documentation on the modifica
variables. tions that the software allows concurrent access to shared
• The servo task controls gun emission. dose rate (pulse memory, that there is no real synchronization aside from
repetition frequency), symmetry (beam steering). and ma data stored in shared variables, and that the "test" and "set"
chine motions. The servo task also sets up the machine pa for such variables are not indivisible operations. Race con
rameters and monitors the beam-tilt-error and the ditions resulting from this implementation of multitasking
flatness-error interlocks. played an important part in the accidents.
July 1993 21
Major event time line
had been operating, apparently without
incident, since 1 983.
1 __
JUN 3rd: Marietta, Georgia, overdose.
On June 3, 1985. the patient was set
Later in the month, Tim Still calls AECL and asks if overdose by
up for a IO-MeV electron treatment to
Therac-25 is possible.
the clavicle area. When the machine
JUL 26th: Hamilton, Ontario, Canada, overdose; AECL notified and de
turned on. she felt a tremendous force
"
safety.
" You burned me. " The technician re
Independent cdnsultant (for Hamilton Clinic) recommends potentiom
plied that that was not possible. Al
eter on turntable.
though there were no marks on the pa
OCT Georgia patient files suit against AECL and hospital.
tient at the time. the treatment area felt
NOV 8th: Letter from CRPB to AECL asking for additional hardware inter
"warm to the touch."
locks and software changes.
It is unclear exactly when AECL
PEC Yakima, Washington, clinic overdose.
learned about this incident. Tim Still.
the Kennestone physicist, said that he
taa. contacted AECL to ask if the Therac-25
JAN Attorney for Hamilton clinic requests that potentiometer be installed could operate in electron mode without
on turntable. scanning to spread the beam. Three days
31st: Letter to AECL from Yakima reporting overdose possibility. later. the engineers at AECL called the
FEB 24th: Letter from AECL to Yakima saying overdose was impossible physicist back to exp l ain that improper
and no other incidents had occurred. scanning was not possihle.
MAR 21st: Tyler, Texas, overdose. AECL notified; claims overdose im Tn an August 1 9, 1 986, letter from
possible and no other accidents had occurred previously. AECL sug AECL to the FDA. the AECL quality
gests hospital might have an electrical problem. assurance manager said. "In March of
APR 7th: Tyler machine put back in service after no electrical problem 19X6, AECL received a lawsuit from the
could be found. patient involved . . . This incident was
11th: Second Tyler overdose. AECL again notified. Software prob never reported to AECL prior to this
lem found. date, although some rather odd ques
15th: AECL files accident report with FDA. tions had heen posed by Tim Still, the
MAY 2nd: FDA declares Therac-25 defective. Asks for CAP and proper hospital physicist." The physicist at a
renotification of Therac-25 users. hospital in Tyler, Texas. where a later
JUN 13th: First version of CAP sent to FDA. accident occurred. reported. "Accord
JUL 23rd: FDA responds and asks for more information. ing to Tim Still , the patient filed suit in
AUG First user grou , meeting. October 1985 listing the hospital, man
SEP 26th: AECL s4h ds FDA additional information. ufacturer, and service organization re
OCT 30th: FDA requests more information. sponsible for the machine. AECL was
NOV 12th: AECL submits revision of CAP. notified informal l y about the suit by the
DEC Therac-20 users notified of a software bug. hospital. and AECL received offici al
11th: FDA requests further changes to CAP. notification of a lawsuit in November
22nd: AECL submits second revision of CAP. 1 985."
Because of the lawsuit ( filed on Nov
1887 ember 13, 1985). some AECL admin
JAN 17th: Second overdose at Yakima. istrators must have known about the
26th: AECL sends FDA its revised test plan. Marietta accident - although no inves
FEB Hamilton clinic investigates first accident and concludes there was tigation occurred at this time. Further
an overdose. comments by FDA investigators point
3rd: AECL announces changes to Therac-25. to the lack of a mechanism in AECL to
10th: FDA sends notice of adverse findings to AECL declaring Ther fo l low up reports of suspected accidents.
ac-25 defective under US law and asking AECL to notify customers The lack of follow-up in this case ap
that it should not be used for routine therapy. Health Protection p ears to be evidence of such a problem
Branch of Canada does the same thing. This lasts until August 1987. in the organizatio n .
MAR Second user group meeting. The patient went home, but shortly
5th: AECL sends third revision of CAP to FDA.
afterward she developed a reddening
APR 9th: FDA responds to CAP and asks for additional information. and swe l lin g in the center of the treat
MAY 1st: AECL sends fourth revision of CAP to FDA.
ment area. Her pain had increased to
26th: FDA approves CAP subject to final testing and safety analYSis.
the point that her shoulder "froze" and
JUN 5th: AECL sends final test plan and draft safely analysis to FDA. she experienced spasms. She was ad
JUL Third user group meeting. mitted to West Paces Ferry H osp i tal i n
21st: Fifth (and final) revision of CAP sent to FDA.
Atlanta. b u t her oncologists continued
to send her to Kennestone for Therac-
taea
JAN 25 treat m ents. Clinical explanation was
29th: Interim safety analysis report issued.
NOV 3rd: Final safety analYSis report issued.
CO.\1PUTER
sought for the reddening of the skin, consequences. H ealth-care profession pital service technician was called. The
which at first h er oncologist attributed als and in stitutions were not required to technician found nothing wrong with
to her disease or to normal treatment report incidents to manufacturers. (The the machine. This also was not an un
reaction. law was amended i n 1 990 to require usual scenario, according to a Therac-
About two weeks later, the physicist health-care facilities to report i n cidents 25 operator.
at Kenncstone noticed that the patient to the manufacturer and the FDA.) The After the treatment, the patient com
had a matching reddening on hcr back compt roller general of the US Govern plained of a burning sensation, described
as though a burn had gone through her ment Accounting Office, in testimony as an "electric tingling shock" to the
body. and the swollen area had hegun to bef ore Congress on :-Iovember 6, 1 989, treatment area in her hip. Six other
slough off layers of skin. Hcr shoulder expressed great concern about the via patients were treated later that day with
was immobile, and she was apparently bility of the incident-reporting regula out incident. The patient came back for
in great pain. It was obvious that she tions in preventing or spotting medical further treatment on July 29 and com
had a radiation burn , but the hospital device problems . According to a GAO plained of burning, hip pain, and exces
and her doctors could provide no satis study, the FDA knows of less than 1 sive swelling in the region of treatment.
factory explanation. Shortly afterward, percent of deaths, serious injuries, or The machine was taken out of service,
she ini tiated a lawsuit again st the hospi equipment malfunctions that occur in as radiation overexposure was suspect
tal and AECL regarding her injury. hospitals.' ed. The patient was hospitalized for the
The Kennestone p hysicist later esti At this point , the othcr Therac-25 condition on July 30. AECL was in
mated that she received one or two dos users were unaware that anything unto formed of the apparent radiation injury
es of radiation in the 1 5 ,000- to 20,000- ward had occurred and did not learn and sent a service engineer to investi
rad (radiation absorbed dose) range. about any problems with the machi n e gate. The FDA, the then-Canadian Ra
He docs not believe her injury could until after subsequent accidents. Even diation Protection Bureau (CRPB), and
have been caused by less than 8,000 then, most of their information came the users were informed that there was
rads. I'ypical single therapeutic doses through personal communica tion among a proble m , although the uscrs claim that
are in the 200-rad range. Doses of 1 .000 themselves. they were ne ver informed that a patient
rads can be fatal if delivered to the injury had occurred. (On April 1. 1 986,
whole body; in fact. the accepted figure Ontario Cancer Foundation, 1985. The the CRPB and the Bureau of Medical
for whole-body radiation that will cause second in this series of accidents oc Devices were merged to form the Bu
death in 50 percent of the cases is 500 curred at this Hamilton, Ontario, Can reau of Radiation and Medical Devices
rads. The consequences of an overdose ada, clinic about seven weeks after the or BRMD.) G sers were told that they
to a smaller part ot the body depend on Kennestone patient was overdosed. At should visually confirm the turntable
the tissuc's radiosen s itivity. The direc thai time, the Therac-25 at the Hamil alignment until further notice (which
tor of radiation oncology at the Kenne ton clinic had been in use for more than occurred three months later).
stone facility explained their confusion six months. On July 26, 1 985 , a 40-year The patient died on November 3. 1 985,
about the accident as due to the fact that old patient came to the clinic for her of an extremely virulent cancer. An
they had never seen an overtreatmcnt 24th Therac-25 tre a t m ent [or carcino autopsy reveitled the cause of death as
of that magnitude before. m a of the cervix. The operator activat the cancer. but it was noted that had she
Eventually. the patient ' s breast had ed the machine, but the Therac shut not died. a total hip re p l ace m e nt would
to be removed because of the radiation down after five seconds with an " H-tilt " have been necessary as a re sult of the
burns. She completely lost the use of error message. The Therac's dosimetry radiation overexposure . An AECL tech
her shoulder and her arm , and was in system display read "no dose" and indi nician later estimated the patient had
constant pain. She had suffered a seri cated a "treatment pause . " received between 1 3,000 and 1 7,000 mds.
ous radiation burn, but the manufactur Since the machine did not suspend
er and operators of the machine refused and the control display indicated no Jlvlanufaclurrr response. AECL could
to believe that it could have been caused dose was delivered to the patient. the not reproduce the m alfunction that had
by the Therac-25. The treatment pre operator went ahead with a second at occurred, but suspected a transient fail
scription printout feature was disabled tempt at treatment by pr e ssing the "P" ure in the mi croswitch used to deter
at the time of the accident, so there was key (the proceed command), expecting mine turntable position. During the in
no hard copy of the treatment data. The the m achine to deliver the proper dose vestigation of t h e accident, AECL
lawsuit was eventually settled out of this time. This was standard operating hardwired the error conditions they as
court. procedure and . as described in the side sumed were necessary for the malfunc
From what we can dctermine, the ac bar "The operator interrace" on p. 24. tion and. as a result, found some design
cident was not reported to the FDA Therac-25 operators had bccome ac weaknesses and potential mechanical
u ntil ajler the later Tyler accidents in customed to frequent malfunctions that problems involving the turntahle posi
1 986 (described in later sections). The had no u ntoward consequences for the tioning.
reporting regulations for medical de patient. Again, the machine shut down The computer senses and controls
vice incidents at that time applied only in the same manner. The operator re turntable position by reading a 3-bit
to equipment manufacturers and im peated this process four times after the signal about the status of three mi
porters, not users. The regulations re original attempt - the display showing croswitches in the turntable switch as
quired that manufacturers and import "no dose" delivered to the patient each sembly (see the sidehar "Turntable po
ers report deaths, serious injuries, or ti me . After the fifth pause, the machine sitioning" on p. 25 ). Essentially, AECL
malfunctions that could result in those went into treatmen t suspend, and a hos- determined that a I -bit error in the mi-
July 1 993 23
The operator interface
In the main text, we describe changes made as a result and some merely consisted of the word "malfunction" fol
of an FDA recall, and here we describe the operator inter lowed by a number from 1 to 64 denoting an analog/digital
face of the software version used during the accidents. channel number. According to an FDA memorandum writ
The Therac-25 operator controls the machine with a ten after one accident
DEC VT100 terminal. In the general case, the operator po
The operator's manual supplied with the machine does
sitions the patient on the treatment table, manually sets not explain nor even address the malfunction codes. The
the treatment field sizes and gantry rotation, and attaches [Maintenance] Manual lists the various malfunction
accessories to the machine.leaving the treatment room, numbers but gives no explanation. The materials provided
the operator returns to the VT100 console to enter the pa give no indication that these malfunctions could place a
patient at riSk.
tient identification, treatment prescription (including mode,
The program does not advise the operator if a situation
energy level, dose, dose rate, and time), field sizing, gan exists wherein the ion chambers used to monitor the
try rotation, arid accessory data. The system then com patient are saturated, thus are beyond the measurement
pares the manually set values with those entered at the limits of the ins tru ment. This software package does not
appear to contain a safety system to prevent parameters
console. If they match, a "verified" message is displayed
being entered and intermixed that wou ld result in excessive
and treatment is permitted.If they do not matCh, treatment
radiation being delivered to the patient under treatment.
is not allowed to proceed until the mismatch is corrected.
Figure A shows the screen layout. An operator involved in an overdose accident testified
When the system was first built, operators complained that she had become insensitive to machine malfunctions.
that it took too long to enter the treatment plan. In re Malfunction messages were commonplace - most did not
sponse, the manufacturer modified the software before the involve patient safety. Service technicians would fix the
first unit was installed so that, instead 01 reentering the problems or the hospital physicist would realign the ma
data at the keyboard, operators could use a carriage return chine and make it operable again. She said, "It was not
to merely copy the treatment site data.' A quick series of out of the ordinary for something to stop the machine...It
carriage returns would thus complete data entry.This inter would often give a low dose rate in which you would turn
face modification was to figure in several aCCidents. the machine back on...They would give messages of
The Therac-25 could shut down in two ways after it de low dose rate, V-tilt, H-tilt, and other things; I can't re
tected an error condition. One was a treatment suspend, member all the reasons it would stop, but there [were) a
which required a complete machine reset to restart. The lot of them." The operator further testified that during in
other, not so serious, was a treatment pause, which re struction she had been taught that there were "so many
quired only a single-key command to restart the machine. safety mechanisms" that she understood it was virtually
If a treatment pause occurred, the operator could press the impossible to overdose a patient.
UP" key to "proceed" and resume treatment quickly and A radiation therapist at another clinic reported an aver
conveniently. The previous treatment parameters remained age of 40 dose-rate malfunctions, attributed to underdos
in effect, and no reset was required. This convenient and es, occurred on some days.
simple feature could be invoked a maximum of five times
before the machine automatically suspended treatment Reference
and required the operator to perform a system reset. 1. E. Miller. ''The Therac-25 Experience," Proc. Conf. State Redia·
Error messages provided to the operator were cryptic, tion Control Program Directors, 1987.
ACTUAL PRESCRIBED
UNIT RATE/MINUTE o 200
MONITOR UNITS 50 50 200
TIM E (MIN) 0.27 1.00
24 COMPUTER
croswitch codes (which could be caused The problem was exacerbated by the The plunger could be extended when
by a single open-circuit fault on the design of the mechanism that extends a the turntable was way out of position,
switch lines) could produce an ambigu plunger to lock the turntable when it is thus giving a second false position indi
ous position message for the computer. in one of the three cardinal positions: cation. AECL devised a method to indi-
Turntable positioning
The Therac-25 turntable design is important in under hazard of dual-mode machines: If the turntable is in the
standing the accidents. The upper turntable (see Figure wrong position, the beam flattener will not be in place.
8) is a rotating table, as the name implies. The turntable In the Therac-25, the computer is responsible for posi
rotates accessory equipment into the beam path to pro tioning the turntable (and for checking turntable position)
duce two therapeutic modes: electron mode and photon so that a target, flattening filter, and X-ray ion chamber
mode. A third position (called the field-light position) in are directly in the beam path. With the target in the beam
volves no beam at all; it facilitates correct positioning of path, electron bombardment produces X rays. The X-ray
the patient. beam is shaped by the flattening filter and measured by
Proper operation of the Therac-25 is heavily dependent the X-ray ion chamber.
on the turntable position; the accessories appropriate to No accelerator beam is expected in the field-light posi
each mode are physically attached to the turntable. The tion. A stainless steel mirror is placed in the beam path
turntable position is monitored by three microswitches and a light Simulates the beam. This lets the operator see
corresponding to the three cardinal turntable positions: precisely where the beam will strike the patient and make
electron beam, X ray, and field light. These microswitches necessary adjustments before treatment starts. There is
are attached to the turntable and are engaged by hard no ion chamber in place at this turntable position, since no
ware stops at the appropriate positions. The position of beam is expected.
the turntable, sent to the computer as a 3-bit binary sig Traditionally, electromechanical interlocks have been
nal, is based on which of the three microswitches are de used on these types of equipment to ensure safety - in
pressed by the hardware stops. this case, to ensure that the turntable and attached equip
The raw, highly concentrated accelerator beam is dan ment are in the correct position when treatment is started.
gerous to living tissue. In electron therapy, the computer In the Therac-25, software checks were substituted for
controls the beam energy (from 5 to 25 MeV) and current many traditional hardware interlocks.
while scanning magnets spread the beam to a safe, thera
peutic concentration. These scanning magnets are mount Reference
ed on the turntable and moved into proper position by the 1. J.A. Rawlinson, "Report on the Therac-25," OCTRFfOCI Physi
computer. Similarly, an ion chamber to measure electrons cists Meeting, Kingston, Ont., Canada. May 7, 1987.
is mounted on the turntable and
also moved into position by the
computer. In addition, operator-
mounted electron trimmers can Turntable switch assembly
be used to shape the beam if
necessary.
For X-ray therapy, only one en
ergy level is available: 25 MeV.
Much greater electron-beam cur
rent is required for photon mode
(some 100 times greater than
that for electron therapy) 1 to pro
duce comparable output. Such a .
I
high dose-rate capability is re I
I
I
quired because a "beam flatten I
I
•
er" is used to produce a uniform I
��
•
treatment field. This flattener,
X-ray de target
Which resembles an inverted ice
Switch
cream cone, is a very efficient at
actuators
tenuator. To get a reasonable
treatment dose rate oul, a very
high input dose rate is required. If
the machine produces a photon Electron mode base
scan magnet
beam with the beam flattener not
in position, a high output dose
rate results. This is the basic Figur e B. Upper turntable assembly.
July 1993 25
cate turntable position that tolerated a they could return to normal operating an independent upper collimator posi
I -bit error: The code would still unam procedures. tioning interlock on the Therac-25. Also,
biguously reveal correct position with As a result of the Hamilton accident, in January 1 986, AECL received a let
any one mieroswiteh failure. the head of advanced X-ray systems in ter from the attorney representing the
In addition, AECL altered the soft the CRPB, Gordon Symonds, wrote a Hamilton clinic. The letter said there
ware so that the computer checked for report that analyzed the design and per had been continuing problems with the
" in transit" status of the switches to formance characteristic� of the Therac- turntable, including four incidents at
keep further track of the switch opera 25 with respect to radiation safety. Be H amilton, and requested the installa
tion and the turntable position, and to sides citing the flawed microswitch, the tion of an independent system (potenti
give additional assurance that the switch report faulted hoth hardware and soft ometer) to verify turntable position.
es were working and the turntable was ware componcnts of the Therac's de AECL did not comply: N o independent
moving. sign. It concluded with a list of four interlock was installed on the Therac-
As a result of these improvements, modifications to the Therac-25 neces 25s at this time.
AECL claimed in its report and corre sary for minimum compliance with Can
spondence with hospitals that "analysis a d a ' s Radiation E mi t t i n g Devices Yakima Valley Memorial Hospital,
of the hazard rate of the new solution ( RED) Act. The R E D law, enacted in 1985. As with the Kennestone over
indicates an improvement over the old 197 1 , gives government officials power dose, machine malfunction in this acci
system by at least five orders of magni to ensure the safety of radiation-emit dent in Yakima, Washington, was not
tude. " A claim that safety h ad heen Ii ng devices. acknowledged until after later accidents
improved by five orders of magnitude The modifications recommended in were under�tood.
seems exaggerated, especially given that the Symonds report included redesign The Therac-25 at Yakima had been
in its final incident report to the FDA, ing the microswitch and changing the modified in September 1 985 in response
AECL concluded that it "cannot be firm way the computer handled malfunction to the overdose at Hamilton. During
on the exact cause of the accident but conditions. In particular. treatment was December 1 985, a woman came in for
can only suspect. . . " Thi� underscore� to be terminated in the event of a dose treatment with the Therac-25. She de
the company's inability to determine rate malfnnction, giving a treatment veloped erythema (excessive redden
the cause of the accident with any cer "suspend." This would have removed ing of the skin) in a parallel striped
tainty. The AECL quality assurance the option to proceed simply by press pattern at one port site (her right hip)
manager testified that AECL could not ing the "P" key. The report also made aner one of the treatments. Despite
reproduce the switch malfunction and recommendations regarding collimator this, she continued to be treated by the
that testing of the microswitch was "in test procedures and message and com Therac-25 because the cause of her rc
conclusive." The similarity ofthe errant mand formats. A Novemher R, 1 9R5 1et action was not determined to be abnor
behavior and the injuries to patients in ter signed by Ernest Letourneau, M.D., mal until January or February of 1 9R6.
this accident and a later one in Yakima, director of the CRPB, asked that AECL On January 6, 1986, her treatments were
Washington , (attributed to software make changes to the Therac-25 based completed.
error) provide good reason to believe on the Symonds report "to be in compli The staff monitored the skin reaction
that the Hamilton overdose was proba ance with the R E D Act." closely and attempted to find possible
bly related to software error rather than Although. as noted above, AECL did causes. The open slots in the blocking
to a microswitch failure. make the microswitch changes. it did trays in the Therac-25 could have pro
not comply with the directive to change duced such a striped pattern, but by the
Government and user response. The the malfunction pause behavior into time the skin reaction had been deter
Hamilton accident resulted in a volun treatment suspends, instead reducing mined to be abnormal, the blocking trays
tary recall by AECL, and the rDA the maximum number of retries from had been discarded. The blocking ar
termed it a Class I I recall. Class II means five to three. According to Symonds, rangement and tray striping orientation
"a situation in which the use of, or expo the deficiencies outlined in the CRPB could not be reproduced. A reaction to
sure to. a violative product may cause letter of November 8 were still pend chemotherapy was ruled out because
temporary or medically reversible ad ing when subsequent accidents five that should have produced reactions at
verse health consequences or where the months later changed the priorities. If the other ports and would not h ave pro
probability of serious adverse health these later accidents had not occurred, duced stripes. When it was discovered
consequences is remote. " Four users in AECL would have been compelled to that the woman slept with a heating
the US were advised by a letter from comply with the requirements in the pad, a possihle explanation was offered
AECL on August 1 , 1985. to visually letter. on the basis of the parallel wires that
check the ionization chamber to make Immediately after the Hamilton acci deliver the heat in such pads. The staff
sure it was in its correct position in the dent, the Ontario Cancer Foundation x-rayed the heating pad and discovered
collimator opening before any treat hired an independent consultant to in that the wire pattern did not correspond
ment and to discontinue treatment if vestigate. H e concluded in a September to the erythema pattern on the patient's
they got an H-tilt message with an in 1 9R5 report that an independent system hip.
correct dose indicated. The letter did (beside the computer) was needed to The hospital staff sent a letter to
not mention that a patient injury was verify turntable position and suggested AECL on January 3 1, and they also
involved. The FDA audited AECL's the u�e of a potentiometer. The CRPB spoke on the phone with the AECL
subsequent modifications. After the wrote a letter to AECL in November technical support supervisor. On Feb
modifications, the users were told that 1 985 requesting that AECL install such ruary 24, 1 986, the AECL technical sup-
26 COMPUTER
port supervisor sent a written response the hospital staff to suspect that the first to typing this. The mistake was easy to
to the director of radiation therapy at injury had been due to a Therac-25 fault, fix; she merely used the cursor up key to
Yakima saying, "After careful consid the staff investigated and found that edit the mode entry.
eration, we are of the opinion that this this patient had a chronic skin ulcer, Since the other parameters she had
damage could not have been produced tissue necrosis (death) under the skin, entered were correct. she hit the return
by any malfunction of the Therac-25 or and was in constant pain . This was sur key several times and left their values
by an y operator error." The letter goes gically repaired, skin grafts were made, unchanged. She reached the bottom of
on to support this opinion by listing two and the symptoms relieved. The patient the screen where a message indicated
pages of technical reasons why an over is alive today, with minor disability and that the parameters had been "verified"
dose by the Thcrac-25 was impossible, some scarring related to thc overdose. and the terminal displayed "beam
along with the additional argument that The hospital staff concluded that the ready," as expected. She hit the onc-key
there have "apparently been no other dose accidentally delivered to this pa command B (for "beam on" ) to begin
" "
instances of similar damage to this or tient must have been much lower than the treatment. After a moment, the
other patients." The letter ends, "In in the second accident, as the reaction machine shut down and the console dis
closing. I wish t o advise that this matter was significantly less intense and necro played the message "Malfunction 54."
has been brought to the attention of our sis did not develop until six to eight The machine also displayed a "treat
Hazards Committee, as is normal prac months after exposure. Some other fac ment pause," indicating a problem of
tice . " tors related to the place on the body low priority (see the operator interface
The hospital staff eventually ascribed where the overdose occurred also kept sidebar ) . The sheet on the side of the
the skin/tissue problem to "cause un her from having more significant prob machine explaincd that this malfunc
known . " In a report written on this first lems as a result of the exposure. tion was a "dose input 2" error. The
Yakima incident after another Yakima ETCC did not have any other informa
overdose a ye ar later (described in a East Texas Cancer Center, March tion available in its instruction manual
later section ). the medical physicist in 1986. More is known about the Tyler, or other Therac-25 documentation to
v olved wrote Texas, accidents than the others be explain the meaning of Malfunction 54.
cause of the diligence of the Tyler hos An AECL technician later testified that
A t that time. we dId n ot believe that pital physicist, Fritz Hager. without "dose input 2" meant that a dose had
l the patie ntJ was ov er d osed because the
whose efforts the understanding of the been delivered that was either too high
manufacturer had installed additional
hardware and software safetv . devices to
software problems might have been or too low.
the accelerator. delayed even further. The machine showed a substantial
In a lcttcr from the manufacturer dated The Therac-25 was at the East Texas underdose on its dose monitor display:
1 6-Sep-85, i t is stated that " Analvsis of the Canccr Center (ETCC) for two years 6 monitor units delivered, whereas the
hazard r at e resulting from these
modifications indicates an improvement
before the first serious accident occurred; operator had requested 202 monitor
of at least five orders of magnitude" ! Wi th during that time. more than SUU pa units. The operator was accustomed to
such an improvement in safety ( 1 0,000,000 tients had been treated. On March 2 1 , the quirks of the machine, which would
percent) we did not believe that there 1 986, a male patient came into ETCC frequently stop or delay treatment. In
c o u l d h a v e b e e n any a c c e le rator
malfunction. These modifications to the
for his ninth treatment on the Therac- the past, the only consequences had
a c c e lerator were completed on 5 , 6 - 25, one of a series prescribed as follow been inconvenience. She immediately
Sep-85. up to the removal of a tumor from his took the normal action when the ma
back. chine merely paused, which was to hit
Even with fairly sophisticated phys The patient's treatment was to be a the "p" key to proceed with the treat
ics support, the hospital staff, as users, 22-MeV electron-beam treatment of 1 80 ment. The machine promptly shut down
did not have the ability to investigate rads over a l O x 1 7-cm fieldon the upper with the same " Malfunction 54" error
the possibility of machine malfunction back and a little to the left of his spine, and the same underdose shown by the
further. They were not aware of any or a total of 6,000 rads over a period of display terminal.
other incidents, and, in fact, were told 6 1 /2 weeks. He was taken into the treat The operator was isolated from the
that there had been none. so there was ment room and placed face down on the patient, since the machine apparatus
no reason for them to pursue the mat treatment table. The operator then left was inside a shielded room of its own.
ter. However, it seems that the fact that the treatment room, closed the door, The only way the operator could be
three similar incidents had occurred with and sat at the control terminaL alerted to patient difficulty was through
this equipment should have triggered The operator had held this job for audio and video monitors. On this day,
some suspicion and investigation by the some time , and her typing efficiency the video display was unplugged and
manufacturer and the appropriate gov had increased with experience. She could the audio monitor was broken.
ernment agencies. This assumes, of quickly enter prescription data and After the first attempt to treat him,
course, that these incidents were all re change it conveniently with the Ther the patient said that he felt like he had
ported and known by A E CL and by the ac's editing features. She entered the received an electric shock or that some
government regulators. If they were not, patient's prescription data quickly, then one had poured hot coffee on his back:
then it is appropriate to ask why they noticed that for mode she had typed "x" He felt a thump and heat and heard a
were not and how this could be reme (for X ray) when she had intended "e" buzzing sound from the eq uipment. Since
died in the future. (for electron). This was a common mis this was his ninth treatment, he knew
About a year later (in February 1 987). take since most treatments involved X that this was not normal. He began to
after the second Yakima overdose led rays. and she had become accustomed get up from the treatment table to go for
J uly 1 993 27
help. It was at this moment that the personnel (including the quality assur him that another patient appeared to
operator hit the "P" key to proceed with ance manager) told him that AECL knew have been burned. Asked by the physi
the treatment. The patient said that he of no accidents involving radiation over cist to describe what he had experi
felt like his arm was being shocked by exposure by the Therac-25. This seems enced, the patient explained that some
electricity and that his hand was leaving odd since AECL was surely at least thing had hit him on the side of the face,
his body. He went to the treatment room aware of the Hamilton accident that he saw a flash of light, and he heard a
door and pounded on it. The operator had occurred seven months before and sizzling sound reminiscent of frying eggs.
was shockcd and immcdiately opened the Yakima accident, and, even by its He was very agitated and asked, "What
the door for him. He appeared shaken own account, AECL learned of the happened to me, what happened to me?"
and upset. Georgia lawsuit about this time (the This patient died from the overdose
The patient was immediately exam suit h ad heen filed four months earlier). on May 1 , 1 986, three weeks after the
ined by a physician, who observed in The AECL enginccrs then suggested accident. He had disorientation that
tense erythema over the treatment area, that an electrical problem might have progressed to coma, fever to 1 04 de
but suspected nothing more serious than caused this accident. grees Fahrenheit, and neurological dam
electric shock. The patient was dis The electric shock theory was checked age. Autopsy showed an acute high
charged with instructions to return if he out thoroughly by an independent engi dose radi ation i nj ury to the right
suffered any further reactions. The hos neering firm. The final report indicated temporal lobe ofthe brain and the brain
pital physicist was called in, and he found that there was no electrical grounding stem.
the machine calibration within specifi problem in the machine. and it did not
cations. The meaning of the malfunc appear capable of giving a patient an User and manufacturer response. Af
tion message was not understood. The electrical shock. The ETCC physicist ter this second Tyler accident, the ETCC
machine was then used to treat patients checked the calibration of the Therac- physicist immediately took the machine
for the rest of the day. 25 and found it to be satisfactory. The out of service and called AECL to alert
In actuality, but unknown to anyone center put the machine hack into ser the company to this second apparent
at that time, the patient had received a vice on April 7, 1 986, convinced that it overexposure. The Tyler physicist then
massive overdose, concentrated in the was performing properly. began his own careful investigation. He
center of the treatment area. After-the worked with the operatoL who remem
fact simulations of the accident revealed East Texas Cancer Center, April 1986. bered exactly what she had done on this
possible doses of 1 6 .500 to 25 .000 rads Three weeks after the first ETCC acci occasion. After a great deal of effort,
in less than 1 second over an area of dent, on Friday, April 1 1 , 1 9X6, another they were eventually able to elicit the
about 1 cm. male patient was scheduled to receive Malfunction 54 message. They deter
During the weeks following the acci an electron treatment at ETCC for a mined that data-entry speed during ed
dent, the patient continued to have pain �kin cancer on the side of his face. The iting was the key factor in producing the
in his neck and shoulder. He lost the prescription was for l O Me V to an area error condition: If the prescription data
function of his left arm and had periodic of approximately 7 x 10 cm. The same was edited at a fast pace (as i s natural
bouts of nausea and vomiting. He was technician who had treated the first Tyler for someone who has repeated the pro
eventually hospitalizcd for radiation accident victim prepared this patient cedure a l arge n umber of times), the
induced myelitis of the cervical cord for treatment. Much of what follows is overdose occurred.
causing paralysis of his left arm and from the deposition of the Tyler Ther It took some practice before the phys
both legs, left vocal cord paralysis (which ae-25 operator. icist could repeat the procedure rapidly
left him unable to spea k ) , neurogenic As with her former patient, she en enough to elicit the Malfunction 54 mes
bowel and bladder, and paralysis of the tered the prescription data and then sage at will. Once he could do this, he
left diaphragm. He also had a lesion on noticed an error in the mode. Again she set about measuring the actual dose
his left lung and recurrent herpes sim used the cursor up key to change the delivered under the error condition. He
plex skin infections. He died from com mode from X ray to electron. After she took a measurement of about 804 rads
plications of the overdose five months finished editing, she pressed the return but realized that the ion chamber had
after the accident. key several times to place the cursor on become saturated. After making adjust
the bottom of the screen. She saw the ments to extend his measurement abil
User and manufacturer response. The "beam ready" message displayed and ity, he determined that the dose was
Therac-25 was shut down for testing the turned the beam on. somewhere over 4,000 rads.
day after this accident. One local AECL Within a few seconds the machine The next day, an engineer from AECL
engineer and one from the home office shut down, making a loud noise audible called and said that he could not repro
in Canada came to ETCC to investi via the ( now working) intercom. The duce the error. After the ETCC physi
gate. They spent a day running the ma display showed Malfunction 54 again. cist explained that the procedure had to
chine through tests but could not repro The operator rushed into the treatment be performed quite rapidly, AECL could
ducc a Malfunction 54. The AECL home room. hearing her patient moaning for finally produce a similar malfunction
office engineer reportedly explained that help. The patient began to remove the on its own machine. AECL then set up
it was not possible for the Therac-25 to tape that had held his head in position its own set of measurements to test the
overdose a paticnt, The ETCC physicist and said something was wrong. She asked dosage delivered. Two days after the
claims that he asked AECL at this time him what he felt, and he replied "fire" accident, AECL said they had measured
if there were any other reports of radi on the side of his face. She immediately the dosage (at the center of the field) to
ation overexposure and that the AECL went to the hospital physicist and told be 25 ,000 rads. An AECL engineer ex-
28 COMPUTER
plained that the frying sound heard by unable to reproduce the error on his The software problem. A lesson to be
the patient was the ion chambers being machine. but two months later he found learned from the Therac-25 story is that
saturated. the link. focusing on particular software bugs is
In fact. it is not possihle to determine The Therac-20 at the University of not the way to make a safe system. Vir
the exact dose each of the accident vic Chicago is used to teach students in a tua lly all complex software can be made
tims received; the total dose delivered radiation therapy school conducted by to behave in an unexpected fashion un
during the malfunction conditions was the center. The center's p hysicist, Frank der certain conditions. The basic mis
found to vary enormously when differ Borger, noticed that whenever a new takes here involved poor software-en
ent clinics simulated the faults. The num class of students started using the Ther gineering practices and b ui lding a
ber of pulses delivered in the 0.3 second ac-20, fuses and breakers on the ma machine that relies on the software for
that elapsed before interlock shutoff chine tripped, shutting down the unit. safe operation. Furthermore, the par
varied because the software adjusted These failures, which had been occur ticular coding error is not as important
the start-up pulse-repetition frequency ring ever since the center had acquired as the general unsafe design of the soft
to very different values on different the machine, might appear three times a ware overall. Examining the part of the
machines. Therefore. there is still some week while new students operated the code hlamed for the Tyler accidents is
uncertainty as to the doses actually re machine and then disappear for months. instructive, however, i n showing the
ceived in the accidents. ' Borger determined that new ,tudents overall software design flaws. The fol
In one lawsuit that resulted from the ma ke lots of different types of mistakes lowing explanation of the problem is
Tyler accidents, the AECL quality con and use "creative methods of editing" from the description AECL provided
trol manager testified that a " cursor parameters on the console. Through for the FDA, although we have tried to
up" problem had been found in the experimentation, he found that certain clarify it somewhat. The description
service mode at the Kennestone clinie editing sequences correlated with blown leaves some unanswered questions, but
and one otherclinic in February or March fuses and determined that the same com i t is the best we can do with the informa
1 985 and also in the summer of 1 9i15. puter bug (as in the Therac-25 soft tion we have.
Both times, AECL thought that the soft ware) was responsible. The physicist As described i n the sidebar on Ther
ware problems had been fixed. There is notified the FDA, which notified Ther ac-25 software development and design,
no way to determine whether there is ac-20 users.' the treatment monitor task (Treat) con
any relationship between these prob The software error is j ust a nuisance trols the various phases of treatment by
lems and the Tyler accidents. on the Thcrac-20 because this machine executing its eight subroutines (see Fig
has independent hardware protective ure 2). The treatment phase indicator
Related Therac-20 problems. After the circuits for monitoring the electron variable (Tphase) i s used to determine
Tyler accidents, Therac-20 users (who beam scanning. The protective circuits which subroutine should be executed.
had heard informally about the Tyler do not allow the beam to turn on, so Following the execution of a particular
accidents from Therac-25 users) con there is no danger of radiation exposure subroutine, Treat reschedules itself.
ducted informal investigations to deter to a patient. While the Therac-20 relies One of Treat's subroutines, called
mine whether the same problem could on mechanical interlocks for monitor Datent (data entry), communicates with
occur with their machines. As noted ing the machine, the Therac-25 relies the keyboard handler task (a task that
earlier, the software for the Therac-25 largely on software. runs concurrently with Treat) via a
and Therac-20 both " evolved" from the
Therac-6 software. Additional functions
had to be added because the Therac-20 When Tphase is " 1 " (Datent):
(and Therac-25) operates in both X-ray If data enby complete, set Tphase to "3"
and electron mode, while the Therac-6
has only X-ray mode. The CGR em
ployees modified the software for the
Therac-20 to handle the dual modes.
When the Therac-25 development
began. AECL engineers adapted the
software from the Therac-6, but they
also borrowed software routines from
the Therac-20 to handle electron mode.
The agreements hetween AECL and
CGR gave both companies the right to
tap technology used in joint products
for their other products.
After the second Tyler accident, a Mode/energy offset variable
physicist at the University of Chicago
Joint Center for Radiation Therapy ......... - --- ... ... ",
heard about the Therac-25 software :' Calibratio�·\.. ..... "
'- tables "
problem and decided to find out wheth .......-..
. - - .. �.,.'
er the same thing could happen with the
Therac-20. At first, the physicist was Figure 2. Tasks and subroutines in the code blamed for the Tyler accidents.
July 1 993 29
shared variable ( Data-entry this output table are transferred
completion flag) to determine Datent : to the digital-analog converter
whether the prescription data if mode/energy specified then during the ncxt clock cycle.
has been entered. The key- begin Once the parameters are all
board handler recognizes the calculate table index set, Datent calls the subrou
completion of data entry and repeat tine Magnet, which sets the
changes the Data-entry com- fetch parameter bending magnets, Figure 3 is a
pletion variable to denote this. output parameter simplified pseudocode descrip
Oncc thc Data-entry comple- point to next parameter tion of relevant parts of the
tion variable is set. the Datent until all parameters set software.
subroutine detects the vari- call Magnet Setting the bending magnets
able 's change in status and if mode/energy changed then return takes about 8 seconds. Magnet
changes the value of Tphase end calls a subroutine called Ptime
from 1 (Data Entry) to 3 (Set if data entry is complete then set Tphase to 3 to introduce a time delay, Since
Up Test) . I n this case, the if data entry is not complet e then several magncts need to be set,
Datent subroutine exits back if reset command entered then set Tphase to 0 Ptime is entered and exited
to the Treat subroutine, which return several times. A flag to indi-
will reschedule itself and be cate that bending magnets are
gin execution of the Set-Up Magnet : being set is initialized upon
Test subroutine. If the Data Set bending magnet flag entry to the Magnet subrou
entry completion variable has repeat tine and cleared at the end of
not been set. Datent leaves the Set next magnet Pti m e . Furthermore. Ptime
value ofTphase unchanged and Call Ptime checks a shared variable, set
exits hack to Treat's main line. if mode/energy has changed, then exit by the keyboard handler, that
Treat will then reschedule it until all magnets ar e set indicates the presence of any
self. essentially rescheduling return editing requests. If there are
the Datent suhroutine. edits, then Ptime clears the
The command line at the Ptime: bending magnet variable and
lower right corner of the screen repeat exits to Magnet, which then
is the cursor's normal position if bending magnet flag is set then exits to Datent. But the edit
when the op e rator has com- if editing taking place then change variable is checked by
pleted all necessary changes if mode/energy has changed then exit Ptime only if the bending mag
to the prescription. Prescrip until hysteresis delay has expired net flag is set. Since Ptimeclears
tion editing is signified by cur Clear bending magnet flag it during its first execution. any
sor movement off the com return edits performed during each
mand line. As the program was succeeding pass through Ptime
originally designed, the Data Figure 3. Datent, Magnet, and Ptime s ubroutines. will not be recognized. Thus,
entry completion variable by an edit change of the mode or
itself is not sufficient since it cnergy, although reflected on
does not ensure that the cursor is locat separately. If the keyboard handler sets the operator's screen and the mode/
e d on the command line. Under the the data-entry completion variable be energy offset variable, will not be sensed
right circumstances, the data-entry phase fore the operator changes the data in by D atent so it can index the appropri
can be exited before all edit changes are MEOS, Datent will not detect the chang ate calibration tables for the machine
made on the screen. es in ME OS since it has already exited parameters.
The keyboard handler parses the mode and will not be reentered again. The Recall that the Tyler error occurred
and energy level specified by the oper upper collimator, on the other hand, is when the operator made an entry indi
ator and places an encoded result in set to the position dictated by the low cating the mode/energy. went to the
another shared variable, the 2-byte order byte of MEOS by another concur command line, then moved the c ur s or
mode/energy offset (MEOS) variable, rently running task (H and) and can up to change the mode/energy, and re
The low-order byte of this variable is therefore be inconsistent with the pa turned to the command line all within 8
used by another task (Hand) to set the rameters set in accordance with the in seconds. Since the magnet setting takes
collimator/turntable to the proper posi formation in the high-order byte of about 8 seconds and .'v1agnet does not
tion for the selected mode/energy. The MEOS. The software appears to include recognize edits after the first execution
high-order byte of the MEOS variable no checks to detect such an incompati of Ptime, the editing had been complet
is used by Datent to set several operat bility , ed by the return to Datent, which never
ing parameters, The first thing that Datent does when detected that it had occurred. Part of
Initially, the data-cntry process forc it is entered is to check whether the the problem was fixed after the accident
es the operator to e nter the mode and mode/energy has been set in MEOS, If by clearing the bending-magnet vari
energy, except when the operator se so, it uses the high-order byte to index able at the end of Magnet (after all the
lects the photon mode, in which case the into a table of preset operating param magnets have heen set) instead of at the
energy defaults to 25 .'v1eV. The opera eters and places them in the d igital-to end of Ptime.
tor can later edit the mode and energy analog output table, The contents of But this was not the only problem.
30 COMPUTER
Upon exit from the Magnet subroutine, specific accidents, but werc improve meeting at the a nnual conference of the
the data-entry subroutine (Datent) ments to the general machine safety. American Association of Physicists in
checks thc data-entry completion vari The full implementation or the CAP, :'v1.edicine. A t the meeting, users dis
able. If i t indicates that data entry is including an extensive safety analysis, cussed the Tyler accident and heard an
complete, Datent sets Tphase to 3 and was not complete until more than two AECL representative present thc com
Datent is not entered again. If it is not years after the Tyler accidents. pany's plans for responding to it. AECL
set, Datent leaves Tphase unchanged, AECL made its accident report to the promised to send a letter to all users
which means it will eventually be re FDA on April 15, 1 986. On that same detailing the CAP.
schcduled. But the data-entry comple date, AECL sent a letter to each Therac Several users described additional
tion variable only indicates that the cur user recommending a temporary "fix" hardware safety features that they had
sor has been down to the command line, to the machine that would allow contin added to their own machines to provide
not that it is still there. A potential race ued clinical use. The letter (shown i n its additionalprotection. An interlock (that
condition is set up. To fix this, A ECL complete form) read as follows: checked gun currcnt values) , which the
introduced another shared variable con Vancouver clinic had previously added
trolled by the keyboard handler task SUBJ ECT: CHANGE IN OPE R A TING to its Therac-2'l. was labeled as redun
PROCEDURES FOR THE THE RAC25
that indicatcs the cursor is not posi dant by AECL. The users disagreed.
LIKEAR ACCELER A T O R
tioned on the command line. If this vari There were further discussions of poor
able is set, then prescription entry is still Effective immediatelv. a n d until further design and other problems that caused
'
i n progress and the value of Tphase is notice, the key used for moving the cursor 1 0- to 30-percent underdosing in both
left unchanged. back through the prescription seq uence modes.
(i.e., cursor "1 iP" inscribe d with an upward
The meeting notes said
pointing arrow) must not he usedfor editing
Governm enr and user respunse. The or any other purpose.
FDA does not approve each new med . . there was a general complaint hy all
To avoid accidental use of this key, the
users present about the lack of information
ical device on the market: All medical key cap must be removed and the switch
propagation. The users were not happy
contacts fixed i n the open position with
devices go through a classification pro about receiving incomplete information.
eleetrica 1 tape or other insulating material.
cess that determines the level of F D A For assistance with the latter yon should
The AECL representative countered by
approval necessary. Medical accelera stating that AECL docs not wish to sp re ad
contact your local AECL service
rumors and that AECL has no policy to
tors follow a procedure called pre-mar representative.
"keep things quiet." The consensus among
ket notification before commercial dis Disabling this key means that if any
the users was that an improvement was
prescription data entered is incorrect then
tribution. In this process, the firm must necessary.
[an] "R" reset command must be used and
estahlish that the product is substantial the whole prescription reentered.
ly equivalent i n safety and effectiveness For those users of the Multipart option, After the first user group meeting,
to a product already on the market. If it also means that editing of dose rate. there were two user group newsletters.
that cannot be done to the FDA's satis dose. and time will not be possible between The first, dated fall 1 986, contained let
ports.
faction, a pre-market approval is re ters from Still, the Kennestone physi
quired. For the Therae-25, the F D A cist, who wmplained about what he
On May 2, 1 986, the F D A declared
relJuired only a pre-market notification. considered to be eight major problems
the Therac defective, demanded a CA P,
The agency is basically reactive to he had experienced with the Therac-25.
and required renotification of all the
problems and requires manufacturers These problems included poor screen
Therae customers. In the letter from the
to report serious ones. Once a problem refresh subroutines that left trash and
FDA to AECL, the director of compli
is identified in a radiation-emitting prod erroneous information on the operator
ance. Center for Devices and Radiolog
uct, the FDA must approve the manu console. and some tape-loading prob
ical Hcalth, wrote
facturer's corrective action plan (CAP). lems upon start-up, which he discov
The first reports of the Tvler acci ered involved the use of "phantom ta
We have reviewed Mr. Downs' Apri l 1 5
dents came to the FD A from t h e state of letter t o purchasers and have concluded bles" to trigger the interlock system in
Texas health department, and this trig that it docs not satisfy the requirements the event of a load failure instead of
gered FDA action. The FDA investiga fur notification to purchasers of a defect in using a check sum. He askcd the ques
an electronic product. Specifically, it does
tion was well under way when A ECL tion , "Is programming safety relying too
not describe the defect nor the hazards
produced a medical device rcport to associated with it. The letter does not much on the software interlock rou
discuss the details of the radiation over provide any reason [or disabling the cursor tines?" The second user group newslet
exposures at Tyler. The F D A declared key and the tone is not commensurate ter. in December 1 986. further discussed
the Therac-25 defective under the Ra with the urgency for doing so. In fact. the
the implications of the "phantom table"
letter i m p l ies t h e i n c o n v e n i e n c e t o
diation Control for Health and Safety parameterization.
operators outweighs t h e n e e d to disable
Act and ordered the firm to notify all the k e y . We request that you immediately AECL produced the first CAP on
purchasers, investigate the problem. renotify purchasers. June 1 3. 1 986. It contained six items:
determine a solution, and submit a cor
rective action plan for FDA approval. AECL promptly made a new notice ( 1 ) Fix the software to eliminate thc
The final CAP consisted of more than to users and also requested an exten specific behavior leading to t he Tyler
20 changes to the system hardware and sion to produce a CAP. The FDA grant problem.
software, plus modifications to the sys ed this request. (2) Modify the software sample-and
tem documcntation and manuals. Some About this time, the Therac-25 users hold circuits to detect one pulse above a
of these changes were unrelated to t he created a user group and held their first nonadjustable threshold. The software
July 1 993 31
sample-and-hold circuit monitors the not been p ro v i d ed, as re quested , on the
magnitude of each pulse from the ion
a
i nt e r cti o n wi t h ot h e r po rtions of the
software t o demonstrate the corre c ted
chambers in the beam. Previously, three software docs not adv er s e ly affect other
The investigators could
consecutive high readings were required software fu ncti o n s .
to shut off the high-voltage circuits, not reproduce the fault The J u l y 23 letter fro m t h e CDRH
re q uested a documented te st plan i n cl ud
which resulted in a shutdown time of condition that produced ing several spe cific pieces of I nfo rma tI O n
300 ms. The software modification rc
the 1 9 8 7 Yakima identified in t h e letter. ThIS request h as
suits i n a reading after each pulse. and a b e e n i g n ored up to this p o i n t hy the
shutdown after a single high read ing. overdose. ma n u fa c ture r . Considering the ramIfI
(3) Make Malfunctions ! through 64 cations of the current software problem,
changes in soft w ar e QA at t i t u de s arc
result in treatment suspend rather than
needed at AECL.
pause.
(4) Add a ncw circuit, which only
On October 30, the FDA responded
administrative staff can reset, to shut F D A also made a very detailed request to AECL's additional suhmissions, com
down the modulator if the sample-and for a documented test plan. p l aini ng about the lack of a detailed
hold circuits dctcct a high pulse. This is A ECL responded on September 26
d e s cri p t i o n or t he accident and of suffi
functionally equivalent to the circuit with several documents describing the
cient detail in flow diagrams. Many ,pe
described i n item 2. However, a new software and its modifications but no
cific questions addressed the vagueness
circuit board is added that monitors the test plan. They explained how the Ther
of the AECL response and made it clear
five sample-and-hold circuits. The new ac-25 software evolved from the Ther
that additional CAP work must precede
circuit detects ion-chamber signals above ac-6 software and stated that "no single
approval.
a fixed threshold and inhibits the trig test plan and report exists for the soft
A ECL. in response, created CAP
ger to the modulator after detecting a ware since both hardware and software
Revision ! on November 1 2 . This CAP
high pulse. This shuts down t h e beam were tested and exercised separately
contained 12 new items under "soft
independently of the software. and together over many years:' AECL
ware modifications." all ( except for one
(5) Modify the software to limit edit concluded that the curre n t CAP im
cosmetic change) designed to eliminate
ing keys to cursor up. backspace, and proved "machine safety by many orders
potentially unsafe behavior. The sub
return. of magn i t ude and virtually eliminates
mission also contained other relevant
(6) Modify the manuals to reflect the the possibility of lethal doses as deliv
documents including a test p l a n .
changes. ered in the Tyler incident."
The FDA responded to CAP Revi
An FDA internal memo dated Octo
sion 1 o n December 1 1 . The FDA ex
FDA internal memos dcscribe their ber 20 commented on these A E CL sub
plained that the software modifications
immediate concerns regarding the CAP. missions, raising several concerns:
appeared to correct the specific defi
One memo suggests adding an indepen
ciencies discovered as a result of the
dent circuit that "detects and shuts down U nfortu n ately . the AECL response also
Tyler accidents. They agreed that the
the system when inappropriate outputs seems to po int out an apparent lack of
documentation on .,oftware specifi c at io n s major items listed in CAP Revision !
are detected." warnings about when ion
and a software test pl an . would improve t h e Therae's operation.
chambers arc saturated, and under . . . co nc e r ns in clud e th e que sti o n of However, the FDA required AECL to
standable system error messages. An previolls kn owl e dge of prohlems hy AECL. attend to several further system prob
other memo questions "whether all pos the a ppa r e n t pa ucity of software QA
[quality assuranceJ at the m an ufa c t uring lems before CAP approval. AECL had
sible h a rdware options have b e e n
f a c i l i t y . a n d possible w a r n i n g s a n d proposed t o retain treatment pause for
investigated by t h e manufacturer t o pre
information dissemination t o others o f the some dose-rate and beam-tilt malfunc
vent any future inadvertent high expo generi c ty p e p robl e ms .
� tions. Since these are dosimetry system
sure. . . As mentioned in my first review,
problems, the FDA considered them
On July 23 the FDA officially re there is some confusion on w h e th e r the
manufacturer should have been aware o f safety interlocks and believed treatment
sponded to AECL's CAP submission.
t h e softw a r e p r o b l e m s prior t o t h e must he suspended for these malfunc
They c o ncept ua l l y agreed with the plan's [ accidenta l radiation o verdose s ] in .re � as. tions.
direction but complained about the lack AECL had received o ffI Cia l notifIcatIOn
AECL also planned to retain the
of specific information necessary to eval of a lawsuit in 1\ ove mb e r 1 985 fr om a
p a t i e n t cl a i mi n g accidental over-exposure malfunction cod e s , hut the FDA re
uate the plan, especially with regard to
fr om a Thcrae-2.'i i n M a riet ta , Georgt a . . . quired bettcr warnings for the opera
the software. The FDA requested a de
I f knowledge of these software deficiencies tors. Furthe rmore , AECL had n o t
tailed description of the software were known beforehand, what would be p l a n ned on a n y quality assurance test
development procedures and documen t h e F D A ' s p ost ure in this case"
. . . The materials submitted b y th e
ing to ensure exact copying of software,
tation, along with a revised CAP to
include revised requirements docu manufacturer have not been in s uffi c ien t but the FDA insisted on it. The 17DA
detail a n d claritv t o ensure an ade q u a te further req uested assurances that rigor
ments, a detailed description of correc
software QA progl'am curr e n t ly exists. For ous testing would he come a standard
tive changes, analysis of the interac ex amp le . a response has not been p ro vi d e d
part of AECL's software-modification
tions of the modified software with the with respect to the software part of t h e
CAP to t he CDRH [FDA Center fo r
procedures:
svstem, and detailed descriptions of the
Devices and Ra d io lo gical Health] r eq ue s t
r� vised edit modes, the changes made
for d oc u m e n ta t i o n on the revised We also expressed our concern t h a t YOll
t o the software setup table, and the requirements a n d s pe ci fica tions for the did n o t intend to perform t h e p rotocol
software interlock i nteractions. The ne w software. In add i t io n . a n a na l ysis has to future modifications to so ft w ar e . We
32 COMPUTER
believe that t h e ri goro u s testing must be The beam came on but the console the turntable ;v as i n the field-light posi
performed each time a modification is tion - was on the order or 4,UUO to 5,000
displayed no dose or dose rate. After 5
made in order Lo ensure the modification
or 6 seconds, the unit shut down with a rads. After two attempts. the patient
does not adversely affect the safety of
the system. pause and displayed a message, The could have received 8,000 to 10,000 in
message "may have disappeared quick stead of the 86 rads prescribed. AECL
AECL was also asked to draw up an ly'": the operator was unclear on this again called users on January 26 (nine
installation test plan to ensure both hard point. However, since the machine mere days after t h e accident) and gave them
ware and software changes perform as ly paused. h e was able to push the "P" detailed instructions on how to avoid
designed when installed. ke} to proceed with treatment. this problem, I n an FDA internal report
AECL submitted CAP Revision 2 and The machine paused again, this time on the accident, an AECL quality assur
supporting documentation on Decem displaying "flatness" on the reason line. ance manager investigating the prob
ber 22, 1 986. They changed the CAP to The operator heard thc patient say some lem is quoted as saying t hat the soft
have dose malfunctions suspend treat thing over the intercom, but couldn't ware a n d hardware changes to b e
ment and included a plan for meaning understand him. H e went into the room retrofitted following t h e Tyler accident
ful error messages and highlighted dose to speak with the patient. who reported nine months earlicr (but which had not
error messages. They also expanded "feeling a burning sensation'" in the chest. yet heen installed) would have prevent
diagrams of software modifications and The console displayed only the total ed the Yakima accident.
expanded the test plan to cover hard dose of the two film exposures (7 rads) The patient died i n April from com
ware and software. and nothing more, plications related t o the overdose. He
On January 26, 1 987. A ECL sent the Later in the day. the patient devel had becn suffering from a terminal form
FDA their " Component and I nstal oped a skin burn over the entire treat of cancer prior to the radi ation over
lation Test Plan" and explained that ment area. Four days later, the rcdness dose, but survivors initiated lawsuits
their delays were due to the investiga took on the striped pattern matching alleging that he died sooner than he
t ion of a new accident on January 1 7 at the slots in the blocking tray. The striped would have and endured unnecessary
Yakima. pattern was similar to the burn a year pain and suffering due to the overdose.
earlier at this hospital that had been The suit was settled out of court.
Yakima Valley Memorial Hospital, attributed to "cause unknown."
1987. On Saturday, January 1 7, 1 987, AECL began an investigation , and The Yakima software problem. The
the second patient of the day was to be users were told to confirm the turntable software prohlem for the second Yaki
treated at the Yakima Valley Memorial position visually before turning on the ma accident is fairly well established
Hospital for a carcinoma. This patient beam, All tests run by the AECL engi and different from that implicated in
was to receive two film-verification ex neers indicated that the machin e was the Tyler accidents. There is n o way to
posures of 4 and 3 rads. plus a 79-rad working perfectly. From the informa determine what particular software de
photon treatment ( for a total exposure tion gathered to that point, it was sus sign errors were related to the Kenne
of 86 rads). pected that the e1edron beam had come stone, Hamilton. and first Y akima acci
Film was placed under the patient on when the turntable was in the field dents. Given the unsafe programming
and 4 rads was a dministered with the light position, But the investigators could practices exhibited in the code, i t is
collimator jaws opened to 22 x 18 cm. not reproduce the fault condition that possible that unknown race conditions
After the machine paused. the collima produccd the overdose. or errors could havc been responsible.
tor j aws opened to 35 x 35 cm automat On the following Thursday. AECL There is speculation, however, that the
ically. and the second exposure of 3 rads sent a n engineer from Ottawa to inves Hamilton accident was the same as this
was administered. The machine paused t igate. The hospital physicist had. in the second Yakima overdose. In a report of
agai n. meantime, run some tests with film. He a conference call on January 26. 1 987,
The operator entered the treatment placed a film i n the Therac's beam and betwcen the AECL quality assurance
room to remove t he film and verify the ran two exposures of X-ray parameters manager and Ed Millcr of the FDA
patient's precise position. He used the with the turntable in field-light posi discussing the Yakima accident, Miller
hand control i n the treatment room to tion. The film appeared to match the notcs
rotate the turntable t o the field-light film that was left (by mistake) under the
position, a feature that let him check patient during the accident. This situation probably occurred in the
the machine's alignment with respect to After a week of checking the hard H a m i l t o n , Ontario, accident a couple of
the patient's body to verify proper beam ware, AECL determined that the "in years RgO. I t was not discovered at that
tune a n d the cause was a t trihuted to
position. The operator t h e n e ither correct machine operation was proba i ntermittent i n t e rl oc k failure. The
pressed the set button on the h and con bly not caused by hardware alone." After subsequent recall of the mul t i p l e
trol or left the room and typed a set checking the software, AECL discov microswitch logic network did n o t really
.
command at the console to return the ered a flaw ( described in the next sec �
solve the probl m .
turntable to the proper position for treat tion) that could explain the erroneous
ment: there is some confusion as to ex behavior. The coding problems explain The second Yakima accident was again
actly what transpired. When he left the ing this accident differ from those asso attributed to a type of race condition in
room, h e forgot to remove the film from ciated with the Tyler accidents. the software - this one allowed the
underneath the patient. The console AECL's preliminary dose measure device t o be activated in an error setting
displayed "'beam ready, " and the oper ments indicated that the dose delivered (a '"failure"' o f a software interlock ) .
ator hit the "B" key to turn the beam on. under these conditions - that is. when The Tyler accidents were related t o prob-
July 1 993 33
p osition check is performed by a sub
rout ine of H k eper calle d Lmtchk (ana
Hkeper
log/digit al limit checkin g) . Lmtchk first
checks the Class3 variable. If Class3
contai n s a n on ze ro value, Lmtchk calls
the Check Collimator (Chkcol) subrou
tine. If Class3 contains zero, Chkcol is
bypas se d and the upper collimator po
sition check is not performed. The Ch
kcol subroutine sets or rese t s bit 9 of the
F$mal shared variable, depending on
the position of the upper collimator
Chkcol
(which in turn is checked by the Set-Up
If uppe r collimator position Test subroutine of Datent so it can de
inconsistent with treatment cide whether to reschedule itself or pro
then set bit 9 of F$mal
ceed to Set-Up Done).
Durin g machine setup, Set-Up Test
will be executed several hundred times
since i t reschedules itself waiting for
other events t o occur. In the code, the
Class3 va riable is incremented by one in
each pass through S et-Up Test. Since
F$mal the Class3 variable is 1 byte. it can only
34 COMPUTER
• a t urn t able potentiomet er to inde the safety of the entire system to prevent until "the company can comp l e t e an
or mi n imi z e exposure from oth er fanlt exhaustive analysis of the design and
pendently monitor turntable posi
c o n d itions .
tion, and o peration of the safety systems employed
• a hardware turntable interlock cir for p atie n t and operator protection. "
On February 6, 1 987, Miller of the
cuit. AECL was told that the letter t o the
FDA called Pavel Dvorak of Canada's
users should include information on how
Health and Welfare to advise him that
The second item, a hardware singlc the users can operate the equipment
the FDA would recommend all Therac-
pulse shutdown circuit, essentially acts safely in t he event that they must con
25s be shut down until permanent
as a hardw a r e interlock to prevent ov e r tinue with p a t ie n t treatment. If AECL
modifications could be made. Accord
dosin g by detecting an unsafe level of could not provide information that would
ing to Miller's notes on the phone call,
radiation and halting beam output after guarantee safe operation of the equip
Dvorak agreed and indicated that they
one pulse of high energy and current. ment, AECL was requested to inform
would coordinate their actions with the
This p rovid es an independent safety the users that they cannot operate the
FDA.
mechanism to protect agains t a wide equip me n t safely, AECL complied by
On February 1 0, 1 987, the FDA gave
range of pot e ntial hardware failures and le tte rs d at ed F e bruary 20, 1987, to Ther
a :"!otice of Adverse Findings to AECL
software errors. The turntable potenti ac-25 purchasers. This recommendation
d e clar i n g the Therac-25 to be defective
ometer was the safety device recom to dis con t inue use of the Therac-25 was
under US law. In part, the letter to
mended by several groups, including to last until August 1 987.
AECL reads:
the C R P I3 , after the Hamilton accident. On March 5, 1987, AECL issued CAP
After the second Yakima accident. I n J a nna r y I YH7, CDRH was advised of Revision 3, which was a CAP for both
the FDA became concerned that the anoth e r accidental radiation occurrence the Tyler and Yakima accidents. It con
in Yakima, which was attributed to a s e con d tained a few additions to the Revision 2
use of t h e Therac-25 d u r i n g the CAP
s o f t w are defect re late d to the " Se t "
process, even with AECL's interim op modifications, n otably
command. In addition, t h e C D R H h as
era t ing instructions, involved too much become aware of at Ie ast two other software
risk to patients. The FDA concluded fe a t u r e s t h a t p ro v ide p ote n t i al for • c h anges to the software to eliminate
that the accidents had demonstrated that u n n e c e s s a ry or inadvertent pa ti e n t the behavior leading to the latest
exposure. O n e o f these i s related to the
the software alone cannot be replied Yakima accident,
method of e d i ting the prescription a fter
upon to assure safe operation o f the • four additional software functional
the " B " comm a nd is entere d and the other
machine. In a February 18, 1 987 inter is the calling of phantom tables when low modifications to improve safe ty, and
nal FDA memorandum, the director of d os es are prescribed. • a turntable po s i ti on interlock in the
the Division of Radiological Product s Further review of the circumstances s oftwa re .
s u rround i ng the a c c i d e n t a l r a d i a t i o n
wrote the following: occnrrenees a nd th e pote n ti a l for other
such incidents has le d us to c oncl ude that In their response on Apr i l 9, the FDA
I t is impossible for CD RII to find all in addition to the items i n yo u r propo,ed noted that in the appcndix under "turn
potenti a l failure mode, and c ond it ion s of corrective a c t i on pl an , h a rd w a r e table p osition interlock circuit" the de
the s o ftw a re . AECL has indi ca ted the inte r locking of the t urnta b le to insure its
scrip tions were wrong. AECL had indi
"simple software fix" will correct the proper posit i on pr i or to beam activation
cated "high" signals where "low" signals
turntable p osition prob lem di s pl ayed at appears t o be necessary to enhance system
Yakima. We h av e not yet had the safety and to correct the Therac-25 defect . were called for and vice versa. The FDA
opportunity to e v a l uate that modification. co rrecti ve act ion plan as
The refore , the also questioned t he reliability of the
Even if it does, based upon past his tory, I currently proposed is insufficient and m m t turntable pot e n t iome t er d esig n and
am not convinced that there arc not other be amended to i n c l ude t u r n t abl e asked whether the backspace key could
software glitches that could result in seri o us inte rloc ki ng a n d corrections fo r the three
inj ury . software problems mentioned above. still act as a carriage return in the edit
For exam p le , we are aware that AECL Without these co rrections , CDRH has mode. They requested a detailed de
i s s u e d a u s e r ' s b ul l e t i n J a nu a ry 2 1 concluded that the consequences of the scription of the software portion of the
r e min d ing users of the p rope r p ro c e dur e defects rcprcsents a si gnifi cant p otentia l
single -p ulse shutdown and a block dia
to fo l l o w if e d i t in g of p r e scr i p t ion risk of serio u s injury e ve n if th e Therac-25
is o pe rated in accordance with your int e ri m
gram to demonstrate the PRF (pulse
parameter is d e sired after e n terin g the
"B" (beam on) code but be fo re the CR ope rati ng instructions. CDRH. therefore, repetition frequency) generator. modu
[carriage return 1 is press ed . I t seems that requests that AECL i mm edi a t e ly notify lator, and associated interlocks,
the normal edit ke ys (down arrow, right all p urc ha s e r s and recommend t hat use o f AECL responded on April 13 with an
arrow, or line feed) will be i nterpret e d as the devi ce on patients for routine th e rapy
update on the Therac CAP status and a
a C R and in i tiate exposure. One must use be discontinued until such time that an
either the b a c k sp a ce or left arrow key to amended c orre ctiv e action plan approved schedule ofthe nine action items pressed
edit. by CDRH is fully completed. You may by the users at a user group meeting in
We are also aware that ifthe dose entered also advise p urc hase rs that if the need for March. This unique and highly p rod uc
i n t o the presc ription tables is below some an individual p a tient treatment outweighs
tive meeting provided an unusual op
pre,et value, the system will default to a the potential risk, then extreme caution
and s tr i c t adherence to op e ra tin g safe ty
p ort u n it y to involve the users i n the
ph an tom table value unbeknownst to the
operator. This proble m is supposedly being procedures mnst be exercised. CAP evaluation process. It brought to
addressed in p ropose d interim revision gether all concerned parties in one place
7 A, althou g h we are unaware of the details. At the same time, the Heal t h Protec so that they could decide on and ap
We are in the position of saying that the
tion B ranch of the Canadian govern prove a course of action as quickly as
p ro po s ed CAP can r e a son ah l y be ex pecte d
to corre c t the deficiencies for which thev
ment instructed AECL to recommend possible. The attendees included repre
we r e deve lop ed (Ty ler). We cannot sa y to all users in Canada that they discon s e ntatives from t h e m an u facturer
that we are [reasonably] confident about tinue the operation {)f t h e Therac-25 (AECL); all users, including their tech-
July 1 993 35
Safety analysis of the Therac-25
T h e Therac-25 safety analysis included ( 1 ) failure mode program changes to correct shortcomings, i mprove reli
and effect analysis, (2) fault-tree analysis, and (3) software ability, o r i mprove the software package i n a general
examination. sense. The final safety report gives no information about
whether any particular methodology or tools were used in
Failure mode and effect analysis. An FMEA describes the software inspection or whether someone just read the
the associated system response to all failure modes of the code looking for errors.
individual system components, considered one by one.
When software was involved, AECL made no assessment Conclusions of the safety analysis. The final report
of the "how and why" of software faults and took any com sum marizes the conclusions of the safety analysis:
bination of software faults as a single event. The latter
The conclusions of the analysis call for 10 changes to
means that if the software was the initiating event, then no Therac-25 hardware; the most significant of these are
credit was given for the software mitigating the effects. i nterlocks to back up software control of both electron
This seems like a reasonable and conservative approach scanning and beam energy selection.
Although it is not considered necessary or advisable to
to handling software faults.
rewrite the entire Therac-25 software package, considerable
effort is being expended to update it. The changes recom
Fault-tree analysis. A n FMEA identifies single failures mended have several distinct objectives: improve the protec
leading to Class I hazards. To identify multiple failures and tion it provides against hardware failures; provide additional
quantify the results, AECL used fault-tree analysis. An FTA reliability via cross-checking; and provide a more maintain
able source package. Two or three software releases are
starts with a postulated hazard - for example, two of the
antiCipated before these changes are completed.
top events for the Therac-25 are high dose per pulse and The implementation of these improvements including
illegal gantry motion. The immediate causes for the event design and testing for both hardware and software is well
are then generated i n an AND/OR tree format, using a ba under way. All hardware modifications should be completed
and installed by mid 1 989, with final software updates
sic understanding of the machine operation to determine
extending into late 1 989 or early 1 990.
the causes. The tree generation continues until all branch
es end in "basic events." Operationally, a basic event is The recommended hardware changes appear to add
sometimes defined as an event that can be quantified (for protection against software errors, to add extra protection
example, a resistor fails ope n ) . against hardware failures, or to increase safety margins.
A E C L used a "generic failure rate" o f 1 0-" p e r h o u r for The software conclusions included the following:
software events. The company j ustified this number as
The software code for Beam Shut-Off, Symmetry Control,
based on the historical performance of the Therac-25 soft
and Dose Calibration was found to be straight-forward and
ware. The final report on the safety analysis said that many no execution path could be found which would cause them
fault trees for the Therac-25 have a computer malfunction to perform incorrectly. A few improvements are being incor
as a causative event, and the outcome of quantification is porated , but no additional hardware interlocks are required.
therefore dependent on the failure rate chosen for soft Inspection of the Scanning and Energy Selection func
tions, which are under software control, showed no improper
ware.
execution paths; however, software inspection was unable
Leaving aside the gene ral q uestion of whether such fail to provide a high level of confidence in their reliability. This
ure rates are meaningful or measurable for software in was due to the complex nature of the code, the extensive
general, it seems rather difficult to justify a single figure of use of variables, and the time limitations of the inspection
process. Due to these factors and the possible clinical
this sort for every type of software error o r software behav
consequences of a malfunction, computer-independent
ior. It would be equivalent to assigning the same failure interlocks are being retrofitted for these two casas.
rate to every type of failure of a car, no matter what partic
ular failure is considered. Given the complex nature of this software design and
The authors of the safety study did note that despite the the basic multitasking design, it is difficult to understand
uncertainty that software introduces into quantification, how any part of the code could be labeled "straightfor
fault-tree analysis provides valuable information in showing ward" o r how confidence could be achieved that "no exe
single and multiple fai l u re paths and the relative i m por cution paths" exist for particular types of software behav
tance of different failure mechanisms. This is certainly true. ior. However, it does appear that a conservative approach
- including computer-independent interlocks - was taken
Software examination. Because of the difficulty of in most cases. Furthermore, few examples of such safety
quantifying software behavior, AECL contracted for a de analyses of software exist in the literature. One such soft
tailed code inspection to "obtain more information on which ware analysis was performed in 1 989 on the shutdown
to base decisions." The software functions selected for ex software of a nuclear power plant, which was written by a
amination were those related to the Class I software haz different division of AECL. ' M uch still needs to be learned
ards identified in the FMEA: electron-beam scanning, ener about how to perform a software-safety analysis.
gy selection, beam shutoff, and dose calibration.
The outside consultant who performed the inspection i n Reference
cluded a detailed examination o f each function's imple
1. W.C. Bowman et aI., " An Application of Fault Tree Analysis to
mentation, a search for coding errors, and a qual itative as
Safety-Critical Software at Ontario Hydro," Cont. Probabilistic
sessment of its reliability. The consultant recommended Safety Assessment and Management. 1 991 .
C O M PUTER
nical and legal staffs; the US FDA; the (1) e lectron-beam scanning, would be an option for other clinics.
Canadian BRMD: the Canadian Atom (2) electron-energy selection, Software documentation was described
ic Energy Control Board; the Province (3) beam shutoff, and as a lower priority task that needed
of Ontario; and the Radiation Regula (4) calibration and/or steering. definition and would not be available to
tions Committee of the Canadian Asso the FDA in any form for more than a
ciation of Physicists. A ECL planned a fifth revision of the year.
According to Symonds of the BRMD, CAP to include the testing and safety On July 6, 1 987, AECLsent a lettcr to
this meeting was very important to the analysis results. all users to inform them of the FDA's
resolution of the problems since the reg Referring to the test plan at this, the verbal approval of the CAP and delin
ulators, users, and the man ufaeturer ar final stage of the CAP process, an FD A eated how AECL would procced. On
rived at a consensus in one day. reviewer said Ju ly 2 1 , 1 9R7, AECL issued the fifth and
At this second users meeting, the final CAP revision. The major features
participants carefully reviewed all the Amazingly, the test data p re s ented to of the final CAP are as follows:
six known major Therac-25 accidents show that the software changes to h andl e
the edit problems in the Therac-25 are
and discussed the elements of the CAP appropriate prove the exact opposite result.
• All interruptions related to the do
along with possible additional modifi A re view of the data table in the tes t simetry system will go to a treatment
cations. They came up with a prioritized results indicates that the fin a l heam type suspend, not a treatment pause. Opera
list of modifications that they wanted and energy (edit change) [ h av e 1 no effect tors will not be allowed to restart the
on the initial beam type and e n e rgy. I can
included in the CAP and expressed con machine without reentering all parame
only assume that either the fix is not right
cerns about the lack of independent or the data was entered incorrectlv. ters.
· The
software evaluation and the lack of a man ufacturer should be admonish ed for • A software single-pulse shutdown
hard-copy audit trail to assist in diag this error. Where is the QC [quality control] will be added.
nosing faults. review for the test progr am? AECL must: • An independent hardware single
( 1 ) cl ari fy this situation, (2) change the
The AECL representative, who was test pro t ocol to prevent this type of error
pulse shutdown will be added.
the quality assurance manager, respond from uccurring, and (3) set up appropriate • Monitoring logic for turntable posi
ed that tests had been done on the CAP QC co ntrol on data review. tion will be improved to ensure that the
changes, but that the tests were not turntable is in one of the three legal
documented, and independent evalua A further FDA memo said the AECL positions.
tion of the software "might not be pos quality assurance manager • A potentiometer will be added to
sible." He claimed that two outside ex the turntable. It will provide a visible
perts had reviewed the software, but he . . . cuuld nut give an explanation and
signal of position that operators will use
w i ll check i n to the circnmstances. He
could not providc thcir namcs. In rc SUbsequently called back a n d verified that to monitor exact turntable location.
sponse to user requests for a hard-copy t h e technician cumpleted the form • Interlocking with the 270-degree
audit trail and access to source code, h e incorrect ly. Correct op e r ation was bending magnet wil l be added to ensure
explained that memory limitations would witnessed by h imself and o t he rs . They will that the target and beam flattener are in
repe at and send us th e correct data sheet.
not permit including an audit option, position if the X-ray mode is selected.
and source code would not be made • Beam on will be prevented if the
available to users. At the American Association ofPhys turntable is in the field-light or an inter
On May 1, AECL issued CAP Revi icists in Medicine meeting in July 1 987, mediate position.
sion 4 as a result of the FD A comments a third user group meeting was held. • Cryptic malfunction messages will
and users meeting input. The FDA re The AECL representative gave the sta be replaced with meaningful messages
sponse on May 26 approved the CAP tus of CAP Revision 5. He explained and highlighted dose-rate messages.
subject to submission of the final test that the FDA had given verbal approval • Editing keys will be limited to cur
plan results and an independent safety and he expected full implementation by sor up, backspace, and return. All other
analysis, distribution of the draft re the end of August 1 987. He reviewed keys will be inoperative.
vised manual to customers, and com and commented on the prioritized con • A motion-enable foot switch will be
pletion of the CAP by June 30, 1 987. cerns of the last meeting. AECL had added, which the operator must hold
The FDA concluded by rating this a included in the CAP three of the user closed during movement of certain parts
Class I recall: a recall in which there is a requested hardware changes. Changes of the machine to prevent unwanted
reasonable probability that the use of to tape-load error messages and check motions when the operator is not in
or exposure to a violative product will sums on the load data would wait until control (a type of "dead man's switch").
cause serious adverse health conse after the CAP was done. • Twenty-three other changes to the
quences or death.' Two user-requested hardware modi software to improve its operation and
AECL sent more supporting docu fications had not been included i n the reliability, including disabling of unused
mentation to the FDA on June 5, 1 987, CAP. One of these, a push-button ener keys, changing the operation of the set
including the CAP test plan, a draft gy and selection mode switch, AECL and reset commands, preventing copy
operator's manual, and the draft of the would work on after completing the ing of the control program on site, chang
new safety analysis (described in the CAP, the quality assurance manager ing the way various detected hardware
sidebar "Safety analysis of the Therac- said. The other, a fixed ion chamber faults are handled, eliminating errors in
25" ) . The safety analysis revealed four with dose/pulse monitoring, was being the software that were detected during
potentially hazardous subsystems that installed at Yakima, had already been the review process, adding several addi
were not covered by CAP Revision 4: installed by Halifax on their own, and tional software interlocks, disallowing
July 1 993 37
changing to the service mode while a in a n unexpected or undesired way un
treatment is in progress, and adding der any circumstances (which is clearly
meaningful error messages. impossible) or not to use software at all
Accidents usually involve i n these types of systems. Both conclu
The known software problems as
•
sociated with the Tyler and Yakima ac a complex web of sions are overly pessimistic.
cidents will be fixed. interacting events with We must approach the problem of
The manuals will be fixed to reflect
• accidents in complex systems from a
multiple contributing
the changes. system-engineering point of view and
factors. consider all possible contributing fac
In a 1 987 paper, �Iiller, director of tors. For the Therac-25 accidents, con
the Division of Standards Enforcement, tributing factors incl uded
CDRH. wrote about the lessons learned
from the Therac-25 experiences.' The guides _ and r e g u lations to guide them a n d • management i n adequacies and lack
first was the importance of safe versus s
we have b e e n r e a sured by the hither lo of procedures for following through on
excellent record of these machines. Except all reported incidents.
" user-friendly" operator interfaces -
for a few incidents i n the 1 960s (e .g .. at
in other words, making the machine as • ovcrconfidence in the software and
H ammersmi t h . H a m b u rg) t he use o f
easy as possible to use may conflict with medical accelerators has b e e n rem a rka bly
removal of hardware interlocks (mak
safety goals. The second is the impor free of s eri ous radiation accidents until ing the software into a single point of
tance of providing fail-safe designs: now. Perhaps, though. we have been spoi ed l failure that could lead to an accident).
by this success. I • presumably less-than-acceptable
The second lesson is that fo r comple x software-engineering practices, and
interrupt-driven software, t iming is of Accidents are seldom simple - they • unrealistic risk assessments along
c r i t i c a l i m p o r t a n c e . In both of t he s e
usually involve a complex web of inter with overconfidence in the results of
si t uatio n s . operator action w i t h i n very
narrow time -frame windows was necessary acting events with multiple contribut these assessments.
for the accidents to occur. It is unlikely ing technicaL h uman, and organization
t h a t software t e s t i n g will discover a il al factors. One of the serious mistakes The exact same accident may not hap
possib l e er rors t h a t i n v o lve o p e r a t o r that led to the multiple Thcrac-25 acci pen a second time_ but if we examine
intervention at p recis e time fra m e s during
softw are operation. These mac h ines , for
dents was the tendency to believe that and try to ameliorate the contributing
example. have been e x e rc i s e d for the cause of an accident had been deter factors to the accidcnts we have had, we
t housands of hours in the factorv and in mined (for example, a microswiteh fail may be able to prevent different acci
the hos p ita ls without acci d e n t . T herefo re . ure in the Hamilton acciden t ) without dents in the future. In the following
o n e m u s t p r o v i d e for p r e v e n t i o n of
adequatc evidence to come to this con sections, we present what we feel are
catastrophic results of failures when they
d o occ u r . clusion and without looking at all possi important lessons learned from the Ther
I. for one. w i l l not be surprised if other ble contributing factors. Another mis ac-25. You may draw different or addi
s oftwa r e errors appear with this or other take was the assumption that fixing a tional conclusions.
equipment i n the future.
particular error (eliminating the cur
rent software bug) would prevent fu System engineering. A common mis
�iller concluded the paper with ture accidents. There is always another take in engineering, in this case and
software bug. many others, is to put too much confi
FDA has performed extensive review
of the Therac-25 softw a re and hardware Accidents are often blamed on a sin dence in software. r-; on software profes
safety s y s t e m s . W e c a n n o tsay with gle cause like human error. B ut virtual sionals seem to feci that software will
absolute certainty that all software l y all factors involved in accidents can not or cannot fail: this attitude leads to
prob em sl that might result in imp rop e r be labeled human error. except perhaps complacency and overreliance on com
dose h a v e been found and eliminated.
for hardware wear-out failures. Even puterized functions. Although software
Howev e r . we a r e c o n f i d e n t t h at t h e
hardware a n d software safety featu res such hardware failures could be attrib is not subject to random wear-out fail
recently added will prevent future uted to human error (for example, the ures like hardware, software design er
catastrophic consequences of failure. designer's failure to provide adequate rors are much harder to find and elimi
redundancy or the failure of operation nate. Furthermore, hardware failure
al personnel to properly maintain or modes are generally much more limit
Lessons learned replace parts) : Concluding that an acci ed, so building protection against them
dent was the result of human error is not is usually easier. A lesson to be learned
Often. it takes an accident to alert very helpful or meaningful. from the Theriic-25 accidents is not to
people to the dangers involved in tech It is nearly as useless to ascribe the remove standard hardware interlocks
nology. A medical physicist wrote about cause of an accident to a computer error when adding computer control.
the Therac-25 accidents: or a software error. Certainly software Hardware backups, interlocks, and
was involved in the Therac-25 accidents, other safety devices are currently being
In the past decade or two, the medical but it was only one contributing factor. replaced by software in many different
accelerator "industry" has become perhaps If we assign software error as the cause types of systems, including commercial
a lillie complacent about safety. We have
ofthe Therac-25 accidents. we are forced aircraft, nuclear power plants, and weap
assumed that t h e manufacturers have all
kinds of safety design experience since
to conclude that the only way to prevent on systems. Where the hardware inter
they ' v e been in the business a long time . such accidents in the future is to build locks are still used. they are often con
We know that there are many safet y codes, perfect software that will never behave trolled by software . Designing any
38 COMPUTER
dangerous system in such a way that one incident-analysis procedurcs that they Software engineering. The Therac-25
failure can lead to an accident violates apply whenever they find any h i nt of a accidents were fairly unique in having
basic system-engineering principles. In prohlem that might lead to an accident. software coding errors involved - most
this respcct. software needs to be treat The first phone call by Still should have computer-related accidents have not
ed as a single component. Software led to an extensive investigation of the involved coding errors but rather errors
should not be assigned sole responsibil events at Kcnnestone. Certainly, learn i n the software requirements such as
ity for safety, and systems should not be ing about t he first lawsuit should h ave omissions and mishandled cnvironmen
designed such that a single software triggered an immediate response. Al tal conditions and system states. Al
error or software-engineering error can though hazard logging and tracking is though using good basic software-engi
be catastrophic. req uired in the standards for safety neering practices will not prevent all
A related tendency among engineers critical military projects, it is less com software errors. it is certainly required
is to ignore software. The first safety mon in nonmilitary projects. Every com as a minimum. Some companies intro
analysis on the Therac-25 did not in pany building hazardous e quipme n t ducing software into their systems for
clude software (although nearly full re should h a v e hazard logging a n d track thc first time do not take software engi
sponsibility for safety rested on the soft ing as well as incident reporting and neering as seriously as they should. Ba
ware). When prohlemsstarted occurring, analysis as parts of its quality control sic software-engineering principles that
investigators assumed that hardware was procedures. Such follow-up and track apparently were violated with the Ther
the cause and focused only on the hard ing will not only help prevent accidents, ac-25 include :
ware. Investigation of software's possi but will easily pay for themselves in
ble contribution to an accident should reduced insurance rates and reasonable • Documentation should not he an
not be the last avenue explored after all settlement of laws uits when they do afterthought.
other possible explanations are elimi occur. • Software quality assurance practic
nated. Finally, overreliance on the numeri es and standards should be estab
I n fact. a software error can always bc cal output of safety analyses is unwise. lished.
attributed to a transient hardware fail The arguments over whether very low • Designs should be kept simple.
ure, since software (in these types of probabilities are meaningful with re • Ways to get information about er
process-control systems) reads and is spect to safety are too extensive to sum rors - for example. software audit
,ues commands to actuators. Without a marize herc. But. at the least, a healthy trails - should be designcd into the
thorough investigation (and wi thout on skepticism is in order. The claim that software from t h e beginning.
line monitoring or audit trails that save safety h ad been increased five orders of • Thc software should he subjected
internal state information), it is not pos magnitude as a result of the microswitch to extensive testing and forma l
sihle to determine whether the sensor fix after the Hamilton accident seems anal ysis at t h e module and software
provided the wrong information, the hard to justify . Perhaps it was b ased on level: system testing alone is not
software provided an incorrect com the probability of failure of the mi adequate.
mand, or the actuator had a transient croswitch (typically 10-') ANDed with
failure and did the wrong thing on its the other interlocks. The problem with In addition. special safety-analysis and
own. In the Hamilton accident, a tran all such analyses is that they exclude design procedures must he incorporat
sient microswitch failure was assumed aspects of the problem (in this case , ed into safety-critical software projects.
to be the cause, even t hough the engi software) that are difficult to quantify Safety must be built into software. and.
neers were unable to reproduce the fail but which may have a larger impact on in addition. safety must be assured at
ure or find anything wrong with the safety than the quantifiable factors that the sy'item level despite software er
microswitch. are included. rors.'I I" The Therac-20 contained the
Patient reactions were the only real Although management and regulato same software error implicated in the
indications of the seriousness of the prob ry agencies often press engineers to Tyler deaths, but the machine included
lems with the Therac-25. There were no obtain such numbers, engineers should hardware interlocks that mitigated its
independent checks that thc software insist that any risk assessment n umbers consequences. Protection against soft
was operating correctly (including soft used are in fact meaningful and that ware error, can also be built into t h e
ware checks). Such verification cannot statistics of this sort are treated with software itself.
be assigned to operators without pro caution. In our enthusiasm to provide Furthermore. important lessons about
viding them with some means of detect measurements. we should not attempt software reuse can be found here. A
ing errors. The Therac-25 software "lied" to measure the unmeasurable. William naivc assumption is often made that
to the operators. and the machine itself Ruckelshaus, two-time head of the US reusing software or using commercial
could not detect that a massive over Environmental Protection Agency, cau off-the-shelf software increases safety
dose had occurred. The Therac-25 ion tioned t h at '"risk assessment data can be because the software has been exer
chambers could not handle the high like the captured spy: if you torture it cised extensively. Re using software
density of ionization from the unscanned long enough. it wil l tcll you anything mod ules does not guarantee safety in
electron beam at high-beam current; you want to know. "7 E.A. Ryder of the the new system to which they are trans
they thus became satnrated and gave an British Health and Safety Executive has ferred and sometimes leads to awkward
indication of a low dosage. Engineers writtcn that the numbers game in risk and dangerous designs. Safety is a qual
need to design for the worst case . assessment "should only he played in ity ofthe system in which the software is
Every company building safety-criti private hetween consenting adults. as it used: it is not a quality of thc software
cal systems should have audit trails and is too easy to be misinterpreted."x itself. Rewriting the entire software to
July 1 993 39
get a clean and si mp l e de si g n may b e swer and involves ethical and political t e s t a t l e a s t t h e operation of s a f e t y
i n t e r l o c k s d ur in g co m m i ssi o ni ng . Few
s a f e r i n many ca se s
. issues that cannot be answered by sci
however have t h e time or resources to
Taking a c o u p l e of prog r a m m i ng ence o r e n gi n e eri ng a lo n e How ev er at
. .
c on du ct a co mpre h e ns i ve assessment of
courses o r p r og r a mmin g a h ome com t h e l e ast , b e t t e r p rocedure s are certain safety design.
puter does no t qu a li fy anyone to pro ly required for report in g p ro b le m s to A more effective approach might be t o
ical so ft wa re e n gin e e ri ng requires train p r o fi t s . On t h e other hand, standards the Th e r ac 2 5 was user driven - the
-
ing and experience i n addi tion t o t h a t can have t h e undesirable e ffect or l i m i t man ufacturer was slow to r e sp o n d The .
r e qu i r e d for noncritical softwa re . i n g t h e safety e ffort s and investment of Therac-25 user group meetings w e re .
A lth ough t h e user interface of t h e c o m p an i e s t h at fe e l t h e i r le gal a n d mor a cc o r din g to p a r tici p a n t s, i mport ant to
Therac-25 h a s a tt ract ed a l ot of atten al resp on s i bi l it i e s are fulfil led i f t hey t h e resolution o f t he p roble ms B u t if .
tion. i t w a s really a side issue in the follow the standards. users are to be i nv o l v e d t h e n t h e y must
.
mation has been entered c o rrec t l y) en pre h e n sive ly id e n t i fies a n d e va l uat es t he
h a n ce d u s a b i l i t y at t h e e x pe n se of We could continue our t radit i o n al role, s yst e m s accident risks: p rovi de s a means
'
l i t tle c x pe rie n ce th e y had with similar wa y t o i m pro v in g safety. It is debatable im p a ct o n fl ig h t a n d ground safet y a n d
however whether these actions would he
p r o bl e m s i n co m pu t e rize d medical de action taken to prevent recurren c e . T h e
s u fficient to p re v en t a fu t ure series of
v ices. S i nc e the T h e r a c -2 5 events. the AFAR also must address failures, acci
accidents.
FDA has moved to improve th e r e por t Pe r h ap s what is needed in ad d i t i on i s a dents, or inciden t s from p re v i ou s mis
in g system and to a ugmen t their p roc e mechanism by which t h e safe ty of a n y new sions of this system or ot h e r systems
dures and g ui d e li ne s to i nclude soft model of a c ce l e r a tor is assessed
usi ng similar hardware. All co rr e ct i ve
inde p e nden t l y o f the manufacturer. This
ware. The p rob le m of deciding when to action taken to p r eve n t recurrence must
task could he done b y the i n d ividual
forbid the use of me d ical devices that p hysici st a t t h e ti m e of acceptance of a be documented. The ac cid en t an d co r
are also sav i n g lives has n o simple an- new machine. I nd eed man y use rs a lre a dy recti o n h is to ry must be u pd a te d through-
40 COMPUTER
out the life of the system. I f any design nonmedical systems. We must learn 10. I'.G. Le veson. " Software Safety in Em
or operating parameters change after from our mistakes so w e do not repeat b edd e d Computer Sys te ms . " Comm.
A C.M. Feb . 1 99 1 . pp. 34-46.
government approval, the AFAR must them . •
be updated to include all changes af
fecting safety.
Unfortunately. the Air Force program
is not practical for commercial systems. Acknowledgments
However. go v e r nm e n t agencies m i g h t
require manufacturers to provide simi Ed M i l l er of th e FDA was e s p eCIall y hel p
ful. hoth in p rovi d i n g information to be in
lar information to users. If required for
cluded i n t h is article a n d i n reviewing and
eve ryone. compe t i t i ve pressures to with
comme n t i ng on the final VerS1 0l1. Gordon
hold information might be lessened. S y m o n d s of the C a n a d i a n Govern m e n t
Manufacturers might find that p r ovid H e a l t h Protection B r a n c h also revie w ed and
ing such information actually increases comme n te d on a draft of the art i c le . Fin a l l y .
t h e referees. several of w h o m were appar Nancy G. Leveson is Boeing p rofe ss or o f
customer loyalty and confidence. An
e n t l y i n timately i nvolv ed i n some of t he acci C omputer Science a n d Engineeri n g a t t h e
emphasis on safety can be turned into a dent s . w ere al so v ery hel pful in provi d i ng C n i versi t y of Washi ngton . Prev iously . she
competitive ad v an t ag e . additional i n formation ahout the accidents. w as a professor in the Information and Co m
p uter Science Departme n t at t h e University
tlf California. I rv i ne . Her research interests
are software safety and reliability. induding
OSt previous accounts of the
M
soft w a re hazard analysi s. requirements spec
Therac-25 accidents blamed References i fic at i on and analysis. d"' i g n for s a fety . and
them on a software error and verification of safety. S h e consults w o r ld
The i nformati on in this article was gathered
stopped there. This is not very useful wi de for i ndust ry and government on safetv
from offici al FDA documents a n d in ternal c r i t ica l sy stem s .
amI. in fact. c a n be misleading and dan
gerous: If we arc to prevent such acci
m emos . la ws u i t depo si ti o n s. l e tt e rs. a n d va r �
Leves n received a RA in mathem ati c s , an
i ous oth e r sources t h at are n ot puhli cly avail MS i n operatIOns research. and a PhD i n
dents in the future. we must dig deeper. able. Cumpu ier doe� not pn.)vidc refere n ces computer science. all from the University o f
Mmt accidents invo l v ing complex tech to documents that are unavailahle to t h e C a l i fornia a t Los Angeles. S h e i s t h e editor
public. in ch i e f of / f, t- t-. irllnclllCiions all Softw<lre
nology are caused by a combination of
Engineering and a m e mh e r of the board of
organizational, managerial. technical.
I.A. Rawlinson. " Report on the Therac- dIre c tors of the Comp u t i ng Re search A sso
and. sometimes. sociological or pol i t i 2 5 . " OCTRF/OCI Physicists Me eti n g . ci a t i on .
cal factors. Preventing accidents requires K i ngston . O n t . . C a n a d a . May 7 . 1 98 .7
paying attention to all the root causes.
not j ust the precipitating event in a par 2. L Houston . "What Do t h e Simple f'olk
devices. In our experience, the same What. a n d How." ACM Computing Sur Universitv of C a l i forn i a. I rv i n e . I r vi n e . CA
types of mistakes are being made i n veys. Vol. 1 8. No. 2. J u n e 1986. pp. 25-69. 927 1 7. e- ';' ail t urne r@li cs . uci . edu.
July 1 993 41