COLUMBIA

ACCIDENT INVESTIGATION BOARD

CHAPTER 7

The Accidentʼs
Organizational Causes

Many accident investigations make the same mistake in UNDERSTANDING CAUSES
defining causes. They identify the widget that broke or mal-
functioned, then locate the person most closely connected In the Boardʼs view, NASAʼs organizational culture and
with the technical failure: the engineer who miscalculated structure had as much to do with this accident as the Exter-
an analysis, the operator who missed signals or pulled the nal Tank foam. Organizational culture refers to the values,
wrong switches, the supervisor who failed to listen, or the norms, beliefs, and practices that govern how an institution
manager who made bad decisions. When causal chains are functions. At the most basic level, organizational culture
limited to technical flaws and individual failures, the ensu- defines the assumptions that employees make as they carry
ing responses aimed at preventing a similar event in the out their work. It is a powerful force that can persist through
future are equally limited: they aim to fix the technical prob- reorganizations and the reassignment of key personnel.
lem and replace or retrain the individual responsible. Such
corrections lead to a misguided and potentially disastrous Given that todayʼs risks in human space flight are as high
belief that the underlying problem has been solved. The and the safety margins as razor thin as they have ever been,
Board did not want to make these errors. A central piece of there is little room for overconfidence. Yet the attitudes
our expanded cause model involves NASA as an organiza- and decision-making of Shuttle Program managers and
tional whole. engineers during the events leading up to this accident were
clearly overconfident and often bureaucratic in nature. They
ORGANIZATIONAL CAUSE STATEMENT deferred to layered and cumbersome regulations rather than
the fundamentals of safety. The Shuttle Programʼs safety
The organizational causes of this accident are rooted culture is straining to hold together the vestiges of a once
in the Space Shuttle Programʼs history and culture, robust systems safety program.
including the original compromises that were re-
quired to gain approval for the Shuttle Program, As the Board investigated the Columbia accident, it expected
subsequent years of resource constraints, fluctuating to find a vigorous safety organization, process, and culture at
priorities, schedule pressures, mischaracterizations of NASA, bearing little resemblance to what the Rogers Com-
the Shuttle as operational rather than developmental, mission identified as the ineffective “silent safety” system in
and lack of an agreed national vision. Cultural traits which budget cuts resulted in a lack of resources, personnel,
and organizational practices detrimental to safety independence, and authority. NASAʼs initial briefings to the
and reliability were allowed to develop, including: Board on its safety programs espoused a risk-averse philoso-
reliance on past success as a substitute for sound phy that empowered any employee to stop an operation at the
engineering practices (such as testing to understand mere glimmer of a problem. Unfortunately, NASAʼs views
why systems were not performing in accordance with of its safety culture in those briefings did not reflect reality.
requirements/specifications); organizational barriers Shuttle Program safety personnel failed to adequately assess
which prevented effective communication of critical anomalies and frequently accepted critical risks without
safety information and stifled professional differences qualitative or quantitative support, even when the tools to
of opinion; lack of integrated management across provide more comprehensive assessments were available.
program elements; and the evolution of an informal
chain of command and decision-making processes Similarly, the Board expected to find NASAʼs Safety and
that operated outside the organizationʼs rules. Mission Assurance organization deeply engaged at every

Report Volume I August 2003 177
COLUMBIA
ACCIDENT INVESTIGATION BOARD

level of Shuttle management: the Flight Readiness Review, (ASAP). The ASAP was intended to be a senior advisory
the Mission Management Team, the Debris Assessment committee to NASA, reviewing space flight safety studies
Team, the Mission Evaluation Room, and so forth. This and operations plans, and evaluating “systems procedures
was not the case. In briefing after briefing, interview after and management policies that contribute to risk.” The
interview, NASA remained in denial: in the agencyʼs eyes, panelʼs main priority was human space flight missions.3
“there were no safety-of-flight issues,” and no safety com- Although four of the panelʼs nine members can be NASA
promises in the long history of debris strikes on the Ther- employees, in recent years few have served as members.
mal Protection System. The silence of Program-level safety While the panelʼs support staff generally consists of full-
processes undermined oversight; when they did not speak time NASA employees, the group technically remains an
up, safety personnel could not fulfill their stated mission independent oversight body.
to provide “checks and balances.” A pattern of acceptance
prevailed throughout the organization that tolerated foam Congress simultaneously mandated that NASA create sepa-
problems without sufficient engineering justification for rate safety and reliability offices at the agencyʼs headquar-
doing so. ters and at each of its Human Space Flight Centers and Pro-
grams. Overall safety oversight became the responsibility
This chapter presents an organizational context for under- of NASAʼs Chief Engineer. Although these offices were not
standing the Columbia accident. Section 7.1 outlines a short totally independent – their funding was linked with the very
history of safety at NASA, beginning in the pre-Apollo era programs they were supposed to oversee – their existence
when the agency reputedly had the finest system safety- allowed NASA to treat safety as a unique function. Until the
engineering programs in the world. Section 7.2 discusses Challenger accident in 1986, NASA safety remained linked
organizational theory and its importance to the Boardʼs in- organizationally and financially to the agencyʼs Human
vestigation, and Section 7.3 examines the practices of three Space Flight Program.
organizations that successfully manage high risk. Sections
7.4 and 7.5 look at NASA today and answer the question, Challenger – 1986
“How could NASA have missed the foam signal?” by high-
lighting the blind spots that rendered the Shuttle Programʼs In the aftermath of the Challenger accident, the Rogers
risk perspective myopic. The Boardʼs conclusion and rec- Commission issued recommendations intended to remedy
ommendations are presented in 7.6. (See Chapter 10 for a what it considered to be basic deficiencies in NASAʼs safety
discussion of the differences between industrial safety and system. These recommendations centered on an underlying
mission assurance/quality assurance.) theme: the lack of independent safety oversight at NASA.
Without independence, the Commission believed, the slate
7.1 ORGANIZATIONAL CAUSES: INSIGHTS FROM of safety failures that contributed to the Challenger accident
HISTORY – such as the undue influence of schedule pressures and the
flawed Flight Readiness process – would not be corrected.
NASAʼs organizational culture is rooted in history and tradi- “NASA should establish an Office of Safety, Reliability,
tion. From NASAʼs inception in 1958 to the Challenger ac- and Quality Assurance to be headed by an Associate Ad-
cident in 1986, the agencyʼs Safety, Reliability, and Quality ministrator, reporting directly to the NASA Administrator,”
Assurance (SRQA) activities, “although distinct disciplines,” concluded the Commission. “It would have direct authority
were “typically treated as one function in the design, devel- for safety, reliability, and quality assurance throughout the
opment, and operations of NASAʼs manned space flight Agency. The office should be assigned the workforce to
programs.”1 Contractors and NASA engineers collaborated ensure adequate oversight of its functions and should be
closely to assure the safety of human space flight. Solid en- independent of other NASA functional and program respon-
gineering practices emphasized defining goals and relating sibilities” [emphasis added].
system performance to them; establishing and using decision
criteria; developing alternatives; modeling systems for analy- In July 1986, NASA Administrator James Fletcher created a
sis; and managing operations.2 Although a NASA Office of Headquarters Office of Safety, Reliability, and Quality As-
Reliability and Quality Assurance existed for a short time surance, which was given responsibility for all agency-wide
during the early 1960s, it was funded by the human space safety-related policy functions. In the process, the position of
flight program. By 1963, the office disappeared from the Chief Engineer was abolished.4 The new officeʼs Associate
agencyʼs organization charts. For the next few years, the only Administrator promptly initiated studies on Shuttle in-flight
type of safety program that existed at NASA was a decentral- anomalies, overtime levels, the lack of spare parts, and land-
ized “loose federation” of risk assessment oversight run by ing and crew safety systems, among other issues.5 Yet NASAʼs
each programʼs contractors and the project offices at each of response to the Rogers Commission recommendation did not
the three Human Space Flight Centers. meet the Commissionʼs intent: the Associate Administrator
did not have direct authority, and safety, reliability, and mis-
Fallout from Apollo – 1967 sion assurance activities across the agency remained depen-
dent on other programs and Centers for funding.
In January 1967, months before the scheduled launch of
Apollo 1, three astronauts died when a fire erupted in a General Accounting Office Review – 1990
ground-test capsule. In response, Congress, seeking to
establish an independent safety organization to oversee A 1990 review by the U.S. General Accounting Office
space flight, created the Aerospace Safety Advisory Panel questioned the effectiveness of NASAʼs new safety organi-

178 Report Volume I August 2003
COLUMBIA
ACCIDENT INVESTIGATION BOARD

zations in a report titled “Space Program Safety: Funding Shuttle Independent Assessment Team – 1999
for NASAʼs Safety Organizations Should Be Centralized.”6
The report concluded “NASA did not have an independent Just three years later, after a number of close calls, NASA
and effective safety organization” [emphasis added]. Al- chartered the Shuttle Independent Assessment Team to
though the safety organizational structure may have “ap- examine Shuttle sub-systems and maintenance practices
peared adequate,” in the late 1980s the space agency had (see Chapter 5). The Shuttle Independent Assessment Team
concentrated most of its efforts on creating an independent Report sounded a stern warning about the quality of NASAʼs
safety office at NASA Headquarters. In contrast, the safety Safety and Mission Assurance efforts and noted that the
offices at NASAʼs field centers “were not entirely indepen- Space Shuttle Program had undergone a massive change in
dent because they obtained most of their funds from activi- structure and was transitioning to “a slimmed down, con-
ties whose safety-related performance they were responsible tractor-run operation.”
for overseeing.” The General Accounting Office worried
that “the lack of centralized independent funding may also The team produced several pointed conclusions: the Shuttle
restrict the flexibility of center safety managers.” It also Program was inappropriately using previous success as
suggested “most NASA safety managers believe that cen- a justification for accepting increased risk; the Shuttle
tralized SRM&QA [Safety, Reliability, Maintainability and Programʼs ability to manage risk was being eroded “by the
Quality Assurance] funding would ensure independence.” desire to reduce costs;” the size and complexity of the Shut-
NASA did not institute centralized funding in response to tle Program and NASA/contractor relationships demanded
the General Accounting Office report, nor has it since. The better communication practices; NASAʼs safety and mission
problems outlined in 1990 persist to this day. assurance organization was not sufficiently independent; and
“the workforce has received a conflicting message due to
Space Flight Operations Contract – 1996 the emphasis on achieving cost and staff reductions, and the
pressures placed on increasing scheduled flights as a result
The Space Flight Operations Contract was intended to of the Space Station” [emphasis added].8 The Shuttle Inde-
streamline and modernize NASAʼs cumbersome contracting pendent Assessment Team found failures of communication
practices, thereby freeing the agency to focus on research to flow up from the “shop floor” and down from supervisors
and development (see Chapter 5). Yet its implementation to workers, deficiencies in problem and waiver-tracking
complicated issues of safety independence. A single contrac- systems, potential conflicts of interest between Program and
tor would, in principle, provide “oversight” on production, contractor goals, and a general failure to communicate re-
safety, and mission assurance, as well as cost management, quirements and changes across organizations. In general, the
while NASA maintained “insight” into safety and quality Programʼs organizational culture was deemed “too insular.”9
assurance through reviews and metrics. Indeed, the reduc-
tion to a single primary contract simplified some aspects of NASA subsequently formed an Integrated Action Team to
the NASA/contractor interface. However, as a result, ex- develop a plan to address the recommendations from pre-
perienced engineers changed jobs, NASA grew dependent vious Program-specific assessments, including the Shuttle
on contractors for technical support, contract monitoring Independent Assessment Team, and to formulate improve-
requirements increased, and positions were subsequently ments.10 In part this effort was also a response to program
staffed by less experienced engineers who were placed in missteps in the drive for efficiency seen in the “faster, better,
management roles. cheaper” NASA of the 1990s. The NASA Integrated Action
Team observed: “NASA should continue to remove commu-
Collectively, this eroded NASAʼs in-house engineering nication barriers and foster an inclusive environment where
and technical capabilities and increased the agencyʼs reli- open communication is the norm.” The intent was to estab-
ance on the United Space Alliance and its subcontractors lish an initiative where “the importance of communication
to identify, track, and resolve problems. The contract also and a culture of trust and openness permeate all facets of the
involved substantial transfers of safety responsibility from organization.” The report indicated that “multiple processes
the government to the private sector; rollbacks of tens of to get the messages across the organizational structure”
thousands of Government Mandated Inspection Points; would need to be explored and fostered [emphasis added].
and vast reductions in NASAʼs in-house safety-related The report recommended that NASA solicit expert advice in
technical expertise (see Chapter 10). In the aggregate, these identifying and removing barriers, providing tools, training,
mid-1990s transformations rendered NASAʼs already prob- and education, and facilitating communication processes.
lematic safety system simultaneously weaker and more
complex. The Shuttle Independent Assessment Team and NASA Inte-
grated Action Team findings mirror those presented by the
The effects of transitioning Shuttle operations to the Space Rogers Commission. The same communication problems
Flight Operations Contract were not immediately apparent persisted in the Space Shuttle Program at the time of the
in the years following implementation. In November 1996, Columbia accident.
as the contract was being implemented, the Aerospace
Safety Advisory Panel published a comprehensive contract Space Shuttle Competitive Source
review, which concluded that the effort “to streamline the Task Force – 2002
Space Shuttle program has not inadvertently created unac-
ceptable flight or ground risks.”7 The Aerospace Safety Ad- In 2002, a 14-member Space Shuttle Competitive Task
visory Panelʼs passing grades proved temporary. Force supported by the RAND Corporation examined com-

Report Volume I August 2003 179
COLUMBIA
ACCIDENT INVESTIGATION BOARD

petitive sourcing options for the Shuttle Program. In its final Boardʼs deliberation. Fundamental to each theory is the im-
report to NASA, the team highlighted several safety-related portance of strong organizational culture and commitment to
concerns, which the Board shares: building successful safety strategies.

• Flight and ground hardware and software are obsolete, The Board selected certain well-known traits from these
and safety upgrades and aging infrastructure repairs models to use as a yardstick to assess the Space Shuttle
have been deferred. Program, and found them particularly useful in shaping its
• Budget constraints have impacted personnel and re- views on whether NASAʼs current organization of its Hu-
sources required for maintenance and upgrades. man Space Flight Program is appropriate for the remaining
• International Space Station schedules exert significant years of Shuttle operation and beyond. Additionally, organi-
pressures on the Shuttle Program. zational theory, which encompasses organizational culture,
• Certain mechanisms may impede worker anonymity in structure, history, and hierarchy, is used to explain the
reporting safety concerns. Columbia accident, and, ultimately, combines with Chapters
• NASA does not have a truly independent safety function 5 and 6 to produce an expanded explanation of the accidentʼs
with the authority to halt the progress of a critical mis- causes.16 The Board believes the following considerations
sion element. 11 are critical to understand what went wrong during STS-107.
They will become the central motifs of the Boardʼs analysis
Based on these findings, the task force suggested that an In- later in this chapter.
dependent Safety Assurance function should be created that
would hold one of “three keys” in the Certification of Flight • Commitment to a Safety Culture: NASAʼs safety cul-
Readiness process (NASA and the operating contractor ture has become reactive, complacent, and dominated
would hold the other two), effectively giving this function by unjustified optimism. Over time, slowly and unin-
the ability to stop any launch. Although in the Boardʼs view tentionally, independent checks and balances intended
the “third key” Certification of Flight Readiness process is to increase safety have been eroded in favor of detailed
not a perfect solution, independent safety and verification processes that produce massive amounts of data and
functions are vital to continued Shuttle operations. This unwarranted consensus, but little effective communica-
independent function should possess the authority to shut tion. Organizations that successfully deal with high-risk
down the flight preparation processes or intervene post- technologies create and sustain a disciplined safety sys-
launch when an anomaly occurs. tem capable of identifying, analyzing, and controlling
hazards throughout a technologyʼs life cycle.
7.2 ORGANIZATIONAL CAUSES: INSIGHTS FROM
THEORY • Ability to Operate in Both a Centralized and Decen-
tralized Manner: The ability to operate in a centralized
To develop a thorough understanding of accident causes and manner when appropriate, and to operate in a decentral-
risk, and to better interpret the chain of events that led to the ized manner when appropriate, is the hallmark of a
Columbia accident, the Board turned to the contemporary high-reliability organization. On the operational side,
social science literature on accidents and risk and sought the Space Shuttle Program has a highly centralized
insight from experts in High Reliability, Normal Accident, structure. Launch commit criteria and flight rules gov-
and Organizational Theory.12 Additionally, the Board held a ern every imaginable contingency. The Mission Control
forum, organized by the National Safety Council, to define Center and the Mission Management Team have very
the essential characteristics of a sound safety program.13 capable decentralized processes to solve problems that
are not covered by such rules. The process is so highly
High Reliability Theory argues that organizations operating regarded that it is considered one of the best problem-
high-risk technologies, if properly designed and managed, solving organizations of its type.17 In these situations,
can compensate for inevitable human shortcomings, and mature processes anchor rules, procedures, and routines
therefore avoid mistakes that under other circumstances to make the Shuttle Programʼs matrixed workforce
would lead to catastrophic failures.14 Normal Accident seamless, at least on the surface.
Theory, on the other hand, has a more pessimistic view of
the ability of organizations and their members to manage Nevertheless, it is evident that the position one occupies
high-risk technology. Normal Accident Theory holds that in this structure makes a difference. When supporting
organizational and technological complexity contributes organizations try to “push back” against centralized
to failures. Organizations that aspire to failure-free perfor- Program direction – like the Debris Assessment Team
mance are inevitably doomed to fail because of the inherent did during STS-107 – independent analysis gener-
risks in the technology they operate.15 Normal Accident ated by a decentralized decision-making process can
models also emphasize systems approaches and systems be stifled. The Debris Assessment Team, working in an
thinking, while the High Reliability model works from the essentially decentralized format, was well-led and had
bottom up: if each component is highly reliable, then the the right expertise to work the problem, but their charter
system will be highly reliable and safe. was “fuzzy,” and the team had little direct connection
to the Mission Management Team. This lack of connec-
Though neither High Reliability Theory nor Normal Ac- tion to the Mission Management Team and the Mission
cident Theory is entirely appropriate for understanding Evaluation Room is the single most compelling reason
this accident, insights from each figured prominently in the why communications were so poor during the debris

180 Report Volume I August 2003
COLUMBIA
ACCIDENT INVESTIGATION BOARD

assessment. In this case, the Shuttle Program was un- high-risk enterprise have fallen victim to bureaucratic
able to simultaneously manage both the centralized and efficiency. Years of workforce reductions and outsourc-
decentralized systems. ing have culled from NASAʼs workforce the layers of
experience and hands-on systems knowledge that once
• Importance of Communication: At every juncture provided a capacity for safety oversight. Safety and
of STS-107, the Shuttle Programʼs structure and pro- Mission Assurance personnel have been eliminated, ca-
cesses, and therefore the managers in charge, resisted reers in safety have lost organizational prestige, and the
new information. Early in the mission, it became clear Program now decides on its own how much safety and
that the Program was not going to authorize imaging of engineering oversight it needs. Aiming to align its in-
the Orbiter because, in the Programʼs opinion, images spection regime with the International Organization for
were not needed. Overwhelming evidence indicates that Standardization 9000/9001 protocol, commonly used in
Program leaders decided the foam strike was merely a industrial environments – environments very different
maintenance problem long before any analysis had be- than the Shuttle Program – the Human Space Flight
gun. Every manager knew the party line: “weʼll wait for Program shifted from a comprehensive “oversight”
the analysis – no safety-of-flight issue expected.” Pro- inspection process to a more limited “insight” process,
gram leaders spent at least as much time making sure cutting mandatory inspection points by more than half
hierarchical rules and processes were followed as they and leaving even fewer workers to make “second” or
did trying to establish why anyone would want a picture “third” Shuttle systems checks (see Chapter 10).
of the Orbiter. These attitudes are incompatible with an
organization that deals with high-risk technology. Implications for the Shuttle Program Organization

• Avoiding Oversimplification: The Columbia accident The Boardʼs investigation into the Columbia accident re-
is an unfortunate illustration of how NASAʼs strong vealed two major causes with which NASA has to contend:
cultural bias and its optimistic organizational think- one technical, the other organizational. As mentioned earlier,
ing undermined effective decision-making. Over the the Board studied the two dominant theories on complex or-
course of 22 years, foam strikes were normalized to the ganizations and accidents involving high-risk technologies.
point where they were simply a “maintenance” issue These schools of thought were influential in shaping the
– a concern that did not threaten a missionʼs success. Boardʼs organizational recommendations, primarily because
This oversimplification of the threat posed by foam each takes a different approach to understanding accidents
debris rendered the issue a low-level concern in the and risk.
minds of Shuttle managers. Ascent risk, so evident in
Challenger, biased leaders to focus on strong signals The Board determined that high-reliability theory is ex-
from the Shuttle System Main Engine and the Solid tremely useful in describing the culture that should exist in
Rocket Boosters. Foam strikes, by comparison, were the human space flight organization. NASA and the Space
a weak and consequently overlooked signal, although Shuttle Program must be committed to a strong safety
they turned out to be no less dangerous. culture, a view that serious accidents can be prevented, a
willingness to learn from mistakes, from technology, and
• Conditioned by Success: Even after it was clear from from others, and a realistic training program that empowers
the launch videos that foam had struck the Orbiter in a employees to know when to decentralize or centralize prob-
manner never before seen, Space Shuttle Program man- lem-solving. The Shuttle Program cannot afford the mindset
agers were not unduly alarmed. They could not imagine that accidents are inevitable because it may lead to unneces-
why anyone would want a photo of something that sarily accepting known and preventable risks.
could be fixed after landing. More importantly, learned
attitudes about foam strikes diminished managementʼs The Board believes normal accident theory has a key role
wariness of their danger. The Shuttle Program turned in human spaceflight as well. Complex organizations need
“the experience of failure into the memory of suc- specific mechanisms to maintain their commitment to safety
cess.”18 Managers also failed to develop simple con- and assist their understanding of how complex interactions
tingency plans for a re-entry emergency. They were can make organizations accident-prone. Organizations can-
convinced, without study, that nothing could be done not put blind faith into redundant warning systems because
about such an emergency. The intellectual curiosity and they inherently create more complexity, and this complexity
skepticism that a solid safety culture requires was al- in turn often produces unintended system interactions that
most entirely absent. Shuttle managers did not embrace can lead to failure. The Human Space Flight Program must
safety-conscious attitudes. Instead, their attitudes were realize that additional protective layers are not always the
shaped and reinforced by an organization that, in this in- best choice. The Program must also remain sensitive to the
stance, was incapable of stepping back and gauging its fact that despite its best intentions, managers, engineers,
biases. Bureaucracy and process trumped thoroughness safety professionals, and other employees, can, when con-
and reason. fronted with extraordinary demands, act in counterproduc-
tive ways.
• Significance of Redundancy: The Human Space Flight
Program has compromised the many redundant process- The challenges to failure-free performance highlighted by
es, checks, and balances that should identify and correct these two theoretical approaches will always be present in
small errors. Redundant systems essential to every an organization that aims to send humans into space. What

Report Volume I August 2003 181
COLUMBIA
ACCIDENT INVESTIGATION BOARD

can the Program do about these difficulties? The Board con- The Navy SUBSAFE and Naval Reactor programs exercise
sidered three alternatives. First, the Board could recommend a high degree of engineering discipline, emphasize total
that NASA follow traditional paths to improving safety by responsibility of individuals and organizations, and provide
making changes to policy, procedures, and processes. These redundant and rapid means of communicating problems
initiatives could improve organizational culture. The analy- to decision-makers. The Navyʼs nuclear safety program
sis provided by experts and the literature leads the Board emerged with its first nuclear-powered warship (USS Nau-
to conclude that although reforming management practices tilus), while non-nuclear SUBSAFE practices evolved from
has certain merits, it also has critical limitations. Second, the from past flooding mishaps and philosophies first introduced
Board could recommend that the Shuttle is simply too risky by Naval Reactors. The Navy lost two nuclear-powered
and should be grounded. As will be discussed in Chapter submarines in the 1960s – the USS Thresher in 1963 and
9, the Board is committed to continuing human space ex- the Scorpion 1968 – which resulted in a renewed effort to
ploration, and believes the Shuttle Program can and should prevent accidents.21 The SUBSAFE program was initiated
continue to operate. Finally, the Board could recommend a just two months after the Thresher mishap to identify criti-
significant change to the organizational structure that con- cal changes to submarine certification requirements. Until a
trols the Space Shuttle Programʼs technology. As will be ship was independently recertified, its operating depth and
discussed at length in this chapterʼs conclusion, the Board maneuvers were limited. SUBSAFE proved its value as a
believes this option has the best chance to successfully man- means of verifying the readiness and safety of submarines,
age the complexities and risks of human space flight. and continues to do so today.22

7.3 ORGANIZATIONAL CAUSES: EVALUATING BEST The Naval Reactor Program is a joint Navy/Department
SAFETY PRACTICES of Energy organization responsible for all aspects of Navy
nuclear propulsion, including research, design, construction,
Many of the principles of solid safety practice identified as testing, training, operation, maintenance, and the disposi-
crucial by independent reviews of NASA and in accident tion of the nuclear propulsion plants onboard many Naval
and risk literature are exhibited by organizations that, like ships and submarines, as well as their radioactive materials.
NASA, operate risky technologies with little or no margin Although the naval fleet is ultimately responsible for day-
for error. While the Board appreciates that organizations to-day operations and maintenance, those operations occur
dealing with high-risk technology cannot sustain accident- within parameters established by an entirely independent
free performance indefinitely, evidence suggests that there division of Naval Reactors.
are effective ways to minimize risk and limit the number of
accidents. The U.S. nuclear Navy has more than 5,500 reactor years of
experience without a reactor accident. Put another way, nu-
In this section, the Board compares NASA to three specific clear-powered warships have steamed a cumulative total of
examples of independent safety programs that have strived over 127 million miles, which is roughly equivalent to over
for accident-free performance and have, by and large, 265 lunar roundtrips. In contrast, the Space Shuttle Program
achieved it: the U.S. Navy Submarine Flooding Prevention has spent about three years on-orbit, although its spacecraft
and Recovery (SUBSAFE), Naval Nuclear Propulsion (Na- have traveled some 420 million miles.
val Reactors) programs, and the Aerospace Corporationʼs
Launch Verification Process, which supports U.S. Air Force Naval Reactor success depends on several key elements:
space launches.19 The safety cultures and organizational
structure of all three make them highly adept in dealing • Concise and timely communication of problems using
with inordinately high risk by designing hardware and man- redundant paths
agement systems that prevent seemingly inconsequential • Insistence on airing minority opinions
failures from leading to major accidents. Although size, • Formal written reports based on independent peer-re-
complexity, and missions in these organizations and NASA viewed recommendations from prime contractors
differ, the following comparisons yield valuable lessons for • Facing facts objectively and with attention to detail
the space agency to consider when re-designing its organiza- • Ability to manage change and deal with obsolescence of
tion to increase safety. classes of warships over their lifetime

Navy Submarine and Reactor Safety Programs These elements can be grouped into several thematic cat-
egories:
Human space flight and submarine programs share notable
similarities. Spacecraft and submarines both operate in haz- • Communication and Action: Formal and informal
ardous environments, use complex and dangerous systems, practices ensure that relevant personnel at all levels are
and perform missions of critical national significance. Both informed of technical decisions and actions that affect
NASA and Navy operational experience include failures (for their area of responsibility. Contractor technical recom-
example, USS Thresher, USS Scorpion, Apollo 1 capsule mendations and government actions are documented in
fire, Challenger, and Columbia). Prior to the Columbia mis- peer-reviewed formal written correspondence. Unlike
hap, Administrator Sean OʼKeefe initiated the NASA/Navy NASA, PowerPoint briefings and papers for technical
Benchmarking Exchange to compare and contrast the pro- seminars are not substitutes for completed staff work. In
grams, specifically in safety and mission assurance.20 addition, contractors strive to provide recommendations

182 Report Volume I August 2003
COLUMBIA
ACCIDENT INVESTIGATION BOARD

based on a technical need, uninfluenced by headquarters • SUBSAFE requirements are clearly documented and
or its representatives. Accordingly, division of respon- achievable, with minimal “tailoring” or granting of
sibilities between the contractor and the Government waivers. NASA requirements are clearly documented
remain clear, and a system of checks and balances is but are also more easily waived.
therefore inherent.
• A separate compliance verification organization inde-
• Recurring Training and Learning From Mistakes: pendently assesses program management.24 NASAʼs
The Naval Reactor Program has yet to experience a Flight Preparation Process, which leads to Certification
reactor accident. This success is partially a testament of Flight Readiness, is supposed to be an independent
to design, but also due to relentless and innovative check-and-balance process. However, the Shuttle
training, grounded on lessons learned both inside and Programʼs control of both engineering and safety com-
outside the program. For example, since 1996, Naval promises the independence of the Flight Preparation
Reactors has educated more than 5,000 Naval Nuclear Process.
Propulsion Program personnel on the lessons learned
from the Challenger accident.23 Senior NASA man- • The submarine Navy has a strong safety culture that em-
agers recently attended the 143rd presentation of the phasizes understanding and learning from past failures.
Naval Reactors seminar entitled “The Challenger Ac- NASA emphasizes safety as well, but training programs
cident Re-examined.” The Board credits NASAʼs inter- are not robust and methods of learning from past fail-
est in the Navy nuclear community, and encourages the ures are informal.
agency to continue to learn from the mistakes of other
organizations as well as from its own. • The Navy implements extensive safety training based
on the Thresher and Scorpion accidents. NASA has not
• Encouraging Minority Opinions: The Naval Reactor focused on any of its past accidents as a means of men-
Program encourages minority opinions and “bad news.” toring new engineers or those destined for management
Leaders continually emphasize that when no minority positions.
opinions are present, the responsibility for a thorough
and critical examination falls to management. Alternate • The SUBSAFE structure is enhanced by the clarity,
perspectives and critical questions are always encour- uniformity, and consistency of submarine safety re-
aged. In practice, NASA does not appear to embrace quirements and responsibilities. Program managers are
these attitudes. Board interviews revealed that it is diffi- not permitted to “tailor” requirements without approval
cult for minority and dissenting opinions to percolate up from the organization with final authority for technical
through the agencyʼs hierarchy, despite processes like requirements and the organization that verifies SUB-
the anonymous NASA Safety Reporting System that SAFEʼs compliance with critical design and process
supposedly encourages the airing of opinions. requirements.25

• Retaining Knowledge: Naval Reactors uses many • The SUBSAFE Program and implementing organiza-
mechanisms to ensure knowledge is retained. The Di- tion are relatively immune to budget pressures. NASAʼs
rector serves a minimum eight-year term, and the pro- program structure requires the Program Manager posi-
gram documents the history of the rationale for every tion to consider such issues, which forces the manager
technical requirement. Key personnel in Headquarters to juggle cost, schedule, and safety considerations. In-
routinely rotate into field positions to remain familiar dependent advice on these issues is therefore inevitably
with every aspect of operations, training, maintenance, subject to political and administrative pressure.
development and the workforce. Current and past is-
sues are discussed in open forum with the Director and • Compliance with critical SUBSAFE design and pro-
immediate staff at “all-hands” informational meetings cess requirements is independently verified by a highly
under an in-house professional development program. capable centralized organization that also “owns” the
NASA lacks such a program. processes and monitors the program for compliance.

• Worst-Case Event Failures: Naval Reactors hazard • Quantitative safety assessments in the Navy submarine
analyses evaluate potential damage to the reactor plant, program are deterministic rather than probabilistic.
potential impact on people, and potential environmental NASA does not have a quantitative, program-wide risk
impact. The Board identified NASAʼs failure to ad- and safety database to support future design capabilities
equately prepare for a range of worst-case scenarios as and assist risk assessment teams.
a weakness in the agencyʼs safety and mission assurance
training programs. Comparing Navy Programs with NASA

SUBSAFE Significant differences exist between NASA and Navy sub-
marine programs.
The Board observed the following during its study of the
Navyʼs SUBSAFE Program. • Requirements Ownership (Technical Authority):
Both the SUBSAFE and Naval Reactorsʼ organizational

Report Volume I August 2003 183
COLUMBIA
ACCIDENT INVESTIGATION BOARD

approach separates the technical and funding authority Aerospace staff, a review of launch system design and pay-
from program management in safety matters. The Board load integration, and a review of the adequacy of flight and
believes this separation of authority of program man- ground hardware, software, and interfaces. This “concept-
agers – who, by nature, must be sensitive to costs and to-orbit” process begins in the design requirements phase,
schedules – and “owners” of technical requirements and continues through the formal verification to countdown
waiver capabilities – who, by nature, are more sensitive and launch, and concludes with a post-flight evaluation of
to safety and technical rigor – is crucial. In the Naval events with findings for subsequent missions. Aerospace
Reactors Program, safety matters are the responsibility Corporation personnel cover the depth and breadth of space
of the technical authority. They are not merely relegated disciplines, and the organization has its own integrated en-
to an independent safety organization with oversight gineering analysis, laboratory, and test matrix capability.
responsibilities. This creates valuable checks and bal- This enables the Aerospace Corporation to rapidly transfer
ances for safety matters in the Naval Reactors Program lessons learned and respond to program anomalies. Most
technical “requirements owner” community. importantly, Aerospace is uniquely independent and is not
subject to any schedule or cost pressures.
• Emphasis on Lessons Learned: Both Naval Reac-
tors and the SUBSAFE have “institutionalized” their The Aerospace Corporation and the Air Force have found
“lessons learned” approaches to ensure that knowl- the independent launch verification process extremely
edge gained from both good and bad experience valuable. Aerospace Corporation involvement in Air Force
is maintained in corporate memory. This has been launch verification has significantly reduced engineering er-
accomplished by designating a central technical au- rors, resulting in a 2.9 percent “probability-of-failure” rate
thority responsible for establishing and maintaining for expendable launch vehicles, compared to 14.6 percent in
functional technical requirements as well as providing the commercial sector.27
an organizational and institutional focus for capturing,
documenting, and using operational lessons to improve Conclusion
future designs. NASA has an impressive history of
scientific discovery, but can learn much from the ap- The practices noted here suggest that responsibility and au-
plication of lessons learned, especially those that relate thority for decisions involving technical requirements and
to future vehicle design and training for contingen- safety should rest with an independent technical authority.
cies. NASA has a broad Lessons Learned Information Organizations that successfully operate high-risk technolo-
System that is strictly voluntary for program/project gies have a major characteristic in common: they place a
managers and management teams. Ideally, the Lessons premium on safety and reliability by structuring their pro-
Learned Information System should support overall grams so that technical and safety engineering organizations
program management and engineering functions and own the process of determining, maintaining, and waiving
provide a historical experience base to aid conceptual technical requirements with a voice that is equal to yet in-
developments and preliminary design. dependent of Program Managers, who are governed by cost,
schedule and mission-accomplishment goals. The Naval
The Aerospace Corporation Reactors Program, SUBSAFE program, and the Aerospace
Corporation are examples of organizations that have in-
The Aerospace Corporation, created in 1960, operates as a vested in redundant technical authorities and processes to
Federally Funded Research and Development Center that become highly reliable.
supports the government in science and technology that is
critical to national security. It is the equivalent of a $500 7.4 ORGANIZATIONAL CAUSES:
million enterprise that supports U.S. Air Force planning, A BROKEN SAFETY CULTURE
development, and acquisition of space launch systems.
The Aerospace Corporation employs approximately 3,200 Perhaps the most perplexing question the Board faced
people including 2,200 technical staff (29 percent Doctors during its seven-month investigation into the Columbia
of Philosophy, 41 percent Masters of Science) who conduct accident was “How could NASA have missed the signals
advanced planning, system design and integration, verify the foam was sending?” Answering this question was a
readiness, and provide technical oversight of contractors.26 challenge. The investigation revealed that in most cases,
the Human Space Flight Program is extremely aggressive in
The Aerospace Corporationʼs independent launch verifica- reducing threats to safety. But we also know – in hindsight
tion process offers another relevant benchmark for NASAʼs – that detection of the dangers posed by foam was impeded
safety and mission assurance program. Several aspects of by “blind spots” in NASAʼs safety culture.
the Aerospace Corporation launch verification process and
independent mission assurance structure could be tailored to From the beginning, the Board witnessed a consistent lack
the Shuttle Program. of concern about the debris strike on Columbia. NASA man-
agers told the Board “there was no safety-of-flight issue”
Aerospaceʼs primary product is a formal verification letter and “we couldnʼt have done anything about it anyway.” The
to the Air Force Systems Program Office stating a vehicle investigation uncovered a troubling pattern in which Shuttle
has been independently verified as ready for launch. The Program management made erroneous assumptions about
verification includes an independent General Systems En- the robustness of a system based on prior success rather than
gineering and Integration review of launch preparations by on dependable engineering data and rigorous testing.

184 Report Volume I August 2003
COLUMBIA
ACCIDENT INVESTIGATION BOARD

The Shuttle Programʼs complex structure erected barriers quarters and decentralized execution of safety programs at
to effective communication and its safety culture no longer the enterprise, program, and project levels. Headquarters
asks enough hard questions about risk. (Safety culture refers dictates what must be done, not how it should be done. The
to an organizationʼs characteristics and attitudes – promoted operational premise that logically follows is that safety is the
by its leaders and internalized by its members – that serve responsibility of program and project managers. Managers
to make safety the top priority.) In this context, the Board are subsequently given flexibility to organize safety efforts
believes the mistakes that were made on STS-107 are not as they see fit, while NASA Headquarters is charged with
isolated failures, but are indicative of systemic flaws that maintaining oversight through independent surveillance and
existed prior to the accident. Had the Shuttle Program ob- assessment.28 NASA policy dictates that safety programs
served the principles discussed in the previous two sections, should be placed high enough in the organization, and be
the threat that foam posed to the Orbiter, particularly after vested with enough authority and seniority, to “maintain
the STS-112 and STS-107 foam strikes, might have been independence.” Signals of potential danger, anomalies,
more fully appreciated by Shuttle Program management. and critical information should, in principle, surface in the
hazard identification process and be tracked with risk assess-
In this section, the Board examines the NASAʼs safety ments supported by engineering analyses. In reality, such a
policy, structure, and process, communication barriers, the process demands a more independent status than NASA has
risk assessment systems that govern decision-making and ever been willing to give its safety organizations, despite the
risk management, and the Shuttle Programʼs penchant for recommendations of numerous outside experts over nearly
substituting analysis for testing. two decades, including the Rogers Commission (1986),
General Accounting Office (1990), and the Shuttle Indepen-
NASAʼs Safety: Policy, Structure, and Process dent Assessment Team (2000).

Safety Policy Safety Organization Structure

NASAʼs current philosophy for safety and mission assur- Center safety organizations that support the Shuttle Pro-
ance calls for centralized policy and oversight at Head- gram are tailored to the missions they perform. Johnson and

Issue:
Same Individual, 4 roles that
cross Center, Program and
NASA Administrator Headquarters responsibilies

Result:
Failure of checks and balances

Code M (Safety Advisor)
Code Q
Office of Space Flight AA Safety and Mission Assurance AA

Code Q MMT Letter

Space Shuttle
Deputy AA SR & QA Manager
JSC Center Director
ISS/SSP

Verbal Input JSC SR & QA
Director

ISS Program Space Shuttle JSC Organization SR & QA Director
Manager Program Managers
Manager Independent
Space Shuttle
Division Chief Assessment
Office

Shuttle Element Managers
Endorse
Space Shuttle
Space Shuttle
Organization
S & MA Manager
Managers

Funding via Integrated Task Agreements

United Space Alliance Responsibility
Vice President SQ & MA
Policy/Advice

Figure 7.4-1. Independent safety checks and balance failure.

Report Volume I August 2003 185
COLUMBIA
ACCIDENT INVESTIGATION BOARD

Marshall Safety and Mission Assurance organizations are tion. A similar argument can be made about the placement
organized similarly. In contrast, Kennedy has decentralized of quality assurance in the Shuttle Processing Divisions at
its Safety and Mission Assurance components and assigned Kennedy, which increases the risk that quality assurance
them to the Shuttle Processing Directorate. This manage- personnel will become too “familiar” with programs they are
ment change renders Kennedyʼs Safety and Mission Assur- charged to oversee, which hinders oversight and judgment.
ance structure even more dependent on the Shuttle Program,
which reduces effective oversight. The Board believes that although the Space Shuttle Program
has effective safety practices at the “shop floor” level, its
At Johnson, safety programs are centralized under a Direc- operational and systems safety program is flawed by its
tor who oversees five divisions and an Independent Assess- dependence on the Shuttle Program. Hindered by a cumber-
ment Office. Each division has clearly-defined roles and some organizational structure, chronic understaffing, and
responsibilities, with the exception of the Space Shuttle poor management principles, the safety apparatus is not
Division Chief, whose job description does not reflect the currently capable of fulfilling its mission. An independent
full scope of authority and responsibility ostensibly vested safety structure would provide the Shuttle Program a more
in the position. Yet the Space Shuttle Division Chief is em- effective operational safety process. Crucial components of
powered to represent the Center, the Shuttle Program, and this structure include a comprehensive integration of safety
NASA Headquarters Safety and Mission Assurance at criti- across all the Shuttle programs and elements, and a more
cal junctures in the safety process. The position therefore independent system of checks and balances.
represents a critical node in NASAʼs Safety and Mission As-
surance architecture that seems to the Board to be plagued Safety Process
by conflict of interest. It is a single point of failure without
any checks or balances. In response to the Rogers Commission Report, NASA es-
tablished what is now known as the Office of Safety and
Johnson also has a Shuttle Program Safety and Mission Mission Assurance at Headquarters to independently moni-
Assurance Manager who oversees United Space Allianceʼs tor safety and ensure communication and accountability
safety organization. The Shuttle Program further receives agency-wide. The Office of Safety and Mission Assurance
program safety support from the Centerʼs Safety, Reliability, monitors unusual events like “out of family” anomalies
and Quality Assurance Space Shuttle Division. Johnsonʼs and establishes agency-wide Safety and Mission Assurance
Space Shuttle Division Chief has the additional role of policy. (An out-of-family event is an operation or perfor-
Shuttle Program Safety, Reliability, and Quality Assurance mance outside the expected performance range for a given
Manager (see Figure 7.4-1). Over the years, this dual desig- parameter or which has not previously been experienced.)
nation has resulted in a general acceptance of the fact that The Office of Safety and Mission Assurance also screens the
the Johnson Space Shuttle Division Chief performs duties Shuttle Programʼs Flight Readiness Process and signs the
on both the Centerʼs and Programʼs behalf. The detached Certificate of Flight Readiness. The Shuttle Program Man-
nature of the support provided by the Space Shuttle Division ager, in turn, is responsible for overall Shuttle safety and is
Chief, and the wide band of the positionʼs responsibilities supported by a one-person safety staff.
throughout multiple layers of NASAʼs hierarchy, confuses
lines of authority, responsibility, and accountability in a The Shuttle Program has been permitted to organize its
manner that almost defies explanation. safety program as it sees fit, which has resulted in a lack of
standardized structure throughout NASAʼs various Centers,
A March 2001 NASA Office of Inspector General Audit enterprises, programs, and projects. The level of funding a
Report on Space Shuttle Program Management Safety Ob- program is granted impacts how much safety the Program
servations made the same point: can “buy” from a Centerʼs safety organization. In turn, Safe-
ty and Mission Assurance organizations struggle to antici-
The job descriptions and responsibilities of the Space pate program requirements and guarantee adequate support
Shuttle Program Manager and Chief, Johnson Safety for the many programs for which they are responsible.
Office Space Shuttle Division, are nearly identical with
each official reporting to a different manager. This over- It is the Boardʼs view, shared by previous assessments,
lap in responsibilities conflicts with the SFOC [Space that the current safety system structure leaves the Office of
Flight Operations Contract] and NSTS 07700, which Safety and Mission Assurance ill-equipped to hold a strong
requires the Chief, Johnson Safety Office Space Shuttle and central role in integrating safety functions. NASA Head-
Division, to provide matrixed personnel support to the quarters has not effectively integrated safety efforts across
Space Shuttle Program Safety Manager in fulfilling re- its culturally and technically distinct Centers. In addition,
quirements applicable to the safety, reliability, and qual- the practice of “buying” safety services establishes a rela-
ity assurance aspects of the Space Shuttle Program. tionship in which programs sustain the very livelihoods of
the safety experts hired to oversee them. These idiosyncra-
The fact that Headquarters, Center, and Program functions sies of structure and funding preclude the safety organiza-
are rolled-up into one position is an example of how a care- tion from effectively providing independent safety analysis.
fully designed oversight process has been circumvented and
made susceptible to conflicts of interest. This organizational The commit-to-flight review process, as described in Chap-
construct is unnecessarily bureaucratic and defeats NASAʼs ters 2 and 6, consists of program reviews and readiness polls
stated objective of providing an independent safety func- that are structured to allow NASAʼs senior leaders to assess

186 Report Volume I August 2003
COLUMBIA
ACCIDENT INVESTIGATION BOARD

mission readiness. In like fashion, safety organizations affil- Despite periodic attempts to emphasize safety, NASAʼs fre-
iated with various projects, programs, and Centers at NASA, quent reorganizations in the drive to become more efficient
conduct a Pre-launch Assessment Review of safety prepara- reduced the budget for safety, sending employees conflict-
tions and mission concerns. The Shuttle Program does not ing messages and creating conditions more conducive to
officially sanction the Pre-launch Assessment Review, which the development of a conventional bureaucracy than to the
updates the Associate Administrator for Safety and Mission maintenance of a safety-conscious research-and-develop-
Assurance on safety concerns during the Flight Readiness ment organization. Over time, a pattern of ineffective com-
Review/Certification of Flight Readiness process. munication has resulted, leaving risks improperly defined,
problems unreported, and concerns unexpressed.30 The
The Johnson Space Shuttle Safety, Reliability, and Quality question is, why?
Assurance Division Chief orchestrates this review on behalf
of Headquarters. Note that this division chief also advises The transition to the Space Flight Operations Contract – and
the Shuttle Program Manager of Safety. Because it lacks the effects it initiated – provides part of the answer. In the
independent analytical rigor, the Pre-launch Assessment Re- Space Flight Operations Contract, NASA encountered a
view is only marginally effective. In this arrangement, the completely new set of structural constraints that hindered ef-
Johnson Shuttle Safety, Reliability, and Quality Assurance fective communication. New organizational and contractual
Division Chief is expected to render an independent assess- requirements demanded an even more complex system of
ment of his own activities. Therefore, the Board is concerned shared management reviews, reporting relationships, safety
that the Pre-Launch Assessment Review is not an effective oversight and insight, and program information develop-
check and balance in the Flight Readiness Review. ment, dissemination, and tracking.

Given that the entire Safety and Mission Assurance orga- The Shuttle Independent Assessment Teamʼs report docu-
nization depends on the Shuttle Program for resources and mented these changes, noting that “the size and complexity
simultaneously lacks the independent ability to conduct of the Shuttle system and of the NASA/contractor relation-
detailed analyses, cost and schedule pressures can easily ships place extreme importance on understanding, commu-
and unintentionally influence safety deliberations. Structure nication, and information handling.”31 Among other findings,
and process places Shuttle safety programs in the unenvi- the Shuttle Independent Assessment Team observed that:
able position of having to choose between rubber-stamping
engineering analyses, technical efforts, and Shuttle program • The current Shuttle program culture is too insular
decisions, or trying to carry the day during a committee • There is a potential for conflicts between contractual
meeting in which the other side almost always has more and programmatic goals
information and analytic capability. • There are deficiencies in problem and waiver-tracking
systems
NASA Barriers to Communication: Integration, • The exchange of communication across the Shuttle pro-
Information Systems, and Databases gram hierarchy is structurally limited, both upward and
downward.32
By their very nature, high-risk technologies are exception-
ally difficult to manage. Complex and intricate, they consist The Board believes that deficiencies in communication, in-
of numerous interrelated parts. Standing alone, components cluding those spelled out by the Shuttle Independent Assess-
may function adequately, and failure modes may be an- ment Team, were a foundation for the Columbia accident.
ticipated. Yet when components are integrated into a total These deficiencies are byproducts of a cumbersome, bureau-
system and work in concert, unanticipated interactions can cratic, and highly complex Shuttle Program structure and
occur that can lead to catastrophic outcomes.29 The risks the absence of authority in two key program areas that are
inherent in these technical systems are heightened when responsible for integrating information across all programs
they are produced and operated by complex organizations and elements in the Shuttle program.
that can also break down in unanticipated ways. The Shuttle
Program is such an organization. All of these factors make Integration Structures
effective communication – between individuals and between
programs – absolutely critical. However, the structure and NASA did not adequately prepare for the consequences of
complexity of the Shuttle Program hinders communication. adding organizational structure and process complexity in
the transition to the Space Flight Operations Contract. The
The Shuttle Program consists of government and contract agencyʼs lack of a centralized clearinghouse for integration
personnel who cover an array of scientific and technical and safety further hindered safe operations. In the Boardʼs
disciplines and are affiliated with various dispersed space, opinion, the Shuttle Integration and Shuttle Safety, Reli-
research, and test centers. NASA derives its organizational ability, and Quality Assurance Offices do not fully integrate
complexity from its origins as much as its widely varied information on behalf of the Shuttle Program. This is due, in
missions. NASA Centers naturally evolved with different part, to an irregular division of responsibilities between the
points of focus, a “divergence” that the Rogers Commission Integration Office and the Orbiter Vehicle Engineering Office
found evident in the propensity of Marshall personnel to and the absence of a truly independent safety organization.
resolve problems without including program managers out-
side their Center – especially managers at Johnson, to whom Within the Shuttle Program, the Orbiter Office handles many
they officially reported (see Chapter 5). key integration tasks, even though the Integration Office ap-

Report Volume I August 2003 187
COLUMBIA
ACCIDENT INVESTIGATION BOARD

pears to be the more logical office to conduct them; the Or- The following addresses the hazard tracking tools and major
biter Office does not actively participate in the Integration databases in the Shuttle Program that promote risk manage-
Control Board; and Orbiter Office managers are actually ment.
ranked above their Integration Office counterparts. These • Hazard Analysis: A fundamental element of system
uncoordinated roles result in conflicting and erroneous safety is managing and controlling hazards. NASAʼs
information, and support the perception that the Orbiter Of- only guidance on hazard analysis is outlined in the
fice is isolated from the Integration Office and has its own Methodology for Conduct of Space Shuttle Program
priorities. Hazard Analysis, which merely lists tools available.35
Therefore, it is not surprising that hazard analysis pro-
The Shuttle Programʼs structure and process for Safety and cesses are applied inconsistently across systems, sub-
Mission Assurance activities further confuse authority and systems, assemblies, and components.
responsibility by giving the Programʼs Safety and Mis-
sion Assurance Manager technical oversight of the safety United Space Alliance, which is responsible for both
aspects of the Space Flight Operations Contract, while Orbiter integration and Shuttle Safety Reliability and
simultaneously making the Johnson Space Shuttle Division Quality Assurance, delegates hazard analysis to Boe-
Chief responsible for advising the Program on safety per- ing. However, as of 2001, the Shuttle Program no
formance. As a result, no one office or person in Program longer requires Boeing to conduct integrated hazard
management is responsible for developing an integrated analyses. Instead, Boeing now performs hazard analysis
risk assessment above the sub-system level that would pro- only at the sub-system level. In other words, Boeing
vide a comprehensive picture of total program risks. The analyzes hazards to components and elements, but is
net effect is that many Shuttle Program safety, quality, and not required to consider the Shuttle as a whole. Since
mission assurance roles are never clearly defined. the current Failure Mode Effects Analysis/Critical Item
List process is designed for bottom-up analysis at the
Safety Information Systems component level, it cannot effectively support the kind
of “top-down” hazard analysis that is needed to inform
Numerous reviews and independent assessments have managers on risk trends and identify potentially harmful
noted that NASAʼs safety system does not effectively man- interactions between systems.
age risk. In particular, these reviews have observed that the
processes in which NASA tracks and attempts to mitigate The Critical Item List (CIL) tracks 5,396 individual
the risks posed by components on its Critical Items List is Shuttle hazards, of which 4,222 are termed “Critical-
flawed. The Post Challenger Evaluation of Space Shuttle
Risk Assessment and Management Report (1988) con-
cluded that: SPACE SHUTTLE SAFETY UPGRADE
PROGRAM
The committee views NASA critical items list (CIL)
waiver decision-making process as being subjective, NASA presented a Space Shuttle Safety Upgrade Initiative
with little in the way of formal and consistent criteria to Congress as part of its Fiscal Year 2001 budget in March
for approval or rejection of waivers. Waiver decisions 2000. This initiative sought to create a “Pro-active upgrade
appear to be driven almost exclusively by the design program to keep Shuttle flying safely and efficiently to 2012
and beyond to meet agency commitments and goals for hu-
based Failure Mode Effects Analysis (FMEA)/CIL man access to space.”
retention rationale, rather than being based on an in-
tegrated assessment of all inputs to risk management. The planned Shuttle safety upgrades included: Electric
The retention rationales appear biased toward proving Auxiliary Power Unit, Improved Main Landing Gear Tire,
that the design is “safe,” sometimes ignoring signifi- Orbiter Cockpit/Avionics Upgrades, Space Shuttle Main En-
cant evidence to the contrary. gine Advanced Health Management System, Block III Space
Shuttle Main Engine, Solid Rocket Booster Thrust Vector
The report continues, “… the Committee has not found an Control/Auxiliary Power Unit Upgrades Plan, Redesigned
independent, detailed analysis or assessment of the CIL Solid Rocket Motor – Propellant Grain Geometry Modifica-
retention rationale which considers all inputs to the risk as- tion, and External Tank Upgrades – Friction Stir Weld. The
plan called for the upgrades to be completed by 2008.
sessment process.”33 Ten years later, the Shuttle Independent
Assessment Team reported “Risk Management process ero- However, as discussed in Chapter 5, every proposed safety
sion created by the desire to reduce costs …” 34 The Shuttle upgrade – with a few exceptions – was either not approved
Independent Assessment Team argued strongly that NASA or was deferred.
Safety and Mission Assurance should be restored to its pre-
vious role of an independent oversight body, and Safety and The irony of the Space Shuttle Safety Upgrade Program was
Mission Assurance not be simply a “safety auditor.” that the strategy placed emphasis on keeping the “Shuttle
flying safely and efficiently to 2012 and beyond,” yet the
The Board found similar problems with integrated hazard Space Flight Leadership Council accepted the upgrades
analyses of debris strikes on the Orbiter. In addition, the only as long as they were financially feasible. Funding a
safety upgrade in order to fly safely, and then canceling it
information systems supporting the Shuttle – intended to be for budgetary reasons, makes the concept of mission safety
tools for decision-making – are extremely cumbersome and rather hollow.
difficult to use at any level.

188 Report Volume I August 2003
COLUMBIA
ACCIDENT INVESTIGATION BOARD

ity 1/1R.” Of those, 3,233 have waivers. CRIT 1/1R records any non-conformances (instances in which a
component failures are defined as those that will result requirement is not met). Formerly, different Centers and
in loss of the Orbiter and crew. Waivers are granted contractors used the Problem Reporting and Corrective
whenever a Critical Item List component cannot be Action database differently, which prevented compari-
redesigned or replaced. More than 36 percent of these sons across the database. NASA recently initiated an
waivers have not been reviewed in 10 years, a sign that effort to integrate these databases to permit anyone in
NASA is not aggressively monitoring changes in sys- the agency to access information from different Centers.
tem risk. This system, Web Program Compliance Assurance and
Status System (WEBPCASS), is supposed to provide
It is worth noting that the Shuttleʼs Thermal Protection easier access to consolidated information and facilitates
System is on the Critical Item List, and an existing haz- higher-level searches.
ard analysis and hazard report deals with debris strikes.
As discussed in Chapter 6, Hazard Report #37 is inef- However, NASA safety managers have complained that
fectual as a decision aid, yet the Shuttle Program never the system is too time-consuming and cumbersome.
challenged its validity at the pivotal STS-113 Flight Only employees trained on the database seem capable
Readiness Review. of using WEBPCASS effectively. One particularly
frustrating aspect of which the Board is acutely aware is
Although the Shuttle Program has undoubtedly learned the databaseʼs waiver section. It is a critical information
a great deal about the technological limitations inher- source, but only the most expert users can employ it ef-
ent in Shuttle operations, it is equally clear that risk fectively. The database is also incomplete. For instance,
– as represented by the number of critical items list in the case of foam strikes on the Thermal Protection
and waivers – has grown substantially without a vigor- System, only strikes that were declared “In-Fight
ous effort to assess and reduce technical problems that Anomalies” are added to the Problem Reporting and
increase risk. An information system bulging with over Corrective Action database, which masks the full extent
5,000 critical items and 3,200 waivers is exceedingly of the foam debris trends.
difficult to manage.
• Lessons Learned Information System: The Lessons
• Hazard Reports: Hazard reports, written either by the Learned Information System database is a much simpler
Space Shuttle Program or a contractor, document con- system to use, and it can assist with hazard identification
ditions that threaten the safe operation of the Shuttle. and risk assessment. However, personnel familiar with
Managers use these reports to evaluate risk and justify the Lessons Learned Information System indicate that
flight.36 During mission preparations, contractors and design engineers and mission assurance personnel use it
Centers review all baseline hazard reports to ensure only on an ad hoc basis, thereby limiting its utility. The
they are current and technically correct. Board is not the first to note such deficiencies. Numer-
ous reports, including most recently a General Account-
Board investigators found that a large number of hazard ing Office 2001 report, highlighted fundamental weak-
reports contained subjective and qualitative judgments, nesses in the collection and sharing of lessons learned
such as “believed” and “based on experience from by program and project managers.37
previous flights this hazard is an ʻAccepted Risk.ʼ” A
critical ingredient of a healthy safety program is the Conclusions
rigorous implementation of technical standards. These
standards must include more than hazard analysis or Throughout the course of this investigation, the Board found
low-level technical activities. Standards must integrate that the Shuttle Programʼs complexity demands highly ef-
project engineering and management activities. Finally, fective communication. Yet integrated hazard reports and
a mechanism for feedback on the effectiveness of sys- risk analyses are rarely communicated effectively, nor are
tem safety engineering and management needs to be the many databases used by Shuttle Program engineers and
built into procedures to learn if safety engineering and managers capable of translating operational experiences
management methods are weakening over time. into effective risk management practices. Although the
Space Shuttle system has conducted a relatively small num-
Dysfunctional Databases ber of missions, there is more than enough data to generate
performance trends. As it is currently structured, the Shuttle
In its investigation, the Board found that the information Program does not use data-driven safety methodologies to
systems that support the Shuttle program are extremely their fullest advantage.
cumbersome and difficult to use in decision-making at any
level. For obvious reasons, these shortcomings imperil the 7.5 ORGANIZATIONAL CAUSES: IMPACT OF
Shuttle Programʼs ability to disseminate and share critical A FLAWED SAFETY CULTURE ON STS-107
information among its many layers. This section explores
the report databases that are crucial to effective risk man- In this section, the Board examines how and why an array
agement. of processes, groups, and individuals in the Shuttle Program
failed to appreciate the severity and implications of the
• Problem Reporting and Corrective Action: The foam strike on STS-107. The Board believes that the Shuttle
Problem Reporting and Corrective Action database Program should have been able to detect the foam trend and

Report Volume I August 2003 189
COLUMBIA
ACCIDENT INVESTIGATION BOARD

more fully appreciate the danger it represented. Recall that porting and disposition of anomalies. These realities contra-
“safety culture” refers to the collection of characteristics and dict NASAʼs optimistic belief that pre-flight reviews provide
attitudes in an organization – promoted by its leaders and in- true safeguards against unacceptable hazards. The schedule
ternalized by its members – that makes safety an overriding pressure to launch International Space Station Node 2 is a
priority. In the following analysis, the Board outlines short- powerful example of this point (Section 6.2).
comings in the Space Shuttle Program, Debris Assessment
Team, and Mission Management Team that resulted from a The premium placed on maintaining an operational sched-
flawed safety culture. ule, combined with ever-decreasing resources, gradually led
Shuttle managers and engineers to miss signals of potential
Shuttle Program Shortcomings danger. Foam strikes on the Orbiterʼs Thermal Protec-
tion System, no matter what the size of the debris, were
The flight readiness process, which involves every organi- “normalized” and accepted as not being a “safety-of-flight
zation affiliated with a Shuttle mission, missed the danger risk.” Clearly, the risk of Thermal Protection damage due to
signals in the history of foam loss. such a strike needed to be better understood in quantifiable
terms. External Tank foam loss should have been eliminated
Generally, the higher information is transmitted in a hierar- or mitigated with redundant layers of protection. If there
chy, the more it gets “rolled-up,” abbreviated, and simpli- was in fact a strong safety culture at NASA, safety experts
fied. Sometimes information gets lost altogether, as weak would have had the authority to test the actual resilience of
signals drop from memos, problem identification systems, the leading edge Reinforced Carbon-Carbon panels, as the
and formal presentations. The same conclusions, repeated Board has done.
over time, can result in problems eventually being deemed
non-problems. An extraordinary example of this phenom- Debris Assessment Team Shortcomings
enon is how Shuttle Program managers assumed the foam
strike on STS-112 was not a warning sign (see Chapter 6). Chapter Six details the Debris Assessment Teamʼs efforts to
obtain additional imagery of Columbia. When managers in
During the STS-113 Flight Readiness Review, the bipod the Shuttle Program denied the teamʼs request for imagery,
foam strike to STS-112 was rationalized by simply restat- the Debris Assessment Team was put in the untenable posi-
ing earlier assessments of foam loss. The question of why tion of having to prove that a safety-of-flight issue existed
bipod foam would detach and strike a Solid Rocket Booster without the very images that would permit such a determina-
spawned no further analysis or heightened curiosity; nor tion. This is precisely the opposite of how an effective safety
did anyone challenge the weakness of External Tank Proj- culture would act. Organizations that deal with high-risk op-
ect Managerʼs argument that backed launching the next erations must always have a healthy fear of failure – opera-
mission. After STS-113ʼs successful flight, once again the tions must be proved safe, rather than the other way around.
STS-112 foam event was not discussed at the STS-107 Flight NASA inverted this burden of proof.
Readiness Review. The failure to mention an outstanding
technical anomaly, even if not technically a violation of Another crucial failure involves the Boeing engineers who
NASAʼs own procedures, desensitized the Shuttle Program conducted the Crater analysis. The Debris Assessment Team
to the dangers of foam striking the Thermal Protection Sys- relied on the inputs of these engineers along with many oth-
tem, and demonstrated just how easily the flight preparation ers to assess the potential damage caused by the foam strike.
process can be compromised. In short, the dangers of bipod Prior to STS-107, Crater analysis was the responsibility of
foam got “rolled-up,” which resulted in a missed opportuni- a team at Boeingʼs Huntington Beach facility in California,
ty to make Shuttle managers aware that the Shuttle required, but this responsibility had recently been transferred to
and did not yet have a fix for the problem. Boeingʼs Houston office. In October 2002, the Shuttle Pro-
gram completed a risk assessment that predicted the move of
Once the Columbia foam strike was discovered, the Mission Boeing functions from Huntington Beach to Houston would
Management Team Chairperson asked for the rationale the increase risk to Shuttle missions through the end of 2003,
STS-113 Flight Readiness Review used to launch in spite because of the small number of experienced engineers who
of the STS-112 foam strike. In her e-mail, she admitted that were willing to relocate. To mitigate this risk, NASA and
the analysis used to continue flying was, in a word, “lousy” United Space Alliance developed a transition plan to run
(Chapter 6). This admission – that the rationale to fly was through January 2003.
rubber-stamped – is, to say the least, unsettling.
The Board has discovered that the implementation of the
The Flight Readiness process is supposed to be shielded transition plan was incomplete and that training of replace-
from outside influence, and is viewed as both rigorous and ment personnel was not uniform. STS-107 was the first
systematic. Yet the Shuttle Program is inevitably influenced mission during which Johnson-based Boeing engineers
by external factors, including, in the case of the STS-107, conducted analysis without guidance and oversight from
schedule demands. Collectively, such factors shape how engineers at Huntington Beach.
the Program establishes mission schedules and sets budget
priorities, which affects safety oversight, workforce levels, Even though STS-107ʼs debris strike was 400 times larger
facility maintenance, and contractor workloads. Ultimately, than the objects Crater is designed to model, neither John-
external expectations and pressures impact even data collec- son engineers nor Program managers appealed for assistance
tion, trend analysis, information development, and the re- from the more experienced Huntington Beach engineers,

190 Report Volume I August 2003
COLUMBIA
ACCIDENT INVESTIGATION BOARD

ENGINEERING BY VIEWGRAPHS
The Debris Assessment Team presented its analysis in a formal Tufte also criticized the sloppy language on the slide. “The
briefing to the Mission Evaluation Room that relied on Power- vaguely quantitative words ʻsignificantʼ and ʻsignificantlyʼ are
Point slides from Boeing. When engineering analyses and risk used 5 times on this slide,” he notes, “with de facto meanings
assessments are condensed to fit on a standard form or overhead ranging from ʻdetectable in largely irrelevant calibration case
slide, information is inevitably lost. In the process, the prior- studyʼ to ʻan amount of damage so that everyone diesʼ to ʻa dif-
ity assigned to information can be easily misrepresented by its ference of 640-fold.ʼ ” 40 Another example of sloppiness is that
placement on a chart and the language that is used. Dr. Edward “cubic inches” is written inconsistently: “3cu. In,” “1920cu in,”
Tufte of Yale University, an expert in information presentation and “3 cu in.” While such inconsistencies might seem minor, in
who also researched communications failures in the Challenger highly technical fields like aerospace engineering a misplaced
accident, studied how the slides used by the Debris Assessment decimal point or mistaken unit of measurement can easily
Team in their briefing to the Mission Evaluation Room misrep- engender inconsistencies and inaccuracies. In another phrase
resented key information.38 “Test results do show that it is possible at sufficient mass and
velocity,” the word “it” actually refers to “damage to the protec-
The slide created six levels of hierarchy, signified by the title tive tiles.”
and the symbols to the left of each line. These levels prioritized
information that was already contained in 11 simple sentences. As information gets passed up an organization hierarchy, from
Tufte also notes that the title is confusing. “Review of Test Data people who do analysis to mid-level managers to high-level
Indicates Conservatism” refers not to the predicted tile damage, leadership, key explanations and supporting information is fil-
but to the choice of test models used to predict the damage. tered out. In this context, it is easy to understand how a senior
manager might read this PowerPoint slide and not realize that it
Only at the bottom of the slide do engineers state a key piece of addresses a life-threatening situation.
information: that one estimate of the debris that struck Columbia
was 640 times larger than the data used to calibrate the model on At many points during its investigation, the Board was sur-
which engineers based their damage assessments. (Later analy- prised to receive similar presentation slides from NASA offi-
sis showed that the debris object was actually 400 times larger). cials in place of technical reports. The Board views the endemic
This difference led Tufte to suggest that a more appropriate use of PowerPoint briefing slides instead of technical papers as
headline would be “Review of Test Data Indicates Irrelevance an illustration of the problematic methods of technical com-
of Two Models.” 39 munication at NASA.

The vaguely quantitative words "significant" and
"significantly" are used 5 times on this slide, with de facto
meanings ranging from "detectable in largely irrelevant
calibration case study" to "an amount of damage so that
everyone dies" to "a difference of 640-fold." None of
these 5 usages appears to refer to the technical meaning
of "statistical significance."
Review Of Test Data Indicates Conservatism for Tile The low resolution of PowerPoint slides promotes
Penetration the use of compressed phrases like "Tile Penetration."
As is the case here, such phrases may well be ambiquous.
• The existing SOFI on tile test data used to create Crater
was reviewed along with STS-107 Southwest Research data
(The low resolution and large font generate 3 typographic
orphans, lonely words dangling on a seperate line.)
– Crater overpredicted penetration of tile coating
significantly
• Initial penetration to described by normal velocity
Varies with volume/mass of projectile(e.g., 200ft/sec for This vague pronoun reference "it" alludes to damage
3cu. In) to the protective tiles,which caused the destruction of the
• Significant energy is required for the softer SOFI particle Columbia. The slide weakens important material with
to penetrate the relatively hard tile coating ambiquous language (sentence fragments, passive voice,
Test results do show that it is possible at sufficient mass multiple meanings of "significant"). The 3 reports
and velocity
• Conversely, once tile is penetrated SOFI can cause were created by engineers for high-level NASA officials
significant damage who were deciding whether the threat of wing damage
Minor variations in total energy (above penetration level) required further investigation before the Columbia
can cause significant tile damage attempted return. The officials were satisfied that the
– Flight condition is significantly outside of test database reports indicated that the Columbia was not in danger,
• Volume of ramp is 1920cu in vs 3 cu in for test
and no attempts to further examine the threat were
2/21/03 6
made. The slides were part of an oral presentation and
also were circulated as e-mail attachments.
In this slide the same unit of measure for volume
(cubic inches) is shown a different way every time
3cu. in 1920cu. in 3 cu. in
rather than in clear and tidy exponential form 1920 in 3 .
Perhaps the available font cannot show exponents.
Shakiness in units of measurement provokes concern.
Slides that use hierarchical bullet-outlines here do not
handle statistical data and scientific notation gracefully.
If PowerPoint is a corporate-mandated format for all
engineering reports, then some competent scientific
typography (rather than the PP market-pitch style) is
essential. In this slide, the typography is so choppy and
clunky that it impedes understanding.

The analysis by Dr. Edward Tufte of the slide from the Debris Assessment Team briefing. [SOFI=Spray-On Foam Insulation]

Report Volume I August 2003 191
COLUMBIA
ACCIDENT INVESTIGATION BOARD

who might have cautioned against using Crater so far out- on solid data. Managers demonstrated little concern for mis-
side its validated limits. Nor did safety personnel provide sion safety.
any additional oversight. NASA failed to connect the dots:
the engineers who misinterpreted Crater – a tool already Organizations with strong safety cultures generally acknowl-
unsuited to the task at hand – were the very ones the Shuttle edge that a leaderʼs best response to unanimous consent is to
Program identified as engendering the most risk in their play devilʼs advocate and encourage an exhaustive debate.
transition from Huntington Beach. The Board views this ex- Mission Management Team leaders failed to seek out such
ample as characteristic of the greater turbulence the Shuttle minority opinions. Imagine the difference if any Shuttle
Program experienced in the decade before Columbia as a manager had simply asked, “Prove to me that Columbia has
result of workforce reductions and management reforms. not been harmed.”

Mission Management Team Shortcomings Similarly, organizations committed to effective communica-
tion seek avenues through which unidentified concerns and
In the Boardʼs view, the decision to fly STS-113 without a dissenting insights can be raised, so that weak signals are
compelling explanation for why bipod foam had separated not lost in background noise. Common methods of bringing
on ascent during the preceding mission, combined with the minority opinions to the fore include hazard reports, sug-
low number of Mission Management Team meetings during gestion programs, and empowering employees to call “time
STS-107, indicates that the Shuttle Program had become out” (Chapter 10). For these methods to be effective, they
overconfident. Over time, the organization determined it did must mitigate the fear of retribution, and management and
not need daily meetings during a mission, despite regula- technical staff must pay attention. Shuttle Program hazard
tions that state otherwise. reporting is seldom used, safety time outs are at times disre-
garded, and informal efforts to gain support are squelched.
Status update meetings should provide an opportunity to raise The very fact that engineers felt inclined to conduct simulat-
concerns and hold discussions across structural and technical ed blown tire landings at Ames “after hours,” indicates their
boundaries. The leader of such meetings must encourage reluctance to bring the concern up in established channels.
participation and guarantee that problems are assessed and
resolved fully. All voices must be heard, which can be dif- Safety Shortcomings
ficult when facing a hierarchy. An employeeʼs location in the
hierarchy can encourage silence. Organizations interested in The Board believes that the safety organization, due to a
safety must take steps to guarantee that all relevant informa- lack of capability and resources independent of the Shuttle
tion is presented to decision-makers. This did not happen in Program, was not an effective voice in discussing technical
the meetings during the Columbia mission (see Chapter 6). issues or mission operations pertaining to STS-107. The
For instance, e-mails from engineers at Johnson and Langley safety personnel present in the Debris Assessment Team,
conveyed the depth of their concern about the foam strike, Mission Evaluation Room, and on the Mission Management
the questions they had about its implications, and the actions Team were largely silent during the events leading up to the
they wanted to take as a follow-up. However, these e-mails loss of Columbia. That silence was not merely a failure of
did not reach the Mission Management Team. safety, but a failure of the entire organization.

The failure to convey the urgency of engineering concerns 7.6 FINDINGS AND RECOMMENDATIONS
was caused, at least in part, by organizational structure and
spheres of authority. The Langley e-mails were circulated The evidence that supports the organizational causes also
among co-workers at Johnson who explored the possible ef- led the Board to conclude that NASAʼs current organization,
fects of the foam strike and its consequences for landing. Yet, which combines in the Shuttle Program all authority and
like Debris Assessment Team Co-Chair Rodney Rocha, they responsibility for schedule, cost, manifest, safety, technical
kept their concerns within local channels and did not forward requirements, and waivers to technical requirements, is not
them to the Mission Management Team. They were separated an effective check and balance to achieve safety and mission
from the decision-making process by distance and rank. assurance. Further, NASAʼs Office of Safety and Mission
Assurance does not have the independence and authority
Similarly, Mission Management Team participants felt pres- that the Board and many outside reviews believe is neces-
sured to remain quiet unless discussion turned to their par- sary. Consequently, the Space Shuttle Program does not
ticular area of technological or system expertise, and, even consistently demonstrate the characteristics of organizations
then, to be brief. The initial damage assessment briefing that effectively manage high risk. Therefore, the Board of-
prepared for the Mission Evaluation Room was cut down fers the following Findings and Recommendations:
considerably in order to make it “fit” the schedule. Even so,
it took 40 minutes. It was cut down further to a three-minute Findings:
discussion topic at the Mission Management Team. Tapes of
STS-107 Mission Management Team sessions reveal a no- F7.1-1 Throughout its history, NASA has consistently
ticeable “rush” by the meetingʼs leader to the preconceived struggled to achieve viable safety programs and
bottom line that there was “no safety-of-flight” issue (see adjust them to the constraints and vagaries of
Chapter 6). Program managers created huge barriers against changing budgets. Yet, according to multiple high
dissenting opinions by stating preconceived conclusions level independent reviews, NASAʼs safety system
based on subjective knowledge and experience, rather than has fallen short of the mark.

192 Report Volume I August 2003
COLUMBIA
ACCIDENT INVESTIGATION BOARD

F7.4-1 The Associate Administrator for Safety and Mis- communicate potential problems throughout the
sion Assurance is not responsible for safety and organization.
mission assurance execution, as intended by the F7.4-13 There are conflicting roles, responsibilities, and
Rogers Commission, but is responsible for Safety guidance in the Space Shuttle safety programs.
and Mission Assurance policy, advice, coordina- The Safety & Mission Assurance Pre-Launch As-
tion, and budgets. This view is consistent with sessment Review process is not recognized by the
NASAʼs recent philosophy of management at a Space Shuttle Program as a requirement that must
strategic level at NASA Headquarters but contrary be followed (NSTS 22778). Failure to consistent-
to the Rogersʼ Commission recommendation. ly apply the Pre-Launch Assessment Review as a
F7.4-2 Safety and Mission Assurance organizations sup- requirements document creates confusion about
porting the Shuttle Program are largely dependent roles and responsibilities in the NASA safety or-
upon the Program for funding, which hampers ganization.
their status as independent advisors.
F7.4-3 Over the last two decades, little to no progress has Recommendations:
been made toward attaining integrated, indepen-
dent, and detailed analyses of risk to the Space R7.5-1 Establish an independent Technical Engineer-
Shuttle system. ing Authority that is responsible for technical
F7.4-4 System safety engineering and management is requirements and all waivers to them, and will
separated from mainstream engineering, is not build a disciplined, systematic approach to
vigorous enough to have an impact on system de- identifying, analyzing, and controlling hazards
sign, and is hidden in the other safety disciplines throughout the life cycle of the Shuttle System.
at NASA Headquarters. The independent technical authority does the fol-
F7.4-5 Risk information and data from hazard analyses lowing as a minimum:
are not communicated effectively to the risk as-
sessment and mission assurance processes. The • Develop and maintain technical standards
Board could not find adequate application of a for all Space Shuttle Program projects and
process, database, or metric analysis tool that elements
took an integrated, systemic view of the entire • Be the sole waiver-granting authority for
Space Shuttle system. all technical standards
F7.4-6 The Space Shuttle Systems Integration Office • Conduct trend and risk analysis at the sub-
handles all Shuttle systems except the Orbiter. system, system, and enterprise levels
Therefore, it is not a true integration office. • Own the failure mode, effects analysis and
F7.4-7 When the Integration Office convenes the Inte- hazard reporting systems
gration Control Board, the Orbiter Office usually • Conduct integrated hazard analysis
does not send a representative, and its staff makes • Decide what is and is not an anomalous
verbal inputs only when requested. event
F7.4-8 The Integration office did not have continuous • Independently verify launch readiness
responsibility to integrate responses to bipod • Approve the provisions of the recertifica-
foam shedding from various offices. Sometimes tion program called for in Recommenda-
the Orbiter Office had responsibility, sometimes tion R9.1-1
the External Tank Office at Marshall Space Flight
Center had responsibility, and sometime the bi- The Technical Engineering Authority should be
pod shedding did not result in any designation of funded directly from NASA Headquarters, and
an In-Flight Anomaly. Integration did not occur. should have no connection to or responsibility for
F7.4-9 NASA information databases such as The Prob- schedule or program cost.
lem Reporting and Corrective Action and the R7.5-2 NASA Headquarters Office of Safety and Mission
Web Program Compliance Assurance and Status Assurance should have direct line authority over
System are marginally effective decision tools. the entire Space Shuttle Program safety organiza-
F7.4-10 Senior Safety, Reliability & Quality Assurance tion and should be independently resourced.
and element managers do not use the Lessons R7.5-3 Reorganize the Space Shuttle Integration Office
Learned Information System when making de- to make it capable of integrating all elements of
cisions. NASA subsequently does not have a the Space Shuttle Program, including the Orbiter.
constructive program to use past lessons to edu-
cate engineers, managers, astronauts, or safety
personnel.
F7.4-11 The Space Shuttle Program has a wealth of data
tucked away in multiple databases without a
convenient way to integrate and use the data for
management, engineering, or safety decisions.
F7.4-12 The dependence of Safety, Reliability & Quality
Assurance personnel on Shuttle Program sup-
port limits their ability to oversee operations and

Report Volume I August 2003 193
COLUMBIA
ACCIDENT INVESTIGATION BOARD

ENDNOTES FOR CHAPTER 7

The citations that contain a reference to “CAIB document” with CAB or Dupont Corporation; Dr. M. Sam Mannan, Texas A&M University; and
CTF followed by seven to eleven digits, such as CAB001-0010, refer to a Mr. Alan C. McMillan, President and Chief Executive Officer, National
document in the Columbia Accident Investigation Board database maintained Safety Council.
by the Department of Justice and archived at the National Archives. 17
Dr. David Woods of Ohio State University speaking to the Board on Hind-
Sight Bias. April 28, 2003.
1 18
Sylvia Kramer, “History of NASA Safety Office from 1958-1980ʼs,” Sagan, The Limits of Safety, p.258.
NASA History Division Record Collection, 1986, p. 1. CAIB document 19
LaPorte and Consolini, “Working In Practice.”
CAB065-0358. 20
2 Notes from “NASA/Navy Benchmarking Exchange (NNBE), Interim
Ralph M. Miles Jr. “Introduction.” In Ralph M. Miles Jr., editor, System Report, Observations & Opportunities Concerning Navy Submarine
Concepts: Lectures on Contemporary Approaches to Systems, p. 1-12 Program Safety Assurance,” Joint NASA and Naval Sea Systems
(New York: John F. Wiley & Sons, 1973). Command NNBE Interim Report, December 20, 2002.
3
“The Aerospace Safety Advisory Panel, ” NASA History Office, July 1, 21
Theodore Rockwell, The Rickover Effect, How One Man Made a
1987, p. 1. Difference. (Annapolis, Maryland: Naval Institute Press, 1992), p. 318.
4
On Rodneyʼs appointment, see NASA Management Instruction 1103.39, 22
Rockwell, Rickover, p. 320.
July 3, 1986, and NASA News July 8, 1986. 23
5 For more information, see Dr. Diane Vaughn, The Challenger Launch
NASA Facts, “Brief Overview, Office of Safety, Reliability, Maintainability Decision, Risky Technology, Culture, and Deviance at NASA (Chicago:
and Quality Assurance,” circa 1987. University of Chicago Press, 1996).
6
“Space Program Safety: Funding for NASAʼs Safety Organizations 24
Presentation to the Board by Admiral Walter Cantrell, Aerospace
Should Be Centralized,” General Accounting Office Report, NSIAD-90- Advisory Panel member, April 7, 2003.
187, 1990. 25
7 Presentation to the Board by Admiral Walter Cantrell, Aerospace
“Aerospace Safety Advisory Panel Annual Report,” 1996. Advisory Panel member, April 7, 2003.
8
The quotes are from the Executive Summary of National Aeronautics 26
Aerospaceʼs Launch Verification Process and its Contribution to Titan Risk
and Space Administration Space Shuttle Independent Assessment Team, Management, Briefing given to Board, May 21, 2003, Mr. Ken Holden,
“Report to Associate Administrator, Office of Space Flight,” October- General Manager, Launch Verification Division.
December 1999. CAIB document CTF017-0169. 27
9 Joe Tomei, “ELV Launch Risk Assessment Briefing,” 3rd Government/
Harry McDonald, “SIAT Space Shuttle Independent Assessment Team Industry Mission Assurance Forum, Aerospace Corporation, September
Report.” 24, 2002.
10
NASA Chief Engineer and NASA Integrated Action Team, “Enhancing 28
NASA Policy Directive 8700.1A, “NASA Policy for Safety and Mission
Mission Success – A Framework for the Future,” December 21, 2000. Success”, Para 1.b, 5.b(1), 5.e(1), and 5.f(1).
11
The information in this section is derived from a briefing titled, “Draft 29
Charles B. Perrow. Normal Accidents (New York: Basic Books, 1984).
Final Report of the Space Shuttle Competitive Source Task Force,” July 30
12, 2002. Mr. Liam Sarsfield briefed this report to NASA Headquarters. A. Shenhar, “Project management style and the space shuttle program
12 (part 2): A retrospective look,” Project Management Journal, 23 (1), pp.
Dr. Karl Weick, University of Michigan; Dr. Karlene Roberts, University of 32-37.
California-Berkley; Dr. Howard McCurdy, American University; and Dr. 31
Diane Vaughan, Boston College. Harry McDonald, “SIAT Space Shuttle Independent Assessment Team
13 Report.”
Dr. David Woods, Ohio State University; Dr. Nancy G. Leveson, 32
Massachusetts Institute of Technology; Mr. James Wick, Intel Ibid.
Corporation; Ms. Deborah L. Grubbe, DuPont Corporation; Dr. M. Sam 33
“Post Challenger Evaluation of Space Shuttle Risk Assessment and
Mannan, Texas A&M University; Douglas A. Wiegmann, University of Management Report, National Academy Press 1988,” section 5.1, pg.
Illinois at Urbana-Champaign; and Mr. Alan C. McMillan, President and 40.
Chief Executive Officer, National Safety Council. 34
Harry McDonald, “SIAT Space Shuttle Independent Assessment Team
14
Todd R. La Porte and Paula M. Consolini, “Working in Practice but Not in Report.”
Theory,” Journal of Public Administration Research and Theory, 1 (1991) 35
NSTS-22254 Rev B.
pp. 19-47.
36
15 Ibid.
Scott Sagan, The Limits of Safety (Princeton: Princeton University Press,
37
1995). GAO Report, “Survey of NASA Lessons Learned,” GAO-01-1015R,
16 September 5, 2001.
Dr. Diane Vaughan, Boston College; Dr. David Woods, Ohio State
38
University; Dr. Howard E. McCurdy, American University; Dr. Karl E. Tufte, Beautiful Evidence (Cheshire, CT: Graphics Press). [in press.]
E. Weick, University of Michigan; Dr. Karlene H. Roberts; Dr. M. 39
Ibid., Edward R. Tufte, “The Cognitive Style of PowerPoint,” (Cheshire,
Elisabeth Paté-Cornell; Dr. Douglas A. Wiegmann, University of Illinois
CT: Graphics Press, May 2003).
at Urbana-Champaign; Dr. Nancy G. Leveson, Massachusetts Institute of
40
Technology; Mr. James Wick, Intel Corporation; Ms. Deborah L. Grubbe, Ibid.

194 Report Volume I August 2003

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.