Evolution and Efficacy of ISS Risk Management
Department of Space Studies, University of North Dakota, Grand Forks, ND 58202, USA November 19, 2010
Risk is a measure of the potential inability to achieve overall program objectives and is characterized by the probability/likelihood and resulting consequence/impact of failing. The high cost and complexity of aerospace systems often lead to long development schedules and high risk due to the importance placed on performance to achieve mission objectives (Altavilla and Garbellini 2002). The International Space Station (ISS) is one of the most complex technical and programmatic projects ever undertaken by humanity. For example, its components were designed, manufactured, and launched by numerous international partners to serve diverse customers and purposes (Perera 2002). Therefore, the ISS program is subject to a great deal of risk that must be managed in order to ensure its success. In 1988, when NASA started designing the International Space Station (ISS), the orbiting complex was to cost $23 billion and be completed by 1996. Today, the cost has topped $150 billion and is still is not finished. By the year 2000, multiple NASA mission failures on Mars and large cost and schedule overruns in the International Space Station (ISS) program had brought the agency’s risk management practices to public spotlight. The 2001 ISS Management and Cost Evaluation Task Force report by the Young Commission concluded that, although the ISS had achieved excellent progress during its first three years of construction, NASA had consistently underestimated cost and schedule and lacked the necessary skills and tools to manage the project within budget (Young 2001). The commission further noted that NASA
management had overly focused on performance and crew safety with emphasis on near-term schedules, rather than total program costs and long-term schedules. The report led to a complete strategy of the ISS risk management approach towards a more robust and comprehensive approach to better keep the program on track. Risk management is the process by which risks are identified and plans are implemented to mitigate their consequences. Risks can be either programmatic or technical. Programmatic risks have consequences that primarily aﬀect cost and schedule, while technical risks have consequences aﬀecting performance and objectives (Altavilla and Garbellini 2002). NASA applies a Continuous Risk Management (CRM) approach as a systemic method to identify, analyze, plan, track, control, communicate and document risks on a continuous basis (NASA 2002, Perera 2002). Figure 1 illustrates the iterative nature of CRM and emphasizes the
importance of communication to bind the entire process together (NASA 2007).
Figure 1: Continuous Risk Management Process.
The ISS Program Risk Management Plan, published as NASA document SSP 50175, defines the ISS risk management strategy as follows (NASA 2002):
a) Embed risk management processes into normal day-to-day activities to identify and help manage all risks and potential threats. b) Delegate risk-management responsibility to the lowest possible organization with the allocated resources to mitigate or authority to accept the risk. c) Dedicate a Program Risk Management organization to lead program-level riskmanagement activities, facilitate the risk-management processes, and provide analytical support and tools including Probabilistic Risk Assessments (PRAs), ISS Risk Management processes training and other risk-management assistance to managing organizations. d) Provide the necessary costs and funding analysis to address all risks and potential threats to the ISS. e) Integrate the risk cost and schedule process within the risk system.
Under the guidance of NASA Headquarters, the Johnson Space Center-based ISS Program Office sets the standards that all ISS program participants must follow. It develops, implements, and supports risk management tools and practices using both qualitative and quantitative methodologies (Perera 2002). The process employed continually assesses and
prioritizes risks, implements strategies to deal with them, and measures the effectiveness of the strategies implemented. The office also facilitates the ISS Program Risk Advisory Board
(PRAB), which is chaired by the ISS Program Manager and convenes at regular intervals to oversee the effective integrated management of risks (NASA 2002). The PRAB reviews risks, ranks them according to a Likelihood x Consequence (L x C) methodology, and assigns their mitigation to responsible managing organizations, who in turn report risks up the chain. Altogether, the goals and benefits of the ISS Program Risk Management Office’s activities increase the likelihood of mission success in the most resource-efficient mode possible while promoting teamwork, communication, and sound Risk-Informed Decision-Making (RIDM) within the hierarchy of the NASA organization and taxonomy of a Risk Breakdown Structure (RBS) (NASA 2007, 2010). Coupled with CRM, the RIDM process and RBS method foster
proactive, consistent risk management by making decisions with regard to outcomes of alternatives, taking into account applicable risks and uncertainties (NASA 2008). The Integrated Risk Management Application (IRMA) is the day-to-day systemic tool NASA managers use to manage risk. The browser-based application provides an integrated risk management database to track and communicate risk throughout all ISS managing organizations (Perera 2002). Each risk is assigned a characteristic severity, cost, and set of mitigation tasks that any stakeholder organization can track in order to know the impacts across the program. Originally developed by the Futron Corporation specifically for the ISS Program (Moses and Malone 2005), the IRMA has expanded agency-wide into other NASA programs as a key component of the One NASA Management Information System (MIS) (Pino and Pitotti 2005). Figure 2 shows a screenshot of the IRMA application.
Figure 2: IRMA screenshot example from the One NASA MIS. 4
The metrics collected by the IRMA allow for the Likelihood, Consequence, and Magnitude of risks to be plotted in matrix form. The ISS Risk Summary Card provides a quick overview of this information to aide managers in decision-making (Perera 2002). Figure 3 shows an example of an ISS Risk Summary Card. The PRAB reviews these risks in the global context of the whole ISS program and uses a logical decision-making processes and Probabilistic Risk Assessment (PRA) analyses to assign resources to help mitigate them (NASA 2002, Stamatelatos et al. 2002a,b). For example, through a PRA process, NASA ISS risk managers can provide more robust risk characterization by conducting trade studies on possible accident scenarios and their likely end states. Some of the scenarios include catastrophic loss of the Station, loss or severe injury of a crewmember, loss of a vital ISS system, and situations requiring Station evacuation (NASA 2002).
Figure 3: Risk Summary Card. 5
Strict compliance with a risk management process does not necessarily mean it is an effective process. Human psychology dictates that people will not generally communicate risk unless that communication provides an immediate personal benefit (Williams and Perera 2004). The barrier to entry for reporting risks should therefore be as low as possible to ensure universal adoption by the workforce. This means reporting tools need to be user-friendly and convenient. Thus, in order to comprehensively identify and analyze all risks for inclusion in the CRM process, a manager must understand his/her team’s “picture of success” and foster its universal explicit adoption by the group (Williams and Perera 2004). Such two-way “buy-in” between the program management and project team and vice versa leads to greater confidence, loyalty, performance, and communication, while reducing fear (Canga and Wood 2009). New metrics added to the IRMA tool to help address this type of concern include the added tracking of speed and fidelity of reporting, which provide quantitative measures to incentivize timely and accurate risk reporting. These have resulted in improvements to levels of clarity and granularity in risk reporting and mitigation (Williams and Perera 2005). The data also contribute to a body of knowledge to facilitate a structured continuous improvement process of risk management. Today, NASA’s overall agency risk management approach is governed by Procedural Requirement 8000.4A, which was issued in December 2008 (NASA 2008). This introduced a number of changes to NASA’s risk management approach in a more proactive direction that recognizes the context of complex institutional relationships (Dezfuli 2009). One goal of the new emphasis is that the agency wants become a better learning organization by integrating risk and knowledge management through a process that preserves knowledge for the future (NASA 2007, Dezfuli 2009). This allows planners to more effectively learn from the past and either repeat successes or learn from failures. By establishing and working from best practices
accumulated over time and integrating them into the CRM paradigm, transfer of knowledge is better incorporated into the organization culture. To achieve these goals, NASA established “Communities of Practice” (CoP) groups, which are comprised of people within the agency who share common concerns, knowledge, and passion for a topic. The CoP deepen their knowledge and expertise by interacting on an ongoing basis using a number of tools such as the Process-Based Mission Assurance (PBMA) Portal and the Riksapedia wiki (Lengyel 2007, Heard 2010), which are pictured in Figure 4. The toolsuppored risk management approach (Williams and Perera 2004) facilitates formal and informal reporting pathways using a network of communication tools in order to achieve the ultimate goal of accomplishing more with less bureaucracy. The approach emphasizes compliance with intent rather than compliance with detailed processes and procedures (NASA 2007, Canga and Wood 2009). This has led to a more effective risk management strategy for NASA in recent years.
Figure 4: CoP tools: Process-Based Mission Assurance (left) and Riskapedia (right).
On October 22, 2010, the International Space Station (ISS) surpassed the Mir space station for number of continuous operational days at 3641 (Jackman and Carreau 2010).
Towards the end of its life, Mir was plagued with a number of dangerous system failures, but the ISS has not yet experienced such safety problems. This is perhaps due in large part to NASA’s safety culture in general and its specific implementation of risk mitigation recommendations from the Columbia Accident Investigation Board (NASA 2005). Although the entire history of the ISS program from its initial proposal in the early 1980s to today has been characterized by large cost and schedule overruns at the expense of station scope and performance, the fact remains that outpost has weathered political and economic turmoil to survive with a spectacular safety record. Evolution of NASA’s risk management strategy has been driven in large part by the ISS program, and the modern collaborative, network-based practices are yielding greater risk management efficacy for the agency. Perhaps it is this latter cultural shift within the agency brought about by modern communication and collaboration tools that will bring knowledgebased risk management to a higher level of effectiveness resulting in long-term measurable cost, schedule, and performance gains.
Altavilla, A. and L. Garbellini (2002). “Risk assessment in the aerospace industry”, Safety Science, 40, 271-298. Canga, M. A. and J. M. Wood (2009). “Fostering an Environment Conducive to Successful Program/Project Risk Management”, NASA Project Management Challenge 2009, 24-25 February 2009. Dezfuli, H. (2009). “NASA’s New Risk Management Paradigm”, NASA Project Management Challenge 2009, 24-25 February 2009. Heard, I. A. (2010). “Riskapedia: An ESMD Risk Management Service for NASA’s Project Managers, Project Engineers, and Risk Practitioners”, NASA Project Management Challenge 2010, 9 February 2010 Jackman, F. and M. Carreau (2010). “ISS Passing Old Russian Mir in Crewed Time”, Aviation Week, 29 October 2010.
Lengyel, D. (2007). “Integrating Risk and Knowledge Management in ESMD”, NASA Project Management Challenge 2007, 6 February 2007, 23pp. NASA (2002). International Space Station Program Risk Management Plan, SSP 50175 Revision A, 10 April 2002, 24 pp. NASA (2005). NASA’s Implementation Plan for International Space Station Continuing Flight, SSP 110883 Volume 2: Revision 2, 15 February 2005, 210pp. NASA (2007). Exploration Systems Risk Management Plan, ESMD-RMP-04.06, Rev 2, 16 August 2007, 77pp. NASA (2008). Agency Risk Management Procedural Requirements, NPR 8000.4A, 16 December 2008, 26pp. NASA (2010). NASA Risk-Informed Decision Making Handbook, SP-2010-576, Version 1.0, April 2010, 128pp. Moses K. D. and R. W. Malone (2005). “Development of Risk Assessment Matrix for NASA Engineering and Safety Center”, Risk Analysis: The Profession and the Future Workshop, Palm Springs, CA, 5-8 Dec 2004. Perera, J. B. (2002). “Risk Management for the International Space Station”, Joint ESA-NASA SpaceFlight Safety Conference, ESA SP-486, Noordwijk, NL, 11-14 June 2002, p339-344. Perera, J. B. (2005). “Integrated Risk Management Application (IRMA) Overview/Update”, NASA Risk Management Conference 2005, Orlando, FL, 6-8 December 2005. Pino, C. and S. Pitotti (2005). “One NASA Management Information System: Theory to Practical Case Studies”, NASA Project Management Challenge 2005, 21 March 2005, 44pp. Stamatelatos, M. et al. (2002a). Fault Tree Handbook with Aerospace Applications, Version 1.1, August 2002, 218pp. Stamatelatos, M. et al. (2002b). Probabilistic Risk Assessment Procedures Guide for NASA Managers and Practitioners, Version 1.1, August 2002, 323pp. Williams, R. and J. Perera (2004). “Update -Risk Management Process Metrics: Measuring Effectiveness of Risk Management”, NASA Risk Management Conference 2004, Cleveland, OH, 26-29 October 2004. Williams, R. and J. Perera (2005). “Update -Risk Management Process Metrics: Measuring Effectiveness of Risk Management”, NASA Risk Management Conference 2005, Orlando, FL, 68 December 2005. Young, A. T., et al. (2001). Report by the International Space Station (ISS) Management and Cost Evaluation (IMCE) Task Force, 1 November 2001, 41pp.