Survivability Assurance for System of Systems

Robert J. Ellison John Goodenough Charles Weinstock Carol Woody May 2008 TECHNICAL REPORT CMU/SEI-2008-TR-008 ESC-TR-2008-008 Networked Systems Survivability
Unlimited distribution subject to the copyright.

This report was prepared for the SEI Administrative Agent ESC/XPK 5 Eglin Street Hanscom AFB, MA 01731-2100 The ideas and findings in this report should not be construed as an official DoD position. It is published in the interest of scientific and technical information exchange. This work is sponsored by the U.S. Department of Defense. The Software Engineering Institute is a federally funded research and development center sponsored by the U.S. Department of Defense. Copyright 2008 Carnegie Mellon University. NO WARRANTY THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN "AS-IS" BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT. Use of any trademarks in this report is not intended in any way to infringe on the rights of the trademark holder. Internal use. Permission to reproduce this document and to prepare derivative works from this document for internal use is granted, provided the copyright and "No Warranty" statements are included with all reproductions and derivative works. External use. Requests for permission to reproduce this document or prepare derivative works of this document for external and commercial use should be addressed to the SEI Licensing Agent. This work was created in the performance of Federal Government Contract Number FA8721-05-C-0003 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center. The Government of the United States has a royalty-free government-purpose license to use, duplicate, or disclose the work, in whole or in part and in any manner, and to have or permit others to do so, for government purposes pursuant to the copyright license under the clause at 252.227-7013. For information about purchasing paper copies of SEI reports, please visit the publications portion of our Web site (http://www.sei.cmu.edu/publications/pubweb.html).

Table of Contents

Acknowledgments Executive Summary Abstract 1 Introduction 1.1 The Need for Analysis of Systems of Systems 1.2 The Assurance Problem Is Hard And Getting Worse 1.3 Strategy for a Solution Building a Shared View of the Organization, People, Process, and Technology for Assurance 2.1 Survivability Analysis Framework 2.1.1 Mission Thread Steps and Step Interactions 2.1.2 Structuring an SAF View of a Business Process 2.1.3 Identifying Critical Steps for Analysis 2.1.4 Evaluating Failure Potential Value Provided by Using a Shared View

vii ix xi 1 1 2 5 9 10 10 11 14 18 19 21 22 23 25 31 33 41 49

2

2.2 3

Assurance Cases for Business Process Survivability 3.1 A Notation for Assurance Cases 3.2 Developing an Assurance Case 3.3 A Survivability Assurance Case Conclusion Example SAF Business Process Mission Steps for Assurance Case

4

Appendix A Appendix B

References/Bibliography

SOFTWARE ENGINEERING INSTITUTE | I

II | CMU/SEI-2008-TR-008 .

Evidence Figure 5: An Assurance Case Fragment Figure 6: A Possible Top-Level Safety Case Structure Figure 7: A Top-Level Survivability Case Figure 8: An Alternative Structure 3 11 18 21 23 24 24 25 SOFTWARE ENGINEERING INSTITUTE | III .List of Figures Figure 1: Development and Sustainment Context Figure 2. Arguments. Survivability Analysis Framework Figure 3: How Systems Fail Figure 4: Claims.

IV | CMU/SEI-2008-TR-008 .

List of Tables Table 1: Evaluation Criteria Table 2: SAF View of Example Step A Table 3: SAF Critical Step View with Claims Table 4: SAF Critical Step View with Failure Potentials Table 5: SAF People Summary View for Steps A1 through A3 Table 6: SAF Resource Summary View for Steps A1 through A3 6 14 14 15 17 17 SOFTWARE ENGINEERING INSTITUTE | V .

VI | CMU/SEI-2008-TR-008

Acknowledgments

This work is the result of many incremental pieces and could not have been accomplished without the support, involvement, and encouragement of each contributor. The core approach for the Survivability Analysis Framework (SAF) was conceived in support of a project for Office of the Under Secretary of Defense for Acquisition, Technology and Logistics (OUSD [AT&L]) to study how survivability of critical multi-service capabilities might be impacted by planned increased interoperability. This framework was initially developed to analyze mission survivability of the Joint Battle Management Command and Control (JBMC2) business processes (also known as mission threads) as the U.S. Department of Defense (DoD) pushes toward increased interoperability with the envisioned Global Information Grid (GIG). Limitations in available analysis techniques required those of us working on that project to build a new way to characterize and analyze business processes. Researchers from the Software Engineering Institute’s Dynamic Systems Program worked with researchers from SEI’s Networked Survivable Systems (NSS) Program to assemble this initial effort. Expansion and refinement of the SAF continued through projects supported by the U.S. Air Force’s Electronic Systems Center (ESC) Cryptologic Systems Group (CPSG) to identify ways of connecting information assurance acquisition decisions to mission critical needs. In partnership with Robert Flores, lead security engineer at CPSG/NIS, GIG IA Solutions Division, and David Eyestone, CPSG/NIS, the NSS team was able to strengthen the structure of the framework to better support ways of connecting information assurance needs to mission drivers. Through work funded by Edgar Dalrymple, associate director for software and distributed systems, PM Future Combat Systems (Brigade Combat Team), the potential for building an assurance case with information about business processes from the SAF was identified. Preliminary usage of SAF with reliability and security assurance cases for FCS provided an opportunity to experiment with blending the analysis techniques. A “seed grant” from Carnegie Mellon Cylab for academic year 2006-2007 provided the support for the authors to develop publicly available materials describing the framework and its use in assembling an example assurance case.

SOFTWARE ENGINEERING INSTITUTE | VII

VIII | CMU/SEI-2008-TR-008

Failure analysis opportunities are introduced using the artifacts constructed in the medical example. Much like a legal case presented in a courtroom. The SAF is designed to address the following: identify potential problems with existing or near-term interoperations among components within today’s network environments highlight the impact on survivability as constrained interoperation moves to more dynamic connectivity increase our assurance that the business process can survive in the presence of stress and possible failure Section 2 of this report describes. Much of the information needed to assemble this view is scattered among a range of stakeholders and must be gathered through documents and workshops.Executive Summary The growing need for interconnected and often global operations means business processes are structured to include many organizations. and users can share. The steps needed to assemble the assurance case are described using the medical business process example developed in Section 2. and technology support is multi-system. software engineers. providing insight into how they should work without consideration for what is actually in place. through the use of a medical business process example. actual operational usage is poorly supported. This results in organizations failing to identify and address the growing challenges of systems of systems [Maier 1998]. and technology. Few analysis techniques provide a way to characterize beyond the limitations of the single system and many are also limited to a finite set of stakeholders. system engineers. SOFTWARE ENGINEERING INSTITUTE | IX . SAF provides a structure for capturing information about a business process so that gaps are readily identifiable. By combining these two analysis techniques. process. With the construction of a structured view of a business process using SAF. Pilot usage of SAF has shown that most characterizations of business processes are idealized. was developed to help organizations characterize the complexity of multi-system and multiorganizational business processes. architects. resulting in differences and conflicts among the systems developed and used to support each organizational area. a method for documenting justified confidence that survivability has been adequately addressed. The third section of this report introduces the assurance case. the strengths and gaps for the survivability of a business process can be described in a graphical and visually compelling form that management. the steps required to apply SAF and the resulting artifacts. Survivability gaps arise when assumptions and decisions within one organizational area are inconsistent with those of another. The Survivability Analysis Framework (SAF). an assurance case is a comprehensive presentation of evidence with argumentation linking the evidence with claims that certain properties have been satisfied. a structured view of people. a great deal of the evidence needed to support the claims of an assurance case can become visible. When technology is developed to only address ideal usage. Section 1 of this report is focused on helping the reader understand the complexity and challenges of systems of systems.

X | CMU/SEI-2008-TR-008 .

increasing the layers of people. processes. managed. a structured view of people. Organizational and technology components that must work together may be created. dependencies. (2) consider architecture tradeoffs carrying impacts beyond a single system.Abstract Complexity and change pervade today’s organizations. and risks for critical technology capabilities in the face of change. The Survivability Analysis Framework (SAF). was developed to help organizations analyze and understand stresses and gaps to survivability for operational and proposed business processes. and technology. In response. process. Netcentric operations and service-oriented architectures will push this trend further. and systems. and maintained by different entities. and (3) consider the linkage of technology to critical organizational functions. The SAF is designed to identify potential problems with existing or near-term interoperations among components within today’s network environments highlight the impact on survivability as constrained interoperation moves to more dynamic connectivity increase assurance that mission threads can survive in the presence of stress and possible failure SOFTWARE ENGINEERING INSTITUTE | XI . a team at the Software Engineering Institute (SEI) built an analysis framework to evaluate the quality of the linkage among roles. constraints. Existing analysis mechanisms do not provide a way to (1) focus on challenges arising from integrating multiple systems.

XII | CMU/SEI-2008-TR-008 .

increasing the layers of people. indeed. the steps required to apply the Survivability Analysis Framework (SAF) and the resulting artifacts. Software is. and a great deal of uncertainty results regarding both the configuration of the SoS at any given time and the behavior that can be expected by its constituents. requirements for software assurance and other quality attributes related to software dependability and supportability need a strong emphasis. business work processes require integration across multiple systems. and systems that must work together for successful completion of a business process. and maintained by different entities around the globe. Organizational and technology components that must work together may be created. This must change. Greater scale. through the use of a medical business process example. SOFTWARE ENGINEERING INSTITUTE | 1 . Historically. and will only change when the acquirer provides incentives for performance with respect to assurance and quality requirements. Section 1 of this report is focused on helping the reader understand the complexity and challenges of systems of systems. By combining these two analysis techniques. Section 2 of this report describes. Net-centric operations and service-oriented architectures will push this trend further. The third section of this report introduces the assurance case. the strengths and gaps for the survivability of a business process can be described in a graphical and visually compelling form. dependencies. a work process that supports a just-intime supply chain for manufacturing can involve multiple organizations. 1. Failure analysis opportunities are introduced using the artifacts constructed in the medical example.1 Introduction Complexity and change are pervasive in the operational environments of today’s organizations. In response. and risks for critical technology capabilities in the face of change. managed. schedule. uncertainty. Existing analysis mechanisms do not provide a way to (1) focus on challenges that arise from integrating multiple systems. a team at the Carnegie Mellon University Software Engineering Institute (SEI) built an analysis framework to evaluate the quality of the linkage among roles. For example. efforts to build in these quality attributes have had much lower priority than efforts to develop functionality. essentially an enterprise system of systems [Maier 1998]. and the delivered domain functionality. Hence. except for safety-critical systems and systems controlling financial transactions. (2) consider architecture tradeoffs that carry impacts beyond a single system. constraints. not just cost. and (3) consider the linkage of technology to critical organizational functions. Add to this the move toward decentralization and the pace at which business and mission requirements change. a method for documenting justified confidence that survivability has been adequately addressed. the mechanism that enables systems of systems to function. processes. and complexity bring with them a rapidly growing set of failure modes. It is clear that the move toward systems of systems (SoS) is increasing business and government use of software at unprecedented levels of scale and complexity.1 THE NEED FOR ANALYSIS OF SYSTEMS OF SYSTEMS Increasingly.

2 | CMU/SEI-2008-TR-008 . The power grid failure in the northeastern United States and Canada in the summer of 2003 is another recent example of the effects of a system failure.S. The acquisition and development processes typically concentrated first on the functionality required and then on monitoring development and reviewing products to ensure the required functionality was provided. Rubin.Technologies such as Web services make it easier to assemble systems. Simpler systems could be understood and their behavior characterized. The power grid failure was not caused by a single event but by a cascading set of failures. 17. Fairly simple computing architectures that could be understood and their behavior characterized have been replaced by distributed. Customers of Skype. 1. a professor of computer science at Johns Hopkins University.000 international travelers flying into Los Angeles International Airport were stranded on planes for hours one day in mid-August 2007 after U. For example. Requirements were assumed to be relatively static and development monitoring often concentrated on costs and schedule. While the individuals interviewed for the New York Times article included a number of wellknown computing security experts. but greater complexity brings unintended consequences.2 THE ASSURANCE PROBLEM IS HARD AND GETTING WORSE Systems acquisition and development have historically focused on the development of a system that operated in a stand-alone fashion or had few interactions with other systems. Aviel D. but ease of assembly may only increase the risk of deploying systems whose behavior is not predictable. and interdependent networks. Voting machine failures continue to be publicized. Business usage requires change. noted that the assurance focus for voting machines might have been too much on hackers and not enough on accidental events that sometimes can cause the worst problems. the general observations focused more on the underlying complexity than on security. The Los Angeles airport failure was traced to a malfunctioning network card on a desktop computer that slowed the network and set off a domino effect of failures on the customs network. including a significant software failure affecting both the primary and backup operating structures. Customs and Border Protection Agency computers went down and stayed down for nine hours. encountered a 48-hour failure in August 2007. Things break. but such change increases complexity by attempting to integrate incompatible computer networks or increasing the scale and scope of systems beyond the ability of the current capabilities to manage and sustain. The logins overloaded the Skype network and revealed a bug in the Skype program that normally would have mitigated the excessive network load by reallocating computer resources [Schwartz 2007]. The theme of a 2007 New York Times article is captured in a quote by Peter Neumann: “We don’t need hackers to break the systems because they’re falling apart by themselves” [Schwartz 2007]. The Skype failure was initiated by a deluge of login attempts by Skype users whose computers had restarted after downloading a monthly Microsoft security update. interconnected. Problems are increasingly difficult to identify and correct with the shift from “stovepipes” (stand-alone systems) to interdependent systems. Complex systems break in complex ways. Most of the problems we have today have nothing to do with malice. the Internet-based telephone company.

Such drivers include large and diverse stakeholder community. The development of a single system that is a component of an SoS has to confront a number of development drivers as shown in Figure 1 that are not adequately addressed by the techniques appropriate for the development of a stand-alone system [Creel 2008]. Mission. and systems—with causes or impacts beyond an individual system boundary SOFTWARE ENGINEERING INSTITUTE | 3 . Regulatory. which complicates requirements elicitation and tradeoff analysis potential for change from multiple directions at any time—from any system-of-systems constituent as well as from evolving business requirements wider spectrum of failures—users. software. computing support for business work processes increasingly requires integration of multiple systems forming an SoS.Today’s Business. The management of some drivers should among the success criteria for a design. and Technology Environment Focus on Business Integration Goals System -of -Systems Integration & Interoperation Large and Diverse Set of Stakeholders Many Possible Sources of Often Unpredictable Change Increasing Uncertainty & Complexity Wider Spectrum of Wider Spectrum of Failures Failures Less Development Freedom Less Visibility and Understanding Acquisition Strategy Figure 1: Development and Sustainment Context As noted earlier. operational.

For example. security. a successful acquisition likely means that no one is completely satisfied. Business requirements increase the likelihood of failure by bringing together incompatible systems or by simply growing beyond the ability to manage change. interoperability. business and mission systems are expected to adapt to market changes and changes in the world environment. creating technical and management problems. the focus is increasingly on the development of systems that are intended to function as constituents within a larger SoS context. In such an environment. Such drivers include limited development freedom—development choices are restricted by constraints associated with design tradeoffs and usage associated with existing systems limited or no knowledge of individual system structure and behavior and reduced or no visibility into runtime state Large and Diverse Stakeholder Community The evolution from stand-alone systems. a system may need to communicate with other systems that were not identified up front. the addition of devices such as cell phones or portable computers and the underlying software and networking technology can significantly affect risk mitigations and the system architecture. it becomes critical to specify requirements related to assurance goals and to build in the qualities needed to enable acceptable operation in the midst of a high degree of complexity and change. other drivers limit the design options or limit the analysis that can be done. but often modifications have to be implemented where components such as commercial off-the-shelf (COTS) or legacy components cannot be easily changed. but ease of assembly may only increase the risk of deploying systems whose behavior is not predictable. to systems with a few known interfaces. An individual system may support multiple work processes and hence must be responsive to changes in each. interconnected. and users expect an ability to adapt their systems accordingly. adaptations one constituent makes to respond to change may result in unintended side effects. Along with providing new or modified capabilities. business and mission needs for SoS constituents continue to evolve. reliability/availability/maintainability. performance. In such an environment. This increase. to a system that is a component of an SoS increases the number and diversity of stakeholders. adaptability. Unpredictable and Dynamic Change Today. and scalability. Wider Spectrum of Failures Technologies such as Web services make it easier to assemble systems. Fairly simple computing architectures that could be understood have been replaced by distributed. In a system-of-systems environment. In today’s world. The stakeholders for a collection of systems may disagree on which quality attributes take precedence. As we 4 | CMU/SEI-2008-TR-008 . but everyone gets something they need and can use. not only to the constituent system but to other systems as well. Stakeholders are beginning to demand high levels of assurance for system qualities such as safety. Concurrently with development. and interdependent networks. usability. with constituents independently managed and operated. results in conflicts of interest and understanding that may be difficult to resolve. along with the potential volatility of the stakeholder community.While drivers such as change-from-all-directions lead to challenging design problems.

1. As that understanding grows during development. Limited Visibility and Understanding The task of eliciting and communicating requirements. but that flexibility is fully available only at the start of development and only to the extent that the environment allows. let alone influence all decisions made on behalf of its constituents. There is increased likelihood that a poorly managed discrepancy will result in additional discrepancies affecting additional participants. Failures may be the result of discrepancies between the expected activity and the actual behavior that occurs normally in business processes. legacy systems.3 STRATEGY FOR A SOLUTION The overall objectives for a solution include specifying a context that focuses analysis on the critical issues. and a single discrepancy can affect multiple participants. and evolution in both stakeholder needs and system configurations create unprecedented levels of uncertainty and complexity. and services to be integrated. SoS development has a sustainment flavor. System understanding is limited with COTS components. A potentially unbounded stakeholder community. Failures are frequently the result of multiple. the complexity and evolving nature of a system of systems limit the ability to fully identify risks. The acquisition consequences can be increased costs. and with independently developed and managed systems.depend more on interdependent systems. understand the consequences. All parties to an acquisition are working with an incomplete understanding of the problem. the number and diversity of components. Requirements will be incomplete. In addition. or the inability to satisfy a requirement. building an understanding of the risks associated with this context. Each participant has to deal with multiple sources of discrepancies. SOFTWARE ENGINEERING INSTITUTE | 5 . often individually manageable errors that collectively become overwhelming. understanding system interfaces and usage patterns. It is virtually impossible to understand everything about a SoS. Changes in business processes and systems often introduce these kinds of discrepancies. and refining assurance strategies is never quite finished. failures are not only more likely but also harder to identify and fix. An acquisition for a SoS context rarely means clean slate development as the SoS is usually an existing operational system. delayed delivery. all parties should be in a better position to understand the tradeoffs that may have to be made to resolve the problems. and analyze mitigations in advance of the start of development. Limited Development Freedom Software is touted for its flexibility in terms of meeting requirements. Dealing with discrepancies becomes much more difficult as the number of participants—people and systems—increases. systems. The overall success of a business process depends on how these discrepancies are dealt with by staff and supporting computing systems. An increasing number of failures are caused by unanticipated interactions between SoS constituents. and finding sufficient common ground among the multiple perspectives to create an effective solution with known but acceptable limitations.

and human. Evolving work processes are a significant driver of change and a source of often difficult integration problems. 6 | CMU/SEI-2008-TR-008 . software. Failure analysis has to be general and not delegated to just security and safety analysis. The SAF constructs an operational model for a work process that provides a context in which to reason about a wide spectrum of failures: technology. operational.2 suggest the criteria that are listed in Table 1. Table 1: Evaluation Criteria Development Drivers Criteria for Approach Evaluation maintains traceability between technical decisions and business requirements provides a shared view that allows people with different perspectives to see the issues various stakeholders have Large and diverse stakeholder community captures success criteria reflecting the primary business drivers establishes a basis for resolving requirement conflicts provides a framework (context) for making tradeoffs among alternatives provides mechanisms for dealing with change from a variety of sources Change from any direction considers the effects of normal evolution of usage and technology in independently managed and developed systems provides mechanisms for managing the multitude of identified failure outcomes. Wider spectrum of failures defines acceptable risk for the diverse stakeholder community in the operational context shows what constitutes a “best effort” solution to a problem and hence demonstrates due diligence in identifying and mitigating risks can be applied with incomplete information Limited visibility and knowledge identifies the effects of new or changed information on the existing analysis The SAF focuses on work processes (or what the DoD refers to as mission threads) that span computing systems. The focus on work processes enables better traceability from technological solutions to business priorities and provides a mechanism for developing a shared view among the many stakeholders. systems interoperability. which help to analyze possible approaches. The limited visibility and knowledge in this context also means that the users of a computing service will not have immediate knowledge of a cause of a service failure and if that failure was maliciously induced.The development drivers discussed in section 1. The SAF is introduced in Section 2 with an extended example.

Safety cases have been extensively applied to justify safety claims about a system. Chapter 3 introduces assurance cases as a way to establish confidence that a system is sufficiently survivable by showing why significant survivability threats have been adequately mitigated or eliminated.The complexity associated with failure analysis can also affect testing by generating too many test cases. A safety case can be generalized to an assurance case that can be applied to other quality attributes. SOFTWARE ENGINEERING INSTITUTE | 7 .

8 | CMU/SEI-2008-TR-008 .

satellite relays. Process. SAF focuses on the challenge arising from integrating multiple business units and systems. and considers the linkage of technology to critical organizational functions. audit. telephone switches. authentication and authorization authorities. which must function together to successfully achieve the work process’s objective. inquiry. and similar roles policies and practices—certification and accreditation. Execution of a business process requires an extensive list of components working in harmony: hardware—servers. administration for technology components. the responsibilities for quality and survivability are allocated across all of these components. PCs. packet switches. firewalls. potential weak points can be identified. SOFTWARE ENGINEERING INSTITUTE | 9 . configuration management. governance controls. outsourcing contracts. local and remote procedures. People. and the criticality of each component to the success of the business process can be evaluated. network protocols. An important characteristic of business processes is that they are constantly changing. From these examples. The level of complexity is too much to validate without developing specific examples to characterize how all of the pieces should work together. considers tradeoffs beyond a single system. Software and hardware upgrades must be expected. assumptions about the ways in which components will work together can be verified. and similar devices software—operating systems for each hardware platform. synthesis among multiple information sources. data storage devices. an organization’s needs are changing and the processes must adjust to these new requirements.2 Building a Shared View of the Organization. third-party access management. physical access controls. Unlike existing analysis methods. and the like From a pragmatic perspective. routers. and Technology for Assurance The Survivability Analysis Framework provides a mechanism for assembling the broad range of information that influences a business process in order to analyze it for quality and business survivability. verification. authentication packages. and others people—organizational roles for data entry. PDAs. In addition. SAF captures and analyzes the ways in which end-to-end business processes could be stressed and whether the stress-handling approaches applied within and among process steps are appropriate for successful process completion. Web applications. databases.

Further incompatibilities arise if a step manages a stress in a manner that is not expected by subsequent steps. We will use these terms interchangeably throughout this document. In some domains.1 Mission Thread Steps and Step Interactions Each critical step in a mission thread is tasked to fulfill some portion of mission thread functionality. may have dire consequences 1 SAF was piloted for Joint Battle Mission Command and Control (JBMC2) in analysis of a Time Sensitive Targeting mission thread for the OUSD (AT&L). consequently. the failure of any specific step may not necessarily doom a process. Each step will have outcomes (post conditions) that may interact with subsequent steps. and often global operations. most widely used analysis techniques primarily focus on a single system controlled from a single organizational unit and miss the growing challenges of systems of systems. systems. such stress does not necessarily cause failure. resources (technology. and unmanaged stresses can potentially lead to interaction failure. However. however. multi-organizational view to define operational survivability in the face of ever-increasing complexity requires a new approach to characterizing the linkages between organizational mission and technology. The behavior of the linkages coupled with the activities to be addressed in each step can also lead to stresses. For example. then the step can include actions to respond in a variety of ways. For instance. Each limitation represents a source of stress on the step and. because subsequent steps may continue to execute the thread. Linkages among steps are driven by three primary components: people.1 SURVIVABILITY ANALYSIS FRAMEWORK The Survivability Analysis Framework1 was developed to help organizations analyze and understand threats and gaps to survivability for complex operational business processes. on the business process. A second pilot analysis was completed for Time Sensitive Targeting information assurance for Electronic Systems Center. Preconditions establish the resources provided to the step. and Network Systems Division (ESC/CPSG NIS). and interaction limitations can lead to potential degradation of step actions.2 Environmental. Even the identity of prior and subsequent steps may vary across executions of a business process. business processes are referred to as mission threads. connectivity. policies. 2 10 | CMU/SEI-2008-TR-008 . Steps can be designed to manage a range of stresses and still respond appropriately or degrade gracefully. Additionally. and the like). or the process may be continually executed (such as a sensor). consider a step that receives some data as input. process. If the value received by the step is out of the expected range. This tasking represents a “contract” of interaction between the mission thread step and prior and subsequent steps.2. However. data. The growing need for interconnected. means business processes are less frequently bounded by a single system or contained within a single organizational unit. 2. the contract with prior and subsequent steps is not necessarily static and may have to be negotiated during execution to reflect the current situation.1. data or a human command). These preconditions may trigger the execution of the step (for example. Business processes are expected to be dynamic in content because each specific execution is unique. Cryptologic Systems Group. The need for a multi-system. an action might substitute a default value in place of the out-of-range value. This substitution. However. and actions.

A range of analysis mechanisms for evaluating the effectiveness of step interactions can use the SAF structure to identify potential failure conditions.1. 2. and actions are assembled in a structured format to accurately portray who does each action. For each step. Survivability Analysis Framework SAF characterizes the specific actions of each step in a business process and the linkages between each step.1. The described actions must be consistent with what participants actually do. impact of occurrences. These steps must be a realistic view of how the business process actually happens for an as-is view or how it is expected to happen for a future perspective. SAF provides a mechanism to capture for analysis the stresses that may impact a business process. what resources are critical to action performance. It also provides opportunities to analyze whether the stresshandling approaches adopted by a step are compatible with subsequent business process steps.2. the required people. and recovery strategies. resources.2 Structuring an SAF View of a Business Process The process for applying the Survivability Analysis Framework requires the identification of a representative example(s). and the resulting outcomes. SOFTWARE ENGINEERING INSTITUTE | 11 . which must be decomposed into a series of steps. which is described in Section 2. The SAF structure is applied using a process for characterizing a business process and expected quality. what initiates the action. Figure 2. By evaluating successful business process completion and considering ways in which success could be jeopardized.if the decision to manage the stress by substituting a default value is inconsistent with the subsequent step’s expectation for a highly accurate value.

The full example of a business process spanning a doctor’s office and hospital-associated lab is provided in Appendix A. Patient’s insurance arrangements confirmed and co-payment made F. Patient’s available records are assembled for use in office visit D. Nurse moves office records and patient into examination room G. The doctor.To appropriately characterize a business process. The selected cross-organizational business process example is as follows: A patient comes to the doctor for a follow-up visit. in many cases doctors are required to provide referrals. it is imperative that all stakeholders agree with the information. and released. While patients may choose where to have tests performed. The process of developing a well-articulated view of a business process that is shared by all stakeholders provides an opportunity to uncover differences in understanding. J. Doctor orders additional lab work Hardcopy paperwork returned to medical records unit K. Reminder sent to patient about scheduled office visit C. The remainder of this subsection of the report describes an example sequence of actions required to characterize a business process with the SAF. after examining the patient and reviewing the medical history along with the results of tests performed at the time of the office visit. Early diagnosis of critical patient conditions before they become crises is a goal for the physicians in this practice. Patient goes to lab for prescribed tests and registers at lab desk M. Based on the results of these tests. a course of treatment is prescribed and communicated to the patient. faulty assumptions. This individual was brought to the hospital emergency room several weeks prior with chest pains. and ways in which organizational boundaries could contribute to stress and potential failure. Patient makes an appointment for an office visit to follow up on hospital release B. The sequence of actions required to perform this example can be described as follows: A. In the doctor’s office example. Nurse takes vitals and electrocardiogram (EKG) (office policy for heart attack patients) and updates office hardcopy records in examination room for doctor H. The survivability analysis starts by selecting an important business process and assembling a general description of what organizational need it addresses and why. Patient arrives and checks in for scheduled appointment E. it is important that lab tests ordered for the patient be performed properly and results communicated to the doctor in a timely manner. Lab paperwork prepared and queued for phlebotomist 12 | CMU/SEI-2008-TR-008 . Doctor examines patient. Office visit information transcribed into office electronic medical record L. orders further blood tests. reviews records and EKG I. treated for a mild heart attack. Much of the diagnostic work is outsourced to local laboratories and hospitals. The Health Insurance Portability and Accountability Act (HIPAA) regulations control the sharing of patient identification data with the lab or hospital and their subsequent link to reporting of results back to the doctors.

results capture from test equipment. which includes the capability for authorized individuals to link to the hospital database and extract available patient data. and upgrades). Staff turnover is high. For step A the following table is constructed: SOFTWARE ENGINEERING INSTITUTE | 13 . electronic medical records. Local administrative support is provided through a contract with the local hospital in conjunction with the database connectivity. Doctor reviews test results. troubleshooting. Report transmitted to doctor’s office (via email) R. each step in the business process example can be described. additional information about the context in which the business process is performed and its participants is needed. The office has implemented it as a turn-key system with support provided (for a fee) by the hospital vendor. System development and support is handled by the lab group’s central office. labels it for lab technician O. Lab technician performs tests on sample and generates report P. The lab context is described as follows: LABTEST system is constructed to use the hospital database as an information repository and patient billing is handled by the hospital. and office manager). test paperwork management. and doctor notification. a description of the preconditions. People and required resources must be identified. Treatment plan communicated to patient For each step in the example. The technical characteristics of this system are described in a manual from the hospital. office clerks. billing clerks. The office context can be described as follows: Patient scheduling. Using the available context information.N. The local office has applications for patient check-in. actions. and post conditions must be assembled. few workers are in their positions beyond a year. To assemble this view of the business process. Phlebotomist takes blood. doctors. Technical support is provided electronically from the vendor (maintenance. Administrative control of the office system is handled by the medical records manager (also known as office manager). Laboratory system activities are streamlined to handle large volumes of input. and billing are handled using a package system provided from the hospital (EPICARE). Everyone working at the office has been in their positions for several years. develops written treatment plan for patient (electronic or hardcopy) S. Lab results transmitted to hospital central repository Q. Everyone working at the doctor’s office has individualized access to the system (nurses.

date. doctor availability. Of particular concern are steps O and P. For this business process example. where tests are performed and information is transferred from the lab to the doctor’s office under the control of a third party (the hospital). A review of the steps critical to meeting the success criteria for the business process requires focused attention on steps L through Q. Patient information is transferred reliably and accurately in a timely manner with all privacy needs addressed.Table 2: SAF View of Example Step A Step A Patient makes an appointment for an office visit to follow up on hospital release patient requires follow-up doctor’s visit for hospital stay Preconditions appointment staff has appropriate authorization to access scheduling. we add claims describing how the actions in the step contribute to success of the business process. time. What constitutes successful business process completion? Many actions may be included which do not directly contribute to successful execution of the business process and would not warrant in-depth analysis. and updates patient contact information as needed appointment staff accesses doctor’s schedule appointment date and time selected and updated with patient agreement appointment flagged as follow-up to hospital stay Post conditions appointment notification scheduled for day before appointment appointment is scheduled and in the system for proper patient. doctor Actions The description tables for the remainder of the steps in the medical example can be seen in Appendix A. and patient demographic information telephone and computer system are available patient calls doctor’s office appointment staff answers phone appointment staff accesses. this activity may not be useful. the following constitutes success: All ordered tests are appropriately performed in a timely manner and results accurately communicated to the requesting doctor.3 Identifying Critical Steps for Analysis While it is possible to assemble a large amount of detailed information about each step in the process. In order to guide the analysis. For each step selected for closer attention. it is necessary to clearly articulate the goals of the business process. verifies. 2. Step L is described as follows: Table 3: SAF Critical Step View with Claims 14 | CMU/SEI-2008-TR-008 .1.

such as technician cannot identify the patient associated with the test results SOFTWARE ENGINEERING INSTITUTE | 15 . potential causes of failure must be assembled to identify the ways in which completion of this step could be hampered (failure outcomes). the description would be expanded as follows: Table 4: SAF Critical Step View with Failure Potentials Step O Lab technician performs tests on sample and generates report blood and paperwork ready Precondition technician loads proper machine with blood samples bar code on vial indicates patient and proper test to machine machine runs tests Action each machine sends results to lab’s database collecting point results collated into report for transmission to the hospital repository report exists Post condition blood disposed of properly technician performing work is identified and linked to results all required tests were run no unordered tests were run test results are accurately recorded Claim test results are associated with the right patient lab audit trail exists—who did the work. For step O. and so forth access to results meets HIPAA requirements.Step L Patient goes to lab for prescribed tests and registers at lab desk patient has an order for lab work Preconditions system is in place for collecting patient demographic and insurance information collect patient insurance and billing information Actions record doctor to receive report medical order entered into system patient is queued for blood work Post conditions medical order for lab work is properly entered into the system all HIPAA privacy constraints are met Claims patient information is accurately input into the laboratory system The actions in this step are expected to support the goal for accuracy and privacy of patient information. who was the operator. For steps of particular concern.

insurance company. or similar faults disclosure unauthorized entity (person. contaminated. the controlling role (decision maker) for each step should be indicated so shifts in responsibility as well as organizational shifts can be visually articulated. These represent governance and policy change points where friction is likely.Step O Lab technician performs tests on sample and generates report missing (or delayed) results: some or all tests are not done some unrequested tests were performed wrong results: Failure outcomes results do not reflect the actual sample disclosure: results are disclosed to unauthorized person test results are not associated with the correct patient test results are not associated with the correct doctor missing results paperwork requiring tests to be run was lost or misplaced blood samples were lost. has faulty reagents. the steps chosen for critical focus have been renumbered A1-A5. Two summary views are constructed for the selected critical steps to focus attention on people and resources (key stress sources). The people view identifies each role involved in each step. Appendix B has a full listing of the expanded description tables including the failure outcomes for steps A4 (O) and A5 (P). but they are not the correct results analysis machine is not calibrated. The table for steps A1-A3 appears as follows (controlling role is marked as “C” and participants are marked as “X”): 16 | CMU/SEI-2008-TR-008 . or misplaced some tests were not run by the technician wrong tests were run by the technician some or all test results were not associated with the correct patient (in the lab) some or all test results were not associated with the right doctor (in the lab) Potential causes of failure lab database was inaccessible for receiving results machine did not produce results machine was not working and could not produce results wrong results machine doing the test has an undetected internal failure so results were produced. or others) gained access to the analysis results during analysis (in the lab) For convenience. If it is known.

resources should be assembled in groups based on the way the organization has allocated management for them. For example. SOFTWARE ENGINEERING INSTITUTE | 17 . The full people and resource tables for steps A1-5 of the medical example are provided in Appendix B. resources controlled by the doctor’s office would be grouped separately from those controlled by the laboratory or other thirdparty contracts.Table 5: SAF People Summary View for Steps A1 through A3 A1) Patient to lab A2) Lab prepares paperwork A3) Blood sample drawn Patient Lab check-in staff Phlebotomist Lab technician X C C X C The resource table for these same steps is as follows: Table 6: SAF Resource Summary View for Steps A1 through A3 A1) Patient to lab Lab work order Patient insurance data HIPAA forms Lab scheduling Lab test repository and reporting system Blood sample Lab paperwork (labels) Testing machine Testing machine connectivity Doctor’s office connectivity X X X X X X X X A2) Lab prepares paper work X A3) Blood sample drawn To better characterize more complex business processes. resources controlled by a specific business unit would be grouped together. This provides visibility to potential variations in governance (policy) and allocation models (such as service level agreements) that could impact performance of the business process.

3. could lead to lost test results.1. Failures are frequently the result of multiple. In building the failure outcomes for each critical step as described in 2. 18 | CMU/SEI-2008-TR-008 .4 Evaluating Failure Potential Stresses are the normal variations that occur constantly in the course of performing a business process.1. EVENT A failure or System State stress change Normal activity An event edges system state closer to orange zone. A second event followed by increased usage pushes state towards red. leading to potential failure of a critical step and subsequent impact on the successful completion of the business process. a range of stress types and potential failures should be considered. The delays temporarily reduce the available capacity to deal with other events. In addition. Figure 3: How Systems Fail Business process failures can be caused by changes in usage as well as traditional causes such as hardware failures. combined with limited storage capacity for data at the lab. unexpected errors and variations which the business process is not designed to accommodate can occur.2. often individually manageable errors that collectively become overwhelming. The functionality of the initial deployment of a system may suggest other applications that were not anticipated in the initial design. a test equipment failure can delay test results for a significant number of patients. Figure 3 describes this pattern of system failure. Some are expected variations that the process is constructed to accommodate such as higher volumes. Large distributed systems are constructed incrementally. Using our medical example. The occurrence of an additional problem such as transmission problems to the hospital database. Users frequently exploit system functionality in unanticipated ways that improve the business processes but that may also stress the operation of components that were not designed for the new usage.

Work processes span multiple systems. incorrect. and ways in which organizational boundaries could contribute to stress and potential failure. analysis paralysis. out of date. unexpected. inconsistent. Human interactions may be necessary to bridge between systems. faulty assumptions. A business process breakdown results from a combination of failures that drive operational execution outside of acceptable limits. Each participant has to deal with multiple sources of discrepancies. resources. unavailable. and a single discrepancy can affect multiple participants. especially as the system is extended and repaired. unintelligible. There is increased likelihood that a poorly managed discrepancy will result in failures affecting additional participants. excessive. This is what SAF enables. This shared view identifies critical steps and the ways in which these could fail. when they exceed an expected range of tolerance. distraction (rubbernecking). incomplete. eroding the boundary between people and system and establishing critical business process dependencies on people interacting with multiple systems 2.2 VALUE PROVIDED BY USING A SHARED VIEW The process of developing a well-articulated view of a business process that is shared by all stakeholders provides an opportunity to uncover differences in understanding. diffusion of responsibility (for example. Inconsistencies must be assumed as we compose systems: Systems developed at different times exhibit variances in technology and expected usage. interrupted people-triggered stress—information overload. process and technology. A system will not be constructed from uniform parts. The overall success of a business process depends on how effectively discrepancies are accommodated through the people. can also drive a business process into failure. An organization constructs a well-articulated view of example business processes documenting the interrelationships of people. latency. For example. Stresses may include interaction (data) triggered stress—missing. and a failure of one system can affect the overall work process as well as other participating systems. “it’s not my job”). SOFTWARE ENGINEERING INSTITUTE | 19 .Stresses. and actions that comprise the end-to-end process. selective focus (only looking for positive reinforcement). Changes in business processes and systems can introduce new types of discrepancies. there are always some misfits. Dealing with discrepancies becomes much more difficult as the number of participants—people and systems—increases. inappropriate. leading to business process failure. duplicate resource triggered stress—insufficient. a system that was developed for a local facility but is now supporting a national process for sharing information among many facilities may require revision to accommodate the increased complexity of information interchange. lack of skills or training Discrepancies (stresses and errors) arise normally in business processes.

and software engineers. and resource interactions. if any. the focus is on the endto-end business process and is best initiated by consulting with those individuals responsible for actual execution of the business processes. users. This brings together a range of knowledge that is usually broadly dispersed in the organization among people who have limited. Though the steps to construct this shared view can be time consuming. For SAF. Many analysis techniques focus primarily on the technology systems and only consider people and resources in light of direct interactions with the technology. an assurance case may be needed. technology architects. the impact of change can be expressed as its effect on the people. There are many ways in which a business process can be hampered. SAF provides a structure for gathering and visually assembling business process information that can be useful to management. drawing this dispersed information together in a shared view allows all organizational participants to understand their role in the process and the ways in which choices they make affect others. and operational resources knowledgeable in the organizational technology infrastructure. Proposed changes to a business process can be evaluated to determine potential problems for process success and requirements for effective mitigation. When technology is developed to only address ideal usage. Determining the critical steps and the failure outcomes can require the active participation of many stakeholders. Those discrepancies that cause a critical impact to the end-to-end business process are the primary ones that warrant considerable investment in analysis. processes. process. The focus must be maintained on successful completion of the business process (satisfactory execution of each critical step). Much of the information needed to assemble this view is scattered among a number of stakeholders and must be gathered through documents and workshops. With the availability of a shared view that includes the full range of people. This is the subject of the next section in this report. interaction. In order to clearly articulate why and how critical steps in a business process are structured to effectively mitigate potential failure.Analysis of this information provides an opportunity to show how the various parts and pieces of technology fit (or should fit) together with the user and organizational aspects to form a repeatable and reliable end-to-end business process. actual operational usage is poorly supported. The long-term value in assembling shared views of important business processes is the ability to consider the effect of change on operational success over time. Technology is only one component in the total business process and must be evaluated in light of its support of the key business drivers and operational success. 20 | CMU/SEI-2008-TR-008 . The shared view of the business process constructed using SAF can be mined to provide much of the information needed in the development of an assurance case. providing insight into how they should work without considering what is actually in place. Pilot usage of SAF has shown that most characterizations of business processes are idealized. including business process owners. system engineers. and resources that make up the business process and contribute to its ongoing success. functional and information subject matter experts.

Arguments. The claim to the left is the overall claim. Assurance cases are an approach for showing why we believe that such threats4 have been adequately addressed. Evidence without supporting arguments is unexplained. When used in this manner assurance cases are referred to 3 A significant threat is one that presents a significant risk when combining the consequences of failure and the probability of failure. The argument is that if those subclaims are valid then the overall claim is valid. The argument continues with further subclaims until. SOFTWARE ENGINEERING INSTITUTE | 21 4 . The argument is the explanation of how the available evidence can reasonably be interpreted as indicating the required levels of survivability. Figure 4: Claims. and evidence. no argument is necessary. The Survivability Analysis Framework is an approach for identifying significant survivability threats. a claim is supported by evidence. and the notion is gaining interest in the United States [Jackson 2007]. a survivability assurance case is a comprehensive presentation of evidence. In this document we will use “threat” to mean either unless we explicitly state otherwise. simulation results. ultimately.3 Assurance Cases for Business Process Survivability How do we establish confidence that a business process (including its underlying technology) is sufficiently survivable? Only by identifying all significant survivability threats3 and showing why we believe they have been adequately controlled or eliminated. hazard analyses. showing the relation between claims. Assurance cases are not a new idea. Evidence Figure 4 is a graphical representation of an assurance case. Much like a legal case presented in a courtroom. A hazard is something that happens by accident. for which. modeling. inspections. Arguing survivability without evidence is unfounded. A threat to survivability is something done intentionally to exploit a vulnerability. formal analyses. and the like. In Europe (particularly) assurance cases have been used extensively to develop confidence that a system is acceptably safe. by its very nature. argument. with argumentation linking the evidence to claims that important survivability properties have been assured. The evidence may consist of test results. It is supported by two subclaims.

as a means of presenting a safety case. In GSN claims are stated as predicates (that is. and other systems where safety is a significant concern. or GSN. true or false statements) and denoted by rectangles. As safety cases have become more prevalent.1 A NOTATION FOR ASSURANCE CASES Creating and presenting assurance cases that are convincing to outside reviewers requires some care. argument strategy. and some context information (denoted by a rectangle with round ends) that helps to explain the claim. then that is sufficient to show that the claim “The system under consideration is survivable” is valid. We use GSN in this section to document examples of survivability assurance cases. or more simply.” Supporting evidence for this claim is shown in the evidence circle. The overall argument is that if the subclaims (“Hazard 1 is mitigated” and “Hazard 2 is mitigated”) are valid. the argument strategy is to review each of the survivability hazards in turn. 22 | CMU/SEI-2008-TR-008 . This. One part of the argument. showing how each has been adequately mitigated (that is. consisting of the claim regarding hazard 2. 3. The argument is multi-pronged. It shows a top-level claim (“The system under consideration is survivable”) along with an argument structure supporting the claim. railway signaling systems. and tools supporting assurance case notations have been developed. Another part of the argument. aircraft control systems. Figure 5 is a fragment of a generic assurance case. is the claim that “Hazard 1 is mitigated. Finally. is not fully developed in the fragment. therefore. the third part of the argument claims that no other hazards significantly affect the survivability of the system. and other elements of a case. such as an oval annotated with a “J” (a justification). notations. evidence. a graphical representation can be much more easily understood by reviewers and more useful to the developers and maintainers of the system or business process being reviewed. One such notation is Goal Structuring Notation. which is used to help the reviewer follow the argument. In this case. This is denoted by the diamond under the rectangle. Although such a case can be presented textually. too. GSN is a graphical notation for showing the claims being made about a system and how arguments incorporating evidence support the claims [Kelly 2004].as safety assurance cases. as indicated by the strategy parallelogram. European laws require that safety cases be developed for (among other things) nuclear reactor systems. In the next subsection we describe GSN. controlled). needs to be further developed. The top level claim is annotated with an assumption about the claim (denoted by the oval with an “A” next to it). and the claim that no other hazard has a significant effect on survivability is valid. There are other GSN symbols not shown in this example. Specific graphical symbols specify claims.

A well-chosen structure will make the assurance case easier to develop and review. This is not always easy because there are alternative ways of developing (and presenting) the argument. The optimal structure for a safety assurance case will likely not be the optimal structure for a survivability case. SOFTWARE ENGINEERING INSTITUTE | 23 . One determinant of the structure of the assurance case is deciding what is to be assured. Conversely. The classic safety case has only to show that the system remains in a safe state. It is not obligated to show that the system functions correctly when there are no current safety issues. an assurance case designed to show that a system is survivable must worry about proper functionality in the presence of a threat or error condition.2 DEVELOPING AN ASSURANCE CASE The first step in developing an assurance case is to determine broadly the structure of the case.. C: Survivable System The system under consideration is survivable Ctxt: System Document X describes the systems and its properties related to survivability A S: Hazards to Processing Argue over the hazards to processing Ctxt: Hazards to Survivability Hazard 1.A: Assumptions for Claim For purposes of this claim we assume . Hazard 2 C: Hazard 1 Hazard 1 is mitigated C: Hazard 2 Hazard 2 is mitigated C: All Significant Hazards Any remaining hazards do not significantly affect survivability of the system Ev: Hazard 1 Evidence Evidence of hazard one's mitigation Figure 5: An Assurance Case Fragment 3..

(The implied argument is that a competent developer is one who conforms to good development standards and a competent developer produces safe systems). An assurance case that is not understandable or reviewable does not provide a significant level of assurance. confidence that the top-level claim is valid. C: CIA Maintained Confidentiality. a case that is easier to understand and review. that is. and Availability are maintained to an acceptable level C: Privacy Privacy is maintained to an acceptable level C: Processing The likelihood of the system operating incorrectly is sufficiently low C: Incorrect Processing Detected Failure to detect incorrect processing is reduced to an acceptable level S: Hazards Argue over the hazards to processing Figure 7: A Top-Level Survivability Case 24 | CMU/SEI-2008-TR-008 .C: Acceptably Safe The 'system' is acceptably safe to operate S: Risk Based Argue over hazards to safe operation S: Process Based Argue over compliance to applicable standards C: Hazard Causes All hazard causes have been exhaustively identified C: Hazard Mitigation All identified hazards have been mitigated C: More Other Risk-Based Claims C: Quality Control Appropriate quality control processes are followed C: Testing The system has been thoroughly tested C: Other Other process claims Figure 6: A Possible Top-Level Safety Case Structure Figure 6 shows a possible top-level structure for a safety case. Integrity. Breaking out the detection and mitigation from the actual hazard avoidance arguments results in. we think. Contrast this to a possible top-level structure for the survivability case shown in Figure 7. The argument is two pronged with one side evaluating risks specific to the system of interest and the other evaluating the competency of the developer. Notice that the claim labeled “processing” considers both hazards to be avoided and detecting problems when they inevitably occur.

You can make several false starts before settling upon a structure that is comfortable.Another determinant of the structure of an assurance case is the intended primary audience. For example. making it easier for them to find the important issues—and to see clearly how they have been resolved. argument. Achieving all of this is hard to do. A well-structured assurance case captures a model of how you or the intended audience think about a problem—what is important. You soon find that there are interactions between the attributes and. It structures the knowledge you have of a system and captures the implications of selected solutions to problems. 3. a lot of duplicate subarguments. and Availability are maintained to an acceptable level S: Argue over Attributes Argue over the attributes that make a system survivable C: Privacy Privacy is maintained to an acceptable level C: Integrity Integrity is maintained to an acceptable level C: Availability Availability is maintained to an acceptable level Figure 8: An Alternative Structure Figure 8 shows another possible way to structure a survivability argument. while an assurance case should be convincing to all. what needs to be highlighted. The linkage between threats and countermeasures is shown clearly by the claim. Integrity.3 A SURVIVABILITY ASSURANCE CASE When being used to describe system survivability. one that is security-centric in nature will be more convincing to some reviewers (that is. how things are related to each other. Breaking the argument down by the constituent attributes of survivability is appealing—until you actually attempt to develop the arguments beneath each attribute. an assurance case captures the full range of known relevant threats and vulnerabilities leading to significant failures. those with a security background) while one that is reliabilitycentric in nature will be more convincing to others—this in spite of the fact that both assurance cases purport to show the same top-level claim to be true. perhaps worse. It addresses issues that the target audience will find important. People from different parts of an organization come with different agendas. evidence structure of an SOFTWARE ENGINEERING INSTITUTE | 25 . C: CIA Maintained Confidentiality.

We’ll illustrate this section with an assurance case covering step A4 in our example (reproduced below for convenience). technician cannot identify the patient associated with the test results) (confidentiality) 26 | CMU/SEI-2008-TR-008 . integrity. Step A4 Lab technician performs tests on sample and generates report blood and paperwork ready Preconditions technician loads proper machine with blood samples bar code on vial indicates patient and proper test to machine machine runs tests Actions each machine sends results to lab’s database collecting point results collated into report for transmission to the hospital repository report exists Post conditions blood disposed of properly technician performing work is identified and linked to results all required tests were run (integrity. It’s also easy to see weaknesses in countermeasures. (this claim is only needed to diagnose sources of failure. availability) no unordered tests were run (integrity) test results are accurately recorded (integrity) Claims test results are associated with the right patient (integrity) lab audit trail exists—who did the work. and reporting results of blood tests. as well as how those weaknesses have been addressed. and so forth. and availability are maintained throughout the process of ordering. performing. integrity.assurance case. we have annotated the claims and failure outcomes with an indication of whether confidentiality. or availability is relevant. Since the assurance case will be focused on how well confidentiality. should a failure occur) access to results meets HIPAA requirements (for example. who was the operator.

integrity and availability not being maintained is reduced as low as is reasonably practicable. contaminated. or similar faults (integrity) disclosure unauthorized entity (person. but they are not the correct results (integrity) analysis machine is not calibrated or has faulty reagents. integrity. In this case we want to show that confidentiality. and availability (CIA) properties are maintained during Step A4. SOFTWARE ENGINEERING INSTITUTE | 27 . confidentiality) some or all test results were not associated with the right doctor (in the lab) (integrity.Step A4 Lab technician performs tests on sample and generates report missing (or delayed) results: some or all tests are not done (integrity. confidentiality) missing results paperwork requiring tests to be run was lost or misplaced (integrity) blood samples were lost. confidentiality) test results not associated with the correct doctor (integrity. confidentiality) Potential causes of failure lab database was inaccessible for receiving results (availability) machine did not produce results (availability) machine was not working and could not produce results (availability) wrong results machine doing the test has an undetected internal failure so results were produced.” 5 We are interested in being explicit about failure outcomes because mitigating the causes and consequences of significant failure outcomes is the way to improve system survivability. or others) gained access to the analysis results during analysis (in the lab) (confidentiality) The first step in developing an assurance case for Step A4 is to formulate the main claim. The claim ends up being straightforward: “The likelihood of confidentiality. availability) some unrequested tests were performed (integrity) wrong results: Failure outcomes 5 results do not reflect the actual sample (integrity) disclosure: results disclosed to unauthorized person (confidentiality) test results not associated with the correct patient (integrity. insurance company. or misplaced (integrity) some tests were not run by the technician (integrity) wrong tests were run by the technician (integrity) some or all test results were not associated with the correct patient (in the lab) (integrity.

” This phrase. We need to argue that confidentiality breaches are unlikely.This claim has several noteworthy aspects. Broadly they include missing results and wrong results. and (2) the cost of reducing it further. The above shows this breakdown. what needs to be done to process blood. “rarely occurs”) and the impact of the failure if it does occur (e..” This would be a so-called “sunny day” claim. integrity. C: CIA in Step A4 The likelihood of confidentiality.” C: CIA in Step A4 The likelihood of confidentiality. In an ideal world. ALARP means that the possibility of the hazard occurring has been reduced as low as makes sense given (1) the requirements of the system.g. ALARP would mean that the hazard is completely eliminated. Note. and that system unavailability is unlikely. Sunny day claims are difficult to work with because the argument beneath them tends to focus on functionality (that is. Instead we state the claim from a survivability viewpoint: we’ve reduced the possibility of hazards as low as is reasonably practicable. in this case) rather than survivability (what hazards need to be avoided or mitigated). that we’ve elected to consider confidentiality (“Acceptable Privacy”) separately from integrity and availability (covered under “Sample 28 | CMU/SEI-2008-TR-008 . however. ALARP allows the developers of the system to stop taking countermeasures once they’ve reduced the hazard “enough. integrity. is used frequently in assurance cases and is usually abbreviated as ALARP. that problems with system integrity are unlikely. Perhaps the most important is that we are not claiming that “Step A4 processes blood correctly. or availability not being maintained in Step A4 is ALARP To expand this claim we note that there are three properties to show. or availability not being maintained in Step A4 is ALARP A: Machine Processing All processing of blood is done by machine C: Acceptable Privacy The likelihood of a breach of privacy while the sample is being processed has been reduced ALARP C: Sample Processed The likelihood of incorrectly processing a sample is ALARP A Ctxt: Hazards to Processing Hazards to processing are documented in Step A4 (O).. “low impact”). namely the phrase “as low as is reasonably practicable. but in the real world there are some hazards that cannot be eliminated except at a cost out of proportion to the benefit obtained when considering the probability of the hazard occurring (e.g. introduced by safety case developers in the United Kingdom. This brings us to the aspect worth noticing in the claim.

The failures we need to detect are sample loss or that the wrong test was conducted on the sample. that the processing is all done by machine. and a context which refers to the hazards we’ll be addressing. leaving the “Acceptable Privacy” side to be expanded in the future. machine failure. To continue our exposition we’ll focus on one of the hazards. The above fragment also shows an assumption. These are all illustrated above. C: Processing Failure Detected The failure to detect the incorrect processing of a sample is reduced ALARP C: Detect Sample Loss The loss of a sample is detected C: Wrong Test Detected If a wrong test is run on a sample it is detected S: Hazards to Processing Argue over the hazards to processing C: Inaccurate Results Mitigated Machine inaccuracies are kept acceptly low C: Minimized Sample Loss The likelihood of sample loss is minimized C: Machine Failure Minimized The likelihood of inaccurate processing due to a machine failure is reduced ALARP C: Wrong Test Minimized The likelihood of a wrong test being performed is minimized To show that the likelihood of incorrectly processing a sample is reduced ALARP. we argue that hazards to normal processing are minimized and further that if one of the hazards actually causes a failure it will be detected (so that appropriate corrective actions can be undertaken or so that we can develop a record of how often particular hazards leads to a failure). a reviewer of the expanded argument may wonder why we have left out some obvious human factors. Without the assumption. SOFTWARE ENGINEERING INSTITUTE | 29 . for example. These elements help set expectations for the reader and reviewer. Broadly they include missing results and wrong results.Processed”). In this paper we’ll concentrate on the “Sample Processed” side of the argument. C: Sample Processed The likelihood of incorrectly processing a sample is ALARP A: Machine Processing All processing of blood is done by machine A Ctxt: Hazards to Processing Hazards to processing are documented in Step A4 (O).

Notice that once the assurance case has been completely developed for a system. the check list can be used to determine if the new system will satisfy the same claim as the old one—without developing a new assurance case. For a subsequent system development. 30 | CMU/SEI-2008-TR-008 . In general weak evidence or weak argument (such as an omitted claim or a specious link from a claim to a subclaim) leads to weak mitigation of hazards and a weaker overall assurance of survivability. as long as the assumptions within the assurance case remain valid and the new system is being developed in the same context as the old one. maintaining it. The assurance case discusses survivability hazards (or threats) that were identified as part of the SAF and shows how these hazards have been mitigated (that is. the results of any acceptance tests. availablily A. we have shown how an assurance case can be used to develop increased confidence in the survivability of a business process. the evidence circles can be assembled into a check list. or not returning any results. controlled). The intention is to provide a structure that is understandable to the variety of business process stakeholders. in this subsection. and other artifacts that show that show acceptable levels of reliability and accuracy Notice that the argument to keep failures to a minimum largely depends upon purchasing a sufficiently reliable machine. C: Inaccuracies The likelihood of the machine returning inaccurate results. cleanup. manufacturer data sheets. and routine maintenance standards Ev: Due Dilligence Materials Ev: Maintenance Logs Maintenance Log Ev: Training Materials Training Materials addressing these issues Ev: Training Logs Personnel Training Logs Including: original requirements. and accuraccy Acc. and operating it correctly. The case also shows what evidence needs to be collected to support the claims comprising the survivability argument. Since the laboratory does not build the machine. To summarize. calibration. it has no control over the machine other than at these points. If the training materials are weak then the arguments regarding hazards mitigated by a well-trained staff are correspondingly weaker. The argument captures this. The laboratory does have control over the training that its staff receives. operation.C: Machine Failure Minimized The likelihood of inaccurate processing due to a machine failure is reduced ALARP C: Machine Capable The machine is capable of providing acceptably accurate results at an acceptable level of reliability Ctxt: Acceptable Accuracy/ Reliability/Availability The machine is acceptable if it is capable of producing results with reliability R. is reduced ALARP C: Due Dillgence in Purchasing The parties responsible for purchasing the machine have done due dillgence in determining that it is acceptable for the intended purpose Ctxt: Maintenance Standards Maintenance standards are documented in XXX C: Machine Maintained The machine is maintained to manufacturer standards C: Trained Technicians Technicians are trained in startup.

process. Identify a subset of critical steps for further analysis. actions. Decompose the example into the sequence of steps required for end-to-end execution of the business process. The tables and graphical representations used in SAF and assurance cases structure the information collected for analysis and are readily used to highlight the strengths and gaps for survivability of a business process. Identify ways in which a step could fail to meet the specified claim in the form of failure outcomes and potential cause of these failures.1 of this report. The reader is encouraged to pilot the use of SAF with the selection of an important business process. The steps required for applying SAF to a business process. and resources needed to complete the step. and technology. Summarize people and resource usage across the critical steps to identify gaps in control and management of these key components. For each step construct a table of preconditions. as described in Section 1. Assemble claims about the contribution of each selected step to overall mission success. are summarized as follows: Identify a representative example(s) of the business process. Describe the unique context of each organizational components involved in the business process. their roles. described through an example in Section 2. As described in Section 3.1. SAF is most useful for environments where business processes rely on many independently constructed and supported systems. and post conditions that include all of the people. this structure can be applied to the development of an assurance case to support claims about business process qualities such as security.4 Conclusion SAF provides a structure for organizing information about a business process that incorporates people. The use of SAF to develop an example of a shared view for the selected example will support the following analyses: potential points of failure (stress analysis) survivability gaps (step interactions) mitigation strategies for critical business process failures identification of gaps in current people and resource requirements SOFTWARE ENGINEERING INSTITUTE | 31 .2.

32 | CMU/SEI-2008-TR-008 .

Patient’s available records are assembled for use in office visit D. and released. Lab results transmitted to hospital central repository Q. Doctor reviews test results. develops treatment plan for patient S. Reminder sent to patient about scheduled office visit C. This individual was brought to the hospital emergency room several weeks prior with chest pains. Based on the results of these tests. Nurse takes vitals and electrocardiogram (EKG) (office policy for heart attack patients) and updates office hardcopy records in examination room for doctor H. treated for a mild heart attack. after examining the patient and reviewing the medical history along with the results of tests performed at the time of the office visit. Treatment plan communicated to patient SOFTWARE ENGINEERING INSTITUTE | 33 . Doctor orders additional lab work Hardcopy paperwork returned to medical records unit K. Patient goes to lab for prescribed tests and registers at lab desk M. reviews records and EKG I. Patient makes an appointment for an office visit to follow up on hospital release B. BUSINESS PROCESS STEPS A. The doctor. Patient arrives and checks in for scheduled appointment E. Lab paperwork prepared and queued for phlebotomist N. Lab technician performs tests on sample and generates report P. Doctor examines patient. J. Phlebotomist takes blood. a course of treatment is prescribed and communicated to the patient. orders further blood tests.Appendix A Example SAF Business Process BUSINESS PROCESS EXAMPLE A patient comes to the doctor for a follow-up visit. Patient’s insurance arrangements confirmed and co-payment made F. Report transmitted to doctor’s office (email) R. Office visit information transcribed into office electronic medical record L. labels it for lab technician O. Nurse moves office records and patient into examination room G.

doctors. Everyone working at the doctor’s office has individualized access to the system (nurses. results capture from test equipment. troubleshooting. Laboratory system activities are streamlined to handle large volumes of input. System development and support is handled by the lab group central office. process. which includes the capability for authorized individuals to link to the hospital database and extract available patient data. SAF STEP DESCRIPTIONS Each step is described as to preconditions. The local office has applications for patient check-in. and patient demographic information telephone and computer system are available patient calls doctor’s office Actions appointment staff answers phone appointment staff accesses. The technical characteristics of this system are described in a manual from the hospital. actions. and doctor notification. Technical support is provided electronically from the vendor (maintenance. Administrative control of the office system is handled by the medical records manager (also known as office manager). The office has implemented it as a turn-key system with support provided (for a fee) by the hospital vendor. billing clerks. Local administrative support is provided through a contract with the local hospital in conjunction with the database connectivity. few workers are in their positions beyond a year. Everyone working at the office has been in their positions for several years. and technology that must occur in order to complete each step. and updates patient contact information 34 | CMU/SEI-2008-TR-008 . and office manager). test paperwork management. The lab context is described as follows: LABTEST system is constructed to use the hospital database as an information repository and patient billing is handled by the hospital. Staff turnover is high. electronic medical records. and upgrades). and post condition to fully characterize the interaction of people. verifies. doctor availability. Step A Patient makes an appointment for an office visit to follow up on hospital release patient requires follow up doctor’s visit for hospital stay Preconditions appointment staff has appropriate authorization to scheduling. and billing are handled using a package system provided from the hospital (EPICARE). office clerks.BUSINESS PROCESS CONTEXT The office context can be described as follows: Patient scheduling.

doctor Step B Reminder sent to patient about scheduled office visit appointment scheduled for next day Preconditions valid patient phone number available to scheduling system recorded message set up for appointment reminder service Actions Post conditions scheduling system dials contact number and sends recorded message linked to appointment date and time call is made to number on file with the appropriate information Step C Patient’s available records are assembled for use in office visit patient scheduled for appointment on current date Preconditions appointment flagged as hospital visit follow up Medical Records has access to hospital patient records matching of patient to proper records electronic and paper files (some identifier) Actions office files pulled for use hospital data (discharge summary) extracted from hospital database into office electronic record and printed updated office electronic record with hospital information Post conditions updated hard copy for office visit use Step D Patient arrives and checks in for scheduled appointment patient office records ready at check-in desk patient scheduled for appointment on current date Preconditions doctor has not had emergency requiring schedule adjustments check in access to scheduling system SOFTWARE ENGINEERING INSTITUTE | 35 . date.Step A Patient makes an appointment for an office visit to follow up on hospital release as needed appointment staff accesses doctor’s schedule appointment date and time selected and updated with patient agreement appointment flagged as follow up to hospital stay appointment notification scheduled for day before appointment Post conditions appointment is scheduled and in the system for proper patient. time.

Step D Patient arrives and checks in for scheduled appointment patient match to office record file flag patient as checked in Actions verify patient demographic data patient given HIPAA form to sign signed HIPAA form Post conditions patient sent to financial window with HIPAA form patient file queued for nurse pickup Step E Patient’s insurance arrangements confirmed and co-payment made patient standing at finance window patient has valid insurance card Preconditions co-pay required (optional) access to scheduling system and patient electronic record access to insurer’s data about the patient coverage validate insurance information in patient electronic record Actions co-pay collected (if required) and scheduling system tagged with payment validated insurance information for patient Post conditions patient registered for appointment with co-pay (if required) Step F Nurse moves office records and patient into examination room patient office records queued for nurse Preconditions patient in waiting room examination room available examination room prepared for office visit Actions patient and records moved to examination room patient prepared for examination Post conditions appropriate records are moved with the patient Step G(a) Preconditions Nurse takes vitals equipment for blood pressure. and other vitals ready for use 36 | CMU/SEI-2008-TR-008 . temperature.

EKG. ready for doctor Step G(b) Preconditions Nurse takes EKG EKG equipment ready for use performs required actions for doctor examination preparation Actions notes collected data in patient record notified doctor patient is ready for examination Post conditions patient EKG ready for doctor Step H Doctor examines patient. hospital discharge.Step G(a) Nurse takes vitals performs required actions for doctor examination preparation Actions notes collected data in patient record notified doctor patient is ready for examination Post conditions patient hard copy records annotated. prior medical history. and other information) doctor completes lab order form (blood tests) doctor updates patient records (hardcopy) noting lab orders lab order form given to patient to fulfill Post conditions patient released from appointment Step J Hardcopy paperwork returned to medical records unit SOFTWARE ENGINEERING INSTITUTE | 37 . reviews records and EKG patient ready for examination Preconditions EKG results available vitals information available doctor identifies potential health concerns Actions doctor identifies actions to be taken to address concerns Post conditions doctor has and reviews all available information for patient Step I Preconditions Actions Doctor orders additional lab work doctor has completed review of all available information (vitals.

labels it for lab technician printed paperwork (labels) and patient ready blood sample taken blood in properly labeled vials 38 | CMU/SEI-2008-TR-008 .doctor has completed patient examination Preconditions Actions Post conditions doctor’s interaction with patient has been incorporated into patient file patient file returned to medical records area and filed patient hardcopy medical documents stored for future retrieval Step K Office visit information transcribed into office electronic medical record patient hardcopy records returned to medical records unit Preconditions patient electronic medical record available for update transcribing resource had electronic access to electronic and hardcopy of medical records Actions Post conditions additions to hardcopy medical record typed into electronic patient record electronic medical record contains all hardcopy patient data Step L Preconditions Patient goes to lab for prescribed tests and registers at lab desk patient has an order for lab work system in place for collecting patient demographic and insurance information collect patient insurance and billing information Actions record doctor to receive report medical order entered into system patient is queued for blood work Post conditions medical order for lab work is properly entered into the system Step M Preconditions Actions Post conditions Lab paperwork prepared and queued for phlebotomist blood specimen requirements for each requested test are appropriately characterized within the system print labels and orders for phlebotomist paperwork (labels) printed for blood sample Step N Preconditions Actions Post conditions Phlebotomist takes blood.

Actions Post conditions results transmitted laboratory associated with results in hospital repository Step Q Notification given to doctor’s office (email) tests completed Preconditions report exists doctor’s email is provided email sent to doctor’s office notifying results are available Actions results placed in patient medical record Post conditions information notification received Step R Doctor reviews test results.Step O Lab technician performs tests on sample and generates report blood and paperwork ready Preconditions technician loads proper machine with blood sample bar code on vial indicates patient and proper test to machine machine runs tests Actions each machine sends results to lab’s database collecting point results collated into report for transmission to the hospital repository report exists Post conditions blood disposed of properly technician performing work is identified and linked to results Step P Lab results transmitted to hospital central repository test result report is available in the lab repository can match the lab’s patient ID with the hospital’s patient ID Preconditions hospital can authenticate the lab communications exist lab can authenticate hospital lab can provide authorized readers of the transmitted report if the request for tests came directly to them from the patient or doctor (not via the hospital). develops treatment plan for patient SOFTWARE ENGINEERING INSTITUTE | 39 .

Step R Doctor reviews test results. authorization. develops treatment plan for patient tests completed and report available at hospital central repository doctor received email notification Preconditions doctor’s office is able to access and retrieve report (authentication. and connectivity) doctor has connectivity and access to electronic medical record doctor reviews test report Actions doctor reviews office electronic medical record treatment plan for patient is prepared (written) Post conditions plan is given to nurse to notify patient Step S Treatment plan communicated to patient treatment plan for patient is completed Preconditions nurse has received treatment plan from doctor patient contact information and mailing address is available to the nurse nurse calls patient to communicate treatment plan and arrange for subsequent patient actions as required by the plan Actions letter prepared with treatment plan and information from nurse/patient discussion and mailed to patient treatment plan report and copy of letter added to patient office medical record patient is notified of treatment plan and future actions (verbal and written) Post conditions office medical record is updated with treatment plan and patient communications 40 | CMU/SEI-2008-TR-008 .

Patient goes to lab for prescribed tests and registers at lab desk (L) A2. Patient information is transferred reliably and accurately in a timely manner with all privacy needs addressed. FOCUS STEPS FOR ASSURANCE A1. Lab results transmitted to hospital central repository (P) A6. labels it for lab technician (N) A4. Phlebotomist takes blood. Lab technician performs tests on sample and generates report (O) A5. Lab paperwork prepared and queued for phlebotomist (M) A3. Notification given to doctor’s office (email) (Q) Step A1 Preconditions Patient goes to lab for prescribed tests and registers at lab desk patient has an order for lab work system in place for collecting patient demographic and insurance information collect patient insurance and billing information Actions record doctor to receive report medical order entered into system patient is queued for blood work Post conditions medical order for lab work is properly entered into the system all HIPAA privacy constraints are met Claims patient information is accurately input into the laboratory system Step A2 Preconditions Actions Lab paperwork prepared and queued for phlebotomist blood specimen requirements for each requested test is appropriately characterized within the system print labels and orders for phlebotomist SOFTWARE ENGINEERING INSTITUTE | 41 .Appendix B Mission Steps for Assurance Case SUCCESSFUL BUSINESS PROCESS COMPLETION CRITERIA All ordered tests are appropriately performed in a timely manner and results accurately communicated to the requesting doctor.

Step A2 Post conditions Claims Lab paperwork prepared and queued for phlebotomist paperwork (labels) printed for blood sample labels are accurate and legible (all and only requested tests. correct patient ID information) for requested tests Step A3 Preconditions Actions Post conditions Claims Phlebotomist takes blood. who was the operator. similar information access to results meets HIPAA requirements (for example. labels it for lab technician printed paperwork (labels) and patient ready blood sample taken blood in properly labeled vials (none) Step A4 Lab technician performs tests on sample and generates report blood and paperwork ready Preconditions technician loads proper machine with blood samples bar code on vial indicates patient and proper test to machine machine runs tests Actions each machine sends results to lab’s database collecting point results collated into report for transmission to the hospital repository report exists Post conditions blood properly disposed of technician performing work is identified and linked to results all required tests were run (integrity. availability) no unordered tests were run (integrity) test results are accurately recorded (integrity) Claims test results are associated with the right patient (integrity) lab audit trail exists—who did the work. technician cannot identify the patient associated with the test results) (confidentiality) 42 | CMU/SEI-2008-TR-008 .

or similar faults (integrity) disclosure unauthorized entity (person. but they are not the correct results (integrity) analysis machine is not calibrated. confidentiality) some or all test results were not associated with the right doctor (in the lab) (integrity. availability) some unrequested tests were performed (integrity) wrong results: results do not reflect the actual sample (integrity) Failure outcomes disclosure results disclosed to unauthorized person (confidentiality) test results not associated with the correct patient (integrity. confidentiality) Potential causes of failure lab database was inaccessible for receiving results (availability) machine did not produce results (availability) machine was not working and could not produce results (availability) wrong results machine doing the test has an undetected internal failure so results were produced. contaminated. has faulty reagents. or others) gained access to the analysis results during analysis (in the lab) (confidentiality) SOFTWARE ENGINEERING INSTITUTE | 43 .Step A4 Lab technician performs tests on sample and generates report missing (or delayed) results: some or all tests are not done (integrity. confidentiality) missing results paperwork requiring tests to be run was lost or misplaced (integrity) blood samples were lost. confidentiality) test results not associated with the correct doctor (integrity. or misplaced (integrity) some tests were not run by the technician (integrity) wrong tests were run by the technician (integrity) some or all test results were not associated with the correct patient (in the lab) (integrity. insurance company.

Step A5 Lab results transmitted to hospital central repository test result report is available in the lab database can match the lab’s patient ID with the hospital’s patient ID hospital can authenticate the lab Preconditions communications exist lab can authenticate hospital lab can provide authorized readers of the transmitted report if the request for tests came directly to them from the patient or doctor (not via the hospital). Actions Post conditions results transmitted laboratory associated with results in hospital repository results transmitted without loss of patient privacy test results are accurately transmitted and entered correctly into the hospital database. test results are connected to the right patient Claims test results are connected to the right doctor test results are connected to the right lab at the hospital (trusted sender) receipt of test results is acknowledged (non-repudiation) access to results in hospital repository meets HIPAA requirements an audit trail exists missing (or delayed) results some or all tests are reported missing from the database when they should have been present wrong results results do not reflect the actual sample or doctor orders disclosure Failure outcomes results disclosed to unauthorized person unauthorized person gains access to analysis results at the lab’s dat abase hospital system corrupted what was transmitted (or the transmission process) causes a failure within the hospital information system duplicated results entry of test results in hospital database duplicated 44 | CMU/SEI-2008-TR-008 .

.Step A5 Lab results transmitted to hospital central repository wrong results test results are not applied to proper patient record in the repository lab results modified before transmission (whether this is possible depends on the process of collecting results and then transmitting them) results of tests not ordered are entered test results were not accurately transmitted to the database (integrity) transmitted results are tampered with (removed or changed) (integrity issue) mismatch in hospital and lab data schema—likely a change on one end or the other missing results authentication of lab fails (e. key mismatch) so results are not sent —could be critical.g.g. test results expected but not received—lab loses results or does not transmit them (lab error recovery fails on a failure in its system) Potential causes of failure tests were run but results were not made available to the database (results not transmitted or not received) test results were not written to the database and no error message was received (or if received. malware received from the laboratory duplicated results retransmission (due to recovery from partial transmission) causes duplicate results to be entered in repository Step A6 Preconditions Actions Post conditions Notification given to doctor’s office (email) tests completed and loaded to the hospital database doctor’s email is provided email sent to doctor notifying that test results are available information notification received at doctor’s office SOFTWARE ENGINEERING INSTITUTE | 45 . results were not retransmitted) data cannot be accessed by the hospital—encryption failure. mismatch in hospital and lab data schema—likely a change on one end or the other disclosure misconfigurations lead to lab system compromises an unauthorized person has access to test results after the transmission (e. faxed to wrong number) results are not associated with the correct doctor hospital system corrupted poorly formed data record causes hospital system failures.

Step A6 Notification given to doctor’s office (email) notification sent to an accurate email address right doctor received patient notification Claims no unauthorized person received notification notification contents are sufficient to properly identify the patient with no patient sensitive information 46 | CMU/SEI-2008-TR-008 .

PEOPLE REFERENCE TABLE A1) Patient to lab Patient Lab check-in staff Phlebotomist Lab technician A2) Lab prepares paperwork A3) Blood sample drawn A4) Lab sample analyzed A5) Report transmitted to hospital A6) Notice sent doctor’s office X C C X C C C SOFTWARE ENGINEERING INSTITUTE | 47 .

RESOURCE REFERENCE TABLE: A1) Patient to lab Lab work order Patient insurance data HIPAA forms Lab scheduling Lab test repository and reporting system Blood sample Lab paperwork (labels) Testing machine Testing machine connectivity Doctor office connectivity X X X X A2) Lab prepares paperwork X A3) Blood sample drawn A4) Lab sample analyzed A5) Report transmitted to hospital A6) Notice sent to doctor’s office X X X X X X X X X X X 48 | CMU/SEI-2008-TR-008 .

“Architecting Principles for Systems of Systems. Final Report on the August 14. https://reports..html [Jackson 2007] Jackson. editors. https://buildsecurityin.cs. Daniel.htm [Schwartz 2007] Schwartz. & Millett. 12. 2003 Blackout in the United States and Canada: Causes and Recommendations. System-of-Systems Influences on Acquisition Strategy Development.com/Open/PAPERS/systems.gov/daisy/bsi/articles/best-practices/acquisition.-Canada Power System Outage Task Force. “Who Needs Hackers?” New York Times.pdf (2004).gov/ SOFTWARE ENGINEERING INSTITUTE | 49 .infoed.us-cert. 2007 [Kelly 2004] Kelly.” Systems Engineering 1.ac.uk/~rob/papers/DSN04. John.S. Lynette I. Rob. Tim & Weaver. 2008. http://www. Sept.energy. 2007. [US-Canada 2004] U. Rita & Ellison.References/Bibliography URLs are valid as of the publication date of this document. [Creel 2008] Creel. Mark. Martyn. Robert J.” http://www-users. 4 (1998): 267-84. [Maier 1998] Maier. Thomas. “The Goal Structuring Notation—A Safety Argument Notation. April 2004.york. Software for Dependable Systems: Sufficient Evidence? The National Academies Press.

and maintained by different entities. ABSTRACT (MAXIMUM 200 WORDS) ESC-TR-2008-008 12B DISTRIBUTION CODE Complexity and change pervade today’s organizations. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) PERFORMING ORGANIZATION REPORT NUMBER CMU/SEI-2008-TR-008 10. survivability. Organizational and technology components that must work together may be created. SECURITY CLASSIFICATION OF ABSTRACT 20. SPONSORING/MONITORING AGENCY REPORT NUMBER HQ ESC/XPK 5 Eglin Street Hanscom AFB. processes. NUMBER OF PAGES 63 14. VA 22202-4302. The SAF is designed to • • • identify potential problems with existing or near-term interoperations among components within today’s network environments highlight the impact on survivability as constrained interoperation moves to more dynamic connectivity increase assurance that mission threads can survive in the presence of stress and possible failure 15. SUBJECT TERMS Assurance. DC 20503. systems of systems 16. SECURITY CLASSIFICATION OF REPORT 18. SECURITY CLASSIFICATION OF THIS PAGE 19. (Leave Blank) 4.REPORT DOCUMENTATION PAGE Form Approved OMB No. dependencies. TITLE AND SUBTITLE April 2008 5. Z39-18 298-102 50 | CMU/SEI-2008-TR-008 . PA 15213 9. DTIC. increasing the layers of people. Existing analysis mechanisms do not provide a way to (1) focus on challenges arising from integrating multiple systems. and completing and reviewing the collection of information. NTIS 13. John Goodenough. and to the Office of Management and Budget. a team at the Software Engineering Institute (SEI) built an analysis framework to evaluate the quality of the linkage among roles. a structured view of people. constraints. Directorate for information Operations and Reports. gathering and maintaining the data needed. REPORT TYPE AND DATES COVERED Final FUNDING NUMBERS Survivability Assurance for System of Systems AUTHOR(S) FA8721-05-C-0003 Robert J. AGENCY USE ONLY 2. SUPPLEMENTARY NOTES 12A DISTRIBUTION/AVAILABILITY STATEMENT Unclassified/Unlimited. Net-centric operations and service-oriented architectures will push this trend further. including suggestions for reducing this burden. MA 01731-2116 11. Ellison. 0704-0188 Public reporting burden for this collection of information is estimated to average 1 hour per response. and technology. and risks for critical technology capabilities in the face of change. Paperwork Reduction Project (0704-0188). 6. including the time for reviewing instructions. The Survivability Analysis Framework (SAF). and systems. Arlington. LIMITATION OF ABSTRACT Unclassified NSN 7540-01-280-5500 Unclassified Unclassified UL Standard Form 298 (Rev. REPORT DATE 3. process. 7. In response. searching existing data sources. (2) consider architecture tradeoffs carrying impacts beyond a single system. to Washington Headquarters Services. 1. managed. Suite 1204. 1215 Jefferson Davis Highway. PRICE CODE 17. NSS. and (3) consider the linkage of technology to critical organizational functions. Charles Weinstock. networked system. Software Engineering Institute Carnegie Mellon University Pittsburgh. Send comments regarding this burden estimate or any other aspect of this collection of information. 2-89) Prescribed by ANSI Std. was developed to help organizations analyze and understand stresses and gaps to survivability for operational and proposed business processes. Carol Woody PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. Washington.

Sign up to vote on this title
UsefulNot useful