You are on page 1of 6

The Agile RAMSS lifecycle for the future

Thor Myklebust
SINTEF Digital, Norway. E-mail: thor.myklebust@sintef.no

Per Håkon Meland


SINTEF Digital, Norway. E-mail: per.h.meland@sintef.no

Tor Stålhane
NTNU Norway. E-mail: stalhane@ntnu.no

Geir K. Hanssen
SINTEF Digital, Norway. E-mail: geir.k.hanssen@sintef.no

In recent years, there has been an increasing interest and growing use of agile development methods when
developing safety-critical systems. This interest is motivated by the need to shorten time-to-market, reduce costs,
improve quality, and to support the paradigm of continuous development and deployment.
This paper presents an agile lifecycle approach to Reliability, Availability, Maintainability, Safety and Security
(RAMSS) engineering and management. The current trend for cyber physical systems is more connectivity over
insecure networks, and as a consequence of emerging security threats, we suggest a systematic addition of security
in this area, complementing safety. Depending on the domain, it is not just the software itself that must be updated
due to security issues, but also safety cases and accompanying evidence.
The Agile RAMSS approach covers all phases of the development process, including improvements due to
modifications and safe patching during operation. These improvements have to be performed based on strict safety
standard requirements. The lifecycle is aimed at manufacturers of High Integrity Systems, like Industrial
Automation and Control Systems and Safety Instrumented Systems. We have used our in-depth knowledge of
security standards, like the IEC62443 series and the software safety standards IEC61508-3 and EN 50128, to
establish a risk-based approach that is combined with a fast track solution of the SafeScrum method including
DevOps.

Keywords: Agile, RAMS, SafeScrum, Safety, Security, DevOps.

1. Introduction certified in a changing threat landscape, where


shorter time-to-market and faster response to
There has been an increasing interest and growing security threats becomes more important.
use of agile development methods when Developers and users of safety critical systems
developing safety-critical systems in both face three important problems; (1) the need to
transport domains and the process industry react quickly to any challenges due to changes in
Myklebust et al (2017). the system’s usage or operating environment, (2)
This paper presents an agile lifecycle approach to the need to use available experiences to improve
Reliability, Availability, Maintainability, Safety the system and (3) the need to keep the system
and Security (RAMSS) engineering and certified by the relevant authorities each time
management. Due to the increased connectivity something is changed. The first two of these
over insecure networks and emerging security problems can be handled efficiently, given the
threats, we recommend a systematic analysis and right development environment, while the last one
handling of security in this area, complementing is, at least partly, outside the developers’ control.
RAMS. Depending on the domain, it is not just As pointed out by Sun et al. (2009), safety and
the software itself that must be updated due to security goals interact synergistically or
security issues, but also safety cases and conflictingly, and should therefore be evaluated
accompanying evidence. These tasks can take together. If not, conflicts can result in either (a)
several months before a modification or patch overly secure systems that compromise the
rollout can begin, which can be a risk by itself. reliability of critical operations or (b) create
This creates a tension between release cycles and insecure systems where back-doors are easily
safety assurance. The industry, regulators and found. They both share a risk-driven approach,
certification bodies have to rethink how safety- where risks should be continuously identified,
critical software systems should be developed and monitored and treated throughout the system
Proceedings of the 29th European Safety and Reliability Conference.
Edited by Michael Beer and Enrico Zio
Copyright ©2019 by ESREL2019 Organizers. Published by Research Publishing, Singapore
ISBN: 981-973-0000-00-0 :: doi: 10.3850/981-973-0000-00-0 esrel2019-paper
2 Thor Myklebust, Per Håkon Meland, Tor Stålhane and Geir K. Hanssen
lifecycle. However, the focus of safety risk threats analysis should be carried out.” Patching,
analysis is more on how system failure events can which may be important when needing a quick fix
harm health and the physical environment in for a security problem, is not mentioned in this
which it operates, while security risks typically standard.
have an external origin that attacks the system.
Safety events follow a stochastic probability The IEC TR 62443-2-3 is only concerned with
distribution due to physical wear and tear and patching of industrial automation and control
human mistakes, while security attacks are driven systems (IACS), not e.g. SIS (Safety
by human intentions and the presence of Instrumented Systems). The main point is that:
exploitable vulnerabilities. The latter leads to a “Asset owners have an implied obligation to
heavier reliance on trend analysis and knowledge uphold the safety, reliability, operability, security
about attackers rather than historical data when and quality of their operations. Achieving cyber
making probability predictions. An exploratory security assurance, through patching IACS assets,
research process using observations, reviews and is a critical part of that obligation.”
documentation conducted at three companies and
interviews was conducted with three certification EN 50128:2011 do not refer to security standards
bodies. and does not even mention the term "security".
Just as IEC 61508 just refers to security concerns,
2. Background IEC 62443-4-1 just refers to safety concerns –
mostly to IEC 61508. Safety is only mentioned in
2.1 Safety and security standards section 11.2: Security update qualification. Here,
the standard states that a process shall be
There are several standards related to safety and employed for verifying that security updates
security. created by the product developer address the
intended security vulnerabilities and do not
introduce regressions. In addition, this standard
has a lot of text related to patches – defined as the
“management area of systems management that
involves acquiring, testing and installing software
patches (code changes) to a product.” Last but not
least, the standard states: “The process should
include a confirmation that update is not
contradicting other operational, safety or legal
constraints.”

ISO/IEC 27001:2013 is a pure information


security management standard and do not mention
safety at all. The standard has four main parts,
covering all the important aspects of information
Figure 1: Different safety standards and their references to security management – understanding the
relevant security standards organization, planning, support, operations and
improvement. Improvement includes
The following four standards are representative identification of non-conformities and corrective
subset for ongoing standardization efforts in the action. The standard has requirements both on
area: IEC 61508:2010 and EN 50128:2011, EN security risk assessment and on how to deal with
50128: draft 2018, IEC 62443-4-1:2018, IEC TR it.
62443-2-3:2015 and ISO/IEC 27001:2013. This
is enough to give an idea of the state of the 2.2 RAMS
standardization. IEC 61508 is a generic standard
for safety critical systems. Its relationship with Reliability is mentioned in the security standard
security is made clear in part 1: “In particular, this IEC 62443-2-3 but only related to patch
standard does not specify the requirements for the management – “The objective of patch
development, implementation, maintenance qualification is to build confidence with technical
and/or operation of security policies or security validation that the patch will not negatively affect
services needed to meet a security policy that may the performance, safety or reliability of the
be required by the E/E/PE safety-related system.” IACS”. IEC 62443-4-1 only mention reliability as
Even though the standard does not have any part of definitions. Availability in the system
requirements for security, it states, “If the hazard sense is mentioned in IEC 62443-4-1 but only as
analysis identifies that malevolent or a characteristic that must be considered during
unauthorized action, constituting a security threat, patching – no advice or requirements are inserted.
as being reasonably foreseeable, then a security
3
Reliability does not get much attention in IEC process. This might create needs for new safety or
61508 except for a recommendation for reliability security requirements.
block diagrams. In addition, reliability is
mentioned in two notes as follows: “Although 2.4 SafeScrum
systematic safety integrity is usually unquantified,
quantified statistical evidence (e.g. statistical The SafeScrum process was developed through
testing, reliability growth) is acceptable if all the collaboration between research and industry,
relevant conditions for statistically valid evidence involving leading Norwegian providers of safety-
are satisfied.” IEC 61508 does not mention critical systems. All the main components of the
maintainability or availability as software quality Scrum process are kept. See Hanssen et al, (2018).
characteristics at all. SafeScrum extends the basic Scrum model in
order to make the process applicable for
We have also analysed the railroad standards – development and certification of safety-critical
IEC 5012x. The results confirm what we have software. This is done by including the SRS, as
seen in the other relevant standards – while this is required by the safety standards. In
“safety” is mentioned more than a 1000 times, addition, practices like safety stories Myklebust
reliability, availability and maintainability are and Stålhane (2016) and the backlog are included
each mentioned less than 50 times. to ensure that the software engineers fully
understands the requirements. In addition, all
Based on the examples cited above, it is fair to say development and change of requirements and
that most of the RAMS focus in all the quoted code are traced.
safety-relevant standards is on safety. We have also included internal quality assurance
within the SafeScrum process. The QA
2.3 DevOps and the relation to safety and responsible is part of the Scrum team and will
security verify that all quality assurance tasks have been
performed Hanssen et al (2016). When existing
DevOps Laukkarinnen et al (2018) has emerged code is being changed, the team must do a change
from the agile community as a way of closing the impact analysis – see Myklebust et al 2014 – to
gap between software development and evaluate whether the change will affect the safety
operations and allow rapid and dependable release of the system. Short iterations enable frequent
cycles. This is accomplished using technical evaluation of the detailed design and of the safety
measures such as automated toolchains, but just function of the system. It also forms a basis for
as important is an encouragement for frequent communication, both within the team,
collaboration and communication between with the product owner (e.g. the customer), and
various stakeholders within an organization. with the assessor. The key principle is to uncover
DevOps is meant to take care of the need to react and correct problems as early as possible in the
quickly to any challenges due to changes in the development process. Knowledge regarding the
system’s usage or operating environment and the status of the development can be communicated
need to use available experiences to improve the by incremental issue of an Agile Safety Case – see
system. However, there are several challenges to Myklebust and Stålhane (2018).
the use of DevOps. Lwakatare et al. (2016) have In order to add security to the SafeScrum process,
summed it up nicely as four important issues – we have used two concepts – alongside
hardware dependency, limited visibility of engineering – a set of supporting activities
customer environments, lack of technology running in parallel with the Scrum process – and
needed for deployment in customer-specific the security case, similar to the safety case,
environment and absence of feature usage data. suggested by Patu and Yamamoto (2013). The
For safety-critical systems a fifth and sixth issue alongside engineering concept fits quite well with
should be added – safety evidence and safety the ideas from Howard and Lipner (2006).
certification. Based on this, part of the solution Howard and Lipner also states that “Microsofts
seems to be more knowledge and data on how our experience is that the security team must be
systems are used. available for frequent interactions during software
Since safety – and thus security – are our main design and development, and must be trusted with
concerns, it is important to collect data on near sensitive technical and business information. For
misses and security breaches. This will allow us these reasons, the preferred solution is to build a
to collect information on which types of barriers security team within the software development
work and which do not. This information should organization (although it may be appropriate to
then be fed back to the developers and used to engage consultants to help build and train the
develop the next, improved version. In addition, members of the team).”
each time the environment or the system’s usage SMEs will not have the resources needed to have
change, we need to repeat the risk assessment a dedicated security team, but should nevertheless
4 Thor Myklebust, Per Håkon Meland, Tor Stålhane and Geir K. Hanssen
have a security advisor to assist the Scrum team. “Our system of interest is safety critical, reads a
A security team or a security advisor is a sensible set of sensors and thereafter sets a set of actuators.
approach to the security part of RAMSS. The The system communicates with its sensors and
security-enhanced process is as shown in the actuators over a dedicated Ethernet. A year ago,
figure below. As stated previously, the outmost the sensor network was extended with several
set of activities belong to “alongside wireless sensors. The new system software was
engineering”. Thus, e.g., “Safety and Security certified by the assessor and then set into
analysis” are done by the safety expert and the operation. The company that uses the system has
security expert respectively. discovered several break-in attempts, using a
vulnerability in the system’s communication part.
The company responsible for the communication
part of the system has been informed of the
problem and came up with a patch that will solve
the problem.”

Based on the scenario above, we put two


questions to the certification companies:
1. What need to be re-certified (1) the
whole system, (2) the changes or (3) the
subsystem affected by the changes – in our case,
the communication.
2. Would it be possible to make an
agreement with the certifier so that we only certify
the change process and report the change to the
certifying company?

Answer from company 1:


Figure 2: The SafeScrum process threat intelligence and safety • "An impact analysis – the effects of the
changes to the overall system should be described
Since new security threats can surface at any time, together with the verification and validation
it will be useful to combine the change impact activities to be repeated in order to show that the
analysis with an attack-tree analysis to get a better changes done to the system have no adverse effect
understanding of possible new attacks. to the safety of the overall system. A re-
certification of the overall updated system should
focus just on the performed changes and their
3. Patching and certification impact to the system”.
In many cases, a correction or changes in • “Certified change or development
functionality can wait until the next release. process will not necessarily lead to a certifiable
However, every now and then a problem surfaces solution. Thus, reporting changes in conjunction
that needs to be taken care of right away – for with a certified change process will not be
instance problems that are security-related. As sufficient. I would always also assess the actual
computer systems become more and more performed changes and check that I agree with the
connected to other computer systems – e.g., impact analysis results”.
through the internet, the opportunities for break-
in will increase and the longer such an opportunity Answer from company 2: The extent of re-
exists, the more probable it will be that someone assessment depends on the system’s SIL.
will take advantage of it. Thus, the need for quick • SIL 2 system => the changes and the
fixes. Depending on the domain and the practice affected subsystems shall be revalidated. SIL 4
by the relevant certification bodies, the changed system => the complete system shall be
system will need to be recertified. This is for revalidated.
instance a requirement in the railway domain. A • It is possible to certify the change
recertification may take anything from one week process in terms of a “functional safety
to half a year after the change. management system”

This is an unfortunate situation, which needs to be Answer from company 3:


fixed. In order to evaluate our opportunities, we • “As a minimum, an impact analysis and
presented the following scenario to three global a complete reverification of affected modules is
certification authorities, which certify safety necessary and a solid software configuration
critical system, e.g., for the offshore industry: procedure must be in place. In addition,
regression validation is highly recommended”
5
• “For higher SIL values (3 – 4) a probability that someone will take advantage of it.
complete system validation may be necessary” After an update, part of the system will need re-
certification.
To sum up: according to the assessors, a Thus, we do not just need to use ideas from
revalidation of the changed parts of the system, DevOps – we also need to make assessors rethink
together with recertification is needed. When it their approach to certification after security-
comes to relying only on a certified change related changes. We already know that assessors
process, the opinions are divided. One of the will not accept a certified change process and
companies says “No way”, one says “Only for leave it with that – see discussion in section 3.
functional safety” while the third company did not
answer this part of our question. The IEC 62443 series includes a security
lifecycle based on SDL (Security Development
First and foremost, the problem is real and will not Lifecyle) developed by Howard and Lipner
go away. If anything, it will only get worse. At the (2006). This lifecycle also includes a description
present, this is solved by applying the patches for agile development.
while not telling anybody – sorry, no references
here. The patches are officially inserted into the In the figure below we have shown an agile
system and certified when a new version is lifecycle, the IEC 61508 lifecycle, DevOps
released. This is, however, not the optimal lifecycle including necessary planning and the
solution since the certifying authorities are left security lifecycle. The security part should be
totally out of the loop. included in the alongside engineering together
with safety but in some projects two teams are
We suggest the following solution: necessary i.e. both a safety and security team.
• The developing company should define New technology has made it simpler to monitor
a patching process. As a minimum, it should the operation of safety systems. It has also
contain change impact analysis and retesting of all become more important to move towards a
parts affected by the change. process with more frequent updates and upgrades
• The assessor should have access to proof (see definitions for update and upgrade in e.g. IEC
of compliance for the patching process, including 62443-4-2:draft 2018) of the safety software, after
the test logs. the product/system has been developed, due to
• The patch acceptance should only be e.g., improved operational feedback, technology
valid for a defined length of time – e.g. six improvements and security issues, including safe
months. After this, the patch should be included patching. The Agile Safety Case Approach,
in the system update and the new, revised system Myklebust and Stålhane (2018), can be an enabler
should be certified as usual. for future DevOps processes that unifies software
development (Dev) and software operation (Ops).

4. The RAMSS lifecycle


RAMS (Reliability, Availability,
Maintainability and Safety) EN 50126-1:1999 has
traditionally been the important concerns and is
the lifecycle for the railway EN 50126 standard
series. A similar lifecycle is presented in the
generic IEC 61508 series. The idea of the agile
RAMSS is to add security to the process in a clear,
agile and distinct manner. All the RAMSS parts
should be based on an agile approach but keeping
relevant and required parts of the safety and
security standards. Security should be part of our
concern, right from the overall scope definition
and hazard analysis to operation, maintenance and
repair. This again, implies that security will also
be part of the verification and validation
processes. For many safety-critical systems, the
environment and the users’ needs will change
over time and create needs for system changes –
some small and other quite dramatic. For changes
related to security, the time from observed needs Figure 3: Different lifecycles that are relevant when
developing safety-critical systems.
to implemented change is important. The longer
the security threat is open, the larger the
6 Thor Myklebust, Per Håkon Meland, Tor Stålhane and Geir K. Hanssen
Although there exists both several safety and T. Myklebust and T. Stålhane. The Agile Safety
security standards, they are not part of a safety and Case. ISBN 9783319702643 Springer
security framework. As a result, the developers International Publishing AG, February 2018.
must interpret how to combine the standards G. K. Hanssen, T. Stålhane and T. Myklebust.
(cooperation, integrated or separate system). SafeScrum – Agile Development of Safety-
Communication with the assessors and safety and Critical Software. ISBN 9783319993348
security experts are of crucial importance. The Springer International Publishing AG,
draft technical report IEC 63069:draft 2018 tries December 2018.
to solve some of these challenges. Several HOWARD, Michael and LIPNER, Steve, The
challenges remain like e.g. safe patching. Safe Security Development Life-cycle; SOL A
patching is not defined. A safe patch must satisfy Process for Developing Demonstrably More
the safety requirements and include e.g. an agile Secure Software, 2006, Microsoft Press,
safety case. In addition, there are a need for Redmond, Washington
improved safety and security awareness and N. Patu and S. Yamamoto. A new approach to
communication between the different experts. develop a dependable security case by
combining real life security experiences
In the figure above we have included the main (lessons learned) with D-Case development
elements to consider when a manufacturer shall process. CD-ARES 2013 workshop, LNCS
develop and continuously improve a safety 8128, pp 457-464, 2013. IFIP International
critical product or system. How to apply the Federation for information processing.
correct RAMSS lifecycle depends on different Sun, M., Mohan, S., Sha, L., Gunter, C.:
factors as e.g. SILx, SLy, product, system and the Addressing safety and security contradictions
actual domain. in cyber-physical systems. In: Proceedings of
the 1st Workshop on Future Directions in
Cyber-Physical Systems Security (CPSSW09)
5. Conclusion (2009)
Lucy Ellen Lwakatare, Teemu Karvonen, Tanja
The new, Agile RAMSS approach covers all Sauvola1, Pasi Kuvaja, Helena Holmström
phases of the development lifecycle, including Olsson, Jan Bosch and Markku Oivo.
improvements due to modifications, updates, Towards DevOps in the Embedded Systems
upgrades or safe patching during operation. The Domain: Why is It so Hard? 2016 49th Hawaii
improvements, including an agile approach and International Conference on System Sciences.
including security, have to be performed based on
strict safety standard requirements when doing
modifications or upgrades. The proposed lifecycle
is aimed at manufacturers of High Integrity
Systems, such as Industrial Automation and
Control Systems and Safety Instrumented
Systems. We have used our in-depth knowledge
of security standards, like the IEC62443 series
and the software safety standards IEC61508-
3:2010 and EN 50128:2011, to establish a risk-
based approach that is combined with a fast track
solution of the SafeScrum method, including
DevOps and issuing of an Agile Safety Case. Both
Sprints (software development, see figure 1) are
used alongside (in parallel) with a Safety &
Security engineering team to facilitate
incremental modifications throughout the whole
lifecycle of the safety system.

References
T. Myklebust, G. K. Hanssen and N. Lyngby. A
survey of the software and safety case
development practice in the railway signalling
sector. ESREL Portoroz Slovenia 2017
T. Laukkarinen, K. Kuusinen and T. Mikkonen.
Regulated software meets DevOps.
Information and Software Technology 97
(2018) 176-178

You might also like