Professional Documents
Culture Documents
SoftwareFMEAAnalysis ANE Paper
SoftwareFMEAAnalysis ANE Paper
net/publication/261182900
CITATIONS READS
15 3,855
3 authors, including:
All content following this page was uploaded by Park Gee-Yong on 12 March 2016.
Gee-Yong Park
Abstract
One method of the software safety analysis is described in this paper for safety-related application
software. The target software system is a software code that was installed at an Automatic Test and
Interface Processor (ATIP) in a digital reactor protection system (DRPS). For the ATIP software safety
analysis, at first, an overall safety analysis is performed over the software architecture and modules,
and then a detailed safety analysis based on the software FMEA (Failure Modes and Effect Analysis)
method is applied to the ATIP program. For an efficient analysis, the software FMEA is carried out
based on the so-called failure-mode template extracted from the function blocks used in the function
block diagram (FBD) for the ATIP software. The software safety analysis by the software FMEA,
being applied to the ATIP software code which has been integrated and passed through a very rigorous
system test procedure, is proven to be able to provide very valuable results (i.e., software defects)
which could not be identified during various system tests.
1. Introduction
A fully-digitalized reactor protection system, which is called the IDiPS-RPS, has been developed
under the KNICS (Korea Nuclear Instrumentation & Control Systems) project in order to be used in
newly-constructed nuclear power plants and also in the upgrade of existing analog-based reactor
protection systems. Fig. 1 depicts the overall configuration of the IDiPS-RPS. The IDiPS-RPS has
four channels which are located in electrically and physically isolated rooms. The IDiPS-RPS
automatically generates the reactor trip signals and the engineered safety features actuation signals
whenever the monitored process variables reach their predefined setpoints. Each channel of the
IDiPS-RPS is composed of Bistable Processors (BPs), Coincidence Processors (CPs), an ATIP, and a
Cabinet Operator Module (COM).
It is recommended in the code and standards that the software safety analysis shall be performed
during the development of software used for a safety system of nuclear power plants [3][4].
Accordingly, as depicted in Fig.2, the software safety analysis has been performed on the safetysafety-
critical software, e.g., the software for the BP/CP, alon
alongg a software development lifecycle from a
planning phase to a validation phase [2]. For the safety
safety-related
related software, the software safety analysis
was not applied for each phase of the software lifecycle but applied only to an implemented code.
The main purpose of the software safety analysis is to find any software defect and the associated
path through which the software defect can propagate in the software system and ultimately affect the
safety of a system within which software is installed and executed. Fig.3 shows the procedure of the
software safety analysis actually performed in the safety analysis of the ATIPS software.
Preliminary/Detailed
System Hazard Analysis
The first step for the software safety analysis is to identify the software-contributable systems
hazards. This identification was performed by investigating the results of the system hazard analysis.
The system hazard analysis in the KNICS project has been performed using the FMEA analysis on the
target system composed of its hardware and software by the system safety analysts. The next step is
the analysis is of the software structure and the interface point both between the software modules and
between the software and hardware. Then, the analysis template is constructed. The safety analysis for
a software system was usually performed based on a particular analysis template. For the safety
activities for the IDiPS-RPS software, we have applied three analysis methods; the software HAZOP
(Hazard and Operability Analysis) analysis method [5], the software FTA (Fault Tree Analysis)
analysis method [6], and the software FMEA analysis method. The strategy of applying these methods
to various software systems in the IDiPS-RPS is described in Ref.5 in detail.
The analysis template is the checklist derived from key guide phrases in the software HAZOP
method, the fault-tree templates in the software FTA method, and the single failure mode template in
the FMEA method. Based on the template, the safety analysis is performed to identify a particular
software defect and its associated path that can induce one of the software-contributable system
hazards.
For the ATIP software, the software-induced system hazards were identified and Table 1 presents
these hazards. The “Criticality” in Table 1 means the significance or the severity of the corresponding
hazard item. The level 4 represents the most significant hazard that can affect the safety or availability
of the IDiPS-RPS. The level 3 indicates a significant hazard than can violate the IDiPS-RPS
requirements. The level 2 means a less significant hazard and the level 4 indicates a non-significant
hazard. In this study, most of analysis efforts have been focused on the identification of a software
defect (or fault) that can induce the top two hazards (i.e., the hazards with the criticality level of 4).
Table 1. Software-Contributable System Hazards for ATIP Software
No Hazard Criticality
The trip-functioning operations of the BP and the CP
1 4
are affected by the ATIP software.
The test requirements are violated by an ATIP
2 4
software defect that can affect the system safety
The BP/CP test operations behaves wrongly by an
3 3
ATIP software
ATIP cannot provide correct information of the BP/CP
4 2
operating status to the COM
Because the software is based on logistic constructs and its behavior is deterministic, the
software safety analysis has a distinguishing characteristic from a conventional safety analysis which
is based on the possibility of failure with the statistical framework. For a software code, the software
safety analysis has a dichotomous result such that a defect really exists that can induce a predefined
hazard or such a defect does not exist.
The IDiPS-RPS is configured based on the programmable logic controller (PLC) based platform.
Software of the IDiPS-RPS except the COM software is programmed by the function block diagram
(FBD) which is compliant with the standard of IEC 61131-3 [7] where the FBD is dictated as one of
the PLC program languages. An FBD program for the ATIP software is edited by the use of a
proprietary CASE (Computer-Aided Software Engineering) tool, which is called pSET. The pSET
then compiles an FBD program into a machine code and loads it into the PLC memory.
The ATIP software code is composed of 24 modules. These modules can be, according to their
corresponding functions, grouped into five parts (or group modules) as follows.
- Data Acquisition: This group module acquires data from the BP and CP. It receives commands
from the COM and also receives necessary data from the other three ATIPs (refer to Fig.1).
- Equipment Diagnosis: This group performs equipment check for its own BPs, CPs, and COM,
and for the other three ATIPs.
- Bypass Check: This part checks the channel bypass for a specific maintenance or for
performing functional tests without inducing an inadvertent reactor trip.
- Test Logic: This group initiates a functional test when it receives a test command from the
COM. Then it generates necessary test signals and analyzes each of the test results from the
BP and the CP.
- Data Transmission: This part sends the appropriate data to the other processors in the same
channel and to the other three ATIPs. In this part, the heartbeat signals are generated and
transmitted.
The most significant criterion or restriction for the ATIP software is that it does not intervene in
the functions of the BP or CP software that generate a reactor trip signal when an emergent state
occurs during a plant operation. Regarding this fact, an analysis with a system-wide viewpoint was
performed in order to determine whether it is possible that the trip functions in the BP and CP are
affect by any malfunction of the ATIP.
For performing this analysis, the memory map and interface mechanisms between the ATIP and
the trip-functioning processors such as the BP and the CP were reviewed and analyzed. After that, the
effect of an ATIP malfunction on the performance of the functional tests was analyzed to determine
whether this effect on the functional tests could affect the reactor trip function, or this effect could be
accommodated in the BP/CP software.
After the system-wide analysis was completed, the criticality analysis for the architecture of the
ATIP software was performed. The purpose of the criticality analysis is to identify the relative
significance of a software module with respect to the safety viewpoint and allocate appropriate
analysis resource to software modules according to their corresponding criticality. One sample of the
criticality analysis performed is presented in Table 2. The criticality level in the criticality analysis is
categorized to three levels such as H (High), M (Medium), and L (Low) as presented in Table 2.
The criticality level H (High) in Table 2 indicates that the safety or the performance of the BP/CP
software can be affected significantly if there is a fault in a module of the ATIP software code. The
criticality level M (Medium) represents that if there is a fault in an ATIP software module, this can
affect the performance of a test function in the BP/CP software but this cannot compromise a safety
function (i.e., generating a trip signal). The criticality level L (Low) means that if there is a fault in an
ATIP software module, this can induce an error in the information for the display at the COM but it
does not affect the safety and test functions.
The criticality level assigned to each module in Table 2 is the maximum criticality level for that
module and it can be used in selecting qualitatively available resources to carry out the detailed
software safety analysis.
For a detailed safety analysis of the ATIP software code, a software FMEA was applied to each
module of the ATIP software code. In order to apply the FMEA method, possible failure modes for the
ATIP code were investigated and a failure-mode template was devised based on these failure modes.
For the BP and CP software, a software FTA was applied at the design and implementation
phases [6]. When the software FTA was performed, all the failure (or fault) events were identified for
every function block in the FBD program and each fault-tree template was developed for each of the
function blocks. The failure events in various fault-tree templates for the software FTA were reviewed
during the investigation of the failure (or fault) modes of function blocks for the software FMEA
analysis, and the generic failure modes applicable to all the function blocks were established.
The failure-mode template, which can apply equally to all of the function blocks in an FBD
program, was constructed based on these generic failure modes. Table 3 shows the failure-mode
template. As can be seen in Table 3, software failure modes are categorized into three items; Function,
Input, and Output. Each of the failure modes has check points that direct a practical investigation for
that failure mode, explaining the meaning of the failure mode. In Table 3, the check points for each of
the failure modes are presented in the “DESCRIPTION” column.
With the failure-mode template with its associated check points, all the software modules in the
ATIP FBD code were analyzed. One of the results of the software FMEA analysis is presented in
Table 4 where a sub-module of "Other Channel 3 Equipment Check" ("Other Channel 3" means
"Channel D" of the IDiPS-RPS as shown in Fig.1.), which is contained in the module of
"EQUIP_CHK", was analyzed. Table 4 shows the failure modes for the target software module (i.e.,
Sub-Module) and evaluates the possible failure causes and their corresponding effects. The final
column of “Failure Detection/Comments” presents the analysis results for possible failure causes and
effects. If any software defect is not discovered for a particular failure mode, “No failure detected” is
denoted in the “Failure Detection/Comments” column.
Actually, the ATIP FBD code to be analyzed by the software FMEA has been passed through
various tests such as a component test, an integration test, and a system test. Regardless of this fact,
when the software FMEA was applied to the implemented code, some faults or defects were found in
the ATIP software code. Table 5 summarizes the major software defects to be identified by the FMEA
analysis on the ATIP application software modules. Among the analysis results in Table 5, two of
them, that is, the analysis results in the second and fifth rows are critical and these analysis results are
presented in detail in this paper.
One fault from the software FMEA analysis is the omission of updating a heartbeat (HB) variable
of the Channel D in the HB monitoring module of the Channels A, B, and C. This fault is also
represented in the 2nd row of Table 5. This defect results in that the ATIPs in the Channel A, B, and C
do not have an updated HB signal from the Channel D and cannot recognize the malfunction of the
HB in the Channel D.
The second software fault found during the software FMEA analysis is related with the module
of "MT_STS_CHK" as represented in the fifth row in Table 5. This module receives the manual test
(MT) command from the COM, identifies the current operational status of the MT, and provides a
new MT initiation signal to the BP/CP. The MT processes must satisfy the following functional test
requirements.
1) MT shall be performed when the nuclear power plant is in the normal operating mode. MT
shall be prohibited at the reactor trip state.
2) MT shall be performed only for a single channel. If a specific channel is executing an MT,
then the other channels shall not execute the MT.
3) MT shall be initiated after the trip-channel bypass (TCB) or all bypass (AB) is completed.
The "MT_STS_CHK" module is composed of three sub-modules as shown in Fig.4. Through the
sub-module of the "MTSC (Manual Test Status Check)", the MT can be initiated by generating the
MT initiation signal of "AO_1_ATIP_MT_ CMD" for the BP/CP, if three conditions are satisfied. The
three conditions to be satisfied are as follows:
- The states of the trip-channel bypass and all-channel bypass are correct as expected.
- The processor to be tested does not announce a test prohibition condition.
- None of the initiation circuits are actuated.
For the first condition mentioned above, the "MTSC" sub-module receives
"AL_1_ATIP_CHA_BYP" value that is generated from a sub-module of "System Bypass Check" in
the "BYP_CHK" module as shown in Fig.4.
Fig.4 Structure of "MT_STS_CHK" Module (with the "BYP_CHK" module)
In the sub-module of "System Bypass Check", an FBD program for generating the value of
"AL_1_ATIP_CHA_BYP" is shown in Fig.5 where a so-called user-defined function block (UDFB)
named "BYP_CHK" is constructed. Note that this UDFB "BYP_CHK" in Fig.5 is different from the
module "BYP_CHK" in Fig.4. The user-defined function block (UDFB) is usually composed of a few
function blocks and/or a small amount of “C” program.
The user-defined function block "BYP_CHK" in Fig.5 produces its output by ORing of all of its
inputs from G1TB1 to G2AB. The first four input variables from "AI_1_CP1_TV1_ TB" contain the
information of the self-channel bypass and don’t include the information of the channel bypass of the
other three channels. But, the other two input variables, "AI_1_CP1_CH_AB" and
"AI_1_CP2_CH_AB" have the all-channel bypass information for the other three channels as well as
for the self-channel.
Due to the fact that the two variables, "AI_1_CP1_CH_ AB" and "AI_1_CP2_CH_AB" include
the all-channel bypass information for the other three channels, the output variable
"AL_1_ATIP_CHA_BYP" in Fig.5 that will be used in the module of "MT_STS_CHK" in Fig.4 is
affected by other channel's bypass state. This makes the "MTSC" sub-module in Fig.4 have a
significant hazard such that, for example, when all the trip channels of the Channel D are already
bypassed (and hence, the Channel A cannot actuate any channel bypass function), if an operator
inadvertently requests the Channel A to execute the manual test and then pushes "Test Start" button at
the COM, the module of "MT_STS_CHK" in the ATIP of Channel A will execute the manual test
even though the corresponding trip channel is not bypassed. This might block a real trip signal from
being transferred to the initiating circuit and a test signal is, instead, transferred into the initiating
circuit; which can affect the safety function of the reactor protection system.
As a matter of fact, the possibility of the occurrence of the above failure sequence is extremely
unlikely in the practical operation of the IDiPS-RPS with many barriers designed to prohibit an
inadvertent test command. However, this software has a defect which can violate the test requirement
and may potentially compromise the safety function of the IDiPS-RPS.
It should be recognized that it is very difficult to identify such a software defect as described
above in the course of a system or integration test because the test cases and procedures become very
complex. The test procedure for discovering the above software defect requires at first the four fully-
interconnected IDiPS-RPS channels with their full application software and communication functions.
And then, there should be four testing supporters (or, at least, more than two testing persons) who take
each channel of the IDiPS-RPS for the channel bypass functions, and one test supervisor who directs
the bypass actuation and MT activation with abnormal manipulations.
Test cases are also so huge in that when all-channel bypass in Channel D is functioned, the MT
start command for one of 18 trip channels of Channel A is generated at the COM in Channel A and
then identify whether the MT test result is really produced and a real trip signal is blocked. These
testing activities are repeated for the remaining 17 trip variables in Channel A. After that, these testing
activities are performed for Channel B and then for Channel C. These activities are repeated by
changing the status of all-channel bypass from Channel D to any Channel until all-channel bypass is
performed over all of the four RPS channels.
These testing activities require obviously a large amount of costs and testing efforts to carry out.
When the safety analysis by the software FMEA was applied, the just single software safety analyst
could identify this defect by investigating the ATIP software code.
4. Conclusions
This paper describes the application of a software FMEA analysis to the ATIP application
software code as one of the software safety analysis activities. The software FMEA is performed
based on the single failure-mode template which is applicable to all the function blocks of the FBD
program. From the application experience, the software FMEA was proven to be able to provide a
very systematic way of analyzing a software system. The analysis results show that the software
FMEA analysis can find a software defect (or fault) and its associated failure sequence that is very
hard to identify in the course of an integration or system test.
There is one thing to consider. This is the coverage of the failure modes in the failure mode
template used in the software FMEA. The failure modes for the FBD presented in this paper were
deduced from investigating every function blocks and discussions among the software safety analysts
who have experience in performing a type of software safety analysis. Regardless of this process, the
failure mode coverage could not be validated in a quantitative fashion. From our study, however, it
was identified that any other failure modes that are outside of the category established in this paper
were not discovered during the safety analysis on the real FBD code.
From the application results to the safety-related software, even though the software safety
analysis mainly has been performed on the safety-critical software such as the trip-actuating software,
it would be helpful in increasing the software quality to apply to safety-related or non-safety software
code when a higher level of the software quality is required.
References
[1] J. H. Park, D. Y. Lee, and C. H. Kim, “Development of KNICS RPS Prototype,” Proceedings of
ISOFIC (International Symposium on the Future I&C for NPPs) 2005, Session 6, Tongyeong,
Korea, Nov. 1~4, pp.160-161. 2005
[2] G. Y. Park and K. C. Kwon, “Formal V&V Activities on Software for Digital Reactor Protection
System,” Enlarged Halden Programme Group Meeting, Storefjell Resort Hotel, Norway, March
11-16, Session-C4.2, 2007.
[3] Regulatory Guide 1.168, Verification, Validation, Reviews and Audits for Digital Computer
Software Used in Safety Systems of Nuclear Power Plants, U.S. Nuclear Regulatory Commission,
2004.
[4] IEEE Std. 1228, IEEE Standard for Software Safety Plan, 1994.
[5] G. Y. Park, J. S. Lee, S. W. Cheon, K. C Kwon, E. Jee, and K. Y. Koh, "Safety Analysis of
Safety-Critical Software for Nuclear Digital Protection System", LNCS(Lecture Notes in
Computer Science) 4680, pp.148-161, 2007.
[6] G. Y. Park, K. Y. Koh, E. Jee, P. H. Seong, K. C. Kwon, and D. H. Lee, "Fault Tree Analysis of
KNICS RPS Software", Nuclear Engineering and Technology, Vol.40, No.5, pp.397-408, 2008.
[7] IEC 61131, Part 3, International Standard for Programmable Logic Controllers: Programming
Languages, International Electrotechnical Commission, 1993.