Professional Documents
Culture Documents
AEROSPACE AIR6988™
RATIONALE
In order to provide a means of compliance for the certification of Artificial Intelligence (AI) within safety critical aeronautical
systems, the committee must first review existing standards and perform a gap analysis to understand how and why existing
standards cannot be reliably used. This document serves as that gap analysis and provides a list of concerns that need to
be addressed in order to produce a future means of compliance.
TABLE OF CONTENTS
1. SCOPE .......................................................................................................................................................... 5
4.3.12 ISO 12207 Systems and Software Engineering - Software Life Cycle Processes ..................................... 27
4.3.13 ISO 26262 Road Vehicles - Functional Safety ........................................................................................... 27
4.3.14 ISO 21448 Road Vehicles - Safety of the Intended Functionality ............................................................... 27
4.3.15 ED-201 Aeronautical Information System Security (AISS) Framework Guidance ..................................... 27
4.3.16 ED-202A/DO-326A Airworthiness Security Process Specification ............................................................. 28
4.3.17 ED-203A/DO-356A Airworthiness Security Methods and Considerations.................................................. 28
4.3.18 ED-204/DO-355 Information Security Guidance for Continuing Airworthiness .......................................... 28
4.3.19 ED-205 Process Standard for Security Certification and Declaration of ATM
ANS Ground Systems ................................................................................................................................. 29
4.4 Gap Analysis Summary............................................................................................................................... 29
PREFACE
Anticipating a growing commercial pressure for Artificial Intelligence (AI) solutions within the aerospace industry over the
coming few years, there is an urgent call for regulation and the emergence of norms around acceptable usage. In response,
two working groups were set up independently on either side of the Atlantic during 2019 to address concerns around
assuring products and services that exploit AI technologies. WG-114 was established by EUROCAE in Europe and G-34
by SAE in the United States.
Both Working Groups were created to produce guidance on safe, secure, and successful adoption of AI technologies in
Aeronautical Systems, through consensus amongst many experts and practitioners in industry and academia. In bilateral
agreement, the groups formed a joint committee in June 2019.
The joint working group will evaluate key applications for AI usage within aeronautical systems, with a scope encompassing
ground-based equipment and airborne vehicles, including Unmanned Aircraft Systems (UAS) products. In terms of
processes, the full lifecycle will be under consideration, from design and manufacture, to operation and through-life
maintenance.
A key deliverable will be documented standards, providing guidance on assuring safe and secure systems utilizing AI,
through an agreed acceptable means of compliance with regulatory requirements.
As per the charters of both EUROCAE WG-114 and SAE G-34, the first objective of the joint working group was to develop
and publish a technical report, a comprehensive Statement of Concerns (SOC), outlining the scope and purpose of the
group’s work and considering the concerns before imagining the solutions. This document is the response to that objective
and is the outcome of virtual meetings that took place between September 2019 and May 2020.
Before merging, each of the groups was developing a SOC document independently. Both groups were organized into
sub-groups, each addressing a section of the SOC. G-34 had Sub-Committees (SCs) and WG-114 had Teams (TMs). Upon
merging, six groups were formed, as defined in the following table.
AI has the potential to disrupt the aerospace industry, impacting all areas in which computing and aerospace intersect. AI
technologies are becoming progressively more embedded into the digital systems used to design, manufacture, operate,
and maintain both aerial vehicles and ground-based systems. Leveraged appropriately, AI-driven solutions could transform
the products and services that aerospace companies provide with an accelerated pace of change. Specifically, Machine
Learning (ML) technologies have the potential to revolutionize established paradigms of aeronautical system development,
including those concerned with safety-critical applications.
AI is a broad subject, still being actively developed from a confluence of many disciplines, including mathematics, computing,
cognitive science, software development, data science, control theory, and others. It demands a collaborative approach
with experts contributing from multiple domains.
Current industry guidance documents have a strong focus on established development methodologies and solutions for
aeronautical systems. These standards are therefore not expected to entirely accommodate the development and
assurance of ML-enabled solutions.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
SUMMARY OF CONTENTS
Section 3 surveys established AI techniques and their application to various problem domains, placing them in a historical
context and suggesting a taxonomy. A development workflow is also outlined.
From the foundational background provided by Section 3, Section 4 assesses current aerospace industry standards and
guidelines, discussing their applicability to systems that incorporate ML technologies. Chief among these standards are ED-
12C/DO-178C and supplements, ED-79/ARP-4754A, ED-80/DO-254, ED-109A/DO-278A, and ED-153.
In Section 5, specific areas of concern for both developmental and operational activities are identified, using the
development workflow outlined in Section 3 as a framework for discussion. The scope is limited to technical considerations.
Ethical and societal implications are out of scope.
Section 6 defines the scope of the documents that the joint Working Group will produce. Both ground and airborne
applications are considered, at whole-system level, rather than at the level of their constituent hardware and software
components. Software operates in the context of a system, and key quality attributes such as safety are properties of the
overall system.
Potential applications of AI to aerospace domains are proposed as a set of Use Cases in Sections 7 and 8, for airborne and
ground-based systems, respectively.
The conclusion in Section 9 proposes a staged plan for developing a future standard for AI techniques, as well as processes
and technologies to be assessed against, in the context of aeronautical systems.
IMPORTANT NOTICE: This document neither defines guidance, nor sets forth any constraints around future guidance.
1. SCOPE
This document reviews current aerospace software, hardware, and system development standards used in the
certification/approval process of safety-critical airborne and ground-based systems, and assesses whether these standards
are compatible with a typical Artificial Intelligence (AI) and Machine Learning (ML) development approach. The document
then outlines what is required to produce a standard that provides the necessary accommodation to support integration of
ML-enabled sub-systems into safety-critical airborne and ground-based systems, and details next steps in the production
of such a standard.
NOTE: This document and the upcoming standard it is to help inform are concerned only with “offline” learning applications
of AI and ML. In offline learning, ML models are trained on historical data within a dedicated learning environment.
When the trained models are then implemented into a production system, learning functionality is turned-off. The
production system implementing AI may collect data for retraining at a later date, but any retraining or further
learning will happen in the separate learning environment, and any resulting changes to the ML models will then
need to be re-implemented into production as a new version of the system utilizing AI. This is in contrast to “online”
learning, where a system utilizing AI will continue to learn and adapt its operation while in production. Consideration
of such systems is not out of scope for SAE-G34, but the committee will not consider online learning until after
publication of this document and its related standard.
2.1 References
[7] SAE/EUROCAE, "ARP4754A/ED-79A - Guidelines for Development of Civil Aircraft and Systems," 2010.
[8] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, The MIT Press, 2016.
[9] RTCA/EUROCAE, "DO-178C/ED-12C Software considerations in airborne systems and equipment
certification," 2013.
[10] High-Level Expert Group on AI (AI HLEG), "Ethics Guidelines for Trustworthy AI," 2019.
[11] Z. Manna, Mathematical Theory of Computation, Dover Publications, 1974.
[12] "Propositional calculus," [Online]. Available: https://en.wikipedia.org/wiki/Propositional_calculus.
[13] "T-norm fuzzy logics," [Online]. Available: https://en.wikipedia.org/wiki/T-norm_fuzzy_logics.
[14] "First-order logic," [Online]. Available: https://en.wikipedia.org/wiki/First-order_logic.
[15] "Second-order logic," [Online]. Available: https://en.wikipedia.org/wiki/Second-order_logic.
[16] "Prolog," [Online]. Available: https://en.wikipedia.org/wiki/Prolog.
[17] "Logic Programming," [Online]. Available: https://en.wikipedia.org/wiki/Logic_programming.
[18] OWL Working Group, "OWL Web Ontology Language Use Cases and Requirements," 2004. [Online].
Available: https://www.w3.org/TR/webont-req/#onto-def.
[19] OWL Working Group, "OWL 2 Web Ontology Language, 2nd ed., W3C Recommendation," 2012. [Online].
Available: https://www.w3.org/TR/owl-ref/.
[20] P. Bellini, R. Mattolini, and P. Nesi, "Temporal Logics for Real-Time Systems Specification," ACM Computing
Surveys, 2000.
[21] "Temporal Logic," [Online]. Available: https://en.wikipedia.org/wiki/Temporal_logic.
[22] "Knowledge engineering," [Online]. Available: https://en.wikipedia.org/wiki/Knowledge_engineering.
[23] "Expert system," [Online]. Available: https://en.wikipedia.org/wiki/Expert_system.
[24] "Rule-based system," [Online]. Available: https://en.wikipedia.org/wiki/Rule-based_system.
[25] "Knowledge representation and reasoning," [Online]. Available:
https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning.
[26] "Ontology engineering," [Online]. Available: https://en.wikipedia.org/wiki/Ontology_engineering.
[27] D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques, The MIT Press, 2009.
[28] "Graphical model," [Online]. Available: https://en.wikipedia.org/wiki/Graphical_model.
[29] T. R. Gruber, "A translation approach to portable," Knowledge Acquisition, 1993.
[30] L. F. Sikos, Mastering Structured Data on the Semantic Web: from HTML5 Microdata to Linked Open Data,
2015.
[31] I. Horrocks, U. Sattler, and F. Baader, "Chapter 3: Description Logics," in Foundations of Artificial Intelligence,
2007.
[32] R. Arp et al., Building Ontologies with Basic Formal Ontology, The MIT Press, 2015.
[33] J. Heleber et al., Semantic Web Programming, Wiley, 2009.
[34] W. Wong et al., "Ontology learning from text: a look back and into the future," ACM Computing Surveys, Vol.
44, No. 4, Article 20, 2012.
[35] "Semantic Web Stack," [Online]. Available: https://en.wikipedia.org/wiki/Semantic_Web_Stack.
[36] N. Casellas, "Linked Legal Data: A SKOS Vocabulary for the Code of Federal Regulations," IOS Press.
[37] S. Mandal et al., "Semantic Web Representations for Reasoning about Applicability and Satisfiability of
Federal Regulations for Information Security," RELAW, Ottawa, 2015.
[38] I. Sanya and E. Shehab, "A Framework for developing engineering design ontologies within the aerospace
industry," International Journal of Production Research, vol. 53:8, pp. 2383-2409, 2015.
[39] CRYSTAL - Critical System Engineering Acceleration, "State of the art for Healthcare ontology," vol.
D407.010, 2013.
[40] P. J. Besl and N. D. McKay, "Method for registration of 3-D shapes. Sensor fusion IV: control paradigms and
data structures.," International Society for Optics and Photonics, vol. 1611, 1992.
[41] S. M. LaValle and J. J. Kuffner Jr., "Randomized kinodynamic planning," The international journal of robotics
research, vol. 20, no. 5, pp. 378-400, 2001.
[42] S. Karaman and E. Frazzoli, "Sampling-based algorithms for optimal motion planning," The international
journal of robotics research, vol. 30.7, pp. 846-894, 2011.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
[43] M. A. Fischler and R. C. Bolles, "Random sample consensus: a paradigm for model fitting with applications to
image analysis and automated cartography," Communications of the ACM, vol. 24.6, pp. 381-395, 1981.
[44] P. Jackson, Introduction to Artificial Intelligence, Dover Publications, 1985.
[45] H. Moravec and A. Elfes, "High resolution maps from wide angle sonar," in IEEE international conference on
robotics and automation, 1985.
[46] P. E. Hart, N. J. Nilsson, and B. Raphael, "A formal basis for the heuristic determination of minimum cost
paths," IEEE transactions on Systems Science and Cybernetics, vol. 4.2, pp. 100-107, 1968.
[47] C. G. Harris and M. Stephens, "A combined corner and edge detector," in Alvey vision conference, 1988.
[48] A. L. Samuel, "Some Studies in Machine Learning Using the Game of Checkers," IBM Journal, pp. 211-229,
1959.
[49] T. M. Mitchell, Machine Learning, McGraw-Hill, 1997.
[50] K. P. Murphy, Machine Learning: A probabilistic Perspective, The MIT Press, 2012.
[51] L. Breiman, "Statistical Modeling: The Two Cultures," Statistic Science, vol. 16(3), pp. 199-231, 2001.
[52] D. Wolpert and W. G. Macready, "No Free Lunch Theorems for Optimization," IEEE Transactions on
Evolutionary Computation, vol. 1 No 1, 1997.
[53] D. H. Wolpert, "The Lack of A Priori Distinctions Between Learning Algorithms," Neural Computation, vol. 8,
pp. 1341-1390, 1996.
[54] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, 2nd ed., Springer, 2009.
[55] "Supervised learning," [Online]. Available: https://en.wikipedia.org/wiki/Supervised_learning.
[56] S. Russell and P. Norvig, "Chapter 18. Learning from examples," in Artificial Intelligence: A modern approach,
Prentice Hall, 2009, pp. 493-767.
[57] A. A. Patel, Hands On Unsupervised Learning with Python, O'Reilly, 2019.
[58] "Unsupervised learning," [Online]. Available: https://en.wikipedia.org/wiki/Unsupervised_learning.
[59] "Semi-supervised learning," [Online]. Available: https://en.wikipedia.org/wiki/Semi-supervised_learning.
[60] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed., The MIT Press, 2018.
[61] "Reinforcement learning," [Online]. Available: https://en.wikipedia.org/wiki/Reinforcement_learning.
[62] "Meta learning," [Online]. Available: https://en.wikipedia.org/wiki/Meta_learning_(computer_science).
[63] "Automated machine learning," [Online]. Available: https://en.wikipedia.org/wiki/Automated_machine_learning.
[64] "Neuroevolution," [Online]. Available: https://en.wikipedia.org/wiki/Neuroevolution.
[65] D. Grbic and S. Risi, "Towards continual reinforcement learning through evolutionary meta-learning," GECCO
'19: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 119-120, 2019.
[66] M. A. Nielsen, Neural Networks and Deep Learning, Determination Press, 2015.
[67] David Rumelhart et al., "Learning Internal Representations by Error Propagation," Defense Technical
Information Center technical report, 1985.
[68] Heng-Tze et al., "Wide & Deep Learning for Recommender Systems," in First Workshop on Deep Learning for
Recommender Systems, 2006.
[69] NASA, "Certification Considerations for Adaptive Systems," 2015.
[70] V. Chandola, A. Banerjee and V. Kumar, "Anomaly Detection: A Survey," ACM Computing Surveys, 2009.
[71] SAE, "J3016 Taxonomy and Definitions for Terms related to Driving Automation Systems for On-Road Motor
Vehicles," 2018.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
2.2 Definitions
ACCEPTANCE SCENARIO: A test or simulation procedure designed to gather evidence of a system’s compliance with
requirements and expectations.
ACCEPTANCE CRITERIA: The criteria applied for the complete set of acceptance scenarios.
ADAPTIVE SYSTEM: System that changes its behavior, based on an active feedback process in the presence of changes
in the system or its environment. The environment can include hardware or software components of the computing platform,
or the external surroundings in which the system operates (e.g., for airborne systems, changes to the physical structure of
the aircraft, or in the weather). An adaptive system goal-seeks by iteratively updating parameters in response to
environmental changes, but without resorting to predetermined values in look-up tables, or through defined calculations.
This is the defining characteristic.
ARTIFICIAL INTELLIGENCE (AI): The theory and development of software-based systems able to perform tasks that have
hitherto been the province of human intelligence. Examples include visual perception, speech recognition, decision-making,
customer support and anomaly detection. John McCarthy, who coined the term in 1956, defines it as "the science and
engineering of making intelligent machines."
ARTIFICIAL GENERAL INTELLIGENCE (AGI): Also known as general or strong AI, Artificial General Intelligence refers to
an artificial system able to perform any intellectual task that can be performed by a typical human. The result would be a
system considered to have the kind of general intelligence possessed by human beings and perhaps also qualities like
consciousness.
ARTIFICIAL NARROW INTELLIGENCE (ANI): Artificial Narrow Intelligence, also known as weak AI, refers to the application
of AI to solve narrow or specific problems. All current AI implementations are in this category.
ARTIFICIAL SUPER INTELLIGENCE (ASI): Artificial Super Intelligence research considers the possibility of systems that,
given their superior processing and storage capabilities when compared to the human brain, could potentially exhibit
super-human levels of intelligence. How such systems could be created, and how they would be evaluated is a matter of
research and philosophical debate.
ARTIFICIAL NEURAL NETWORK (ANN): Neural Networks are algorithms, modelled loosely on the workings of the
biological brain, where neurons are “fired” with sufficient stimuli. Typically, an ANN is “trained” with example data, in which
the expected outputs are defined for a given input dataset. Parameters are adjusted based on the deviation from expected
outcomes.
In “classical” ANNs, data is transmitted in one direction only, but there are two common architectures in which data is
bi-directional, and these are better suited to certain classes of problem:
• Convolutional Neural Network (CNN): A type of Neural Network for processing data that has a known grid-like topology.
CNNs use a specialized kind of linear operation in place of general matrix multiplication in at least one of their layers,
which make them well suited to analytical applications, such as predictive maintenance.
• Recurrent Neural Network (RNN): A type of Neural Network that involves directed cycles in memory. One aspect of
recurrent neural networks is the ability to build on earlier types of networks with fixed-size input vectors and output
vectors, which make them well suited to operational applications, such as a self-driving car or self-flying airplane.
ATTRIBUTE: Information representing a property or characteristic of a subject as recorded in structured data, such as a
table, spreadsheet, or database. In a database context, attributes are commonly referred to as fields, whereas in a
spreadsheet or table context, attributes are commonly referred to as columns. For instance, structured data representing a
person may have attributes such as name or age.
AUTOMATION: The use of control systems and information technologies reducing the need for human input, typically for
repetitive tasks.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
AUTONOMOUS [4]: Operations of an unmanned system wherein the system receives its mission from the human or agent
and accomplishes that mission with or without further human-robot interaction (HRI). The level of HRI, along with other
factors such as mission complexity, and environmental difficulty, determine the level of autonomy for the unmanned system.
Finer-grained autonomy level designations can also be applied to the tasks, lower in scope than mission.
• Fully Autonomous: A mode of UMS operations wherein the UMS is expected to accomplish its mission, within a defined
scope, without human intervention.
• Semi-Autonomous: A mode of UMS operations wherein the human operator and/or the UMS plan(s) and conduct(s) a
mission and requires various levels of HRI.
For the automotive industry, six levels of autonomy are defined (refer to SAE J3016 [4]).
AUTONOMY: The ability to perform one or more tasks in a changing environment following a decision-making process
without input by a human.
BIAS (MACHINE LEARNING): (1) An error deriving from erroneous assumptions in the learning process. High bias can
cause an algorithm to miss the relevant relations between attributes and target outputs (known as underfitting). (2) A
parameter within neural network that transforms input data. The inputs get modified by a bias value.
BIAS (STATISTICS): A feature of a statistical technique or of its results whereby the expected value of the results differs
from the true underlying quantitative parameter being estimated.
BIG DATA: A discipline that specializes in dealing with the analysis of very large amounts of data, with high velocity (high
speed of data processing.) The data may come from a wide variety of sources (sensors, images, texts, etc.) in a wide variety
of formats, including unstructured formats, such as free text.
CLASSIFICATION (MACHINE LEARNING): A method that maps input data to one of a number of discrete classes.
CLUSTERING: A method that identifies similarities between features of the data and groups data items into clusters with
similar features. It is a common example of unsupervised learning.
COMMERCIAL-OFF-THE-SHELF (COTS): Refers to generally available software and hardware developed for a broad
market, rather than hardware or software designed specifically for aerospace applications. Special rules and regulations
often apply when leveraging COTS hardware or software for safety critical applications, as such hardware or software may
not have been developed in accordance with aerospace industry standards.
DATA CLEANSING: Identification and removal of errors and duplicate data to create a reliable dataset. This improves the
quality of the training data for analytics.
DATA DRIFT: Phenomenon in which a Machine Learning system encounters data values during operation that were not
used in training the underlying model. The predictive capabilities of the system are compromised as in-service data values
diverge from those used in model training; for instance, the performance of an aircraft engine prognostics system trained
with data from warm weather data, may degrade if the deployed system is used in cold weather over sustained periods.
DATA-DRIVEN AI: An approach to AI development focused on building a system that can output appropriate responses
based on having learned from a large number of examples.
DATA SCIENCE [3]: A broad field that refers to the collective processes, theories, concepts, tools, and technologies that
enable the extraction of information and analysis to acquire knowledge from that information.
DATASET (MACHINE LEARNING): The sample of data used for various development phases of the algorithm: i.e., training,
validation, and test.
• Training Dataset: Data that is input to a Machine Learning model in order to establish its behavior.
• Validation/Development Dataset: Used to tune some hyperparameters of a model (e.g., number of hidden layers,
learning rate, number of neurons per layer).
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
• Test Dataset: Used to assess the performance of a model, independent of the training dataset.
DEEP LEARNING (DL): A specific type of Machine Learning based on the use of large (deep) neural networks to learn
abstract representations of the input data using multiple layers.
DETERMINISTIC SYSTEM: A system for which no randomness is involved in the development of future states. A
deterministic system will always produce the same output given the same input (including environmental conditions) and
initial state.
DERIVED REQUIREMENTS: Requirements which are not directly traceable to higher level requirements and/or specify
behavior beyond that specified by the higher level requirements.
EXPECTED PROPERTY: The expected functional or performance property of the system outputs for each acceptance
scenario.
EXPERT SYSTEM: A knowledge-based system that provides for solving problems in a domain or application area by
drawing inferences from a knowledge base developed from human expertise. The term "expert system" is sometimes used
synonymously with "knowledge-based system” but should be taken to emphasize expert knowledge. Some expert systems
can improve their knowledge base and develop new inference rules based on their experience with previous problems.
Expert systems usually rely on explicit, symbolic encoding of knowledge.
EXPLAINABILITY: The extent to which humans can understand, interpret, and account for the causality of an AI system or
algorithm: why a particular output is produced for a given set of inputs. (Note: In this document, we use explainability in a
broad manner to cover both explainability, as in a system’s ability to show its work, and interpretability, as in a human’s
ability to logically understand what is being explained.)
EXPLAINABLE AI: An emerging field of study that aims to make the workings of AI systems transparent, such that humans
can see process and decision flows. Note, that this may not the same as having AI systems be logically understood by
humans, although the goal of many Explainable AI projects is often aligned to that outcome. See definition for
INTERPRETABLE AI.
FORMAL METHODS: A collection of mathematically rigorous techniques that can be applied to the specification,
development and verification of software or hardware. A major benefit is a reduced need for human inspection and testing,
for verification and validation, relying instead on unequivocal mathematical proofs of system properties. Challenges include
the high level of required expertise, sparse tool support, and the limited applicability to all types of problem.
FUNCTION: The intended behavior of a product based on a defined set of requirements regardless of implementation.
GENETIC LEARNING [6]: An approach to Machine Learning based on an iterative classification algorithm which selects
pairs of classifiers according to strength, and then applies genetic operators to the pairs to create offspring. The strongest
offspring replace the weakest in order to generate new, plausible classifiers when the prior classifiers prove inadequate.
The term “genetic" comes from the field of natural genetics, where it is linked to heredity governed by genes.
HYPERPARAMETER (MACHINE LEARNING) [8]: A setting that is used to control the behavior of a learning algorithm (e.g.,
number of hidden layers, learning rate, number of neurons per layer). Note: the values of hyperparameters are not adapted
by the learning algorithm.
INFERENCE (MACHINE LEARNING): The process of a Machine Learning model computing an output, based on input
data. The concept of inference originates from the field of Logic and is used extensively in Symbolic AI. The dictionary
definition is "a conclusion or opinion that is formed because of known facts or evidence." See also "rule of inference" in [14]
or "inference engine" in [25].
INTERNET OF THINGS (IoT): The network of physical objects that contain embedded technology to communicate and
sense or interact with their internal states or the external environment.
INTERPRETABLE AI: The quality of an AI system to explain process and decision flows in such a manner that humans can
logically decompose and understand the system’s output and decision making. With Interpretable AI, predictions can be
logically justified, errors can be traced to their source in logical fashion, and a level of confidence in outcomes can be
ascertained.
KNOWLEDGE: A collection of facts, events, beliefs, and rules organized for systematic use.
KNOWLEDGE ACQUISITION: The process of locating, collecting, and refining knowledge and converting it into a form that
can be further processed by a knowledge-based system. Knowledge acquisition normally implies the intervention of a
knowledge engineer, but it is also an important component of Machine Learning. [6]
KNOWLEDGE BASE; K-BASE [6]: A database that contains inference rules and information about human experience and
expertise in a domain. In self-improving systems, the knowledge base additionally contains information resulting from the
solution of previously encountered problems.
KNOWLEDGE ENGINEERING [6]: The discipline concerned with acquiring knowledge from domain experts and other
knowledge sources and incorporating it into a knowledge base. The term "knowledge engineering" sometimes refers
particularly to the art of designing, building, and maintaining expert systems and other knowledge-based systems.
KNOWLEDGE-BASED SYSTEM: Information processing system that provides for solving problems in a domain or
application area by drawing inferences from a knowledge base. The term "knowledge-based system" is sometimes used
synonymously with "expert system," which is usually restricted to expert knowledge.
MACHINE LEARNING (ML): The branch of AI concerned with the development of algorithms that allow computers to evolve
behaviors based on observing data and making inferences on these data.
• Supervised Learning: The process of learning a function that maps an input to an output based on labelled training
dataset.
• Unsupervised Learning: The process of learning a function from a non-labelled dataset, by adapting the model to
increase accuracy of the algorithm based on a given cost function.
• Reinforcement Learning: The process of learning in which the algorithm rewards positive results and punishes for a
negative result, enabling it to improve over time. The learning system is called an agent.
• Semi-Supervised Learning: The process of learning in which the algorithm is capable of learning from data that is
partially labelled.
MACHINE LEARNING MODEL: A parameterized function that maps inputs to outputs. The parameters are determined
during the training process.
NATURAL LANGUAGE PROCESSING (NLP): Natural Language Processing (NLP) is a sub-field of Artificial Intelligence,
concerned with computational techniques for analyzing, representing, and processing natural (human) language texts and
voices, at one or more levels of linguistic analysis, such as phonetics, syntax, semantics and discourse, for the purpose of
achieving human-like language abilities for a range of applications, including natural language understanding and
generation, and speech recognition and generation.
OPERATIONAL DESIGN DOMAIN (ODD): Description of the specific operating domain(s) in which an automated function
or system is designed to properly operate, including but not limited to operational aspects, environmental conditions, and
other domain constraints.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
ONLINE LEARNING ALGORITHM [2]: Represents a class of learning algorithms that learn to sequentially optimize
predictive models over a stream of data instances while performing in the run-time environment during operations. The
on-the-fly learning makes online learning highly scalable and memory efficient and differentiates it from batch learning or
offline learning.
OPTIMIZATION: Methods that find the best available values within a function or distribution according to a defined cost
function.
PREDICTABILITY: The degree to which a correct forecast of a system's state can be made quantitatively.
REGRESSION: A set of statistical techniques to postulate the mathematical relationship of one or more dependent variables
to one or more independent variables and use such a relationship for statistical inference or prediction.
ROBUSTNESS (SYSTEM): The extent to which the system handles invalid inputs properly, ability of a system to provide
stable outputs with perturbations in the inputs.
ROBUSTNESS (SOFTWARE): The extent to which software can continue to operate correctly despite abnormal inputs and
conditions.
ROBUSTNESS (MACHINE LEARNING MODEL): Ability of a model to generalize the outputs, for an input varying in a
region of the state space compatible with the training range.
RULE-BASED SYSTEM; PRODUCTION SYSTEM: A knowledge-based system that draws inferences by applying a set of
if-then rules to a set of facts following given procedures.
SEMANTIC NETWORK; SEMANTIC NET: A concept-based knowledge representation in which objects or states appear
as nodes connected with links that indicate the relationships between various nodes.
SYMBOLIC AI [3]: In contrast to Data-Driven AI, attempts to capture knowledge and derive decisions through explicit
representation and rules.
TESTING: The process of exercising a system or component to verify that it satisfies specified requirements.
TRAINING (MACHINE LEARNING): The process of optimizing the parameters of a Machine Learning model, given a
dataset and a task to achieve on that dataset.
TRUSTWORTHINESS: Set of three qualities of a system that should be satisfied throughout its entire life cycle: it should
be lawful, complying with all applicable laws and regulations, it should be ethical, ensuring adherence to ethical principles
and values, and it should be robust, both from a technical and social perspective. [10]
UNMANNED SYSTEM (UMS): An electro-mechanical system, with no human operator aboard, that is able to exert its power
to perform designed missions. May be mobile or stationary. Includes categories of unmanned ground vehicles (UGV),
unmanned aerial vehicles (UAV), unmanned underwater vehicles (UUV), unmanned surface vehicles (USV), unattended
munitions (UM), and unattended ground sensors (UGS). Missiles, rockets, and their submunitions, and artillery are not
considered unmanned systems.
VALIDATION: The process of determining that requirements are both correct and complete with respect to representing
larger goals and objectives.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
VARIANCE (MACHINE LEARNING): An error from sensitivity to small fluctuations in the training set. High variance can
cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting).
VARIANCE (STATISTICS): A standard calculation in Statistics that indicates the “spread” of a data distribution, whilst
reducing the sensitivity to outliers. Essentially, it tells us how far a set of numbers are spread out from their average value.
More formally, variance is the expectation of the squared deviation of a random variable from its mean.
VERIFICATION: The evaluation of the outputs of a process to ensure correctness and consistency with respect to the inputs
and standards provided to that process.
WEIGHT (MACHINE LEARNING): A parameter within a Neural Network that transforms input data. The inputs get multiplied
by a weight value.
NOTE: Both bias and weight are learnable parameters inside the network.
WEIGHT (SYMBOLIC AI): A parameter within Semantic Networks which indicates the strength of the relationship between
the nodes.
WELL FORMULATED FORMULA (wff): A statement whose validity can be determined by mathematical logic.
3. CLASSIFICATION OF AI TECHNIQUES
This section offers a common knowledge base of AI terminology as a foundation for the remaining sections. We provide a
taxonomy of AI concepts and techniques based on three major paradigms: Symbolic AI, Numerical Analysis, and Machine
Learning. The section concludes with a discussion of a general workflow for developing ML systems.
Figure 1 shows a classification of AI techniques adopted by the SAE G-34/WG-114 committee, and those techniques
relevant to the statement of concerns document are detailed in 3.1.
Main Classification:
Figure 1 - AI classification
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
3.1 Symbolic AI
Symbolic AI refers to the collection of methods that attempt to capture and encode human knowledge, for the purpose of
machine understanding and processing. Solutions are based on human-readable representations of problems, where
real-world objects, along with their characteristics, relationships, and interactions, are represented by symbols. An aircraft,
for example, could be represented by the textual string “aircraft.” Although this approach requires no model training, no
massive amounts of data, and no “guesswork,” its main limitation lies in the difficulty of codifying the real worlds, with all its
complexity and nuances. Today’s knowledge-based AI agents occupy only a partially described universe.
3.1.1 Logic
AI and Computer Science evolved together over the second half of the 20th century and, since computers are fundamentally
Boolean Logic processing machines, it is natural to apply the principles and techniques of Logic to solve problems. The
approach is to build propositional sentences, using a constrained grammar, that express some belief (“Truth”) about the
domain of interest, with no ambiguity. A modern computer can assess the truth of millions of propositions per second, to
make decisions, plan, and act.
The logic branch of Symbolic AI is composed of several classes of Formal Methods. They allow the representation of logical
arguments (a.k.a. statements) in terms of well-formed formulas (wffs), whose validity can be determined by mathematical
logic. A wff is said to be valid if its tautology can be demonstrated for all possible interpretations [11]. An interpretation
consists of an assignment of truth values to the wff variable symbols.
Knowledge Engineering [22] refers to all technical, scientific, and social aspects involved in building, maintaining, and using
knowledge-based systems. This broad field includes long-standing concepts such as expert systems [23] and rule-based
systems [24], as well as more recent techniques such as ontology engineering [25], [26] and probabilistic graphical models
[27], [28]. A common trait of knowledge engineering solutions is the hard coding of knowledge using formal languages and
their processing using formal methods.
3.1.1.1.1 Ontologies
Ontologies [18], [29], [30], [31] are data structures that model concepts, roles, and individuals, and their relationships.
Ontologies can store conceptual or incomplete information and use reasoning over description logic to infer missing
relationships and attributes.
Ontologies [32] capture domain knowledge by organizing concepts, relationships, and constraints organized in a web of
statements named the Semantic Web [33]. The ontology information model is designed to facilitate easy data sharing,
reuse, and interoperability across multiple scientific and engineering domains. The Semantic Web accomplishes this
through two critical principles: decoupling the knowledge model from the application and integrating knowledge models
through reuse and extension [27]. Many ontologies can be defined for any single domain since their creation is usually
context driven in terms of scope, abstraction-level, granularity, properties, and intent.
A Knowledge Base (KB) [27] is a software component that represents a collection of facts or statements that is ontologically
described, processed, and accessed in a Semantic Web application. Entailments inferred by reasoners combined with
user/application asserted statements in the KB can refer to generic concepts or specific individuals (instance data).
Reasoners evaluate the assertions in the underlying data structures and verify consistency of ontology concepts and their
mutual relationship. The use of ontologies allows a KB to incorporate new ontologies and instance data incrementally as
the need arises.
In artificial intelligence, expert systems are designed to solve complex problems by reasoning through bodies of knowledge,
represented mainly as “if-then” rules (production rules) rather than through conventional procedural code. An expert system
usually has two core components: a knowledge base and an inference engine. The knowledge base is an organized
collection of facts and production rules about the system’s domain. Facts are frequently acquired from human experts
through interviews and observations. The inference engine applies the rules to the known facts in the knowledge base to
possibly deduce new facts in order to emulate the decision-making ability of a human expert. Inference engines can also
include explanation and debugging abilities. Typical tasks for expert systems involve classification, diagnosis, monitoring,
scheduling, and planning for specialized technology domains.
This section covers several common algorithmic approaches based on non-data driven techniques used in a wide range of
applications including Optimization, Mapping, and State Estimation, in applications such as robotics and autonomous
vehicles.
Iterative approaches can be described as taking an initial guess at a solution, and repeatedly running a procedure (an
iteration) to refine that guess to get closer and closer to the acceptable solution. The iterations typically continue until either
a maximum number of iterations are reached, or a certain fitness of the solution is found. Depending on the problem, the
fitness of the solution may be a measure of the proximity to the true solution, or it may be a heuristic measurement.
Randomized algorithms explicitly use random numbers as one of their inputs as a method to converge to a true solution.
These algorithms generally use random inputs as samples from a large action or state space. As the number of samples
increases, these algorithms generally then either heuristically or provably converge to the correct solution.
Population methods involve self-affecting, multi-part populations [46]. A mathematical system that “affects itself” is typically
composed of at least two parts that produce and accept feedback to themselves, that is, where the action of any part affects
the other parts, which in turn affect the original part [44].
In ANNs, for example, population-based training is a hyperparameter optimization technique, similar to genetic algorithms,
that learns from a schedule of hyperparameters rather than fixed values.
Population-based methods mimic the genetic search of Natural Selection in the biological world. Instead of working with a
single candidate solution, the designer works with a population of candidates to explore the solution space.
Probabilistic algorithms describe a general class of algorithms that operate directly on probability distribution functions
(PDF). Rather than computing a value directly, they compute the probability that a random variable takes any given value.
These methods can be useful in developing systems with inputs that have an associated uncertainty (such as sensor data),
as they allow for passing that uncertainty through the full system. Many probabilistic algorithms fall into the category of
Bayesian Inference, which make use of Bayes’ Rule to update a prior distribution with new information.
Search algorithms handle retrieving data or finding connections in a given data structure. This includes applications such
as finding an optimal path between two nodes on a graph or identifying a subset of items within the structure that fulfils
certain criteria.
Direct approaches can best be described by not being iterative approaches; they constitute a predetermined set of
operations independent of the input data. Direct approaches cover a wide range of algorithms from algorithms with exact
solutions (such as linear systems or Kalman filters) to algorithms that simply change the representation of data (such as
convolutions). Many algorithms in both traditional and learning based computer vision fall in the category of direct
approaches.
The ML field is crowded with multiple learning strategies and algorithm families with no discernible advantage among
alternatives if one makes absolutely no assumption about the training data (this is known in the AI R&D community as the
No Free Lunch theorem [8], [52], [53]). The generally accepted classification of ML algorithms is: supervised learning,
unsupervised learning, semi-supervised learning, and reinforcement learning.
With Supervised Learning, the main goal is to train a model with labelled data, enabling the model to make predictions on
new unseen data. The training data comprises samples, where the desired output (labels) are known and correct, for a
given input.
In Classification problems, the model assigns a categorical class label to each sample. An example of Classification is
image recognition, where physical objects can be classified as cars, people, signs, buildings, and so on.
A second type of Supervised Learning is the prediction of continuous outcomes, referred to as Regression. An example of
regression is predicting the remaining useful life of an engine hardware component, given explanatory variables such as
number of flight cycles, hours of operation, and operating temperature distribution [50], [54], [55], [56].
In contrast to Supervised Learning, where the right answer is already known for a number of samples, Unsupervised
Learning deals with unlabeled datasets, or data of an unknown structure. Unsupervised algorithms learn underlying
structures in the training set and use these patterns to make predictions. In Clustering, a common Unsupervised Learning
technique, an amorphous pile of information can be sorted into meaningful subcategories (clusters), using features that may
not be immediately discernable by humans. Typical unsupervised learning tasks include anomaly detection, visualization,
dimensionality reduction, and association rule learning [50], [57], [58], [56].
The ML algorithm is capable of learning from data that is partially labelled. Most semi-supervised learning algorithms use
unsupervised learning approaches to improve supervised learning solutions in dealing with problems such as insufficient
labelled data, curse of dimensionality, feature engineering, outliers, and data drift.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
In Reinforcement Learning, the goal is to produce a Machine Learning algorithm (agent) that improves its performance
interacting with its environment and observing changes brought about by the agent’s own actions. The learning objective is
to determine an action-selection policy that maximizes a numeric reward signal. The agent must discover which actions
yield the most reward by trying them. The trial-and-error search and the learner’s need to consider delayed rewards are the
two most important distinguishing features of Reinforcement Learning.
A classic example of Reinforcement Learning is where the objective, such as the safe landing of a spacecraft without
crashing, is well defined. The algorithm uses repeated attempts to determine the optimal set of maneuvers and fuel usage
to best achieve the objective [60], [61], [56].
To summarize the types of learning and their applications, a bubble chart was designed. The intend was not to have an
exhaustive list of applications for each type of learning, a task that is impossible considering the variety of possible
applications of Machine Learning. On the contrary, this chart provides only a few examples of technical applications (smaller
bubbles) for each of the types of learning (medium size bubbles).
Artificial Neural Nets (ANNs) manifests from a Machine Learning modelling technique that was originally inspired by the
networks of neurons in human brains, but gradually evolved apart from its biological analogy. Sufficient to say, the
development of powerful graphical processing units (GPUs), availability of huge amounts of training data, improvements in
ANN training algorithms and programming APIs, and a “virtuous circle of funding and progress” [5] are leading to an
accelerated evolution in the development of ANN-based products.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
Original ANN architectures include the basic Perceptron and their layered combinations called Multi-Layer Perceptrons
(MLPs). Fundamentally, an MLP is “just a mathematical function mapping some sets of input values to output values” [8].
Structurally, MLPs are dense networks of interconnecting layers of simpler perceptron units. The signal in MLPs flows only
in one direction from the input layer towards the output layer passing sequentially through one or more hidden layers. The
output of a unit in any given layer in an MLP network is a linear combination of the output of the units of the previous layer
in the sequence. Training an MLP then consists in determining appropriate connection weights and respective biases terms
for all units in the network. Nonlinear activation functions are interposed between consecutive layers to increase the
expressiveness of the network.
The ANN is called a Deep Neural Network (DNN) when it has a deep stack of hidden layers, as opposed to Shallow Neural
Networks (SNN) that have only a few hidden layers. There is no consensus about how much depth an ANN requires to be
classified as a DNN. Ian Goodfellow, Yoshua Bengio, and Aaron Courville, however, state in their seminal book [8] that
“deep learning can be safely regarded as study of models that involve a greater amount of composition of either learned
functions or learned concepts than traditional Machine Learning does.”
The workflow illustrated in Figure 6 is proposed to support the discussion in the following sections.
System Definition:
The system requirements, allocated to the ML system, are captured and validated.
The system architecture is also defined at this stage and corresponding safety assessment and analyses are conducted to
identify the relevant system safety and security requirements and objectives (quantitative, qualitative) to be flow down to
the other phases.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
In this phase, the data should be selected and collected based on the specific problem domain. The data should contain a
distribution of the data that includes the proper level of variances. The data is cleaned and/or transformed (using several
techniques), with respect to specific expected quality attributes, and validated against requirements established in the
previous phase.
Each data sample is allocated to either the training, validation, or test set.
In this phase, the model is selected and trained with the Training dataset and then optimized with the Validation Dataset.
At the end of this phase the trained model is tested with the Test dataset, to assess the performance regarding the system
requirements.
ML Sub-System Implementation:
In this phase the tested trained model is implemented (Design and HW/SW integration) into the target environment.
ML Sub-System Verification:
In this phase the correct implementation of the tested trained model is verified. The verification could be based on the test
dataset in addition to the other test inputs established for the purpose of the implementation verification.
In this phase, the step by step integration within the overall system is performed and corresponding verification activities
are conducted.
Operation:
In this phase, the overall system containing the ML system is released to field and operated. The behavior of the ML system
is monitored, and the operating data are compiled into an operational dataset which is fed back to Data Selection and
Validation to augment the existing datasets.
4.1 Introduction
4.1.1 Description
There is a concern that existing development assurance standards may not be appropriate for AI/ML solutions. Existing
development assurance standards require a predefined set of activities and demonstrations used later by certification
authorities as evidences of compliance. There is a broad consensus the currently accepted means of compliance in use for
systems, software and hardware fail to provide appropriate assurance for some specific of AI/ML techniques. These
techniques use methodologies that are fundamentally different from the generic life-cycle assumed by the existing
development assurance standards.
Although these techniques are not governed by a strict methodology and could be assessed in various ways, this section
focuses on a predefined hypothetical development scenario. It does not intend to cover all possible scenarios and
possibilities, but rather tries to highlight the main gaps identified in a factual way among existing development assurance
standards in use today. It is important to note that this is not a guidance on how to certify a system using an AI sub-system.
Many published standards are considered in this section and some of them have been subject to a deeper analysis. The
choice of standards assessed is led by the desire to cover at least system, software, and hardware aspect of ground and
airborne development assurance standards.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
In order to perform the gap analysis, this section employs a methodology whereby two aspects of a development assurance
standard’s objectives are assessed. The first aspect, the objective applicability, relates to how the objective is relevant to
the envisioned life cycle of AI development. The second aspect, the objective sufficiency, relates to how the guidance
provided in the standard is relevant to the envisioned life-cycle of AI development, i.e., it is clear how to apply the guidance
and it is compatible with the envisioned life-cycle activities.
In order to limit the scope of the gap analysis, this section describes a hypothetical development scenario that represents a
common ML-based system development life cycle. It also attempts to identify and map where ML-based system
development activities, described in 3.2, could be performed within actual requirement-based development assurance
standards.
There exist numerous ML techniques in literature and each of them have their own particularities impacting development
activities. This development scenario focus on a classical feed-forward Neural Network (NN).
1. The system requirements allocated to the ML system are captured and validated. The system architecture is also
defined at this stage and corresponding safety assessment and analyses are conducted to identify the relevant system
safety and security requirements and objectives (quantitative, qualitative) to be flow down to the other phases.
2. The data selection phase, where data is gathered, cleaned, and selected in respect to the requirements defined in the
system definition phase. Data is selected for the purposes of training and validation of the NN. The data needs to be
sufficient to confirm whether the functional requirements have been satisfied.
3. The model selection phase consists of the choice of the NN architecture. It defines the number of layers, number of
nodes for each layer and their corresponding activation function.
4. The offline training and testing of the NN is performed at on a separate host/system. The training activity defines the
NN weights value using the training dataset and performs optimization of these weights using the validation dataset in
respect to the acceptance criteria defined in phase 1.
5. After the offline training, the trained model is tested using the test dataset in respect to the acceptance criteria defined
in phase 1 by executing the acceptance scenarios.
6. The implementation phase focuses on the software and/or hardware development activities. It consists of designing,
implementing, and verifying the trained NN in the target environment. A trained NN has passed acceptance criteria and
represents system requirements allocated to software or hardware implementation. Special cases will be considered
where it may apply, particularly in a model-based approach.
7. In this phase, the step-by-step integration within the overall system is performed and corresponding verification activities
are conducted.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
ARP4754A/ED-79A discusses the development of aircraft and aircraft systems taking into account the overall aircraft
operating environment and functions. This includes validation of requirements and verification of the design implementation
for certification and product assurance. It provides practices for showing compliance with the regulations and serves to
assist an organization in developing and meeting its own internal standards. ED-79 and ARP4754 also invoke and interact
with the safety assessment process of ARP4761.
The gap analysis has been performed considering Machine Learning techniques, and more precisely Neural Networks
(although the results can be generalized to other ML techniques).
The whole scope of the standard has been considered, however, as described in the scenario of 4.2, the use of AI/ML was
assumed to be primarily at system or item level.
• Development Assurance Level: The concept of development assurance level (DAL) remains relevant. Nevertheless, its
current interpretation for Machine Learning remains to be defined, and cannot be based on DO-178C/ED-12C nor
DO-254/ED-80 processes (cf. gap analysis on DO-178C/ED-12C and DO-254/ED-80.)
• System requirements: The requirements definition approach may need some adaptations for Machine Learning. In
particular, the fact that some of the requirements are implicitly contained in the dataset may require new methods for
evaluating the correctness and completeness of a dataset to the requirements. A specific focus on dataset requirements
and AI/ML performance requirements associated with the DAL may also be needed.
• Requirements validation: New methods for requirements validation could be needed for Machine Learning, because of
the potential changes in the requirements capture philosophy. These new methods should be considered in addition to
the ones defined in the ARP4754A.
• Implementation verification: new methods for implementation verification could be needed for Machine Learning in order
to cope with the probabilistic nature of ML-based algorithms and because of the potential changes in the requirements
capture philosophy.
4.3.2 ARP4761 Guidelines and Methods for Conducting the Safety Assessment Process on Civil Airborne Systems and
Equipment
This document describes guidelines and methods of performing the safety assessment for certification of civil aircraft. It is
primarily associated with showing compliance with FAR/JAR 25.1309. These tools and methods cover both system and
aircraft level safety assessment. The overall aircraft operating environment is considered.
No detailed gap analysis has been performed on this standard. A review led to the conclusion that the processes/principles
for safety assessment should not be drastically impacted by the use of AI or ML techniques. However, some additional
safety methods may be needed to address specifics of AI/ML.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
The purpose of the ED-12C/DO-178C standard is to provide guidance for the production of software for airborne systems
and equipment that performs its intended function with a required level of confidence in safety. The standard includes:
• Descriptions of the evidence in the form of software lifecycle data and variations of objectives by software level
The review was made considering the scenario described in 4.2. Note that special cases may leverage from defining the
NN as a whole model instead of only an algorithm and its coefficients. Indeed, an ED-12C/DO-178C supplement for
model-based development, ED-218/DO-331, will look at this particular case in the next section.
In respect to the scenario, the flow down of the ML model coefficients and weights is equivalent to the normal practice of
equations and algorithms flowed to the software process through requirements. As such, the majority of the
ED-12C/DO-178C processes can be executed without major impact. However, there may be gaps which have been
identified:
1. Based on the hypothesis developed in 4.2, the high-level software requirements capture process foreseen in
ED-12C/DO-178C might not be occurring at the level of the software item but rather originate from the system
development processes. The input to the software design processes would not be a set of functional requirements but
rather the output of the training process, e.g., a NN structure and associated weights and biases.
2. The NN structures and weights cannot be traced to the system requirements from which they are developed (neither to
the system textual requirements describing the expected properties nor to the training datasets). There is no guidance
in this standard and its related documents on how to handle traceability between NN structure and weights and system
textual requirements and training datasets.
3. The ED-12C/DO-178C verification methods are not appropriate to training datasets and NN weights:
a. The properties relevant to design assurance of datasets are fundamentally different from the properties of high level
or low level requirements.
b. NN weights are not comprehensible by humans, so manual review and analysis for compliance to parent
requirements is not credible.
4. Requirements-based testing may not be possible in the traditional manner because typical NN techniques (training,
validation, and testing phases) do not fit with the requirements-based verification approach per ED-12C/DO-178C.
a. In particular, verification on NN that relies on traceability cannot be achieved (the coverage metrics for NN structure
and weights are not established).
b. DO-178C testing strategy calls out an equivalence classes concept which may not apply for NN implementation
due to the highly complex and non-linear nature of a regular size NN. It may be practically impossible to identify
equivalence classes for a NN algorithm.
5. The structural coverage measurement metrics for source code may not be effective because any activation of the NN
would trigger activation of all nodes at some extent. Hence, there is no value in performing a classical structural
coverage approach.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
6. Guidance for software planning process is applicable, but there is no specific guidance for NN based systems.
7. Additional consideration for the usage of NN weights as Parameter Data Items (PDI). The NN weights are inherently
different from a traditional configuration file. Using PDI for NN weights would modify the behavior of the approved
configuration and thus represents a serious gap.
ED-218/DO-331 is a supplement to ED-12C/DO-178C that provides guidelines to produce airborne software using
model-based techniques.
”An abstract representation of a given set of aspects of a system that is used for analysis, verification, simulation, code
generation, or any combination thereof. A model should be unambiguous, regardless of its level of abstraction.”
NOTE 1: If the representation is a diagram that is ambiguous in its interpretation, this is not considered to be a model.
NOTE 2: The “given set of aspects of a system” may contain all aspects of the system or only a subset.
This standard does not remove or eliminate the ED-12C/DO-178C objectives, but rather provides additional guidance and
objectives specific to a model-based development lifecycle. Such lifecycles may share some similarities with our
hypothetical AI development scenario.
Readers of ED-218/DO-331 may conclude that in some instances, NN architecture and weights could be considered a
design model that is shared between the system and the software engineering teams. Also, acceptance tests performed at
the system level can be considered as simulation cases executed on the target software under development.
Further refinements are needed to our hypothetical AI development scenario to address the ED-218/DO-331 specific
aspects of model-based software development.
1. The NN model is shared between the system level (learning phase) and the software level (implementation, integration,
and tests of the Executable Object Code on the target). Note: the acceptance scenarios may be considered as the
simulation cases of DO-331. Some acceptance scenarios may need to be executed with the final sensors, hardware,
and software in non-simulated environments.
2. The textual requirements that define expected properties at the system level, as well as the acceptance criteria of the
NN, are considered part of the system textual requirements from which the NN design model is developed.
With these assumptions, the development and verification activities of the software model, and requirements from which
the software model is developed, are implemented at the system level, but nevertheless all ED-218/DO-331 objectives
applicable to software models should be satisfied (as described in the Note 1 in ED-218/DO-331 MB.1.6.3).
In practice, ED-12C/DO-178C and ED-218/DO-331 are used in conjunction. Therefore, all the previous gaps identified for
ED-12C/DO-178C are still valid. Identified additional gaps pertaining to ED-218/DO-331 are as follows:
1. The NN structures and weights can only be traced to their parent requirements as “many to many.” There is no guidance
in the standards on how to handle traceability between NN structure and weights, and system textual requirements and
training datasets.
2. The DO-331 verification methods are not appropriate to training datasets and NN weights:
a. The properties of datasets are fundamentally different than the properties used in review and analysis, e.g.,
verifiability, completeness, etc.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
b. NN weights are not comprehensible by humans, thus traditional review/analysis is not credible.
3. The traceability objectives may not be relevant to traceability between acceptance scenarios and the system textual
requirements, because the acceptance scenarios for NN are typically highly complex and cover multiple requirements.
Without traceability, it may not be possible to provide confidence in detecting the presence of unintended behaviors in
the design model.
4. Requirements-based testing may not be possible in the traditional manner because typical NN techniques (training,
validation, and testing phases) do not fit with the requirements-based verification approach per ED-218/DO-331.
a. In particular, verification on NN that relies on the traceability cannot be achieved (the coverage metrics for NN
structure and weights are not established).
5. The structural coverage measurement metrics may not be effective as any activation of the NN would trigger activation
of all nodes at some extent. Thus, there is no value in performing a classical structural coverage approach.
ED-215/DO-330 provides guidelines for qualification of tools used for the development of safety-critical systems per
ED-12C/DO-178C or other safety-related standards.
AI can be used for tool development. With the limited timeframe allocated for the gap analysis, ED-215/DO-330 was
excluded from the analysis scope.
ED-216/DO-333 is a supplement to ED-12C/DO-178C, which provides guidelines to produce airborne software using Formal
Methods techniques. Formal Methods are mathematically based techniques for the specification, development, and
verification of software aspects of digital systems.
Formal methods may be applied for the verification of AI. With the limited timeframe allocated for the gap analysis, it was
excluded from the analysis scope.
ED-217/DO-332 is a supplement to ED-12C/DO-178C which provides guidelines to produce airborne software using
Object-oriented and related techniques.
No additional gap found regarding ML/AI than the ones identified in 4.3.4.2.
The use of increasingly complex electronic hardware for safety critical aircraft functions generates new safety and
certification challenges. DO-254/ED-80 provides design assurance guidance for the development of airborne electronic
hardware such that it safely performs its intended function in its specified environments. The guidance is conveyed through
recommended activities that should be performed from the hardware’s conception through initial certification in order to
meet design assurance objectives.
The review was made mainly considering the hypothetical AI development scenario described in 4.2. In addition to the
general assumptions already defined, the following assumptions have been applied to this analysis:
• The guidance was only assessed against the development of a Programmable Logic Devices (PLD) based H/W
implementation.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
The flow down of the ML model, including coefficients and weights, could be seen as equivalent to the implementation of
equations or algorithms flowed to the Field Programmable Gate Array (FPGA) process through requirements. However, the
gaps inherent to traditional development assurance approaches are also applicable:
1. Based on the hypothesis developed in 4.2, the requirements capture process foreseen in ED-80/DO-254 might not be
occurring at the level of the hardware item but rather originate from the system development processes. The input to
the hardware design processes (conceptual design and detailed design) would not be a set of functional requirements,
but rather the output of the training process, e.g., a NN structure and associated weights and biases.
2. The NN structures and weights cannot be traced to the system requirements from which they are developed. There is
no guidance in this standard on how to handle traceability between NN structure and weights and system textual
requirements and training datasets.
3. The ED-80/DO-254 validation and verification methods may not be applicable to training datasets and NN weights:
a. The properties of datasets are fundamentally different than the properties used in traditional DA review and analysis.
b. NN weights are not comprehensible by humans. Thus, traditional review/analysis cannot be used.
4. Requirements-based verification may not be possible in the traditional manner because typical NN techniques (training,
validation, and testing phases) do not fit with the requirements-based verification approach per ED-80/DO-254.
5. The elemental analysis method which drive detailed design coverage metrics for HDL code may not be effective as any
activation of the NN would trigger activation of all nodes at some extent. Thus, there is no value in performing a classical
design coverage or elemental analysis approach.
4.3.9 ED-109A/DO-278A Software Integrity Assurance Considerations for Communication, Navigation, Surveillance, and
Air Traffic Management (CNS/ATM) Systems
Since ED-109A/DO-278A is very close in terms of objectives to ED-12C/DO-178C, the group has made the assumption that
the main outcome of the ED-12C/DO-178C gap analysis will apply to ED-109A/DO-278A. ED-109A/DO-278A has an
assurance level AL4 that has no equivalence on ED-12C/DO-178C. Additionally, ED-109A/DO-278A contain a section
(12.4) devoted to COTS SW, which has no correspondence in ED-12C/DO-178C either. These differences should be
addressed during the definition of the future standard.
ED-153 objective is to offer guidance on how to ensure that the risk associated with deploying the software is reduced to a
tolerable level by providing:
• Recommendations and requirements on the major processes necessary to provide safety assurance for software in Air
Navigation Service (ANS) systems (“ground” only).
• A recommended ANS Software lifecycle and its associated activities in support of achieving the identified objectives.
• The ML software was considered as being either a standalone software or a component/module of a bigger software.
• In regard to “Sufficiency” (see 4.1.2), ED-153 is composed of sets of objectives relating to software (SW) processes.
The only way it provides guidance is by referencing other standards (refer to ED-153 P.7). These standards were not
part of our review. However, whenever identified, insufficiency of guidance is recorded and part of the gap analysis
summary below.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
Scope
• General review of “classical” objectives, linked to primary, supporting, and organizational lifecycles (CH4, CH5, CH6).
Out of Scope
• Online learning is out of scope of the present review. Only pre-trained systems (offline ML) are in scope.
• The Software Assurance Level (SWAL) concept fully applies to “classical” SW but does not fully apply to learning
datasets and AI based SW:
By “classical” SW, we mean SW which behavior is explicitly described through several description levels such as
software requirements, architectural design, and detailed design, including algorithms, which are then translated into a
target coding language. With AI (ML) components, these artefacts may not be available. Additionally, learning datasets
are not SW and cannot be allocated a SWAL.
• SWAL4 objectives are fully applicable to an AI based SW while SWAL1-3 objectives are partially or not applicable to
an AI based SW:
This is due to the fact that (1) requirements can be specified using the outcome of the Machine Learning pipeline (done
at system level) while detailed software design for example is not, and (2) the level of depth of analysis, verification,
and evidence requested for SWAL4 is at SW requirement level (i.e., black box) while it is much more stringent for
SWL3-1.
• SW development processes are adapted to “classical” SW, not to learning datasets and potentially not to SW resulting
from learning methods.
• There is a need to clarify what is considered as a configuration item in the context of AI based systems.
For example, most verification activities related to the detailed design may not be achievable, as the detailed design is
not available/explainable.
• Quality audits down to source code level and executable level activities are not fully applicable to AI based systems.
The main objective of this regulation is to lay down common requirements for the provision of ATM and air navigation
services and other ATM network functions for general air traffic, and for the competent authorities, which exercise
certification, oversight, and enforcement tasks.
This regulation addresses the approval process of changes to a functional system. A functional system is combination of
procedures, human resources, and equipment (including HW and SW), organized to perform a function within the context
of ATM/ANS and other ATM network function. All service providers need to assess changes they make to their functional
system.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
A full gap analysis has not been done yet. However, an initial analysis shows that the main gaps are similar to those found
in EUROCAE ED-153 analysis.
4.3.12 ISO 12207 Systems and Software Engineering - Software Life Cycle Processes
ISO/IEC/IEEE 12207 is an international standard for software lifecycle processes. First introduced in 1995, it aims to be a
primary standard that defines all the processes required for developing and maintaining software systems, including the
outcomes and/or activities of each process.
This standard shares many concepts that are similar to ED-12C/DO-178C but is intended for the software industry domain
in general. With the limited timeframe allocated for the gap analysis, it was considered as not essential and excluded from
the analysis scope.
The ISO 26262 series of standards is the adaptation of IEC 61508 for road vehicles. This adaptation applies to the activities
during the safety lifecycle of safety-related systems comprised of electrical, electronic, and software components. ISO 26262
includes guidance to mitigate risks from software systematic failures and random hardware failures by providing appropriate
requirements and processes.
This standard is conceptually similar to the suite of aeronautical standards (ARP4761, ARP4754A, ED-12C/DO-178C, and
ED-80/DO-254) but less detailed and intended for automotive domain. With the limited timeframe allocated for the gap
analysis, it was considered as not essential and excluded from the analyse scope.
For some systems, which rely on sensing the external or internal environment, there can be potentially hazardous behavior
caused by the intended functionality or performance limitation of a system that is free from faults. Examples of such
limitations include: (1) the inability of the function to correctly comprehend the situation and operate safely; this also includes
functions that use Machine Learning algorithms; (2) an insufficient robustness of the function with respect to sensor input
variations or diverse environmental conditions. The ISO 21448 is intended to address the absence of unreasonable risk due
to the potentially hazardous behaviors related to such imitations.
Although this standard hasn’t been reviewed against the scenario described in 4.2, it seems to present interesting concepts
that can be relevant to AI/ML. Among these concepts, it covers aspects where it falls outside functional safety-hazards that
can’t be traced to functional failures and especially in hard-to-specify environments. This standard may be considered as
an interesting input for future works.
There are many standards and guidance documents which address the responsibility that each organization has for its own
information security, dealing with internal systems, processes, products, and data. However, this guidance looks beyond
the individual organization at information security related to systems and processes, and to products and data, in a wider
context. This guidance concentrates on the shared information risk, which is inherent in the situation where systems,
processes, products, or data are shared, or are passed from one organization to another. There are varying degrees of
sharing and exchange in these situations, but any sharing or exchange causes additional risk to an organization. This
guidance is most applicable to the larger risks, which affect safety or where there are significant implications for service
delivery. Smaller risks (assuming they have been correctly assessed) require less effort, particularly in mitigations, and this
guidance should be interpreted accordingly.
The guidance in ED-202A adds to current guidance for aircraft certification to handle the threat of intentional unauthorized
electronic interaction to aircraft safety. It adds data requirements and compliance objectives, as organized by generic
activities for aircraft development and certification, to handle the threat of unauthorized interaction to aircraft safety and is
intended to be used in conjunction with other applicable guidance material, including ED-79A/ARP4754A,
ED-135/ARP4761, ED-12C/DO-178C, and ED-80/DO-254 and with the advisory material associated with FAA AC 25.1309
and EASA AMC25.1309, in the context of Part 25 and CS-25. Tailoring of this guidance may allow it to be applicable in
other contexts such as CS-23, CS-27, CS-29, CS-E, Part 23, Part 27, Part 29, and Part 33.
ED-202A guidance material is for equipment manufacturers, aircraft manufacturers, and anyone else who is applying for an
initial Type Certificate (TC), and afterwards (e.g., for Design Approval Holders (DAH)), Supplemental Type Certificate (STC),
Amended Type Certificate (ATC), or changes to Type Certification for installation and continued airworthiness for aircraft
systems, and is derived from understood best practice.
This document provides guidance on a set of methods and guidelines for applicants implementing an Airworthiness Security
Process as specified in ED-202A/DO-326A to address information security for certification of aircraft and its systems. More
specifically, it addresses the activities in the areas of security risk management and security assurance. Applicants and
authorities should consider these methods, and alternative practices if and when they are proposed. Those aspects of
information security that have no safety effect are not in the scope of ED-203A.
Airworthiness security is the protection of the airworthiness of an aircraft from intentional unauthorized electronic interaction.
Intentional unauthorized electronic interaction (also known as "unauthorized interaction" within the scope of ED-203) is
defined as human-initiated actions with the potential to affect the aircraft due to unauthorized access, use, disclosure, denial,
disruption, modification, or destruction of electronic information or electronic aircraft system interfaces. This definition
includes the effects of malware on infected devices and the logical effects of external systems on aircraft systems but does
not include physical attacks or electromagnetic jamming.
The guidance provides methods and considerations for securing airworthiness during the aircraft life cycle. It was developed
as a companion document to ED-202A/DO-326A "Airworthiness Security Process Specification" which addresses security
aspects of aircraft certification and ED-204/DO-355 Information Security Guidance for Continuing Airworthiness which
addresses airworthiness security for continued airworthiness.
There was no gap found regarding ML/AI outside references to objectives linked to development process standards.
This document provides guidance for the operation and maintenance of aircraft and for organizations and personnel involved
in these tasks. It shall support the responsibilities of the Design Approval Holder (DAH) to obtain a valid airworthiness
certificate and the responsibilities of aircraft operators to maintain their aircraft, to demonstrate that the effects on the safety
of the aircraft of information security threats are confined within acceptable levels. As all information security threats may
have an intentional origin, ED-204 also covers electronic sabotage (as used in AMC 25.1309).
ED-204 is a resource for civil aviation authorities and the aviation industry when the operation and maintenance of aircraft
and the effects of information security threats can affect aircraft safety. It deals with activities that need to be performed in
operation and maintenance of the aircraft related to information security threats.
ED-204 also gives guidance that is related to operational and commercial effects (i.e., guidance that exceeds the safety-only
effects). Thus, it also supports harmonizing security guidance documents among Design Approval Holders (DAH), which is
deemed beneficial to DAHs, operators, and civil aviation authorities. The most comprehensive possible area of the
application of this guidance is deemed to be Large Transport Aircraft programs. However, ED-204 does not make any
assumptions about and is without prejudice to its applicability.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
There was no gap found regarding ML/AI. Although this statement is true for a generic AI/ML sub-system, gaps can be
identified when the AI/ML sub-system is a security function. A security function may need some additional properties like
diversity, independence, and isolation that could be hard to demonstrate for an AI/ML sub-system.
4.3.19 ED-205 Process Standard for Security Certification and Declaration of ATM ANS Ground Systems
The security of ATM/ANS ground systems, constituents in use and data, is currently being regulated and security
management is a must for providers of ATM/ANS, who need to address risks with a potential impact on safety, operational
delivery, economic concerns, and others. ED-205 guides stakeholders involved in the protection of ATM/ANS ground
constituents.
In this section, several standards have been considered, and a gap analysis against a typical Machine Learning
development life cycle was summarized. The analysis has focused on system, software, and hardware development
standards with on-board and ground considerations in mind.
In performing this work, defining a typical Machine Learning development life cycle was quite challenging, highlighting a
major concern regarding where the elements of Machine Learning development life cycle should be addressed among the
usual aeronautical requirements structure. Although many possible scenarios could exist and present other gaps than the
ones identified, we focused on a classic end-to-end scenario in which to perform the analysis.
Moreover, among the main gaps identified, the requirements traceability, the mapping of Machine Learning model functions
and parameters between aerospace engineering concerns (such as SRATS, SRATH, and high or low level requirements),
and the application or lack of verification methods suitable for dataset were the gaps that raised many concerns. The
identified gaps highlight that a data-driven paradigm for AI/ML may not be adequately addressed by existing standards.
Lastly, the committee recognizes that as the field of AI/ML in aerospace matures, there will be a need to not only address
gaps in industry standards but also gaps in regulatory standards. While addressing these gaps is out of scope for this
document and the standard it intends to inform, the committee does consider the impact of its work on regulatory standards
and liaises with global policy leaders for discussion on the subject. The concept of AI Licensing, as described in 6.2.2.5, is
one example of where the committee’s work includes high-level discussion on policy with global regulators.
This section lists specific consideration and areas of concerns for an ML system.
This identification has been conducted considering the workflow proposed in 3.2 and the issues identified in the gap analysis
across Section 4.
The specific considerations and areas of concerns described in this section are technical in nature. Ethical and societal
considerations are not addressed.
5.1 Criteria for the Identification of Specific Considerations and Areas of Concerns
The following criteria were considered when identifying the specific concerns for each ML workflow phase.
• Safety
• Configuration Management
• Security
• Reliability
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
During this phase, the system requirements, allocated to the ML system, are captured and validated.
The system architecture is also defined, and corresponding safety assessments and analyses are conducted to identify the
relevant system safety requirements and objectives (quantitative and qualitative) to be flowed down to the other phases.
• There is a building consensus in the aviation domain that the capture of requirements for intended functionalities to be
developed within an ML system will still be performed following current methodologies (e.g., ARP4754A/ED-79A). These
reports also acknowledge that in a data driven ML system, the data from which the ML system is trained is considered
as a requirement because it “represents” the expected result and desired behavior of the ML system. The data also
contains some variability that may lead to unintended and unexpected behavior of the ML system.
• In an ML system, it is increasingly understood that the data gathered for training purposes will have the greatest impact
on the operational performance of the system. However, there is not yet an agreed upon approach to assure that the
data gathered is sufficient to train a suitably performant system. In an ML approach, capture of system requirements
will still be necessary, though current approaches would need to be adapted, especially in the case of derived
requirements.
• Requirements definition and validation processes should avoid or detect any requirements that may lead to
non-representative dataset selection for the AI model training phase.
• The probabilistic nature of ML applications should be considered as part of the standard safety process analysis and
relevant architectural mitigation (bounding, voting, diversity, etc.) should be flowed down to the ML application phases.
During this phase, the data is gathered, cleaned, and selected (using any of several established techniques), with respect
to specific expected quality attributes, and verified against the system requirements established in the previous phase. At
this phase, the dataset is divided into Training, Validation, and Test sets.
o Sufficiency (enough data to train the system to the desired level of accuracy)
o Representativeness (regarding the foreseeable environmental conditions of the intended operation, based on data
attributes defined in the requirements set, and with regards to the capabilities of the target environment)
o Balance (enough samples in each class and sufficient diversity among classes)
o Fairness (bias avoidance based on information presented in the set of requirements specifying which data is
selected)
o Timeliness
o Integrity (a degree of assurance that the data and its value has not been lost or altered, in order to avoid misleading
the ML model training and any resulting adverse behavior)
• The above attributes are applicable to all datasets (training, validation, and test) used for the ML-based system design.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
• The above attributes must be assured, whatever the source of the data, e.g., real world, synthetic, or augmented.
• The ML-based system should be tested with a test dataset different from the training dataset. The level of independence
between training dataset and test dataset should be considered.
• A key challenge with supervised ML is how well the training datasets are organized to support the good fit of the learning
model in order to avoid overfitting and underfitting.
• The process and activities to manage the different datasets and their data under configuration management should be
defined, including the aspects related to the tracking and management of changes, and the addition of data after the
certification or approval of the ML system.
• For investigation purposes, the incoming in-service operational data should be traceable to their origin.
• Tool qualification should be considered when dataset quality attributes may be altered, by inclusion of synthetic or
augmented data, or by automated labelling. (Specific considerations will have to be established when ML is used for
tool development.)
• The impact of the sampling method (statistical, random, systematic, etc.) used to select the test dataset, on the
representativeness of the data, should be considered.
• The dataset management environment should be secured in order to avoid adversarial or accidental data poisoning
and tampering. Attackers with access to the datasets used to train systems utilizing AI can influence the training process
by tampering with the data or the parameters used by the system. Poisoning attacks involve gradually introducing
carefully designed samples or “perturbations” that avoid setting off alerts, but eventually fool the system into seeing the
changes as natural rather than abnormal. This affects its decision-making capability and produces misleading results.
During this phase, the model is selected and trained with the training dataset and is optimized using the Validation Dataset.
At the end of this phase, the trained model is tested with the test dataset to assess its performance against the system
requirements.
• The training strategies (learning curves study, cross validation, models, feature selection, resampling, random restart,
etc.) that increase generalization should be prioritized to improve the model’s performance on new data.
• The trained model integrity should be evaluated from a safety and security perspective.
• Considerations about how to gain confidence into COTS training framework (HW/SW) should be established.
• Considerations related to the qualification of the tools used in training environments should be established.
• The process should give confidence in the choice of adequate network model architecture (including model topology,
numbers of layers, and numbers of nodes per layer,) training dataset, and training techniques. Model properties
(explainability, accuracy, safety, etc.) should be identified when selecting and training the model.
• The level of detail expected for the explanation of the model should be identified according to the expected level of
safety required by the ML system.
• The impact of re-training the ML system on testing and certification activities is to be considered, especially with respect
to potential credit for previous training and testing activities.
• The training process should ensure performance repeatability per ML system functional and performance requirements.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
• The training environment should be secured in order to avoid adversarial or accidental data poisoning and tampering
during the training. Evasion attacks modify input data so that the system cannot correctly identify the input,
misclassifying data and effectively rendering the system unavailable. One well-cited example involves fooling image
processing systems into incorrectly identifying images (for instance, a traffic stop sign is mistakenly detected as a speed
limit sign).
• The training environment should be secured to avoid common pitfalls to bad design, including models that result in
systems utilizing AI making bad decisions by overfitting or underfitting a decision-making model based on the data
analyzed. Modelling the available data too closely leads to overfitting, while not matching closely enough leads to
underfitting. This means that the systems produces either too-specific or too-general decisions, rather than providing
the right balance between certainty and doubt.
• The use of pre-trained models is a current practice for saving time or getting better performance in the training phase.
Considerations related to the validation of these previously trained or COTS models should be established.
• Re-training/re-validation strategies of an ML system with in-service collected data and the frequency of retraining should
be considered as part of training phase activities.
• The trained model and its weights and hyper-parameters should be managed as part of the configuration of the system.
During this phase, the tested and trained model is implemented (including design and HW/SW integration) into the target
environment.
• The implementation of SW robustness strategies is made difficult due to the lack of capability to specify abnormal or
invalid input/output data.
• The implementation of the system utilizing AI using common means of compliance for traditional SW or HW systems
based on traceability to requirements is made difficult due to the fact that machine-learnt models, such as neural
networks, cannot be traced.
• Differences between executing the model implementation in the target environment and executing the model on
the Model Selection, Training, and Testing environment are to be identified, including data representation, resources,
performances, non-functional aspects, system integration aspects, and other relevant areas.
• Implementing instrumentation for multiple purposes: (1) ease of verification and explainability (to the extent in which a
cause and effect can be observed within a system), and (2) recording of ML system behaviors in order to support
operation phase post analysis, should be considered.
• The level of detail expected for the explanation of the implementation of the trained model should be identified according
to the expected level of safety allocated to the ML system.
• Detection of unexpected features, introduced at the time of the implementation, cannot be supported by traceability and
human verification as in traditional software development, so an alternate approach should be established.
During this phase, the correct implementation of the tested and trained model is verified. The verification could be based on
the test dataset in addition to the other test inputs established for the purpose of the implementation verification. The test
data established to verify the correctness of the model training can be used to verify the implementation of the ML
sub-system.
• Verification completion criteria should not only show that requirements are met, but should also establish confidence
that the potential for unintended behavior is minimized. The traditional software structural coverage criteria defined are
not usable for ML system testing activities. Dedicated systematic criteria (input data space coverage, node coverage,
etc.) should be established.
• Implementation and verification of “play back” features to analyze, verify, and validate results should be considered.
• Difficulty of establishing completeness of verification of ML models without the use of automated and qualified tools
should be considered.
During this phase, the step by step integration with the overall system is performed and corresponding verification activities
are conducted.
• During the integration of the ML sub-system into the overall system, the verification of the architectural mitigation as
identified in 5.2.1 is to be performed.
5.2.7 Operation
• During this phase, the overall system containing the ML subsystem or component is released to the field and operated.
The behavior of the system is monitored and the operating data is compiled into an operational dataset which is fed
back to the data selection and validation phase and potentially used to augment the existing datasets. In-service
operational data (input data, corresponding ML system prediction data, observed errors rates, and ML monitoring data)
should be recorded for post-deployment analysis by users, ML systems, and authorities.
• In-service operational data should be transferred to the ML system manufacturer for in-service analysis and potential
improvement of the training, validation, and test datasets.
• The in-service operational data management environment should be secured in order to avoid adversarial or accidental
data poisoning and tampering. Attackers with access to the collected operational data can influence post-deployment
data analysis and potential ML System re-training.
• In some specific cases and under certain conditions, the possibility of local update of ML system parameters should be
considered.
This section clarifies the scope of the standards and guidance material to be delivered by G-34/WG-114 and describes
potential approaches to be further considered by the committee for inclusion in future revisions. The intent for G-34/WG-114
standards are to serve as a means of compliance for the certification and approval of products utilizing AI.
NOTE: Since this section is elaborated in parallel with the rest of the Statement of Concerns, it does not address all of the
concerns identified in Section 5.
The intent of G-34/WG-114 is to write standards addressing both airborne and ground systems. It is acknowledged that
those domains will involve different considerations, and that the requirements shall be adapted to each considered system.
The need for AI-specific activities on tool qualification 1 will also be considered within scope.
1
According to ED-215/DO-330, a tool is a computer program or a functional part thereof, used to help develop, transform, test, analyze, produce, or
modify another program, its data, or its documentation. Examples are automated code generators, compilers, test tools, and modification management
tools.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
The scope of the standards and guidance material to be delivered by G-34 is limited to the components utilizing AI that are
not already covered by existing standards. This material should be compatible with existing standards to enable integration
of components utilizing AI in broader airborne and ground systems.
As a first step, it is suggested to limit the scope of the future standard to Machine Learning. Therefore, the remainder of
Section 6 is focused on Machine Learning, even if parts of its content may be applicable to other AI techniques. Additional
AI techniques will be addressed in subsequent revisions of G-34/WG-114 standards.
Online learning raises specific certification/approval issues if the system utilizing AI continues to learn while in operational
use. Examples of issues are as follows:
In accordance with the above-mentioned concerns, a staged approach is recommended for continuing AI standards
development:
• At first, offline ML training can be addressed. The scope includes both the use of ML at the initial system design phase
(before the first deployed system) and later offline ML retraining based on an extended or updated dataset (e.g.,
collected from the deployed system).
• Secondly, online learning AI can benefit from the framework for offline learning but will also bring new questions that
may prompt a paradigm shift towards certification/qualification for machine systems and their relationship with humans.
This would be addressed in subsequent revisions of the standards, or the development of yet a new standard.
The Statement of Concern’s primary focus is on offline learning. Some online learning concerns have been included, but
not all concerns with online learning are covered.
6.1.3 Autonomy
It is expected that AI techniques will also be used to increase the level of autonomy of future products. Defining a framework
for autonomous operations is beyond the scope of this standard. Liaison with existing and future working groups on
autonomy should be established to ensure that future AI certification/approval standards answer their needs.
6.1.4 Cybersecurity
New technologies often come with additional cybersecurity risks related to new vulnerabilities, and Artificial Intelligence is
no exception to this rule. For example, Machine Learning techniques are vulnerable to data poisoning attacks, adversarial
examples attacks, or more typical backdoor attacks. For many systems, these cybersecurity risks and associated
vulnerabilities can have an adverse impact on safety. Therefore, cybersecurity cannot be overlooked for the certification
of AI.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
G-34/WG-114 should:
• Identify the cybersecurity vulnerabilities that are specific to systems utilizing AI and check if existing standards enable
identifying and managing these vulnerabilities,
• Develop the necessary guidance or ask other working groups to create or update relevant standards if some
vulnerabilities are not addressed by existing standards,
• Liaise with cybersecurity standardization working groups, for example SAE G-32, RTCA SC 216, and EUROCAE
WG-72, in order to ensure consistency of the standards.
Since the digital technologies and in particular AI are considered in certain aspects as a breakthrough, many regulators and
international bodies have issued high level requirements to promote but also regulate these technologies with respect to
democratic values. This is the case in Europe with the Ethic Guidelines for Trustworthy AI published by the European
Commission (see more details below). In the U.S., it is the Executive Order issued on February 11, 2019 by the White
House on “Maintaining American Leadership in Artificial Intelligence” 2. The United States Congress has also issued a
specific guideline titled “Artificial Intelligence and National Security” 3 for dealing with defense issues. From the perspective
of international bodies, OECD published the OECD AI Principles 4 and G20 published the G20 human-centered AI
Principles 5.
In Europe, the European Union is promoting trustworthy AI systems, a concept introduced in the “Ethic Guidelines for
Trustworthy AI” 6 produced by the High Level Expert Group on Artificial Intelligence set up by the European Commission (AI
HLEG Guidelines). Trustworthiness has three components, which should be met throughout the system’s entire life cycle:
• The AI system should be lawful, complying with all applicable laws and regulations,
• The AI system should be ethical, ensuring adherence to ethical principles and values, and
• The AI system should be robust, both from a technical and social perspective since, even with good intentions, AI
systems can cause unintentional harm.
The adherence to these three components should be checked through a trustworthiness analysis, as defined in the EASA
AI Roadmap 1.0, published in 2020.
With regard to the first component, lawfulness, existing aviation regulation is sufficient to meet AI HLEG recommendations
and guidance. Industry standards dedicated to AI based systems in aviation are an important element in demonstrating
compliance with the lawfulness principle. No other specific consideration is deemed necessary for AI based systems within
EUROCAE’s scope.
Regarding the two other components in Trustworthy AI, ethics, and robustness, the AI HLEG Guidelines establishes four
principles: respect for human autonomy, prevention of harm, fairness, and explicability. These are further translated to
seven key areas of concern as described by the HLEG Guidelines:
2
https://www.whitehouse.gov/presidential-actions/executive-order-maintaining-american-leadership-artificial-intelligence/
3
https://fas.org/sgp/crs/natsec/R45178.pdf
4
https://www.oecd.org/going-digital/ai/principles/
5
https://www.mofa.go.jp/files/000486596.pdf
6
https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
• Transparency
• Accountability
The AI HLEG Guidelines also introduces available methods for the implementation of these requirements throughout the AI
system’s life cycle. For requirement that are directly applicable to aviation systems (e.g., technical robustness, safety, and
cybersecurity), it is expected that the trustworthiness analysis will result in functional and non-functional system
requirements as well as organizational requirements that should be implemented and verified (using AI HLEG methods
and/or existing/adapted industrial processes of the aviation sector). It should also be noted that some aspects of these
seven AI HLEG key requirements may be out-of-scope for SAE-G34 and EUROCAE WG-114 and may require specific
rulemaking tasks and/or other industry standards.
New inputs from the AI HLEG should be monitored by G-34/WG-114 to check applicability of latest updates on the
Committee’s Standard.
NOTE: The terms used in this document (i.e., statement of concern) are not necessarily the same as those present in the
trustworthiness high level requirements. Refer to the original cited documents for clarification.
It is expected that certification/approval of products utilizing AI will rely on several activities, performed fully or partially at
each phase of the product lifecycle, depending on the AI technique used, on the system architecture, and on the safety
assessment.
Paragraph 6.2.1 describes the phases of an ML-based product lifecycle, in order to clarify the steps of the development
process. Then 6.2.2 lists the possible certification/approval activities that may be performed to show compliance of the
ML-based product with the requirements. Finally, 6.2.3 provides an overview of the complete certification process. The
certification/approval approach proposed in this section is an initial proposal, that is expected to evolve over time. This
approach has been elaborated taking into account the development assurance processes and the methods for compliance
demonstration described in existing standards, while trying to address the specific aspects of Machine Learning.
For products implementing ML, the proposed lifecycle is described in Figure 7. This lifecycle should support a common
standard for ground and airborne systems. ML can be used for developing simple software algorithms, but also for
developing complete functions or even systems. ML model development requiring system expertise should be addressed
at the system level. For this reason, this section proposes a process that includes the system level, and is not limited to a
sub-system, component, or item level.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
NOTE: This process is not fully linear but iterative. In particular, the safety assessment process spans the lifecycle and is
updated throughout.
This activity defines the system requirements and allocates them to the ML sub-systems and all other hardware and software
components. It also validates that requirements are correct.
Several types of requirements are listed as part of this activity: functional requirements, customer requirements, operational
requirements, performance requirements, physical and installation requirements, dataset requirements, maintainability
requirements, interface requirements, and any additional requirements deemed necessary for the specification of the
intended function of the ML sub-system. In particular, requirement capture should include the capacity of the system to
detect the ML sub-system is not exercised in an operational domain consistent with the training dataset used.
• For ATM/ANS systems: (EU) 2017/373 and its AMC/GM, Eurocontrol SRM, IEC 61508, and ED-153.
Examples of ML-specific requirements can be as follows: required probability of success of each function, characteristics of
the data input to the ML sub-system in the target environment (interface requirements), robustness requirements, i.e.,
criteria for correctness of input data and expected behavior in case of incorrect data, etc.
The safety assessment aims at identifying the failure conditions of the system associated with the loss of or malfunction of
the system (including ML sub-system(s)), associated hazards, their effects, and the rationale for their classification (e.g.,
minor, major, hazardous, catastrophic.) The safety assessment enables the identification of top-level safety requirements
for the whole system. The safety assessment is then refined and updated throughout the system development process
considering the selected system architecture. It establishes the safety requirements and the Design/Software Assurance
Level assigned to the ML sub-system, determines if the proposed architecture can satisfy the identified safety objectives,
and provides the evidence that the safety requirements are met. The resiliency aspect can be considered as part of the
safety assessment with its three components: protection, detect and respond, and lastly recovery.
• For ATM/ANS systems: (EU) 2017/373 and its AMC/GM, Eurocontrol SRM, IEC 61508, and ED-153.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
This activity establishes the system architecture that includes the interfaces to the ML sub-system as well as potential
technical constraints. The safety requirements allocated to the ML-based sub-system and its interfaces with other
components should be considered (e.g., redundancy and monitoring identified during the safety assessment process).
• For ATM/ANS systems: (EU) 2017/373 and its AMC/GM, Eurocontrol SRM, IEC 61508, and ED-153.
Data will be the corner stone of the ML-based sub-system development: the final behavior of the component will be almost
fully determined by the selected data. The goal of this activity is to select and clean data (cleaning, data curation, labelling,
normalization, etc.), in order to achieve the expected quality attributes (representativeness, lack of undesired bias,
timeliness, etc.). Once data is selected and cleaned, data validation consists of verifying that the desired quality attributes
are indeed present, that data has not been altered, and that data is adapted to the use case. Validation can for example be
performed by systematic check of certain attributes, by sampling, or by cross-check.
Model selection and training are crucial to the correct design of the ML-based sub-system. There are many models available,
and many ways to train and optimize a model (e.g., pruning). The standard should not impose a specific algorithmic
approach or a specific training technique. The focus should rather be on specifying the minimum criteria to be achieved by
the model after training. These minimum criteria are currently not addressed by existing standards.
Once the model has been selected and trained using the “training dataset,” it should be validated and optimized using a
second dataset called “validation dataset.” At the end of this phase, the resulting model is tested with a third dataset called
“test dataset” to check that it behaves as required.
Once the learning phase is achieved (e.g., model selection, training, and validation), the architecture and parameters of the
ML product are defined. Software and electronic hardware design consists of the requirements development and ensuring
that they are in-line with the requirements standards. It should be understood that the component verification will be based
on those requirements. Partially, or completely derived requirements are sent back to safety for analysis. Software and
electronic hardware architectures are also defined and their compliance with standards are ensured. The implementation
phase consists of implementing the ML sub-system according to this definition. During the implementation phase, the
following steps are carried out as applicable:
• Hardware production
• Hardware/Software integration
This implementation phase share common characteristics with the one performed on traditional systems, and existing
development standards may be of interest for this phase:
• For ATM/ANS systems: (EU) 2017/373, Eurocontrol SRM, IEC 61508, ED-109A/DO-278A, ED-218/DO-331, and
ED-153.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
Some implementation errors could be introduced during this phase (e.g., representation of the computed weights on the
target, performance resources not compatible, accuracy and performance downgraded if translation is in another language,
insertion of additional code, incompatibility of executable object code and target machine etc.). They are not specific to ML,
but they may jeopardize the expected behavior (e.g., prediction or execution time.) Hence, attention is needed to ensure
correct implementation of the ML-based sub-system. Compliance with previously cited standards could be a way to achieve
this.
NOTE: Application of Model-Based Development and Verification based on ED-218/DO-331 is presented in Section 4.
The purpose of this step is to check that the requirements are indeed fulfilled by the ML-based sub-system. Various
verification strategies can be adopted, depending on the verification means available: testing (including massive testing,
adversarial testing, robustness testing, etc.), formal methods, service experience, etc. The verification challenges are tightly
linked to the ML specification challenges.
ML-based sub-systems should also provide confidence in the absence of unintended behaviors. Unintended behaviors
should be identified during software verification activities (including specific verification techniques coming from research
such as neural network coverage) and mitigation should be put in place if needed. More investigation on this topic should
be carried out during the development of future AI standards.
Verification may be performed at the system level and/or at component level. The adequate balance between system-level
and component-level verification should be identified.
Once the ML-based sub-system has been designed, implemented, and verified, it can be integrated in the broader system.
The ML-based sub-system is put together with other components of the system, and integration is verified at the system
level.
• For ATM/ANS systems: (EU) 2017/373 and its AMC/GM, Eurocontrol SRM, IEC 61508, and ED-153.
NOTE: Similarity analysis or service history may facilitate the verification at the system level.
The system is maintained and potentially repaired or updated. Offline learning may be used by recording the operating
environment. The recording is then used to enhance the training offline, with the enhanced model verified before it is
uploaded onto the aircraft during a maintenance process or procedure. All ML system sensors are maintained (e.g.,
scratches repaired, sensors surface cleaned and polished, replaced sensors calibrated).
In-service operational data (input data and corresponding ML system prediction data) should be recorded for in-service
issues post analysis by users, by ML systems, and by authorities.
In-service operational data should be transferred to the ML system manufacturer for in-service analysis and potential
improvement of the training/validation and test datasets.
The in-service operational data management environment should be secured in order to avoid adversarial or accidental
data poisoning and tampering.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
The certification/approval of systems utilizing AI could be achieved through various strategies, involving different ML
development assurance activities. Each activity may contribute to the overall demonstration of compliance to applicable
regulation. The following potential activities are put forward as potential means and methods of compliance, for further
consideration by the committee as part of the standard development.
The committee further acknowledges that the application of these approaches may not fully result in a successful
demonstration of compliance without the evolution of regulatory policy. As mentioned in 4.4, the committee is engaged in
continued high level discussions with global regulators and other policy leaders on how AI/ML may impact and evolve
certification regulation.
Assurance can be defined as “The planned and systematic actions necessary to provide adequate confidence and evidence
that a product or process satisfies given requirements” (ED-12C and ED-109A/ED-153).
Learning Assurance is expected to be the adaptation of well-known development assurance approaches to Machine
Learning. As the learning phase is not addressed by existing standards, a new development assurance process is needed
for this phase.
Learning assurance aims at ensuring that learning errors are detected and removed through the application of best
engineering practices. This includes consideration of specification, data quality, selection, design, and verification of ML
models and learning errors.
The complete learning assurance process still needs to be defined through identification of ML development best practices.
The level of rigor of learning assurance should be adjusted depending on the selected Design Assurance Level, and on the
other activities performed.
Learning assurance only applies to the learning phase and is therefore only one of the activities needed as part of the full
development assurance process.
According to ED-216/DO-333, “Formal methods are mathematically based techniques for the specification, development,
and verification of software aspects of digital systems.”
For some ML algorithms, it might be possible to use formal methods to demonstrate the compliance of an AI implementation
with a given set of requirements or mathematical properties.
Such an approach has already been trialed on Neural Networks to demonstrate that the system satisfies specific safety
requirements (G. Katz, C. Barrett, D. Dill, K. Julian, and M. Kochenderfer, “Reluplex: An Efficient SMT Solver for Verifying
Deep Neural Networks,” ArXiv170201135 Cs, Feb. 2017).
Even if these approaches are currently limited to specific topologies of a small size, their main advantage is the ability to be
applied after training, and could therefore enable verifying the outputs of the learning phase without knowledge of the training
activities that have been performed (except knowledge of the aspects used as assumptions in the formal demonstration).
6.2.2.3 Testing
Testing aims at demonstrating compliance of the system utilizing AI with the requirements through various types of tests.
As for traditional system development, requirement-based testing is recommended.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
In addition to traditional testing methods, at least two additional testing approaches can be considered for Artificial
Intelligence products:
• Random testing: this approach consists of testing the product by generating a large number of random independent
inputs, and checking the corresponding outputs, in order to verify and demonstrate the performance requirements of
the system. The tests should be performed in a representative environment and with sufficient coverage to achieve
statistical significance.
• Robustness and adversarial testing: In addition to traditional functional robustness related to unexpected or outlier
inputs, the robustness of a ML system relates to the robustness of the ML inference versus any variability of the input
data compared to the data used during the learning process. Perturbations can be natural (e.g., sensor noise or bias)
due to failures (e.g., invalid data from degraded sensors) or malicious insertions (e.g., pixels modified in an image) to
fool the model predictions. Perturbations can also be defined as true data locally different from the original data used
for the model training and that might lead to a wrong prediction and an incorrect behavior of the system. Adversarial
testing is targeted to exercise the system robustness to adversarial examples and may also be used in order to detect
overfitting.
NOTE: Associated performance and robustness requirements should be properly defined as part of the specification of the
product.
Such approaches are now made possible thanks to the exponential increase of the available computing power, to the
improvements of simulation means, and to the progress in adversarial examples generation.
The criteria for assessing the statistical significance of these testing approaches are still to be specified in the context of
safety-critical products utilizing AI.
If testing relies on a dataset, the dataset used should be checked and verified in a similar manner as the other datasets
used in ML development.
6.2.2.4 Explainability
NOTE: The difference between explainability and interpretability has been debated for years within the scientific community,
without a clear conclusion on their respective definition. It is decided to use only the term explainability to cover the
whole concept of explainability and interpretability.
In traditional products (not utilizing AI), explainability is inherently built into the architecture, and requirements are directly
implemented. The implementation requirements trace directly to the system requirements as well as to the coding
developed. Therefore, the implementation is fully explainable as a result of the direct traceability from the system definition
down to the implementation code. On the contrary, some classes of products utilizing AI, and ML-based products in
particular, may show a behavior that cannot be directly traced from the requirements to the implementation code. The
process of transitioning from system requirements to the ML model, with weights, is usually an automated learning process
that does not preserve the traceability path from system requirements to ML model architecture and definition. The
translation from the requirements to the learned model during the algorithm learning process may not be fully
understandable, and therefore the learned model and weights are not explainable.
In this context, explainability of the behavior of a product utilizing AI is an important characteristic needed to support means
of compliance at several stages of its lifecycle:
• At certification/approval: an authority should be given explanations on how the system works and the verification
evidence that it meets its requirement in order to accept it.
• In operation: the user of the system should have sufficient understanding of the systems behavior to be able to use it
as intended.
• After an in-service occurrence: the manufacturer should be able to reproduce what happened and explain the causes
of the occurrence (large capacity data recording could be necessary).
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
Various levels of explainability exist, and it is acknowledged that complete explainability of the full AI decision-making
process may be neither achievable nor necessary to meet the goals previously listed.
Explainability may be inherent to the model used (e.g., rule-based systems) or be obtained through investigation techniques
applied to a black-box model (e.g., sensitivity analysis applied to ANN).
NOTE: Explainability is also beneficial for cybersecurity assessment to understand if the model has learned a vulnerability.
6.2.2.5 Licensing
The concept of licensing is mentioned here even if it is highly unlikely Civil Aviation Authorities could give
certification/approval credit to such an approach for compliance in the short term.
The concept of licensing has been introduced by NASA in “Certification Considerations for Adaptive Systems.” In this
document, licensing is described as follows: “Pilots are licensed to fly based on demonstrating knowledge and skill through
hundreds of hours of training and evaluation. Similarly, humans performing other critical tasks such as air traffic control are
trained and tested extensively before they enter an operational role. Extending this licensing procedure to autonomous
software would lead to an analogous system of gained trust. Certification would be eventually attained through extensive,
though not comprehensive, demonstration of knowledge and skill by the advanced software systems.”
In-service experience is defined in existing standards (such as ED-12C, ED-109A, AMC 20-152A on Airborne Electronic
Hardware, etc.) as an alternative method when no conventional software/hardware certification/approval artefact is available
or will be difficult to obtain. Traditionally, in-service experience is used to manage COTS, Previous Developed Software
(PDS) 7 or Previously Developed Hardware (PDH). For the time being, in-service experience has never been used to
certify/approve components utilizing AI. Therefore, the necessary conditions for enabling the use of in-service experience
to build certification/approval credit in this context remains to be defined.
Pending further analyses, the following necessary conditions could be considered as a starting point:
• The service period duration is sufficient (e.g., for a AL3/DALC COTS, 8760 cumulated hours may be required by the
Competent Authority).
• The new operating design domain (ODD) is the same or similar (with additional verification needed if not the same).
• The ML sensor suite is identical (same sensors, orientation, location, and operational environment, calibration).
• The product is stable and mature (that is, few problem reports and/or modifications occurred during the service period
and were not safety critical and all anomalous behavior events are recorded and analyzed).
In-service experience may be built using data coming from real operation or data coming from simulated operation (in
particular for ground systems) provided that all necessary conditions, in particular the one related to the environment
representativeness, are fulfilled.
NOTE: Service experience could also be beneficial to cybersecurity monitoring and in particular vulnerability management
of COTS/Open Source Software (OSS).
NOTE: Other alternative methods for Providing Assurance of COTS ML models, such as additional testing, prior product
approval, etc. may be considered.
7
PDS is software already developed for use. This encompasses a wide range of software, including COTS software through software developed to
previous or current software guidance (i.e., ED-12C or ED-109A).
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
If online learning is used, then it should be demonstrated that the behavior of the system is not adversely impacted by new
inputs processed in operation. At this stage, there is no mature solution to address online learning, but the demonstration
could rely on different strategies, for example formal proof at runtime or safety risk mitigation.
The suggested development assurance process for certification/approval is depicted in Figure 8. The process is composed
by a trustworthiness analysis, the identification of safety risk mitigations, a safety assessment, assurance level assignment,
planning for certification/approval and by the activities listed in the previous paragraph. The key principle in this process is
the proportionality of activities, depending on the AI technique used, on the system architecture and the safety assessment.
The first step is the safety assessment which then yields the assurance level assignment. These steps could be performed
as described in ED-79A/ARP4754A and ED-135/ARP4761 for airborne systems or in the EUROCONTROL SAM, the
SESAR SRM, and ED-153 for ATM/ANS systems.
However, these safety analysis approaches mainly deal with identifying the failure conditions of a system derived from
failures of its components and their interactions with other systems. Due to the complex nature of systems utilizing AI and
their interactions with a complex environment, the traditional safety process could be complemented with a more
comprehensive safety analysis that deals specifically with failure conditions derived from interactions of the system and
external actors, without any system failure being necessarily present. An example of this safety approach is the Safety of
the Intended Functionality (SOTIF - ISO 21448) concept which is currently being used in the automotive industry to cover
those scenarios for advanced algorithms. The following are some key items of the methodology:
• Evaluation of functional and performance aspects of the system and its AI sub-systems.
• Identification and evaluation of hazards caused by the intended functionality and its triggering events.
In order to mitigate possible unintended behavior of the system utilizing AI, safety risk mitigations can be put in place.
The identification of safety risk mitigations and the safety assessment are two tightly coupled activities that allow allocating
the safety requirements to each component of the product utilizing AI.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
In the context of products utilizing AI, various safety risk mitigation strategies can be considered, including the runtime
monitoring of components utilizing AI, output bounding mechanisms, dissimilar and/or redundant architectures, pilot
validation, or any other relevant strategies.
Once the safety assessment has been performed and the assurance level has been assigned, the plan for
certification/approval aims at identifying the activities that are relevant for the demonstration of compliance. This plan should
be adapted to each system, based on the AI technique used, on the system architecture and the safety assessment.
The plan can involve one or several activities listed in 6.2.2, and the depth of demonstration for each activity can be adjusted,
as illustrated by the gauge on top of each activity on Figure 8. Each activity can be carried out at one or several phases of
the product lifecycle described in 6.2.1.
The plan should be agreed with the certification/approval authority. Criteria for defining and accepting the relevant activities
and the relevant depth of demonstration could be defined as part of the future work of SAE-G34/WG-114. Any required
adaptation or change in the existing certification liaison processes (for airborne and ATM/ANS) should also be investigated.
The concerns outlined in this document are illustrated by use cases drawn from various aerospace application domains.
The columns in the table include:
• Example - a brief descriptive title of the use case identifying the functionality provided by the ML-based system.
• ID - a unique identifier useful for reference in future work of the joint EUROCAE SAE G-34/WG-114 committee.
• Assurance Gaps - features of the use case that complicate the process of assuring system safety and other assurance
concerns.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
The concerns outlined in this document are illustrated by use cases drawn from various aerospace application domains.
The columns in the table include:
• Example - a brief descriptive title of the use case identifying the functionality provided by the ML-based system.
• ID - a unique identifier useful for reference in future work of the joint EUROCAE WG-114/SAE G-34 committee.
• Safety Concerns - identified safety concerns that would need to be mitigated to achieve successful implementation of
the use-case.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
Safety Concerns
Highly trafficked airspace can overload ATC and
cause accidents.
Example ID Goal Input Output
Time-Based UC-SC308 Allow for stable Headwind Conditions, Time to keep aircraft
Separation arrival runway Aircraft Tracks separated on approach
throughput in all to landing
headwind conditions
on final approach.
Details Integration
Machine Learning Techniques ATC
Safety Concerns
The move from distance to time-based rules allowing
efficient separation management request to properly
model/predict
Example ID Goal Input Output
Remote Towers UC-SC309 Provide tower Video images Tower ATC for aircraft
services for airports Wind conditions operations
without a dedicated Aircraft Tracks
tower. The remote Ground vehicle traffic
tower uses sensors Voice
and data around the Radar
physical airport and
transmits to a
remote tower
operator
Details Integration
Machine Learning Techniques ATC
Neural networks
Safety Concerns
Loss of view from airport to remote tower will cause a
hold or stop on traffic. Remote tower operations may
lose situational awareness in unpredictable events.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
Details Integration
ML Techniques ATC
Safety Concerns
Flight commands from auto-router can damage
aircraft with flight commands
The Statement of Concerns (SOC) objectives were to (a) align the group (EUROCAE WG-114 and SAE G-34) on a common
understanding of AI techniques, (b) outline the concerns that use of such techniques would cause to the development of an
aeronautic system, and (c) recommend an efficient roadmap and organization to develop a means of compliance for AI
certification.
Section 3 (Classification of AI Techniques) identifies a classification of AI into three branches (Symbolic AI, Machine
Learning, and Numerical Analysis),and the rest of the SOC focusses on Machine Learning (ML) with the stipulation that
only offline learning is concerned at this time. This choice was made because Machine Learning appeared as the broadest
and most widely considered technique for use in aerospace development, while also being the most challenging to certify
with existing standards. However, this choice does not preclude the consideration of other techniques in the scope of future
standards produced by EUROCAE WG-114 and SAE G-34.
Section 4 (Gap analysis) considers the main design assurance standards for airborne and ground systems. If significant
gaps have been identified that make them not sufficient to the development of a ML-based system, many of their objectives
remain valid. This is the case for instance of system development or safety assessment processes: their objectives are still
applicable. However, specificities of ML Model development should be considered through specific guidance or methods.
Section 5 (ML development specific considerations and areas of concerns) exposes ML development specific
considerations and dives into areas of concern, including:
• The fact that current code coverage analysis may not be practical for neural networks and other machine learning
structures
• The fact that current testing methods may not be appropriate for AI/ML sub-systems
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
Assuming that many objectives of existing system development standards (e.g., ARP4754) remain applicable, this section
proposes a ML workflow in the overall development flow of the system for which it pertains. This can be summarized by the
below figure.
It should be noted that an AI/ML-based system development requires data scientist expertise in addition to the usual system,
safety, hardware, and software expertise.
Section 6 (Potential Next steps) suggests an approach for an ML-based systems certification/approval and detailed potential
development assurance activities that could be considered in the frame of standard development.
Section 7 (Use Cases - Aircraft Systems) and Section 8 (Air Traffic Management (ATM)/Ground Systems Operations (GSO))
collect use cases of aeronautical functions that would benefit from the use of AI techniques. This important work constitutes
a representative panel of the industrial needs at this point and it consolidates the necessary focus on ML techniques. No
new areas of concerns were raised compared to those identified in Section 5.
As a conclusion of the studies and analyses done by EUROCAE teams and SAE Sub-Committees through the SOC
document, the group recommends to breakdown into sub-groups as per the diagram below in order to efficiently develop a
standard or set of standards to serve as a means of compliance for AI and ML certification.
Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
10. NOTES
A change bar (I) located in the left margin is for the convenience of the user in locating areas where technical revisions, not
editorial changes, have been made to the previous issue of this document. An (R) symbol to the left of the document title
indicates a complete revision of the document, including technical revisions. Change bars and (R) are not used in original
publications, nor in documents that contain editorial changes only.