Air 6988

Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023
AEROSPACE AIR6988™
INFORMATION REPORT Issued 2021-04
Artificial Intelligence in Aeronautical Systems: Statement of Concerns
RATIONALE
In order to provide a means of compliance for the certification of Artificial Intelligence (AI) within safety critical aeronautical
systems, the committee must first review existing standards and perform a gap analysis to understand how and why existing
standards cannot be reliably used. This document serves as that gap analysis and provides a list of concerns that need to
be addressed in order to produce a future means of compliance.
TABLE OF CONTENTS
1. SCOPE .......................................................................................................................................................... 5
2. REFERENCES AND DEFINITIONS ............................................................................................................. 5

2.1 References .................................................................................................................................................... 5
2.2 Definitions ..................................................................................................................................................... 8
3. CLASSIFICATION OF AI TECHNIQUES ................................................................................................... 13

3.1 Symbolic AI ................................................................................................................................................. 14
3.1.1 Logic ............................................................................................................................................................ 14
3.1.2 Numerical Analysis...................................................................................................................................... 15
3.1.3 Machine Learning (ML) ............................................................................................................................... 16
3.2 Machine Learning System Workflow........................................................................................................... 18
4. GAP ANALYSIS FROM EXISTING STANDARDS ..................................................................................... 19

4.1 Introduction ................................................................................................................................................. 19
4.1.1 Description .................................................................................................................................................. 19
4.1.2 Gap Analysis Methodology ......................................................................................................................... 20
4.2 A Hypothetical Development Scenario ....................................................................................................... 20
4.2.1 Machine Learning Technique ...................................................................................................................... 20
4.2.2 Development Life Cycle Activities Description ............................................................................................ 20
4.3 Standards Analysis ..................................................................................................................................... 21
4.3.1 ED-79A/ARP4754A Guidelines for Development of Civil Aircraft and Systems ......................................... 21
4.3.2 ARP4761 Guidelines and Methods for Conducting the Safety Assessment Process
on Civil Airborne Systems and Equipment.................................................................................................. 21
4.3.3 ED-12C/DO-178C Software Considerations in Airborne Systems and Equipment Certification ................ 22
4.3.4 ED-218/DO-331 Model-Based Development and Verification Supplement to ED-12C ............................. 23
4.3.5 ED-215/DO-330 Software Tool Qualification Considerations ..................................................................... 24
4.3.6 ED-216/DO-333 Formal Methods Supplement to ED-12C and ED-109A .................................................. 24
4.3.7 ED-217/DO-332 Object-Oriented Technology Supplement to ED-12C and ED-109A ............................... 24
4.3.8 ED-80/DO-254 Design Assurance Guidance for Airborne Electronic Hardware ........................................ 24
4.3.9 ED-109A/DO-278A Software Integrity Assurance Considerations for
Communication, Navigation, Surveillance, and Air Traffic Management (CNS/ATM) Systems ................. 25
4.3.10 ED-153 Guidelines for ANS Software Safety Assurance............................................................................ 25
4.3.11 Commission Implementing Regulation (EU) 2017/373 ............................................................................... 26
__________________________________________________________________________________________________________________________________________
SAE Executive Standards Committee Rules provide that: “This report is published by SAE to advance the state of technical and engineering sciences. The use of this report is
entirely voluntary, and its applicability and suitability for any particular use, including any patent infringement arising therefrom, is the sole responsibility of the user.”
SAE reviews each technical report at least every five years at which time it may be revised, reaffirmed, stabilized, or cancelled. SAE invites your written comments and
suggestions.
Copyright © 2021 SAE International
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted, in any form or by any means, electronic, mechanical, photocopying,
recording, or otherwise, without the prior written permission of SAE.
TO PLACE A DOCUMENT ORDER: Tel: 877-606-7323 (inside USA and Canada)
Tel: +1 724-776-4970 (outside USA)
For more information on this standard, visit
Fax: 724-776-0790 https://www.sae.org/standards/content/AIR6988
Email: CustomerService@sae.org
SAE WEB ADDRESS: http://www.sae.org
SAE INTERNATIONAL AIR6988™ Page 2 of 61
4.3.12 ISO 12207 Systems and Software Engineering - Software Life Cycle Processes ..................................... 27
4.3.13 ISO 26262 Road Vehicles - Functional Safety ........................................................................................... 27
4.3.14 ISO 21448 Road Vehicles - Safety of the Intended Functionality ............................................................... 27
4.3.15 ED-201 Aeronautical Information System Security (AISS) Framework Guidance ..................................... 27
4.3.16 ED-202A/DO-326A Airworthiness Security Process Specification ............................................................. 28
4.3.17 ED-203A/DO-356A Airworthiness Security Methods and Considerations.................................................. 28
4.3.18 ED-204/DO-355 Information Security Guidance for Continuing Airworthiness .......................................... 28
4.3.19 ED-205 Process Standard for Security Certification and Declaration of ATM
ANS Ground Systems ................................................................................................................................. 29
4.4 Gap Analysis Summary............................................................................................................................... 29
5. ML DEVELOPMENT SPECIFIC CONSIDERATIONS AND AREAS OF CONCERNS .............................. 29

5.1 Criteria for the Identification of Specific Considerations and Areas of Concerns ....................................... 29
5.2 Specific Considerations and Areas of Concerns ........................................................................................ 30
5.2.1 System Definition ........................................................................................................................................ 30
5.2.2 Data Selection and Validation ..................................................................................................................... 30
5.2.3 Model Selection, Training, and Testing....................................................................................................... 31
5.2.4 Inference Implementation of the ML Model ................................................................................................ 32
5.2.5 ML Sub-System Verification ........................................................................................................................ 32
5.2.6 System Integration and Verification ............................................................................................................ 33
5.2.7 Operation .................................................................................................................................................... 33
6. POTENTIAL NEXT STEPS ......................................................................................................................... 33

6.1 Scope of Work............................................................................................................................................. 33
6.1.1 Airborne and Ground Systems .................................................................................................................... 33
6.1.2 AI Techniques Included............................................................................................................................... 34
6.1.3 Autonomy .................................................................................................................................................... 34
6.1.4 Cybersecurity .............................................................................................................................................. 34
6.1.5 Trustworthiness Analysis ............................................................................................................................ 35
6.2 Suggested Approach for Certification/Approval of Systems Utilizing AI ..................................................... 36
6.2.1 Phases of the Product Lifecycle.................................................................................................................. 36
6.2.2 Potential AI/ML Development Assurance Activities .................................................................................... 40
6.2.3 Potential AI/ML Development Assurance Process ..................................................................................... 43
7. USE CASES - AIRCRAFT SYSTEMS ........................................................................................................ 44
8. AIR TRAFFIC MANAGEMENT (ATM)/GROUND SYSTEMS OPERATIONS (GSO) ................................ 51
9. CONCLUSIONS AND WAY FORWARD .................................................................................................... 59
10. NOTES ........................................................................................................................................................ 61

10.1 Revision Indicator........................................................................................................................................ 61
Figure 1 AI classification ........................................................................................................................................... 13

Figure 2 The spectrum of ontology kinds [34] ........................................................................................................... 14
Figure 3 Machine learning bubble chart .................................................................................................................... 17
Figure 4 Example of A SNN [66] ............................................................................................................................... 18
Figure 5 Example of A DNN [66] ............................................................................................................................... 18
Figure 6 Machine learning system workflow ............................................................................................................. 18
Figure 7 Product lifecycle for ML-based products ..................................................................................................... 37
Figure 8 Development assurance process for certification/approval ........................................................................ 43
Figure 9 ML workflow in a system development ....................................................................................................... 60
Figure 10 SAE G-34/EUROCAE WG-114 joint committee structure .......................................................................... 61
Table 1 Example use cases for aircraft systems ..................................................................................................... 45

Table 2 Air space traffic management use-cases .................................................................................................... 52
Table 3 Ground traffic management use cases ....................................................................................................... 55
Table 4 Airworthiness use cases ............................................................................................................................. 56
Table 5 Air traffic communications use cases.......................................................................................................... 56
Table 6 Remote ID from NAS infrastructure use cases ........................................................................................... 56
Table 7 Mixed man and unmanned operations use case ........................................................................................ 57
Table 8 Predictive maintenance use cases ............................................................................................................. 58
PREFACE
Anticipating a growing commercial pressure for Artificial Intelligence (AI) solutions within the aerospace industry over the
coming few years, there is an urgent call for regulation and the emergence of norms around acceptable usage. In response,
two working groups were set up independently on either side of the Atlantic during 2019 to address concerns around
assuring products and services that exploit AI technologies. WG-114 was established by EUROCAE in Europe and G-34
by SAE in the United States.
Both Working Groups were created to produce guidance on safe, secure, and successful adoption of AI technologies in
Aeronautical Systems, through consensus amongst many experts and practitioners in industry and academia. In bilateral
agreement, the groups formed a joint committee in June 2019.
The joint working group will evaluate key applications for AI usage within aeronautical systems, with a scope encompassing
ground-based equipment and airborne vehicles, including Unmanned Aircraft Systems (UAS) products. In terms of
processes, the full lifecycle will be under consideration, from design and manufacture, to operation and through-life
maintenance.
A key deliverable will be documented standards, providing guidance on assuring safe and secure systems utilizing AI,
through an agreed acceptable means of compliance with regulatory requirements.
As per the charters of both EUROCAE WG-114 and SAE G-34, the first objective of the joint working group was to develop
and publish a technical report, a comprehensive Statement of Concerns (SOC), outlining the scope and purpose of the
group’s work and considering the concerns before imagining the solutions. This document is the response to that objective
and is the outcome of virtual meetings that took place between September 2019 and May 2020.
Before merging, each of the groups was developing a SOC document independently. Both groups were organized into
sub-groups, each addressing a section of the SOC. G-34 had Sub-Committees (SCs) and WG-114 had Teams (TMs). Upon
merging, six groups were formed, as defined in the following table.
WG-114 G-34 Scope

TM1 SC4 Taxonomy/Classification of AI Techniques
TM2 - Areas of Concerns
TM3 SC1 Gap Analysis from Existing Standards
TM4 - Potential Next Steps
- SC2 Aircraft Systems
- SC3 Ground Use-Case & ATO
AI has the potential to disrupt the aerospace industry, impacting all areas in which computing and aerospace intersect. AI
technologies are becoming progressively more embedded into the digital systems used to design, manufacture, operate,
and maintain both aerial vehicles and ground-based systems. Leveraged appropriately, AI-driven solutions could transform
the products and services that aerospace companies provide with an accelerated pace of change. Specifically, Machine
Learning (ML) technologies have the potential to revolutionize established paradigms of aeronautical system development,
including those concerned with safety-critical applications.
AI is a broad subject, still being actively developed from a confluence of many disciplines, including mathematics, computing,
cognitive science, software development, data science, control theory, and others. It demands a collaborative approach
with experts contributing from multiple domains.
Current industry guidance documents have a strong focus on established development methodologies and solutions for
aeronautical systems. These standards are therefore not expected to entirely accommodate the development and
assurance of ML-enabled solutions.
SUMMARY OF CONTENTS
Section 3 surveys established AI techniques and their application to various problem domains, placing them in a historical
context and suggesting a taxonomy. A development workflow is also outlined.
From the foundational background provided by Section 3, Section 4 assesses current aerospace industry standards and
guidelines, discussing their applicability to systems that incorporate ML technologies. Chief among these standards are ED-
12C/DO-178C and supplements, ED-79/ARP-4754A, ED-80/DO-254, ED-109A/DO-278A, and ED-153.
In Section 5, specific areas of concern for both developmental and operational activities are identified, using the
development workflow outlined in Section 3 as a framework for discussion. The scope is limited to technical considerations.
Ethical and societal implications are out of scope.
Section 6 defines the scope of the documents that the joint Working Group will produce. Both ground and airborne
applications are considered, at whole-system level, rather than at the level of their constituent hardware and software
components. Software operates in the context of a system, and key quality attributes such as safety are properties of the
overall system.
Potential applications of AI to aerospace domains are proposed as a set of Use Cases in Sections 7 and 8, for airborne and
ground-based systems, respectively.
The conclusion in Section 9 proposes a staged plan for developing a future standard for AI techniques, as well as processes
and technologies to be assessed against, in the context of aeronautical systems.
IMPORTANT NOTICE: This document neither defines guidance, nor sets forth any constraints around future guidance.
1. SCOPE
This document reviews current aerospace software, hardware, and system development standards used in the
certification/approval process of safety-critical airborne and ground-based systems, and assesses whether these standards
are compatible with a typical Artificial Intelligence (AI) and Machine Learning (ML) development approach. The document
then outlines what is required to produce a standard that provides the necessary accommodation to support integration of
ML-enabled sub-systems into safety-critical airborne and ground-based systems, and details next steps in the production
of such a standard.
NOTE: This document and the upcoming standard it is to help inform are concerned only with “offline” learning applications
of AI and ML. In offline learning, ML models are trained on historical data within a dedicated learning environment.
When the trained models are then implemented into a production system, learning functionality is turned-off. The
production system implementing AI may collect data for retraining at a later date, but any retraining or further
learning will happen in the separate learning environment, and any resulting changes to the ML models will then
need to be re-implemented into production as a new version of the system utilizing AI. This is in contrast to “online”
learning, where a system utilizing AI will continue to learn and adapt its operation while in production. Consideration
of such systems is not out of scope for SAE-G34, but the committee will not consider online learning until after
publication of this document and its related standard.
2. REFERENCES AND DEFINITIONS
2.1 References
[1] NASA/Honeywell, “DOT-FAA-TC-16-4 Verification of Adaptive Systems,” 2016.

[2] EASA, "EASA AI Roadmap 1.0," 2020.
[3] SAE, "ARP6128 - Unmanned Systems Terminology Based on the ALFUS Framework," 2019.
[4] SAE, "J3016 - Taxonomy and Definitions for Terms related to Driving Automation Systems for On-Road Motor
Vehicles," 2018.
[5] A. Geron, Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow; 2nd ed, O'Reilly Media,
2019.
[6] ISO/IEC 2382:2005, "Information Technology - Vocabulary," 2015.
[7] SAE/EUROCAE, "ARP4754A/ED-79A - Guidelines for Development of Civil Aircraft and Systems," 2010.
[8] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, The MIT Press, 2016.
[9] RTCA/EUROCAE, "DO-178C/ED-12C Software considerations in airborne systems and equipment
certification," 2013.
[10] High-Level Expert Group on AI (AI HLEG), "Ethics Guidelines for Trustworthy AI," 2019.
[11] Z. Manna, Mathematical Theory of Computation, Dover Publications, 1974.
[12] "Propositional calculus," [Online]. Available: https://en.wikipedia.org/wiki/Propositional_calculus.
[13] "T-norm fuzzy logics," [Online]. Available: https://en.wikipedia.org/wiki/T-norm_fuzzy_logics.
[14] "First-order logic," [Online]. Available: https://en.wikipedia.org/wiki/First-order_logic.
[15] "Second-order logic," [Online]. Available: https://en.wikipedia.org/wiki/Second-order_logic.
[16] "Prolog," [Online]. Available: https://en.wikipedia.org/wiki/Prolog.
[17] "Logic Programming," [Online]. Available: https://en.wikipedia.org/wiki/Logic_programming.
[18] OWL Working Group, "OWL Web Ontology Language Use Cases and Requirements," 2004. [Online].
Available: https://www.w3.org/TR/webont-req/#onto-def.
[19] OWL Working Group, "OWL 2 Web Ontology Language, 2nd ed., W3C Recommendation," 2012. [Online].
Available: https://www.w3.org/TR/owl-ref/.
[20] P. Bellini, R. Mattolini, and P. Nesi, "Temporal Logics for Real-Time Systems Specification," ACM Computing
Surveys, 2000.
[21] "Temporal Logic," [Online]. Available: https://en.wikipedia.org/wiki/Temporal_logic.
[22] "Knowledge engineering," [Online]. Available: https://en.wikipedia.org/wiki/Knowledge_engineering.
[23] "Expert system," [Online]. Available: https://en.wikipedia.org/wiki/Expert_system.
[24] "Rule-based system," [Online]. Available: https://en.wikipedia.org/wiki/Rule-based_system.
[25] "Knowledge representation and reasoning," [Online]. Available:
https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning.
[26] "Ontology engineering," [Online]. Available: https://en.wikipedia.org/wiki/Ontology_engineering.
[27] D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques, The MIT Press, 2009.
[28] "Graphical model," [Online]. Available: https://en.wikipedia.org/wiki/Graphical_model.
[29] T. R. Gruber, "A translation approach to portable," Knowledge Acquisition, 1993.
[30] L. F. Sikos, Mastering Structured Data on the Semantic Web: from HTML5 Microdata to Linked Open Data,
2015.
[31] I. Horrocks, U. Sattler, and F. Baader, "Chapter 3: Description Logics," in Foundations of Artificial Intelligence,
2007.
[32] R. Arp et al., Building Ontologies with Basic Formal Ontology, The MIT Press, 2015.
[33] J. Heleber et al., Semantic Web Programming, Wiley, 2009.
[34] W. Wong et al., "Ontology learning from text: a look back and into the future," ACM Computing Surveys, Vol.
44, No. 4, Article 20, 2012.
[35] "Semantic Web Stack," [Online]. Available: https://en.wikipedia.org/wiki/Semantic_Web_Stack.
[36] N. Casellas, "Linked Legal Data: A SKOS Vocabulary for the Code of Federal Regulations," IOS Press.
[37] S. Mandal et al., "Semantic Web Representations for Reasoning about Applicability and Satisfiability of
Federal Regulations for Information Security," RELAW, Ottawa, 2015.
[38] I. Sanya and E. Shehab, "A Framework for developing engineering design ontologies within the aerospace
industry," International Journal of Production Research, vol. 53:8, pp. 2383-2409, 2015.
[39] CRYSTAL - Critical System Engineering Acceleration, "State of the art for Healthcare ontology," vol.
D407.010, 2013.
[40] P. J. Besl and N. D. McKay, "Method for registration of 3-D shapes. Sensor fusion IV: control paradigms and
data structures.," International Society for Optics and Photonics, vol. 1611, 1992.
[41] S. M. LaValle and J. J. Kuffner Jr., "Randomized kinodynamic planning," The international journal of robotics
research, vol. 20, no. 5, pp. 378-400, 2001.
[42] S. Karaman and E. Frazzoli, "Sampling-based algorithms for optimal motion planning," The international
journal of robotics research, vol. 30.7, pp. 846-894, 2011.
[43] M. A. Fischler and R. C. Bolles, "Random sample consensus: a paradigm for model fitting with applications to
image analysis and automated cartography," Communications of the ACM, vol. 24.6, pp. 381-395, 1981.
[44] P. Jackson, Introduction to Artificial Intelligence, Dover Publications, 1985.
[45] H. Moravec and A. Elfes, "High resolution maps from wide angle sonar," in IEEE international conference on
robotics and automation, 1985.
[46] P. E. Hart, N. J. Nilsson, and B. Raphael, "A formal basis for the heuristic determination of minimum cost
paths," IEEE transactions on Systems Science and Cybernetics, vol. 4.2, pp. 100-107, 1968.
[47] C. G. Harris and M. Stephens, "A combined corner and edge detector," in Alvey vision conference, 1988.
[48] A. L. Samuel, "Some Studies in Machine Learning Using the Game of Checkers," IBM Journal, pp. 211-229,
1959.
[49] T. M. Mitchell, Machine Learning, McGraw-Hill, 1997.
[50] K. P. Murphy, Machine Learning: A probabilistic Perspective, The MIT Press, 2012.
[51] L. Breiman, "Statistical Modeling: The Two Cultures," Statistic Science, vol. 16(3), pp. 199-231, 2001.
[52] D. Wolpert and W. G. Macready, "No Free Lunch Theorems for Optimization," IEEE Transactions on
Evolutionary Computation, vol. 1 No 1, 1997.
[53] D. H. Wolpert, "The Lack of A Priori Distinctions Between Learning Algorithms," Neural Computation, vol. 8,
pp. 1341-1390, 1996.
[54] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, 2nd ed., Springer, 2009.
[55] "Supervised learning," [Online]. Available: https://en.wikipedia.org/wiki/Supervised_learning.
[56] S. Russell and P. Norvig, "Chapter 18. Learning from examples," in Artificial Intelligence: A modern approach,
Prentice Hall, 2009, pp. 493-767.
[57] A. A. Patel, Hands On Unsupervised Learning with Python, O'Reilly, 2019.
[58] "Unsupervised learning," [Online]. Available: https://en.wikipedia.org/wiki/Unsupervised_learning.
[59] "Semi-supervised learning," [Online]. Available: https://en.wikipedia.org/wiki/Semi-supervised_learning.
[60] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed., The MIT Press, 2018.
[61] "Reinforcement learning," [Online]. Available: https://en.wikipedia.org/wiki/Reinforcement_learning.
[62] "Meta learning," [Online]. Available: https://en.wikipedia.org/wiki/Meta_learning_(computer_science).
[63] "Automated machine learning," [Online]. Available: https://en.wikipedia.org/wiki/Automated_machine_learning.
[64] "Neuroevolution," [Online]. Available: https://en.wikipedia.org/wiki/Neuroevolution.
[65] D. Grbic and S. Risi, "Towards continual reinforcement learning through evolutionary meta-learning," GECCO
'19: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 119-120, 2019.
[66] M. A. Nielsen, Neural Networks and Deep Learning, Determination Press, 2015.
[67] David Rumelhart et al., "Learning Internal Representations by Error Propagation," Defense Technical
Information Center technical report, 1985.
[68] Heng-Tze et al., "Wide & Deep Learning for Recommender Systems," in First Workshop on Deep Learning for
Recommender Systems, 2006.
[69] NASA, "Certification Considerations for Adaptive Systems," 2015.
[70] V. Chandola, A. Banerjee and V. Kumar, "Anomaly Detection: A Survey," ACM Computing Surveys, 2009.
[71] SAE, "J3016 Taxonomy and Definitions for Terms related to Driving Automation Systems for On-Road Motor
Vehicles," 2018.
2.2 Definitions
This set of definitions is meant to be a reference for the whole document.
ACCEPTANCE SCENARIO: A test or simulation procedure designed to gather evidence of a system’s compliance with
requirements and expectations.
ACCEPTANCE CRITERIA: The criteria applied for the complete set of acceptance scenarios.
ADAPTIVE SYSTEM: System that changes its behavior, based on an active feedback process in the presence of changes
in the system or its environment. The environment can include hardware or software components of the computing platform,
or the external surroundings in which the system operates (e.g., for airborne systems, changes to the physical structure of
the aircraft, or in the weather). An adaptive system goal-seeks by iteratively updating parameters in response to
environmental changes, but without resorting to predetermined values in look-up tables, or through defined calculations.
This is the defining characteristic.
ARTIFICIAL INTELLIGENCE (AI): The theory and development of software-based systems able to perform tasks that have
hitherto been the province of human intelligence. Examples include visual perception, speech recognition, decision-making,
customer support and anomaly detection. John McCarthy, who coined the term in 1956, defines it as "the science and
engineering of making intelligent machines."
ARTIFICIAL GENERAL INTELLIGENCE (AGI): Also known as general or strong AI, Artificial General Intelligence refers to
an artificial system able to perform any intellectual task that can be performed by a typical human. The result would be a
system considered to have the kind of general intelligence possessed by human beings and perhaps also qualities like
consciousness.
ARTIFICIAL NARROW INTELLIGENCE (ANI): Artificial Narrow Intelligence, also known as weak AI, refers to the application
of AI to solve narrow or specific problems. All current AI implementations are in this category.
ARTIFICIAL SUPER INTELLIGENCE (ASI): Artificial Super Intelligence research considers the possibility of systems that,
given their superior processing and storage capabilities when compared to the human brain, could potentially exhibit
super-human levels of intelligence. How such systems could be created, and how they would be evaluated is a matter of
research and philosophical debate.
ARTIFICIAL NEURAL NETWORK (ANN): Neural Networks are algorithms, modelled loosely on the workings of the
biological brain, where neurons are “fired” with sufficient stimuli. Typically, an ANN is “trained” with example data, in which
the expected outputs are defined for a given input dataset. Parameters are adjusted based on the deviation from expected
outcomes.
In “classical” ANNs, data is transmitted in one direction only, but there are two common architectures in which data is
bi-directional, and these are better suited to certain classes of problem:
• Convolutional Neural Network (CNN): A type of Neural Network for processing data that has a known grid-like topology.
CNNs use a specialized kind of linear operation in place of general matrix multiplication in at least one of their layers,
which make them well suited to analytical applications, such as predictive maintenance.
• Recurrent Neural Network (RNN): A type of Neural Network that involves directed cycles in memory. One aspect of
recurrent neural networks is the ability to build on earlier types of networks with fixed-size input vectors and output
vectors, which make them well suited to operational applications, such as a self-driving car or self-flying airplane.
ATTRIBUTE: Information representing a property or characteristic of a subject as recorded in structured data, such as a
table, spreadsheet, or database. In a database context, attributes are commonly referred to as fields, whereas in a
spreadsheet or table context, attributes are commonly referred to as columns. For instance, structured data representing a
person may have attributes such as name or age.
AUTOMATION: The use of control systems and information technologies reducing the need for human input, typically for
repetitive tasks.
AUTONOMOUS [4]: Operations of an unmanned system wherein the system receives its mission from the human or agent
and accomplishes that mission with or without further human-robot interaction (HRI). The level of HRI, along with other
factors such as mission complexity, and environmental difficulty, determine the level of autonomy for the unmanned system.
Finer-grained autonomy level designations can also be applied to the tasks, lower in scope than mission.
• Fully Autonomous: A mode of UMS operations wherein the UMS is expected to accomplish its mission, within a defined
scope, without human intervention.
• Semi-Autonomous: A mode of UMS operations wherein the human operator and/or the UMS plan(s) and conduct(s) a
mission and requires various levels of HRI.
For the automotive industry, six levels of autonomy are defined (refer to SAE J3016 [4]).
AUTONOMY: The ability to perform one or more tasks in a changing environment following a decision-making process
without input by a human.
BIAS (MACHINE LEARNING): (1) An error deriving from erroneous assumptions in the learning process. High bias can
cause an algorithm to miss the relevant relations between attributes and target outputs (known as underfitting). (2) A
parameter within neural network that transforms input data. The inputs get modified by a bias value.
BIAS (STATISTICS): A feature of a statistical technique or of its results whereby the expected value of the results differs
from the true underlying quantitative parameter being estimated.
BIG DATA: A discipline that specializes in dealing with the analysis of very large amounts of data, with high velocity (high
speed of data processing.) The data may come from a wide variety of sources (sensors, images, texts, etc.) in a wide variety
of formats, including unstructured formats, such as free text.
CLASSIFICATION (MACHINE LEARNING): A method that maps input data to one of a number of discrete classes.
CLUSTERING: A method that identifies similarities between features of the data and groups data items into clusters with
similar features. It is a common example of unsupervised learning.
COMMERCIAL-OFF-THE-SHELF (COTS): Refers to generally available software and hardware developed for a broad
market, rather than hardware or software designed specifically for aerospace applications. Special rules and regulations
often apply when leveraging COTS hardware or software for safety critical applications, as such hardware or software may
not have been developed in accordance with aerospace industry standards.
DATA CLEANSING: Identification and removal of errors and duplicate data to create a reliable dataset. This improves the
quality of the training data for analytics.
DATA DRIFT: Phenomenon in which a Machine Learning system encounters data values during operation that were not
used in training the underlying model. The predictive capabilities of the system are compromised as in-service data values
diverge from those used in model training; for instance, the performance of an aircraft engine prognostics system trained
with data from warm weather data, may degrade if the deployed system is used in cold weather over sustained periods.
DATA-DRIVEN AI: An approach to AI development focused on building a system that can output appropriate responses
based on having learned from a large number of examples.
DATA SCIENCE [3]: A broad field that refers to the collective processes, theories, concepts, tools, and technologies that
enable the extraction of information and analysis to acquire knowledge from that information.
DATASET (MACHINE LEARNING): The sample of data used for various development phases of the algorithm: i.e., training,
validation, and test.
• Training Dataset: Data that is input to a Machine Learning model in order to establish its behavior.
• Validation/Development Dataset: Used to tune some hyperparameters of a model (e.g., number of hidden layers,
learning rate, number of neurons per layer).
• Test Dataset: Used to assess the performance of a model, independent of the training dataset.
NOTE: These terms are not used consistently in the ML community.
DECISION MAKING: The selection of a course of action based on available information.
DEEP LEARNING (DL): A specific type of Machine Learning based on the use of large (deep) neural networks to learn
abstract representations of the input data using multiple layers.
DETERMINISTIC SYSTEM: A system for which no randomness is involved in the development of future states. A
deterministic system will always produce the same output given the same input (including environmental conditions) and
initial state.
DERIVED REQUIREMENTS: Requirements which are not directly traceable to higher level requirements and/or specify
behavior beyond that specified by the higher level requirements.
DOMAIN: Specific field of knowledge or expertise.
EXPECTED PROPERTY: The expected functional or performance property of the system outputs for each acceptance
scenario.
EXPERT SYSTEM: A knowledge-based system that provides for solving problems in a domain or application area by
drawing inferences from a knowledge base developed from human expertise. The term "expert system" is sometimes used
synonymously with "knowledge-based system” but should be taken to emphasize expert knowledge. Some expert systems
can improve their knowledge base and develop new inference rules based on their experience with previous problems.
Expert systems usually rely on explicit, symbolic encoding of knowledge.
EXPLAINABILITY: The extent to which humans can understand, interpret, and account for the causality of an AI system or
algorithm: why a particular output is produced for a given set of inputs. (Note: In this document, we use explainability in a
broad manner to cover both explainability, as in a system’s ability to show its work, and interpretability, as in a human’s
ability to logically understand what is being explained.)
EXPLAINABLE AI: An emerging field of study that aims to make the workings of AI systems transparent, such that humans
can see process and decision flows. Note, that this may not the same as having AI systems be logically understood by
humans, although the goal of many Explainable AI projects is often aligned to that outcome. See definition for
INTERPRETABLE AI.
FORMAL METHODS: A collection of mathematically rigorous techniques that can be applied to the specification,
development and verification of software or hardware. A major benefit is a reduced need for human inspection and testing,
for verification and validation, relying instead on unequivocal mathematical proofs of system properties. Challenges include
the high level of required expertise, sparse tool support, and the limited applicability to all types of problem.
FUNCTION: The intended behavior of a product based on a defined set of requirements regardless of implementation.
GENETIC LEARNING [6]: An approach to Machine Learning based on an iterative classification algorithm which selects
pairs of classifiers according to strength, and then applies genetic operators to the pairs to create offspring. The strongest
offspring replace the weakest in order to generate new, plausible classifiers when the prior classifiers prove inadequate.
The term “genetic" comes from the field of natural genetics, where it is linked to heredity governed by genes.
HYPERPARAMETER (MACHINE LEARNING) [8]: A setting that is used to control the behavior of a learning algorithm (e.g.,
number of hidden layers, learning rate, number of neurons per layer). Note: the values of hyperparameters are not adapted
by the learning algorithm.
INFERENCE (MACHINE LEARNING): The process of a Machine Learning model computing an output, based on input
data. The concept of inference originates from the field of Logic and is used extensively in Symbolic AI. The dictionary
definition is "a conclusion or opinion that is formed because of known facts or evidence." See also "rule of inference" in [14]
or "inference engine" in [25].
See also the related definition of “Training.”

INTERNET OF THINGS (IoT): The network of physical objects that contain embedded technology to communicate and
sense or interact with their internal states or the external environment.
INTERPRETABLE AI: The quality of an AI system to explain process and decision flows in such a manner that humans can
logically decompose and understand the system’s output and decision making. With Interpretable AI, predictions can be
logically justified, errors can be traced to their source in logical fashion, and a level of confidence in outcomes can be
ascertained.
ITEM: A hardware or software element having bounded and well-defined interfaces.
KNOWLEDGE: A collection of facts, events, beliefs, and rules organized for systematic use.
KNOWLEDGE ACQUISITION: The process of locating, collecting, and refining knowledge and converting it into a form that
can be further processed by a knowledge-based system. Knowledge acquisition normally implies the intervention of a
knowledge engineer, but it is also an important component of Machine Learning. [6]
KNOWLEDGE BASE; K-BASE [6]: A database that contains inference rules and information about human experience and
expertise in a domain. In self-improving systems, the knowledge base additionally contains information resulting from the
solution of previously encountered problems.
KNOWLEDGE ENGINEERING [6]: The discipline concerned with acquiring knowledge from domain experts and other
knowledge sources and incorporating it into a knowledge base. The term "knowledge engineering" sometimes refers
particularly to the art of designing, building, and maintaining expert systems and other knowledge-based systems.
KNOWLEDGE-BASED SYSTEM: Information processing system that provides for solving problems in a domain or
application area by drawing inferences from a knowledge base. The term "knowledge-based system" is sometimes used
synonymously with "expert system," which is usually restricted to expert knowledge.
MACHINE LEARNING (ML): The branch of AI concerned with the development of algorithms that allow computers to evolve
behaviors based on observing data and making inferences on these data.
Machine Learning strategies include four methods: [1][2]
• Supervised Learning: The process of learning a function that maps an input to an output based on labelled training
dataset.
• Unsupervised Learning: The process of learning a function from a non-labelled dataset, by adapting the model to
increase accuracy of the algorithm based on a given cost function.
• Reinforcement Learning: The process of learning in which the algorithm rewards positive results and punishes for a
negative result, enabling it to improve over time. The learning system is called an agent.
• Semi-Supervised Learning: The process of learning in which the algorithm is capable of learning from data that is
partially labelled.
MACHINE LEARNING MODEL: A parameterized function that maps inputs to outputs. The parameters are determined
during the training process.
NATURAL LANGUAGE PROCESSING (NLP): Natural Language Processing (NLP) is a sub-field of Artificial Intelligence,
concerned with computational techniques for analyzing, representing, and processing natural (human) language texts and
voices, at one or more levels of linguistic analysis, such as phonetics, syntax, semantics and discourse, for the purpose of
achieving human-like language abilities for a range of applications, including natural language understanding and
generation, and speech recognition and generation.
OPERATIONAL DESIGN DOMAIN (ODD): Description of the specific operating domain(s) in which an automated function
or system is designed to properly operate, including but not limited to operational aspects, environmental conditions, and
other domain constraints.
ONLINE LEARNING ALGORITHM [2]: Represents a class of learning algorithms that learn to sequentially optimize
predictive models over a stream of data instances while performing in the run-time environment during operations. The
on-the-fly learning makes online learning highly scalable and memory efficient and differentiates it from batch learning or
offline learning.
OPTIMIZATION: Methods that find the best available values within a function or distribution according to a defined cost
function.
PREDICTABILITY: The degree to which a correct forecast of a system's state can be made quantitatively.
REGRESSION: A set of statistical techniques to postulate the mathematical relationship of one or more dependent variables
to one or more independent variables and use such a relationship for statistical inference or prediction.
REINFORCEMENT LEARNING (MACHINE LEARNING): See Machine Learning definition.
ROBUSTNESS (SYSTEM): The extent to which the system handles invalid inputs properly, ability of a system to provide
stable outputs with perturbations in the inputs.
ROBUSTNESS (SOFTWARE): The extent to which software can continue to operate correctly despite abnormal inputs and
conditions.
ROBUSTNESS (MACHINE LEARNING MODEL): Ability of a model to generalize the outputs, for an input varying in a
region of the state space compatible with the training range.
RULE-BASED SYSTEM; PRODUCTION SYSTEM: A knowledge-based system that draws inferences by applying a set of
if-then rules to a set of facts following given procedures.
SEMANTIC NETWORK; SEMANTIC NET: A concept-based knowledge representation in which objects or states appear
as nodes connected with links that indicate the relationships between various nodes.
SEMI-SUPERVISED LEARNING: See Machine Learning definition.
SUPERVISED LEARNING: See Machine Learning definition.
SYMBOLIC AI [3]: In contrast to Data-Driven AI, attempts to capture knowledge and derive decisions through explicit
representation and rules.
SYSTEM: A combination of inter-related elements arranged to perform specific functions.
TESTING: The process of exercising a system or component to verify that it satisfies specified requirements.
TRAINING (MACHINE LEARNING): The process of optimizing the parameters of a Machine Learning model, given a
dataset and a task to achieve on that dataset.
TRUSTWORTHINESS: Set of three qualities of a system that should be satisfied throughout its entire life cycle: it should
be lawful, complying with all applicable laws and regulations, it should be ethical, ensuring adherence to ethical principles
and values, and it should be robust, both from a technical and social perspective. [10]
UNMANNED SYSTEM (UMS): An electro-mechanical system, with no human operator aboard, that is able to exert its power
to perform designed missions. May be mobile or stationary. Includes categories of unmanned ground vehicles (UGV),
unmanned aerial vehicles (UAV), unmanned underwater vehicles (UUV), unmanned surface vehicles (USV), unattended
munitions (UM), and unattended ground sensors (UGS). Missiles, rockets, and their submunitions, and artillery are not
considered unmanned systems.
UNSUPERVISED LEARNING: See Machine Learning definition.
VALIDATION: The process of determining that requirements are both correct and complete with respect to representing
larger goals and objectives.
VARIANCE (MACHINE LEARNING): An error from sensitivity to small fluctuations in the training set. High variance can
cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting).
VARIANCE (STATISTICS): A standard calculation in Statistics that indicates the “spread” of a data distribution, whilst
reducing the sensitivity to outliers. Essentially, it tells us how far a set of numbers are spread out from their average value.
More formally, variance is the expectation of the squared deviation of a random variable from its mean.
VERIFICATION: The evaluation of the outputs of a process to ensure correctness and consistency with respect to the inputs
and standards provided to that process.
WEIGHT (MACHINE LEARNING): A parameter within a Neural Network that transforms input data. The inputs get multiplied
by a weight value.
• Positive weight → excitatory relation
• Negative weight → inhibitory relation
NOTE: Both bias and weight are learnable parameters inside the network.
WEIGHT (SYMBOLIC AI): A parameter within Semantic Networks which indicates the strength of the relationship between
the nodes.
WELL FORMULATED FORMULA (wff): A statement whose validity can be determined by mathematical logic.
3. CLASSIFICATION OF AI TECHNIQUES
This section offers a common knowledge base of AI terminology as a foundation for the remaining sections. We provide a
taxonomy of AI concepts and techniques based on three major paradigms: Symbolic AI, Numerical Analysis, and Machine
Learning. The section concludes with a discussion of a general workflow for developing ML systems.
Figure 1 shows a classification of AI techniques adopted by the SAE G-34/WG-114 committee, and those techniques
relevant to the statement of concerns document are detailed in 3.1.
Main Classification:
Figure 1 - AI classification
3.1 Symbolic AI
Symbolic AI refers to the collection of methods that attempt to capture and encode human knowledge, for the purpose of
machine understanding and processing. Solutions are based on human-readable representations of problems, where
real-world objects, along with their characteristics, relationships, and interactions, are represented by symbols. An aircraft,
for example, could be represented by the textual string “aircraft.” Although this approach requires no model training, no
massive amounts of data, and no “guesswork,” its main limitation lies in the difficulty of codifying the real worlds, with all its
complexity and nuances. Today’s knowledge-based AI agents occupy only a partially described universe.
3.1.1 Logic
AI and Computer Science evolved together over the second half of the 20th century and, since computers are fundamentally
Boolean Logic processing machines, it is natural to apply the principles and techniques of Logic to solve problems. The
approach is to build propositional sentences, using a constrained grammar, that express some belief (“Truth”) about the
domain of interest, with no ambiguity. A modern computer can assess the truth of millions of propositions per second, to
make decisions, plan, and act.
The logic branch of Symbolic AI is composed of several classes of Formal Methods. They allow the representation of logical
arguments (a.k.a. statements) in terms of well-formed formulas (wffs), whose validity can be determined by mathematical
logic. A wff is said to be valid if its tautology can be demonstrated for all possible interpretations [11]. An interpretation
consists of an assignment of truth values to the wff variable symbols.
3.1.1.1 Knowledge Engineering
Knowledge Engineering [22] refers to all technical, scientific, and social aspects involved in building, maintaining, and using
knowledge-based systems. This broad field includes long-standing concepts such as expert systems [23] and rule-based
systems [24], as well as more recent techniques such as ontology engineering [25], [26] and probabilistic graphical models
[27], [28]. A common trait of knowledge engineering solutions is the hard coding of knowledge using formal languages and
their processing using formal methods.
3.1.1.1.1 Ontologies
Ontologies [18], [29], [30], [31] are data structures that model concepts, roles, and individuals, and their relationships.
Ontologies can store conceptual or incomplete information and use reasoning over description logic to infer missing
relationships and attributes.
Ontologies [32] capture domain knowledge by organizing concepts, relationships, and constraints organized in a web of
statements named the Semantic Web [33]. The ontology information model is designed to facilitate easy data sharing,
reuse, and interoperability across multiple scientific and engineering domains. The Semantic Web accomplishes this
through two critical principles: decoupling the knowledge model from the application and integrating knowledge models
through reuse and extension [27]. Many ontologies can be defined for any single domain since their creation is usually
context driven in terms of scope, abstraction-level, granularity, properties, and intent.
Figure 2 - The spectrum of ontology kinds [34]

3.1.1.1.2 Knowledge Bases
A Knowledge Base (KB) [27] is a software component that represents a collection of facts or statements that is ontologically
described, processed, and accessed in a Semantic Web application. Entailments inferred by reasoners combined with
user/application asserted statements in the KB can refer to generic concepts or specific individuals (instance data).
Reasoners evaluate the assertions in the underlying data structures and verify consistency of ontology concepts and their
mutual relationship. The use of ontologies allows a KB to incorporate new ontologies and instance data incrementally as
the need arises.
3.1.1.1.3 Expert Systems
In artificial intelligence, expert systems are designed to solve complex problems by reasoning through bodies of knowledge,
represented mainly as “if-then” rules (production rules) rather than through conventional procedural code. An expert system
usually has two core components: a knowledge base and an inference engine. The knowledge base is an organized
collection of facts and production rules about the system’s domain. Facts are frequently acquired from human experts
through interviews and observations. The inference engine applies the rules to the known facts in the knowledge base to
possibly deduce new facts in order to emulate the decision-making ability of a human expert. Inference engines can also
include explanation and debugging abilities. Typical tasks for expert systems involve classification, diagnosis, monitoring,
scheduling, and planning for specialized technology domains.
3.1.2 Numerical Analysis
This section covers several common algorithmic approaches based on non-data driven techniques used in a wide range of
applications including Optimization, Mapping, and State Estimation, in applications such as robotics and autonomous
vehicles.
3.1.2.1 Algorithmic Approaches
3.1.2.1.1 Iterative Algorithms
Iterative approaches can be described as taking an initial guess at a solution, and repeatedly running a procedure (an
iteration) to refine that guess to get closer and closer to the acceptable solution. The iterations typically continue until either
a maximum number of iterations are reached, or a certain fitness of the solution is found. Depending on the problem, the
fitness of the solution may be a measure of the proximity to the true solution, or it may be a heuristic measurement.
3.1.2.1.2 Randomized Algorithms
Randomized algorithms explicitly use random numbers as one of their inputs as a method to converge to a true solution.
These algorithms generally use random inputs as samples from a large action or state space. As the number of samples
increases, these algorithms generally then either heuristically or provably converge to the correct solution.
3.1.2.1.3 Population Methods
Population methods involve self-affecting, multi-part populations [46]. A mathematical system that “affects itself” is typically
composed of at least two parts that produce and accept feedback to themselves, that is, where the action of any part affects
the other parts, which in turn affect the original part [44].
In ANNs, for example, population-based training is a hyperparameter optimization technique, similar to genetic algorithms,
that learns from a schedule of hyperparameters rather than fixed values.
Population-based methods mimic the genetic search of Natural Selection in the biological world. Instead of working with a
single candidate solution, the designer works with a population of candidates to explore the solution space.
Genetic (Evolution) Algorithms have a flavor of “bottom-up” or “emergent” problem solving.

3.1.2.1.4 Probabilistic Algorithms
Probabilistic algorithms describe a general class of algorithms that operate directly on probability distribution functions
(PDF). Rather than computing a value directly, they compute the probability that a random variable takes any given value.
These methods can be useful in developing systems with inputs that have an associated uncertainty (such as sensor data),
as they allow for passing that uncertainty through the full system. Many probabilistic algorithms fall into the category of
Bayesian Inference, which make use of Bayes’ Rule to update a prior distribution with new information.
3.1.2.1.5 Search Algorithms
Search algorithms handle retrieving data or finding connections in a given data structure. This includes applications such
as finding an optimal path between two nodes on a graph or identifying a subset of items within the structure that fulfils
certain criteria.
3.1.2.1.6 Direct Approaches
Direct approaches can best be described by not being iterative approaches; they constitute a predetermined set of
operations independent of the input data. Direct approaches cover a wide range of algorithms from algorithms with exact
solutions (such as linear systems or Kalman filters) to algorithms that simply change the representation of data (such as
convolutions). Many algorithms in both traditional and learning based computer vision fall in the category of direct
approaches.
3.1.3 Machine Learning (ML)
The ML field is crowded with multiple learning strategies and algorithm families with no discernible advantage among
alternatives if one makes absolutely no assumption about the training data (this is known in the AI R&D community as the
No Free Lunch theorem [8], [52], [53]). The generally accepted classification of ML algorithms is: supervised learning,
unsupervised learning, semi-supervised learning, and reinforcement learning.
3.1.3.1 Supervised Learning
With Supervised Learning, the main goal is to train a model with labelled data, enabling the model to make predictions on
new unseen data. The training data comprises samples, where the desired output (labels) are known and correct, for a
given input.
In Classification problems, the model assigns a categorical class label to each sample. An example of Classification is
image recognition, where physical objects can be classified as cars, people, signs, buildings, and so on.
A second type of Supervised Learning is the prediction of continuous outcomes, referred to as Regression. An example of
regression is predicting the remaining useful life of an engine hardware component, given explanatory variables such as
number of flight cycles, hours of operation, and operating temperature distribution [50], [54], [55], [56].
3.1.3.2 Unsupervised Learning
In contrast to Supervised Learning, where the right answer is already known for a number of samples, Unsupervised
Learning deals with unlabeled datasets, or data of an unknown structure. Unsupervised algorithms learn underlying
structures in the training set and use these patterns to make predictions. In Clustering, a common Unsupervised Learning
technique, an amorphous pile of information can be sorted into meaningful subcategories (clusters), using features that may
not be immediately discernable by humans. Typical unsupervised learning tasks include anomaly detection, visualization,
dimensionality reduction, and association rule learning [50], [57], [58], [56].
3.1.3.3 Semi-Supervised Learning
The ML algorithm is capable of learning from data that is partially labelled. Most semi-supervised learning algorithms use
unsupervised learning approaches to improve supervised learning solutions in dealing with problems such as insufficient
labelled data, curse of dimensionality, feature engineering, outliers, and data drift.
3.1.3.4 Reinforcement Learning
In Reinforcement Learning, the goal is to produce a Machine Learning algorithm (agent) that improves its performance
interacting with its environment and observing changes brought about by the agent’s own actions. The learning objective is
to determine an action-selection policy that maximizes a numeric reward signal. The agent must discover which actions
yield the most reward by trying them. The trial-and-error search and the learner’s need to consider delayed rewards are the
two most important distinguishing features of Reinforcement Learning.
A classic example of Reinforcement Learning is where the objective, such as the safe landing of a spacecraft without
crashing, is well defined. The algorithm uses repeated attempts to determine the optimal set of maneuvers and fuel usage
to best achieve the objective [60], [61], [56].
3.1.3.5 Machine Learning Bubble Chart
To summarize the types of learning and their applications, a bubble chart was designed. The intend was not to have an
exhaustive list of applications for each type of learning, a task that is impossible considering the variety of possible
applications of Machine Learning. On the contrary, this chart provides only a few examples of technical applications (smaller
bubbles) for each of the types of learning (medium size bubbles).
Figure 3 - Machine learning bubble chart
3.1.3.6 Neural Networks and Deep Learning
Artificial Neural Nets (ANNs) manifests from a Machine Learning modelling technique that was originally inspired by the
networks of neurons in human brains, but gradually evolved apart from its biological analogy. Sufficient to say, the
development of powerful graphical processing units (GPUs), availability of huge amounts of training data, improvements in
ANN training algorithms and programming APIs, and a “virtuous circle of funding and progress” [5] are leading to an
accelerated evolution in the development of ANN-based products.
Original ANN architectures include the basic Perceptron and their layered combinations called Multi-Layer Perceptrons
(MLPs). Fundamentally, an MLP is “just a mathematical function mapping some sets of input values to output values” [8].
Structurally, MLPs are dense networks of interconnecting layers of simpler perceptron units. The signal in MLPs flows only
in one direction from the input layer towards the output layer passing sequentially through one or more hidden layers. The
output of a unit in any given layer in an MLP network is a linear combination of the output of the units of the previous layer
in the sequence. Training an MLP then consists in determining appropriate connection weights and respective biases terms
for all units in the network. Nonlinear activation functions are interposed between consecutive layers to increase the
expressiveness of the network.
The ANN is called a Deep Neural Network (DNN) when it has a deep stack of hidden layers, as opposed to Shallow Neural
Networks (SNN) that have only a few hidden layers. There is no consensus about how much depth an ANN requires to be
classified as a DNN. Ian Goodfellow, Yoshua Bengio, and Aaron Courville, however, state in their seminal book [8] that
“deep learning can be safely regarded as study of models that involve a greater amount of composition of either learned
functions or learned concepts than traditional Machine Learning does.”
Figure 5 - Example of A DNN [66]

Figure 4 - Example of A SNN [66]
3.2 Machine Learning System Workflow
The workflow illustrated in Figure 6 is proposed to support the discussion in the following sections.
Figure 6 - Machine learning system workflow
The workflow comprises of the following phases:
System Definition:
The system requirements, allocated to the ML system, are captured and validated.
The system architecture is also defined at this stage and corresponding safety assessment and analyses are conducted to
identify the relevant system safety and security requirements and objectives (quantitative, qualitative) to be flow down to
the other phases.
Data Selection and Validation:
In this phase, the data should be selected and collected based on the specific problem domain. The data should contain a
distribution of the data that includes the proper level of variances. The data is cleaned and/or transformed (using several
techniques), with respect to specific expected quality attributes, and validated against requirements established in the
previous phase.
Each data sample is allocated to either the training, validation, or test set.
Model Selection, Training, and Testing:
In this phase, the model is selected and trained with the Training dataset and then optimized with the Validation Dataset.
At the end of this phase the trained model is tested with the Test dataset, to assess the performance regarding the system
requirements.
ML Sub-System Implementation:
In this phase the tested trained model is implemented (Design and HW/SW integration) into the target environment.
ML Sub-System Verification:
In this phase the correct implementation of the tested trained model is verified. The verification could be based on the test
dataset in addition to the other test inputs established for the purpose of the implementation verification.
ML Sub-System Integration and Verification:
In this phase, the step by step integration within the overall system is performed and corresponding verification activities
are conducted.
Operation:
In this phase, the overall system containing the ML system is released to field and operated. The behavior of the ML system
is monitored, and the operating data are compiled into an operational dataset which is fed back to Data Selection and
Validation to augment the existing datasets.
4. GAP ANALYSIS FROM EXISTING STANDARDS
4.1 Introduction
4.1.1 Description
There is a concern that existing development assurance standards may not be appropriate for AI/ML solutions. Existing
development assurance standards require a predefined set of activities and demonstrations used later by certification
authorities as evidences of compliance. There is a broad consensus the currently accepted means of compliance in use for
systems, software and hardware fail to provide appropriate assurance for some specific of AI/ML techniques. These
techniques use methodologies that are fundamentally different from the generic life-cycle assumed by the existing
development assurance standards.
Although these techniques are not governed by a strict methodology and could be assessed in various ways, this section
focuses on a predefined hypothetical development scenario. It does not intend to cover all possible scenarios and
possibilities, but rather tries to highlight the main gaps identified in a factual way among existing development assurance
standards in use today. It is important to note that this is not a guidance on how to certify a system using an AI sub-system.
Many published standards are considered in this section and some of them have been subject to a deeper analysis. The
choice of standards assessed is led by the desire to cover at least system, software, and hardware aspect of ground and
airborne development assurance standards.
4.1.2 Gap Analysis Methodology
In order to perform the gap analysis, this section employs a methodology whereby two aspects of a development assurance
standard’s objectives are assessed. The first aspect, the objective applicability, relates to how the objective is relevant to
the envisioned life cycle of AI development. The second aspect, the objective sufficiency, relates to how the guidance
provided in the standard is relevant to the envisioned life-cycle of AI development, i.e., it is clear how to apply the guidance
and it is compatible with the envisioned life-cycle activities.
4.2 A Hypothetical Development Scenario
In order to limit the scope of the gap analysis, this section describes a hypothetical development scenario that represents a
common ML-based system development life cycle. It also attempts to identify and map where ML-based system
development activities, described in 3.2, could be performed within actual requirement-based development assurance
standards.
4.2.1 Machine Learning Technique
There exist numerous ML techniques in literature and each of them have their own particularities impacting development
activities. This development scenario focus on a classical feed-forward Neural Network (NN).
4.2.2 Development Life Cycle Activities Description
1. The system requirements allocated to the ML system are captured and validated. The system architecture is also
defined at this stage and corresponding safety assessment and analyses are conducted to identify the relevant system
safety and security requirements and objectives (quantitative, qualitative) to be flow down to the other phases.
2. The data selection phase, where data is gathered, cleaned, and selected in respect to the requirements defined in the
system definition phase. Data is selected for the purposes of training and validation of the NN. The data needs to be
sufficient to confirm whether the functional requirements have been satisfied.
3. The model selection phase consists of the choice of the NN architecture. It defines the number of layers, number of
nodes for each layer and their corresponding activation function.
4. The offline training and testing of the NN is performed at on a separate host/system. The training activity defines the
NN weights value using the training dataset and performs optimization of these weights using the validation dataset in
respect to the acceptance criteria defined in phase 1.
5. After the offline training, the trained model is tested using the test dataset in respect to the acceptance criteria defined
in phase 1 by executing the acceptance scenarios.
6. The implementation phase focuses on the software and/or hardware development activities. It consists of designing,
implementing, and verifying the trained NN in the target environment. A trained NN has passed acceptance criteria and
represents system requirements allocated to software or hardware implementation. Special cases will be considered
where it may apply, particularly in a model-based approach.
7. In this phase, the step-by-step integration within the overall system is performed and corresponding verification activities
are conducted.
4.3 Standards Analysis
4.3.1 ED-79A/ARP4754A Guidelines for Development of Civil Aircraft and Systems
ARP4754A/ED-79A discusses the development of aircraft and aircraft systems taking into account the overall aircraft
operating environment and functions. This includes validation of requirements and verification of the design implementation
for certification and product assurance. It provides practices for showing compliance with the regulations and serves to
assist an organization in developing and meeting its own internal standards. ED-79 and ARP4754 also invoke and interact
with the safety assessment process of ARP4761.
4.3.1.1 Review Process and Scope
The gap analysis has been performed considering Machine Learning techniques, and more precisely Neural Networks
(although the results can be generalized to other ML techniques).
The whole scope of the standard has been considered, however, as described in the scenario of 4.2, the use of AI/ML was
assumed to be primarily at system or item level.
4.3.1.2 Main Gaps
The potential gaps identified are the following:
• Development Assurance Level: The concept of development assurance level (DAL) remains relevant. Nevertheless, its
current interpretation for Machine Learning remains to be defined, and cannot be based on DO-178C/ED-12C nor
DO-254/ED-80 processes (cf. gap analysis on DO-178C/ED-12C and DO-254/ED-80.)
• System requirements: The requirements definition approach may need some adaptations for Machine Learning. In
particular, the fact that some of the requirements are implicitly contained in the dataset may require new methods for
evaluating the correctness and completeness of a dataset to the requirements. A specific focus on dataset requirements
and AI/ML performance requirements associated with the DAL may also be needed.
• Requirements validation: New methods for requirements validation could be needed for Machine Learning, because of
the potential changes in the requirements capture philosophy. These new methods should be considered in addition to
the ones defined in the ARP4754A.
• Implementation verification: new methods for implementation verification could be needed for Machine Learning in order
to cope with the probabilistic nature of ML-based algorithms and because of the potential changes in the requirements
capture philosophy.
4.3.2 ARP4761 Guidelines and Methods for Conducting the Safety Assessment Process on Civil Airborne Systems and
Equipment
This document describes guidelines and methods of performing the safety assessment for certification of civil aircraft. It is
primarily associated with showing compliance with FAR/JAR 25.1309. These tools and methods cover both system and
aircraft level safety assessment. The overall aircraft operating environment is considered.
No detailed gap analysis has been performed on this standard. A review led to the conclusion that the processes/principles
for safety assessment should not be drastically impacted by the use of AI or ML techniques. However, some additional
safety methods may be needed to address specifics of AI/ML.
4.3.3 ED-12C/DO-178C Software Considerations in Airborne Systems and Equipment Certification
The purpose of the ED-12C/DO-178C standard is to provide guidance for the production of software for airborne systems
and equipment that performs its intended function with a required level of confidence in safety. The standard includes:
• Objectives for software lifecycle processes
• Activities that provide a means for satisfying those objectives
• Descriptions of the evidence in the form of software lifecycle data and variations of objectives by software level
• Additional consideration (e.g., previously developed software)
The review was made considering the scenario described in 4.2. Note that special cases may leverage from defining the
NN as a whole model instead of only an algorithm and its coefficients. Indeed, an ED-12C/DO-178C supplement for
model-based development, ED-218/DO-331, will look at this particular case in the next section.
4.3.3.2 Main Gaps
In respect to the scenario, the flow down of the ML model coefficients and weights is equivalent to the normal practice of
equations and algorithms flowed to the software process through requirements. As such, the majority of the
ED-12C/DO-178C processes can be executed without major impact. However, there may be gaps which have been
identified:
1. Based on the hypothesis developed in 4.2, the high-level software requirements capture process foreseen in
ED-12C/DO-178C might not be occurring at the level of the software item but rather originate from the system
development processes. The input to the software design processes would not be a set of functional requirements but
rather the output of the training process, e.g., a NN structure and associated weights and biases.
2. The NN structures and weights cannot be traced to the system requirements from which they are developed (neither to
the system textual requirements describing the expected properties nor to the training datasets). There is no guidance
in this standard and its related documents on how to handle traceability between NN structure and weights and system
textual requirements and training datasets.
3. The ED-12C/DO-178C verification methods are not appropriate to training datasets and NN weights:
a. The properties relevant to design assurance of datasets are fundamentally different from the properties of high level
or low level requirements.
b. NN weights are not comprehensible by humans, so manual review and analysis for compliance to parent
requirements is not credible.
4. Requirements-based testing may not be possible in the traditional manner because typical NN techniques (training,
validation, and testing phases) do not fit with the requirements-based verification approach per ED-12C/DO-178C.
a. In particular, verification on NN that relies on traceability cannot be achieved (the coverage metrics for NN structure
and weights are not established).
b. DO-178C testing strategy calls out an equivalence classes concept which may not apply for NN implementation
due to the highly complex and non-linear nature of a regular size NN. It may be practically impossible to identify
equivalence classes for a NN algorithm.
5. The structural coverage measurement metrics for source code may not be effective because any activation of the NN
would trigger activation of all nodes at some extent. Hence, there is no value in performing a classical structural
coverage approach.
6. Guidance for software planning process is applicable, but there is no specific guidance for NN based systems.
7. Additional consideration for the usage of NN weights as Parameter Data Items (PDI). The NN weights are inherently
different from a traditional configuration file. Using PDI for NN weights would modify the behavior of the approved
configuration and thus represents a serious gap.
4.3.4 ED-218/DO-331 Model-Based Development and Verification Supplement to ED-12C
ED-218/DO-331 is a supplement to ED-12C/DO-178C that provides guidelines to produce airborne software using
model-based techniques.
ED-218/DO-331 defines a model as:
”An abstract representation of a given set of aspects of a system that is used for analysis, verification, simulation, code
generation, or any combination thereof. A model should be unambiguous, regardless of its level of abstraction.”
NOTE 1: If the representation is a diagram that is ambiguous in its interpretation, this is not considered to be a model.
NOTE 2: The “given set of aspects of a system” may contain all aspects of the system or only a subset.
This standard does not remove or eliminate the ED-12C/DO-178C objectives, but rather provides additional guidance and
objectives specific to a model-based development lifecycle. Such lifecycles may share some similarities with our
hypothetical AI development scenario.
Readers of ED-218/DO-331 may conclude that in some instances, NN architecture and weights could be considered a
design model that is shared between the system and the software engineering teams. Also, acceptance tests performed at
the system level can be considered as simulation cases executed on the target software under development.
Further refinements are needed to our hypothetical AI development scenario to address the ED-218/DO-331 specific
aspects of model-based software development.
1. The NN model is shared between the system level (learning phase) and the software level (implementation, integration,
and tests of the Executable Object Code on the target). Note: the acceptance scenarios may be considered as the
simulation cases of DO-331. Some acceptance scenarios may need to be executed with the final sensors, hardware,
and software in non-simulated environments.
2. The textual requirements that define expected properties at the system level, as well as the acceptance criteria of the
NN, are considered part of the system textual requirements from which the NN design model is developed.
With these assumptions, the development and verification activities of the software model, and requirements from which
the software model is developed, are implemented at the system level, but nevertheless all ED-218/DO-331 objectives
applicable to software models should be satisfied (as described in the Note 1 in ED-218/DO-331 MB.1.6.3).
4.3.4.2 Main Gaps
In practice, ED-12C/DO-178C and ED-218/DO-331 are used in conjunction. Therefore, all the previous gaps identified for
ED-12C/DO-178C are still valid. Identified additional gaps pertaining to ED-218/DO-331 are as follows:
1. The NN structures and weights can only be traced to their parent requirements as “many to many.” There is no guidance
in the standards on how to handle traceability between NN structure and weights, and system textual requirements and
training datasets.
2. The DO-331 verification methods are not appropriate to training datasets and NN weights:
a. The properties of datasets are fundamentally different than the properties used in review and analysis, e.g.,
verifiability, completeness, etc.
b. NN weights are not comprehensible by humans, thus traditional review/analysis is not credible.
3. The traceability objectives may not be relevant to traceability between acceptance scenarios and the system textual
requirements, because the acceptance scenarios for NN are typically highly complex and cover multiple requirements.
Without traceability, it may not be possible to provide confidence in detecting the presence of unintended behaviors in
the design model.
4. Requirements-based testing may not be possible in the traditional manner because typical NN techniques (training,
validation, and testing phases) do not fit with the requirements-based verification approach per ED-218/DO-331.
a. In particular, verification on NN that relies on the traceability cannot be achieved (the coverage metrics for NN
structure and weights are not established).
5. The structural coverage measurement metrics may not be effective as any activation of the NN would trigger activation
of all nodes at some extent. Thus, there is no value in performing a classical structural coverage approach.
4.3.5 ED-215/DO-330 Software Tool Qualification Considerations
ED-215/DO-330 provides guidelines for qualification of tools used for the development of safety-critical systems per
ED-12C/DO-178C or other safety-related standards.
AI can be used for tool development. With the limited timeframe allocated for the gap analysis, ED-215/DO-330 was
excluded from the analysis scope.
4.3.6 ED-216/DO-333 Formal Methods Supplement to ED-12C and ED-109A
ED-216/DO-333 is a supplement to ED-12C/DO-178C, which provides guidelines to produce airborne software using Formal
Methods techniques. Formal Methods are mathematically based techniques for the specification, development, and
verification of software aspects of digital systems.
Formal methods may be applied for the verification of AI. With the limited timeframe allocated for the gap analysis, it was
excluded from the analysis scope.
4.3.7 ED-217/DO-332 Object-Oriented Technology Supplement to ED-12C and ED-109A
ED-217/DO-332 is a supplement to ED-12C/DO-178C which provides guidelines to produce airborne software using
Object-oriented and related techniques.
No additional gap found regarding ML/AI than the ones identified in 4.3.4.2.
4.3.8 ED-80/DO-254 Design Assurance Guidance for Airborne Electronic Hardware
The use of increasingly complex electronic hardware for safety critical aircraft functions generates new safety and
certification challenges. DO-254/ED-80 provides design assurance guidance for the development of airborne electronic
hardware such that it safely performs its intended function in its specified environments. The guidance is conveyed through
recommended activities that should be performed from the hardware’s conception through initial certification in order to
meet design assurance objectives.
The review was made mainly considering the hypothetical AI development scenario described in 4.2. In addition to the
general assumptions already defined, the following assumptions have been applied to this analysis:
• Neural network weights would be stored as coefficients in programmable memory.
• The guidance was only assessed against the development of a Programmable Logic Devices (PLD) based H/W
implementation.
4.3.8.2 Main Gaps
The flow down of the ML model, including coefficients and weights, could be seen as equivalent to the implementation of
equations or algorithms flowed to the Field Programmable Gate Array (FPGA) process through requirements. However, the
gaps inherent to traditional development assurance approaches are also applicable:
1. Based on the hypothesis developed in 4.2, the requirements capture process foreseen in ED-80/DO-254 might not be
occurring at the level of the hardware item but rather originate from the system development processes. The input to
the hardware design processes (conceptual design and detailed design) would not be a set of functional requirements,
but rather the output of the training process, e.g., a NN structure and associated weights and biases.
2. The NN structures and weights cannot be traced to the system requirements from which they are developed. There is
no guidance in this standard on how to handle traceability between NN structure and weights and system textual
requirements and training datasets.
3. The ED-80/DO-254 validation and verification methods may not be applicable to training datasets and NN weights:
a. The properties of datasets are fundamentally different than the properties used in traditional DA review and analysis.
b. NN weights are not comprehensible by humans. Thus, traditional review/analysis cannot be used.
4. Requirements-based verification may not be possible in the traditional manner because typical NN techniques (training,
validation, and testing phases) do not fit with the requirements-based verification approach per ED-80/DO-254.
5. The elemental analysis method which drive detailed design coverage metrics for HDL code may not be effective as any
activation of the NN would trigger activation of all nodes at some extent. Thus, there is no value in performing a classical
design coverage or elemental analysis approach.
4.3.9 ED-109A/DO-278A Software Integrity Assurance Considerations for Communication, Navigation, Surveillance, and
Air Traffic Management (CNS/ATM) Systems
Since ED-109A/DO-278A is very close in terms of objectives to ED-12C/DO-178C, the group has made the assumption that
the main outcome of the ED-12C/DO-178C gap analysis will apply to ED-109A/DO-278A. ED-109A/DO-278A has an
assurance level AL4 that has no equivalence on ED-12C/DO-178C. Additionally, ED-109A/DO-278A contain a section
(12.4) devoted to COTS SW, which has no correspondence in ED-12C/DO-178C either. These differences should be
addressed during the definition of the future standard.
4.3.10 ED-153 Guidelines for ANS Software Safety Assurance
ED-153 objective is to offer guidance on how to ensure that the risk associated with deploying the software is reduced to a
tolerable level by providing:
• Recommendations and requirements on the major processes necessary to provide safety assurance for software in Air
Navigation Service (ANS) systems (“ground” only).
• A recommended ANS Software lifecycle and its associated activities in support of achieving the identified objectives.
• The ML software was considered as being either a standalone software or a component/module of a bigger software.
• In regard to “Sufficiency” (see 4.1.2), ED-153 is composed of sets of objectives relating to software (SW) processes.
The only way it provides guidance is by referencing other standards (refer to ED-153 P.7). These standards were not
part of our review. However, whenever identified, insufficiency of guidance is recorded and part of the gap analysis
summary below.
Scope
• Detailed review of SW Safety Assurance System Objectives (CH3).
• General review of “classical” objectives, linked to primary, supporting, and organizational lifecycles (CH4, CH5, CH6).
• We considered that identified gaps also apply to COTS (CH7).
Out of Scope
• Online learning is out of scope of the present review. Only pre-trained systems (offline ML) are in scope.
4.3.10.2 Main Gaps
• The Software Assurance Level (SWAL) concept fully applies to “classical” SW but does not fully apply to learning
datasets and AI based SW:
By “classical” SW, we mean SW which behavior is explicitly described through several description levels such as
software requirements, architectural design, and detailed design, including algorithms, which are then translated into a
target coding language. With AI (ML) components, these artefacts may not be available. Additionally, learning datasets
are not SW and cannot be allocated a SWAL.
• SWAL4 objectives are fully applicable to an AI based SW while SWAL1-3 objectives are partially or not applicable to
an AI based SW:
This is due to the fact that (1) requirements can be specified using the outcome of the Machine Learning pipeline (done
at system level) while detailed software design for example is not, and (2) the level of depth of analysis, verification,
and evidence requested for SWAL4 is at SW requirement level (i.e., black box) while it is much more stringent for
SWL3-1.
• SW development processes are adapted to “classical” SW, not to learning datasets and potentially not to SW resulting
from learning methods.
• Configuration Management is not fully applicable to AI based systems.
• There is a need to clarify what is considered as a configuration item in the context of AI based systems.
• Traditional verification activities might not be fully applicable to AI based systems:
For example, most verification activities related to the detailed design may not be achievable, as the detailed design is
not available/explainable.
• Quality audits down to source code level and executable level activities are not fully applicable to AI based systems.
• Identified gaps also apply for COTS.
4.3.11 Commission Implementing Regulation (EU) 2017/373
The main objective of this regulation is to lay down common requirements for the provision of ATM and air navigation
services and other ATM network functions for general air traffic, and for the competent authorities, which exercise
certification, oversight, and enforcement tasks.
This regulation addresses the approval process of changes to a functional system. A functional system is combination of
procedures, human resources, and equipment (including HW and SW), organized to perform a function within the context
of ATM/ANS and other ATM network function. All service providers need to assess changes they make to their functional
system.
4.3.11.1 Main Gaps
A full gap analysis has not been done yet. However, an initial analysis shows that the main gaps are similar to those found
in EUROCAE ED-153 analysis.
4.3.12 ISO 12207 Systems and Software Engineering - Software Life Cycle Processes
ISO/IEC/IEEE 12207 is an international standard for software lifecycle processes. First introduced in 1995, it aims to be a
primary standard that defines all the processes required for developing and maintaining software systems, including the
outcomes and/or activities of each process.
This standard shares many concepts that are similar to ED-12C/DO-178C but is intended for the software industry domain
in general. With the limited timeframe allocated for the gap analysis, it was considered as not essential and excluded from
the analysis scope.
4.3.13 ISO 26262 Road Vehicles - Functional Safety
The ISO 26262 series of standards is the adaptation of IEC 61508 for road vehicles. This adaptation applies to the activities
during the safety lifecycle of safety-related systems comprised of electrical, electronic, and software components. ISO 26262
includes guidance to mitigate risks from software systematic failures and random hardware failures by providing appropriate
requirements and processes.
This standard is conceptually similar to the suite of aeronautical standards (ARP4761, ARP4754A, ED-12C/DO-178C, and
ED-80/DO-254) but less detailed and intended for automotive domain. With the limited timeframe allocated for the gap
analysis, it was considered as not essential and excluded from the analyse scope.
4.3.14 ISO 21448 Road Vehicles - Safety of the Intended Functionality
For some systems, which rely on sensing the external or internal environment, there can be potentially hazardous behavior
caused by the intended functionality or performance limitation of a system that is free from faults. Examples of such
limitations include: (1) the inability of the function to correctly comprehend the situation and operate safely; this also includes
functions that use Machine Learning algorithms; (2) an insufficient robustness of the function with respect to sensor input
variations or diverse environmental conditions. The ISO 21448 is intended to address the absence of unreasonable risk due
to the potentially hazardous behaviors related to such imitations.
Although this standard hasn’t been reviewed against the scenario described in 4.2, it seems to present interesting concepts
that can be relevant to AI/ML. Among these concepts, it covers aspects where it falls outside functional safety-hazards that
can’t be traced to functional failures and especially in hard-to-specify environments. This standard may be considered as
an interesting input for future works.
4.3.15 ED-201 Aeronautical Information System Security (AISS) Framework Guidance
There are many standards and guidance documents which address the responsibility that each organization has for its own
information security, dealing with internal systems, processes, products, and data. However, this guidance looks beyond
the individual organization at information security related to systems and processes, and to products and data, in a wider
context. This guidance concentrates on the shared information risk, which is inherent in the situation where systems,
processes, products, or data are shared, or are passed from one organization to another. There are varying degrees of
sharing and exchange in these situations, but any sharing or exchange causes additional risk to an organization. This
guidance is most applicable to the larger risks, which affect safety or where there are significant implications for service
delivery. Smaller risks (assuming they have been correctly assessed) require less effort, particularly in mitigations, and this
guidance should be interpreted accordingly.
There was no gap found regarding ML/AI.

4.3.16 ED-202A/DO-326A Airworthiness Security Process Specification
The guidance in ED-202A adds to current guidance for aircraft certification to handle the threat of intentional unauthorized
electronic interaction to aircraft safety. It adds data requirements and compliance objectives, as organized by generic
activities for aircraft development and certification, to handle the threat of unauthorized interaction to aircraft safety and is
intended to be used in conjunction with other applicable guidance material, including ED-79A/ARP4754A,
ED-135/ARP4761, ED-12C/DO-178C, and ED-80/DO-254 and with the advisory material associated with FAA AC 25.1309
and EASA AMC25.1309, in the context of Part 25 and CS-25. Tailoring of this guidance may allow it to be applicable in
other contexts such as CS-23, CS-27, CS-29, CS-E, Part 23, Part 27, Part 29, and Part 33.
ED-202A guidance material is for equipment manufacturers, aircraft manufacturers, and anyone else who is applying for an
initial Type Certificate (TC), and afterwards (e.g., for Design Approval Holders (DAH)), Supplemental Type Certificate (STC),
Amended Type Certificate (ATC), or changes to Type Certification for installation and continued airworthiness for aircraft
systems, and is derived from understood best practice.
4.3.17 ED-203A/DO-356A Airworthiness Security Methods and Considerations
This document provides guidance on a set of methods and guidelines for applicants implementing an Airworthiness Security
Process as specified in ED-202A/DO-326A to address information security for certification of aircraft and its systems. More
specifically, it addresses the activities in the areas of security risk management and security assurance. Applicants and
authorities should consider these methods, and alternative practices if and when they are proposed. Those aspects of
information security that have no safety effect are not in the scope of ED-203A.
Airworthiness security is the protection of the airworthiness of an aircraft from intentional unauthorized electronic interaction.
Intentional unauthorized electronic interaction (also known as "unauthorized interaction" within the scope of ED-203) is
defined as human-initiated actions with the potential to affect the aircraft due to unauthorized access, use, disclosure, denial,
disruption, modification, or destruction of electronic information or electronic aircraft system interfaces. This definition
includes the effects of malware on infected devices and the logical effects of external systems on aircraft systems but does
not include physical attacks or electromagnetic jamming.
The guidance provides methods and considerations for securing airworthiness during the aircraft life cycle. It was developed
as a companion document to ED-202A/DO-326A "Airworthiness Security Process Specification" which addresses security
aspects of aircraft certification and ED-204/DO-355 Information Security Guidance for Continuing Airworthiness which
addresses airworthiness security for continued airworthiness.
There was no gap found regarding ML/AI outside references to objectives linked to development process standards.
4.3.18 ED-204/DO-355 Information Security Guidance for Continuing Airworthiness
This document provides guidance for the operation and maintenance of aircraft and for organizations and personnel involved
in these tasks. It shall support the responsibilities of the Design Approval Holder (DAH) to obtain a valid airworthiness
certificate and the responsibilities of aircraft operators to maintain their aircraft, to demonstrate that the effects on the safety
of the aircraft of information security threats are confined within acceptable levels. As all information security threats may
have an intentional origin, ED-204 also covers electronic sabotage (as used in AMC 25.1309).
ED-204 is a resource for civil aviation authorities and the aviation industry when the operation and maintenance of aircraft
and the effects of information security threats can affect aircraft safety. It deals with activities that need to be performed in
operation and maintenance of the aircraft related to information security threats.
ED-204 also gives guidance that is related to operational and commercial effects (i.e., guidance that exceeds the safety-only
effects). Thus, it also supports harmonizing security guidance documents among Design Approval Holders (DAH), which is
deemed beneficial to DAHs, operators, and civil aviation authorities. The most comprehensive possible area of the
application of this guidance is deemed to be Large Transport Aircraft programs. However, ED-204 does not make any
assumptions about and is without prejudice to its applicability.
There was no gap found regarding ML/AI. Although this statement is true for a generic AI/ML sub-system, gaps can be
identified when the AI/ML sub-system is a security function. A security function may need some additional properties like
diversity, independence, and isolation that could be hard to demonstrate for an AI/ML sub-system.
4.3.19 ED-205 Process Standard for Security Certification and Declaration of ATM ANS Ground Systems
The security of ATM/ANS ground systems, constituents in use and data, is currently being regulated and security
management is a must for providers of ATM/ANS, who need to address risks with a potential impact on safety, operational
delivery, economic concerns, and others. ED-205 guides stakeholders involved in the protection of ATM/ANS ground
constituents.
4.4 Gap Analysis Summary
In this section, several standards have been considered, and a gap analysis against a typical Machine Learning
development life cycle was summarized. The analysis has focused on system, software, and hardware development
standards with on-board and ground considerations in mind.
In performing this work, defining a typical Machine Learning development life cycle was quite challenging, highlighting a
major concern regarding where the elements of Machine Learning development life cycle should be addressed among the
usual aeronautical requirements structure. Although many possible scenarios could exist and present other gaps than the
ones identified, we focused on a classic end-to-end scenario in which to perform the analysis.
Moreover, among the main gaps identified, the requirements traceability, the mapping of Machine Learning model functions
and parameters between aerospace engineering concerns (such as SRATS, SRATH, and high or low level requirements),
and the application or lack of verification methods suitable for dataset were the gaps that raised many concerns. The
identified gaps highlight that a data-driven paradigm for AI/ML may not be adequately addressed by existing standards.
Lastly, the committee recognizes that as the field of AI/ML in aerospace matures, there will be a need to not only address
gaps in industry standards but also gaps in regulatory standards. While addressing these gaps is out of scope for this
document and the standard it intends to inform, the committee does consider the impact of its work on regulatory standards
and liaises with global policy leaders for discussion on the subject. The concept of AI Licensing, as described in 6.2.2.5, is
one example of where the committee’s work includes high-level discussion on policy with global regulators.
5. ML DEVELOPMENT SPECIFIC CONSIDERATIONS AND AREAS OF CONCERNS
This section lists specific consideration and areas of concerns for an ML system.
This identification has been conducted considering the workflow proposed in 3.2 and the issues identified in the gap analysis
across Section 4.
The specific considerations and areas of concerns described in this section are technical in nature. Ethical and societal
considerations are not addressed.
5.1 Criteria for the Identification of Specific Considerations and Areas of Concerns
The following criteria were considered when identifying the specific concerns for each ML workflow phase.
• Safety
• Compliance Demonstration (concerns in matching current standards)
• Configuration Management
• Security
• Reliability
5.2 Specific Considerations and Areas of Concerns
5.2.1 System Definition
During this phase, the system requirements, allocated to the ML system, are captured and validated.
The system architecture is also defined, and corresponding safety assessments and analyses are conducted to identify the
relevant system safety requirements and objectives (quantitative and qualitative) to be flowed down to the other phases.
• There is a building consensus in the aviation domain that the capture of requirements for intended functionalities to be
developed within an ML system will still be performed following current methodologies (e.g., ARP4754A/ED-79A). These
reports also acknowledge that in a data driven ML system, the data from which the ML system is trained is considered
as a requirement because it “represents” the expected result and desired behavior of the ML system. The data also
contains some variability that may lead to unintended and unexpected behavior of the ML system.
• In an ML system, it is increasingly understood that the data gathered for training purposes will have the greatest impact
on the operational performance of the system. However, there is not yet an agreed upon approach to assure that the
data gathered is sufficient to train a suitably performant system. In an ML approach, capture of system requirements
will still be necessary, though current approaches would need to be adapted, especially in the case of derived
requirements.
• Requirements definition and validation processes should avoid or detect any requirements that may lead to
non-representative dataset selection for the AI model training phase.
• The probabilistic nature of ML applications should be considered as part of the standard safety process analysis and
relevant architectural mitigation (bounding, voting, diversity, etc.) should be flowed down to the ML application phases.
5.2.2 Data Selection and Validation
During this phase, the data is gathered, cleaned, and selected (using any of several established techniques), with respect
to specific expected quality attributes, and verified against the system requirements established in the previous phase. At
this phase, the dataset is divided into Training, Validation, and Test sets.
• Identification of dataset quality attributes:
o Correctness/accuracy (measure of data accuracy, e.g., detection of noise, or presence of outliers)
o Completeness (enough data to cover the input space)
o Sufficiency (enough data to train the system to the desired level of accuracy)
o Representativeness (regarding the foreseeable environmental conditions of the intended operation, based on data
attributes defined in the requirements set, and with regards to the capabilities of the target environment)
o Balance (enough samples in each class and sufficient diversity among classes)
o Fairness (bias avoidance based on information presented in the set of requirements specifying which data is
selected)
o Timeliness
o Integrity (a degree of assurance that the data and its value has not been lost or altered, in order to avoid misleading
the ML model training and any resulting adverse behavior)
o Traceability (identification of data sources and their trustworthiness)
o Attributes completeness (enough attributes for each dataset)
• The above attributes are applicable to all datasets (training, validation, and test) used for the ML-based system design.
• The above attributes must be assured, whatever the source of the data, e.g., real world, synthetic, or augmented.
• The ML-based system should be tested with a test dataset different from the training dataset. The level of independence
between training dataset and test dataset should be considered.
• A key challenge with supervised ML is how well the training datasets are organized to support the good fit of the learning
model in order to avoid overfitting and underfitting.
• The process and activities to manage the different datasets and their data under configuration management should be
defined, including the aspects related to the tracking and management of changes, and the addition of data after the
certification or approval of the ML system.
• For investigation purposes, the incoming in-service operational data should be traceable to their origin.
• Tool qualification should be considered when dataset quality attributes may be altered, by inclusion of synthetic or
augmented data, or by automated labelling. (Specific considerations will have to be established when ML is used for
tool development.)
• The verification of the correctness of the data labelling is to be considered.
• The impact of the sampling method (statistical, random, systematic, etc.) used to select the test dataset, on the
representativeness of the data, should be considered.
• The dataset management environment should be secured in order to avoid adversarial or accidental data poisoning
and tampering. Attackers with access to the datasets used to train systems utilizing AI can influence the training process
by tampering with the data or the parameters used by the system. Poisoning attacks involve gradually introducing
carefully designed samples or “perturbations” that avoid setting off alerts, but eventually fool the system into seeing the
changes as natural rather than abnormal. This affects its decision-making capability and produces misleading results.
5.2.3 Model Selection, Training, and Testing
During this phase, the model is selected and trained with the training dataset and is optimized using the Validation Dataset.
At the end of this phase, the trained model is tested with the test dataset to assess its performance against the system
requirements.
• The training strategies (learning curves study, cross validation, models, feature selection, resampling, random restart,
etc.) that increase generalization should be prioritized to improve the model’s performance on new data.
• The trained model integrity should be evaluated from a safety and security perspective.
• Considerations about how to gain confidence into COTS training framework (HW/SW) should be established.
• Considerations related to the qualification of the tools used in training environments should be established.
• The process should give confidence in the choice of adequate network model architecture (including model topology,
numbers of layers, and numbers of nodes per layer,) training dataset, and training techniques. Model properties
(explainability, accuracy, safety, etc.) should be identified when selecting and training the model.
• The level of detail expected for the explanation of the model should be identified according to the expected level of
safety required by the ML system.
• Training phase allowable stop criteria should be defined.
• The impact of re-training the ML system on testing and certification activities is to be considered, especially with respect
to potential credit for previous training and testing activities.
• The training process should ensure performance repeatability per ML system functional and performance requirements.
• The training environment should be secured in order to avoid adversarial or accidental data poisoning and tampering
during the training. Evasion attacks modify input data so that the system cannot correctly identify the input,
misclassifying data and effectively rendering the system unavailable. One well-cited example involves fooling image
processing systems into incorrectly identifying images (for instance, a traffic stop sign is mistakenly detected as a speed
limit sign).
• The training environment should be secured to avoid common pitfalls to bad design, including models that result in
systems utilizing AI making bad decisions by overfitting or underfitting a decision-making model based on the data
analyzed. Modelling the available data too closely leads to overfitting, while not matching closely enough leads to
underfitting. This means that the systems produces either too-specific or too-general decisions, rather than providing
the right balance between certainty and doubt.
• The use of pre-trained models is a current practice for saving time or getting better performance in the training phase.
Considerations related to the validation of these previously trained or COTS models should be established.
• Re-training/re-validation strategies of an ML system with in-service collected data and the frequency of retraining should
be considered as part of training phase activities.
• The trained model and its weights and hyper-parameters should be managed as part of the configuration of the system.
5.2.4 Inference Implementation of the ML Model
During this phase, the tested and trained model is implemented (including design and HW/SW integration) into the target
environment.
• The implementation of SW robustness strategies is made difficult due to the lack of capability to specify abnormal or
invalid input/output data.
• The implementation of the system utilizing AI using common means of compliance for traditional SW or HW systems
based on traceability to requirements is made difficult due to the fact that machine-learnt models, such as neural
networks, cannot be traced.
• Differences between executing the model implementation in the target environment and executing the model on
the Model Selection, Training, and Testing environment are to be identified, including data representation, resources,
performances, non-functional aspects, system integration aspects, and other relevant areas.
• Implementing instrumentation for multiple purposes: (1) ease of verification and explainability (to the extent in which a
cause and effect can be observed within a system), and (2) recording of ML system behaviors in order to support
operation phase post analysis, should be considered.
• The level of detail expected for the explanation of the implementation of the trained model should be identified according
to the expected level of safety allocated to the ML system.
• Detection of unexpected features, introduced at the time of the implementation, cannot be supported by traceability and
human verification as in traditional software development, so an alternate approach should be established.
5.2.5 ML Sub-System Verification
During this phase, the correct implementation of the tested and trained model is verified. The verification could be based on
the test dataset in addition to the other test inputs established for the purpose of the implementation verification. The test
data established to verify the correctness of the model training can be used to verify the implementation of the ML
sub-system.
• The use of robustness test data should be considered.
• Consideration should be given to the identification, by testing, of unexpected behavior.

• Verification completion criteria should not only show that requirements are met, but should also establish confidence
that the potential for unintended behavior is minimized. The traditional software structural coverage criteria defined are
not usable for ML system testing activities. Dedicated systematic criteria (input data space coverage, node coverage,
etc.) should be established.
• Implementation and verification of “play back” features to analyze, verify, and validate results should be considered.
• Difficulty of establishing completeness of verification of ML models without the use of automated and qualified tools
should be considered.
5.2.6 System Integration and Verification
During this phase, the step by step integration with the overall system is performed and corresponding verification activities
are conducted.
• During the integration of the ML sub-system into the overall system, the verification of the architectural mitigation as
identified in 5.2.1 is to be performed.
5.2.7 Operation
• During this phase, the overall system containing the ML subsystem or component is released to the field and operated.
The behavior of the system is monitored and the operating data is compiled into an operational dataset which is fed
back to the data selection and validation phase and potentially used to augment the existing datasets. In-service
operational data (input data, corresponding ML system prediction data, observed errors rates, and ML monitoring data)
should be recorded for post-deployment analysis by users, ML systems, and authorities.
• In-service operational data should be transferred to the ML system manufacturer for in-service analysis and potential
improvement of the training, validation, and test datasets.
• The in-service operational data management environment should be secured in order to avoid adversarial or accidental
data poisoning and tampering. Attackers with access to the collected operational data can influence post-deployment
data analysis and potential ML System re-training.
• In some specific cases and under certain conditions, the possibility of local update of ML system parameters should be
considered.
6. POTENTIAL NEXT STEPS
This section clarifies the scope of the standards and guidance material to be delivered by G-34/WG-114 and describes
potential approaches to be further considered by the committee for inclusion in future revisions. The intent for G-34/WG-114
standards are to serve as a means of compliance for the certification and approval of products utilizing AI.
NOTE: Since this section is elaborated in parallel with the rest of the Statement of Concerns, it does not address all of the
concerns identified in Section 5.
6.1 Scope of Work
6.1.1 Airborne and Ground Systems
The intent of G-34/WG-114 is to write standards addressing both airborne and ground systems. It is acknowledged that
those domains will involve different considerations, and that the requirements shall be adapted to each considered system.
The need for AI-specific activities on tool qualification 1 will also be considered within scope.
1
According to ED-215/DO-330, a tool is a computer program or a functional part thereof, used to help develop, transform, test, analyze, produce, or
modify another program, its data, or its documentation. Examples are automated code generators, compilers, test tools, and modification management
tools.
6.1.2 AI Techniques Included
The scope of the standards and guidance material to be delivered by G-34 is limited to the components utilizing AI that are
not already covered by existing standards. This material should be compatible with existing standards to enable integration
of components utilizing AI in broader airborne and ground systems.
Industry standard development assurance processes such as ED-12C/DO-178C, ED-109A/DO-278A, ED-80/DO-254, do

not have guidance for AI techniques such as Machine Learning algorithms. For some AI techniques, it may not be possible
to meet all ED-12C/DO-178C, ED-109A/DO-278A and ED-80/DO-254 objectives such as those associated with the low-level
requirements, implementation, integration, and verification activities. For artificial neural networks, there may be no
meaningful representation of the internal structure of Machine Learning algorithm. While it may be possible to extract and
represent the weights and connections of Machine Learning algorithms, it may not readily describe the intended behavior.
The detailed gap-analysis with respect to existing standard can be found in Section 4 of this document.
As a first step, it is suggested to limit the scope of the future standard to Machine Learning. Therefore, the remainder of
Section 6 is focused on Machine Learning, even if parts of its content may be applicable to other AI techniques. Additional
AI techniques will be addressed in subsequent revisions of G-34/WG-114 standards.
Online learning raises specific certification/approval issues if the system utilizing AI continues to learn while in operational
use. Examples of issues are as follows:
• The system utilizing AI may diverge from acceptable behavior,
• Each aircraft may have a different behavior, and
• Online learning may make accident/incident investigations very difficult.
In accordance with the above-mentioned concerns, a staged approach is recommended for continuing AI standards
development:
• At first, offline ML training can be addressed. The scope includes both the use of ML at the initial system design phase
(before the first deployed system) and later offline ML retraining based on an extended or updated dataset (e.g.,
collected from the deployed system).
• Secondly, online learning AI can benefit from the framework for offline learning but will also bring new questions that
may prompt a paradigm shift towards certification/qualification for machine systems and their relationship with humans.
This would be addressed in subsequent revisions of the standards, or the development of yet a new standard.
The Statement of Concern’s primary focus is on offline learning. Some online learning concerns have been included, but
not all concerns with online learning are covered.
6.1.3 Autonomy
It is expected that AI techniques will also be used to increase the level of autonomy of future products. Defining a framework
for autonomous operations is beyond the scope of this standard. Liaison with existing and future working groups on
autonomy should be established to ensure that future AI certification/approval standards answer their needs.
6.1.4 Cybersecurity
New technologies often come with additional cybersecurity risks related to new vulnerabilities, and Artificial Intelligence is
no exception to this rule. For example, Machine Learning techniques are vulnerable to data poisoning attacks, adversarial
examples attacks, or more typical backdoor attacks. For many systems, these cybersecurity risks and associated
vulnerabilities can have an adverse impact on safety. Therefore, cybersecurity cannot be overlooked for the certification
of AI.
G-34/WG-114 should:
• Identify the cybersecurity vulnerabilities that are specific to systems utilizing AI and check if existing standards enable
identifying and managing these vulnerabilities,
• Develop the necessary guidance or ask other working groups to create or update relevant standards if some
vulnerabilities are not addressed by existing standards,
• Liaise with cybersecurity standardization working groups, for example SAE G-32, RTCA SC 216, and EUROCAE
WG-72, in order to ensure consistency of the standards.
NOTE: This standard will not address security risk analysis.
6.1.5 Trustworthiness Analysis
Since the digital technologies and in particular AI are considered in certain aspects as a breakthrough, many regulators and
international bodies have issued high level requirements to promote but also regulate these technologies with respect to
democratic values. This is the case in Europe with the Ethic Guidelines for Trustworthy AI published by the European
Commission (see more details below). In the U.S., it is the Executive Order issued on February 11, 2019 by the White
House on “Maintaining American Leadership in Artificial Intelligence” 2. The United States Congress has also issued a
specific guideline titled “Artificial Intelligence and National Security” 3 for dealing with defense issues. From the perspective
of international bodies, OECD published the OECD AI Principles 4 and G20 published the G20 human-centered AI
Principles 5.
In Europe, the European Union is promoting trustworthy AI systems, a concept introduced in the “Ethic Guidelines for
Trustworthy AI” 6 produced by the High Level Expert Group on Artificial Intelligence set up by the European Commission (AI
HLEG Guidelines). Trustworthiness has three components, which should be met throughout the system’s entire life cycle:
• The AI system should be lawful, complying with all applicable laws and regulations,
• The AI system should be ethical, ensuring adherence to ethical principles and values, and
• The AI system should be robust, both from a technical and social perspective since, even with good intentions, AI
systems can cause unintentional harm.
The adherence to these three components should be checked through a trustworthiness analysis, as defined in the EASA
AI Roadmap 1.0, published in 2020.
With regard to the first component, lawfulness, existing aviation regulation is sufficient to meet AI HLEG recommendations
and guidance. Industry standards dedicated to AI based systems in aviation are an important element in demonstrating
compliance with the lawfulness principle. No other specific consideration is deemed necessary for AI based systems within
EUROCAE’s scope.
Regarding the two other components in Trustworthy AI, ethics, and robustness, the AI HLEG Guidelines establishes four
principles: respect for human autonomy, prevention of harm, fairness, and explicability. These are further translated to
seven key areas of concern as described by the HLEG Guidelines:
• Human agency and oversight
• Technical robustness and safety
• Privacy and data governance
2
https://www.whitehouse.gov/presidential-actions/executive-order-maintaining-american-leadership-artificial-intelligence/
3
https://fas.org/sgp/crs/natsec/R45178.pdf
4
https://www.oecd.org/going-digital/ai/principles/
5
https://www.mofa.go.jp/files/000486596.pdf
6
https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai
• Transparency
• Diversity, non-discrimination, and fairness
• Societal and environmental well-being
• Accountability
The AI HLEG Guidelines also introduces available methods for the implementation of these requirements throughout the AI
system’s life cycle. For requirement that are directly applicable to aviation systems (e.g., technical robustness, safety, and
cybersecurity), it is expected that the trustworthiness analysis will result in functional and non-functional system
requirements as well as organizational requirements that should be implemented and verified (using AI HLEG methods
and/or existing/adapted industrial processes of the aviation sector). It should also be noted that some aspects of these
seven AI HLEG key requirements may be out-of-scope for SAE-G34 and EUROCAE WG-114 and may require specific
rulemaking tasks and/or other industry standards.
New inputs from the AI HLEG should be monitored by G-34/WG-114 to check applicability of latest updates on the
Committee’s Standard.
NOTE: The terms used in this document (i.e., statement of concern) are not necessarily the same as those present in the
trustworthiness high level requirements. Refer to the original cited documents for clarification.
6.2 Suggested Approach for Certification/Approval of Systems Utilizing AI
It is expected that certification/approval of products utilizing AI will rely on several activities, performed fully or partially at
each phase of the product lifecycle, depending on the AI technique used, on the system architecture, and on the safety
assessment.
Paragraph 6.2.1 describes the phases of an ML-based product lifecycle, in order to clarify the steps of the development
process. Then 6.2.2 lists the possible certification/approval activities that may be performed to show compliance of the
ML-based product with the requirements. Finally, 6.2.3 provides an overview of the complete certification process. The
certification/approval approach proposed in this section is an initial proposal, that is expected to evolve over time. This
approach has been elaborated taking into account the development assurance processes and the methods for compliance
demonstration described in existing standards, while trying to address the specific aspects of Machine Learning.
6.2.1 Phases of the Product Lifecycle
For products implementing ML, the proposed lifecycle is described in Figure 7. This lifecycle should support a common
standard for ground and airborne systems. ML can be used for developing simple software algorithms, but also for
developing complete functions or even systems. ML model development requiring system expertise should be addressed
at the system level. For this reason, this section proposes a process that includes the system level, and is not limited to a
sub-system, component, or item level.
System requirements capture, allocation, and validation, and safety assessment
Development of system architecture
Data selection and validation
Model selection, training, and verification (testing)
ML-based sub-system implementation
ML-based sub-system verification
System integration and verification
Operations and maintenance
Figure 7 - Product lifecycle for ML-based products
NOTE: This process is not fully linear but iterative. In particular, the safety assessment process spans the lifecycle and is
updated throughout.
6.2.1.1 System Requirement Capture, Allocation, and Validation
This activity defines the system requirements and allocates them to the ML sub-systems and all other hardware and software
components. It also validates that requirements are correct.
Several types of requirements are listed as part of this activity: functional requirements, customer requirements, operational
requirements, performance requirements, physical and installation requirements, dataset requirements, maintainability
requirements, interface requirements, and any additional requirements deemed necessary for the specification of the
intended function of the ML sub-system. In particular, requirement capture should include the capacity of the system to
detect the ML sub-system is not exercised in an operational domain consistent with the training dataset used.
Existing system standards may be considered for this phase:
• For airborne systems: ED-79A/ARP-4754A, ED-135/ARP4761;
• For ATM/ANS systems: (EU) 2017/373 and its AMC/GM, Eurocontrol SRM, IEC 61508, and ED-153.
Examples of ML-specific requirements can be as follows: required probability of success of each function, characteristics of
the data input to the ML sub-system in the target environment (interface requirements), robustness requirements, i.e.,
criteria for correctness of input data and expected behavior in case of incorrect data, etc.
6.2.1.2 Safety Assessment
The safety assessment aims at identifying the failure conditions of the system associated with the loss of or malfunction of
the system (including ML sub-system(s)), associated hazards, their effects, and the rationale for their classification (e.g.,
minor, major, hazardous, catastrophic.) The safety assessment enables the identification of top-level safety requirements
for the whole system. The safety assessment is then refined and updated throughout the system development process
considering the selected system architecture. It establishes the safety requirements and the Design/Software Assurance
Level assigned to the ML sub-system, determines if the proposed architecture can satisfy the identified safety objectives,
and provides the evidence that the safety requirements are met. The resiliency aspect can be considered as part of the
safety assessment with its three components: protection, detect and respond, and lastly recovery.
6.2.1.3 Development of System Architecture
This activity establishes the system architecture that includes the interfaces to the ML sub-system as well as potential
technical constraints. The safety requirements allocated to the ML-based sub-system and its interfaces with other
components should be considered (e.g., redundancy and monitoring identified during the safety assessment process).
6.2.1.4 Data Selection and Validation
Data will be the corner stone of the ML-based sub-system development: the final behavior of the component will be almost
fully determined by the selected data. The goal of this activity is to select and clean data (cleaning, data curation, labelling,
normalization, etc.), in order to achieve the expected quality attributes (representativeness, lack of undesired bias,
timeliness, etc.). Once data is selected and cleaned, data validation consists of verifying that the desired quality attributes
are indeed present, that data has not been altered, and that data is adapted to the use case. Validation can for example be
performed by systematic check of certain attributes, by sampling, or by cross-check.
6.2.1.5 Model Selection, Training, Validation, and Testing
Model selection and training are crucial to the correct design of the ML-based sub-system. There are many models available,
and many ways to train and optimize a model (e.g., pruning). The standard should not impose a specific algorithmic
approach or a specific training technique. The focus should rather be on specifying the minimum criteria to be achieved by
the model after training. These minimum criteria are currently not addressed by existing standards.
Once the model has been selected and trained using the “training dataset,” it should be validated and optimized using a
second dataset called “validation dataset.” At the end of this phase, the resulting model is tested with a third dataset called
“test dataset” to check that it behaves as required.
6.2.1.6 ML-Based Sub-System Implementation
Once the learning phase is achieved (e.g., model selection, training, and validation), the architecture and parameters of the
ML product are defined. Software and electronic hardware design consists of the requirements development and ensuring
that they are in-line with the requirements standards. It should be understood that the component verification will be based
on those requirements. Partially, or completely derived requirements are sent back to safety for analysis. Software and
electronic hardware architectures are also defined and their compliance with standards are ensured. The implementation
phase consists of implementing the ML sub-system according to this definition. During the implementation phase, the
following steps are carried out as applicable:
• Hardware production
• Software design, coding, and hardware synthesis
• Hardware/Software integration
This implementation phase share common characteristics with the one performed on traditional systems, and existing
development standards may be of interest for this phase:
• For airborne systems: ED-12C/DO-178C, ED-218/DO-331, and ED-80/DO-254;
• For ATM/ANS systems: (EU) 2017/373, Eurocontrol SRM, IEC 61508, ED-109A/DO-278A, ED-218/DO-331, and
ED-153.
Some implementation errors could be introduced during this phase (e.g., representation of the computed weights on the
target, performance resources not compatible, accuracy and performance downgraded if translation is in another language,
insertion of additional code, incompatibility of executable object code and target machine etc.). They are not specific to ML,
but they may jeopardize the expected behavior (e.g., prediction or execution time.) Hence, attention is needed to ensure
correct implementation of the ML-based sub-system. Compliance with previously cited standards could be a way to achieve
this.
NOTE: Application of Model-Based Development and Verification based on ED-218/DO-331 is presented in Section 4.
6.2.1.7 ML-Based Sub-System Verification
The purpose of this step is to check that the requirements are indeed fulfilled by the ML-based sub-system. Various
verification strategies can be adopted, depending on the verification means available: testing (including massive testing,
adversarial testing, robustness testing, etc.), formal methods, service experience, etc. The verification challenges are tightly
linked to the ML specification challenges.
ML-based sub-systems should also provide confidence in the absence of unintended behaviors. Unintended behaviors
should be identified during software verification activities (including specific verification techniques coming from research
such as neural network coverage) and mitigation should be put in place if needed. More investigation on this topic should
be carried out during the development of future AI standards.
Verification may be performed at the system level and/or at component level. The adequate balance between system-level
and component-level verification should be identified.
6.2.1.8 ML-Based Sub-System System Integration and Verification
Once the ML-based sub-system has been designed, implemented, and verified, it can be integrated in the broader system.
The ML-based sub-system is put together with other components of the system, and integration is verified at the system
level.
NOTE: Similarity analysis or service history may facilitate the verification at the system level.
6.2.1.9 ML-Based Operations and Maintenance
The system is maintained and potentially repaired or updated. Offline learning may be used by recording the operating
environment. The recording is then used to enhance the training offline, with the enhanced model verified before it is
uploaded onto the aircraft during a maintenance process or procedure. All ML system sensors are maintained (e.g.,
scratches repaired, sensors surface cleaned and polished, replaced sensors calibrated).
In-service operational data (input data and corresponding ML system prediction data) should be recorded for in-service
issues post analysis by users, by ML systems, and by authorities.
In-service operational data should be transferred to the ML system manufacturer for in-service analysis and potential
improvement of the training/validation and test datasets.
The in-service operational data management environment should be secured in order to avoid adversarial or accidental
data poisoning and tampering.
6.2.2 Potential AI/ML Development Assurance Activities
The certification/approval of systems utilizing AI could be achieved through various strategies, involving different ML
development assurance activities. Each activity may contribute to the overall demonstration of compliance to applicable
regulation. The following potential activities are put forward as potential means and methods of compliance, for further
consideration by the committee as part of the standard development.
The committee further acknowledges that the application of these approaches may not fully result in a successful
demonstration of compliance without the evolution of regulatory policy. As mentioned in 4.4, the committee is engaged in
continued high level discussions with global regulators and other policy leaders on how AI/ML may impact and evolve
certification regulation.
6.2.2.1 Learning Assurance
Assurance can be defined as “The planned and systematic actions necessary to provide adequate confidence and evidence
that a product or process satisfies given requirements” (ED-12C and ED-109A/ED-153).
Learning Assurance is expected to be the adaptation of well-known development assurance approaches to Machine
Learning. As the learning phase is not addressed by existing standards, a new development assurance process is needed
for this phase.
Learning assurance aims at ensuring that learning errors are detected and removed through the application of best
engineering practices. This includes consideration of specification, data quality, selection, design, and verification of ML
models and learning errors.
The complete learning assurance process still needs to be defined through identification of ML development best practices.
The level of rigor of learning assurance should be adjusted depending on the selected Design Assurance Level, and on the
other activities performed.
Learning assurance only applies to the learning phase and is therefore only one of the activities needed as part of the full
development assurance process.
6.2.2.2 Formal Methods
According to ED-216/DO-333, “Formal methods are mathematically based techniques for the specification, development,
and verification of software aspects of digital systems.”
For some ML algorithms, it might be possible to use formal methods to demonstrate the compliance of an AI implementation
with a given set of requirements or mathematical properties.
Such an approach has already been trialed on Neural Networks to demonstrate that the system satisfies specific safety
requirements (G. Katz, C. Barrett, D. Dill, K. Julian, and M. Kochenderfer, “Reluplex: An Efficient SMT Solver for Verifying
Deep Neural Networks,” ArXiv170201135 Cs, Feb. 2017).
Even if these approaches are currently limited to specific topologies of a small size, their main advantage is the ability to be
applied after training, and could therefore enable verifying the outputs of the learning phase without knowledge of the training
activities that have been performed (except knowledge of the aspects used as assumptions in the formal demonstration).
6.2.2.3 Testing
Testing aims at demonstrating compliance of the system utilizing AI with the requirements through various types of tests.
As for traditional system development, requirement-based testing is recommended.
In addition to traditional testing methods, at least two additional testing approaches can be considered for Artificial
Intelligence products:
• Random testing: this approach consists of testing the product by generating a large number of random independent
inputs, and checking the corresponding outputs, in order to verify and demonstrate the performance requirements of
the system. The tests should be performed in a representative environment and with sufficient coverage to achieve
statistical significance.
• Robustness and adversarial testing: In addition to traditional functional robustness related to unexpected or outlier
inputs, the robustness of a ML system relates to the robustness of the ML inference versus any variability of the input
data compared to the data used during the learning process. Perturbations can be natural (e.g., sensor noise or bias)
due to failures (e.g., invalid data from degraded sensors) or malicious insertions (e.g., pixels modified in an image) to
fool the model predictions. Perturbations can also be defined as true data locally different from the original data used
for the model training and that might lead to a wrong prediction and an incorrect behavior of the system. Adversarial
testing is targeted to exercise the system robustness to adversarial examples and may also be used in order to detect
overfitting.
NOTE: Associated performance and robustness requirements should be properly defined as part of the specification of the
product.
Such approaches are now made possible thanks to the exponential increase of the available computing power, to the
improvements of simulation means, and to the progress in adversarial examples generation.
The criteria for assessing the statistical significance of these testing approaches are still to be specified in the context of
safety-critical products utilizing AI.
If testing relies on a dataset, the dataset used should be checked and verified in a similar manner as the other datasets
used in ML development.
6.2.2.4 Explainability
NOTE: The difference between explainability and interpretability has been debated for years within the scientific community,
without a clear conclusion on their respective definition. It is decided to use only the term explainability to cover the
whole concept of explainability and interpretability.
In traditional products (not utilizing AI), explainability is inherently built into the architecture, and requirements are directly
implemented. The implementation requirements trace directly to the system requirements as well as to the coding
developed. Therefore, the implementation is fully explainable as a result of the direct traceability from the system definition
down to the implementation code. On the contrary, some classes of products utilizing AI, and ML-based products in
particular, may show a behavior that cannot be directly traced from the requirements to the implementation code. The
process of transitioning from system requirements to the ML model, with weights, is usually an automated learning process
that does not preserve the traceability path from system requirements to ML model architecture and definition. The
translation from the requirements to the learned model during the algorithm learning process may not be fully
understandable, and therefore the learned model and weights are not explainable.
In this context, explainability of the behavior of a product utilizing AI is an important characteristic needed to support means
of compliance at several stages of its lifecycle:
• At design: a good understanding of the AI behavior facilitates the design process.
• At certification/approval: an authority should be given explanations on how the system works and the verification
evidence that it meets its requirement in order to accept it.
• In operation: the user of the system should have sufficient understanding of the systems behavior to be able to use it
as intended.
• After an in-service occurrence: the manufacturer should be able to reproduce what happened and explain the causes
of the occurrence (large capacity data recording could be necessary).
Various levels of explainability exist, and it is acknowledged that complete explainability of the full AI decision-making
process may be neither achievable nor necessary to meet the goals previously listed.
Explainability may be inherent to the model used (e.g., rule-based systems) or be obtained through investigation techniques
applied to a black-box model (e.g., sensitivity analysis applied to ANN).
NOTE: Explainability is also beneficial for cybersecurity assessment to understand if the model has learned a vulnerability.
6.2.2.5 Licensing
The concept of licensing is mentioned here even if it is highly unlikely Civil Aviation Authorities could give
certification/approval credit to such an approach for compliance in the short term.
The concept of licensing has been introduced by NASA in “Certification Considerations for Adaptive Systems.” In this
document, licensing is described as follows: “Pilots are licensed to fly based on demonstrating knowledge and skill through
hundreds of hours of training and evaluation. Similarly, humans performing other critical tasks such as air traffic control are
trained and tested extensively before they enter an operational role. Extending this licensing procedure to autonomous
software would lead to an analogous system of gained trust. Certification would be eventually attained through extensive,
though not comprehensive, demonstration of knowledge and skill by the advanced software systems.”
6.2.2.6 In-Service Experience
In-service experience is defined in existing standards (such as ED-12C, ED-109A, AMC 20-152A on Airborne Electronic
Hardware, etc.) as an alternative method when no conventional software/hardware certification/approval artefact is available
or will be difficult to obtain. Traditionally, in-service experience is used to manage COTS, Previous Developed Software
(PDS) 7 or Previously Developed Hardware (PDH). For the time being, in-service experience has never been used to
certify/approve components utilizing AI. Therefore, the necessary conditions for enabling the use of in-service experience
to build certification/approval credit in this context remains to be defined.
Pending further analyses, the following necessary conditions could be considered as a starting point:
• The service period duration is sufficient (e.g., for a AL3/DALC COTS, 8760 cumulated hours may be required by the
Competent Authority).
• The new operating design domain (ODD) is the same or similar (with additional verification needed if not the same).
• The ML sensor suite is identical (same sensors, orientation, location, and operational environment, calibration).
• The product is stable and mature (that is, few problem reports and/or modifications occurred during the service period
and were not safety critical and all anomalous behavior events are recorded and analyzed).
In-service experience may be built using data coming from real operation or data coming from simulated operation (in
particular for ground systems) provided that all necessary conditions, in particular the one related to the environment
representativeness, are fulfilled.
NOTE: Service experience could also be beneficial to cybersecurity monitoring and in particular vulnerability management
of COTS/Open Source Software (OSS).
NOTE: Other alternative methods for Providing Assurance of COTS ML models, such as additional testing, prior product
approval, etc. may be considered.
7
PDS is software already developed for use. This encompasses a wide range of software, including COTS software through software developed to
previous or current software guidance (i.e., ED-12C or ED-109A).
6.2.2.7 Online Learning Assurance
If online learning is used, then it should be demonstrated that the behavior of the system is not adversely impacted by new
inputs processed in operation. At this stage, there is no mature solution to address online learning, but the demonstration
could rely on different strategies, for example formal proof at runtime or safety risk mitigation.
6.2.3 Potential AI/ML Development Assurance Process
6.2.3.1 Process Overview
The suggested development assurance process for certification/approval is depicted in Figure 8. The process is composed
by a trustworthiness analysis, the identification of safety risk mitigations, a safety assessment, assurance level assignment,
planning for certification/approval and by the activities listed in the previous paragraph. The key principle in this process is
the proportionality of activities, depending on the AI technique used, on the system architecture and the safety assessment.
Figure 8 - Development assurance process for certification/approval
6.2.3.2 Safety Assessment and Assurance Level Assignment
The first step is the safety assessment which then yields the assurance level assignment. These steps could be performed
as described in ED-79A/ARP4754A and ED-135/ARP4761 for airborne systems or in the EUROCONTROL SAM, the
SESAR SRM, and ED-153 for ATM/ANS systems.
However, these safety analysis approaches mainly deal with identifying the failure conditions of a system derived from
failures of its components and their interactions with other systems. Due to the complex nature of systems utilizing AI and
their interactions with a complex environment, the traditional safety process could be complemented with a more
comprehensive safety analysis that deals specifically with failure conditions derived from interactions of the system and
external actors, without any system failure being necessarily present. An example of this safety approach is the Safety of
the Intended Functionality (SOTIF - ISO 21448) concept which is currently being used in the automotive industry to cover
those scenarios for advanced algorithms. The following are some key items of the methodology:
• Evaluation of functional and performance aspects of the system and its AI sub-systems.
• Identification and evaluation of hazards caused by the intended functionality and its triggering events.
• Functional modifications to mitigate identified hazards.
Such an approach could be explored to ensure completeness of the safety assessment.
In order to mitigate possible unintended behavior of the system utilizing AI, safety risk mitigations can be put in place.
The identification of safety risk mitigations and the safety assessment are two tightly coupled activities that allow allocating
the safety requirements to each component of the product utilizing AI.
In the context of products utilizing AI, various safety risk mitigation strategies can be considered, including the runtime
monitoring of components utilizing AI, output bounding mechanisms, dissimilar and/or redundant architectures, pilot
validation, or any other relevant strategies.
6.2.3.3 Plan for Certification/Approval
Once the safety assessment has been performed and the assurance level has been assigned, the plan for
certification/approval aims at identifying the activities that are relevant for the demonstration of compliance. This plan should
be adapted to each system, based on the AI technique used, on the system architecture and the safety assessment.
The plan can involve one or several activities listed in 6.2.2, and the depth of demonstration for each activity can be adjusted,
as illustrated by the gauge on top of each activity on Figure 8. Each activity can be carried out at one or several phases of
the product lifecycle described in 6.2.1.
The plan should be agreed with the certification/approval authority. Criteria for defining and accepting the relevant activities
and the relevant depth of demonstration could be defined as part of the future work of SAE-G34/WG-114. Any required
adaptation or change in the existing certification liaison processes (for airborne and ATM/ANS) should also be investigated.
7. USE CASES - AIRCRAFT SYSTEMS
The concerns outlined in this document are illustrated by use cases drawn from various aerospace application domains.
The columns in the table include:
• Example - a brief descriptive title of the use case identifying the functionality provided by the ML-based system.
• ID - a unique identifier useful for reference in future work of the joint EUROCAE SAE G-34/WG-114 committee.
• Goal - a description of the function provided by the ML-based system.
• Inputs - sensor/data inputs to system under normal operation.
• Outputs - output data from ML-based system fed to downstream systems.
• Details - identification of the issue(s) illustrated by the use case.
• Assurance Gaps - features of the use case that complicate the process of assuring system safety and other assurance
concerns.
Table 1 - Example use cases for aircraft systems
Example ID Goal Inputs Outputs

Object UC-S01 ML-based classifier for threat Object detection sensors Object classification
Classification for avoidance (e.g., aircraft, UAS, (e.g., radar, LIDAR, and
Sense & Avoid birds) camera)
Details Assurance Gaps
Algorithm must accurately classify the threat Training/V&V data sufficiency
posed by detected objects at different Objective function sufficiency (robustness)
distances and without false positives. Impact Definition of operational design domain at top left of
to flight planning as well as immediate V
evasive maneuvering. Safety requirements definition and coverage - what
degree do we need to capture?
Object type determination and threat How do the requirements change with design
assignment are two different aspects of the assurance level?
function; the algorithms/functions will rely on What is the effect of functional failure?
multiple ML capabilities. What appropriate classification performance
requirements? For example, does accuracy need to
be 99.99 percent? What is good enough from a
system safety perspective considering the function.
How do potential failures affect system architecture?
Verification - how will coverage of the entire
operational design domain (ODD) be demonstrated?
What is the verification approach (e.g., statistical)?
Structural Health UC-S02 Condition-based Strain and acceleration of Probability of pending
Monitoring recommendation for inspections each structure during failure requiring need
with structural location operations for inspection and
failure location
Inspections driven by structure condition. The Definition of an appropriate set of requirements
system advises locations to be inspected Data sufficiency, especially data when structure
before (or after) regular inspection intervals. damaged
Sensor installation/interface with structural design
Consistency with AOM/SRM/AMM
Cruise Altitude UC-S03 Recommendation for cruise On/Off board weather data Optimized cruise
Recommendation altitude and data from nearby altitude
aircraft.
System recommends cruise altitude that is How to set requirement
optimized for weather, traffic/conflict Training Data Sufficiency
avoidance, fuel economy, route performance, Objective function sufficiency
etc. in compliance with current rules. Consistency with AOM/FCOM etc.

Tire Burst UC-S04 Predict timing for exchange tires Time series sensor data Probability of tire burst
Prediction placed near tires and when next landing
picture image of tires
Predict tire burst probability with any How to set requirement
environment condition. Consistency with AMM/AOM
Local Weather UC-S05 Air traffic accident and incident Dataset from FDR and/or Warning for turbulence
Prediction avoidance. ADS-B data and/or and/or lightning etc.
weather radar. Recommended flight
Dataset from ground- path considering en-
based weather information route weather
condition.
System shows the alert message of air Training/ V&V data sufficiency
turbulence and/or lightning to pilot by visual Definition of operational design domain at top left of
alert or sound alert and give pilot a V
recommended reactive action. Safety requirement definition
Consistency with AOM/FCOM, etc.
Fuel Level UC-S06 Provide accurate fuel level for all Fuel level sensors Fuel level
Measurement fuel levels and aircraft attitudes.
Trained off-line. NN replaces a
look up table.
Need a neural network that is well behaved Training data sufficiency - generated from flight,
over the entire input state space. ground data collection for all aircraft attitudes and
fuel levels, generated from mathematical models.
Verification sufficiency - how to determine that there
are no discontinuities or nonlinearities between test
point measurements.

Safe Landing UC-S07 Continuously scan for a safe Object detection sensors Safe landing area
Area landing zone that is free of (e.g., radar, LIDAR, and
obstacles, people, and animals. camera)
ML Classifier learns off-line.
Performance can be affected by weather System Safety - what is the functional failure effect?
(e.g., rain, snow, fog,), sensor degradation to Are there any safety mitigations? Example - data
include sensor calibration, new foliage. base maintained of all safe landing areas over
specific routes. If a classifier is not available, aircraft
Minimize false positives. uses the latest updates in the database when an
emergency landing is required).
Training data sufficiency - to cover all the ODD to
include failures.
Requirements - what is the performance requirement
(e.g., accuracy) that is consistent with failure modes
and effects.
Verification - how will coverage of the entire ODD be
demonstrated? What is the verification approach
(e.g., statistical)? Will some form of continuous
verification be required using recorded aircraft data
and human surveys especially for new routes?
Maneuver UC-S08 ML-enabled maneuver decision Localization and Mapping Provision of maneuver
Planning support; potentially ML-enabled Sensors (e.g., conflict command options to
auto-maneuver geometry, intruder type, operator, or auto-
intruder threat assignment, command deconfliction
environmental traffic, and maneuver
obstacle state)
Algorithm must compute the best System Safety - what is the functional failure effect?
deconfliction path given the conflict specs Are there any safety mitigations?
and existing traffic and obstacles; Training data sufficiency - to cover all the ODD to
maneuver command must not create new include failures.
conflicts or other hazards at more than Requirements - what is the performance requirement
acceptable rate; safe operating distance is (e.g., accuracy) that is consistent with failure modes
still TBD for these environments. and effects.

UAM Route UC-S09 ML-enabled route planner for Air traffic conditions, point- Optimized route plan
Planning/ Urban Air Mobility environment to-point navigation aids,
Optimization GPS, camera, weather,
etc.
UAM route planning considering either point- System Safety - what is the functional failure effect?
to-point navigation or designated urban air Are there any safety mitigations? Example - data
routing waypoints taking into account all base maintained of all safe landing areas over
relevant flight conditions. specific routes. If a classifier is not available, aircraft
uses the latest updates in the database when an
emergency landing is required).
Training data sufficiency - to cover all the ODD to
include failures.
Requirements - what is the performance requirement
(e.g., accuracy) that is consistent with failure modes
and effects.
Electric UC-S10 Maximizes remaining battery Aircraft state (e.g., attitude, Optimum speed and
Aircraft Battery capacity during cruise. altitude, speed, battery altitude commands
Usage state and aging conditions, during cruise
Optimizer wind conditions, air temp,
aircraft CG, and weight)
Machine Learning converges to inefficient or System Safety - what is the functional failure effect.
invalid speed and altitude commands. Are there any safety mitigations? For example, a
Insufficient battery capacity remaining for monitor that checks the validity of the speed and
landing at both the primary and alternate altitude commands.
landing sites. Invalid speed and altitude Requirements - what is the ODD for all possible flight
commands. and battery aging conditions. What are the
acceptance criteria?
demonstrated? What is the verification approach?
Simulations, flight tests, formal methods? What is the
level of verification rigor needed that is
commensurate with the associated functional failure
conditions?
Neural UC-S11 Adaptively reduce cabin noise Cabin microphones that Cabin speaker noise
Network during flight. Noise canceller sample both noise and cancelling audio mixed
Active Noise learns on-line. aircraft speaker audio. with cabin public
Cancelling Aircraft speaker audio address (PA) audio
Active noise cancelling does not degrade the Safety - note the functional failure effects may be
intelligibility of intercom PA audio for the minimal.
entire ODD. Requirements - what is the ODD for all possible flight
conditions.
Verification - how will the active noise canceller be
verified for the entire ODD? Simulation may not
properly model the complex cabin and PA speaker
acoustics.

Neural UC-S12 Converts speech to text and text Microphone audio, radio Speech to text
Network to speech. Learns off line and on- audio, and textual translation for
Natural line during an initialization messages automatic use by the
Language process that samples the aircraft. Text to speech
Processing speaker’s speech. audio
Recognition performance may not satisfy the Requirements - what is the ODD for all possible
requirements for all possible speakers, operating conditions, all possible phrases and
operating conditions, and any possible speakers. What is the recognition performance
phrase/sentence. Recognition performance requirement(s)? Example: Accuracy 90%? And what
may be degraded by the ambient noise or is good enough from a system safety perspective
radio data link noise. Text to speech may not considering the functional failure effects? What is the
be intelligible. text to speech performance and what is good
enough?
demonstrated? What is the verification approach?
Simulations, ground tests and flight tests. Verification
rigor will be dependent on the functional failure
effects.
Diversion UC-S13 Decide whether to continue Aircraft state info, weather Boolean whether to
Decision planned mission or end early at a info, ATC info, destination, proceed or divert;
(UAS) suitable airfield and alternate airfield info New airfield location
(as input to route
planning)
Leveraging data about aircraft state (such as Requirement capture
aerodynamic, structure, system, and Requirement Validation
propulsion state,) as well as other situational Algorithm framework development with ML elements
factors, decide which airfields are viable or within it
preferred. FHA and PSSA to define performance needs and
assessment with respect to safety classification
Define data or scenarios for training NN. Tie Training planning and data/scenario selection
Safety assessment method into training NN validation with independent data/scenarios
approach and performance evaluation. Verification completion criteria to both show intended
function and support innocuity
Integration: could be RPAS or autonomous
onboard decision or manned a/c with
advisory to pilot.

Ditch Decision UC-S14 Decide whether to terminate Aircraft state info, weather Boolean whether to
(UAS) mission and land at a place to info, ATC info, destination proceed or terminate;
avoid injury to persons or damage and alternate airfield info, Termination location
to property or sensitive non-standard landing/ditch and path (as input to
environment location info. route planning)
aerodynamic, structure, system, and Requirement Validation
factors, decide whether airfields are viable, within it
and if not whether circumstances lead no FHA and PSSA to define performance needs and
alternative but termination; Indicate whether assessment wrt safety classification
termination can minimize ground impact or Training planning and data/scenario selection
not. Decide termination path (e.g., vector and NN validation with independent data/scenarios
smooth versus steep). Verification completion criteria to both show intended
Identify ways to assess state and situational function and support innocuity
info with enough confidence to support a
critical decision. Define data or scenarios for
training NN. Tie Safety assessment method
into training approach and performance
evaluation.
Integration: could be RPAS or autonomous

onboard decision or manned a/c with
advisory to pilot
Continue UC-S15 During takeoff roll, decide whether Aircraft state info, weather Boolean whether to
Takeoff versus to continue or reject takeoff info, ATC info, runway continue or reject
RTO Decision obstruction info takeoff;
If reject, command to
systems for abruptness
aerodynamic, structure, systems, and Requirement Validation
factors, decide on action. Define data or within it
scenarios for training NN. Tie Safety FHA and PSSA to define performance needs and
assessment method into training approach assessment with regards to safety classification
and performance evaluation Training planning and data/scenario selection
NN validation with independent data/scenarios
Verification completion criteria to both show intended

Continue UC-S16 During approach, decide whether Aircraft state info, weather Boolean whether to
Landing to continue or conduct a missed info, ATC info, destination, proceed or divert;
versus Missed approach or go around and alternate airfield info New airfield location
Approach or (as input to route
Go-Around planning)
Decision Details Assurance Gaps
With info about aircraft state (incl Requirement capture
aerodynamic, structure, systems, propulsion), Requirement Validation
approach stability, predicted landing spot and Algorithm framework development with ML elements
other situational factors, decide whether to within it
continue or not, and if not how abruptly to go FHA and PSSA to define performance needs and
around. assessment wrt safety classification
Define data or scenarios for training NN. Tie Training planning and data/scenario selection
Safety assessment method into training NN validation with independent data/scenarios
approach and performance evaluation. Verification completion criteria to both show intended
8. AIR TRAFFIC MANAGEMENT (ATM)/GROUND SYSTEMS OPERATIONS (GSO)
The concerns outlined in this document are illustrated by use cases drawn from various aerospace application domains.
The columns in the table include:
• Example - a brief descriptive title of the use case identifying the functionality provided by the ML-based system.
• ID - a unique identifier useful for reference in future work of the joint EUROCAE WG-114/SAE G-34 committee.
• Goal - a description of the function provided by the ML-based system.
• Inputs - sensor/data inputs to system under normal operation.
• Outputs - output data from ML-based system fed to downstream systems.
• Details - identification of the issue(s) illustrated by the use case.
• Integration - systems in which the AI use-case would be implemented into.
• Safety Concerns - identified safety concerns that would need to be mitigated to achieve successful implementation of
the use-case.
Table 2 - Air space traffic management use-cases

Multiple UAS or UC- Automatic prioritization of Voice or Merck Chat Prioritized List of Air
Manned Aircraft SC301 airspace requests Requesting Air Space Space Access
Prioritizing Air Access Request Requests
Space Access Details Integration
Request Natural Language Processing (NLP), Fully ATC
Connected NNs Safety Concerns
Aircraft can go into the same airspace, causing a
collision if airspace requests are not prioritized.
ATC can get over whelmed with Air Space access
Requests which can cause ATC to make mistakes,
causing aircrafts to go into the same airspace and
cause an accident.
Unexpected UC- System to detect storm clouds Video images New flight path,
Bad Weather in SC302 and reroute to new flight path. Notification to ATC of
Flight Path of new flight path
UAS Details Integration
System to detect storm clouds and reroute On board computer in the UAS
to new flight path. ATC
Safety Concerns
Bad weather can disrupt aircraft communications or
compromise the airframe of the aircraft which could
result in a compromise of safety.

Unexpected UC-SC303 System to detect CNN Object Likelihood for go around
Terrain in the high terrain and Classification as a single number
Flight Path of reroute to new safe Auto-Router
the UAS flight path.
Details Integration
System to detect terrain collision and reroute On board computer in the UAS
to new flight path.
ATC System
Safety Concerns
Increased air traffic can cause ATC workload which
can cause mistakes that can compromise safety.
Example ID Goal Input Output
Go-Around UC-SC304 Prediction of Go- Airline, AC Type, Likelihood for go around
Prediction Arounds at airports Weather Data, AC Track as a single number
before they happen
and without Pilot
communication into
account.
Details Integration
Traditional ML supervised or unsupervised ATC
learning
Safety Concerns
Increased air traffic can cause ATC workload which
can cause mistakes that can compromise safety.

Air Traffic UC-SC305 AI/ML techniques Inputs would be time Outputs would be air
Control Routing will be used to series sensor data. This traffic control flight
control flight paths of could be LIDAR, images, direction and
aircraft, both human object positions, speed, instructions.
and autonomously thrust, anticipated
driven. routing.
Details Integration
NN, reinforcement learning techniques, This would require a mix of off and onboard systems
computer vision, and LSTMs. Method and data analytics. Aircraft would require onboard
selected will depend on approach pursued. processing of data from their surroundings. They
On board processing of information will would need to feed this to other aircraft in the vicinity
require adequate compute resources. or send to a processing center. There would also
need to be offline analysis of flight paths to anticipate
traffic patterns and best routing to deal with
congestion and deconfliction.
Safety Concerns
How does AI build trust in such a risk averse area?
How is intent communicated when the AI and

humans (either other air traffic controllers or pilots)
are interacting?
Training of UC-SC306 A service All recorded operational Training scenarios,
Operators established with the data decision-aid-making
(ATC) objective of
contributing to a
safe, orderly, and
expeditious flow of
air traffic by ensuring
that ATC capacity is
utilized to the
maximum extent
possible, and that
the traffic volume is
compatible with the
capacities declared
by the appropriate
ATS authority.
Details Integration
Machine Learning Techniques ATC, Aerodromes
Safety Concerns
The AI-based training shall not generate training
material that contradict the applicable procedures.
The integrity and the consistency of the training
material shall be ensured.

Air Traffic Flow UC-SC307 A service Air Traffic Tracks Air Traffic Flow Plan
Management established with the
objective of
contributing to a
safe, orderly, and
expeditious flow of
air traffic by ensuring
that ATC capacity is
utilized to the
maximum extent
possible, and that
the traffic volume is
compatible with the
capacities declared
by the appropriate
ATS authority.
Details Integration
Machine Learning Techniques ATC
Safety Concerns
Highly trafficked airspace can overload ATC and
cause accidents.
Time-Based UC-SC308 Allow for stable Headwind Conditions, Time to keep aircraft
Separation arrival runway Aircraft Tracks separated on approach
throughput in all to landing
headwind conditions
on final approach.
Details Integration
Safety Concerns
The move from distance to time-based rules allowing
efficient separation management request to properly
model/predict
Remote Towers UC-SC309 Provide tower Video images Tower ATC for aircraft
services for airports Wind conditions operations
without a dedicated Aircraft Tracks
tower. The remote Ground vehicle traffic
tower uses sensors Voice
and data around the Radar
physical airport and
transmits to a
remote tower
operator
Details Integration
Neural networks
Safety Concerns
Loss of view from airport to remote tower will cause a
hold or stop on traffic. Remote tower operations may
lose situational awareness in unpredictable events.
Table 3 - Ground traffic management use cases

Ground UC- Automatic prioritization of Runway availability, and ATC to aircraft
Operations Taxi SC310 airspace requests taxi space state commands. Aircraft to
ATC status.
Details Integration
Natural Language Processing (NLP), Fully ATC, UAS, Manned Aircraft
Connected NNs Safety Concerns
Air vehicle collision as a result UAV or Manned
Aircraft being directed incorrectly and collision
avoidance malfunction.
Taxi to Runway UC-SC311 Automation of Tower Control Runway state ATC to aircraft
Operations during ground operations commands. Aircraft to
ATC status.
Details Integration
Collision detection ATC, UAV, Manned Aircraft
Computer Vision and Radar Object Safety Concerns
Detection for runway awareness Air vehicle collision as a result UAV or Manned
Voice to text Natural Language Processing Aircraft being directed incorrectly and collision
Text processing Natural Language avoidance malfunction. Could lead to loss of life
Processing for two tasks, to intent and to within Aircraft with passengers which makes it a
Named Entity Recognition catastrophic failure.
Fuel-Efficient US-SC312 Automatically assigns a Description of all inbound Plane-runway affectation
Runway runway to each plane in an aircrafts with their fuel and order of sequence
Affectation with airport circuit and orders level
RL them in order to minimize
global fuel consumption with
respect to security margins
Details Integration
Constraint Programming ATC
(CSP), branch-and-bound Safety Concerns
solver with priors determined
with reinforcement learning
over simulated problems
Fuel-Efficient UC- Automatically assigns a runway Description of all inbound Plane-runway
Runway SC313 to each plane in an airport circuit aircrafts with their fuel affectation and order of
Affectation and orders them in order to level sequence
minimize global fuel
consumption with respect to
security margins
Details Integration
Constraint Programming (CSP), branch-and- ATC, UAV, Manned Aircraft
bound solver with priors determined with Safety Concerns
reinforcement learning over simulated Ordering too many aircraft to queue up for a runway
problems has the potential safety issue of collision which could
lead to loss of aircraft or people.
Table 4 - Airworthiness use cases

ATC System UC-SC314 Flight commands from auto Waypoints Flight Commands that do
Sends Auto router do not exceed air frame not exceed air frame
Routing performance limits. performance limits.
Commands to
UAS.
Details Integration
ML Techniques ATC
Safety Concerns
Flight commands from auto-router can damage
aircraft with flight commands
Table 5 - Air traffic communications use cases

Voice to Text UC- Automatic logging of ATC ATC Voice Text
SC315 voice communications
Details Integration
Cognitive systems, including speech ATC
recognition, text to speech, and Natural Safety Concerns
Language Processing (NLP) would be used
to automate ATC communications over
traditional voice channels to both human
controlled and autonomously functioning
aircraft.
Voice or Text to UC- Automating ATC Voice or Text Intent and Named
Intent SC316 communications Entities
Details Integration
Cognitive systems, including speech ATC
recognition, text to speech, and Natural Safety Concerns
Language Processing (NLP) would be used Depends upon level of autonomy but in general a
to automate ATC communications over failure of these algorithms could lead to several types
traditional voice channels to both human of losses such as loss of aircraft or personnel. Injury
controlled and autonomously functioning to personnel is also possible.
aircraft.
Table 6 - Remote ID from NAS infrastructure use cases

Target UC- Detecting aircrafts and other Sequence of Camera Asterix (or similar)
Detection, SC317 objects on an airfield. Classifying images from multiple output of tracks plus
Classification, the targets and optionally camera locations bounding boxes/pixels
and identifying them. on target.
Identification Details Integration
Convolutional neural network, e.g., ResNet- ATC, Service Provider
50 and others Safety Concerns
Automation/AI/ML interoperability.
Timeliness to detect, classify, and ID with minimal
impact on separation standards.
USS and ANSP interoperability
Table 7 - Mixed man and unmanned operations use case

Loading and UC- Use AI/ML to help facilitate the Time series sensor data. robot positioning
Unloading SC318 movement of cargo. This could LIDAR, images, commands
Cargo play out by controlling package employee/object positions.
handling autonomous robots, historical order data
automated delivery of materials,
or automated loading/unloading.
Details Integration
Technology will pull heavily from robotics Cargo Loading, ground support
research. Most likely will use NN, Safety Concerns
reinforcement learning techniques, computer AI interface with handlers
vision, and LSTMs. On board processing of Travel deconflicting travel paths
information will require adequate compute
resources.
Arial Refueling UC- Air to air refueling operation is Time series sensor data. aircraft or refueling
SC319 conducted using an autonomous LIDAR, images, boom positioning
tanker, or an autonomous object positions. commands
aircraft is refueled.
Details Integration
Technology will pull heavily from robotics UAV, Manned Aircraft, Aircraft to Aircraft
research. Most likely will use NN, Safety Concerns
reinforcement learning techniques, computer Tanker communications with pilot
vision, and LSTMs. On board processing of
information will require adequate compute AI communicate refueling complete
resources.
Delivery UC- Using AI to help predict when a This will depend on how Outputs would be
Interaction and SC320 part will be available for the much fidelity is built into part/object delivery
Forecasting person ordering. Or the AI can the system. This could estimate along with
help suggest alternate courses require adding sensors to potential
of action to get around a factories, using historical recommendations to
potential part shortage by using supply chain data, or avoid any process
different looking at how previous blocks.
production/repair/ordering steps. tasks were executed.
Details Integration
This will depend on the system design. Service Provider
Methods employed will most likely use a Safety Concerns
combination of traditional statistics, ML/AI How does AI inform user of expected delivery?
algorithms and potentially more advanced
deep learning method. What happens if part is not available?
Communicating UC- In many applications a ML/AI This will depend on the Outputs could be a
Uncertainty SC321 model may be uncertain in some system. confidence interval or
situations about what actions to some sort of metric
take/recommend. In these cases quantifying uncertainty.
how does they system
communicate this to the user so
they can take appropriate
action?
Details Integration
Technology will depend on the application General
and the method used. Safety Concerns
AI inform users of uncertainty
Alternative methods, concerns communicated

Table 8 - Predictive maintenance use cases

Off-Board UC- Predict with high-specificity and Low-level time-series Failure message (can
Predictive SC322 high-accuracy an on-board sensor data collected and be EICAS/ECAMS
Maintenance failure with enough lead time to sent through a digital message) + anticipated
plan an optimized reaction acquisition unit or data failure time +
gateway. confidence of failure
prediction
Details Integration
Combination of existing data cleansing/ETL Aircraft owner, maintenance operation
+ ML and other statistical methods to do big- Safety Concerns
data predictive maintenance Minimal, assuming existing procedures + instructions
for parts handling are followed, and that scheduled
maintenance is performed, as required.
On-Board UC- Predict with high-specificity and Low-level time-series EICAS/ECAMS
Predictive SC23 high-accuracy an on-board sensor data managed message with predictive
Maintenance failure without having to send through high-bandwidth notation + anticipated
data to an off-board data center digital acquisition unit failure time +
for analysis. confidence of failure
prediction
Details Integration
Embedded NNs + other existing statistical Aircraft owner, maintenance operation
methods (embedded) + on-board hardware Safety Concerns
for complex analytical processing Minimal, assuming existing procedures + instructions
for parts handling are followed, and that scheduled
maintenance is performed, as required.
Prescriptive UC- Given a failure statement Combination of low-level Recommended AMM
Maintenance SC324 (whether predictive or reactive,) time series data + ACMS procedure with
(Off-Board) prescribe a most efficacious event-based data + explanation as to why it
repair based on collected MX data/content collected off- is most efficacious fix +
history and expertise aircraft (MX Logs+ MEL statistical confidence
Data + Post Flight rating
Reports)
Details Integration
Combination of existing ML/NN techniques Aircraft Owner, Maintenance Operation
paired with NLP (rules-based or ML,) plus Safety Concerns
statistical analysis Significant if fault/root-cause analysis is incorrect,
and an action is taken that is leads to compromised
safety
Active UC- Embed predictive + prescriptive Low-level time-series Automated action to
Resilience/Self SC325 MX technology in components to sensor data + MX history prevent failure or
Repair Systems enable self-repair capability data collected onto aircraft conduct self-repair
Details Integration
Embedded NNs + other existing statistical Aircraft embedded systems, Aircraft owner
methods (embedded) + on-board hardware Safety Concerns
for complex analytical processing Minimal to Moderate if prediction is wrong, but repair
has no effect on safe operation. Significant if
fault/root-cause is wrong, and repair has
demonstratable effect on safe operation.
9. CONCLUSIONS AND WAY FORWARD
The Statement of Concerns (SOC) objectives were to (a) align the group (EUROCAE WG-114 and SAE G-34) on a common
understanding of AI techniques, (b) outline the concerns that use of such techniques would cause to the development of an
aeronautic system, and (c) recommend an efficient roadmap and organization to develop a means of compliance for AI
certification.
Section 3 (Classification of AI Techniques) identifies a classification of AI into three branches (Symbolic AI, Machine
Learning, and Numerical Analysis),and the rest of the SOC focusses on Machine Learning (ML) with the stipulation that
only offline learning is concerned at this time. This choice was made because Machine Learning appeared as the broadest
and most widely considered technique for use in aerospace development, while also being the most challenging to certify
with existing standards. However, this choice does not preclude the consideration of other techniques in the scope of future
standards produced by EUROCAE WG-114 and SAE G-34.
Section 4 (Gap analysis) considers the main design assurance standards for airborne and ground systems. If significant
gaps have been identified that make them not sufficient to the development of a ML-based system, many of their objectives
remain valid. This is the case for instance of system development or safety assessment processes: their objectives are still
applicable. However, specificities of ML Model development should be considered through specific guidance or methods.
Section 5 (ML development specific considerations and areas of concerns) exposes ML development specific
considerations and dives into areas of concern, including:
• The difficulty tracing requirements to/from NN models
• The configuration management of AI/ML datasets and databases
• The obsolescences of current software development models with regards to AI/ML
• The fact that NN weights are not comprehensible to humans
• The fact that current code coverage analysis may not be practical for neural networks and other machine learning
structures
• The problems concerning online versus offline learning
• The fact that current testing methods may not be appropriate for AI/ML sub-systems
Assuming that many objectives of existing system development standards (e.g., ARP4754) remain applicable, this section
proposes a ML workflow in the overall development flow of the system for which it pertains. This can be summarized by the
below figure.
Figure 9 - ML workflow in a system development
It should be noted that an AI/ML-based system development requires data scientist expertise in addition to the usual system,
safety, hardware, and software expertise.
Section 6 (Potential Next steps) suggests an approach for an ML-based systems certification/approval and detailed potential
development assurance activities that could be considered in the frame of standard development.
Section 7 (Use Cases - Aircraft Systems) and Section 8 (Air Traffic Management (ATM)/Ground Systems Operations (GSO))
collect use cases of aeronautical functions that would benefit from the use of AI techniques. This important work constitutes
a representative panel of the industrial needs at this point and it consolidates the necessary focus on ML techniques. No
new areas of concerns were raised compared to those identified in Section 5.
As a conclusion of the studies and analyses done by EUROCAE teams and SAE Sub-Committees through the SOC
document, the group recommends to breakdown into sub-groups as per the diagram below in order to efficiently develop a
standard or set of standards to serve as a means of compliance for AI and ML certification.
Figure 10 - SAE G-34/EUROCAE WG-114 joint committee structure
This new organization was approved during June 20 virtual plenary.
10. NOTES
10.1 Revision Indicator
A change bar (I) located in the left margin is for the convenience of the user in locating areas where technical revisions, not
editorial changes, have been made to the previous issue of this document. An (R) symbol to the left of the document title
indicates a complete revision of the document, including technical revisions. Change bars and (R) are not used in original
publications, nor in documents that contain editorial changes only.
PREPARED BY SAE G-34, ARTIFICIAL INTELLIGENCE IN AVIATION

Air 6988

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Air 6988

Uploaded by

Copyright:

Available Formats

Downloaded from SAE International by Rosa Maria Arnaldo, Tuesday, April 25, 2023

INFORMATION REPORT Issued 2021-04

Artificial Intelligence in Aeronautical Systems: Statement of Concerns

2. REFERENCES AND DEFINITIONS ............................................................................................................. 5

3. CLASSIFICATION OF AI TECHNIQUES ................................................................................................... 13

4. GAP ANALYSIS FROM EXISTING STANDARDS ..................................................................................... 19

SAE INTERNATIONAL AIR6988™ Page 2 of 61

5. ML DEVELOPMENT SPECIFIC CONSIDERATIONS AND AREAS OF CONCERNS .............................. 29

6. POTENTIAL NEXT STEPS ......................................................................................................................... 33

7. USE CASES - AIRCRAFT SYSTEMS ........................................................................................................ 44

8. AIR TRAFFIC MANAGEMENT (ATM)/GROUND SYSTEMS OPERATIONS (GSO) ................................ 51

9. CONCLUSIONS AND WAY FORWARD .................................................................................................... 59

10. NOTES ........................................................................................................................................................ 61

Figure 1 AI classification ........................................................................................................................................... 13

SAE INTERNATIONAL AIR6988™ Page 3 of 61

Table 1 Example use cases for aircraft systems ..................................................................................................... 45

SAE INTERNATIONAL AIR6988™ Page 4 of 61

WG-114 G-34 Scope

SAE INTERNATIONAL AIR6988™ Page 5 of 61

2. REFERENCES AND DEFINITIONS

[1] NASA/Honeywell, “DOT-FAA-TC-16-4 Verification of Adaptive Systems,” 2016.

SAE INTERNATIONAL AIR6988™ Page 6 of 61

SAE INTERNATIONAL AIR6988™ Page 7 of 61

SAE INTERNATIONAL AIR6988™ Page 8 of 61

This set of definitions is meant to be a reference for the whole document.

SAE INTERNATIONAL AIR6988™ Page 9 of 61

SAE INTERNATIONAL AIR6988™ Page 10 of 61

NOTE: These terms are not used consistently in the ML community.

DECISION MAKING: The selection of a course of action based on available information.

DOMAIN: Specific field of knowledge or expertise.

See also the related definition of “Training.”

SAE INTERNATIONAL AIR6988™ Page 11 of 61

ITEM: A hardware or software element having bounded and well-defined interfaces.

Machine Learning strategies include four methods: [1][2]

SAE INTERNATIONAL AIR6988™ Page 12 of 61

REINFORCEMENT LEARNING (MACHINE LEARNING): See Machine Learning definition.

SEMI-SUPERVISED LEARNING: See Machine Learning definition.

SUPERVISED LEARNING: See Machine Learning definition.

SYSTEM: A combination of inter-related elements arranged to perform specific functions.

UNSUPERVISED LEARNING: See Machine Learning definition.

SAE INTERNATIONAL AIR6988™ Page 13 of 61

• Positive weight → excitatory relation

• Negative weight → inhibitory relation

SAE INTERNATIONAL AIR6988™ Page 14 of 61

3.1.1.1 Knowledge Engineering

Figure 2 - The spectrum of ontology kinds [34]

SAE INTERNATIONAL AIR6988™ Page 15 of 61

3.1.1.1.2 Knowledge Bases

3.1.1.1.3 Expert Systems

3.1.2 Numerical Analysis

3.1.2.1 Algorithmic Approaches

3.1.2.1.1 Iterative Algorithms

3.1.2.1.2 Randomized Algorithms

3.1.2.1.3 Population Methods

Genetic (Evolution) Algorithms have a flavor of “bottom-up” or “emergent” problem solving.

SAE INTERNATIONAL AIR6988™ Page 16 of 61

3.1.2.1.4 Probabilistic Algorithms

3.1.2.1.5 Search Algorithms

3.1.2.1.6 Direct Approaches

3.1.3 Machine Learning (ML)

3.1.3.1 Supervised Learning

3.1.3.2 Unsupervised Learning

3.1.3.3 Semi-Supervised Learning

SAE INTERNATIONAL AIR6988™ Page 17 of 61