You are on page 1of 28

DO Qualification Kit

Polyspace® Code Prover™


Theoretical Foundation

R2015a, March 2015


How to Contact MathWorks
Latest news: www.mathworks.com
Sales and services: www.mathworks.com/sales_and_services
User community: www.mathworks.com/matlabcentral
Technical support: www.mathworks.com/support/contact_us
Phone: 508-647-7000

The MathWorks, Inc.


3 Apple Hill Drive
Natick, MA 01760-2098
DO Qualification Kit: Polyspace® Code Prover™ Theoretical Foundation
© COPYRIGHT 2013–2015 by The MathWorks, Inc.
The software described in this document is furnished under a license agreement. The software may be used or copied only under
the terms of the license agreement. No part of this manual may be photocopied or reproduced in any form without prior written
consent from The MathWorks, Inc.
FEDERAL ACQUISITION: This provision applies to all acquisitions of the Program and Documentation by, for, or through the
federal government of the United States. By accepting delivery of the Program or Documentation, the government hereby agrees
that this software or documentation qualifies as commercial computer software or commercial computer software documentation
as such terms are used or defined in FAR 12.212, DFARS Part 227.72, and DFARS 252.227-7014. Accordingly, the terms and
conditions of this Agreement and only those rights specified in this Agreement, shall pertain to and govern the use, modification,
reproduction, release, performance, display, and disclosure of the Program and Documentation by the federal government (or
other entity acquiring for or through the federal government)and shall supersede any conflicting contractual terms or conditions.
If this License fails to meet the government’s needs or is inconsistent in any respect with federal procurement law, the
government agrees to return the Program and Documentation, unused, to The MathWorks, Inc.

Trademarks
MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See www.mathworks.com/trademarks for a
list of additional trademarks. Other product or brand names may be trademarks or registered trademarks of their respective
holders.

Patents
MathWorks products are protected by one or more U.S. patents. Please see www.mathworks.com/patents for more
information.
Revision History
September 2013 New for Version 2.2 (Applies to Release 2013b)
March 2014 Revised for Version 2.3 (Applies to Release 2014a)
October 2014 Revised for Version 2.4 (Applies to Release 2014b)
March 2015 Revised for Version 2.5 (Applies to Release 2015a)
Contents
1 Introduction ...................................................................................................................................... 1-1
2 Initial Requirements ......................................................................................................................... 2-1
3 High-Level Semantics of Programming Languages......................................................................... 3-1
Definition 1. (Operational Semantics) ......................................................................................... 3-2
Definition 2. (Kripke Structures with Single Failure) .................................................................. 3-3
Definition 3. (Run-Time Errors) .................................................................................................. 3-4
Definition 4. (Strongest Invariant at k) ........................................................................................ 3-5
Definition 5. (Run-Time Error Modalities/Colors) ...................................................................... 3-6
Proposition 6. (Run-Time Error Modalities Are Noncomputable)............................................... 3-7
Definition 7. (Admissible Check Modalities) .............................................................................. 3-9
Proposition 8. (Semantics of C#(k))............................................................................................ 3-10
4 High-Level Specification of Polyspace Code Prover Outputs ......................................................... 4-1
Requirement HLR-1. (Soundness) ............................................................................................... 4-2
Requirement HLR-2. (Run-Time Errors Yield Checks) .............................................................. 4-2
Requirement HLR-3. (Check Colors Are Sound) ........................................................................ 4-2
Requirement HLR-4. (Call Graphs) ............................................................................................. 4-2
Requirement HLR-5. (Call Graphs Are Sound) ........................................................................... 4-2
Requirement HLR-6. (Data Dictionaries) .................................................................................... 4-2
Requirement HLR-7. (Data Accesses Referenced in Dictionaries Are Sound) ........................... 4-3
Requirement HLR-8. (Shared Statuses Referenced in Dictionaries Are Sound) ......................... 4-3
Requirement HLR-12. (Compliance with coding standard) ......................................................... 4-3
Requirement HLR-13. (Coding metrics) ...................................................................................... 4-3
5 High-Level Specification of Polyspace Code Prover Outputs ─ Independence............................... 5-1
Requirement HLR-10. (Component Independence)..................................................................... 5-2
Requirement HLR-11. (Behavior Independence)......................................................................... 5-2
6 References ........................................................................................................................................ 6-1
6.1 Reference Documents.............................................................................................................. 6-2

v
vi
1 Introduction

This document describes the Theoretical Foundation for the Polyspace® Code ProverTM
verification tool. It is intended for use in the DO-178C tool qualification process for verification
tools.

This document comprises the Tool Operational Requirements (reference DO-330 Section
10.3.1) for the following verification tools:
 Polyspace® Bug FinderTM
 Polyspace Code Prover
The Tool Operational Requirements are defined as High-Level Requirements (HLRs) in this
document. The Tool Requirements are defined as Operational Requirements (ORs) and
Language Specific Requirments (LSRs) in the Tool Operational Requirements documents. To
comply with DO-330, Polyspace Bug Finder and Polyspace Code Prover Tool Requirements
trace to HLRs.

The following table summarizes the documents in which the Tool Operational Requirements and
Tool Requirements are defined. The table also provides the name of the requirement traceability
matrices.

Verification Tool Operational Tool Requirements Traceability


Tool Requirements

Polyspace Defined as HLRs Defined as ORs and Polyspace Bug Finder


Bug Finder in Polyspace Code LSRs in Polyspace Requirement Traceability
Prover Theoretical Bug Finder Tool Matrix
Foundation (this Operational qualkitdo_bugfinder_HLR_
document) Requirements OR_LSR.trace.xlsx

Polyspace Defined as HLRs Defined as ORs and Polyspace Code Prover


Code Prover in Polyspace Code LSRs in Polyspace Requirement Traceability
Prover Theoretical Code Prover Tool Matrix
Foundation (this Operational qualkitdo_codeprover_HLR
document) Requirements _OR_LSR.trace.xlsx
This document is a high-level specification of Polyspace Code Prover code verification
technology for ANSI C and ISO C++. It starts from the user need to detect more bugs earlier on
in the development process and in a more automated way. Conventional technologies, such as
testing, should be disruptively improved. MathWorks® particularly focuses on run-time errors,
which typically represent 30-40% of the total program errors that occur after delivery. These
errors are both difficult to detect through conventional means, and can have significant negative
impact, which can range from erroneous outputs to system crash or security breaches.

This document describes the formal, theoretical background of Polyspace Code Prover
technology. Detecting automatically and exhaustively run-time errors in general programs is
very complex. The seminal idea, introduced by Ben Wegbreit in 1974 and 1975, is to perform
approximate computations in which the direction of approximation is controlled. These
approximations are formalized by closure operators on algebraic structures called complete
lattices, starting with a mathematical model of program execution via operational semantics and
Kripke structures.

Having described the theoretical framework, this document then describes high-level
requirements for the outputs of Polyspace Code Prover for ANSI C and ISO C++, as well as the
independence of Polyspace Code Prover outputs with respect to tools to which it is coupled.
These requirements are linked to operational requirements which can be found in accompanying
documents. These requirements apply to the core of Polyspace Code Prover and do not apply to
its peripherals, such as user interfaces that involve launching or exploitation interfaces.

Polyspace® Bug FinderTM identifies run-time errors in C and C++ embedded software. Polyspace
Bug Finder does not prove the absence of run-time errors. Polyspace Bug Finder uses the same
theoretical foundation as Polyspace Code Prover, but it is not irrefutable with respect to
identification of run-time errors.

1-2
2 Initial Requirements

Designing a bridge, choosing the trajectory for the launch of a communication satellite,
optimizing the shape of a plane wing, estimating the multiple echo effects of urban buildings in
cellular phone communications: what is common among these industrial activities is high-speed
processors and applied mathematics. The central paradigm is to model a physical world system
as a set of mathematical equations, solving these equations using high-speed processors, and
finally using the solutions to these equations to predict the behavior of the physical system.

The software industry has not yet really leveraged this paradigm to optimize its own verification
and validation processes. Polyspace Code Prover brings to the software industry the power of
applied mathematics and high-speed modern processors. The Polyspace Code Prover software
aims at helping users simultaneously:

 Automate specific software development and verification processes


 Increase the reliability of software

The main criteria for software are:

 Functional correctness: computes expected outputs


 Temporal correctness: computes outputs within specified time bounds
 Robustness: does not halt, crash, or behave erratically because of run-time errors

Run-time errors are an important cause of software defects. The study of Sullivan and
Chillarege1 conducted at Berkeley and IBM® Watson found that many software defects
addressed during a four-year maintenance phase on large IBM codes are due to run-time errors.
Memory allocation errors, array out of bounds, uninitialized pointers, and pointer management
errors accounted for 26% of all observed software faults and more than 57% of the highest
severity faults, causing system outage or major disruption.

1 M. SULLIVAN AND R. CHILLAREGE, Software defects and their impact on system availability, proc. 21th International
Symposium on Fault-Tolerant Computing (FTCS-21), Montreal, 1991, 2-9, IEEE Press.
The Polyspace Code Prover software targets the mathematical modeling paradigm regarding
run-time errors. Polyspace Code Prover addresses two essential needs:

 Static verification: statically predicting specific classes of run-time errors and sources of
nondeterminism
 Semantic browsing: statically computing data and control flow to ease program
understanding, verification, or qualification

Given a source program, P, written in source programming language L, you want to compute
statically (without specific input data) and automatically a conservative model of the future
dynamic, run-time behavior of P. You also want to extract from this model predictions about the
possible occurrences of run-time errors and sources of nondeterminism (for static verification),
as well as data and control flow information (for semantic browsing).

This document serves as a reference for the design of Polyspace Code Prover and as a criterion
for functional validation testing. MathWorks uses an established tool life cycle process to
address tool development and verification activities. Hardware errors, coding errors, testing
errors, documentation errors, or other unforeseen circumstances may cause significant
deviations between expected behavior and actual behavior of the software tool. Therefore, this
document and the associated documents do not imply that MathWorks explicitly or implicitly
guarantees that Polyspace Code Prover is fully compliant to the specification, that it always
delivers correct results, or that it conforms to the user needs.

2-2
3 High-Level Semantics of
Programming Languages

Program behavior and run-time errors are formalized to provide a firm basis for the specification
of Polyspace Code Prover outputs.
Definition 1. (Operational Semantics)
The operational semantics of a program, P, written in programming language L, consists in the
set of finite and infinite execution traces O[P]2Trace. An execution trace is a time-evolving
sequence of states defined as Trace = State. Each trace Trace is a function from positive
integers to states. These integers represent the discrete computation time measured as the
number of elementary language constructs executed since program start.

The formal behavior of program P consists in the set of all possible runs of P, where each run is
represented by a possibly infinite sequence of states. States can be chosen according to the
programming language. Consider a simple flowchart programming language consisting of
integer variables, integers, arithmetic operations, assignments, conditionals, and loops. States
can be defined as pairs consisting of an integer representing the current flowchart instruction to
be executed, and a vector of integers in an n-dimensional state, where n is the number of
variables in the flowchart program, P, under consideration.

Infinite traces correspond to either diverging (looping) programs or nonterminating programs (a


server loop, for instance).

3-2
Definition 2. (Kripke Structures with Single Failure)
Given a Kripke structure2 with single failure (State, succ, ) associated to program P, where:

 succ  State  2State is a transition function that relates each state to its successors;
  State  2A, is a valuation that associates each state with the set of atomic formulas true
in this state. A contains the distinguished elements error, final, initial and the set {at1,at2,…};
 s State, error (s)  succ(s)= ;
 s State, succ(s)=  (s) {error, final}.
O[P] = { Trace | initial ((0)), (1) succ((0)), …, (n+1) succ((n)), …}

If a state s  State is such that succ(s)=, the state is final.

If atk  (s), state s is at program point k.

The transition function associates zero, one, or several successors with a given state: a state with
no successors is either an error state or a final state (corresponding to the nominal termination of
program P); a state with one successor is the ordinary case; and a state with several successors
corresponds to nondeterminism. Nondeterminism can be the result of interleaving tasking or can
occur as a modelization of input/output from the world, external to the computer under
consideration.

No further hypotheses on how the transition function and the set of states are defined. This is
called semantics of programming languages. For further information, see the Handbook of
Theoretical Computer Science by Van Leeuwen3.

2 S.KRIPKE.A completeness proof in modal logic. J. Symbolic Logic, 24 :1-14, 1959.


3 J.VAN LEEUWEN. Handbook of theoretical computer science. The MIT Press,1990.

3-3
Definition 3. (Run-Time Errors)
A run-time error occurs in state s State if and only if error (s).

An execution trace  can be any one of the following:

 Infinite and without run-time errors


 Finite ending with a final state
 Finite ending with an error state

What is the connection with actual programming languages? The ANSI C and C++ standards
have an immediate and precisely defined notion of run-time error, as the standard gives an
informal but precise definition of the cases where a run-time occurs. Examples of run-time
errors include indexing an array out of its bounds, dividing by zero, referencing an illegal field
of a structure, or dereferencing a dangling pointer.

3-4
Definition 4. (Strongest Invariant at k)
The strongest invariant at k in program P as the set of states is:

SGI(k)={(t) | O[P], t, atk((t))}

SGI(k) is the set of all possible states that are at point k and reachable in program P. It can be
equivalently formulated by translating the source program, P, to a system of equations that has
one equation per program point456. For each program point k, the solution yields the invariant
SGI(k).

4 R. FLOYD, Assigning meaning to programs. In Mathematical Aspects of Computer Science, Proc. of Symposia on Applied
Mathematics, American Mathematical Society, 19-32, Providence, 1967.
5 D. PARK, Fixpoint induction and proofs of program properties, in Machine Intelligence, Edinburgh Univ. Press, 5 : 59-78, 1969.
6 E. CLARKE, Program invariants as fixedpoints, Computing 21- : 273-294, 1979.

3-5
Definition 5. (Run-Time Error Modalities/Colors)
In some instances, deviation from the reference workflow explained in this document might
occur.

In a program, P, define the modality (color) C(k){gray,red,green,orange} associated to


program point k as follows:

 C(k) = gray  SGI(k)=


 C(k) = red  SGI(k) and s SGI(k), error (s)
 C(k) = green  SGI(k) and s SGI(k), error (s)
 C(k) = orange   s1, s2 SGI(k), error (s1) and error (s2)

This formal definition of the (semantic, exact) color (modality) associated with program point k
is the cornerstone of the Polyspace Code Prover specification. Polyspace Code Prover does not
rely on a binary partition of cases (correct versus incorrect) but on a more expressive set of four
modalities. Having more than two modalities is a common phenomenon in modal or temporal
logics.

A program point associated with gray or green will not raise run-time errors during execution. A
program point associated with red triggers a run-time error if executed. Gray identifies a
program point that cannot be executed (dead code) and orange is associated with a program
point that can intermittently execute correctly or incorrectly.

The color C(k) is a prediction of the future behavior of program P. If you compute for each
program point k of a program P the modality C(k), then there is a powerful means of verifying
the absence of run-time errors. First fix the red errors in P and then fix the orange program
points by either inserting protection code or correcting the cause of the underlying red case. At
the end, there is a program containing only gray and green program points that is completely
free of run-time errors in any future execution.

It is not possible to automatically compute these modalities C(k) on general programs in


common programming languages.

3-6
Proposition 6. (Run-Time Error Modalities Are
Noncomputable)
Given an arbitrary program, P, written in programming language L, the run-time error
modalities for P are noncomputable in finite time by any initially established means.

Observe that the halting problem (deciding if a program stops) is reducible to the run-time error
problem. As the halting problem has been shown to be undecidable by Church and Turing7 in
the general case, and for computer programs by Hoare and Allison8 , it follows that the
computation of C(k) is undecidable as well, which in turn implies noncomputability in finite
time.

What is the practical significance of this theoretical fact? Does it mean that run-time errors are a
problem for which theoretical computer science cannot help? When confronted with a problem
for which computational complexity is too difficult, you can resort to approximate methods, if
they can provide close enough solutions for an acceptable time and memory space usage.

What is a meaningful approximation in this case?

 Probabilistic or statistical methods give results of the form “C(k) is green with a 95%
confidence interval”. These results raise difficult questions to answer, such as: What is the
underlying notion of distribution? Such probabilistic information cannot validate highly
critical systems such as power plant control systems or fly-by-wire software.
 Algebra, lattice theory9, and logics introduce another approximation that is based on
implication and partial orderings. This approximation obtains results of the form: “C(k) is not
orange or red”. This is an approximate property; it could mean that C(k) is actually either
gray or green. Despite the approximation, it can be directly used because it implies that the
operation at k is never wrong.

This document pursues this second approach, following the pioneering work of Wegbreit10,
further extended by Karr, Cousot, & Cousot, Halbwachs, Jones, Sharir, & Pnueli, and others to
more complex language constructs and more expressive properties.

7 A. TURING. Computability and -definability. J. Symbolic Logic, 2 :153-163, 1937.


8 C. HOARE AND D. ALLISON. Incomputability, ACM Computing Surveys, 4(3) :169-178, 1972.
9 G. BIRKHOFF, Lattice Theory, American Mathematical Society, 1940.
10 B. WEGBREIT, Property extraction in well-founded property sets, IEEE Transactions on Software Engineering, 1(3):270-285,
1975.

3-7
The key idea of Wegbreit’s work is that while the exact program property may not be
computable, a weaker property implied by the exact one may be computable. He also devises a
means to compute these approximate properties by replacing the exact invariant propagation of a
Floyd-style equation system by its image in a weaker space through an approximation operator
. If properly designed, the approximated system is solvable and its solution, called SGI#(k) at a
particular program point k, is related to that of strongest global invariant by: (SGI(k)) 
SGI#(k).

This approximate invariant can in turn be used in the computation of run-time error modalities
(colors) according to Definition 5. Replacing the exact invariant SGI(k) at point k in this
definition by its supersets yields an approximate check color C#(k) that is related to the exact
color C(k) by the following relation.

3-8
Definition 7. (Admissible Check Modalities)
An approximate check color C#(k) is admissible with respect to the exact check color C(k) if,
and only if:

C(k)  C#(k)

The following lattice diagram defines the ordering  between check colors.

The partial order relation  is defined as follows:

 Gray Red Green Orange


Gray 1 1 1 1
Red 1 1
Green 1 1
Orange 1

In turn, this induces the following meanings for the color defined by C#(k).

3-9
Proposition 8. (Semantics of C#(k))
If C#(k) is computed with an approximate invariant that is a superset of SGI(k), then it is related
to the exact color C(k), as follows:

 If C#(k) = gray, then C(k) = gray


 If C#(k) = green, then C(k)  {green,gray}
 If C#(k) = red, then C(k)  {red,gray}
 If C#(k) = orange, then C(k)  {red,green,gray,orange}

Knowing the value of the computable term C#(k) gives partial but useful information about the
actual value of the term of interest C(k) that is not computable:

 If C#(k) = green, then C(k) is neither red nor orange; the operation at k is correct.
 Conversely, if C(k) = orange or C(k )= red, then C# (k) is either red or orange.

3-10
4 High-Level Specification of
Polyspace Code Prover Outputs
Requirement HLR-1. (Soundness)
The outputs generated by Polyspace Code Prover shall be irrefutable (sound) with respect to the
run-time error, call trees and data dictionaries, as specified by the applicable language standard.
This applies to programs that meet applicable language standards at compile-time: programs that
are syntactically correct and for which the context conditions prescribed by the standard are
satisfied (which includes type checking). Language-specific requirements and/or restrictions
may apply. The semantics of the corresponding programming language may be parameterized
by options passed to the Polyspace Code Prover software that describe the target processor and
the target environment, by options that change specific parts of the semantic model, or by
options that favor either analysis time or precision.

Requirement HLR-2. (Run-Time Errors Yield Checks)


Polyspace Code Prover shall give a predictive color C#(k) for every operation k that can possibly
raise a run-time error, which, depending on the language standard, can cause an exception, an
undefined behavior, a processor halt, or other unspecified conditions (including nondeterminism
due to the use of uninitialized data). This is specified further by a language-specific requirement.
Besides check colors, Polyspace Code Prover may also generate additional information that can
help the exploitation of check colors, including the indication of nonterminating constructs,
information about the dynamic ranges of values of variables, or other unspecified information.
This information is outside the scope of this document.

Requirement HLR-3. (Check Colors Are Sound)


Each predicted color C#(k) output by Polyspace Code Prover shall be sound. The predicted color
must be admissible according to Definition 7.

Requirement HLR-4. (Call Graphs)


Polyspace Code Prover shall output call graphs, which associate each source subprogram with
the source location of each statement that can dynamically issue a call to the subprogram.

Requirement HLR-5. (Call Graphs Are Sound)


Calls that can occur in the semantic model (dynamically) shall be referenced in the call graph.

Requirement HLR-6. (Data Dictionaries)


Polyspace Code Prover shall output data dictionaries containing, for each global variable, the
source location of each read or write access; its shared or nonshared status; and, optionally, its
dynamic value range, if it is a scalar.

4-2
Requirement HLR-7. (Data Accesses Referenced in
Dictionaries Are Sound)
If a read or write access can be dynamically issued to a global variable, then it shall appear in
the data dictionary.

Requirement HLR-8. (Shared Statuses Referenced in


Dictionaries Are Sound)
Each variable V not satisfying the Bernstein11 condition for noninterference (if V can be
accessed by two tasks, threads, or interrupt routines with one access being a write access) shall
be referenced as shared in the data dictionary.

Requirement HLR-12. (Compliance with coding standard)


Polyspace Code Prover and Polyspace Bug Finder shall output coding standard violations. For
each violation, Polyspace Code Prover and Polyspace Bug Finder provide the location and the
type of violation. This is applicable for known standards (MISRA, or JSF) and for rules defined
by users.

Requirement HLR-13. (Coding metrics)


Polyspace Code Prover and Polyspace Bug Finder shall output coding metrics at project, files,
and function level (number of files, number of function, cyclomatic number…).

11 A. BERNSTEIN, Analysis of Programs for Parallel Processing, IEEE Trans. on Computers, EC 15: 5, 757-763, 1966.

4-3
4-4
5 High-Level Specification of
Polyspace Code Prover Outputs
─ Independence
Requirement HLR-10. (Component Independence)
Polyspace Code Prover core components shall be specified, developed, and tested independently
from MathWorks code generators. As a result, core components developed by one team are not
reused by another team.

Requirement HLR-11. (Behavior Independence)


Using another tool (for example, compiler, IDE, or code generator) with Polyspace Code Prover
shall not change Polyspace Code Prover behavior or output, as stated in the HLR-[1..8, 12, 13].
Manual semantic and automatic behavioral changes shall be accessible from Polyspace Code
Prover by user-level options.

5-2
6 References
6.1 Reference Documents
Floating point arithmetic standard IEEE 748

Programming languages – C. International standard ISO/EIC 9899: 1990 (E)

Programming languages – C. International standard ISO/EIC 9899: 1999 (E)

Programming languages – C++. International standard ISO/EIC 14882: 1998 (E)

6-2

You might also like