Professional Documents
Culture Documents
Version 3.0 B
ii Power ISA™
Version 3.0 B
IBM®
Power ISA
PowerPC®
Power Architecture
PowerPC Architecture
Power Family
RISC/System 6000®
POWER®
POWER2
POWER4
POWER4+
POWER5
POWER5+
POWER6®
POWER7®
POWER8®
POWER9™
System/370
System z
Notice to U.S. Government Users—Documentation
Related to Restricted Rights—Use, duplication or dis-
closure is subject to restrictions set fourth in GSA ADP
Schedule Contract with IBM Corporation.
iii
Version 3.0 B
iv Power ISA™ I
Version 3.0 B
Preface
The roots of the Power ISA (Instruction Set Architec- As used in this document, the term “Power ISA” refers
ture) extend back over a quarter of a century, to IBM to the instructions and facilities described in Books I, II,
Research. The POWER (Performance Optimization and III.
With Enhanced RISC) Architecture was introduced with
Change bars have been included in the body of this
the RISC System/6000 product family in early 1990. In
document to indicate changes from the Power ISA Ver-
1991, Apple, IBM, and Motorola began the collabora-
sion 2.07B. Change bars may be omitted for changes
tion to evolve to the PowerPC Architecture, expanding
associated with removing obsolete categories and the
the architecture’s applicability. In 1997, Motorola and
second Book III.
IBM began another collaboration, focused on optimiz-
ing PowerPC for embedded systems, which produced
Book E.
In 2006, Freescale and IBM collaborated on the cre-
ation of the Power ISA Version 2.03, which represented
the reunification of the architecture by combining
Book E content with the more general purpose Pow-
erPC Version 2.02. The resulting architecture included
environment-specific privileged architecture optimiza-
tions (two Book IIIs) and optional application-specific
facilities (categories) as extensions to a pervasive base
architecture.
Power ISA Version 3.0 B focuses this integration by
choosing a single Book III and a set of widely used cat-
egories to become part of the base architecture for all
forward-looking Power implementations. All other
optional architecture categories have been eliminated
to ensure increased application portability between
Power processors. Legacy embedded applications that
require the eliminated material will continue to use V.
2.07B.
The Power ISA Version 3.0 B consists of three books
and a set of appendices.
Book I, Power ISA User Instruction Set Architecture,
covers the base instruction set and related facilities
available to the application programmer.
Book II, Power ISA Virtual Environment Architecture,
defines the storage model and other instructions and
facilities that enable the application programmer to cre-
ate multithreaded programs and programs that interact
with certain physical realities of the computing environ-
ment.
Book III, Power ISA Operating Environment Architec-
ture, defines the supervisor instructions and related
facilities.
Preface v
Version 3.0 B
vi Power ISA™
Version 3.0 B
Integer Multiply-Add Instructions: Adds new integer Accesses to unimplemented SPRs by the OS newly
multiply-add instructions to accelerate arbitrary-length cause interrupts that are also directed to the hypervisor.
multiplication.
Synchronizing Messages and Storage Updates: Adds a
msgsndp Hypervisor Facility Availability Interrupt: new instruction to make latent storage updates from
Adds a new HFSCR bit to control the availability of the another thread accessible after receiving a Directed
msgsndp instruction and the associated control regis- Hypervisor Doorbell interrupt from that thread.
ters.
VSX Conditional: Adds new instruction to
VSX Permute: Adds new pernute instructions that can accelerate conditional, maximum, and minimum opera-
address all 64 VSRs. tions. Withdrew xscmpnedp, xvcmpnesp[.], and xvc-
mpnedp[.] instructions introduced in v3.0.
Array Index Support: Enhance support for mixed-data-
type addressing into arrays (e.g., base + 32-bit index) FXU & Vector Extensions for Blockchain Support: Two
new instructions (addex and vmsumudm) introduced to
Hypervisor Virtualization Interrupt: Defines a new
accelerate arbitrary-precision integer arithmetic, and
exception and corresponding interrupt that is caused
specifically to accelerate Blockchain’s implementation
by events external to the processor that relate to virtu-
of elliptical curve encryption signature algorithm. The
alization.
OV bit is employed to provide an additional, indepen-
dent carry status bit, allowing software to parallelize
carry propagation.
wait Instruction Enhancements: Improves the capabili-
ties of the wait instruction so that resumption of pro- Miscellaneous Changes: Makes minor clarifications,
cessing can occur due to event-based branches and corrections, and editorial enhancements.
external signals.
FX/VSX/Vector Miscellaneous: Editorial cleanup of
Decrementer and Hypervisor Decrementer Enahnce- Book I chapters 4, 5, and 7.
ments: Defines a new mode bit in the LPCR that
TM Multithread Overflow: Adds a bit to TEX-
enables additional Decrementer and Hypervisor Decre-
ASR to enable software to differentiate single thread
menter bits in order to increase the time between the
footprint overflow from that aggravated by multiple
associated interrupts.
threads competing for footprint.
Deliver A Random Number: Adds a new instruction to
Lightweight mffs: Modifications of mffs to accelerate
place a random number in a GPR in one of three for-
saving/setting/restoring floating-point environments
mats.
(e.g., rounding modes, exception trapping enables)
Data Storage Interrupt Status Register for Alignment common in math libraries that require overriding the
Interrupt: Simplifies the Alignment interrupt by remov- environment.
ing the Data Storage Interrupt Status Register (DSISR)
from the set of registers modified by the Alignment
interrupt.
CA32 & OV32 and Move XER to CR Extended: Added
support for 32-bit CA & OV status in 64-bit mode for
dynamically-typed languages.
VSX Shift Variable: Accelerate parallel element
extraction from packed vectors of arbitrary-width-ele-
ment values.
Enhanced Virtualization for Linux: Delivers exceptions
caused by the OS attempting to use hypervisor instruc-
tions and SPRs to the hypervisor instead of the OS.
Preface vii
Version 3.0 B
Table of Contents
1.6.19 XO-FORM . . . . . . . . . . . . . . . . . 15
1.6.20 XS-FORM. . . . . . . . . . . . . . . . . . 15
Preface. . . . . . . . . . . . . . . . . . . . . . . . . v 1.6.21 XX2-FORM. . . . . . . . . . . . . . . . . 15
Summary of Changes in Power ISA Ver- 1.6.22 XX3-FORM. . . . . . . . . . . . . . . . . 15
sion 3.0 B . . . . . . . . . . . . . . . . . . . . . . . . vi 1.6.23 XX4-FORM. . . . . . . . . . . . . . . . . 15
1.6.24 Z22-FORM . . . . . . . . . . . . . . . . . 15
Table of Contents . . . . . . . . . . . . . . . . ix 1.6.25 Z23-FORM . . . . . . . . . . . . . . . . . 16
1.7 Instruction Fields . . . . . . . . . . . . . . . 16
1.8 Classes of Instructions . . . . . . . . . . 22
Book I: 1.8.1 Defined Instruction Class . . . . . . . 22
1.8.2 Illegal Instruction Class . . . . . . . . 22
Power ISA User Instruction Set 1.8.3 Reserved Instruction Class . . . . . 22
1.9 Forms of Defined Instructions . . . . . 23
Architecture. . . . . . . . . . . . . . . . . . . . 1
1.9.1 Preferred Instruction Forms . . . . . 23
1.9.2 Invalid Instruction Forms . . . . . . . 23
Chapter 1. Introduction . . . . . . . . . . 3 1.9.3 Reserved-no-op Instructions . . . . 23
1.1 Overview. . . . . . . . . . . . . . . . . . . . . . 3 1.10 Exceptions. . . . . . . . . . . . . . . . . . . 23
1.2 Instruction Mnemonics and Operands3 1.11 Storage Addressing . . . . . . . . . . . . 24
1.3 Document Conventions . . . . . . . . . . 3 1.11.1 Storage Operands . . . . . . . . . . . 24
1.3.1 Definitions . . . . . . . . . . . . . . . . . . . 3 1.11.2 Instruction Fetches . . . . . . . . . . . 26
1.3.2 Notation . . . . . . . . . . . . . . . . . . . . . 4 1.11.3 Effective Address Calculation . . . 27
1.3.3 Reserved Fields, Reserved Values,
and Reserved SPRs . . . . . . . . . . . . . . . . 5 Chapter 2. Branch Facility . . . . . . . 29
1.3.4 Description of Instruction Operation 6 2.1 Branch Facility Overview. . . . . . . . . 29
1.3.5 Phased-Out Facilities . . . . . . . . . . 8 2.2 Instruction Execution Order. . . . . . . 29
1.4 Processor Overview . . . . . . . . . . . . . 9 2.3 Branch Facility Registers . . . . . . . . 30
1.5 Computation modes . . . . . . . . . . . . 10 2.3.1 Condition Register . . . . . . . . . . . . 30
1.6 Instruction Formats . . . . . . . . . . . . . 11 2.3.2 Link Register . . . . . . . . . . . . . . . . 32
1.6.1 A-FORM . . . . . . . . . . . . . . . . . . . 12 2.3.3 Count Register . . . . . . . . . . . . . . . 32
1.6.2 B-FORM . . . . . . . . . . . . . . . . . . . 12 2.3.4 Target Address Register. . . . . . . . 32
1.6.3 D-FORM . . . . . . . . . . . . . . . . . . . 12 2.4 Branch Instructions . . . . . . . . . . . . . 33
1.6.4 DQ-FORM . . . . . . . . . . . . . . . . . . 12 2.5 Condition Register Instructions . . . . 40
1.6.5 DS-FORM . . . . . . . . . . . . . . . . . . 12 2.5.1 Condition Register Logical Instruc-
1.6.6 DX-FORM . . . . . . . . . . . . . . . . . . 12 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.6.7 I-FORM . . . . . . . . . . . . . . . . . . . . 12 2.5.2 Condition Register Field Instruction .
1.6.8 M-FORM . . . . . . . . . . . . . . . . . . . 12 41
1.6.9 MD-FORM . . . . . . . . . . . . . . . . . . 12 2.6 System Call Instructions. . . . . . . . . 42
1.6.10 MDS-FORM . . . . . . . . . . . . . . . . 12
1.6.11 SC-FORM . . . . . . . . . . . . . . . . . 12
1.6.12 VA-FORM . . . . . . . . . . . . . . . . . 12 Chapter 3. Fixed-Point Facility. . . . 45
1.6.13 VC-FORM . . . . . . . . . . . . . . . . . 12 3.1 Fixed-Point Facility Overview . . . . . 45
1.6.14 VX-FORM . . . . . . . . . . . . . . . . . 13 3.2 Fixed-Point Facility Registers . . . . . 45
1.6.15 X-FORM . . . . . . . . . . . . . . . . . . 13 3.2.1 General Purpose Registers . . . . . 45
1.6.16 XFL-FORM . . . . . . . . . . . . . . . . 15 3.2.2 Fixed-Point Exception
1.6.17 XFX-FORM . . . . . . . . . . . . . . . . 15 Register . . . . . . . . . . . . . . . . . . . . . . . . . 45
1.6.18 XL-FORM . . . . . . . . . . . . . . . . . 15 3.2.3 VR Save Register. . . . . . . . . . . . . 46
3.3 Fixed-Point Facility Instructions . . . 47
Table of Contents ix
Version 3.0 B
x Power ISA™
Version 3.0 B
Table of Contents xi
Version 3.0 B
6.9.1.6 Vector Integer Negate Instructions. 6.17.4 Decimal Integer Shift and Round
293 Instructions . . . . . . . . . . . . . . . . . . . . . 357
6.9.2 Vector Extend Sign Instructions .294 6.17.5 Decimal Integer Truncate Instruc-
6.9.2.1 Vector Integer Average Instructions tions . . . . . . . . . . . . . . . . . . . . . . . . . . 360
295 6.18 Vector Status and Control Register
6.9.2.2 Vector Integer Absolute Difference Instructions . . . . . . . . . . . . . . . . . . . . . 362
Instructions . . . . . . . . . . . . . . . . . . . . . .297
6.9.2.3 Vector Integer Maximum and Mini- Chapter 7. Vector-Scalar
mum Instructions . . . . . . . . . . . . . . . . .299 Floating-Point Operations . . . . . . 363
6.9.3 Vector Integer Compare Instructions.
7.1 Introduction . . . . . . . . . . . . . . . . . . 363
303
7.1.1 Overview of the Vector-Scalar Exten-
6.9.4 Vector Logical Instructions . . . . .312
sion . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
6.9.5 Vector Parity Byte Instructions . .314
7.1.1.1 Compatibility with Floating-Point
6.9.6 Vector Integer Rotate and Shift
and Decimal Floating-Point Operations 363
Instructions . . . . . . . . . . . . . . . . . . . . . .315
7.1.1.2 Compatibility with Vector Opera-
6.10 Vector Floating-Point Instruction Set .
tions . . . . . . . . . . . . . . . . . . . . . . . . . . 363
321
7.2 VSX Registers . . . . . . . . . . . . . . . 364
6.10.1 Vector Floating-Point Arithmetic
7.2.1 Vector-Scalar Registers . . . . . . . 364
Instructions . . . . . . . . . . . . . . . . . . . . . .321
7.2.1.1 Floating-Point Registers . . . . . 364
6.10.2 Vector Floating-Point Maximum and
7.2.1.2 Vector Registers . . . . . . . . . . . 366
Minimum Instructions . . . . . . . . . . . . . .323
7.2.2 Floating-Point Status and Control
6.10.3 Vector Floating-Point Rounding and
Register. . . . . . . . . . . . . . . . . . . . . . . . 367
Conversion Instructions . . . . . . . . . . . .324
7.3 VSX Operations . . . . . . . . . . . . . . 372
6.10.4 Vector Floating-Point Compare
7.3.1 VSX Floating-Point Arithmetic Over-
Instructions . . . . . . . . . . . . . . . . . . . . . .328
view . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
6.10.5 Vector Floating-Point Estimate
7.3.2 VSX Floating-Point Data . . . . . . 373
Instructions . . . . . . . . . . . . . . . . . . . . . .331
7.3.2.1 Data Format . . . . . . . . . . . . . . 373
6.11 Vector Exclusive-OR-based Instruc-
7.3.2.2 Value Representation . . . . . . . 375
tions . . . . . . . . . . . . . . . . . . . . . . . . . . .333
7.3.2.3 Sign of Result . . . . . . . . . . . . . 376
6.11.1 Vector AES Instructions. . . . . . .333
7.3.2.4 Normalization and Denormalization
6.11.2 Vector SHA-256 and SHA-512
377
Sigma Instructions . . . . . . . . . . . . . . . .335
7.3.2.5 Data Handling and Precision . 377
6.11.3 Vector Binary Polynomial Multiplica-
7.3.2.6 Rounding . . . . . . . . . . . . . . . . 381
tion Instructions . . . . . . . . . . . . . . . . . .336
7.3.3 VSX Floating-Point Execution Mod-
6.11.4 Vector Permute and Exclusive-OR
els . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
Instruction . . . . . . . . . . . . . . . . . . . . . . .338
7.3.3.1 VSX Execution Model for IEEE
6.12 Vector Gather Instruction . . . . . . .339
Operations . . . . . . . . . . . . . . . . . . . . . 384
6.13 Vector Count Leading Zeros Instruc-
7.3.3.2 VSX Execution Model for Multi-
tions . . . . . . . . . . . . . . . . . . . . . . . . . . .340
ply-Add Type Instructions . . . . . . . . . . 385
6.14 Vector Count Trailing Zeros Instruc-
7.4 VSX Floating-Point Exceptions. . . 387
tions . . . . . . . . . . . . . . . . . . . . . . . . . . .341
7.4.1 Floating-Point Invalid Operation
6.14.1 Vector Count Leading/Trailing Zero
Exception . . . . . . . . . . . . . . . . . . . . . . 390
LSB Instructions . . . . . . . . . . . . . . . . . .342
7.4.1.1 Definition. . . . . . . . . . . . . . . . . 390
6.14.2 Vector Extract Element Instructions
7.4.1.2 Action for VE=1. . . . . . . . . . . . 390
343
7.4.1.3 Action for VE=0. . . . . . . . . . . . 392
6.15 Vector Population Count Instructions .
7.4.2 Floating-Point Zero Divide Exception
345
401
6.16 Vector Bit Permute Instruction . . .346
7.4.2.1 Definition. . . . . . . . . . . . . . . . . 401
6.17 Decimal Integer Instructions. . . . .347
7.4.2.2 Action for ZE=1. . . . . . . . . . . . 401
6.17.1 Decimal Integer Arithmetic Instruc-
7.4.2.3 Action for ZE=0. . . . . . . . . . . . 402
tions . . . . . . . . . . . . . . . . . . . . . . . . . . .347
7.4.3 Floating-Point Overflow Exception .
6.17.2 Decimal Integer Format Conversion
404
Instructions . . . . . . . . . . . . . . . . . . . . . .350
7.4.3.1 Definition. . . . . . . . . . . . . . . . . 404
6.17.3 Decimal Integer Sign Manipulation
7.4.3.2 Action for OE=1 . . . . . . . . . . . 404
Instructions . . . . . . . . . . . . . . . . . . . . . .356
7.4.3.3 Action for OE=0 . . . . . . . . . . . 407
1.7 Shared Storage . . . . . . . . . . . . . .814 4.6.2 Load and Reserve and Store Condi-
1.7.1 Storage Access Ordering . . . . .815 tional Instructions . . . . . . . . . . . . . . . . 863
1.7.2 Storage Ordering of Copy/Paste-Initi- 4.6.2.1 64-Bit Load and Reserve and Store
ated Data Transfers . . . . . . . . . . . . . . .817 Conditional Instructions. . . . . . . . . . . . 869
1.7.3 Storage Ordering of I/O Accesses. . . 4.6.2.2 128-bit Load and Reserve Store
817 Conditional Instructions. . . . . . . . . . . . 871
1.7.4 Atomic Update. . . . . . . . . . . . . . .817 4.6.3 Memory Barrier Instructions . . . 873
1.7.4.1 Reservations . . . . . . . . . . . . .818 4.6.4 Wait Instruction . . . . . . . . . . . . . 876
1.7.4.2 Forward Progress . . . . . . . . . .820
1.8 Transactions. . . . . . . . . . . . . . . . . .821 Chapter 5. Transactional Memory
1.8.1 Rollback-Only Transactions . . . .823 Facility . . . . . . . . . . . . . . . . . . . . . 877
1.9 Instruction Storage . . . . . . . . . . . . .823
5.1 Transactional Memory Facility Over-
1.9.1 Concurrent Modification and Execu-
view . . . . . . . . . . . . . . . . . . . . . . . . . . . 877
tion of Instructions . . . . . . . . . . . . . . . .825
5.1.1 Definitions . . . . . . . . . . . . . . . . . 878
5.2 Transactional Memory Facility States.
Chapter 2. Performance 880
Considerations and Instruction 5.2.1 The TDOOMED Bit . . . . . . . . . . 882
Restart . . . . . . . . . . . . . . . . . . . . . . 827 5.3 Transaction Failure . . . . . . . . . . . . 882
2.1 Performance-Optimized Instruction 5.3.1 Causes of Transaction Failure . . 882
Sequences . . . . . . . . . . . . . . . . . . . . . .827 5.3.2 Recording of Transaction Failure 885
2.1.1 Load and Store Operations . . . . .828 5.3.3 Handling of Transaction Failure . 885
2.1.2 32-Bit Constant Generation. . . . .831 5.4 Transactional Memory Facility Regis-
2.1.3 Sign and Zero Extension . . . . . .831 ters . . . . . . . . . . . . . . . . . . . . . . . . . . . 886
2.1.4 Load/Store Addressing Relative to 5.4.1 Transaction Failure Handler Address
Program Counter . . . . . . . . . . . . . . . . .832 Register (TFHAR) . . . . . . . . . . . . . . . . 886
2.1.5 Destructive Operation Operand 5.4.2 Transaction EXception And Status
Preservation . . . . . . . . . . . . . . . . . . . . .833 Register (TEXASR) . . . . . . . . . . . . . . . 886
2.2 Instruction Restart . . . . . . . . . . . .834 5.4.3 Transaction Failure Instruction
Address Register (TFIAR). . . . . . . . . . 889
Chapter 3. Management of Shared 5.5 Transactional Facility Instructions. 890
Resources . . . . . . . . . . . . . . . . . . . 835 Chapter 6. Time Base . . . . . . . . . 897
3.1 Program Priority Registers . . . . . . .835
6.1 Time Base Instructions . . . . . . . . . 898
3.2 “or” Instruction . . . . . . . . . . . . . . . .835
Table of Contents xv
Version 3.0 B
4.3.7 Program Priority Register . . . . . .963 5.7.8.3 Segment Table Description and
4.3.8 Problem State Priority Boost Regis- Search. . . . . . . . . . . . . . . . . . . . . . . . . 995
ter . . . . . . . . . . . . . . . . . . . . . . . . . . . . .963 5.7.8.3.1 Primary Hash for 256MB Seg-
4.3.9 Relative Priority Register. . . . . . .963 ment . . . . . . . . . . . . . . . . . . . . . . . . . . 996
4.3.10 Software-use SPRs. . . . . . . . . .964 5.7.8.3.2 Primary Hash for 1TB Segment.
4.4 Fixed-Point Facility Instructions . . .965 996
4.4.1 Fixed-Point Load and Store Caching 5.7.8.3.3 Secondary Hash for 256MB Seg-
Inhibited Instructions. . . . . . . . . . . . . . .965 ment . . . . . . . . . . . . . . . . . . . . . . . . . . 996
4.4.2 OR Instruction . . . . . . . . . . . . . . .968 5.7.8.3.4 Secondary Hash for 1TB Seg-
4.4.3 Transactional Memory Instructions . . ment . . . . . . . . . . . . . . . . . . . . . . . . . . 996
969 5.7.9 Hashed Page Table Translation. 996
4.4.4 Move To/From System Register 5.7.9.1 Hashed Page Table . . . . . . . . 998
Instructions . . . . . . . . . . . . . . . . . . . . . .970 5.7.9.2 Page Table Search . . . . . . . . . 999
5.7.10 Radix Tree Translation. . . . . . 1001
Chapter 5. Storage Control . . . . . 981 5.7.10.1 Radix Tree Page Directory Entry
5.1 Overview . . . . . . . . . . . . . . . . . . . .981 1002
5.2 Storage Exceptions . . . . . . . . . . . .981 5.7.10.2 Radix Tree Page Table Entry1003
5.3 Instruction Fetch . . . . . . . . . . . . . .981 5.7.10.3 Nested Translation . . . . . . . 1003
5.3.1 Implicit Branch. . . . . . . . . . . . . . .981 5.7.11 Translation Process . . . . . . . . 1005
5.3.2 Address Wrapping Combined with 5.7.11.1 Fully-Qualified Address . . . . 1005
Changing MSR Bit SF . . . . . . . . . . . . .981 5.7.11.2 Finding the Page Tables . . . 1006
5.4 Data Access . . . . . . . . . . . . . . . . . .982 5.7.11.3 Obtaining Host Real Address,
5.5 Performing Operations Radix on Radix . . . . . . . . . . . . . . . . . 1006
Out-of-Order . . . . . . . . . . . . . . . . . . . . .982 5.7.11.4 Obtaining Host Real Address,
5.6 Invalid Real Address . . . . . . . . . . .982 HPT . . . . . . . . . . . . . . . . . . . . . . . . . . 1007
5.7 Storage Addressing . . . . . . . . . . . .983 5.7.12 Reference and Change Recording
5.7.1 32-Bit Mode. . . . . . . . . . . . . . . . .983 1007
5.7.2 Virtualized Partition Memory (VPM) 5.7.13 Storage Protection . . . . . . . . . 1011
Mode. . . . . . . . . . . . . . . . . . . . . . . . . . .984 5.7.13.1 Virtual Page Class Key Protection
5.7.3 Hypervisor Real And Virtual Real 1011
Addressing Modes . . . . . . . . . . . . . . . .984 5.7.13.2 Basic Storage Protection,
5.7.3.1 Hypervisor Offset Real Mode Address Translation Enabled . . . . . . 1015
Address . . . . . . . . . . . . . . . . . . . . . . . .984 5.7.13.3 Basic Storage Protection,
5.7.3.2 Storage Control Attributes for Address Translation Disabled . . . . . . 1016
Accesses in Hypervisor Real Addressing 5.7.13.4 Radix Tree Translation Storage
Mode. . . . . . . . . . . . . . . . . . . . . . . . . . .984 Protection . . . . . . . . . . . . . . . . . . . . . 1016
5.7.3.2.1 Hypervisor Real Mode Storage 5.8 Storage Control Attributes . . . . . 1017
Control . . . . . . . . . . . . . . . . . . . . . . . . .985 5.8.1 Guarded Storage . . . . . . . . . . . 1017
5.7.3.3 Virtual Real Mode Addressing 5.8.1.1 Out-of-Order Accesses to Guarded
Mechanism . . . . . . . . . . . . . . . . . . . . . .985 Storage . . . . . . . . . . . . . . . . . . . . . . . 1018
5.7.3.4 Storage Control Attributes for 5.8.2 Storage Control Bits . . . . . . . . 1018
Implicit Storage Accesses. . . . . . . . . . .986 5.8.2.1 Storage Control Bit Restrictions . .
5.7.4 Definitions . . . . . . . . . . . . . . . . . .986 1019
5.7.5 Address Ranges Having Defined 5.8.2.2 Altering the Storage Control Bits .
Uses . . . . . . . . . . . . . . . . . . . . . . . . . . .987 1019
5.7.5.1 Effective Address Space Structure 5.9 Storage Control Instructions . . . . 1021
for Radix-using Partitions . . . . . . . . . . .987 5.9.1 Cache Management Instructions . . .
5.7.6 In-Memory Tables . . . . . . . . . . . .988 1021
5.7.6.1 Partition Table . . . . . . . . . . . . .989 5.9.2 Synchronize Instruction . . . . . . 1021
5.7.6.2 Process Table. . . . . . . . . . . . . .991 5.9.3 Lookaside Buffer
5.7.7 Address Translation Overview . .991 Management . . . . . . . . . . . . . . . . . . . 1022
5.7.8 Segment Translation . . . . . . . . . .994 5.9.3.1 Thread-Specific Segment Transla-
5.7.8.1 Segment Lookaside Buffer (SLB) . tions . . . . . . . . . . . . . . . . . . . . . . . . . 1023
994 5.9.3.2 SLB Management Instructions . .
5.7.8.2 SLB Search . . . . . . . . . . . . . . .995 1023
Book I:
2 Power ISA™ I
Version 3.0 B
Chapter 1. Introduction
positive
1.1 Overview Means greater than zero.
This chapter describes computation modes,document negative
conventions, a processor overview, instruction formats, Means less than zero.
storage addressing, and instruction fetching.
floating-point single format (or simply single
format)
1.2 Instruction Mnemonics and Refers to the representation of a single-precision
binary floating-point value in a register or storage.
Operands
floating-point double format (or simply double
The description of each instruction includes the mne- format)
monic and a formatted list of operands. Some exam- Refers to the representation of a double-precision
ples are the following. binary floating-point value in a register or storage.
system library program
stw RS,D(RA)
A component of the system software that can be
addis RT,RA,SI
called by an application program using a Branch
Power ISA-compliant Assemblers will support the mne- instruction.
monics and operand lists exactly as shown. They
system service program
should also provide certain extended mnemonics, such
A component of the system software that can be
as the ones described in Appendix C of Book I.
called by an application program using a System
Call or System Call Vectored instruction.
1.3 Document Conventions system trap handler
A component of the system software that receives
control when the conditions specified in a Trap
1.3.1 Definitions instruction are satisfied.
The following definitions are used throughout this docu- system error handler
ment. A component of the system software that receives
control when an error occurs. The system error
program
handler includes a component for each of the vari-
A sequence of related instructions.
ous kinds of error. These error-specific compo-
application program nents are referred to as the system alignment error
A program that uses only the instructions and handler, the system data storage error handler,
resources described in Books I and II. etc.
processor latency
The hardware component that implements the Refers to the interval from the time an instruction
instruction set, storage model, and other facilities begins execution until it produces a result that is
defined in the Power ISA architecture, and exe- available for use by a subsequent instruction.
cutes the instructions specified in a program.
unavailable
quadword, doubleword, word, halfword, and Refers to a resource that cannot be used by the
byte program. For example, storage is unavailable if
128 bits, 64 bits, 32 bits, 16 bits, and 8 bits, access to it is denied. See Book III.
respectively.
Chapter 1. Introduction 3
Version 3.0 B
undefined value are not used with them. Parentheses are also
May vary between implementations, and between omitted when register x is the register into which
different executions on the same implementation, the result of an operation is placed.
and similarly for register contents, storage con-
(RA|0) means the contents of register RA if the RA
tents, etc., that are specified as being undefined.
field has the value 1-31, or the value 0 if the RA
boundedly undefined field is 0.
The results of executing a given instruction are Bytes in instructions, fields, and bit strings are
said to be boundedly undefined if they could have numbered from left to right, starting with byte 0
been achieved by executing an arbitrary finite (most significant).
sequence of instructions (none of which yields Bits in registers, instructions, fields, and bit strings
boundedly undefined results) in the state the pro- are specified as follows. In the last three items
cessor was in before executing the given instruc- (definition of Xp etc.), if X is a field that specifies a
tion. Boundedly undefined results may include the GPR, FPR, or VR (e.g., the RS field of an instruc-
presentation of inconsistent state to the system tion), the definitions apply to the register, not to the
error handler as described in Section 1.9.1 of Book field.
II. Boundedly undefined results for a given instruc-
tion may vary between implementations, and - Bits in instructions, fields, and bit strings are
between different executions on the same imple- numbered from left to right, starting with bit 0
mentation. - For all registers except the Vector registers,
“must” bits in registers that are less than 64 bits start
If software violates a rule that is stated using the with bit number 64-L, where L is the register
word “must” (e.g., “this field must be set to 0”), the length; for the Vector registers, bits in regis-
results are boundedly undefined unless otherwise ters that are less than 128 bits start with bit
stated. number 128-L.
- The leftmost bit of a sequence of bits is the
sequential execution model most significant bit of the sequence.
The model of program execution described in - Xp means bit p of register/instruction/field/
Section 2.2, “Instruction Execution Order” on bit_string X.
page 29. - Xp:q means bits p through q of register/instruc-
tion/field/bit_string X.
- Xp q ... means bits p, q, ... of register/instruc-
1.3.2 Notation tion/field/bit_string X.
¬(RA) means the one’s complement of the con-
The following notation is used throughout the Power tents of register RA.
ISA documents.
A period (.) as the last character of an instruction
All numbers are decimal unless specified in some mnemonic means that the instruction records sta-
special way. tus information in certain fields of the Condition
- 0bnnnn means a number expressed in binary Register as a side effect of execution.
format. The symbol || is used to describe the concatena-
- 0xnnnn means a number expressed in hexa- tion of two values. For example, 010 || 111 is the
decimal format. same as 010111.
Underscores may be used between digits. xn means x raised to the nth power.
RT, RA, R1, ... refer to General Purpose Registers. nx means the replication of x, n times (i.e., x con-
FRT, FRA, FR1, ... refer to Floating-Point Regis- catenated to itself n-1 times). n0 and n1 are spe-
ters. cial cases:
n0 means a field of n bits with each bit equal to
FRTp, FRAp, FRBp, ... refer to an even-odd pair of -
Floating-Point Registers. Values must be even, 0. Thus 50 is equivalent to 0b00000.
n1 means a field of n bits with each bit equal to
otherwise the instruction form is invalid. -
1. Thus 51 is equivalent to 0b11111.
VRT, VRA, VR1, ... refer to Vector Registers.
Each bit and field in instructions, and in status and
(x) means the contents of register x, where x is the control registers (e.g., XER, FPSCR) and Special
name of an instruction field. For example, (RA) Purpose Registers, is either defined or reserved.
means the contents of register RA, and (FRA) Some defined fields contain reserved values. In
means the contents of register FRA, where RA and such cases when this document refers to the spe-
FRA are instruction fields. Names such as LR and cific field, it refers only to the defined values,
CTR denote registers, not fields, so parentheses unless otherwise specified.
4 Power ISA™ I
Version 3.0 B
/, //, ///, ... denotes a reserved field, in a register, 1.3.3 Reserved Fields, Reserved
instruction, field, or bit string.
Values, and Reserved SPRs
?, ??, ???, ... denotes an implementation-depen-
dent field in a register, instruction, field or bit string. Reserved fields in instructions are ignored by the pro-
cessor.
In some cases a defined field of an instruction has cer-
tain values that are reserved. This includes cases in
which the field is shown in the instruction layout as con-
taining a particular value; in such cases all other values
of the field are reserved. In general, if an instruction is
coded such that a defined field contains a reserved
value the instruction form is invalid; see Section 1.9.2
on page 23. The only exception to the preceding rule is
that it does not apply to Reserved and Illegal classes of
instructions (see Section 1.8) or to portions of defined
fields that are specified, in the instruction description,
as being treated as reserved fields.
To maximize compatibility with future architecture
extensions, software must ensure that reserved fields
in instructions contain zero and that defined fields of
instructions do not contain reserved values.
The handling of reserved bits in System Registers (e.g.,
XER, FPSCR) depends on whether the processor is in
problem state. Unless otherwise stated, software is per-
mitted to write any value to such a bit. In problem state,
a subsequent reading of the bit returns 0 regardless of
the value written; in privileged states, a subsequent
reading of the bit returns 0 if the value last written to the
bit was 0 and returns an undefined value (0 or 1) other-
wise.
In some cases, a defined field of a System Register
has certain values that are reserved. Software must not
set a defined field of a System Register to a reserved
value. References elsewhere in this document to a
defined field (in an instruction or System Register) that
has reserved values assume the field does not contain
a reserved value, unless otherwise stated or obvious
from context.
In some cases, a given bit of a System Register is
specified to be set to a constant value by a given
instruction or event. Unless otherwise stated or obvious
from context, software should not depend on this con-
stant value because the bit may be assigned a mean-
ing in a future version of the architecture.
The reserved SPRs include SPRs 808, 809, 810, and
811. mtspr and mfspr instructions specifying these
SPRs are treated as no-ops. Reserved SPRs are pro-
vided in the architecture to anticipate the eventual
adoption of performance hint functionality that must be
controlled by SPRs. Control of these capabilities using
reserved SPRs will allow software to use these new
capabilities on new implementations that support them
while remaining compatible with existing implementa-
tions that may not support the new functionality.
Chapter 1. Introduction 5
Version 3.0 B
Reserved SPRs are not assigned names. There are no (“Non-standard” setting of these registers, such as the
individual descriptions of reserved SPRs in this docu- setting of the Condition Register by the Compare
ment. instructions, is shown.) The RTL descriptions do not
cover cases in which the system error handler is
Assembler Note invoked, or for which the results are boundedly unde-
Assemblers should report uses of reserved values fined.
of defined fields of instructions as errors. The RTL descriptions specify the architectural transfor-
mation performed by the execution of an instruction.
Programming Note They do not imply any particular implementation.
It is the responsibility of software to preserve bits
that are now reserved in System Registers,
Notation Meaning
because they may be assigned a meaning in some
Assignment
future version of the architecture.
iea Assignment of an instruction effective
In order to accomplish this preservation in imple- address. In 32-bit mode the high-order 32
mentation-independent fashion, software should do bits of the 64-bit target address are set to
the following. 0.
¬ NOT logical operator
Initialize each such register supplying zeros for
all reserved bits. + Two’s complement addition
Alter (defined) bit(s) in the register by reading - Two’s complement subtraction, unary
the register, altering only the desired bit(s), minus
and then writing the new value back to the reg- Multiplication
ister. si Signed-integer multiplication
ui Unsigned-integer multiplication
The XER and FPSCR are partial exceptions to this / Division
recommendation. Software can alter the status bits Division, with result truncated to integer
in these registers, preserving the reserved bits, by % Remainder of integer division
executing instructions that have the side effect of Square root
altering the status bits. Similarly, software can alter =, Equals, Not Equals relations
any defined bit in the FPSCR by executing a Float- <, , >, Signed comparison relations
ing-Point Status and Control Register instruction. <u, >u Unsigned comparison relations
Using such instructions is likely to yield better per- ? Unordered comparison relation
formance than using the method described in the &, | AND, OR logical operators
second item above.
, Exclusive OR, Equivalence logical opera-
tors ((ab) = (a¬b))
ABS(x) Absolute value of x
1.3.4 Description of Instruction BCD_TO_DPD(x)
Operation The low-order 24 bits of x contain six, 4-bit
BCD fields which are converted to two
Instruction descriptions (including related material such declets; each set of two declets is placed
as the introduction to the section describing the instruc- into the low-order 20 bits of the result. See
tions) mention that the instruction may cause a system Section B.1, “BCD-to-DPD Translation”.
error handler to be invoked, under certain conditions, if CEIL(x) Least integer x
and only if the system error handler may treat the case DOUBLE(x) Result of converting x from floating-point
as a programming error. (An instruction may cause a single format to floating-point double for-
system error handler to be invoked under other condi- mat, using the model shown on page 140
tions as well; see Chapter 6 of Book III). DPD_TO_BCD(x)
The low-order 20 bits of x contain two
A formal description is given of the operation of each
declets which are converted to six, 4-bit
instruction. In addition, the operation of most instruc-
BCD fields; each set of six, 4-bit BCD
tions is described by a semiformal language at the reg-
fields is placed into the low-order 24 bits of
ister transfer level (RTL). This RTL uses the notation
the result. See Section B.2, “DPD-to-BCD
given below, in addition to the notation described in
Translation”.
Section 1.3.2. Some of this notation is also used in the
EXTS(x) Result of extending x on the left with sign
formal descriptions of instructions. RTL notation not
bits
summarized here should be self-explanatory.
FLOOR(x) Greatest integer x
The RTL descriptions cover the normal execution of the GPR(x) General Purpose Register x
instruction, except that “standard” setting of status reg- MASK(x, y) Mask having 1s in positions x through y
isters, such as the Condition Register, is not shown. (wrapping if x > y) and 0s elsewhere
6 Power ISA™ I
Version 3.0 B
MEM(x, y) Contents of a sequence of y bytes of stor- for For loop, indenting shows range. Clause
age. The sequence depends on the byte after “for” specifies the entities for which to
ordering used for storage access, as fol- execute the body of the loop.
lows. switch/case/default
Big-Endian byte ordering: switch/case/default statement, indenting
The sequence starts with the byte at shows range. The clause after “switch”
address x and ends with the byte at specifies the expression to evaluate. The
address x+y-1. clause after “case” specifies individual val-
Little-Endian byte ordering: ues for the expression, followed by a
The sequence starts with the byte at colon, followed by the actions that are
address x+y-1 and ends with the byte at taken if the evaluated expression has any
address x. of the specified values. “default” is
ROTL64(x, y) optional. If present, it must follow all the
Result of rotating the 64-bit value x left y “case” clauses. The clause after “default”
positions starts with a colon, and specifies the
ROTL32(x, y) actions that are taken if the evaluated
Result of rotating the 64-bit value x||x left y expression does not have any of the val-
positions, where x is 32 bits long ues specified in the preceding case state-
SINGLE(x) Result of converting x from floating-point ments.
double format to floating-point single for-
mat, using the model shown on page 144
SPR(x) Special Purpose Register x
TRAP Invoke the system trap handler
characterization
Reference to the setting of status bits, in a
standard way that is explained in the text
undefined An undefined value.
CIA Current Instruction Address, which is the
64-bit address of the instruction being
described by a sequence of RTL. Used by
relative branches to set the Next Instruc-
tion Address (NIA), and by Branch instruc-
tions with LK=1 to set the Link Register.
Does not correspond to any architected
register. The CIA is sometimes referred to
as the Program Counter (PC).
NIA Next Instruction Address, which is the
64-bit address of the next instruction to be
executed. For a successful branch, the
next instruction address is the branch tar-
get address: in RTL, this is indicated by
assigning a value to NIA. For other instruc-
tions that cause non-sequential instruction
fetching (see Book III), the RTL is similar.
For instructions that do not branch, and do
not otherwise cause instruction fetching to
be non-sequential, the next instruction
address is CIA+4. Does not correspond to
any architected register.
if... then... else...
Conditional execution, indenting shows
range; else is optional.
do Do loop, indenting shows range. “To” and/
or “by” clauses specify incrementing an
iteration variable, and a “while” clause
gives termination conditions.
leave Leave innermost do loop, or do loop
described in leave statement.
Chapter 1. Introduction 7
Version 3.0 B
The precedence rules for RTL operators are summa- 1.3.5 Phased-Out Facilities
rized in Table 1. Operators higher in the table are
applied before those lower in the table. Operators at
the same level in the table associate from left to right,
from right to left, or not at all, as shown. (For example, Phased-Out Facilities
- associates from left to right, so a-b-c = (a-b)-c.)
These are facilities and instructions that, in some
Parentheses are used to override the evaluation order
future version of the architecture, will be dropped
implied by the table or to increase clarity; parenthe-
out of the architecture. System developers should
sized expressions are evaluated before serving as
develop a migration plan to eliminate use of them
operands.
in new systems. These facilities are marked with a
[Phased-Out] marker.
Table 1: Operator precedence
Phased-Out facilities and instructions must be
Operators Associativity
implemented.
subscript, function evaluation left to right
pre-superscript (replication), right to left Programming Note
post-superscript (exponentiation) Warning: Instructions and facilities being phased
unary -, ¬ right to left out of the architecture are likely to perform poorly
, left to right on future implementations. New programs should
not use them.
+, -, left to right
|| left to right
=, , <, , >, ,<u, >u,? left to right
&, , left to right
| left to right
: (range) none
,iea none
8 Power ISA™ I
Version 3.0 B
Chapter 1. Introduction 9
Version 3.0 B
CR FPSCR
32 63 32 63
“Condition Register” on page 30 “Floating-Point Status and Control Register” on
page 124
LR
0 63
VR 0
“Link Register” on page 32
VR 1
CTR ...
0 63 ...
“Count Register” on page 32 VR 30
VR 31
GPR 0
0 127
GPR 1 “Vector Registers” on page 232
...
... VSCR
96 127
GPR 30 “Vector Status and Control Register” on page 232
GPR 31
0 63
VSR 0
“General Purpose Registers” on page 45
VSR 1
XER ...
0 63 ...
“Fixed-Point Exception Register” on page 45 VSR 62
VSR 63
VRSAVE 0 127
32 63
“VR Save Register” on page 233 “Vector-Scalar Registers” on page 364
FPR 0
FPR 1
...
...
FPR 30
FPR 31
0 63
“Floating-Point Registers” on page 124
10 Power ISA™ I
Version 3.0 B
Programming Note
Although instructions that set a 64-bit register affect
all 64 bits in both 32-bit and 64-bit modes, operat-
ing systems often do not preserve the upper 32-bits
of all registers across context switches done in
32-bit mode. For this reason, application programs
operating in 32-bit mode should not assume that
the upper 32 bits of the GPRs are preserved from
instruction to instruction unless the operating sys-
tem is known to preserve these bits.
Chapter 1. Introduction 11
Version 3.0 B
PO BF / L RA SI 1.6.9 MD-FORM
PO BF / L RA UI 0 6 11 16 21 27 3031
PO FRS RA D PO RS RA sh mb XO sh Rc
PO FRT RA D PO RS RA sh me XO sh Rc
PO RS RA D Figure 11. MD instruction format
PO RS RA UI
PO RT RA D 1.6.10 MDS-FORM
PO RT RA SI 0 6 11 16 21 25 27 31
PO TO RA SI PO RS RA RB mb XO Rc
Figure 5. D instruction format PO RS RA RB me XO Rc
0 6 11 16 3031 PO RT RA RB RC XO
PO FRSp RA DS XO PO VRT VRA VRB / SHB XO
PO FRTp RA DS XO PO VRT VRA VRB VRC XO
PO RS RA DS XO Figure 14. VA instruction format
PO RSp RA DS XO
PO RT RA DS XO 1.6.13 VC-FORM
PO VRS RA DS XO 0 6 11 16 2122 31
12 Power ISA™ I
Version 3.0 B
Chapter 1. Introduction 13
Version 3.0 B
0 6 7 8 9 10111213141516171819202122232425262728293031
PO RS RA RB XO 1
PO RS RA RB XO Rc
PO RSp RA RB XO 1
PO RT /// /// XO /
PO RT /// RB XO /
PO RT /// RB XO 1
PO RT /// L /// XO /
PO RT / SR /// XO /
PO RT RA FC XO /
PO RT RA NB XO /
PO RT RA RB XO /
PO RT RA RB XO EH
PO RTp RA RB XO EH
PO S RA /// XO SX
PO S RA RB XO SX
PO T EO IMM8 XO TX
PO T RA /// XO TX
PO T RA RB XO TX
PO TH RA RB XO /
PO TO RA SI XO 1
PO TO RA RB XO /
PO TO RA RB XO 1
PO VRS RA RB XO /
PO VRT EO VRB XO /
PO VRT EO VRB XO RO
PO VRT RA RB XO /
PO VRT VRA VRB XO /
PO VRT VRA VRB XO RO
14 Power ISA™ I
Version 3.0 B
PO RT spr XO / PO BF // A B XO AX BX /
PO RT tbr XO / PO T A B 0 DM XO AX BX TX
1.6.18 XL-FORM PO T A B XO AX BX TX
0 6 9 11 14 16 192021 31 Figure 24. XX3 instruction format
PO /// /// /// XO /
PO /// /// /// S XO / 1.6.23 XX4-FORM
PO BF // BFA // /// XO / 0 6 11 16 21 262728293031
PO BO BI /// BH XO LK PO T A B C XO CX AX BX TX
PO BT BA BB XO / Figure 25. XX4 instruction format
Figure 20. XL instruction format
1.6.24 Z22-FORM
1.6.19 XO-FORM 0 6 9 11 1516 22 31
PO RS RA sh XO sh Rc
Chapter 1. Introduction 15
Version 3.0 B
16 Power ISA™ I
Version 3.0 B
Chapter 1. Introduction 17
Version 3.0 B
18 Power ISA™ I
Version 3.0 B
MB (21:25) R (10)
Field used in M-form instructions to specify the first Field used by the tbegin. instruction to specify the
1-bit of a 64-bit mask, as described in start of a ROT.
Section 3.3.14, “Fixed-Point Rotate and Shift
Formats: X
Instructions” on page 101.
R (15)
Formats: M
Immediate field that specifies whether the RMC is
mb (21:26) specifying the primary or secondary encoding
Field used in MD-form and MDS-form instructions
Field used to specify whether to invalidate Radix
to specify the first 1-bit of a 64-bit mask, as
Tree or HPT entries for tlbie[l].
described in Section 3.3.14, “Fixed-Point Rotate
and Shift Instructions” on page 101. Formats: X, Z23
Formats: MD, MDS RA (11:15)
Field used to specify a GPR to be used as a
me (21:26)
source or as a target.
Field used in MD-form and MDS-form instructions
to specify the last 1-bit of a 64-bit mask, as Formats: A, D, DQ, DQE, DS, M, MD, MDS, TX,
described in Section 3.3.14, “Fixed-Point Rotate VA, VX, X, XO, XS
and Shift Instructions” on page 101.
RB (16:20)
Formats: MD, MDS Field used to specify a GPR to be used as a
source.
ME (26:30)
Field used in M-form instructions to specify the last Formats: A, M, MDS, VA, X, XO
1-bit of a 64-bit mask, as described in
Section 3.3.14, “Fixed-Point Rotate and Shift Rc (21)
Instructions” on page 101. RECORD bit.
OE (21) RC (21:25)
Field used by XO-form instructions to enable set- Field used to specify a GPR to be used as a
ting OV and SO in the XER. source.
Formats: XO Formats: VA
PO (0:5) Rc (31)
Primary opcode. RECORD bit.
Chapter 1. Introduction 19
Version 3.0 B
20 Power ISA™ I
Version 3.0 B
Chapter 1. Introduction 21
Version 3.0 B
XO (21:31) Defined
Extended opcode field. Illegal
Reserved
Formats: VX
The class is determined by examining the opcode, and
XO (22:30)
the extended opcode if any. If the opcode, or combina-
Extended opcode field.
tion of opcode and extended opcode, is not that of a
Formats: XO, XX3, Z22 defined instruction or a reserved instruction, the
instruction is illegal.
XO (22:31)
Extended opcode field.
Formats: VC
1.8.1 Defined Instruction Class
This class of instructions contains all the instructions
XO (23:30)
defined in this document.
Extended opcode field.
A defined instruction can have preferred and/or invalid
Formats: X, Z23
forms, as described in Section 1.9.1, “Preferred
XO (25:30) Instruction Forms” and Section 1.9.2, “Invalid Instruc-
Extended opcode field. tion Forms”.
Formats: TX
XO (26:27)
1.8.2 Illegal Instruction Class
Extended opcode field. This class of instructions contains the set of instructions
Formats: XX4 described in Appendix A of Book Appendices. Illegal
instructions are available for future extensions of the
XO (26:30) Power ISA ; that is, some future version of the Power
Extended opcode field. ISA may define any of these instructions to perform
new functions.
Formats: A, DX
Any attempt to execute an illegal instruction will cause
XO (26:31)
the system illegal instruction error handler to be
Extended opcode field.
invoked and will have no other effect.
Formats: VA
An instruction consisting entirely of binary 0s is guaran-
XO (27:29) teed always to be an illegal instruction. This increases
Extended opcode field. the probability that an attempt to execute data or unini-
tialized storage will result in the invocation of the sys-
Formats: MD
tem illegal instruction error handler.
XO (27:30)
Extended opcode field.
1.8.3 Reserved Instruction Class
Formats: MDS
This class of instructions contains the set of instructions
XO (29:31) described in Appendix B of Book Appendices.
Extended opcode field.
Reserved instructions are allocated to specific pur-
Formats: DQ poses that are outside the scope of the Power ISA.
XO (30) Any attempt to execute a reserved instruction will:
Extended opcode field.
perform the actions described by the implementa-
Formats: SC tion if the instruction is implemented; or
cause the system illegal instruction error handler to
XO (30:31)
be invoked if the instruction is not implemented.
Extended opcode field.
Formats: DQE, DS, SC
22 Power ISA™ I
Version 3.0 B
1.9 Forms of Defined Instruc- with existing implementations that may not support the
new function.
tions When a reserved-no-op instruction is executed, no
operation is performed.
1.9.1 Preferred Instruction Forms Reserved-no-op instructions are not assigned instruc-
Some of the defined instructions have preferred forms. tion names or mnemonics. There are no individual
For such an instruction, the preferred form will execute descriptions of reserved-no-op instructions in this docu-
in an efficient manner, but any other form may take sig- ment.
nificantly longer to execute than the preferred form.
Instructions having preferred forms are: 1.10 Exceptions
the Condition Register Logical instructions
There are two kinds of exception, those caused directly
the Load Quadword instruction
by the execution of an instruction and those caused by
the Move Assist instructions
an asynchronous event. In either case, the exception
the Or Immediate instruction (preferred form of
may cause one of several components of the system
no-op)
software to be invoked.
the Move To Condition Register Fields instruction
The exceptions that can be caused directly by the exe-
cution of an instruction include the following:
1.9.2 Invalid Instruction Forms
an attempt to execute an illegal instruction, or an
Some of the defined instructions can be coded in a attempt by an application program to execute a
form that is invalid. An instruction form is invalid if one “privileged” instruction (see Book III) (system ille-
or more fields of the instruction, excluding the opcode gal instruction error handler or system privileged
field(s), are coded incorrectly in a manner that can be instruction error handler)
deduced by examining only the instruction encoding.
the execution of a defined instruction using an
In general, any attempt to execute an invalid form of an invalid form (system illegal instruction error han-
instruction will either cause the system illegal instruc- dler or system privileged instruction error handler)
tion error handler to be invoked or yield boundedly
an attempt to execute an instruction that is not pro-
undefined results. Exceptions to this rule are stated in
vided by the implementation (system illegal
the instruction descriptions.
instruction error handler)
Some instruction forms are invalid because the instruc-
an attempt to access a storage location that is
tion contains a reserved value in a defined field (see
unavailable (system instruction storage error han-
Section 1.3.3 on page 5); these invalid forms are not
dler or system data storage error handler)
discussed further. All other invalid forms are identified
in the instruction descriptions. an attempt to access storage with an effective
address alignment that is invalid for the instruction
References to instructions elsewhere in this document (system alignment error handler)
assume the instruction form is not invalid, unless other-
wise stated or obvious from context. the execution of a System Call or System Call Vec-
tored instruction (system service program)
Assembler Note the execution of a Trap instruction that traps (sys-
Assemblers should report uses of invalid instruc- tem trap handler)
tion forms as errors.
the execution of a floating-point instruction that
causes a floating-point enabled exception to exist
(system floating-point enabled exception error
1.9.3 Reserved-no-op Instructions handler)
Reserved-no-op instructions include the following the execution of an auxiliary processor instruction
extended opcodes under primary opcode 31: 530, 562, that causes an auxiliary processor enabled excep-
594, 626, 658, 690, 722, and 754. tion to exist (system auxiliary processor enabled
exception error handler)
Reserved-no-op instructions are provided in the archi-
tecture to anticipate the eventual adoption of perfor- The exceptions that can be caused by an asynchro-
mance hint instructions to the architecture. For these nous event are described in Book III.
instructions, which cause no visible change to archi-
tected state, employing a reserved-no-op opcode will The invocation of the system error handler is precise,
allow software to use this new capability on new imple- except that the invocation of the auxiliary processor
mentations that support it while remaining compatible enabled exception error handler may be imprecise, and
Chapter 1. Introduction 23
Version 3.0 B
if one of the imprecise modes for invoking the system (Load/Store Multiple), or by an immediate field in the
floating-point enabled exception error handler is in instruction or the contents of a field in the XER (Move
effect (see page 133), then the invocation of the system Assist), as well as by the name of the instruction. For
floating-point enabled exception error handler may also example, the length of the storage operand of a Load
be imprecise. When the system error handler is Multiple Word instruction for which the specified target
invoked imprecisely, the excepting instruction does not register is GPR 20 is 48 bytes ((32-20)x4), and the
appear to complete before the next instruction starts length of the storage operand of a Load String Word
(because one of the effects of the excepting instruction, Immediate instruction for which the immediate field
namely the invocation of the system error handler, has contains the number 20 is 20 bytes.
not yet occurred).
The storage operand of a Load or Store instruction
Additional information about exception handling can be other than a Load/Store Multiple or Move Assist instruc-
found in Book III. tion is said to be aligned if the address of the storage
operand is an integral multiple of the storage operand
length; otherwise it is said to be unaligned. See the fol-
1.11 Storage Addressing lowing table. (The storage operand of a Load/Store
Multiple or Move Assist instruction is neither said to be
A program references storage using the effective aligned nor said to be unaligned. Its alignment proper-
address computed by the processor when it executes a ties are described, when necessary, using terms such
Storage Access or Branch instruction (or certain other as “word-aligned”, which are defined below.)
instructions described in Book II and Book III), or when
it fetches the next sequential instruction. Operand Length Addr60:63 if aligned
Bytes in storage are numbered consecutively starting Byte 8 bits xxxx
with 0. Each number is the address of the correspond- Halfword 2 bytes xxx0
ing byte. Word 4 bytes xx00
Doubleword 8 bytes x000
The byte ordering (Big-Endian or Little-Endian) for a
Quadword 16 bytes 0000
storage access is specified by the operating system.
This byte ordering is also referred to as the Endian Note: An “x” in an address bit position indicates that
mode and it applies to both data accesses and instruc- the bit can be 0 or 1 independent of the contents of
tion fetches. The Endian mode is specified by the LE other bits in the address.
mode bit (see Section 3.2.1 of Book III), which applies The concept of alignment is also applied more gener-
to all of storage. ally, to any datum in storage.
A datum having length that is an integral power of
1.11.1 Storage Operands 2 is said to be aligned if its address is an integral
multiple of its length.
A storage operand may be a byte, a halfword, a word, a A datum of any length is said to be half-
doubleword, or a quadword, or, for the Load/Store Mul- word-aligned (or aligned at a halfword boundary) if
tiple and Move Assist instructions, a sequence of bytes its address is an integral multiple of 2,
(Move Assist) or words (Load/Store Multiple). The word-aligned (or aligned at a word boundary) if its
address of a storage operand is the address of its first address is an integral multiple of 4, etc. (All data in
byte (i.e., of its lowest-numbered byte). An instruction storage is byte-aligned.)
for which the storage operand is a byte is said to cause
a byte access, and similarly for halfword, word, double- The concept of alignment can also be applied to data in
word, and quadword. registers, with the "address" of the datum interpreted as
the byte number of the datum in the register. E.g., a
The length of the storage operand is the number of word element (4 bytes) in a Vector Register is said to
bytes (of the storage operand) that the instruction be aligned if its byte number is an integral multiple of 4.
would access in the absence of invocations of the sys-
tem error handler. The length is generally implied by Programming Note
the name of the instruction (equivalently, by the The technical literature sometimes uses the term
opcode, and extended opcode if any). For example, the “naturally aligned” to mean “aligned.”
length of the storage operand of a Load Word and
Zero, Load Floating-Point Single, and Load Vector Ele- Versions of the architecture that precede Version
ment Word instruction is four bytes (one word), and the 2.07 also used “naturally aligned” as defined
length of a Store Quadword, Store Floating-Point Dou- above. The term was dropped from the architecture
ble Pair, and Store VSX Vector Word*4 instruction is 16 in Version 2.07 because it seemed to mean differ-
bytes (one quadword). The only exceptions are the ent things to different readers and is not needed.
Load/Store Multiple and Move Assist instructions, for
which the length of the storage operand is implied by
the identity of the specified source or target register
24 Power ISA™ I
Version 3.0 B
Some instructions require their storage operands to Figure 29 shows an example of a C language
have certain alignments. In addition, alignment may structure s containing an assortment of scalars and
affect performance. In general, the best performance is one character string. The value assumed to be in each
obtained when storage operands are aligned. structure element is shown in hex in the C comments;
these values are used below to show how the bytes
When a storage operand of length N bytes starting at making up each structure element are mapped into
effective address EA is copied between storage and a storage. It is assumed that structure s is compiled for
register that is R bytes long (i.e., the register contains 32-bit mode or for a 32-bit implementation. (This affects
bytes numbered from 0, most significant, through R-1, the length of the pointer to c.)
least significant), the bytes of the operand are placed C structure mapping rules permit the use of padding
into the register or into storage in a manner that (skipped bytes) in order to align the scalars on desir-
depends on the byte ordering for the storage access as able boundaries. Figures 30 and 31 show each scalar
shown in Figure 28, unless otherwise specified in the as aligned. This alignment introduces padding of four
instruction description. bytes between a and b, one byte between d and e, and
two bytes between e and f. The same amount of pad-
ding is present for both Big-Endian and Little-Endian
Big-Endian Byte Ordering mappings.
Load Store The Big-Endian mapping of structure s is shown in
for i=0 to N-1: for i=0 to N-1: Figure 30. Addresses are shown in hex at the left of
RT(R-N)+i MEM(EA+i,1) MEM(EA+i,1) (RS)(R-N)+i each doubleword, and in small figures below each byte.
Little-Endian Byte Ordering The contents of each byte, as indicated in the C exam-
ple in Figure 29, are shown in hex (as characters for
Load Store the elements of the string).
for i=0 to N-1: for i=0 to N-1:
The Little-Endian mapping of structure s is shown in
RT(R-1)-i MEM(EA+i,1) MEM(EA+i,1) (RS)(R-1)-i
Figure 31. Doublewords are shown laid out from right
Notes: to left, which is the common way of showing storage
1. In this table, subscripts refer to bytes in a register maps for processors that implement only Little-Endian
rather than to bits as defined in Section 1.3.2. byte ordering.
2. This table does not apply to the lvebx, lvehx,
lvewx, stvebx, stvehx, and stvewx instructions.
Figure 28. Storage operands and byte ordering
struct {
int a; /* 0x1112_1314 word */
double b; /* 0x2122_2324_2526_2728 doubleword */
char * c; /* 0x3132_3334 word */
char d[7]; /* ‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’, ‘G’ array of bytes */
short e; /* 0x5152 halfword */
int f; /* 0x6162_6364 word */
} s;
Chapter 1. Introduction 25
Version 3.0 B
loop:
cmplwi r5,0
beq done
lwzux r4,r5,r6
add r7,r7,r4
subi r5,r5,4
b loop
done:
stw r7,total
Figure 33. Assembly language program ‘p’
The Big-Endian mapping of program p is shown in
Figure 34 (assuming the program starts at address 0).
26 Power ISA™ I
Version 3.0 B
Programming Note
The terms Big-Endian and Little-Endian come from forbidden, and the whole Party rendered incapable
Part I, Chapter 4, of Jonathan Swift’s Gulliver’s Travels. by Law of holding Employments. During the
Here is the complete passage, from the edition printed Course of these Troubles, the Emperors of Ble-
in 1734 by George Faulkner in Dublin. fuscu did frequently expostulate by their Ambassa-
dors, accusing us of making a Schism in Religion,
... our Histories of six Thousand Moons make no
by offending against a fundamental Doctrine of our
Mention of any other Regions, than the two great
great Prophet Lustrog, in the fifty-fourth Chapter of
Empires of Lilliput and Blefuscu. Which two mighty
the Brundrecal, (which is their Alcoran.) This, how-
Powers have, as I was going to tell you, been
ever, is thought to be a mere Strain upon the text:
engaged in a most obstinate War for six and thirty
For the Words are these; That all true Believers
Moons past. It began upon the following Occasion.
shall break their Eggs at the convenient End: and
It is allowed on all Hands, that the primitive Way of
which is the convenient End, seems, in my humble
breaking Eggs before we eat them, was upon the
Opinion, to be left to every Man’s Conscience, or
larger End: But his present Majesty’s Grand-father,
at least in the Power of the chief Magistrate to
while he was a Boy, going to eat an Egg, and
determine. Now the Big-Endian Exiles have found
breaking it according to the ancient Practice, hap-
so much Credit in the Emperor of Blefuscu’s Court;
pened to cut one of his Fingers. Whereupon the
and so much private Assistance and Encourage-
Emperor his Father, published an Edict, command-
ment from their Party here at home, that a bloody
ing all his Subjects, upon great Penalties, to break
War has been carried on between the two Empires
the smaller End of their Eggs. The People so
for six and thirty Moons with various Success;
highly resented this Law, that our Histories tell us,
during which Time we have lost Forty Capital
there have been six Rebellions raised on that
Ships, and a much greater Number of smaller Ves-
Account; wherein one Emperor lost his Life, and
sels, together with thirty thousand of our best Sea-
another his Crown. These civil Commotions were
men and Soldiers; and the Damage received by
constantly fomented by the Monarchs of Blefuscu;
the Enemy is reckoned to be somewhat greater
and when they were quelled, the Exiles always fled
than ours. However, they have now equipped a
for Refuge to that Empire. It is computed that
numerous Fleet, and are just preparing to make a
eleven Thousand Persons have, at several Times,
Descent upon us: and his Imperial Majesty, placing
suffered Death, rather than submit to break their
great Confidence in your Valour and Strength, hath
Eggs at the smaller End. Many hundred large Vol-
commanded me to lay this Account of his Affairs
umes have been published upon this Controversy:
before you.
But the Books of the Big-Endians have been long
1.11.3 Effective Address Calcula- tic wraps around from the maximum address, 264 - 1,
to address 0, except that if the current instruction is at
tion effective address 264 - 4 the effective address of the
next sequential instruction is undefined.
An effective address is computed by the processor
when executing a Storage Access or Branch instruction In 32-bit mode, the low-order 32 bits of the 64-bit result,
(or certain other instructions described in Book II and preceded by 32 0 bits, comprise the 64-bit effective
Book III) when fetching the next sequential instruction, address for the purpose of addressing storage, except
or when invoking a system error handler. The following that if the current instruction is at effective address
provides an overview of this process. More detail is 232- 4 the 64-bit effective address of the next sequen-
provided in the individual instruction descriptions. tial instruction is undefined. Thus, as used to address
storage, the effective address arithmetic appears to
Effective address calculations, for both data and
wrap around from the maximum address 232-1, to
instruction accesses, use 64-bit two’s complement
address 0, except when the resulting 64-bit effective
addition. All 64 bits of each address component partici-
address is undefined as just described. When an effec-
pate in the calculation regardless of mode (32-bit or
tive address is placed into a register by an instruction
64-bit). In this computation one operand is an address
or event, the value placed into the register is as follows.
(which is by definition an unsigned number) and the
Register RA when set by Load with Update and
second is a signed offset. Carries out of the most signif-
Store with Update instructions: the entire 64-bit
icant bit are ignored.
result.
In 64-bit mode, the entire 64-bit result comprises the All other cases (e.g., the Link Register when set by
64-bit effective address. The effective address arithme- Branch instructions having LK=1, Special Purpose
Chapter 1. Introduction 27
Version 3.0 B
Registers when set to an effective address by invo- sign-extended to form a 64-bit address compo-
cation of a system error handler): the low-order 32 nent. If AA=0, this address component is added to
bits of the 64-bit result preceded by 32 0 bits, the address of the Branch instruction to form the
except that if the intended effective address is that effective address of the target instruction. If AA=1,
of the NIA of the instruction at effective address this address component is the effective address of
232-4 the value placed into the register is unde- the target instruction.
fined. With XL-form Branch instructions, bits 0:61 of the
Link Register or the Count Register are concate-
RA is a field in the instruction which specifies an
nated on the right with 0b00 to form the effective
address component in the computation of an effective
address of the target instruction.
address. A zero in the RA field indicates the absence of
the corresponding address component. A value of zero With sequential instruction fetching, the value 4 is
is substituted for the absent component of the effective added to the address of the current instruction to
address computation. This substitution is shown in the form the effective address of the next instruction,
instruction descriptions as (RA|0). except that if the current instruction is at the maxi-
mum instruction effective address for the mode
Effective addresses are computed as follows. In the
(264 - 4 in 64-bit mode, 232 - 4 in 32-bit mode) the
descriptions below, it should be understood that “the
effective address of the next sequential instruction
contents of a GPR” refers to the entire 64-bit contents,
is undefined.
independent of mode, but that in 32-bit mode only bits
32:63 of the 64-bit result of the computation are used to If the size of the operand of a Storage Access instruc-
address storage. tion is more than one byte, the effective address for
each byte after the first is computed by adding 1 to the
With X-form instructions, in computing the effective
effective address of the preceding byte.
address of a data element, the contents of the
GPR designated by RB (or the value zero for lswi
and stswi) are added to the contents of the GPR
designated by RA or to zero if RA=0 or RA is not
used in forming the EA.
With D-form instructions, the 16-bit D field is
sign-extended to form a 64-bit address compo-
nent. In computing the effective address of a data
element, this address component is added to the
contents of the GPR designated by RA or to zero if
RA=0.
With DS-form instructions, the 14-bit DS field is
concatenated on the right with 0b00 and
sign-extended to form a 64-bit address compo-
nent. In computing the effective address of a data
element, this address component is added to the
contents of the GPR designated by RA or to zero if
RA=0.
With DQ-form instructions, the 12-bit DQ field is
concatenated on the right with 0b0000 and
sign-extended to form a 64-bit address compo-
nent. In computing the effective address of a data
element, this address component is added to the
contents of the GPR designated by RA or to zero if
RA=0.
With I-form Branch instructions, the 24-bit LI field
is concatenated on the right with 0b00 and
sign-extended to form a 64-bit address compo-
nent. If AA=0, this address component is added to
the address of the Branch instruction to form the
effective address of the target instruction. If AA=1,
this address component is the effective address of
the target instruction.
With B-form Branch instructions, the 14-bit BD field
is concatenated on the right with 0b00 and
28 Power ISA™ I
Version 3.0 B
2.1 Branch Facility Overview respect to setting exception bits and (if the excep-
tion is enabled) invoking the system error handler.
This chapter describes the registers and instructions A Store instruction modifies one or more bytes in
that make up the Branch Facility. an area of storage that contains instructions that
will subsequently be executed. Before an instruc-
tion in that area of storage is executed, software
2.2 Instruction Execution Order synchronization is required to ensure that the
instructions executed are consistent with the
In general, instructions appear to execute sequentially, results produced by the Store instruction.
in the order in which they appear in storage. The
exceptions to this rule are listed below. Programming Note
Branch instructions for which the branch is taken This software synchronization will generally be
cause execution to continue at the target address provided by system library programs (see
specified by the Branch instruction. Section 1.9 of Book II). Application programs
Trap instructions for which the trap conditions are should call the appropriate system library pro-
satisfied, and System Call and System Call Vec- gram before attempting to execute modified
tored instructions, cause the appropriate system instructions.
handler to be invoked.
Transaction failure will eventually cause the trans-
action’s failure handler, implied by the tbegin.
instruction, to be invoked. See the programming
note following the tbegin. description in
Section 5.5 of Book II.
Event-based exceptions can cause the
event-based branch handler to be invoked, as
described in Chapter 7 of Book II.
Exceptions can cause the system error handler to
be invoked, as described in Section 1.10, “Excep-
tions” on page 23.
Returning from a system service program, system
trap handler, or system error handler causes exe-
cution to continue at a specified address.
The model of program execution in which the proces-
sor appears to execute one instruction at a time, com-
pleting each instruction before beginning to execute the
next instruction is called the “sequential execution
model”. In general, the processor obeys the sequential
execution model. For the instructions and facilities
defined in this Book, the only exceptions to this rule are
the following.
A floating-point exception occurs when the proces-
sor is running in one of the Imprecise floating-point
exception modes (see Section 4.4). The instruction
that causes the exception need not complete
before the next instruction begins execution, with
2.3 Branch Facility Registers The bits of CR Field 0 are interpreted as follows.
30 Power ISA™ I
Version 3.0 B
The paste. instruction (see Section 4.4, “Copy-Paste SI, UI, or (RB). For floating-point Compare
Facility”, in Book II) and the stbcx., sthcx., stwcx., instructions, (FRA) = (FRB).
stdcx., and stqcx. instructions (see Section 4.6.2,
3 Summary Overflow, Floating-Point Unor-
“Load and Reserve and Store Conditional Instructions”,
dered (SO,FU)
in Book II) also set CR Field 0.
For fixed-point Compare instructions, this is a
For all floating-point instructions in which Rc=1, CR copy of the contents of XERSO at the comple-
Field 1 (bits 36:39 of the Condition Register) is set to tion of the instruction. For floating-point Com-
the Floating-Point exception status, copied from bits pare instructions, one or both of (FRA) and
32:35 of the Floating-Point Status and Control Register. (FRB) is a NaN.
This occurs regardless of whether any exceptions are
The Vector Integer Compare instructions (see
enabled, and regardless of whether the writing of the
Section 6.9.3, “Vector Integer Compare Instructions”)
result is suppressed (see Section 4.4, “Floating-Point
compare two Vector Registers element by element,
Exceptions” on page 132). These bits are interpreted
interpreting the elements as unsigned or signed inte-
as follows.
gers depending on the instruction, and set the corre-
sponding element of the target Vector Register to all 1s
Bit Description
if the relation being tested is true and 0s if the relation
32 Floating-Point Exception Summary (FX) being tested is false.
This is a copy of the contents of FPSCRFX at
If Rc=1, CR Field 6 is set to reflect the result of the
the completion of the instruction.
comparison, as follows
33 Floating-Point Enabled Exception Sum-
mary (FEX) Bit Description
This is a copy of the contents of FPSCRFEX at
0 The relation is true for all element pairs (i.e.,
the completion of the instruction.
VRT is set to all 1s).
34 Floating-Point Invalid Operation Excep-
1 0
tion Summary (VX)
This is a copy of the contents of FPSCRVX at 2 The relation is false for all element pairs (i.e.,
the completion of the instruction. VRT is set to all 0s).
35 Floating-Point Overflow Exception (OX) 3 0
This is a copy of the contents of FPSCROX at
The Vector Floating-Point Compare instructions com-
the completion of the instruction.
pare two Vector Registers word element by word ele-
For Compare instructions, a specified CR field is set to ment, interpreting the elements as single-precision
reflect the result of the comparison. The bits of the floating-point numbers. With the exception of the Vector
specified CR field are interpreted as follows. A com- Compare Bounds Floating-Point instruction, they set
plete description of how the bits are set is given in the the target Vector Register, and CR Field 6 if Rc=1, in
instruction descriptions in Section 3.3.10, “Fixed-Point the same manner as do the Vector Integer Compare
Compare Instructions” on page 84, and Section 4.6.8, instructions.
“Floating-Point Compare Instructions” on page 167.
Bit Description
Bit Description
0 The relation is true for all element pairs (i.e.,
0 Less Than, Floating-Point Less Than (LT, VRT is set to all 1s).
FL)
1 0
For fixed-point Compare instructions, (RA) <
SI or (RB) (signed comparison) or (RA) <u UI 2 The relation is false for all element pairs (i.e.,
or (RB) (unsigned comparison). For float- VRT is set to all 0s).
ing-point Compare instructions, (FRA) < 3 0
(FRB).
The Vector Compare Bounds Floating-Point instruction
1 Greater Than, Floating-Point Greater Than
on page 328 sets CR Field 6 if Rc=1, to indicate
(GT, FG)
whether the elements in VRA are within the bounds
For fixed-point Compare instructions, (RA) >
specified by the corresponding element in VRB, as
SI or (RB) (signed comparison) or (RA) >u UI
explained in the instruction description. A single-preci-
or (RB) (unsigned comparison). For float-
sion floating-point value x is said to be “within the
ing-point Compare instructions, (FRA) >
bounds” specified by a single-precision floating-point
(FRB).
value y if -y x y.
2 Equal, Floating-Point Equal (EQ, FE)
For fixed-point Compare instructions, (RA) =
Bit Description
0 0
1 0
2 Set to indicate whether all four elements in <BHRB material moved to Chapter 8 of Book II.>
VRA are within the bounds specified by the
corresponding element in VRB, otherwise set
to 0.
3 0
LR
0 63
CTR
0 63
Efffective Address
0 62
Programming Note
The TAR is reserved for system software.
32 Power ISA™ I
Version 3.0 B
BH Hint
00 bclr[l]: The instruction is a subroutine
return
bcctr[l] and bctar[l]:The instruction is not a
subroutine return; the target
address is likely to be the same as
the target address used the pre-
ceding time the branch was taken
01 bclr[l]: The instruction is not a subroutine
return; the target address is likely to
be the same as the target address
used the preceding time the branch
was taken
bcctr[l] and bctar[l]:Reserved
10 Reserved
11 bclr[l], bcctr[l], and bctar[l]: The target
address is not predictable
Figure 42. BH field encodings
Programming Note
The hint provided by the BH field is independent of
the hint provided by the “at” bits (e.g., the BH field
provides no indication of whether the branch is
likely to be taken).
Programming Note
The hints provided by the “at” bits and by the BH
field do not affect the results of executing the
instruction.
The “z” bits should be set to 0, because they may
be assigned a meaning in some future version of
the architecture.
34 Power ISA™ I
Version 3.0 B
Programming Note
Many implementations have dynamic mechanisms for branch to, and use a bcctr instruction (LK=0, and
predicting the target addresses of bclr[l] and bcctr[l] BH=0b11 if appropriate) to branch to the selected
instructions. These mechanisms may cache return address.
addresses (i.e., Link Register values set by Branch
Direct subroutine linkage:
instructions for which LK=1 and for which the branch
Here A calls B and B returns to A. The two
was taken, other than the special form shown in the first
branches should be as follows.
example below) and recently used branch target
- A calls B: use a bl or bcl instruction (LK=1).
addresses. To obtain the best performance across the
- B returns to A: use a bclr instruction (LK=0)
widest range of implementations, the programmer
(the return address is in, or can be restored to,
should obey the following rules.
the Link Register).
Use Branch instructions for which LK=1 only as Indirect subroutine linkage:
subroutine calls (including function calls, etc.), or in Here A calls Glue, Glue calls B, and B returns to A
the special form shown in the first example below. rather than to Glue. (Such a calling sequence is
Pair each subroutine call (i.e., each Branch common in linkage code used when the subroutine
instruction for which LK=1 and the branch is taken, that the programmer wants to call, here B, is in a
other than the special form shown in the first different module from the caller; the Binder inserts
example below) with a bclr instruction that returns “glue” code to mediate the branch.) The three
from the subroutine and has BH=0b00. branches should be as follows.
Do not use bclrl as a subroutine call. (Some imple- - A calls Glue: use a bl or bcl instruction
mentations access the return address cache at (LK=1).
most once per instruction; such implementations - Glue calls B: place the address of B into the
are likely to treat bclrl as a subroutine return, and Count Register, and use a bcctr instruction
not as a subroutine call.) (LK=0).
For bclr[l] and bcctr[l], use the appropriate value - B returns to A: use a bclr instruction (LK=0)
in the BH field. (the return address is in, or can be restored to,
The following are examples of programming conven- the Link Register).
tions that obey these rules. In the examples, BH is
assumed to contain 0b00 unless otherwise stated. In Function call:
addition, the “at” bits are assumed to be coded appro- Here A calls a function, the identity of which may
priately. vary from one instance of the call to another,
instead of calling a specific program B. This case
Let A, B, and Glue be specific programs. should be handled using the conventions of the
Obtaining the address of the next instruction: preceding two bullets, depending on whether the
Use the following form of Branch and Link. call is direct or indirect, with the following differ-
bcl 20,31,$+4 ences.
Loop counts: - If the call is direct, place the address of the
Keep them in the Count Register, and use a bc function into the Count Register, and use a
instruction (LK=0) to decrement the count and to bcctrl instruction (LK=1) instead of a bl or bcl
branch back to the beginning of the loop if the dec- instruction.
remented count is nonzero. - For the bcctr[l] instruction that branches to
the function, use BH=0b11 if appropriate.
Computed goto’s, case statements, etc.:
Use the Count Register to hold the address to
Compatibility Note
The bits corresponding to the current “a” and “t”
bits, and to the current “z” bits except in the “branch
always” BO encoding, had different meanings in
versions of the architecture that precede Version
2.00.
The bit corresponding to the “t” bit was called
the “y” bit. The “y” bit indicated whether to use
the architected default prediction (y=0) or to
use the complement of the default prediction
(y=1). The default prediction was defined as
follows.
- If the instruction is bc[l][a] with a negative
value in the displacement field, the branch
is taken. (This is the only case in which
the prediction corresponding to the “y” bit
differs from the prediction corresponding
to the “t” bit.)
- In all other cases (bc[l][a] with a nonnega-
tive value in the displacement field, bclr[l],
or bcctr[l]), the branch is not taken.
The BO encodings that test both the Count
Register and the Condition Register had a “y”
bit in place of the current “z” bit. The meaning
of the “y” bit was as described in the preceding
item.
The “a” bit was a “z” bit.
Because these bits have always been defined
either to be ignored or to be treated as hints, a
given program will produce the same result on any
implementation regardless of the values of the bits.
Also, because even the “y” bit is ignored, in prac-
tice, by most processors that comply with versions
of the architecture that precede Version 2.00, the
performance of a given program on those proces-
sors will not be affected by the values of the bits.
36 Power ISA™ I
Version 3.0 B
18 LI AA LK 16 BO BI BD AA LK
0 6 30 31 0 6 11 16 30 31
Programming Note
bclr, bclrl, bcctr, and bcctrl each serve as both a
basic and an extended mnemonic. The Assembler
will recognize a bclr, bclrl, bcctr, or bcctrl mne-
monic with three operands as the basic form, and a
bclr, bclrl, bcctr, or bcctrl mnemonic with two
operands as the extended form. In the extended
form the BH operand is omitted and assumed to be
0b00.
38 Power ISA™ I
Version 3.0 B
19 BO BI /// BH 560 LK
0 6 11 16 19 21 31
if (64-bit mode)
then M 0
else M 32
if ¬BO2 then CTR CTR - 1
ctr_ok BO2 | ((CTRM:63 0) BO3
cond_ok BO0 | (CRBI+32 BO1)
if ctr_ok & cond_ok then NIA iea TAR0:61 || 0b00
if LK then LR iea CIA + 4
BI+32 specifies the Condition Register bit to be tested.
The BO field is used to resolve the branch as described
in Figure 40. The BH field is used as described in
Figure 42. The branch target address is
TAR0:61 || 0b00, with the high-order 32 bits of the
branch target address set to 0 in 32-bit mode.
If LK=1 then the effective address of the instruction fol-
lowing the Branch instruction is placed into the Link
Register.
Special Registers Altered:
CTR (if BO2=0)
LR (if LK=1)
Programming Note
In some systems, the system software will restrict
usage of the bctar[l] instruction to only selected
programs. If an attempt is made to execute the
instruction when it is not available, the system error
handler will be invoked. See Book III for additional
information.
19 BT BA BB 257 / 19 BT BA BB 225 /
0 6 11 16 21 31 0 6 11 16 21 31
19 BT BA BB 449 / 19 BT BA BB 193 /
0 6 11 16 21 31 0 6 11 16 21 31
40 Power ISA™ I
Version 3.0 B
19 BT BA BB 33 / 19 BT BA BB 289 /
0 6 11 16 21 31 0 6 11 16 21 31
19 BT BA BB 129 / 19 BT BA BB 417 /
0 6 11 16 21 31 0 6 11 16 21 31
19 BF // BFA // /// 0 /
0 6 9 11 14 16 21 31
CR4BF+32:4BF+35 CR4BFA+32:4BFA+35
The contents of Condition Register field BFA are copied
to Condition Register field BF.
Special Registers Altered:
CR field BF
Programming Note
sc serves as both a basic and an extended mne-
monic. The Assembler will recognize an sc mne-
monic with one operand as the basic form, and an
sc mnemonic with no operand as the extended
form. In the extended form the LEV operand is
omitted and assumed to be 0.
In application programs the value of the LEV oper-
and for sc should be 0.
42 Power ISA™ I
Version 3.0 B
44 Power ISA™ I
Version 3.0 B
3.2.1 General Purpose Registers The bits are set based on the operation of an instruc-
tion considered as a whole, not on intermediate results
All manipulation of information is done in registers (e.g., the Subtract From Carrying instruction, the result
internal to the Fixed-Point Facility. The principal storage of which is specified as the sum of three values, sets
internal to the Fixed-Point Facility is a set of 32 General bits in the Fixed-Point Exception Register based on the
Purpose Registers (GPRs). See Figure 43. entire operation, not on an intermediate sum).
46 Power ISA™ I
Version 3.0 B
Programming Note
In some implementations, the Load Algebraic and
Load with Update instructions may have greater
latency than other types of Load instructions. More-
over, Load with Update instructions may take lon-
ger to execute in some implementations than the
corresponding pair of a non-update Load instruc-
tion and an Add instruction.
Load Byte and Zero D-form Load Byte and Zero Indexed X-form
lbz RT,D(RA) lbzx RT,RA,RB
34 RT RA D 31 RT RA RB 87 /
0 6 11 16 31 0 6 11 16 21 31
if RA = 0 then b 0 if RA = 0 then b 0
else b (RA) else b (RA)
EA b + EXTS(D) EA b + (RB)
RT 560 || MEM(EA, 1) RT 560 || MEM(EA, 1)
Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum
The byte in storage addressed by EA is loaded into (RA|0)+ (RB). The byte in storage addressed by EA is
RT56:63. RT0:55 are set to 0. loaded into RT56:63. RT0:55 are set to 0.
Special Registers Altered: Special Registers Altered:
None None
Load Byte and Zero with Update D-form Load Byte and Zero with Update Indexed
X-form
lbzu RT,D(RA)
lbzux RT,RA,RB
35 RT RA D
0 6 11 16 31 31 RT RA RB 119 /
0 6 11 16 21 31
EA (RA) + EXTS(D)
RT 560 || MEM(EA, 1) EA (RA) + (RB)
RA EA RT 560 || MEM(EA, 1)
RA EA
Let the effective address (EA) be the sum (RA)+ D. The
byte in storage addressed by EA is loaded into RT56:63. Let the effective address (EA) be the sum (RA)+ (RB).
RT0:55 are set to 0. The byte in storage addressed by EA is loaded into
RT56:63. RT0:55 are set to 0.
EA is placed into register RA.
EA is placed into register RA.
If RA=0 or RA=RT, the instruction form is invalid.
If RA=0 or RA=RT, the instruction form is invalid.
Special Registers Altered:
None Special Registers Altered:
None
48 Power ISA™ I
Version 3.0 B
Load Halfword and Zero D-form Load Halfword and Zero Indexed X-form
lhz RT,D(RA) lhzx RT,RA,RB
40 RT RA D 31 RT RA RB 279 /
0 6 11 16 31 0 6 11 16 21 31
if RA = 0 then b 0 if RA = 0 then b 0
else b (RA) else b (RA)
EA b + EXTS(D) EA b + (RB)
RT 480 || MEM(EA, 2) RT 480 || MEM(EA, 2)
Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum
The halfword in storage addressed by EA is loaded into (RA|0)+ (RB). The halfword in storage addressed by
RT48:63. RT0:47 are set to 0. EA is loaded into RT48:63. RT0:47 are set to 0.
Special Registers Altered: Special Registers Altered:
None None
Load Halfword and Zero with Update Load Halfword and Zero with Update
D-form Indexed X-form
lhzu RT,D(RA) lhzux RT,RA,RB
41 RT RA D 31 RT RA RB 311 /
0 6 11 16 31 0 6 11 16 21 31
42 RT RA D 31 RT RA RB 343 /
0 6 11 16 31 0 6 11 16 21 31
if RA = 0 then b 0 if RA = 0 then b 0
else b (RA) else b (RA)
EA b + EXTS(D) EA b + (RB)
RT EXTS(MEM(EA, 2)) RT EXTS(MEM(EA, 2))
Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum
The halfword in storage addressed by EA is loaded into (RA|0)+ (RB). The halfword in storage addressed by
RT48:63. RT0:47 are filled with a copy of bit 0 of the EA is loaded into RT48:63. RT0:47 are filled with a copy
loaded halfword. of bit 0 of the loaded halfword.
Special Registers Altered: Special Registers Altered:
None None
Load Halfword Algebraic with Update Load Halfword Algebraic with Update
D-form Indexed X-form
lhau RT,D(RA) lhaux RT,RA,RB
43 RT RA D 31 RT RA RB 375 /
0 6 11 16 31 0 6 11 16 21 31
50 Power ISA™ I
Version 3.0 B
Load Word and Zero D-form Load Word and Zero Indexed X-form
lwz RT,D(RA) lwzx RT,RA,RB
32 RT RA D 31 RT RA RB 23 /
0 6 11 16 31 0 6 11 16 21 31
if RA = 0 then b 0 if RA = 0 then b 0
else b (RA) else b (RA)
EA b + EXTS(D) EA b + (RB)
RT 320 || MEM(EA, 4) RT 320 || MEM(EA, 4)
Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum
The word in storage addressed by EA is loaded into (RA|0)+ (RB). The word in storage addressed by EA is
RT32:63. RT0:31 are set to 0. loaded into RT32:63. RT0:31 are set to 0.
Special Registers Altered: Special Registers Altered:
None None
Load Word and Zero with Update D-form Load Word and Zero with Update Indexed
X-form
lwzu RT,D(RA)
lwzux RT,RA,RB
33 RT RA D
0 6 11 16 31 31 RT RA RB 55 /
0 6 11 16 21 31
EA (RA) + EXTS(D)
RT 320 || MEM(EA, 4) EA (RA) + (RB)
RA EA RT 320 || MEM(EA, 4)
RA EA
Let the effective address (EA) be the sum (RA)+ D. The
word in storage addressed by EA is loaded into Let the effective address (EA) be the sum (RA)+ (RB).
RT32:63. RT0:31 are set to 0. The word in storage addressed by EA is loaded into
RT32:63. RT0:31 are set to 0.
EA is placed into register RA.
EA is placed into register RA.
If RA=0 or RA=RT, the instruction form is invalid.
If RA=0 or RA=RT, the instruction form is invalid.
Special Registers Altered:
None Special Registers Altered:
None
58 RT RA DS 2 31 RT RA RB 341 /
0 6 11 16 30 31 0 6 11 16 21 31
if RA = 0 then b 0 if RA = 0 then b 0
else b (RA) else b (RA)
EA b + EXTS(DS || 0b00) EA b + (RB)
RT EXTS(MEM(EA, 4)) RT EXTS(MEM(EA, 4))
Let the effective address (EA) be the sum Let the effective address (EA) be the sum
(RA|0)+ (DS||0b00). The word in storage addressed by (RA|0)+ (RB). The word in storage addressed by EA is
EA is loaded into RT32:63. RT0:31 are filled with a copy loaded into RT32:63. RT0:31 are filled with a copy of bit 0
of bit 0 of the loaded word. of the loaded word.
Special Registers Altered: Special Registers Altered:
None None
31 RT RA RB 373 /
0 6 11 16 21 31
EA (RA) + (RB)
RT EXTS(MEM(EA, 4))
RA EA
Let the effective address (EA) be the sum (RA)+ (RB).
The word in storage addressed by EA is loaded into
RT32:63. RT0:31 are filled with a copy of bit 0 of the
loaded word.
EA is placed into register RA.
If RA=0 or RA=RT, the instruction form is invalid.
Special Registers Altered:
None
52 Power ISA™ I
Version 3.0 B
58 RT RA DS 0 31 RT RA RB 21 /
0 6 11 16 30 31 0 6 11 16 21 31
if RA = 0 then b 0 if RA = 0 then b 0
else b (RA) else b (RA)
EA b + EXTS(DS || 0b00) EA b + (RB)
RT MEM(EA, 8) RT MEM(EA, 8)
Let the effective address (EA) be the sum Let the effective address (EA) be the sum
(RA|0)+ (DS||0b00). The doubleword in storage (RA|0)+ (RB). The doubleword in storage addressed by
addressed by EA is loaded into RT. EA is loaded into RT.
Special Registers Altered: Special Registers Altered:
None None
Load Doubleword with Update DS-form Load Doubleword with Update Indexed
X-form
ldu RT,DS(RA)
ldux RT,RA,RB
58 RT RA DS 1
0 6 11 16 30 31 31 RT RA RB 53 /
0 6 11 16 21 31
38 RS RA D 31 RS RA RB 215 /
0 6 11 16 31 0 6 11 16 21 31
if RA = 0 then b 0 if RA = 0 then b 0
else b (RA) else b (RA)
EA b + EXTS(D) EA b + (RB)
MEM(EA, 1) (RS)56:63 MEM(EA, 1) (RS)56:63
Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum
(RS)56:63 are stored into the byte in storage addressed (RA|0)+ (RB). (RS)56:63 are stored into the byte in stor-
by EA. age addressed by EA.
Special Registers Altered: Special Registers Altered:
None None
Store Byte with Update D-form Store Byte with Update Indexed X-form
stbu RS,D(RA) stbux RS,RA,RB
39 RS RA D 31 RS RA RB 247 /
0 6 11 16 31 0 6 11 16 21 31
54 Power ISA™ I
Version 3.0 B
44 RS RA D 31 RS RA RB 407 /
0 6 11 16 31 0 6 11 16 21 31
if RA = 0 then b 0 if RA = 0 then b 0
else b (RA) else b (RA)
EA b + EXTS(D) EA b + (RB)
MEM(EA, 2) (RS)48:63 MEM(EA, 2) (RS)48:63
Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum
(RS)48:63 are stored into the halfword in storage (RA|0)+ (RB). (RS)48:63 are stored into the halfword in
addressed by EA. storage addressed by EA.
Special Registers Altered: Special Registers Altered:
None None
Store Halfword with Update D-form Store Halfword with Update Indexed
X-form
sthu RS,D(RA)
sthux RS,RA,RB
45 RS RA D
0 6 11 16 31 31 RS RA RB 439 /
0 6 11 16 21 31
EA (RA) + EXTS(D)
MEM(EA, 2) (RS)48:63 EA (RA) + (RB)
RA EA MEM(EA, 2) (RS)48:63
RA EA
Let the effective address (EA) be the sum (RA)+ D.
(RS)48:63 are stored into the halfword in storage Let the effective address (EA) be the sum (RA)+ (RB).
addressed by EA. (RS)48:63 are stored into the halfword in storage
addressed by EA.
EA is placed into register RA.
EA is placed into register RA.
If RA=0, the instruction form is invalid.
If RA=0, the instruction form is invalid.
Special Registers Altered:
None Special Registers Altered:
None
36 RS RA D 31 RS RA RB 151 /
0 6 11 16 31 0 6 11 16 21 31
if RA = 0 then b 0 if RA = 0 then b 0
else b (RA) else b (RA)
EA b + EXTS(D) EA b + (RB)
MEM(EA, 4) (RS)32:63 MEM(EA, 4) (RS)32:63
Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum
(RS)32:63 are stored into the word in storage addressed (RA|0)+ (RB). (RS)32:63 are stored into the word in stor-
by EA. age addressed by EA.
Special Registers Altered: Special Registers Altered:
None None
Store Word with Update D-form Store Word with Update Indexed X-form
stwu RS,D(RA) stwux RS,RA,RB
37 RS RA D 31 RS RA RB 183 /
0 6 11 16 31 0 6 11 16 21 31
56 Power ISA™ I
Version 3.0 B
62 RS RA DS 0 31 RS RA RB 149 /
0 6 11 16 30 31 0 6 11 16 21 31
if RA = 0 then b 0 if RA = 0 then b 0
else b (RA) else b (RA)
EA b + EXTS(DS || 0b00) EA b + (RB)
MEM(EA, 8) (RS) MEM(EA, 8) (RS)
Let the effective address (EA) be the sum Let the effective address (EA) be the sum
(RA|0)+ (DS||0b00). (RS) is stored into the doubleword (RA|0)+ (RB). (RS) is stored into the doubleword in
in storage addressed by EA. storage addressed by EA.
Special Registers Altered: Special Registers Altered:
None None
Store Doubleword with Update DS-form Store Doubleword with Update Indexed
X-form
stdu RS,DS(RA)
stdux RS,RA,RB
62 RS RA DS 1
0 6 11 16 30 31 31 RS RA RB 181 /
0 6 11 16 21 31
if RA = 0 then b 0
else b (RA)
EA b + EXTS(DQ || 0b0000)
RTp MEM(EA, 16)
Let the effective address (EA) be the sum (RA|0)+
(DQ||0b0000). The quadword in storage addressed by
EA is loaded into register pair RTp.
If RTp is odd or RTp=RA, the instruction form is invalid.
If RTp=RA, an attempt to execute this instruction will
invoke the system illegal instruction error handler. (The
RTp=RA case includes the case of RTp=RA=0.)
The quadword in storage addressed by EA is loaded
into an even-odd pair of GPRs as follows. In
Big-Endian mode, the even-numbered GPR is loaded
with the doubleword from storage addressed by EA
and the odd-numbered GPR is loaded with the double-
word addressed by EA+8. In Little-Endian mode, the
even-numbered GPR is loaded with the byte-reversed
doubleword from storage addressed by EA+8 and the
odd-numbered GPR is loaded with the byte-reversed
doubleword addressed by EA.
58 Power ISA™ I
Version 3.0 B
62 RSp RA DS 2
0 6 11 16 30 31
if RA = 0 then b 0
else b (RA)
EA b + EXTS(DS || 0b00)
MEM(EA, 16) RSp
Let the effective address (EA) be the sum (RA|0)+
(DS||0b00). The contents of register pair RSp are
stored into the quadword in storage addressed by EA.
If RSp is odd, the instruction form is invalid.
The contents of an even-odd pair of GPRs is stored into
the quadword in storage addressed by EA as follows.
In Big-Endian mode, the even-numbered GPR is stored
into the doubleword in storage addressed by EA and
the odd-numbered GPR is stored into the doubleword
addressed by EA+8. In Little-Endian mode, the
even-numbered GPR is stored byte-reversed into the
doubleword in storage addressed by EA+8 and the
odd-numbered GPR is stored byte-reversed into the
doubleword addressed by EA.
Programming Note
In versions of the architecture prior to V. 2.07, this
instruction was privileged.
31 RT RA RB 790 / 31 RS RA RB 918 /
0 6 11 16 21 31 0 6 11 16 21 31
if RA = 0 then b 0 if RA = 0 then b 0
else b (RA) else b (RA)
EA b + (RB) EA b + (RB)
load_data MEM(EA, 2) MEM(EA, 2) (RS)56:63 || (RS)48:55
RT 480 || load_data8:15 || load_data0:7
Let the effective address (EA) be the sum
Let the effective address (EA) be the sum (RA|0)+(RB). (RA|0)+ (RB). (RS)56:63 are stored into bits 0:7 of the
Bits 0:7 of the halfword in storage addressed by EA are halfword in storage addressed by EA. (RS)48:55 are
loaded into RT56:63. Bits 8:15 of the halfword in storage stored into bits 8:15 of the halfword in storage
addressed by EA are loaded into RT48:55. RT0:47 are addressed by EA.
set to 0.
Special Registers Altered:
Special Registers Altered: None
None
Load Word Byte-Reverse Indexed X-form Store Word Byte-Reverse Indexed X-form
lwbrx RT,RA,RB stwbrx RS,RA,RB
31 RT RA RB 534 / 31 RS RA RB 662 /
0 6 11 16 21 31 0 6 11 16 21 31
if RA = 0 then b 0 if RA = 0 then b 0
else b (RA) else b (RA)
EA b + (RB) EA b + (RB)
load_data MEM(EA, 4) MEM(EA, 4) (RS)56:63 || (RS)48:55 || (RS)40:47
RT 320 || load_data24:31 || load_data16:23 ||(RS)32:39
|| load_data8:15 || load_data0:7
Let the effective address (EA) be the sum
Let the effective address (EA) be the sum (RA|0)+ (RB). (RS)56:63 are stored into bits 0:7 of the
(RA|0)+ (RB). Bits 0:7 of the word in storage addressed word in storage addressed by EA. (RS)48:55 are stored
by EA are loaded into RT56:63. Bits 8:15 of the word in into bits 8:15 of the word in storage addressed by EA.
storage addressed by EA are loaded into RT48:55. Bits (RS)40:47 are stored into bits 16:23 of the word in stor-
16:23 of the word in storage addressed by EA are age addressed by EA. (RS)32:39 are stored into bits
loaded into RT40:47. Bits 24:31 of the word in storage 24:31 of the word in storage addressed by EA.
addressed by EA are loaded into RT32:39. RT0:31 are
set to 0. Special Registers Altered:
None
Special Registers Altered:
None
60 Power ISA™ I
Version 3.0 B
31 RT RA RB 532 / 31 RS RA RB 660 /
0 6 11 16 21 31 0 6 11 16 21 31
if RA = 0 then b 0 if RA = 0 then b 0
else b (RA) else b (RA)
EA b + (RB) EA b + (RB)
load_data MEM(EA, 8) MEM(EA, 8) (RS)56:63 || (RS)48:55
RT load_data56:63 || load_data48:55 || (RS)40:47 || (RS)32:39
|| load_data40:47 || load_data32:39 || (RS)24:31 || (RS)16:23
|| load_data24:31 || load_data16:23 || (RS)8:15 || (RS)0:7
|| load_data8:15 || load_data0:7
Let the effective address (EA) be the sum
Let the effective address (EA) be the sum (RA|0)+(RB). (RA|0)+ (RB). (RS)56:63 are stored into bits 0:7 of the
Bits 0:7 of the doubleword in storage addressed by EA doubleword in storage addressed by EA. (RS)48:55 are
are loaded into RT56:63. Bits 8:15 of the doubleword in stored into bits 8:15 of the doubleword in storage
storage addressed by EA are loaded into RT48:55. Bits addressed by EA. (RS)40:47 are stored into bits 16:23 of
16:23 of the doubleword in storage addressed by EA the doubleword in storage addressed by EA. (RS)32:39
are loaded into RT40:47. Bits 24:31 of the doubleword in are stored into bits 23:31 of the doubleword in storage
storage addressed by EA are loaded into RT32:39. Bits addressed by EA. (RS)24:31 are stored into bits 32:39 of
32:39 of the doubleword in storage addressed by EA the doubleword in storage addressed by EA. (RS)16:23
are loaded into RT24:31. Bits 40:47 of the doubleword in are stored into bits 40:47 of the doubleword in storage
storage addressed by EA are loaded into RT16:23. Bits addressed by EA. (RS)8:15 are stored into bits 48:55 of
48:55 of the doubleword in storage addressed by EA the doubleword in storage addressed by EA. (RS)0:7
are loaded into RT8:15. Bits 56:63 of the doubleword in are stored into bits 56:63 of the doubleword in storage
storage addressed by EA are loaded into RT0:7. addressed by EA.
Special Registers Altered: Special Registers Altered:
None None
46 RT RA D 47 RS RA D
0 6 11 16 31 0 6 11 16 31
if RA = 0 then b 0 if RA = 0 then b 0
else b (RA) else b (RA)
EA b + EXTS(D) EA b + EXTS(D)
r RT r RS
do while r 31 do while r 31
GPR(r) 320 || MEM(EA, 4) MEM(EA, 4) GPR(r)32:63
r r + 1 r r + 1
EA EA + 4 EA EA + 4
Let n = (32-RT). Let the effective address (EA) be the Let n = (32-RS). Let the effective address (EA) be the
sum (RA|0)+ D. sum (RA|0)+ D.
n consecutive words starting at EA are loaded into the n consecutive words starting at EA are stored from the
low-order 32 bits of GPRs RT through 31. The low-order 32 bits of GPRs RS through 31.
high-order 32 bits of these GPRs are set to zero.
This instruction is not supported in Little-Endian mode.
If RA is in the range of registers to be loaded, including If it is executed in Little-Endian mode, the system align-
the case in which RA=0, the instruction form is invalid. ment error handler is invoked.
This instruction is not supported in Little-Endian mode. Special Registers Altered:
If it is executed in Little-Endian mode, the system align- None
ment error handler is invoked.
Special Registers Altered:
None
62 Power ISA™ I
Version 3.0 B
Load String Word Immediate X-form Load String Word Indexed X-form
lswi RT,RA,NB lswx RT,RA,RB
31 RT RA NB 597 / 31 RT RA RB 533 /
0 6 11 16 21 31 0 6 11 16 21 31
if RA = 0 then EA 0 if RA = 0 then b 0
else EA (RA) else b (RA)
if NB = 0 then n 32 EA b + (RB)
else n NB n XER57:63
r RT - 1 r RT - 1
i 32 i 32
do while n > 0 RT undefined
if i = 32 then do while n > 0
r r + 1 (mod 32) if i = 32 then
GPR(r) 0 r r + 1 (mod 32)
GPR(r)i:i+7 MEM(EA, 1) GPR(r) 0
i i + 8 GPR(r)i:i+7 MEM(EA, 1)
if i = 64 then i 32 i i + 8
EA EA + 1 if i = 64 then i 32
n n - 1 EA EA + 1
n n - 1
Let the effective address (EA) be (RA|0). Let n = NB if
NB0, n = 32 if NB=0; n is the number of bytes to load. Let the effective address (EA) be the sum
Let nr=CEIL(n/4); nr is the number of registers to (RA|0)+ (RB). Let n=XER57:63; n is the number of bytes
receive data. to load. Let nr=CEIL(n/4); nr is the number of registers
to receive data.
n consecutive bytes starting at EA are loaded into
GPRs RT through RT+nr-1. Data are loaded into the If n>0, n consecutive bytes starting at EA are loaded
low-order four bytes of each GPR; the high-order four into GPRs RT through RT+nr-1. Data are loaded into
bytes are set to 0. the low-order four bytes of each GPR; the high-order
four bytes are set to 0.
Bytes are loaded left to right in each register. The
sequence of registers wraps around to GPR 0 if Bytes are loaded left to right in each register. The
required. If the low-order four bytes of register RT+nr-1 sequence of registers wraps around to GPR 0 if
are only partially filled, the unfilled low-order byte(s) of required. If the low-order four bytes of register RT+nr-1
that register are set to 0. are only partially filled, the unfilled low-order byte(s) of
that register are set to 0.
If RA is in the range of registers to be loaded, including
the case in which RA=0, the instruction form is invalid. If n=0, the contents of register RT are undefined.
This instruction is not supported in Little-Endian mode. If RA or RB is in the range of registers to be loaded,
If it is executed in Little-Endian mode, the system align- including the case in which RA=0, the instruction is
ment error handler is invoked. treated as if the instruction form were invalid. If RT=RA
or RT=RB, the instruction form is invalid.
Special Registers Altered:
None This instruction is not supported in Little-Endian mode.
If it is executed in Little-Endian mode and n>0, the sys-
tem alignment error handler is invoked.
Special Registers Altered:
None
64 Power ISA™ I
Version 3.0 B
Store String Word Immediate X-form Store String Word Indexed X-form
stswi RS,RA,NB stswx RS,RA,RB
31 RS RA NB 725 / 31 RS RA RB 661 /
0 6 11 16 21 31 0 6 11 16 21 31
if RA = 0 then EA 0 if RA = 0 then b 0
else EA (RA) else b (RA)
if NB = 0 then n 32 EA b + (RB)
else n NB n XER57:63
r RS - 1 r RS - 1
i 32 i 32
do while n > 0 do while n > 0
if i = 32 then r r + 1 (mod 32) if i = 32 then r r + 1 (mod 32)
MEM(EA, 1) GPR(r)i:i+7 MEM(EA, 1) GPR(r)i:i+7
i i + 8 i i + 8
if i = 64 then i 32 if i = 64 then i 32
EA EA + 1 EA EA + 1
n n - 1 n n - 1
Let the effective address (EA) be (RA|0). Let n = NB if Let the effective address (EA) be the sum
NB0, n = 32 if NB=0; n is the number of bytes to store. (RA|0)+ (RB). Let n = XER57:63; n is the number of
Let nr =CEIL(n/4); nr is the number of registers to sup- bytes to store. Let nr = CEIL(n/4); nr is the number of
ply data. registers to supply data.
n consecutive bytes starting at EA are stored from If n>0, n consecutive bytes starting at EA are stored
GPRs RS through RS+nr-1. Data are stored from the from GPRs RS through RS+nr-1. Data are stored from
low-order four bytes of each GPR. the low-order four bytes of each GPR.
Bytes are stored left to right from each register. The Bytes are stored left to right from each register. The
sequence of registers wraps around to GPR 0 if sequence of registers wraps around to GPR 0 if
required. required.
This instruction is not supported in Little-Endian mode. If n=0, no bytes are stored.
If it is executed in Little-Endian mode, the system align-
This instruction is not supported in Little-Endian mode.
ment error handler is invoked.
If it is executed in Little-Endian mode and n>0, the sys-
Special Registers Altered: tem alignment error handler is invoked.
None
Special Registers Altered:
None
66 Power ISA™ I
Version 3.0 B
14 RT RA SI 15 RT RA SI
0 6 11 16 31 0 6 11 16 31
19 RT d1 d0 2 d2
D d0||d1||d2
RT NIA + EXTS(D || 160)
The sum of NIA + (D || 0x0000) is placed into register
RT.
68 Power ISA™ I
Version 3.0 B
31 RT RA RB OE 266 Rc 31 RT RA RB OE 40 Rc
0 6 11 16 21 22 31 0 6 11 16 21 22 31
RT (RA) + EXTS(SI)
RT (RA) + EXTS(SI)
The sum (RA) + SI is placed into register RT.
The sum (RA) + SI is placed into register RT.
Special Registers Altered:
CA CA32 Special Registers Altered:
CR0 CA CA32
Extended Mnemonics:
Extended Mnemonics:
Example of extended mnemonics for Add Immediate
Carrying: Example of extended mnemonics for Add Immediate
Carrying and Record:
Extended: Equivalent to:
subic Rx,Ry,value addic Rx,Ry,-value Extended: Equivalent to:
subic. Rx,Ry,value addic. Rx,Ry,-value
8 RT RA SI
0 6 11 16 31
RT ¬(RA) + EXTS(SI) + 1
The sum ¬(RA) + SI + 1 is placed into register RT.
Special Registers Altered:
CA CA32
31 RT RA RB OE 10 Rc 31 RT RA RB OE 8 Rc
0 6 11 16 21 22 31 0 6 11 16 21 22 31
70 Power ISA™ I
Version 3.0 B
31 RT RA RB OE 138 Rc 31 RT RA RB OE 136 Rc
0 6 11 16 21 22 31 0 6 11 16 21 22 31
Add to Minus One Extended XO-form Subtract From Minus One Extended
XO-form
addme RT,RA (OE=0 Rc=0)
addme. RT,RA (OE=0 Rc=1) subfme RT,RA (OE=0 Rc=0)
addmeo RT,RA (OE=1 Rc=0) subfme. RT,RA (OE=0 Rc=1)
addmeo. RT,RA (OE=1 Rc=1) subfmeo RT,RA (OE=1 Rc=0)
subfmeo. RT,RA (OE=1 Rc=1)
31 RT RA /// OE 234 Rc
0 6 11 16 21 22 31 31 RT RA /// OE 232 Rc
0 6 11 16 21 22 31
RT (RA) + CA - 1
RT ¬(RA) + CA - 1
The sum (RA) + CA + 641 is placed into register RT.
The sum ¬(RA) + CA + 641 is placed into register RT.
Special Registers Altered:
CA CA32 Special Registers Altered:
CR0 (if Rc=1) CA CA32
SO OV OV32 (if OE=1) CR0 (if Rc=1)
SO OV OV32 (if OE=1)
Add Extended using alternate carry bit Subtract From Zero Extended XO-form
Z23-form
subfze RT,RA (OE=0 Rc=0)
addex RT,RA,RB,CY
subfze. RT,RA (OE=0 Rc=1)
31 RT RA RB CY 170 / subfzeo RT,RA (OE=1 Rc=0)
0 6 11 16 21 23 31 subfzeo. RT,RA (OE=1 Rc=1)
RT (RA) + CA RT ¬(RA) + 1
The sum (RA) + CA is placed into register RT. The sum ¬(RA) + 1 is placed into register RT.
Special Registers Altered: If the processor is in 64-bit mode and register RA con-
CA CA32 tains the most negative 64-bit number (0x8000_
CR0 (if Rc=1) 0000_0000_0000), the result is the most negative num-
SO OV OV32 (if OE=1) ber and, if OE=1, OV and OV32 are set to 1. Similarly, if
the processor is in 32-bit mode and (RA)32:63 contain
the most negative 32-bit number (0x8000_0000), the
low-order 32 bits of the result contain the most negative
32-bit number and, if OE=1, OV and OV32 are set to 1.
Special Registers Altered:
CR0 (if Rc=1)
SO OV OV32 (if OE=1)
72 Power ISA™ I
Version 3.0 B
31 RT RA RB OE 235 Rc 31 RT RA RB / 11 Rc
0 6 11 16 21 22 31 0 6 11 16 21 22 31
31 RT RA RB OE 491 Rc 31 RT RA RB OE 459 Rc
0 6 11 16 21 22 31 0 6 11 16 21 22 31
74 Power ISA™ I
Version 3.0 B
31 RT RA RB OE 427 Rc 31 RT RA RB OE 395 Rc
0 6 11 16 21 22 31 0 6 11 16 21 22 31
Programming Note
76 Power ISA™ I
Version 3.0 B
31 RT RA RB 779 / 31 RT RA RB 267 /
0 6 11 16 21 31 0 6 11 16 21 31
L Format
0 320
|| CRN0:31
1 CRN0:63
2 RRN0:63
3 reserved
Format above is for non-error conditions.
0xFFFFFFFF_FFFFFFFF for error conditions.
CRN = conditioned random number
RRN = raw random number
Programming Note
32-bit software running in an environment that does
not preserve the high-order 32 bits of GPRs across
invocations of the system error handler, signal han-
dlers, event-based branch handlers, etc. may use
the L=0 variant of darn and interpret the value
0xFFFFFFFF to indicate an error condition. The
fact that the error condition includes the valid value
0x00000000_FFFFFFFF together with the true
error value 0xFFFFFFFF_FFFFFFFF is not a prob-
lem.
Programming Note
When the error value is obtained, software is
expected to repeat the operation. If a non-error
value has not been obtained after several attempts,
a software random number generation method
should be used. The recommended number of
attempts may be implementation specific. In the
absence of other guidance, ten attempts should be
adequate.
78 Power ISA™ I
Version 3.0 B
4 RT RA RB RC 51
0 6 11 16 21 26 31
80 Power ISA™ I
Version 3.0 B
31 RT RA RB OE 489 Rc 31 RT RA RB OE 457 Rc
0 6 11 16 21 22 31 0 6 11 16 21 22 31
Programming Note
Unsigned long division of a 128-bit dividend con-
tained in two 64-bit registers by a 64-bit divisor can
be accomplished using the technique described in
the Programming Note with the divweu instruction
description: divd[e]u would be used instead of
divw[e]u (and cmpld instead of cmplw, etc.).
82 Power ISA™ I
Version 3.0 B
31 RT RA RB 777 / 31 RT RA RB 265 /
0 6 11 16 21 31 0 6 11 16 21 31
The 64-bit dividend is (RA). The 64-bit divisor is (RB). The 64-bit dividend is (RA). The 64-bit divisor is (RB).
The 64-bit remainder is placed into register RT. The The 64-bit remainder is placed into register RT. The
quotient is not supplied as a result. quotient is not supplied as a result.
Both operands and the remainder are interpreted as Both operands and the remainder are interpreted as
signed integers. The remainder is the unique signed unsigned integers. The remainder is the unique signed
integer that satisfies integer that satisfies
where 0 remainder < |divisor| if the dividend is where 0 remainder < divisor.
nonnegative, and -|divisor| < remainder 0 if the
dividend is negative. If an attempt is made to perform any of the divisions
84 Power ISA™ I
Version 3.0 B
11 BF / L RA SI 31 BF / L RA RB 0 /
0 6 9 10 11 16 31 0 6 9 10 11 16 21 31
10 BF / L RA UI 31 BF / L RA RB 32 /
0 6 9 10 11 16 31 0 6 9 10 11 16 21 31
86 Power ISA™ I
Version 3.0 B
31 BF // RA RB 224 /
0 6 9 11 16 21 31
src1 GPR[RA].bit[56:63]
CR4×BF+32 0b0
CR4×BF+33 match
CR4×BF+34 0b0
CR4×BF+35 0b0
Programming Note
cmpeqb is useful for implementing character
typing functions such as isspace() that are
implemented by comparing the character to 1 or
more values.
88 Power ISA™ I
Version 3.0 B
3 TO RA SI 31 TO RA RB 4 /
0 6 11 16 31 0 6 11 16 21 31
a EXTS((RA)32:63) a EXTS((RA)32:63)
if (a < EXTS(SI)) & TO0 then TRAP b EXTS((RB)32:63)
if (a > EXTS(SI)) & TO1 then TRAP if (a < b) & TO0 then TRAP
if (a = EXTS(SI)) & TO2 then TRAP if (a > b) & TO1 then TRAP
if (a <u EXTS(SI)) & TO3 then TRAP if (a = b) & TO2 then TRAP
if (a >u EXTS(SI)) & TO4 then TRAP if (a <u b) & TO3 then TRAP
if (a >u b) & TO4 then TRAP
The contents of RA32:63 are compared with the
sign-extended value of the SI field. If any bit in the TO The contents of RA32:63 are compared with the con-
field is set to 1 and its corresponding condition is met tents of RB32:63. If any bit in the TO field is set to 1 and
by the result of the comparison, the system trap han- its corresponding condition is met by the result of the
dler is invoked. comparison, the system trap handler is invoked.
If the trap conditions are met, this instruction is context If the trap conditions are met, this instruction is context
synchronizing (see Book III). synchronizing (see Book III).
Special Registers Altered: Special Registers Altered:
None None
Extended Mnemonics: Extended Mnemonics:
Examples of extended mnemonics for Trap Word Examples of extended mnemonics for Trap Word:
Immediate:
Extended: Equivalent to:
Extended: Equivalent to: tweq Rx,Ry tw 4,Rx,Ry
twgti Rx,value twi 8,Rx,value twlge Rx,Ry tw 5,Rx,Ry
twllei Rx,value twi 6,Rx,value trap tw 31,0,0
90 Power ISA™ I
Version 3.0 B
28 RS RA UI 24 RS RA UI
0 6 11 16 31 0 6 11 16 31
29 RS RA UI Extended Mnemonics:
0 6 11 16 31
Example of extended mnemonics for OR Immediate:
92 Power ISA™ I
Version 3.0 B
25 RS RA UI 27 RS RA UI
0 6 11 16 31 0 6 11 16 31
26 RS RA UI
0 6 11 16 31
xori 0,0,0
Special Registers Altered:
None
Extended Mnemonics:
Example of extended mnemonics for XOR Immediate:
Programming Note
The executed form of no-op should be used only
when the intent is to alter the timing of a program.
31 RS RA RB 28 Rc 31 RS RA RB 444 Rc
0 6 11 16 21 31 0 6 11 16 21 31
31 RS RA RB 316 Rc
0 6 11 16 21 31
RA (RS) (RB)
The contents of register RS are XORed with the con-
tents of register RB and the result is placed into register
RA.
Special Registers Altered:
CR0 (if Rc=1)
NAND X-form
nand RA,RS,RB (Rc=0)
nand. RA,RS,RB (Rc=1)
31 RS RA RB 476 Rc
0 6 11 16 21 31
Programming Note
nand or nor with RS=RB can be used to obtain the
one’s complement.
94 Power ISA™ I
Version 3.0 B
31 RS RA RB 124 Rc 31 RS RA RB 284 Rc
0 6 11 16 21 31 0 6 11 16 21 31
31 RS RA RB 60 Rc 31 RS RA RB 412 Rc
0 6 11 16 21 31 0 6 11 16 21 31
s (RS)56 s (RS)48
RA56:63 (RS)56:63 RA48:63 (RS)48:63
RA0:55 56s RA0:47 48s
(RS)56:63 are placed into RA56:63. RA0:55 are filled with (RS)48:63 are placed into RA48:63. RA0:47 are filled with
a copy of (RS)56. a copy of (RS)48.
Special Registers Altered: Special Registers Altered:
CR0 (if Rc=1) CR0 (if Rc=1)
Count Leading Zeros Word X-form Count Trailing Zeros Word X-form
cntlzw RA,RS (Rc=0) cnttzw RA,RS (Rc=0)
cntlzw. RA,RS (Rc=1) cnttzw. RA,RS (Rc=1)
n 32 n 0
RA n - 32 RA EXTZ64(n)
A count of the number of consecutive zero bits starting A count of the number of consecutive zero bits starting
at bit 32 of register RS is placed into register RA. This at bit 63 of the rightmost word of register RS is placed
number ranges from 0 to 32, inclusive. into register RA. This number ranges from 0 to 32,
inclusive.
If Rc is equal to 1, CR field 0 is set to reflect the result.
If Rc is equal to 1, CR field 0 is set to reflect the result.
Special Registers Altered:
CR0 (if Rc=1) Special Registers Altered:
CR0 (if Rc=1)
Programming Note
For both Count Leading Zeros instructions, if Rc=1
then LT is set to 0 in CR Field 0.
96 Power ISA™ I
Version 3.0 B
do n = 0 to 7 do i = 0 to 7
if RS8n:8n+7 = (RB)8n:8n+7 then n 0
RA8n:8n+7 81 do j = 0 to 7
else if (RS)(i8)+j = 1 then
RA8n:8n+7 80 n n+1
Each byte of the contents of register RS is compared to RA(i8):(i8)+7 n
each corresponding byte of the contents in register RB. A count of the number of one bits in each byte of regis-
If they are equal, the corresponding byte in RA is set to ter RS is placed into the corresponding byte of register
0xFF. Otherwise the corresponding byte in RA is set to RA. This number ranges from 0 to 8, inclusive.
0x00.
Special Registers Altered:
Special Registers Altered: None
None
Population Count Words X-form
popcntw RA, RS
31 RS RA /// 378 /
0 6 11 16 21 31
do i = 0 to 1
n 0
do j = 0 to 31
if (RS)(i32)+j = 1 then
n n+1
RA(i32):(i32)+31 n
A count of the number of one bits in each word of regis-
ter RS is placed into the corresponding word of register
RA. This number ranges from 0 to 32, inclusive.
Special Registers Altered:
None
s 0 s 0
do i = 0 to 7 t 0
s s / (RS)i%8+7 do i = 0 to 3
RA 630 || s s s / (RS)i%8+7
do i = 4 to 7
The least significant bit in each byte of the contents of t t / (RS)i%8+7
register RS is examined. If there is an odd number of RA0:31 310 || s
one bits the value 1 is placed into register RA; other- RA32:63 310 || t
wise the value 0 is placed into register RA.
The least significant bit in each byte of (RS)0:31 is
Special Registers Altered: examined. If there is an odd number of one bits the
None value 1 is placed into RA0:31; otherwise the value 0 is
placed into RA0:31. The least significant bit in each byte
of (RS)32:63 is examined. If there is an odd number of
one bits the value 1 is placed into RA32:63; otherwise
the value 0 is placed into RA32:63.
Special Registers Altered:
None
Programming Note
The Parity instructions are designed to be used in
conjunction with the Population Count instruction to
compute the parity of words or a doubleword. The
parity of the upper and lower words in (RS) can be
computed as follows.
popcntb RA, RS
prtyw RA, RA
The parity of (RS) can be computed as follows.
popcntb RA, RS
prtyd RA, RA
98 Power ISA™ I
Version 3.0 B
Count Leading Zeros Doubleword X-form Count Trailing Zeros Doubleword X-form
cntlzd RA,RS (Rc=0) cnttzd RA,RS (Rc=0)
cntlzd. RA,RS (Rc=1) cnttzd. RA,RS (Rc=1)
n 0 n 0
do while n < 64 do while n < 64
if (RS)n = 1 then leave if (RS)63-n = 0b1 then leave
n n + 1 n n + 1
RA n RA EXTZ64(n)
A count of the number of consecutive zero bits starting A count of the number of consecutive zero bits starting
at bit 0 of register RS is placed into register RA. This at bit 63 of register RS is placed into register RA. This
number ranges from 0 to 64, inclusive. number ranges from 0 to 64, inclusive.
If Rc=1, CR Field 0 is set to reflect the result. If Rc is equal to 1, CR field 0 is set to reflect the result.
Special Registers Altered:
Special Registers Altered:
CR0 (if Rc=1)
CR0 (if Rc=1)
31 RS RA RB 252 /
0 6 11 16 21 31
For i = 0 to 7
index (RS)8*i:8*i+7
If index < 64
then permi (RB)index
else permi 0
RA 560 || perm0:7
Eight permuted bits are produced. For each permuted
bit i where i ranges from 0 to 7 and for each byte i of
RS, do the following.
If byte i of RS is less than 64, permuted bit i is set
to the bit of RB specified by byte i of RS; otherwise
permuted bit i is set to 0.
The permuted bits are placed in the least-significant
byte of RA, and the remaining bits are filled with 0s.
Special Registers Altered:
None
Programming Note
The fact that the permuted bit is 0 if the corre-
sponding index value exceeds 63 permits the per-
muted bits to be selected from a 128-bit quantity,
using a single index register. For example, assume
that the 128-bit quantity Q, from which the per-
muted bits are to be selected, is in registers r2
(high-order 64 bits of Q) and r3 (low-order 64 bits of
Q), that the index values are in register r1, with
each byte of r1 containing a value in the range
0:127, and that each byte of register r4 contains the
value 64. The following code sequence selects
eight permuted bits from Q and places them into
the low-order byte of r6.
bpermd r6,r1,r2 # select from high-
order half of Q
xor r0,r1,r4 # adjust index values
bpermd r5,r0,r3 # select from low-
order half of Q
or r6,r6,r5 # merge the two
selections
Rotate Left Word then AND with Mask Rotate Left Word Immediate then Mask
M-form Insert M-form
rlwnm RA,RS,RB,MB,ME (Rc=0) rlwimi RA,RS,SH,MB,ME (Rc=0)
rlwnm. RA,RS,RB,MB,ME (Rc=1) rlwimi. RA,RS,SH,MB,ME (Rc=1)
23 RS RA RB MB ME Rc 20 RS RA SH MB ME Rc
0 6 11 16 21 26 31 0 6 11 16 21 26 31
n (RB)59:63 n SH
r ROTL32((RS)32:63, n) r ROTL32((RS)32:63, n)
m MASK(MB+32, ME+32) m MASK(MB+32, ME+32)
RA r & m RA r&m | (RA)&¬m
The contents of register RS are rotated32 left the num- The contents of register RS are rotated32 left SH bits. A
ber of bits specified by (RB)59:63. A mask is generated mask is generated having 1-bits from bit MB+32
having 1-bits from bit MB+32 through bit ME+32 and through bit ME+32 and 0-bits elsewhere. The rotated
0-bits elsewhere. The rotated data are ANDed with the data are inserted into register RA under control of the
generated mask and the result is placed into register generated mask.
RA.
Special Registers Altered:
Special Registers Altered: CR0 (if Rc=1)
CR0 (if Rc=1)
Extended Mnemonics:
Extended Mnemonics:
Example of extended mnemonics for Rotate Left Word
Example of extended mnemonics for Rotate Left Word Immediate then Mask Insert:
then AND with Mask:
Extended: Equivalent to:
Extended: Equivalent to: inslwi Rx,Ry,n,b rlwimi Rx,Ry,32-b,b,b+n-1
rotlw Rx,Ry,Rz rlwnm Rx,Ry,Rz,0,31
Programming Note
Programming Note Let RAL represent the low-order 32 bits of register
Let RSL represent the low-order 32 bits of register RA, with the bits numbered from 0 through 31.
RS, with the bits numbered from 0 through 31.
rlwimi can be used to insert an n-bit field that is
rlwnm can be used to extract an n-bit field that left-justified in the low-order 32 bits of register RS,
starts at variable bit position b in RSL, right-justified into RAL starting at bit position b, by setting
into the low-order 32 bits of register RA (clearing SH=32-b, MB=b, and ME=(b+n)-1. It can be used
the remaining 32-n bits of the low-order 32 bits of to insert an n-bit field that is right-justified in the
RA), by setting RB59:63=b+n, MB=32-n, and low-order 32 bits of register RS, into RAL starting at
ME=31. It can be used to extract an n-bit field that bit position b, by setting SH=32-(b+n), MB=b, and
starts at variable bit position b in RSL, left-justified ME=(b+n)-1.
into the low-order 32 bits of register RA (clearing
Extended mnemonics are provided for both of
the remaining 32-n bits of the low-order 32 bits of
these uses; see Appendix C, “Assembler Extended
RA), by setting RB59:63=b, MB = 0, and ME=n-1. It
Mnemonics” on page 791.
can be used to rotate the contents of the low-order
32 bits of a register left (right) by variable n bits, by
setting RB59:63=n (32-n), MB=0, and ME=31.
For all the uses given above, the high-order 32 bits
of register RA are cleared.
Extended mnemonics are provided for some of
these uses; see Appendix C, “Assembler Extended
Mnemonics” on page 791.
Rotate Left Doubleword Immediate then Rotate Left Doubleword Immediate then
Clear Left MD-form Clear Right MD-form
rldicl RA,RS,SH,MB (Rc=0) rldicr RA,RS,SH,ME (Rc=0)
rldicl. RA,RS,SH,MB (Rc=1) rldicr. RA,RS,SH,ME (Rc=1)
30 RS RA sh mb 0 sh Rc 30 RS RA sh me 1 sh Rc
0 6 11 16 21 27 30 31 0 6 11 16 21 27 30 31
Rotate Left Doubleword Immediate then Rotate Left Doubleword then Clear Left
Clear MD-form MDS-form
rldic RA,RS,SH,MB (Rc=0) rldcl RA,RS,RB,MB (Rc=0)
rldic. RA,RS,SH,MB (Rc=1) rldcl. RA,RS,RB,MB (Rc=1)
30 RS RA sh mb 2 sh Rc 30 RS RA RB mb 8 Rc
0 6 11 16 21 27 30 31 0 6 11 16 21 27 31
Rotate Left Doubleword then Clear Right Rotate Left Doubleword Immediate then
MDS-form Mask Insert MD-form
rldcr RA,RS,RB,ME (Rc=0) rldimi RA,RS,SH,MB (Rc=0)
rldcr. RA,RS,RB,ME (Rc=1) rldimi. RA,RS,SH,MB (Rc=1)
30 RS RA RB me 9 Rc 30 RS RA sh mb 3 sh Rc
0 6 11 16 21 27 31 0 6 11 16 21 27 30 31
31 RS RA RB 24 Rc 31 RS RA RB 536 Rc
0 6 11 16 21 31 0 6 11 16 21 31
n (RB)59:63 n (RB)59:63
r ROTL32((RS)32:63, n) r ROTL32((RS)32:63, 64-n)
if (RB)58 = 0 then if (RB)58 = 0 then
m MASK(32, 63-n) m MASK(n+32, 63)
else m 640 else m 640
RA r & m RA r & m
The contents of the low-order 32 bits of register RS are The contents of the low-order 32 bits of register RS are
shifted left the number of bits specified by (RB)58:63. shifted right the number of bits specified by (RB)58:63.
Bits shifted out of position 32 are lost. Zeros are sup- Bits shifted out of position 63 are lost. Zeros are sup-
plied to the vacated positions on the right. The 32-bit plied to the vacated positions on the left. The 32-bit
result is placed into RA32:63. RA0:31 are set to zero. result is placed into RA32:63. RA0:31 are set to zero.
Shift amounts from 32 to 63 give a zero result. Shift amounts from 32 to 63 give a zero result.
Special Registers Altered: Special Registers Altered:
CR0 (if Rc=1) CR0 (if Rc=1)
Shift Right Algebraic Word Immediate Shift Right Algebraic Word X-form
X-form
sraw RA,RS,RB (Rc=0)
srawi RA,RS,SH (Rc=0) sraw. RA,RS,RB (Rc=1)
srawi. RA,RS,SH (Rc=1)
31 RS RA RB 792 Rc
31 RS RA SH 824 Rc 0 6 11 16 21 31
0 6 11 16 21 31
n (RB)59:63
n SH r ROTL32((RS)32:63, 64-n)
r ROTL32((RS)32:63, 64-n) if (RB)58 = 0 then
m MASK(n+32, 63) m MASK(n+32, 63)
s (RS)32 else m 640
RA r&m | (64s)&¬m s (RS)32
carry s & ((r&¬m)32:630) RA r&m | (64s)&¬m
CA carry carry s & ((r&¬m)32:630)
CA32 carry CA carry
CA32 carry
The contents of the low-order 32 bits of register RS are
shifted right SH bits. Bits shifted out of position 63 are The contents of the low-order 32 bits of register RS are
lost. Bit 32 of RS is replicated to fill the vacated posi- shifted right the number of bits specified by (RB)58:63.
tions on the left. The 32-bit result is placed into RA32:63. Bits shifted out of position 63 are lost. Bit 32 of RS is
Bit 32 of RS is replicated to fill RA0:31. CA and CA32 replicated to fill the vacated positions on the left. The
are set to 1 if the low-order 32 bits of (RS) contain a 32-bit result is placed into RA32:63. Bit 32 of RS is repli-
negative number and any 1-bits are shifted out of posi- cated to fill RA0:31. CA and CA32 are set to 1 if the
tion 63; otherwise CA and CA32 are set to 0. A shift low-order 32 bits of (RS) contain a negative number
amount of zero causes RA to receive EXTS((RS)32:63), and any 1-bits are shifted out of position 63; otherwise
and CA and CA32 to be set to 0. CA and CA32 are set to 0. A shift amount of zero
causes RA to receive EXTS((RS)32:63), and CA and
CA32 to be set to 0. Shift amounts from 32 to 63 give a
Special Registers Altered: result of 64 sign bits, and cause CA and CA32 to
CA CA32 receive the sign bit of (RS)32:63.
CR0 (if Rc=1)
31 RS RA RB 27 Rc 31 RS RA RB 539 Rc
0 6 11 16 21 31 0 6 11 16 21 31
n (RB)58:63 n (RB)58:63
r ROTL64((RS), n) r ROTL64((RS), 64-n)
if (RB)57 = 0 then if (RB)57 = 0 then
m MASK(0, 63-n) m MASK(n, 63)
else m 640 else m 640
RA r & m RA r & m
The contents of register RS are shifted left the number The contents of register RS are shifted right the num-
of bits specified by (RB)57:63. Bits shifted out of position ber of bits specified by (RB)57:63. Bits shifted out of
0 are lost. Zeros are supplied to the vacated positions position 63 are lost. Zeros are supplied to the vacated
on the right. The result is placed into register RA. Shift positions on the left. The result is placed into register
amounts from 64 to 127 give a zero result. RA. Shift amounts from 64 to 127 give a zero result.
Special Registers Altered: Special Registers Altered:
CR0 (if Rc=1) CR0 (if Rc=1)
n (RB)58:63
n sh5 || sh0:4 r ROTL64((RS), 64-n)
r ROTL64((RS), 64-n) if (RB)57 = 0 then
m MASK(n, 63) m MASK(n, 63)
s (RS)0 else m 640
RA r&m | (64s)&¬m s (RS)0
carry s & ((r&¬m)0) RA r&m | (64s)&¬m
CA carry carry s & ((r&¬m)0)
CA32 carry CA carry
CA32 carry
The contents of register RS are shifted right SH bits.
Bits shifted out of position 63 are lost. Bit 0 of RS is rep- The contents of register RS are shifted right the num-
licated to fill the vacated positions on the left. The result ber of bits specified by (RB)57:63. Bits shifted out of
is placed into register RA. CA and CA32 are set to 1 if position 63 are lost. Bit 0 of RS is replicated to fill the
(RS) is negative and any 1-bits are shifted out of posi- vacated positions on the left. The result is placed into
tion 63; otherwise CA and CA32 are set to 0. A shift register RA. CA and CA32 are set to 1 if (RS) is nega-
amount of zero causes RA to be set equal to (RS), and tive and any 1-bits are shifted out of position 63; other-
CA and CA32 to be set to 0. wise CA and CA32 are set to 0. A shift amount of zero
causes RA to be set equal to (RS), and CA and CA32
Special Registers Altered: to be set to 0. Shift amounts from 64 to 127 give a
CA CA32 result of 64 sign bits in RA, and cause CA and CA32 to
CR0 (if Rc=1) receive the sign bit of (RS).
Special Registers Altered:
CA CA32
CR0 (if Rc=1)
31 RS RA sh 445 sh Rc
0 6 11 16 21 30 31
n sh5 || sh0:4
r ROTL64(EXTS64(RS32:63), n)
m MASK(0, 63-n)
RA r & m
Convert Declets To Binary Coded Decimal Add and Generate Sixes XO-form
X-form
addg6s RT,RA,RB
cdtbcd RA, RS
31 RT RA RB / 74 /
31 RS RA /// 282 / 0 6 11 16 21 22 31
0 6 11 16 21 31
do i = 0 to 15
do i = 0 to 1 dci carry_out(RA4xi:63 + RB4xi:63)
n i x 32 c 4(dc0) || 4(dc1) || ... || 4(dc15)
RAn+0:n+7 0 RT (¬c) & 0x6666_6666_6666_6666
RAn+8:n+19 DPD_TO_BCD( (RS)n+12:n+21 )
RAn+20:n+31 DPD_TO_BCD( (RS)n+22:n+31 ) The contents of register RA are added to the contents
of register RB. Sixteen carry bits are produced, one
The low-order 20 bits of each word of register RS con- for each carry out of decimal position n (bit posi-
tain two declets which are converted to six, 4-bit BCD tion 4xn).
fields; each set of six, 4-bit BCD fields is placed into the
low-order 24 bits of the corresponding word in RA. The A doubleword is composed from the 16 carry bits, and
high-order 8 bits in each word of RA are set to 0. placed into RT. The doubleword consists of a decimal
Special Registers Altered: six (0b0110) in every decimal digit position for which
None the corresponding carry bit is 0, and a zero (0b0000) in
every position for which the corresponding carry bit is
Convert Binary Coded Decimal To Declets 1.
X-form Special Registers Altered:
None
cbcdtd RA, RS
Programming Note
31 RS RA /// 314 /
0 6 11 16 21 31
addg6s can be used to add or subtract two BCD
operands. In these examples it is assumed that r0
contains 0x666...666. (BCD data formats are
do i = 0 to 1
n i x 32 described in Section 5.3.)
RAn+0:n+11 0 Addition of the unsigned BCD operand in register
RAn+12:n+21 BCD_TO_DPD( (RS)n+8:n+19 )
RAn+22:n+31 BCD_TO_DPD( (RS)n+20:n+31 ) RA to the unsigned BCD operand in register RB
can be accomplished as follows.
The low-order 24 bits of each word of register RS con-
tain six, 4-bit BCD fields which are converted to two add r1,RA,r0
declets; each set of two declets is placed into the add r2,r1,RB
low-order 20 bits of the corresponding word in RA. The addg6s RT,r1,RB
high-order 12 bits in each word of RA are set to 0. subf RT,RT,r2# RT = RA +BCD RB
If a 4-bit BCD field has a value greater than 9 the Subtraction of the unsigned BCD operand in regis-
results are undefined. ter RA from the unsigned BCD operand in register
Special Registers Altered: RB can be accomplished as follows. (In this exam-
None ple it is assumed that RB is not register 0.)
addi r1,RB,1
nor r2,RA,RA# one's complement of RA
add r3,r1,r2
addg6s RT,r1,r2
subf RT,RT,r3# RT = RB -BCD RA
Additional instructions are needed to handle signed
BCD operands, and BCD operands that occupy
more than one register (e.g., unsigned BCD oper-
ands that have more than 16 decimal digits).
Move From VSR Doubleword X-form Move From VSR Lower Doubleword
X-form
mfvsrd RA,XS
mfvsrld RA,XS
31 S RA /// 51 SX
31 S RA /// 307 SX
0 6 11 16 21 31
0 6 11 16 21 31
if SX=0 & MSR.FP=0 then FP_Unavailable() if SX=0 & MSR.VSX=0 then VSX_Unavailable()
if SX=1 & MSR.VEC=0 then Vector_Unavailable() if SX=1 & MSR.VEC=0 then Vector_Unavailable()
0 64 127
31 S RA /// 115 SX
0 6 11 16 21 31
GPR[RA] EXTZ64(VSR[32×SX+S].word[1])
0 32 64 127
if TX=0 & MSR.FP=0 then FP_Unavailable() if TX=0 & MSR.FP=0 then FP_Unavailable()
if TX=1 & MSR.VEC=0 then Vector_Unavailable() if TX=1 & MSR.VEC=0 then Vector_Unavailable()
The contents of GPR[RA] are placed into doubleword The two’s-complement integer in bits 32:63 of GPR[RA]
element 0 of VSR[XT]. is sign-extended to 64 bits and placed into doubleword
element 0 of VSR[XT].
The contents of doubleword element 1 of VSR[XT] are
undefined. The contents of doubleword element 1 of VSR[XT] are
undefined.
For TX=0, mtvsrd is treated as a Floating-Point
instruction in terms of resource availability. For TX=0, mtvsrwa is treated as a Floating-Point
instruction in terms of resource availability.
For TX=1, mtvsrd is treated as a Vector instruction in
terms of resource availability. For TX=1, mtvsrwa is treated as a Vector instruction in
terms of resource availability.
Extended Mnemonics Equivalent To
mtfprd FRT,RA mtvsrd FRT,RA Extended Mnemonics Equivalent To
mtvrd VRT,RA mtvsrd VRT+32,RA mtfprwa FRT,RA mtvsrwa FRT,RA
mtvrwa VRT,RA mtvsrwa VRT+32,RA
Special Registers Altered
None Special Registers Altered
None
Data Layout for mtvsrd
Data Layout for mtvsrwa
src = GPR[RA]
src = GPR[RA]
Move To VSR Word and Zero X-form Move To VSR Double Doubleword X-form
mtvsrwz XT,RA mtvsrdd XT,RA,RB
31 T RA RB 435 TX
31 T RA /// 243 TX
0 6 11 16 21 31
0 6 11 16 21 31
if TX=0 & MSR.VSX=0 then VSX_Unavailable()
if TX=0 & MSR.FP=0 then FP_Unavailable() if TX=1 & MSR.VEC=0 then Vector_Unavailable()
if TX=1 & MSR.VEC=0 then Vector_Unavailable()
VSR[32×TX+T].dword[0] (RA=0) ? 0x0000_0000_0000_0000 : GPR[RA]
VSR[32×TX+T].dword[0] EXTZ64(GPR[RA].word[1]) VSR[32×TX+T].dword[1] GPR[RB]
VSR[32×TX+T].dword[1] 0xUUUU_UUUU_UUUU_UUUU
Let XT be the value 32×TX + T.
Let XT be the value 32×TX + T.
The contents of GPR[RA], or the value 0 if RA=0, are
The contents of bits 32:63 of GPR[RA] are placed into placed into doubleword 0 of VSR[XT].
word element 1 of VSR[XT]. The contents of word
element 0 of VSR[XT] are set to 0. The contents of GPR[RB] are placed into doubleword 1
of VSR[XT].
The contents of doubleword element 1 of VSR[XT] are
undefined. For TX=0, mtvsrdd is treated as a VSX instruction in
terms of resource availability.
For TX=0, mtvsrwz is treated as a Floating-Point
instruction in terms of resource availability. For TX=1, mtvsrdd is treated as a Vector instruction in
terms of resource availability.
For TX=1, mtvsrwz is treated as a Vector instruction in
terms of resource availability. Special Registers Altered:
Extended Mnemonics Equivalent To None
mtfprwz FRT,RA mtvsrwz FRT,RA
mtvrwz VRT,RA mtvsrwz VRT+32,RA Data Layout for mtvsrdd
src = GPR[RA]
Special Registers Altered
None
src = GPR[RB]
Data Layout for mtvsrwz
src = GPR[RA] tgt = VSR[XT]
unused .dword[0] .dword[1]
tgt = VSR[XT] 0 32 64 127
.dword[0] undefined
0 32 64 127
31 T RA /// 403 TX
0 6 11 16 21 31
VSR[32×TX+T].word[0] GPR[RA].bit[32:63]
VSR[32×TX+T].word[1] GPR[RA].bit[32:63]
VSR[32×TX+T].word[2] GPR[RA].bit[32:63]
VSR[32×TX+T].word[3] GPR[RA].bit[32:63]
Programming Note
The AMR is part of the “context” of the program
(see Book III). Therefore modification of the AMR
requires “synchronization” by software. For this
reason, most operating systems provide a system
library program that application programs can use
to modify the AMR.
count 0
do i = 0 to 7 mask 4(FXM0) || 4(FXM1) || ... 4(FXM7)
if FXMi = 1 then CR ((RS)32:63 & mask) | (CR & ¬mask)
n i
count count + 1 The contents of bits 32:63 of register RS are placed
if count = 1 then into the Condition Register under control of the field
CR4n+32:4n+35 (RS)4n+32:4n+35 mask specified by FXM. The field mask identifies the
else CR undefined 4-bit fields affected. Let i be an integer in the range 0-7.
If FXMi=1 then CR field i (CR bits 4i+32:4i+35) is set
If exactly one bit of the FXM field is set to 1, let n be the to the contents of the corresponding field of the
position of that bit in the field (0 n 7). The contents low-order 32 bits of RS.
of bits 4n+32:4n+35 of register RS are placed into
CR field n (CR bits 4n+32:4n+35). Otherwise, the Special Registers Altered:
contents of the Condition Register are undefined. CR fields selected by mask
Special Registers Altered: Extended Mnemonics:
CR field selected by FXM
Example of extended mnemonics for Move To Condi-
tion Register Fields:
Move From One Condition Register Field Move From Condition Register XFX-form
XFX-form
mfocrf RT,FXM mfcr RT
31 RT 1 FXM / 19 / 31 RT 0 /// 19 /
0 6 11 12 20 21 31 0 6 11 12 21 31
RT undefined RT 320 || CR
count 0
do i = 0 to 7 The contents of the Condition Register are placed into
if FXMi = 1 then RT32:63. RT0:31 are set to 0.
n i
count count + 1
Special Registers Altered:
if count = 1 then
RT 640 None
RT4n+32:4n+35 CR4n+32:4n+35
If exactly one bit of the FXM field is set to 1, let n be Set Boolean X-form
the position of that bit in the field (0 n 7). The
setb RT,BFA
contents of CR field n (CR bits 4n+32:4n+35) are
placed into bits 4n+32:4n+35 of register RT, and the 31 RT BFA // /// 128 /
contents of the remaining bits of register RT are 0 6 11 14 16 21 31
undefined. Otherwise, the contents of register RT are
undefined. if CR4×BFA+32=1 then
RT 0xFFFF_FFFF_FFFF_FFFF
If exactly one bit of the FXM field is set to 1, the
else if CR4×BFA+33=1 then
contents of the remaining bits of register RT are set to RT 0x0000_0000_0000_0001
0's instead of being undefined as specified above.
else
Special Registers Altered: RT 0x0000_0000_0000_0000
None
If the contents of bit 0 of CR field BFA are equal to 0b1,
Programming Note the contents of register RT are set to
0xFFFF_FFFF_FFFF_FFFF.
Warning: mfocrf is not backward compatible with
processors that comply with versions of the
Otherwise, if the contents of bit 1 of CR field BFA are
architecture that precede Version 3.0 B. Such
equal to 0b1, the contents of register RT are set to
processors may not set to 0 the bits of register RT
0x0000_0000_0000_0001.
that do not correspond to the specified CR field. If
programs that depend on this clearing behavior Otherwise, the contents of register RT are set to
are run on such processors, the programs may get 0x0000_0000_0000_0000.
incorrect results.
Special Registers Altered:
The POWER4, POWER5, POWER7 and
None
POWER8 processors set to 0's all bytes of register
RT other than the byte that contains the specified
CR field. In the byte that contains the CR field, bits
other than those containing the CR field may or
may not be set to 0s.
4.1 Floating-Point Facility Over- itly, and select the value from one of two float-
ing-point registers based on the value in a third
view floating-point register. The operations performed
by these instructions are not considered float-
This chapter describes the registers and instructions ing-point operations. With the exception of the
that make up the Floating-Point Facility. instructions that manipulate the Floating-Point Sta-
tus and Control Register explicitly, they do not alter
The processor (augmented by appropriate software
the Floating-Point Status and Control Register.
support, where required) implements a floating-point
They are the instructions described in Sections
system compliant with the ANSI/IEEE Standard
4.6.2 through 4.6.5, and 4.6.10.
754-1985, “IEEE Standard for Binary Floating-Point
Arithmetic” (hereafter referred to as “the IEEE stan- A floating-point number consists of a signed exponent
dard”). That standard defines certain required “opera- and a signed significand. The quantity expressed by
tions” (addition, subtraction, etc.). Herein, the term this number is the product of the significand and the
“floating-point operation” is used to refer to one of these number 2exponent. Encodings are provided in the data
required operations and to additional operations format to represent finite numeric values, Infinity, and
defined (e.g., those performed by Multiply-Add or values that are “Not a Number” (NaN). Operations
Reciprocal Estimate instructions). A Non-IEEE mode is involving infinities produce results obeying traditional
also provided. This mode, which may produce results mathematical conventions. NaNs have no mathemati-
not in strict compliance with the IEEE standard, allows cal interpretation. Their encoding permits a variable
shorter latency. diagnostic information field. They may be used to indi-
cate such things as uninitialized variables and can be
Instructions are provided to perform arithmetic, round-
produced by certain invalid operations.
ing, conversion, comparison, and other operations in
floating-point registers; to move floating-point data There is one class of exceptional events that occur
between storage and these registers; and to manipu- during instruction execution that is unique to the Float-
late the Floating-Point Status and Control Register ing-Point Facility: the Floating-Point Exception. Float-
explicitly. ing-point exceptions are signaled with bits set in the
Floating-Point Status and Control Register (FPSCR).
These instructions are divided into two categories.
They can cause the system floating-point enabled
computational instructions exception error handler to be invoked, precisely or
The computational instructions are those that per- imprecisely, if the proper control bits are set.
form addition, subtraction, multiplication, division,
extracting the square root, rounding, conversion, Floating-Point Exceptions
comparison, and combinations of these opera-
The following floating-point exceptions are detected by
tions. These instructions provide the floating-point
the processor:
operations. They place status information into the
Floating-Point Status and Control Register. They
Invalid Operation Exception (VX)
are the instructions described in Sections 4.6.6
SNaN (VXSNAN)
through 4.6.8.
Infinity-Infinity (VXISI)
non-computational instructions InfinityInfinity (VXIDI)
ZeroZero (VXZDZ)
The non-computational instructions are those that
InfinityZero (VXIMZ)
perform loads and stores, move the contents of a
Invalid Compare (VXVC)
floating-point register to another floating-point reg-
Software-Defined Condition (VXSOFT)
ister possibly altering the sign, manipulate the
Invalid Square Root (VXSQRT)
Floating-Point Status and Control Register explic-
Invalid Integer Convert (VXCVI) the result placed into the target FPR, and the setting of
Zero Divide Exception (ZX) status bits in the FPSCR and in the Condition Register
Overflow Exception (OX) (if Rc=1), are undefined.
Underflow Exception (UX)
Inexact Exception (XX) FPR 0
Each floating-point exception, and each category of FPR 1
Invalid Operation Exception, has an exception bit in the ...
FPSCR. In addition, each floating-point exception has a
corresponding enable bit in the FPSCR. See ...
Section 4.2.2, “Floating-Point Status and Control Reg- FPR 30
ister” on page 124 for a description of these exception
FPR 31
and enable bits, and Section 4.4, “Floating-Point
Exceptions” on page 132 for a detailed discussion of 0 63
the FPCC bits to 1 and the other three FPCC See Section 4.4.5, “Inexact Exception” on
bits to 0. Arithmetic, rounding, and Convert page 136.
From Integer instructions may set the FPCC
61 Floating-Point Non-IEEE Mode (NI)
bits with the C bit, to indicate the class of the
Floating-point non-IEEE mode is optional. If
result as shown in Figure 47 on page 127.
floating-point non-IEEE mode is not imple-
Note that in this case the high-order three bits
mented, this bit is treated as reserved, and the
of the FPCC retain their relational significance
remainder of the definition of this bit does not
indicating that the value is less than, greater
apply.
than, or equal to zero.
If floating-point non-IEEE mode is imple-
48 Floating-Point Less Than or Negative (FL
mented, this bit has the following meaning.
or <)
0 The processor is not in floating-point
49 Floating-Point Greater Than or Positive non-IEEE mode (i.e., all floating-point
(FG or >) operations conform to the IEEE standard).
50 Floating-Point Equal or Zero (FE or =) 1 The processor is in floating-point
non-IEEE mode.
51 Floating-Point Unordered or NaN (FU or ?)
When the processor is in floating-point
52 Reserved non-IEEE mode, the remaining FPSCR bits
53 Floating-Point Invalid Operation Excep- may have meanings different from those given
tion (Software-Defined Condition) in this document, and floating-point operations
(VXSOFT) need not conform to the IEEE standard. The
This bit can be altered only by mcrfs, mtfsfi, effects of executing a given floating-point
mtfsf, mtfsb0, or mtfsb1. See Section 4.4.1. instruction with FPSCRNI=1, and any addi-
tional requirements for using non-IEEE mode,
Programming Note are implementation-dependent. The results of
FPSCRVXSOFT can be used by software executing a given instruction in non-IEEE
to indicate the occurrence of an arbitrary, mode may vary between implementations,
software-defined, condition that is to be and between different executions on the same
treated as an Invalid Operation Exception. implementation.
For example, the bit could be set by a pro-
gram that computes a base 10 logarithm if Programming Note
the supplied input is negative. When the processor is in floating-point
non-IEEE mode, the results of float-
54 Floating-Point Invalid Operation Excep- ing-point operations may be approximate,
tion (Invalid Square Root) (VXSQRT) and performance for these operations
See Section 4.4.1. may be better, more predictable, or less
data-dependent than when the processor
55 Floating-Point Invalid Operation Excep- is not in non-IEEE mode. For example, in
tion (Invalid Integer Convert) (VXCVI) non-IEEE mode an implementation may
See Section 4.4.1. return 0 instead of a denormalized num-
56 Floating-Point Invalid Operation Excep- ber, and may return a large number
tion Enable (VE) instead of an infinity.
See Section 4.4.1.
62:63 Floating-Point Rounding Control (RN) See
57 Floating-Point Overflow Exception Enable
Section 4.3.6, “Rounding” on page 131.
(OE)
See Section 4.4.3, “Overflow Exception” on 00 Round to Nearest
page 135. 01 Round toward Zero
10 Round toward +Infinity
58 Floating-Point Underflow Exception
11 Round toward -Infinity
Enable (UE)
See Section 4.4.4, “Underflow Exception” on
page 136.
59 Floating-Point Zero Divide Exception
Enable (ZE)
See Section 4.4.2, “Zero Divide Exception” on
page 134.
60 Floating-Point Inexact Exception Enable
(XE)
Figure 49. Floating-point double format -INF -NOR -DEN -0 +0 +DEN +NOR +INF
S sign bit The NaNs are not related to the numeric values or infin-
EXP exponent+bias ities by order or value but are encodings used to con-
FRACTION fraction vey diagnostic information such as the representation
of uninitialized variables.
Representation of numeric values in the floating-point
formats consists of a sign bit (S), a biased exponent The following is a description of the different float-
(EXP), and the fraction portion (FRACTION) of the sig- ing-point values defined in the architecture:
nificand. The significand consists of a leading implied Binary floating-point numbers
bit concatenated on the right with the FRACTION. This Machine representable values used as approximations
leading implied bit is 1 for normalized numbers and 0 to real numbers. Three categories of numbers are sup-
for denormalized numbers and is located in the unit bit ported: normalized numbers, denormalized numbers,
position (i.e., the first bit to the left of the binary point). and zero values.
Values representable within the two floating-point for-
Exception generates this QNaN (i.e., mented by one. For the leading-zero case, the signifi-
0x7FF8_0000_0000_0000). cand is shifted left while decrementing its exponent by
one for each bit shifted, until the leading significand bit
A double-precision NaN is considered to be represent-
becomes one. The Guard bit and the Round bit (see
able in single format if and only if the low-order 29 bits
Section 4.5.1, “Execution Model for IEEE Operations”
of the double-precision NaN’s fraction are zero.
on page 137) participate in the shift with zeros shifted
into the Round bit. The exponent is regarded as if its
4.3.3 Sign of Result range were unlimited.
The following rules govern the sign of the result of an After normalization, or if normalization was not
arithmetic, rounding, or conversion operation, when the required, the intermediate result may have a nonzero
operation does not yield an exception. They apply even significand and an exponent value that is less than the
when the operands or results are zeros or infinities. minimum value that can be represented in the format
specified for the result. In this case, the intermediate
The sign of the result of an add operation is the result is said to be “Tiny” and the stored result is deter-
sign of the operand having the larger absolute mined by the rules described in Section 4.4.4, “Under-
value. If both operands have the same sign, the flow Exception”. These rules may require
sign of the result of an add operation is the same denormalization.
as the sign of the operands. The sign of the result
of the subtract operation x-y is the same as the A number is denormalized by shifting its significand
sign of the result of the add operation x+(-y). right while incrementing its exponent by 1 for each bit
shifted, until the exponent is equal to the format’s mini-
When the sum of two operands with opposite sign, mum value. If any significant bits are lost in this shifting
or the difference of two operands with the same process then “Loss of Accuracy” has occurred (See
sign, is exactly zero, the sign of the result is posi- Section 4.4.4, “Underflow Exception” on page 136) and
tive in all rounding modes except Round toward Underflow Exception is signaled.
-Infinity, in which mode the sign is negative.
The sign of the result of a multiply or divide opera- 4.3.5 Data Handling and Precision
tion is the Exclusive OR of the signs of the oper-
ands. Most of the Floating-Point Facility Architecture, includ-
The sign of the result of a Square Root or Recipro- ing all computational, Move, and Select instructions,
cal Square Root Estimate operation is always pos- use the floating-point double format to represent data in
itive, except that the square root of -0 is -0 and the FPRs. Single-precision and integer-valued oper-
the reciprocal square root of -0 is -Infinity. ands may be manipulated using double-precision oper-
ations. Instructions are provided to coerce these values
The sign of the result of a Round to Single-Preci- from a double format operand. Instructions are also
sion, or Convert From Integer, or Round to Integer provided for manipulations which do not require dou-
operation is the sign of the operand being con- ble-precision. In addition, instructions are provided to
verted. access a true single-precision representation in stor-
For the Multiply-Add instructions, the rules given above age, and a fixed-point integer representation in GPRs.
are applied first to the multiply operation and then to
the add or subtract operation (one of the inputs to the 4.3.5.1 Single-Precision Operands
add or subtract operation is the result of the multiply
operation). For single format data, a format conversion from single
to double is performed when loading from storage into
an FPR and a format conversion from double to single
4.3.4 Normalization and is performed when storing from an FPR to storage. No
Denormalization floating-point exceptions are caused by these instruc-
tions. An instruction is provided to explicitly convert a
The intermediate result of an arithmetic or frsp instruc- double format operand in an FPR to single-precision.
tion may require normalization and/or denormalization Floating-point single-precision is enabled with four
as described below. Normalization and denormalization types of instruction.
do not affect the sign of the result.
When an arithmetic or rounding instruction produces an 1. Load Floating-Point Single
intermediate result which carries out of the significand,
or in which the significand is nonzero but has a leading This form of instruction accesses a single-preci-
zero bit, it is not a normalized number and must be nor- sion operand in single format in storage, converts it
malized before it is stored. For the carry-out case, the to double format, and loads it into an FPR. No
significand is shifted right one bit, with a one shifted into floating-point exceptions are caused by these
the leading significand bit, and the exponent is incre- instructions.
4.4 Floating-Point Exceptions When an exception occurs the writing of a result to the
target register may be suppressed or a result may be
This architecture defines the following floating-point delivered, depending on the exception.
exceptions: The writing of a result to the target register is sup-
Invalid Operation Exception pressed for the following kinds of exception, so that
SNaN there is no possibility that one of the operands is lost:
Infinity-Infinity Enabled Invalid Operation
InfinityInfinity Enabled Zero Divide
ZeroZero
InfinityZero For the remaining kinds of exception, a result is gener-
Invalid Compare ated and written to the destination specified by the
Software-Defined Condition instruction causing the exception. The result may be a
Invalid Square Root different value for the enabled and disabled conditions
Invalid Integer Convert for some of these exceptions. The kinds of exception
Zero Divide Exception that deliver a result are the following:
Overflow Exception Disabled Invalid Operation
Underflow Exception Disabled Zero Divide
Inexact Exception Disabled Overflow
These exceptions, other than Invalid Operation Excep- Disabled Underflow
tion due to Software-Defined Condition, may occur Disabled Inexact
during execution of computational instructions. An Enabled Overflow
Invalid Operation Exception due to Software-Defined Enabled Underflow
Condition occurs when a Move To FPSCR instruction Enabled Inexact
sets FPSCRVXSOFT to 1. Subsequent sections define each of the floating-point
Each floating-point exception, and each category of exceptions and specify the action that is taken when
Invalid Operation Exception, has an exception bit in the they are detected.
FPSCR. In addition, each floating-point exception has a The IEEE standard specifies the handling of excep-
corresponding enable bit in the FPSCR. The exception tional conditions in terms of “traps” and “trap handlers”.
bit indicates occurrence of the corresponding excep- In this architecture, an FPSCR exception enable bit of 1
tion. If an exception occurs, the corresponding enable causes generation of the result value specified in the
bit governs the result produced by the instruction and, IEEE standard for the “trap enabled” case; the expecta-
in conjunction with the FE0 and FE1 bits (see tion is that the exception will be detected by software,
page 133), whether and how the system floating-point which will revise the result. An FPSCR exception
enabled exception error handler is invoked. (In general, enable bit of 0 causes generation of the “default result”
the enabling specified by the enable bit is of invoking value specified for the “trap disabled” (or “no trap
the system error handler, not of permitting the excep- occurs” or “trap is not implemented”) case; the expecta-
tion to occur. The occurrence of an exception depends tion is that the exception will not be detected by soft-
only on the instruction and its inputs, not on the setting ware, which will simply use the default result. The result
of any control bits. The only deviation from this general to be delivered in each case for each exception is
rule is that the occurrence of an Underflow Exception described in the sections below.
may depend on the setting of the enable bit.)
The IEEE default behavior when an exception occurs is
A single instruction, other than mtfsfi or mtfsf, may set to generate a default value and not to notify software. In
more than one exception bit only in the following cases: this architecture, if the IEEE default behavior when an
Inexact Exception may be set with Overflow exception occurs is desired for all exceptions, all
Exception. FPSCR exception enable bits should be set to 0 and
Inexact Exception may be set with Underflow Ignore Exceptions Mode (see below) should be used.
Exception. In this case the system floating-point enabled exception
Invalid Operation Exception (SNaN) is set with error handler is not invoked, even if floating-point
Invalid Operation Exception (0) for Multiply-Add exceptions occur: software can inspect the FPSCR
instructions for which the values being multiplied exception bits if necessary, to determine whether
are infinity and zero and the value being added is exceptions have occurred.
an SNaN.
Invalid Operation Exception (SNaN) may be set In this architecture, if software is to be notified that a
with Invalid Operation Exception (Invalid Compare) given kind of exception has occurred, the correspond-
for Compare Ordered instructions. ing FPSCR exception enable bit must be set to 1 and a
Invalid Operation Exception (SNaN) may be set mode other than Ignore Exceptions Mode must be
with Invalid Operation Exception (Invalid Integer used. In this case the system floating-point enabled
Convert) for Convert To Integer instructions. exception error handler is invoked if an enabled float-
ing-point exception occurs. The system floating-point before the instruction at which the system floating-point
enabled exception error handler is also invoked if a enabled exception error handler is invoked have com-
Move To FPSCR instruction causes an exception bit pleted, and no instruction after the instruction at which
and the corresponding enable bit both to be 1; the the system floating-point enabled exception error han-
Move To FPSCR instruction is considered to cause the dler is invoked has begun execution. The instruction at
enabled exception. which the system floating-point enabled exception error
handler is invoked has completed if it is the excepting
The FE0 and FE1 bits control whether and how the sys-
instruction and there is only one such instruction. Oth-
tem floating-point enabled exception error handler is
erwise it has not begun execution (or may have been
invoked if an enabled floating-point exception occurs.
partially executed in some cases, as described in Book
The location of these bits and the requirements for
III).
altering them are described in Book III. (The system
floating-point enabled exception error handler is never
Programming Note
invoked because of a disabled floating-point excep-
tion.) The effects of the four possible settings of these In any of the three non-Precise modes, a Float-
bits are as follows. ing-Point Status and Control Register instruction
can be used to force any exceptions, due to
FE0 FE1 Description instructions initiated before the Floating-Point Sta-
tus and Control Register instruction, to be recorded
0 0 Ignore Exceptions Mode in the FPSCR. (This forcing is superfluous for Pre-
Floating-point exceptions do not cause cise Mode.)
the system floating-point enabled excep-
tion error handler to be invoked. In either of the Imprecise modes, a Floating-Point
0 1 Imprecise Nonrecoverable Mode Status and Control Register instruction can be used
The system floating-point enabled excep- to force any invocations of the system floating-point
tion error handler is invoked at some point enabled exception error handler, due to instructions
at or beyond the instruction that caused initiated before the Floating-Point Status and Con-
the enabled exception. It may not be pos- trol Register instruction, to occur. (This forcing has
sible to identify the excepting instruction no effect in Ignore Exceptions Mode, and is super-
or the data that caused the exception. fluous for Precise Mode.)
Results produced by the excepting The last sentence of the paragraph preceding this
instruction may have been used by or may Programming Note can apply only in the Imprecise
have affected subsequent instructions modes, or if the mode has just been changed from
that are executed before the error handler Ignore Exceptions Mode to some other mode. (It
is invoked. always applies in the latter case.)
1 0 Imprecise Recoverable Mode
The system floating-point enabled excep- In order to obtain the best performance across the wid-
tion error handler is invoked at some point est range of implementations, the programmer should
at or beyond the instruction that caused obey the following guidelines.
the enabled exception. Sufficient informa-
If the IEEE default results are acceptable to the
tion is provided to the error handler that it
application, Ignore Exceptions Mode should be
can identify the excepting instruction and
used with all FPSCR exception enable bits set to
the operands, and correct the result. No
0.
results produced by the excepting instruc-
If the IEEE default results are not acceptable to the
tion have been used by or have affected
application, Imprecise Nonrecoverable Mode
subsequent instructions that are executed
should be used, or Imprecise Recoverable Mode if
before the error handler is invoked.
recoverability is needed, with FPSCR exception
1 1 Precise Mode enable bits set to 1 for those exceptions for which
The system floating-point enabled excep- the system floating-point enabled exception error
tion error handler is invoked precisely at handler is to be invoked.
the instruction that caused the enabled Ignore Exceptions Mode should not, in general, be
exception. used when any FPSCR exception enable bits are
set to 1.
In all cases, the question of whether a floating-point
Precise Mode may degrade performance in some
result is stored, and what value is stored, is governed
implementations, perhaps substantially, and there-
by the FPSCR exception enable bits, as described in
fore should be used only for debugging and other
subsequent sections, and is not affected by the value of
specialized applications.
the FE0 and FE1 bits.
In all cases in which the system floating-point enabled
exception error handler is invoked, all instructions
4.4.3.1 Definition
An Overflow Exception occurs when the magnitude of
what would have been the rounded result if the expo-
nent range were unbounded exceeds that of the largest
finite number of the specified result precision.
4.4.3.2 Action
The action to be taken depends on the setting of the
Overflow Exception Enable bit of the FPSCR.
When Overflow Exception is enabled (FPSCROE=1)
and an Overflow Exception occurs, the following
actions are taken:
1. Overflow Exception is set
FPSCROX 1
2. For double-precision arithmetic instructions, the
exponent of the normalized intermediate result is
adjusted by subtracting 1536
3. For single-precision arithmetic instructions and the
Floating Round to Single-Precision instruction, the
exponent of the normalized intermediate result is
adjusted by subtracting 192
4. The adjusted rounded result is placed into the tar-
get FPR
5. FPSCRFPRF is set to indicate the class and sign of
the result ( Normal Number)
When Overflow Exception is disabled (FPSCROE=0)
and an Overflow Exception occurs, the following
actions are taken:
Programming Note
In some implementations, enabling Inexact Excep-
tions may degrade performance more than does
enabling other types of floating-point exception.
S C L FRACTION X’
0 1 2 3 106
Storage accesses will cause the system data storage Zero / Infinity / NaN
error handler to be invoked if the program is not if WORD1:8 = 255 or WORD1:31 = 0 then
allowed to modify the target storage (Store only), or if FRT0:1 WORD0:1
the program attempts to access storage that is unavail- FRT2 WORD1
able. FRT3 WORD1
FRT4 WORD1
FRT5:63 WORD2:31 || 290
4.6.2 Floating-Point Load Instruc- For double-precision Load Floating-Point instructions
tions and for the Load Floating-Point as Integer Word Alge-
braic instruction no conversion is required, as the data
There are three basic forms of load instruction: sin- from storage are copied directly into the FPR.
gle-precision, double-precision, and integer. The inte-
ger form is provided by the Load Floating-Point as Many of the Load Floating-Point instructions have an
Integer Word Algebraic instruction, described on “update” form, in which register RA is updated with the
page 143. Because the FPRs support only float- effective address. For these forms, if RA0, the effec-
ing-point double format, single-precision Load Float- tive address is placed into register RA and the storage
ing-Point instructions convert single-precision data to element (word or doubleword) addressed by EA is
double format prior to loading the operand into the tar- loaded into FRT.
get FPR. The conversion and loading steps are as fol-
Note: Recall that RA and RB denote General Purpose
lows.
Registers, while FRT denotes a Floating-Point Regis-
Let WORD0:31 be the floating-point single-precision ter.
operand accessed from storage.
Load Floating-Point Double with Update Load Floating-Point as Integer Word and
Indexed X-form Zero Indexed X-form
lfdux FRT,RA,RB lfiwzx FRT,RA,RB
31 FRT RA RB 855 /
0 6 11 16 21 31
if RA = 0 then b 0
else b (RA)
EA b + (RB)
FRT EXTS(MEM(EA, 4))
Let the effective address (EA) be the sum (RA|0)+(RB).
The word in storage addressed by EA is loaded into
FRT32:63. FRT0:31 are filled with a copy of bit 0 of the
loaded word.
Special Registers Altered:
None
Store Floating-Point Single with Update Store Floating-Point Single with Update
D-form Indexed X-form
stfsu FRS,D(RA) stfsux FRS,RA,RB
Store Floating-Point Double with Update Store Floating-Point Double with Update
D-form Indexed X-form
stfdu FRS,D(RA) stfdux FRS,RA,RB
31 FRS RA RB 983 /
0 6 11 16 21 31
if RA = 0 then b 0
else b (RA)
EA b + (RB)
MEM(EA, 4) (FRS)32:63
Let the effective address (EA) be the sum (RA|0)+(RB).
(FRS)32:63 are stored, without conversion, into the word
in storage addressed by EA.
If the contents of register FRS were produced, either
directly or indirectly, by a Load Floating-Point Single
instruction, a single-precision Arithmetic instruction, or
frsp, then the value stored is undefined. (The contents
of register FRS are produced directly by such an
instruction if FRS is the target register for the instruc-
tion. The contents of register FRS are produced indi-
rectly by such an instruction if FRS is the final target
register of a sequence of one or more Floating-Point
Move instructions, with the input to the sequence hav-
ing been produced directly by such an instruction.)
Special Registers Altered:
None
Load Floating-Point Double Pair DS-form Store Floating-Point Double Pair DS-form
lfdp FRTp,DS(RA) stfdp FRSp,DS(RA)
57 FRTp RA DS 0 61 FRSp RA DS 0
0 6 11 16 30 31 0 6 11 16 30 31
if RA = 0 then b 0 if RA = 0 then b 0
else b (RA) else b (RA)
EA b + EXTS(DS||0b00) EA b + EXTS(DS||0b00)
FRTpeven MEM(EA,8) MEM(EA, 8) FRSpeven
FRTpodd MEM(EA+8, 8) MEM(EA+8, 8) FRSpodd
Let the effective address (EA) be the sum (RA|0) + Let the effective address (EA) be the sum (RA|0) +
(DS||0b00). (DS||0b00).
The doubleword in storage addressed by EA is placed The contents of the even-numbered register of FRSp
into the even-numbered register of FRTp. are stored into the doubleword in storage addressed by
EA.
The doubleword in storage addressed by EA+8 is
placed into the odd-numbered register of FRTp. The contents of the odd-numbered register of FRSp are
stored into the doubleword in storage addressed by
If FRTp is odd, the instruction form is invalid.
EA+8.
Special Registers Altered: If FRSp is odd, the instruction form is invalid.
None
Special Registers Altered:
Load Floating-Point Double Pair Indexed None
X-form
Store Floating-Point Double Pair Indexed
lfdpx FRTp,RA,RB X-form
31 FRTp RA RB 791 / stfdpx FRSp,RA,RB
0 6 11 16 21 31
31 FRSp RA RB 919 /
if RA = 0 then b 0 0 6 11 16 21 31
else b (RA)
EA b + (RB) if RA = 0 then b 0
FRTpeven MEM(EA,8) else b (RA)
FRTpodd MEM(EA+8, 8) EA b + (RB)
Let the effective address (EA) be the sum (RA|0) + MEM(EA, 8) FRSpeven
MEM(EA+8, 8) FRSpodd
(RB).
Let the effective address (EA) be the sum (RA|0) +
The doubleword in storage addressed by EA is placed
(DS||0b00).
into the even-numbered register of FRTp.
The contents of the even-numbered register of FRSp
The doubleword in storage addressed by EA+8 is
are stored into the doubleword in storage addressed by
placed into the odd-numbered register of FRTp.
EA.
If FRTp is odd, the instruction form is invalid.
The contents of the odd-numbered register of FRSp are
Special Registers Altered: stored into the doubleword in storage addressed by
None EA+8.
If FRSp is odd, the instruction form is invalid.
The contents of register FRB are placed into register The contents of register FRB with bit 0 inverted are
FRT. placed into register FRT.
Special Registers Altered: Special Registers Altered:
CR1 (if Rc=1) CR1 (if Rc=1)
The contents of register FRB with bit 0 set to zero are The contents of register FRB with bit 0 set to the value
placed into register FRT. of bit 0 of register FRA are placed into register FRT.
Special Registers Altered: Special Registers Altered:
CR1 (if Rc=1) CR1 (if Rc=1)
Floating Merge Even Word X-form Floating Merge Odd Word X-form
fmrgew FRT,FRA,FRB fmrgow FRT,FRA,FRB
The contents of word element 0 of FPR[FRA] are placed The contents of word element 1 of FPR[FRA] are placed
into word element 0 of FPR[FRT]. into word element 0 of FPR[FRT].
The contents of word element 0 of FPR[FRB] are placed The contents of word element 1 of FPR[FRB] are placed
into word element 1 of FPR[FRT]. into word element 1 of FPR[FRT].
The floating-point operand in register FRA is added to The floating-point operand in register FRB is subtracted
the floating-point operand in register FRB. from the floating-point operand in register FRA.
If the most significant bit of the resultant significand is If the most significant bit of the resultant significand is
not 1, the result is normalized. The result is rounded to not 1, the result is normalized. The result is rounded to
the target precision under control of the Floating-Point the target precision under control of the Floating-Point
Rounding Control field RN of the FPSCR and placed Rounding Control field RN of the FPSCR and placed
into register FRT. into register FRT.
Floating-point addition is based on exponent compari- The execution of the Floating Subtract instruction is
son and addition of the two significands. The exponents identical to that of Floating Add, except that the con-
of the two operands are compared, and the significand tents of FRB participate in the operation with the sign
accompanying the smaller exponent is shifted right, bit (bit 0) inverted.
with its exponent increased by one for each bit shifted,
FPSCRFPRF is set to the class and sign of the result,
until the two exponents are equal. The two significands
except for Invalid Operation Exceptions when
are then added or subtracted as appropriate, depend-
FPSCRVE=1.
ing on the signs of the operands, to form an intermedi-
ate sum. All 53 bits of the significand as well as all three Special Registers Altered:
guard bits (G, R, and X) enter into the computation. FPRF FR FI
FX OX UX XX
If a carry occurs, the sum’s significand is shifted right
VXSNAN VXISI
one bit position and the exponent is increased by one.
CR1 (if Rc=1)
FPSCRFPRF is set to the class and sign of the result,
except for Invalid Operation Exceptions when
FPSCRVE=1.
Special Registers Altered:
FPRF FR FI
FX OX UX XX
VXSNAN VXISI
CR1 (if Rc=1)
The floating-point operand in register FRA is multiplied The floating-point operand in register FRA is divided by
by the floating-point operand in register FRC. the floating-point operand in register FRB. The remain-
der is not supplied as a result.
If the most significant bit of the resultant significand is
not 1, the result is normalized. The result is rounded to If the most significant bit of the resultant significand is
the target precision under control of the Floating-Point not 1, the result is normalized. The result is rounded to
Rounding Control field RN of the FPSCR and placed the target precision under control of the Floating-Point
into register FRT. Rounding Control field RN of the FPSCR and placed
into register FRT.
Floating-point multiplication is based on exponent addi-
tion and multiplication of the significands. Floating-point division is based on exponent subtrac-
tion and division of the significands.
FPSCRFPRF is set to the class and sign of the result,
except for Invalid Operation Exceptions when FPSCRFPRF is set to the class and sign of the result,
FPSCRVE=1. except for Invalid Operation Exceptions when
FPSCRVE=1 and Zero Divide Exceptions when
Special Registers Altered:
FPSCRZE=1.
FPRF FR FI
FX OX UX XX Special Registers Altered:
VXSNAN VXIMZ FPRF FR FI
CR1 (if Rc=1) FX OX UX ZX XX
VXSNAN VXIDI VXZDZ
CR1 (if Rc=1)
Note
See the Notes that appear with fre[s].
Floating Test for software Divide X-form Floating Test for software Square Root
X-form
ftdiv BF,FRA,FRB
ftsqrt BF,FRB
63 BF // FRA FRB 128 /
0 6 9 11 16 21 31 63 BF // /// FRB 160 /
0 6 9 11 16 21 31
Let e_a be the unbiased exponent of the double-preci-
sion floating-point operand in register FRA. Let e_b be the unbiased exponent of the double-preci-
sion floating-point operand in register FRB.
Let e_b be the unbiased exponent of the double-preci-
sion floating-point operand in register FRB. fe_flag is set to 1 if either of the following conditions
occurs.
fe_flag is set to 1 if any of the following conditions
occurs. The double-precision floating-point operand in reg-
ister FRB is a zero, a NaN, or an infinity, or a nega-
The double-precision floating-point operand in reg-
tive value.
ister FRA is a NaN or an Infinity.
e_b is less than or equal to -970.
The double-precision floating-point operand in reg-
ister FRB is a Zero, a NaN, or an Infinity. Otherwise fe_flag is set to 0.
e_b is less than or equal to -1022. fg_flag is set to 1 if the following condition occurs.
e_b is greater than or equal to 1021. The double-precision floating-point operand in reg-
The double-precision floating-point operand in reg- ister FRB is a Zero, an Infinity, or a denormalized
ister FRA is not a zero and the difference, value.
e_a - e_b, is greater than or equal to 1023. Otherwise fg_flag is set to 0.
The double-precision floating-point operand in reg- If the implementation guarantees a relative error of
ister FRA is not a zero and the difference, frsqrte[s][.] of less than or equal to 2-14, then fl_flag
e_a - e_b, is less than or equal to -1021. is set to 1. Otherwise fl_flag is set to 0.
The double-precision floating-point operand in reg- CR field BF is set to the value
ister FRA is not a zero and e_a is less than or fl_flag || fg_flag || fe_flag || 0b0.
equal to -970
Special Registers Altered:
Otherwise fe_flag is set to 0. CR field BF
fg_flag is set to 1 if either of the following conditions
occurs. Programming Note
The double-precision floating-point operand in reg- ftdiv and ftsqrt are provided to accelerate software
ister FRA is an Infinity. emulation of divide and square root operations, by
performing the requisite special case checking.
The double-precision floating-point operand in reg- Software needs only a single branch, on FE=1 (in
ister FRB is a Zero, an Infinity, or a denormalized CR[BF]), to a special case handler. FG and FL may
value. provide further acceleration opportunities.
Otherwise fg_flag is set to 0.
If the implementation guarantees a relative error of
fre[s][.] of less than or equal to 2-14, then fl_flag is
set to 1. Otherwise fl_flag is set to 0.
CR field BF is set to the value
fl_flag || fg_flag || fe_flag || 0b0.
Special Registers Altered:
CR field BF
Let src be the double-precision floating-point value in Let src be the double-precision floating-point value in
FRB. FRB.
If src is a NaN, then the result is If src is a NaN, then the result is
0x8000_0000_0000_0000, VXCVI is set to 1, and, if src is 0x0000_0000_0000_0000, VXCVI is set to 1, and, if src is
an SNaN, VXSNAN is set to 1. an SNaN, VXSNAN is set to 1.
Otherwise, src is rounded to a floating-point integer Otherwise, src is rounded to a floating-point integer
using the rounding mode Round toward Zero. using the rounding mode specified by RN.
If the rounded value is greater than 263-1, then the If the rounded value is greater than 264-1, then the
result is 0x7FFF_FFFF_FFFF_FFFF and VXCVI is set to 1. result is 0xFFFF_FFFF_FFFF_FFFF, and VXCVI is set to 1.
Otherwise, if the rounded value is less than -263, then Otherwise, if the rounded value is less than 0, then the
the result is 0x8000_0000_0000_0000 and VXCVI is set to result is 0x0000_0000_0000_0000, and VXCVI is set to 1.
1.
Otherwise, the result is the rounded value converted to
Otherwise, the result is the rounded value converted to 64-bit unsigned-integer format, and XX is set to 1 if the
64-bit signed-integer format, and XX is set to 1 if the result is inexact.
result is inexact.
If an enabled Invalid Operation Exception does not
If an enabled Invalid Operation Exception does not occur, then the result is placed into FRT.
occur, then the result is placed into FRT.
The conversion is described fully in Section A.2, “Float-
The conversion is described fully in Section A.2, “Float- ing-Point Convert to Integer Model” on page 779.
ing-Point Convert to Integer Model” on page 779.
Except for enabled Invalid Operation Exceptions, FPRF
Except for enabled Invalid Operation Exceptions, FPRF is undefined. FR is set if the result is incremented when
is undefined. FR is set if the result is incremented when rounded. FI is set if the result is inexact.
rounded. FI is set if the result is inexact.
Special Registers Altered:
Special Registers Altered: FPRF (undefined) FR FI
FPRF (undefined) FR FI FX XX VXSNAN VXCVI
FX XX VXSNAN VXCVI CR1 (if Rc=1)
CR1 (if Rc=1)
Let src be the double-precision floating-point value in Let src be the double-precision floating-point value in
FRB. FRB.
If src is a NaN, then the result is 0x8000_0000, VXCVI is If src is a NaN, then the result is 0x0000_0000, VXCVI is
set to 1, and, if src is an SNaN, VXSNAN is set to 1. set to 1, and, if src is an SNaN, VXSNAN is set to 1.
Otherwise, src is rounded to a floating-point integer Otherwise, src is rounded to a floating-point integer
using the rounding mode Round toward Zero. using the rounding mode specified by RN.
If the rounded value is greater than 231-1, then the If the rounded value is greater than 232-1, then the
result is 0x7FFF_FFFF, and VXCVI is set to 1. result is 0xFFFF_FFFF and VXCVI is set to 1.
Otherwise, if the rounded value is less than -231, then Otherwise, if the rounded value is less than 0, then the
the result is 0x8000_0000, and VXCVI is set to 1. result is 0x0000_0000 and VXCVI is set to 1.
Otherwise, the result is the rounded value converted to Otherwise, the result is the rounded value converted to
32-bit signed-integer format, and XX is set to 1 if the 32-bit unsigned-integer format, and XX is set to 1 if the
result is inexact. result is inexact.
If an enabled Invalid Operation Exception does not If an enabled Invalid Operation Exception does not
occur, then the result is placed into FRT32:63 and FRT0:31 occur, then the result is placed into FRT32:63 and FRT0:31
is undefined, is undefined,
The conversion is described fully in Section A.2, “Float- The conversion is described fully in Section A.2, “Float-
ing-Point Convert to Integer Model” on page 779. ing-Point Convert to Integer Model” on page 779.
Except for enabled Invalid Operation Exceptions, FPRF Except for enabled Invalid Operation Exceptions, FPRF
is undefined. FR is set if the result is incremented when is undefined. FR is set if the result is incremented when
rounded. FI is set if the result is inexact. rounded. FI is set if the result is inexact.
Special Registers Altered: Special Registers Altered:
FPRF (undefined) FR FI FPRF (undefined) FR FI
FX XX FX XX
VXSNAN VXCVI VXSNAN VXCVI
CR1 (if Rc=1) CR1 (if Rc=1)
Floating Convert with round Unsigned Floating Convert with round Signed
Doubleword to Double-Precision format Doubleword to Single-Precision format
X-form X-form
fcfidu FRT,FRB (Rc=0) fcfids FRT,FRB (Rc=0)
fcfidu. FRT,FRB (Rc=1) fcfids. FRT,FRB (Rc=1)
The 64-bit unsigned fixed-point operand in register The 64-bit signed fixed-point operand in register FRB is
FRB is converted to an infinitely precise floating-point converted to an infinitely precise floating-point integer.
integer. The result of the conversion is rounded to dou- The result of the conversion is rounded to single-preci-
ble-precision, using the rounding mode specified by sion, using the rounding mode specified by FPSCRRN,
FPSCRRN, and placed into register FRT. and placed into register FRT.
The conversion is described fully in Section A.3, “Float- The conversion is described fully in Section A.3, “Float-
ing-Point Convert from Integer Model”. ing-Point Convert from Integer Model”.
FPSCRFPRF is set to the class and sign of the result. FR FPSCRFPRF is set to the class and sign of the result. FR
is set if the result is incremented when rounded. is set if the result is incremented when rounded.
FPSCRFI is set if the result is inexact. FPSCRFI is set if the result is inexact.
Special Registers Altered: Special Registers Altered:
FPRF FR FI FPRF FR FI
FX XX FX XX
CR1 (if Rc=1) CR1 (if Rc=1)
Programming Note
Converting a unsigned integer word to single-preci-
sion floating-point can be accomplished by loading
the word from storage using Load Float Word and
Zero Indexed and then using fcfidus.
Floating Round to Integer Nearest X-form Floating Round to Integer Plus X-form
frin FRT,FRB (Rc=0) frip FRT,FRB (Rc=0)
frin. FRT,FRB (Rc=1) frip. FRT,FRB (Rc=1)
The floating-point operand in register FRB is rounded The floating-point operand in register FRB is rounded
to an integral value as follows, with the result placed to an integral value using the rounding mode round
into register FRT. If the sign of the operand is positive, toward +infinity, and the result is placed into register
(FRB) + 0.5 is truncated to an integral value, otherwise FRT.
(FRB) - 0.5 is truncated to an integral value.
FPSCRFPRF is set to the class and sign of the result,
FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when
except for Invalid Operation Exceptions when FPSCRVE = 1.
FPSCRVE = 1.
Special Registers Altered:
Special Registers Altered: FPRF FR (set to 0) FI (set to 0)
FPRF FR (set to 0) FI (set to 0) FX
FX VXSNAN
VXSNAN CR1 (if Rc = 1)
CR1 (if Rc = 1)
Floating Round to Integer Toward Zero Floating Round to Integer Minus X-form
X-form
frim FRT,FRB (Rc=0)
friz FRT,FRB (Rc=0) frim. FRT,FRB (Rc=1)
friz. FRT,FRB (Rc=1)
63 FRT /// FRB 488 Rc
63 FRT /// FRB 424 Rc 0 6 11 16 21 31
0 6 11 16 21 31
The floating-point operand in register FRB is rounded
The floating-point operand in register FRB is rounded to an integral value using the rounding mode round
to an integral value using the rounding mode round toward -infinity, and the result is placed into register
toward zero, and the result is placed into register FRT. FRT.
FPSCRFPRF is set to the class and sign of the result, FPSCRFPRF is set to the class and sign of the result,
except for Invalid Operation Exceptions when except for Invalid Operation Exceptions when
FPSCRVE = 1. FPSCRVE = 1.
Special Registers Altered: Special Registers Altered:
FPRF FR (set to 0) FI (set to 0) FPRF FR (set to 0) FI (set to 0)
FX FX
VXSNAN VXSNAN
CR1 (if Rc = 1) CR1 (if Rc = 1)
This section gives examples of how the Floating Select instruction can be used to implement certain simple forms of
if-then-else constructions, without branching.
The examples show program fragments in an imaginary, C-like, high-level programming language, and the corre-
sponding program fragment using fsel and other Power ISA instructions. In the examples, a, b, x, y, and z are float-
ing-point variables, which are assumed to be in FPRs fa, fb, fx, fy, and fz. FPR fs is assumed to be available for
scratch space.
Warning: Care must be taken in using fsel if IEEE compatibility is required, or if the values being tested can be NaNs
or infinities; see Section .
High-level language: Power ISA: Notes High-level language: Power ISA: Notes
if a 0.0 then fsel fx,fa,fy,fz (1) if a b then x y fsub fs,fa,fb (4,5)
x y else x z fsel fx,fs,fy,fz
else if a > b then x y fsub fs,fb,fa (3,4,5)
x z else x z fsel fx,fs,fz,fy
if a > 0.0 then fneg fs,fa (1,2) if a = b then x y fsub fs,fa,fb (4,5)
x y fsel fx,fs,fz,fy else x z fsel fx,fs,fy,fz
else fneg fs,fs
x z fsel fx,fs,fx,fz
if a = 0.0 then fsel fx,fa,fy,fz (1)
x y fneg fs,fa
else fsel fx,fs,fx,fz
x z
Notes:
The following Notes apply to the preceding examples 1. The unoptimized program affects the VXSNAN bit of
and to the corresponding cases using the other three the FPSCR, and therefore may cause the system
arithmetic relations (<, , and ). They should also be error handler to be invoked if the corresponding
considered when any other use of fsel is contemplated. exception is enabled, while the optimized program
does not affect this bit. This property of the opti-
In these Notes, the “optimized program” is the Power
mized program is incompatible with the IEEE stan-
ISA program shown, and the “unoptimized program”
dard.
(not shown) is the corresponding Power ISA program
that uses fcmpu and Branch Conditional instructions 2. The optimized program gives the incorrect result if
instead of fsel. a is a NaN.
3. The optimized program gives the incorrect result if 5. The optimized program affects the OX, UX, XX, and
a and/or b is a NaN (except that it may give the cor- VXISI bits of the FPSCR, and therefore may cause
rect result in some cases for the minimum and the system error handler to be invoked if the corre-
maximum functions, depending on how those sponding exceptions are enabled, while the unopti-
functions are defined to operate on NaNs). mized program does not affect these bits. This
property of the optimized program is incompatible
4. The optimized program gives the incorrect result if
with the IEEE standard.
a and b are infinities of the same sign. (Here it is
assumed that Invalid Operation Exceptions are
disabled, in which case the result of the subtrac-
tion is a NaN. The analysis is more complicated if
Invalid Operation Exceptions are enabled,
because in that case the target register of the sub-
traction is unchanged.)
The value of the U field is placed into FPSCR field The FPSCR is modified as specified by the FLM, L, and
BF+8*(1-W). W fields.
FPSCRFX is altered only if BF = 0 and W = 0. L=0
Special Registers Altered: The contents of register FRB are placed into the
FPSCR field BF + 8*(1-W) FPSCR under control of the W field and the field
CR1 (if Rc=1) mask specified by FLM. W and the field mask iden-
tify the 4-bit fields affected. Let i be an integer in
Programming Note the range 0-7. If FLMi=1 then FPSCR field k is set
mtfsfi serves as both a basic and an extended to the contents of the corresponding field of regis-
mnemonic. The Assembler will recognize a mtfsfi ter FRB, where k=i+8*(1-W).
mnemonic with three operands as the basic form, L=1
and a mtfsfi mnemonic with two operands as the
extended form. In the extended form the W oper- The contents of register FRB are placed into the
and is omitted and assumed to be 0. FPSCR.
FPSCRFX is not altered implicitly by this instruction.
Programming Note Special Registers Altered:
When FPSCR32:35 is specified, bits 32 (FX) and 35 FPSCR fields selected by mask, L, and W
(OX) are set to the values of U0 and U3 (i.e., even if CR1 (if Rc=1)
this instruction causes OX to change from 0 to 1,
FX is set from U0 and not by the usual rule that FX Programming Note
is set to 1 when an exception bit changes from 0 to mtfsf serves as both a basic and an extended
1). Bits 33 and 34 (FEX and VX) are set according mnemonic. The Assembler will recognize a mtfsf
to the usual rule, given on page 125, and not from mnemonic with four operands as the basic form,
U1:2. and a mtfsf mnemonic with two operands as the
extended form. In the extended form the W and L
operands are omitted and both are assumed to be
0.
Programming Note
If L=1 or if L=0 and FPSCR32:35 is specified, bits 32
(FX) and 35 (OX) are set to the values of (FRB)32
and (FRB)35 (i.e., even if this instruction causes OX
to change from 0 to 1, FX is set from (FRB)32 and
not by the usual rule that FX is set to 1 when an
exception bit changes from 0 to 1). Bits 33 and 34
(FEX and VX) are set according to the usual rule,
given on page 125, and not from (FRB)33:34.
Bit BT+32 of the FPSCR is set to 0. Bit BT+32 of the FPSCR is set to 1.
Special Registers Altered: Special Registers Altered:
FPSCR bit BT+32 FPSCR bits BT+32 and FX
CR1 (if Rc=1) CR1 (if Rc=1)
5.1 Decimal Floating-Point These instructions test the data class, the data
group, the exponent, or the number of significant
(DFP) Facility Overview digits of a DFP operand.
Quantum-adjustment instructions
This chapter describes the behavior of the decimal
floating-point facility, the supported data types, formats, These instructions convert a DFP number to a
and classes, and the usage of registers. Also included result in the form that has the designated expo-
are the execution model, exceptions, and instructions nent, which may be explicitly or implicitly specified.
supported by the decimal floating-point facility.
Conversion instructions
The decimal floating-point (DFP) facility shares the 32
These instructions perform conversion between
floating-point registers (FPRs) and the Floating-Point
different data formats or data types.
Status and Control Register (FPSCR) with the float-
ing-point (BFP) facility. However, the interpretation of Format instructions
data formats in the FPRs, and the meaning of some
These instructions facilitate composing or decom-
control and status bits in the FPSCR are different
posing a DFP operand.
between the BFP and DFP facilities.
These instructions are described in Section 5.6 “DFP
The DFP facility also shares the Condition Register
Instruction Descriptions” on page 193.
(CR) with the fixed-Point facility, the BFP faciltiy, and
the vector facility. The three DFP data formats allow finite numbers to be
represented with different precision and ranges. Spe-
The DFP facility supports three DFP data formats: DFP
cial codes are also provided to represent +Infinity-Infin-
Short (single precision), DFP Long (double precision),
ity Quiet NaN (Not-a-Number), and Signaling NaN.
and DFP Extended (quad precision). Most operations
Operations involving infinities produce results obeying
are performed on DFP Long or DFP Extended format
traditional mathematical conventions. NaNs have no
directly. Support for DFP Short is limited to conversion
mathematical interpretation. The encoding of NaNs
to and from DFP Long. Some DFP instructions operate
provides a diagnostic information field. This diagnostic
on other data types, including signed or unsigned
field may be used to indicate such things as the source
binary fixed-point data, and signed or unsigned decimal
of an uninitialized variable or the reason an invalid
data.
result was produced.
DFP instructions are provided to perform arithmetic,
The DFP processor recognizes a set of DFP excep-
compare, test, quantum-adjustment, conversion, and
tions which are indicated via bits set in the FPSCR.
format operations on operands held in FPRs or FPR
Additionally, the DFP exception actions depend on the
pairs.
setting of the various exception enable bits in the
Arithmetic instructions FPSCR.
These instructions perform addition, subtraction, The following DFP exceptions are detected by the DFP
multiplication, and division operations. processor. The exception status bits in the FPSCR are
indicated in parentheses.
Compare instructions
Invalid Operation Exception (VX)
These instructions perform a comparison opera- SNaN (VXSNAN)
tion on the numerical value of two DFP operands. - (VXISI)
Test instructions (VXIDI)
0 0 (VXZDZ)
% 0 (VXIMZ)
Invalid Compare (VXVC)
Invalid conversion (VXCVI) Data used as a source operand by any Decimal Float-
Zero Divide Exception (ZX) ing-Point instruction that was produced, either directly
Overflow Exception (OX) or indirectly, by a Load Floating-Point Single instruc-
Underflow Exception (UX) tion, a Floating Round to Single-Precision instruction,
Inexact Exception (XX) or a binary floating-point single-precision arithmetic
instruction is boundedly undefined.
Each DFP exception and each category of Invalid
Operation Exception has an exception status bit in the When an even-odd FPR pair is used to hold a 128-bit
FPSCR. In addition, each of the five DFP exceptions operand, the even-numbered FPR is used to hold the
has a corresponding enable bit in the FPSCR. These leftmost doubleword of the operand and the next
enable bits enable or disable the invocation of the sys- higher-numbered FPR is used to hold the rightmost
tem floating-point enabled exception error handler, and doubleword. A DFP instruction designating an
may affect the setting of some exception status bits in odd-numbered FPR for a 128-bit operand is an invalid
the FPSCR. instruction form.
The usage of these bits by the DFP facility differs from Programming Note
the usage by the BFP facility. Section 5.5.10 “DFP
Exceptions” on page 185 provides a detailed discus- The Floating-Point Move instructions can be used
sion of DFP exceptions, including the effects of the to move operands between FPRs.
enable bits.
The bit definitions for the FPSCR are as follows.
exception bits. mcrfs, mtfsfi, mtfsf, mtfsb0, Overflow Exception. See Section 5.5.1. This
and mtfsb1 cannot alter FPSCRVX explicitly. bit is not sticky.
35 Floating-Point Overflow Exception (OX) See the definition of FPSCRXX, above,
See Section 5.5.10.3, “Overflow Exception” regarding the relationship between FPSCRFI
on page 189. and FPSCRXX.
36 Floating-Point Underflow Exception (UX) 47:51 Floating-Point Result Flags (FPRF)
See Section 5.5.10.4, “Underflow Exception” This field is set as described below. For arith-
on page 189. metic, rounding, and conversion instructions,
the field is set based on the result placed into
37 Floating-Point Zero Divide Exception (ZX)
the target register, except that if any portion of
See Section 5.5.10.2, “Zero Divide Exception”
the result is undefined then the value placed
on page 188.
into FPRF is undefined.
38 Floating-Point Inexact Exception (XX)
47 Floating-Point Result Class Descriptor (C)
See Section 5.5.10.5, “Inexact Exception” on
Arithmetic, rounding, and conversion instruc-
page 190.
tions may set this bit with the FPCC bits, to
FPSCRXX is a sticky version of FPSCRFI (see indicate the class of the result as shown in
below). Thus the following rules completely Figure 58 on page 178.
describe how FPSCRXX is set by a given
48:51 Floating-Point Condition Code (FPCC)
instruction.
Floating-point Compare and DFP Test instruc-
If the instruction affects FPSCRFI, the tions set one of the FPCC bits to 1 and the
new value of FPSCRXX is obtained by other three FPCC bits to 0. Arithmetic, round-
ORing the old value of FPSCRXX with ing, and conversion instructions may set the
the new value of FPSCRFI. FPCC bits with the C bit, to indicate the class
If the instruction does not affect of the result as shown in Figure 58 on
FPSCRFI, the value of FPSCRXX is page 178. Note that in this case the high-order
unchanged. three bits of the FPCC retain their relational
significance indicating that the value is less
39 Floating-Point Invalid Operation Excep-
than, greater than, or equal to zero.
tion (SNaN) (VXSNAN)
See Section 5.5.10.1, “Invalid Operation 48 Floating-Point Less Than or Negative (FL
Exception” on page 187. or <)
40 Floating-Point Invalid Operation Excep- 49 Floating-Point Greater Than or Positive
tion (½ - ½) (VXISI) (FG or >)
See Section 5.5.10.1. 50 Floating-Point Equal or Zero (FE or =)
41 Floating-Point Invalid Operation Excep- 51 Floating-Point Unordered or NaN (FU or ?)
tion (½ + ½) (VXIDI)
See Section 5.5.10.1. 52 Reserved
142 Floating-Point Invalid Operation Excep- 53 Floating-Point Invalid Operation Excep-
tion (0+ 0) (VXZDZ) tion (Software Request) (VXSOFT)
See Section 5.5.10.1. This bit can be altered only by mcrfs, mtfsfi,
mtfsf, mtfsb0, or mtfsb1. See
43 Floating-Point Invalid Operation Excep- Section 5.5.10.1, “Invalid Operation Excep-
tion (½ % 0) (VXIMZ) tion” on page 187.
See Section 5.5.10.1.
54 Neither used nor changed by DFP.
44 Floating-Point Invalid Operation Excep-
tion (Invalid Compare) (VXVC) Programming Note
See Section 5.5.10.1.
Although the architecture does not pro-
45 Floating-Point Fraction Rounded (FR) vide a DFP square root instruction, if soft-
The last Arithmetic or Rounding and Conver- ware simulates such an instruction, it
sion instruction incremented the fraction should set bit 54 whenever the source
during rounding. See Section 5.5.1, “Round- operand of the square root function is
ing” on page 182. This bit is not sticky. invalid.
46 Floating-Point Fraction Inexact (FI)
The last Arithmetic or Rounding and Conver- 55 Floating-Point Invalid Operation Excep-
sion instruction either produced an inexact tion (Invalid Conversion) (VXCVI)
result during rounding or caused a disabled See Section 5.5.10.1.
56 Floating-Point Invalid Operation Excep- sign, which is followed by the numeric field. Positive
tion Enable (VE) numbers are represented in true binary notation with
See Section 5.5.10.1. the sign bit set to zero. When the value is zero, all bits
are zeros, including the sign bit. Negative numbers are
57 Floating-Point Overflow Exception Enable
represented in two’s complement binary notation with a
(OE)
one in the sign-bit position.
See Section 5.5.10.3, “Overflow Exception”
on page 189. For decimal data, each byte contains a pair of four-bit
58 Floating-Point Underflow Exception nibbles; each four-bit nibble contains a
Enable (UE) binary-coded-decimal (BCD) code. There are two kinds
See Section 5.5.10.4, “Underflow Exception” of BCD codes: digit code and sign code. For unsigned
on page 189. decimal data, all nibbles contain a digit code (D) as
shown in Figure 59
59 Floating-Point Zero Divide Exception
Enable (ZE) D D D D ... D D D D
See Section 5.5.10.2, “Zero Divide Exception”
on page 188. Figure 59. Format for Unsigned Decimal Data
60 Floating-Point Inexact Exception Enable For signed decimal data, the rightmost nibble contains
(XE) a sign code (S) and all other nibbles contain a digit
See Section 5.5.10.5, “Inexact Exception” on code as shown in Figure 60.
page 190
61 Reserved (not used by DFP) D D D D ... D D D S
62:63 Binary Floating-Point Rounding Control Figure 60. Format for Signed Decimal Data
(RN)
The decimal digits 0-9 have the binary encoding
See Section 5.5.1, “Rounding” on page 182.
0000-1001. The preferred plus-sign codes are 1100
00 Round to Nearest and 1111. The preferred minus sign code is 1101.
01 Round toward Zero These are the sign codes generated for the results of
10 Round toward +Infinity the Decode DPD To BCD instruction. A selection is pro-
11 Round toward -Infinity vided by this instruction to specify which of the two pre-
ferred plus sign codes is to be generated. Alternate
Result sign codes are also recognized as valid in the sign
Flags Result Value Class position: 1010 and 1110 are alternate sign codes for
C < > = ? plus, and 1011 is an alternate sign code for minus.
0 0 0 0 1 Signaling NaN (DFP only) Alternate sign codes are accepted for any source oper-
1 0 0 0 1 Quiet NaN and, but are not generated as a result by the instruc-
0 1 0 0 1 - Infinity tion. When an invalid digit or sign code is detected by
the Encode BCD To DPD instruction, an invalid-opera-
0 1 0 0 0 - Normal Number
1 1 0 0 0 - Subnormal Number
1 0 0 1 0 - Zero
0 0 0 1 0 + Zero
1 0 1 0 0 + Subnormal Number
0 0 1 0 0 + Normal Number
0 0 1 0 1 + Infinity
tion exception occurs. A summary of digit and sign 5.4.1 DFP Data Format
codes are provided in Figure 61.
DFP numbers and NaNs may be represented in FPRs
Binary Recognized As in any of the three data formats: DFP Short, DFP Long,
Code Digit Sign or DFP Extended. The contents of each data format
represent encoded information. Special codes are
0000 0 Invalid assigned to NaNs and infinities. Different formats sup-
0001 1 Invalid port different sizes in both significand and exponent.
0010 2 Invalid Arithmetic, compare, test, quantum-adjustment, and
format instructions are provided for DFP Long and DFP
0011 3 Invalid
Extended formats only.
0100 4 Invalid
The sign is encoded as a one bit binary value. Signifi-
0101 5 Invalid cand is encoded as an unsigned decimal integer in two
0110 6 Invalid distinct parts. The leftmost digit (LMD) of the significand
0111 7 Invalid is encoded as part of the combination field; the remain-
ing digits of the significand are encoded in the trailing
1000 8 Invalid
significand field. The exponent is contained in the com-
1001 9 Invalid bination field in two parts. However, prior to encoding,
1010 Invalid Plus the exponent is converted to an unsigned binary value
1011 Invalid Minus called the biased exponent by adding a bias value
which is a constant for each format. The two leftmost
1100 Invalid Plus (preferred; option 1) bits of the biased exponent are encoded with the left-
1101 Invalid Minus (preferred) most digit of the significand in the leftmost bits of the
1110 Invalid Plus combination field. The rest of the biased exponent
occupies the remaining portion of the combination field.
1111 Invalid Plus (preferred; option 2)
Figure 61. Summary of BCD Digit and Sign Codes 5.4.1.1 Fields Within the Data Format
The DFP data representation comprises three fields, as
5.4 DFP Number Representation diagrammed below for each of the three formats:
cand consists of a number of decimal digits, which are Figure 62. DFP Short format
to the left of the implied decimal point. The rightmost
digit of the significand is called the units digit. The
numerical value of a DFP finite number is represented
S G T
as (-1)sign % significand % 10exponent and the unit
0 1 14 63
value of this number is (1 % 10exponent), which is called
Figure 63. DFP Long format
the quantum.
DFP finite numbers are not normalized. This allows
leading zeros and trailing zeros to exist in the signifi-
S G T
cand. This unnormalized DFP number representation
0 1 18 63
allows some values to have redundant forms; each
form represents the DFP number with a different com- T (continued)
bination of the significand value and the exponent 64 127
value. For example, 1000000 % 105 and 10 % 1010 Figure 64. DFP Extended format
are two different forms of the same numerical value. A
form of this number representation carries information The fields are defined as follows:
about both the numerical value and the quantum of a
Sign bit (S)
DFP finite number.
The sign bit is in bit 0 of each format, and is zero for
The significant digits of a DFP finite number are the dig- plus and one for minus.
its in the significand beginning with the leftmost non-
Combination field (G)
zero digit and ending with the units digit.
As the name implies, this field provides a combination
of the exponent and the left-most digit (LMD) of the sig-
nificand, for finite numbers, or provides a special code
for denoting the value as either a Not-a-Number or an For DFP finite numbers, the rightmost N-5 bits of the
Infinity. N-bit combination field contain the remaining bits of the
biased exponent. For NaNs, bit 5 of the combination
The first 5 bits of the combination field contain the
field is used to distinguish a Quiet NaN from a Signal-
encoding of NaN or infinity, or the two leftmost bits of
ing NaN; the remaining bits in a source operand are
the biased exponent and the leftmost digit (LMD) of the
ignored and they are set to zeros in a target operand by
significand. The following tables show the encoding:
most operations. For infinities, the rightmost N-5 bits of
the N-bit combination field of a source operand are
G0:4 Description ignored and they are set to zeros in a target operand by
most operations.
11111 NaN
11110 Infinity Trailing Significand field (T)
For DFP finite numbers, this field contains the remain-
All others Finite Number (see Figure 66) ing significand digits. For NaNs, this field may be used
Figure 65. Encoding of the G field for Special to contain diagnostic information. For infinities, con-
Symbols tents in this field of a source operand are ignored and
they are set to zeros in a target operand by most opera-
tions. The trailing significand field is a multiple of 10-bit
Leftmost 2-bits of biased exponent
LMD blocks. The multiple depends on the format. Each
00 01 10 10-bit block is called a declet and represents three dec-
0 00000 01000 10000 imal digits, using the Densely Packed Decimal (DPD)
1 00001 01001 10001 encoding defined in Appendix B.
2 00010 01010 10010
3 00011 01011 10011
5.4.1.2 Summary of DFP Data Formats
4 00100 01100 10100 The properties of the three DFP formats are summa-
rized in the following table:.
5 00101 01101 10101
6 00110 01110 10110
7 00111 01111 10111
8 11000 11010 11100
9 11001 11011 11101
Figure 66. Encoding of bits 0:4 of the G field for
Finite Numbers
Format
DFP Short DFP Long DFP Extended
Widths (bits):
Format 32 64 128
Sign (S) 1 1 1
Combination (G) 11 13 17
Trailing Significand (T) 20 50 110
Exponent:
Maximum biased 191 767 12,287
Maximum (Xmax) 90 369 6111
Minimum (Xmin) -101 -398 -6176
Bias 101 398 6176
Precision (p) (digits) 7 16 34
Magnitude:
Maximum normal number (Nmax) (107 - 1) x 1090 (1016 - 1) x 10369 (1034 - 1) x 106111
Minimum normal number (Nmin) 1 x 10-95 1 x 10-383 1 x 10-6143
Minimum subnormal number (Dmin) 1x 10-101 1x 10-398 1 x 10-6176
Figure 67. Summary of DFP Formats
Normally, source QNaNs are propagated during opera- Rounding sets FPSCR bits FR and FI. When an inex-
tions so that they will remain visible at the end. When a act exception occurs, FI is set to one; otherwise, FI is
QNaN is propagated, the sign is preserved, the decimal set to zero. When an inexact exception occurs and if
value of the trailing significand field is preserved but the rounded result is greater in magnitude than the
reencoded using the preferred DPD codes, and the intermediate result, then FR is set to one; otherwise,
contents in the rightmost N-6 bits of the combination FR is set to zero. The exception is the Round to FP
field set to zero, where N is the width of the combina- Integer Without Inexact instruction, which always sets
tion field for the format. FR and FI to zero. Rounding may cause an overflow
exception or underflow exception; it may also cause an
A source SNaN generally causes an invalid-operation
inexact exception.
exception. If the exception is disabled, the SNaN is
converted to the corresponding QNaN and propagated. Refer to Figure 70 below for rounding. Let Z be the
The primary encoding difference between an SNaN intermediate result of a DFP operation. Z may or may
and a QNaN is that bit 5 of an SNaN is 1 and bit 5 of a not fit in the destination’s precision. If Z is exactly one
QNaN is 0. When an SNaN is propagated as a QNaN, of the permissible representable resultant values, then
bit 5 is set to 0, and, just as with QNaN proagation, the the final result in all rounding modes is Z. Otherwise,
sign is preserved, the decimal value of the trailing sig- either Z1 or Z2 is chosen to approximate the result,
nificand field is preserved but reencoded using the pre- where Z1 and Z2 are the next larger and smaller per-
ferred DPD codes, and the contents in the rightmost missible resultant values, respectively.
N-6 bits of the combination field set to zero, where N is
the width of the combination field for the format. For
some format-conversion instructions, a source SNaN
does not cause an invalid-operation exception, and an By increasing |Z|
SNaN is returned as the target operand. Infinitely precise value
For instructions with two source NaNs and a NaN is to By decreasing |Z|
be propagated as the result, do the following.
If there is a QNaN in FRA and an SNaN in FRB,
the SNaN in FRB is propagated. Z2 Z1 0 Z2 Z1
Z Z
Otherwise, propagate the NaN is FRA.
Negative values Positive Values
5.5.4 Arithmetic Operations Infinities with like sign compare equal, that is, +
equals +, and -equals -.
Four arithmetic operations are provided: Add, Subtract,
A NaN compares as unordered with any other operand,
Multiply, and Divide.
whether a finite number, an infinity, or another NaN,
including itself.
5.5.4.1 Sign of Arithmetic Result
Execution of a compare instruction always completes,
The following rules govern the sign of an arithmetic regardless of whether any DFP exception occurs or
operation when the operation does not yield an excep- not, and whether the exception is enabled or not.
tion. They apply even when the operands or results are
zeros or infinities.
5.5.6 Test Operations
The sign of the result of an add operation is the
sign of the source operand having the larger abso- Four kinds of test operations are provided: Test Data
lute value. If both source operands have the same Class, Test Data Group, Test Exponent, and Test Sig-
sign, the sign of the result of an add operation is nificance.
the same as the sign of the source operands.
The Test Data Class instruction examines the contents
When the sum of two operands with opposite signs
of a source operand and determines if the operand is
is exactly zero, the sign of the result is positive in
one of the specified data classes. The test result and
all rounding modes except Round toward -, in
the sign of the source operand are indicated in the
which case the sign is negative.
FPSCRFPCC field and CR field BF.
The sign of the result of the subtract operation x - y
The Test Data Group instruction examines the contents
is the same as the sign of the result of the add
of a source operand and determines if the operand is
operation x + (-y).
one of the specified data groups. The test result and
The sign of the result of a multiply or divide opera- the sign of the source operand are indicated in the
tion is the exclusive-OR of the signs of the source FPSCRFPCC field and CR field BF.
operands.
The Test Exponent instruction compares the exponent
of the two source operands. The test operation ignores
5.5.5 Compare Operations the sign and significand of operands. Infinities compare
equal, and NaNs compare equal. The test result is indi-
Two sets of instructions are provided for comparing cated in the FPSCRFPCC field and CR field BF.
numerical values: Compare Ordered and Compare
Unordered. In the absence of NaNs, these instructions The Test Significance instruction compares the number
work the same. These instructions work differently of significant digits of one source operand with the ref-
when either of the followings is true: erenced number of significant digits in another source
operand. The test result is indicated in the FPSCRFPCC
1. At least one source operand of the instruction is an field and CR field BF.
SNaN and the invalid-operation exception is dis-
abled. Execution of a test instruction does not cause any DFP
2. When there is no SNaN in any source operand, at exception.
least one source operand of the instruction is a
QNaN
5.5.7 Quantum Adjustment Opera-
In case 1, Compare Unordered recognizes an
invalid-operation exception and sets the FPSCRVX-
tions
SNAN flag, but Compare Ordered recognizes the excep- Four kinds of quantum-adjustment operations are pro-
tion and sets both the FPSCRVXSNAN and FPSCRVXVC vided: Quantize, Quantize Immediate, Reround, and
flags. In case 2, Compare Unordered does not recog- Round To FP Integer. Each of them has an immediate
nize an exception, but Compare Ordered recognizes an field which specifies whether the rounding mode in
invalid-operation exception and sets the FPSCRVXVC FPSCR or a different one is to be used.
flag.
The Quantize instruction is used to adjust a DFP num-
For finite numbers, comparisons are performed on val- ber to the form that has the specified target exponent.
ues, that is, all redundant forms of a DFP number are The Quantize Immediate instruction is similar to the
treated equal. Quantize instruction, except that the target exponent is
Comparisons are always exact and cannot cause an specified in a 5-bit immediate field as a signed binary
inexact exception. integer and has a limited range.
Comparison ignores the sign of zero, that is, +0 equals The Reround instruction is used to simulate a DFP
-0. operation of a precision other than that of DFP Long or
DFP Extended. For the Reround instruction to produce
a result which accurately reflects that which would have When converting an infinity between DFP Long and
resulted from a DFP operation of the desired precision DFP Extended, a default infinity with the same sign is
d in the range {1: 33} inclusively, the following condi- produced.
tions must be met:
When converting an SNaN between DFP Short and
The precision of the preceding DFP operation DFP Long, it is converted to an SNaN without causing
must be at least one digit larger than d. an invalid-operation exception. When converting an
The rounding mode used by the preceding DFP SNaN between DFP Long and DFP Extended, the
operation must be round-to-pre- invalid-operation exception occurs; if the invalid-opera-
pare-for-shorter-precision. tion exception is disabled, the result is converted to the
corresponding QNaN.
The Round To FP Integer instruction is used to round a
DFP number to an integer value of the same format.
The target exponent is implicitly specified, and is
5.5.8.2 Data-Type Conversion
greater than or equal to zero. The instructions Convert From Fixed and Convert To
Fixed are provided to convert a number between the
DFP data type and the signed 64-bit binary-integer data
5.5.8 Conversion Operations type.
There are two kinds of conversion operations: data-for- Conversion of a signed 64-bit binary integer to a DFP
mat conversion and data-type conversion. Extended number is always exact.
Conversion of a DFP number to a signed 64-bit binary
5.5.8.1 Data-Format Conversion integer results in an invalid-operation exception when
The instructions Convert To DFP Long and Convert To the converted value does not fit into the target format,
DFP Extended convert DFP operands to wider formats; or when the source operand is an infinity or NaN. When
the instructions Round To DFP Short and Round To the exception is disabled, the most positive integer is
DFP Long convert DFP operands to narrower formats. returned if the source operand is a positive number or
+, and the most negative integer is returned if the
When converting a finite number to a wider format, the source operand is a negative number, -, or NaN.
result is exact. When converting a finite number to a
narrower format, the source operand is rounded to the
target-format precision, which is specified by the 5.5.9 Format Operations
instruction, not by the target register size.
The format instructions are provided to facilitate com-
When converting a finite number, the ideal exponent of posing or decomposing a DFP number, and consist of
the result is the source exponent. Encode BCD To DPD, Decode DPD To BCD, Extract
Biased Exponent, Insert Biased Exponent, Shift Signifi-
Conversion of an infinity or NaN to a different format
cand Left Immediate, and Shift Significand Right Imme-
does not preserve the source combination field. Let N
diate. A source operand of SNaN does not cause an
be the width of the target format’s combination field.
invalid-operation exception, and an SNaN may be pro-
When the result is an infinity or a QNaN, the con- duced as the target operand.
tents of the rightmost N-5 bits of the N-bit target
combination field are set to zero.
5.5.10 DFP Exceptions
When the result is an SNaN, bit 5 of the target for-
mat’s combination field is set to one and the right- This architecture defines the following DFP exceptions:
most N-6 bits of the N-bit target combination field Invalid Operation Exception
are set to zero. SNaN
When converting a NaN to a wider format or when con- -
verting an infinity from DFP Short to DFP Long, digits in
the source trailing significand field are reencoded using 0 0
the preferred DPD codes with sufficient zeros %0
appended on the left to form the target trailing signifi- Invalid Compare
cand field. When converting a NaN to a narrower for- Invalid Conversion
mat or when converting an infinity from DFP Long to Zero Divide Exception
DFP Short, the appropriate number of leftmost digits of Overflow Exception
the source trailing significand field are removed and the Underflow Exception
remaining digits of the field are reencoded using the Inexact Exception
preferred DPD codes to form the target trailing signifi- These exceptions may occur during execution of a DFP
cand field. instruction.
Each DFP exception, and each category of the Invalid Disabled Inexact
Operation Exception, has an exception status bit in the Enabled Overflow
FPSCR. In addition, each DFP exception has a corre- Enabled Underflow
sponding enable bit in the FPSCR. The exception sta- Enabled Inexact
tus bit indicates occurrence of the corresponding
Subsequent sections define each of the DFP excep-
exception. If an exception occurs, the corresponding
tions and specify the action that is taken when they are
enable bit governs the result produced by the instruc-
detected.
tion and, in conjunction with the FE0 and FE1 bits (see
the discussion of FE0 and FE1 below), whether and The IEEE standard specifies the handling of excep-
how the system floating-point enabled exception error tional conditions in terms of “traps” and “trap handlers”.
handler is invoked. (In general, the enabling specified In this architecture, a FPSCR exception enable bit of 1
by the enable bit is of invoking the system error han- causes generation of the result value specified in the
dler, not of permitting the exception to occur. The IEEE standard for the “trap enabled” case: the expecta-
occurrence of an exception depends only on the tion is that the exception will be detected by software,
instruction and its source operands, not on the setting which will revise the result. A FPSCR exception enable
of any control bits. The only deviation from this general bit of 0 causes generation of the “default result” value
rule is that the occurrence of an Underflow Exception specified for the “trap disabled” (or “no trap occurs” or
may depend on the setting of the enable bit.) “trap is not implemented”) case: the expectation is that
the exception will not be detected by software, which
A single instruction, other than mtfsfi or mtfsf, may set
will simply use the default result. The result to be deliv-
more than one exception bit only in the following cases:
ered in each case for each exception is described in
Inexact Exception may be set with Overflow the sections below.
Exception.
Inexact Exception may be set with Underflow The IEEE default behavior when an exception occurs is
Exception. to generate a default value and not to notify software.
Invalid Operation Exception (SNaN) may be set In this architecture, if the IEEE default behavior when
with Invalid Operation Exception (Invalid Compare) an exception occurs is desired for all exceptions, all
for Compare Ordered instructions FPSCR exception enable bits should be set to zero and
Invalid Operation Exception (SNaN) may be set Ignore Exceptions Mode (see below) should be used.
with Invalid Operation Exception (Invalid Conver- In this case the system floating-point enabled exception
sion) for Convert To Fixed instructions. error handler is not invoked, even if DFP exceptions
occur: software can inspect the FPSCR exception bits if
When an exception occurs the instruction execution necessary, to determine whether exceptions have
may be completed or partially completed, depending on occurred.
the exception and the operation.
In this architecture, if software is to be notified that a
For all instructions, except for the Compare and Test given kind of exception has occurred, the correspond-
instructions, the following exceptions cause the instruc- ing FPSCR exception enable bit must be set to one and
tion execution to be partially completed. That is, setting a mode other than Ignore Exceptions Mode must be
of CR field 1(when Rc=1) and exception status flags is used. In this case the system floating-point enabled
performed, but no result is stored into the target FPR or exception error handler is invoked if an enabled DFP
FPR pair. For Compare and Test instructions, instruc- exception occurs. The system floating-point enabled
tion execution is always completed, regardless of exception error handler is also invoked if a Move To
whether any DFP exception occurs or not, and whether FPSCR instruction causes an exception bit and the cor-
the exception is enabled or not. responding enable bit both to be 1; the Move To
Enabled Invalid Operation FPSCR instruction is considered to cause the enabled
Enabled Zero Divide exception.
For the remaining kinds of exceptions, instruction exe- The FE0 and FE1 bits control whether and how the sys-
cution is completed, a result, if specified by the instruc- tem floating-point enabled exception error handler is
tion, is generated and stored into the target FPR or invoked if an enabled DFP exception occurs. The loca-
FPR pair, and appropriate status flags are set. The tion of these bits and the requirements for altering them
result may be a different value for the enabled and dis- are described in Book III, Power ISA Operating Envi-
abled conditions for some of these exceptions. The ronment Architecture. (The system floating-point
kinds of exceptions that deliver a result in target FPR enabled exception error handler is never invoked
are the following:
Disabled Invalid Operation
Disabled Zero Divide
Disabled Overflow
Disabled Underflow
because of a disabled DFP exception.) The effects of exception is not among those listed on page 185 as
the four possible settings of these bits are as follows. suppressed.
For add or subtract operations, magnitude subtrac- When Invalid Operation Exception is disabled
tion of infinities (+) + (-) (FPSCRVE=0) and Invalid Operation occurs, the follow-
Division of infinity by infinity ( ) ing actions are taken:
Division of zero by zero (0 0)
1. One or two Invalid Operation Exceptions are set:
Multiplication of infinity by zero (% 0)
FPSCRVXSNAN (if SNaN)
Ordered comparison involving a NaN (Invalid
FPSCRVXISI (if - )
Compare)
FPSCRVXIDI (if )
The Quantize operation detects that the significand
FPSCRVXZDZ (if 0 0)
associated with the specified target exponent
FPSCRVXIMZ (if x 0)
would have more significant digits than the tar-
FPSCRVXVC (if invalid comp)
get-format precision
FPSCRVXCVI (if invalid conversion)
For the Quantize operation, when one source
2. If the operation is an arithmetic, quantum-adjust-
operand specifies an infinity and the other speci-
ment, Round to DFP Long, Convert to DFP
fies a finite number
Extended, or format
The Reround operation detects that the target
the target FPR is set to a Quiet NaN
exponent associated with the specified target sig-
FPSCRFR FI are set to zero
nificance would be greater than Xmax
FPSCRFPRF is set to indicate the class of the
The Encode BCD To DPD operation detects an
result (Quiet NaN)
invalid BCD digit or sign code
3. If the operation is a Convert To Fixed
The Convert To Fixed operation involving a num-
the target FPR is set as follows:
ber too large in magnitude to be represented in the
FRT is set to the most positive 64-bit binary
target format, or involving a NaN.
integer if the operand in FRB is a positive or
+, and to the most negative 64-bit binary
Programming Note
integer if the operand in FRB is a negative
In addition, an Invalid Operation Exception occurs if number, - , or NaN.
software explicitly requests this by executing an FPSCRFR FI are set to zero
mtfsfi, mtfsf, or mtfsb1 instruction that sets FPSCRFPRF is unchanged
FPSCRVXSOFT to 1 (Software Request). The pur- 4. If the operation is a compare,
pose of FPSCRVXSOFT is to allow software to FPSCRFR FI C are unchanged
cause an Invalid Operation Exception for a condi- FPSCRFPCC is set to reflect unordered
tion that is not necessarily associated with the exe-
cution of a DFP instruction. For example, it might
be set by a program that computes a square root, if 5.5.10.2 Zero Divide Exception
the source operand is negative.
Definition
Action A Zero Divide Exception occurs when a Divide instruc-
tion is executed with a zero divisor value and a finite
The action to be taken depends on the setting of the nonzero dividend value.
Invalid Operation Exception Enable bit of the FPSCR.
When Invalid Operation Exception is enabled Action
(FPSCRVE=1) and Invalid Operation occurs, the follow-
The action to be taken depends on the setting of the
ing actions are taken:
Zero Divide Exception Enable bit of the FPSCR.
1. One or two Invalid Operation Exceptions are set:
When Zero Divide Exception is enabled (FPSCRZE=1)
FPSCRVXSNAN (if SNaN)
and Zero Divide occurs, the following actions are taken:
FPSCRVXISI (if - )
FPSCRVXIDI (if ) 1. Zero Divide Exception is set
FPSCRVXZDZ (if 0 0) FPSCRZX 1
FPSCRVXIMZ (if % 0) 2. The target FPR is unchanged
FPSCRVXVC (if invalid comp) 3. FPSCRFR FI are set to zero
FPSCRVXCVI (if invalid conversion) 4. FPSCRFPRF is unchanged
2. If the operation is an arithmetic, quantum-adjust-
When Zero Divide Exception is disabled (FPSCRZE=0)
ment, conversion, or format,
and Zero Divide occurs, the following actions are taken:
the target FPR is unchanged,
FPSCRFR FI are set to zero, and 1. Zero Divide Exception is set
FPSCRFPRF is unchanged. FPSCRZX 1
3. If the operation is a compare, 2. The target FPR is set to ±, where the sign is
FPSCRFR FI C are unchanged, and determined by the XOR of the signs of the oper-
FPSCRFPCC is set to reflect unordered. ands
3. FPSCRFR FI are set to zero 3. The result is determined by the rounding mode
4. FPSCRFPRF is set to indicate the class and sign of and the sign of the intermediate result as follows.
the result ()
Sign of inter-
5.5.10.3 Overflow Exception mediate result
Definition
Except for Round to FP Integer Without Inexact, the fol-
lowing describes the handling of the IEEE inexact
exception condition. The Round to FP Integer Without
Inexact does not recognize an inexact exception condi-
tion.
An Inexact Exception occurs when either of two condi-
tions occur during rounding:
Result (r)
when Rounding Mode Is
Range of v Case RNE RNTZ RNAZ RAFZ RTMI RFSP RTPI RTZ
1 1 1 1 1
v < -Nmax, q < -Nmax Overflow - - - - - -Nmax -Nmax -Nmax
v < -Nmax, q = -Nmax Normal -Nmax -Nmax -Nmax — — -Nmax -Nmax -Nmax
-Nmax v -Nmin Normal b b b b b b b b
-Nmin < v -Dmin Tiny b* b* b* b* b* b* b b
-Dmin < v < -Dmin/2 Tiny -Dmin -Dmin -Dmin -Dmin -Dmin -Dmin -0 -0
v = -Dmin/2 Tiny -0 -0 -Dmin -Dmin -Dmin -Dmin -0 -0
-Dmin/2 < v < 0 Tiny -0 -0 -0 -Dmin -Dmin -Dmin -0 -0
v=0 EZD +0 +0 +0 +0 -0 +0 +0 +0
0 < v < +Dmin/2 Tiny +0 +0 +0 +Dmin +0 +Dmin +Dmin +0
v = +Dmin/2 Tiny +0 +0 +Dmin +Dmin +0 +Dmin +Dmin +0
+Dmin/2 < v < +Dmin Tiny +Dmin +Dmin +Dmin +Dmin +0 +Dmin +Dmin +0
+Dmin v < +Nmin Tiny b* b* b* b* b b* b* b
+Nmin v +Nmax Normal b b b b b b b b
+Nmax < v, q = +Nmax Normal +Nmax +Nmax +Nmax — +Nmax +Nmax — +Nmax
+Nmax < v, q > +Nmax Overflow +1 +1 +1 +1 +Nmax +Nmax +1 +Nmax
Explanation:
— This situation cannot occur.
1 The normal result r is considered to have been incremented.
* The rounded value, in the extreme case, may be Nmin. In this case, the exception conditions are underflow,
inexact, and incremented.
b The value derived when the precise result v is rounded to the destination’s precision, including both bounded
precision and bounded exponent range.
q The value derived when the precise result v is rounded to the destination’s precision, but assuming an
unbounded exponent range.
r This is the returned value when neither overflow nor underflow is enabled.
v Precise result before rounding, assuming unbounded precision and an unbounded exponent range. For
data-format conversion operations, v is the source value.
Dmin Smallest (in magnitude) representable subnormal number in the target format.
EZD The result r of the exact-zero-difference case applies only to ADD and SUBTRACT with both source operands
having opposite signs. (For ADD and SUBTRACT, when both source operands have the same sign, the sign of
the zero result is the same sign as the sign of the source operands.)
Nmax Largest (in magnitude) representable finite number in the target format.
Nmin Smallest (in magnitude) representable normalized number in the target format.
RAFZ Round away from 0.
RFSP Round to Prepare for Shorter Precision.
RNAZ Round to Nearest, Ties away from 0.
RNE Round to Nearest, Ties to even.
RNTZ Round to Nearest, Ties toward 0.
RTPI Round toward +.
RTMI Round toward -
RTZ Round toward 0.
Figure 76. Rounding and Range Actions (Part 1)
Is r Is r Incre- Is q Is q Incre-
inexact mented inexact mented
Case (rv) OE=1 UE=1 XE=1 (|r|>|v|) (qv) (|q|>|v|) Returned Results and Status Setting*
Overflow Yes1 No — No No — — T(r), OX1, FI1, FR0, XX 1
Overflow Yes1 No — No Yes — — T(r), OX1, FI1, FR1, XX 1
Overflow Yes1 No — Yes No — — T(r), OX1, FI1, FR0, XX 1, TX
Overflow Yes1 No — Yes Yes — — T(r), OX1, FI1, FR1, XX 1, TX
Overflow Yes1 Yes — — — No No1 Tw(q), OX1, FI0, FR0, TO
Overflow Yes1 Yes — — — Yes No Tw(q), OX1, FI1, FR0, XX 1,TO
Overflow Yes1 Yes — — — Yes Yes Tw(q), OX1, FI1, FR1, XX 1,TO
Normal No — — — — — — T(r), FI0, FR0
Normal Yes — — No No — — T(r), FI1, FR0, XX 1
Normal Yes — — No Yes — — T(r), FI1, FR1, XX 1
Normal Yes — — Yes No — — T(r), FI1, FR0, XX 1, TX
Normal Yes — — Yes Yes — — T(r), FI1, FR1, XX 1, TX
Tiny No — No — — — — T(r), FI0, FR0
Tiny No — Yes — — No1 No1 Tw(q), UX1, FI0, FR0, TU
Tiny Yes — No No No — — T(r), UX1, FI1, FR0, XX 1
Tiny Yes — No No Yes — — T(r), UX1, FI1, FR1, XX 1
Tiny Yes — No Yes No — — T(r), UX1, FI1, FR0, XX 1, TX
Tiny Yes — No Yes Yes — — T(r), UX1, FI1, FR1, XX 1, TX
Tiny Yes — Yes — — No No1 Tw(q), UX1, FI0, FR0, TU
Tiny Yes — Yes — — Yes No Tw(q), UX1, FI1, FR0, XX 1,TU
Tiny Yes — Yes — — Yes Yes Tw(q), UX1, FI, FR1, XX 1,TU
Explanation:
— The results do not depend on this condition.
1 This condition is true by virtue of the state of some condition to the left of this column.
* Rounding sets only the FI and FR status flags. Setting of the OX, XX, or UX flag is part of the exception actions. They
are listed here for reference.
Wrap adjust, which depends on the type of operation and operand format. For all operations except Round to DFP
Short and Round to DFP Long, the wrap adjust depends on the target format: = 10, where is 576 for DFP Long,
and 9216 for DFP Extended. For Round to DFP Short and Round to DFP Long, the wrap adjust depends on the source
format: = 10 where is 192 for DFP Long and 3072 for DFP Extended.
q The value derived when the precise result v is rounded to destination’s precision, but assuming an unbounded
exponent range.
r The result as defined in Part 1 of this figure.
v Precise result before rounding, assuming unbounded precision and unbounded exponent range.
FI Floating-Point-Fraction-Inexact status flag, FPSCRFI. This status flag is non-sticky.
FR Floating-Point-Fraction-Rounded status flag, FPSCRFR.
OX Floating-Point Overflow Exception status flag, FPSCRoX.
TO The system floating-point enabled exception error handler is invoked for the overflow exception if the FE0 and FE1 bits
in the machine-state register are set to any mode other than the ignore-exception mode.
TU The system floating-point enabled exception error handler is invoked for the underflow exception if the FE0 and FE1 bits
in the machine-state register are set to any mode other than the ignore-exception mode.
TX The system floating-point enabled exception error handler is invoked for the inexact exception if the FE0 and FE1 bits in
the machine-state register are set to any mode other than the ignore-exception mode.
T(x) The value x is placed at the target operand location.
Tw(x) The wrapped rounded result x is placed at the target operand location. For all operations except data format
conversions, the wrapped rounded result is in the same format and length as normal results at the target location. For
data format conversions, the wrapped rounded result is in the same format and length as the source, but rounded to the
target-format precision.
UX Floating-Point-Underflow-Exception status flag, FPSCRUX
XX Float-Point-Inexact-Exception Status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a
new value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of
FPSCRFI.
DFP Multiply [Quad] X-form invalid-operation exception, in which case the field
remains unchanged.
dmul FRT,FRA,FRB (Rc=0)
Special Registers Altered:
dmul. FRT,FRA,FRB (Rc=1)
FPRF FR FI
FX OX UX XX VXSNAN VXIMZ
59 FRT FRA FRB 34 Rc CR1 (if Rc=1)
0 6 11 16 21 31
DFP Divide [Quad] X-form Figure 80 summarizes the actions for Divide. Figure 80
does not include the setting of the FPSCRFPRF field.
ddiv FRT,FRA,FRB (Rc=0) The FPSCRFPRF field is always set to the class and
ddiv. FRT,FRA,FRB (Rc=1) sign of the result, except for an enabled invalid-opera-
tion and enabled zero-divide exceptions, in which
59 FRT FRA FRB 546 Rc cases the field remains unchanged.
0 6 11 16 21 31
Special Registers Altered:
FPRF FR FI
ddivq FRTp,FRAp,FRBp (Rc=0) FX OX UX ZX XX
ddivq. FRTp,FRAp,FRBp (Rc=1) VXSNAN VXIDI VXZDZ
CR1 (if Rc=1)
63 FRTp FRAp FRBp 546 Rc
0 6 11 16 21 31
dcmpuq BF,FRAp,FRBp
dcmpoq BF,FRAp,FRBp
DFP Test Data Class [Quad] Z22-form DFP Test Data Group [Quad] Z22-form
dtstdc BF,FRA,DCM dtstdg BF,FRA,DGM
Let the DCM (Data Class Mask) field specify one or Let the DGM (Data Group Mask) field specify one or
more of the 6 possible data classes, where each bit more of the 6 possible data groups, where each bit cor-
corresponds to a specific data class. responds to a specific data group.
DCM Bit Data Class The term extreme exponent means either the maxi-
mum exponent, Xmax, or the minimum exponent, Xmin.
0 Zero
1 Subnormal DGM Bit Data Group
2 Normal 0 Zero with non-extreme exponent
3 Infinity 1 Zero with extreme exponent
4 Quiet NaN 2 Subnormal or (Normal with extreme expo-
5 Signaling NaN nent)
CR field BF and FPSCRFPCC are set to indicate the 3 Normal with non-extreme exponent and
sign of the DFP operand in FRA[p] and whether the leftmost zero digit in significand
data class of the DFP operand in FRA[p] matches any 4 Normal with non-extreme exponent and
of the data classes specified by DCM. leftmost nonzero digit in significand
5 Special symbol (Infinity, QNaN, or SNaN)
Field Meaning
CR field BF and FPSCRFPCC are set to indicate the
0000 Operand positive with no match sign of the DFP operand in FRA[p] and whether the
0010 Operand positive with match data group of the DFP operand in FRA[p] matches any
1000 Operand negative with no match of the data groups specified by DGM.
1010 Operand negative with match
Field Meaning
Special Registers Altered:
CR field BF 0000 Operand positive with no match
FPCC 0010 Operand positive with match
1000 Operand negative with no match
1010 Operand negative with match
Special Registers Altered:
CR field BF
FPCC
dtstexq BF,FRAp,FRBp
Bit Description
0 Ea < Eb
1 Ea > Eb
2 Ea = Eb
3 Ea ? Eb
Special Registers Altered:
CR field BF
FPCC
DFP Test Significance [Quad] X-form DFP Test Significance Immediate [Quad]
dtstsf BF,FRA,FRB
X-form
dtstsfi BF,UIM,FRB
59 BF / FRA FRB 674 /
0 6 9 10 16 21 31 59 BF / UIM FRB 675 /
0 6 9 10 16 21 31
dtstsfq BF,FRA,FRBp
dtstsfiq BF,UIM,FRBp
63 BF / FRA FRBp 674 /
0 6 9 10 16 21 31 63 BF / UIM FRBp 675 /
0 6 9 10 16 21 31
Let k be the contents of bits 58:63 of FPR[FRA] that
specifies the reference significance. Let the value UIM specify the reference significance.
For dtstsf, let the value NSDb be the number of For dtstsfi, let the value NSDb be the number of
significant digits of the DFP value in FPR[FRB]. significant digits of the DFP value in FPR[FRB].
For dtstsfq, let the value NSDb be the number of For dtstsfiq, let the value NSDb be the number of
significant digits of the DFP value in FPR[FRBp:FRBp+1]. significant digits of the DFP value in FPR[FRBp:FRBp+1].
For this instruction, the number of significant digits of For this instruction, the number of significant digits of
the value 0 is considered to be zero. the value 0 is considered to be zero.
NSDb is compared to k. The result of the compare is NSDb is compared to UIM. The result of the compare is
placed into CR field BF and the FPCC as follows. placed into CR field BF and the FPCC as follows.
1.00 (100 x 10-2) 9. (9 x 100) 9.00 (900 x 10-2) 9.00 (900 x 10-2)
1.00 (100 x 10-2) 49.1234 (491234 x 10-4) 49.12 (4912 x 10-2) 49.12 (4912 x 10-2)
1.00 (100 x 10-2) 49.9876 (499876 x 10-4) 49.98 (4998 x 10-2) 49.99 (4999 x 10-2)
0.01 (1 x 10-2) 49.9876 (499876 x 10-4) 49.98 (4998 x 10-2) 49.99 (4999 x 10-2)
9999999999999999
1.0 (10 x 10-1) QNaN QNaN
(9999999999999999 x 100)
DFP Reround [Quad] Z23-form invalid-operation exception, in which case the field
remains unchanged.
drrnd FRT,FRA,FRB,RMC (Rc=0)
Special Registers Altered:
drrnd. FRT,FRA,FRB,RMC (Rc=1
FPRF FR FI
FX XX
59 FRT FRA FRB RMC 35 Rc VXSNAN VXCVI
0 6 11 16 21 23 31 CR1 (if Rc=1)
If the exponent of the rounded result of the form that dxex f0,FRB
has the specified number of significant digits would be stfd f0,-16(r1)
greater than Xmax, an invalid operation exception drrnd f1,f13,FRB,1 # reround 1 digit toward 0
(VXCVI) occurs. When the invalid-operation exception dxex f1,f1
stfd f1,-8(r1)
occurs, and if the exception is disabled, a default QNaN
lfd r11,-16(r1)
is returned. When an invalid-operation exception lfd r3,-8(r1)
occurs, no inexact exception is recognized. subf r3,r11,r3
In the absence of an invalid-operation exception, if the addi r3,r3,1
result differs in value from the operand in FRB[p], an Given the value 412.34 the result is E(4 x 102) -
inexact exception is recognized. E(41234 x 10-2) + 1 = (398+2) - (398-2) + 1 = 400 -
396 + 1 = 5. Additional code is required to detect
This operation causes neither an overflow nor an
and handle special values like Subnormal, Infinity,
underflow exception.
and NAN.
Figure 90 summarizes the actions for Reround. The
table does not include the setting of the FPSCRFPRF
field. The FPSCRFPRF field is always set to the class
and sign of the result, except for an enabled
Programming Note
DFP Reround combined with DFP Quantize can be
used to left justify a value (as needed by the frexp
function). FRB is the DFP value for which we want
to left justify; f13 contains the reference significance
value 0x0000000000000001; and r1 is the stack
pointer, with free space for a doubleword at offset
-8. This doubleword is used to transfer the biased
exponents from the FPR to a GPR, for integer com-
putation. The adjusted biased exponent (+ format
precision - 1) is transferred back into an FPR so it
can be inserted into the rerounded value. The
adjusted rerounded value becomes the quantize
reference value. The quantize instruction returns
the left justified result in FRT.
drrnd f1,f13,FRB,1 # reround 1 digit toward 0
dxex f0,f1
stfd f0,-8(r1)
lfd r11,-8(r1)
addi r11,r11,15 # biased exp + precision - 1
lfd r11,-8(r1)
stfd f0,-8(r1)
diex f1,f0,f1 # adjust exponent
dqua FRT,f1,f0,1 # quantize to adjusted
exponent
Figure 92 summarizes the actions for Round To FP Note that nearbyint is similar to the rint function but
Integer Without Inexact. The table does not include the without raising the inexact exception. Similarly ceil,
setting of the FPSCRFPRF field. The FPSCRFPRF field floor, round, and trunc do not require the inexact
is always set to the class and sign of the result, except exception.
for an enabled invalid-operation, in which case the field
remains unchanged.
DFP Convert To DFP Long X-form DFP Convert To DFP Extended X-form
dctdp FRT,FRB (Rc=0) dctqpq FRTp,FRB (Rc=0)
dctdp. FRT,FRB (Rc=1) dctqpq. FRTp,FRB (Rc=1)
The DFP short operand in bits 32:63 of FRB is con- The DFP long operand in the FRB is converted to DFP
verted to DFP long format and the converted result is extended format and placed into FRTp. The sign of the
placed into FRT. The sign of the result is the same as result is the same as the sign of the operand in FRB.
the sign of the source operand. The ideal exponent is The ideal exponent is the exponent of the operand in
the exponent of the source operand. FRB.
If the operand in FRB is an SNaN, it is converted to an If the operand in FRB is an SNaN, an invalid-operation
SNaN in DFP long format and does not cause an exception is recognized. If the exception is disabled,
invalid-operation exception. the SNaN is converted to the corresponding QNaN in
DFP extended format.
Special Registers Altered:
FPRF FR (undefined) FI (undefined) Special Registers Altered:
CR1 (if Rc=1) FPRF FR (set to 0) FI (set to 0)
FX
Programming Note VXSNAN
Note that DFP short format is a storage-only for- CR1 (if Rc=1)
mat, Therefore, conversion of a short SNaN to long
format will not cause an exception and the SNaN is
preserved. Subsequent operation on that SNaN in
long format will cause an exception.
DFP Round To DFP Short X-form DFP Round To DFP Long X-form
drsp FRT,FRB (Rc=0) drdpq FRTp,FRBp (Rc=0)
drsp. FRT,FRB (Rc=1) drdpq. FRTp,FRBp (Rc=1)
The DFP long operand in FRB is converted and The DFP extended operand in FRBp is converted and
rounded to DFP short format. The DFP short value is rounded to DFP long format. The result concatenated
extended on the left with zeros to form a 64-bit entity with 64 0s is placed in FRTp. The sign of the result is
and placed into FRT. The sign of the result is the same the same as the sign of the source operand. The ideal
as the sign of the source operand. The ideal exponent exponent is the exponent of the operand in FRBp.
is the exponent of the source operand.
If the operand in FRBp is an SNaN, an invalid-opera-
If the operand in FRB is an SNaN, it is converted to an tion exception is recognized. If the exception is dis-
SNaN in DFP short format and does not cause an abled, the SNaN is converted to the corresponding
invalid-operation exception. QNaN in DFP long format.
Normally, the result is in the format and length of the Normally, the result is in the format and length of the
target. However, when an overflow or underflow target. However, when an overflow or underflow
exception occurs and if the exception is enabled, the exception occurs and if the exception is enabled, the
operation is completed by producing a wrapped operation is completed by producing a wrapped
rounded result in the same format and length as the rounded result in the same format and length as the
source but rounded to the target-format precision. source but rounded to the target-format precision.
Special Registers Altered: Special Registers Altered:
FPRF FR FI FPRF FR FI
FX OX UX XX FX OX UX XX
CR1 (if Rc=1) VXSNAN
CR1 (if Rc=1)
Programming Note
Note that DFP short format is a storage-only for- Programming Note
mat, Therefore, conversion of a long SNaN to short Note that DFP Round to DFP Long, while produc-
format will not cause an exception. Converting a ing a result in DFP long format, actually targets a
long format SNaN to short format is an implied register pair, writing 64 0s in FRTp+1.
move operation.
DFP Convert From Fixed X-form DFP Convert To Fixed [Quad] X-form
dcffix FRT,FRB (Rc=0) dctfix FRT,FRB (Rc=0)
dcffix. FRT,FRB (Rc=1) dctfix. FRT,FRB (Rc=1)
The 64-bit signed binary integer in FRB is converted dctfixq FRT,FRBp (Rc=0)
and rounded to a DFP Long value and placed into FRT. dctfixq. FRT,FRBp (Rc=1)
The sign of the result is the same as the sign of the
source operand. The ideal exponent is zero. 63 FRT /// FRBp 290 Rc
0 6 11 16 21 31
If the source operand is a zero, then a plus zero with a
zero exponent is returned.
The DFP operand in FRB[p] is rounded to an integer
The FPSCRFPRF field is set to the class and sign of the value and is placed into FRT in the 64-bit signed binary
result. integer format. The sign of the result is the same as
Special Registers Altered: the sign of the source operand, except when the source
FPRF FR FI operand is a NaN or a zero.
FX XX
Figure 94 summarizes the actions for Convert To Fixed.
CR1 (if Rc=1)
Special Registers Altered:
DFP Convert From Fixed Quad X-form FPRF (undefined) FR FI
FX XX
dcffixq FRTp,FRB (Rc=0) VXSNAN VXCVI
dcffixq. FRTp,FRB (Rc=1) CR1 (if Rc=1)
DFP Decode DPD To BCD [Quad] X-form DFP Encode BCD To DPD [Quad] X-form
ddedpd SP,FRT,FRB (Rc=0) denbcd S,FRT,FRB (Rc=0)
ddedpd. SP,FRT,FRB (Rc=1) denbcd. S,FRT,FRB (Rc=1)
A portion of the significand of the DFP operand in The signed or unsigned BCD operand, depending on
FRB[p] is converted to a signed or unsigned BCD num- the S field, in FRB[p] is converted to a DFP number.
ber depending on the SP field. For infinity and NaN, the The ideal exponent is zero.
significand is considered to be the contents in the trail-
ing significand field padded on the left by a zero digit. S = 0 (unsigned BCD operand)
The unsigned BCD operand in FRB[p] is converted
SP0 = 0 (unsigned conversion)
to a positive DFP number of the same magnitude
The rightmost 16 digits of the significand (32 digits and the result is placed into FRT[p].
for ddedpdq) is converted to an unsigned BCD
number and the result is placed into FRT[p]. S = 1 (signed BCD operand)
The signed BCD operand in FRB[p] is converted to
SP0 = 1 (signed conversion)
the corresponding DFP number and the result is
The rightmost 15 digits of the significand (31 digits placed into FRT[p].
for ddedpdq) is converted to a signed BCD num-
If an invalid BCD digit or sign code is detected in the
ber with the same sign as the DFP operand, and
source operand, an invalid-operation exception
the result is placed into FRT[p]. If the DFP operand
(VXCVI) occurs.
is negative, the sign is encoded as 0b1101. If the
DFP operand is positive, SP1 indicates which pre- FPSCRFPRF is set to the class and sign of the result,
ferred plus sign encoding is used. If SP1 = 0, the except for Invalid Operation Exception when
plus sign is encoded as 0b1100 (the option-1 pre- FPSCRVE=1.
ferred sign code), otherwise the plus sign is
encoded as 0b1111(the option-2 preferred sign Special Registers Altered:
code). FPRF FR (set to 0) FI (set to 0)
FX
Special Registers Altered: VXCVI
CR1 (if Rc=1) CR1 (if Rc=1)
DFP Extract Biased Exponent [Quad] DFP Insert Biased Exponent [Quad]
X-form X-form
dxex FRT,FRB (Rc=0) diex FRT,FRA,FRB (Rc=0)
dxex. FRT,FRB (Rc=1) diex. FRT,FRA,FRB (Rc=1)
The biased exponent of the operand in FRB[p] is Let a be the value of the 64-bit signed binary integer in
extracted and placed into FRT in the 64-bit signed FRA.
binary integer format. When the operand in FRB is an a Result
infinity, QNaN, or SNaN, a special code is returned. a > MBE1 QNaN
MBE m a m Finite number with biased exponent a
Operand Result
0
Finite Number biased exponent value
a = -1 Infinity
Infinity -1
a = -2 QNaN
QNaN -2
a = -3 SNaN
SNaN -3
a < -3 QNaN
1 Maximum biased exponent for the target format
Special Registers Altered:
CR1 (if Rc=1)
When 0 [ a [ MBE, a is the biased target exponent that
is combined with the sign bit and the significand value
Programming Note
of the DFP operand in FRB[p] to form the DFP result in
The exponent bias value is 101 for DFP Short, 398 FRT[p]. The ideal exponent is the specified target expo-
for DFP Long, and 6176 for DFP Extended. nent.
When a specifies a special code (a < 0 or a > MBE), an
infinity, QNaN, or SNaN is formed in FRT[p] with the
trailing significand field containing the value from the
trailing significand field of the source operand in
FRB[p], and with an N-bit combination field set as fol-
lows.
For an Infinity result,
the leftmost 5 bits are set to 0b11110, and
the rightmost N-5 bits are set to zero.
For a QNaN result,
the leftmost 5 bits are set to 0b11111,
bit 5 is set to zero, and
the rightmost N-5 bits are set to zero.
For an SNaN result,
the leftmost 5 bits are set to 0b11111,
bit 5 is set to one, and
the rightmost N-5 bits are set to zero.
Special Registers Altered:
CR1 (if Rc=1)
Programming Note
The exponent bias value is 101 for DFP Short, 398
for DFP Long, and 6176 for DFP Extended.
Operand a in Actions for Insert Biased Exponent when operand b in FRB[p] specifies
FRA[p] specifies F QNaN SNaN
F N, Rb Z, Rb Z, Rb Z, Rb
I, Rb I, Rb I, Rb I, Rb
QNaN Q, Rb Q, Rb Q, Rb Q, Rb
SNaN S, Rb S, Rb S, Rb S, Rb
Explanation:
F All finite numbers, including zeros
I The combination field in FRT[p] is set to indicate a default Infinity.
N The combination field in FRT[p] is set to the specified biased exponent in FRA and
the leftmost significand digit in FRB[p].
Q The combination field in FRT[p] is set to indicate a default QNaN.
S The combination field in FRT[p] is set to indicate a default SNaN.
Z The combination field in FRT[p] is set to indicate the specific biased exponent in FRA
and a leftmost coefficient digit of zero.
Rb The contents of the trailing significand field in FRB[p] are reencoded using preferred
DPD encodings and the reencoded result is placed in the same field in FRT[p]. The
sign bit of FRB[p] is copied into the sign bit in FRT[p].
Figure 95. Actions: Insert Biased Exponent
DFP Shift Significand Left Immediate DFP Shift Significand Right Immediate
[Quad] Z22-form [Quad] Z22-form
dscli FRT,FRA,SH (Rc=0) dscri FRT,FRA,SH (Rc=0)
dscli. FRT,FRA,SH (Rc=1) dscri. FRT,FRA,SH (Rc=1)
Encoding
FORM
FP
FPCC
FR\FI
SNaN Exception
Rc
Full Name Operands Vs G V Z O U X IE
C
dadd DFP Add X FRT, FRA, FRB Y N RE Y Y V O U X Y Y Y
daddq DFP Add Quad X FRTp, FRAp, FRBp Y N RE Y Y V O U X Y Y Y
dsub DFP Subtract X FRT, FRA, FRB Y N RE Y Y V O U X Y Y Y
dsubq DFP Subtract Quad X FRTp, FRAp, FRBp Y N RE Y Y V O U X Y Y Y
dmul DFP Multiply X FRT, FRA, FRB Y N RE Y Y V O U X Y Y Y
dmulq DFP Multiply Quad X FRTp, FRAp, FRBp Y N RE Y Y V O U X Y Y Y
ddiv DFP Divide X FRT, FRA, FRB Y N RE Y Y V Z O U X Y Y Y
ddivq DFP Divide Quad X FRTp, FRAp, FRBp Y N RE Y Y V Z O U X Y Y Y
dcmpo DFP Compare Ordered X BF, FRA, FRB Y - - N Y V - - N
dcmpoq DFP Compare Ordered Quad X BF, FRAp, FRBp Y - - N Y V - - N
dcmpu DFP Compare Unordered X BF, FRA, FRB Y - - N Y V - - N
dcmpuq DFP Compare Unordered Quad X BF, FRAp, FRBp Y - - N Y V - - N
dtstdc DFP Test Data Class Z22 BF, FRA, DCM N - - N Y1 - - N
dtstdcq DFP Test Data Class Quad Z22 BF, FRAp, DCM N - - N Y1 - - N
dtstdgq DFP Test Data Group Quad Z22 BF, FRAp, DGM N - - N Y 1 - - N
dtstex DFP Test Exponent X BF, FRA, FRB N - - N Y - - N
dtstexq DFP Test Exponent Quad X BF, FRAp, FRBp N - - N Y - - N
dtstsf DFP Test Significance X BF, FRA(FIX), FRB N - - N Y - - N
dtstsfq DFP Test Significance Quad X BF, FRA(FIX), FRBp N - - N Y - - N
dquai DFP Quantize Immediate Z23 TE, FRT, FRB, RMC Y N RE Y Y V X Y Y Y
dquaiq DFP Quantize Immediate Quad Z23 TE, FRTp, FRBp, RMC Y N RE Y Y V X Y Y Y
dqua DFP Quantize Z23 FRT,FRA,FRB,RMC Y N RE Y Y V X Y Y Y
dquaq DFP Quantize Quad Z23 FRTp,FRAp,FRBp, RMC Y N RE Y Y V X Y Y Y
drrnd DFP Reround Z23 FRT,FRA(FIX),FRB,RMC Y N RE Y Y V X Y Y Y
FRTp, FRA(FIX), FRBp, Y
drrndq DFP Reround Quad Z23 Y N RE Y Y V X Y Y
RMC
DFP Round To FP Integer With Y
drintx Z23 R,FRT, FRB,RMC Y N RE Y Y V X Y Y
Inexact
DFP Round To FP Integer With Y
drintxq Z23 R,FRTp,FRBp,RMC Y N RE Y Y V X Y Y
Inexact Quad
DFP Round To FP Integer With- Y
drintn
out Inexact
Z23 R,FRT, FRB,RMC Y N RE Y Y V Y# Y
Mnemonic
FPRF
Encoding
FORM
FP
FPCC
FR\FI
SNaN Exception
Rc
Full Name Operands Vs G V Z O U X IE
C
ddedpdq DFP Decode DPD To BCD Quad X SP, FRTp(BCD), FRBp N - - N N - - Y
Y
denbcd DFP Encode BCD To DPD X S, FRT, FRB (BCD) - N RE Y Y V Y
Y#
denbcdq DFP Encode BCD To DPD Quad X S, FRTp, FRBp (BCD) - N RE Y Y V Y# Y Y
dxex DFP Extract Biased Exponent X FRT (FIX), FRB N N - N N - - Y
DFP Extract Biased Exponent Y
dxexq X FRT (FIX), FRBp N N - N N - -
Quad
diex DFP Insert Biased Exponent X FRT, FRA(FIX), FRB N Y RE N N - Y Y
DFP Insert Biased Exponent Y
diexq X FRTp, FRA(FIX), FRBp N Y RE N N - Y
Quad
DFP Shift Significand Left Imme- Y
dscli Z22 FRT,FRA,SH N Y RE N N - -
diate
DFP Shift Significand Left Imme- Y
dscliq Z22 FRTp,FRAp,SH N Y RE N N - -
diate Quad
DFP Shift Significand Right Imme- Y
dscri Z22 FRT,FRA,SH N Y RE N N - -
diate
DFP Shift Significand Right Imme- Y
dscriq Z22 FRTp,FRAp,SH N Y RE N N - -
diate Quad
Explanation:
# FI and FR are set to zeros for these instructions.
- Not applicable.
1 A unique definition of the FPSCRFPCC field is provided for the instruction.
These are the only instructions that may generate an SNaN and also set the FPSCFPRF field. Since the BFP FPSCRFPRF
2
field does not include a code for SNaN, these instructions cause the need for redefining the FPSCRFPRF field for DFP.
DCM A 6-bit immediate operand specifying the data-class mask.
DGM A 6-bit immediate operand specifying the data-group mask.
G An SNaN can be generated as the target operand.
IE An ideal exponent is defined for the instruction.
FI Setting of the FPSCRFI flag.
FR Setting of the FPSCRFR flag.
N No.
O An overflow exception may be recognized.
Rc The record bit, Rc, is provided to record FPSCR32:35 in CR field 1.
The trailing significand field is reencoded using preferred DPD encodings.The preferred DPD encoding are also used for
RE
propagated NaNs, or converted NaNs and infinities.
RMC A 2-bit immediate operand specifying the rounding-mode control.
S An one-bit immediate operand specifying if the operation is signed or unsigned.
A two-bit immediate operand: one bit specifies if the operation is signed or unsigned and, for signed operations, another
SP
bit specifies which preferred plus sign code is generated.
U An underflow exception may be recognized.
V An invalid-operation exception may be recognized.
Vs An input operand of SNaN causes an invalid-operation exception.
X An inexact exception may be recognized.
Y Yes.
U Undefined
Z A zero-divide exception may be recognized.
Figure 96. Decimal Floating-Point Instructions Summary (Continued)
x.byte[y] =int
Return the contents of byte element y of x. Integer equals relation.
x.byte[y:z] =fp
Return the contents of byte elements y:z of x. Floating-point equals relation.
LENGTH( x ) ConvertSItoBCD(x,y)
Length of x, in bits. If x is the word “element”, Let x be a signed integer quadword.
LENGTH(x) is the length, in bits, of the element Let y indicate the preferred sign code.
implied by the instruction mnemonic.
Return the signed integer value x in packed
x +bcd 1 decimal format.
Increments the magnitude of the packed decimal
value x by 1. if (x<0) then do
x ¬x + 1
x << y sign 0x000D
Result of shifting x left by y bits, filling vacated bits end
with zeros. else
sign (y=0) ? 0x000C : 0x000F
b LENGTH(x)
result (y < b) ? (xy:b-1 ||y0) : b0 result 0
shcnt 4
x >>ui y
Result of shifting x right by y bits, filling vacated do while (x > 0)
bits with zeros. digit x % 10
result result | (digit<<shcnt)
b LENGTH(x) x x ÷ 10
result (y < b) ? (y0 || x0:(b-y)-1) : b0 shcnt shcnt + 4
end
x >> y
Result of shifting x right by y bits, filling vacated return (result | sign)
bits with copies of bit 0 (sign bit) of x.
ConvertBCDtoSI(x)
b LENGTH(x) Let x be a packed decimal value.
result (y<b) ? (yx0 ||x0:(b-y)-1) : bx0
Return the value x in signed integer format.
x <<< y
Result of rotating x left by y bits. result 0
scale 1
b LENGTH(x) sign x.bit[124:127]
result xy:b-1 || x0:y-1 x x >> 4
do while (x > 0)
x >>> y digit x & 0x000F
Returns the contents of x rotated right by y bits. result result + (digit × scale)
x x >> 4
Chop(x, y) scale scale × 10
Result of extending the right-most y bits of x on end
the left with zeros.
if (sign==0x000B) | (sign==0x000D) then
result x & ((1<<y)-1) result ¬result + 1
if (x < y) then
result y
VSCRSAT 1
else if (x > z) then
result z
VSCRSAT 1
else result x
ConvertSPtoSXWsaturate(x, y)
Let x be a single-precision floating-point value.
Let y be an unsigned integer value.
sign x.bit[0]
exp x.bit[1:8]
frac.bit[0:22] x.bit[9:31]
frac.bit[23:30] 0b0000_0000
if (exp==255) & (frac!=0) then return (0x0000_0000) // NaN operand
if (exp==255) & (frac==0) then do // infinity operand
VSCR.SAT 1
return ((sign==1) ? 0x8000_0000 : 0x7FFF_FFFF)
end
if ((exp+Y-127)>30) then do // large operand
VSCR.SAT 1
return ((sign==1) ? 0x8000_0000 : 0x7FFF_FFFF)
end
if ((exp+y-127)<0) then return (0x0000_0000) // -1.0 < value < 1.0 (value rounds to 0)
significand.bit[0] 0b1
significand.bit[1:31] frac
do i = 1 to 31-(exp+Y-127)
significand significand >>ui 1
end
return ((sign==0) ? significand : (¬significand + 1))
ConvertSPtoUXWsaturate(x, y)
Let x be a single-precision floating-point value.
Let y be an unsigned integer value.
sign x.bit[0]
exp x.bit[1:8]
frac.bit[0:22] x.bit[9:31]
frac.bit[23:30] 0b0000_0000
if (exp==255) & (frac!=0) then return (0x0000_0000) // NaN operand
if (exp==255) & (frac==0) then do // infinity operand
VSCR.SAT 1
return ((sign==1) ? 0x0000_0000 : 0xFFFF_FFFF)
end
if ((exp+Y-127)>31) then do // large operand
VSCR.SAT 1
return ((sign==1) ? 0x0000_0000 : 0xFFFF_FFFF)
end
if ((exp+Y-127)<0) then return (0x0000_0000) // -1.0 < value < 1.0
// value rounds to 0
if( sign==1 ) then do // negative operand
VSCR.SAT 1
return (0x0000_0000)
end
significand.bit[0] 0b1
significand.bit[1:31] frac
do i = 1 to 31-(exp+Y-127)
significand = significand >>ui 1
end
return (significand)
ConvertSXWtoSP(x)
Let x be a 32-bit signed integer value.
sign X.bit[0]
exp 32 + 127
frac.bit[0] x.bit[0]
frac.bit[1:32] x.bit[0:31]
if (frac==0) return (0x0000_0000) // Zero Operand
if (sign==1) then frac = ¬frac + 1
do while (frac.bit[0]=0)
frac frac << 1
exp exp - 1
end
lsb frac.bit[23]
gbit frac.bit[24]
xbit frac.bit[25:32]!=0
inc (lsb & gbit) | (gbit & xbit)
frac.bit[0:23] frac.bit[0:23] + inc
if (carry_out=1) then exp exp + 1
result.bit[0] sign
result.bit[1:8] exp
result.bit[9:31] frac.bit[1:23]
return (result)
ConvertUXWtoSP(x)
Let x be a 32-bit unsigned integer value.
exp 31 + 127
frac x.bit[0:31]
if (frac==0) return (0x0000_0000) // Zero Operand
do while( frac0==0 )
frac frac << 1
exp exp - 1
end
lsb frac.bit[23]
gbit frac.bit[24]
xbit frac.bit[25:31]!=0
inc (lsb & gbit) | (gbit & xbit)
frac.bit[0:23] frac.bit[0:23] + inc
if (carry_out=1) then exp exp + 1
result.bit[0] 0b0
result.bit[1:8] exp
result.bit[9:31] frac.bit[1:23]
return (result)
DUP(x,y)
Return the concatenation of y copies x.
DUP(0b01,4) = 0b01010101
DUP(0b001,3) = 0b001001001
EXTZ(x)
Result of extending x on the left with zeros.
b LENGTH(x)
result x & ((1<<b)-1)
InvMixColumns(x)
do c = 0 to 3
result.word[c].byte[0] = 0x0E•x.word[c].byte[0] ^ 0x0B•x.word[c].byte[1] ^ 0x0D•x.word[c].byte[2] ^ 0x09•x.word[c].byte[3]
result.word[c].byte[1] = 0x09•x.word[c].byte[0] ^ 0x0E•x.word[c].byte[1] ^ 0x0B•x.word[c].byte[2] ^ 0x0D•x.word[c].byte[3]
result.word[c].byte[2] = 0x0D•x.word[c].byte[0] ^ 0x09•x.word[c].byte[1] ^ 0x0E•x.word[c].byte[2] ^ 0x0B•x.word[c].byte[3]
result.word[c].byte[3] = 0x0B•x.word[c].byte[0] ^ 0x0D•x.word[c].byte[1] ^ 0x09•x.word[c].byte[2] ^ 0x0E•x.word[c].byte[3]
end
return(result);
where “•” is a GF(28) multiply, a binary polynomial multiplication reduced by modulo 0x11B.
The GF(28) multiply of 0x09•x can be expressed in minimized terms as the following.
product.bit[0] = x.bit[0] ^ x.bit[3]
product.bit[1] = x.bit[1] ^ x.bit[4] ^ x.bit[0]
product.bit[2] = x.bit[2] ^ x.bit[5] ^ x.bit[0] ^ x.bit[1]
product.bit[3] = x.bit[3] ^ x.bit[6] ^ x.bit[1] ^ x.bit[2]
product.bit[4] = x.bit[4] ^ x.bit[7] ^ x.bit[0] ^ x.bit[2]
product.bit[5] = x.bit[5] ^ x.bit[0] ^ x.bit[1]
product.bit[6] = x.bit[6] ^ x.bit[1] ^ x.bit[2]
product.bit[7] = x.bit[7] ^ x.bit[2]
The GF(28) multiply of 0x0B•x can be expressed in minimized terms as the following.
product.bit[0] = x.bit[0] ^ x.bit[1] ^ x.bit[3]
product.bit[1] = x.bit[1] ^ x.bit[2] ^ x.bit[4] ^ x.bit[0]
product.bit[2] = x.bit[2] ^ x.bit[3] ^ x.bit[5] ^ x.bit[0] ^ x.bit[1]
product.bit[3] = x.bit[3] ^ x.bit[4] ^ x.bit[6] ^ x.bit[0] ^ x.bit[1] ^ x.bit[2]
product.bit[4] = x.bit[4] ^ x.bit[5] ^ x.bit[7] ^ x.bit[2]
product.bit[5] = x.bit[5] ^ x.bit[6] ^ x.bit[0] ^ x.bit[1]
product.bit[6] = x.bit[6] ^ x.bit[7] ^ x.bit[0] ^ x.bit[1] ^ x.bit[2]
product.bit[7] = x.bit[7] ^ x.bit[0] ^ x.bit[2]
The GF(28) multiply of 0x0D•x can be expressed in minimized terms as the following.
product.bit[0] = x.bit[0] ^ x.bit[2] ^ x.bit[3]
product.bit[1] = x.bit[1] ^ x.bit[3] ^ x.bit[4] ^ x.bit[0]
product.bit[2] = x.bit[2] ^ x.bit[4] ^ x.bit[5] ^ x.bit[1]
product.bit[3] = x.bit[3] ^ x.bit[5] ^ x.bit[6] ^ x.bit[0] ^ x.bit[2]
product.bit[4] = x.bit[4] ^ x.bit[6] ^ x.bit[7] ^ x.bit[0] ^ x.bit[1] ^ x.bit[2]
product.bit[5] = x.bit[5] ^ x.bit[7] ^ x.bit[1]
product.bit[6] = x.bit[6] ^ x.bit[0] ^ x.bit[2]
product.bit[7] = x.bit[7] ^ x.bit[1] ^ x.bit[2]
The GF(28) multiply of 0x0E•x can be expressed in minimized terms as the following.
product.bit[0] = x.bit[1] ^ x.bit[2] ^ x.bit[3]
product.bit[1] = x.bit[2] ^ x.bit[3] ^ x.bit[4] ^ x.bit[0]
product.bit[2] = x.bit[3] ^ x.bit[4] ^ x.bit[5] ^ x.bit[1]
product.bit[3] = x.bit[4] ^ x.bit[5] ^ x.bit[6] ^ x.bit[2]
product.bit[4] = x.bit[5] ^ x.bit[6] ^ x.bit[7] ^ x.bit[1] ^ x.bit[2]
product.bit[5] = x.bit[6] ^ x.bit[7] ^ x.bit[1]
product.bit[6] = x.bit[7] ^ x.bit[2]
product.bit[7] = x.bit[0] ^ x.bit[1] ^ x.bit[2]
InvShiftRows(x)
result.word[0].byte[0] = x.word[0].byte[0]
result.word[1].byte[0] = x.word[1].byte[0]
result.word[2].byte[0] = x.word[2].byte[0]
result.word[3].byte[0] = x.word[3].byte[0]
result.word[0].byte[1] = x.word[3].byte[1]
result.word[1].byte[1] = x.word[0].byte[1]
result.word[2].byte[1] = x.word[1].byte[1]
result.word[3].byte[1] = x.word[2].byte[1]
result.word[0].byte[2] = x.word[2].byte[2]
result.word[1].byte[2] = x.word[3].byte[2]
result.word[2].byte[2] = x.word[0].byte[2]
result.word[3].byte[2] = x.word[1].byte[2]
result.word[0].byte[3] = x.word[1].byte[3]
result.word[1].byte[3] = x.word[2].byte[3]
result.word[2].byte[3] = x.word[3].byte[3]
result.word[3].byte[3] = x.word[0].byte[3]
return(result)
InvSubBytes(x)
InvSBOX.byte[256] = { 0x52,0x09,0x6A,0xD5,0x30,0x36,0xA5,0x38,0xBF,0x40,0xA3,0x9E,0x81,0xF3,0xD7,0xFB,
0x7C,0xE3,0x39,0x82,0x9B,0x2F,0xFF,0x87,0x34,0x8E,0x43,0x44,0xC4,0xDE,0xE9,0xCB,
0x54,0x7B,0x94,0x32,0xA6,0xC2,0x23,0x3D,0xEE,0x4C,0x95,0x0B,0x42,0xFA,0xC3,0x4E,
0x08,0x2E,0xA1,0x66,0x28,0xD9,0x24,0xB2,0x76,0x5B,0xA2,0x49,0x6D,0x8B,0xD1,0x25,
0x72,0xF8,0xF6,0x64,0x86,0x68,0x98,0x16,0xD4,0xA4,0x5C,0xCC,0x5D,0x65,0xB6,0x92,
0x6C,0x70,0x48,0x50,0xFD,0xED,0xB9,0xDA,0x5E,0x15,0x46,0x57,0xA7,0x8D,0x9D,0x84,
0x90,0xD8,0xAB,0x00,0x8C,0xBC,0xD3,0x0A,0xF7,0xE4,0x58,0x05,0xB8,0xB3,0x45,0x06,
0xD0,0x2C,0x1E,0x8F,0xCA,0x3F,0x0F,0x02,0xC1,0xAF,0xBD,0x03,0x01,0x13,0x8A,0x6B,
0x3A,0x91,0x11,0x41,0x4F,0x67,0xDC,0xEA,0x97,0xF2,0xCF,0xCE,0xF0,0xB4,0xE6,0x73,
0x96,0xAC,0x74,0x22,0xE7,0xAD,0x35,0x85,0xE2,0xF9,0x37,0xE8,0x1C,0x75,0xDF,0x6E,
0x47,0xF1,0x1A,0x71,0x1D,0x29,0xC5,0x89,0x6F,0xB7,0x62,0x0E,0xAA,0x18,0xBE,0x1B,
0xFC,0x56,0x3E,0x4B,0xC6,0xD2,0x79,0x20,0x9A,0xDB,0xC0,0xFE,0x78,0xCD,0x5A,0xF4,
0x1F,0xDD,0xA8,0x33,0x88,0x07,0xC7,0x31,0xB1,0x12,0x10,0x59,0x27,0x80,0xEC,0x5F,
0x60,0x51,0x7F,0xA9,0x19,0xB5,0x4A,0x0D,0x2D,0xE5,0x7A,0x9F,0x93,0xC9,0x9C,0xEF,
0xA0,0xE0,0x3B,0x4D,0xAE,0x2A,0xF5,0xB0,0xC8,0xEB,0xBB,0x3C,0x83,0x53,0x99,0x61,
0x17,0x2B,0x04,0x7E,0xBA,0x77,0xD6,0x26,0xE1,0x69,0x14,0x63,0x55,0x21,0x0C,0x7D }
do i = 0 to 15
result.byte[i] = InvSBOX.byte[x.byte[i]]
end
return(result)
MixColumns(x)
do c = 0 to 3
result.word[c].byte[0] = 0x02•x.word[c].byte[0] ^ 0x03•x.word[c].byte[1] ^ x.word[c].byte[2] ^ x.word[c].byte[3]
result.word[c].byte[1] = x.word[c].byte[0] ^ 0x02•x.word[c].byte[1] ^ 0x03•x.word[c].byte[2] ^ x.word[c].byte[3]
result.word[c].byte[2] = x.word[c].byte[0] ^ x.word[c].byte[1] ^ 0x02•x.word[c].byte[2] ^ 0x03•x.word[c].byte[3]
result.word[c].byte[3] = 0x03•x.word[c].byte[0] ^ x.word[c].byte[1] ^ x.word[c].byte[2] ^ 0x02•x.word[c].byte[3]
end
return(result)
The GF(28) multiply of 0x02•x can be expressed in minimized terms as the following.
product.bit[0] = x.bit[1]
product.bit[1] = x.bit[2]
product.bit[2] = x.bit[3]
product.bit[3] = x.bit[4] ^ x.bit[0]
product.bit[4] = x.bit[5] ^ x.bit[0]
product.bit[5] = x.bit[6]
product.bit[6] = x.bit[7] ^ x.bit[0]
product.bit[7] = x.bit[0]
The GF(28) multiply of 0x03•x can be expressed in minimized terms as the following.
product.bit[0] = x.bit[0] ^ x.bit[1]
product.bit[1] = x.bit[1] ^ x.bit[2]
product.bit[2] = x.bit[2] ^ x.bit[3]
product.bit[3] = x.bit[3] ^ x.bit[4] ^ x.bit[0]
product.bit[4] = x.bit[4] ^ x.bit[5] ^ x.bit[0]
product.bit[5] = x.bit[5] ^ x.bit[6]
product.bit[6] = x.bit[6] ^ x.bit[7] ^ x.bit[0]
product.bit[7] = x.bit[7] ^ x.bit[0]
ShiftRows(x)
result.word[0].byte[0] = x.word[0].byte[0]
result.word[1].byte[0] = x.word[1].byte[0]
result.word[2].byte[0] = x.word[2].byte[0]
result.word[3].byte[0] = x.word[3].byte[0]
result.word[0].byte[1] = x.word[1].byte[1]
result.word[1].byte[1] = x.word[2].byte[1]
result.word[2].byte[1] = x.word[3].byte[1]
result.word[3].byte[1] = x.word[0].byte[1]
result.word[0].byte[2] = x.word[2].byte[2]
result.word[1].byte[2] = x.word[3].byte[2]
result.word[2].byte[2] = x.word[0].byte[2]
result.word[3].byte[2] = x.word[1].byte[2]
result.word[0].byte[3] = x.word[3].byte[3]
result.word[1].byte[3] = x.word[0].byte[3]
result.word[2].byte[3] = x.word[1].byte[3]
result.word[3].byte[3] = x.word[2].byte[3]
return(result)
Signed_BCD_Add(x,y,z)
Let x and y be 31-digit signed decimal values.
If the unbounded result is equal to zero, eq_flag is set to 1. Otherwise, eq_flag is set to 0.
If the unbounded result is greater than zero, gt_flag is set to 1. Otherwise, gt_flag is set to 0.
If the unbounded result is less than zero, lt_flag is set to 1. Otherwise, lt_flag is set to 0.
If the magnitude of the unbounded result is greater than 1031-1, ox_flag is set to 1. Otherwise, ox_flag is set to
0.
If the unbounded result is greater than or equal to zero, the sign code of the result is set to 0b1100 if z=0.
If the unbounded result is greater than or equal to zero, the sign code of the result is set to 0b1111 if z=1.
If the unbounded result is less than zero, the sign code of the result is set to 0b1101.
The low-order 31 digits of the unbounded result magnitude concatented with the sign code are returned.
If either operand is an invalid encoding of a signed decimal value, the result returned is undefined and inv_flag
is set to 1 and lt_flag, gt_flag and eq_flag are set to 0. Otherwise, inv_flag is set to 0.
Signed_BCD_Subtract(x,y,z)
Let x and y be 31-digit signed decimal values.
If the unbounded result is equal to zero, eq_flag is set to 1. Otherwise, eq_flag is set to 0.
If the unbounded result is greater than zero, gt_flag is set to 1. Otherwise, gt_flag is set to 0.
If the unbounded result is less than zero, lt_flag is set to 1. Otherwise, lt_flag is set to 0.
If the magnitude of the unbounded result is greater than 1031-1, ox_flag is set to 1. Otherwise, ox_flag is set to
0.
If the unbounded result is greater than or equal to zero, the sign code of the result is set to 0b1100 if z=0.
If the unbounded result is greater than or equal to zero, the sign code of the result is set to 0b1111 if z=1.
If the unbounded result is less than zero, the sign code of the result is set to 0b1101.
The low-order 31 digits of the unbounded result magnitude concatented with the sign code are returned.
If either operand is an invalid encoding of a signed decimal value, the result returned is undefined and inv_flag
is set to 1 and lt_flag, gt_flag and eq_flag are set to 0. Otherwise, inv_flag is set to 0.
SubBytes(x)
SBOX.byte[0:255] = { 0x63,0x7C,0x77,0x7B,0xF2,0x6B,0x6F,0xC5,0x30,0x01,0x67,0x2B,0xFE,0xD7,0xAB,0x76,
0xCA,0x82,0xC9,0x7D,0xFA,0x59,0x47,0xF0,0xAD,0xD4,0xA2,0xAF,0x9C,0xA4,0x72,0xC0,
0xB7,0xFD,0x93,0x26,0x36,0x3F,0xF7,0xCC,0x34,0xA5,0xE5,0xF1,0x71,0xD8,0x31,0x15,
0x04,0xC7,0x23,0xC3,0x18,0x96,0x05,0x9A,0x07,0x12,0x80,0xE2,0xEB,0x27,0xB2,0x75,
0x09,0x83,0x2C,0x1A,0x1B,0x6E,0x5A,0xA0,0x52,0x3B,0xD6,0xB3,0x29,0xE3,0x2F,0x84,
0x53,0xD1,0x00,0xED,0x20,0xFC,0xB1,0x5B,0x6A,0xCB,0xBE,0x39,0x4A,0x4C,0x58,0xCF,
0xD0,0xEF,0xAA,0xFB,0x43,0x4D,0x33,0x85,0x45,0xF9,0x02,0x7F,0x50,0x3C,0x9F,0xA8,
0x51,0xA3,0x40,0x8F,0x92,0x9D,0x38,0xF5,0xBC,0xB6,0xDA,0x21,0x10,0xFF,0xF3,0xD2,
0xCD,0x0C,0x13,0xEC,0x5F,0x97,0x44,0x17,0xC4,0xA7,0x7E,0x3D,0x64,0x5D,0x19,0x73,
0x60,0x81,0x4F,0xDC,0x22,0x2A,0x90,0x88,0x46,0xEE,0xB8,0x14,0xDE,0x5E,0x0B,0xDB,
0xE0,0x32,0x3A,0x0A,0x49,0x06,0x24,0x5C,0xC2,0xD3,0xAC,0x62,0x91,0x95,0xE4,0x79,
0xE7,0xC8,0x37,0x6D,0x8D,0xD5,0x4E,0xA9,0x6C,0x56,0xF4,0xEA,0x65,0x7A,0xAE,0x08,
0xBA,0x78,0x25,0x2E,0x1C,0xA6,0xB4,0xC6,0xE8,0xDD,0x74,0x1F,0x4B,0xBD,0x8B,0x8A,
0x70,0x3E,0xB5,0x66,0x48,0x03,0xF6,0x0E,0x61,0x35,0x57,0xB9,0x86,0xC1,0x1D,0x9E,
0xE1,0xF8,0x98,0x11,0x69,0xD9,0x8E,0x94,0x9B,0x1E,0x87,0xE9,0xCE,0x55,0x28,0xDF,
0x8C,0xA1,0x89,0x0D,0xBF,0xE6,0x42,0x68,0x41,0x99,0x2D,0x0F,0xB0,0x54,0xBB,0x16 }
do i = 0 to 15
result.byte[i] = SBOX.byte[x.byte[i]]
end
return(result)
RoundToSPIntCeil(x) RoundToSPIntTrunc(x)
The value x if x is a single-precision floating-point The value x if x is a single-precision floating-point
integer; otherwise the smallest single-precision integer; otherwise the largest single-precision
floating-point integer that is greater than x. floating-point integer that is less than x if x>0, or
the smallest single-precision floating-point integer
RoundToSPIntFloor(x) that is greater than x if x<0.
The value x if x is a single-precision floating-point
integer; otherwise the largest single-precision RoundToNearSP(x)
floating-point integer that is less than x. The single-precision floating-point number that is
nearest in value to the infinitely-precise
RoundToSPIntNear(x) floating-point intermediate result x (in case of a tie,
The value x if x is a single-precision floating-point the single-precision floating-point value with the
integer; otherwise the single-precision least-significant bit equal to 0 is used).
floating-point integer that is nearest in value to x
(in case of a tie, the even single-precision
floating-point integer is used).
ReciprocalEstimateSP(x) LogBase2EstimateSP(x)
A single-precision floating-point estimate of the A single-precision floating-point estimate of the
reciprocal of the single-precision floating-point base 2 logarithm of the single-precision
number x. floating-point number x.
ReciprocalSquareRootEstimateSP(x) Power2EstimateSP(x)
A single-precision floating-point estimate of the A single-precision floating-point estimate of the 2
reciprocal of the square root of the raised to the power of the single-precision
single-precision floating-point number x. floating-point number x.
Figure 98. Vector Registers Figure 99. Vector Status and Control Register
Depending on the instruction, the contents of a Vector The bit definitions for the VSCR are as follows.
Register are interpreted as a sequence of equal-length
elements (bytes, halfwords, or words) or as a Bit(s) Description
quadword. Each of the elements is aligned within the
Vector Register, as shown in Figure 97. Many 96:110 Reserved
instructions perform a given operation in parallel on all 111 Vector Non-Java Mode (NJ)
elements in a Vector Register. Depending on the
This bit controls how denormalized values
instruction, a byte, halfword, or word element can be
are handled by Vector Floating-Point
interpreted as a signed-integer, an unsigned-integer,
instructions.
or a logical value; a word element can also be
0 Denormalized values are handled as
interpreted as a single-precision floating-point value. In
specified by Java and the IEEE stan-
the instruction descriptions, phrases like
dard; see Section 6.6.1.
“signed-integer word element” are used as shorthand 1 If an element in a source VR contains a
for “word element, interpreted as a signed-integer”. denormalized value, the value 0 is used
instead. If an instruction causes an
Load and Store instructions are provided that transfer
Underflow Exception, the correspond-
a byte, halfword, word, or quadword between storage
ing element in the target VR is set to 0.
and a Vector Register. In both cases the 0 has the same sign
as the denormalized or underflowing
value.
112:126 Reserved
127 Vector Saturation (SAT)
6.4 Vector Storage Access Oper- A vector quadword Load instruction for which the
effective address (EA) is quadword-aligned places the
ations byte in storage addressed by EA into byte element 0 of
the target Vector Register, the byte in storage
The Vector Storage Access instructions provide the addressed by EA+1 into byte element 1 of the target
means by which data can be copied from storage to a Vector Register, etc. Similarly, a vector quadword
Vector Register or from a Vector Register to storage. Store instruction for which the EA is quadword-aligned
Instructions are provided that access byte, halfword, places the contents of byte element 0 of the source
word, and quadword storage operands. These Vector Register into the byte in storage addressed by
instructions differ from the fixed-point and floating-point EA, the contents of byte element 1 of the source
Storage Access instructions in that vector storage Vector Register into the byte in storage addressed by
operands are assumed to be aligned, and vector EA+1, etc.
storage accesses are performed as if the appropriate
number of low-order bits of the specified effective Figure 100 shows an aligned quadword in storage.
address (EA) were zero. For example, the low-order bit Figure 101 shows the result of loading that quadword
of EA is ignored for halfword Vector Storage Access into a Vector Register or, equivalently, shows the
instructions, and the low-order four bits of EA are contents that must be in a Vector Register if storing
ignored for quadword Vector Storage Access that Vector Register is to produce the storage contents
instructions. The effect is to load or store the storage shown in Figure 100.
operand of the specified length that contains the byte
addressed by EA. When an aligned byte, halfword, or word storage
operand is loaded into a Vector Register, the element
If a storage operand is unaligned, additional (byte, halfword, or word respectively) that receives the
instructions must be used to ensure that the operand is data is the element that would have received the data
correctly placed in a Vector Register or in storage. had the entire aligned quadword containing the
Instructions are provided that shift and merge the storage operand addressed by EA been loaded.
contents of two Vector Registers, such that an Similarly, when a byte, halfword, or word element in a
unaligned quadword storage operand can be copied Vector Register is stored into an aligned storage
between storage and the Vector Registers in a operand (byte, halfword, or word respectively), the
relatively efficient manner. element selected to be stored is the element that
would have been stored into the storage operand
As shown in Figure 97, the elements in Vector addressed by EA had the entire Vector Register been
Registers are numbered; the high-order (or most stored to the aligned quadword containing the storage
significant) byte element is numbered 0 and the operand addressed by EA. (Byte storage operands are
low-order (or least significant) byte element is always aligned.)
numbered 15. The numbering affects the values that
must be placed into the permute control vector for the For aligned byte, halfword, and word storage
Vector Permute instruction in order for that instruction operands, if the corresponding element number is
to achieve the desired effects, as illustrated by the known when the program is written, the appropriate
examples in the following subsections. Vector Splat and Vector Permute instructions can be
used to copy or replicate the data contained in the
storage operand after loading the operand into a
Vector Register. An example of this is given in the
Programming Note for Vector Splat; see page 257.
Another example is to replicate the element across an
entire Vector Register before storing it into an arbitrary
aligned storage operand of the same length; the
replication ensures that the correct data are stored
regardless of the offset of the storage operand in its
aligned quadword in storage.
00 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
10
0 1 2 3 4 5 6 7 8 9 A B C D E F
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
0 1 2 3 4 5 6 7 8 9 A B C D E F
00 00 01 02 03 04
10 05 06 07 08 09 0A 0B 0C 0D 0E 0F
0 1 2 3 4 5 6 7 8 9 A B C D E F
Vhi 00 01 02 03 04
Vlo 05 06 07 08 09 0A 0B 0C 0D 0E 0F
Vt,Vs 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
0 15
Programming Note
The sequence of instructions given below is one stored is assumed to be in Vs; see Figure 103 The
approach that can be used to load the unaligned contents of Vs are shifted right and split into two parts,
quadword shown in Figure 102 into a Vector Register. each of which is merged (using a Vector Select
In Figure 103 Vhi and Vlo are the Vector Registers that instruction) with the current contents of the two aligned
will receive the most significant quadword and least quadwords (MSQ and LSQ) that will contain the most
significant quadword respectively. VRT is the target significant bytes and least significant bytes,
Vector Register. respectively, of the unaligned quadword. The resulting
two quadwords are stored using Store Vector Indexed
After the two quadwords have been loaded into Vhi instructions. A Load Vector for Shift Right instruction is
and Vlo, using Load Vector Indexed instructions, the used to generate the permute control vector that is
alignment is performed by shifting the 32-byte quantity used for the shifting. A single register is used for the
Vhi || Vlo left by an amount determined by the address “shifted” contents; this is possible because the
of the first byte of the desired data. The shifting is done “shifting” is done by means of a right rotation. The
using a Vector Permute instruction for which the rotation is accomplished by specifying Vs for both
permute control vector is generated by a Load Vector components of the Vector Permute instruction. In
for Shift Left instruction. The Load Vector for Shift Left addition, the same permute control vector is used on a
instruction uses the same address specification as the sequence of 1s and 0s to generate the mask used by
Load Vector Indexed instruction that loads the Vhi the Vector Select instructions that do the merging.
register; this is the address of the desired unaligned
quadword. The following sequence of instructions copies the
contents of Vs into an unaligned quadword in storage.
The following sequence of instructions copies the
unaligned quadword storage operand into register Vt. # Assumptions:
# Rb != 0 and contents of Rb = 0xB
# Assumptions: lvx Vhi,0,Rb # load current MSQ
# Rb != 0 and contents of Rb = 0xB lvsr Vp,0,Rb # set permute control vector
lvx Vhi,0,Rb # load MSQ addi Rb,Rb,16 # address of LSQ
lvsl Vp,0,Rb # set permute control vector lvx Vlo,0,Rb # load current LSQ
addi Rb,Rb,16 # address of LSQ vspltisb V1s,-1 # generate the select mask bits
lvx Vlo,0,Rb # load LSQ vspltisb V0s,0
vperm Vt,Vhi,Vlo,Vp # align the data vperm Vmask,V0s,V1s,Vp # generate the select mask
vperm Vs,Vs,Vs,Vp # right rotate the data
The procedure for storing an unaligned quadword is vsel Vlo,Vs,Vlo,Vmask # insert LSQ component
essentially the reverse of the procedure for loading vsel Vhi,Vhi,Vs,Vmask # insert MSQ component
one. However, a read-modify-write sequence is stvx Vlo,0,Rb # store LSQ
addi Rb,Rb,-16 # address of MSQ
required that inserts the source quadword into two
stvx Vhi,0,Rb # store MSQ
aligned quadwords in storage. The quadword to be
then the result is that NaN range were unbounded exceeds that of the largest
else if Invalid Operation exception finite floating-point number for the target
(Section 6.6.2.2) floating-point format.
then the result is the QNaN 0x7FC0_0000
– For either of the two Vector Convert To
If the instruction is either of the two Vector Convert To Fixed-Point Word instructions, either a source
Fixed-Point Word instructions, the corresponding result value is an infinity or the product of a source value
is 0x0000_0000. VSCRSAT is not affected. and 2UIM is a number too large in magnitude to be
represented in the target fixed-point format.
If the instruction is Vector Compare Bounds
Floating-Point, the corresponding result is The following actions are taken:
0xC000_0000.
1. If the vector instruction would normally produce
If the instruction is one of the other Vector floating-point results, the corresponding result is
Floating-Point Compare instructions, the an infinity, where the sign is the sign of the inter-
corresponding result is 0x0000_0000. mediate result.
2. If the instruction is Vector Convert To Unsigned
6.6.2.2 Invalid Operation Exception Fixed-Point Word Saturate, the corresponding
result is 0xFFFF_FFFF if the source value is a
An Invalid Operation Exception occurs when a source positive number or +infinity, and is 0x0000_0000 if
value or set of source values is invalid for the specified the source value is a negative number or -infinity.
operation. The invalid operations are: VSCRSAT is set to 1.
3. If the instruction is Vector Convert To Signed
– Magnitude subtraction of infinities Fixed-Point Word Saturate, the corresponding
– Multiplication of infinity by zero result is 0x7FFF_FFFF if the source value is a pos-
– Reciprocal square root estimate of a negative, itive number or +infinity., and is 0x8000_0000 if the
nonzero number or -infinity. source value is a negative number or -infinity.
– Log base 2 estimate of a negative, nonzero VSCRSAT is set to 1.
number or -infinity.
Load Vector Element Byte Indexed X-form Load Vector Element Halfword Indexed
X-form
lvebx VRT,RA,RB
lvehx VRT,RA,RB
31 VRT RA RB 7 /
0 6 11 16 21 31 31 VRT RA RB 39 /
0 6 11 16 21 31
if RA = 0 then b 0
else b (RA) if RA = 0 then b 0
EA b + (RB) else b (RA)
eb EA60:63 EA (b + (RB)) & 0xFFFF_FFFF_FFFF_FFFE
eb EA60:63
VRT undefined
if Big-Endian byte ordering then VRT undefined
VRT8×eb:8×eb+7 MEM(EA,1) if Big-Endian byte ordering then
else VRT8×eb:8×eb+15 MEM(EA,2)
VRT120-(8×eb):127-(8×eb) MEM(EA,1) else
VRT112-(8×eb):127-(8×eb) MEM(EA,2)
Let the effective address (EA) be the sum
(RA|0)+(RB). Let the effective address (EA) be the result of ANDing
0xFFFF_FFFF_FFFF_FFFE with the sum
Let eb be bits 60:63 of EA. (RA|0)+(RB).
If Big-Endian byte ordering is used for the storage Let eb be bits 60:63 of EA.
access, the contents of the byte in storage at address
EA are placed into byte eb of register VRT. The If Big-Endian byte ordering is used for the storage
remaining bytes in register VRT are set to undefined access,
values.
– the contents of the byte in storage at address EA
If Little-Endian byte ordering is used for the storage are placed into byte eb of register VRT,
access, the contents of the byte in storage at address – the contents of the byte in storage at address
EA are placed into byte 15-eb of register VRT. The EA+1 are placed into byte eb+1 of register VRT,
remaining bytes in register VRT are set to undefined and
values. – the remaining bytes in register VRT are set to
undefined values.
Special Registers Altered:
None If Little-Endian byte ordering is used for the storage
access,
Programming Note
On some implementations, the hint provided by the
lvxl instruction and the corresponding hint
provided by the stvxl, lvepxl, and stvepxl
instructions are applied to the entire cache block
containing the specified quadword. On such
implementations, the effect of the hint may be to
cause that cache block to be considered a likely
candidate for replacement when space is needed
in the cache for a new block. Thus, on such
implementations, the hint should be used with
caution if the cache block containing the quadword
also contains data that may be needed by the
program in the near future. Also, the hint may be
used before the last reference in a sequence of
references to the quadword if the subsequent
references are likely to occur sufficiently soon that
the cache block containing the quadword is not
likely to be displaced from the cache before the
last reference.
Store Vector Element Byte Indexed X-form Store Vector Element Halfword Indexed
X-form
stvebx VRS,RA,RB
stvehx VRS,RA,RB
31 VRS RA RB 135 /
0 6 11 16 21 31 31 VRS RA RB 167 /
0 6 11 16 21 31
if RA = 0 then b 0
else b (RA) if RA = 0 then b 0
EA b + (RB) else b (RA)
eb EA60:63 EA (b + (RB)) & 0xFFFF_FFFF_FFFF_FFFE
if Big-Endian byte ordering then eb EA60:63
MEM(EA,1) VRS8eb:8eb+7 if Big-Endian byte ordering then
else MEM(EA,2) VRS8eb:8eb+15
MEM(EA,1) VRS120-(8eb):127-(8eb) else
MEM(EA,2) VRS112-(8eb):127-(8eb)
Let the effective address (EA) be the sum
(RA|0)+(RB). Let the effective address (EA) be the result of ANDing
0xFFFF_FFFF_FFFF_FFFE with the sum
Let eb be bits 60:63 of EA. (RA|0)+(RB).
If Big-Endian byte ordering is used for the storage Let eb be bits 60:63 of EA.
access, the contents of byte eb of register VRS are
placed in the byte in storage at address EA. If Big-Endian byte ordering is used for the storage
access,
If Little-Endian byte ordering is used for the storage
access, the contents of byte 15-eb of register VRS are – the contents of byte eb of register VRS are placed
placed in the byte in storage at address EA. in the byte in storage at address EA, and
– the contents of byte eb+1 of register VRS are
Special Registers Altered: placed in the byte in storage at address EA+1.
None
If Little-Endian byte ordering is used for the storage
Programming Note access,
Unless bits 60:63 of the address are known to
match the byte offset of the subject byte element in – the contents of byte 15-eb of register VRS are
register VRS, software should use Vector Splat to placed in the byte in storage at address EA, and
splat the subject byte element before performing – the contents of byte 14-eb of register VRS are
the store. placed in the byte in storage at address EA+1.
Programming Note
Unless bits 60:62 of the address are known to
match the halfword offset of the subject halfword
element in register VRS software should use
Vector Splat to splat the subject halfword element
before performing the store.
Programming Note
Unless bits 60:61 of the address are known to
match the word offset of the subject word element
in register VRS, software should use Vector Splat
to splat the subject word element before
performing the store.
Load Vector for Shift Left Indexed X-form Load Vector for Shift Right Indexed
X-form
lvsl VRT,RA,RB
lvsr VRT,RA,RB
31 VRT RA RB 6 /
0 6 11 16 21 31 31 VRT RA RB 38 /
0 6 11 16 21 31
if RA = 0 then b 0
else b (RA) if RA = 0 then b 0
sh (b + (RB))60:63 else b (RA)
switch(sh) sh (b + (RB))60:63
case(0x0): VRT0x000102030405060708090A0B0C0D0E0F switch(sh)
case(0x1): VRT0x0102030405060708090A0B0C0D0E0F10 case(0x0): VRT0x101112131415161718191A1B1C1D1E1F
case(0x2): VRT0x02030405060708090A0B0C0D0E0F1011 case(0x1): VRT0x0F101112131415161718191A1B1C1D1E
case(0x3): VRT0x030405060708090A0B0C0D0E0F101112 case(0x2): VRT0x0E0F101112131415161718191A1B1C1D
case(0x4): VRT0x0405060708090A0B0C0D0E0F10111213 case(0x3): VRT0x0D0E0F101112131415161718191A1B1C
case(0x5): VRT0x05060708090A0B0C0D0E0F1011121314 case(0x4): VRT0x0C0D0E0F101112131415161718191A1B
case(0x6): VRT0x060708090A0B0C0D0E0F101112131415 case(0x5): VRT0x0B0C0D0E0F101112131415161718191A
case(0x7): VRT0x0708090A0B0C0D0E0F10111213141516 case(0x6): VRT0x0A0B0C0D0E0F10111213141516171819
case(0x8): VRT0x08090A0B0C0D0E0F1011121314151617 case(0x7): VRT0x090A0B0C0D0E0F101112131415161718
case(0x9): VRT0x090A0B0C0D0E0F101112131415161718 case(0x8): VRT0x08090A0B0C0D0E0F1011121314151617
case(0xA): VRT0x0A0B0C0D0E0F10111213141516171819 case(0x9): VRT0x0708090A0B0C0D0E0F10111213141516
case(0xB): VRT0x0B0C0D0E0F101112131415161718191A case(0xA): VRT0x060708090A0B0C0D0E0F101112131415
case(0xC): VRT0x0C0D0E0F101112131415161718191A1B case(0xB): VRT0x05060708090A0B0C0D0E0F1011121314
case(0xD): VRT0x0D0E0F101112131415161718191A1B1C case(0xC): VRT0x0405060708090A0B0C0D0E0F10111213
case(0xE): VRT0x0E0F101112131415161718191A1B1C1D case(0xD): VRT0x030405060708090A0B0C0D0E0F101112
case(0xF): VRT0x0F101112131415161718191A1B1C1D1E case(0xE): VRT0x02030405060708090A0B0C0D0E0F1011
case(0xF): VRT0x0102030405060708090A0B0C0D0E0F10
Let sh be bits 60:63 of the sum (RA|0)+(RB). Let X be
the 32 byte value 0x00 || 0x01 || 0x02 || … || 0x1E || Let sh be bits 60:63 of the sum (RA|0)+(RB). Let X be
0x1F. the 32-byte value 0x00 || 0x01 || 0x02 || … || 0x1E ||
0x1F.
Bytes sh to sh+15 of X are placed into VRT.
Bytes 16-sh to 31-sh of X are placed into VRT.
Special Registers Altered:
None Special Registers Altered:
None
Programming Note
Each source word can be considered to be a 32-bit
"pixel", consisting of four 8-bit "channels". Each
target halfword can be considered to be a 16-bit
pixel, consisting of one 1-bit channel and three
5-bit channels. A channel can be used to specify
the intensity of a particular color, such as red,
green, or blue, or to provide other information
needed by the application.
Let doubleword elements 0 and 1 of src be the Let the source vector be the concatenation of the
contents of VR[VRA]. contents of VRA followed by the contents of VRB.
Let doubleword elements 2 and 3 of src be the For each integer value i from 0 to 15, do the following.
contents of VR[VRB]. Signed-integer halfword element i in the source
vector is converted to an signed-integer byte.
For each integer value i from 0 to 3, do the following.
The signed integer value in doubleword element i – If the value of the element is greater than 127
of src is placed into word element i of VR[VRT] in the result saturates to 127
unsigned integer format.
– If the value is greater than 232-1 the result – If the value of the element is less than -128
saturates to 232-1. the result saturates to -128.
– If the value is less than 0 the result saturates
to 0. The low-order 8 bits of the result is placed into
byte element i of VRT.
Special Registers Altered:
SAT Special Registers Altered:
SAT
Vector Pack Signed Halfword Unsigned Vector Pack Signed Word Signed Saturate
Saturate VX-form VX-form
vpkshus VRT,VRA,VRB vpkswss VRT,VRA,VRB
do i=0 to 63 by 8 do i=0 to 63 by 16
src1 EXTS((VRA)i2:i2+15) src1 EXTS((VRA)i2:i2+31)
src2 EXTS((VRB)i2:i2+15) src2 EXTS((VRB)i2:i2+31)
VRTi:i+7 Clamp(src1, 0, 255)24:31 VRTi:i+15 Clamp(src1, -215, 215-1)16:31
VRTi+64:i+71 Clamp(src2, 0, 255)24:31 VRTi+64:i+79 Clamp(src2, -215, 215-1)16:31
end end
Let the source vector be the concatenation of the Let the source vector be the concatenation of the
contents of VRA followed by the contents of VRB. contents of VRA followed by the contents of VRB.
For each integer value i from 0 to 15, do the following. For each integer value i from 0 to 7, do the following.
Signed-integer halfword element i in the source Signed-integer word element i in the source vector
vector is converted to an unsigned-integer byte. is converted to an signed-integer halfword.
– If the value of the element is greater than 255 – If the value of the element is greater than
the result saturates to 255 215-1 the result saturates to 215-1
– If the value of the element is less than 0 the – If the value of the element is less than -215
result saturates to 0. the result saturates to -215.
The low-order 8 bits of the result is placed into The low-order 16 bits of the result is placed into
byte element i of VRT. halfword element i of VRT.
For each integer value i from 0 to 7, do the following. Let doubleword elements 0 and 1 of src be the
Signed-integer word element i in the source vector contents of VR[VRA].
is converted to an unsigned-integer halfword.
Let doubleword elements 2 and 3 of src be the
contents of VR[VRB].
– If the value of the element is greater than
216-1 the result saturates to 216-1
For each integer value i from 0 to 3, do the following.
The unsigned integer value in doubleword
– If the value of the element is less than 0 the
element i of src is placed into word element i of
result saturates to 0.
VR[VRT] in unsigned integer format.
– If the value of the element is greater than
The low-order 16 bits of the result is placed into
232-1 the result saturates to 232-1
halfword element i of VRT.
Special Registers Altered:
Special Registers Altered:
SAT
SAT
Let doubleword elements 0 and 1 of src be the For each integer value i from 0 to 15, do the following.
contents of VR[VRA]. The contents of bits 8:15 of halfword element i in
the source vector is placed into byte element i of
Let doubleword elements 2 and 3 of src be the VRT.
contents of VR[VRB].
Special Registers Altered:
For each integer value i from 0 to 3, do the following. None
The contents of bits 32:63 of doubleword element
i of src is placed into word element i of VR[VRT].
Vector Pack Unsigned Halfword Unsigned Vector Pack Unsigned Word Unsigned
Saturate VX-form Saturate VX-form
vpkuhus VRT,VRA,VRB vpkuwus VRT,VRA,VRB
do i=0 to 63 by 8 do i=0 to 63 by 16
src1 EXTZ((VRA)i×2:i×2+15) src1 EXTZ((VRA)i×2:i×2+31 )
src2 EXTZ((VRB)i×2:i×2+15) src2 EXTZ((VRB)i×2:i×2+31 )
VRTi:i+7 Clamp( src1, 0, 255 )24:31 VRTi:i+15 Clamp( src1, 0, 216-1 )16:31
VRTi+64:i+71 Clamp( src2, 0, 255 )24:31 VRTi+64:i+79 Clamp( src2, 0, 216-1 )16:31
end end
Let the source vector be the concatenation of the Let the source vector be the concatenation of the
contents of VRA followed by the contents of VRB. contents of VRA followed by the contents of VRB.
For each integer value i from 0 to 15, do the following. For each integer value i from 0 to 7, do the following.
Unsigned-integer halfword element i in the source Unsigned-integer word element i in the source
vector is converted to an unsigned-integer byte. vector is converted to an unsigned-integer
halfword.
– If the value of the element is greater than 255
the result saturates to 255. – If the value of the element is greater than
216-1 the result saturates to 216-1.
The low-order 8 bits of the result is placed into byte
element i of VRT. The low-order 16 bits of the result is placed into
halfword element i of VRT.
Special Registers Altered:
SAT Special Registers Altered:
SAT
do i=0 to 63 by 16
VRTi:i+15 (VRA)i×2+16:i×2+31
VRTi+64:i+79 (VRB)i×2+16:i×2+31
end
Vector Unpack High Pixel VX-form Vector Unpack Low Pixel VX-form
vupkhpx VRT,VRB vupklpx VRT,VRB
do i=0 to 63 by 16 do i=0 to 63 by 16
VRTi×2:i×2+7 EXTS((VRB)i ) VRTi×2:i×2+7 EXTS((VRB)i+64 )
VRTi×2+8:i×2+15 EXTZ((VRB)i+1:i+5 ) VRTi×2+8:i×2+15 EXTZ((VRB)i+65:i+69)
VRTi×2+16:i×2+23 EXTZ((VRB)i+6:i+10 ) VRTi×2+16:i×2+23 EXTZ((VRB)i+70:i+74)
VRTi×2+24:i×2+31 EXTZ((VRB)i+11:i+15) VRTi×2+24:i×2+31 EXTZ((VRB)i+75:i+79)
end end
For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following.
Halfword element i in VRB is unpacked as follows. Halfword element i+4 in VRB is unpacked as
follows.
– sign-extend bit 0 of the halfword to 8 bits
– zero-extend bits 1:5 of the halfword to 8 bits – sign-extend bit 0 of the halfword to 8 bits
– zero-extend bits 6:10 of the halfword to 8 bits – zero-extend bits 1:5 of the halfword to 8 bits
– zero-extend bits 11:15 of the halfword to 8 – zero-extend bits 6:10 of the halfword to 8 bits
bits – zero-extend bits 11:15 of the halfword to 8
bits
The result is placed in word element i of VRT.
The result is placed in word element i of VRT.
Special Registers Altered:
None Special Registers Altered:
None
Programming Note
The source and target elements can be considered
to be 16-bit and 32-bit “pixels” respectively, having
the formats described in the Programming Note for
the Vector Pack Pixel instruction on page 248.
Programming Note
Notice that the unpacking done by the Vector
Unpack Pixel instructions does not reverse the
packing done by the Vector Pack Pixel instruction.
Specifically, if a 16-bit pixel is unpacked to a 32-bit
pixel which is then packed to a 16-bit pixel, the
resulting 16-bit pixel will not, in general, be equal
to the original 16-bit pixel (because, for each
channel except the first, Vector Unpack Pixel
inserts high-order bits while Vector Pack Pixel
discards low-order bits).
Vector Unpack High Signed Byte VX-form Vector Unpack Low Signed Byte VX-form
vupkhsb VRT,VRB vupklsb VRT,VRB
do i=0 to 63 by 8 do i=0 to 63 by 8
VRTi×2:i×2+15 EXTS((VRB)i:i+7) VRTi×2:i×2+15 EXTS((VRB)i+64:i+71)
end end
For each vector element i from 0 to 7, do the following. For each vector element i from 0 to 7, do the following.
Signed-integer byte element i in VRB is Signed-integer byte element i+8 in VRB is
sign-extended to produce a signed-integer sign-extended to produce a signed-integer
halfword and placed into halfword element i in halfword and placed into halfword element i in
VRT. VRT.
Vector Unpack High Signed Halfword Vector Unpack Low Signed Halfword
VX-form VX-form
vupkhsh VRT,VRB vupklsh VRT,VRB
do i=0 to 63 by 16 do i=0 to 63 by 16
VRTi×2:i×2+31 EXTS((VRB)i:i+15) VRTi×2:i×2+31 EXTS((VRB)i+64:i+79)
end end
For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following.
Signed-integer halfword element i in VRB is Signed-integer halfword element i+4 in VRB is
sign-extended to produce a signed-integer word sign-extended to produce a signed-integer word
and placed into word element i in VRT. and placed into word element i in VRT.
Vector Unpack High Signed Word VX-form Vector Unpack Low Signed Word VX-form
vupkhsw VRT,VRB vupklsw VRT,VRB
For each integer value i from 0 to 1, do the following. For each integer value i from 0 to 1, do the following.
The signed integer value in word element i of The signed integer value in word element i+2 of
VR[VRB] is sign-extended and placed into VR[VRB] is sign-extended and placed into
doubleword element i of VR[VRT]. doubleword element i of VR[VRT].
Vector Merge High Byte VX-form Vector Merge Low Byte VX-form
vmrghb VRT,VRA,VRB vmrglb VRT,VRA,VRB
do i=0 to 63 by 8 do i=0 to 63 by 8
VRTi×2:i×2+7 (VRA)i:i+7 VRTi×2:i×2+7 (VRA)i+64:i+71
VRTi×2+8:i×2+15 (VRB)i:i+7 VRTi×2+8:i×2+15 (VRB)i+64:i+71
end end
For each vector element i from 0 to 7, do the following. For each vector element i from 0 to 7, do the following.
Byte element i in VRA is placed into byte element Byte element i+8 in VRA is placed into byte
2×i in VRT. element 2×i in VRT.
Byte element i in VRB is placed into byte element Byte element i+8 in VRB is placed into byte
2×i+1 in VRT. element 2×i+1 in VRT.
Vector Merge High Halfword VX-form Vector Merge Low Halfword VX-form
vmrghh VRT,VRA,VRB vmrglh VRT,VRA,VRB
do i=0 to 63 by 16 do i=0 to 63 by 16
VRTi×2:i×2+15 (VRA)i:i+15 VRTi×2:i×2+15 (VRA)i+64:i+79
VRTi×2+16:i×2+31 (VRB)i:i+15 VRTi×2+16:i×2+31 (VRB)i+64:i+79
end end
For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following.
Halfword element i in VRA is placed into halfword Halfword element i+4 in VRA is placed into
element 2×i in VRT. halfword element 2×i in VRT.
Halfword element i in VRB is placed into halfword Halfword element i+4 in VRB is placed into
element 2×i+1 in VRT. halfword element 2×i+1 in VRT.
Vector Merge High Word VX-form Vector Merge Low Word VX-form
vmrghw VRT,VRA,VRB vmrglw VRT,VRA,VRB
do i=0 to 63 by 32 do i=0 to 63 by 32
VRTi×2:i×2+31 (VRA)i:i+31 VRTi×2:i×2+31 (VRA)i+64:i+95
VRTi×2+32:i×2+63 (VRB)i:i+31 VRTi×2+32:i×2+63 (VRB)i+64:i+95
end end
For each vector element i from 0 to 1, do the following. For each vector element i from 0 to 1, do the following.
Word element i in VRA is placed into word Word element i+2 in VRA is placed into word
element 2×i in VRT. element 2×i in VRT.
Word element i in VRB is placed into word Word element i+2 in VRB is placed into word
element 2×i+1 in VRT. element 2×i+1 in VRT.
The word elements in the high-order half of VRA are Special Registers Altered:
placed, in the same order, into the even-numbered None
word elements of VRT. The word elements in the
high-order half of VRB are placed, in the same order,
into the odd-numbered word elements of VRT.
Vector Merge Even Word VX-form Vector Merge Odd Word VX-form
vmrgew VRT,VRA,VRB vmrgow VRT,VRA,VRB
The contents of word element 0 of VR[VRA] are placed The contents of word element 1 of VR[VRA] are placed
into word element 0 of VR[VRT]. into word element 0 of VR[VRT].
The contents of word element 0 of VR[VRB] are placed The contents of word element 1 of VR[VRB] are placed
into word element 1 of VR[VRT]. into word element 1 of VR[VRT].
The contents of word element 2 of VR[VRA] are placed The contents of word element 3 of VR[VRA] are placed
into word element 2 of VR[VRT]. into word element 2 of VR[VRT].
The contents of word element 2 of VR[VRB] are placed The contents of word element 3 of VR[VRB] are placed
into word element 3 of VR[VRT]. into word element 3 of VR[VRT].
vmrgew is treated as a Vector instruction in terms of vmrgow is treated as a Vector instruction in terms of
resource availability. resource availability.
For each integer value i from 0 to 15, do the following. For each integer value i from 0 to 3, do the following.
The contents of byte element UIM in VRB are The contents of word element UIM in VRB are
placed into byte element i of VRT. placed into word element i of VRT.
b UIM || 0b0000
do i=0 to 127 by 16
VRTi:i+15 (VRB)b:b+15
end
do i=0 to 127 by 8
VRTi:i+7 EXTS(SIM, 8)
end
do i=0 to 127 by 16
VRTi:i+15 EXTS(SIM, 16)
end
do i=0 to 127 by 32
VRTi:i+31 EXTS(SIM, 32)
end
do i = 0 to 15 do i = 0 to 15
index VR[VRC].byte[i].bit[3:7] index VR[VRC].byte[i].bit[3:7]
VR[VRT].byte[i] src.byte[index] VR[VRT].byte[i] src.byte[31-index]
end end
Let the source vector be the concatenation of the Let the source vector be the concatenation of the
contents of VR[VRA] followed by the contents of contents of VR[VRA] followed by the contents of
VR[VRB]. VR[VRB].
For each integer value i from 0 to 15, do the following. For each integer value i from 0 to 15, do the following.
Let index be the value specified by bits 3:7 of byte Let index be the value specified by bits 3:7 of byte
element i of VR[VRC]. element i of VR[VRC].
The contents of byte element index of src are The contents of byte element 31-index of src are
placed into byte element i of VR[VRT]. placed into byte element i of VR[VRT].
Programming Note
See the Programming Notes with the Load Vector
for Shift Left and Load Vector for Shift Right
instructions on page 247 for examples of uses of
vperm.
do i = 0 to 127
mask VR[VRC].bit[i]
VR[VRT].bit[i] (mask=0) ? VR[VRA].bit[i] : VR[VRB].bit[i]
end
src.qword[0] VR[VRA]
src.qword[1] VR[VRB]
VR[VRT] src.byte[SHB:SHB+15]
t1 t1
do i = 0 to 15 do i = 0 to 15
t t & (VR[VRB].byte[i].bit[5:7] = sh) t t & (VR[VRB].byte[i].bit[5:7]=sh)
end end
if t=1 then if t=1 then
VR[VRT] VR[VRA] << sh VR[VRT] VR[VRA] >> sh
else else
VR[VRT] undefined VR[VRT] undefined
The contents of VR[VRA] are shifted left by the number The contents of VR[VRA] are shifted right by the number
of bits specified in bits 125:127 of VR[VRB]. of bits specified in bits 125:127 of VR[VRB].
– Bits shifted out of bit 0 are lost. – Bits shifted out of bit 127 are lost.
– Zeros are supplied to the vacated bits on the right. – Zeros are supplied to the vacated bits on the left.
The result is place into VR[VRT], except if, for any byte The result is place into VR[VRT], except if, for any byte
element in register VR[VRB], the low-order 3 bits are not element in register VR[VRB], the low-order 3 bits are not
equal to the shift amount, then VR[VRT] is undefined. equal to the shift amount, then VR[VRT] is undefined.
Vector Shift Left by Octet VX-form Vector Shift Right by Octet VX-form
vslo VRT,VRA,VRB vsro VRT,VRA,VRB
The contents of VR[VRA] are shifted left by the number The contents of VR[VRA] are shifted right by the number
of bytes specified in bits 121:124 of VR[VRB]. of bytes specified in bits 121:124 of VR[VRB].
– Bytes shifted out of byte 0 are lost. – Bytes shifted out of byte 15 are lost.
– Zeros are supplied to the vacated bytes on the – Zeros are supplied to the vacated bytes on the
right. left.
The result is placed into VR[VRT]. The result is placed into VR[VRT].
Programming Note
A double-register shift by a dynamically specified number of bits (0-127) can be performed in six instructions.
The following example shifts Vw || Vx left by the number of bits specified in Vy and places the high-order 128 bits
of the result into Vz.
Vector Shift Left Variable VX-form Vector Shift Right Variable VX-form
vslv VRT,VRA,VRB vsrv VRT,VRA,VRB
do i = 0 to 15 do i = 0 to 15
sh VR[VRB].byte[i].bit[5:7] sh VR[VRB].byte[i].bit[5:7]
VR[VRT].byte[i] src.byte[i:i+1].bit[sh:sh+7] VR[VRT].byte[i] src.byte[i:i+1].bit[8-sh:15-sh]
end end
Let bytes 0:15 of src be the contents of VR[VRA]. Let bytes 1:16 of src be the contents of VR[VRA].
Let byte 16 of src be the value 0x00. Let byte 0 of src be the value 0x00.
For each integer value i from 0 to 15, do the following. For each integer value i from 0 to 15, do the following.
Let sh be the value in bits 5:7 of byte element i of Let sh be the value in bits 5:7 of byte element i of
VR[VRB]. VR[VRB].
The contents of bits sh:sh+7 of the halfword in The contents of bits 8-sh:15-sh of the halfword in
byte elements i:i+1 of src are placed into byte byte elements i:i+1 of src are placed into byte
element i of VR[VRT]. element i of VR[VRT].
Programming Note
Assume vSRC contains a vector of packed 7-bit values, A located in bits 0:6, B located in bits 7:13, C located in
bits 14:20, etc..
The leftmost seven packed 7-bit values can be unpacked into byte elements 0 to 6 using vsrv with vSHCNT1.
The next seven packed 7-bit values can then be unpacked into byte elements 7 to 13 using vsrv with vSHCNT2.
The next two packed 7-bit values can then be unpacked into byte elements 14 to 15 using vsrv with vSHCNT3.
The most-significant bit in each byte element is masked off to produce a vector of sixteen unsigned byte
elements.
The vector of sixteen unsigned byte elements can be further unpacked to two vectors of eight unsigned halfword
elements using a vupkhsb and a vupklsb.
The resultant two vectors of eight unsigned halfword elements can then be further unpacked to four vectors of
four unsigned word elements using two vupkhsh and two vupklsh instructions.
The contents of byte elements UIM:UIM+3 of VR[VRB] The contents of byte elements UIM:UIM+7 of VR[VRB]
are placed into word element 1 of VR[VRT]. The are placed into VR[VRT]. The contents of doubleword
contents of the remaining word elements of VR[VRT] element 1 of VR[VRT] are set to 0.
are set to 0.
If the value of UIM is greater than 8, the results are
If the value of UIM is greater than 12, the results are undefined.
undefined.
Special Registers Altered:
Special Registers Altered: None
None
The contents of byte element 7 of VR[VRB] are placed The contents of halfword element 3 of VR[VRB] are
into byte element UIM of VR[VRT]. The contents of the placed into byte elements UIM:UIM+1 of VR[VRT]. The
remaining byte elements of VR[VRT] are not modified. contents of the remaining byte elements of VR[VRT] are
not modified.
Special Registers Altered:
None If the value of UIM is greater than 14, the results are
undefined.
The contents of word element 1 of VR[VRB] are placed The contents of doubleword element 0 of VR[VRB] are
into byte elements UIM:UIM+3 of VR[VRT]. The contents placed into byte elements UIM:UIM+7 of VR[VRT]. The
of the remaining byte elements of VR[VRT] are not contents of the remaining byte elements of VR[VRT] are
modified. not modified.
If the value of UIM is greater than 12, the results are If the value of UIM is greater than 8, the results are
undefined. undefined.
Vector Add and Write Carry-Out Unsigned Vector Add Signed Halfword Saturate
Word VX-form VX-form
vaddcuw VRT,VRA,VRB vaddshs VRT,VRA,VRB
For each integer value i from 0 to 3, do the following. For each integer value i from 0 to 7, do the following.
Unsigned-integer word element i in VRA is added Signed-integer halfword element i in VRA is added
to unsigned-integer word element i in VRB. The to signed-integer halfword element i in VRB.
carry out of the 32-bit sum is zero-extended to 32
bits and placed into word element i of VRT. – If the sum is greater than 215-1 the result
saturates to 215-1
Special Registers Altered:
None – If the sum is less than -215 the result
saturates to -215.
Vector Add Signed Byte Saturate VX-form The low-order 16 bits of the result are placed into
halfword element i of VRT.
vaddsbs VRT,VRA,VRB
Special Registers Altered:
4 VRT VRA VRB 768 SAT
0 6 11 16 21 31
do i=0 to 127 by 8
aop EXTS(VRAi:i+7)
bop EXTS(VRBi:i+7)
VRTi:i+7 Clamp( aop +int bop, -128, 127 )24:31
end
Vector Add Signed Word Saturate Vector Add Unsigned Doubleword Modulo
VX-form VX-form
vaddsws VRT,VRA,VRB vaddudm VRT,VRA,VRB
do i=0 to 127 by 32 do i = 0 to 1
aop EXTS((VRA)i:i+31) aop VR[VRA].dword[i]
bop EXTS((VRB)i:i+31) bop VR[VRB].dword[i]
VRTi:i+31 Clamp(aop +int bop, -231, 231-1) VR[VRT].dword[i] Chop( aop +int bop, 64 )
end end
For each integer value i from 0 to 3, do the following. For each integer value i from 0 to 1, do the following.
Signed-integer word element i in VRA is added to The integer value in doubleword element i of
signed-integer word element i in VRB. VR[VRB] is added to the integer value in
doubleword element i of VR[VRA].
– If the sum is greater than 231-1 the result
saturates to 231-1. The low-order 64 bits of the result are placed into
doubleword element i of VR[VRT].
– If the sum is less than -231 the result
saturates to -231. Special Registers Altered:
None
The low-order 32 bits of the result are placed into
word element i of VRT. Programming Note
Special Registers Altered: vaddudm can be used for signed or unsigned inte-
SAT gers.
do i=0 to 127 by 8
aop EXTZ((VRA)i:i+7)
bop EXTZ((VRB)i:i+7)
VRTi:i+7 Chop( aop +int bop, 8 )
end
Programming Note
vaddubm can be used for unsigned or
signed-integers.
Vector Add Unsigned Halfword Modulo Vector Add Unsigned Word Modulo
VX-form VX-form
vadduhm VRT,VRA,VRB vadduwm VRT,VRA,VRB
The low-order 16 bits of the result are placed into The low-order 32 bits of the result are placed into
halfword element i of VRT. word element i of VRT.
Vector Add Unsigned Byte Saturate Vector Add Unsigned Word Saturate
VX-form VX-form
vaddubs VRT,VRA,VRB vadduws VRT,VRA,VRB
For each integer value i from 0 to 15, do the following. For each integer value i from 0 to 3, do the following.
Unsigned-integer byte element i in VRA is added Unsigned-integer word element i in VRA is added
to unsigned-integer byte element i in VRB. to unsigned-integer word element i in VRB.
– If the sum is greater than 255 the result – If the sum is greater than 232-1 the result
saturates to 255. saturates to 232-1.
The low-order 8 bits of the result are placed into The low-order 32 bits of the result are placed into
byte element i of VRT. word element i of VRT.
do i=0 to 127 by 16
aop EXTZ((VRA)i:i+15)
bop EXTZ((VRB)i:i+15)
VRTi:i+15 Clamp(aop +int bop, 0, 216-1)16:31
end
Vector Add Unsigned Quadword Modulo Vector Add & write Carry Unsigned
VX-form Quadword VX-form
vadduqm VRT,VRA,VRB vaddcuq VRT,VRA,VRB
Let src1 be the integer value in VR[VRA]. Let src1 be the integer value in VR[VRA].
Let src2 be the integer value in VR[VRB]. Let src2 be the integer value in VR[VRB].
src1 and src2 can be signed or unsigned integers. src1 and src2 can be signed or unsigned integers.
The rightmost 128 bits of the sum of src1 and src2 are The carry out of the sum of src1 and src2 is placed
placed into VR[VRT]. into VR[VRT].
Vector Add Extended Unsigned Vector Add Extended & write Carry
Quadword Modulo VA-form Unsigned Quadword VA-form
vaddeuqm VRT,VRA,VRB,VRC vaddecuq VRT,VRA,VRB,VRC
VR[VRT] Chop(sum, 128) VR[VRT] Chop( EXTZ( Chop(sum >> 128, 1) ), 128 )
Let src1 be the integer value in VR[VRA]. Let src1 be the integer value in VR[VRA].
Let src2 be the integer value in VR[VRB]. Let src2 be the integer value in VR[VRB].
Let cin be the integer value in bit 127 of VR[VRC]. Let cin be the integer value in bit 127 of VR[VRC].
src1 and src2 can be signed or unsigned integers. src1 and src2 can be signed or unsigned integers.
The rightmost 128 bits of the sum of src1, src2, and cin The carry out of the sum of src1, src2, and cin are
are placed into VR[VRT]. placed into VR[VRT].
Programming Note
The Vector Add Unsigned Quadword instructions support efficient wide-integer addition. The following code
sequence can be used to implement a 512-bit signed or unsigned add operation.
Vector Subtract and Write Carry-Out Vector Subtract Signed Halfword Saturate
Unsigned Word VX-form VX-form
vsubcuw VRT,VRA,VRB vsubshs VRT,VRA,VRB
For each integer value i from 0 to 3, do the following. For each integer value i from 0 to 7, do the following.
Unsigned-integer word element i in VRB is Signed-integer halfword element i in VRB is
subtracted from unsigned-integer word element i subtracted from signed-integer halfword element i
in VRA. The complement of the borrow out of bit 0 in VRA.
of the 32-bit difference is zero-extended to 32 bits
and placed into word element i of VRT. – If the intermediate result is greater than 215-1
the result saturates to 215-1.
Special Registers Altered:
None – If the intermediate result is less than -215 the
result saturates to -215.
Vector Subtract Signed Byte Saturate The low-order 16 bits of the result are placed into
VX-form halfword element i of VRT.
do i=0 to 127 by 8
aop EXTS((VRA)i:i+7)
bop EXTS((VRB)i:i+7)
VRTi:i+7 Clamp(aop +int ¬bop +int 1, -128, 127)24:31
end
do i=0 to 127 by 32
aop EXTS((VRA)i:i+31)
bop EXTS((VRB)i:i+31)
VRTi:i+31 Clamp(aop +int ¬bop +int 1,-231,231-1)
end
For each integer value i from 0 to 15, do the following. For each integer value i from 0 to 7, do the following.
Unsigned-integer byte element i in VRB is Unsigned-integer halfword element i in VRB is
subtracted from unsigned-integer byte element i in subtracted from unsigned-integer halfword
VRA. The low-order 8 bits of the result are placed element i in VRA. The low-order 16 bits of the
into byte element i of VRT. result are placed into halfword element i of VRT.
do i = 0 to 1 do i=0 to 127 by 32
aop VR[VRA].dword[i] aop EXTZ((VRA)i:i+31)
bop VR[VRB].dword[i] bop EXTZ((VRB)i:i+31)
VR[VRT].dword[i] Chop( aop +int ~bop +int 1, 64 ) VRTi:i+31 Chop( aop +int ¬bop +int 1, 32 )
end end
For each integer value i from 0 to 1, do the following. For each integer value i from 0 to 3, do the following.
The integer value in doubleword element i of Unsigned-integer word element i in VRB is
VR[VRB] is subtracted from the integer value in subtracted from unsigned-integer word element i
doubleword element i of VR[VRA]. in VRA. The low-order 32 bits of the result are
placed into word element i of VRT.
The low-order 64 bits of the result are placed into
doubleword element i of VR[VRT]. Special Registers Altered:
None
Special Registers Altered:
None
Programming Note
vsubudm can be used for signed or unsigned inte-
gers.
Vector Subtract Unsigned Byte Saturate Vector Subtract Unsigned Word Saturate
VX-form VX-form
vsububs VRT,VRA,VRB
vsubuws VRT,VRA,VRB
4 VRT VRA VRB 1536
0 6 11 16 21 31 4 VRT VRA VRB 1664
0 6 11 16 21 31
do i=0 to 127 by 8
aop EXTZ((VRA)i:i+7) do i=0 to 127 by 32
bop EXTZ((VRB)i:i+7) aop EXTZ((VRA)i:i+31)
VRTi:i+7 Clamp(aop +int ¬bop +int 1, 0, 255)24:31 bop EXTZ((VRB)i:i+31)
end VRTi:i+31 Clamp(aop +int ¬bop +int 1, 0, 232-1)
end
For each integer value i from 0 to 15, do the following.
Unsigned-integer byte element i in VRB is For each integer value i from 0 to 7, do the following.
subtracted from unsigned-integer byte element i in Unsigned-integer word element i in VRB is
VRA. If the intermediate result is less than 0 the subtracted from unsigned-integer word element i
result saturates to 0. The low-order 8 bits of the in VRA.
result are placed into byte element i of VRT.
– If the intermediate result is less than 0 the
Special Registers Altered: result saturates to 0.
SAT
The low-order 32 bits of the result are placed into
word element i of VRT.
Vector Subtract Unsigned Halfword
Saturate VX-form Special Registers Altered:
SAT
vsubuhs VRT,VRA,VRB
do i=0 to 127 by 16
aop EXTZ((VRA)i:i+15)
bop EXTZ((VRB)i:i+15)
VRTi:i+15 Clamp(aop +int ¬bop +int 1,0,216-1)16:31
end
Vector Subtract Unsigned Quadword Vector Subtract & write Carry Unsigned
Modulo VX-form Quadword VX-form
vsubuqm VRT,VRA,VRB vsubcuq VRT,VRA,VRB
Let src1 be the integer value in VR[VRA]. Let src1 be the integer value in VR[VRA].
Let src2 be the integer value in VR[VRB]. Let src2 be the integer value in VR[VRB].
src1 and src2 can be signed or unsigned integers. src1 and src2 can be signed or unsigned integers.
The rightmost 128 bits of the sum of src1, the one’s The carry out of the sum of src1, the one’s
complement of src2, and the value 1 are placed into complement of src2, and the value 1 is placed into
VR[VRT]. VR[VRT].
Vector Subtract Extended Unsigned Vector Subtract Extended & write Carry
Quadword Modulo VA-form Unsigned Quadword VA-form
vsubeuqm VRT,VRA,VRB,VRC vsubecuq VRT,VRA,VRB,VRC
Let src1 be the integer value in VR[VRA]. Let src1 be the integer value in VR[VRA].
Let src2 be the integer value in VR[VRB]. Let src2 be the integer value in VR[VRB].
Let cin be the integer value in bit 127 of VR[VRC]. Let cin be the integer value in bit 127 of VR[VRC].
src1 and src2 can be signed or unsigned integers. src1 and src2 can be signed or unsigned integers.
The rightmost 128 bits of the sum of src1, the one’s The carry out of the sum of src1, the one’s
complement of src2, and cin are placed into VR[VRT]. complement of src2, and cin are placed into VR[VRT].
Programming Note
The Vector Subtract Unsigned Quadword instructions support efficient wide-integer subtraction. The following
code sequence can be used to implement a 512-bit signed or unsigned subtract operation.
Vector Multiply Even Signed Byte VX-form Vector Multiply Odd Signed Byte VX-form
vmulesb VRT,VRA,VRB vmulosb VRT,VRA,VRB
For each integer value i from 0 to 7, do the following. For each integer value i from 0 to 7, do the following.
Signed-integer byte element i×2 in VRA is Signed-integer byte element i×2+1 in VRA is
multiplied by signed-integer byte element i×2 in multiplied by signed-integer byte element i×2+1 in
VRB. The low-order 16 bits of the product are VRB. The low-order 16 bits of the product are
placed into halfword element i VRT. placed into halfword element i VRT.
Vector Multiply Even Unsigned Byte Vector Multiply Odd Unsigned Byte
VX-form VX-form
vmuleub VRT,VRA,VRB vmuloub VRT,VRA,VRB
For each integer value i from 0 to 7, do the following. For each integer value i from 0 to 7, do the following.
Unsigned-integer byte element i×2 in VRA is Unsigned-integer byte element i×2+1 in VRA is
multiplied by unsigned-integer byte element i×2 in multiplied by unsigned-integer byte element i×2+1
VRB. The low-order 16 bits of the product are in VRB. The low-order 16 bits of the product are
placed into halfword element i VRT. placed into halfword element i VRT.
Vector Multiply Even Signed Halfword Vector Multiply Odd Signed Halfword
VX-form VX-form
vmulesh VRT,VRA,VRB vmulosh VRT,VRA,VRB
For each integer value i from 0 to 3, do the following. For each integer value i from 0 to 3, do the following.
Signed-integer halfword element i×2 in VRA is Signed-integer halfword element i×2+1 in VRA is
multiplied by signed-integer halfword element i×2 multiplied by signed-integer halfword element
in VRB. The low-order 32 bits of the product are i×2+1 in VRB. The low-order 32 bits of the product
placed into halfword element i VRT. are placed into halfword element i VRT.
Vector Multiply Even Unsigned Halfword Vector Multiply Odd Unsigned Halfword
VX-form VX-form
vmuleuh VRT,VRA,VRB vmulouh VRT,VRA,VRB
For each integer value i from 0 to 3, do the following. For each integer value i from 0 to 3, do the following.
Unsigned-integer halfword element i×2 in VRA is Unsigned-integer halfword element i×2+1 in VRA
multiplied by unsigned-integer halfword element is multiplied by unsigned-integer halfword element
i×2 in VRB. The low-order 32 bits of the product i×2+1 in VRB. The low-order 32 bits of the product
are placed into halfword element i VRT. are placed into halfword element i VRT.
Vector Multiply Even Signed Word Vector Multiply Odd Signed Word
VX-form VX-form
vmulesw VRT,VRA,VRB vmulosw VRT,VRA,VRB
do i = 0 to 1 do i = 0 to 1
src1 VR[VRA].word[2×i] src1 VR[VRA].word[2×i+1]
src2 VR[VRB].word[2×i] src2 VR[VRB].word[2×i+1]
VR[VRT].dword[i] src1 ×si src2 VR[VRT].dword[i] src1 ×si src2
end end
For each integer value i from 0 to 1, do the following. For each integer value i from 0 to 1, do the following.
The signed integer in word element 2×i of VR[VRA] The signed integer in word element 2×i+1 of
is multiplied by the signed integer in word element VR[VRA] is multiplied by the signed integer in word
2×i of VR[VRB]. element 2×i+1 of VR[VRB].
The 64-bit product is placed into doubleword The 64-bit product is placed into doubleword
element i of VR[VRT]. element i of VR[VRT].
Vector Multiply Even Unsigned Word Vector Multiply Odd Unsigned Word
VX-form VX-form
vmuleuw VRT,VRA,VRB vmulouw VRT,VRA,VRB
do i = 0 to 1 do i = 0 to 1
src1 VR[VRA].word[2×i] src1 VR[VRA].word[2×i+1]
src2 VR[VRB].word[2×i] src2 VR[VRB].word[2×i+1]
VR[VRT].dword[i] src1 ×ui src2 VR[VRT].dword[i] src1 ×ui src2
end end
For each integer value i from 0 to 1, do the following. For each integer value i from 0 to 1, do the following.
The unsigned integer in word element 2×i of The unsigned integer in word element 2×i+1 of
VR[VRA] is multiplied by the unsigned integer in VR[VRA] is multiplied by the unsigned integer in
word element 2×i of VR[VRB]. word element 2×i+1 of VR[VRB].
The 64-bit product is placed into doubleword The 64-bit product is placed into doubleword
element i of VR[VRT]. element i of VR[VRT].
do i = 0 to 3
src1 VR[VRA].word[i]
src2 VR[VRB].word[i]
VR[VRT].word[i] Chop( src1 ×ui src2, 32 )
end
Programming Note
vmuluwm can be used for unsigned or signed
integers.
– If the intermediate result is less than -215 the – If the intermediate result is greater than 215-1
result saturates to -215. the result saturates to 215-1.
The low-order 16 bits of the result are placed into – If the intermediate result is less than -215 the
halfword element i of VRT. result saturates to -215.
Special Registers Altered: The low-order 16 bits of the result are placed into
SAT halfword element i of VRT.
For each word element in VRT the following operations For each word element in VRT the following operations
are performed, in the order shown. are performed, in the order shown.
– Each of the four signed-integer byte elements – Each of the two signed-integer halfword elements
contained in the corresponding word element of contained in the corresponding word element of
VRA is multiplied by the corresponding VRA is multiplied by the corresponding
unsigned-integer byte element in VRB, producing signed-integer halfword element in VRB,
a signed-integer product. producing a signed-integer product.
– The sum of these four signed-integer halfword – The sum of these two signed-integer word
products is added to the signed-integer word products is added to the signed-integer word
element in VRC. element in VRC.
– The signed-integer result is placed into the – The signed-integer word result is placed into the
corresponding word element of VRT. corresponding word element of VRT.
For each word element in VRT the following operations For each word element in VRT the following operations
are performed, in the order shown. are performed, in the order shown.
– Each of the two signed-integer halfword elements – Each of the two unsigned-integer halfword
contained in the corresponding word element of elements contained in the corresponding word
VRA is multiplied by the corresponding element of VRA is multiplied by the corresponding
signed-integer halfword element in VRB, unsigned-integer halfword element in VRB,
producing a signed-integer product. producing an unsigned-integer word product.
– The sum of these two signed-integer word – The sum of these two unsigned-integer word
products is added to the signed-integer word products is added to the unsigned-integer word
element in VRC. element in VRC.
– If the intermediate result is greater than 231-1 the – The unsigned-integer result is placed into the
result saturates to 231-1 and if it is less than -231 it corresponding word element of VRT.
saturates to -231.
Special Registers Altered:
– The result is placed into the corresponding word None
element of VRT.
– The sum of these two unsigned-integer word The low-order 128 bits of the sum are placed into
products is added to the unsigned-integer word VR[VRT]. Any carry out or overflow status is discarded.
element in VRC.
Special Registers Altered:
– If the intermediate result is greater than 2 32-1
the None
result saturates to 232-1.
Programming Note
– The result is placed into the corresponding word A horizontal add of the doubleword elements in
element of VRT. VR[VRA] can be performed using vmsumudm
when VR[VRB] contains the doubleword integer
Special Registers Altered: values {1,1} and VR[VRC] contains the quadword
SAT integer value 0.
Vector Sum across Signed Word Saturate Vector Sum across Half Signed Word
VX-form Saturate VX-form
vsumsws VRT,VRA,VRB vsum2sws VRT,VRA,VRB
Vector Sum across Quarter Signed Byte Vector Sum across Quarter Signed
Saturate VX-form Halfword Saturate VX-form
vsum4sbs VRT,VRA,VRB vsum4shs VRT,VRA,VRB
For each integer value i from 0 to 3, do the following. For each integer value i from 0 to 3, do the following.
The sum of the four signed-integer byte elements The sum of the two signed-integer halfword
contained in word element i of VRA is added to elements contained in word element i of VRA is
signed-integer word element i in VRB. added to signed-integer word element i in VRB.
– If the intermediate result is greater than 231-1 – If the intermediate result is greater than 231-1
the result saturates to 231-1. the result saturates to 231-1.
– If the intermediate result is less than -231 the – If the intermediate result is less than -231 the
result saturates to -231. result saturates to -231.
The low-order 32 bits of the result are placed into The low-order 32 bits of the result are placed into
word element i of VRT. the corresponding word element of VRT.
do i=0 to 127 by 32
temp EXTZ((VRB)i:i+31)
do j=0 to 31 by 8
temp temp +int EXTZ((VRA)i+j:i+j+7)
end
VRTi:i+31 Clamp( temp, 0, 232-1 )
end
do i = 0 to 3 do i = 0 to 1
src EXTS(VR[VRB].word[i]) src EXTS(VR[VRB].dword[i])
VR[VRT].word[i] Chop((¬src + 1), 32) VR[VRT]dword[i] Chop((¬src + 1), 64)
end end
For each integer value i from 0 to 3, do the following. For each integer value i from 0 to 1, do the following.
The sum of the one’s-complement of the signed The sum of the one’s-complement of the signed
integer in word element i of VR[VRB] and 1 is integer in doubleword element i of VR[VRB] and 1
placed into word element i of VR[VRT]. is placed into doubleword element i of VR[VRT].
Vector Extend Sign Byte To Word VX-form Vector Extend Sign Byte To Doubleword
vextsb2w VRT,VRB
VX-form
vextsb2d VRT,VRB
4 VRT 16 VRB 1538
0 6 11 16 21 31 4 VRT 24 VRB 1538
0 6 11 16 21 31
if MSR.VEC=0 then Vector_Unavailable()
if MSR.VEC=0 then Vector_Unavailable()
do i = 0 to 3
VR[VRT].word[i] EXTS32(VR[VRB].word[i].byte[3]) do i = 0 to 1
end VR[VRT].dword[i] EXTS64(VR[VRB].dword[i].byte[7])
end
For each integer value i from 0 to 3, do the following.
The rightmost byte of word element i of VR[VRB] is For each integer value i from 0 to 1, do the following.
sign-extended and placed into word element i of The rightmost byte of doubleword element i of
VR[VRT]. VR[VRB] is sign-extended and placed into
doubleword element i of VR[VRT].
Special Registers Altered:
None Special Registers Altered:
None
Vector Extend Sign Word To Doubleword The rightmost word of doubleword element i of
VX-form VR[VRB] is sign-extended and placed into
doubleword element i of VR[VRT].
vextsw2d VRT,VRB
do i = 0 to 1
VR[VRT].dword[i] EXTS64(VR[VRB].dword[i].word[1])
end
Vector Average Signed Byte VX-form Vector Average Signed Word VX-form
vavgsb VRT,VRA,VRB vavgsw VRT,VRA,VRB
For each integer value i from 0 to 15, do the following. For each integer value i from 0 to 3, do the following.
Signed-integer byte element i in VRA is added to Signed-integer word element i in VRA is added to
signed-integer byte element i in VRB. The sum is signed-integer word element i in VRB. The sum is
incremented by 1 and then shifted right 1 bit. incremented by 1 and then shifted right 1 bit.
The low-order 8 bits of the result are placed into The low-order 32 bits of the result are placed into
byte element i of VRT. word element i of VRT.
do i=0 to 127 by 16
aop EXTS((VRA)i:i+15)
bop EXTS((VRB)i:i+15)
VRTi:i+15 Chop(( aop +int bop +int 1 ) >> 1, 16)
end
do i=0 to 127 by 32
aop EXTZ((VRA)i:i+31)
bop EXTZ((VRB)i:i+31)
VRTi:i+31 Chop((aop +int bop +int 1) >>ui 1, 32)
end
for i = 0 to 15 for i = 0 to 7
src1 EXTZ(VR[VRA].byte[i]) src1 EXTZ(VR[VRA].hword[i])
src2 EXTZ(VR[VRB].byte[i]) src2 EXTZ(VR[VRB].hword[i])
if (src1>src2) then if (src1>src2) then
VR[VRT].byte[i] Chop(src1 + ¬src2 + 1, 8) VR[VRT].hword[i] Chop(src1 + ¬src2 + 1, 16)
else else
VR[VRT].byte[i] Chop(src2 + ¬src1 + 1, 8) VR[VRT].hword[i] Chop(src2 + ¬src1 + 1, 16)
end end
For each integer value i from 0 to 15, do the following. For each integer value i from 0 to 7, do the following.
The unsigned integer value in byte element i of The unsigned integer value in halfword element i
VR[VRA] is subtracted by the unsigned integer of VR[VRA] is subtracted by the unsigned integer
value in byte element i of VR[VRB]. The absolute value in halfword element i of VR[VRB]. The
value of the difference is placed into byte element absolute value of the difference is placed into
i of VR[VRT]. halfword element i of VR[VRT].
for i = 0 to 3
src1 EXTZ(VR[VRA].word[i])
src2 EXTZ(VR[VRB].word[i])
if (src1>src2) then
VR[VRT].word[i] Chop(src1 + ¬src2 + 1, 32)
else
VR[VRT].word[i] Chop(src2 + ¬src1 + 1, 32)
end
Vector Maximum Signed Byte VX-form Vector Maximum Unsigned Byte VX-form
vmaxsb VRT,VRA,VRB vmaxub VRT,VRA,VRB
For each integer value i from 0 to 15, do the following. For each integer value i from 0 to 15, do the following.
Signed-integer byte element i in VRA is compared Unsigned-integer byte element i in VRA is
to signed-integer byte element i in VRB. The compared to unsigned-integer byte element i in
larger of the two values is placed into byte VRB. The larger of the two values is placed into
element i of VRT. byte element i of VRT.
do i = 0 to 1 do i = 0 to 1
aop VR[VRA].dword[i] aop VR[VRA].dword[i]
bop VR[VRB].dword[i] bop VR[VRB].dword[i]
VR[VRT].dword[i] (aop >si bop) ? aop : bop VR[VRT].dword[i] (aop >ui bop) ? aop : bop
end end
For each integer value i from 0 to 1, do the following. For each integer value i from 0 to 1, do the following.
The signed integer value in doubleword element i The unsigned integer value in doubleword
of VR[VRA] is compared to the signed integer value element i of VR[VRA] is compared to the unsigned
in doubleword element i of VR[VRB]. The larger of integer value in doubleword element i of VR[VRB].
the two values is placed into doubleword element The larger of the two values is placed into
i of VR[VRT]. doubleword element i of VR[VRT].
For each integer value i from 0 to 7, do the following. For each integer value i from 0 to 7, do the following.
Signed-integer halfword element i in VRA is Unsigned-integer halfword element i in VRA is
compared to signed-integer halfword element i in compared to unsigned-integer halfword element i
VRB. The larger of the two values is placed into in VRB. The larger of the two values is placed into
halfword element i of VRT. halfword element i of VRT.
Vector Maximum Signed Word VX-form Vector Maximum Unsigned Word VX-form
vmaxsw VRT,VRA,VRB vmaxuw VRT,VRA,VRB
For each integer value i from 0 to 3, do the following. For each integer value i from 0 to 3, do the following.
Signed-integer word element i in VRA is Unsigned-integer word element i in VRA is
compared to signed-integer word element i in compared to unsigned-integer word element i in
VRB. The larger of the two values is placed into VRB. The larger of the two values is placed into
word element i of VRT. word element i of VRT.
Vector Minimum Signed Byte VX-form Vector Minimum Unsigned Byte VX-form
vminsb VRT,VRA,VRB vminub VRT,VRA,VRB
For each integer value i from 0 to 15, do the following. For each integer value i from 0 to 15, do the following.
Signed-integer byte element i in VRA is compared Unsigned-integer byte element i in VRA is
to signed-integer byte element i in VRB. The compared to unsigned-integer byte element i in
smaller of the two values is placed into byte VRB. The smaller of the two values is placed into
element i of VRT. byte element i of VRT.
do i = 0 to 1 do i = 0 to 1
aop VR[VRA].dword[i] aop VR[VRA].dword[i]
bop VR[VRB].dword[i] bop VR[VRB].dword[i]
VR[VRT].dword[i] (EXTS(aop) <si EXTS(bop)) ? aop : bop VR[VRT].dword[i] (aop <ui bop) ? aop : bop
end end
For each integer value i from 0 to 1, do the following. For each integer value i from 0 to 1, do the following.
The signed integer value in doubleword element i The unsigned integer value in doubleword
of VR[VRA] is compared to the signed integer value element i of VR[VRA] is compared to the unsigned
in doubleword element i of VR[VRB]. The smaller integer value in doubleword element i of VR[VRB].
of the two values is placed into doubleword The smaller of the two values is placed into
element i of VR[VRT]. doubleword element i of VR[VRT].
For each integer value i from 0 to 7, do the following. For each integer value i from 0 to 7, do the following.
Signed-integer halfword element i in VRA is Unsigned-integer halfword element i in VRA is
compared to signed-integer halfword element i in compared to unsigned-integer halfword element i
VRB. The smaller of the two values is placed into in VRB. The smaller of the two values is placed
halfword element i of VRT. into halfword element i of VRT.
Vector Minimum Signed Word VX-form Vector Minimum Unsigned Word VX-form
vminsw VRT,VRA,VRB vminuw VRT,VRA,VRB
For each integer value i from 0 to 3, do the following. For each integer value i from 0 to 3, do the following.
Signed-integer word element i in VRA is Unsigned-integer word element i in VRA is
compared to signed-integer word element i in compared to unsigned-integer word element i in
VRB. The smaller of the two values is placed into VRB. The smaller of the two values is placed into
word element i of VRT. word element i of VRT.
For each integer value i from 0 to 15, do the following. For each integer value i from 0 to 7, do the following.
Unsigned-integer byte element i in VRA is Unsigned-integer halfword element i in VRA is
compared to unsigned-integer byte element i in compared to unsigned-integer halfword element
VRB. Byte element i in VRT is set to all 1s if element i in VRB. Halfword element i in VRT is set
unsigned-integer byte element i in VRA is equal to to all 1s if unsigned-integer halfword element i in
unsigned-integer byte element i in VRB, and is set VRA is equal to unsigned-integer halfword
to all 0s otherwise. element i in VRB, and is set to all 0s otherwise.
do i=0 to 127 by 32 do i = 0 to 1
VRTi:i+31 ((VRA)i:i+31 =int (VRB)i:i+31) ? 32
1 : 32
0 aop EXTZ(VR[VRA].dword[i])
end bop EXTZ(VR[VRB].dword[i])
if Rc=1 then do if (aop = bop) then do
t (VRT=1281) VR[VRT].dword[i] 0xFFFF_FFFF_FFFF_FFFF
f (VRT=1280) flag.bit[i] 0b1
CR6 t || 0b0 || f || 0b0 end
end else do
VR[VRT].dword[i] 0x0000_0000_0000_0000
For each integer value i from 0 to 3, do the following. flag.bit[i] 0b0
The unsigned integer value in word element i in end
VR[VRA] is compared to the unsigned integer value end
in word element i in VR[VRB]. Word element i in if Rc=1 then do
CR.bit[24] (flag=0b11)
VR[VRT] is set to all 1s if unsigned-integer word
CR.bit[25] 0b0
element i in VR[VRA] is equal to unsigned-integer CR.bit[26] (flag=0b00)
word element i in VR[VRB], and is set to all 0s CR.bit[27] 0b0
otherwise. end
Special Registers Altered: For each integer value i from 0 to 1, do the following.
CR field 6 . . . . . . . . . . . . . . . . . . . . . . . . . . (if Rc=1) The unsigned integer value in doubleword
element i of VR[VRA] is compared to the unsigned
integer value in doubleword element i of VR[VRB].
Doubleword element i of VR[VRT] is set to all 1s if
the unsigned integer value in doubleword element
i of VR[VRA] is equal to the unsigned integer value
in doubleword element i of VR[VRB], and is set to
all 0s otherwise.
Vector Compare Greater Than Signed Vector Compare Greater Than Signed
Byte VC-form Doubleword VX-form
vcmpgtsb VRT,VRA,VRB (Rc=0) vcmpgtsd VRT,VRA,VRB (Rc=0)
vcmpgtsb. VRT,VRA,VRB (Rc=1) vcmpgtsd. VRT,VRA,VRB (Rc=1)
do i=0 to 127 by 8 do i = 0 to 1
VRTi:i+7 ((VRA)i:i+7 >si (VRB)i:i+7) ? 81 : 80 aop EXTS(VR[VRA].dword[i])
end bop EXTS(VR[VRB].dword[i])
if Rc=1 then do if (aop >si bop) then do
t (VRT=1281) VR[VRT].dword[i] 0xFFFF_FFFF_FFFF_FFFF
f (VRT=1280) flag.bit[i] 0b1
CR6 t || 0b0 || f || 0b0 end
end else do
VR[VRT].dword[i] 0x0000_0000_0000_0000
For each integer value i from 0 to 15, do the following. flag.bit[i] 0b0
The signed integer value in byte element i in end
VR[VRA] is compared to the signed integer value in end
byte element i in VR[VRB]. Byte element i in if “vcmpgtsd.” then do
CR.bit[24] (flag=0b11)
VR[VRT] is set to all 1s if signed-integer byte
CR.bit[25] 0b0
element i in VR[VRA] is greater than to CR.bit[26] (flag=0b00)
signed-integer byte element i in VR[VRB], and is set CR.bit[27] 0b0
to all 0s otherwise. end
Special Registers Altered: For each integer value i from 0 to 1, do the following.
CR field 6 . . . . . . . . . . . . . . . . . . . . . . . . . . (if Rc=1) The signed integer value in doubleword element i
of VR[VRA] is compared to the signed integer value
in doubleword element i of VR[VRB]. Doubleword
element i of VR[VRT] is set to all 1s if the signed
integer value in doubleword element i of VR[VRA]
is greater than the signed integer value in
doubleword element i of VR[VRB], and is set to all
0s otherwise.
Vector Compare Greater Than Signed Vector Compare Greater Than Signed
Halfword VC-form Word VC-form
vcmpgtsh VRT,VRA,VRB (Rc=0) vcmpgtsw VRT,VRA,VRB (Rc=0)
vcmpgtsh. VRT,VRA,VRB (Rc=1) vcmpgtsw. VRT,VRA,VRB (Rc=1)
For each integer value i from 0 to 7, do the following. For each integer value i from 0 to 3, do the following.
Signed-integer halfword element i in VRA is Signed-integer word element i in VRA is
compared to signed-integer halfword element i in compared to signed-integer word element i in
VRB. Halfword element i in VRT is set to all 1s if VRB. Word element i in VRT is set to all 1s if
signed-integer halfword element i in VRA is signed-integer word element i in VRA is greater
greater than signed-integer halfword element i in than signed-integer word element i in VRB, and is
VRB, and is set to all 0s otherwise. set to all 0s otherwise.
Vector Compare Greater Than Unsigned Vector Compare Greater Than Unsigned
Byte VC-form Doubleword VX-form
vcmpgtub VRT,VRA,VRB (Rc=0) vcmpgtud VRT,VRA,VRB (Rc=0)
vcmpgtub. VRT,VRA,VRB (Rc=1) vcmpgtud. VRT,VRA,VRB (Rc=1)
do i=0 to 127 by 8 do i = 0 to 1
VRTi:i+7 ((VRA)i:i+7 >ui (VRB)i:i+7) ? 81 : 80 aop EXTZ(VR[VRA].dword[i])
end bop EXTZ(VR[VRB].dword[i])
if Rc=1 then do if (EXTZ(aop) >ui EXTZ(bop)) then do
t (VRT=1281) VR[VRT].dword[i] 0xFFFF_FFFF_FFFF_FFFF
f (VRT=1280) flag.bit[i] 0b1
CR6 t || 0b0 || f || 0b0 end
end else do
VR[VRT].dword[i] 0x0000_0000_0000_0000
For each integer value i from 0 to 15, do the following. flag.bit[i] 0b1
Unsigned-integer byte element i in VRA is end
compared to unsigned-integer byte element i in end
VRB. Byte element i in VRT is set to all 1s if if “vcmpgtud.” then do
CR.bit[24] (flag=0b11)
unsigned-integer byte element i in VRA is greater
CR.bit[25] 0b0
than to unsigned-integer byte element i in VRB, CR.bit[26] (flag=0b00)
and is set to all 0s otherwise. CR.bit[27] 0b0
end
Special Registers Altered:
CR field 6 . . . . . . . . . . . . . . . . . . . . . . . . . . (if Rc=1) For each integer value i from 0 to 1, do the following.
The unsigned integer value in doubleword
element i of VR[VRA] is compared to the unsigned
integer value in doubleword element i of VR[VRB].
Doubleword element i of VR[VRT] is set to all 1s if
the unsigned integer value in doubleword element
i of VR[VRA] is greater than the unsigned integer
value in doubleword element i of VR[VRB], and is
set to all 0s otherwise.
Vector Compare Greater Than Unsigned Vector Compare Greater Than Unsigned
Halfword VC-form Word VC-form
vcmpgtuh VRT,VRA,VRB (Rc=0) vcmpgtuw VRT,VRA,VRB (Rc=0)
vcmpgtuh. VRT,VRA,VRB (Rc=1) vcmpgtuw. VRT,VRA,VRB (Rc=1)
For each integer value i from 0 to 7, do the following. For each integer value i from 0 to 3, do the following.
Unsigned-integer halfword element i in VRA is Unsigned-integer word element i in VRA is
compared to unsigned-integer halfword element i compared to unsigned-integer word element i in
in VRB. Halfword element i in VRT is set to all 1s VRB. Word element i in VRT is set to all 1s if
if unsigned-integer halfword element i in VRA is unsigned-integer word element i in VRA is greater
greater than to unsigned-integer halfword element than to unsigned-integer word element i in VRB,
i in VRB, and is set to all 0s otherwise. and is set to all 0s otherwise.
Vector Compare Not Equal Byte VX-form Vector Compare Not Equal or Zero Byte
vcmpneb VRT,VRA,VRB (if Rc=0)
VX-form
vcmpneb. VRT,VRA,VRB (if Rc=1) vcmpnezb VRT,VRA,VRB (if Rc=0)
vcmpnezb. VRT,VRA,VRB (if Rc=1)
4 VRT VRA VRB Rc 7
0 6 11 16 21 31 4 VRT VRA VRB Rc 263
0 6 11 16 21 31
if MSR.VEC=0 then Vector_Unavailable()
if MSR.VEC=0 then Vector_Unavailable()
for i = 0 to 15
src1 VR[VRA].byte[i] for i = 0 to 15
src2 VR[VRB].byte[i] src1 VR[VRA].byte[i]
if (src1 != src2) then src2 VR[VRB].byte[i]
VR[VRT].byte[i] 0xFF if (src1 = 0) | (src2 = 0) | (src1 != src2) then
else VR[VRT].byte[i] 0xFF
VR[VRT].byte[i] 0x00 else
end VR[VRT].byte[i] 0x00
end
all_true (VR[VRT]=0xFFFF_FFFF_FFFF_FFFF_FFF_FFFF_FFFF_FFFF)
all_false (VR[VRT]=0x0000_0000_0000_0000_0000_0000_0000_0000) all_true (VR[VRT]=0xFFFF_FFFF_FFFF_FFFF_FFF_FFFF_FFFF_FFFF)
all_false (VR[VRT]=0x0000_0000_0000_0000_0000_0000_0000_0000)
if Rc=1 then CR.bit[56:59] (all_true<<3) + (all_false<<1)
if Rc=1 then CR.bit[56:59] (all_true<<3) + (all_false<<1)
For each integer value i from 0 to 15, do the following.
The integer value in byte element i in VR[VRA] is For each integer value i from 0 to 15, do the following.
compared to the integer value in byte element i in The integer value in byte element i in VR[VRA] is
VR[VRB]. The contents of byte element i in VR[VRT] compared to the integer value in byte element i in
are set to 0xFF if integer value in byte element i in VR[VRB]. The contents of byte element i in VR[VRT]
VR[VRA] is not equal to the integer value in byte are set to 0xFF if integer value in byte element i in
element i in VR[VRB], and are set to 0x00 VR[VRA] is not equal to the integer value in byte
otherwise. element i in VR[VRB] or either value is equal to
0x00, and are set to 0x00 otherwise.
If Rc=1, CR field 6 is set to indicate whether all vector
elements compared true and whether all vector If Rc=1, CR field 6 is set to indicate whether all vector
elements compared false. elements compared true and whether all vector
elements compared false.
Special Registers Altered:
CR field 6 (if Rc=1) Special Registers Altered:
CR field 6 (if Rc=1)
Vector Compare Not Equal Halfword Vector Compare Not Equal or Zero
VX-form Halfword VX-form
vcmpneh VRT,VRA,VRB (if Rc=0) vcmpnezh VRT,VRA,VRB (if Rc=0)
vcmpneh. VRT,VRA,VRB (if Rc=1) vcmpnezh. VRT,VRA,VRB (if Rc=1)
for i = 0 to 7 for i = 0 to 7
src1 VR[VRA].hword[i] src1 VR[VRA].hword[i]
src2 VR[VRB].hword[i] src2 VR[VRB].hword[i]
if (src1 != src2) then if (src1=0) | (src2=0) | (src1src2) then
VR[VRT].hword[i] 0xFFFF VR[VRT].hword[i] 0xFFFF
else else
VR[VRT].hword[i] 0x0000 VR[VRT].hword[i] 0x0000
end end
if Rc=1 then CR.bit[56:59] (all_true<<3) + (all_false<<1) if Rc=1 then CR.bit[56:59] (all_true<<3) + (all_false<<1)
For each integer value i from 0 to 7, do the following. For each integer value i from 0 to 7, do the following.
The integer value in halfword element i in VR[VRA] The integer value in halfword element i in VR[VRA]
is compared to the integer value in halfword is compared to the integer value in halfword
element i in VR[VRB]. The contents of halfword element i in VR[VRB]. The contents of halfword
element i in VR[VRT] are set to 0xFFFF if integer element i in VR[VRT] are set to 0xFFFF if integer
value in halfword element i in VR[VRA] is not equal value in halfword element i in VR[VRA] is not equal
to the integer value in halfword element i in to the integer value in halfword element i in
VR[VRB], and are set to 0x0000 otherwise. VR[VRB] or either value is equal to 0x0000, and are
set to 0x0000 otherwise.
If Rc=1, CR field 6 is set to indicate whether all vector
elements compared true and whether all vector If Rc=1, CR field 6 is set to indicate whether all vector
elements compared false. elements compared true and whether all vector
elements compared false.
Special Registers Altered:
CR field 6 (if Rc=1) Special Registers Altered:
CR field 6 (if Rc=1)
Vector Compare Not Equal Word VX-form Vector Compare Not Equal or Zero Word
vcmpnew VRT,VRA,VRB (if Rc=0)
VX-form
vcmpnew. VRT,VRA,VRB (if Rc=1) vcmpnezw VRT,VRA,VRB (if Rc=0)
vcmpnezw. VRT,VRA,VRB (if Rc=1)
4 VRT VRA VRB Rc 135
0 6 11 16 21 31 4 VRT VRA VRB Rc 391
0 6 11 16 21 31
if MSR.VEC=0 then Vector_Unavailable()
if MSR.VEC=0 then Vector_Unavailable()
for i = 0 to 3
src1 VR[VRA].word[i] for i = 0 to 3
src2 VR[VRB].word[i] src1 VR[VRA].word[i]
if (src1 != src2) then src2 VR[VRB].word[i]
VR[VRT].word[i] 0xFFFF_FFFF if (src1=0) | (src2=0) | (src1src2) then
else VR[VRT].word[i] 0xFFFF_FFFF
VR[VRT].word[i] 0x0000_0000 else
end VR[VRT].word[i] 0x0000_0000
end
all_true (VR[VRT]=0xFFFF_FFFF_FFFF_FFFF_FFF_FFFF_FFFF_FFFF)
all_false (VR[VRT]=0x0000_0000_0000_0000_0000_0000_0000_0000) all_true (VR[VRT]=0xFFFF_FFFF_FFFF_FFFF_FFF_FFFF_FFFF_FFFF)
all_false (VR[VRT]=0x0000_0000_0000_0000_0000_0000_0000_0000)
if Rc=1 then CR.bit[56:59] (all_true<<3) + (all_false<<1)
if Rc=1 then CR.bit[56:59] (all_true<<3) + (all_false<<1)
For each integer value i from 0 to 3, do the following.
The integer value in word element i in VR[VRA] is For each integer value i from 0 to 3, do the following.
compared to the integer value in word element i in The integer value in word element i in VR[VRA] is
VR[VRB]. The contents of word element i in compared to the integer value in word element i in
VR[VRT] are set to 0xFFFF_FFFF if integer value in VR[VRB]. The contents of word element i in
word element i in VR[VRA] is not equal to the VR[VRT] are set to 0xFFFF_FFFF if integer value in
integer value in word element i in VR[VRB], and word element i in VR[VRA] is not equal to the
are set to 0x0000_0000 otherwise. integer value in word element i in VR[VRB] or
either value is equal to 0x0000_0000, and are set to
If Rc=1, CR field 6 is set to indicate whether all vector 0x0000_0000 otherwise.
elements compared true and whether all vector
elements compared false. If Rc=1, CR field 6 is set to indicate whether all vector
elements compared true and whether all vector
Special Registers Altered: elements compared false.
CR field 6 (if Rc=1)
Special Registers Altered:
CR field 6 (if Rc=1)
The contents of VR[VRA] are ANDed with the contents The contents of VR[VRA] are XORed with the contents
of VR[VRB] and the result is placed into VR[VRT]. of VR[VRB] and the complemented result is placed into
VR[VRT].
Special Registers Altered:
None Special Registers Altered:
None
The contents of VR[VRA] are ANDed with the VR[VRT] ¬( VR[VRA] & VR[VRB] )
complement of the contents of VR[VRB] and the result is
placed into VR[VRT]. The contents of VR[VRA] are ANDed with the contents
of VR[VRB] and the complemented result is placed into
Special Registers Altered: VR[VRT].
None
Special Registers Altered:
None
VR[VRT] ¬( VR[VRA] | VR[VRB] ) The contents of VR[VRA] are XORed with the contents
of VR[VRB] and the result is placed into VR[VRT].
The contents of VR[VRA] are ORed with the contents of
VR[VRB] and the complemented result is placed into Special Registers Altered:
VR[VRT]. None
Vector Parity Byte Word VX-form Vector Parity Byte Doubleword VX-form
vprtybw VRT,VRB vprtybd VRT,VRB
do i = 0 to 3 do i = 0 to 1
s0 s0
do j = 0 to 3 do j = 0 to 7
s s ^ VR[VRB].word[i].byte[j].bit[7] s s ^ VR[VRB]dword[i].byte[j].bit[7]
end end
VR[VRT].word[i] Chop(EXTZ(s), 32) VR[VRT].dword[i] Chop(EXTZ(s), 64)
end end
For each integer value i from 0 to 3, do the following For each integer value i from 0 to 1, do the following
If the sum of the least significant bit in each byte If the sum of the least significant bit in each byte
sub-element of word element i of VR[VRB] is odd, sub-element of doubleword element i of VR[VRB]
the value 1 is placed into word element i of is odd, the value 1 is placed into doubleword
VR[VRT]; otherwise the value 0 is placed into word element i of VR[VRT]; otherwise the value 0 is
element i of VR[VRT]. placed into doubleword element i of VR[VRT].
s0
do j = 0 to 15
s s ^ VR[VRB].byte[j].bit[7]
end
VR[VRT] Chop(EXTZ(s), 128)
Vector Rotate Left Byte VX-form Vector Rotate Left Word VX-form
vrlb VRT,VRA,VRB vrlw VRT,VRA,VRB
For each integer value i from 0 to 15, do the following. For each integer value i from 0 to 3, do the following.
Byte element i in VRA is rotated left by the Word element i in VRA is rotated left by the
number of bits specified in the low-order 3 bits of number of bits specified in the low-order 5 bits of
the corresponding byte element i in VRB. the corresponding word element i in VRB.
The result is placed into byte element i in VRT. The result is placed into word element i in VRT.
Vector Rotate Left Halfword VX-form Vector Rotate Left Doubleword VX-form
vrlh VRT,VRA,VRB vrld VRT,VRA,VRB
do i=0 to 127 by 16 do i = 0 to 1
sh (VRB)i+12:i+15 sh VR[VRB].dword[i].bit[58:63]
VRTi:i+15 (VRA)i:i+15 <<< sh VR[VRT].dword[i] VR[VRA].dword[i] <<< sh
end end
For each integer value i from 0 to 7, do the following. For each integer value i from 0 to 1, do the following.
Halfword element i in VRA is rotated left by the The contents of doubleword element i of VR[VRA]
number of bits specified in the low-order 4 bits of are rotated left by the number of bits specified in
the corresponding halfword element i in VRB. bits 58:63 of doubleword element i of VR[VRB].
The result is placed into halfword element i in The result is placed into doubleword element i of
VRT. VR[VRT].
Vector Shift Left Byte VX-form Vector Shift Left Word VX-form
vslb VRT,VRA,VRB vslw VRT,VRA,VRB
For each integer value i from 0 to 15, do the following. For each integer value i from 0 to 3, do the following.
Byte element i in VRA is shifted left by the number Word element i in VRA is shifted left by the
of bits specified in the low-order 3 bits of byte number of bits specified in the low-order 5 bits of
element i in VRB. word element i in VRB.
– Bits shifted out of bit 0 are lost. – Bits shifted out of bit 0 are lost.
– Zeros are supplied to the vacated bits on the – Zeros are supplied to the vacated bits on the
right. right.
The result is placed into byte element i of VRT. The result is placed into word element i of VRT.
Vector Shift Left Halfword VX-form Vector Shift Left Doubleword VX-form
vslh VRT,VRA,VRB vsld VRT,VRA,VRB
do i=0 to 127 by 16 do i = 0 to 1
sh (VRB)i+12:i+15 sh VR[VRB].dword[i].bit[58:63]
VRTi:i+15 (VRA)i:i+15 << sh VR[VRT].dword[i] VR[VRA].dword[i] << sh
end end
For each integer value i from 0 to 7, do the following. For each integer value i from 0 to 1, do the following.
Halfword element i in VRA is shifted left by the The contents of doubleword element i of VR[VRA]
number of bits specified in the low-order 4 bits of are shifted left by the number of bits specified in
halfword element i in VRB. bits 58:63 of doubleword element i of VR[VRB].
– Bits shifted out of bit 0 are lost.
– Bits shifted out of bit 0 are lost. – Zeros are supplied to the vacated bits on the
right.
– Zeros are supplied to the vacated bits on the
right. The result is placed into doubleword element i of
VR[VRT].
The result is placed into halfword element i of
VRT. Special Registers Altered:
None
Special Registers Altered:
None
Vector Shift Right Byte VX-form Vector Shift Right Word VX-form
vsrb VRT,VRA,VRB vsrw VRT,VRA,VRB
For each integer value i from 0 to 15, do the following. For each integer value i from 0 to 3, do the following.
Byte element i in VRA is shifted right by the Word element i in VRA is shifted right by the
number of bits specified in the low-order 3 bits of number of bits specified in the low-order 5 bits of
byte element i in VRB. Bits shifted out of the word element i in VRB. Bits shifted out of the
least-significant bit are lost. Zeros are supplied to least-significant bit are lost. Zeros are supplied to
the vacated bits on the left. The result is placed the vacated bits on the left. The result is placed
into byte element i of VRT. into word element i of VRT.
Vector Shift Right Halfword VX-form Vector Shift Right Doubleword VX-form
vsrh VRT,VRA,VRB vsrd VRT,VRA,VRB
do i=0 to 127 by 16 do i = 0 to 1
sh (VRB)i+12:i+15 sh VR[VRB].dword[i].bit[58:63]
VRTi:i+15 (VRA)i:i+15 >>ui sh VR[VRT].dword[i] VR[VRA].dword[i] >>ui sh
end end
For each integer value i from 0 to 7, do the following. For each integer value i from 0 to 1, do the following.
Halfword element i in VRA is shifted right by the The contents of doubleword element i of VR[VRA]
number of bits specified in the low-order 4 bits of are shifted right by the number of bits specified in
halfword element i in VRB. Bits shifted out of the bits 58:63 of doubleword element i of VR[VRB].
least-significant bit are lost. Zeros are supplied to Zeros are supplied to the vacated bits on the left.
the vacated bits on the left. The result is placed
into halfword element i of VRT. The result is placed into doubleword element i of
VR[VRT].
Special Registers Altered:
None Special Registers Altered:
None
Vector Shift Right Algebraic Byte VX-form Vector Shift Right Algebraic Word
VX-form
vsrab VRT,VRA,VRB
vsraw VRT,VRA,VRB
4 VRT VRA VRB 772
0 6 11 16 21 31 4 VRT VRA VRB 900
0 6 11 16 21 31
do i=0 to 127 by 8
sh (VRB)i+5:i+7 do i=0 to 127 by 32
VRTi:i+7 (VRA)i:i+7 >>si sh sh (VRB)i+27:i+31
end VRTi:i+31 (VRA)i:i+31 >>si sh
end
For each integer value i from 0 to 15, do the following.
Byte element i in VRA is shifted right by the For each integer value i from 0 to 3, do the following.
number of bits specified in the low-order 3 bits of Word element i in VRA is shifted right by the
the corresponding byte element i in VRB. Bits number of bits specified in the low-order 5 bits of
shifted out of bit 7 of the byte element are lost. Bit the corresponding word element i in VRB. Bits
0 of the byte element is replicated to fill the shifted out of bit 31 of the word are lost. Bit 0 of
vacated bits on the left. The result is placed into the word is replicated to fill the vacated bits on the
byte element i of VRT. left. The result is placed into word element i of
VRT.
Special Registers Altered:
None Special Registers Altered:
None
Vector Rotate Left Word then AND with Vector Rotate Left Word then Mask Insert
Mask VX-form VX-form
vrlwnm VRT,VRA,VRB vrlwmi VRT,VRA,VRB
do i = 0 to 3 do i = 0 to 3
src1.word[0] VR[VRA].word[i] src1.word[0] VR[VRA].word[i]
src1.word[1] VR[VRA].word[i] src1.word[1] VR[VRA].word[i]
src2 VR[VRB].word[i] src2 VR[VRB].word[i]
src3 VR[VRT].word[i]
b src2.bit[11:15]
e src2.bit[19:23] b src2.bit[11:15]
n src2.bit[27:31] e src2.bit[19:23]
r src1.bit[n:n+31] n src2.bit[27:31]
m MASK(b, e) r src1.bit[n:n+31]
m MASK(b, e)
VR[VRT].word[i] r & m
end VR[VRT].word[i] (r & m) | (src3 & ¬m)
end
For each integer value i from 0 to 3, do the following.
Let src1 be the contents of word element i of For each integer value i from 0 to 3, do the following.
VR[VRA]. Let src1 be the contents of word element i of
VR[VRA].
Let src2 be the contents of word element i of
VR[VRB]. Let src2 be the contents of word element i of
VR[VRB].
Let mb be the contents of bits 11:15 of src2.
Let me be the contents of bits 19:23 of src2. Let src3 be the contents of word element i of
Let sh be the contents of bits 27:31 of src2. VR[VRT].
src1 is rotated left sh bits. A mask is generated Let mb be the contents of bits 11:15 of src2.
having 1-bits from bit mb through bit me and 0-bits Let me be the contents of bits 19:23 of src2.
elsewhere. The rotated data are ANDed with the Let sh be the contents of bits 27:31 of src2.
generated mask.
src1 is rotated left sh bits. A mask is generated
The result is placed into word element i of having 1-bits from bit mb through bit me and 0-bits
VR[VRT]. elsewhere. The rotated data are inserted into src3
under control of the generated mask.
Special Registers Altered:
None The result is placed into word element i of
VR[VRT].
Vector Rotate Left Doubleword then AND Vector Rotate Left Doubleword then Mask
with Mask VX-form Insert VX-form
vrldnm VRT,VRA,VRB vrldmi VRT,VRA,VRB
do i = 0 to 1 do i = 0 to 1
src1.dword[0] VR[VRA].dword[i] src1.dword[0] VR[VRA].dword[i]
src1.dword[1] VR[VRA].dword[i] src1.dword[1] VR[VRA].dword[i]
src2 VR[VRB].dword[i] src2 VR[VRB].dword[i]
src3 VR[VRT].dword[i]
b src2.bit[42:47]
e src2.bit[50:55] b src2.bit[42:47]
n src2.bit[58:63] e src2.bit[50:55]
r src1.bit[n:n+63] n src2.bit[58:63]
m MASK(b, e) r src1.bit[n:n+63]
m MASK(b, e)
VR[VRT].dword[i] r & m
end VR[VRT].dword[i] (r & m) | (src3 & ¬m)
end
For each integer value i from 0 to 1, do the following.
Let src1 be the contents of doubleword element i For each integer value i from 0 to 1, do the following.
of VR[VRA]. Let src1 be the contents of doubleword element i
of VR[VRA].
Let src2 be the contents of doubleword element i
of VR[VRB]. Let src2 be the contents of doubleword element i
of VR[VRB].
Let mb be the contents of bits 42:47 of src2.
Let me be the contents of bits 50:55 of src2. Let src3 be the contents of doubleword element i
Let sh be the contents of bits 58:63 of src2. of VR[VRT].
src1 is rotated left sh bits. A mask is generated Let mb be the contents of bits 42:47 of src2.
having 1-bits from bit mb through bit me and 0-bits Let me be the contents of bits 50:55 of src2.
elsewhere. The rotated data are ANDed with the Let sh be the contents of bits 58:63 of src2.
generated mask.
src1 is rotated left sh bits. A mask is generated
The result is placed into doubleword element i of having 1-bits from bit mb through bit me and 0-bits
VR[VRT]. elsewhere. The rotated data are inserted into src3
under control of the generated mask.
Special Registers Altered:
None The result is placed into doubleword element i of
VR[VRT].
For each integer value i from 0 to 3, do the following. For each integer value i from 0 to 3, do the following.
Single-precision floating-point element i in VRA is Single-precision floating-point element i in VRB is
added to single-precision floating-point element i subtracted from single-precision floating-point
in VRB. The intermediate result is rounded to the element i in VRA. The intermediate result is
nearest single-precision floating-point number and rounded to the nearest single-precision
placed into word element i of VRT. floating-point number and placed into word
element i of VRT.
Special Registers Altered:
None Special Registers Altered:
None
For each integer value i from 0 to 3, do the following. For each integer value i from 0 to 3, do the following.
Single-precision floating-point element i in VRA is Single-precision floating-point element i in VRA is
multiplied by single-precision floating-point multiplied by single-precision floating-point
element i in VRC. Single-precision floating-point element i in VRC. Single-precision floating-point
element i in VRB is added to the infinitely-precise element i in VRB is subtracted from the
product. The intermediate result is rounded to the infinitely-precise product. The intermediate result
nearest single-precision floating-point number and is rounded to the nearest single-precision
placed into word element i of VRT. floating-point number, then negated and placed
into word element i of VRT.
Special Registers Altered:
None Special Registers Altered:
None
Programming Note
To use a multiply-add to perform an IEEE or Java
compliant multiply, the addend must be -0.0. This
is necessary to insure that the sign of a zero result
will be correct when the product is -0.0 (+0.0 + -0.0
+0.0, and -0.0 + -0.0 -0.0). When the sign of a
resulting 0.0 is not important, then +0.0 can be
used as an addend which may, in some cases,
avoid the need for a second register to hold a -0.0
in addition to the integer 0/floating-point +0.0 that
may already be available.
For each integer value i from 0 to 3, do the following. For each integer value i from 0 to 3, do the following.
Single-precision floating-point element i in VRA is Single-precision floating-point element i in VRA is
compared to single-precision floating-point compared to single-precision floating-point
element i in VRB. The larger of the two values is element i in VRB. The smaller of the two values is
placed into word element i of VRT. placed into word element i of VRT.
The maximum of +0 and -0 is +0. The maximum of any The minimum of +0 and -0 is -0. The minimum of any
value and a NaN is a QNaN. value and a NaN is a QNaN.
For each integer value i from 0 to 3, do the following. For each integer value i from 0 to 3, do the following.
Single-precision floating-point word element i in Single-precision floating-point word element i in
VRB is multiplied by 2UIM. The product is VRB is multiplied by 2UIM. The product is
converted to a 32-bit signed fixed-point integer converted to a 32-bit unsigned fixed-point integer
using the rounding mode Round toward Zero. using the rounding mode Round toward Zero.
– If the intermediate result is greater than 231-1 – If the intermediate result is greater than 232-1
the result saturates to 231-1. the result saturates to 232-1.
– If the intermediate result is less than -231 the The result is placed into word element i of VRT.
result saturates to -231.
Special Registers Altered:
The result is placed into word element i of VRT. SAT
Vector Convert with round to nearest Vector Convert with round to nearest
Signed Word format VX-form Unsigned Word format VX-form
vcfsx VRT,VRB,UIM vcfux VRT,VRB,UIM
For each integer value i from 0 to 3, do the following. For each integer value i from 0 to 3, do the following.
Signed fixed-point word element i in VRB is Unsigned fixed-point word element i in VRB is
converted to the nearest single-precision converted to the nearest single-precision
floating-point value. Each result is divided by 2UIM floating-point value. The result is divided by 2UIM
and placed into word element i of VRT. and placed into word element i of VRT.
For each integer value i from 0 to 3, do the following. For each integer value i from 0 to 3, do the following.
Single-precision floating-point element i in VRB is Single-precision floating-point element i in VRB is
rounded to a single-precision floating-point integer rounded to a single-precision floating-point integer
using the rounding mode Round toward -Infinity. using the rounding mode Round to Nearest.
The result is placed into the corresponding word The result is placed into the corresponding word
element i of VRT. element i of VRT.
Programming Note
The Vector Convert To Fixed-Point Word Vector Round to Floating-Point Integer
instructions support only the rounding mode toward +Infinity VX-form
Round toward Zero. A floating-point number can
be converted to a fixed-point integer using any of vrfip VRT,VRB
the other three rounding modes by executing the 4 VRT /// VRB 650
appropriate Vector Round to Floating-Point Integer 0 6 11 16 21 31
instruction before the Vector Convert To
Fixed-Point Word instruction. do i=0 to 127 by 32
VRT0:31 RoundToSPIntCeil( (VRB)0:31 )
end
Programming Note
The fixed-point integers used by the Vector For each integer value i from 0 to 3, do the following.
Convert instructions can be interpreted as Single-precision floating-point element i in VRB is
consisting of 32-UIM integer bits followed by UIM rounded to a single-precision floating-point integer
fraction bits. using the rounding mode Round toward +Infinity.
do i=0 to 127 by 32
VRT0:31 RoundToSPIntTrunc( (VRB)0:31 )
end
Bit Description
0 Set to 0
1 Set to 0
For each integer value i from 0 to 3, do the following. For each integer value i from 0 to 3, do the following.
Single-precision floating-point element i in VRA is Single-precision floating-point element i in VRA is
compared to single-precision floating-point compared to single-precision floating-point
element i in VRB. Word element i in VRT is set to element i in VRB. Word element i in VRT is set to
all 1s if single-precision floating-point element i in all 1s if single-precision floating-point element i in
VRA is equal to single-precision floating-point VRA is greater than or equal to single-precision
element i in VRB, and is set to all 0s otherwise. floating-point element i in VRB, and is set to all 0s
otherwise.
If the source element i in VRA or the source
element i in VRB is a NaN, VRT is set to all 0s, If the source element i in VRA or the source
indicating “not equal to”. If the source element i in element i in VRB is a NaN, VRT is set to all 0s,
VRA and the source element i in VRB are both indicating “not greater than or equal to”. If the
infinity with the same sign, VRT is set to all 1s, source element i in VRA and the source element i
indicating “equal to”. in VRB are both infinity with the same sign, VRT is
set to all 1s, indicating “greater than or equal to”.
Special Registers Altered:
CR field 6 . . . . . . . . . . . . . . . . . . . . . . . . . .(if Rc=1) Special Registers Altered:
CR field 6 . . . . . . . . . . . . . . . . . . . . . . . . . . (if Rc=1)
do i=0 to 127 by 32
VRTi:i+31 ((VRA)i:i+31 >fp (VRB)i:i+31) ? 32
1 : 32
0
end
if Rc=1 then do
t ( VRT=1281 )
f ( VRT=1280 )
CR6 t || 0b0 || f || 0b0
end
For each integer value i from 0 to 3, do the following. For each integer value i from 0 to 3, do the following.
The single-precision floating-point estimate of 2 The single-precision floating-point estimate of the
raised to the power of single-precision base 2 logarithm of single-precision floating-point
floating-point element i in VRB is placed into word element i in VRB is placed into the corresponding
element i of VRT. word element of VRT.
Let x be any single-precision floating-point input value. Let x be any single-precision floating-point input value.
Unless x< -146 or the single-precision floating-point Unless | x-1 | is less than or equal to 0.125 or the
result of computing 2 raised to the power x would be a single-precision floating-point result of computing the
zero, an infinity, or a QNaN, the estimate has a relative base 2 logarithm of x would be an infinity or a QNaN,
error in precision no greater than one part in 16. The the estimate has an absolute error in precision
most significant 12 bits of the estimate’s significand (absolute value of the difference between the estimate
are monotonic. An integral input value returns an and the infinitely precise value) no greater than 2-5.
integral value when the result is representable. Under the same conditions, the estimate has a relative
error in precision no greater than one part in 8.
The result for various special cases of the source
value is given below. The most significant 12 bits of the estimate’s
significand are monotonic. The estimate is exact if
Value Result x=2y, where y is an integer between -149 and +127
- Infinity +0 inclusive. Otherwise the value placed into the element
-0 +1 of register VRT may vary between implementations,
and between different executions on the same
+0 +1
implementation.
+Infinity +Infinity
NaN QNaN The result for various special cases of the source
value is given below.
Special Registers Altered:
Value Result
None
- Infinity QNaN
<0 QNaN
-0 - Infinity
+0 - Infinity
+Infinity +Infinity
NaN QNaN
For each integer value i from 0 to 3, do the following. For each integer value i from 0 to 3, do the following.
The single-precision floating-point estimate of the The single-precision floating-point estimate of the
reciprocal of single-precision floating-point reciprocal of the square root of single-precision
element i in VRB is placed into word element i of floating-point element i in VRB is placed into word
VRT. element i of VRT.
Unless the single-precision floating-point result of Let x be any single-precision floating-point value.
computing the reciprocal of a value would be a zero, Unless the single-precision floating-point result of
an infinity, or a QNaN, the estimate has a relative error computing the reciprocal of the square root of x would
in precision no greater than one part in 4096. be a zero, an infinity, or a QNaN, the estimate has a
relative error in precision no greater than one part in
Note that results may vary between implementations, 4096.
and between different executions on the same
implementation. Note that results may vary between implementations,
and between different executions on the same
The result for various special cases of the source implementation.
value is given below.
The result for various special cases of the source
Value Result value is given below.
- Infinity -0
-0 - Infinity Value Result
+0 + Infinity - Infinity QNaN
+Infinity +0 <0 QNaN
NaN QNaN -0 - Infinity
+0 + Infinity
Special Registers Altered: +Infinity +0
None NaN QNaN
Vector AES Inverse Cipher VX-form Vector AES Inverse Cipher Last VX-form
vncipher VRT,VRA,VRB vncipherlast VRT,VRA,VRB
State VR[VRA]
VR[VRT] SubBytes(State)
do i = 0 to 7 do i = 0 to 3
prod.bit[0:30] 0 prod[i].bit[0:62] 0
srcA VR[VRA].halfword[i] srcA VR[VRA].word[i]
srcB VR[VRB].halfword[i] srcB VR[VRB].word[i]
do j = 0 to 15 do j = 0 to 31
do k = 0 to j do k = 0 to j
gbit srcA.bit[k] & srcB.bit[j-k] gbit srcA.bit[k] & srcB.bit[j-k]
prod[i].bit[j] prod[i].bit[j] ^ gbit prod[i].bit[j] prod[i].bit[j] ^ gbit
end end
end end
do j = 16 to 30 do j = 32 to 62
do k = j-15 to 15 do k = j-31 to 31
gbit (srcA.bit[k] & srcB.bit[j-k]) gbit (srcA.bit[k] & srcB.bit[j-k])
prod[i].bit[j] prod[i].bit[j] ^ gbit prod[i].bit[j] prod[i].bit[j] ^ gbit
end end
end end
end end
VR[VRT].word[0] 0b0 » (prod[0] ^ prod[1]) VR[VRT].dword[0] 0b0 » (prod[0] ^ prod[1])
VR[VRT].word[1] 0b0 » (prod[2] ^ prod[3]) VR[VRT].dword[1] 0b0 » (prod[2] ^ prod[3])
VR[VRT].word[2] 0b0 » (prod[4] ^ prod[5])
VR[VRT].word[3] 0b0 » (prod[6] ^ prod[7]) For each integer value i from 0 to 3, do the following.
Let prod[i] be the 63-bit result of a binary
For each integer value i from 0 to 7, do the following. polynomial multiplication of the contents of word
Let prod[i] be the 31-bit result of a binary element i of VR[VRA] and the contents of word
polynomial multiplication of the contents of element i of VR[VRB].
halfword element i of VR[VRA] and the contents of
halfword element i of VR[VRB]. For each integer value i from 0 to 1, do the following.
The exclusive-OR of prod[2×i] and prod[2×i+1] is
For each integer value i from 0 to 3, do the following. placed in bits 1:63 of doubleword element i of
The exclusive-OR of prod[2×i] and prod[2×i+1] is VR[VRT]. Bit 0 of doubleword element i of VR[VRT]
placed in bits 1:31 of word element i of VR[VRT]. is set to 0.
Bit 0 of word element i of VR[VRT] is set to 0.
Special Registers Altered:
Special Registers Altered: None
None
do i = 0 to 15
indexA VR[VRC].byte[i].bit[0:3]
indexB VR[VRC].byte[i].bit[4:7]
src1 VR[VRA].byte[indexA]
src2 VR[VRB].byte[indexB]
VSR[VRT].byte[i] src1 ^ src2
end
Vector Gather Bits by Bytes by The contents of bit 1 of each byte of doubleword
Doubleword VX-form element i of VR[VRB] are concatenated and placed
into byte 1 of doubleword element i of VR[VRT].
vgbbd VRT,VRB
The contents of bit 2 of each byte of doubleword
4 VRT /// VRB 1292 element i of VR[VRB] are concatenated and placed
0 6 11 16 21 31
into byte 2 of doubleword element i of VR[VRT].
do i = 0 to 1
do j = 0 to 7 The contents of bit 3 of each byte of doubleword
do k = 0 to 7 element i of VR[VRB] are concatenated and placed
b VSR[VRB].dword[i].byte[k].bit[j] into byte 3 of doubleword element i of VR[VRT].
VSR[VRT].dword[i].byte[j].bit[k] b
end The contents of bit 4 of each byte of doubleword
end element i of VR[VRB] are concatenated and placed
end into byte 4 of doubleword element i of VR[VRT].
Let src be the contents of VR[VRB], composed of two The contents of bit 5 of each byte of doubleword
doubleword elements numbered 0 and 1. element i of VR[VRB] are concatenated and placed
into byte 5 of doubleword element i of VR[VRT].
Let each doubleword element be composed of eight
bytes numbered 0 through 7. The contents of bit 6 of each byte of doubleword
element i of VR[VRB] are concatenated and placed
An 8-bit × 8-bit bit-matrix transpose is performed on
into byte 6 of doubleword element i of VR[VRT].
the contents of each doubleword element of VR[VRB]
(see Figure 104). The contents of bit 7 of each byte of doubleword
element i of VR[VRB] are concatenated and placed
For each integer value i from 0 to 1, do the following,
into byte 7 of doubleword element i of VR[VRT].
The contents of bit 0 of each byte of doubleword
element i of VR[VRB] are concatenated and placed
Special Registers Altered:
into byte 0 of doubleword element i of VR[VRT].
None
Vector Count Leading Zeros Byte VX-form Vector Count Leading Zeros Word
VX-form
vclzb VRT,VRB
vclzw VRT,VRB
4 VRT /// VRB 1794
0 6 11 16 21 31 4 VRT /// VRB 1922
0 6 11 16 21 31
if MSR.VEC=0 then Vector_Unavailable()
if MSR.VEC=0 then Vector_Unavailable()
do i = 0 to 15
n 0 do i = 0 to 3
do while n < 8 n 0
if VR[VRB].byte[i].bit[n] = 0b1 then leave do while n < 32
n n + 1 if VR[VRB].word[i].bit[n] = 0b1 then leave
end n n + 1
VSR[VRT].byte[i] n end
end VSR[VRT].word[i] n
end
For each integer value i from 0 to 15, do the following.
A count of the number of consecutive zero bits For each integer value i from 0 to 3, do the following.
starting at bit 0 of byte element i of VR[VRB] is A count of the number of consecutive zero bits
placed into byte element i of VR[VRT]. This starting at bit 0 of word element i of VR[VRB] is
number ranges from 0 to 8, inclusive. placed into word element i of VR[VRT]. This
number ranges from 0 to 32, inclusive.
Special Registers Altered:
None Special Registers Altered:
None
For each integer value i from 0 to 7, do the following. For each integer value i from 0 to 1, do the following.
A count of the number of consecutive zero bits A count of the number of consecutive zero bits
starting at bit 0 of halfword element i of VR[VRB] is starting at bit 0 of doubleword element i of
placed into halfword element i of VR[VRT]. This VR[VRB] is placed into doubleword element i of
number ranges from 0 to 16, inclusive. VR[VRT]. This number ranges from 0 to 64,
inclusive.
Special Registers Altered:
None Special Registers Altered:
None
Vector Count Trailing Zeros Byte VX-form Vector Count Trailing Zeros Word
VX-form
vctzb VRT,VRB
vctzw VRT,VRB
4 VRT 28 VRB 1538
0 6 11 16 21 31 4 VRT 30 VRB 1538
0 6 11 16 21 31
if MSR.VEC=0 then Vector_Unavailable()
if MSR.VEC=0 then Vector_Unavailable()
do i = 0 to 15
n0 do i = 0 to 3
do while n < 8 n 0
if VR[VRB].byte[i].bit[7-n] = 0b1 then leave do while n < 32
nn+1 if VR[VRB].word[i].bit[31-n] = 0b1 then leave
end n n+1
VR[VRT].byte[i] Chop(EXTZ(n), 8) end
end VR[VRT].word[i] Chop(EXTZ(n), 32)
end
For each integer value i from 0 to 15, do the following.
A count of the number of consecutive zero bits For each integer value i from 0 to 3, do the following.
starting at bit 7 of byte element i of VR[VRB] is A count of the number of consecutive zero bits
placed into byte element i of VR[VRT]. This number starting at bit 31 of word element i of VR[VRB] is
ranges from 0 to 8, inclusive. placed into word element i of VR[VRT]. This number
ranges from 0 to 32, inclusive.
Special Registers Altered:
None Special Registers Altered:
None
count 0 count 0
do while count < 16 do while count < 16
if (VR[VRB].byte[count].bit[7]=1) break if (VR[VRB].byte[15-count].bit[7]=1) break
count count + 1 count count + 1
end end
Let count be the number of contiguous leading byte Let count be the number of contiguous trailing byte
elements in VR[VRB] having a zero least-significant bit. elements in VR[VRB] having a zero least-significant bit.
Let index be the contents of bits 60:63 of GPR[RA]. Let index be the contents of bits 60:63 of GPR[RA].
The contents of byte element index of VR[VRB] are The contents of byte element 15-index of VR[VRB] are
placed into bits 56:63 of GPR[RT]. placed into bits 56:63 of GPR[RT].
The contents of bits 0:55 of GPR[RT] are set to 0. The contents of bits 0:55 of GPR[RT] are set to 0.
Let index be the contents of bits 60:63 of GPR[RA]. Let index be the contents of bits 60:63 of GPR[RA].
The contents of byte elements index:index+1 of The contents of byte elements 14-index:15-index of
VR[VRB] are placed into bits 48:63 of GPR[RT]. VR[VRB] are placed into bits 48:63 of GPR[RT].
The contents of bits 0:47 of GPR[RT] are set to 0. The contents of bits 0:47 of GPR[RT] are set to 0.
If the value of index is greater than 14, the results are If the value of index is greater than 14, the results are
undefined. undefined.
Let index be the contents of bits 60:63 of GPR[RA]. Let index be the contents of bits 60:63 of GPR[RA].
The contents of byte elements index:index+3 of The contents of byte elements index:index+3 of
VR[VRB] are placed into bits 32:63 of GPR[RT]. VR[VRB] are placed into bits 32:63 of GPR[RT].
The contents of bits 0:31 of GPR[RT] are set to 0. The contents of bits 0:31 of GPR[RT] are set to 0.
If the value of index is greater than 12, the results are If the value of index is greater than 12, the results are
undefined. undefined.
do i = 0 to 1 do i = 0 to 3
n 0 n 0
do j = 0 to 63 do j = 0 to 31
n n + VR[VRB].dword[i].bit[j] n n + VR[VRB].word[i].bit[j]
end end
VSR[VRT].dword[i] n VSR[VRT].word[i] n
end end
For each integer value i from 0 to 1, do the following. For each integer value i from 0 to 3, do the following.
A count of the number of bits set to 1 in A count of the number of bits set to 1 in word
doubleword element i of VR[VRB] is placed into element i of VR[VRB] is placed into word element i
doubleword element i of VR[VRT]. This number of VR[VRT]. This number ranges from 0 to 32,
ranges from 0 to 64, inclusive. inclusive.
Vector Bit Permute Doubleword VX-form Vector Bit Permute Quadword VX-form
vbpermd VRT,VRA,VRB
vbpermq VRT,VRA,VRB
4 VRT VRA VRB 1484
0 6 11 16 21 31 4 VRT VRA VRB 1356
0 6 11 16 21 31
if MSR.VEC=0 then Vector_Unavailable()
if MSR.VEC=0 then Vector_Unavailable()
do i = 0 to 1
do j = 0 to 7 do i = 0 to 15
index VR[VRB].dword[i].byte[j] index VR[VRB].byte[i]
if index < 64 then if index < 128 then
perm.bit[j] VR[VRA].dword[i].bit[index] perm.bit[i] VR[VRA].bit[index]
else
else
perm.bit[j] 0
end
perm.bit[i] 0
end
VR[VRT].dword[i] EXTZ64(perm) VR[VRT].dword[0] Chop(EXTZ(perm),64)
end VR[VRT].dword[1] 0x0000_0000_0000_0000
For each integer value i from 0 to 1, and for each For each integer value i from 0 to 15, do the following.
integer value j from 0 to 7, do the following. Let index be the contents of byte element i of
Let index be the contents of byte sub-element j of VR[VRB].
doubleword element i of VR[VRB].
If index is less than 128, then the contents of bit
If index is less than 64, then the contents of bit index of VR[VRA] are placed into bit 48+i of double-
index of doubleword i of VR[VRA] are placed into word element i of VR[VRT]. Otherwise, bit 48+i of
doubleword element i of VR[VRT] is set to 0.
bit 56+j of doubleword element i of VR[VRT].
Otherwise, bit 56+j of doubleword element i of The contents of bits 0:47 of VR[VRT] are set to 0.
VR[VRT] is set to 0. The contents of bits 64:127 of VR[VRT] are set to 0.
Special Registers Altered:
The contents of bits 0:55 of doubleword element i
None
of VR[VRT] are set to 0.
Programming Note
The fact that the permuted bit is 0 if the corresponding index value exceeds 127 permits the permuted bits to be
selected from a 256-bit quantity, using a single index register. For example, assume that the 256-bit quantity Q,
from which the permuted bits are to be selected, is in registers v2 (high-order 128 bits of Q) and v3 (low-order
128 bits of Q), that the index values are in register v1, with each byte of v1 containing a value in the range 0:255,
and that each byte of register v4 contains the value 128. The following code sequence selects eight permuted
bits from Q and places them into the low-order byte of v6.
Source operands with sign codes of 0b1010, 0b1100, Negative results are encoded with a sign code of
0b1110, and 0b1111 are interpreted as positive values. 0b1101.
Let src1 be the decimal integer value in VR[VRA]. Let src1 be the decimal integer value in VR[VRA].
Let src2 be the decimal integer value in VR[VRB]. Let src2 be the decimal integer value in VR[VRB].
If the unbounded result is equal to zero, do the If the unbounded result is equal to zero, do the
following. following.
If PS=0, the sign code of the result is set to 0b1100. If PS=0, the sign code of the result is set to 0b1100.
If PS=1, the sign code of the result is set to 0b1111. If PS=1, the sign code of the result is set to 0b1111.
If the unbounded result is greater than zero, do the If the unbounded result is greater than zero, do the
following. following.
If PS=0, the sign code of the result is set to 0b1100. If PS=0, the sign code of the result is set to 0b1100.
If PS=1, the sign code of the result is set to 0b1111. If PS=1, the sign code of the result is set to 0b1111.
If the operation overflows, CR field 6 is set to If the operation overflows, CR field 6 is set to
0b0101. Otherwise, CR field 6 is set to 0b0100. 0b0101. Otherwise, CR field 6 is set to 0b0100.
If the unbounded result is less than zero, do the If the unbounded result is less than zero, do the
following. following.
The sign code of the result is set to 0b1101. The sign code of the result is set to 0b1101.
If the operation overflows, CR field 6 is set to If the operation overflows, CR field 6 is set to
0b1001. Otherwise, CR field 6 is set to 0b1000. 0b1001. Otherwise, CR field 6 is set to 0b1000.
The low-order 31 digits of the magnitude of the result The low-order 31 digits of the magnitude of the result
are placed in bits 0:123 of VR[VRT]. are placed in bits 0:123 of VR[VRT].
The sign code is placed in bits 124:127 of VR[VRT]. The sign code is placed in bits 124:127 of VR[VRT].
If either src1 or src2 is an invalid encoding of a 31-digit If either src1 or src2 is an invalid encoding of a 31-digit
signed decimal value, the result is undefined and CR signed decimal value, the result is undefined and CR
field 6 is set to 0b0001. field 6 is set to 0b0001.
Programming Note
bcdsub. vTmp,vA,vB,0 can be used to compare
decimal operands vA and vB. Bits 0:2 of CR field 6
will be set to indicate vA is less than vB (LT), vA is
greater than vB (GT), and vA is equal to vB (EQ).
Decimal Convert From National VX-form Let src be the national decimal value in VR[VRB].
bcdcfn. VRT,VRB,PS src is placed in VR[VRT] in packed decimal format.
4 VRT 7 VRB 1 PS 385
0 6 11 16 21 22 23 31
A valid encoding of a national decimal value requires
the following.
if MSR.VEC=0 then Vector_Unavailable() – The contents of halfword 7 (sign code) must be
either 0x002B or 0x002D.
src_sign (VR[VRB].hword[7] = 0x002D) – The contents of halfwords 0 to 6 must be in the
eq_flag 1 range 0x0030 to 0x0039.
/* check for valid sign */
inv_flag (VR[VRB].hword[7] != 0x002B) & National decimal values having a sign code of 0x002B
(VR[VRB].hword[7] != 0x002D) are interpreted as positive values.
CR.bit[56] inv_flag ? 0b0 : lt_flag CR field 6 is set to reflect src compared to zero.
CR.bit[57] inv_flag ? 0b0 : gt_flag
CR.bit[58] inv_flag ? 0b0 : eq_flag If src is an invalid encoding of a national decimal
CR.bit[59] inv_flag
value, the contents of VR[VRT] are undefined and CR
field 6 is set to 0b0001.
Decimal Convert From Zoned VX-form Zoned decimal values having a sign code of 0x0,
0x1, 0x2, 0x3, 0x8, 0x9, 0xA, or 0xB are interpreted
bcdcfz. VRT,VRB,PS
as positive values.
4 VRT 6 VRB 1 PS 385
0 6 11 16 21 22 23 31 Zoned decimal values having a sign code of 0x4,
0x5, 0x6, 0x7, 0xC, 0xD, 0xE, or 0xF are interpreted
if MSR.VEC=0 then Vector_Unavailable() as negative values.
/* check for valid sign */
When PS=1, do the following.
inv_flag ((VR[VRB].byte[15].nibble[0] < 0xA) & (PS=1)) |
(VR[VRB].byte[15].nibble[1] > 0x9)
A valid encoding of a zoned decimal source
operand requires the following.
/* check for valid digits */ – The contents of bits 0:3 of byte 15 (sign code)
MIN (PS=0) ? 0x30 : 0xF0 must be a value in the range 0xA to 0xF.
MAX (PS=0) ? 0x39 : 0xF9 – The contents of bits 0:3 of bytes 0 to 14 must
do i = 0 to 14 be the value 0xF.
inv_flag inv_flag | (VR[VRB].byte[i] < MIN) – The contents of bits 4:7 of bytes 0 to 15 must
| (VR[VRB].byte[i] > MAX)
be a value in the range 0x0 to 0x9.
end
if PS=0 then
Zoned decimal source operands having a sign
src_sign VR[VRB].nibble[30].bit[1] code of 0xA, 0xC, 0xE, or 0xF are interpreted as
else positive values.
src_sign (VR[VRB].nibble[30] = 0b1011) |
(VR[VRB].nibble[30] = 0b1101) Zoned decimal source operands having a sign
code of 0xB or 0xD are interpreted as negative
eq_flag 1 values.
result.nibble[31] (src_sign=0) ? 0xC : 0xD For each integer value i from 0 to 15,
The contents of nibble 1 of byte element i of src
VR[VRT] inv_flag ? undefined : result are placed into nibble element i+15 of VR[VRT].
CR.bit[56] inv_flag ? 0b0 : lt_flag CR field 6 is set to reflect src compared to zero.
CR.bit[57] inv_flag ? 0b0 : gt_flag
CR.bit[58] inv_flag ? 0b0 : eq_flag If src is an invalid encoding of a zoned decimal value,
CR.bit[59] inv_flag the contents of VR[VRT] are undefined and CR field 6 is
set to 0b0001.
Let src be the zoned decimal value in VR[VRB].
Special Registers Altered:
src is placed in VR[VRT] in packed decimal format.
CR field 6
When PS=0, do the following.
A valid encoding of a zoned decimal value
requires the following.
– The contents of bits 0:3 of byte 15 (sign code)
can be any value in the range 0x0 to 0xF.
– The contents of bits 0:3 of bytes 0 to 14 must
be the value 0x3.
– The contents of bits 4:7 of bytes 0 to 15 must
be a value in the range 0x0 to 0x9.
Decimal Convert To National VX-form For each integer value i from 0 to 6, do the following.
The value 0x003 is placed into nibbles 0:2 of
bcdctn. VRT,VRB
halfword element i of VR[VRT].
4 VRT 5 VRB 1 / 385
0 6 11 16 21 22 23 31 The contents of nibble element i+24 of VR[VRB] are
placed into nibble 3 of halfword element i of
if MSR.VEC=0 then Vector_Unavailable() VR[VRT].
ox_flag 0 The contents of halfword element 7 (i.e., sign code) of
do i = 0 to 23
VR[VRT] are set to 0x002B for positive values and to
ox_flag ox_flag | (VR[VRB].nibble[i] != 0x0)
end
0x002D for negative values.
inv_flag (VR[VRB].nibble[31] < 0xA) CR field 6 is set to reflect src compared to zero,
do i = 0 to 30 including whether or not src is too large to be
inv_flag inv_flag | (VR[VRB].nibble[i] > 0x9) represented in national decimal format.
end
If src is an invalid encoding of a packed decimal value,
src_sign (VR[VRB].nibble[31] = 0xB) | the contents of VR[VRT] are undefined and CR field 6 is
src.sign (VR[VRB].nibble[31] = 0xD) set to 0b0001.
eq_flag (VR[VRB].nibble[0:30] = 0)
lt_flag (eq_flag=0) & (src_sign=1)
Special Registers Altered:
gt_flag (eq_flag=0) & (src_sign=0) CR field 6
do i = 0 to 6
result.hword[i].nibble[0:2] 0x003
result.hword[i].nibble[3] VR[VRB].nibble[i+24]
end
src_sign (VR[VRB].nibble[31] = 0xB) | Negative zoned decimal results are returned with
(VR[VRB].nibble[31] = 0xD) a sign code of 0xD.
eq_flag (VR[VRB].nibble[0:30] = 0) For each integer value i from 0 to 15, do the following.
lt_flag (eq_flag=0) & (src_sign=1) The rightmost nibble of each digit i of the zoned
gt_flag (eq_flag=0) & (src_sign=0)
decimal result is set to the contents of nibble i+15
do i = 0 to 14
of src.
result.byte[i].nibble[0] (PS=0) ? 0x3 : 0xF
result.byte[i].nibble[1] VR[VRB].nibble[i+15] The result is placed into VR[VRT].
end
CR field 6 is set to reflect src compared to zero,
if src.sign=0 then including whether or not src is too large to be
result.byte[15].nibble[0] (PS=0) ? 0x3 : 0xC represented in zoned decimal format.
else
result.byte[15].nibble[0] (PS=0) ? 0x7 : 0xD If src is an invalid encoding of a packed decimal value,
the contents of VR[VRT] are undefined and CR field 6 is
result.byte[15].nibble[1] VR[VRB].nibble[30]
set to 0b0001.
VR[VRT] inv_flag ? undefined : result
Special Registers Altered:
CR.bit[56] inv_flag ? 0b0 : lt_flag CR field 6
CR.bit[57] inv_flag ? 0b0 : gt_flag
CR.bit[58] inv_flag ? 0b0 : eq_flag
CR.bit[59] inv_flag | ox_flag
For PS=0, the contents of nibble element 31 (i.e., sign src is placed into VR[VRT] in signed integer format.
code) of VR[VRT] are set to 0xC for values greater than
or equal to 0 and to 0xD for values less than 0. A valid encoding of a signed packed decimal value
requires the following.
For PS=1, the contents of nibble element 31 (i.e., sign – The contents of nibble 31 (sign code) must be a
code) of VR[VRT] are set to 0xF for values greater than value in the range 0xA to 0xF.
or equal to 0 and to 0xD for values less than 0. – The contents of each nibble 0-30 must be a value
in the range 0x0 to 0x9.
If the signed integer value in VR[VRB] is greater than
1031-1 or less than -1031-1, the value is too large to be Packed decimal values with sign codes of 0xA, 0xC,
represented in packed decimal format, and the 0xE, or 0xF are interpreted as positive values.
contents of VR[VRT] are undefined.
Packed decimal values with sign codes of 0xB or 0xD
CR field 6 is set to reflect src compared to zero and are interpreted as negative values.
whether or not src is too large in magnitude to be
represented in packed decimal format. CR field 6 is set to reflect src compared to zero.
The decimal value in VR[VRA] is placed into VR[VRT] Packed decimal values with sign codes of 0xA, 0xC,
with the sign code of the decimal value in VR[VRB]. 0xE, or 0xF are interpreted as positive values.
CR field 6 is set to reflect the result compared to zero. Packed decimal values with sign codes of 0xB or 0xD
are interpreted as negative values.
If either the decimal value in VR[VRA] or the decimal
value in VR[VRB] is an invalid encoding, the contents of If src is negative, src is placed into VR[VRT] with the
VR[VRT] are undefined and CR field 6 is set to 0b0001. sign code set to 0xD.
Special Registers Altered: If src is positive and PS=0, src is placed into VR[VRT]
CR field 6 with the sign code set to 0xC.
Decimal Shift VX-form Let n be the signed integer value in byte element 7 of
VR[VRA].
bcds. VRT,VRA,VRB,PS
Decimal Unsigned Shift VX-form Let n be the signed integer value in byte element 7 of
VR[VRA].
bcdus. VRT,VRA,VRB
eq_flag (VR[VRB].nibble[0:31] = 0) If n is less than zero, src is shifted right -n digits. Zeros
gt_flag (eq_flag=0) are supplied to vacated digits on the left.
if (n >si 0) then do // shift left The shifted result is placed into VR[VRT].
shcnt (n<33) ? n : 32
src.nibble[0:31] VR[VRB] CR field 6 is set to reflect src compared to zero,
src.nibble[32:63] DUP(0b0000,32) including whether or not significant digits were shifted
result src.nibble[shcnt:shcnt+31] out when the shift count is positive (i.e., left shift
ox_flag (shcnt > 0) & (src.nibble[0:shcnt-1] != 0)
operation).
end
else do // shift right
shcnt ((¬n+1)<33) ? (¬n+1) : 32
If src is an invalid encoding of a packed decimal value,
src.nibble[0:31] DUP(0b0000,32) the contents of VR[VRT] are undefined and CR field 6 is
src.nibble[32:63] VR[VRB] set to 0b0001.
result src.nibble[32-shcnt:63-shcnt]
ox_flag 0 Special Registers Altered:
end CR field 6
CR.bit[56] 0b0
CR.bit[57] inv_flag ? 0b0 : gt_flag
CR.bit[58] inv_flag ? 0b0 : eq_flag
CR.bit[59] inv_flag | ox_flag
Decimal Shift and Round VX-form Let n be the signed integer value in byte element 7 of
VR[VRA].
bcdsr. VRT,VRA,VRB,PS
CR.bit[56] inv_flag ? 0b0 : lt_flag If src is an invalid encoding of a packed decimal value,
CR.bit[57] inv_flag ? 0b0 : gt_flag the contents of VR[VRT] are undefined and CR field 6 is
CR.bit[58] inv_flag ? 0b0 : eq_flag
set to 0b0001.
CR.bit[59] inv_flag | ox_flag
Decimal Truncate VX-form Let length be the integer value in bits 48:63 of VR[VRA].
bcdtrunc. VRT,VRA,VRB,PS Let src be the signed decimal value in VR[VRB].
4 VRT VRA VRB 1 PS 257
0 6 11 16 21 22 23 31
A valid encoding of a packed decimal source operand
requires the following.
if MSR.VEC=0 then Vector_Unavailable() – The contents of nibble 31 (sign code) must be a
value in the range 0xA to 0xF.
inv_flag (VR[VRB].nibble[31] < 0xA) – The contents of each nibble 0-30 must be a value
do i = 0 to 30 in the range 0x0 to 0x9.
inv_flag inv_flag | (VR[VRB].nibble[i] > 0x9)
end
Packed decimal values with sign codes of 0xA, 0xC,
length VR[VRA].bit[48:63] 0xE, or 0xF are interpreted as positive values.
ox_flag 0
Packed decimal values with sign codes of 0xB or 0xD
src_sign (VR[VRB].nibble[31] = 0xB) | are interpreted as negative values.
(VR[VRB].nibble[31] = 0xD)
If src is negative, the sign code of the result is set to
eq_flag (VR[VRB].nibble[0:30] = 0) 0b1101.
lt_flag src_sign & ¬eq_flag
gt_flag ¬src_sign & ¬eq_flag
If src is positive, the sign code of the result is set to
if length < 31 then do 0b1100 if PS=0 and is set to 0b1111 if PS=1.
do i = 0 to 30-length
if VR[VRB].nibble[i]!=0b0000 then ox_flag 1 src is copied into VR[VRT] with the leftmost 31-length
result.nibble[i] 0b0000 digits each set to 0b0000. If any of the leftmost
end 31-length digits of the signed decimal value in VR[VRB]
if length > 0 then do are non-zero, an overflow occurs.
do i = 31-length to 30
result.nibble[i] VR[VRB].nibble[i] CR field 6 is set to reflect src compared to zero,
end including whether or not significant digits were
end
truncated.
end
else result.nibble[0:30] VR[VRB].nibble[0:30]
If src is an invalid encoding of a packed decimal value,
result.nibble[31] (src_sign=0) ? ((PS=0) ? 0xC : 0xF) : 0xD the contents of VR[VRT] are undefined and CR field 6 is
set to 0b0001.
VR[VRT] inv_flag ? undefined : result
Special Registers Altered:
CR.bit[56] inv_flag ? 0b0 : lt_flag CR field 6
CR.bit[57] inv_flag ? 0b0 : gt_flag
CR.bit[58] inv_flag ? 0b0 : eq_flag
CR.bit[59] inv_flag | ox_flag
Decimal Unsigned Truncate VX-form Let length be the integer value in bits 48:63 of VR[VRA].
bcdutrunc. VRT,VRA,VRB Let src be the unsigned decimal value in VR[VRB].
4 VRT VRA VRB 1 / 321
0 6 11 16 21 22 23 31
A valid encoding of a packed decimal source operand
requires the contents of each nibble 0-31 must be a
if MSR.VEC=0 then Vector_Unavailable() value in the range 0x0 to 0x9.
CR.bit[56] 0b0
CR.bit[57] inv_flag ? 0b0 : gt_flag
CR.bit[58] inv_flag ? 0b0 : eq_flag
CR.bit[59] inv_flag | ox_flag
VSCR (VRB)96:127
VRT 96
0 || (VSCR)
7.1 Introduction
VSR[0]
VSR[1]
…
…
VSR[62]
VSR[63]
0 127
SQ/UQ/QP/BCD
SD/UD/MD/DP 0 SD/UD/MD/DP 1
SW/UW/MW/SP 0 SW/UW/MW/SP 1 SW/UW/MW/SP 2 SW/UW/MW/SP 3
HP 0 HP 1 HP 2 HP 3 HP 4 HP 5 HP 6 HP 7
0 16 32 48 64 80 96 112 127
VSR[0] FPR[0]
VSR[1] FPR[1]
…
…
VSR[30] FPR[30]
VSR[31] FPR[31]
VSR[32]
VSR[33]
…
…
VSR[62]
VSR[63]
0 63 127
VSR[0]
VSR[1]
…
…
VSR[30]
VSR[31]
VSR[32] VR[0]
VSR[33] VR[1]
…
…
VSR[62] VR[30]
VSR[63] VR[31]
0 127
Result Flags
Result Value Class
C FL FG FE FU
1 0 0 0 1 Quiet NaN
0 1 0 0 1 - Infinity
0 1 0 0 0 - Normalized Number
1 1 0 0 0 - Denormalized Number
1 0 0 1 0 - Zero
0 0 0 1 0 + Zero
1 0 1 0 0 + Denormalized Number
0 0 1 0 0 + Normalized Number
0 0 1 0 1 + Infinity
This section describes the floating-point arithmetic and A floating-point number consists of a signed exponent
exception model supported by Vector-Scalar and a signed significand. The quantity expressed by
Extension. Except for extensions to support 32-bit this number is the product of the significand and the
single-precision floating-point vector operations, the number 2exponent. Encodings are provided in the data
models are identical to that described in Chapter 4. format to represent finite numeric values, Infinity, and
Floating-Point Facility. values that are “Not a Number” (NaN). Operations
involving infinities produce results obeying traditional
The processor (augmented by appropriate software mathematical conventions. NaNs have no
support, where required) implements a floating-point mathematical interpretation. Their encoding permits a
system compliant with the ANSI/IEEE Standard variable diagnostic information field. NaNs might be
754-1985, IEEE Standard for Binary Floating-Point used to indicate such things as uninitialized variables
Arithmetic (hereafter referred to as the IEEE standard). and can be produced by certain invalid operations.
That standard defines certain required "operations"
(addition, subtraction, and so on). Herein, the term, There is one class of exceptional events that occur
floating-point operation, is used to refer to one of these during instruction execution that is unique to
required operations and to additional operations Vector-Scalar Extension and Floating-Point: the
defined (e.g., those performed by Multiply-Add or Floating-Point Exception. Floating-point exceptions are
Reciprocal Estimate instructions). A Non-IEEE mode is signaled with bits set in the FPSCR. They can cause
also provided. This mode, which is permitted to the system floating-point enabled exception error
produce results not in strict compliance with the IEEE handler to be invoked, precisely or imprecisely, if the
standard, allows shorter latency. proper control bits are set.
These instructions are divided into two categories. – Invalid Operation exception (VX)
SNaN (VXSNAN)
– computational instructions Infinity-Infinity (VXISI)
Infinity÷Infinity (VXIDI)
The computational instructions are those that Zero÷Zero (VXZDZ)
perform addition, subtraction, multiplication, Infinity×Zero (VXIMZ)
division, extracting the square root, rounding, Invalid Compare (VXVC)
conversion, comparison, and combinations of Software-Defined Condition (VXSOFT)
these operations. These instructions provide the Invalid Square Root (VXSQRT)
floating-point operations. There are two forms of Invalid Integer Convert (VXCVI)
computational instructions, scalar, which perform – Zero Divide exception (ZX)
a single floating-point operation, and vector, which – Overflow exception (OX)
perform either two double-precision floating-point – Underflow exception (UX)
operations or four single-precision operations. – Inexact exception (XX)
Computational instructions place status
information into the Floating-Point Status and Each floating-point exception, and each category of
Control Register. They are the instructions Invalid Operation exception, has an exception bit in the
described in Sections 7.6.1.3 through 7.6.1.8.2. FPSCR. In addition, each floating-point exception has
a corresponding enable bit in the FPSCR. See
– noncomputational instructions Section 7.2.2, “Floating-Point Status and Control
Register” on page 367 for a description of these
The noncomputational instructions are those that exception and enable bits, and Section 7.3.3 , “VSX
perform loads and stores, move the contents of a Floating-Point Execution Models” on page 384 for a
VSR to another floating-point register possibly detailed discussion of floating-point exceptions,
altering the sign, and select the value from one of including the effects of the enable bits.
two VSRs based on the value in a third VSR. The
S EXP FRACTION
0 1 6 15
Figure 109. Floating-point half-precision format
S EXP FRACTION
0 9 31
Figure 110. Floating-point single-precision format
S EXP FRACTION
01 12 63
Figure 111.Floating-point double-precision format
S EXP FRACTION
01 16 127
Figure 112.Floating-point quad-precision format (binary128)
Nmax (2-2-10) × 2156.6 × 104 (1-2-24) x 21283.4 x 1038 (1-2-53) x 210241.8 x 10308 (1-2-113) x 2163841.2 x 104932
Nmin 1.0 × 2-146.1 × 10-5 1.0 x 2-1261.2 x 10-38 1.0 x 2-10222.2 x 10-308 1.0 x 2-163823.4 x 10-4932
Dmin 1.0 × 2-246.0 × 10-8 1.0 x 2-1491.4 x 10-45 1.0 x 2-10744.9 x 10-324 1.0 x 2-164946.5 x 10-4966
Value is approximate
Dmin Smallest (in magnitude) representable denormalized number.
Nmax Largest (in magnitude) representable number.
Nmin Smallest (in magnitude) representable normalized number.
Table 3. IEEE floating-point fields
7.3.2.2 Value Representation can have a positive or negative sign. The sign of
zero is ignored by comparison operations (that is,
This architecture defines numeric and nonnumeric comparison regards +0 as equal to -0).
values representable within each of the three
supported formats. The numeric values are Denormalized numbers (DEN)
approximations to the real numbers and include the These are values that have a biased exponent
normalized numbers, denormalized numbers, and zero value of zero and a nonzero fraction value. They
values. The nonnumeric values representable are the are nonzero numbers smaller in magnitude than
infinities and the Not a Numbers (NaNs). The infinities the representable normalized numbers. They are
are adjoined to the real numbers, but are not numbers values in which the implied unit bit is 0.
themselves, and the standard rules of arithmetic do not Denormalized numbers are interpreted as follows:
hold when they are used in an operation. They are
related to the real numbers by order alone. It is DEN = (-1)s x 2Emin x (0.fraction)
possible however to define restricted operations
among numbers and infinities as defined below. The where Emin is the minimum representable
relative location on the real number line for each of the exponent value.
defined entities is shown in Figure 113.
-14 for half-precision
Figure 113.Approximation to real numbers -126 for single-precision
-1022 for double-precision
-INF -NOR -DEN –0 +0 +DEN +NOR +INF -16382 for quad-precision.
Infinities (INF)
These are values that have the maximum biased
The NaNs are not related to the numeric values or exponent value:
infinities by order or value but are encodings used to
convey diagnostic information such as the 31 in half-precision format
representation of uninitialized variables. 255 in single-precision format
2047 in double-precision format
The following is a description of the different 32767 in quad-precision format
floating-point values defined in the architecture:
and a zero fraction value. They are used to
Binary floating-point numbers approximate values greater in magnitude than the
Machine representable values used as maximum normalized value.
approximations to real numbers. Three categories
of numbers are supported: normalized numbers, Infinity arithmetic is defined as the limiting case of
denormalized numbers, and zero values. real arithmetic, with restricted operations defined
among numbers and infinities. Infinities and the
Normalized numbers (NOR) real numbers can be related by ordering in the
These are values that have a biased exponent affine sense:
value in the range:
-Infinity < every finite number < +Infinity
1 to 30 in half-precision format
1 to 254 in single-precision format Arithmetic on infinities is always exact and does
1 to 2046 in double-precision format not signal any exception, except when an
1 to 32766 in quad-precision format exception occurs due to the invalid operations as
described in Section 7.4.1 , “Floating-Point Invalid
They are values in which the implied unit bit is 1. Operation Exception” on page 390.
Normalized numbers are interpreted as follows:
For comparison operations, +Infinity compares
NOR = (-1)s x 2E x (1.fraction) equal to +Infinity and -Infinity compares equal to
-Infinity.
where s is the sign, E is the unbiased exponent,
and 1.fraction is the significand, which is Not a Numbers (NaNs)
composed of a leading unit bit (implied bit) and a These are values that have the maximum biased
fraction part. exponent value and a nonzero fraction value. The
sign bit is ignored (that is, NaNs are neither
Zero values (0) positive nor negative). If the high-order bit of the
These are values that have a biased exponent fraction field is 0, the NaN is a Signaling NaN;
value of zero and a fraction value of zero. Zeros otherwise it is a Quiet NaN.
When an arithmetic or rounding instruction produces Double-precision operands may be used as input for
an intermediate result which carries out of the double-precision scalar arithmetic operations.
significand, or in which the significand is nonzero but
Double-precision operands may be used as input for
has a leading zero bit, it is not a normalized number
single-precision scalar arithmetic operations when
and must be normalized before it is stored. For the
trapping on overflow and underflow exceptions is
carry-out case, the significand is shifted right one bit,
disabled.
with a one shifted into the leading significand bit, and
the exponent is incremented by one. For the
Single-precision operands may be used as input for
leading-zero case, the significand is shifted left while
double-precision and single-precision scalar arithmetic
decrementing its exponent by one for each bit shifted,
operations.
until the leading significand bit becomes one. The
Guard bit and the Round bit (see Section 7.3.3.1, “VSX Double-precision operands may be used as input for
Execution Model for IEEE Operations” on page 384) double-precision vector arithmetic operations.
participate in the shift with zeros shifted into the Round
bit. The exponent is regarded as if its range were Single-precision operands may be used as input for
unlimited. single-precison vector arithmetic operations.
After normalization, or if normalization was not Instructions are also provided for manipulations which
required, the intermediate result can have a nonzero do not require double-precision or single-precision. In
significand and an exponent value that is less than the addition, instructions are provided to access an integer
minimum value that can be represented in the format representation in GPRs.
specified for the result. In this case, the intermediate
result is said to be “Tiny” and the stored result is Half-Precision Operands
determined by the rules described in Section 7.4.4 ,
“Floating-Point Underflow Exception” on page 409. Instructions are provided to convert between
These rules can require denormalization. half-precision and single-precision formats for vector
data in VSRs and between half-precision and
A number is denormalized by shifting its significand double-precision formats for scalar data. Note that
right while incrementing its exponent by 1 for each bit scalar double-precision format is identical to scalar
shifted, until the exponent is equal to the format’s single-precision format.
minimum value. If any significant bits are lost in this
shifting process, “Loss of Accuracy” has occurred (See An instruction is provided to explicitly convert
Section 7.4.4 , “Floating-Point Underflow Exception” half-precision format operands in a VSR to
on page 409) and Underflow exception is signaled. single-precision format. Scalar single-precision
floating-point is enabled with six types of instruction.
Engineering Note
When denormalized numbers are operands of 1. VSX Scalar Convert Half-Precision to
multiply, divide, and square root operations, some Double-Precision format XX2-form
implementations might prenormalize the operands
internally before performing the operations. The half-precision floating-point value in the
rightmost halfword in doubleword element 0 of the
source VSR is placed into the doubleword
7.3.2.5 Data Handling and Precision element 0 of the target VSR in double-precision
format.
Scalar double-precision floating-point data is
represented in double-precision format in VSRs and 2. VSX Scalar Convert with round Double-Precision
storage. to Half-Precision format XX2-form
For xsresp or xsrsqrtesp, if the input value is Except for xsresp or xsrsqrtesp, any
finite and has an unbiased exponent greater than double-precision value can be used in
+127, the input value is interpreted as an Infinity. single-precision scalar arithmetic operations when
OE=0 and UE=0. When OE=1 or UE=1, or if the
6. Store VSX Scalar Single-Precision instruction is xsresp or xsrsqrtesp, source
operands must be respresentable in
stxsspx converts a single-precision value that is single-precision format.
in double-precision format to single-precision
format and stores that operand into storage. No Some implementations may execute
floating-point exceptions are caused by stxsspx. single-precision arithmetic instructions faster than
(The value being stored is effectively assumed to double-precision arithmetic instructions. Therefore,
be the result of an instruction of one of the if double-precision accuracy is not required,
preceding five types.) single-precision data and instructions should be
used.
When the result of a Load VSX Scalar Single-Precision
(lxsspx), a VSX Scalar Round to Single-Precision Programming Note
(xsrsp), or a VSX Scalar Single-Precision Arithmetic[1]
instruction is stored in a VSR, the low-order 29 bits of Both single-precision and double-precision forms
FRACTION are zero. are provided for most scalar floating-point
instructions. Some scalar floating-point instructions
Programming Note are only provided in double-precision form since
their operation is identical to the equivalent scalar
VSX Scalar Round to Single-Precision (xsrsp) is
single-precision operation.
provided to allow value conversion from
double-precision to single-precision with Of the operations for which only a double-precision
appropriate exception checking and rounding. form of the instruction is provided,
xsrsp should be used to convert double-precision
floating-point values to single-precision values – instructions that return the absolute value, the
prior to storing them into single format storage negative absolute value, or the negated value
elements or using them as operands for (xsnabsdp, xsabsdp, xsnegdp) can be used
single-precision arithmetic instructions. Values to perform these operations on scalar
produced by single-precision load and arithmetic single-precision operands,
instructions are already single-precision values
and can be stored directly into single format – instructions that perform a comparison
storage elements, or used directly as operands for (xscmpodp, xscmpudp) can be used to
single-precision arithmetic instructions, without perform these operations on scalar
preceding the store, or the arithmetic instruction, single-precision operands,
by an xsrsp.
– instructions that determine the maximum
(xsmaxdp) or minimum (xsmindp) can be
used to perform these operations on scalar
single-precision operands, and
xsrdpic, xvrdpic, and xvrspic can also cause VSX Scalar Integer Doubleword to
Inexact exception. Single-Precision Format Conversion[10]
instructions converts a 64-bit signed or unsigned can be regarded as having unbounded precision and
integer to a single-precision floating-point value exponent range. All but two groups of these
and returns the result in double-precision format. instructions normalize or denormalize the intermediate
result prior to rounding and then place the final result
VSX Vector Integer Doubleword to into the target element of the target VSR in either
Double-Precision Format Conversion[1] double-precision, single-precision, or quad-precision
instructions converts the 64-bit signed or unsigned format.
integer in each doubleword element in the source
vector operand to double-precision floating-point The scalar round to double-precision integer, vector
format. round to double-precision integer, and convert
double-precision to integer instructions with biased
VSX Vector Integer Word to Double-Precision exponents ranging from 1022 through 1074 are
Format Conversion[2] instructions converts the prepared for rounding by repetitively shifting the
32-bit signed or unsigned integer in each significand right one position and incrementing the
odd-numbered word element in the source vector biased exponent until it reaches a value of 1075.
operand to double-precision floating-point format. (Intermediate results with biased exponents 1075 or
larger are already integers, and with biased exponents
VSX Vector Integer Doubleword to 1021 or less round to zero.) After rounding, the final
Single-Precision Format Conversion[3] instructions result for round to double-precision integer instructions
convert the 64-bit signed or unsigned integer in is normalized and put in double-precision format, and,
each doubleword element in the source vector for the convert double-precision to integer instructions,
operand to single-precision floating-point format. is converted to a signed or unsigned integer.
VSX Vector Integer Word to Single-Precision The vector round to single-precision integer and vector
Format Conversion[4] instructions convert the convert single-precision to integer instructions with
32-bit signed or unsigned integer in each word biased exponents ranging from 126 through 178 are
element in the source vector operand to prepared for rounding by repetitively shifting the
single-precision floating-point format. significand right one position and incrementing the
biased exponent until it reaches a value of 179.
Rounding is performed using the rounding mode (Intermediate results with biased exponents 179 or
specificed in RN. Because of the limitations of the larger are already integers, and with biased exponents
source format, only an Inexact exception can be 125 or less round to zero.) After rounding, the final
generated. result for vector round to single-precision integer is
normalized and put in double-precision format, and for
7.3.2.6 Rounding vector convert single-precision to integer is converted
to a signed or unsigned integer.
The material in this section applies to operations that
have numeric operands (that is, operands that are not FR and FI generally indicate the results of rounding.
infinities or NaNs). Rounding the intermediate result of Each of the scalar instructions which rounds its
such an operation can cause an Overflow exception, intermediate result sets these bits. There are no vector
an Underflow exception, or an Inexact exception. The instructions that modify FR and FI. If the fraction is
remainder of this section assumes that the operation incremented during rounding, FR is set to 1, otherwise
causes no exceptions and that the result is numeric. FR is set to 0. If the result is inexact, FI is set to 1,
See Section 7.3.2.2, “Value Representation” and otherwise FI is set to zero. The scalar round to
Section 7.4, “VSX Floating-Point Exceptions” for the double-precision integer instructions are exceptions to
cases not covered here. this rule, setting FR and FI to 0. The scalar
double-precision estimate instructions set FR and FI to
The floating-point arithmetic, and rounding and undefined values. The remaining scalar floating-point
conversion instructions round their intermediate instructions do not alter FR and FI.
results. With the exception of the estimate instructions,
these instructions produce an intermediate result that
Programming Note
Round to Odd rounding mode is useful when the results of a Quad-Precision Arithmetic instruction are required
to be rounded to a shorter precision while avoiding a double rounding error. In this case, the rounding mode of
the Quad-Precision Arithmetic instruction is overridden as Round To Odd by setting the RO bit in the instruction
encoding to 1, then the result of that Quad-Precision Arithmetic instruction can be rounded to the desired shorter
precision using the rounding mode specified in RN by following with a VSX Scalar Round Quad-Precision to
Double-Extended-Precision for 15-bit exponent range and 64-bit significand precision, VSX Scalar Round
Quad-Precision to Double-Precision for 11-bit exponent range and 53-bit significand precision, or VSX Scalar
Round Quad-Precision to Single-Precision for 8-bit exponent range and 24-bit significand precision. For
example,
Let Z be the intermediate arithmetic result or the Figure 114 also summarizes the rounding actions for
operand of a convert operation. If Z can be floating-point intermediate result for all supported
represented exactly in the target format, the result in rounding modes.
all rounding modes is Z as represented in the target
format. If Z cannot be represented exactly in the target
format, let Z1 and Z2 bound Z as the next larger and
next smaller numbers representable in the target
format. Then Z1 or Z2 can be used to approximate the
result in the target format.
Z2 Z1 0 Z2 Z1
Z Z
Negative values Positive values
Otherwise, choose the value that is closer to Z (Z1 or Z2). In case of a tie, choose the one that is furthest
away from 0.
Otherwise, choose the value that is closer to Z (Z1 or Z2). In case of a tie, choose the one that is even (least
significant bit is 0).
Round to Odd
Choose Z if Z is representable in the target precision.
Otherwise, choose the value (Z1 or Z2) that is odd (least significant bit is 1).
0 1 1 – Round to Odd
If IR is exact, choose IR.
1 0 0 IR midway between NL and NH
Otherwise, choose NL, and if G=1, R=1, or X=1,
1 0 1 IR closer to NH the least-significant bit of the result is set to 1.
1 1 0 Four of the rounding modes are user-selectable
1 1 1 through RN.
S C L FRACTION X’
0 1 2 3 106
Double 53 54 OR of 55:105, X’
Single 24 25 OR of 26:105, X’
These exceptions, other than Invalid Operation – An Invalid Operation exception (SNaN) can be set
exception resulting from a Software-Defined Condition, with an Invalid Operation exception (Invalid
can occur during execution of computational Integer Convert) for convert to integer instructions.
instructions. An Invalid Operation exception resulting
from a Software-Defined Condition occurs when a When an exception occurs, the writing of a result to the
Move To FPSCR instruction sets VXSOFT to 1. target register can be suppressed, or a result can be
delivered, depending on the exception.
Each floating-point exception, and each category of
Invalid Operation exception, has an exception bit in the The writing of a result to the target register is
FPSCR. In addition, each floating-point exception has suppressed for the certain kinds of exceptions, based
a corresponding enable bit in the FPSCR. The on whether the instruction is a vector or a scalar
exception bit indicates the occurrence of the instruction, so that there is no possibility that one of the
corresponding exception. If an exception occurs, the operands is lost. For other kinds of exceptions and
corresponding enable bit governs the result produced also depending on whether the instruction is a vector
by the instruction and, in conjunction with the FE0 and or a scalar instruction, a result is generated and written
FE1 bits (see page 388), whether and how the system to the destination specified by the instruction causing
floating-point enabled exception error handler is the exception. The result can be a different value for
invoked. In general, the enabling specified by the the enabled and disabled conditions for some of these
enable bit is of invoking the system error handler, not exceptions. Table 7 lists the types of exceptions and
of permitting the exception to occur. The occurrence of indicates whether a result is written to the target VSR
an exception depends only on the instruction and its or suppressed.
inputs, not on the setting of any control bits. The only
deviation from this general rule is that the occurrence Scalar Vector
of an Underflow exception depends on the setting of On exception type... Instruction Instruction
the enable bit. Results Results
Enabled Invalid Operation suppressed suppressed
Enabled Zero Divide suppressed suppressed
Enabled Overflow written suppressed
Enabled Underflow written suppressed
Enabled Inexact written suppressed
Disabled Invalid Operation written written
Scalar Vector for altering them are described in Book III. The system
On exception type... Instruction Instruction floating-point enabled exception error handler is never
Results Results invoked because of a disabled floating-point exception.
The effects of the four possible settings of these bits
Disabled Zero Divide written written
are as follows.
Disabled Overflow written written
Disabled Underflow written written FE0 FE1 Description
Disabled Inexact written written 0 0 Ignore Exceptions Mode
Floating-point exceptions do not cause the
Table 7. Exception Types Result Suppression
system floating-point enabled exception
The subsequent sections define each of the error handler to be invoked.
floating-point exceptions and specify the action that is 0 1 Imprecise Nonrecoverable Mode
taken when they are detected. The system floating-point enabled excep-
tion error handler is invoked at some point
The IEEE standard specifies the handling of
at or beyond the instruction that caused the
exceptional conditions in terms of traps and trap
enabled exception. It may not be possible
handlers. In this architecture, an FPSCR exception
to identify the excepting instruction or the
enable bit of 1 causes generation of the result value
data that caused the exception. Results
specified in the IEEE standard for the trap enabled
produced by the excepting instruction might
case; the expectation is that the exception is detected
have been used by or might have affected
by software, which revises the result. An FPSCR
subsequent instructions that are executed
exception enable bit of 0 causes generation of the
before the error handler is invoked.
default result value specified for the trap disabled (or
no trap occurs or trap is not implemented) case. The 1 0 Imprecise Recoverable Mode
expectation is that the exception is not detected by The system floating-point enabled excep-
software, which uses the default result. The result to tion error handler is invoked at some point
be delivered in each case for each exception is at or beyond the instruction that caused the
described in the following sections. enabled exception. Sufficient information is
provided to the error handler for it to identify
The IEEE default behavior when an exception occurs the excepting instruction, the operands, and
is to generate a default value and not to notify correct the result. No results produced by
software. In this architecture, if the IEEE default the excepting instruction have been used
behavior when an exception occurs is required for all by or affected subsequent instructions that
exceptions, all FPSCR exception enable bits must be are executed before the error handler is
set to 0, and Ignore Exceptions Mode (see below) invoked.
should be used. In this case, the system floating-point
enabled exception error handler is not invoked, even if 1 1 Precise Mode
floating-point exceptions occur: software can inspect The system floating-point enabled excep-
the FPSCR exception bits, if necessary, to determine tion error handler is invoked precisely at the
whether exceptions have occurred. instruction that caused the enabled excep-
tion.
In this architecture, if software is to be notified that a
given kind of exception has occurred, the In all cases, the question of whether a floating-point
corresponding FPSCR exception enable bit must be result is stored, and what value is stored, is governed
set to 1, and a mode other than Ignore Exceptions by the FPSCR exception enable bits, as described in
Mode must be used. In this case, the system subsequent sections, and is not affected by the value
floating-point enabled exception error handler is of the FE0 and FE1 bits.
invoked if an enabled floating-point exception occurs.
The system floating-point enabled exception error In all cases in which the system floating-point enabled
handler is also invoked if a Move To FPSCR instruction exception error handler is invoked, all instructions
causes an exception bit and the corresponding enable before the instruction at which the system
bit both to be 1. The Move To FPSCR instruction is floating-point enabled exception error handler is
considered to cause the enabled exception. invoked have been completed, and no instruction after
the instruction at which the system floating-point
The FE0 and FE1 bits control whether and how the enabled exception error handler is invoked has begun
system floating-point enabled exception error handler execution. The instruction at which the system
is invoked if an enabled floating-point exception floating-point enabled exception error handler is
occurs. The location of these bits and the requirements invoked has completed if it is the excepting instruction,
Programming Note
In any of the three non-Precise modes, a
Floating-Point Status and Control Register
instruction can be used to force any exceptions,
because of instructions initiated before the
Floating-Point Status and Control Register
instruction, to be recorded in the FPSCR. (This
forcing is superfluous for Precise Mode.)
For VSX Scalar Floating-Point Arithmetic, VSX Scalar DP-SP Conversion, VSX Scalar Convert Floating-Point to
Integer, and VSX Scalar Round to Floating-Point Integer instructions:
4. FPRF is unchanged.
do the following.
do the following.
2. FR, FI, and C are not modified. FPCC is set to reflect unordered.
do the following.
1. VXSNAN is set to 1.
2. VSR[XT] is not modified.
3. FR and FI are set to 0. FPRF is not modified.
do the following.
1. VXSNAN is set to 1.
2. VSR[XT] is not modified.
3. FR, FI, and FPRF are not modified.
do the following.
4. FPRF is unchanged.
For the VSX Scalar Convert with round Double-Precision to Single-Precision format (xscvdpsp) instruction:
1. VXSNAN is set to 1.
2. The single-precision representation of a Quiet NaN is placed into word element 0 of VSR[XT]. The
contents of word elements 1-3 of VSR[XT] are undefined.
For the VSX Vector Single-Precision Arithmetic instructions, VSX Vector Single-Precision Maximum/Minimum
instructions, the VSX Vector Convert with round Double-Precision to Single-Precision format (xvcvdpsp)
instruction, and the VSX Vector Round to Single-Precision Integer instructions:
2. The single-precision representation of a Quiet NaN is placed into its respective word element of
VSR[XT].
For the VSX Scalar Double-Precision Arithmetic instructions, VSX Scalar Double-Precision Maximum/Minimum
instructions, the VSX Scalar Convert Single-Precision to Double-Precision format (xscvspdp) instruction, and
the VSX Scalar Round to Double-Precision Integer instructions:
2. The double-precision representation of a Quiet NaN is placed into doubleword element 0 of VSR[XT].
The contents of doubleword element 1 of VSR[XT] are undefined.
do the following.
3. FR and FI are set to 0. FPRF is set to indicate the class of the result (Quiet NaN).
1. VXSNAN is set to 1.
3. FR and FI are set to 0. FPRF is set to indicate the class of the result (Quiet NaN).
do the following.
For VSX Scalar Convert with round Quad-Precision to Double-Precision format [using round to Odd]
(xscvqpdp[o]), do the following.
1. VXSNAN is set to 1.
2. The double-precision Quiet NaN result is placed into doubleword element 0 of VSR[VRT+32] in
double-precision format.
3. FR and FI are set to 0. FPRF is set to indicate the class of the result (Quiet NaN).
For VSX Scalar Convert with round to zero Quad-Precision to Signed Doubleword format (xscvqpsdz), do the
following.
For VSX Scalar Convert with round to zero Quad-Precision to Signed Word format (xscvqpswz), do the
following.
2. 0x7FFF_FFFF is placed into word element 1 of VSR[VRT+32] if the quad-precision operand in VSR[VRB+32]
is a positive number or +Infinity.
0x8000_0000 is placed into word element 1 of VSR[VRT+32] if the quad-precision operand in VSR[VRB+32]
is a negative number, -Infinity, or NaN.
For VSX Scalar Convert with round to zero Quad-Precision to Unsigned Doubleword format (xscvqpudz), do
the following.
For VSX Scalar Convert with round to zero Quad-Precision to Unsigned Word format (xscvqpuwz), do the
following.
2. 0xFFFF_FFFF is placed into word element 1 of VSR[VRT+32] if the quad-precision operand in VSR[VRB+32]
is a positive number or +Infinity.
0x0000_0000 is placed into word element 1 of VSR[VRT+32] if the quad-precision operand in VSR[VRB+32]
is a negative number, -Infinity, or NaN.
For VSX Scalar Convert with round Double-Precision to Half-Precision format (xscvdphp), do the following.
1. VXSNAN is set to 1.
2. The half-precision representation of a Quiet NaN is placed into the rightmost halfword of doubleword
element 0 of VSR[XT]. The contents of the leftmost 3 halfwords of doubleword element 0 of VSR[XT] are
set to 0. The contents of doubleword element 1 of VSR[XT] are undefined.
3. FR and FI are set to 0. FPRF is set to indicate the class of the result (Quiet NaN).
For VSX Scalar Convert Half-Precision to Double-Precision format (xscvhpdp), do the following.
1. VXSNAN is set to 1.
2. The double-precision representation of a Quiet NaN is placed into doubleword element 0 of VSR[XT].
The contents of doubleword element 1 of VSR[XT] are undefined.
3. FR and FI are set to 0. FPRF is set to indicate the class of the result (Quiet NaN).
For the VSX Vector Double-Precision Arithmetic instructions, VSX Vector Double-Precision Maximum/Minimum
instructions, the VSX Vector Convert Single-Precision to Double-Precision format (xvcvspdp) instruction, and
the VSX Vector Round to Double-Precision Integer instructions:
2. The double-precision representation of a Quiet NaN is placed into its respective doubleword element
of VSR[XT].
For the VSX Scalar Convert with round to zero Double-Precision to Signed Doubleword format (xscvdpsxd)
instruction:
4. FPRF is undefined.
For the VSX Scalar Convert with round to zero Double-Precision to Unsigned Doubleword format (xscvdpuxd)
instruction:
4. FPRF is undefined.
For the VSX Scalar Convert with round to zero Double-Precision to Signed Word format (xscvdpsxw)
instruction:
2. 0x7FFF_FFFF is placed into word element 1 of VSR[XT] if the double-precision operand in doubleword
element 0 of VSR[XB] is a positive number or +Infinity.
0x8000_0000 is placed into word element 1 of VSR[XT] if the double-precision operand in doubleword
element 0 of VSR[XB] is a negative number, -Infinity, or NaN.
4. FPRF is undefined.
For the VSX Scalar Convert with round to zero Double-Precision to Unsigned Word format (xscvdpuxw)
instruction:
2. 0xFFFF_FFFF is placed into word element 1 of VSR[XT] if the double-precision operand in doubleword
element 0 of VSR[XB] is a positive number or +Infinity.
0x0000_0000 is placed into word element 1 of VSR[XT] if the double-precision operand in doubleword
element 0 of VSR[XB] is a negative number, -Infinity, or NaN.
4. FPRF is undefined.
For the VSX Vector Convert with round to zero Double-Precision to Signed Doubleword format (xvcvdpsxd)
instruction:
For the VSX Vector Convert with round to zero Double-Precision to Unsigned Doubleword format (xvcvdpuxd)
instruction:
For the VSX Vector Convert with round to zero Double-Precision to Signed Word format (xvcvdpsxw)
instruction:
2. 0x7FFF_FFFF is placed intoword element i×2 of VSR[XT] if the double-precision operand in doubleword
element i of VSR[XB] is a positive number or +Infinity.
0x8000_0000 is placed into word element i×2 of VSR[XT] if the double-precision operand in doubleword
element i of VSR[XB] is a negative number, -Infinity, or NaN.
For the VSX Vector Convert with round to zero Double-Precision to Unsigned Word format (xvcvdpuxw)
instruction:
2. 0xFFFF_FFFF is placed into word element i×2 of VSR[XT] if the double-precision operand in doubleword
element i of VSR[XB] is a positive number or +Infinity.
0x0000_0000 is placed into word element i×2 of VSR[XT] if the double-precision operand in doubleword
element i of VSR[XB] is a negative number, -Infinity, or NaN.
For the VSX Vector Convert with round to zero Single-Precision to Signed Doubleword format (xvcvspsxd)
instruction:
For the VSX Vector Convert with round to zero Single-Precision to Unsigned Doubleword format (xvcvspuxd)
instruction:
For the VSX Vector Convert with round to zero Single-Precision to Signed Word format (xvcvspsxw)
instruction:
2. 0x7FFF_FFFF is placed into word element i of VSR[XT] if the single-precision operand in word element i
of VSR[XB] is a positive number or +Infinity.
0x8000_0000 is placed into word element i of VSR[XT] if the single-precision operand in word element i
of VSR[XB] is a negative number, -Infinity, or NaN.
For the VSX Vector Convert with round to zero Single-Precision to Unsigned Word format (xvcvspuxw)
instruction:
2. 0xFFFF_FFFF is placed into word element i of VSR[XT] if the single-precision operand in the
corresponding word element 2×i of VSR[XB] is a positive number or +Infinity.
0x0000_0000 is placed into word element i of VSR[XT] if the single-precision operand in word element
2×i of VSR[XB] is a negative number, -Infinity, or NaN.
For VSX Vector Convert with round Single-Precision to Half-Precision format (xscvsphp), do the following.
1. VXSNAN is set to 1.
2. The half-precision representation of a Quiet NaN is placed into the rightmost halfword of its respective
word element of VSR[XT]. The contents of the leftmost halfword of its respective word element of
VSR[XT] are set to 0.
For VSX Vector Convert Half-Precision to Single-Precision format (xscvhpsp), do the following.
1. VXSNAN is set to 1.
2. The half-precision representation of a Quiet NaN is placed into the rightmost halfword of its respective
word element of VSR[XT]. The contents of the leftmost halfword of its respective word element of
VSR[XT] are set to 0.
A Zero Divide exception also occurs when a VSX Floating-Point Reciprocal Estimate[2] instruction or a VSX
Floating-Point Reciprocal Square Root Estimate[3] instruction is executed with an operand value of zero.
The action to be taken depends on the setting of the Zero Divide Exception Enable bit of the FPSCR.
do the following.
1. ZX is set to 1.
2. Update of VSR[XT] is suppressed.
3. FR and FI are set to 0.
4. FPRF is unchanged.
1. ZX is set to 1.
2. Update of VSR[VRT+32] is suppressed.
3. FR and FI are set to 0. FPRF is not modified.
do the following.
1. ZX is set to 1.
2. Update of VSR[XT] is suppressed for all vector elements.
3. FR and FI are unchanged.
4. FPRF is unchanged.
1. ZX is set to 1.
2. An Infinity, having a sign determined by the XOR of the signs of the source operands, is placed into
doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of
VSR[XT] are undefined.
4. FPRF is set to indicate the class and sign of the result ( Infinity).
1. ZX is set to 1.
2. An Infinity, having a sign determined by the XOR of the signs of the source operands, is placed into
VSR[VRT+32] in quad-precision format.
3. FR and FI are set to 0. FPRF is set to indicate the class and sign of the result ( Infinity).
1. ZX is set to 1.
2. For each vector element causing a Zero Divide exception, an Infinity, having a sign determined by the
XOR of the signs of the source operands, is placed into its respective doubleword element of VSR[XT]
in double-precision format.
1. ZX is set to 1.
2. For each vector element causing a Zero Divide exception, an Infinity, having a sign determined by the
XOR of the signs of the source operands, is placed into its respective word element of VSR[XT] in
single-precision format.
For VSX Scalar Floating-Point Reciprocal Estimate[1] instructions and VSX Scalar Floating-Point Reciprocal
Square Root Estimate[2] instructions, do the following.
1. ZX is set to 1.
2. An Infinity, having the sign of the source operand, is placed into doubleword element 0 of VSR[XT] in
double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined.
4. FPRF is set to indicate the class and sign of the result ( Infinity).
For the VSX Vector Reciprocal Estimate Double-Precision (xvredp) and VSX Vector Reciprocal Square Root
Estimate Double-Precision (xvrsqrtedp) instructions:
1. ZX is set to 1.
2. For each vector element causing a Zero Divide exception, an Infinity, having the sign of the source
operand, is placed into its respective doubleword element of VSR[XT] in double-precision format.
For the VSX Vector Reciprocal Estimate Single-Precision (xvresp) and VSX Vector Reciprocal Square Root
Estimate Single-Precision (xvrsqrtesp) instructions:
1. ZX is set to 1.
2. For each vector element causing a Zero Divide exception, an Infinity, having the sign of the source
operand, is placed into its respective word element of VSR[XT] in single-precision format.
The action to be taken depends on the setting of the Overflow Exception Enable bit of the FPSCR.
For the VSX Vector round and Convert Double-Precision to Single-Precision format (xscvdpsp) instruction:
1. OX is set to 1.
2. If the unbiased exponent of the normalized intermediate result is less than or equal to 318
(Emax+192), the exponent is adjusted by subtracting 192. Otherwise the result is undefined.
3. The adjusted rounded result is placed into word element 0 of VSR[XT] in single-precision format. The
contents of word elements 1-3 of VSR[XT] are undefined.
4. Unless the result is undefined, FPRF is set to indicate the class and sign of the result (±Normal
Number).
1. OX is set to 1.
3. The adjusted rounded result is placed into doubleword element 0 of VSR[XT] in double-precision
format. The contents of doubleword element 1 of VSR[XT] are undefined.
4. FPRF is set to indicate the class and sign of the result (Normal Number).
1. OX is set to 1.
3. The adjusted and rounded result is placed into doubleword element 0 of VSR[XT] in double-precision
format. The contents of doubleword element 1 of VSR[XT] are undefined.
4. FPRF is set to indicate the class and sign of the result (±Normal Number).
do the following.
1. OX is set to 1.
4. Unless the result is undefined, FPRF is set to indicate the class and sign of the result (±Normal
Number).
For VSX Scalar Convert with round Quad-Precision to Double-Precision format [using round to Odd]
(xscvqpdp), do the following.
1. OX is set to 1.
2. The exponent is adjusted by subtracting 1536. If the adjusted exponent is greater than +1023 (Emax), the
result is undefined.
3. The adjusted, rounded result is placed into doubleword element 0 of VSR[VRT+32] in double-precision
format.
4. Unless the result is undefined, FPRF is set to indicate the class and sign of the result (±Normal
Number).
For VSX Scalar Convert with round Double-Precision to Half-Precision format (xscvdphp), do the following.
1. OX is set to 1.
2. The exponent is adjusted by subtracting 24. If the adjusted exponent is greater than +15 (Emax), the
result is undefined.
3. The adjusted, rounded result is placed into rightmost halfword of doubleword element 0 of VSR[XT] in
half-precision format.
The contents of the leftmost 3 halfwords of doubleword element 0 of VSR[XT] are set to 0.
4. Unless the result is undefined, FPRF is set to indicate the class and sign of the result (±Normal
Number).
For VSX Vector Double-Precision Arithmetic[1] instructions, VSX Vector Single-Precision Arithmetic[2]
instructions, and VSX Vector round and Convert Double-Precision to Single-Precision format instruction
(xvcvdpsp), do the following.
1. OX is set to 1.
For VSX Vector Convert with round Single-Precision to Half-Precision format (xvcvsphp), do the following.
1. OX is set to 1.
2. VSR[XT] is not modified.
3. FR, FI, and FPRF are not modified.
2. The result is determined by the rounding mode (RN) and the sign of the intermediate result as follows:
For VSX Scalar round and Convert Double-Precision to Single-Precision format (xscvdpsp):
3. The result is placed into word element 0 of VSR[XT] as a single-precision value. The contents of word
elements 1-3 of VSR[XT] are undefined.
4. FR is undefined.
5. FI is set to 1.
For VSX Scalar Double-Precision Arithmetic[1] instructions and VSX Scalar Single-Precision Arithmetic[2]
instructions, do the following.
3. The result is placed into doubleword element 0 of VSR[XT] as a double-precision value. The contents
of doubleword element 1 of VSR[XT] are undefined.
4. FR is undefined.
5. FI is set to 1.
do the following.
4. FR is undefined. FI is set to 1. FPRF is set to indicate the class and sign of the result.
For VSX Scalar Convert with round Quad-Precision to Double-Precision format (xscvqpdp), do the following.
4. FR is undefined. FI is set to 1. FPRF is set to indicate the class and sign of the result.
For VSX Scalar Convert with round Double-Precision to Half-Precision format (xscvdphp), do the following.
2. The result is placed into the rightmost halfword of doubleword element 0 of VSR[XT] as a half-precision
value.
The contents of the leftmost 3 halfwords of doubleword element 0 of VSR[XT] are set to 0.
3. FR is undefined. FI is set to 1. FPRF is set to indicate the class and sign of the result.
3. For each vector element causing an Overflow exception, the result is placed into its respective
doubleword element of VSR[XT] in double-precision format.
For VSX Vector Single-Precision Arithmetic[2] instructions and VSX Vector round and Convert Double-Precision
to Single-Precision format (xvcvdpsp), do the following.
3. For each vector element causing an Overflow exception, the result is placed into its respective word
element of VSR[XT] in single-precision format.
For VSX Vector Convert with round Single-Precision to Half-Precision format (xvcvsphp), do the following.
2. For each vector element causing an Overflow exception, the result is placed into the rightmost
halfword of its respective word element of VSR[XT] in half-precision format.
The contents of the leftmost halfword of its respective word element of VSR[XT] are set to 0.
Enabled:
Underflow occurs when the intermediate result is “Tiny”.
Disabled:
Underflow occurs when the intermediate result is “Tiny” and there is “Loss of Accuracy”.
A tiny result is detected before rounding, when a nonzero intermediate result computed as though both the precision
and the exponent range were unbounded would be less in magnitude than the smallest normalized number.
If the intermediate result is tiny and Underflow exception is disabled (UE=0), the intermediate result is denormalized
(see Section 7.3.2.4 , “Normalization and Denormalization” on page 377) and rounded (see Section 7.3.2.6 ,
“Rounding” on page 381) before being placed into the target VSR.
Loss of accuracy is detected when the delivered result value differs from what would have been computed were
both the precision and the exponent range unbounded.
The action to be taken depends on the setting of the Underflow Exception Enable bit of the FPSCR.
For VSX Scalar round and Convert Double-Precision to Single-Precision format (xscvdpsp), do the following.
1. UX is set to 1.
2. If the unbiased exponent of the normalized intermediate result is greater than or equal to -319
(Emin-192), the exponent is adjusted by adding 192. Otherwise the result is undefined.
3. The adjusted rounded result is placed into word element 0 of VSR[XT] in single-precision format. The
contents of word elements 1-3 of VSR[XT] are undefined.
4. Unless the result is undefined, FPRF is set to indicate the class and sign of the result (±Normal
Number).
For VSX Scalar Double-Precision Arithmetic[1] instructions and VSX Scalar Double-Precision Reciprocal
Estimate (xsredp), do the following.
1. UX is set to 1.
3. The adjusted rounded result is placed into doubleword element 0 of VSR[XT] in double-precision
format. The contents of doubleword element 1 of VSR[XT] are undefined.
4. FPRF is set to indicate the class and sign of the result (±Normal Number).
do the following.
1. UX is set to 1.
4. Unless the result is undefined, FPRF is set to indicate the class and sign of the result (±Normal
Number).
For VSX Scalar Convert with round Quad-Precision to Double-Precision format [using round to Odd]
(xscvqpdp[o]), do the following.
1. UX is set to 1.
2. The exponent of the normalized intermediate result is adjusted by adding 1536. If the adjusted
exponent is less than -1022, the result is undefined.
3. The adjusted, rounded result is placed into doubleword element 0 of VSR[VRT+32] in double-precision
format.
4. Unless the result is undefined, FPRF is set to indicate the class and sign of the result (±Normal
Number).
For VSX Scalar Single-Precision Arithmetic[1] instructions and VSX Scalar Single-Precision Reciprocal Estimate
(xsresp), do the following.
1. UX is set to 1.
3. The adjusted rounded result is placed into doubleword element 0 of VSR[XT] in double-precision
format. The contents of doubleword element 1 of VSR[XT] are undefined.
4. FPRF is set to indicate the class and sign of the result (±Normal Number).
Programming Note
The FR and FI bits are provided to allow the system floating-point enabled exception error handler, when
invoked because of an Underflow exception, to simulate a “trap disabled” environment. That is, the FR and FI
bits allow the system floating-point enabled exception error handler to unround the result, thus allowing the
result to be denormalized and correctly rounded.
For VSX Scalar Convert with round Double-Precision to Half-Precision with round (xscvdphp), do the following.
1. UX is set to 1.
2. The exponent of the normalized intermediate result is adjusted by adding 24. If the adjusted exponent
is less than -14, the result is undefined.
3. The adjusted, rounded result is placed into rightmost halfword of doubleword element 0 of VSR[XT] in
half-precision format.
The contents of the leftmost 3 halfwords of doubleword element 0 of VSR[XT] are set to 0.
4. Unless the result is undefined, FPRF is set to indicate the class and sign of the result (±Normal
Number).
For VSX Vector Floating-Point Arithmetic[1] instructions, VSX Vector Floating-Point Reciprocal Estimate[2]
instructions, and VSX Vector round and Convert Double-Precision to Single-Precision format (xvcvdpsp), do
the following.
1. UX is set to 1.
For VSX Vector Convert with round Single-Precision to Half-Precision format (xvcvsphp), do the following.
1. UX is set to 1.
2. VSR[XT] is not modified.
3. FR, FI, and FPRF are not modified.
For VSX Scalar round and Convert Double-Precision to Single-Precision format (xscvdpsp), do the following.
1. UX is set to 1.
2. The result is placed into word element 0 of VSR[XT] in single-precision format. The contents of word
elements 1-3 of VSR[XT] are undefined.
For VSX Scalar Floating-Point Arithmetic[3] instructions and VSX Scalar Reciprocal Estimate[4] instructions, do
the following.
1. UX is set to 1.
2. The result is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of
doubleword element 1 of VSR[XT] are undefined.
do the following.
1. UX is set to 1.
For VSX Scalar Convert with round Quad-Precision to Double-Precision format (xscvqpdp), do the following.
1. UX is set to 1.
For VSX Scalar Convert with round Double-Precision to Half-Precision format (xscvdphp), do the following.
1. UX is set to 1.
2. The result is placed into the rightmost halfword of doubleword element 0 of VSR[XT] as a half-precision
value.
The contents of the leftmost 3 halfwords of doubleword element 0 of VSR[XT] are set to 0.
For VSX Vector Double-Precision Arithmetic[1] instructions and VSX Vector Reciprocal Estimate
Double-Precision (xvredp), do the following.
1. UX is set to 1.
2. For each vector element causing an Underflow exception, the result is placed into its respective
doubleword element of VSR[XT] in double-precision format.
For VSX Vector Single-Precision Arithmetic[2] instructions, VSX Vector Reciprocal Estimate Single-Precision
(xvresp), and VSX Vector round and Convert Double-Precision to Single-Precision format (xvcvdpsp), do the
following.
1. UX is set to 1.
1. VSX Vector Double-Precision Arithmetic instructions:
xvadddp, xvdivdp, xvmuldp, xvsubdp, xvmaddadp, xvmaddmdp, xvmsubadp, xvmsubmdp, xvnmaddadp, xvnmaddmdp, xvnmsubadp,
xvnmsubmdp
2. VSX Vector Single-Precision Arithmetic instructions:
xvaddsp, xvdivsp, xvmulsp, xvsubsp, xvmaddasp, xvmaddmsp, xvmsubasp, xvmsubmsp, xvnmaddasp, xvnmaddmsp, xvnmsubasp,
xvnmsubmsp
2. For each vector element causing an Underflow exception, the result is placed into its respective word
element of VSR[XT] in single-precision format.
For VSX Vector Convert with round Single-Precision to Half-Precision format (xvcvsphp), do the following.
1. UX is set to 1.
2. For each vector element causing an Underflow exception, the result is placed into the rightmost
halfword of its respective word element of VSR[XT] in half-precision format.
The contents of the leftmost halfword of its respective word element of VSR[XT] are set to 0.
1. The rounded result differs from the intermediate result assuming both the precision and the exponent range of
the intermediate result to be unbounded. In this case the result is said to be inexact. (If the rounding causes an
enabled Overflow exception or an enabled Underflow exception, an Inexact exception also occurs only if the
significands of the rounded result and the intermediate result differ.)
The action to be taken depends on the setting of the Inexact Exception Enable bit of the FPSCR.
Programming Note
In some implementations, enabling Inexact exceptions can degrade performance more than does enabling other
types of floating-point exception.
When Inexact exception is enabled (UE=1) and an Inexact exception occurs, the following actions are taken:
For the VSX Vector round and Convert Double-Precision to Single-Precision format (xscvdpsp) instruction:
1. XX is set to 1.
2. The result is placed into word element 0 of VSR[XT] in single-precision format. The contents of word
elements 1-3 of VSR[XT] are undefined.
For VSX Scalar Floating-Point Arithmetic[1] instructions, VSX Scalar Round to Double-Precision Integer Exact
using Current rounding mode (xsrdpic), and VSX Scalar Integer to Floating-Point Format Conversion[2]
instructions, do the following.
1. XX is set to 1.
2. The result is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of
doubleword element 1 of VSR[XT] are undefined.
For VSX Scalar Floating-Point to Integer Word Format Conversion[3] instructions, do the following.
1. XX is set to 1.
2. The result is placed into word element 1 of VSR[XT]. The contents of word elements 0, 2, and 3 of
VSR[XT] are undefined.
do the following.
1. XX is set to 1.
3. FR is set to indicate if the rounded result was incremented. FI is set to 1. FPRF is set to indicate the
class and sign of the result.
For VSX Scalar Convert with round Quad-Precision to Double-Precision format (xscvqpdp), do the following.
1. XX is set to 1.
3. FR is set to indicate if the rounded result was incremented. FI is set to 1. FPRF is set to indicate the
class and sign of the result.
For VSX Scalar truncate & Convert Quad-Precision to Signed Doubleword (xscvqpsdz), do the following.
1. XX is set to 1.
2. The result is placed into doubleword element 0 of VSR[XT] in signed integer format.
For VSX Scalar truncate & Convert Quad-Precision to Signed Word (xscvqpswz), do the following.
1. XX is set to 1.
2. The result is placed into word element 1 of VSR[XT] in signed integer format.
For VSX Scalar truncate & Convert Quad-Precision to Unsigned Doubleword (xscvqpudz), do the following.
1. XX is set to 1.
2. The result is placed into doubleword element 0 of VSR[XT] in unsigned integer format.
For VSX Scalar truncate & Convert Quad-Precision to Unsigned Word (xscvqpuwz), do the following.
1. XX is set to 1.
2. The result is placed into word element 1 of VSR[XT] in unsigned integer format.
For VSX Scalar Convert with round Double-Precision to Half-Precision truncate (xscvdphp), do the following.
1. XX is set to 1.
2. The result is placed into the rightmost halfword of doubleword element 0 of VSR[XT] as a half-precision
value.
The contents of the leftmost 3 halfwords of doubleword element 0 of VSR[XT] are set to 0.
3. FR is set to indicate if the rounded result was incremented. FI is set to 1. FPRF is set to indicate the
class and sign of the result.
For VSX Vector Floating-Point Arithmetic[1] instructions, VSX Vector Floating-Point Reciprocal Estimate[2]
instructions, VSX Vector round and Convert Double-Precision to Single-Precision format (xvcvdpsp), VSX
Vector Double-Precision to Integer Format Conversion[3] instructions, and VSX Vector Integer to Floating-Point
Format Conversion[4] instructions, do the following.
1. XX is set to 1.
For VSX Vector Convert with round Single-Precision to Half-Precision format (xvcvsphp), do the following.
1. XX is set to 1.
2. VSR[XT] is not modified.
3. FR, FI, and FPRF are not modified.
For VSX Scalar round and Convert Double-Precision to Single-Precision format (xscvdpsp), do the following.
1. XX is set to 1.
2. The result is placed into word element 0 of VSR[XT] as a single-precision value. The contents of word
elements 1-3 of VSR[XT] are undefined.
For VSX Scalar Double-Precision Arithmetic[1] instructions, VSX Scalar Single-Precision Arithmetic[2]
instructions, VSX Scalar Round to Single-Precision (xsrsp), the VSX Scalar Round to Double-Precision Integer
Exact using Current rounding mode (xsrdpic), and VSX Scalar Integer to Double-Precision Format
Conversion[3] instructions, do the following.
1. XX is set to 1.
2. The result is placed into doubleword element 0 of VSR[XT] as a double-precision value. The contents
of doubleword element 1 of VSR[XT] are undefined.
For VSX Scalar Convert with round to zero Double-Precision To Signed Word format (xscvdpsxws) and VSX
Scalar Convert with round to zero Double-Precision To Unsigned Word format (xscvdpuxws), do the following.
1. XX is set to 1.
2. The result is placed into word element 1 of VSR[XT]. The contents of word elements 0, 2, and 3 of
VSR[XT] are undefined.
For VSX Scalar Convert with round Quad-Precision to Double-Precision format (xscvqpdp), do the following.
1. XX is set to 1.
2. The result is placed into the rightmost halfword of doubleword element 0 of VSR[XT] as a half-precision
value.
The contents of the leftmost 3 halfwords of doubleword element 0 of VSR[XT] are set to 0.
3. FR is set to indicate if the rounded result was incremented. FI is set to 1. FPRF is set to indicate the
class and sign of the result.
do the following.
1. XX is set to 1.
2. For each vector element causing an Inexact exception, the result is placed into its respective
doubleword element of VSR[XT] in double-precision format.
do the following.
1. XX is set to 1.
3. FR is set to indicate if the rounded result was incremented. FI is set to 1. FPRF is set to indicate the
class and sign of the result.
For VSX Scalar round & Convert Quad-Precision to Double-Precision (xscvqpdp), do the following.
1. XX is set to 1.
3. FR is set to indicate if the rounded result was incremented. FI is set to 1. FPRF is set to indicate the
class and sign of the result.
do the following.
1. XX is set to 1.
2. The result is placed into doubleword element 0 of VSR[VRT+32] in signed integer format.
do the following.
1. XX is set to 1.
2. The result is placed into doubleword element 0 of VSR[VRT+32] in unsigned integer format.
For VSX Vector Convert with round Single-Precision to Half-Precision format (xvcvsphp), do the following.
1. XX is set to 1.
2. For each vector element causing an Underflow exception, the result is placed into the rightmost
halfword of its respective word element of VSR[XT] in half-precision format.
The contents of the leftmost halfword of its respective word element of VSR[XT] are set to 0.
1. XX is set to 1.
2. For each vector element causing an Inexact exception, the result is placed into its respective word
element of VSR[XT] in single-precision format.
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x0000: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
Figure 122.Vector-Scalar Register contents for
0x0010: aligned quadword Load or Store VSX
Vector
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x0000: 03 02 01 00 07 06 05 04 0B 0A 09 08 0F 0E 0D 0C
0x0010:
0 1 2 3 4 5 6 7 8 9 A B C D E F
0 1 2 3 4 5 6 7 8 9 A B C D E F # Assumptions
Little-Endian storage image of array B GPR[Ra] = address of B
GPR[Rb] = 4 (index to B[1])
0x0000: 67 45 23 01 33 22 11 00 77 66 55 44 BBAA 99 88
lxvw4x Xt,Ra,Rb
0x0010: FF EEDDCC Xt: 00 11 22 33 44 55 66 77 88 99 AABBCCDDEE FF
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
0x0010: FF EEDDCC
0 1 2 3 4 5 6 7 8 9 A B C D E F
# Assumptions
GPR[A] = address of B
GPR[B] = 4 (index to B[1])
lxvw4x Xt,Ra,Rb
Xt: 00 11 22 33 44 55 66 77 88 99 AABBCCDDEE FF
0 1 2 3 4 5 6 7 8 9 A B C D E F
Storing a VSR to elements 1 through 4 of B (see Storing a VSR to elements 1 through 4 of B (see
Figure 123) into VR[VT] involves an unaligned Figure 123) into VR[VT] involves an unaligned
quadword storage access. quadword storage access.
VSX supports word-aligned vector and scalar storage VSX supports word-aligned vector and scalar storage
accesses using Big-Endian byte ordering. accesses using Little-Endian byte ordering.
Big-Endian storage image of array B Little-Endian storage image of array B
0x0000: 01 23 45 67 00 11 22 33 44 55 66 77 88 99 AA BB 0x0000: 67 45 23 01 33 22 11 00 77 66 55 44 BB AA 99 88
0x0010: CC DD EE FF 0x0010: FF EE DD CC
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
Xs: F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA BB FC FD FE FF Xs: F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA BB FC FD FE FF
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
# Assumptions # Assumptions
GPR[Ra] = address of B GPR[A] = address of B
GPR[Rb] = 4 (index to B[1]) GPR[B] = 4 (index to B[1])
0x0010: FC FD FE FF 0x0010: FF FE FD FC
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
Figure 126.Process to store unaligned quadword to Figure 127.Process to store unaligned quadword to
Big-Endian storage using Store VSX Little-Endian storage Store VSX Vector
Vector Word*4 Indexed Word*4 Indexed
x.dword[y] x<=y
Return the contents of doubleword element y of x. x and y are integer values.
~x
Return the one’s complement of x.
!x
Return 1 if the contents of x are equal to 0,
otherwise return 0.
x || y
Return the value of x concatenated with the value
of y. For example, 0b010 || 0b111 is the same as
0b010111.
x^y
Return the value of x exclusive ORed with the
value of y.
x?y:z
If the value of x is true, return the value of y,
otherwise return the value z.
x+y
x and y are integer values.
If x is a QNaN, return x.
Otherwise, if x is an SNaN, return x represented as a QNaN.
Otherwise, if y is a QNaN, return y.
Otherwise, if y is an SNaN, return y represented as a QNaN.
Otherwise, if x and y are infinities of opposite sign, return the standard QNaN.
Otherwise, return the normalized sum of x and y, having unbounded range and precision.
AddSP(x,y)
x and y are single-precision floating-point values.
If x is a QNaN, return x.
Otherwise, if x is an SNaN, return x represented as a QNaN.
Otherwise, if y is a QNaN, return y.
Otherwise, if y is an SNaN, return y represented as a QNaN.
Otherwise, if x and y are infinities of opposite sign, return the standard QNaN.
Otherwise, return the normalized sum of x added to y, having unbounded range and precision.
bfp_ABSOLUTE(x)
x is a binary floating-point value represented in the working floating-point format.
bfp_ADD(x, y)
x is a binary floating-point value represented in the working floating-point format.
y is a binary floating-point value represented in the working floating-point format.
If x is a QNaN, return x.
Otherwise, if x is an SNaN, return x represented as a QNaN.
Otherwise, if y is a QNaN, return y.
Otherwise, if y is an SNaN, return y represented as a QNaN.
Otherwise, if x and y are infinities of opposite sign, return the standard QNaN.
Otherwise, return the normalized sum of x and y, having unbounded range and precision.
bfp_COMPARE_EQ(x, y)
x is a binary floating-point value represented in the working floating-point format.
y is a binary floating-point value represented in the working floating-point format.
bfp_COMPARE_GT(x, y)
x is a binary floating-point value represented in the working floating-point format.
y is a binary floating-point value represented in the working floating-point format.
bfp_COMPARE_LT(x, y)
x is a binary floating-point value represented in the working floating-point format.
y is a binary floating-point value represented in the working floating-point format.
bfp_CONVERT_FROM_BFP16(x)
x is a floating-point value represented in half-precision format.
result.significand is shifted left until the contents bit 0 of result.significand are equal to 1.
result.exponent is decremented by the the number of bits result.significand was shifted.
Return result.
bfp_CONVERT_FROM_BFP32(x)
x is a floating-point value represented in single-precision format.
result.significand is shifted left until the contents bit 0 of result.significand are equal to 1.
result.exponent is decremented by the the number of bits result.significand was shifted.
Return result.
bfp_CONVERT_FROM_BFP64(x)
x is a binary floating-point value represented in double-precision format.
result.sign is initialized to 0.
result.exponent is initialized to 0.
result.significand is initialized to 0.
result.class.SNaN is initialized to 0.
result.class.QNaN is initialized to 0.
result.class.Infinity is initialized to 0.
result.class.Zero is initialized to 0.
result.class.Denormal is initialized to 0.
result.class.Normal is initialized to 0.
bfp_CONVERT_FROM_BFP128(x)
x is a binary floating-point value represented in quad-precision format.
result.sign is initialized to 0.
result.exponent is initialized to 0.
result.significand is initialized to 0.
result.class.SNaN is initialized to 0.
result.class.QNaN is initialized to 0.
result.class.Infinity is initialized to 0.
result.class.Zero is initialized to 0.
result.class.Denormal is initialized to 0.
result.class.Normal is initialized to 0.
bfp_CONVERT_FROM_SI64(x)
x is an integer value represented in signed doubleword integer format.
result.sign is initialized to 0.
result.exponent is initialized to 0.
result.significand is initialized to 0.
result.class.SNaN is initialized to 0.
result.class.QNaN is initialized to 0.
result.class.Infinity is initialized to 0.
result.class.Zero is initialized to 0.
result.class.Denormal is initialized to 0.
result.class.Normal is initialized to 0.
If x is equal to 0x0000_0000_0000_0000,
result.class.Zero is set to 1.
bfp_CONVERT_FROM_UI64(x)
x is an integer value represented in unsigned doubleword integer format.
result.sign is initialized to 0.
result.exponent is initialized to 0.
result.significand is initialized to 0.
result.class.SNaN is initialized to 0.
result.class.QNaN is initialized to 0.
result.class.Infinity is initialized to 0.
result.class.Zero is initialized to 0.
result.class.Denormal is initialized to 0.
result.class.Normal is initialized to 0.
bfp_CONVERT_TO_BFP16(x)
x is a floating-point value represented in the working format.
Return result.
bfp_CONVERT_TO_BFP32(x)
x is a floating-point value represented in the working format.
Return result.
bfp_CONVERT_TO_BFP64(x)
x is a floating-point value represented in the working format.
Return result.
bfp_CONVERT_TO_BFP128(x)
x is a quad-precision floating-point value that is represented in the working floating-point format.
If x is a QNaN,
the contents of bit 0 of result are set to the value of x.sign,
the contents of bits 1:15 of result are set to the value 0b111_1111_1111_1111, and
the contents of bits 16:127 of result are set to the value of bits 1:112 of x.significand.
Otherwise, if x is a Zero,
the contents of bit 0 of result are set to the value of x.sign, and
the contents of bits 1:15 of result are set to the value 0b000_0000_0000_0000, and
the contents of bits 16:127 of result are set to the value 0x0000_0000_0000_0000_0000_0000_0000.
Otherwise, if x is an Infinity,
the contents of bit 0 of result are set to the value of x.sign,
the contents of bits 1:15 of result are set to the value 0b111_1111_1111_1111, and
the contents of bits 16:127 of result are set to the value 0x0000_0000_0000_0000_0000_0000_0000.
bfp_CONVERT_TO_SI64(x)
x is an integer value represented in the working floating-point format.
bfp_CONVERT_TO_UI64(x)
x is an integer value represented in the working floating-point format.
bfp_DENORM(x, y)
x is an integer value specifying the target format’s Emin value.
y is a binary floating-point value that is represented in the working floating-point format.
If y.exponent is less than Emin, let sh_cnt be the value Emin - y.exponent.
Otherwise, let sh_cnt be the value 0.
bfp_DIVIDE(x, y)
x is a binary floating-point value that is represented in the working floating-point format.
y is a binary floating-point value that is represented in the working floating-point format.
If x is a QNaN, return x.
Otherwise, if x is an SNaN, return x represented as a QNaN.
Otherwise, if y is a QNaN, return y.
Otherwise, if y is an SNaN, return y represented as a QNaN.
Otherwise, if x and y are infinities, return the standard QNaN.
Otherwise, if x and y are zeros, return the standard QNaN.
Otherwise, if y is a zero, return infinity, having the sign of the exclusive-OR of the signs of x and y.
Otherwise, return the normalized quotient of x ÷ y, having unbounded range and precision.
bfp_INFINITY()
Return a positive floating-point infinity value, represented in the working format.
bfp_INITIALIZE(result)
result.class.Infinity 1
return(result)
bfp_INITIALIZE(x)
Return x.
bfp_MULTIPLY(x, y)
x is a binary floating-point value represented in the working floating-point format.
y is a binary floating-point value represented in the working floating-point format.
If x is a QNaN, return x.
Otherwise, if x is an SNaN, return x represented as a QNaN.
Otherwise, if y is a QNaN, return y.
Otherwise, if y is an SNaN, return y represented as a QNaN.
Otherwise, if x is an infinity and y is a zero, return the standard QNaN.
Otherwise, if x is a zero and y is an infinity, return the standard QNaN.
Otherwise, return the normalized product of x × y, having unbounded range and precision.
bfp_MULTIPLY_ADD(x, y, z)
x is a binary floating-point value represented in the working floating-point format.
y is a binary floating-point value represented in the working floating-point format.
z is a binary floating-point value represented in the working floating-point format.
If x is a QNaN, return x.
Otherwise, if x is an SNaN, return x represented as a QNaN.
Otherwise, if z is a QNaN, return z.
Otherwise, if z is an SNaN, return z represented as a QNaN.
Otherwise, if y is a QNaN, return y.
Otherwise, if y is an SNaN, return y represented as a QNaN.
Otherwise, if x is an infinity and y is a zero, return the standard QNaN.
Otherwise, if x is a zero and y is an infinity, return the standard QNaN.
Otherwise, if z and the product of x × y are Infinity values having opposite signs, return the standard QNaN.
Otherwise, return the sum of z and the normalized product of x × y, having unbounded range and precision.
bfp_NEGATE(x)
x is a binary floating-point value that is represented in the working floating-point format.
bfp_NMAX_BFP16()
Return the largest, positive, normalized half-precision floating-point value, (2-2-10)×2+15, represented in the
working format.
bfp_INITIALIZE(result)
result.exponent +15
result.significand.bit[0:10] 0b111_1111_1111
result.class.Normal 1
return(result)
bfp_NMAX_BFP64
Return the largest finite double-precision value (i.e., 21024-21024-53) in the working floating-point format.
return( bfp_CONVERT_FROM_BFP64(0x7FEF_FFFF_FFFF_FFFF) )
bfp_NMAX_BFP80
Return the largest finite double-extended-precision value (i.e., 216384-216384-65) in the working floating-point
format.
return( bfp_CONVERT_FROM_BFP80(0x7FFE_FFFF_FFFF_FFFF_FFFF) )
bfp_NMAX_BFP128
Return the largest finite quad-precision value (i.e., 216384-216384-113) in the working floating-point format.
return( bfp_CONVERT_FROM_BFP128(0x7FFE_FFFF_FFFF_FFFF_FFFF_FFFF_FFFF_FFFF) )
bfp_NMIN_BFP16()
Return the smallest, positive, normalized half-precision floating-point value, 2-14, represented in the working
format.
bfp_INITIALIZE(result)
result.exponent -14
result.significand.bit[0:10] 0b100_0000_0000
result.class.Normal 1
return(result)
bfp_NMIN_BFP64
Return the smallest, positive, normalized double-precision value, 2-1022, represented in the binary floating-point
working format.
return( bfp_CONVERT_FROM_BFP64(0x0010_0000_0000_0000) )
bfp_NMIN_BFP80
Return the smallest, positive, normalized double-extended-precision value, 2-16382, represented in the binary
floating-point working format.
return( bfp_CONVERT_FROM_BFP80(0x0001_0000_0000_0000_0000) )
bfp_NMIN_BFP128
Return the smallest, positive, normalized quad-precision value, 2-16382, represented in the binary floating-point
working format.
return( bfp_CONVERT_FROM_BFP128(0x0001_0000_0000_0000_0000_0000_0000_0000) )
bfp_QUIET(x)
x is a Signalling NaN.
Return x converted to a Quiet NaN with x.class.QNaN set to 1 and x.class.SNaN set to 0.
bfp_ROUND_CEIL(p, x)
x is a binary floating-point value that is represented in the working floating-point format and has unbounded
exponent range and significand precision. x must be rounded as presented, without prenormalization.
p is an integer value specifying the precision (i.e., number of bits) the significand is rounded to.
Return the smallest floating-point number having unbounded exponent range and a significand with a width of p
bits that is greater or equal in value to x.
bfp_ROUND_FLOOR(p, x)
x is a binary floating-point value that is represented in the working floating-point format and has unbounded
exponent range and significand precision. The value must be rounded as presented, without prenormalization.
p is an integer value specifying the precision (i.e., number of bits) the significand is rounded to.
Return the largest floating-point number having unbounded exponent range and a significand with a width of p
bits that is lesser or equal in value to x.
bfp_ROUND_TO_BFP16(x,y)
y is a normalized floating-point value represented in the working format, having unbounded exponent range and
significand precision.
If y is an QNaN, Infinity, or Zero, return y. Otherwise, if y is an SNaN, set vxsnan_flag to 1 and return the
corresponding QNaN representation of y. Otherwise, return the value y rounded to half-precision format’s
exponent range and significand precision using the rounding mode specified by x.
if bfp_COMPARE_LT(y,bfp_NMIN_BFP16()) then do
if FPSCR.UE=0 then do
do while y.exponent < -14 // denormalize y
y.significand y.significand >> 1
y.exponent y.exponent + 1
end
if x=0b00 then result bfp_ROUND_TO_BFP16_NEAR_EVEN(y)
if x=0b01 then result bfp_ROUND_TO_BFP16_TRUNC(y)
if x=0b10 then result bfp_ROUND_TO_BFP16_CEIL(y)
if x=0b11 then result bfp_ROUND_TO_BFP16_FLOOR(y)
do while result.significand.bit[0] = 0 // normalize result
result.significand result.significand << 1
result.exponent result.exponent - 1
end
ux_flag xx_flag
return(result)
end
else do
y.exponent y.exponent + 24
ux_flag 1
end
end
return(result)
bfp_ROUND_TO_BFP16_CEIL(x)
x is a normalized floating-point value represented in the working format, having unbounded exponent range and
significand precision.
Return the smallest floating-point number having unbounded exponent range but half-precision significand
precision that is greater or equal in value to x.
bfp_ROUND_TO_BFP16_FLOOR(x)
x is a normalized floating-point value represented in the working format, having unbounded exponent range and
significand precision.
Return the largest floating-point number having unbounded exponent range but half-precision significand
precision that is lesser or equal in value to x.
bfp_ROUND_TO_BFP16_NEAR_EVEN(x)
x is a normalized floating-point value represented in the working format, having unbounded exponent range and
significand precision.
Return the floating-point number having unbounded exponent range but half-precision significand precision that
is nearest in value to x (in case of a tie, the floating-point number having unbounded exponent range but
half-precision significand precision with the least-significant bit equal to 0 is used).
bfp_ROUND_TO_BFP16_TRUNC(x)
x is a normalized floating-point value represented in the working format, having unbounded exponent range and
significand precision.
Return the largest floating-point number having unbounded exponent range but half-precision significand
precision that is lesser or equal in value to x if x>0, or the smallest floating-point number having unbounded
exponent range but half0-precision significand precision that is greater or equal in value to x if x<0.
bfp_ROUND_TO_INTEGER(rmode, x)
x is a binary floating-point value that is represented in the working floating-point format and has unbounded
exponent range and significand precision.
If x is a QNaN, return x.
bfp_ROUND_ODD(p, x)
x is a binary floating-point value that is represented in the working floating-point format and has unbounded
exponent range and significand precision. x must be rounded as presented, without prenormalization.
p is an integer value specifying the precision (i.e., number of bits) the significand is rounded to.
Return x with bit p-1 of the significand set to 1 if any of the bits to the right of bit p-1 of the significand of x are
equal to 1, and all bits to the right of bit p-1 of the significand of the value returned are set to 0. Otherwise return
x with all bits to the right of bit p-1 of the significand set to 0.
bfp_ROUND_NEAR_EVEN(p, x)
x is a binary floating-point value that is represented in the working floating-point format and has unbounded
exponent range and significand precision. x must be rounded as presented, without prenormalization.
p is an integer value specifying the precision (i.e., number of bits) the significand is rounded to.
Return the floating-point number having unbounded exponent range and a significand with a width of p bits that
is nearest in value to x (in case of a tie, the floating-point number having unbounded exponent range and a
p-bit significand with the least-significant bit equal to 0 is used).
bfp_ROUND_TRUNC(p, x)
x is a binary floating-point value that is represented in the working floating-point format and has unbounded
exponent range and significand precision. x must be rounded as presented, without prenormalization.
p is an integer value specifying the precision (i.e., number of bits) the significand is rounded to.
Return the largest floating-point number having unbounded exponent range and a significand with a width of p
bits that is lesser or equal in value to x if x>0, or the smallest floating-point number having unbounded exponent
range but double-precision significand precision that is greater or equal in value to x if x<0.
bfp_ROUND_TO_BFP128(ro, rmode, x)
x is a normalized binary floating-point value that is represented in the working floating-point format and has
unbounded exponent range and significand precision.
ro is a 1-bit unsigned integer and rmode is a 2-bit unsigned integer, together specifying one of five rounding
modes to be used in rounding z.
Return the value x rounded to quad-precision under control of the specified rounding mode.
bfp_ROUND_TO_BFP80(rmode,x)
x is a normalized binary floating-point value that is represented in the working floating-point format and has
unbounded exponent range and significand precision.
rmode is a 2-bit unsigned integer, together specifying one of four rounding modes to be used in rounding x.
Return the value x rounded to double-extended-precision under control of the specified rounding mode.
bfp_ROUND_TO_BFP64(ro, rmode, x)
x is a normalized binary floating-point value that is represented in the working floating-point format and has
unbounded exponent range and significand precision.
ro is a 1-bit unsigned integer and rmode is a 2-bit unsigned integer, together specifying one of five rounding
modes to be used in rounding z.
Return the value x rounded to double-precision under control of the specified rounding mode.
bfp_SQUARE_ROOT(x)
x is a binary floating-point value that is represented in the working floating-point format and has unbounded
exponent range and significand precision.
If x is a QNaN, return x.
Otherwise, if x is an SNaN, return x represented as a QNaN.
Otherwise, if x is -Zero, return -Zero.
Otherwise, if x is negative, return the standard QNaN.
Otherwise, return the normalized square root of x, having unbounded range and precision.
ClassDP(x,y)
Return a 5-bit characterization of the double-precision floating-point number x.
ClassSP(x,y)
Return a 5-bit characterization of the single-precision floating-point number x.
CompareEQDP(x,y)
x and y are double-precision floating-point values.
If x or y is a NaN, return 0.
Otherwise, if x is equal to y, return 1.
Otherwise, return 0.
CompareEQSP(x,y)
x and y are single-precision floating-point values.
If x or y is a NaN, return 0,
Otherwise, if x is equal to y, return 1.
Otherwise, return 0.
CompareGTDP(x,y)
x and y are double-precision floating-point values.
If x or y is a NaN, return 0,
Otherwise, if x is greater than y, return 1.
Otherwise, return 0.
CompareGTSP(x,y)
x and y are single-precision floating-point values.
If x or y is a NaN, return 0.
Otherwise, if x is greater than y, return 1.
Otherwise, return 0.
CompareLTDP(x,y)
x and y are double-precision floating-point values.
If x or y is a NaN, return 0.
Otherwise, if x is less than y, return 1.
Otherwise, return 0.
CompareLTSP(x,y)
x and y are single-precision floating-point values.
If x or y is a NaN, return 0.
Otherwise, if x is less than y, return 1.
Otherwise, return 0.
ConvertDPtoSD(x)
x is a floating-point value in double-precision format.
If x is a NaN,
vxcvi_flag is set to 1,
vxsnan_flag is set to 1 if x is an SNaN, and
return 0x8000_0000_0000_0000,
Otherwise,
xx_flag is set to 1 if rnd is inexact.
return rnd in 64-bit signed integer format.
ConvertDPtoSP(x)
x is a floating-point value in double-precision format.
If x is an SNaN, vxsnan_flag is set to 1.
If x is a SNaN, returns x, converted to a QNaN, in single-precision floating-point format.
Otherwise, if x is a QNaN, an Infinity, or a Zero, returns x in single-precision floating-point format.
Otherwise, returns x, rounded to single-precision using the rounding mode specified in RN, in single-precision
floating-point format.
ox_flag is set to 1 if rounding x resulted in an Overflow exception.
ux_flag is set to 1 if rounding x resulted in an Underflow exception.
xx_flag is set to 1 if rounding x returns an inexact result.
inc_flag is set to 1 if the significand of the result was incremented during rounding.
ConvertDPtoSP_NS(x)
x is a single-precision floating-point value represented in double-precision format.
Returns x in single-precision format.
sign x.bit[0]
exponent x.bit[1:11]
fraction 0b1 » x.bit[12:63] // implicit bit set to 1 (for now)
Programming Note
If x is not representable in single-precision, some exponent and/or significand bits will be discarded, likely
producing undesirable results. The low-order 29 bits of the significand of x are discarded, more if the
unbiased exponent of x is less than -126 (i.e., denormal). Finite values of x having an unbiased exponent
less than -150 will return a result of Zero. Finite values of x having an unbiased exponent greater than +127
will result in discarding significant bits of the exponent. SNaN inputs having no significant bits in the upper
23 bits of the signifcand will return Infinity as the result. No status is set for any of these cases.
ConvertDPtoSW(x)
x is a floating-point value in double-precision format.
If x is a NaN,
vxcvi_flag is set to 1,
vxsnan_flag is set to 1 if x is an SNaN, and
return 0x8000_0000,
Otherwise,
xx_flag is set to 1 if rnd is inexact.
return rnd in 32-bit signed integer format.
ConvertDPtoUD(x)
x is a floating-point value in double-precision format.
If x is a NaN,
vxcvi_flag is set to 1,
vxsnan_flag is set to 1 if x is an SNaN, and
return 0x8000_0000_0000_0000,
Otherwise,
xx_flag is set to 1 if rnd is inexact.
return rnd in 64-bit unsigned integer format.
ConvertDPtoUW(x)
x is a floating-point value in double-precision format.
If x is a NaN,
vxcvi_flag is set to 1,
vxsnan_flag is set to 1 if x is an SNaN, and
return 0x0000_0000,
Otherwise,
xx_flag is set to 1 if rnd is inexact.
return rnd in 32-bit unsigned integer format.
ConvertFPtoDP(x)
Return the floating-point value x in DP format.
ConvertFPtoSP(x)
Return the floating-point value x in single-precision format.
ConvertSDtoFP(x)
x is a 64-bit signed integer value.
Return the value x converted to floating-point format having unbounded significand precision.
ConvertSPtoDP_NS(x)
x is a single-precision floating-point value.
Returns x in double-precision format.
sign x.bit[0]
exponent (x.bit[1] » ¬x.bit[1] » ¬x.bit[1] » ¬x.bit[1] » x.bit[2:8])
fraction 0b0 » x.bit[9:31] » 0b0_0000_0000_0000_0000_0000_0000_0000
ConvertSP64toSP(x)
x is a single-precision floating-point value in double-precision format.
Returns the value x in single-precision format. x must be representable in single-precision, or else result returned
is undefined. x may require denormalization. No rounding is performed. If x is a SNaN, it is converted to a sin-
gle-precision SNaN having the same payload as x.
sign x.bit[0]
exp x.bit[1:11] - 1023
frac x.bit[12:63]
ConvertSPtoDP(x)
x is a single-precision floating-point value.
ConvertSPtoSD(x)
x is a floating-point value in single-precision format.
If x is a NaN,
vxcvi_flag is set to 1, and
vxsnan_flag is set to 1 if x is an SNaN
return 0x8000_0000_0000_0000 and
Otherwise,
xx_flag is set to 1 if rnd is inexact, and
return rnd in 64-bit signed integer format.
ConvertSPtoSP64(x)
x is a floating-point value in single-precision format.
Returns the value x in double-precision format. If x is a SNaN, it is converted to a double-precision SNaN having
the same payload as x.
sign x.bit[0]
exp x.bit[1:8] - 127
frac x.bit[9:31]
result.bit[0] sign
result.bit[1:11] exp + 1023
result.bit[12:34] frac
result.bit[35:63] 0
return(result)
ConvertSPtoSW(x)
x is a floating-point value in single-precision format.
If x is a NaN,
vxcvi_flag is set to 1,
vxsnan_flag is set to 1 if x is an SNaN, and
return 0x8000_0000.
Otherwise,
xx_flag is set to 1 if rnd is inexact, and
return rnd in 32-bit signed integer format.
ConvertSPtoUD(x)
x is a floating-point value in single-precision format.
If x is a NaN,
vxcvi_flag is set to 1, and
vxsnan_flag is set to 1 if x is an SNaN
return 0x0000_0000_0000_0000,
Otherwise,
xx_flag is set to 1 if rnd is inexact, and
return rnd in 64-bit unsigned integer format.
ConvertSPtoUW(x)
x is a floating-point value in single-precision format.
If x is a NaN,
vxcvi_flag is set to 1,
vxsnan_flag is set to 1 if x is an SNaN, and
return 0x0000_0000.
Otherwise,
xx_flag is set to 1 if rnd is inexact, and
return rnd in 32-bit unsigned integer format.
ConvertSWtoFP(x)
x is a 32-bit signed integer value.
Return the value x converted to floating-point format having unbounded significand precision.
ConvertUDtoFP(x)
x is a 64-bit unsigned integer value.
Return the value x converted to floating-point format having unbounded significand precision.
ConvertUWtoFP(x)
x is a 32-bit unsigned integer value.
Return the value x converted to floating-point format having unbounded significand precision.
DivideDP(x,y)
x and y are double-precision floating-point values.
If x is a QNaN, return x.
Otherwise, if x is an SNaN, return x represented as a QNaN.
Otherwise, if y is a QNaN, return y.
Otherwise, if y is an SNaN, return y represented as a QNaN.
Otherwise, if x is a Zero and y is a Zero, return the standard QNaN.
Otherwise, if x is a finite, nonzero value and y is a Zero with the same sign as x, return +Infinity.
Otherwise, if x is a finite, nonzero value and y is a Zero with the opposite sign as x, return -Infinity.
Otherwise, if x is an Infinity and y is an Infinity, return the standard QNaN.
Otherwise, return the normalized quotient of x divided by y, having unbounded range and precision.
DivideSP(x,y)
x and y are single-precision floating-point values.
If x is a QNaN, return x.
Otherwise, if x is an SNaN, return x represented as a QNaN.
Otherwise, if y is a QNaN, return y.
Otherwise, if y is an SNaN, return y represented as a QNaN.
Otherwise, if x is a Zero and y is a Zero, return the standard QNaN.
Otherwise, if x is a finite, nonzero value and y is a Zero with the same sign as x, return +Infinity.
Otherwise, if x is a finite, nonzero value and y is a Zero with the opposite sign as x, return -Infinity.
Otherwise, if x is an Infinity and y is an Infinity, return the standard QNaN.
Otherwise, return the normalized quotient of x divided by y, having unbounded range and precision.
DenormDP(x)
x is a floating-point value having unbounded range and precision.
Return the value x with its significand shifted right by a number of bits equal to the difference of the -1022 and
the unbiased exponent of x, and its unbiased exponent set to -1022.
DenormSP(x)
x is a floating-point value having unbounded range and precision.
Return the value x with its significand shifted right by a number of bits equal to the difference of the -126 and
the unbiased exponent of x, and its unbiased exponent set to -126.
EXTZ32(x)
Result of extending the b-bit value x on the left with 32-b zeros, forming a 32-bit value.
b LENGTH(x)
result.bit[0:31-b] 0
result.bit[32-b:31] x
EXTZ64(x)
Result of extending the b-bit value x on the left with 64-b zeros, forming a 64-bit value.
b LENGTH(x)
result.bit[0:63-b] 0
result.bit[64-b:63] x
EXTZ128(x)
Result of extending the b-bit value x on the left with 128-b zeros, forming a 128-bit value.
b LENGTH(x)
result.bit[0:127-b] 0
result.bit[128-b:127] x
fprf_CLASS_BFP16(x)
x is a floating-point value represented in half-precision format.
Return the 5-bit code that specifies the sign and class of x.
fprf_CLASS_BFP64(x)
x is a floating-point value represented in double-precision format.
Return the 5-bit code that specifies the sign and class of x.
fprf_CLASS_BFP128(x)
x is binary floating-point value that is represented in quad-precision format.
IsInf(x)
Return 1 if x is an Infinity, otherwise return 0.
IsNaN(x)
Return 1 if x is either an SNaN or a QNaN, otherwise return 0.
IsNeg(x)
Return 1 if x is a negative, nonzero value, otherwise return 0.
IsSNaN(x)
Return 1 if x is an SNaN, otherwise return 0.
IsZero(x)
Return 1 if x is a Zero, otherwise return 0.
MaximumDP(x,y)
x and y are double-precision floating-point values.
MaximumSP(x,y)
x and y are single-precision floating-point values.
MinimumDP(x,y)
x and y are double-precision floating-point values.
MinimumSP(x,y)
x and y are single-precision floating-point values.
MultiplyAddDP(x,y,z)
x, y and z are double-precision floating-point values.
If the product of x and y is an Infinity and z is an Infinity of the opposite sign, vxisi_flag is set to 1.
If x is a QNaN, return x.
Otherwise, if x is an SNaN, return x represented as a QNaN.
Otherwise, if z is a QNaN, return z.
Otherwise, if z is an SNaN, return z represented as a QNaN.
Otherwise, if y is a QNaN, return y.
Otherwise, if y is an SNaN, return y represented as a QNaN.
Otherwise, if x is a Zero and y is an Infinity or x is an Infinity and y is an Zero, return the standard QNaN.
Otherwise, if the product of x and y is an Infinity, and z is an Infinity of the opposite sign, return the standard
QNaN.
Otherwise, return the normalized sum of z and the product of x and y, having unbounded range and precision.
MultiplyAddSP(x,y,z)
x, y and z are single-precision floating-point values.
If the product of x and y is an Infinity and z is an Infinity of the opposite sign, vxisi_flag is set to 1.
If x is a QNaN, return x.
Otherwise, if x is an SNaN, return x represented as a QNaN.
Otherwise, if z is a QNaN, return z.
Otherwise, if z is an SNaN, return z represented as a QNaN.
Otherwise, if y is a QNaN, return y.
Otherwise, if y is an SNaN, return y represented as a QNaN.
Otherwise, if x is a Zero and y is an Infinity or x is an Infinity and y is an Zero, return the standard QNaN.
Otherwise, if the product of x and y is an Infinity, and z is an Infinity of the opposite sign, return the standard
QNaN.
Otherwise, return the normalized sum of z and the product of x and y, having unbounded range and precision.
MultiplyDP(x,y)
x and y are double-precision floating-point values.
If x is a QNaN, return x.
Otherwise, if x is an SNaN, return x represented as a QNaN.
Otherwise, if y is a QNaN, return y.
Otherwise, if y is an SNaN, return y represented as a QNaN.
Otherwise, if x is a Zero and y is as Infinity or x is a Infinity and y is an Zero, return the standard QNaN.
Otherwise, return the normalized product of x and y, having unbounded range and precision.
MultiplySP(x,y)
x and y are single-precision floating-point values.
If x is a QNaN, return x.
Otherwise, if x is an SNaN, return x represented as a QNaN.
Otherwise, if y is a QNaN, return y.
Otherwise, if y is an SNaN, return y represented as a QNaN.
Otherwise, if x is a Zero and y is as Infinity or x is a Infinity and y is an Zero, return the standard QNaN.
Otherwise, return the normalized product of x and y, having unbounded range and precision.
NegateDP(x)
If the double-precision floating-point value x is a NaN, return x.
Otherwise, return the double-precision floating-point value x with its sign bit complemented.
NegateSP(x)
If the single-precision floating-point value x is a NaN, return x.
Otherwise, return the single-precision floating-point value x with its sign bit complemented.
ReciprocalEstimateDP(x)
x is a double-precision floating-point value.
If x is a QNaN, return x.
Otherwise, if x is an SNaN, return x represented as a QNaN.
Otherwise, if x is a Zero, return an Infinity with the sign of x.
Otherwise, if x is an Infinity, return a Zero with the sign of x.
Otherwise, return an estimate of the reciprocal of x having unbounded exponent range.
ReciprocalEstimateSP(x)
x is a single-precision floating-point value.
If x is a QNaN, return x.
Otherwise, if x is an SNaN, return x represented as a QNaN.
Otherwise, if x is a Zero, return an Infinity with the sign of x.
Otherwise, if x is an Infinity, return a Zero with the sign of x.
Otherwise, return an estimate of the reciprocal of x having unbounded exponent range.
ReciprocalSquareRootEstimateDP(x)
x is a double-precision floating-point value.
If x is a QNaN, return x.
Otherwise, if x is an SNaN, return x represented as a QNaN.
Otherwise, if x is a negative, nonzero value, return the default QNaN.
Otherwise, return an estimate of the reciprocal of the square root of x having unbounded exponent range.
ReciprocalSquareRootEstimateSP(x)
x is a single-precision floating-point value.
If x is a QNaN, return x.
Otherwise, if x is an SNaN, return x represented as a QNaN.
Otherwise, if x is a negative, nonzero value, return the default QNaN.
Otherwise, return an estimate of the reciprocal of the square root of x having unbounded exponent range.
reset_xflags()
vxsnan_flag is set to 0.
vximz_flag is set to 0.
vxidi_flag is set to 0.
vxisi_flag is set to 0.
vxzdz_flag is set to 0.
vxsqrt_flag is set to 0.
vxcvi_flag is set to 0.
vxvc_flag is set to 0.
ox_flag is set to 0.
ux_flag is set to 0.
xx_flag is set to 0.
zx_flag is set to 0.
RoundToDP(x,y)
x is a 2-bit unsigned integer specifying one of four rounding modes.
Return the value y rounded to double-precision under control of the rounding mode specified by x.
RoundToDPCeil(x)
x is a floating-point value having unbounded range and precision.
If x is a QNaN, return x.
RoundToDPFloor(x)
x is a floating-point value having unbounded range and precision.
If x is a QNaN, return x.
RoundToDPIntegerCeil(x)
x is a double-precision floating-point value.
If x is a QNaN, return x.
RoundToDPIntegerFloor(x)
x is a double-precision floating-point value.
If x is a QNaN, return x.
RoundToDPIntegerNearAway(x)
x is a double-precision floating-point value.
If x is a QNaN, return x.
RoundToDPIntegerNearEven(x)
x is a double-precision floating-point value.
If x is a QNaN, return x.
RoundToDPIntegerTrunc(x)
x is a double-precision floating-point value.
If x is a QNaN, return x.
RoundToDPNearEven(x)
x is a floating-point value having unbounded range and precision.
If x is a QNaN, return x.
RoundToDPTrunc(x)
x is a floating-point value having unbounded range and precision.
If x is a QNaN, return x.
RoundToSP(x,y)
x is a 2-bit unsigned integer specifying one of four rounding modes.
Return the value y rounded to single-precision under control of the rounding mode specified by x.
RoundToSPCeil(x)
x is a floating-point value having unbounded range and precision.
If x is a QNaN, return x.
RoundToSPFloor(x)
x is a floating-point value having unbounded range and precision.
If x is a QNaN, return x.
RoundToSPIntegerCeil(x)
x is a single-precision floating-point value.
If x is a QNaN, return x.
RoundToSPIntegerFloor(x)
x is a single-precision floating-point value.
If x is a QNaN, return x.
RoundToSPIntegerNearAway(x)
x is a single-precision floating-point value.
If x is a QNaN, return x.
RoundToSPIntegerNearEven(x)
x is a single-precision floating-point value.
If x is a QNaN, return x.
RoundToSPIntegerTrunc(x)
x is a single-precision floating-point value.
If x is a QNaN, return x.
RoundToSPNearEven(x)
x is a floating-point value having unbounded range and precision.
If x is a QNaN, return x.
RoundToSPTrunc(x)
x is a floating-point value having unbounded range and precision.
If x is a QNaN, return x.
Scalb(x,y)
x is a floating-point value having unbounded range and precision.
y is a signed integer.
SetFX(x)
x is one of the exception flags in the FPSCR.
SquareRootDP(x)
x is a double-precision floating-point value.
If x is a QNaN, return x.
Otherwise, if x is an SNaN, return x represented as a QNaN.
Otherwise, if x is a negative, nonzero value, return the default QNaN.
Otherwise, return the normalized square root of x, having unbounded range and precision.
SquareRootSP(x)
x is a single-precision floating-point value.
If x is a QNaN, return x.
Otherwise, if x is an SNaN, return x represented as a QNaN.
Otherwise, if x is a negative, nonzero value, return the default QNaN.
Otherwise, return the normalized square root of x, having unbounded range and precision.
57 VRT RA DS 2 31 T RA RB 588 TX
0 6 11 16 30 31 0 6 11 16 21 31
Let EA be the sum of the contents of GPR[RA], or 0 if Let EA be the sum of the contents of GPR[RA], or 0 if RA
RA=0, and the signed integer value DS<<2. is equal to 0, and the contents of GPR[RB].
When Big-Endian byte ordering is employed, the When Big-Endian byte ordering is employed, the
contents of the doubleword in storage at address EA contents of the doubleword in storage at address EA
are placed into load_data in such an order that; are placed into load_data in such an order that;
– the contents of the byte in storage at address EA – the contents of the byte in storage at address EA
are placed into byte 0 of load_data, are placed into byte 0 of load_data,
– the contents of the byte in storage at address EA+1 – the contents of the byte in storage at address EA+1
are placed into byte 1 of load_data, and so forth are placed into byte 1 of load_data, and so forth
until until
– the contents of the byte in storage at address EA+7 – the contents of the byte in storage at address EA+7
are placed into byte 7 of load_data. are placed into byte 7 of load_data.
When Little-Endian byte ordering is employed, let When Little-Endian byte ordering is employed, the
load_data be the contents of the doubleword in storage contents of the doubleword in storage at address EA
at address EA such that; are placed into load_data in such an order that;
– the contents of the byte in storage at address EA – the contents of the byte in storage at address EA
are placed into byte 7 of load_data, are placed into byte 7 of load_data,
– the contents of the byte in storage at address EA+1 – the contents of the byte in storage at address EA+1
are placed into byte 6 of load_data, and so forth are placed into byte 6 of load_data, and so forth
until until
– the contents of the byte in storage at address EA+7 – the contents of the byte in storage at address EA+7
are placed into byte 0 of load_data. are placed into byte 0 of load_data.
load_data is placed into doubleword element 0 of load_data is placed into doubleword element 0 of
VSR[XT]. VSR[XT].
The contents of doubleword element 1 of VSR[XT] are The contents of doubleword element 1 of VSR[XT] are
undefined. undefined.
Load VSX Scalar as Integer Byte & Zero Load VSX Scalar as Integer Halfword & Zero
Indexed X-form Indexed X-form
lxsibzx XT,RA,RB lxsihzx XT,RA,RB
31 T RA RB 781 TX 31 T RA RB 813 TX
0 6 11 16 21 31 0 6 11 16 21 31
if TX=0 & MSR.VSX=0 then VSX_Unavailable() if TX=0 & MSR.VSX=0 then VSX_Unavailable()
if TX=1 & MSR.VEC=0 then Vector_Unavailable() if TX=1 & MSR.VEC=0 then Vector_Unavailable()
Let the effective address (EA) be sum of the contents of Let the effective address (EA) be sum of the contents of
GPR[RA], or 0 if RA is equal to 0, and the contents of GPR[RA], or 0 if RA is equal to 0, and the contents of
GPR[RB]. GPR[RB].
The unsigned integer in the byte in storage addressed The unsigned integer in the halfword in storage
by EA is placed in doubleword element 0 of VSR[XT]. addressed by EA is placed in doubleword element 0 of
The contents of doubleword element 1 of VSR[XT] are VSR[XT]. The contents of doubleword element 1 of
undefined. VSR[XT] are undefined.
VSR Data Layout for lxsibzx VSR Data Layout for lxsihzx
tgt = VSR[XT] tgt = VSR[XT]
tgt.dword[0] undefined tgt.dword[0] undefined
0 64 127 0 64 127
VSR[32×TX+T].dword[0] EXTS64(MEM(EA,4))
VSR[32×TX+T].dword[1] 0xUUUU_UUUU_UUUU_UUUU
VSR[32×TX+T].dword[0] ExtendZero(MEM(EA,4))
VSR[32×TX+T].dword[1] 0xUUUU_UUUU_UUUU_UUUU
Load VSX Scalar Single DS-form Load VSX Scalar Single-Precision Indexed
X-form
lxssp VRT,DS(RA)
lxsspx XT,RA,RB
57 VRT RA DS 3
0 6 11 16 30 31 31 T RA RB 524 TX
0 6 11 16 21 31
if MSR.VEC=0 then Vector_Unavailable()
if MSR.VSX=0 then VSX_Unavailable()
EA ((RA=0) ? 0 : GPR[RA]) + EXTS(DS||0b00)
EA ((RA=0) ? 0 : GPR[RA]) + GPR[RB]
VSR[VRT+32].dword[0] ConvertSPtoSP64(MEM(EA,4))
VSR[VRT+32].dword[1] 0xUUUU_UUUU_UUUU_UUUU VSR[VRT+32].dword[0] ConvertSPtoSP64(MEM(EA,4))
VSR[VRT+32].dword[1] 0xUUUU_UUUU_UUUU_UUUU
Let XT be the value VRT + 32.
Let XT be the value 32×TX + T.
Let EA be the sum of the contents of GPR[RA], or 0 if
RA=0, and the signed integer value DS||0b00. Let EA be the sum of the contents of GPR[RA], or 0 if RA
is equal to 0, and the contents of GPR[RB].
When Big-Endian byte ordering is employed, the
contents of the word in storage at address EA are When Big-Endian byte ordering is employed, the
placed into load_data in such an order that; contents of the word in storage at address EA are
placed into load_data in such an order that;
– the contents of the byte in storage at address EA
are placed into byte 0 of load_data, – the contents of the byte in storage at address EA
are placed into byte 0 of load_data,
– the contents of the byte in storage at address EA+1
are placed into byte 1 of load_data, – the contents of the byte in storage at address EA+1
are placed into byte 1 of load_data,
– the contents of the byte in storage at address EA+2
are placed into byte 2 of load_data, and – the contents of the byte in storage at address EA+2
are placed into byte 2 of load_data, and
– the contents of the byte in storage at address EA+3
are placed into byte 3 of load_data. – the contents of the byte in storage at address EA+3
are placed into byte 3 of load_data.
When Little-Endian byte ordering is employed, the
contents of the word in storage at address EA are When Little-Endian byte ordering is employed, the
placed into load_data in such an order that; contents of the word in storage at address EA are
placed into load_data in such an order that;
– the contents of the byte in storage at address EA
are placed into byte 3 of load_data, – the contents of the byte in storage at address EA
are placed into byte 3 of load_data,
– the contents of the byte in storage at address EA+1
are placed into byte 2 of load_data, – the contents of the byte in storage at address EA+1
are placed into byte 2 of load_data,
– the contents of the byte in storage at address EA+2
are placed into byte 1 of load_data, and – the contents of the byte in storage at address EA+2
are placed into byte 1 of load_data, and
– the contents of the byte in storage at address EA+3
are placed into byte 0 of load_data. – the contents of the byte in storage at address EA+3
are placed into byte 0 of load_data.
load_data, interpreted as a single-precision
floating-point value, is placed into doubleword element load_data, interpreted as a single-precision
0 of VSR[VRT+32] in double-precision format. floating-point value, is placed in doubleword element 0
of VSR[XT] in double-precision format.
The contents of doubleword element 1 of VSR[VRT+32]
are undefined. The contents of doubleword element 1 of VSR[XT] are
undefined.
Special Registers Altered:
None Special Registers Altered
None
Load VSX Vector Byte*16 Indexed X-form Example: Loading data using Load VSX Vector
Byte*16 Indexed
lxvb16x XT,RA,RB
char X[] = { 0xF0, 0xF1, 0xF2, 0xF3,
31 T RA RB 876 TX 0xF4, 0xF5, 0xF6, 0xF7,
0 6 11 16 21 31 0xE0, 0xE1, 0xE2, 0xE3,
0xE4, 0xE5, 0xE6, 0xE7 };
if TX=0 & MSR.VSX=0 then VSX_Unavailable()
if TX=1 & MSR.VEC=0 then Vector_Unavailable()
Big-endian storage image of X
EA ((RA=0) ? 0 : GPR[RA]) + GPR[RB] addr(X): F0 F1 F2 F3 F4 F5 F6 F7 E0 E1 E2 E3 E4 E5 E6 E7
do i = 0 to 15 0 1 2 3 4 5 6 7 8 9 A B C D E F
VSR[32×TX+T].byte[i] MEM(EA+i, 1)
Little-endian storage image of X
end
addr(X): F0 F1 F2 F3 F4 F5 F6 F7 E0 E1 E2 E3 E4 E5 E6 E7
Let XT be the value 32×TX + T.
0 1 2 3 4 5 6 7 8 9 A B C D E F
Let the effective address (EA) be the sum of the
contents of GPR[RA], or 0 if RA is equal to 0, and the Loading a vector of 16 byte elements from Big-Endian
contents of GPR[RB]. storage in VSR[XT] using lxvb16x, retaining left-to-right
element ordering.
For each integer value from 0 to 15, do the following.
The contents of the byte in storage at address # Assumptions
# GPR[PX] = address of X
EA+i are placed into byte element i of VSR[XT],
lxvb16x xX,r0,rPX
Special Registers Altered:
None VSR[W]: F0 F1 F2 F3 F4 F5 F6 F7 E0 E1 E2 E3 E4 E5 E6 E7
0 1 2 3 4 5 6 7 8 9 A B C D E F
Programming Note
lxvd2x, lxvw4x, lxvh8x, lxvb16x, and lxvx exhibit Loading a vector of 16 byte elements from
identical behavior in Big-Endian mode. Little-Endian storage in VSR[XT] using lxvb16x,
retaining left-to-right element ordering.
# Assumptions
# GPR[PX] = address of X
lxvb16x xX,r0,rPX
VSR[X]: F0 F1 F2 F3 F4 F5 F6 F7 E0 E1 E2 E3 E4 E5 E6 E7
0 1 2 3 4 5 6 7 8 9 A B C D E F
31 T RA RB 269 TX
0 6 11 16 21 31
EA (RA=0) ? 0 : GPR[RA]
nb EXTZ(GPR[RB].bit[0:7])
if nb>16 then nb 16
load_data 0x0000_0000_0000_0000_0000_0000_0000_0000
VSR[32×TX+T] load_data
Example: Loading less than 16-byte data into VSR using lxvl
Loading less than 16-byte data from Big-Endian Loading less than 16-byte data from Little-Endian
storage in VSR[XT] using lxvl. storage in VSR[XT] using lxvl.
addr(S)+0x0010: E2 E3 E4 E5 E6 E7 E8 E9 EA EB F0 F1 F2 F3 F4 F5 addr(S)+0x0010: E3 E2 E5 E4 E7 E6 E9 E8 EB EA F9 F8 F7 F6 F5 F4
addr(S)+0x0020: F6 F7 F8 F9 00 00 00 00 00 00 00 00 00 00 00 00 addr(S)+0x0020: F3 F2 F1 F0 00 00 00 00 00 00 00 00 00 00 00 00
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
# Assumptions # Assumptions
# GPR[NS] = 14 (length of S in # of bytes) # GPR[NS] = 14 (length of S in # of bytes)
# GPR[NX] = 12 (length of X in # of bytes) # GPR[NX] = 12 (length of X in # of bytes)
# GPR[NZ] = 10 (length of Z in # of bytes) # GPR[NZ] = 10 (length of Z in # of bytes)
# GPR[PS] = address of S # GPR[PS] = address of S
VSR[X]: E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB 00 00 00 00 VSR[X]: 00 00 00 00 EA EB E8 E9 E6 E7 E4 E5 E2 E3 E0 E1
VSR[Z]: F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 00 00 00 00 00 00 VSR[Z]: 00 00 00 00 00 00 F0 F1 F2 F3 F4 F5 F6 F7 F8 F9
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
Load VSX Vector Left-justified with Length Example: Loading less than 16-byte left-justified
X-form data
lxvll XT,RA,RB
decimal X = +1234567890123456789;
31 T RA RB 301 TX decimal Y = -123456;
0 6 11 16 21 31 decimal Z = +1004966723510220;
61 T RA DQ TX 1 31 T RA RB 4 / 12 TX
0 6 11 16 28 29 31 0 6 11 16 21 25 26 31
if TX=0 & MSR.VSX=0 then VSX_Unavailable() if TX=0 & MSR.VSX=0 then VSX_Unavailable()
if TX=1 & MSR.VEC=0 then Vector_Unavailable() if TX=1 & MSR.VEC=0 then Vector_Unavailable()
Let the effective address (EA) be the sum of the Let the effective address (EA) be the sum of the
contents of GPR[RA], or 0 if RA is equal to 0, and the contents of GPR[RA], or 0 if RA is equal to 0, and the
signed integer value DQ||0b0000. contents of GPR[RB].
When Big-Endian byte ordering is employed, the When Big-Endian byte ordering is employed, the
contents of the quadword in storage at address EA are contents of the quadword in storage at address EA are
placed into load_data in such an order that; placed into load_data in such an order that;
– the contents of the byte in storage at address EA – the contents of the byte in storage at address EA
are placed into byte element 0 of load_data, are placed into byte element 0 of load_data,
– the contents of the byte in storage at address EA+1 – the contents of the byte in storage at address EA+1
are placed into byte element 1 of load_data, and are placed into byte element 1 of load_data, and
so forth until so forth until
– the contents of the byte in storage at address – the contents of the byte in storage at address
EA+15 are placed into byte element 15 of EA+15 are placed into byte element 15 of
load_data. load_data.
When Little-Endian byte ordering is employed, the When Little-Endian byte ordering is employed, the
contents of the quadword in storage at address EA are contents of the quadword in storage at address EA are
placed into load_data in such an order that; placed into load_data in such an order that;
– the contents of the byte in storage at address EA – the contents of the byte in storage at address EA
are placed into byte element 15 of load_data, are placed into byte element 15 of load_data,
– the contents of the byte in storage at address EA+1 – the contents of the byte in storage at address EA+1
are placed into byte element 14 of load_data, and are placed into byte element 14 of load_data, and
so forth until so forth until
– the contents of the byte in storage at address – the contents of the byte in storage at address
EA+15 are placed into byte element 0 of load_data. EA+15 are placed into byte element 0 of load_data.
char W[16] = { 0xF0, 0xF1, 0xF2, 0xF3, 0xF4, 0xF5, 0xF6, 0xF7, 0xE0, 0xE1, 0xE2, 0xE3, 0xE4, 0xE5, 0xE6, 0xE7 };
short X[8] = { 0xF0F1, 0xF2F3, 0xF4F5, 0xF6F7, 0xE0E1, 0xE2E3, 0xE4E5, 0xE6E7 };
float Y[4] = { 0xF0F1_F2F3, 0xF4F5_F6F7, 0xE0E1_E2E3, 0xE4E5_E6E7 };
double Z[2] = { 0xF0F1_F2F3_F4F5_F6F7, 0xE0E1_E2E3_E4E5_E6E7 };
Loading 16 bytes of data from Big-Endian storage in Loading 16 bytes of data from Little-Endian storage in
VSR[XT] using lxvx. VSR[XT] using lxvx.
addr(W+0x0010): F0 F1 F2 F3 F4 F5 F6 F7 E0 E1 E2 E3 E4 E5 E6 E7 addr(W+0x0010): F1 F0 F3 F2 F5 F4 F7 F6 E1 E0 E3 E2 E5 E4 E7 E6
addr(W+0x0020): F0 F1 F2 F3 F4 F5 F6 F7 E0 E1 E2 E3 E4 E5 E6 E7 addr(W+0x0020): F3 F2 F1 F0 F7 F6 F5 F4 E3 E2 E1 E0 E7 E6 E5 E4
addr(W+0x0030): F0 F1 F2 F3 F4 F5 F6 F7 E0 E1 E2 E3 E4 E5 E6 E7 addr(W+0x0030): F7 F6 F5 F4 F3 F2 F1 F0 E7 E6 E5 E4 E3 E2 E1 E0
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
# Assumptions # Assumptions
# GPR[PW] = address of W # GPR[PW] = address of W
# GPR[PX] = address of X = GPR[PW] + 16 # GPR[PX] = address of X = GPR[PW] + 16
# GPR[PY] = address of Y = GPR[PW] + 32 # GPR[PY] = address of Y = GPR[PW] + 32
# GPR[PZ] = address of Z = GPR[PW] + 48 # GPR[PZ] = address of Z = GPR[PW] + 48
VSR[X]: F0 F1 F2 F3 F4 F5 F6 F7 E0 E1 E2 E3 E4 E5 E6 E7 VSR[X]: E6 E7 E4 E5 E2 E3 E0 E1 F6 F7 F4 F5 F2 F3 F0 F1
VSR[Y]: F0 F1 F2 F3 F4 F5 F6 F7 E0 E1 E2 E3 E4 E5 E6 E7 VSR[Y]: E4 E5 E6 E7 E0 E1 E2 E3 F4 F5 F6 F7 F0 F1 F2 F3
VSR[Z]: F0 F1 F2 F3 F4 F5 F6 F7 E0 E1 E2 E3 E4 E5 E6 E7 VSR[Z]: E0 E1 E2 E3 E4 E5 E6 E7 F0 F1 F2 F3 F4 F5 F6 F7
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
31 T RA RB 332 TX
0 6 11 16 21 31
load_data MEM(EA, 8)
VSR[32×TX+T].dword[0] load_data
VSR[32×TX+T].dword[1] load_data
Load VSX Vector Halfword*8 Indexed X-form Example: Loading data using Load VSX Vector
Halfword*8 Indexed
lxvh8x XT,RA,RB
short X[] = { 0x0001, 0x1011, 0x2021, 0x3031,
31 T RA RB 812 TX 0x4041, 0x5051, 0x6061, 0x7071 };
0 6 11 16 21 31
Let the effective address (EA) be the sum of the Loading a vector of 8 halfword elements from
contents of GPR[RA], or 0 if RA is equal to 0, and the Big-Endian storage in VSR[XT] using lxvh8x, retaining
contents of GPR[RB]. left-to-right element ordering.
0 1 2 3 4 5 6 7 8 9 A B C D E F
– the contents of the byte in storage at address
EA+2×i are placed into byte element 1 of
load_data,
Programming Note
lxvd2x, lxvw4x, lxvh8x, lxvb16x, and lxvx exhibit
identical behavior in Big-Endian mode.
Load VSX Vector Word*4 Indexed X-form – the contents of the byte in storage at address
EA+4×i+3 are placed into byte element 0 of
lxvw4x XT,RA,RB load_data.
31 T RA RB 780 TX load_data is placed into word element i of
0 6 11 16 21 31
VSR[XT].
if MSR.VSX=0 then VSX_Unavailable()
Special Registers Altered
EA ((RA=0) ? 0 : GPR[RA]) + GPR[RB] None
VSR[32×TX+T].word[0] MEM(EA, 4)
VSR[32×TX+T].word[1] MEM(EA+4, 4) VSR Data Layout for lxvw4x
VSR[32×TX+T].word[2] MEM(EA+8, 4) tgt = VSR[XT]
VSR[32×TX+T].word[3] MEM(EA+12, 4)
.word[0] .word[1] .word[2] .word[3]
Let XT be the value 32×TX + T. 0 32 64 96 127
Load VSX Vector Word & Splat Indexed X-form Example: Loading data using Load VSX Vector
Word & Splat Indexed
lxvwsx XT,RA,RB
int X = 0xF0F1_F2F3;
31 T RA RB 364 TX
0 6 11 16 21 31
Big-endian storage image of X
if TX=0 & MSR.VSX=0 then VSX_Unavailable() addr(X): F0 F1 F2 F3 00 00 00 00 00 00 00 00 00 00 00 00
if TX=1 & MSR.VEC=0 then Vector_Unavailable()
0 1 2 3 4 5 6 7 8 9 A B C D E F
EA ((RA=0) ? 0 : GPR[RA]) + GPR[RB]
Little-endian storage image of X
load_data MEM(EA,4) addr(X): F3 F2 F1 F0 00 00 00 00 00 00 00 00 00 00 00 00
do i = 0 to 3 0 1 2 3 4 5 6 7 8 9 A B C D E F
VSR[32×TX+T].word[i] load_data
end Loading scalar word data from Big-Endian storage in
VSR[XT] using lxvwsx.
Let XT be the value 32×TX + T.
# Assumptions
Let the effective address (EA) be the sum of the # GPR[PX] = address of X
contents of GPR[RA], or 0 if RA is equal to 0, and the
contents of GPR[RB]. lxvwsx xX,r0,rPX
– the contents of the byte in storage at address EA Loading scalar word data from Little-Endian storage in
are placed into byte element 0 of load_data, VSR[XT] using lxvwsx.
Store VSX Scalar Doubleword DS-form Store VSX Scalar Doubleword Indexed X-form
stxsd VRS,DS(RA) stxsdx XS,RA,RB
61 VRS RA DS 2 31 S RA RB 716 SX
0 6 11 16 30 31 0 6 11 16 21 31
Let EA be the sum of the contents of GPR[RA], or 0 if Let EA be the sum of the contents of GPR[RA], or 0 if RA
RA=0, and the signed integer value DS<<2. is equal to 0, and the contents of GPR[RB].
Let store_data be the contents of doubleword element Let store_data be the contents of doubleword element
0 of VSR[XS]. 0 of VSR[XS].
When Big-Endian byte ordering is employed, When Big-Endian byte ordering is employed,
store_data is placed in the doubleword in storage at store_data is placed in the doubleword in storage at
address EA in such order that; address EA in such order that;
– byte 0 of store_data is placed into the byte in – byte 0 of store_data is placed into the byte in
storage at address EA, storage at address EA,
– byte 1 of store_data is placed into the byte in – byte 1 of store_data is placed into the byte in
storage at address EA+1, and so forth until storage at address EA+1, and so forth until
– byte 7 of store_data is placed into the byte in – byte 7 of store_data is placed into the byte in
storage at address EA+7. storage at address EA+7.
When Little-Endian byte ordering is employed, When Little-Endian byte ordering is employed,
store_data is placed in the doubleword in storage at store_data is placed in the doubleword in storage at
address EA in such order that; address EA in such order that;
– the contents of byte 7 of doubleword element 0 of – byte 0 of store_data is placed into the byte in
VSR[VRS+32] are placed into the byte in storage at storage at address EA+7,
address EA,
– byte 1 of store_data is placed into the byte in
– the contents of byte 6 of doubleword element 0 of storage at address EA+6, and so forth until
VSR[VRS+32] are placed into the byte in storage at
address EA+1, and so forth until – byte 7 of store_data is placed into the byte in
storage at address EA.
– the contents of byte 0 of doubleword element 0 of
VSR[VRS+32] are placed into the byte in storage at Special Registers Altered
address EA+7. None
.dword[0] unused
0 64 127
Store VSX Scalar as Integer Byte Indexed Store VSX Scalar as Integer Halfword Indexed
X-form X-form
stxsibx XS,RA,RB stxsihx XS,RA,RB
31 S RA RB 909 SX 31 S RA RB 941 SX
0 6 11 16 21 31 0 6 11 16 21 31
if SX=0 & MSR.VSX=0 then VSX_Unavailable() if SX=0 & MSR.VSX=0 then VSX_Unavailable()
if SX=1 & MSR.VEC=0 then Vector_Unavailable() if SX=1 & MSR.VEC=0 then Vector_Unavailable()
Let the effective address (EA) be sum of the contents of Let the effective address (EA) be sum of the contents of
GPR[RA], or 0 if RA is equal to 0, and the contents of GPR[RA], or 0 if RA is equal to 0, and the contents of
GPR[RB]. GPR[RB].
The contents of byte element 7 of VSR[XS] are placed The contents of halfword element 3 of VSR[XS] are
into the byte in storage addressed by EA. placed into the halfword in storage addressed by EA.
VSR Data Layout for stxsibx VSR Data Layout for stxsihx
src = VSR[XS] src = VSR[XS]
unused .byte unused unused .hword[3] unused
0 56 64 127 0 48 64 127
MEM(EA,4) VSR[32×SX+S].word[1]
MEM(EA,4) ConvertSP64toSP(VSR[VRS+32].dword[0])
MEM(EA,4) ConvertSP64toSP(VSR[32×SX+S].dword[0])
Store VSX Vector Byte*16 Indexed X-form Example: Storing data using Store VSX Vector
Byte*16 Indexed
stxvb16x XS,RA,RB
char X[16];
31 S RA RB 1004 SX
0 6 11 16 21 31 VSR[X]: F0 F1 F2 F3 F4 F5 F6 F7 E0 E1 E2 E3 E4 E5 E6 E7
stxvb16x xX,r0,rPX
Let XS be the value 32×SX + S.
Big-endian storage image of X
Let the effective address (EA) be the sum of the
addr(X): F0 F1 F2 F3 F4 F5 F6 F7 E0 E1 E2 E3 E4 E5 E6 E7
contents of GPR[RA], or 0 if RA is equal to 0, and the
contents of GPR[RB]. 0 1 2 3 4 5 6 7 8 9 A B C D E F
For each integer value from 0 to 15, do the following. Loading a vector of 16 byte elements from
The contents of byte element i of VSR[XS] are Little-Endian storage in VSR[XT] using lxvb16x,
placed into the byte in storage at address EA+i. retaining left-to-right element ordering.
0 1 2 3 4 5 6 7 8 9 A B C D E F
31 S RA RB 972 SX
0 6 11 16 21 31
MEM(EA,8) VSR[32×SX+S].dword[0]
MEM(EA+8,8) VSR[32×SX+S].dword[1]
Store VSX Vector Halfword*8 Indexed X-form Example: Storing data using Store VSX Vector
Halfword*8 Indexed
stxvh8x XS,RA,RB
short X[8];
31 S RA RB 940 SX
0 6 11 16 21 31
VSR[X]: 00 01 10 11 20 21 30 31 40 41 50 51 60 61 70 71
if SX=0 & MSR.VSX=0 then VSX_Unavailable()
if SX=1 & MSR.VEC=0 then Vector_Unavailable() 0 1 2 3 4 5 6 7 8 9 A B C D E F
0 1 2 3 4 5 6 7 8 9 A B C D E F
For each integer value from 0 to 7, do the following.
The contents of byte element i of VSR[XS] are Storing a vector of 8 halfword elements from VSR[X]
placed into the byte in storage at address EA+i. into Little-Endian storage using stxvh8x, retaining
left-to-right element ordering.
For each integer value from 0 to 7, do the following.
When Big-Endian byte ordering is employed, the # Assumptions
contents of halfword element i of VSR[XS] are # GPR[PX] = address of X
placed into the halfword in storage at address
EA+2×i in such an order that; stxvh8x xX,r0,rPX
MEM(EA,4) VSR[32×SX+S].word[0]
MEM(EA+4,4) VSR[32×SX+S].word[1]
MEM(EA+8,4) VSR[32×SX+S].word[2]
MEM(EA+12,4) VSR[32×SX+S].word[3]
Store VSX Vector DQ-form Store VSX Vector with Length X-form
stxv XS,DQ(RA) stxvl XS,RA,RB
61 S RA DQ SX 5 31 S RA RB 397 SX
0 6 11 16 28 29 31 0 6 11 16 21 31
if SX=0 & MSR.VSX=0 then VSX_Unavailable() if SX=0 & MSR.VSX=0 then VSX_Unavailable()
if SX=1 & MSR.VEC=0 then Vector_Unavailable() if SX=1 & MSR.VEC=0 then Vector_Unavailable()
– byte 15 of store_data is placed into the byte in Otherwise, when Big-Endian byte-ordering is
storage at address EA+15. employed, do the following.
If nb less than 16, the contents of the leftmost nb
When Little-Endian byte ordering is employed, bytes of VSR[XS] are placed in storage starting at
store_data is placed into the quadword in storage at address EA.
address EA in such an order that;
Otherwise, the contents of VSR[XS] are placed into
– byte 15 of store_data is placed into the byte in the quadword in storage at address EA.
storage at address EA,
Otherwise, when Little-Endian byte ordering is
– byte 14 of store_data is placed into the byte in employed, do the following.
storage at address EA+1, and so forth until If nb less than 16, the contents of the rightmost nb
bytes of VSR[XS] are placed in storage starting at
– byte 0 of store_data is placed into the byte in address EA in byte-reversed order.
storage at address EA+15.
Otherwise, the contents of VSR[XS] are placed into
Special Registers Altered the quadword in storage at address EA in
None byte-reversed order.
Example: Storing less than 16-byte data from VSR using stxvl
0 1 2 3 4 5 6 7 8 9 A B C D E F
Store VSX Vector Left-justified with Length Example: Storing less than 16-byte left-justified
X-form data
stxvll XS,RA,RB decimal X = +1234567890123456789;
decimal Y = -123456;
31 S RA RB 429 SX decimal Z = +1004966723510220;
0 6 11 16 21 31
X+0x0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Otherwise, do the following.
If nb less than 16, the contents of the leftmost nb 0 1 2 3 4 5 6 7 8 9 A B C D E F
bytes of VSR[XS] are placed in storage starting at
stxvll xX,rPX,rNX
address EA. stxvll xY,rPY,rNY
stxvll xZ,rPZ,rNZ
Otherwise, the contents of VSR[XS] are placed into
the quadword in storage at address EA. Final state of Big-endian & Little-Endian storage image of X, Y, & Z
X+0x0000: 12 34 56 78 90 12 34 56 78 9C 01 23 45 6D 01 00
Data is stored from VSR[XS] into storage in
Big-Endian byte ordering (i.e., the contents of byte X+0x0010: 49 66 72 35 10 22 0C 00 00 00 00 00 00 00 00 00
element 0 of VSR[XS] are placed into the byte in
0 1 2 3 4 5 6 7 8 9 A B C D E F
storage at address EA, the contents of byte
element 1 of VSR[XS] are placed into the byte in
storage at address EA+1, and so forth).
31 S RA RB 396 SX
0 6 11 16 21 31
MEM(EA,16) VSR[32×SX+S]
Programming Note
stxvd2x, stxvw4x, stxvh8x, stxvb16x, and stxvx
exhibit identical behavior in Big-Endian mode.
char W[16] = { 0xF0, 0xF1, 0xF2, 0xF3, 0xF4, 0xF5, 0xF6, 0xF7, 0xE0, 0xE1, 0xE2, 0xE3, 0xE4, 0xE5, 0xE6, 0xE7 };
short X[8] = { 0xF0F1, 0xF2F3, 0xF4F5, 0xF6F7, 0xE0E1, 0xE2E3, 0xE4E5, 0xE6E7 };
float Y[4] = { 0xF0F1_F2F3, 0xF4F5_F6F7, 0xE0E1_E2E3, 0xE4E5_E6E7 };
double Z[2] = { 0xF0F1_F2F3_F4F5_F6F7, 0xE0E1_E2E3_E4E5_E6E7 };
Storing 16 bytes of data into Big-Endian storage from Storing 16 bytes of data into Little-Endian storage from
VSR[XS] using stxvx. VSR[XS] using stxvx.
VSR[W]: F0 F1 F2 F3 F4 F5 F6 F7 E0 E1 E2 E3 E4 E5 E6 E7 VSR[W]: E7 E6 E5 E4 E3 E2 E1 E0 F7 F6 F5 F4 F3 F2 F1 F0
VSR[X]: F0 F1 F2 F3 F4 F5 F6 F7 E0 E1 E2 E3 E4 E5 E6 E7 VSR[X]: E6 E7 E4 E5 E2 E3 E0 E1 F6 F7 F4 F5 F2 F3 F0 F1
VSR[Y]: F0 F1 F2 F3 F4 F5 F6 F7 E0 E1 E2 E3 E4 E5 E6 E7 VSR[Y]: E4 E5 E6 E7 E0 E1 E2 E3 F4 F5 F6 F7 F0 F1 F2 F3
VSR[Z]: F0 F1 F2 F3 F4 F5 F6 F7 E0 E1 E2 E3 E4 E5 E6 E7 VSR[Z]: E0 E1 E2 E3 E4 E5 E6 E7 F0 F1 F2 F3 F4 F5 F6 F7
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
# Assumptions # Assumptions
# GPR[PW] = address of W # GPR[PW] = address of W
# GPR[PX] = address of X = GPR[PW] + 16 # GPR[PX] = address of X = GPR[PW] + 16
# GPR[PY] = address of Y = GPR[PW] + 32 # GPR[PY] = address of Y = GPR[PW] + 32
# GPR[PZ] = address of Z = GPR[PW] + 48 # GPR[PZ] = address of Z = GPR[PW] + 48
addr(W+0x0010): F0 F1 F2 F3 F4 F5 F6 F7 E0 E1 E2 E3 E4 E5 E6 E7 addr(W+0x0010): F1 F0 F3 F2 F5 F4 F7 F6 E1 E0 E3 E2 E5 E4 E7 E6
addr(W+0x0020): F0 F1 F2 F3 F4 F5 F6 F7 E0 E1 E2 E3 E4 E5 E6 E7 addr(W+0x0020): F3 F2 F1 F0 F7 F6 F5 F4 E3 E2 E1 E0 E7 E6 E5 E4
addr(W+0x0030): F0 F1 F2 F3 F4 F5 F6 F7 E0 E1 E2 E3 E4 E5 E6 E7 addr(W+0x0030): F7 F6 F5 F4 F3 F2 F1 F0 E7 E6 E5 E4 E3 E2 E1 E0
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
src
The contents of doubleword element 1 of VSR[XT] are
undefined. VSR[XT]
tgt
Special Registers Altered
None
Programming Note
This instruction can be used to operate on a
single-precision source operand.
VSX Scalar Add Double-Precision XX3-form The result is placed into doubleword element 0 of
VSR[XT] in double-precision format.
xsadddp XT,XA,XB
The contents of doubleword element 1 of VSR[XT] are
60 T A B 32 AX BX TX undefined.
0 6 11 16 21 29 30 31
FPRF is set to the class and sign of the result. FR is
XT TX || T set to indicate if the result was incremented when
XA AX || A rounded. FI is set to indicate the result is inexact.
XB BX || B
reset_xflags()
If a trap-enabled invalid operation exception occurs,
src1 VSR[XA]{0:63}
src2 VSR[XB]{0:63}
VSR[XT] and FPRF are not modified, and FR and FI
v{0:inf} AddDP(src1,src2) are set to 0.
result{0:63} RoundToDP(RN,v)
if(vxsnan_flag) then SetFX(VXSNAN) See Table 51, “VSX Scalar Floating-Point Final
if(vxisi_flag) then SetFX(VXISI) Result,” on page 516.
if(ox_flag) then SetFX(OX)
if(ux_flag) then SetFX(UX) Special Registers Altered
if(xx_flag) then SetFX(XX) FPRF FR FI FX OX UX XX
vex_flag VE & (vxsnan_flag | vxisi_flag) VXSNAN VXISI
if( ~vex_flag ) then do
VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU VSR Data Layout for xsadddp
FPRF ClassSP(result) src1 = VSR[XA]
FR inc_flag
FI xx_flag DP unused
end
else do src2 = VSR[XB]
FR 0b0 DP unused
FI 0b0
end tgt = VSR[XT]
1. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared,
and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo-
nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an interme-
diate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation.
2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the num-
ber of bits the significand was shifted.
src2
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
-Infinity v -Infinity v -Infinity v -Infinity v -Infinity v -Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
-NZF v -Infinity v A(src1,src2) v src1 v src1 v A(src1,src2) v +Infinity v src2
vxsnan_flag 1
v Q(src2)
-Zero v -Infinity v src2 v -Zero v Rezd v src2 v +Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v -Infinity v src2 v Rezd v +Zero v src2 v +Infinity v src2
vxsnan_flag 1
src1
v Q(src2)
+NZF v -Infinity v A(src1,src2) v src1 v src1 v A(src1,src2) v +Infinity v src2
vxsnan_flag 1
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v src1
QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1
vxsnan_flag 1
v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 The double-precision floating-point value in doubleword element 0 of VSR[XB].
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs).
A(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision.
Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd).
Q(x) Return a QNaN with the payload of x.
v The intermediate result having unbounded signficand precision and unbounded exponent range.
Rounding Mode
Range of v Case Round To Round Towards Round Towards Round Towards Round To
Nearest (RTN) Zero (RTZ) +Infinity (RTP) –Infinity (RTM) Odd (RTO)
Explanation:
– This situation cannot occur.
v The precise intermediate result defined in the instruction having unbounded range and precision.
den(x) The significand of x is shifted right by the amount of the difference between the target rounding precision Emin and the unbiased
exponent of x. The unbiased exponent of the denormalized value is Emin. The significand of the denormalized value has
unbounded significand precision.
Emin = -16382 (quad-precision)
Emin = -16382 (double-extended-precision)
Emin = -1022 (double-precision)
Emin = -126 (single-precision)
Rezd Exact-zero-difference result. Applies only to add operations involving source operands having the same magnitude and different
signs or subtract operations involving source operands having the same magnitude and same signs. Whether +Zero or -Zero is
returned is controlled by the setting of the rounding mode in RN, even when the rounding mode is overridden to Round to Odd.
rnd(x) The significand of x is rounded to the target rounding precision according to the rounding mode specified in FPSCR.RN. Exponent
range of the rounded result is unbounded. See Section 7.3.2.6.
Nmax Largest (in magnitude) representable normalized number in the target rounding precision format.
Nmax = 2+16383 × 1.FFFFFFFFFFFFFFFFFFFFFFFFFFFF (quad-precision)
Nmax = 2+16383 × 1.FFFFFFFFFFFFFFFF000000000000 (double-extended-precision)
Nmax = 2+1023 × 1.FFFFFFFFFFFFF000000000000000 (double-precision)
Nmax = 2+127 × 1.FFFFFF0000000000000000000000 (single-precision)
Nmin Smallest (in magnitude) representable normalized number in the target rounding precision format.
Nmin = 2-16382 × 1.0000000000000000000000000000 (quad-precision)
Nmin = 2-16382 × 1.0000000000000000000000000000 (double-extended-precision)
Nmin = 2-1022 × 1.0000000000000000000000000000 (double-precision)
Nmin = 2-126 × 1.0000000000000000000000000000 (single-precision)
ulp Least significant bit in the target precision format’s significand (Unit in the Last Position).
Is q inexact? (q g v)
vxsnan_flag
vxsqrt_flag
vximz_flag
vxisi_flag
vxidi_flag
vxzdz_flag
FPSCR.VE
FPSCR.OE
FPSCR.UE
FPSCR.ZE
FPSCR.XE
zx_flag
Case Returned Results and Status Setting
Explanation:
– The results do not depend on this condition.
T(x) Places the result into the target VSR.
For scalar single-precision and double-precision results
VSR[XT].dword[0] bfp_CONVERT_TO_BFP64(r)
VSR[XT].dword[1] 0xUUUU_UUUU_UUUU_UUUU
For scalar quad-precision results
VSR[VRT+32] bfp_CONVERT_TO_BFP128(r)
class_bfp(x) Sets FPSCR.FPRF to the sign and class of x.
FPSCR.FPRF fprf_CLASS_BFP32(x) (single-precision)
FPSCR.FPRF fprf_CLASS_BFP64(x) (double-precision)
FPSCR.FPRF fprf_CLASS_BFP128(x) (quad-precision)
fx(x) FPSCR.FX is set to 1 if FPSCR.x=0. FPSCR.x is set to 1.
fi(x) FPSCR.FI is set to the value x.
fr(x) FPSCR.FR is set to the value x.
Wrap adjust
= 2192 (single-precision)
= 21536 (double-precision)
= 224576 (quad-precision)
See Table 7.4.3.2, “Action for OE=1,” on page 404 for trap-enabled Overflow exceptions.
See Table 7.4.4.2, “Action for UE=1,” on page 409 for trap-enabled Underflow exceptions.
q The value defined in Table 50, “Scalar Floating-Point Intermediate Result Handling,” on page 515, significand rounded to the
target rounding precision, unbounded exponent range.
r The value defined in Table 50, “Scalar Floating-Point Intermediate Result Handling,” on page 515, significand rounded to the
target rounding precision, exponent bounded to the target rounding precision format exponent range.
error() The system error handler is invoked for the trap-enabled exception if MSR.FE0 and MSR.FE1 are set to any mode other than the
ignore-exception mode.
Is q inexact? (q g v)
vxsnan_flag
vxsqrt_flag
vximz_flag
vxisi_flag
vxidi_flag
vxzdz_flag
FPSCR.VE
FPSCR.OE
FPSCR.UE
FPSCR.ZE
FPSCR.XE
zx_flag
Case Returned Results and Status Setting
Explanation:
– The results do not depend on this condition.
T(x) Places the result into the target VSR.
For scalar single-precision and double-precision results
VSR[XT].dword[0] bfp_CONVERT_TO_BFP64(r)
VSR[XT].dword[1] 0xUUUU_UUUU_UUUU_UUUU
For scalar quad-precision results
VSR[VRT+32] bfp_CONVERT_TO_BFP128(r)
class_bfp(x) Sets FPSCR.FPRF to the sign and class of x.
FPSCR.FPRF fprf_CLASS_BFP32(x) (single-precision)
FPSCR.FPRF fprf_CLASS_BFP64(x) (double-precision)
FPSCR.FPRF fprf_CLASS_BFP128(x) (quad-precision)
fx(x) FPSCR.FX is set to 1 if FPSCR.x=0. FPSCR.x is set to 1.
fi(x) FPSCR.FI is set to the value x.
fr(x) FPSCR.FR is set to the value x.
Wrap adjust
= 2192 (single-precision)
= 21536 (double-precision)
= 224576 (quad-precision)
See Table 7.4.3.2, “Action for OE=1,” on page 404 for trap-enabled Overflow exceptions.
See Table 7.4.4.2, “Action for UE=1,” on page 409 for trap-enabled Underflow exceptions.
q The value defined in Table 50, “Scalar Floating-Point Intermediate Result Handling,” on page 515, significand rounded to the
target rounding precision, unbounded exponent range.
r The value defined in Table 50, “Scalar Floating-Point Intermediate Result Handling,” on page 515, significand rounded to the
target rounding precision, exponent bounded to the target rounding precision format exponent range.
error() The system error handler is invoked for the trap-enabled exception if MSR.FE0 and MSR.FE1 are set to any mode other than the
ignore-exception mode.
VSX Scalar Add Single-Precision XX3-form The result is placed into doubleword element 0 of
VSR[XT] in double-precision format.
xsaddsp XT,XA,XB
The contents of doubleword element 1 of VSR[XT] are
60 T A B 0 AXBXTX undefined.
0 6 11 16 21 29 30 31
FPRF is set to the class and sign of the result as
reset_xflags()
represented in single-precision format. FR is set to
src1 VSR[32×AX+A].dword[0]
indicate if the result was incremented when rounded.
src2 VSR[32×BX+B].dword[0] FI is set to indicate the result is inexact.
v AddDP(src1,src2)
result RoundToSP(RN,v) If a trap-enabled invalid operation exception occurs,
VSR[XT] and FPRF are not modified, and FR and FI
if(vxsnan_flag) then SetFX(VXSNAN) are set to 0.
if(vxisi_flag) then SetFX(VXISI)
if(ox_flag) then SetFX(OX) See Table 51, “VSX Scalar Floating-Point Final
if(ux_flag) then SetFX(UX) Result,” on page 516.
if(xx_flag) then SetFX(XX)
Special Registers Altered
vex_flag VE & (vxsnan_flag | vxisi_flag)
FPRF FR FI FX OX UX XX
if( ~vex_flag ) then do
VXSNAN VXISI
VSR[32×TX+T].dword[0] ConverSPtoDP(result)
VSR[32×TX+T].dword[1] 0xUUUU_UUUU_UUUU_UUUU VSR Data Layout for xsaddsp
FPRF ClassSP(result) src1 = VSR[XA]
FR inc_flag
DP unused
FI xx_flag
end src2 = VSR[XB]
else do
DP unused
FR 0b0
FI 0b0 tgt = VSR[XT]
end DP undefined
0 64 127
Let XT be the value 32×TX + T.
Let XA be the value 32×AX + A.
Let XB be the value 32×BX + B.
1. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared,
and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two
exponents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an
intermediate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation.
2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number
of bits the significand was shifted.
src2
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
-Infinity v -Infinity v -Infinity v -Infinity v -Infinity v -Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
-NZF v -Infinity v A(src1,src2) v src1 v src1 v A(src1,src2) v +Infinity v src2
vxsnan_flag 1
v Q(src2)
-Zero v -Infinity v src2 v -Zero v Rezd v src2 v +Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v -Infinity v src2 v Rezd v +Zero v src2 v +Infinity v src2
vxsnan_flag 1
src1
v Q(src2)
+NZF v -Infinity v A(src1,src2) v src1 v src1 v A(src1,src2) v +Infinity v src2
vxsnan_flag 1
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v src1
QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1
vxsnan_flag 1
v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 The double-precision floating-point value in doubleword element 0 of VSR[XB].
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs).
A(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision.
Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd).
Q(x) Return a QNaN with the payload of x.
v The intermediate result having unbounded signficand precision and unbounded exponent range.
if MSR.VSX=0 then VSX_Unavailable() If the intermediate result is Tiny (i.e., the unbiased
exponent is less than -16382) and UE=0, the
reset_xflags() significand is shifted right N bits, where N is the
difference between -16382 and the unbiased
src1 bfp_CONVERT_FROM_BFP128(VSR[VRA+32]) exponent of the intermediate result. The exponent
src2 bfp_CONVERT_FROM_BFP128(VSR[VRB+32])
of the intermediate result is set to the value
v bfp_ADD(src1, src2)
rnd bfp_ROUND_TO_BFP128(RO, FPSCR.RN, v)
-16382.
result bfp_CONVERT_TO_BFP128(rnd)
If RO=1, let the rounding mode be Round to Odd.
if(vxsnan_flag) then SetFX(FPSCR.VXSNAN) Otherwise, let the rounding mode be specified by
if(vxisi_flag) then SetFX(FPSCR.VXISI) RN. The intermediate result is rounded to
if(ox_flag) then SetFX(FPSCR.OX) quad-precision using the specified rounding mode.
if(ux_flag) then SetFX(FPSCR.UX)
if(xx_flag) then SetFX(FPSCR.XX) See Table 50, “Scalar Floating-Point Intermediate
Result Handling,” on page 515.
vx_flag vxsnan_flag | vxisi_flag
ex_flag FPSCR.VE & vx_flag The result is placed into VSR[VRT+32] in quad-precision
format.
if ex_flag=0 then do
VSR[VRT+32] result
FPRF is set to the class and sign of the result. FR is set
FPSCR.FPRF fprf_CLASS_BFP128(result)
end
to indicate if the rounded result was incremented. FI is
FPSCR.FR (vx_flag=0) & inc_flag set to indicate the result is inexact.
FPSCR.FI (vx_flag=0) & xx_flag
If a trap-disabled Invalid Operation exception occurs,
Let src1 be the floating-point value in VSR[VRA+32] FR and FI are set to 0.
represented in quad-precision format.
If a trap-enabled Invalid Operation exception occurs,
Let src2 be the floating-point value in VSR[VRB+32] VSR[VRT+32] and FPRF are not modified, and FR and FI
represented in quad-precision format. are set to 0.
If either src1 or src2 is a Signalling NaN, an Invalid See Table 51, “VSX Scalar Floating-Point Final
Operation exception occurs and VXSNAN is set to 1. Result,” on page 516.
If src1 and src2 are Infinity values having opposite Special Registers Altered:
signs, an Invalid Operation exception occurs and VXISI FPRF FR FI
is set to 1. FX VXSNAN VXISI OX UX XX
If src1 is a Signalling NaN, the result is the Quiet NaN VSR Data Layout for xsaddqp[o]
corresponding to src1. VSR[VRA+32]
src1
Otherwise, if src1 is a Quiet NaN, the result is src1.
VSR[VRB+32]
Otherwise, if src2 is a Signalling NaN, the result is the
src2
Quiet NaN corresponding to src2.
VSR[VRT+32]
Otherwise, if src2 is a Quiet NaN, the result is src2.
tgt
Otherwise, if src1 and src2 are Infinity values having
opposite signs, the result is the default Quiet NaN[1].
src2
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN
-Infinity v -Infinity
vxisi_flag 1
-NZF v add(src1,src2) v src1 v add(src1,src2)
VSX Scalar Compare Exponents Let src1 be the floating-point value in VSR[VRA+32]
Quad-Precision X-form represented in quad-precision format.
VSX Scalar Compare Greater Than or Equal Let XT be the value 32×TX + T.
Double-Precision XX3-form Let XA be the value 32×AX + A.
Let XB be the value 32×BX + B.
xscmpgedp XT,XA,XB
Let src1 be the double-precision floating-point value in
60 T A B 19 AXBXTX
0 6 11 16 21 29 30 31 doubleword 0 of VSR[XA].
FL CompareLTDP(src1,src2)
FG CompareGTDP(src1,src2)
FE CompareEQDP(src1,src2)
FU IsNAN(src1) | IsNAN(src2)
CR[BF] FL || FG || FE || FU
if(vxsnan_flag) then SetFX(VXSNAN)
if(vxvc_flag) then SetFX(VXVC)
src2
–Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
cc0b0001
cc0b0001
–Infinity cc0b0010 cc0b1000 cc0b1000 cc0b1000 cc0b1000 cc0b1000 vxsnan_flag1
vxvc_flag1
vxvc_flag(VE=0)
cc0b0001
cc0b0001
–NZF cc0b0100 ccC(src1,src2) cc0b1000 cc0b1000 cc0b1000 cc0b1000 vxsnan_flag1
vxvc_flag1
vxvc_flag(VE=0)
cc0b0001
cc0b0001
–Zero cc0b0100 cc0b0100 cc0b0010 cc0b0010 cc0b1000 cc0b1000 vxsnan_flag1
vxvc_flag1
vxvc_flag(VE=0)
cc0b0001
cc0b0001
+Zero cc0b0100 cc0b0100 cc0b0010 cc0b0010 cc0b1000 cc0b1000 vxsnan_flag1
vxvc_flag1
vxvc_flag(VE=0)
src1
cc0b0001
cc0b0001
+NZF cc0b0100 cc0b0100 cc0b0100 cc0b0100 ccC(src1,src2) cc0b1000 vxsnan_flag1
vxvc_flag1
vxvc_flag(VE=0)
cc0b0001
cc0b0001
+Infinity cc0b0100 cc0b0100 cc0b0100 cc0b0100 cc0b0100 cc0b0010 vxsnan_flag1
vxvc_flag1
vxvc_flag(VE=0)
cc0b0001
cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001
QNaN vxsnan_flag1
vxvc_flag1 vxvc_flag1 vxvc_flag1 vxvc_flag1 vxvc_flag1 vxvc_flag1 vxvc_flag1
vxvc_flag(VE=0)
cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001
SNaN vxsnan_flag1 vxsnan_flag1 vxsnan_flag1 vxsnan_flag1 vxsnan_flag1 vxsnan_flag1 vxsnan_flag1 vxsnan_flag1
vxvc_flag(VE=0) vxvc_flag(VE=0) vxvc_flag(VE=0) vxvc_flag(VE=0) vxvc_flag(VE=0) vxvc_flag(VE=0) vxvc_flag(VE=0) vxvc_flag(VE=0)
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 The double-precision floating-point value in doubleword element 0 of VSR[XB].
NZF Nonzero finite number.
C(x,y) The floating-point value x is compared to the floating-point value y, returning one of three 4-bit results.
0b1000 when x is greater than y
0b0100 when x is less than y
0b0010 when x is equal to y
cc The 4-bit result compare code.
– 0 0 FPCCcc, CR[BF]cc
0 0 1 FPCCcc, CR[BF]cc, fx(VXVC)
0 1 0 FPCCcc, CR[BF]cc, fx(VXSNAN)
0 1 1 FPCCcc, CR[BF]cc, fx(VXSNAN), fx(VXVC)
1 0 1 FPCCcc, CR[BF]cc, fx(VXVC), error()
1 1 – FPCCcc, CR[BF]cc, fx(VXSNAN), error()
Explanation:
– The results do not depend on this condition.
cc The 4-bit result as defined in Table 54.
fx(x) FX is set to 1 if x=0. x is set to 1.
error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set
to any mode other than the ignore-exception mode.
FX Floating-Point Summary Exception status flag, FPSCRFX.
VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. See Section 7.4.1.
VXC Floating-Point Invalid Operation Exception (Invalid Compare) status flag, FPSCRVXVC. See Section 7.4.1.
VSX Scalar Compare Ordered Quad-Precision Let src1 be the floating-point value in VSR[VRA+32]
X-form represented in quad-precision format.
if( src1.class.SNaN | src2.class.SNaN ) then do Bit 1 of CR field BF and FG are set to indicate if src1 is
vxsnan_flag 0b1 greater than src2.
if(FPSCR.VE=0) then vxvc_flag 0b1
end
Bit 2 of CR field BF and FE are set to indicate if src1 is
else if( src1.class.QNaN | src2.class.QNaN ) then vxvc_flag 0b1
equal to src2.
cc.bit[0] bfp_COMPARE_LT(src1,src2)
cc.bit[1] bfp_COMPARE_GT(src1,src2) Bit 3 of CR field BF and FU are set to indicate unordered
cc.bit[2] bfp_COMPARE_EQ(src1,src2) (i.e., src1 or src2 is a NaN).
cc.bit[3] src1.class.SNaN | src1.class.QNaN |
cc.bit[3] src2.class.SNaN | src2.class.QNaN If either of the operands is a NaN, either quiet or
signaling, CR field BF and the FPCC are set to reflect
if(vxsnan_flag) then SetFX(FPSCR.VXSNAN) unordered. If either of the operands is a Signaling
if(vxvc_flag) then SetFX(FPSCR.VXVC) NaN, an Invalid Operation exception occurs and
VXSNAN is set, and if Invalid Operation exceptions are
FPSCR.FPCC cc disabled (VE=0), VXVC is set. If neither operand is a
CR.field[BF] cc
Signaling NaN but at least one operand is a Quiet
NaN, an Invalid Operation exception occurs and VXVC
is set.
src1
VSR[VRB+32]
src2
src1 VSR[XA]{0:63}
src2 VSR[XB]{0:63}
FL CompareLTDP(src1,src2)
FG CompareGTDP(src1,src2)
FE CompareEQDP(src1,src2)
FU IsNAN(src1) | IsNAN(src2)
CR[BF] FL || FG || FE || FU
if(vxsnan_flag) then SetFX(VXSNAN)
Programming Note
This instruction can be used to operate on
single-precision source operands.
src2
–Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
cc = 0b0001
–Infinity cc = 0b0010 cc = 0b1000 cc = 0b1000 cc = 0b1000 cc = 0b1000 cc = 0b1000 cc = 0b0001
vxsnan_flag = 1
cc = 0b0001
–NZF cc = 0b0100 cc = C(src1,src2) cc = 0b1000 cc = 0b1000 cc = 0b1000 cc = 0b1000 cc = 0b0001
vxsnan_flag = 1
cc = 0b0001
–Zero cc = 0b0100 cc = 0b0100 cc = 0b0010 cc = 0b0010 cc = 0b1000 cc = 0b1000 cc = 0b0001
vxsnan_flag = 1
cc = 0b0001
+Zero cc = 0b0100 cc = 0b0100 cc = 0b0010 cc = 0b0010 cc = 0b1000 cc = 0b1000 cc = 0b0001
vxsnan_flag = 1
src1
cc = 0b0001
+NZF cc = 0b0100 cc = 0b0100 cc = 0b0100 cc = 0b0100 cc = C(src1,src2) cc = 0b1000 cc = 0b0001
vxsnan_flag = 1
cc = 0b0001
+Infinity cc = 0b0100 cc = 0b0100 cc = 0b0100 cc = 0b0100 cc = 0b0100 cc = 0b0010 cc = 0b0001
vxsnan_flag = 1
cc = 0b0001
QNaN cc = 0b0001 cc = 0b0001 cc = 0b0001 cc = 0b0001 cc = 0b0001 cc = 0b0001 cc = 0b0001
vxsnan_flag = 1
cc = 0b0001 cc = 0b0001 cc = 0b0001 cc = 0b0001 cc = 0b0001 cc = 0b0001 cc = 0b0001 cc = 0b0001
SNaN
vxsnan_flag = 1 vxsnan_flag = 1 vxsnan_flag = 1 vxsnan_flag = 1 vxsnan_flag = 1 vxsnan_flag = 1 vxsnan_flag = 1 vxsnan_flag = 1
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 The double-precision floating-point value in doubleword element 0 of VSR[XB].
NZF Nonzero finite number.
C(x,y) The floating-point value x is compared to the floating-point value y, returning one of three 4-bit results.
0b1000 when x is greater than y
0b0100 when x is less than y
0b0010 when x is equal to y
cc The 4-bit result compare code.
– 0 FPCCcc, CR[BF]cc
0 1 FPCCcc, CR[BF]cc, fx(VXSNAN)
1 1 FPCCcc, CR[BF]cc, fx(VXSNAN), error()
Explanation:
– The results do not depend on this condition.
cc The 4-bit result as defined in Table 56.
fx(x) FX is set to 1 if x=0. x is set to 1.
error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set
to any mode other than the ignore-exception mode.
FX Floating-Point Summary Exception status flag, FPSCRFX.
VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. See Section 7.4.1.
VSX Scalar Compare Unordered Let src1 be the floating-point value in VSR[VRA+32]
Quad-Precision X-form represented in quad-precision format.
vxsnan_flag src1.class.SNaN | src2.class.SNaN Bit 1 of CR field BF and FG are set to indicate if src1 is
greater than src2.
cc.bit[0] bfp_COMPARE_LT(src1,src2)
cc.bit[1] bfp_COMPARE_GT(src1,src2)
Bit 2 of CR field BF and FE are set to indicate if src1 is
cc.bit[2] bfp_COMPARE_EQ(src1,src2)
cc.bit[3] src1.class.SNaN | src1.class.QNaN |
equal to src2.
cc.bit[3] src2.class.SNaN | src2.class.QNaN
Bit 3 of CR field BF and FU are set to indicate unordered
if(vxsnan_flag) then SetFX(FPSCR.VXSNAN) (i.e., src1 or src2 is a NaN).
src1
VSR[VRB+32]
src2
VSX Scalar Copy Sign Double-Precision VSX Scalar Copy Sign Quad-Precision X-form
XX3-form
xscpsgnqp VRT,VRA,VRB
xscpsgndp XT,XA,XB
63 VRT VRA VRB 100 /
0 6 11 16 21 31
60 T A B 176 AX BX TX
0 6 11 16 21 29 30 31 if MSR.VSX=0 then VSX_Unavailable()
DP unused tgt
src2 = VSR[XB]
DP unused
tgt = VSR[XT]
DP undefined
0 64 127
Programming Note
This instruction can be used to operate on
single-precision source operands.
VSX Scalar Convert with round Otherwise, if src is a QNaN, the result is the
Double-Precision to Half-Precision format half-precision representation of that QNaN.
XX2-form
Otherwise, if src is an Infinity, the result is the
xscvdphp XT,XB half-precision representation of Infinity with the same
60 T 17 B 347
sign as src.
BX TX
0 6 11 16 21 30 31
Otherwise, if src is a Zero, the result is the
if MSR.VSX=0 then VSX_Unavailable() half-precision representation of Zero with the same
sign as src.
reset_flags()
Otherwise, the result is the half-precision
src bfp_CONVERT_FROM_BFP64(VSR[BX×32+B].dword[0]) representation of src rounded to half-precision using
rnd bfp_ROUND_TO_BFP16(FPSCR.RN,src)
the rounding mode specified by RN.
result bfp_CONVERT_TO_BFP16(rnd)
src bfp_CONVERT_FROM_BFP64(VSR[VRB+32].dword[0])
if src.class.SNaN then
result bfp_CONVERT_TO_BFP128(bfp_QUIET(src))
else
result bfp_CONVERT_TO_BFP128(src)
vxsnan_flag src.class.SNaN
if(vxsnan_flag) then SetFX(FPSCR.VXSNAN)
vex_flag FPSCR.VE & vxsnan_flag
if vex_flag=0 then do
VSR[VRT+32] result
FPSCR.FPRF fprf_CLASS_BFP128(result)
end
FPSCR.FR 0
FPSCR.FI 0
FR is set to 0. FI is set to 0.
src.dword[0] unused
VSR[VRT+32]
tgt
VSX Scalar Convert with round If a trap-enabled invalid operation exception occurs,
Double-Precision to Single-Precision format VSR[XT] and FPRF are not modified, and FR and FI are
XX2-form set to 0.
VSX Scalar Convert Scalar Single-Precision to VSX Scalar Convert with round to zero
Vector Single-Precision format Non-signalling Double-Precision to Signed Doubleword
XX2-form format XX2-form
xscvdpspn XT,XB xscvdpsxds XT,XB
reset_xflags() XT TX || T
src VSR[32×BX+B].dword[0] XB BX || B
result ConvertDPtoSP_NS(src) reset_xflags()
VSR[32×TX+T].word[0] result result{0:63} ConvertDPtoSD(VSR[XB]{0:63})
VSR[32×TX+T].word[1] 0xUUUU_UUUU if(vxsnan_flag) then SetFX(VXSNAN)
VSR[32×TX+T].word[2] 0xUUUU_UUUU if(vxcvi_flag) then SetFX(VXCVI)
VSR[32×TX+T].word[3] 0xUUUU_UUUU if(xx_flag) then SetFX(XX)
vex_flag VE & (vxsnan_flag | vxcvi_flag)
Let XT be the value 32×TX + T.
Let XB be the value 32×BX + B. if( ~vex_flag ) then do
VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU
Let src be the single-precision floating-point value in FPRF 0bUUUUU
doubleword element 0 of VSR[XB] represented in FR inc_flag
FI xx_flag
double-precision format.
end
else do
src is placed into word element 0 of VSR[XT] in FR 0b0
single-precision format. FI 0b0
end
The contents of word elements 1, 2, and 3 of VSR[XT]
are undefined. Let XT be the value 32×TX + T.
Let XB be the value 32×BX + B.
Special Registers Altered
None Let src be the double-precision floating-point value in
doubleword element 0 of VSR[XB].
VSR Data Layout for xscvdpspn
src = VSR[XB] If src is a NaN, the result is the value
0x8000_0000_0000_0000 and VXCVI is set to 1. If src is
SP unused
an SNaN, VXSNAN is also set to 1.
tgt = VSR[XT]
Otherwise, src is rounded to a floating-point integer
SP undefined undefined undefined using the rounding mode Round Toward Zero.
0 32 64 96 127
If the rounded value is greater than 263-1, the result is
Programming Note 0x7FFF_FFFF_FFFF_FFFF and VXCVI is set to 1.
xscvdpsp should be used to convert a scalar
double-precision value to vector single-precision Otherwise, if the rounded value is less than -263, the
format. result is 0x8000_0000_0000_0000 and VXCVI is set to 1.
xscvdpspn should be used to convert a scalar Otherwise, the result is the rounded value converted to
single-precision value to vector single-precision 64-bit signed-integer format, and if the result is inexact
format. (i.e., not equal to src), XX is set to 1.
Otherwise,
– The result is placed into doubleword element 0 of
VSR[XT]. The contents of doubleword element 1 of
VSR[XT] are undefined.
– FPRF is set to an undefined value.
– FR is set to indicate if the result was incremented
when rounded.
– FI is set to indicate the result is inexact.
Programming Note
This instruction can be used to operate on a
single-precision source operand.
Programming Note
xscvdpsxds rounds using Round towards Zero
rounding mode. For other rounding modes, software
must use a Round to Double-Precision Integer
instruction that corresponds to the desired rounding
mode, including xsrdpic which uses the rounding
mode specified by RN.
XT TX || T
– FR is set to indicate if the result was incremented
XB BX || B when rounded.
inc_flag 0b0
reset_xflags() – FI is set to indicate the result is inexact.
result{0:31} ConvertDPtoSW(VSR[XB]{0:63})
if(vxsnan_flag) then SetFX(VXSNAN) See Table 59.
if(vxcvi_flag) then SetFX(VXCVI)
if(xx_flag) then SetFX(XX) Special Registers Altered
vex_flag VE & (vxsnan_flag | vxcvi_flag) FPRF=0bUUUUU FR FI FX XX VXSNAN VXCVI
if( ~vex_flag ) then do
VSR[XT] 0xUUUU_UUUU || result || 0xUUUU_UUUU_UUUU_UUUU VSR Data Layout for xscvdpsxws
FPRF 0bUUUUU src = VSR[XB]
FR inc_flag
FI xx_flag DP unused
end
else do tgt = VSR[XT]
FR 0b0 undefined SW undefined
FI 0b0
0 32 64 127
end
XT TX || T
– FR is set to indicate if the result was incremented
XB BX || B when rounded.
inc_flag 0b0
reset_xflags() – FI is set to indicate the result is inexact.
result{0:63} ConvertDPtoUD(VSR[XB]{0:63})
if(vxsnan_flag) then SetFX(VXSNAN) See Table 60.
if(vxcvi_flag) then SetFX(VXCVI)
if(xx_flag) then SetFX(XX) Special Registers Altered
vex_flag VE & (vxsnan_flag | vxcvi_flag) FPRF=0bUUUUU FR FI FX XX VXSNAN VXCVI
if( ~vex_flag ) then do
VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU VSR Data Layout for xscvdpuxds
FPRF 0bUUUUU src = VSR[XB]
FR inc_flag
FI xx_flag DP unused
end
else do tgt = VSR[XT]
FR 0b0 UD undefined
FI 0b0
0 64 127
end
XT TX || T
– FR is set to indicate if the result was incremented
XB BX || B when rounded.
inc_flag 0b0
reset_xflags() – FI is set to indicate the result is inexact.
result{0:31} ConvertDPtoUW(VSR[XB]{0:63})
if(vxsnan_flag) then SetFX(VXSNAN) See Table 61.
if(vxcvi_flag) then SetFX(VXCVI)
if(xx_flag) then SetFX(XX) Special Registers Altered
vex_flag VE & (vxsnan_flag | vxcvi_flag) FPRF=0bUUUUU FR FI FX XX VXSNAN VXCVI
if( ~vex_flag ) then do
VSR[XT] 0xUUUU_UUUU || result || 0xUUUU_UUUU_UUUU_UUUU VSR Data Layout for xscvdpuxws
FPRF 0bUUUUU src = VSR[XB]
FR inc_flag
FI xx_flag DP unused
end
else do tgt = VSR[XT]
FR 0b0 undefined UW undefined
FI 0b0
0 32 64 127
end
VSX Scalar Convert Half-Precision to Otherwise, if src is a QNaN, the result is the
Double-Precision format XX2-form double-precision representation of that QNaN.
vex_flag FPSCR.VE & vxsnan_flag FPRF is set to the class and sign of the result as
represented in double-precision format. FR is set to
if vex_flag=0 then do indicate if the rounded result was incremented. FI is
VSR[VRT+32].dword[0] result set to indicate the result is inexact.
VSR[VRT+32].dword[1] 0x0000_0000_0000_0000
FPSCR.FPRF fprf_CLASS_BFP64(result)
If a trap-disabled Invalid Operation exception occurs,
end
FPSCR.FR (vxsnan_flag=0) & inc_flag
FR and FI are set to 0.
FPSCR.FI (vxsnan_flag=0) & xx_flag
If a trap-enabled Invalid Operation exception occurs,
Let src be the quad-precision floating-point value in VSR[VRT+32] and FPRF are not modified, and FR and FI
VSR[VRB+32]. are set to 0.
If src is a Signalling NaN, an Invalid Operation See Table 51, “VSX Scalar Floating-Point Final
exception occurs and VXSNAN is set to 1. Result,” on page 516.
If src is a Signalling NaN, the result is the Quiet NaN Special Registers Altered:
corresponding to the Signalling NaN, with the FPRF FR FI
significand truncated to the rounding precision. FX VXSNAN OX UX XX
Otherwise, if src is a Quiet NaN, then the result is src VSR Data Layout for xscvqpdp[o]
with the significand truncated to double-precision. VSR[VRB+32]
tgt.dword[0] 0x0000_0000_0000_0000
VSX Scalar Convert with round to zero If src is a Quiet NaN or an Infinity, an Invalid Operation
Quad-Precision to Signed Doubleword format exception occurs and VXCVI is set to 1.
X-form
If src is a NaN, the result is 0x8000_0000_0000_0000.
xscvqpsdz VRT,VRB
Otherwise, if src is a Zero, the result is
63 VRT 25 VRB 836 /
0 6 11 16 21 31 0x0000_0000_0000_0000.
tgt.dword[0] 0x0000_0000_0000_0000
If src is a Signalling NaN, an Invalid Operation
exception occurs and VXSNAN and VXCVI are set to 1.
bfp_ROUND_TO_INTEGER(0b001,src) g src
FPSCR.VE
FPSCR.XE
VSX Scalar Convert with round to zero If src is a Quiet NaN or an Infinity, an Invalid Operation
Quad-Precision to Signed Word format X-form exception occurs and VXCVI is set to 1.
bfp_ROUND_TO_INTEGER(0b001,src) g src
FPSCR.VE
FPSCR.XE
VSX Scalar Convert with round to zero If src is a Quiet NaN or an Infinity, an Invalid Operation
Quad-Precision to Unsigned Doubleword exception occurs and VXCVI is set to 1.
format X-form
If src is a NaN, the result is 0x0000_0000_0000_0000.
xscvqpudz VRT,VRB
Otherwise, if src is a Zero, the result is
63 VRT 17 VRB 836 /
0 6 11 16 21 31 0x0000_0000_0000_0000.
bfp_ROUND_TO_INTEGER(0b001,src) g src
FPSCR.VE
FPSCR.XE
VSX Scalar Convert with round to zero If src is a Quiet NaN or an Infinity, an Invalid Operation
Quad-Precision to Unsigned Word format exception occurs and VXCVI is set to 1.
X-form
If src is a NaN, the result is 0x0000_0000_0000_0000.
xscvqpuwz VRT,VRB
Otherwise, if src is a Zero, the result is
63 VRT 1 VRB 836 /
0 6 11 16 21 31 0x0000_0000_0000_0000.
tgt.dword[0] 0x0000_0000_0000_0000
Let src be the quad-precision floating-point value in
VSR[VRB+32].
bfp_ROUND_TO_INTEGER(0b001,src) g src
FPSCR.VE
FPSCR.XE
src bfp_CONVERT_FROM_SI64(VSR[VRB+32].dword[0])
result bfp_CONVERT_TO_BFP128(src)
VSR[VRT+32] result
FPSCR.FPRF fprf_CLASS_BFP128(result)
FPSCR.FR 0
FPSCR.FI 0
src.dword[0] unused
VSR[VRT+32]
tgt
reset_xflags()
src VSR[32×BX+B].word[0]
result ConvertVectorSPtoScalarSP(src)
if(vxsnan_flag) then SetFX(FPSCR.VXSNAN)
vex_flag FPSCR.VE & vxsnan_flag
FPSCR.FR 0b0
FPSCR.FI 0b0
if( ~vex_flag ) then do
VSR[32×TX+T].dword[0] result
VSR[32×TX+T].dword[1] 0xUUUU_UUUU_UUUU_UUUU
FPSCR.FPRF ClassDP(result)
end
reset_xflags()
src VSR[32×BX+B].word[0]
result ConvertSPtoDP_NS(src)
VSR[32×TX+T].dword[0] result
VSR[32×TX+T].dword[1] 0xUUUU_UUUU_UUUU_UUUU
Programming Note
xscvspdp should be used to convert a vector
single-precision floating-point value to scalar
double-precision format.
VSX Scalar Convert with round Signed VSX Scalar Convert with round Signed
Doubleword to Double-Precision format Doubleword to Single-Precision format
XX2-form XX2-form
xscvsxddp XT,XB xscvsxdsp XT,XB
reset_xflags() reset_xflags()
Let src be the signed integer value in doubleword Let src be the two’s-complement integer value in
element 0 of VSR[XB]. doubleword element 0 of VSR[XB].
The result is placed into doubleword element 0 of The result is placed into doubleword element 0 of
VSR[XT] in double-precision format. VSR[XT] in double-precision format.
The contents of doubleword element 1 of VSR[XT] are The contents of doubleword element 1 of VSR[XT] are
undefined. undefined.
FPRF is set to the class and sign of the result. FR is FPRF is set to the class and sign of the result as
set to indicate if the result was incremented when represented in single-precision format. FR is set to
rounded. FI is set to indicate the result is inexact. indicate if the result was incremented when rounded.
FI is set to indicate the result is inexact.
Special Registers Altered
FPRF FR FI FX XX Special Registers Altered
FPRF FR FI FX XX
VSR Data Layout for xscvsxddp
VSR Data Layout for xscvsxdsp
src = VSR[XB] src = VSR[XB]
SD unused SD unused
tgt = VSR[XT] tgt = VSR[XT]
DP undefined
DP undefined
0 64 127
0 64 127
VSX Scalar Convert Signed Doubleword to VSX Scalar Convert Unsigned Doubleword to
Quad-Precision format X-form Quad-Precision format X-form
xscvsdqp VRT,VRB xscvudqp VRT,VRB
Let src be the signed integer value in doubleword Let src be the unsigned integer value in doubleword
element 0 of VSR[VRB+32]. element 0 of VSR[VRB+32].
src is placed into VSR[VRT+32] in quad-precision src is placed into VSR[VRT+32] in quad-precision
floating-point format. floating-point format.
FPRF is set to the class and sign of the result. FR is set FPRF is set to the class and sign of the result. FR is set
to 0. FI is set to 0. to 0. FI is set to 0.
VSR Data Layout for xscvsdqp VSR Data Layout for xscvudqp
VSR[VRB+32] VSR[VRB+32]
VSR[VRT+32] VSR[VRT+32]
tgt tgt
VSX Scalar Convert with round Unsigned VSX Scalar Convert with round Unsigned
Doubleword to Double-Precision format Doubleword to Single-Precision XX2-form
XX2-form
xscvuxdsp XT,XB
xscvuxddp XT,XB
60 T /// B 296 BXTX
60 T /// B 360 BX TX 0 6 11 16 21 30 31
0 6 11 16 21 30 31
reset_xflags()
reset_xflags()
src ConvertUDtoDP(VSR[32×BX+B].dword[0])
src ConvertUDtoFP(VSR[32×BX+B].dword[0]) result RoundToSP(RN,src)
result RoundToDP(RN,src) VSR[32×TX+T].dword[0] ConvertSPtoSP64(result)
VSR[32×TX+T].dword[0] result VSR[32×TX+T].dword[1] 0xUUUU_UUUU_UUUU_UUUU
VSR[32×TX+T].dword[1] 0xUUUU_UUUU_UUUU_UUUU
if(xx_flag) then SetFX(XX)
if(xx_flag) then SetFX(XX)
FPRF ClassSP(result)
FPRF ClassDP(result) FR inc_flag
FR inc_flag FI xx_flag
FI xx_flag
Let XT be the value 32×TX + T.
Let XT be the value 32×TX + T. Let XB be the value 32×BX + B.
Let XB be the value 32×BX + B.
Let src be the unsigned-integer value in doubleword
Let src be the unsigned integer value in doubleword element 0 of VSR[XB].
element 0 of VSR[XB].
src is converted to floating-point format, and rounded
src is converted to an unbounded-precision to single-precision using the rounding mode specified
floating-point value and rounded to double-precision by RN.
using the rounding mode specified by RN.
The result is placed into doubleword element 0 of
The result is placed into doubleword element 0 of VSR[XT] in double-precision format.
VSR[XT] in double-precision format.
The contents of doubleword element 1 of VSR[XT] are
The contents of doubleword element 1 of VSR[XT] are undefined.
undefined.
FPRF is set to the class and sign of the result as
FPRF is set to the class and sign of the result. FR is represented in single-precision format. FR is set to
set to indicate if the result was incremented when indicate if the result was incremented when rounded.
rounded. FI is set to indicate the result is inexact. FI is set to indicate the result is inexact.
VSR Data Layout for xscvuxddp VSR Data Layout for xscvuxdsp
src = VSR[XB]
src = VSR[XB]
UD unused
UD unused
tgt = VSR[XT]
tgt = VSR[XT] DP undefined
DP undefined 0 64 127
0 64 127
VSX Scalar Divide Double-Precision XX3-form The result is placed into doubleword element 0 of
VSR[XT] in double-precision format.
xsdivdp XT,XA,XB
The contents of doubleword element 1 of VSR[XT] are
60 T A B 56 AX BX TX undefined.
0 6 11 16 21 29 30 31
FPRF is set to the class and sign of the result. FR is
XT TX || T set to indicate if the result was incremented when
XA AX || A rounded. FI is set to indicate the result is inexact.
XB BX || B
reset_xflags()
If a trap-enabled invalid operation exception or a
src1 VSR[XA]{0:63}
src2 VSR[XB]{0:63}
trap-enabled zero divide exception occurs, VSR[XT]
v{0:inf} DivideFP(src1,src2) and FPRF are not modified, and FR and FI are set to
result{0:63} RoundToDP(RN,v) 0.
if(vxsnan_flag) then SetFX(VXSNAN)
if(vxidi_flag) then SetFX(VXIDI) See Table 51, “VSX Scalar Floating-Point Final
if(vxzdz_flag) then SetFX(VXZDZ) Result,” on page 516.
if(ox_flag) then SetFX(OX)
if(ux_flag) then SetFX(UX) Special Registers Altered
if(xx_flag) then SetFX(XX) FPRF FR FI FX OX UX ZX XX
if(zx_flag) then SetFX(ZX) VXSNAN VXIDI VXZDZ
vex_flag VE & (vxsnan_flag | vxidi_flag | vxzdz_flag)
zex_flag ZE & zx_flag
VSR Data Layout for xsdivdp
if( ~vex_flag & ~zex_flag ) then do src1 = VSR[XA]
VSR[XT] = result || 0xUUUU_UUUU_UUUU_UUUU
FPRF = ClassDP(result) DP unused
FR = inc_flag
FI = xx_flag src2 = VSR[XB]
end
DP unused
else do
FR = 0b0 tgt = VSR[XT]
FI = 0b0
end DP undefined
0 64 127
Let XT be the value 32×TX + T.
Let XA be the value 32×AX + A.
Let XB be the value 32×BX + B.
src2
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v dQNaN v Q(src2)
-Infinity v +Infinity v +Infinity v –Infinity v –Infinity v src2
vxidi_flag 1 vxidi_flag 1 vxsnan_flag 1
v +Infinity v –Infinity v Q(src2)
-NZF v +Zero v D(src1,src2) v D(src1,src2) v –Zero v src2
zx_flag 1 zx_flag 1 vxsnan_flag 1
v dQNaN v dQNaN v Q(src2)
-Zero v +Zero v +Zero v –Zero v –Zero v src2
vxzdz_flag 1 vxzdz_flag 1 vxsnan_flag 1
v dQNaN v dQNaN v Q(src2)
+Zero v –Zero v –Zero v +Zero v +Zero v src2
vxzdz_flag 1 vxzdz_flag 1 vxsnan_flag 1
src1
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 The double-precision floating-point value in doubleword element 0 of VSR[XB].
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
D(x,y) Return the normalized quotient of floating-point value x divided by floating-point value y, having unbounded range and precision.
Q(x) Return a QNaN with the payload of x.
v The intermediate result having unbounded signficand precision and unbounded exponent range.
VSX Scalar Divide Quad-Precision [using Otherwise, if src2 is a Quiet NaN, the result is src2.
round to Odd] X-form
Otherwise, if src1 and src2 are Infinity values, or if
xsdivqp VRT,VRA,VRB (RO=0) src1 and src2 are Zero values, the result is the default
xsdivqpo VRT,VRA,VRB (RO=1) Quiet NaN[1].
63 VRT VRA VRB 548 RO
0 6 11 16 21 31
Otherwise, if src1 is a non-zero value and src2 is a
Zero value, the result is an Infinity.
if MSR.VSX=0 then VSX_Unavailable()
Otherwise, do the following.
reset_xflags() The normalized quotient of src1 divided by src2 is
produced with unbounded significand precision
src1 bfp_CONVERT_FROM_BFP128(VSR[VRA+32]) and exponent range.
src2 bfp_CONVERT_FROM_BFP128(VSR[VRB+32])
v bfp_DIVIDE(src1, src2) See Table 67, “Actions for xsdivqp[o],” on
rnd bfp_ROUND_TO_BFP128(RO, FPSCR.RN, v)
page 565.
result bfp_CONVERT_TO_BFP128(rnd)
vx_flag vxsnan_flag | vxidi_flag | vxzdz_flag If RO=1, let the rounding mode be Round to Odd.
ex_flag (FPSCR.VE & vx_flag) | (FPSCR.ZE & zx_flag) Otherwise, let the rounding mode be specified by
RN. Unless the result is an Infinity or a Zero, the
if ex_flag=0 then do
VSR[VRT+32] result
intermediate result is rounded to quad-precision
FPSCR.FPRF fprf_CLASS_BFP128(result) using the specified rounding mode.
end
FPSCR.FR (vx_flag=0) & (zx_flag=0) & inc_flag See Table 50, “Scalar Floating-Point Intermediate
FPSCR.FI (vx_flag=0) & (zx_flag=0) & xx_flag Result Handling,” on page 515.
Let src1 be the floating-point value in VSR[VRA+32] The result is placed into VSR[VRT+32] in quad-precision
represented in quad-precision format. format.
Let src2 be the floating-point value in VSR[VRB+32] FPRF is set to the class and sign of the result. FR is set
represented in quad-precision format. to indicate if the result was incremented when
rounded. FI is set to indicate the result is inexact.
If either src1 or src2 is a Signalling NaN, an Invalid
Operation exception occurs and VXSNAN is set to 1 If a trap-disabled Invalid Operation exception occurs,
FR and FI are set to 0.
If src1 and src2 are Infinity values, an Invalid
Operation exception occurs and VXIDI is set to 1. If a trap-disabled Zero Divide exception occurs, FR and
FI are set to 0.
If src1 and src2 are Zero values, an Invalid Operation
exception occurs and VXZDZ is set to 1. If a trap-enabled Invalid Operation exception or a
trap-enabled Zero Divide exception occurs,
If src1 is a finite value and src2 is a Zero value, an VSR[VRT+32] and FPRF are not modified, and FR and FI
Zero Divide exception occurs and ZX is set to 1. are set to 0.
If src1 is a Signalling NaN, the result is the Quiet NaN See Table 51, “VSX Scalar Floating-Point Final
corresponding to src1. Result,” on page 516.
Otherwise, if src1 is a Quiet NaN, the result is src1. Special Registers Altered:
FPRF FR FI
Otherwise, if src2 is a Signalling NaN, the result is the FX VXSNAN VXIDI VXZDZ OX UX ZX XX
Quiet NaN corresponding to src2.
1. The quad-precision default Quiet NaN is the value, 0x7FFF_8000_0000_0000_0000_0000_0000.
src1
VSR[VRB+32]
src2
VSR[VRT+32]
tgt
src2
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v dQNaN
-Infinity v +Infinity v +Infinity v -Infinity v -Infinity
vxidi_flag 1 vxidi_flag 1
v +Infinity v -Infinity
-NZF v Div(src1,src2) v Div(src1,src2)
zx_flag 1 zx_flag 1
-Zero v +Zero v -Zero
v dQNaN
v src2
vxzdz_flag 1 v quiet(src2)
+Zero v -Zero v +Zero
vxsnan_flag 1
src1
v -Infinity v +Infinity
+NZF v Div(src1,src2) v Div(src1,src2)
zx_flag 1 zx_flag 1
v dQNaN v dQNaN
+Infinity v -Infinity v -Infinity v +Infinity v +Infinity
vxidi_flag 1 vxidi_flag 1
v src1
QNaN v src1
vxsnan_flag 1
v quiet(src1)
SNaN
vxsnan_flag 1
Explanation:
src1 The quad-precision floating-point value in VSR[VRA+32].
src2 The quad-precision floating-point value in VSR[VRB+32].
dQNaN Default quiet NaN (0x7FFF_8000_0000_0000_0000_0000_0000).
NZF Nonzero finite number.
Div(x,y) The floating-point value x is divided1 by floating-point value y. Return the normalized2 quotient, having unbounded range and
precision.
quiet(x) Convert x to the corresponding Quiet NaN.
v The intermediate result having unbounded significand precision and unbounded exponent range.
VSX Scalar Divide Single-Precision XX3-form The result is placed into doubleword element 0 of
VSR[XT] in double-precision format.
xsdivsp XT,XA,XB
The contents of doubleword element 1 of VSR[XT] are
60 T A B 24 AXBXTX undefined.
0 6 11 16 21 29 30 31
FPRF is set to the class and sign of the result as
reset_xflags()
represented in single-precision format. FR is set to
src1 VSR[32×AX+A].dword[0]
indicate if the result was incremented when rounded.
src2 VSR[32×BX+B].dword[0] FI is set to indicate the result is inexact.
v DivideDP(src1,src2) If a trap-enabled invalid operation exception or a
result RoundToSP(RN,v) trap-enabled zero divide exception occurs, VSR[XT]
and FPRF are not modified, and FR and FI are set to
if(vxsnan_flag) then SetFX(VXSNAN) 0.
if(vxidi_flag) then SetFX(VXIDI) See Table 51, “VSX Scalar Floating-Point Final
if(vxzdz_flag) then SetFX(VXZDZ) Result,” on page 516.
if(ox_flag) then SetFX(OX)
if(ux_flag) then SetFX(UX)
Special Registers Altered
if(xx_flag) then SetFX(XX)
if(zx_flag) then SetFX(ZX)
FPRF FR FI FX OX UX ZX XX
VXSNAN VXIDI VXZDZ
vex_flag VE & (vxsnan_flag|vxidi_flag|vxzdz_flag)
zex_flag ZE & zx_flag VSR Data Layout for xsdivsp
src1 = VSR[XA]
if( ~vex_flag & ~zex_flag ) then do
DP unused
VSR[32×TX+T].dword[0] ConvertSPtoSP64(result)
VSR[32×TX+T].dword[1] 0xUUUU_UUUU_UUUU_UUUU src2 = VSR[XB]
FPRF ClassSP(result) DP unused
FR inc_flag
FI xx_flag tgt = VSR[XT]
end DP undefined
else do 0 64 127
FR 0b0
FI 0b0
end
src2
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v dQNaN v Q(src2)
-Infinity v +Infinity v +Infinity v –Infinity v –Infinity v src2
vxidi_flag 1 vxidi_flag 1 vxsnan_flag 1
v +Infinity v –Infinity v Q(src2)
-NZF v +Zero v D(src1,src2) v D(src1,src2) v –Zero v src2
zx_flag 1 zx_flag 1 vxsnan_flag 1
v dQNaN v dQNaN v Q(src2)
-Zero v +Zero v +Zero v –Zero v –Zero v src2
vxzdz_flag 1 vxzdz_flag 1 vxsnan_flag 1
v dQNaN v dQNaN v Q(src2)
+Zero v –Zero v –Zero v +Zero v +Zero v src2
vxzdz_flag 1 vxzdz_flag 1 vxsnan_flag 1
src1
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 The double-precision floating-point value in doubleword element 0 of VSR[XB].
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
D(x,y) Return the normalized quotient of floating-point value x divided by floating-point value y, having unbounded range and precision.
Q(x) Return a QNaN with the payload of x.
v The intermediate result having unbounded signficand precision and unbounded exponent range.
Programming Note
This instruction can be used to produce a
single-precision result.
src1 GPR[RA]
src2 GPR[RB]
VSR[VRT+32].bit[0] VSR[VRA+32].bit[0]
VSR[VRT+32].bit[1:15] VSR[VRB+32].dword[0].bit[49:63]
VSR[VRT+32].bit[16:127] VSR[VRA+32].bit[16:127]
VSR[VRB+32]
src2.dword[0] unused
VSR[VRT+32]
tgt
Part 1: src3
Multiply –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
p dQNaN p dQNaN p Q(src3)
–Infinity p +Infinity p +Infinity p –Infinity p –Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p Q(src3)
–NZF p +Infinity p M(src1,src3) p +Zero p –Zero p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
–Zero p +Zero p +Zero p –Zero p –Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Zero p –Zero p –Zero p +Zero p +Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
src1
p Q(src3)
+NZF p –Infinity p M(src1,src3) p –Zero p +Zero p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Infinity p –Infinity p +Infinity p +Infinity p +Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p src1
QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1
vxsnan_flag 1
p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Part 2: src2
Add –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
–Infinity v –Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
–NZF v –Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2
vxsnan_flag 1
v Q(src2)
–Zero v –Infinity v src2 v –Zero v Rezd v src2 v +Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v –Infinity v src2 v Rezd v +Zero v src2 v +Infinity v src2
vxsnan_flag 1
p
v Q(src2)
+NZF v –Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2
vxsnan_flag 1
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
QNaN & vp
vp vp vp vp vp vp vp
src1 is a NaN vxsnan_flag 1
QNaN & v Q(src2)
vp vp vp vp vp vp v src2
src1 not a NaN vxsnan_flag 1
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 For xsmaddadp, the double-precision floating-point value in doubleword element 0 of VSR[XT].
For xsmaddmdp, the double-precision floating-point value in doubleword element 0 of VSR[XB].
src3 For xsmaddadp, the double-precision floating-point value in doubleword element 0 of VSR[XB].
For xsmaddmdp, the double-precision floating-point value in doubleword element 0 of VSR[XT].
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two
nonzero finite number source operands.
Q(x) Return a QNaN with the payload of x.
A(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision.
Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd).
M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
p The intermediate product having unbounded range and precision.
v The intermediate result having unbounded range and precision.
if(vxsnan_flag) then SetFX(VXSNAN) See part 2 of Table 70, “Actions for xsmadd(a|m)sp,”
if(vximz_flag) then SetFX(VXIMZ) on page 575.
if(vxisi_flag) then SetFX(VXISI)
if(ox_flag) then SetFX(OX) The intermediate result is rounded to single-precision
if(ux_flag) then SetFX(UX) using the rounding mode specified by RN.
if(xx_flag) then SetFX(XX)
See Table 50, “Scalar Floating-Point Intermediate
vex_flag VE & (vxsnan_flag | vximz_flag | vxisi_flag)
Result Handling,” on page 515.
if( ~vex_flag ) then do
VSR[32×TX+T].dword[0] ConvertSPtoSP64(result)
The result is placed into doubleword element 0 of
VSR[32×TX+T].dword[1] 0xUUUU_UUUU_UUUU_UUUU VSR[XT] in double-precision format.
FPRF ClassSP(result)
FR inc_flag The contents of doubleword element 1 of VSR[XT] are
FI xx_flag undefined.
end
else do FPRF is set to the class and sign of the result as
FR 0b0 represented in single-precision format. FR is set to
FI 0b0 indicate if the result was incremented when rounded.
end FI is set to indicate the result is inexact.
Part 1: src3
Multiply –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
p dQNaN p dQNaN p Q(src3)
–Infinity p +Infinity p +Infinity p –Infinity p –Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p Q(src3)
–NZF p +Infinity p M(src1,src3) p +Zero p –Zero p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
–Zero p +Zero p +Zero p –Zero p –Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Zero p –Zero p –Zero p +Zero p +Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
src1
p Q(src3)
+NZF p –Infinity p M(src1,src3) p –Zero p +Zero p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Infinity p –Infinity p +Infinity p +Infinity p +Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p src1
QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1
vxsnan_flag 1
p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Part 2: src2
Add –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
–Infinity v –Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
–NZF v –Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2
vxsnan_flag 1
v Q(src2)
–Zero v –Infinity v src2 v –Zero v Rezd v src2 v +Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v –Infinity v src2 v Rezd v +Zero v src2 v +Infinity v src2
vxsnan_flag 1
v Q(src2)
+NZF v –Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2
vxsnan_flag 1
p
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
QNaN &
vp
src1 is a vp vp vp vp vp vp vp
vxsnan_flag 1
NaN
QNaN &
v Q(src2)
src1 not a v p vp vp vp vp vp v src2
vxsnan_flag 1
NaN
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 For xsmaddasp, the double-precision floating-point value in doubleword element 0 of VSR[XT].
For xsmaddmsp, the double-precision floating-point value in doubleword element 0 of VSR[XB].
src3 For xsmaddasp, the double-precision floating-point value in doubleword element 0 of VSR[XB].
For xsmaddmsp, the double-precision floating-point value in doubleword element 0 of VSR[XT].
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two
nonzero finite number source operands.
Q(x) Return a QNaN with the payload of x.
A(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision.
Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd).
M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
p The intermediate product having unbounded range and precision.
v The intermediate result having unbounded range and precision.
VSX Scalar Multiply-Add Quad-Precision Otherwise, if src1 is a Quiet NaN, the result is src1.
[using round to Odd] X-form
Otherwise, if src2 is a Signalling NaN, the result is the
xsmaddqp VRT,VRA,VRB (RO=0) Quiet NaN corresponding to src2.
xsmaddqpo VRT,VRA,VRB (RO=1)
Otherwise, if src2 is a Quiet NaN, the result is src2.
63 VRT VRA VRB 388 RO
0 6 11 16 21 31 Otherwise, if src3 is a Signalling NaN, the result is the
Quiet NaN corresponding to src3.
if MSR.VSX=0 then VSX_Unavailable()
Otherwise, if src3 is a Quiet NaN, the result is src3.
reset_xflags()
Otherwise, if src1 is an Infinity value and src3 is a Zero
src1 bfp_CONVERT_FROM_BFP128(VSR[VRA+32])
src2 bfp_CONVERT_FROM_BFP128(VSR[VRT+32])
value, or if src1 is a Zero value and src3 is an Infinity
src3 bfp_CONVERT_FROM_BFP128(VSR[VRB+32]) value, the result is the default Quiet NaN[1].
v bfp_MULTIPLY_ADD(src1, src3, src2)
rnd bfp_ROUND_TO_BFP128(RO, FPSCR.RN, v) Otherwise, if the product of src1 and src3, and src2
result bfp_CONVERT_TO_BFP128(rnd) are Infinity values having opposite signs, the result is
the default Quiet NaN.
if(vxsnan_flag) then SetFX(FPSCR.VXSNAN)
if(vximz_flag) then SetFX(FPSCR.VXIMZ) Otherwise, do the following.
if(vxisi_flag) then SetFX(FPSCR.VXISI) src1 is multiplied by src3, producing a product
if(ox_flag) then SetFX(FPSCR.OX) having unbounded significand precision and
if(ux_flag) then SetFX(FPSCR.UX) exponent range.
if(xx_flag) then SetFX(FPSCR.XX)
See part 1 of Table 69. "Actions for
vx_flag vxsnan_flag | vximz_flag | vxisi_flag
ex_flag FPSCR.VE & vx_flag
xsmadd(a|m)dp".
src1
VSR[VRT+32]
src2
VSR[VRB+32]
src3
VSR[VRT+32]
tgt
Part 1: src3
Multiply –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
p dQNaN
–Infinity p +Infinity p –Infinity
vximz_flag 1
p p
–NZF
mul(src1,src3) mul(src1,src3)
p p
+NZF
mul(src1,src3) mul(src1,src3)
p dQNaN
+Infinity p –Infinity p +Infinity
vximz_flag 1
p src1
QNaN p src1
vxsnan_flag 1
p quiet(src1)
SNaN
vxsnan_flag 1
Part 2: src2
Add –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN
–Infinity v –Infinity
vxisi_flag 1
–NZF v add(p,src2) vp v add(p,src2)
Explanation:
src1 The quad-precision floating-point value in VSR[VRA+32].
src2 The quad-precision floating-point value in VSR[VRT+32].
src3 The quad-precision floating-point value in VSR[VRB+32].
dQNaN Default quiet NaN (0x7FFF_8000_0000_0000_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two
nonzero finite number source operands.
quiet(x) Return a QNaN with the payload of x.
add(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision.
Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd).
mul(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
p The intermediate product having unbounded range and precision.
v The intermediate result having unbounded range and precision.
Programming Note
This instruction can be used to operate on
single-precision source operands.
src2
–Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
T(Q(src2))
–Infinity T(src1) T(src2) T(src2) T(src2) T(src2) T(src2) T(src1)
fx(VXSNAN)
T(Q(src2))
–NZF T(src1) T(M(src1,src2)) T(src2) T(src2) T(src2) T(src2) T(src1)
fx(VXSNAN)
T(Q(src2))
–Zero T(src1) T(src1) T(src1) T(src2) T(src2) T(src2) T(src1)
fx(VXSNAN)
T(Q(src2))
+Zero T(src1) T(src1) T(src1) T(src1) T(src2) T(src2) T(src1)
fx(VXSNAN)
src1
T(Q(src2))
+NZF T(src1) T(src1) T(src1) T(src1) T(M(src1,src2)) T(src2) T(src1)
fx(VXSNAN)
T(Q(src2))
+Infinity T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1)
fx(VXSNAN)
T(src1)
QNaN T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(src1)
fx(VXSNAN)
T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1))
SNaN
fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN)
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 The double-precision floating-point value in doubleword element 0 of VSR[XT].
NZF Nonzero finite number.
Q(x) Return a QNaN with the payload of x.
M(x,y) Return the greater of floating-point value x and floating-point value y.
T(x) The value x is placed in doubleword element 0 of VSR[XT] in double-precision format.
The contents of doubleword element 1 of VSR[XT] are undefined.
FPRF, FR and FI are not modified.
fx(x) If x is equal to 0, FX is set to 1. x is set to 1.
VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. If VE=1, update of VSR[XT] is suppressed.
src2
–Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
T(src2)
–Infinity T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(src2)
fx(VXSNAN)
T(src2)
–NZF T(src1) T(M(src1,src2) T(src2) T(src2) T(src2) T(src2) T(src2)
fx(VXSNAN)
T(src2)
–Zero T(src1) T(src1) T(src2) T(src2) T(src2) T(src2) T(src2)
fx(VXSNAN)
T(src2)
+Zero T(src1) T(src1) T(src2) T(src2) T(src2) T(src2) T(src2)
fx(VXSNAN)
src1
T(src2)
+NZF T(src1) T(src1) T(src1) T(src1) T(M(src1,src2) T(src2) T(src2)
fx(VXSNAN)
T(src2)
+Infinity T(src1) T(src1) T(src1) T(src1) T(src1) T(src2) T(src2)
fx(VXSNAN)
T(src2)
QNaN T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(src2)
fx(VXSNAN)
T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(src2)
SNaN
fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN)
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 The double-precision floating-point value in doubleword element 0 of VSR[XT].
NZF Nonzero finite number.
M(x,y) Return the greater of floating-point value x and floating-point value y.
T(x) The value x is placed in doubleword element 0 of VSR[XT] in double-precision format.
The contents of doubleword element 1 of VSR[XT] are undefined.
FPRF, FR and FI are not modified.
fx(x) If x is equal to 0, FX is set to 1. x is set to 1.
VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, VXSNAN. If VE=1, update of VSR[XT] is suppressed.
src2
–Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
T(src2)
–Infinity T(-INF) T(src2) T(src2) T(src2) T(src2) T(src2) T(src2)
fx(VXSNAN)
T(src2)
–NZF T(src1) T(M(src1,src2) T(src2) T(src2) T(src2) T(src2) T(src2)
fx(VXSNAN)
T(src2)
–Zero T(src1) T(src1) T(-Zero) T(+Zero) T(src2) T(src2) T(src2)
fx(VXSNAN)
T(src2)
+Zero T(src1) T(src1) T(+Zero) T(+Zero) T(src2) T(src2) T(src2)
fx(VXSNAN)
src1
T(src2)
+NZF T(src1) T(src1) T(src1) T(src1) T(M(src1,src2) T(src2) T(src2)
fx(VXSNAN)
T(src2)
+Infinity T(src1) T(src1) T(src1) T(src1) T(src1) T(+INF) T(src2)
fx(VXSNAN)
T(src1)
QNaN T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1)
fx(VXSNAN)
T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1)
SNaN
fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN)
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 The double-precision floating-point value in doubleword element 0 of VSR[XT].
NZF Nonzero finite number.
M(x,y) Return the greater of floating-point value x and floating-point value y.
T(x) The value x is placed in doubleword element 0 of VSR[XT] in double-precision format.
The contents of doubleword element 1 of VSR[XT] are undefined.
FPRF, FR and FI are not modified.
fx(x) If x is equal to 0, FX is set to 1. x is set to 1.
VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, VXSNAN. If VE=1, update of VSR[XT] is suppressed.
Programming Note
This instruction can be used to operate on
single-precision source operands.
src2
–Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
T(Q(src2))
–Infinity T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1)
fx(VXSNAN)
T(Q(src2))
–NZF T(src2) T(M(src1,src2)) T(src1) T(src1) T(src1) T(src1) T(src1)
fx(VXSNAN)
T(Q(src2))
–Zero T(src2) T(src2) T(src1) T(src1) T(src1) T(src1) T(src1)
fx(VXSNAN)
T(Q(src2))
+Zero T(src2) T(src2) T(src2) T(src1) T(src1) T(src1) T(src1)
fx(VXSNAN)
src1
T(Q(src2))
+NZF T(src2) T(src2) T(src2) T(src2) T(M(src1,src2)) T(src1) T(src1)
fx(VXSNAN)
T(Q(src2))
+Infinity T(src2) T(src2) T(src2) T(src2) T(src2) T(src1) T(src1)
fx(VXSNAN)
T(src1)
QNaN T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(src1)
fx(VXSNAN)
T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1))
SNaN
fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN)
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 The double-precision floating-point value in doubleword element 0 of VSR[XT].
NZF Nonzero finite number.
Q(x) Return a QNaN with the payload of x.
M(x,y) Return the lesser of floating-point value x and floating-point value y.
T(x) The value x is placed in doubleword element i (i{0,1}) of VSR[XT] in double-precision format.
The contents of doubleword element 1 of VSR[XT] are undefined.
FPRF, FR and FI are not modified.
fx(x) If x is equal to 0, FX is set to 1. x is set to 1.
VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. If VE=1, update of VSR[XT] is suppressed.
src2
–Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
T(src2)
–Infinity T(src2) T(src1) T(src1) T(src1) T(src1) T(src1) T(src2)
fx(VXSNAN)
T(src2)
–NZF T(src2) T(M(src1,src2) T(src1) T(src1) T(src1) T(src1) T(src2)
fx(VXSNAN)
T(src2)
–Zero T(src2) T(src2) T(src2) T(src2) T(src1) T(src1) T(src2)
fx(VXSNAN)
T(src2)
+Zero T(src2) T(src2) T(src2) T(src2) T(src1) T(src1) T(src2)
fx(VXSNAN)
src1
T(src2)
+NZF T(src2) T(src2) T(src2) T(src2) T(M(src1,src2) T(src1) T(src2)
fx(VXSNAN)
T(src2)
+Infinity T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(src2)
fx(VXSNAN)
T(src2)
QNaN T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(src2)
fx(VXSNAN)
T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(src2)
SNaN
fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN)
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 The double-precision floating-point value in doubleword element 0 of VSR[XT].
NZF Nonzero finite number.
M(x,y) Return the lesser of floating-point value x and floating-point value y.
T(x) The value x is placed in doubleword element 0 of VSR[XT] in double-precision format.
The contents of doubleword element 1 of VSR[XT] are undefined.
FPRF, FR and FI are not modified.
fx(x) If x is equal to 0, FX is set to 1. x is set to 1.
VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, VXSNAN. If VE=1, update of VSR[XT] is suppressed.
src2
–Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
T(src2)
–Infinity T(-INF) T(src1) T(src1) T(src1) T(src1) T(src1) T(src2)
fx(VXSNAN)
T(src2)
–NZF T(src2) T(M(src1,src2) T(src1) T(src1) T(src1) T(src1) T(src2)
fx(VXSNAN)
T(src2)
–Zero T(src2) T(src2) T(-Zero) T(-Zero) T(src1) T(src1) T(src2)
fx(VXSNAN)
T(src2)
+Zero T(src2) T(src2) T(-Zero) T(+Zero) T(src1) T(src1) T(src2)
fx(VXSNAN)
src1
T(src2)
+NZF T(src2) T(src2) T(src2) T(src2) T(M(src1,src2) T(src1) T(src2)
fx(VXSNAN)
T(src2)
+Infinity T(src2) T(src2) T(src2) T(src2) T(src2) T(+INF) T(src2)
fx(VXSNAN)
T(src1)
QNaN T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1)
fx(VXSNAN)
T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1)
SNaN
fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN)
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 The double-precision floating-point value in doubleword element 0 of VSR[XT].
NZF Nonzero finite number.
M(x,y) Return the greater of floating-point value x and floating-point value y.
T(x) The value x is placed in doubleword element 0 of VSR[XT] in double-precision format.
The contents of doubleword element 1 of VSR[XT] are undefined.
FPRF, FR and FI are not modified.
fx(x) If x is equal to 0, FX is set to 1. x is set to 1.
VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, VXSNAN. If VE=1, update of VSR[XT] is suppressed.
Part 1: src3
Multiply –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
p dQNaN p dQNaN p Q(src3)
–Infinity p +Infinity p +Infinity p –Infinity p –Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p Q(src3)
–NZF p +Infinity p M(src1,src3) p +Zero p –Zero p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
–Zero p +Zero p +Zero p –Zero p –Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Zero p –Zero p –Zero p +Zero p +Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
src1
p Q(src3)
+NZF p –Infinity p M(src1,src3) p –Zero p +Zero p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Infinity p –Infinity p +Infinity p +Infinity p +Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p src1
QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1
vxsnan_flag 1
p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Part 2: src2
Subtract –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
–Infinity v –Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
–NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v –Infinity v src2
vxsnan_flag 1
v Q(src2)
–Zero v +Infinity v –src2 v Rezd v –Zero v –src2 v –Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v +Infinity v –src2 v +Zero v Rezd v –src2 v –Infinity v src2
vxsnan_flag 1
p
v Q(src2)
+NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v –Infinity v src2
vxsnan_flag 1
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
QNaN & vp
vp vp vp vp vp vp vp
src1 is a NaN vxsnan_flag 1
QNaN & v Q(src2)
vp vp vp vp vp vp v src2
src1 not a NaN vxsnan_flag 1
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 For xsmsubadp, the double-precision floating-point value in doubleword element 0 of VSR[XT].
For xsmsubmdp, the double-precision floating-point value in doubleword element 0 of VSR[XB].
src3 For xsmsubadp, the double-precision floating-point value in doubleword element 0 of VSR[XB].
For xsmsubmdp, the double-precision floating-point value in doubleword element 0 of VSR[XT].
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two
nonzero finite number source operands.
Q(x) Return a QNaN with the payload of x.
S(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision.
Note: If x = y, v is considered to be an exact-zero-difference result (Rezd).
M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
p The intermediate product having unbounded range and precision.
v The intermediate result having unbounded range and precision.
if(vxsnan_flag) then SetFX(VXSNAN) See part 2 of Table 79, “Actions for xsmsub(a|m)sp”.
if(vximz_flag) then SetFX(VXIMZ)
if(vxisi_flag) then SetFX(VXISI) The intermediate result is rounded to single-precision
if(ox_flag) then SetFX(OX) using the rounding mode specified by RN.
if(ux_flag) then SetFX(UX)
if(xx_flag) then SetFX(XX) See Table 50, “Scalar Floating-Point Intermediate
Result Handling,” on page 515.
vex_flag VE & (vxsnan_flag | vximz_flag | vxisi_flag)
The result is placed into doubleword element 0 of
if( ~vex_flag ) then do
VSR[32×TX+T].dword[0] ConvertSPtoSP64(result)
VSR[XT] in double-precision format.
VSR[32×TX+T].dword[1] 0xUUUU_UUUU_UUUU_UUUU
FPRF ClassSP(result) The contents of doubleword element 1 of VSR[XT] are
FR inc_flag undefined.
FI xx_flag
end FPRF is set to the class and sign of the result as
else do represented in single-precision format. FR is set to
FR 0b0 indicate if the result was incremented when rounded.
FI 0b0 FI is set to indicate the result is inexact.
end
If a trap-enabled invalid operation exception occurs,
Let XT be the value 32×TX + T. VSR[XT] and FPRF are not modified, and FR and FI
Let XA be the value 32×AX + A. are set to 0.
Let XB be the value 32×BX + B.
See Table 51, “VSX Scalar Floating-Point Final
Result,” on page 516.
Part 1: src3
Multiply –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
p dQNaN p dQNaN p Q(src3)
–Infinity p +Infinity p +Infinity p –Infinity p –Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p Q(src3)
–NZF p +Infinity p M(src1,src3) p +Zero p –Zero p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
–Zero p +Zero p +Zero p –Zero p –Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Zero p –Zero p –Zero p +Zero p +Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
src1
p Q(src3)
+NZF p –Infinity p M(src1,src3) p –Zero p +Zero p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Infinity p –Infinity p +Infinity p +Infinity p +Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p src1
QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1
vxsnan_flag 1
p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Part 2: src2
Subtract –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
–Infinity v –Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
–NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v –Infinity v src2
vxsnan_flag 1
v Q(src2)
–Zero v +Infinity v –src2 v Rezd v –Zero v –src2 v –Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v +Infinity v –src2 v +Zero v Rezd v –src2 v –Infinity v src2
vxsnan_flag 1
v Q(src2)
+NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v –Infinity v src2
vxsnan_flag 1
p
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
QNaN &
vp
src1 is a vp vp vp vp vp vp vp
vxsnan_flag 1
NaN
QNaN &
v Q(src2)
src1 not a v p vp vp vp vp vp v src2
vxsnan_flag 1
NaN
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 For xsmsubasp, the double-precision floating-point value in doubleword element 0 of VSR[XT].
For xsmsubmsp, the double-precision floating-point value in doubleword element 0 of VSR[XB].
src3 For xsmsubasp, the double-precision floating-point value in doubleword element 0 of VSR[XB].
For xsmsubmsp, the double-precision floating-point value in doubleword element 0 of VSR[XT].
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two
nonzero finite number source operands.
Q(x) Return a QNaN with the payload of x.
S(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision.
Note: If x = y, v is considered to be an exact-zero-difference result (Rezd).
M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
p The intermediate product having unbounded range and precision.
v The intermediate result having unbounded range and precision.
VSX Scalar Multiply-Subtract Quad-Precision Otherwise, if src1 is a Quiet NaN, the result is src1.
[using round to Odd] X-form
Otherwise, if src2 is a Signalling NaN, the result is the
xsmsubqp VRT,VRA,VRB (RO=0) Quiet NaN corresponding to src2.
xsmsubqpo VRT,VRA,VRB (RO=1)
Otherwise, if src2 is a Quiet NaN, the result is src2.
63 VRT VRA VRB 420 RO
0 6 11 16 21 31 Otherwise, if src3 is a Signalling NaN, the result is the
Quiet NaN corresponding to src3.
if MSR.VSX=0 then VSX_Unavailable()
Otherwise, if src3 is a Quiet NaN, the result is src3.
reset_xflags()
Otherwise, if src1 is an Infinity value and src3 is a Zero
src1 bfp_CONVERT_FROM_BFP128(VSR[VRA+32])
src2 bfp_CONVERT_FROM_BFP128(VSR[VRT+32])
value, or if src1 is a Zero value and src3 is an Infinity
src3 bfp_CONVERT_FROM_BFP128(VSR[VRB+32]) value, the result is the default Quiet NaN[1].
v bfp_MULTIPLY_ADD(src1, src3, bfp_NEGATE(src2))
rnd bfp_ROUND_TO_BFP128(RO, FPSCR.RN, v) Otherwise, if the product of src1 and src3, and src2
result bfp_CONVERT_TO_BFP128(rnd) are Infinity values having same signs, the result is the
default Quiet NaN.
if(vxsnan_flag) then SetFX(FPSCR.VXSNAN)
if(vximz_flag) then SetFX(FPSCR.VXIMZ) Otherwise, do the following.
if(vxisi_flag) then SetFX(FPSCR.VXISI) src1 is multiplied by src3, producing a product
if(ox_flag) then SetFX(FPSCR.OX) having unbounded significand precision and
if(ux_flag) then SetFX(FPSCR.UX) exponent range.
if(xx_flag) then SetFX(FPSCR.XX)
See part 1 of Table 80. "Actions for xsmsubqp[o]".
vx_flag vxsnan_flag | vximz_flag | vxisi_flag
ex_flag FPSCR.VE & vx_flag
src2 is negated and added to the product,
if ex_flag=0 then do producing a sum having unbounded range and
VSR[VRT+32] result precision.
FPSCR.FPRF fprf_CLASS_BFP128(result)
end See part 2 of Table 80. "Actions for xsmsubqp[o]".
FPSCR.FR (vx_flag=0) & inc_flag
FPSCR.FI (vx_flag=0) & xx_flag If the intermediate result is Tiny (i.e., the unbiased
exponent is less than -16382) and UE=0, the
Let src1 be the floating-point value in VSR[VRA+32] significand is shifted right N bits, where N is the
represented in quad-precision format. difference between -16382 and the unbiased
exponent of the intermediate result. The exponent
Let src2 be the floating-point value in VSR[VRT+32] of the intermediate result is set to the value
represented in quad-precision format. -16382.
Let src3 be the floating-point value in VSR[VRB+32] If RO=1, let the rounding mode be Round to Odd.
represented in quad-precision format. Otherwise, let the rounding mode be specified by
RN. Unless the result is an Infinity or a Zero, the
If either src1, src2, or src3 is a Signalling NaN, an intermediate result is rounded to quad-precision
Invalid Operation exception occurs and VXSNAN is set to using the specified rounding mode.
1.
See Table 50, “Scalar Floating-Point Intermediate
If src1 is an Infinity value and src3 is a Zero value, or if Result Handling,” on page 515.
src1 is a Zero value and src3 is an Infinity value, an
Invalid Operation exception occurs and VXIMZ is set to The result is placed into VSR[VRT+32] in quad-precision
1. format.
If src2 and the product of src1 and src3 are Infinity FPRF is set to the class and sign of the result. FR is set
values having same signs, an Invalid Operation to indicate if the rounded result was incremented. FI is
exception occurs and VXISI is set to 1. set to indicate the result is inexact.
If src1 is a Signalling NaN, the result is the Quiet NaN If a trap-disabled Invalid Operation exception occurs,
corresponding to src1. FR and FI are set to 0.
1. The quad-precision default Quiet NaN is the value, 0x7FFF_8000_0000_0000_0000_0000_0000.
src1
VSR[VRT+32]
src2
VSR[VRB+32]
src3
VSR[VRT+32]
tgt
Part 1: src3
Multiply –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
p dQNaN
–Infinity p +Infinity p –Infinity
vximz_flag 1
p p
–NZF
mul(src1,src3) mul(src1,src3)
p p
+NZF
mul(src1,src3) mul(src1,src3)
p dQNaN
+Infinity p –Infinity p +Infinity
vximz_flag 1
p src1
QNaN p src1
vxsnan_flag 1
p quiet(src1)
SNaN
vxsnan_flag 1
Part 2: src2
Subtract –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN
–Infinity v –Infinity
vxisi_flag 1
v dQNaN
+Infinity v +Infinity
vxisi_flag 1
QNaN & vp
src1 is a NaN vxsnan_flag 1
vp
QNaN & v quiet(src2)
v src2
src1 not a NaN vxsnan_flag 1
Explanation:
src1 The quad-precision floating-point value in VSR[VRA+32].
src2 The quad-precision floating-point value in VSR[VRT+32].
src3 The quad-precision floating-point value in VSR[VRB+32].
dQNaN Default quiet NaN (0x7FFF_8000_0000_0000_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two
nonzero finite number source operands.
quiet(x) Return a QNaN with the payload of x.
sub(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision.
Note: If x = y, v is considered to be an exact-zero-difference result (Rezd).
mul(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
p The intermediate product having unbounded range and precision.
v The intermediate result having unbounded range and precision.
VSX Scalar Multiply Double-Precision The contents of doubleword element 1 of VSR[XT] are
XX3-form undefined.
xsmuldp XT,XA,XB FPRF is set to the class and sign of the result. FR is
set to indicate if the result was incremented when
60 T A B 48 AX BX TX rounded. FI is set to indicate the result is inexact.
0 6 11 16 21 29 30 31
If a trap-enabled invalid operation exception occurs,
XT TX || T VSR[XT] and FPRF are not modified, and FR and FI
XA AX || A
are set to 0.
XB BX || B
reset_xflags()
src1 VSR[XA]{0:63}
See Table 51, “VSX Scalar Floating-Point Final
src2 VSR[XB]{0:63} Result,” on page 516.
v{0:inf} MultiplyFP(src1,src2)
result{0:63} RoundToDP(RN,v) Special Registers Altered
if(vxsnan_flag) then SetFX(VXSNAN) FPRF FR FI FX OX UX XX
if(vximz_flag) then SetFX(VXIMZ) VXSNAN VXIMZ
if(vxisi_flag) then SetFX(VXISI)
if(ox_flag) then SetFX(OX)
if(ux_flag) then SetFX(UX) VSR Data Layout for xsmuldp
if(xx_flag) then SetFX(XX) src1 = VSR[XA]
vex_flag VE & (vxsnan_flag | vximz_flag)
DP unused
if( ~vex_flag ) then do
src2 = VSR[XB]
VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU
FPRF ClassDP(result) DP unused
FR inc_flag
FI xx_flag tgt = VSR[XT]
end
DP undefined
else do
FR 0b0 0 64 127
FI 0b0
end
src2
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v dQNaN v Q(src2)
-Infinity v +Infinity v +Infinity v –Infinity v –Infinity v src2
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
v Q(src2)
-NZF v +Infinity v M(src1,src2) v +Zero v –Zero v M(src1,src2) v +Infinity v src2
vxsnan_flag 1
v dQNaN v dQNaN v Q(src2)
-Zero v +Zero v +Zero v –Zero v –Zero v src2
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
v dQNaN v dQNaN v Q(src2)
+Zero v –Zero v –Zero v +Zero v +Zero v src2
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
src1
v Q(src2)
+NZF v –Infinity v M(src1,src2) v –Zero v +Zero v M(src1,src2) v +Infinity v src2
vxsnan_flag 1
v dQNaN v dQNaN v Q(src2)
+Infinity v –Infinity v +Infinity v +Infinity v +Infinity v src2
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
v src1
QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1
vxsnan_flag 1
v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 The double-precision floating-point value in doubleword element 0 of VSR[XB].
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
Q(x) Return a QNaN with the payload of x.
v The intermediate result having unbounded signficand precision and unbounded exponent range.
VSX Scalar Multiply Quad-Precision [using Otherwise, if src1 is an Infinity value and src2 is a Zero
round to Odd] X-form value, or if src1 is a Zero value and src2 is an Infinity
value, the result is the default Quiet NaN[1].
xsmulqp VRT,VRA,VRB (RO=0)
xsmulqpo VRT,VRA,VRB (RO=1) Otherwise, do the following.
The normalized product of src1 multiplied by src2
63 VRT VRA VRB 36 RO is produced with unbounded significand precision
0 6 11 16 21 31
and exponent range.
if MSR.VSX=0 then VSX_Unavailable()
See Table 82. "Actions for xsmulqp[o]".
reset_xflags()
If the intermediate result is Tiny (i.e., the unbiased
src1 bfp_CONVERT_FROM_BFP128(VSR[VRA+32]) exponent is less than -16382) and UE=0, the
src2 bfp_CONVERT_FROM_BFP128(VSR[VRB+32]) significand is shifted right N bits, where N is the
v bfp_MULTIPLY(src1, src2) difference between -16382 and the unbiased
rnd bfp_ROUND_TO_BFP128(RO, FPSCR.RN, v) exponent of the intermediate result. The exponent
result bfp_CONVERT_TO_BFP128(rnd) of the intermediate result is set to the value
-16382.
if(vxsnan_flag) then SetFX(FPSCR.VXSNAN)
if(vximz_flag) then SetFX(FPSCR.VXIMZ) If RO=1, let the rounding mode be Round to Odd.
if(ox_flag) then SetFX(FPSCR.OX) Otherwise, let the rounding mode be specified by
if(ux_flag) then SetFX(FPSCR.UX)
RN. Unless the result is an Infinity or a Zero, the
if(xx_flag) then SetFX(FPSCR.XX)
intermediate result is rounded to quad-precision
vx_flag vxsnan_flag | vximz_flag using the specified rounding mode.
ex_flag FPSCR.VE & vx_flag
See Table 50, “Scalar Floating-Point Intermediate
if ex_flag=0 then do Result Handling,” on page 515.
VSR[VRT+32] result
FPSCR.FPRF fprf_CLASS_BFP128(result) The result is placed into VSR[VRT+32] in quad-precision
end format.
FPSCR.FR (vx_flag=0) & inc_flag
FPSCR.FI (vx_flag=0) & xx_flag FPRF is set to the class and sign of the result. FR is set
to indicate if the rounded result was incremented. FI is
Let src1 be the floating-point value in VSR[VRA+32] set to indicate the result is inexact.
represented in quad-precision format.
If a trap-disabled Invalid Operation exception occurs,
Let src2 be the floating-point value in VSR[VRB+32] FR and FI are set to 0.
represented in quad-precision format.
If a trap-enabled Invalid Operation exception occurs,
If either src1 or src2 is a Signalling NaN, an Invalid VSR[VRT+32] and FPRF are not modified, and FR and FI
Operation exception occurs and VXSNAN is set to 1. are set to 0.
If src1 is an Infinity value and src2 is a Zero value, or if See Table 51, “VSX Scalar Floating-Point Final
src1 is a Zero value and src2 is an Infinity value, an Result,” on page 516.
Invalid Operation exception occurs and VXIMZ is set to
1. Special Registers Altered:
FPRF FR FI FX VXSNAN VXIMZ OX UX XX
If src1 is a Signalling NaN, the result is the Quiet NaN
corresponding to src1.
src1
VSR[VRB+32]
src2
VSR[VRT+32]
tgt
src2
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN
-Infinity v +Infinity v –Infinity
vximz_flag 1
v dQNaN
+Infinity v –Infinity v +Infinity
vximz_flag 1
v src1
QNaN v src1
vxsnan_flag 1
v quiet(src1)
SNaN
vxsnan_flag 1
Explanation:
src1 The quad-precision floating-point value in VSR[VRA+32].
src2 The quad-precision floating-point value in VSR[VRB+32].
dQNaN Default quiet NaN (0x7FFF_8000_0000_0000_0000_0000_0000).
NZF Nonzero finite number.
mul(x,y) The floating-point value x is multiplied1 by the floating-point value y. Return the normalized product, having unbounded significand
precision and exponent range.
quiet(x) Convert x to the corresponding Quiet NaN.
v The intermediate result having unbounded significand precision and unbounded exponent range.
VSX Scalar Multiply Single-Precision The result is placed into doubleword element 0 of
XX3-form VSR[XT] in double-precision format.
if(vxsnan_flag) then SetFX(VXSNAN) See Table 51, “VSX Scalar Floating-Point Final
if(vximz_flag) then SetFX(VXIMZ) Result,” on page 516.
if(ox_flag) then SetFX(OX)
if(ux_flag) then SetFX(UX) Special Registers Altered
if(xx_flag) then SetFX(XX)
FPRF FR FI FX OX UX XX
vex_flag VE & (vxsnan_flag | vximz_flag)
VXSNAN VXIMZ
src2
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v dQNaN v Q(src2)
-Infinity v +Infinity v +Infinity v –Infinity v –Infinity v src2
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
v Q(src2)
-NZF v +Infinity v M(src1,src2) v +Zero v –Zero v M(src1,src2) v +Infinity v src2
vxsnan_flag 1
v dQNaN v dQNaN v Q(src2)
-Zero v +Zero v +Zero v –Zero v –Zero v src2
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
v dQNaN v dQNaN v Q(src2)
+Zero v –Zero v –Zero v +Zero v +Zero v src2
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
src1
v Q(src2)
+NZF v –Infinity v M(src1,src2) v –Zero v +Zero v M(src1,src2) v +Infinity v src2
vxsnan_flag 1
v dQNaN v dQNaN v Q(src2)
+Infinity v –Infinity v +Infinity v +Infinity v +Infinity v src2
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
v src1
QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1
vxsnan_flag 1
v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 The double-precision floating-point value in doubleword element 0 of VSR[XB].
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
Q(x) Return a QNaN with the payload of x.
v The intermediate result having unbounded signficand precision and unbounded exponent range.
Programming Note
This instruction can be used to operate on a
single-precision source operand.
The contents of doubleword element 1 of VSR[XT] are VSR Data Layout for xsnegqp
undefined. VSR[VRB+32]
tgt
VSR Data Layout for xsnegdp
src = VSR[XB]
DP unused
tgt = VSR[XT]
DP undefined
0 64 127
Programming Note
This instruction can be used to operate on a
single-precision source operand.
Part 1: src3
Multiply –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
p dQNaN p dQNaN p Q(src3)
–Infinity p +Infinity p +Infinity p –Infinity p –Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p Q(src3)
–NZF p +Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
–Zero p +Zero p +Zero p –Zero p –Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Zero p –Zero p –Zero p +Zero p +Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
src1
p Q(src3)
+NZF p –Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Infinity p –Infinity p +Infinity p +Infinity p +Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p src1
QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1
vxsnan_flag 1
p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Part 2: src2
Add –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
–Infinity v –Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
–NZF v –Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2
vxsnan_flag 1
v Q(src2)
–Zero v –Infinity v src2 v –Zero v Rezd v src2 v +Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v –Infinity v src2 v Rezd v +Zero v src2 v +Infinity v src2
vxsnan_flag 1
p
v Q(src2)
+NZF v –Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2
vxsnan_flag 1
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
QNaN & vp
vp vp vp vp vp vp vp
src1 is a NaN vxsnan_flag 1
QNaN & v Q(src2)
vp vp vp vp vp vp v src2
src1 not a NaN vxsnan_flag 1
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 For xsnmaddadp, the double-precision floating-point value in doubleword element 0 of VSR[XT].
For xsnmaddmdp, the double-precision floating-point value in doubleword element 0 of VSR[XB].
src3 For xsnmaddadp, the double-precision floating-point value in doubleword element 0 of VSR[XB].
For xsnmaddmdp, the double-precision floating-point value in doubleword element 0 of VSR[XT].
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two
nonzero finite number source operands.
Q(x) Return a QNaN with the payload of x.
A(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision.
Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd).
M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
p The intermediate product having unbounded range and precision.
v The intermediate result having unbounded range and precision.
Is q inexact? (q g v)
vxsnan_flag
vximz_flag
vxisi_flag
OE
UE
VE
XE
Case ZE Returned Results and Status Setting
Explanation:
– The results do not depend on this condition.
ClassFP(x) Classifies the floating-point value x as defined in Table 2, “Floating-Point Result Flags,” on page 371.
fx(x) FX is set to 1 if x=0. x is set to 1.
Wrap adjust, where = 21536 for double-precision and = 2192 for single-precision.
q The value defined in Table 50, “Scalar Floating-Point Intermediate Result Handling,” on page 515, signficand rounded to the target
precision, unbounded exponent range.
r The value defined in Table 50, “Scalar Floating-Point Intermediate Result Handling,” on page 515, signficand rounded to the target
precision, bounded exponent range.
v The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range.
FI Floating-Point Fraction Inexact status flag, FPSCRFI. This status flag is nonsticky.
FR Floating-Point Fraction Rounded status flag, FPSCRFR.
OX Floating-Point Overflow Exception status flag, FPSCROX.
error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set
to any mode other than the ignore-exception mode.
N(x) The value x is is negated by complementing the sign bit of x.
T(x) The value x is placed in element 0 of VSR[XT] in the target precision format.
The contents of the remaining element(s) of VSR[XT] are undefined.
UX Floating-Point Underflow Exception status flag, FPSCRUX
VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN.
VXIMZ Floating-Point Invalid Operation Exception (Infinity × Zero) status flag, FPSCRVXIMZ.
VXISI Floating-Point Invalid Operation Exception (Infinity – Infinity) status flag, FPSCRVXISI.
XX Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new
value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI.
Is q inexact? (q g v)
vxsnan_flag
vximz_flag
vxisi_flag
OE
UE
VE
XE
ZE
Case Returned Results and Status Setting
Explanation:
– The results do not depend on this condition.
ClassFP(x) Classifies the floating-point value x as defined in Table 2, “Floating-Point Result Flags,” on page 371.
fx(x) FX is set to 1 if x=0. x is set to 1.
Wrap adjust, where = 21536 for double-precision and = 2192 for single-precision.
q The value defined in Table 50, “Scalar Floating-Point Intermediate Result Handling,” on page 515, signficand rounded to the target
precision, unbounded exponent range.
r The value defined in Table 50, “Scalar Floating-Point Intermediate Result Handling,” on page 515, signficand rounded to the target
precision, bounded exponent range.
v The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range.
FI Floating-Point Fraction Inexact status flag, FPSCRFI. This status flag is nonsticky.
FR Floating-Point Fraction Rounded status flag, FPSCRFR.
OX Floating-Point Overflow Exception status flag, FPSCROX.
error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set
to any mode other than the ignore-exception mode.
N(x) The value x is is negated by complementing the sign bit of x.
T(x) The value x is placed in element 0 of VSR[XT] in the target precision format.
The contents of the remaining element(s) of VSR[XT] are undefined.
UX Floating-Point Underflow Exception status flag, FPSCRUX
VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN.
VXIMZ Floating-Point Invalid Operation Exception (Infinity × Zero) status flag, FPSCRVXIMZ.
VXISI Floating-Point Invalid Operation Exception (Infinity – Infinity) status flag, FPSCRVXISI.
XX Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new
value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI.
if( ~vex_flag ) then do The result is negated and placed into doubleword
VSR[32×TX+T].dword[0] ConvertToSP(result) element 0 of VSR[XT] in double-precision format.
VSR[32×TX+T].dword[1] 0xUUUU_UUUU_UUUU_UUUU
FPRF ClassSP(result) The contents of doubleword element 1 of VSR[XT] are
FR inc_flag
undefined.
FI xx_flag
end
else do
FPRF is set to the class and sign of the result as
FR 0b0 represented in single-precision format. FR is set to
FI 0b0 indicate if the result was incremented when rounded.
end FI is set to indicate the result is inexact.
Part 1: src3
Multiply –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
p dQNaN p dQNaN p Q(src3)
–Infinity p +Infinity p +Infinity p –Infinity p –Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p Q(src3)
–NZF p +Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
–Zero p +Zero p +Zero p –Zero p –Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Zero p –Zero p –Zero p +Zero p +Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
src1
p Q(src3)
+NZF p –Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Infinity p –Infinity p +Infinity p +Infinity p +Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p src1
QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1
vxsnan_flag 1
p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Part 2: src2
Add –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
–Infinity v –Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
–NZF v –Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2
vxsnan_flag 1
v Q(src2)
–Zero v –Infinity v src2 v –Zero v Rezd v src2 v +Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v –Infinity v src2 v Rezd v +Zero v src2 v +Infinity v src2
vxsnan_flag 1
v Q(src2)
+NZF v –Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2
vxsnan_flag 1
p
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
QNaN &
vp
src1 is a vp vp vp vp vp vp vp
vxsnan_flag 1
NaN
QNaN &
v Q(src2)
src1 not a v p vp vp vp vp vp v src2
vxsnan_flag 1
NaN
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 For xsnmaddasp, the double-precision floating-point value in doubleword element 0 of VSR[XT].
For xsnmaddmsp, the double-precision floating-point value in doubleword element 0 of VSR[XB].
src3 For xsnmaddasp, the double-precision floating-point value in doubleword element 0 of VSR[XB].
For xsnmaddmsp, the double-precision floating-point value in doubleword element 0 of VSR[XT].
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two
nonzero finite number source operands.
Q(x) Return a QNaN with the payload of x.
A(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision.
Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd).
M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
p The intermediate product having unbounded range and precision.
v The intermediate result having unbounded range and precision.
VSX Scalar Negative Multiply-Add Otherwise, if src1 is a Quiet NaN, the result is src1.
Quad-Precision [using round to Odd] X-form
Otherwise, if src2 is a Signalling NaN, the result is the
xsnmaddqp VRT,VRA,VRB (RO=0) Quiet NaN corresponding to src2.
xsnmaddqpo VRT,VRA,VRB (RO=1)
Otherwise, if src2 is a Quiet NaN, the result is src2.
63 VRT VRA VRB 452 RO
0 6 11 16 21 31 Otherwise, if src3 is a Signalling NaN, the result is the
Quiet NaN corresponding to src3.
if MSR.VSX=0 then VSX_Unavailable()
Otherwise, if src3 is a Quiet NaN, the result is src3.
reset_xflags()
Otherwise, if src1 is an Infinity value and src3 is a Zero
src1 bfp_CONVERT_FROM_BFP128(VSR[VRA+32])
src2 bfp_CONVERT_FROM_BFP128(VSR[VRT+32])
value, or if src1 is a Zero value and src3 is an Infinity
src3 bfp_CONVERT_FROM_BFP128(VSR[VRB+32]) value, the result is the default Quiet NaN[1].
v bfp_MULTIPLY_ADD(src1,src3,src2)
rnd bfp_NEGATE(bfp_ROUND_TO_BFP128(RO, FPSCR.RN, v)) Otherwise, if the product of src1 and src3, and src2
result bfp_CONVERT_TO_BFP128(rnd) are Infinity values having opposite signs, the result is
the default Quiet NaN.
if(vxsnan_flag) then SetFX(FPSCR.VXSNAN)
if(vximz_flag) then SetFX(FPSCR.VXIMZ) Otherwise, do the following.
if(vxisi_flag) then SetFX(FPSCR.VXISI) src1 is multiplied by src3, producing a product
if(ox_flag) then SetFX(FPSCR.OX) having unbounded significand precision and
if(ux_flag) then SetFX(FPSCR.UX) exponent range.
if(xx_flag) then SetFX(FPSCR.XX)
See part 1 of Table 69. "Actions for
vx_flag vxsnan_flag | vximz_flag | vxisi_flag
ex_flag FPSCR.VE & vx_flag
xsmadd(a|m)dp".
src1
VSR[VRT+32]
src2
VSR[VRB+32]
src3
VSR[VRT+32]
tgt
Part 1: src3
Multiply –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
p dQNaN
–Infinity p +Infinity p –Infinity
vximz_flag 1
p p
–NZF
Mul(src1,src3) Mul(src1,src3)
p p
+NZF
Mul(src1,src3) Mul(src1,src3)
p dQNaN
+Infinity p –Infinity p +Infinity
vximz_flag 1
p src1
QNaN p src1
vxsnan_flag 1
p quiet(src1)
SNaN
vxsnan_flag 1
Part 2: src2
Add –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN
–Infinity v –Infinity
vxisi_flag 1
v dQNaN
+Infinity v +Infinity
vxisi_flag 1
QNaN & vp
src1 is a NaN vxsnan_flag 1
vp
QNaN & v quiet(src2)
v src2
src1 not a NaN vxsnan_flag 1
Explanation:
src1 The quad-precision floating-point value in VSR[VRA+32].
src2 The quad-precision floating-point value in VSR[VRT+32].
src3 The quad-precision floating-point value in VSR[VRB+32].
dQNaN Default quiet NaN (0x7FFF_8000_0000_0000_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two
nonzero finite number source operands.
quiet(x) Return a QNaN with the payload of x.
Add(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision.
Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd).
Mul(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
p The intermediate product having unbounded range and precision.
v The intermediate result having unbounded range and precision.
Part 1: src3
Multiply –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
p dQNaN p dQNaN p Q(src3)
–Infinity p +Infinity p +Infinity p –Infinity p –Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p Q(src3)
–NZF p +Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
–Zero p +Zero p +Zero p –Zero p –Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Zero p –Zero p –Zero p +Zero p +Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
src1
p Q(src3)
+NZF p –Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Infinity p –Infinity p +Infinity p +Infinity p +Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p src1
QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1
vxsnan_flag 1
p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Part 2: src2
Subtract –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
–Infinity v –Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
–NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v –Infinity v src2
vxsnan_flag 1
v Q(src2)
–Zero v +Infinity v –src2 v Rezd v –Zero v –src2 v –Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v +Infinity v –src2 v +Zero v Rezd v –src2 v –Infinity v src2
vxsnan_flag 1
p
v Q(src2)
+NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v –Infinity v src2
vxsnan_flag 1
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
QNaN & vp
vp vp vp vp vp vp vp
src1 is a NaN vxsnan_flag 1
QNaN & v Q(src2)
vp vp vp vp vp vp v src2
src1 not a NaN vxsnan_flag 1
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 For xsnmsubadp, the double-precision floating-point value in doubleword element 0 of VSR[XT].
For xsnmsubmdp, the double-precision floating-point value in doubleword element 0 of VSR[XB].
src3 For xsnmsubadp, the double-precision floating-point value in doubleword element 0 of VSR[XB].
For xsnmsubmdp, the double-precision floating-point value in doubleword element 0 of VSR[XT].
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two
nonzero finite number source operands.
Q(x) Return a QNaN with the payload of x.
S(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision.
Note: If x = y, v is considered to be an exact-zero-difference result (Rezd).
M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
p The intermediate product having unbounded range and precision.
v The intermediate result having unbounded range and precision.
if( ~vex_flag ) then do The result is negated and placed into doubleword
VSR[32×TX+T].dword[0] ConvertSPtoSP64(result) element 0 of VSR[XT] in double-precision format.
VSR[32×TX+T].dword[1] 0xUUUU_UUUU_UUUU_UUUU
FPRF ClassSP(result) The contents of doubleword element 1 of VSR[XT] are
FR inc_flag
undefined.
FI xx_flag
end
else do
FPRF is set to the class and sign of the result as
FR 0b0 represented in single-precision format. FR is set to
FI 0b0 indicate if the result was incremented when rounded.
end FI is set to indicate the result is inexact.
Part 1: src3
Multiply –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
p dQNaN p dQNaN p Q(src3)
–Infinity p +Infinity p +Infinity p –Infinity p –Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p Q(src3)
–NZF p +Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
–Zero p +Zero p +Zero p –Zero p –Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Zero p –Zero p –Zero p +Zero p +Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
src1
p Q(src3)
+NZF p –Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Infinity p –Infinity p +Infinity p +Infinity p +Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p src1
QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1
vxsnan_flag 1
p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Part 2: src2
Subtract –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
–Infinity v –Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
–NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v –Infinity v src2
vxsnan_flag 1
v Q(src2)
–Zero v +Infinity v –src2 v Rezd v –Zero v –src2 v –Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v +Infinity v –src2 v +Zero v Rezd v –src2 v –Infinity v src2
vxsnan_flag 1
p
v Q(src2)
+NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v –Infinity v src2
vxsnan_flag 1
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
QNaN & vp
vp vp vp vp vp vp vp
src1 is a NaN vxsnan_flag 1
QNaN & v Q(src2)
vp vp vp vp vp vp v src2
src1 not a NaN vxsnan_flag 1
Explanation:
src1 The double-precision floating-point value in VSR[XA].dword[0].
src2 For xsnmsubasp, the double-precision floating-point value in VSR[XT].dword[0].
For xsnmsubmsp, the double-precision floating-point value in VSR[XB].dword[0].
src3 For xsnmsubasp, the double-precision floating-point value in VSR[XB].dword[0].
For xsnmsubmsp, the double-precision floating-point value in VSR[XT].dword[0].
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two
nonzero finite number source operands.
Q(x) Return a QNaN with the payload of x.
S(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision.
Note: If x = y, v is considered to be an exact-zero-difference result (Rezd).
M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
p The intermediate product having unbounded range and precision.
v The intermediate result having unbounded range and precision.
VSX Scalar Negative Multiply-Subtract Otherwise, if src1 is a Quiet NaN, the result is src1.
Quad-Precision [using round to Odd] X-form
Otherwise, if src2 is a Signalling NaN, the result is the
xsnmsubqp VRT,VRA,VRB (RO=0) Quiet NaN corresponding to src2.
xsnmsubqpo VRT,VRA,VRB (RO=1)
Otherwise, if src2 is a Quiet NaN, the result is src2.
63 VRT VRA VRB 484 RO
0 6 11 16 21 31 Otherwise, if src3 is a Signalling NaN, the result is the
Quiet NaN corresponding to src3.
if MSR.VSX=0 then VSX_Unavailable()
Otherwise, if src3 is a Quiet NaN, the result is src3.
reset_xflags()
Otherwise, if src1 is an Infinity value and src3 is a Zero
src1 bfp_CONVERT_FROM_BFP128(VSR[VRA+32])
src2 bfp_CONVERT_FROM_BFP128(VSR[VRT+32])
value, or if src1 is a Zero value and src3 is an Infinity
src3 bfp_CONVERT_FROM_BFP128(VSR[VRB+32]) value, the result is the default Quiet NaN[1].
v bfp_MULTIPLY_ADD(src1, src3, bfp_NEGATE(src2))
rnd bfp_NEGATE(bfp_ROUND_TO_BFP128(RO, FPSCR.RN, v)) Otherwise, if the product of src1 and src3, and src2
result bfp_CONVERT_TO_BFP128(rnd) are Infinity values having same signs, the result is the
default Quiet NaN.
if(vxsnan_flag) then SetFX(FPSCR.VXSNAN)
if(vximz_flag) then SetFX(FPSCR.VXIMZ) Otherwise, do the following.
if(vxisi_flag) then SetFX(FPSCR.VXISI) src1 is multiplied by src3, producing a product
if(ox_flag) then SetFX(FPSCR.OX) having unbounded significand precision and
if(ux_flag) then SetFX(FPSCR.UX) exponent range.
if(xx_flag) then SetFX(FPSCR.XX)
See part 1 of Table 80. "Actions for xsmsubqp[o]".
vx_flag vxsnan_flag | vximz_flag | vxisi_flag
ex_flag FPSCR.VE & vx_flag
src2 is negated and added to the product,
if ex_flag=0 then do producing a sum having unbounded range and
VSR[VRT+32] result precision.
FPSCR.FPRF fprf_CLASS_BFP128(result)
end See part 2 of Table 80. "Actions for xsmsubqp[o]".
FPSCR.FR (vx_flag=0) & inc_flag
FPSCR.FI (vx_flag=0) & xx_flag If the intermediate result is Tiny (i.e., the unbiased
exponent is less than -16382) and UE=0, the
Let src1 be the floating-point value in VSR[VRA+32] significand is shifted right N bits, where N is the
represented in quad-precision format. difference between -16382 and the unbiased
exponent of the intermediate result. The exponent
Let src2 be the floating-point value in VSR[VRT+32] of the intermediate result is set to the value
represented in quad-precision format. -16382.
Let src3 be the floating-point value in VSR[VRB+32] If RO=1, let the rounding mode be Round to Odd.
represented in quad-precision format. Otherwise, let the rounding mode be specified by
RN. Unless the result is an Infinity or a Zero, the
If either src1, src2, or src3 is a Signalling NaN, an intermediate result is rounded to quad-precision
Invalid Operation exception occurs and VXSNAN is set to using the specified rounding mode.
1.
See Table 50, “Scalar Floating-Point Intermediate
If src1 is an Infinity value and src3 is a Zero value, or if Result Handling,” on page 515.
src1 is a Zero value and src3 is an Infinity value, an
Invalid Operation exception occurs and VXIMZ is set to The result is negated and placed into VSR[VRT+32] in
1. quad-precision format.
If src2 and the product of src1 and src3 are Infinity FPRF is set to the class and sign of the result. FR is set
values having same signs, an Invalid Operation to indicate if the rounded result was incremented. FI is
exception occurs and VXISI is set to 1. set to indicate the result is inexact.
If src1 is a Signalling NaN, the result is the Quiet NaN If a trap-disabled Invalid Operation exception occurs,
corresponding to src1. FR and FI are set to 0.
1. The quad-precision default Quiet NaN is the value, 0x7FFF_8000_0000_0000_0000_0000_0000.
src1
VSR[VRT+32]
src2
VSR[VRB+32]
src3
VSR[VRT+32]
tgt
Part 1: src3
Multiply –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
p dQNaN
–Infinity p +Infinity p –Infinity
vximz_flag 1
p p
–NZF
Mul(src1,src3) Mul(src1,src3)
p +Zero p –Zero
–Zero p +Zero p –Zero
p dQNaN p dQNaN p quiet(src3)
p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
+Zero p –Zero p +Zero
src1
p –Zero p +Zero
p p
+NZF
Mul(src1,src3) Mul(src1,src3)
p dQNaN
+Infinity p –Infinity p +Infinity
vximz_flag 1
p src1
QNaN p src1
vxsnan_flag 1
p quiet(src1)
SNaN
vxsnan_flag 1
Part 2: src2
Subtract –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN
–Infinity v –Infinity
vxisi_flag 1
v dQNaN
+Infinity v +Infinity
vxisi_flag 1
QNaN & vp
src1 is a NaN vxsnan_flag 1
vp
QNaN & v quiet(src2)
v src2
src1 not a NaN vxsnan_flag 1
Explanation:
src1 The quad-precision floating-point value in VSR[VRA+32].
src2 The quad-precision floating-point value in VSR[VRT+32].
src3 The quad-precision floating-point value in VSR[VRB+32].
dQNaN Default quiet NaN (0x7FFF_8000_0000_0000_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two
nonzero finite number source operands.
quiet(x) Return a QNaN with the payload of x.
sub(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision.
Note: If x = y, v is considered to be an exact-zero-difference result (Rezd).
Mul(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
p The intermediate product having unbounded range and precision.
v The intermediate result having unbounded range and precision.
60 T /// B 73 BX TX
0 6 11 16 21 30 31
XT TX || T
XB BX || B
reset_xflags()
result{0:63} RoundToDPIntegerNearAway(VSR[XB]{0:63})
if(vxsnan_flag) then SetFX(VXSNAN)
FR 0b0
FI 0b0
vex_flag VE & vxsnan_flag
Programming Note
This instruction can be used to operate on a
single-precision source operand.
Programming Note
This instruction can be used to operate on a
single-precision source operand.
VSX Scalar Round to Double-Precision Integer VSX Scalar Round to Double-Precision Integer
using round toward -Infinity XX2-form using round toward +Infinity XX2-form
xsrdpim XT,XB xsrdpip XT,XB
XT TX || T XT TX || T
XB BX || B XB BX || B
reset_xflags() reset_xflags()
result{0:63} RoundToDPIntegerFloor(VSR[XB]{0:63}) result{0:63} RoundToDPIntegerCeil(VSR[XB]{0:63})
if(vxsnan_flag) then SetFX(VXSNAN) if(vxsnan_flag) then SetFX(VXSNAN)
FR 0b0 FR 0b0
FI 0b0 FI 0b0
vex_flag VE & vxsnan_flag vex_flag VE & vxsnan_flag
Let src be the double-precision floating-point value in Let src be the double-precision floating-point value in
doubleword element 0 of VSR[XB]. doubleword element 0 of VSR[XB].
src is rounded to an integer using the rounding mode src is rounded to an integer using the rounding mode
Round toward -Infinity. Round toward +Infinity.
The result is placed into doubleword element 0 of The result is placed into doubleword element 0 of
VSR[XT] in double-precision format. VSR[XT] in double-precision format.
The contents of doubleword element 1 of VSR[XT] are The contents of doubleword element 1 of VSR[XT] are
undefined. undefined.
FPRF is set to the class and sign of the result. FR is FPRF is set to the class and sign of the result. FR is
set to 0. FI is set to 0. set to 0. FI is set to 0.
If a trap-enabled invalid operation exception occurs, If a trap-enabled invalid operation exception occurs,
VSR[XT] and FPRF are not modified, and FR and FI VSR[XT] and FPRF are not modified, and FR and FI
are set to 0. are set to 0.
VSR Data Layout for xsrdpim VSR Data Layout for xsrdpip
src = VSR[XB] src = VSR[XB]
DP unused DP unused
tgt = VSR[XT] tgt = VSR[XT]
DP undefined DP undefined
0 64 127 0 64 127
60 T /// B 89 BX TX
0 6 11 16 21 30 31
XT TX || T
XB BX || B
reset_xflags()
result{0:63} RoundToDPIntegerTrunc(VSR[XB]{0:63})
if(vxsnan_flag) then SetFX(VXSNAN)
FR 0b0
FI 0b0
vex_flag VE & vxsnan_flag
Programming Note
This instruction can be used to operate on a
single-precision source operand.
VSX Scalar Round to Quad-Precision Integer Let R and RMC specify the rounding mode as follows.
[with Inexact] Z23-form
FPSCR.RN
xsrqpi R,VRT,VRB,RMC (EX=0)
RMC
xsrqpix R,VRT,VRB,RMC (EX=1)
Rounding Mode
R
63 VRT /// R VRB RMC 5 EX 0 00 – Round to Nearest Away
0 6 11 15 16 21 23 31
0 01 – reserved
if MSR.VSX=0 then VSX_Unavailable() 0 10 – reserved
0 11 00 Round to Nearest Even
reset_xflags()
0 11 01 Round towards Zero
if R=0 then do 0 11 10 Round towards +Infinity
if RMC=0b00 then // Round to Nearest Away 0 11 11 Round towards -Infinity
rmode 0b100 1 00 – Round to Nearest Even
if RMC=0b11 then do 1 01 – Round towards Zero
if FPSCR.RN=0b00 then // Round to Nearest Even
1 10 – Round towards +Infinity
rmode 0b000
if FPSCR.RN=0b01 then // Round towards Zero 1 11 – Round towards -Infinity
rmode 0b001
if FPSCR.RN=0b10 then // Round towards +Infinity
Let src be the floating-point value in VSR[VRB+32]
rmode 0b010 represented in quad-precision format.
if FPSCR.RN=0b11 then // Round towards -Infinity
rmode 0b011 If src is a Signalling NaN, an Invalid Operation
end exception occurs, VXSNAN is set to 1, and the result is
end the Quiet NaN corresponding to the Signalling NaN.
else do // R=1
if RMC=0b00 then // Round to Nearest Even Otherwise, if src is a Quiet NaN, an Infinity, or a Zero,
rmode 0b000 then the result is src.
if RMC=0b01 then // Round towards Zero
rmode 0b001 Otherwise, src is rounded to an integer using the
if RMC=0b10 then // Round towards +Infinity rounding mode rmode.
rmode 0b010
if RMC=0b11 then // Round towards -Infinity
The result is placed into VSR[VRT+32] in quad-precision
rmode 0b011
end
format.
src bfp_CONVERT_FROM_BFP128(VSR[VRB+32]) FPRF is set to the class and sign of the result.
src
VSR[VRT+32]
tgt
VSX Scalar Round Quad-Precision to Let R and RMC specify the rounding mode as follows.
Double-Extended Precision Z23-form
FPSCR.RN
xsrqpxp R,VRT,VRB,RMC
RMC
63 VRT /// R VRB RMC 37 / Rounding Mode
R
0 6 11 15 16 21 23 31
0 00 – Round to Nearest Away
if MSR.VSX=0 then VSX_Unavailable() 0 01 – reserved
0 10 – reserved
reset_xflags() 0 11 00 Round to Nearest Even
0 11 01 Round to Zero
if R=0 then do
if RMC=0b00 then // Round to Nearest Away 0 11 10 Round to +Infinity
rmode 0b100 0 11 11 Round to -Infinity
if RMC=0b11 then do 1 00 – Round to Nearest Even
if FPSCR.RN=0b00 then // Round to Nearest Even 1 01 – Round to Zero
rmode 0b000
1 10 – Round to +Infinity
if FPSCR.RN=0b01 then // Round towards Zero
rmode 0b001 1 11 – Round to -Infinity
if FPSCR.RN=0b10 then // Round towards +Infinity
rmode 0b010 Let src be the floating-point value in VSR[VRB+32]
if FPSCR.RN=0b11 then // Round towards -Infinity represented in quad-precision format.
rmode 0b011
end If src is a Signalling NaN, an Invalid Operation
end exception occurs, VXSNAN is set to 1, and the result is
else do // R=1 the Quiet NaN corresponding to the Signalling NaN,
if RMC=0b00 then // Round to Nearest Even with the significand truncated to
rmode 0b000 double-extended-precision.
if RMC=0b01 then // Round towards Zero
rmode 0b001 Otherwise, if src is a Quiet NaN, then the result is src
if RMC=0b10 then // Round towards +Infinity
with the significand truncated to
rmode 0b010
if RMC=0b11 then // Round towards -Infinity
double-extended-precision.
rmode 0b011
end Otherwise, if src is an Infinity or a Zero, the result is
src.
src bfp_CONVERT_FROM_BFP128(VSR[VRB+32])
rnd bfp_ROUND_TO_BFP80(rmode,src) Otherwise, src is rounded to double-extended
result bfp_CONVERT_TO_BFP128(rnd) precision (i.e., 15-bit exponent range and 64-bit
significand precision) using the specified rounding
if(vxsnan_flag) then SetFX(FPSCR.VXSNAN) mode.
if(ox_flag) then SetFX(FPSCR.OX)
if(ux_flag) then SetFX(FPSCR.UX) See Table 50, “Scalar Floating-Point Intermediate
if(xx_flag) then SetFX(FPSCR.XX) Result Handling,” on page 515.
ex_flag FPSCR.VE & vxsnan_flag
The result is placed into VSR[VRT+32] in quad-precision
if ex_flag=0 then do
format.
VSR[VRT+32] result
FPSCR.FPRF fprf_CLASS_BFP128(result) FPRF is set to the class and sign of the result. FR is set
end to indicate if the rounded result was incremented. FI is
FPSCR.FR (vxsnan_flag=0) & inc_flag set to indicate the result is inexact.
FPSCR.FI (vxsnan_flag=0) & xx_flag
If a trap-disabled Invalid Operation exception occurs,
FPRF is set to an undefined value, and FR and FI are set
to 0.
src
VSR[VRT+32]
tgt
src VSR[32×BX+B].dword[0]
result RoundToSP(RN,src)
VSX Scalar Reciprocal Square Root Estimate Source Value Result Exception
Double-Precision XX2-form
–Infinity QNaN1 VXSQRT
xsrsqrtedp XT,XB –Finite QNaN1 VXSQRT
VSX Scalar Reciprocal Square Root Estimate Source Value Result Exception
Single-Precision XX2-form
–Infinity QNaN1 VXSQRT
xsrsqrtesp XT,XB
–Finite QNaN1 VXSQRT
60 T /// B 10 BXTX 2
–Zero –Infinity ZX
0 6 11 16 21 30 31
2
+Zero +Infinity ZX
reset_xflags()
+Infinity +Zero None
src VSR[32×BX+B].dword[0] 1
v ReciprocalSquareRootEstimateDP(src) SNaN QNaN VXSNAN
result RoundToSP(RN,v) QNaN QNaN None
if(vxsnan_flag) then SetFX(VXSNAN) 1. No result if VE=1.
if(vxsqrt_flag) then SetFX(VXSQRT) 2. No result if ZE=1.
if(ox_flag) then SetFX(OX)
if(ux_flag) then SetFX(UX) The contents of doubleword element 1 of VSR[XT] are
if(0bU) then SetFX(XX) undefined.
if(zx_flag) then SetFX(ZX)
FPRF is set to the class and sign of the result as
vex_flag VE & (vxsnan_flag | vxsqrt_flag) represented in single-precision format. FR is set to an
zex_flag ZE & zx_flag undefined value. FI is set to an undefined value.
if( ~vex_flag & ~zex_flag ) then do
If a trap-enabled invalid operation exception or a
VSR[32×TX+T].dword[0] ConvertSPtoSP64(result)
VSR[32×TX+T].dword[1] 0xUUUU_UUUU_UUUU_UUUU
trap-enabled zero divide exception occurs, VSR[XT]
FPRF ClassSP(result) and FPRF are not modified.
FR 0bU
FI 0bU The results of executing this instruction is permitted to
end vary between implementations, and between different
else do executions on the same implementation.
FR 0b0
FI 0b0 Special Registers Altered
end FPRF FR=0bU FI=0bU FX OX UX ZX
XX=0bU VXSNAN VXSQRT
Let XT be the value 32×TX + T.
Let XB be the value 32×BX + B. VSR Data Layout for xsrsqrtesp
src = VSR[XB]
Let src be the double-precision floating-point value in
doubleword element 0 of VSR[XB]. DP unused
tgt = VSR[XT]
A single-precision floating-point estimate of the
DP undefined
reciprocal square root of src is placed into doubleword
0 64 127
element 0 of VSR[XT] in double-precision format.
1
estimate – ---------------
src 1
------------------------------------------------- ----------------
1 16384
----------------
src
src
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v dQNaN v Q(src)
v +Zero v +Zero v SQRT(src) v +Infinity v src
vxsqrt_flag 1 vxsqrt_flag 1 vxsnan_flag 1
Explanation:
src The double-precision floating-point value in doubleword element 0 of VSR[XB].
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
SQRT(x) The unbounded-precision square root of the floating-point value x.
Q(x) Return a QNaN with the payload of x.
v The intermediate result having unbounded signficand precision and unbounded exponent range.
VSR[VRT+32]
tgt
src
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v dQNaN v quiet(src)
vxsqrt_flag 1 vxsqrt_flag 1 v +Zero v +Zero v sqrt(src) v +Infinity v src
vxsnan_flag 1
Explanation:
src The quad-precision floating-point value in VSR[VRB+32].
dQNaN Default quiet NaN (0x7FFF_8000_0000_0000_0000_0000_0000).
NZF Nonzero finite number.
sqrt(x) Return the normalized1 square root of floating-point value x, having unbounded significand precision and exponent range.
quiet(x) Convert x to the corresponding Quiet NaN.
v The intermediate result having unbounded significand precision and unbounded exponent range.
src
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v dQNaN v Q(src)
v +Zero v +Zero v SQRT(src) v +Infinity v src
vxsqrt_flag 1 vxsqrt_flag 1 vxsnan_flag 1
Explanation:
src The double-precision floating-point value in doubleword element 0 of VSR[XB].
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
SQRT(x) The unbounded-precision and exponent range square root of the floating-point value x.
Q(x) Return a QNaN with the payload of x.
v The intermediate result having unbounded signficand precision and unbounded exponent range.
VSX Scalar Subtract Double-Precision The result is placed into doubleword element 0 of
XX3-form VSR[XT].
1. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared,
and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo-
nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an interme-
diate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation.
2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the num-
ber of bits the significand was shifted.
src2
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
-Infinity v –Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
-NZF v +Infinity v S(src1,src2) v src1 v src1 v S(src1,src2) v –Infinity v src2
vxsnan_flag 1
v Q(src2)
-Zero v +Infinity v –src2 v –Zero v Rezd v –src2 v –Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v +Infinity v –src2 v Rezd v +Zero v –src2 v –Infinity v src2
vxsnan_flag 1
src1
v Q(src2)
+NZF v +Infinity v S(src1,src2) v src1 v src1 v S(src1,src2) v –Infinity v src2
vxsnan_flag 1
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v src1
QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1
vxsnan_flag 1
v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 The double-precision floating-point value in doubleword element 0 of VSR[XB].
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs).
S(x,y) The floating-point value y is negated and then added to the floating-point value x.
S(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision.
Note: If x = y, v is considered to be an exact-zero-difference result (Rezd).
Q(x) Return a QNaN with the payload of x.
v The intermediate result having unbounded signficand precision and unbounded exponent range.
src2
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN
-Infinity v -Infinity
vxisi_flag 1
-NZF v sub(src1,src2) v src1 v sub(src1,src2)
VSX Scalar Subtract Single-Precision The result is placed into doubleword element 0 of
XX3-form VSR[XT].
1. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared,
and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two
exponents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an
intermediate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation.
2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number
of bits the significand was shifted.
src2
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
-Infinity v –Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
-NZF v +Infinity v S(src1,src2) v src1 v src1 v S(src1,src2) v –Infinity v src2
vxsnan_flag 1
v Q(src2)
-Zero v +Infinity v –src2 v –Zero v Rezd v –src2 v –Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v +Infinity v –src2 v Rezd v +Zero v –src2 v –Infinity v src2
vxsnan_flag 1
src1
v Q(src2)
+NZF v +Infinity v S(src1,src2) v src1 v src1 v S(src1,src2) v –Infinity v src2
vxsnan_flag 1
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v src1
QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1
vxsnan_flag 1
v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Explanation:
src1 The double-precision floating-point value in doubleword element 0 of VSR[XA].
src2 The double-precision floating-point value in doubleword element 0 of VSR[XB].
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs).
S(x,y) The floating-point value y is negated and then added to the floating-point value x.
S(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision.
Note: If x = y, v is considered to be an exact-zero-difference result (Rezd).
Q(x) Return a QNaN with the payload of x.
v The intermediate result having unbounded signficand precision and unbounded exponent range.
60 BF // /// B 106 BX /
0 6 9 11 16 21 30 31
XB BX || B
src VSR[XB]{0:63}
e_b VSR[XB]{1:11} - 1023
fe_flag IsNaN(src) | IsInf(src) | IsZero(src) |
IsNeg(src) | ( e_b <= -970 )
fg_flag IsInf(src) | IsZero(src) | IsDen(src)
fl_flag xsrsqrtedp_error() <= 2-14
CR[BF] 0b1 || fg_flag || fe_flag || 0b0
VSX Scalar Test Data Class Double-Precision Let XB be the sum 32×BX + B.
XX2-form
Let src be the double-precision floating-point value in
xststdcdp BF,XB,DCMX doubleword element 0 of VSR[XB].
60 BF DCMX B 362 BX /
0 6 9 16 21 30 31
Bit 0 of CR field BF and bit 0 of FPCC are set to the sign
bit of src.
if MSR.VSX=0 then VSX_Unavailable()
Bit 1 of CR field BF and bit 1 of FPCC are set to 0b0.
src VSR[32×BX+B].dword[0]
exponent src.bit[1:11] Bit 2 of CR field BF and bit 2 of FPCC are set to indicate
fraction src.bit[12:63] whether the data class of src, as represented in
double-precision format, matches any of the data
class.Infinity (exponent = 0x7FF) & (fraction = 0) classes specified by DCMX (Data Class Mask).
class.NaN (exponent = 0x7FF) & (fraction != 0)
class.Zero (exponent = 0x000) & (fraction = 0) DCMX bit Data Class
class.Denormal (exponent = 0x000) & (fraction != 0)
0 NaN
match (DCMX.bit[0] & class.NaN) | 1 +Infinity
(DCMX.bit[1] & class.Infinity & !sign) | 2 -Infinity
(DCMX.bit[2] & class.Infinity & sign) | 3 +Zero
(DCMX.bit[3] & class.Zero & !sign) | 4 -Zero
(DCMX.bit[4] & class.Zero & sign) |
5 +Denormal
(DCMX.bit[5] & class.Denormal & !sign) |
(DCMX.bit[6] & class.Denormal & sign) 6 -Denormal
VSX Scalar Test Data Class Quad-Precision Let src be the quad-precision floating-point value in
X-form VSR[VRB+32].
xststdcqp BF,VRB,DCMX Let the DCMX (Data Class Mask) field specify one or
more of the 7 possible data classes, where each bit
63 BF DCMX VRB 708 /
0 6 9 16 21 31 corresponds to a specific data class.
src
VSX Scalar Test Data Class Single-Precision Let XB be the sum 32×BX + B.
XX2-form
Let src be the double-precision floating-point value in
xststdcsp BF,XB,DCMX doubleword element 0 of VSR[XB].
60 BF DCMX B 298 BX /
0 6 9 16 21 30 31
Bit 0 of CR field BF and bit 0 of FPCC are set to the sign
bit of src.
if MSR.VSX=0 then VSX_Unavailable()
Bit 1 of CR field BF and bit 1 of FPCC are set to 0b0.
src VSR[32×BX+B].dword[0]
exponent src.bit[1:11] Bit 2 of CR field BF and bit 2 of FPCC are set to indicate
fraction src.bit[12:63] whether the data class of src, as represented in
single-precision format, matches any of the data
class.Infinity (exponent = 0x7FF) & (fraction = 0) classes specified by DCMX (Data Class Mask).
class.NaN (exponent = 0x7FF) & (fraction != 0)
class.Zero (exponent = 0x000) & (fraction = 0) DCMX bit Data Class
class.Denormal (exponent = 0x000) & (fraction != 0) |
0 NaN
(exponent > 0x000) & (exponent < 0x381)
1 +Infinity
match (DCMX.bit[0] & class.NaN) | 2 -Infinity
(DCMX.bit[1] & class.Infinity & !sign) | 3 +Zero
(DCMX.bit[2] & class.Infinity & sign) | 4 -Zero
(DCMX.bit[3] & class.Zero & !sign) |
5 +Denormal
(DCMX.bit[4] & class.Zero & sign) |
(DCMX.bit[5] & class.Denormal & !sign) | 6 -Denormal
(DCMX.bit[6] & class.Denormal & sign)
Bit 3 of CR field BF and bit 3 of FPCC are set to indicate if
not_SP_value (src != Convert_SPtoDP(Convert_DPtoSP(src))) src is not representable in single-precision format.
Let XB be the sum 32×BX + B. Let src be the quad-precision floating-point value in
VSR[VRB+32].
Let src be the double-precision floating-point value in
doubleword element 0 of VSR[XB]. The contents of the exponent field of src (bits 1:15) are
zero-extended and placed into doubleword 0 of
The value of the exponent field in src is placed into VSR[VRT+32].
GPR[RT] in unsigned integer format.
The contents of doubleword 1 of VSR[VRT+32] are set to
Special Registers Altered: 0.
None
Special Registers Altered:
Programming Note None
This instruction can be used to operate on a
single-precision source operand.
tgt GPR[RT]
0 63 127
src VSR[VRB+32]
Let src be the double-precision floating-point value in The significand of src is placed into VSR[VRT+32].
doubleword element 0 of VSR[XB].
If the value of the exponent field of src is equal to
The significand of src is placed into GPR[RT] in 0b000_0000_0000_0000 (i.e., Zero or Denormal value) or
unsigned integer format. If src is a normal value, the 0b111_1111_1111 (i.e., Infinity or NaN), 0b0 is placed
implicit leading bit is set to 1. into bit 15 of VSR[VRT+32]. Otherwise (i.e., Normal
value), 0b1 is placed into bit 15 of VSR[VRT+32]. The
Special Registers Altered: contents of bits 0:14 of VSR[VRT+32] are set to 0.
None
Special Registers Altered:
Programming Note None
This instruction can be used to operate on a
single-precision source operand.
tgt GPR[RT]
0 64 127
src VSR[VRB+32]
tgt VSR[VRT+32]
0 127
VSX Vector Absolute Value Double-Precision VSX Vector Absolute Value Single-Precision
XX2-form XX2-form
xvabsdp XT,XB xvabssp XT,XB
XT TX || T XT TX || T
XB BX || B XB BX || B
VSX Vector Add Double-Precision XX3-form The result is placed into doubleword element i of
VSR[XT] in double-precision format.
xvadddp XT,XA,XB
See Table 98, “Vector Floating-Point Final Result,”
60 T A B 96 AX BX TX on page 661.
0 6 11 16 21 29 30 31
If a trap-enabled exception occurs in any element of
XT TX || T the vector, no results are written to VSR[XT].
XA AX || A
XB BX || B
Special Registers Altered
ex_flag 0b0
FX OX UX XX VXSNAN VXISI
do i=0 to 127 by 64
reset_xflags() VSR Data Layout for xvadddp
src1 VSR[XA]{i:i+63}
src1 = VSR[XA]
src2 VSR[XB]{i:i+63}
v{0:inf} AddDP(src1,src2) DP DP
result{i:i+63} RoundToDP(RN,v)
if(vxsnan_flag) then SetFX(VXSNAN) src2 = VSR[XB]
if(vxisi_flag) then SetFX(VXISI)
if(ox_flag) then SetFX(OX) DP DP
if(ux_flag) then SetFX(UX) tgt = VSR[XT]
if(xx_flag) then SetFX(XX)
ex_flag ex_flag | (VE & vxsnan_flag) DP DP
ex_flag ex_flag | (VE & vxisi_flag) 0 64 127
ex_flag ex_flag | (OE & ox_flag)
ex_flag ex_flag | (UE & ux_flag)
ex_flag ex_flag | (XE & xx_flag)
end
1. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared,
and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo-
nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an interme-
diate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation.
2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the num-
ber of bits the significand was shifted.
src2
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
-Infinity v -Infinity v -Infinity v -Infinity v -Infinity v -Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
-NZF v -Infinity v A(src1,src2) v src1 v src1 v A(src1,src2) v +Infinity v src2
vxsnan_flag 1
v Q(src2)
-Zero v -Infinity v src2 v -Zero v Rezd v src2 v +Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v -Infinity v src2 v Rezd v +Zero v src2 v +Infinity v src2
vxsnan_flag 1
src1
v Q(src2)
+NZF v -Infinity v A(src1,src2) v src1 v src1 v A(src1,src2) v +Infinity v src2
vxsnan_flag 1
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v src1
QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1
vxsnan_flag 1
v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Explanation:
src1 The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}).
src2 The double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}).
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs).
A(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision.
Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd).
Q(x) Return a QNaN with the payload of x.
v The intermediate result having unbounded signficand precision and unbounded exponent range.
Is q inexact? (q g v)
vxsnan_flag
vxsqrt_flag
vximz_flag
vxisi_flag
vxidi_flag
vxzdz_flag
zx_flag
OE
UE
VE
XE
Case ZE Returned Results and Status Setting
– – – – – 0 0 0 0 0 0 0 – – – – T(r)
– – – 0 – – – – – – – 1 – – – – T(r), fx(ZX)
– – – 1 – – – – – – – 1 – – – – fx(ZX), error()
0 – – – – – – – – – 1 – – – – – T(r), fx(VXSQRT)
0 – – – – – – – – 1 – – – – – – T(r), fx(VXZDZ)
0 – – – – – – – 1 – – – – – – – T(r), fx(VXIDI)
0 – – – – – – 1 – – – – – – – – T(r), fx(VXISI)
0 – – – – 0 1 – – – – – – – – – T(r), fx(VXIMZ)
Special 0 – – – – 1 0 – – – – – – – – – T(r), fx(VXSNAN)
0 – – – – 1 1 – – – – – – – – – T(r), fx(VXSNAN), fx(VXIMZ)
1 – – – – – – – – – 1 – – – – – T(r), fx(VXSQRT)
1 – – – – – – – – 1 – – – – – – fx(VXZDZ), error()
1 – – – – – – – 1 – – – – – – – fx(VXIDI), error()
1 – – – – – – 1 – – – – – – – – fx(VXISI), error()
1 – – – – 0 1 – – – – – – – – – fx(VXIMZ), error()
1 – – – – 1 0 – – – – – – – – – fx(VXSNAN), error()
1 – – – – 1 1 – – – – – – – – – fx(VXSNAN), fx(VXIMZ), error()
– – – – – – – – – – – – no – – – T(r)
– – – – 0 – – – – – – – yes no – – T(r), fx(XX)
Normal – – – – 0 – – – – – – – yes yes – – T(r), fx(XX)
– – – – 1 – – – – – – – yes no – – T(r), fx(XX), error()
– – – – 1 – – – – – – – yes yes – – T(r), fx(XX), error()
Explanation:
– The results do not depend on this condition.
fx(x) FX is set to 1 if x=0. x is set to 1.
q The value defined in Table 50, “Scalar Floating-Point Intermediate Result Handling,” on page 515, signficand rounded to the target
precision, unbounded exponent range.
r The value defined in Table 50, “Scalar Floating-Point Intermediate Result Handling,” on page 515, signficand rounded to the target
precision, bounded exponent range.
v The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range.
OX Floating-Point Overflow Exception status flag, FPSCROX.
error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set
to any mode other than the ignore-exception mode. Update of the target VSR is suppressed for all vector elements.
T(x) The value x is placed in element i of VSR[XT] in the target precision format (where i c {0,1} for results with 64-bit elements, and i c
{0,1,3,4}) for results with 32-bit elements).
UX Floating-Point Underflow Exception status flag, FPSCRUX
VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN.
VXSQRT Floating-Point Invalid Operation Exception (Invalid Square Root) status flag, FPSCRVXSQRT.
VXIDI Floating-Point Invalid Operation Exception (Infinity ÷ Infinity) status flag, FPSCRVXIDI.
VXIMZ Floating-Point Invalid Operation Exception (Infinity × Zero) status flag, FPSCRVXIMZ.
VXISI Floating-Point Invalid Operation Exception (Infinity – Infinity) status flag, FPSCRVXISI.
VXZDZ Floating-Point Invalid Operation Exception (Zero ÷ Zero) status flag, FPSCRVXZDZ.
XX Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new
value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI.
ZX Floating-Point Zero Divide Exception status flag, FPSCRZX.
Is q inexact? (q g v)
vxsnan_flag
vxsqrt_flag
vximz_flag
vxisi_flag
vxidi_flag
vxzdz_flag
zx_flag
OE
UE
VE
XE
ZE
Case Returned Results and Status Setting
Explanation:
– The results do not depend on this condition.
fx(x) FX is set to 1 if x=0. x is set to 1.
q The value defined in Table 50, “Scalar Floating-Point Intermediate Result Handling,” on page 515, signficand rounded to the target
precision, unbounded exponent range.
r The value defined in Table 50, “Scalar Floating-Point Intermediate Result Handling,” on page 515, signficand rounded to the target
precision, bounded exponent range.
v The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range.
OX Floating-Point Overflow Exception status flag, FPSCROX.
error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set
to any mode other than the ignore-exception mode. Update of the target VSR is suppressed for all vector elements.
T(x) The value x is placed in element i of VSR[XT] in the target precision format (where i c {0,1} for results with 64-bit elements, and i c
{0,1,3,4}) for results with 32-bit elements).
UX Floating-Point Underflow Exception status flag, FPSCRUX
VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN.
VXSQRT Floating-Point Invalid Operation Exception (Invalid Square Root) status flag, FPSCRVXSQRT.
VXIDI Floating-Point Invalid Operation Exception (Infinity ÷ Infinity) status flag, FPSCRVXIDI.
VXIMZ Floating-Point Invalid Operation Exception (Infinity × Zero) status flag, FPSCRVXIMZ.
VXISI Floating-Point Invalid Operation Exception (Infinity – Infinity) status flag, FPSCRVXISI.
VXZDZ Floating-Point Invalid Operation Exception (Zero ÷ Zero) status flag, FPSCRVXZDZ.
XX Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new
value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI.
ZX Floating-Point Zero Divide Exception status flag, FPSCRZX.
VSX Vector Add Single-Precision XX3-form The result is placed into word element i of
VSR[XT] in single-precision format.
xvaddsp XT,XA,XB
See Table 98, “Vector Floating-Point Final Result,”
60 T A B 64 AX BX TX on page 661.
0 6 11 16 21 29 30 31
If a trap-enabled exception occurs in any element of
XT TX || T the vector, no results are written to VSR[XT].
XA AX || A
XB BX || B
Special Registers Altered
ex_flag 0b0
FX OX UX XX VXSNAN VXISI
do i=0 to 127 by 32
reset_xflags() VSR Data Layout for xvaddsp
src1 VSR[XA]{i:i+31}
src1 = VSR[XA]
src2 VSR[XB]{i:i+31}
v{0:inf} AddSP(src1,src2) SP SP SP SP
result{i:i+31} RoundToSP(RN,v)
if(vxsnan_flag) then SetFX(VXSNAN) src2 = VSR[XB]
if(vxisi_flag) then SetFX(VXISI)
if(ox_flag) then SetFX(OX) SP SP SP SP
if(ux_flag) then SetFX(UX) tgt = VSR[XT]
if(xx_flag) then SetFX(XX)
ex_flag ex_flag | (VE & vxsnan_flag) SP SP SP SP
ex_flag ex_flag | (VE & vxisi_flag) 0 32 64 96 127
ex_flag ex_flag | (OE & ox_flag)
ex_flag ex_flag | (UE & ux_flag)
ex_flag ex_flag | (XE & xx_flag)
end
1. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared,
and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo-
nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an interme-
diate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation.
2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the num-
ber of bits the significand was shifted.
src2
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
-Infinity v -Infinity v -Infinity v -Infinity v -Infinity v -Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
-NZF v -Infinity v A(src1,src2) v src1 v src1 v A(src1,src2) v +Infinity v src2
vxsnan_flag 1
v Q(src2)
-Zero v -Infinity v src2 v -Zero v Rezd v src2 v +Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v -Infinity v src2 v Rezd v +Zero v src2 v +Infinity v src2
vxsnan_flag 1
src1
v Q(src2)
+NZF v -Infinity v A(src1,src2) v src1 v src1 v A(src1,src2) v +Infinity v src2
vxsnan_flag 1
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v src1
QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1
vxsnan_flag 1
v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Explanation:
src1 The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}).
src2 The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}).
dQNaN Default quiet NaN (0x7FC0_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs).
A(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision.
Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd).
Q(x) Return a QNaN with the payload of x.
v The intermediate result having unbounded signficand precision and unbounded exponent range.
VSX Vector Compare Equal To Two zero inputs of same or different signs return
Double-Precision XX3-form true for that element.
xvcmpeqdp XT,XA,XB (Rc=0) Two infinity inputs of same signs return true for
xvcmpeqdp. XT,XA,XB (Rc=1) that element.
VSX Vector Compare Equal To Two zero inputs of same or different signs return
Single-Precision XX3-form true for that element.
xvcmpeqsp XT,XA,XB (Rc=0) Two infinity inputs of same signs return true for
xvcmpeqsp. XT,XA,XB (Rc=1) that element.
VSX Vector Compare Greater Than or Equal The contents of doubleword element i of VSR[XT]
To Double-Precision XX3-form are set to all 1s if src1 is greater than or equal to
the double-precision floating-point operand in
xvcmpgedp XT,XA,XB (Rc=0) doubleword element i of VSR[XB]src2, and is set
xvcmpgedp. XT,XA,XB (Rc=1) to all 0s otherwise.
VSX Vector Compare Greater Than or Equal The contents of word element i of VSR[XT] are set
To Single-Precision XX3-form to all 1s if src1 is greater than or equal to src2,
and is set to all 0s otherwise.
xvcmpgesp XT,XA,XB (Rc=0)
xvcmpgesp. XT,XA,XB (Rc=1) A NaN input causes the comparison to return false
for that element.
60 T A B Rc 83 AX BX TX
0 6 11 16 21 22 29 30 31 Two zero inputs of same or different signs return
true for that element.
XT TX || T
XA AX || A Two infinity inputs of same signs return true for
XB BX || B
that element.
ex_flag 0b0
all_false 0b1
all_true 0b1
If Rc=1, CR Field 6 is set as follows.
– Bit 0 of CR[6] is set to indicate all vector elements
do i=0 to 127 by 32 compared true.
reset_xflags() – Bit 1 of CR[6] is set to 0.
src1 VSR[XA]{i:i+31} – Bit 2 of CR[6] is set to indicate all vector elements
src2 VSR[XB]{i:i+31} compared false.
– Bit 3 of CR[6] is set to 0.
if( IsSNaN(src1) | IsSNaN(src2) ) then do
vxsnan_flag 0b1 If a trap-enabled exception occurs in any element of
if(VE=0) then vxvc_flag 0b1 the vector, no results are written to VSR[XT] and the
end
contents of CR[6] are undefined if Rc is equal to 1.
else vxvc_flag IsQNaN(src1) | IsQNaN(src2)
VSX Vector Compare Greater Than The contents of doubleword element i of VSR[XT]
Double-Precision XX3-form are set to all 1s if src1 is greater than src2, and
is set to all 0s otherwise.
xvcmpgtdp XT,XA,XB (Rc=0)
xvcmpgtdp. XT,XA,XB (Rc=1) A NaN input causes the comparison to return false
for that element.
60 T A B Rc 107 AX BX TX
0 6 11 16 21 22 29 30 31 Two zero inputs of same or different signs return
false for that element.
XT TX || T
XA AX || A If Rc=1, CR Field 6 is set as follows.
XB BX || B
– Bit 0 of CR[6] is set to indicate all vector elements
ex_flag 0b0
all_false 0b1
compared true.
all_true 0b1 – Bit 1 of CR[6] is set to 0.
– Bit 2 of CR[6] is set to indicate all vector elements
do i=0 to 127 by 64 compared false.
reset_xflags() – Bit 3 of CR[6] is set to 0.
src1 VSR[XA]{i:i+63}
src2 VSR[XB]{i:i+63} If a trap-enabled exception occurs in any element of
the vector, no results are written to VSR[XT] and the
if( IsSNaN(src1) | IsSNaN(src2) ) then do contents of CR[6] are undefined if Rc is equal to 1.
vxsnan_flag 0b1
if(VE=0) then vxvc_flag 0b1
Special Registers Altered
end
else vxvc_flag IsQNaN(src1) | IsQNaN(src2)
CR[6] . . . . . . . . . . . . . . . . . . . . . . . . . . (if Rc=1)
FX VXSNAN VXVC
if( CompareGTDP(src1,src2) ) then do
result{i:i+63} 0xFFFF_FFFF_FFFF_FFFF VSR Data Layout for xvcmpgtdp[.]
all_false 0b0
end src1 = VSR[XA]
else do DP DP
result{i:i+63} 0x0000_0000_0000_0000
all_true 0b0 src2 = VSR[XB]
end
if(vxsnan_flag) then SetFX(VXSNAN) DP DP
if(vxvc_flag) then SetFX(VXVC)
tgt = VSR[XT]
ex_flag ex_flag | (VE & vxsnan_flag)
ex_flag ex_flag | (VE & vxvc_flag) MD MD
end 0 64 127
if(Rc=1) then do
if( !vex_flag ) then
CR[6] all_true || 0b0 || all_false || 0b0
else
CR[6] 0bUUUU
end
VSX Vector Compare Greater Than The contents of word element i of VSR[XT] are set
Single-Precision XX3-form to all 1s if src1 is greater than src2, and is set to
all 0s otherwise.
xvcmpgtsp XT,XA,XB (Rc=0)
xvcmpgtsp. XT,XA,XB (Rc=1) A NaN input causes the comparison to return false
for that element.
60 T A B Rc 75 AX BX TX
0 6 11 16 21 22 29 30 31 Two zero inputs of same or different signs return
false for that element.
XT TX || T
XA AX || A If Rc=1, CR Field 6 is set as follows.
XB BX || B
– Bit 0 of CR[6] is set to indicate all vector elements
ex_flag 0b0
all_false 0b1
compared true.
all_true 0b1 – Bit 1 of CR[6] is set to 0.
– Bit 2 of CR[6] is set to indicate all vector elements
do i=0 to 127 by 32 compared false.
reset_xflags() – Bit 3 of CR[6] is set to 0.
src1 VSR[XA]{i:i+31}
src2 VSR[XB]{i:i+31} If a trap-enabled exception occurs in any element of
the vector, no results are written to VSR[XT] and the
if( IsSNaN(src1) | IsSNaN(src2) ) then do contents of CR[6] are undefined if Rc is equal to 1.
vxsnan_flag 0b1
if(VE=0) then vxvc_flag 0b1
Special Registers Altered
end
else vxvc_flag IsQNaN(src1) | IsQNaN(src2)
CR[6] . . . . . . . . . . . . . . . . . . . . . . . . . . (if Rc=1)
FX VXSNAN VXVC
if( CompareGTSP(src1,src2) ) then do
result{i:i+31} 0xFFFF_FFFF VSR Data Layout for xvcmpgtsp[.]
all_false 0b0
end src1 = VSR[XA]
else do SP SP SP SP
result{i:i+31} 0x0000_0000
all_true 0b0 src2 = VSR[XB]
end
if(vxsnan_flag) then SetFX(VXSNAN) SP SP SP SP
if(vxvc_flag) then SetFX(VXVC)
tgt = VSR[XT]
ex_flag ex_flag | (VE & vxsnan_flag)
ex_flag ex_flag | (VE & vxvc_flag) MW MW MW MW
end 0 32 64 96 127
if(Rc=1) then do
if( !vex_flag ) then
CR[6] all_true || 0b0 || all_false || 0b0
else
CR[6] 0bUUUU
end
VSX Vector Copy Sign Double-Precision VSX Vector Copy Sign Single-Precision
XX3-form XX3-form
xvcpsgndp XT,XA,XB xvcpsgnsp XT,XA,XB
XT TX || T XT TX || T
XA AX || A XA AX || A
XB BX || B XB BX || B
For each vector element i from 0 to 1, do the following. For each vector element i from 0 to 3, do the following.
The contents of bit 0 of doubleword element i of The contents of bit 0 of word element i of VSR[XA]
VSR[XA] are concatenated with the contents of are concatenated with the contents of bits 1:31 of
bits 1:63 of doubleword element i of VSR[XB] and word element i of VSR[XB] and placed into word
placed into doubleword element i of VSR[XT]. element i of VSR[XT].
VSR Data Layout for xvcpsgndp VSR Data Layout for xvcpsgnsp
src1 = VSR[XA] src1 = VSR[XA]
DP DP SP SP SP SP
src2 = VSR[XB] src2 = VSR[XB]
DP DP SP SP SP SP
tgt = VSR[XT] tgt = VSR[XT]
DP DP SP SP SP SP
0 64 127 0 32 64 96 127
60 T /// B 393 BX TX
0 6 11 16 21 30 31
XT TX || T
XB BX || B
ex_flag 0b0
do i=0 to 127 by 64
reset_xflags()
src VSR[XB]{i:i+63}
result{i:i+31} RoundToSP(RN,src)
result{i+32:i+63} 0xUUUU_UUUU
if(vxsnan_flag) then SetFX(VXSNAN)
if(ox_flag) then SetFX(OX)
if(ux_flag) then SetFX(UX)
if(xx_flag) then SetFX(XX)
ex_flag ex_flag | (VE & vxsnan_flag)
ex_flag ex_flag | (OE & ox_flag)
ex_flag ex_flag | (UE & ux_flag)
ex_flag ex_flag | (XE & xx_flag)
end
do i=0 to 127 by 64
Programming Note
reset_xflags()
result{i:i+63} ConvertDPtoSD(VSR[XB]{i:i+63}) xvcvdpsxds rounds using Round towards Zero
if(vxsnan_flag) then SetFX(VXSNAN) rounding mode. For other rounding modes, soft-
if(vxcvi_flag) then SetFX(VXCVI) ware must use a Round to Double-Precision Inte-
if(xx_flag) then SetFX(XX) ger instruction that corresponds to the desired
ex_flag ex_flag | (VE & vxsnan_flag) rounding mode, including xvrdpic which uses the
ex_flag ex_flag | (VE & vxcvi_flag) rounding mode specified by the RN.
ex_flag ex_flag | (XE & xx_flag)
end
VSX Vector Convert with round to zero If a trap-enabled exception occurs in any element of
Double-Precision to Signed Word format the vector, no results are written to VSR[XT].
XX2-form
Special Registers Altered
xvcvdpsxws XT,XB FX XX VXSNAN VXCVI
60 T /// B 216 BX TX
0 6 11 16 21 30 31 VSR Data Layout for xvcvdpsxws
src = VSR[XB]
XT TX || T
XB BX || B DP DP
ex_flag 0b0
tgt = VSR[XT]
do i=0 to 127 by 64 SW undefined SW undefined
reset_xflags()
0 32 64 96 127
result{i:i+31} ConvertDPtoSW(VSR[XB]{i:i+63})
result{i+32:i+63} 0xUUUU_UUUU
if(vxsnan_flag) then SetFX(VXSNAN) Programming Note
if(vxcvi_flag) then SetFX(VXCVI) xvcvdpsxws rounds using Round towards Zero
if(xx_flag) then SetFX(XX) rounding mode. For other rounding modes,
ex_flag ex_flag | (VE & vxsnan_flag)
software must use a Round to Double-Precision
ex_flag ex_flag | (VE & vxcvi_flag)
Integer instruction that corresponds to the desired
ex_flag ex_flag | (XE & xx_flag)
end rounding mode, including xvrdpic which uses the
rounding mode specified by RN.
if( ex_flag = 0 ) then VSR[XT] result
do i=0 to 127 by 64
Programming Note
reset_xflags()
result{i:i+63} ConvertDPtoUD(VSR[XB]{i:i+63}) xvcvdpuxds rounds using Round towards Zero
if(vxsnan_flag) then SetFX(VXSNAN) rounding mode. For other rounding modes,
if(vxcvi_flag) then SetFX(VXCVI) software must use a Round to Double-Precision
if(xx_flag) then SetFX(XX) Integer instruction that corresponds to the desired
ex_flag ex_flag | (VE & vxsnan_flag) rounding mode, including xvrdpic which uses the
ex_flag ex_flag | (VE & vxcvi_flag)
rounding mode specified by the RN.
ex_flag ex_flag | (XE & xx_flag)
end
VSX Vector Convert with round to zero If a trap-enabled exception occurs in any element of
Double-Precision to Unsigned Word format the vector, no results are written to VSR[XT].
XX2-form
Special Registers Altered
xvcvdpuxws XT,XB FX XX VXSNAN VXCVI
60 T /// B 200 BX TX
0 6 11 16 21 30 31 VSR Data Layout for xvcvdpuxws
src = VSR[XB]
XT TX || T
XB BX || B DP DP
ex_flag 0b0
tgt = VSR[XT]
do i=0 to 127 by 64 UW undefined UW undefined
reset_xflags()
0 32 64 96 127
result{i:i+31} ConvertDPtoUW(VSR[XB]{i:i+63})
result{i+32:i+63} 0xUUUU_UUUU
if(vxsnan_flag) then SetFX(VXSNAN) Programming Note
if(vxcvi_flag) then SetFX(VXCVI) xvcvdpuxws rounds using Round towards Zero
if(xx_flag) then SetFX(XX) rounding mode. For other rounding modes,
ex_flag ex_flag | (VE & vxsnan_flag)
software must use a Round to Double-Precision
ex_flag ex_flag | (VE & vxcvi_flag)
Integer instruction that corresponds to the desired
ex_flag ex_flag | (XE & xx_flag)
end rounding mode, including xvrdpic which uses the
rounding mode specified by RN.
if( ex_flag = 0 ) then VSR[XT] result
VSX Vector Convert Half-Precision to If src is an SNaN, the result is the single-precision
Single-Precision format XX2-form representation of that SNaN converted to a QNaN.
For each integer value i from 0 to 3, do the following. Special Registers Altered:
Let src be the half-precision floating-point value in FX VXSNAN
the rightmost halfword of word element i of
VSR[XB].
60 T /// B 457 BX TX
0 6 11 16 21 30 31
XT TX || T
XB BX || B
ex_flag 0b0
do i=0 to 127 by 64
reset_xflags()
result{i:i+63} ConvertSPtoDP(VSR[XB]{i:i+31})
if(vxsnan_flag) then SetFX(VXSNAN)
ex_flag ex_flag | (VE & vxsnan_flag)
end
VSX Vector Convert with round If src is an SNaN, the result is the half-precision
Single-Precision to Half-Precision format representation of that SNaN converted to a QNaN.
XX2-form
Otherwise, if src is a QNaN, the result is the
xvcvsphp XT,XB
half-precision representation of that QNaN.
60 T 25 B 475 BX TX
0 6 11 16 21 30 31 Otherwise, if src is an Infinity, the result is the
half-precision representation of Infinity with the
if MSR.VSX=0 then VSX_Unavailable() same sign as src.
reset_flags()
Otherwise, if src is a Zero, the result is the
do i = 0 to 3
half-precision representation of Zero with the
src bfp_CONVERT_FROM_BFP32(VSR[BX×32+B].word[i]) same sign as src.
rnd bfp_ROUND_TO_BFP16(FPSCR.RN,rnd)
result.hword[2×i] 0x0000 Otherwise, the result is the half-precision
result.hword[2×i+1] bfp_CONVERT_TO_BFP16(rnd) representation of src rounded to half-precision
using the rounding mode specified by RN.
if(vxsnan_flag) then SetFX(FPSCR.VXSNAN)
if(ox_flag) then SetFX(FPSCR.OX) The result is zero-extended and placed into word
if(ux_flag) then SetFX(FPSCR.UX) element i of VSR[XT].
if(xx_flag) then SetFX(FPSCR.XX)
If a trap-enabled exception occurs, VSR[XT] is not
ex_flag ex_flag | (FPSCR.VE & vxsnan_flag)
| (FPSCR.OE & ox_flag)
modified.
| (FPSCR.UE & ux_flag)
| (FPSCR.XE & xx_flag) Special Registers Altered:
end FX VXSNAN OX UX XX
do i=0 to 127 by 64
Programming Note
reset_xflags()
result{i:i+63} ConvertSPtoSD(VSR[XB]{i:i+31}) xvcvspsxds rounds using Round towards Zero
if(vxsnan_flag) then SetFX(VXSNAN) rounding mode. For other rounding modes,
if(vxcvi_flag) then SetFX(VXCVI) software must use a Round to Single-Precision
if(xx_flag) then SetFX(XX) Integer instruction that corresponds to the desired
ex_flag ex_flag | (VE & vxsnan_flag) rounding mode, including xvrspic which uses the
ex_flag ex_flag | (VE & vxcvi_flag)
rounding mode specified by RN.
ex_flag ex_flag | (XE & xx_flag)
end
do i=0 to 127 by 64
Programming Note
reset_xflags()
result{i:i+63} ConvertSPtoUD(VSR[XB]{i:i+31}) xvcvspuxds rounds using Round towards Zero
if(vxsnan_flag) then SetFX(VXSNAN) rounding mode. For other rounding modes,
if(vxcvi_flag) then SetFX(VXCVI) software must use a Round to Single-Precision
if(xx_flag) then SetFX(XX) Integer instruction that corresponds to the desired
ex_flag ex_flag | (VE & vxsnan_flag) rounding mode, including xvrspic which uses the
ex_flag ex_flag | (VE & vxcvi_flag)
rounding mode specified by RN.
ex_flag ex_flag | (XE & xx_flag)
end
VSX Vector Convert with round Signed VSX Vector Convert with round Signed
Doubleword to Double-Precision format Doubleword to Single-Precision format
XX2-form XX2-form
xvcvsxddp XT,XB xvcvsxdsp XT,XB
XT TX || T XT TX || T
XB BX || B XB BX || B
ex_flag 0b0 ex_flag 0b0
VSX Vector Convert Signed Word to VSX Vector Convert with round Signed Word
Double-Precision format XX2-form to Single-Precision format XX2-form
xvcvsxwdp XT,XB xvcvsxwsp XT,XB
do i = 0 to 1 ex_flag 0b0
src bfp_CONVERT_FROM_SI32(VSR[32×BX+B].dword[i].word[0])
VSR[32×TX+T].dword[i] bfp64_CONVERT_FROM_BFP(src) do i = 0 to 3
end reset_xflags()
v{0:inf} ConvertSWtoFP(VSR[32×BX+B].word[i])
Let XT be the value 32×TX + T. result.word[i] RoundToSP(RN,v)
Let XB be the value 32×BX + B. if(xx_flag) then SetFX(XX)
ex_flag ex_flag | (XE & xx_flag)
For each vector element i from 0 to 1, do the following. end
Let src be the signed integer value in bits 0:31 of
if(ex_flag=0) then VSR[32×TX+T] result
doubleword element i of VSR[XB].
Let XT be the value 32×TX + T.
src is placed into doubleword element i of VSR[XT]
Let XB be the value 32×BX + B.
in double-precision format.
For each vector element i from 0 to 3, do the following.
Special Registers Altered
Let src be the signed integer in word element i of
None
VSR[XB].
VSR Data Layout for xvcvsxwdp src is converted to an unbounded-precision
src = VSR[XB] floating-point value and rounded to
single-precision using the rounding mode
SW unused SW unused
specified by RN.
tgt = VSR[XT]
The result is placed into word element i of
DP DP VSR[XT] in single-precision format.
0 32 64 96 127
If a trap-enabled exception occurs in any element of
the vector, no results are written to VSR[XT].
VSX Vector Convert with round Unsigned VSX Vector Convert with round Unsigned
Doubleword to Double-Precision format Doubleword to Single-Precision format
XX2-form XX2-form
xvcvuxddp XT,XB xvcvuxdsp XT,XB
XT TX || T XT TX || T
XB BX || B XB BX || B
ex_flag 0b0 ex_flag 0b0
VSX Vector Convert Unsigned Word to VSX Vector Convert with round Unsigned
Double-Precision format XX2-form Word to Single-Precision format XX2-form
xvcvuxwdp XT,XB xvcvuxwsp XT,XB
do i = 0 to 1 XT TX || T
src bfp_CONVERT_FROM_UI32(VSR[32×BX+B].dword[i].word[0]) XB BX || B
VSR[32×TX+T].dword[i] bfp64_CONVERT_FROM_BFP(src) ex_flag 0b0
end
do i=0 to 127 by 32
Let XT be the value 32×TX + T. reset_xflags()
Let XB be the value 32×BX + B. v{0:inf} ConvertUWtoFP(VSR[XB]{i:i+31})
result{i:i+31} RoundToSP(RN,v)
For each vector element i from 0 to 1, do the following. if(xx_flag) then SetFX(XX)
Let src be the unsigned integer value in bits 0:31 ex_flag ex_flag | (XE & xx_flag)
end
of doubleword element i of VSR[XB].
if( ex_flag = 0 ) then VSR[XT] result
src is placed into doubleword element i of VSR[XT]
in double-precision format. Let XT be the value 32×TX + T.
Let XB be the value 32×BX + B.
Special Registers Altered
None For each vector element i from 0 to 3, do the following.
Let src be the unsigned integer value in word
VSR Data Layout for xvcvuxwdp element i of VSR[XB].
src = VSR[XB]
src is converted to an unbounded-precision
UW unused UW unused floating-point value and rounded to
tgt = VSR[XT] single-precision using the rounding mode
specified by RN.
DP DP
0 32 64 96 127 The result is placed into word element i of
VSR[XT] in single-precision format.
VSX Vector Divide Double-Precision XX3-form The result is placed into doubleword element i of
VSR[XT] in double-precision format.
xvdivdp XT,XA,XB
See Table 98, “Vector Floating-Point Final Result,”
60 T A B 120 AX BX TX on page 661.
0 6 11 16 21 29 30 31
If a trap-enabled exception occurs in any element of
XT TX || T the vector, no results are written to VSR[XT].
XA AX || A
XB BX || B
Special Registers Altered
ex_flag 0b0
FX OX UX ZX XX
do i=0 to 127 by 64 VXSNAN VXIDI VXZDZ
reset_xflags()
src1 VSR[XA]{i:i+63} VSR Data Layout for xvdivdp
src2 VSR[XB]{i:i+63}
v{0:inf} DivideDP(src1,src2) src1 = VSR[XA]
result{i:i+63} RoundToDP(RN,v) DP DP
if(vxsnan_flag) then SetFX(VXSNAN)
if(vxidi_flag) then SetFX(VXIDI) src2 = VSR[XB]
if(vxisi_flag) then SetFX(VXZDZ)
if(ox_flag) then SetFX(OX) DP DP
if(ux_flag) then SetFX(UX)
tgt = VSR[XT]
if(xx_flag) then SetFX(XX)
if(zx_flag) then SetFX(ZX) DP DP
ex_flag ex_flag | (VE & vxsnan_flag) 0 64 127
ex_flag ex_flag | (VE & vxidi_flag)
ex_flag ex_flag | (VE & vxzdz_flag)
ex_flag ex_flag | (OE & ox_flag)
ex_flag ex_flag | (UE & ux_flag)
ex_flag ex_flag | (ZE & zx_flag)
ex_flag ex_flag | (XE & xx_flag)
end
src2
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v dQNaN v Q(src2)
-Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxidi_flag 1 vxidi_flag 1 vxsnan_flag 1
v +Infinity v –Infinity v Q(src2)
-NZF v +Zero v D(src1,src2) v D(src1,src2) v –Zero v src2
zx_flag 1 zx_flag 1 vxsnan_flag 1
v dQNaN v dQNaN v Q(src2)
-Zero v +Zero v +Zero v –Zero v –Zero v src2
vxzdz_flag 1 vxzdz_flag 1 vxsnan_flag 1
v dQNaN v dQNaN v Q(src2)
+Zero v –Zero v –Zero v +Zero v +Zero v src2
vxzdz_flag 1 vxzdz_flag 1 vxsnan_flag 1
src1
Explanation:
src1 The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}).
src2 The double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}).
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs).
D(x,y) Return the normalized quotient of floating-point value x divided by floating-point value y, having unbounded range and precision.
Q(x) Return a QNaN with the payload of x.
v The intermediate result having unbounded signficand precision and unbounded exponent range.
VSX Vector Divide Single-Precision XX3-form The result is placed into word element i of
VSR[XT] in single-precision format.
xvdivsp XT,XA,XB
See Table 98, “Vector Floating-Point Final Result,”
60 T A B 88 AX BX TX on page 661.
0 6 11 16 21 29 30 31
If a trap-enabled exception occurs in any element of
XT TX || T the vector, no results are written to VSR[XT].
XA AX || A
XB BX || B
Special Registers Altered
ex_flag 0b0
FX OX UX ZX XX
do i=0 to 127 by 32 VXSNAN VXIDI VXZDZ
reset_xflags()
src1 VSR[XA]{i:i+31} VSR Data Layout for xvdivsp
src2 VSR[XB]{i:i+31}
v{0:inf} DivideSP(src1,src2) src1 = VSR[XA]
result{i:i+31} RoundToSP(RN,v) SP SP SP SP
if(vxsnan_flag) then SetFX(VXSNAN)
if(vxidi_flag) then SetFX(VXIDI) src2 = VSR[XB]
if(vxisi_flag) then SetFX(VXZDZ)
if(ox_flag) then SetFX(OX) SP SP SP SP
if(ux_flag) then SetFX(UX)
tgt = VSR[XT]
if(xx_flag) then SetFX(XX)
if(zx_flag) then SetFX(ZX) SP SP SP SP
ex_flag ex_flag | (VE & vxsnan_flag) 0 32 64 96 127
ex_flag ex_flag | (VE & vxidi_flag)
ex_flag ex_flag | (VE & vxzdz_flag)
ex_flag ex_flag | (OE & ox_flag)
ex_flag ex_flag | (UE & ux_flag)
ex_flag ex_flag | (ZE & zx_flag)
ex_flag ex_flag | (XE & xx_flag)
end
src2
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v dQNaN v Q(src2)
-Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxidi_flag 1 vxidi_flag 1 vxsnan_flag 1
v +Infinity v –Infinity v Q(src2)
-NZF v +Zero v D(src1,src2) v D(src1,src2) v –Zero v src2
zx_flag 1 zx_flag 1 vxsnan_flag 1
v dQNaN v dQNaN v Q(src2)
-Zero v +Zero v +Zero v –Zero v –Zero v src2
vxzdz_flag 1 vxzdz_flag 1 vxsnan_flag 1
v dQNaN v dQNaN v Q(src2)
+Zero v –Zero v –Zero v +Zero v +Zero v src2
vxzdz_flag 1 vxzdz_flag 1 vxsnan_flag 1
src1
Explanation:
src1 The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}).
src2 The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}).
dQNaN Default quiet NaN (0x7FC0_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs).
D(x,y) Return the normalized quotient of floating-point value x divided by floating-point value y, having unbounded range and precision.
Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd).
Q(x) Return a QNaN with the payload of x.
v The intermediate result having unbounded signficand precision and unbounded exponent range.
VSX Vector Insert Exponent Double-Precision VSX Vector Insert Exponent Single-Precision
XX3-form XX3-form
xviexpdp XT,XA,XB xviexpsp XT,XA,XB
do i = 0 to 1 do i = 0 to 3
src1 VSR[32×AX+A].dword[i] src1 VSR[32×AX+A].word[i]
src2 VSR[32×BX+B].dword[i] src2 VSR[32×BX+B].word[i]
For each integer value i from 0 to 1, do the following. For each integer value i from 0 to 3, do the following.
Let src1 be the unsigned integer value in Let src1 be the unsigned integer value in word
doubleword element i of VSR[XA]. element i of VSR[XA].
Let src2 be the unsigned integer value in Let src2 be the unsigned integer value in word
doubleword element i of VSR[XB]. element i of VSR[XB].
The contents of bits 0 of src1 are placed into bit 0 The contents of bits 0 of src1 are placed into bit 0
of doubleword element i of VSR[XT]. of word element i of VSR[XT].
The contents of bits 53:63 of src2 are placed into The contents of bits 24:31 of src2 are placed into
bits 1:11 of doubleword element i of VSR[XT]. bits 1:8 of word element i of VSR[XT].
The contents of bits 12:63 of src1 are placed into The contents of bits 9:31 of src1 are placed into
bits 12:63 of doubleword element i of VSR[XT]. bits 9:31 of word element i of VSR[XT].
Part 1: src3
Multiply –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
p dQNaN p dQNaN p Q(src3)
–Infinity p +Infinity p +Infinity p –Infinity p –Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p Q(src3)
–NZF p +Infinity p M(src1,src3) p +Zero p –Zero p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
–Zero p +Zero p +Zero p –Zero p –Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Zero p –Zero p –Zero p +Zero p +Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
src1
p Q(src3)
+NZF p –Infinity p M(src1,src3) p –Zero p +Zero p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Infinity p –Infinity p +Infinity p +Infinity p +Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p src1
QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1
vxsnan_flag 1
p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Part 2: src2
Add –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
–Infinity v –Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
–NZF v –Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2
vxsnan_flag 1
v Q(src2)
–Zero v –Infinity v src2 v –Zero v Rezd v src2 v +Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v –Infinity v src2 v Rezd v +Zero v src2 v +Infinity v src2
vxsnan_flag 1
p
v Q(src2)
+NZF v –Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2
vxsnan_flag 1
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
QNaN & vp
vp vp vp vp vp vp vp
src1 is a NaN vxsnan_flag 1
QNaN & v Q(src2)
vp vp vp vp vp vp v src2
src1 not a NaN vxsnan_flag 1
Explanation:
src1 The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}).
src2 For xvmaddadp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}).
For xvmaddmdp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}).
src3 For xvmaddadp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}).
For xvmaddmdp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}).
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two
nonzero finite number source operands.
Q(x) Return a QNaN with the payload of x.
A(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision.
Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd).
M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
p The intermediate product having unbounded range and precision.
v The intermediate result having unbounded range and precision.
Part 1: src3
Multiply –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
p dQNaN p dQNaN p Q(src3)
–Infinity p +Infinity p +Infinity p –Infinity p –Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p Q(src3)
–NZF p +Infinity p M(src1,src3) p +Zero p –Zero p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
–Zero p +Zero p +Zero p –Zero p –Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Zero p –Zero p –Zero p +Zero p +Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
src1
p Q(src3)
+NZF p –Infinity p M(src1,src3) p –Zero p +Zero p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Infinity p –Infinity p +Infinity p +Infinity p +Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p src1
QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1
vxsnan_flag 1
p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Part 2: src2
Add –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
–Infinity v –Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
–NZF v –Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2
vxsnan_flag 1
v Q(src2)
–Zero v –Infinity v src2 v –Zero v Rezd v src2 v +Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v –Infinity v src2 v Rezd v +Zero v src2 v +Infinity v src2
vxsnan_flag 1
p
v Q(src2)
+NZF v –Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2
vxsnan_flag 1
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
QNaN & vp
vp vp vp vp vp vp vp
src1 is a NaN vxsnan_flag 1
QNaN & v Q(src2)
vp vp vp vp vp vp v src2
src1 not a NaN vxsnan_flag 1
Explanation:
src1 The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}).
src2 For xvmaddasp, the single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}).
For xvmaddmsp, the single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}).
src3 For xvmaddasp, the single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}).
For xvmaddmsp, the single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}).
dQNaN Default quiet NaN (0x7FC0_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two
nonzero finite number source operands.
Q(x) Return a QNaN with the payload of x.
A(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision.
Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd).
M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
p The intermediate product having unbounded range and precision.
v The intermediate result having unbounded range and precision.
src2
–Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
T(Q(src2))
–Infinity T(src1) T(src2) T(src2) T(src2) T(src2) T(src2) T(src1)
fx(VXSNAN)
T(Q(src2))
–NZF T(src1) T(M(src1,src2)) T(src2) T(src2) T(src2) T(src2) T(src1)
fx(VXSNAN)
T(Q(src2))
–Zero T(src1) T(src1) T(src1) T(src2) T(src2) T(src2) T(src1)
fx(VXSNAN)
T(Q(src2))
+Zero T(src1) T(src1) T(src1) T(src1) T(src2) T(src2) T(src1)
fx(VXSNAN)
src1
T(Q(src2))
+NZF T(src1) T(src1) T(src1) T(src1) T(M(src1,src2)) T(src2) T(src1)
fx(VXSNAN)
T(Q(src2))
+Infinity T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1)
fx(VXSNAN)
T(src1)
QNaN T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(src1)
fx(VXSNAN)
T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1))
SNaN
fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN)
Explanation:
src1 The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}).
src2 The double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}).
NZF Nonzero finite number.
Q(x) Return a QNaN with the payload of x.
M(x,y) Return the greater of floating-point value x and floating-point value y.
T(x) The value x is placed in doubleword element i (i{0,1}) of VSR[XT] in double-precision format.
FPRF, FR and FI are not modified.
fx(x) If x is equal to 0, FX is set to 1. x is set to 1.
VXSNAN Floating-point Invalid Operation Exception (SNaN). If VE=1, update of VSR[XT] is suppressed.
src2
–Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
T(Q(src2))
–Infinity T(src1) T(src2) T(src2) T(src2) T(src2) T(src2) T(src1)
fx(VXSNAN)
T(Q(src2))
–NZF T(src1) T(M(src1,src2)) T(src2) T(src2) T(src2) T(src2) T(src1)
fx(VXSNAN)
T(Q(src2))
–Zero T(src1) T(src1) T(src1) T(src2) T(src2) T(src2) T(src1)
fx(VXSNAN)
T(Q(src2))
+Zero T(src1) T(src1) T(src1) T(src1) T(src2) T(src2) T(src1)
fx(VXSNAN)
src1
T(Q(src2))
+NZF T(src1) T(src1) T(src1) T(src1) T(M(src1,src2)) T(src2) T(src1)
fx(VXSNAN)
T(Q(src2))
+Infinity T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1)
fx(VXSNAN)
T(src1)
QNaN T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(src1)
fx(VXSNAN)
T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1))
SNaN
fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN)
Explanation:
src1 The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}).
src2 The single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}).
NZF Nonzero finite number.
Q(x) Return a QNaN with the payload of x.
M(x,y) Return the greater of floating-point value x and floating-point value y.
T(x) The value x is placed in word element i (i{0,1,2,3}) of VSR[XT] in single-precision format.
FPRF, FR and FI are not modified.
fx(x) If x is equal to 0, FX is set to 1. x is set to 1.
VXSNAN Floating-point Invalid Operation Exception (SNaN). If VE=1, update of VSR[XT] is suppressed.
src2
–Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
T(Q(src2))
–Infinity T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1)
fx(VXSNAN)
T(Q(src2))
–NZF T(src2) T(M(src1,src2)) T(src1) T(src1) T(src1) T(src1) T(src1)
fx(VXSNAN)
T(Q(src2))
–Zero T(src2) T(src2) T(src1) T(src1) T(src1) T(src1) T(src1)
fx(VXSNAN)
T(Q(src2))
+Zero T(src2) T(src2) T(src2) T(src1) T(src1) T(src1) T(src1)
fx(VXSNAN)
src1
T(Q(src2))
+NZF T(src2) T(src2) T(src2) T(src2) T(M(src1,src2)) T(src1) T(src1)
fx(VXSNAN)
T(Q(src2))
+Infinity T(src2) T(src2) T(src2) T(src2) T(src2) T(src1) T(src1)
fx(VXSNAN)
T(src1)
QNaN T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(src1)
fx(VXSNAN)
T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1))
SNaN
fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN)
Explanation:
src1 The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}).
src2 The double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}).
NZF Nonzero finite number.
Q(x) Return a QNaN with the payload of x.
M(x,y) Return the lesser of floating-point value x and floating-point value y.
T(x) The value x is placed in doubleword element i (i{0,1}) of VSR[XT] in double-precision format.
FPRF, FR and FI are not modified.
fx(x) If x is equal to 0, FX is set to 1. x is set to 1.
VXSNAN Floating-point Invalid Operation Exception (SNaN). If VE=1, update of VSR[XT] is suppressed.
src2
–Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
T(Q(src2))
–Infinity T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1)
fx(VXSNAN)
T(Q(src2))
–NZF T(src2) T(M(src1,src2)) T(src1) T(src1) T(src1) T(src1) T(src1)
fx(VXSNAN)
T(Q(src2))
–Zero T(src2) T(src2) T(src1) T(src1) T(src1) T(src1) T(src1)
fx(VXSNAN)
T(Q(src2))
+Zero T(src2) T(src2) T(src2) T(src1) T(src1) T(src1) T(src1)
fx(VXSNAN)
src1
T(Q(src2))
+NZF T(src2) T(src2) T(src2) T(src2) T(M(src1,src2)) T(src1) T(src1)
fx(VXSNAN)
T(Q(src2))
+Infinity T(src2) T(src2) T(src2) T(src2) T(src2) T(src1) T(src1)
fx(VXSNAN)
T(src1)
QNaN T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(src1)
fx(VXSNAN)
T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1))
SNaN
fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN)
Explanation:
src1 The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}).
src2 The single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}).
NZF Nonzero finite number.
Q(x) Return a QNaN with the payload of x.
M(x,y) Return the lesser of floating-point value x and floating-point value y.
T(x) The value x is placed in word element i (i{0,1,2,3}) of VSR[XT] in single-precision format.
FPRF, FR and FI are not modified.
fx(x) If x is equal to 0, FX is set to 1. x is set to 1.
VXSNAN Floating-point Invalid Operation Exception (SNaN). If VE=1, update of VSR[XT] is suppressed.
if( ex_flag = 0 ) then VSR[XT] result If a trap-enabled exception occurs in any element of
the vector, no results are written to VSR[XT].
Let XT be the value 32×TX + T.
Let XA be the value 32×AX + A. Special Registers Altered
Let XB be the value 32×BX + B. FX OX UX XX VXSNAN VXISI VXIMZ
Part 1: src3
Multiply –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
p dQNaN p dQNaN p Q(src3)
–Infinity p +Infinity p +Infinity p –Infinity p –Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p Q(src3)
–NZF p +Infinity p M(src1,src3) p +Zero p –Zero p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
–Zero p +Zero p +Zero p –Zero p –Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Zero p –Zero p –Zero p +Zero p +Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
src1
p Q(src3)
+NZF p –Infinity p M(src1,src3) p –Zero p +Zero p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Infinity p –Infinity p +Infinity p +Infinity p +Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p src1
QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1
vxsnan_flag 1
p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Part 2: src2
Subtract –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
–Infinity v –Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
–NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v –Infinity v src2
vxsnan_flag 1
v Q(src2)
–Zero v +Infinity v –src2 v –Zero v Rezd v –src2 v –Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v +Infinity v –src2 v Rezd v +Zero v –src2 v –Infinity v src2
vxsnan_flag 1
p
v Q(src2)
+NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v –Infinity v src2
vxsnan_flag 1
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
QNaN & vp
vp vp vp vp vp vp vp
src1 is a NaN vxsnan_flag 1
QNaN & v Q(src2)
vp vp vp vp vp vp v src2
src1 not a NaN vxsnan_flag 1
Explanation:
src1 The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}).
src2 For xvmsubadp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}).
For xvmsubmdp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}).
src3 For xvmsubadp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}).
For xvmsubmdp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}).
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two
nonzero finite number source operands.
Q(x) Return a QNaN with the payload of x.
S(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision.
Note: If x = y, v is considered to be an exact-zero-difference result (Rezd).
M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
p The intermediate product having unbounded range and precision.
v The intermediate result having unbounded range and precision.
Part 1: src3
Multiply –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
p dQNaN p dQNaN p Q(src3)
–Infinity p +Infinity p +Infinity p –Infinity p –Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p Q(src3)
–NZF p +Infinity p M(src1,src3) p +Zero p –Zero p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
–Zero p +Zero p +Zero p –Zero p –Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Zero p –Zero p –Zero p +Zero p +Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
src1
p Q(src3)
+NZF p –Infinity p M(src1,src3) p –Zero p +Zero p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Infinity p –Infinity p +Infinity p +Infinity p +Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p src1
QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1
vxsnan_flag 1
p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Part 2: src2
Subtract –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
–Infinity v –Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
–NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v –Infinity v src2
vxsnan_flag 1
v Q(src2)
–Zero v +Infinity v –src2 v –Zero v Rezd v –src2 v –Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v +Infinity v –src2 v Rezd v +Zero v –src2 v –Infinity v src2
vxsnan_flag 1
p
v Q(src2)
+NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v –Infinity v src2
vxsnan_flag 1
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
QNaN & vp
vp vp vp vp vp vp vp
src1 is a NaN vxsnan_flag 1
QNaN & v Q(src2)
vp vp vp vp vp vp v src2
src1 not a NaN vxsnan_flag 1
Explanation:
src1 The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}).
src2 For xvmsubasp, the single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}).
For xvmsubmsp, the single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}).
src3 For xvmsubasp, the single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}).
For xvmsubmsp, the single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}).
dQNaN Default quiet NaN (0x7FC0_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two
nonzero finite number source operands.
Q(x) Return a QNaN with the payload of x.
S(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision.
Note: If x = y, v is considered to be an exact-zero-difference result (Rezd).
M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
p The intermediate product having unbounded range and precision.
v The intermediate result having unbounded range and precision.
VSX Vector Multiply Double-Precision See Table 98, “Vector Floating-Point Final Result,”
XX3-form on page 661.
src2
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v dQNaN v Q(src2)
-Infinity v +Infinity v +Infinity v –Infinity v –Infinity v src2
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
v Q(src2)
-NZF v +Infinity v M(src1,src2) v +Zero v –Zero v M(src1,src2) v +Infinity v src2
vxsnan_flag 1
v dQNaN v dQNaN v Q(src2)
-Zero v +Zero v +Zero v –Zero v –Zero v src2
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
v dQNaN v dQNaN v Q(src2)
+Zero v –Zero v –Zero v +Zero v +Zero v src2
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
src1
v Q(src2)
+NZF v –Infinity v M(src1,src2) v –Zero v +Zero v M(src1,src2) v +Infinity v src2
vxsnan_flag 1
v dQNaN v dQNaN v Q(src2)
+Infinity v –Infinity v +Infinity v +Infinity v +Infinity v src2
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
v src1
QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1
vxsnan_flag 1
v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Explanation:
src1 The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}).
src2 The double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}).
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
Q(x) Return a QNaN with the payload of x.
v The intermediate result having unbounded signficand precision and unbounded exponent range.
VSX Vector Multiply Single-Precision See Table 98, “Vector Floating-Point Final Result,”
XX3-form on page 661.
src2
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v dQNaN v Q(src2)
-Infinity v +Infinity v +Infinity v –Infinity v –Infinity v src2
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
v Q(src2)
-NZF v +Infinity v M(src1,src2) v +Zero v –Zero v M(src1,src2) v +Infinity v src2
vxsnan_flag 1
v dQNaN v dQNaN v Q(src2)
-Zero v +Zero v +Zero v –Zero v –Zero v src2
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
v dQNaN v dQNaN v Q(src2)
+Zero v –Zero v –Zero v +Zero v +Zero v src2
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
src1
v Q(src2)
+NZF v –Infinity v M(src1,src2) v –Zero v +Zero v M(src1,src2) v +Infinity v src2
vxsnan_flag 1
v dQNaN v dQNaN v Q(src2)
+Infinity v –Infinity v +Infinity v +Infinity v +Infinity v src2
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
v src1
QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1
vxsnan_flag 1
v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Explanation:
src1 The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}).
src2 The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}).
dQNaN Default quiet NaN (0x7FC0_0000).
NZF Nonzero finite number.
M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
Q(x) Return a QNaN with the payload of x.
v The intermediate result having unbounded signficand precision and unbounded exponent range.
XT TX || T XT TX || T
XB BX || B XB BX || B
For each vector element i from 0 to 1, do the following. For each vector element i from 0 to 3, do the following.
The contents of doubleword element i of VSR[XB], The contents of word element i of VSR[XB], with
with bit 0 set to 1, is placed into doubleword bit 0 set to 1, is placed into word element i of
element i of VSR[XT]. VSR[XT].
VSR Data Layout for xvnabsdp VSR Data Layout for xvnabssp
src = VSR[XB] src = VSR[XB]
DP DP SP SP SP SP
tgt = VSR[XT] tgt = VSR[XT]
DP DP SP SP SP SP
0 64 127 0 32 64 96 127
Part 1: src3
Multiply –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
p dQNaN p dQNaN p Q(src3)
–Infinity p +Infinity p +Infinity p –Infinity p –Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p Q(src3)
–NZF p +Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
–Zero p +Zero p +Zero p –Zero p –Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Zero p –Zero p –Zero p +Zero p +Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
src1
p Q(src3)
+NZF p –Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Infinity p –Infinity p +Infinity p +Infinity p +Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p src1
QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1
vxsnan_flag 1
p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Part 2: src2
Add –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
–Infinity v –Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
–NZF v –Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2
vxsnan_flag 1
v Q(src2)
–Zero v –Infinity v src2 v –Zero v Rezd v src2 v +Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v –Infinity v src2 v Rezd v +Zero v src2 v +Infinity v src2
vxsnan_flag 1
p
v Q(src2)
+NZF v –Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2
vxsnan_flag 1
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
QNaN & vp
vp vp vp vp vp vp vp
src1 is a NaN vxsnan_flag 1
QNaN & v Q(src2)
vp vp vp vp vp vp v src2
src1 not a NaN vxsnan_flag 1
Explanation:
src1 The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}).
src2 For xvnmaddadp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}).
For xvnmaddmdp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}).
src3 For xvnmaddadp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}).
For xvnmaddmdp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}).
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two
nonzero finite number source operands.
Q(x) Return a QNaN with the payload of x.
A(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision.
Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd).
M(x,y) Return the product of floating-point value x and floating-point value y, having unbounded range and precision.
p The intermediate product having unbounded range and precision.
v The intermediate result having unbounded range and precision.
Is q inexact? (q g v)
vxsnan_flag
vximz_flag
vxisi_flag
OE
UE
VE
XE
ZE
Case Returned Results and Status Setting
– – – – – 0 0 0 – – – – T(N(r))
0 – – – – – – 1 – – – – T(r), fx(VXISI)
0 – – – – 0 1 – – – – – T(r), fx(VXIMZ)
0 – – – – 1 0 – – – – – T(r), fx(VXSNAN)
Special 0 – – – – 1 1 – – – – – T(r), fx(VXSNAN), fx(VXIMZ)
1 – – – – – – 1 – – – – fx(VXISI), error()
1 – – – – 0 1 – – – – – fx(VXIMZ), error()
1 – – – – 1 0 – – – – – fx(VXSNAN), error()
1 – – – – 1 1 – – – – – fx(VXSNAN), fx(VXIMZ), error()
– – – – – – – – no – – – T(N(r))
– – – – 0 – – – yes no – – T(N(r)), fx(XX)
Normal – – – – 0 – – – yes yes – – T(N(r)), fx(XX)
– – – – 1 – – – yes no – – T(N(r)), fx(XX), error()
– – – – 1 – – – yes yes – – T(N(r)), fx(XX), error()
– 0 – – 0 – – – – – – – T(N(r)), fx(OX), fx(XX)
– 0 – – 1 – – – – – – – T(N(r)), fx(OX), fx(XX), error()
Overflow – 1 – – – – – – – – no – fx(OX), error()
– 1 – – – – – – – – yes no fx(OX), fx(XX), error()
– 1 – – – – – – – – yes yes fx(OX), fx(XX), error()
Explanation:
– The results do not depend on this condition.
fx(x) FX is set to 1 if x=0. x is set to 1.
q The value defined in Table 50, “Scalar Floating-Point Intermediate Result Handling,” on page 515, signficand rounded to the target
precision, unbounded exponent range.
r The value defined in Table 50, “Scalar Floating-Point Intermediate Result Handling,” on page 515, signficand rounded to the target
precision, bounded exponent range.
v The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range.
FI Floating-Point Fraction Inexact status flag, FPSCRFI. This status flag is nonsticky.
FR Floating-Point Fraction Rounded status flag, FPSCRFR.
OX Floating-Point Overflow Exception status flag, FPSCROX.
error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set
to any mode other than the ignore-exception mode. Update of the target VSR is suppressed for all vector elements.
N(x) The value x is is negated by complementing the sign bit of x.
T(x) The value x is placed in element i of VSR[XT] in the target precision format (where i c {0,1} for results with 64-bit elements, and i c
{0,1,3,4}) for results with 32-bit elements).
UX Floating-Point Underflow Exception status flag, FPSCRUX
VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN.
VXIMZ Floating-Point Invalid Operation Exception (Infinity × Zero) status flag, FPSCRVXIMZ.
VXISI Floating-Point Invalid Operation Exception (Infinity – Infinity) status flag, FPSCRVXISI.
XX Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new
value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI.
Is q inexact? (q g v)
vxsnan_flag
vximz_flag
vxisi_flag
OE
UE
VE
XE
Case ZE Returned Results and Status Setting
– – 0 – – – – – no – – – T(N(r))
– – 0 – 0 – – – yes no – – T(N(r)), fx(UX), fx(XX)
– – 0 – 0 – – – yes yes – – T(N(r)), fx(UX), fx(XX)
– – 0 – 1 – – – yes no – – T(N(r)), fx(UX), fx(XX), error()
Tiny
– – 0 – 1 – – – yes yes – – T(N(r)), fx(UX), fx(XX), error()
– – 1 – – – – – yes – no – fx(UX), error()
– – 1 – – – – – yes – yes no fx(UX), fx(XX), error()
– – 1 – – – – – yes – yes yes fx(UX), fx(XX), error()
Explanation:
– The results do not depend on this condition.
fx(x) FX is set to 1 if x=0. x is set to 1.
q The value defined in Table 50, “Scalar Floating-Point Intermediate Result Handling,” on page 515, signficand rounded to the target
precision, unbounded exponent range.
r The value defined in Table 50, “Scalar Floating-Point Intermediate Result Handling,” on page 515, signficand rounded to the target
precision, bounded exponent range.
v The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range.
FI Floating-Point Fraction Inexact status flag, FPSCRFI. This status flag is nonsticky.
FR Floating-Point Fraction Rounded status flag, FPSCRFR.
OX Floating-Point Overflow Exception status flag, FPSCROX.
error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set
to any mode other than the ignore-exception mode. Update of the target VSR is suppressed for all vector elements.
N(x) The value x is is negated by complementing the sign bit of x.
T(x) The value x is placed in element i of VSR[XT] in the target precision format (where i c {0,1} for results with 64-bit elements, and i c
{0,1,3,4}) for results with 32-bit elements).
UX Floating-Point Underflow Exception status flag, FPSCRUX
VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN.
VXIMZ Floating-Point Invalid Operation Exception (Infinity × Zero) status flag, FPSCRVXIMZ.
VXISI Floating-Point Invalid Operation Exception (Infinity – Infinity) status flag, FPSCRVXISI.
XX Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new
value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI.
Part 1: src3
Multiply –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
p dQNaN p dQNaN p Q(src3)
–Infinity p +Infinity p +Infinity p –Infinity p –Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p Q(src3)
–NZF p +Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
–Zero p +Zero p +Zero p –Zero p –Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Zero p –Zero p –Zero p +Zero p +Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
src1
p Q(src3)
+NZF p –Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Infinity p –Infinity p +Infinity p +Infinity p +Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p src1
QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1
vxsnan_flag 1
p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Part 2: src2
Add –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
–Infinity v –Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
–NZF v –Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2
vxsnan_flag 1
v Q(src2)
–Zero v –Infinity v src2 v –Zero v Rezd v src2 v +Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v –Infinity v src2 v Rezd v +Zero v src2 v +Infinity v src2
vxsnan_flag 1
p
v Q(src2)
+NZF v –Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2
vxsnan_flag 1
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
QNaN & vp
vp vp vp vp vp vp vp
src1 is a NaN vxsnan_flag 1
QNaN & v Q(src2)
vp vp vp vp vp vp v src2
src1 not a NaN vxsnan_flag 1
Explanation:
src1 The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}).
src2 For xvnmaddasp, the single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}).
For xvnmaddmsp, the single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}).
src3 For xvnmaddasp, the single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}).
For xvnmaddmsp, the single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}).
dQNaN Default quiet NaN (0x7FC0_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two
nonzero finite number source operands.
Q(x) Return a QNaN with the payload of x.
A(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision.
Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd).
M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
p The intermediate product having unbounded range and precision.
v The intermediate result having unbounded range and precision.
if( ex_flag = 0 ) then VSR[XT] result If a trap-enabled exception occurs in any element of
the vector, no results are written to VSR[XT].
Let XT be the value 32×TX + T.
Let XA be the value 32×AX + A. Special Registers Altered
Let XB be the value 32×BX + B. FX OX UX XX VXSNAN VXISI VXIMZ
Part 1: src3
Multiply –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
p dQNaN p dQNaN p Q(src3)
–Infinity p +Infinity p +Infinity p –Infinity p –Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p Q(src3)
–NZF p +Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
–Zero p +Zero p +Zero p –Zero p –Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Zero p –Zero p –Zero p +Zero p +Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
src1
p Q(src3)
+NZF p –Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Infinity p –Infinity p +Infinity p +Infinity p +Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p src1
QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1
vxsnan_flag 1
p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Part 2: src2
Subtract –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
–Infinity v –Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
–NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v –Infinity v src2
vxsnan_flag 1
v Q(src2)
–Zero v +Infinity v –src2 v –Zero v Rezd v –src2 v –Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v +Infinity v –src2 v Rezd v +Zero v –src2 v –Infinity v src2
vxsnan_flag 1
p
v Q(src2)
+NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v –Infinity v src2
vxsnan_flag 1
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
QNaN & vp
vp vp vp vp vp vp vp
src1 is a NaN vxsnan_flag 1
QNaN & v Q(src2)
vp vp vp vp vp vp v src2
src1 not a NaN vxsnan_flag 1
Explanation:
src1 The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}).
src2 For xvnmsubadp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}).
For xvnmsubmdp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}).
src3 For xvnmsubadp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}).
For xvnmsubmdp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}).
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two
nonzero finite number source operands.
Q(x) Return a QNaN with the payload of x.
S(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision.
Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd).
M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
p The intermediate product having unbounded range and precision.
v The intermediate result having unbounded range and precision.
Part 1: src3
Multiply –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
p dQNaN p dQNaN p Q(src3)
–Infinity p +Infinity p +Infinity p –Infinity p –Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p Q(src3)
–NZF p +Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
–Zero p +Zero p +Zero p –Zero p –Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Zero p –Zero p –Zero p +Zero p +Zero p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
src1
p Q(src3)
+NZF p –Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3
vxsnan_flag 1
p dQNaN p dQNaN p Q(src3)
+Infinity p –Infinity p +Infinity p +Infinity p +Infinity p src3
vximz_flag 1 vximz_flag 1 vxsnan_flag 1
p src1
QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1
vxsnan_flag 1
p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Part 2: src2
Subtract –Infinity –NZF –Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
–Infinity v –Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
–NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v –Infinity v src2
vxsnan_flag 1
v Q(src2)
–Zero v +Infinity v –src2 v –Zero v Rezd v –src2 v –Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v +Infinity v –src2 v Rezd v +Zero v –src2 v –Infinity v src2
vxsnan_flag 1
p
v Q(src2)
+NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v –Infinity v src2
vxsnan_flag 1
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
QNaN & vp
vp vp vp vp vp vp vp
src1 is a NaN vxsnan_flag 1
QNaN & v Q(src2)
vp vp vp vp vp vp v src2
src1 not a NaN vxsnan_flag 1
Explanation:
src1 The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}).
src2 The single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}).
src3 The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}).
dQNaN Default quiet NaN (0x7FC0_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two
nonzero finite number source operands.
Q(x) Return a QNaN with the payload of x.
S(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision.
Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd).
M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision.
p The intermediate product having unbounded range and precision.
v The intermediate result having unbounded range and precision.
XT TX || T XT TX || T
XB BX || B XB BX || B
ex_flag 0b0 ex_flag 0b0
VSR Data Layout for xvrdpi The result is placed into doubleword element i of
VSR[XT] in double-precision format.
src = VSR[XB]
DP DP If a trap-enabled exception occurs in any element of
the vector, no results are written to VSR[XT].
tgt = VSR[XT]
DP DP Special Registers Altered
0 64 127 FX XX VXSNAN
XT TX || T XT TX || T
XB BX || B XB BX || B
ex_flag 0b0 ex_flag 0b0
if( ex_flag = 0 ) then VSR[XT] result if( ex_flag = 0 ) then VSR[XT] result
For each vector element i from 0 to 1, do the following. For each vector element i from 0 to 1, do the following.
Let src be the double-precision floating-point Let src be the double-precision floating-point
operand in doubleword element i of VSR[XB]. operand in doubleword element i of VSR[XB].
src is rounded to an integer using the rounding src is rounded to an integer using the rounding
mode Round toward -Infinity. mode Round toward +Infinity.
The result is placed into doubleword element i of The result is placed into doubleword element i of
VSR[XT] in double-precision format. VSR[XT] in double-precision format.
If a trap-enabled exception occurs in any element of If a trap-enabled exception occurs in any element of
the vector, no results are written to VSR[XT]. the vector, no results are written to VSR[XT].
VSR Data Layout for xvrdpim VSR Data Layout for xvrdpip
src = VSR[XB] src = VSR[XB]
DP DP DP DP
tgt = VSR[XT] tgt = VSR[XT]
DP DP DP DP
0 64 127 0 64 127
60 T /// B 217 BX TX
0 6 11 16 21 30 31
XT TX || T
XB BX || B
ex_flag 0b0
do i=0 to 127 by 64
reset_xflags()
result{i:i+63} RoundToDPIntegerTrunc(VSR[XB]{i:i+63})
if(vxsnan_flag) then SetFX(VXSNAN)
ex_flag ex_flag | (VE & vxsnan_flag)
end
if( ex_flag = 0 ) then VSR[XT] result VSR Data Layout for xvredp
src = VSR[XB]
Let XT be the value 32×TX + T.
Let XB be the value 32×BX + B. DP DP
tgt = VSR[XT]
For each vector element i from 0 to 1, do the following.
Let src be the double-precision floating-point DP DP
operand in doubleword element i of VSR[XB]. 0 64 127
1
estimate – ---------- 1
src
---------------------------------------------- ------------------
1 16384
----------
src
1
estimate – ---------- 1
src
---------------------------------------------- ------------------
1 16384
----------
src
VSX Vector Round to Single-Precision Integer VSX Vector Round to Single-Precision Integer
using round to Nearest Away XX2-form Exact using Current rounding mode XX2-form
xvrspi XT,XB xvrspic XT,XB
XT TX || T XT TX || T
XB BX || B XB BX || B
ex_flag 0b0 ex_flag 0b0
VSR Data Layout for xvrspi The result is placed into word element i of VSR[XT]
in single-precision format.
src = VSR[XB]
SP SP SP SP If a trap-enabled exception occurs in any element of
the vector, no results are written to VSR[XT].
tgt = VSR[XT]
SP SP SP SP Special Registers Altered
0 32 64 96 127 FX XX VXSNAN
VSX Vector Round to Single-Precision Integer VSX Vector Round to Single-Precision Integer
using round toward -Infinity XX2-form using round toward +Infinity XX2-form
xvrspim XT,XB xvrspip XT,XB
XT TX || T XT TX || T
XB BX || B XB BX || B
ex_flag 0b0 ex_flag 0b0
if( ex_flag = 0 ) then VSR[XT] result if( ex_flag = 0 ) then VSR[XT] result
For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following.
Let src be the single-precision floating-point Let src be the single-precision floating-point
operand in word element i of VSR[XB]. operand in word element i of VSR[XB].
src is rounded to an integer using the rounding src is rounded to an integer using the rounding
mode Round toward -Infinity. mode Round toward +Infinity.
The result is placed into word element i of VSR[XT] The result is placed into word element i of VSR[XT]
in single-precision format. in single-precision format.
If a trap-enabled exception occurs in any element of If a trap-enabled exception occurs in any element of
the vector, no results are written to VSR[XT]. the vector, no results are written to VSR[XT].
VSR Data Layout for xvrspim VSR Data Layout for xvrspip
src = VSR[XB] src = VSR[XB]
SP SP SP SP SP SP SP SP
tgt = VSR[XT] tgt = VSR[XT]
SP SP SP SP SP SP SP SP
0 32 64 96 127 0 32 64 96 127
VSX Vector Round to Single-Precision Integer VSX Vector Reciprocal Square Root Estimate
using round toward Zero XX2-form Double-Precision XX2-form
xvrspiz XT,XB xvrsqrtedp XT,XB
XT TX || T XT TX || T
XB BX || B XB BX || B
ex_flag 0b0 ex_flag 0b0
For each vector element i from 0 to 3, do the following. if( ex_flag = 0 ) then VSR[XT] result
Let src be the single-precision floating-point
operand in word element i of VSR[XB]. Let XT be the value 32×TX + T.
Let XB be the value 32×BX + B.
src is rounded to an integer using the rounding
mode Round toward Zero. For each vector element i from 0 to 1, do the following.
Let src be the double-precision floating-point
The result is placed into word element i of VSR[XT] operand in doubleword element i of VSR[XB].
in single-precision format.
A double-precision floating-point estimate of the
If a trap-enabled exception occurs in any element of reciprocal square root of src is placed into
the vector, no results are written to VSR[XT]. doubleword element i of VSR[XT] in
double-precision format.
Special Registers Altered
FX VXSNAN Unless the reciprocal of the square root of src
would be a zero, an infinity, or a QNaN, the
estimate has a relative error in precision no
VSR Data Layout for xvrspiz greater than one part in 16384 of the reciprocal of
src = VSR[XB] the square root of src. That is,
SP SP SP SP 1
estimate – ---------------
1
tgt = VSR[XT] src
-------------------------------------------------- ----------------
1 16384
----------------
SP SP SP SP src
0 32 64 96 127
Operation with various special values of the operand is
summarized below.
VSX Vector Reciprocal Square Root Estimate Source Value Result Exception
Single-Precision XX2-form
–Infinity QNaN1 VXSQRT
xvrsqrtesp XT,XB +Infinity +Zero None
do i=0 to 127 by 32
QNaN QNaN None
reset_xflags() 1. No result if VE=1.
v{0:inf} RecipSquareRootEstimateSP(VSR[XB]{i:i+31}) 2. No result if ZE=1.
result{i:i+31} RoundToDP(RN,v)
if(vxsnan_flag) then SetFX(VXSNAN) If a trap-enabled exception occurs in any element of
if(vxsqrt_flag) then SetFX(VXSQRT) the vector, no results are written to VSR[XT].
if(zx_flag) then SetFX(ZX)
ex_flag ex_flag | (VE & vxsnan_flag) The results of executing this instruction is permitted to
ex_flag ex_flag | (VE & vxsqrt_flag)
vary between implementations, and between different
ex_flag ex_flag | (ZE & zx_flag)
end
executions on the same implementation.
VSX Vector Square Root Double-Precision See Table 50, “Scalar Floating-Point Intermediate
XX2-form Result Handling,” on page 515.
src
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v dQNaN v Q(src)
v +Zero v +Zero v SQRT(src) v +Infinity v src
vxsqrt_flag 1 vxsqrt_flag 1 vxsnan_flag 1
Explanation:
src The double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}).
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
SQRT(x) The unbounded-precision square root of the floating-point value x.
Q(x) Return a QNaN with the payload of x.
v The intermediate result having unbounded signficand precision and unbounded exponent range.
VSX Vector Square Root Single-Precision See Table 50, “Scalar Floating-Point Intermediate
XX2-form Result Handling,” on page 515.
src
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v dQNaN v Q(src)
v +Zero v +Zero v SQRT(src) v +Infinity v src
vxsqrt_flag 1 vxsqrt_flag 1 vxsnan_flag 1
Explanation:
src The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}).
dQNaN Default quiet NaN (0x7FC0_0000).
NZF Nonzero finite number.
SQRT(x) The unbounded-precision square root of the floating-point value x.
Q(x) Return a QNaN with the payload of x.
v The intermediate result having unbounded signficand precision and unbounded exponent range.
VSX Vector Subtract Double-Precision The result is placed into doubleword element i of
XX3-form VSR[XT] in double-precision format.
1. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared,
and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo-
nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an interme-
diate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation.
2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the num-
ber of bits the significand was shifted.
src2
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
-Infinity v –Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
-NZF v +Infinity v S(src1,src2) v src1 v src1 v S(src1,src2) v –Infinity v src2
vxsnan_flag 1
v Q(src2)
-Zero v +Infinity v –src2 v –Zero v Rezd v –src2 v –Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v +Infinity v –src2 v Rezd v +Zero v –src2 v –Infinity v src2
vxsnan_flag 1
src1
v Q(src2)
+NZF v +Infinity v S(src1,src2) v src1 v src1 v S(src1,src2) v –Infinity v src2
vxsnan_flag 1
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v src1
QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1
vxsnan_flag 1
v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Explanation:
src1 The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}).
src2 The double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}).
dQNaN Default quiet NaN (0x7FF8_0000_0000_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs).
S(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision.
Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd).
Q(x) Return a QNaN with the payload of x.
v The intermediate result having unbounded signficand precision and unbounded exponent range.
VSX Vector Subtract Single-Precision The result is placed into word element i of VSR[XT]
XX3-form in single-precision format.
do i=0 to 127 by 32
reset_xflags() VSR Data Layout for xvsubsp
src1 VSR[XA]{i:i+31}
src2 VSR[XB]{i:i+31} src1 = VSR[XA]
v{0:inf} AddSP(src1,NegateSP(src2)) SP SP SP SP
result{i:i+31} RoundToSP(RN,v)
if(vxsnan_flag) then SetFX(VXSNAN) src2 = VSR[XB]
if(vxisi_flag) then SetFX(VXISI)
if(ox_flag) then SetFX(OX) SP SP SP SP
if(ux_flag) then SetFX(UX)
tgt = VSR[XT]
if(xx_flag) then SetFX(XX)
ex_flag ex_flag | (VE & vxsnan_flag) SP SP SP SP
ex_flag ex_flag | (VE & vxisi_flag) 0 32 64 96 127
ex_flag ex_flag | (OE & ox_flag)
ex_flag ex_flag | (UE & ux_flag)
ex_flag ex_flag | (XE & xx_flag)
end
1. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared,
and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo-
nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an interme-
diate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation.
2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the num-
ber of bits the significand was shifted.
src2
-Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN
v dQNaN v Q(src2)
-Infinity v –Infinity v –Infinity v –Infinity v –Infinity v –Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v Q(src2)
-NZF v +Infinity v S(src1,src2) v src1 v src1 v S(src1,src2) v –Infinity v src2
vxsnan_flag 1
v Q(src2)
-Zero v +Infinity v –src2 v –Zero v Rezd v –src2 v –Infinity v src2
vxsnan_flag 1
v Q(src2)
+Zero v +Infinity v –src2 v Rezd v +Zero v –src2 v –Infinity v src2
vxsnan_flag 1
src1
v Q(src2)
+NZF v +Infinity v S(src1,src2) v src1 v src1 v S(src1,src2) v –Infinity v src2
vxsnan_flag 1
v dQNaN v Q(src2)
+Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2
vxisi_flag 1 vxsnan_flag 1
v src1
QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1
vxsnan_flag 1
v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1)
SNaN
vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1
Explanation:
src1 The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}).
src2 The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}).
dQNaN Default quiet NaN (0x7FC0_0000).
NZF Nonzero finite number.
Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs).
S(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision.
Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd).
Q(x) Return a QNaN with the payload of x.
v The intermediate result having unbounded signficand precision and unbounded exponent range.
VSX Vector Test for software Divide fg_flag is set to 1 for any of the following
Double-Precision XX3-form conditions.
– src1 is an infinity.
xvtdivdp BF,XA,XB – src2 is a zero, an infinity, or a denormalized
value.
60 BF // A B 125 AX BX /
0 6 9 11 16 21 29 30 31 CR field BF is set to the value
0b1 || fg_flag || fe_flag || 0b0.
XA AX || A
XB BX || B
eq_flag 0b0
Special Registers Altered
gt_flag 0b0 CR[BF]
do i=0 to 127 by 64
src1 VSR[XA]{i:i+63} VSR Data Layout for xvtdivdp
src2 VSR[XB]{i:i+63} src1 = VSR[XA]
e_a src1{1:11} - 1023
e_b src2{1:11} - 1023 .dword[0] .dword[1]
fe_flag fe_flag | IsNaN(src1) | IsInf(src1) |
IsNaN(src2) | IsInf(src2) | IsZero(src2) | src2 = VSR[XB]
( e_b <= -1022 ) |
.dword[0] .dword[1]
( e_b >= 1021 ) |
( !IsZero(src1) & ( (e_a - e_b) >= 1023 ) ) | 0 64 127
( !IsZero(src1) & ( (e_a - e_b) <= -1021 ) ) |
( !IsZero(src1) & ( e_a <= -970 ) )
fg_flag fg_flag | IsInf(src1) | IsInf(src2) |
IsZero(src2) | IsDen(src2)
end
fe_flag is initialized to 0.
fg_flag is initialized to 0.
VSX Vector Test for software Divide fg_flag is set to 1 for any of the following
Single-Precision XX3-form conditions.
– src1 is an infinity.
xvtdivsp BF,XA,XB – src2 is a zero, an infinity, or a denormalized
value.
60 BF // A B 93 AX BX /
0 6 9 11 16 21 29 30 31 CR field BF is set to the value
0b1 || fg_flag || fe_flag || 0b0.
XA AX || A
XB BX || B
eq_flag 0b0
Special Registers Altered
gt_flag 0b0 CR[BF]
fe_flag is initialized to 0.
fg_flag is initialized to 0.
VSX Vector Test for software Square Root VSX Vector Test for software Square Root
Double-Precision XX2-form Single-Precision XX2-form
xvtsqrtdp BF,XB xvtsqrtsp BF,XB
XB BX || B XB BX || B
fe_flag 0b0 fe_flag 0b0
fg_flag 0b0 fg_flag 0b0
For each vector element i from 0 to 1, do the following. For each vector element i from 0 to 3, do the following.
Let src be the double-precision floating-point Let src be the single-precision floating-point
operand in doubleword element i of VSR[XB]. operand in word element i of VSR[XB].
Let e_b be the unbiased exponent of src. Let e_b be the unbiased exponent of src.
fe_flag is set to 1 for any of the following fe_flag is set to 1 for any of the following
conditions. conditions.
– src is a zero, a NaN, an infinity, or a negative – src is a zero, a NaN, an infinity, or a negative
value. value.
– e_b is less than or equal to -970. – e_b is less than or equal to -103.
fg_flag is set to 1 for the following condition. fg_flag is set to 1 for the following condition.
– src is a zero, an infinity, or a denormalized – src is a zero, an infinity, or a denormalized
value. value.
VSR Data Layout for xvtsqrtdp VSR Data Layout for xvtsqrtsp
src = VSR[XB] src = VSR[XB]
.dword[0] .dword[1] .word[0] .word[1] .word[2] .word[3]
0 64 127 0 32 64 96 127
VSX Vector Test Data Class Double-Precision Let XB be the sum 32×BX + B.
XX2-form Let XT be the sum 32×TX + T.
Let DCMX be the value dc concatenated with dm
xvtstdcdp XT,XB,DCMX concatenated with dx.
60 T dx B 15 dc 5 dmBXTX
0 6 11 16 21 25 26 29 30 31
For each integer value i from 0 to 1, do the following.
Let src be the double-precision floating-point
if MSR.VSX=0 then VSX_Unavailable() value in doubleword element i of VSR[XB].
if match = 1 then
VSR[32×TX+T].dword[i] 0xFFFF_FFFF_FFFF_FFFF
else
VSR[32×TX+T].dword[i] 0x0000_0000_0000_0000
end
VSX Vector Test Data Class Single-Precision Let XB be the sum 32×BX + B.
XX2-form Let XT be the sum 32×TX + T.
Let DCMX be the value dc concatenated with dm
xvtstdcsp XT,XB,DCMX concatenated with dx.
60 T dx B 13 dc 5 dmBXTX
0 6 11 16 21 25 26 29 30 31
For each integer value i from 0 to 3, do the following.
Let src be the single-precision floating-point value
if MSR.VSX=0 then VSX_Unavailable() in word element i of VSR[XB].
if match = 1 then
VSR[32×TX+T].dword[i] 0xFFFF_FFFF
else
VSR[32×TX+T].dword[i] 0x0000_0000
end
60 T 0 B 475 BX TX 60 T 8 B 475 BX TX
0 6 11 16 21 30 31 0 6 11 16 21 30 31
do i = 0 to 1 do i = 0 to 3
src VSR[32×BX+B].dword[i] src VSR[32×BX+B].word[i]
VSR[32×TX+T].dword[i] EXTZ64(src.bit[1:11]) VSR[32×TX+T].word[i] EXTZ32(src.bit[1:8])
end end
For each integer value i from 0 to 1, do the following. For each integer value i from 0 to 3, do the following.
Let src be the double-precision floating-point Let src be the single-precision floating-point value
value in doubleword element i of VSR[XB]. in word element i of VSR[XB].
The value of the exponent field in src is placed The value of the exponent field in src is placed
into doubleword element i of VSR[XT] in unsigned into word element i of VSR[XT] in unsigned integer
integer format. format.
60 T 1 B 475 BX TX 60 T 9 B 475 BX TX
0 6 11 16 21 30 31 0 6 11 16 21 30 31
do i = 0 to 1 do i = 0 to 3
src VSR[32×BX+B].dword[i] src VSR[32×BX+B].word[i]
exponent EXTZ(src.bit[1:11]) exponent EXTZ(src.bit[1:8])
fraction EXTZ64(src.bit[12:63]) fraction EXTZ32(src.bit[9:31])
if (exponent != 0) & (exponent != 2047) then if (exponent != 0) & (exponent != 255) then
fraction fraction | 0x0010_0000_0000_0000 fraction fraction | 0x0080_0000
For each integer value i from 0 to 1, do the following. For each integer value i from 0 to 3, do the following.
Let src be the double-precision floating-point Let src be the single-precision floating-point value
value in doubleword element i of VSR[XB]. in word element i of VSR[XB].
The significand of src is placed into doubleword The significand of src is placed into word element
element i of VSR[XT] in unsigned integer format. If i of VSR[XT] in unsigned integer format. If src is a
src is a normal value, the implicit leading bit is set normal value, the implicit leading bit is set to 1.
to 1.
Special Registers Altered:
Special Registers Altered: None
None
VSX Vector Byte-Reverse Quadword XX2-form VSX Vector Byte-Reverse Word XX2-form
xxbrq XT,XB xxbrw XT,XB
do i = 0 to 15 do i = 0 to 3
VSR[32×TX+T].byte[i] VSR[32×BX+B].byte[15-i] do j = 0 to 3
end VSR[32×TX+T].word[i].byte[j] VSR[32×BX+B].word[i].byte[3-j]
end
Let XT be the value 32×TX + T. end
Let XB be the value 32×BX + B.
Let XT be the value 32×TX + T.
For each integer value i from 0 to 15, do the following. Let XB be the value 32×BX + B.
The contents of byte sub-element 15-i of VSR[XB]
are placed into byte sub-element i of VSR[XT]. For each integer value i from 0 to 3, do the following.
The contents of byte 3 of word element i of
Special Registers Altered: VSR[XB] are placed into byte 0 of word element i
None of VSR[XT].
VSX Vector Extract Unsigned Word XX2-form VSX Vector Insert Word XX2-form
xxextractuw XT,XB,UIM xxinsertw XT,XB,UIM
Let XT be the value 32×TX + T. The contents of word element 1 of VSR[XB] are placed
Let XB be the value 32×BX + B. into byte elements UIM:UIM+3 of VSR[XT]. The contents
of the remaining byte elements of VSR[XT] are not
The contents of byte elements UIM:UIM+3 of VSR[XB] modified.
are placed into word element 1 of VSR[XT]. The
contents of the remaining word elements of VSR[XT] If the value of UIM is greater than 12, the results are
are set to 0. undefined.
If the value of UIM is greater than 12, the results are Special Registers Altered:
undefined. None
VSX Logical AND XX3-form VSX Logical AND with Complement XX3-form
xxland XT,XA,XB xxlandc XT,XA,XB
The contents of VSR[XA] are ANDed with the contents The contents of VSR[XA] are ANDed with the
of VSR[XB] and the result is placed into VSR[XT]. complement of the contents of VSR[XB] and the result is
placed into VSR[XT].
Special Registers Altered
None Special Registers Altered
None
VSR Data Layout for xxland
src1 = VSR[XA] VSR Data Layout for xxland
src1 = VSR[XA]
src2 = VSR[XB]
src2 = VSR[XB]
tgt = VSR[XT]
tgt = VSR[XT]
0 127
0 127
The contents of VSR[XA] are exclusive-ORed with the The contents of VSR[XA] are ANDed with the contents
contents of VSR[XB] and the complemented result is of VSR[XB] and the complemented result is placed into
placed into VSR[XT]. VSR[XT].
VSR Data Layout for xxleqv VSR Data Layout for xxlnand
src = VSR[XA] src = VSR[XA]
0 127 0 127
The contents of VSR[XA] are ORed with the The contents of VSR[XA] are ORed with the contents of
complement of the contents of VSR[XB] and the result is VSR[XB] and the complemented result is placed into
placed into VSR[XT]. VSR[XT].
VSR Data Layout for xxlorc VSR Data Layout for xxlnor
src1 = VSR[XA] src1 = VSR[XA]
0 127 0 127
The contents of VSR[XA] are ORed with the contents of The contents of VSR[XA] are exclusive-ORed with the
VSR[XB] and the result is placed into VSR[XT]. contents of VSR[XB] and the result is placed into
VSR[XT].
Special Registers Altered
None Special Registers Altered
None
VSR Data Layout for xxlor
src1 = VSR[XA] VSR Data Layout for xxlxor
src1 = VSR[XA]
src2 = VSR[XB]
src2 = VSR[XB]
tgt = VSR[XT]
tgt = VSR[XT]
0 127
0 127
VSX Merge High Word XX3-form VSX Merge Low Word XX3-form
xxmrghw XT,XA,XB xxmrglw XT,XA,XB
60 T A B 18 AX BX TX 60 T A B 50 AXBX TX
0 6 11 16 21 29 30 31 0 6 11 16 21 29 30 31
The contents of word element 0 of VSR[XA] are placed The contents of word element 2 of VSR[XA] are placed
into word element 0 of VSR[XT]. into word element 0 of VSR[XT].
The contents of word element 0 of VSR[XB] are placed The contents of word element 2 of VSR[XB] are placed
into word element 1 of VSR[XT]. into word element 1 of VSR[XT].
The contents of word element 1 of VSR[XA] are placed The contents of word element 3 of VSR[XA] are placed
into word element 2 of VSR[XT]. into word element 2 of VSR[XT].
The contents of word element 1 of VSR[XB] are placed The contents of word element 3 of VSR[XB] are placed
into word element 3 of VSR[XT]. into word element 3 of VSR[XT].
VSR Data Layout for xxmrghw VSR Data Layout for xxmrglw
src1 = VSR[XA] src1 = VSR[XA]
.word[0] .word[1] unused unused unused unused .word[2] .word[3]
src2 = VSR[XB] src2 = VSR[XB]
.word[0] .word[1] unused unused unused unused .word[2] .word[3]
tgt = VSR[XT] tgt = VSR[XT]
.word[0] .word[1] .word[2] .word[3] .word[0] .word[1] .word[2] .word[3]
0 32 64 96 127 0 32 64 96 127
60 T A B 26 AXBXTX 60 T A B 58 AXBXTX
0 6 11 16 21 29 30 31 0 6 11 16 21 29 30 31
do i = 0 to 15 do i = 0 to 15
idx pcv.byte[i].bit[3:7] idx pcv.byte[i].bit[3:7]
VSR[32×TX+T].byte[i] src.byte[idx] VSR[32×TX+T].byte[i] src.byte[31-idx]
end end
Let bytes 0:15 of src be the contents of VSR[XA]. Let bytes 0:15 of src be the contents of VSR[XA].
Let bytes 16:31 of src be the contents of VSR[XT]. Let bytes 16:31 of src be the contents of VSR[XT].
Let the permute control vector pcv be the contents of Let the permute control vector pcv be the contents of
VSR[XB]. VSR[XB].
For each integer value i from 0 to 15, do the following. For each integer value i from 0 to 15, do the following.
Let idx be the unsigned integer in bits 3:7 of byte Let idx be the unsigned integer in bits 3:7 of byte
element i of pcv. element i of pcv.
The contents of byte element idx of src is placed The contents of byte element 31-idx of src is
into byte element i of VSR[XT]. placed into byte element i of VSR[XT].
VSX Shift Left Double by Word Immediate VSX Splat Word XX2-form
XX3-form
xxspltw XT,XB,UIM
xxsldwi XT,XA,XB,SHW
60 T /// UIM B 164 BXTX
60 T A B 0 SHW 2 AXBXTX 0 6 11 14 16 21 30 31
0 6 11 16 21 22 24 29 30 31
if MSR.VSX=0 then VSX_Unavailable()
if MSR.VSX=0 then VSX_Unavailable()
VSR[32×TX+T].word[0] VSR[32×BX+B].word[UIM]
source.qword[0] VSR[32×AX+A] VSR[32×TX+T].word[1] VSR[32×BX+B].word[UIM]
source.qword[1] VSR[32×BX+B] VSR[32×TX+T].word[2] VSR[32×BX+B].word[UIM]
VSR[32×TX+T] source.word[SHW:SHW+3] VSR[32×TX+T].word[3] VSR[32×BX+B].word[UIM]
60 T 0 IMM8 360 TX
0 6 11 13 21 31
do i = 0 to 15
VSR[32×TX+T].byte[i] UIM8
end
If (FRB)1:11 > 896 and (FRB)1:11 < 1151 then goto Normal Operand
Zero Operand:
FRT (FRB)
If (FRB)0 = 0 then FPSCRFPRF “+ zero”
If (FRB)0 = 1 then FPSCRFPRF “- zero”
FPSCRFRFI 0b00
Done
Infinity Operand:
FRT (FRB)
If (FRB)0 = 0 then FPSCRFPRF “+ infinity”
If (FRB)0 = 1 then FPSCRFPRF “- infinity”
FPSCRFRFI 0b00
Done
QNaN Operand:
FRT (FRB)0:34 || 290
FPSCRFPRF “QNaN”
FPSCRFR FI 0b00
Done
SNaN Operand:
FPSCRVXSNAN 1
If FPSCRVE = 0 then
Do
FRT0:11 (FRB)0:11
FRT12 1
FRT13:63 (FRB)13:34 || 290
FPSCRFPRF “QNaN”
End
FPSCRFR FI 0b00
Done
Normal Operand:
sign (FRB)0
exp (FRB)1:11 - 1023
frac0:52 0b1 || (FRB)12:63
Round Single(sign,exp,frac0:52,0,0,0)
FPSCRXX FPSCRXX | FPSCRFI
If exp > 127 and FPSCROE = 0 then go to Disabled Exponent Overflow
If exp > 127 and FPSCROE = 1 then go to Enabled Overflow
FRT0 sign
FRT1:11 exp + 1023
FRT12:63 frac1:52
If sign = 0 then FPSCRFPRF “+ normal number”
If sign = 1 then FPSCRFPRF “- normal number”
Done
Round Single(sign,exp,frac0:52,G,R,X):
inc 0
lsb frac23
gbit frac24
rbit frac25
xbit (frac26:52||G||R||X)0
If FPSCRRN = 0b00 then /* Round to Nearest */
Do /* comparisons ignore u bits */
If sign || lsb || gbit || rbit || xbit = 0bu11uu then inc 1
If sign || lsb || gbit || rbit || xbit = 0bu011u then inc 1
If sign || lsb || gbit || rbit || xbit = 0bu01u1 then inc 1
End
If FPSCRRN = 0b10 then /* Round toward + Infinity */
Do /* comparisons ignore u bits */
If sign || lsb || gbit || rbit || xbit = 0b0u1uu then inc 1
If sign || lsb || gbit || rbit || xbit = 0b0uu1u then inc 1
If sign || lsb || gbit || rbit || xbit = 0b0uuu1 then inc 1
End
If FPSCRRN = 0b11 then /* Round toward - Infinity */
Do /* comparisons ignore u bits */
If sign || lsb || gbit || rbit || xbit = 0b1u1uu then inc 1
If sign || lsb || gbit || rbit || xbit = 0b1uu1u then inc 1
If sign || lsb || gbit || rbit || xbit = 0b1uuu1 then inc 1
End
frac0:23 frac0:23 + inc
If carry_out = 1 then
Do
frac0:23 0b1 || frac0:22
exp exp + 1
End
frac24:52 290
FPSCRFR inc
FPSCRFI gbit | rbit | xbit
Return
if Floating Convert To Integer Word Unsigned with round toward Zero then do
round_mode 0b01
tgt_precision “32-bit unsigned integer”
end
if Floating Convert To Integer Doubleword Unsigned with round toward Zero then do
round_mode 0b01
tgt_precision “64-bit unsigned integer”
end
sign (FRB)0
if (FRB)1:11 = 2047 and (FRB)12:63 = 0 then goto Infinity Operand
if (FRB)1:11 = 2047 and (FRB)12 = 0 then goto SNaN Operand
if (FRB)1:11 = 2047 and (FRB)12 = 1 then goto QNaN Operand
if (FRB)1:11 > 1086 then goto Large Operand
Infinity Operand:
FPSCRFR 0b0
FPSCRFI 0b0
FPSCRVXCVI 0b1
if FPSCRVE = 0 then do
if tgt_precision = “32-bit signed integer” then do
if sign=0 then FRT 0xUUUU_UUUU_7FFF_FFFF
if sign=1 then FRT 0xUUUU_UUUU_8000_0000
end
else if tgt_precision = “32-bit unsigned integer” then do
if sign=0 then FRT 0xUUUU_UUUU_FFFF_FFFF
if sign=1 then FRT 0xUUUU_UUUU_0000_0000
end
else if tgt_precision = “64-bit signed integer” then do
if sign=0 then FRT 0x7FFF_FFFF_FFFF_FFFF
if sign=1 then FRT 0x8000_0000_0000_0000
end
SNaN Operand:
FPSCRFR 0b0
FPSCRFI 0b0
FPSCRVXSNAN 0b1
FPSCRVXCVI 0b1
if FPSCRVE = 0 then do
if tgt_precision = “32-bit signed integer” then FRT 0xUUUU_UUUU_8000_0000
if tgt_precision = “64-bit signed integer” then FRT 0x8000_0000_0000_0000
if tgt_precision = “32-bit unsigned integer” then FRT 0xUUUU_UUUU_0000_0000
if tgt_precision = “64-bit unsigned integer” then FRT 0x0000_0000_0000_0000
FPSCRFPRF 0bUUUUU
end
done
QNaN Operand:
FPSCRFR 0b0
FPSCRFI 0b0
FPSCRVXCVI 0b1
if FPSCRVE = 0 then do
if tgt_precision = “32-bit signed integer” then FRT 0xUUUU_UUUU_8000_0000
if tgt_precision = “64-bit signed integer” then FRT 0x8000_0000_0000_0000
if tgt_precision = “32-bit unsigned integer” then FRT 0xUUUU_UUUU_0000_0000
if tgt_precision = “64-bit unsigned integer” then FRT 0x0000_0000_0000_0000
FPSCRFPRF 0bUUUUU
end
done
Large Operand:
FPSCRFR 0b0
FPSCRFI 0b0
FPSCRVXCVI 0b1
if FPSCRVE = 0 then do
if tgt_precision = “32-bit signed integer” then do
if sign = 0 then FRT 0xUUUU_UUUU_7FFF_FFFF
if sign = 1 then FRT 0xUUUU_UUUU_8000_0000
end
else if tgt_precision = “64-bit signed integer” then do
if sign = 0 then FRT 0x7FFF_FFFF_FFFF_FFFF
if sign = 1 then FRT 0x8000_0000_0000_0000
end
else if tgt_precision = “32-bit unsigned integer” then do
if sign = 0 then FRT 0xUUUU_UUUU_FFFF_FFFF
if sign = 1 then FRT 0xUUUU_UUUU_0000_0000
end
else if tgt_precision = “64-bit unsigned integer” then do
if sign = 0 then FRT 0xFFFF_FFFF_FFFF_FFFF
if sign = 1 then FRT 0x0000_0000_0000_0000
end
FPSCRFPRF 0bUUUUU
end
done
Zero Operand:
FPSCRFR 0b00
FPSCRFI 0b00
FPSCRFPRF “+ zero”
FRT 0x0000_0000_0000_0000
done
lsb frac52
gbit frac53
rbit frac54
xbit frac55:63 > 0
end
FPSCRFR inc
FPSCRFI gbit | rbit | xbit
FPSCRXX FPSCRXX | FPSCRFI
return
sign (FRB)0
exp (FRB)1:11 - 1023 /* exp - bias */
frac0:52 0b1 || (FRB)12:63
gbit || rbit || xbit 0b000
Do i = 1, 52 - exp
frac0:52 || gbit || rbit || xbit 0b0 || frac0:52 || gbit || (rbit | xbit)
End
Do i = 2, 52 - exp
frac0:52 frac1:52 || 0b0
End
FRT0 sign
FRT1:11 exp + 1023
FRT12:63 frac1:52
SNaN Operand:
FPSCRVXSNAN 1
If FPSCRVE = 0 then
Do
FRT (FRB)
FRT12 1
FPSCRFPRF “QNaN”
End
FPSCRFR FI 0b00
Done
QNaN Operand:
FRT (FRB)
FPSCRFPRF “QNaN”
FPSCRFR FI 0b00
Done
Zero Operand:
If (FRB)0 = 0 then
Do
FRT 0x0000_0000_0000_0000
FPSCRFPRF “+ zero”
End
Else
Do
FRT 0x8000_0000_0000_0000
FPSCRFPRF “- zero”
End
FPSCRFR FI 0b00
Done
Small Operand:
If inst = Floating Round to Integer Nearest and
(FRB)1:11 < 1022 then goto Zero Operand
If inst = Floating Round to Integer Toward Zero
then goto Zero Operand
If inst = Floating Round to Integer Plus and (FRB)0
= 1 then goto Zero Operand
If inst = Floating Round to Integer Minus and
(FRB)0 = 0 then goto Zero Operand
If (FRB)0 = 0 then
Do
FRT 0x3FF0_0000_0000_0000
/* value = 1.0 */
FPSCRFPRF “+ normal number”
End
Else
Do
FRT 0xBFF0_0000_0000_0000
/* value = -1.0 */
FPSCRFPRF “- normal number”
End
FPSCRFR FI 0b00
Done
Large Operand:
FRT (FRB)
The trailing significand field of the decimal floating-point can be applied or reversed using simple Boolean oper-
data format is encoded using Densely Packed Decimal ations. In the following examples, a 3-digit BCD num-
(DPD). DPD encoding is a compression technique ber is represented as (abcd)(efgh)(ijkm), a 10-bit DPD
which supports the representation of decimal integers number is represented as (pqr)(stu)(v)(wxy), and the
of arbitrary length. Translation operates on three Boolean operations, & (AND), | (OR), and ¬ (NOT) are
Binary Coded Decimal (BCD) digits at a time com- used.
pressing the 12 bits into 10 bits with an algorithm that
B.1 BCD-to-DPD Translation with the DPD entries shown in hexadecimal format.
The BCD number is produced by replacing ‘_’ in the
The translation from a 3-digit BCD number to a 10-bit leftmost column with the corresponding digit along the
DPD can be performed through the following Boolean top row. The table is split into two halves, with the right
operations. half being a continuation of the left half.
vwxst abcd efgh ijkm DPD Code BCD Value DPD Code BCD Value
0---- 0pqr 0stu 0wxy 0x06E 0x0EE
100-- 0pqr 0stu 100y (0x16E) 888 (0x1EE) 988
101-- 0pqr 100u 0sty (0x26E) (0x2EE)
110-- 100r 0stu 0pqy (0x36E) (0x3EE)
11100 100r 100u 0pqy 0x06F 0x0EF
11101 100r 0pqu 100y (0x16F) 889 (0x1EF) 989
11110 0pqr 100u 100y (0x26F) (0x2EF)
11111 100r 100u 100y (0x36F) (0x3EF)
The full translation of the 10-bit DPD to a 3-digit BCD 0x07E 0x0FE
number is shown in Table 132 on page 790. The 10-bit (0x17E) 898 (0x1FE) 998
DPD index is produced by concatenating the 6-bit (0x27E) (0x2FE)
value shown in the left column with the 4-bit index
(0x37E) (0x3FE)
along the top row, both represented in hexadecimal.
The values in parentheses are non-preferred transla- 0x07F 0x0FF
tions and are explained further in the following section. (0x17F) 899 (0x1FF) 999
(0x27F) (0x2FF)
B.3 Preferred DPD encoding (0x37F) (0x3FF)
In order to make assembler language programs simpler to write and easier to understand, a set of extended mne-
monics and symbols is provided that defines simple shorthand for the most frequently used forms of Branch Condi-
tional, Compare, Trap, Rotate and Shift, and certain other instructions.
Assemblers should provide the extended mnemonics and symbols listed here, and may provide others.
C.1 Symbols
The following symbols are defined for use in instructions (basic or extended mnemonics) that specify a Condition
Register field or a Condition Register bit. The first five (lt, ..., un) identify a bit number within a CR field. The remainder
(cr0, ..., cr7) identify a CR field. An expression in which a CR field symbol is multiplied by 4 and then added to a
bit-number-within-CR-field symbol and 32 can be used to identify a CR bit.
The extended mnemonics in Sections C.2.2 and C.3 require identification of a CR bit: if one of the CR field symbols is
used, it must be multiplied by 4 and added to a bit-number-within-CR-field (value in the range 0-3, explicit or sym-
bolic) and 32. The extended mnemonics in Sections C.2.3 and C.5 require identification of a CR field: if one of the CR
field symbols is used, it must not be multiplied by 4 or added to 32. (For the extended mnemonics in Section C.2.3,
the bit number within the CR field is part of the extended mnemonic. The programmer identifies the CR field, and the
Assembler does the multiplication and addition required to produce a CR bit number for the BI field of the underlying
basic mnemonic.)
Examples
1. Decrement CTR and branch if it is still nonzero (closure of a loop controlled by a count loaded into CTR).
bdnz target (equivalent to: bc 16,0,target)
2. Same as (1) but branch only if CTR is nonzero and condition in CR0 is “equal”.
bdnzt eq,target (equivalent to: bc 8,2,target)
3. Same as (2), but “equal” condition is in CR5.
bdnzt 4cr5+eq,target (equivalent to: bc 8,22,target)
4. Branch if bit 59 of CR is 0.
bf 27,target (equivalent to: bc 4,27,target)
5. Same as (4), but set the Link Register. This is a form of conditional “call”.
bfl 27,target (equivalent to: bcl 4,27,target)
Code Meaning
lt Less than
le Less than or equal
eq Equal
ge Greater than or equal
gt Greater than
nl Not less than
ne Not equal
ng Not greater than
so Summary overflow
ns Not summary overflow
un Unordered (after floating-point comparison)
nu Not unordered (after floating-point comparison)
These codes are reflected in the mnemonics shown in Table 134.
Examples
1. Branch if CR0 reflects condition “not equal”.
bne target (equivalent to: bc 4,2,target)
2. Same as (1), but condition is in CR3.
Examples
1. Branch if CR0 reflects condition “less than”, specifying that the branch should be predicted to be taken.
blt+ target
2. Same as (1), but target address is in the Link Register and the branch should be predicted not to be taken.
bltlr-
Examples
1. Set CR bit 57.
crset 25 (equivalent to: creqv 25,25,25)
2. Clear the SO bit of CR0.
crclr so (equivalent to: crxor 3,3,3)
3. Same as (2), but SO bit to be cleared is in CR3.
crclr 4cr3+so (equivalent to: crxor 15,15,15)
4. Invert the EQ bit.
crnot eq,eq (equivalent to: crnor 2,2,2)
5. Same as (4), but EQ bit to be inverted is in CR4, and the result is to be placed into the EQ bit of CR5.
crnot 4cr5+eq,4cr4+eq (equivalent to: crnor 22,18,18)
C.4.2 Subtract
The Subtract From instructions subtract the second operand (RA) from the third (RB). Extended mnemonics are pro-
vided that use the more “normal” order, in which the third operand is subtracted from the second. Both these mne-
monics can be coded with a final “o” and/or “.” to cause the OE and/or Rc bit to be set in the underlying instruction.
sub Rx,Ry,Rz (equivalent to: subf Rx,Rz,Ry)
subc Rx,Ry,Rz (equivalent to: subfc Rx,Rz,Ry)
Examples
1. Compare register Rx and immediate value 100 as unsigned 64-bit integers and place result into CR0.
cmpldi Rx,100 (equivalent to: cmpli 0,1,Rx,100)
2. Same as (1), but place result into CR4.
cmpldi cr4,Rx,100 (equivalent to: cmpli 4,1,Rx,100)
3. Compare registers Rx and Ry as signed 64-bit integers and place result into CR0.
cmpd Rx,Ry (equivalent to: cmp 0,1,Rx,Ry)
Examples
1. Compare bits 32:63 of register Rx and immediate value 100 as signed 32-bit integers and place result into CR0.
cmpwi Rx,100 (equivalent to: cmpi 0,0,Rx,100)
2. Same as (1), but place result into CR4.
cmpwi cr4,Rx,100 (equivalent to: cmpi 4,0,Rx,100)
3. Compare bits 32:63 of registers Rx and Ry as unsigned 32-bit integers and place result into CR0.
cmplw Rx,Ry (equivalent to: cmpl 0,0,Rx,Ry)
Examples
1. Trap if register Rx is not 0.
tdnei Rx,0 (equivalent to: tdi 24,Rx,0)
2. Same as (1), but comparison is to register Ry.
tdne Rx,Ry (equivalent to: td 24,Rx,Ry)
3. Trap if bits 32:63 of register Rx, considered as a 32-bit quantity, are logically greater than 0x7FF.
twlgti Rx,0x7FF (equivalent to: twi 1,Rx,0x7FF)
4. Trap unconditionally.
trap (equivalent to: tw 31,0,0)
5. Trap unconditionally with immediate parameters Rx and Ry
tdu Rx,Ry (equivalent to: td 31,Rx,Ry)
Code Meaning
lt Less than
eq Equal
gt Greater than
These codes are reflected in the mnemonics shown in Table 139.
Examples
1. Set register Rx to Ry if the LT bit is set in CR0, and to Rz otherwise.
isellt Rx,Ry,Rz (equivalent to: isel Rx,Ry,Rz,0)
2. Set register Rx to Ry if the GT bit is set in CR0, and to Rz otherwise.
iselgt Rx,Ry,Rz (equivalent to: isel Rx,Ry,Rz,1)
3. Set register Rx to Ry if the EQ bit is set in CR0, and to Rz otherwise.
iseleq Rx,Ry,Rz (equivalent to: isel Rx,Ry,Rz,2)
Examples
1. Extract the sign bit (bit 0) of register Ry and place the result right-justified into register Rx.
extrdi Rx,Ry,1,0 (equivalent to: rldicl Rx,Ry,1,63)
2. Insert the bit extracted in (1) into the sign bit (bit 0) of register Rz.
insrdi Rz,Rx,1,0 (equivalent to: rldimi Rz,Rx,63,0)
3. Shift the contents of register Rx left 8 bits.
sldi Rx,Rx,8 (equivalent to: rldicr Rx,Rx,8,55)
4. Clear the high-order 32 bits of register Ry and place the result into register Rx.
clrldi Rx,Ry,32 (equivalent to: rldicl Rx,Ry,0,32)
Examples
1. Extract the sign bit (bit 32) of register Ry and place the result right-justified into register Rx.
extrwi Rx,Ry,1,0 (equivalent to: rlwinm Rx,Ry,1,31,31)
2. Insert the bit extracted in (1) into the sign bit (bit 32) of register Rz.
insrwi Rz,Rx,1,0 (equivalent to: rlwimi Rz,Rx,31,0,0)
3. Shift the contents of register Rx left 8 bits, clearing the high-order 32 bits.
slwi Rx,Rx,8 (equivalent to: rlwinm Rx,Rx,8,0,23)
4. Clear the high-order 16 bits of the low-order 32 bits of register Ry and place the result into register Rx, clearing
the high-order 32 bits of register Rx.
clrlwi Rx,Ry,16 (equivalent to: rlwinm Rx,Ry,0,16,31)
Examples
1. Copy the contents of register Rx to the XER.
mtxer Rx (equivalent to: mtspr 1,Rx)
2. Copy the contents of the LR to register Rx.
mflr Rx (equivalent to: mfspr Rx,8)
3. Copy the contents of register Rx to the CTR.
mtctr Rx (equivalent to: mtspr 9,Rx)
Load Immediate
The addi and addis instructions can be used to load an immediate value into a register. Extended mnemonics are
provided to convey the idea that no addition is being performed but merely data movement (from the immediate field
of the instruction to a register).
Load a 16-bit signed immediate value into register Rx.
li Rx,value (equivalent to: addi Rx,0,value)
Load a 16-bit signed immediate value, shifted left by 16 bits, into register Rx.
lis Rx,value (equivalent to: addis Rx,0,value)
Load Address
This mnemonic permits computing the value of a base-displacement operand, using the addi instruction which nor-
mally requires separate register and immediate operands.
la Rx,D(Ry) (equivalent to: addi Rx,Ry,D)
The la mnemonic is useful for obtaining the address of a variable specified by name, allowing the Assembler to sup-
ply the base register number and compute the displacement. If the variable v is located at offset Dv bytes from the
address in register Rv, and the Assembler has been told to use register Rv as a base for references to the data struc-
ture containing v, then the following line causes the address of v to be loaded into register Rx.
la Rx,v (equivalent to: addi Rx,Rv,Dv)
Move Register
Several Power ISA instructions can be coded in a way such that they simply copy the contents of one register to
another. An extended mnemonic is provided to convey the idea that no computation is being performed but merely
data movement (from one register to another).
The following instruction copies the contents of register Ry to register Rx. This mnemonic can be coded with a final
“.” to cause the Rc bit to be set in the underlying instruction.
mr Rx,Ry (equivalent to: or Rx,Ry,Ry)
Complement Register
Several Power ISA instructions can be coded in a way such that they complement the contents of one register and
place the result into another register. An extended mnemonic is provided that allows this operation to be coded easily.
The following instruction complements the contents of register Ry and places the result into register Rx. This mne-
monic can be coded with a final “.” to cause the Rc bit to be set in the underlying instruction.
not Rx,Ry (equivalent to: nor Rx,Ry,Ry)
Book II:
address” (VA) space managed by the operating sys- any Cache Management instruction
tem.
An access that is not atomic is performed as a set of
Each effective address is translated to a real address smaller disjoint atomic accesses. If the non-atomic
(i.e., to an address of a byte in real storage or on an I/O access is caused by an instruction other than a Load/
device) before being used to access storage. The Store Multiple or Move Assist instruction and one of the
hardware accomplishes this, using the address transla- following conditions is satisfied, the non-atomic access
tion mechanism described in Book III. The operating is performed as described in the corresponding list
system manages the real (physical) storage resources item. The first list item matching a given situation
of the system, by setting up the tables and other infor- applies.
mation used by the hardware address translation The storage operand is one quadword and is dou-
mechanism. bleword-aligned:
the access is performed as two disjoint aligned
doubleword atomic accesses.
In general, real storage may not be large enough to
The storage operand is at least eight bytes long
map all the virtual pages used by the currently active
and is word-aligned:
applications. With support provided by hardware, the
the access is performed as a set of disjoint atomic
operating system can attempt to use the available real
accesses each of which consists of one or more
pages to map a sufficient set of virtual pages of the
aligned words.
applications. If a sufficient set is maintained, “paging”
The storage operand is at least four bytes long and
activity is minimized. If not, performance degradation
is halfword-aligned:
is likely.
the access is performed as a set of disjoint atomic
The operating system can support restricted access to accesses each of which consists of one or more
virtual pages (including read/write, read only, and no aligned halfwords.
access; see Book III), based on system standards (e.g.,
In all other cases the number, length, and alignment of
program code might be read only) and application
the component disjoint atomic accesses are implemen-
requests.
tation-dependent. In all cases the relative order in
which the component disjoint atomic accesses are per-
formed is implementation-dependent.
1.4 Single-Copy Atomicity
The results for several combinations of loads and
An access is single-copy atomic, or simply atomic, if it stores to the same or overlapping locations are
is always performed in its entirety with no visible frag- described below.
mentation. Atomic accesses are thus serialized: each
happens in its entirety in some order, even when that 1. When two processors perform atomic stores to
order is not specified in the program or enforced locations that do not overlap, and no other stores
between processors. are performed to those locations, the contents of
those locations are the same as if the two stores
The access caused by an instruction other than a Load/ were performed by a single processor.
Store Multiple or Move Assist instruction is guaranteed
2. When two processors perform atomic stores to the
to be atomic if the storage operand is not larger than a
same storage location, and no other store is per-
doubleword and is aligned (see Section 1.11.1 of Book
formed to that location, the contents of that loca-
I).
tion are the result stored by one of the processors.
Quadword accesses with aligned storage operands are
3. When two processors perform stores that have the
guaranteed to be atomic when caused by the following
same target location and are not guaranteed to be
instructions.
atomic, and no other store is performed to that
lq
location, the result is some combination of the
stq
bytes stored by both processors.
lqarx
stqcx. 4. When two processors perform stores to overlap-
ping locations, and no other store is performed to
Quadword atomicity applies only to storage that is nei-
those locations, the result is some combination of
ther Write Through Required nor Caching Inhibited.
the bytes stored by the processors to the overlap-
The cases described above are the only cases in which
ping bytes. The portions of the locations that do
the access to the storage operand is guaranteed to be
not overlap contain the bytes stored by the proces-
atomic. For example, the access caused by the follow-
sor storing to the location.
ing instructions is not guaranteed to be atomic.
any Load or Store instruction for which the storage 5. When a processor performs an atomic store to a
operand is unaligned location, a second processor performs an atomic
lmw, stmw, lswi, lswx, stswi, stswx load from that location, and no other store is per-
lfdp, lfdpx, stfdp, stfdpx formed to that location, the value returned by the
load is the contents of the location before the store copy the contents of a modified data cache block
or the contents of the location after the store. to main storage (dcbst)
copy the contents of a modified data cache block
6. When a load and a store with the same target loca-
to main storage and make the copy of the block in
tion can be performed simultaneously, and the
the data cache invalid (dcbf or dcbfl)
accesses are not guaranteed to be atomic, and no
other store is performed to that location, the value
returned by the load is some combination of the
contents of the location before the store and the
1.6 Storage Control Attributes
contents of the location after the store. Some operating systems may provide a means to allow
programs to specify the storage control attributes
described in this section. Because the support pro-
vided for these attributes by the operating system may
1.5 Cache Model vary between systems, the details of the specific sys-
tem being used must be known before these attributes
A cache model in which there is one cache for instruc- can be used.
tions and another cache for data is called a “Har-
Storage control attributes are associated with units of
vard-style” cache. This is the model assumed by the
storage that are multiples of the page size. Each stor-
Power ISA, e.g., in the descriptions of the Cache Man-
age access is performed according to the storage con-
agement instructions in Section 4.3. Alternative cache
trol attributes of the specified storage location, as
models may be implemented (e.g., a “combined cache”
described below. The storage control attributes are the
model, in which a single cache is used for both instruc-
following.
tions and data, or a model in which there are several
levels of caches), but they support the programming Write Through Required
model implied by a Harvard-style cache. Caching Inhibited
Memory Coherence Required
The processor is not required to maintain copies of Guarded
storage locations in the instruction cache consistent Strong Access Order
with modifications to those storage locations (e.g.,
modifications caused by Store instructions). These attributes have meaning only when an effective
address is translated by the processor performing the
A location in the data cache is considered to be modi- storage access.
fied in that cache if the location has been modified
(e.g., by a Store instruction) and the modified data have
Programming Note
not been written to main storage.
The Write Through Required and Caching Inhibited
Cache Management instructions are provided so that attributes are mutually exclusive because, as
programs can manage the caches when needed. For described below, the Write Through Required attri-
example, program management of the caches is bute permits the storage location to be in the data
needed when a program generates or modifies code cache while the Caching Inhibited attribute does
that will be executed (i.e., when the program modifies not.
data in storage and then attempts to execute the modi-
fied data as instructions). The Cache Management Storage that is Write Through Required or Caching
instructions are also useful in optimizing the use of Inhibited is not intended to be used for general-pur-
memory bandwidth in such applications as graphics pose programming. For example, the lbarx, lharx,
and numerically intensive computing. The functions lwarx, ldarx, lqarx, stbcx., sthcx., stwcx., stdcx.,
performed by these instructions depend on the storage and stqcx. instructions may cause the system data
control attributes associated with the specified storage storage error handler to be invoked if they specify a
location (see Section 1.6, “Storage Control Attributes”). location in storage having either of these attributes.
To obtain the best performance across the widest
The Cache Management instructions allow the pro- range of implementations, storage that is Write
gram to do the following. Through Required or Caching Inhibited should be
used only when the use of such storage meets spe-
invalidate the copy of storage in an instruction
cific functional or semantic needs or enables a per-
cache block (icbi)
formance optimization.
provide a hint that an instruction will probably soon
be accessed from a specified instruction cache
block (icbt) In the remainder of this section, “Load instruction”
provide a hint that the program will probably soon includes the Cache Management and other instructions
access a specified data cache block (dcbt, dcbtst) that are stated in the instruction descriptions to be
set the contents of a data cache block to zeros “treated as a Load” unless they are explicitly excluded,
(dcbz) and similarly for “Store instruction”.
1.6.1 Write Through Required of those stores as occurring in a conflicting order. This
serialization order is an abstract sequence of values;
A store to a Write Through Required storage location is the physical storage location need not assume each of
performed in main storage. A Store instruction that the values written to it. For example, a processor may
specifies a location in Write Through Required storage update a location several times before the value is writ-
may cause additional locations in main storage to be ten to physical storage. The result of a store operation
accessed. If a copy of the block containing the speci- is not available to every processor or mechanism at the
fied location is retained in the data cache, the store is same instant, and it may be that a processor or mecha-
also performed in the data cache. The store does not nism observes only some of the values that are written
cause the block to be considered to be modified in the to a location. However, when a location is accessed
data cache. atomically and coherently by all processors and mech-
anisms, the sequence of values loaded from the loca-
In general, accesses caused by separate Store instruc- tion by any processor or mechanism during any interval
tions that specify locations in Write Through Required of time forms a subsequence of the sequence of values
storage may be combined into one access. Such com- that the location logically held during that interval. That
bining does not occur if the Store instructions are sepa- is, a processor or mechanism can never load a “newer”
rated by a sync, eieio instruction. value first and then, later, load an “older” value.
Memory coherence is managed in blocks called coher-
1.6.2 Caching Inhibited ence blocks. Their size is implementation-dependent,
An access to a Caching Inhibited storage location is but is larger than a word and is usually the size of a
performed in main storage. A Load instruction that cache block.
specifies a location in Caching Inhibited storage may For storage that is not Memory Coherence Required,
cause additional locations in main storage to be software must explicitly manage memory coherence to
accessed unless the specified location is also Guarded. the extent required by program correctness. The oper-
An instruction fetch from Caching Inhibited storage may ations required to do this may be system-dependent.
cause additional words in main storage to be accessed.
No copy of the accessed locations is placed into the Because the Memory Coherence Required attribute for
caches. a given storage location is of little use unless all pro-
cessors that access the location do so coherently, in
In general, non-overlapping accesses caused by sepa- statements about Memory Coherence Required stor-
rate Load instructions that specify locations in Caching age elsewhere in this document it is generally assumed
Inhibited storage may be combined into one access, as that the storage has the Memory Coherence Required
may non-overlapping accesses caused by separate attribute for all processors that access it.
Store instructions that specify locations in Caching
Inhibited storage. Such combining does not occur if the Programming Note
Load or Store instructions are separated by a sync
Operating systems that allow programs to request
instruction. Combining may also occur among such
that storage not be Memory Coherence Required
accesses from multiple processors that share a com-
should provide services to assist in managing
mon memory interface. No combining occurs if the
memory coherence for such storage, including all
storage is also Guarded.
system-dependent aspects thereof.
Programming Note In most systems the default is that all storage is
None of the memory barrier instructions prevent Memory Coherence Required. For some applica-
the combining of accesses from different proces- tions in some systems, software management of
sors. The Guarded storage attribute must be used coherence may yield better performance. In such
in combination with Caching Inhibited to prevent cases, a program can request that a given unit of
such combining. storage not be Memory Coherence Required, and
can manage the coherence of that storage by using
the sync instruction, the Cache Management
1.6.3 Memory Coherence instructions, and services provided by the operat-
ing system.
Required
An access to a Memory Coherence Required storage
location is performed coherently, as follows.
1.6.4 Guarded
Memory coherence refers to the ordering of stores to a A data access to a Guarded storage location is per-
single location. Atomic stores to a given location are formed only if either (a) the access is caused by an
coherent if they are serialized in some order, and no instruction that is known to be required by the sequen-
processor or mechanism is able to observe any subset tial execution model, or (b) the access is a load and the
storage location is already in a cache. If the storage is
also Caching Inhibited, only the storage location speci- The order in which a processor performs storage
fied by the instruction is accessed; otherwise any stor- accesses to SAO storage, the order in which those
age location in the cache block containing the specified accesses are performed with respect to other proces-
storage location may be accessed. sors and mechanisms, and the order in which those
accesses are performed in main storage are the same
Instructions are not fetched from virtual storage that is
except in the circumstances described in the following
Guarded. If the instruction addressed by the current
paragraph. The ordering rules for accesses performed
instruction address is in such storage, the system
by a single processor to SAO storage are as follows.
instruction storage error handler may be invoked (see
Stores are performed in program order. When a store
Section 6.5.5 of Book III).
accesses data adjacent to that which is accessed by
the next store in program order, the two storage
Programming Note accesses may be combined into a single larger access.
In some implementations, instructions may be exe- Loads are performed in program order. When a load
cuted before they are known to be required by the accesses data adjacent to that which is accessed by
sequential execution model. Because the results the next load in program order, the two storage
of instructions executed in this manner are dis- accesses may be combined into a single larger access.
carded if it is later determined that those instruc- Stores may not be performed before loads which pre-
tions would not have been executed in the cede them in program order. Loads may be performed
sequential execution model, this behavior does not before stores which precede them in program order,
affect most programs. with the provision that a load which follows a store of
the same datum (to the same address) must obtain a
This behavior does affect programs that access
value which is no older (in consideration of the possibil-
storage locations that are not “well-behaved” (e.g.,
ity of programs on other processors sharing the same
a storage location that represents a control register
storage) than the value stored by the preceding store.
on an I/O device that, when accessed, causes the
device to perform an operation). To avoid unin- When any given processor loads the datum it just
tended results, programs that access such storage stored, as described above, the load may be performed
locations should request that the storage be by the processor before the preceding store has been
Guarded, and should prevent such storage loca- performed with respect to other processors and mecha-
tions from being in a cache (e.g., by requesting that nisms, and in main storage. This may cause the pro-
the storage also be Caching Inhibited). cessor to see its store earlier relative to stores
performed by other processors than it is observed by
other processors and mechanisms, and than it is per-
1.6.5 Strong Access Order formed in memory. A direct consequence of this con-
sideration is that although programs running on each
All accesses to storage with the Strong Access Order processor will see the same sequence of accesses
(SAO) attribute (referred to as SAO storage) will be from any individual processor to SAO storage, each
performed using a set of ordering rules different from may in general see a different interleaving of the indi-
that of the weakly consistent model that is described in vidual sequences. The memory barrier instructions
Section 1.7.1, “Storage Access Ordering”. These rules may be used to establish stronger ordering, as
apply only to accesses that are caused by a Load or a described in Section 1.7.1, “Storage Access Ordering”,
Store, and not to accesses associated with those beginning with the third major bullet.
instructions. Furthermore, these rules do not apply to
accesses that are caused by or associated with instruc-
tions that are stated in their descriptions to be “treated 1.7 Shared Storage
as a Load” or “treated as a Store.” The details are
described below, from the programmer’s point of view. This architecture supports the sharing of storage
(The processor may deviate from these rules if the pro- between programs, between different instances of the
grammer cannot detect the deviation.) The SAO attri- same program, and between processors and other
bute is not intended to be used for general purpose mechanisms. It also supports access to a storage loca-
programming. It is provided in a manner that is not fully tion by one or more programs using different effective
independent of the other storage attributes. Specifi- addresses. All these cases are considered storage
cally, it is only provided for storage that is Memory sharing. Storage is shared in blocks that are an inte-
Coherence Required, but not Write Through Required, gral number of pages.
not Caching Inhibited, and not Guarded. See
Section 5.8.2.1, “Storage Control Bit Restrictions”, in When the same storage location has different effective
Book III for more details. Accesses to SAO storage are addresses, the addresses are said to be aliases. Each
likely to be performed more slowly than similar application can be granted separate access privileges
accesses to non-SAO storage. to aliased pages.
1.7.1 Storage Access Ordering accesses that includes all storage accesses asso-
ciated with instructions following the barrier-creat-
The Power ISA defines two models for the ordering of ing instruction. For each applicable pair ai,bj of
storage accesses: weakly consistent and strong storage accesses such that ai is in A and bj is in B,
access ordering. The predominant model is weakly the memory barrier ensures that ai will be per-
consistent. This model provides an opportunity for formed with respect to any processor or mecha-
improved performance over a model that has stronger nism, to the extent required by the associated
consistency rules, but places the responsibility on the Memory Coherence Required attributes, before bj
program to ensure that ordering or synchronization is performed with respect to that processor or
instructions are properly placed when storage is shared mechanism.
by two or more programs. Implementations which sup-
The ordering done by a memory barrier is said to
port SAO apply a stronger consistency model among
be “cumulative” if it also orders storage accesses
accesses to SAO storage. The order between
that are performed by processors and mecha-
accesses to SAO storage and those performed using
nisms other than P1, as follows.
the weakly consistent model is characteristic of the
weakly consistent model. The following description, - A includes all applicable storage accesses by
through the second major bullet, applies only to the any such processor or mechanism that have
weakly consistent model. The corresponding descrip- been performed with respect to P1 before the
tion for SAO storage is found in Section 1.6.5, “Strong memory barrier is created.
Access Order”. The rest of the description following the - B includes all applicable storage accesses by
second bulletted item applies to both models. any such processor or mechanism that are
The order in which the processor performs storage performed after a Load instruction executed
accesses, the order in which those accesses are per- by that processor or mechanism has returned
formed with respect to another processor or mecha- the value stored by a store that is in B.
nism, and the order in which those accesses are No ordering should be assumed among the storage
performed in main storage may all be different. Several accesses caused by a single instruction (i.e, by an
means of enforcing an ordering of storage accesses instruction for which the access is not atomic), even if
are provided to allow programs to share storage with the accesses are to SAO storage, and no means are
other programs, or with mechanisms such as I/O provided for controlling that order.
devices. These means are listed below. The phrase
“to the extent required by the associated Memory
Coherence Required attributes” refers to the Memory
Coherence Required attribute, if any, associated with
each access.
If two Store instructions or two Load instructions
specify storage locations that are both Caching
Inhibited and Guarded, the corresponding storage
accesses are performed in program order with
respect to any processor or mechanism.
If a Load instruction depends on the value returned
by a preceding Load instruction (because the
value is used to compute the effective address
specified by the second Load), the corresponding
storage accesses are performed in program order
with respect to any processor or mechanism to the
extent required by the associated Memory Coher-
ence Required attributes. This applies even if the
dependency has no effect on program logic (e.g.,
the value returned by the first Load is ANDed with
zero and then added to the effective address spec-
ified by the second Load).
When a processor (P1) executes a Synchronize or
eieio instruction a memory barrier is created,
which orders applicable storage accesses pair-
wise, as follows. Let A be a set of storage
accesses that includes all storage accesses asso-
ciated with instructions preceding the barrier-creat-
ing instruction, and let B be a set of storage
Programming Note
Because stores cannot be performed “out-of-order” not order the Store Conditional's store with respect
(see Book III), if a Store instruction depends on the to storage accesses caused by instructions that
value returned by a preceding Load instruction follow the Branch.
(because the value returned by the Load is used to
Because processors may predict branch target
compute either the effective address specified by the
addresses and branch condition resolution, control
Store or the value to be stored), the corresponding stor-
dependencies (e.g., branches) do not order stor-
age accesses are performed in program order. The
age accesses except as described above. For
same applies if whether the Store instruction is exe-
example, when a subroutine returns to its caller the
cuted depends on a conditional Branch instruction that
return address may be predicted, with the result
in turn depends on the value returned by a preceding
that loads caused by instructions at or after the
Load instruction.
return address may be performed before the load
Because an isync instruction prevents the execution of that obtains the return address is performed.
instructions following the isync until instructions pre-
Because processors may implement nonarchitected
ceding the isync have completed, if an isync follows a
duplicates of architected resources (e.g., GPRs, CR
conditional Branch instruction that depends on the
fields, and the Link Register), resource dependencies
value returned by a preceding Load instruction, the
(e.g., specification of the same target register for two
load on which the Branch depends is performed before
Load instructions) do not order storage accesses.
any loads caused by instructions following the isync.
This applies even if the effects of the “dependency” are Examples of correct uses of dependencies, sync and
independent of the value loaded (e.g., the value is lwsync to order storage accesses can be found in
compared to itself and the Branch tests the EQ bit in Appendix B. “Programming Examples for Sharing Stor-
the selected CR field), and even if the branch target is age” on page 913.
the sequentially next instruction.
Because the storage model is weakly consistent, the
With the exception of the cases described above and sequential execution model as applied to instructions
earlier in this section, data dependencies and control that cause storage accesses guarantees only that
dependencies do not order storage accesses. Exam- those accesses appear to be performed in program
ples include the following. order with respect to the processor executing the
instructions. For example, an instruction may com-
If a Load instruction specifies the same storage
plete, and subsequent instructions may be executed,
location as a preceding Store instruction and the
before storage accesses caused by the first instruction
location is in storage that is not Caching Inhibited,
have been performed. However, for a sequence of
the load may be satisfied from a “store queue” (a
atomic accesses to the same storage location, if the
buffer into which the processor places stored val-
location is in storage that is Memory Coherence
ues before presenting them to the storage subsys-
Required the definition of coherence guarantees that
tem), and not be visible to other processors and
the accesses are performed in program order with
mechanisms. A consequence is that if a subse-
respect to any processor or mechanism that accesses
quent Store depends on the value returned by the
the location coherently, and similarly if the location is in
Load, the two stores need not be performed in pro-
storage that is Caching Inhibited.
gram order with respect to other processors and
mechanisms. Because accesses to storage that is Caching Inhibited
Because a Store Conditional instruction may com- are performed in main storage, memory barriers and
plete before its store has been performed, a condi- dependencies on Load instructions order such
tional Branch instruction that depends on the CR0 accesses with respect to any processor or mechanism
value set by a Store Conditional instruction does even if the storage is not Memory Coherence Required.
Processor B:
loads from location X obtaining the value 1.7.3 Storage Ordering of I/O
1, executes a sync instruction, then
stores the value 2 to location Y Accesses
Processor C: A “coherence domain” consists of all processors and all
loads from location Y obtaining the value interfaces to main storage. Memory reads and writes
2, executes a sync instruction, then loads initiated by mechanisms outside the coherence domain
from location X are performed within the coherence domain in the
order in which they enter the coherence domain and
Example 2: are performed as coherent accesses.
Processor A:
stores the value 1 to location X, executes 1.7.4 Atomic Update
a sync instruction, then stores the value 2
to location Y The Load And Reserve and Store Conditional instruc-
tions together permit atomic update of a shared storage
Processor B:
location. There are byte, halfword, word, doubleword,
loops loading from location Y until the
and quadword forms of each of these instructions.
value 2 is obtained, then stores the value
Described here is the operation of the word forms
3 to location Z
lwarx and stwcx.; operation of the byte, halfword, dou-
Processor C: bleword, and quadword forms lbarx, stbcx., lharx,
loads from location Z obtaining the value sthcx., ldarx, stdcx., lqarx, and stqcx. is the same
3, executes a sync instruction, then loads except for obvious substitutions.
from location X
The lwarx instruction is a load from a word-aligned
In both cases, cumulative ordering dictates that the location that has two side effects. Both of these side
value loaded from location X by processor C is 1. effects occur at the same time that the load is per-
formed.
1. A reservation for a subsequent stwcx. instruction
1.7.2 Storage Ordering of Copy/ is created.
Paste-Initiated Data Transfers 2. The memory coherence mechanism is notified that
a reservation exists for the storage location speci-
The Copy-Paste Facility (see Section 4.4) uses pairs of
fied by the lwarx.
instructions to initiate 128-byte data transfers. They
are referred to as “data transfers” to differentiate them The stwcx. instruction is a store to a word-aligned loca-
from the “normal” storage accesses caused by or asso- tion that is conditioned on the existence of the reserva-
ciated with loads, stores, and instructions that are tion created by the lwarx and on whether the same
treated as loads and stores. In the absence of barriers, storage location is specified by both instructions. To
the relative ordering among adjacent data transfers or emulate an atomic operation with these instructions, it
data transfers and storage accesses is not defined, and is necessary that both the lwarx and the stwcx. specify
the sequential execution model and coher- the same storage location.
ence-required ordering relationships do not apply. To
establish order between adjacent data transfers or A stwcx. performs a store to the target storage location
between data transfers and storage accesses, hwsync only if the reservation created by the lwarx still exists at
must be used. See the description of the Synchronize the time the stwcx. is executed, and only if the storage
instruction in Section 4.6.3 for more information. locations specified by the two instructions are in the
same aligned block of real storage whose size is the
smallest real page size supported by the implementa-
1.7.4.1 Reservations
The ability to emulate an atomic operation using lwarx
and stwcx. is based on the conditional behavior of
stwcx., the reservation created by lwarx, and the
clearing of that reservation if the target storage location
is modified by another processor or mechanism before
the stwcx. performs its store.
A reservation is held on an aligned unit of real storage
called a reservation granule. The size of the reserva-
tion granule is 2n bytes, where n is implementa-
tion-dependent but is always at least 4 (thus the
minimum reservation granule size is a quadword), and
where 2n is not larger than the smallest real page size
Programming Note
In general, programming conventions must ensure 1.7.4.2 Forward Progress
that lwarx and stwcx. specify addresses that Forward progress in loops that use lwarx and stwcx. is
match; a stwcx. should be paired with a specific achieved by a cooperative effort among hardware, sys-
lwarx to the same storage location. Situations in tem software, and application software.
which a stwcx. may erroneously be issued after
some lwarx other than that with which it is intended The architecture guarantees that when a processor
to be paired must be scrupulously avoided. For executes a lwarx to obtain a reservation for location X
example, there must not be a context switch in and then a stwcx. to store a value to location X, either
which the processor holds a reservation in behalf of
the old context, and the new context resumes after 1. the stwcx. succeeds and the value is written to
a lwarx and before the paired stwcx.. The stwcx. location X, or
in the new context might succeed, which is not
2. the stwcx. fails because some other processor or
what was intended by the programmer. Such a situ-
mechanism modified location X, or
ation must be prevented by executing a stbcx.,
sthcx., stwcx., stdcx., or stqcx. that specifies a 3. the stwcx. fails because the processor’s reserva-
dummy writable aligned location as part of the con- tion was lost for some other reason.
text switch; see Section 6.4.3 of Book III.
In Cases 1 and 2, the system as a whole makes prog-
ress in the sense that some processor successfully
modifies location X. Case 3 covers reservation loss
required for correct operation of the rest of the system.
This includes cancellation caused by some other pro-
cessor or mechanism writing elsewhere in the reserva-
tion granule, cancellation caused by the operating
system in managing certain limited resources such as
real storage, and cancellation caused by any of the
other effects listed in see Section 1.7.4.1.
An implementation may make a forward progress guar-
antee, defining the conditions under which the system
as a whole makes progress. Such a guarantee must
specify the possible causes of reservation loss in Case
3. While the architecture alone cannot provide such a
guarantee, the characteristics listed in Cases 1 and 2
are necessary conditions for any forward progress
guarantee. An implementation and operating system
can build on them to provide such a guarantee.
mechanism. Set A contains all data accesses caused In this section, including its subsections, it is assumed
by instructions preceding the tend., including Sus- that all instructions for which execution is attempted are
pended state accesses, that are neither Write Through in storage that is not Caching Inhibited and (unless
Required nor Caching Inhibited. Set B contains all data instruction address translation is disabled; see Book III)
accesses caused by instructions following the tend. is not Guarded, and from which instruction fetching
that are neither Write Through Required nor Caching does not cause the system error handler to be invoked
Inhibited. The ordering done by this memory barrier is (e.g., from which instruction fetching is not prohibited
cumulative. by the “address translation mechanism” or the “storage
protection mechanism”; see Book III).
Programming Note
The memory barriers that are created by the exe- Programming Note
cution of a successful transaction (those associ- The results of attempting to execute instructions
ated with tbegin., tend., and the integrated from storage that does not satisfy this assumption
cumulative memomry barrier) render most explicit are described in Section 1.6.2 and Section 1.6.4 of
memory barriers in and around transactions redun- this Book and in Book III.
dant. An exception is when there is a need to
establish order among Suspended state accesses. For each instance of executing an instruction from loca-
tion X, the instruction may be fetched multiple times.
The instruction cache is not necessarily kept consistent
1.8.1 Rollback-Only Transactions with the data cache or with main storage. It is the
A Rollback-Only Transaction (ROT) is a sequence of responsibility of software to ensure that instruction stor-
instructions that is executed, or not, as a unit. The pur- age is consistent with data storage when such consis-
pose of the ROT is to enable bulk speculation of tency is required for program correctness.
instructions with minimum overhead. It leverages the After one or more bytes of a storage location have been
rollback mechanism that is invoked as part of transac- modified and before an instruction located in that stor-
tion failure handling, but has reduced overhead in that it age location is executed, software must execute the
does not have the full atomic nature of the transaction appropriate sequence of instructions to make instruc-
and its synchronization and serialization properties. tion storage consistent with data storage. Otherwise
The absence of a (normal) transaction’s atomic quality the result of attempting to execute the instruction is
means that a ROT must not be used to manipulate boundedly undefined except as described in
shared data. Section 1.9.1, “Concurrent Modification and Execution
More specifically, a ROT differs from a normal transac- of Instructions” on page 825.
tion as follows.
ROTs are not serialized.
There are no memory barriers created by tbegin.
and tend.
A ROT has no integrated cumulative memory bar-
rier.
There is no monitoring of storage locations speci-
fied by loads for modification by other processors
and mechanisms between the performing of the
loads and the completion of the ROT.
The stores that are included in the ROT need not
appear to be performed as an aggregate store.
(Implementations are likely to provide an aggre-
gate store appearance, but the correctness of the
program must not depend on the aggregate store
appearance.)
Programming Note
2.1 Performance-Optimized
Instruction Sequences
Performance-optimized instruction sequences are
instruction sequences that provide better performance
than other ways of achieving the same results. The
supported performance-optimized sequences are
shown in the following sections. In order to achieve the
improved performance, the sequences must be coded
exactly as shown, including instruction order, register
re-use, and lack of intervening instructions. The pro-
cessor achieves the improved performance by execut-
ing the sequence as a single operation, or in some
other highly efficient, sequence-specific, manner. (The
improved performance may not be obtained if the
sequence causes the system error handler to be
invoked, or for implementation-dependent reasons.)
Programming Note
Even independent of the performance optimization described above, the techniques illustrated in Table 1 and Table 2
generally perform better than other ways of achieving the effect of having a large displacement field for D-form and
DS-form fixed-point Load/Store instructions (Table 1), and of having a displacement field for X-form Vector and VSX
Load/Store instructions (Table 2).
The technique for the fixed-point Load/Store instructions is complicated by the fact that D-form and DS-form Loads
and Stores treat the D/DS value as signed.
For simplicity, most of this Note assumes that the fixed-point Load/Store instruction is D-form; the modifications for
DS-form fixed-point Load/Store instructions are straightforward.
Let the desired effective address to load from or store to be (RA) + DISP, where DISP is a signed 32-bit value.
(RA) + DISP = (RA) + DISP0:15 || DISP16:31
= (RA) + (DISP0:15 || 0x0000) + DISP16:31
where DISP0:15 is a signed 16-bit value.
If DISP0:15 is used as the SI value for the addis, the addis forms the sum
(RA) + (DISP0:15 || 0x0000)
and places the result into Rx.
If DISP16:31 is used as the D value for the Load or Store and Rx is used as the base register for the Load or Store,
and DISP16 = 0, the Load or Store computes the EA to load from as
(Rx) + DISP16:31 = (RA) + (DISP0:15 || 0x0000) + DISP16:31
= (RA) + DISP
However, because D-form Loads and Stores treat the D value as signed, if DISP16 = 1 the Load or Store computes
the EA as
(Rx) + DISP16:31 = (RA) + (DISP0:15 || 0x0000) + DISP16:31 + 0xFFFF_FFFF_FFFF_0000
= (RA) + (DISP0:15 || 0x0000) + DISP16:31 - 216
= (RA) + DISP - 216
To compensate for this effective subtraction of 216, if DISP16 = 1 the SI value used for the addis must be
DISP0:15 + 1. Then the addis sets Rx to
(RA) + ((DISP0:15 + 1) || 0x0000) = (RA) + (DISP0:15 || 0x0000) + 216
and the Load or Store computes the EA as
(Rx) + DISP16:31 = (RA) + (DISP0:15 || 0x0000) + 216 + DISP16:31 - 216
= (RA) + DISP
as desired.
Thus the rules for using the technique illustrated in Table 1 are as follows.
For the RA field of the addis, use the desired base register for the Load or Store.
For the D field of the Load or Store, use DISP16:31.
(For DS-form Loads and Stores, for the DS field use DISP16:29; DISP30:31 are 0b00.)
For the SI field of the addis:
- if DISP16 = 0 use DISP0:15;
- if DISP16 = 1 use DISP0:15 + 1.
2.1.2 32-Bit Constant Generation The following instruction sequence will optimize perfor-
mance when zero-extending the result of a 32-bit addi-
The following instruction sequences will optimize per- tion.
formance when generating zero-extended 32-bit
unsigned constants (when RA0:63 equal 0) and when Operation Instruction
performing 32-bit logical operations on RA32:63). Sequence
Unsigned constant add Rx,RA,RB
Operation Instruction (RA + RB zero extended) rldicl Rt,Rx,0,32
Sequence Table 6: 32-bit Zero-Extended addition
Unsigned constant oris Rx,RA,UIh
(UIh,UIl zero extended) ori Rt,Rx,UIl
Unsigned constant xoris Rx,RA,UIh
(UIh,UIl zero extended) xori Rt,Rx,UIl
Table 3: 32-bit Unsigned Constant Generation
Operation Instruction
Sequence
Signed consant addis Rx,RA,SIh
(SIh,SIl sign extended) addi Rt,Rx,SIl
Signed consant addis Rx,0,SIh
(SIh sign extended; UI zero ori Rt,Rx,UIl
extended)
Table 4: 32-bit Signed Constant Generation
Instruction
Sequence
add Rx,RA,RB
extsw[.] Rt,Rx
addi Rx,RA,SI
extsw[.] Rt,Rx
addis Rx,RA,SI
extsw[.] Rt,Rx
subf Rx,RA,RB
extsw[.] Rt,Rx
neg Rx,RA
extsw[.] Rt,Rx
Table 5: 32-bit Sign Extended Addition
Programming Note
See the Programming Notes for Table 1.
Programming Note
Table 8 includes only the Type-A Multiply-Add
instructions because supporting only one of the two
types (i.e. either Type-A or Type-M) is sufficient to
preserve the contents of the destination operand of
the permute or Multiply-Add instruction. The xxlor
instruction “preserves” the contents of the destina-
tion operand by copying it into another register, and
the copy is then used as the destination operand of
the Multiply-Add instruction, which is overwritten
upon execution.
Programming Note
In order to ensure that the contents of registers are
preserved to the extent that a partially executed
instruction can be re-executed correctly, the regis-
ters that are preserved must satisfy the following
conditions. For any given instruction, zero or more
of the conditions applies.
For a fixed-point Load instruction that is not a
multiple or string form, if RT=RA or RT=RB
then the contents of register RT are not
altered.
For an update form Load or Store instruction,
the contents of register RA are not altered.
Programming Note
Bit(s) Description
or Rx,Rx,Rx can be used to modify the PRI field;
11:13 Program Priority (PRI) see Section 3.2.
(PPR3243:45)
001 very low Programming Note
010 low
011 medium low When the system error handler is invoked, the PRI
100 medium field may be set to an undefined value.
101 medium high
Programs can always set the PRI field to very
low, low, medium low, and medium priorities;
programs may be allowed to set the PRI field
to medium high priority during certain time
intervals. (See Section 4.3.8.) If the program
priority is medium high when the time interval
expires or if an attempt is made to set the pri-
ority to medium high when it is not allowed,
the PRI field is set to medium.
If other values are written to this field, the PRI
field is not changed. (See Section 4.3.7 of
Book III for additional information.)
All other fields are reserved.
Figure 1. Program Priority Register
Rx PPRPRI Priority
31 001 very low
1 010 low
6 011 medium low
2 100 medium
5 101 medium high
Table 9: Priority levels for or Rx,Rx,Rx
Programs can always set the PRI field to very low, low,
medium low, and medium priorities; programs may be
allowed to set the PRI field to medium high priority
during certain time intervals. (See Section 4.3.8 of
Book III.) If the program priority is medium high when
the time interval expires or if an attempt is made to set
the priority to medium high when it is not allowed, the
PRI field is set to medium.
Programming Note
Warning: Other forms of or Rx,Rx,Rx that are not
described in this section and in Section 4.3.3 may
also cause program priority to change. Use of
these forms should be avoided except when soft-
ware explicitly intends to alter program priority. If a
no-op is needed, the preferred no-op (ori 0,0,0)
should be used.
URG
CNT
SSE
STE
LSD
LTE
0 default
1 not urgent
0 38 39 40 41 42 43 44 45 54 55 57 58 59 60 61 63
2 least urgent
Figure 2. Data Stream Control Register 3 less urgent
4 medium
Bit(s) Description 5 urgent
6 more urgent
39 Software Transient Enable (SWTE) 7 most urgent
0 SWTE is disabled. 58 Load Stream Disable (LSD)
0 No effect.
Programming Note
The URG, LSD, SNSE and SSE fields do not affect
the initiation of streams specified using the dcbt
and dcbtst instructions.
Note that even when SNSE is not set, hardware
may detect Stride-N streams in intervals when they
access elements that map to sequential cache
blocks.
Programming Note
Accesses that are caused by or associated with
Cache Management instructions that are “treated
as a Load” or “treated as a Store” are not subject to
the special ordering rules described for SAO stor-
age. These accesses are always performed in
accordance with the weakly consistent storage
model.
31 /// RA RB 982 /
0 6 11 16 21 31 31 / CT RA RB 22 /
0 6 7 11 16 21 31
Programming Note
Because the instruction is treated as a Load, the
effective address is translated using translation
resources that are used for data accesses, even
though the block being invalidated was copied into
the instruction cache based on translation
resources used for instruction fetches (see Book
III).
Programming Note
The invalidation of the specified block need not
have been performed with respect to the processor
executing the icbi instruction until a subsequent
isync instruction has been executed by that pro-
cessor. No other instruction or event has the corre-
sponding effect.
Programming Note
The architecture does not provide a way to specify
the size of the data elements that compose a
stream. An implementation may assume some
fixed size for all data elements. As a result,
depending on the offset, stride, and size (and in
particular whether the elements are aligned), the
implementation may reduce the latency for access-
ing only a portion of some of the elements. A future
version of the architecture may enable the specifi-
cation of element size to avoid this limitation.
Programming Note
To obtain the best performance across the widest range At each level of the storage hierarchy that is “near”
of implementations that support the data stream vari- the processor, elements of a data stream that is
ants of dcbt/dcbtst, the programmer should assume specified as transient are most likely to be
the following model when using those variants. replaced. As a result, it may be desirable to stag-
ger addresses of streams (choose addresses that
The processor’s response to a hint that the pro-
map to different cache congruence classes) to
gram will probably soon access a given data
reduce the likelihood that an element of a transient
stream is to take actions that reduce the latency of
stream will be replaced prior to being accessed by
accesses to the first few elements of the stream.
the program.
(Such actions may include prefetching cache
blocks into levels of the storage hierarchy that are Processors that comply with versions of the archi-
“near” the processor.) Thereafter, as the program tecture that do not support the TH field at all treat
accesses each successive element of the stream, TH = 0b01000, 0b01010, and 0b01011 as if TH =
the processor takes latency-reducing actions for 0b00000.
additional elements of the stream, pacing these
A single set of stream IDs is shared between the
actions with the program’s accesses (i.e., taking
dcbt and dcbtst instructions.
the actions for only a limited number of elements
ahead of the element that the program is currently On some implementations, data streams that are
accessing). not specified by software may be detected by the
processor. Such data streams are called “hard-
The processor’s response to a hint that the pro-
ware-detected data streams”. On some such
gram will probably no longer access a given data
implementations, data stream resources
stream, or to the cessation of existence of a data
(resources that are used primarily to support data
stream, is to stop taking latency-reducing actions
streams) are shared between software-specified
for the stream.
data streams and hardware-detected data
A data stream having finite length ceases to exist streams. On these latter implementations, the pro-
when the latency-reducing actions have been gramming model includes the following.
taken for all elements of the stream.
- Software-specified data streams take prece-
If the program ceases to need a given data stream dence over hardware-detected data streams
before having accessed all elements of the stream in use of data stream resources.
(always the case for streams having unlimited
- The processor’s response to a hint that the
length), performance may be improved if the pro-
program will probably no longer access a
gram then provides a hint that it will no longer
given data stream, or to the cessation of exis-
access the stream (e.g., by executing the appropri-
tence of a data stream, includes releasing the
ate dcbt instruction with TH=0b01010 and
associated data stream resources, so that
S0b00).
they can be used by hardware-detected data
streams.
Programming Note
The latency-reducing actions taken in response to
a program's hints about access to a data stream,
including the depth and urgency parameters, may
vary based on its behavior and on the behavior of
other programs sharing platform resources, as well
as on the design of the platform resources they
use. Without actually changing the stream specifi-
cation or DSCR parameters, the processor may
adjust its actions (e.g. slow down prefetches or be
more selective choosing them) based on their
effectiveness and on the availability of storage
bandwidth. In general, the goal of this variation is
to improve overall system performance and fair-
ness across the set of programs that share
resources. There often will be a performance ben-
efit, however, from adjusting stream specifications
to the platform and co-resident programs to adjust
for these actions by the processor.
Programming Note
This Programming Note describes several aspects of ceding dcbt/dcbtst instructions, and another eieio
using the data stream variants of the dcbt and dcbtst instruction must separate that dcbt instruction
instructions. from the following dcbt/dcbtst instructions.
A non-transient data stream having unlimited In practice, the second eieio described above can
length and which will access consecutive units in sometimes be omitted. For example, if the pro-
storage can be completely specified, including pro- gram consists of an outer loop that contains the
viding the hint that the program will probably soon dcbt/dcbtst instructions and an inner loop that
access it, using one dcbt instruction. The corre- contains the Load or Store instructions that access
sponding specification for a data stream having the data streams, the characteristics of the inner
other attributes requires two or three dcbt/dcbtst loop and of the implementation’s branch prediction
instructions to describe the stream and one addi- mechanisms may make it highly unlikely that hints
tional dcbt instruction to start the stream. How- corresponding to a given iteration of the outer loop
ever, one dcbt instruction with TH=0b01010 and will be provided out of program order with respect
GO=1 can apply to a set of the data streams to hints corresponding to the previous iteration of
described in the preceding sentence, so the corre- the outer loop. (Also, any providing of hints out of
sponding specification for n such data streams program order affects only performance, not pro-
requires 2n to 3n dcbt/dcbtst instructions plus gram correctness.)
one dcbt instruction. (There is no need to execute
To mitigate the effects of interrupts on data
a dcbt/dcbtst instruction with TH=0b01010 and
streams, it may be desirable to specify a given
S=0b10 for a given stream ID before using the
“logical” data stream as a sequence of shorter,
stream ID for a new data stream; the implicit por-
component data streams. Similar considerations
tion of the hint provided by dcbt/dcbtst instruc-
apply to conditions and events that, depending on
tions that describe data streams suffices.)
the implementation, may cause an existing data
If it is desired that the hint provided by a given stream to cease to exist; for example, in some
dcbt/dcbtst instruction be provided in program implementations an existing data stream ceases to
order with respect to the hint provided by another exist when it comes to the end of a virtual page.
dcbt/dcbtst instruction, the two instructions must
If it is desired to specify data streams without
be separated by an eieio instruction. For example,
regard to the number of stream IDs provided by
if a dcbt instruction with TH=0b01010 and GO=1
the implementation, stream IDs should be
is intended to indicate that the program will proba-
assigned to data streams in order of decreasing
bly soon access nascent data streams described
stream importance (stream ID 0 to the most
(completely) by preceding dcbt/dcbtst instruc-
important stream, stream ID 1 to the next most
tions, and is intended not to indicate that the pro-
important stream, etc.). This order ensures that the
gram will probably soon access nascent data
hints for the most important data streams will be
streams described (completely) by following dcbt/
provided.
dcbtst instructions, an eieio instruction must sep-
arate the dcbt instruction with GO=1 from the pre-
Programming Note
The processor’s response to the hint that access to
TH=0b10000 the block will be transient is to prefetch data into
the cache hierarchy in a way that minimizes the
If TH=0b10000, the dcbt instruction provides a hint that displacement of data that has not been identified as
the program will probably soon load from the block con- transient.
taining the byte addressed by EA, and that the pro-
gram’s need for the block will be transient (i.e., the time
interval during which the program accesses the block is
likely to be short). TH=0b10001
If TH=0b10001, the dcbt instruction provides a hint that
the program will probably not access the block contain-
ing the byte addressed by EA for a relatively long
period of time.
31 TH RA RB 246 /
0 6 11 16 21 31
Let the effective address (EA) be the sum (RA|0)+(RB). Data Cache Block set to Zero X-form
The dcbtst instruction provides a hint that describes a
dcbz RA,RB
block or data stream to which the program may perform
a Store access, or indicates the expected use thereof.
31 /// RA RB 1014 /
A hint that the program will soon store to a given stor-
0 6 11 16 21 31
age location is ignored if the location is Caching Inhib-
ited or Guarded.
if RA = 0 then b 0
The only operation that is “caused by” the dcbtst else b (RA)
instruction is the providing of the hint. The actions (if EA b + (RB)
any) taken by the processor in response to the hint are n block size (bytes)
not considered to be “caused by” or “associated with” m log2(n)
the dcbtst instruction (e.g., dcbtst is considered not to ea EA0:63-m || m0
cause any data accesses). No means are provided by MEM(ea, n) n0x00
which software can synchronize these actions with the Let the effective address (EA) be the sum (RA|0)+(RB).
execution of the instruction stream. For example, these
actions are not ordered by memory barriers. All bytes in the block containing the byte addressed by
EA are set to zero.
The dcbtst instruction may complete before the opera-
tion it causes has been performed. This instruction is treated as a Store (see Section 4.3).
The nature of the hint depends, in part, on the value of Special Registers Altered:
the TH field, as specified at the beginning of this sec- None
tion. If TH0b01010 and TH0b01011, this instruction
is treated as a Store (see Section 4.3), except that the Programming Note
system data storage error handler is not invoked, refer- dcbz does not cause the block to exist in the data
ence recording need not be done, and change record- cache if the block is in storage that is Caching
ing is not done. Inhibited.
Special Registers Altered: For storage that is neither Write Through Required
None nor Caching Inhibited, dcbz provides an efficient
means of setting blocks of storage to zero. It can
Extended Mnemonics:
be used to initialize large areas of such storage, in
Extended mnemonics are provided for the Data Cache a manner that is likely to consume less memory
Block Touch for Store instruction so that it can be coded bandwidth than an equivalent sequence of Store
with the TH value as the last operand for all categories, instructions.
and so that the transient hint can be specified without
For storage that is either Write Through Required
coding the TH field explicitly.
or Caching Inhibited, dcbz is likely to take signifi-
cantly longer to execute than an equivalent
Extended: Equivalent to: sequence of Store instructions. For example, on
dcbtstct RA,RB,TH dcbtst for TH values of 0b00000 some implementations dcbz for such storage may
or 0b00000 - 0b00111; cause the system alignment error handler to be
other TH values are invalid. invoked; on such implementations the system
dcbtstds RA,RB,TH dcbtst for TH values of 0b00000 alignment error handler sets the specified block to
or 0b01000 - 0b01111; zero using Store instructions.
other TH values are invalid. See Section 5.9.1 of Book III for additional informa-
dcbtstt RA,RB dcbtst for TH value of 0b10000. tion about dcbz.
Programming Note
See the Programming Notes at the beginning of
this section.
Data Cache Block Store X-form Data Cache Block Flush X-form
dcbst RA,RB dcbf RA,RB,L
31 /// RA RB 54 / 31 /// L RA RB 86 /
0 6 11 16 21 31 0 6 9 11 16 21 31
Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB).
If the block containing the byte addressed by EA is in L=0
storage that is Memory Coherence Required and a
If the block containing the byte addressed by EA is
block containing the byte addressed by EA is in the
in storage that is Memory Coherence Required
data cache of any processor and any locations in the
and a block containing the byte addressed by EA
block are considered to be modified there, those loca-
is in the data cache of any processor and any loca-
tions are written to main storage, additional locations in
tions in the block are considered to be modified
the block may be written to main storage, and the block
there, those locations are written to main storage
ceases to be considered to be modified in that data
and additional locations in the block may be written
cache.
to main storage. The block is invalidated in the
If the block containing the byte addressed by EA is in data caches of all processors.
storage that is not Memory Coherence Required and If the block containing the byte addressed by EA is
the block is in the data cache of this processor and any in storage that is not Memory Coherence Required
locations in the block are considered to be modified and the block is in the data cache of this processor
there, those locations are written to main storage, addi- and any locations in the block are considered to be
tional locations in the block may be written to main stor- modified there, those locations are written to main
age, and the block ceases to be considered to be storage and additional locations in the block may
modified in that data cache. be written to main storage. The block is invalidated
The function of this instruction is independent of in the data cache of this processor.
whether the block containing the byte addressed by EA L=1 (“dcbf local”)
is in storage that is Write Through Required or Caching
Inhibited. The L=1 form of the dcbf instruction permits a pro-
gram to limit the scope of the “flush” operation to
This instruction is treated as a Load (see Section 4.3), the data cache of this processor. If the block con-
except that reference and change recording need not taining the byte addressed by EA is in the data
be done. cache of this processor, it is removed from this
Special Registers Altered: cache. The coherence of the block is maintained to
None the extent required by the Memory Coherence
Required storage attribute.
L = 3 (“dcbf local primary”)
The L=3 form of the dcbf instruction permits a pro-
gram to limit the scope of the “flush” operation to
the primary data cache of this processor. If the
block containing the byte addressed by EA is in the
primary data cache of this processor, it is removed
from this cache. The coherence of the block is
maintained to the extent required by the Memory
Coherence Required storage attribute.
For the L operand, the value 2 is reserved. The results
of executing a dcbf instruction with L=2 are boundedly
undefined.
The function of this instruction is independent of
whether the block containing the byte addressed by EA
is in storage that is Write Through Required or Caching
Inhibited.
This instruction is treated as a Load (see Section 4.3),
except that reference and change recording need not
be done.
Programming Note
This form of the or instruction can be used to
reduce latency in producer-consumer applications
by requesting that modified data be made visible to
other processors quickly. In this example it is
assumed that the base register is GPR3.
Producer:
addi r1,r0,0x1234
sth r1,0x1000(r3) # store data value 0x1234
lwsync # order data store before
flag store
addi r2,r0,0x0001
stb r2,0x1002(r3) # store nonzero flag byte
or r26,r26,r26 # miso
p_loop:
lbz r2,0x1002(r3) # load flag byte
andi. r2,r2,0x00FF
bne p_loop # wait for consumer to clear
# flag
Consumer:
c_loop:
lbz r2,0x1002(r3) # load flag byte
andi. r2,r2,0x00FF
beq c_loop # wait for producer to set
# flag to nonzero
lwsync # order flag load before
# data load
lhz r1,0x1000(r3) # load data value
lwsync # order data load before
# flag store
addi r2,r0,0x0000
stb r2,0x1002(r3) # clear flag byte
or r26,r26,r26 # miso
Programming Note
Warning: Other forms of or Rx,Rx,Rx that are not
described in this section and in Section 3.2 may
also cause program priority to change. Use of
these forms should be avoided except when soft-
ware explicitly intends to alter program priority. If a
no-op is needed, the preferred no-op (ori 0,0,0)
should be used.
Programming Note
It is always best to avoid unnecessary instructions
between the copy and the paste.
CR0 Description
0b000||XERSO Data transfer failed due to a
sequence error or a conflict with
tlbie or some implementa-
tion-specific problem.
Function Code GPR operands Storage operands Function name and RTL
10000 RT, RT1, RT+2 mem(EA,s) Compare and Swap Not Equal
t mem(EA, s)
if t != (RT+1) then mem(EA,s) (RT+2)
RT t
Notes:
s = operand size in number of bytes
Function codes not listed in this table are considered invalid.
For word atomics, only the least significant word of each source register is used, and the least significant word of
the target register is updated with the result, while the upper word is set to zero.
31 RT RA FC 582 / 31 RT RA FC 614 /
0 6 11 16 21 31 0 6 11 16 21 31
Function Code GPR operands Storage operands Function name and RTL
Notes:
s = operand size in number of bytes
Function codes not listed in this table are considered invalid.
For word atomics, only the least significant word of each source register is used.
31 RS RA FC 710 / 31 RS RA FC 742 /
0 6 11 16 21 31 0 6 11 16 21 31
Programming Note
The Memory Coherence Required attribute on
other processors and mechanisms ensures that
their stores to the reservation granule will cause
the reservation created by the Load And Reserve
instruction to be lost.
Programming Note Let the effective address (EA) be the sum (RA|0)+(RB).
The byte in storage addressed by EA is loaded into
EH = 1 should be used when the program is obtain-
RT56:63. RT0:55 are set to 0.
ing a lock variable which it will subsequently
release before another program attempts to per- This instruction creates a reservation for use by a
form a store to it. When contention for a lock is sig- stbcx. instruction. A real address computed from the
nificant, using this hint may reduce the number of EA as described in Section 1.7.4.1 is associated with
times a cache block is transferred between proces- the reservation, and replaces any address previously
sor caches.
associated with the reservation. A length of 1 byte is
EH = 0 should be used when all accesses to a associated with the reservation, and replaces any
mutex variable are performed using an instruction length previously associated with the reservation.
sequence with Load and Reserve followed by Store
Conditional (e.g., emulating atomic update primi- The value of EH provides a hint as to whether the pro-
tives such as “Fetch and Add;” see Appendix B). gram will perform a subsequent store to the byte in
The processor may use this hint to optimize the storage addressed by EA before some other processor
cache to cache transfer of the block containing the attempts to modify it.
mutex variable, thus reducing the latency of per- 0 Other programs might attempt to modify
forming an operation such as ‘Fetch and Add’. the byte in storage addressed by EA
regardless of the result of the correspond-
Engineering Note
Programming Note ing stbcx. instruction.
1 Other programs will not attempt to modify
Either value of the EH field is appropriate for a the byte in storage addressed by EA until
Load and Reserve instruction that is intended to the program that has acquired the lock
establish a reservation for a subsequent waitrsv
performs a subsequent store releasing the
and not a subsequent Store Conditional instruction.
lock.
Special Registers Altered:
Programming Note
None
Warning: On some processors that comply with
versions of the architecture that precede Version Programming Note
2.00, executing a Load And Reserve instruction in
which EH = 1 will cause the illegal instruction error lbarx serves as both a basic and an extended
handler to be invoked. mnemonic. The Assembler will recognize a lbarx
mnemonic with four operands as the basic form,
and a lbarx mnemonic with three operands as the
extended form. In the extended form the EH oper-
and is omitted and assumed to be 0.
Load Halfword And Reserve Indexed Load Word And Reserve Indexed X-form
X-form
lwarx RT,RA,RB,EH
lharx RT,RA,RB,EH
31 RT RA RB 20 EH
31 RT RA RB 116 EH 0 6 11 16 21 31
0 6 11 16 21 31
if RA = 0 then b 0
if RA = 0 then b 0 else b (RA)
else b (RA) EA b +(RB)
EA b +(RB) RESERVE 1
RESERVE 1 RESERVE_LENGTH 4
RESERVE_LENGTH 2 RESERVE_ADDR real_addr(EA)
RESERVE_ADDR real_addr(EA) RT 320 || MEM(EA, 4)
RT 480 || MEM(EA, 2)
Let the effective address (EA) be the sum (RA|0)+(RB).
Let the effective address (EA) be the sum (RA|0)+(RB). The word in storage addressed by EA is loaded into
The halfword in storage addressed by EA is loaded into RT32:63. RT0:31 are set to 0.
RT48:63. RT0:47 are set to 0.
This instruction creates a reservation for use by a
This instruction creates a reservation for use by a stwcx. instruction. A real address computed from the
sthcx. instruction. A real address computed from the EA as described in Section 1.7.4.1 is associated with
EA as described in Section 1.7.4.1 is associated with the reservation, and replaces any address previously
the reservation, and replaces any address previously associated with the reservation. A length of 4 bytes is
associated with the reservation. A length of 2 bytes is associated with the reservation, and replaces any
associated with the reservation, and replaces any length previously associated with the reservation.
length previously associated with the reservation.
The value of EH provides a hint as to whether the pro-
The value of EH provides a hint as to whether the pro- gram will perform a subsequent store to the word in
gram will perform a subsequent store to the halfword in storage addressed by EA before some other processor
storage addressed by EA before some other processor attempts to modify it.
attempts to modify it. 0 Other programs might attempt to modify
0 Other programs might attempt to modify the word in storage addressed by EA
the halfword in storage addressed by EA regardless of the result of the correspond-
regardless of the result of the correspond- ing stwcx. instruction.
ing sthcx. instruction. 1 Other programs will not attempt to modify
1 Other programs will not attempt to modify the word in storage addressed by EA until
the halfword in storage addressed by EA the program that has acquired the lock
until the program that has acquired the performs a subsequent store releasing the
lock performs a subsequent store releas- lock.
ing the lock.
EA must be a multiple of 4. If it is not, either the system
EA must be a multiple of 2. If it is not, either the system alignment error handler is invoked or the results are
alignment error handler is invoked or the results are boundedly undefined.
boundedly undefined.
Special Registers Altered:
Special Registers Altered: None
None
Programming Note
Programming Note lwarx serves as both a basic and an extended
lharx serves as both a basic and an extended mnemonic. The Assembler will recognize a lwarx
mnemonic. The Assembler will recognize a lharx mnemonic with four operands as the basic form,
mnemonic with four operands as the basic form, and a lwarx mnemonic with three operands as the
and a lharx mnemonic with three operands as the extended form. In the extended form the EH oper-
extended form. In the extended form the EH oper- and is omitted and assumed to be 0.
and is omitted and assumed to be 0.
Store Byte Conditional Indexed X-form the store is performed, the value of n is undefined (and
need not reflect whether the store was performed).
stbcx. RS,RA,RB
CR0LT GT EQ SO = 0b00 || n || XERSO
31 RS RA RB 694 1
The reservation is cleared.
0 6 11 16 21 31
Special Registers Altered:
if RA = 0 then b 0 CR0
else b (RA)
EA b + (RB)
if RESERVE then
if RESERVE_LENGTH = 1 &
RESERVE_ADDR = real_addr(EA) then
MEM(EA, 1) (RS)56:63
undefined_case 0
store_performed 1
else
z smallest real page size supported by
implementation
if RESERVE_ADDR z = real_addr(EA) z then
undefined_case 1
else
undefined_case 0
store_performed 0
else
undefined_case 0
store_performed 0
if undefined_case then
u1 undefined 1-bit value
if u1 then
MEM(EA, 1) (RS)56:63
u2 undefined 1-bit value
CR0 0b00 || u2 || XERSO
else
CR0 0b00 || store_performed || XERSO
RESERVE 0
Let the effective address (EA) be the sum (RA|0)+(RB).
If a reservation exists, the length associated with the
reservation is 1 byte, and the real storage location
specified by the stbcx. is the same as the real storage
location specified by the lbarx instruction that estab-
lished the reservation, (RS)56:63 are stored into the
byte in storage addressed by EA.
If a reservation exists, and either the length associated
with the reservation is not 1 byte or the real storage
location specified by the stbcx. is not the same as the
real storage location specified by the lbarx instruction
that established the reservation, the following applies.
Let z denote the smallest real page size supported by
the implementation. If the real storage location speci-
fied by the stbcx. is in the same aligned z-byte block of
real storage as the real storage location specified by
the lbarx instruction that established the reservation, it
is undefined whether (RS)56:63 are stored into the byte
in storage addressed by EA. Otherwise, no store is per-
formed.
If a reservation does not exist, no store is performed.
CR Field 0 is set as follows. n is a 1-bit value that indi-
cates whether the store was performed, except that if,
per the preceding description, it is undefined whether
Store Halfword Conditional Indexed per the preceding description, it is undefined whether
X-form the store is performed, the value of n is undefined (and
need not reflect whether the store was performed).
sthcx. RS,RA,RB
CR0LT GT EQ SO = 0b00 || n || XERSO
31 RS RA RB 726 1
The reservation is cleared.
0 6 11 16 21 31
EA must be a multiple of 2. If it is not, either the system
if RA = 0 then b 0 alignment error handler is invoked or the results are
else b (RA) boundedly undefined.
EA b + (RB)
Special Registers Altered:
if RESERVE then
if RESERVE_LENGTH = 2 & CR0
RESERVE_ADDR = real_addr(EA) then
MEM(EA, 2) (RS)48:63
undefined_case 0
store_performed 1
else
z smallest real page size supported by
implementation
if RESERVE_ADDR z = real_addr(EA) z then
undefined_case 1
else
undefined_case 0
store_performed 0
else
undefined_case 0
store_performed 0
if undefined_case then
u1 undefined 1-bit value
if u1 then
MEM(EA, 2) (RS)48:63
u2 undefined 1-bit value
CR0 0b00 || u2 || XERSO
else
CR0 0b00 || store_performed || XERSO
RESERVE 0
Let the effective address (EA) be the sum (RA|0)+(RB).
If a reservation exists, the length associated with the
reservation is 2 bytes, and the real storage location
specified by the sthcx. is the same as the real storage
location specified by the lharx instruction that estab-
lished the reservation, (RS)48:63 are stored into the
halfword in storage addressed by EA.
If a reservation exists, and either the length associated
with the reservation is not 2 bytes or the real storage
location specified by the sthcx. is not the same as the
real storage location specified by the lharx instruction
that established the reservation, the following applies.
Let z denote the smallest real page size supported by
the implementation. If the real storage location speci-
fied by the sthcx. is in the same aligned z-byte block of
real storage as the real storage location specified by
the lharx instruction that established the reservation, it
is undefined whether (RS)48:63 are stored into the half-
word in storage addressed by EA. Otherwise, no store
is performed.
If a reservation does not exist, no store is performed.
CR Field 0 is set as follows. n is a 1-bit value that indi-
cates whether the store was performed, except that if,
Store Word Conditional Indexed X-form the store is performed, the value of n is undefined (and
need not reflect whether the store was performed).
stwcx. RS,RA,RB
CR0LT GT EQ SO = 0b00 || n || XERSO
31 RS RA RB 150 1
The reservation is cleared.
0 6 11 16 21 31
EA must be a multiple of 4. If it is not, either the system
if RA = 0 then b 0 alignment error handler is invoked or the results are
else b (RA) boundedly undefined.
EA b + (RB)
Special Registers Altered:
if RESERVE then
if RESERVE_LENGTH = 4 & CR0
RESERVE_ADDR = real_addr(EA) then
MEM(EA, 4) (RS)32:63
undefined_case 0
store_performed 1
else
z smallest real page size supported by
implementation
if RESERVE_ADDR z = real_addr(EA) z then
undefined_case 1
else
undefined_case 0
store_performed 0
else
undefined_case 0
store_performed 0
if undefined_case then
u1 undefined 1-bit value
if u1 then
MEM(EA, 4) (RS)32:63
u2 undefined 1-bit value
CR0 0b00 || u2 || XERSO
else
CR0 0b00 || store_performed || XERSO
RESERVE 0
Let the effective address (EA) be the sum (RA|0)+(RB).
If a reservation exists, the length associated with the
reservation is 4 bytes, and the real storage location
specified by the stwcx. is the same as the real storage
location specified by the lwarx instruction that estab-
lished the reservation, (RS)32:63 are stored into the
word in storage addressed by EA.
If a reservation exists, and either the length associated
with the reservation is not 4 bytes or the real storage
location specified by the stwcx. is not the same as the
real storage location specified by the lwarx instruction
that established the reservation, the following applies.
Let z denote the smallest real page size supported by
the implementation. If the real storage location speci-
fied by the stwcx. is in the same aligned z-byte block of
real storage as the real storage location specified by
the lwarx instruction that established the reservation, it
is undefined whether (RS)32:63 are stored into the word
in storage addressed by EA. Otherwise, no store is per-
formed.
If a reservation does not exist, no store is performed.
CR Field 0 is set as follows. n is a 1-bit value that indi-
cates whether the store was performed, except that if,
per the preceding description, it is undefined whether
31 RT RA RB 84 EH 31 RS RA RB 214 1
0 6 11 16 21 31 0 6 11 16 21 31
if RA = 0 then b 0 if RA = 0 then b 0
else b (RA) else b (RA)
EA b +(RB) EA b + (RB)
RESERVE 1 if RESERVE then
RESERVE_LENGTH 8 if RESERVE_LENGTH = 8 &
RESERVE_ADDR real_addr(EA) RESERVE_ADDR = real_addr(EA) then
RT MEM(EA, 8) MEM(EA, 8) (RS)
undefined_case 0
Let the effective address (EA) be the sum (RA|0)+(RB). store_performed 1
The doubleword in storage addressed by EA is loaded else
into RT. z smallest real page size supported by
implementation
This instruction creates a reservation for use by a
if RESERVE_ADDR z = real_addr(EA) z then
stdcx. instruction. A real address computed from the undefined_case 1
EA as described in Section 1.7.4.1 is associated with else
the reservation, and replaces any address previously undefined_case 0
associated with the reservation. A length of 8 bytes is store_performed 0
associated with the reservation, and replaces any else
length previously associated with the reservation. undefined_case 0
store_performed 0
The value of EH provides a hint as to whether the pro- if undefined_case then
gram will perform a subsequent store to the double- u1 undefined 1-bit value
word in storage addressed by EA before some other if u1 then
processor attempts to modify it. MEM(EA, 8) (RS)
0 Other programs might attempt to modify u2 undefined 1-bit value
the doubleword in storage addressed by CR0 0b00 || u2 || XERSO
else
EA regardless of the result of the corre-
CR0 0b00 || store_performed || XERSO
sponding stdcx. instruction. RESERVE 0
1 Other programs will not attempt to modify
the doubleword in storage addressed by Let the effective address (EA) be the sum (RA|0)+(RB).
EA until the program that has acquired the If a reservation exists, and either the length associated
lock performs a subsequent store releas- with the reservation is not 8 bytes or the real storage
ing the lock. location specified by the stdcx. is the same as the real
EA must be a multiple of 8. If it is not, either the system storage location specified by the ldarx instruction that
alignment error handler is invoked or the results are established the reservation, (RS) is stored into the dou-
boundedly undefined. bleword in storage addressed by EA.
Special Registers Altered: If a reservation exists, and either the length associated
None with the reservation is not 8 bytes or the real storage
location specified by the stdcx. is not the same as the
Programming Note real storage location specified by the ldarx instruction
ldarx serves as both a basic and an extended that established the reservation, the following applies.
mnemonic. The Assembler will recognize a ldarx Let z denote the smallest real page size supported by
mnemonic with four operands as the basic form, the implementation. If the real storage location speci-
and a ldarx mnemonic with three operands as the fied by the stdcx. is in the same aligned z-byte block of
extended form. In the extended form the EH oper- real storage as the real storage location specified by
and is omitted and assumed to be 0. the ldarx instruction that established the reservation, it
is undefined whether (RS) is stored into the double-
word in storage addressed by EA. Otherwise, no store
is performed.
If a reservation does not exist, no store is performed.
Load Quadword And Reserve Indexed If RTp is odd, RTp=RA, or RTp=RB the instruction form
X-form is invalid. If RTp=RA or RTp=RB, an attempt to execute
this instruction will invoke the system illegal instruction
lqarx RTp,RA,RB,EH error handler. (The RTp=RA case includes the case of
RTp=RA=0.)
31 RTp RA RB 276 EH
Special Registers Altered:
0 6 11 16 21 31
None
Store Quadword Conditional Indexed per the preceding description, it is undefined whether
X-form the store is performed, the value of n is undefined (and
need not reflect whether the store was performed).
stqcx. RSp,RA,RB
CR0LT GT EQ SO = 0b00 || n || XERSO
31 RSp RA RB 182 1
The reservation is cleared.
0 6 11 16 21 31
EA must be a multiple of 16. If it is not, either the sys-
if RA = 0 then b 0 tem alignment error handler is invoked or the results
else b (RA) are boundedly undefined.
EA b + (RB)
If RSp is odd, the instruction form is invalid.
if RESERVE then
if RESERVE_LENGTH = 16 & Special Registers Altered:
RESERVE_ADDR = real_addr(EA) then
MEM(EA, 16) (RSp) CR0
undefined_case 0
store_performed 1
else
z smallest real page size supported by
implementation
if RESERVE_ADDR z = real_addr(EA) z then
undefined_case 1
else
undefined_case 0
store_performed 0
else
undefined_case 0
store_performed 0
if undefined_case then
u1 undefined 1-bit value
if u1 then
MEM(EA, 16) (RSp)
u2 undefined 1-bit value
CR0 0b00 || u2 || XERSO
else
CR0 0b00 || store_performed || XERSO
RESERVE 0
Let the effective address (EA) be the sum (RA|0)+(RB).
If a reservation exists, the length associated with the
reservation is 16 bytes, and the real storage location
specified by the stqcx. is the same as the real storage
location specified by the lqarx instruction that estab-
lished the reservation, (RSp) is stored into the quad-
word in storage addressed by EA.
If a reservation exists, and either the length associated
with the reservation is not 16 bytes or the real storage
location specified by the stqcx. is not the same as the
real storage location specified by the lqarx instruction
that established the reservation, the following applies.
Let z denote the smallest real page size supported by
the implementation. If the real storage location speci-
fied by the stqcx. is in the same aligned z-byte block of
real storage as the real storage location specified by
the lqarx instruction that established the reservation, it
is undefined whether (RSp) is stored into the quadword
in storage addressed by EA. Otherwise, no store is per-
formed.
If a reservation does not exist, no store is performed.
CR Field 0 is set as follows. n is a 1-bit value that indi-
cates whether the store was performed, except that if,
Extended Mnemonics:
Extended mnemonics for Synchronize:
Programming Note
sync serves as both a basic and an extended mne-
monic. Assemblers will recognize a sync mne-
monic with one operand as the basic form, and a
sync mnemonic with no operand as the extended
form. In the extended form the L operand is omitted
and assumed to be 0.
Programming Note
The sync instruction can be used to ensure that all
stores into a data structure, caused by Store
instructions executed in a “critical section” of a pro-
gram, will be performed with respect to another
processor before the store that releases the lock is
performed with respect to that processor; see
Section B.2, “Lock Acquisition and Release, and
Related Techniques” on page 915.
The memory barrier created by a sync instruction
with L=1 does not order implicit storage accesses
or instruction fetches. The memory barrier created
by a sync instruction with L=0 (or L=2) orders
implicit storage accesses and instruction fetches
associated with instructions preceding the sync
instruction but not those associated with instruc-
tions following the sync instruction.
In order to obtain the best performance across the
widest range of implementations, the programmer
should use the sync instruction with L=1, or the
eieio instruction, if any of these is sufficient for his
needs; otherwise he should use sync with L=0.
sync with L=2 should not be used by application
programs.
Programming Note
The functions provided by sync with L=1 are a
strict subset of those provided by sync with L=0.
(The functions provided by sync with L=2 are a
strict superset of those provided by sync with L=0;
see Book III.)
The wait instruction is used to stop instruction fetching task completion by accelerators in the platform
and execution until certain events occur. These events (referred to as “platform notify”).
include exceptions (see Section 1.2.1 of Book III),
event-based branch exceptions (see Section 1.1), and
Rollback-Only Transactions
Rollback-Only Transactions (ROTs) differ from normal
transactions in that they are speculative but not atomic.
They are initated by a unique variant of tbegin. They
may be nested with other ROTs or with normal transac-
tions. When a normal transaction is nested within a
ROT, the behavior from the normal tbegin. until the end
of the outer transaction is characteristic of a normal
transaction. Although subject to failure from storage
conflicts, the typical cause of ROT failure is via a
Tabort variant that is executed after the program
detects an error in its (software) speculation. Except
where specifically differentiated or where differences
follow from specific differentiation, the following
description applies to ROTs as well as normal transac-
tions.
Programming Note
5.2 Transactional Memory Facil- The intent of the Suspended execution state is to
temporarily escape from transactional handling
ity States when transactional semantics are undesirable.
Examples of such cases include storage updates
The transactional memory facility supports several
that should be retained in the event of transactional
modes of operation, referred to in this document as the
failure, which is useful for debugging, interthread
“transaction state.” These states control the behavior of
communication, the access of Caching Inhibited
storage accesses made during the transaction and the
storage, and the handling of interrupts. In the event
handling of transaction failure. Changes to transaction
of transaction failure during the Suspended execu-
state affect all transactions currently using the transac-
tion state, failure handling is deferred until transac-
tional facility on the affected thread: the outer transac-
tional execution is resumed, allowing the block of
tion as well as any nested transactions, should they
Suspended state code to complete its activities.
exist.
Non-transactional: The default, initial state of execu-
tion; no transaction is executing. The transactional Programming Note
facility is available for the initiation of a new transaction. During Suspended state execution, accessing stor-
age locations that have been transactionally
Transactional: This state is initiated by the execution accessed by the same thread prior to entering Sus-
of a tbegin. instruction in the Non-transactional state. pended state requires special care, because failure
Storage accesses (data accesses) caused by instruc- may occur due to uncontrollable events such as
tions executed in the Transactional state are performed interactions with other threads or the operating sys-
transactionally. Other storage accesses associated tem. Up until a transaction fails, loads from transac-
with instructions executed in the Transactional state tionally modified storage locations will return the
(instruction fetches, implicit accesses) are performed transactionally modified data. However once the
non-transactionally. In the event of transaction failure, transaction fails, the loads may return either the
failure is recorded as defined in Section 5.3.2, and con- transactionally updated version of storage, or a
trol is transferred to the failure handler as described in non-transactional version. Suspended state stores
Section 5.3.3. to transactionally modified blocks cause the
Suspended: The Suspended execution state is thread’s transaction to fail.
explicitly entered with the execution of a tsuspend.
form of tsr. instruction during a transaction, the execu- Table 9 enumerates the set of Transactional Memory
tion of a trechkpt. instruction from Non-transactional instructions and events that can cause changes to the
state, or as a side-effect of an interrupt while in the transaction state. Transaction states are abbreviated N
Transactional state. Storage accesses and accesses to (Non-transactional), T (Transactional), and S (Sus-
pended). (Interrupts, and the rfebb, rfid, rfscv, hrfid,
Programming Note
tbegin. in Suspended state merely updates CR0.
When tbegin. is followed by beq, this will result in
a transfer to the failure handler. Nothing more
severe (e.g. an interrupt) is required. The failure
handler for a transaction for which initiation may be
attempted in Suspended state should test CR0 to
determine whether tbegin. was executed in Sus-
pended state. If so, it should attempt to emulate
the transaction non-transactionally. (This case can
arise, for example, if a transaction enters Sus-
pended state and then calls a library routine that
independently attempts to use transactions.)
Notice that, although a failure handler runs in
Non-transactional state when reached because the
transaction has failed, it runs in Suspended state
for the case discussed in this Programmng Note.)
N T N2 N2 N2 N2 Not appli- N6 S7
cable
T T N, if outer trans-
N
3,4 S T
N
3,4
N
3 S6
action or A=1
form; otherwise T
S
S
1 S6 S
3
S
2
T
5
S
3
N
3 S6
Notes
1. CR0 updated indicating transactional initiation was unsuccessful, due to a pre-existing transaction occupying the
transactional facility.
2. Execution of these operations does not affect transaction state, allowing for the instructions to be used in software
modules called from Non-transactional, Transactional, and Suspended paths.
3. If failure recording has not previously occurred, failure recording is performed as defined in Section 5.3.2.
4. Failure handling is performed as defined in Section 5.3.3.
5. If failure has occurred during Suspended execution, failure handling will be performed sometime after the execu-
tion of tresume, and no later than the set of events listed in Section 5.3.3.
6. Generate TM Bad Thing type Program interrupt.
7. If TEXASRFS=0, generate a TM Bad Thing type Program interrupt.
treclaim., the following things occur. CR0 is set to 5.4.1 Transaction Failure Handler
0b101 || 0. The transaction state is set to Non-transac-
tional, and control flow is redirected to the instruction Address Register (TFHAR)
address stored in TFHAR. If the failure is caused by
The Transaction Failure Handler Address Register is a
treclaim., CR0 is not set to indicate failure and the
64-bit SPR that records the effective address of a soft-
transaction’s failure handler is not invoked.
ware failure handler used in the event of transaction
The following RTL function specifies the actions taken failure. Bits 62:63 are reserved.
during the handling of transaction failure:
TFHA //
0 62 63
TMHandleFailure()
If the transactional footprint has not previ- Figure 5. Transaction Failure Handler Address
ously been discarded Register (TFHAR)
Discard transactional footprint
Revert checkpointed registers to pre-transac- This register is written with the NIA for the tbegin. as a
tional values side-effect of the execution of an outer tbegin. instruc-
Discard all resources related to current trans- tion (tbegin. executed in the Non-transactional state).
action
MSRTS 0b00 #Non-transactional
If failure was not caused by treclaim., 5.4.2 Transaction EXception And
NIA TFHAR
CR0 0b101 || 0 Status Register (TEXASR)
Upon failure detected in Suspended state from causes The Transaction EXception And Status Register is a
other than the execution of a treclaim. instruction, fail- 64-bit register, containing a transaction level (TEXAS-
ure handling is deferred until the transaction is RTL) and status information for use by transaction fail-
resumed. Once resumed, failure handling will occur no ure handlers. The identification of the cause and
later than the set of failure synchronizing events listed persistence of transaction failure reported in bits 7:30
above. Upon failure in Suspended state caused by tre- may rarely be inaccurate, except that if bit 31 is set to 1
claim., failure handling is immediate (but CR0 is not set then bits 7:30 are always accurate. Bits 0:31 are called
to indicate failure and the transaction’s failure handler the failure cause in the instruction descriptions.
is not invoked).
Programming Note
Programming Note Warning: In addition to the contents of bits 7:30 of
A Load instruction executed immediately after tre- the TEXASR being rarely inaccurate, new failure
claim. or a conditional or unconditional Abort causes may be added over time, and/or an existing
instruction is guaranteed not to load a transactional cause may be divided into two or more subsets of
storage update. causes (which may differ in their persistence). Fur-
ther, speculative execution can cause unexpected
contents of these bits. As a result, except when
failure is caused by a treclaim. or tabort. instruc-
tion (including conditional tabort.), software must
not depend on the contents of bits 7:30 for its cor-
5.4 Transactional Memory Facil- rect execution. Guidelines follow.
ity Registers The bits should only be used to determine the
approach to the computation, e.g. whether the
The architecture is augmented with three Special Pur- computation is suitable to use the Transac-
pose Registers in support of transactional memory. tional Memory facility, how to adapt it best for
TFHAR stores the effective address of the software fail- TM, or how many retries to attempt before per-
ure handler used in the event of transaction failure. forming the operation non-transactionally.
TFIAR is used to inform software of the exact location Software should use the persistence indication
of the transaction failure, when possible. TEXASR con- when none of the causes that were defined at
tains a transaction level indicating the nesting depth of the time the software was written is indicated.
an active transaction, as well as an indicator of the Under no circumstances should software
cause of transaction failure and some machine state depend on the transience of a failure. There
when the transaction failed. These registers can be must always be a limit to the number of retries
written only when in Non-transactional state, and for before performing the operation non-transac-
TFHAR, also when in Suspended state. tionally.
9 Nesting Overflow
Bit(s Description The maximum transaction level was
0:6 Failure Code exceeded. When set, TFIAR is exact.
The Failure Code is copied from the tabort. or 10 Footprint Overflow, Self-Induced
treclaim. source operand. When set, TFIAR The tracking limit for transactional storage
is exact. accesses was exceeded when this thread was
7 Failure Persistent the only thread using the transactional foot-
The failure is likely to recur on each execution print tracking resources. When set, TFIAR is
of the transaction. This bit is set to 1 for an approximation.
causes in bits 8:11, copied from the tabort. or
treclaim. source operand when RA is non-
zero, and set to 0 for all other failure causes. 11 Self-Induced Conflict
A self-induced conflict occurred in Suspended
Programming Note state, due to one of the following: a store to a
storage location that was previously accessed
For tabort. and treclaim., the Failure Per- transactionally; a dcbf, dcbi, or icbi specify-
sistent bit may be viewed as an eighth bit ing a block that was previously accessed
in the failure code in that both fields are transactionally; a dcbst specifying a block
supplied by the least significant byte of RA that was previously written transactionally; a
and software may use all eight to differen- Load Atomic or Store Atomic instruction speci-
tiate among the cases for which it per- fying a block that was previously accessed
forms an abort or reclaim. However, transactionally, or a copy from a block that
software is expected to organize its cases was previously accessed transactionally.
so that bit 7 predicts the persistence of the When set, TFIAR may be exact.
case.
12 Non-Transactional Conflict
A conflict occurred with a non-transactional
Programming Note access by another processor. When set,
TFIAR is an approximation.
The inaccuracy of the Failure Persistent 13 Transaction Conflict
bit may arise from either of two causes. A conflict occurred with another transaction.
First, a kind of failure that is usually tran- When set, TFIAR may be exact.
sient, such as conflict with another thread, 14 Translation Invalidation Conflict
may in certain unusual circumstances be A conflict occurred with a TLB or SLB invalida-
persistent. Second, if the cause of trans- tion. When set, TFIAR is an approximation.
action failure is identified incorrectly, the
Failure Persistent bit will inherit this inac- 15 Implementation-specific
curacy -- i.e., will be set to 0 or 1 based on An implementation-specific condition caused
the identified failure cause. (Neither of the transaction to fail. Such conditions are
these causes applies if TEXASR31=1.) transient and the value in the TFIAR may be
exact.
8 Disallowed 16 Instruction Fetch Conflict
The instruction, SPR, or access type is not An instruction fetch (by this or another thread)
permitted. When set, TFIAR is exact. See was performed from a storage location that
was previously written transactionally. Such Otherwise the value in the TFIAR is approxi-
conditions are transient and the value in the mate.
TFIAR may be exact.
38 ROT
17 Footprint Overflow, Externally-Induced Set to 1 when a ROT is initiated. Set to zero
The tracking limit for transactional storage when a non-ROT tbegin. is executed.
accesses was exceeded when other threads,
39 Reserved
in addition to this thread, were using the trans-
actional footprint tracking resources. This bit 40:51 Reserved
is also set when a cache block eviction 52:63 Transaction Level (TL)
causes the transaction to fail. When set, Transaction level (nesting depth + 1) for the
TFIAR is an approximation. active transaction, if any; otherwise 0 if the
most recently executed transaction completed
Programming Note successfully, or the transaction level at which
Appropriate behavior of the failure handler the most recently executed transaction failed
when the tracking limit is exceeded due if the most recently executed transaction did
partly to transactions running on other not complete successfully.
threads may include re-executing the
transaction after a significant and random-
ized amount of time has elapsed. (This The transaction level in TEXASRTL contains an
policy will tend to spread out the contend- unsigned integer indicating whether the current trans-
ing transactions in time, and thereby action is an outer transaction (TEXASRTL = 1), or is
reduce their simultaneous use of the nested (TEXASRTL > 1), and if nested, its depth. The
transactional footprint tracking resources.) maximum transaction level supported by a given imple-
Some designs may provide hardware mentation is of the form 2t - 1. The value of t corre-
assistance in reducing contention for the sponding to the smallest maximum is 4; the value of t
tracking resources. Writers of failure han- corresponding to the largest maximum is 12. This value
dlers should see the Users’ Manual for the is tied to the “Maximum transaction level” parameter
implementation to understand how to ben- useful for application programmers, as specified in
efit from the hardware behavior. Section 4.1. The high-order 12-t bits of TEXASRTL are
Transaction failure due to cache block treated as reserved.
eviction is expected to be sufficiently rare Transaction failure information is contained in TEX-
that handling it as if the failure were ASR0:37. The fields of TEXASR are initialized upon the
caused by exceeding the tracking limit is successful initiation of a transaction from the
acceptable. Non-transactional state, by setting TEXASRTL to 1,
indicating an outer transaction, and all other fields to 0.
18:30 Reserved for future failure causes
When transaction failure is recorded, the failure sum-
31 Abort mary bit TEXASRFS is set to 1, indicating that failure
Termination was caused by the execution of a has been detected for the active transaction and that
tabort., tabortdc., tabortdci., tabortwc., failure recording has been performed. TEXASR0:31 are
tabortwci. or treclaim. instruction. When due set indicating the source of the failure. Exactly one of
to tabort. or treclaim., bits in TEXASR0:7 are bits 8 through 31 will be set indicating the instruction or
user-supplied. When set, TFIAR is exact. event that caused failure. In the event of failure due to
32 Suspended the execution of a tabort., tabortdc., tabortdci.,
When set to 1, the failure was recorded in tabortwc., tabortwci. or treclaim. instruction, TEX-
Suspended state. When set to 0, the failure ASR31 is set to 1, and, for tabort. and treclaim., a soft-
was recorded in Transactional state. ware defined failure code is copied from a register
operand to TEXASR0:7. TEXASRSuspended indicates
33 Reserved whether the transaction was in the Suspended state at
34:35 Privilege the time that failure occurred. The values of MSRHV
The thread was in this privilege state when the and MSRPR at the time that failure occurs are copied to
failure was recorded. This was the value TEXASR34 and TEXASR35, respectively. In some cir-
MSRHV PR when the failure was recorded. cumstances, the failure causing instruction address in
TFIAR may not be exact. In such circumstances, TEX-
36 Failure Summary (FS) ASR37 is set to 0 indicating that the contents of TFIAR
Set to 1 when a failure has been detected and are not exact; otherwise TEXASR37 is set to 1.
failure recording has been performed.
37 TFIAR Exact
Set to 1 when the value in the TFIAR is exact.
Programming Note
The transaction level contained in TEXASRTL
should be interpreted by software as follows:
When in the Transactional or Suspended state, this
field contains an unsigned integer representing the
transaction level of the active transaction, with 1
indicating an outer transaction, and a number
greater than 1 indicating a nested transaction. The
nesting depth of the active transaction is TEXAS-
RTL – 1.
When in the Non-transactional state, TEXASRTL
contains 0 if the last transaction committed suc-
cessfully, otherwise it contains the transaction level
at which the most recent transaction failed.
Programming Note
The Privilege bits in TEXASR represent the state of
the machine at the point when failure occurs. This
information may be used by problem state software
to determine whether an unexpected hypervisor or
operating system interaction was responsible for
transaction failure. This information may be useful
to operating systems or hypervisors when restoring
register state for failure handling after the transac-
tional facility was reclaimed, to determine which of
the operating system or the hypervisor has retained
the pre-transactional version of the checkpointed
registers.
TFIA Privilege
0 62 63
5.5 Transactional Facility by the given processor until the transaction state has
been modified. In particular:
Instructions All exceptions that will be caused by the previously
initiated instructions are recorded in the FPSCR
Similar to the Floating-Point Status and Control Regis- before the transaction state is modified.
ter instructions, modifications of transaction state All invocations of the system floating-point enabled
caused by the execution of Transactional Memory exception error handler that will be caused by the
instructions or by failure handling synchronize the previously initiated instructions have occurred
effects of exception-causing floating-point instructions before the transaction state is modified.
executed by a given processor. Executing a Transacti- No subsequent floating-point instruction that alters
nal Memory instruction, or invocation of the failure han- the settings of any FPSCR bits is initiated until the
dler, ensures that all floating-point instructions transaction state has been modified.
previously initiated by the given processor have com-
pleted before the transaction state is modified, and that (Floating-point Storage Access instructions are not
no subsequent floating-point instructions are initiated affected.)
31 TO RA RB 782 1 31 TO RA SI 846 1
0 6 11 16 21 31 0 6 11 16 21 31
a EXTS((RA)32:63)
b EXTS((RB)32:63)
abort 0 a EXTS((RA)32:63)
abort 0
CR0 0 || MSRTS || 0
CR0 0 || MSRTS || 0
if (a < b) & TO0 then abort 1
if (a > b) & TO1 then abort 1
if (a = b) & TO2 then abort 1 if a < EXTS(SI) & TO0 then abort 1
if (a u< b) & TO3 then abort 1 if a > EXTS(SI) & TO1 then abort 1
if (a >u b) & TO4 then abort 1 if a = EXTS(SI) & T02 then abort 1
if a u< EXTS(SI) & TO3 then abort 1
if abort & (MSRTS = 0b10 | MSRTS = 0b01) then if a >u EXTS(SI) & TO4 then abort 1
#Transactional or Suspended
cause 0x00000001 if abort & (MSRTS = 0b10 | MSRTS = 0b01) then
if MSRTS= 0b01 & TEXASRFS = 0 then #Suspended #Transactional or Suspended
Discard transactional footprint cause 0x00000001
TMRecordFailure(cause) if MSRTS= 0b01 & TEXASRFS = 0 then #Suspended
if MSRTS = 0b10 then #Transactional Discard transactional footprint
TMHandleFailure() TMRecordFailure(cause)
if MSRTS = 0b10 then #Transactional
TMHandleFailure()
The tabortwc. instruction sets condition register field 0
to 0 || MSRTS || 0. The contents of register RA32:63 are The tabortwci. instruction sets condition register field 0
compared with the contents of register RB32:63. If any to 0 || MSRTS || 0. The contents of register RA32:63 are
bit in the TO field is set to 1 and its corresponding con- compared with the sign-extended value of the SI field. If
dition is met by the result of the comparison, and the any bit in the TO field is set to 1 and its corresponding
transaction state is Transactional or Suspended, then condition is met by the result of the comparison, and
the tabortwc. instruction causes transaction failure, the transaction state is Transactional or Suspended
resulting in the following: then the tabortwci. instruction causes transaction fail-
ure, resulting in the following:
Failure recording is performed as defined in Section
5.3.2, using the failure cause 0x00000001. Failure recording is performed as defined in Section
5.3.2, using the failure cause 0x00000001.
If the transaction state is Transactional, failure handling
is performed as defined in Section 5.3.3 (this includes If the transaction state is Transactional, failure handling
discarding the transactional footprint). is performed as defined in Section 5.3.3 (this includes
discarding the transactional footprint).
If the transaction state is Suspended, the transactional
footprint is discarded (if not already discarded for a If the transaction state is Suspended, the transactional
pending failure), but failure handling is deferred. footprint is discarded (if not already discarded for a
pending failure), but failure handling is deferred.
Other than the setting of CR0, execution of tabortwc.
in the Non-transactional state is treated as a no-op. Other than the setting of CR0, execution of tabortwci.
in the Non-transactional state is treated as a no-op.
Special Registers Altered:
Special Registers Altered: CR0 TEXASR TFIAR TS
CR0 TEXASR TFIAR TS
31 TO RA RB 814 1 31 TO RA SI 878 1
0 6 11 16 21 31 0 6 11 16 21 31
a ( RA ) a (RA)
b ( RB ) abort 0
abort 0
CR0 0 || MSRTS || 0
CR0 0 || MSRTS || 0
if (a < b) & TO0 then abort 1 if a < EXTS(SI) & TO0 then abort 1
if (a > b) & TO1 then abort 1 if a > EXTS(SI) & TO1 then abort 1
if (a = b) & TO2 then abort 1 if a = EXTS(SI) & T02 then abort 1
if (a u< b) & TO3 then abort 1 if a u< EXTS(SI) & TO3 then abort 1
if (a >u b) & TO4 then abort 1 if a >u EXTS(SI) & TO4 then abort 1
if abort & (MSRTS = 0b10 | MSRTS = 0b01) then if abort & (MSRTS = 0b10 | MSRTS = 0b01) then
#Transactional or Suspended #Transactional or Suspended
cause 0x00000001 cause 0x00000001
if MSRTS= 0b01 & TEXASRFS = 0 then #Suspended if MSRTS= 0b01 & TEXASRFS = 0 then #Suspended
Discard transactional footprint Discard transactional footprint
TMRecordFailure(cause) TMRecordFailure(cause)
if MSRTS = 0b10 then #Transactional if MSRTS = 0b10 then #Transactional
TMHandleFailure() TMHandleFailure()
The tabortdc. instruction sets condition register field 0 The tabortdci. instruction sets condition register field 0
to 0 || MSRTS || 0. The contents of register RA are com- to 0 || MSRTS || 0. The contents of register RA are com-
pared with the contents of register RB. If any bit in the pared with the sign-extended value of the SI field. If any
TO field is set to 1 and its corresponding condition is bit in the TO field is set to 1 and its corresponding con-
met by the result of the comparison, and the transac- dition is met by the result of the comparison, and the
tion state is Transactional or Suspended, then the transaction state is Transactional or Suspended then
tabortdc. instruction causes transaction failure, result- the tabortdci. instruction causes transaction failure,
ing in the following: resulting in the following:
Failure recording is performed as defined in Section Failure recording is performed as defined in Section
5.3.2, using the failure cause 0x00000001. 5.3.2, using the failure cause 0x00000001.
If the transaction state is Transactional, failure handling If the transaction state is Transactional, failure handling
is performed as defined in Section 5.3.3 (this includes is performed as defined in Section 5.3.3 (this includes
discarding the transactional footprint). discarding the transactional footprint).
If the transaction state is Suspended, the transactional If the transaction state is Suspended, the transactional
footprint is discarded (if not already discarded for a footprint is discarded (if not already discarded for a
pending failure), but failure handling is deferred. pending failure), but failure handling is deferred.
Other than the setting of CR0, execution of tabortdc. in Other than the setting of CR0, execution of tabortdci.
the Non-transactional state is treated as a no-op. in the Non-transactional state is treated as a no-op.
Special Registers Altered: Special Registers Altered:
CR0 TEXASR TFIAR TS CR0 TEXASR TFIAR TS
Programming Note
One use of the tcheck instruction in Suspended
state is to determine whether preceding loads from
transactionally modified locations have returned
the data the transaction stored. (If the transaction
has failed, some of the loads may have returned a
more recent value that was stored by a conflicting
store, or may have returned the pre-transaction
contents of the location.). It is important to use
tcheck. between any Suspended state loads that
might access transactionally modified locations and
subsequent computation using the Sus-
pended-state-loaded data. Otherwise, corrupt data
could cause problems such as wild branches or
infinite loops.
Another use of tcheck in Suspended state is to
determine whether the contents of storage, as seen
in Suspended state, are consistent with the trans-
action succeeding -- e.g., whether no location that
has been accessed transactionally (stored transac-
tionally, for ROTs), and has been seen in Sus-
pended state, has been subject to a conflict thus
far. (A location is seen in Suspended state either
by being loaded in Suspended state or by being
loaded in Transactional state and the value (or a
value derived therefrom) passed, in a register, into
Suspended state.)
A use of tcheck in Transactional state is to deter-
mine whether the transaction still has the potential
to succeed.
Note that tcheck provides an instantaneous check
on the integrity of a subset of the accesses per-
formed within a transaction. tcheck is not a failure
synchronizing mechanism. Even if no accesses
follow the tcheck, there may still be latent failures
that haven’t been recorded, for example caused by
accesses that tcheck does not wait for, by external
conflicts that will happen in the future, or simply by
time of flight to the failure detection mechanism for
operations that have already been performed.
Programming Note
The tcheck instruction can return 1 in bit 0 of CR
field BF before the failure has been recorded in
TEXASR and TFIAR.
Programming Note
The tcheck instruction may cause pipeline syn-
chronization. As a result, programs that use
tcheck excessively may perform poorly.
Programming Note
New programs should use mfspr instead of mftb
to access the Time Base.
Programming Note
mftb serves as both a basic and an extended mne-
monic. The Assembler will recognize an mftb
mnemonic with two operands as the basic form,
and an mftb mnemonic with one operand as the
extended form. In the extended form the TBR
operand is omitted and assumed to be 268 (the
value that corresponds to TB).
Programming Note
Since the update frequency of the Time Base is imple- When the processor is in 64-bit mode, The POSIX
mentation-dependent, the algorithm for converting the clock can be computed with an instruction sequence
current value in the Time Base to time of day is also such as this:
implementation-dependent. mfspr Ry,268 # Ry = Time Base
lwz Rx,ticks_per_sec
As an example, assume that the Time Base increments divdu Rz,Ry,Rx # Rz = whole seconds
at the constant rate of 512 MHz. (Note, however, that stw Rz,posix_sec
programs should allow for the possibility that some mulld Rz,Rz,Rx # Rz = quotient * divisor
implementations may not increment the least-signifi- sub Rz,Ry,Rz # Rz = excess ticks
cant 4 bits of the Time Base at a constant rate.) What is lwz Rx,ns_adj
wanted is the pair of 32-bit values comprising a POSIX slwi Rz,Rz,1 # Rz = 2 * excess ticks
standard clock:1 the number of whole seconds that mulhwu Rz,Rz,Rx # mul by (ns/tick)/2 * 232
have passed since 00:00:00 January 1, 1970, UTC, stw Rz,posix_ns# product[0:31] = excess ns
and the remaining fraction of a second expressed as a
number of nanoseconds. Non-constant update frequency
Assume that: In a system in which the update frequency of the Time
Base may change over time, it is not possible to con-
The value 0 in the Time Base represents the start vert an isolated Time Base value into time of day.
time of the POSIX clock (if this is not true, a simple Instead, a Time Base value has meaning only with
64-bit subtraction will make it so). respect to the current update frequency and the time of
The integer constant ticks_per_sec contains the day that the update frequency was last changed. Each
value 512,000,000, which is the number of times time the update frequency changes, either the system
the Time Base is updated each second. software is notified of the change via an interrupt (see
Book III), or the change was instigated by the system
The integer constant ns_adj contains the value
software itself. At each such change, the system soft-
ware must compute the current time of day using the
1,000,000,000
-------------------------------------- 232 / 2 = 4194304000 old update frequency, compute a new value of
512,000,000 ticks_per_sec for the new frequency, and save the time
of day, Time Base value, and tick rate. Subsequent
which is the number of nanoseconds per tick of the calls to compute Time of Day use the current Time
Time Base, multiplied by 232 for use in mulhwu Base Value and the saved value.
(see below), and then divided by 2 in order to fit, as
an unsigned integer, into 32 bits.
1. Described in POSIX Draft Standard P1003.4/D12, Draft Standard for Information Technology -- Portable Operating System Interface (POSIX) --
Part 1: System Application Program Interface (API) - Amendment 1: Real-time Extension [C Language]. Institute of Electrical and Electronics Engi-
neers, Inc., Feb. 1992.
Programming Note
31 Performance Monitor Event-Based
Exception Enable (PME) As part of processing an External EBB
0 Performance Monitor event-based exception, it may also be necessary to
exceptions are disabled. perform additional operations to manage
1 Performance Monitor event-based the external EBB input from the system.
exceptions are enabled until a Per- See the system documentation for details.
formance Monitor event-based
exception occurs, at which time: 63 Performance Monitor Event-Based
- PME is set to 0 Exception Occurred (PMEO)
- PMEO is set to 1 0 A Performance Monitor
See Chapter 9 of Book III for information event-based exception has not
about Performance Monitor event-based occurred since the last time soft-
exceptions and about the effects of this bit on ware set this bit to 0.
the Performance Monitor. 1 A Performance Monitor
event-based exception has
Programming Note occurred since the last time soft-
ware set this bit to 0.
Performance Monitor event-based excep-
tions can only occur in problem state. See This bit is set to 1 by the hardware when a
Section 9.2 of Book III. Performance Monitor event-based exception
occurs. This bit can be set to 0 only by the
32:33 Transaction State (TS) mtspr instruction.
When an event-based branch occurs, hard-
See Chapter 9 of Book III for information
ware sets this field to indicate the transaction
about Performance Monitor event-based
state of the processor when the event-based
exceptions and about the effects of this bit on
branch occurred.
the Performance Monitor.
The values and their associated meanings are
as follows.
Programming Note
00 Non-transactional After handling an event-based branch, software
01 Suspended should set the “exception occurred” bit(s) corre-
10 Transactional sponding to the event-based exception(s) that have
11 Reserved occurred to 0. See the Programming Notes in Sec-
BESCRTS is part of the Transactional Memory tion 7.3 for additional information.
facility. (The entire BESCR is part of the
Event-Based Branch facility.)
7.2.2 Event-Based Branch Han-
Programming Note
dler Register
Event-based branch handlers should not
modify this field since its value is used by The Event-Based Branch Handler Register (EBBHR) is
the processor to determine the transaction a 64-bit register register that contains the 62 most sig-
state of the processor after the rfebb nificant bits of the effective address of the instruction
instruction is executed. that is executed next after an event-based branch
occurs. Bits 62:63 must be available to be read and
written by software.
34:63 Event Status
34:61Reserved Effective Address
62 External Event-Based Exception 0 62 63
Occurred (EEO) Figure 12. Event-Based Branch Handler Register
0 An external EBB exception has not (EBBHR)
occurred since the last time soft-
ware set this bit to 0.
1 An external EBB exception has
occurred since the last time soft-
ware set this bit to 0.
Programming Note
The EBBHR can be used by software as a scratch-
pad register after entry into an event-based branch
handler, provided that its contents are restored
prior to executing rfebb 1. An example of such
usage is as follows. In the example, SPRG3 is
used to contain a pointer to a storage area where
private application data may be saved, however,
refer to the applicable operating system documen-
tation to determine if an alternate register or stor-
age area should be used.
Effective Address //
0 62 63
Programming Note
If the BESCRTS has been modified by software
after an event-based branch occurs, an illegal
transaction state transition may occur. See Chapter
3.2.2 of Book III.
The Branch History Rolling Buffer (BHRB) is a buffer The effective address of the branch target exceeds
containing an implementation-dependent number of the effective address of the Branch instruction by
entries, referred to as BHRB Entries (BHRBEs), that 4.
contain information related to branches that have been The instruction is a B-form Branch, the effective
taken. Entries are numbered from 0 through n, where n address of the branch target exceeds the effective
is implementation-dependent but no more than 1023. address of the Branch instruction by 8, and the
Entry 0 is the most-recently written entry. The BHRB is instruction immediately following the Branch
read by means of the mfbhrbe instruction. instruction is not another Branch instruction.
System software typically controls the availability of the The determination of whether the effective address of
BHRB as well as the number of entries that it contains. the branch target exceeds the effective address of the
If the BHRB is accessed when it is unavailable, the sys- Branch instruction by 4 or 8 is made modulo 264.
tem facility unavailable error handler is invoked.
Various events or actions by the system software may Programming Note
result in the BHRB occasionally being cleared. If BHRB The cases described above, for which the BHRBE
entries are read after this has occurred, 0s will be need not be written, are cases for which some
returned. See the description of the mfbhrbe instruc- implementations may optimize the execution of the
tion for additional information. Branch instruction (first case) or of the Branch
instruction and the following instruction (second
The BHRB is typically used in conjunction with Perfor- case) in a manner that makes writing the BHRBE
mance Monitor event-based branches. (See Chapter 7 difficult. Such implementations may provide a
of Book II.) When used in conjunction with this facility, means by which system software can disable
BESCRPME is set to 1 to enable Performance Monitor these optimizations, thereby ensuring that the cor-
event-based exceptions, and Performance Monitor responding BHRBEs are written normally.
alerts are enabled to enable the writing of BHRB
entries. When a Performance Monitor alert occurs, Per-
When an XL-form Branch instruction is entered into the
formance Monitor alerts are disabled, BHRB entries are
BHRB, bits 0:61 of the effective address of the Branch
no longer written, and an event-based branch occurs.
instruction are written into the next available entry if
(See Chapter 9 of Book III for additional information on
allowed by the filtering mode; subsequently, bits 0:61 of
the Performance Monitor.) The event-based branch
the effective address of the branch target are written
handler can then access the contents of the BHRB for
into the following entry.
analysis.
BHRB entries are written as described above without
When the BHRB is written by hardware, only those
regard to transaction state and are not removed due to
Branch instructions that meet the filtering criteria are
transaction failures.
written. See Section 9.4.7 of Book III.
The following paragraphs describe the entries written
into the BHRB for various types of Branch instructions
for which the branch was taken. In some circum-
stances, however, the hardware may be unable to
make the entry even though the following paragraphs
require it. In such cases, the hardware sets the EA field
to 0, and indicates any missed entries using the T and
P fields. (See Section 8.1.)
When an I-form or B-form Branch instruction is entered
into the BHRB, bits 0:61 of the effective address of the
Branch instruction are written into the next available
entry, except that the entry may or may not be written in
the following cases.
Effective Address T P
0 62 63
Programming Note
It is expected that programs will not contain Branch
instructions with instruction or target effective
address equal to 0. If such instructions exist, pro-
grams cannot distinguish between entries that are
markers and entries that correspond to instructions
with instruction or target effective address 0.
Programming Note
In order to read all the BHRB entries containing
information about taken branches, software should
read the entries starting from entry number 0 and
continuing until an entry containing all 0s is read or
until all implemented BHRB entries have been
read.
Since the number of BHRB entries may decrease
or the BHRB may be cleared at any time, if a given
entry, m, is read as not containing all 0s and is read
again subsequently, the subsequent read may
return all 0s even though the program has not exe-
cuted clrbhrb.
In order to make assembler language programs simpler symbols related to instructions defined in Book II.
to write and easier to understand, a set of extended Assemblers should provide the extended mnemonics
mnemonics and symbols is provided for certain instruc- and symbols listed here, and may provide others.
tions. This appendix defines extended mnemonics and
A.1 Data Cache Block Touch [for represent the L value in the mnemonic rather than
requiring it to be coded as a numeric operand.
Store] Mnemonics Note: dcbf serves as both a basic and an extended
mnemonic. The Assembler will recognize a dcbf mne-
The TH field in the Data Cache Block Touch and Data monic with three operands as the basic form, and a
Cache Block Touch for Store instructions control the dcbf mnemonic with two operands as the extended
actions performed by the instructions. Extended mne- form. In the extended form the L operand is omitted
monics are provided that represent the TH value in the and assumed to be 0.
mnemonic rather than requiring it to be coded as a
numeric operand.
dcbf RA,RB (equivalent to: dcbf RA,RB,0)
dcbtct RA,RB,TH (equivalent to: dcbt for TH val- dcbfl RA,RB (equivalent to: dcbf RA,RB,1)
ues of 0b00000 - 0b00111); dcbflp RA,RB (equivalent to: dcbf RA,RB,3)
other TH values are invalid.
dcbtds RA,RB,TH (equivalent to: dcbt for TH val-
ues of 0b00000 or 0b01000 A.3 Or Mnemonics
- 0b01111);
other TH values are invalid. The three register fields in the or instruction can be
used to specify a hint indicating how the processor
dcbtt RA,RB (equivalent to: dcbt for TH
should handle stores caused by previous Store or dcbz
value of 0b10000)
instructions. An extended mnemonic is supported that
dcbna RA,RB (equivalent to: dcbt for TH represents the operand values in the mnemonic rather
value of 0b10001) than requiring them to be coded as numeric operands.
dcbtstct RA,RB,TH (equivalent to: dcbtst for TH
values of 0b00000 or miso (equivalent to: or 26,26,26)
0b00000 - 0b00111);
other TH values are invalid.
dcbtstds RA,RB,TH (equivalent to: dcbtst for TH A.4 Load and Reserve
values of 0b00000 or
0b01000 - 0b01111); Mnemonics
other TH values are invalid.
The EH field in the Load and Reserve instructions pro-
dcbtstt RA,RB (equivalent to: dcbtst for TH vides a hint regarding the type of algorithm imple-
value of 0b10000) mented by the instruction sequence being executed.
Extended mnemonics are provided that allow the EH
value to be omitted and assumed to be 0b0.
A.2 Data Cache Block Flush
Note: lbarx, lharx, lwarx, ldarx, and lqarx serve as
Mnemonics both basic and extended mnemonics. The Assembler
will recognize these mnemonics with four operands as
The L field in the Data Cache Block Flush instruction the basic form, and these mnemonics with three oper-
controls the scope of the flush function performed by
the instruction. Extended mnemonics are provided that
ands as the extended form. In the extended form the sent the A value in the mnemonic rather than requiring
EH operand is omitted and assumed to be 0. it to be coded as a numeric operand..
lbarx RT,RA,RB (equivalent to: lbarx RT,RA,RB,0) tend. (equivalent to: tend. 0)
lharx RT,RA,RB (equivalent to: lharx RT,RA,RB,0) tendall. (equivalent to: tend. 1)
lwarx RT,RA,RB (equivalent to: lwarx RT,RA,RB,0)
ldarx RT,RA,RB (equivalent to: ldarx RT,RA,RB,0) The L field in the Transaction Suspend or Resume
lqarx RT,RA,RB (equivalent to: lqarx RT,RA,RB,0) instruction determines how to change the transaction
state. Extended mnemonics are provided that repre-
sent the L value in the mnemonic rather than requiring
it to be coded as a numeric operand.
A.5 Synchronize Mnemonics
The L field in the Synchronize instruction controls the tsuspend. (equivalent to: tsr. 0)
scope of the synchronization function performed by the tresume. (equivalent to: tsr. 1)
instruction. Extended mnemonics are provided that
represent the L value in the mnemonic rather than
requiring it to be coded as a numeric operand. Two
extended mnemonics are provided for the L=0 value in
A.8 Move To/From Time Base
order to support Assemblers that do not recognize the Mnemonics
sync mnemonic.
Note: sync serves as both a basic and an extended The tbr field in the Move From Time Base instruction
mnemonic. Assemblers will recognize a sync mne- specifies whether the instruction reads the entire Time
monic with one operand as the basic form, and a sync Base or only the high-order half of the Time Base.
mnemonic with no operand as the extended form. In
the extended form the L operand is omitted and mftb Rx (equivalent to: mftb Rx,268)
assumed to be 0. or: mfspr Rx,268
mftbu Rx (equivalent to: mftb Rx,269)
sync (equivalent to: sync 0) or: mfspr Rx,269
lwsync (equivalent to: sync 1)
ptesync (equivalent to: sync 2)
A.9 Return From Event-Based
A.6 Wait Mnemonics Branch Mnemonic
The S field in the Return from Event-Based Branch
The WC field in the wait instruction is reserved for
instruction specifies the value to which the instruction
future use. It may be be used in the future to indicate
sets the GE field in the BESCR. Extended mnemonics
the condition that causes instruction execution to
are provided that represent the S value in the mne-
resume. An extended mnemonic is provided that repre-
monic rather than requiring it to be coded as a numeric
sent the WC value in the mnemonic rather than requir-
operand.
ing it to be coded as a numeric operand.
Note: wait serves as both a basic and an extended rfebb (equivalent to: rfebb 1)
mnemonic. The Assembler will recognize a wait mne-
monic with one operand as the basic form, and a wait Note: rfebb serves as both a basic and an extended
mnemonic with no operands as the extended form. In mnemonic. The Assembler will recognize this mne-
the extended form the WC operand is omitted and monic with one operand as the basic form, and this
assumed to be 0. mnemonic with no operands as the extended form. In
the extended form the S operand is omitted and
wait (equivalent to: wait 0) assumed to be 1.
This appendix gives examples of how dependencies In these examples it is assumed that contention for the
and the Synchronization instructions can be used to shared resource is low; the conditional branches are
control storage access ordering when storage is shared optimized for this case by using “+” and “-” suffixes
between programs. appropriately.
Many of the examples use extended mnemonics (e.g., The examples deal with words; they can be used for
bne, bne-, cmpw) that are defined in Appendix C of doublewords by changing all word-specific mnemonics
Book I. to the corresponding doubleword-specific mnemonics
(e.g., lwarx to ldarx, cmpw to cmpd).
Many of the examples use the Load And Reserve and
Store Conditional instructions, in a sequence that In this appendix it is assumed that all shared storage
begins with a Load And Reserve instruction and ends locations are in storage that is Memory Coherence
with a Store Conditional instruction (specifying the Required, and that the storage locations specified by
same storage location as the Load Conditional) fol- Load And Reserve and Store Conditional instructions
lowed by a Branch Conditional instruction that tests are in storage that is neither Write Through Required
whether the Store Conditional instruction succeeded. nor Caching Inhibited.
loop:
Fetch and AND lwarx r6,0,r3 #load and reserve
The “Fetch and AND” primitive atomically ANDs a cmpw r4,r6 #1st 2 operands equal?
bne- exit #skip if not
value into a word in storage.
stwcx. r5,0,r3 #store new value if still res’ved
In this example it is assumed that the address of the bne- loop #loop if lost reservation
word to be ANDed is in GPR 3, the value to AND into it exit:
is in GPR 4, and the old value is returned in GPR 5. mr r4,r6 #return value from storage
Notes:
loop:
lwarx r5,0,r3 #load and reserve 1. The semantics given for “Compare and Swap”
and r0,r4,r5#AND word above are based on those of the IBM System/370
stwcx. r0,0,r3 #store new value if still res’ved Compare and Swap instruction. Other architec-
bne- loop #loop if lost reservation tures may define a Compare and Swap instruction
Note: differently.
1. The sequence given above can be changed to per- 2. “Compare and Swap” is shown primarily for peda-
form another Boolean operation atomically on a gogical reasons. It is useful on machines that lack
word in storage, simply by changing the and the better synchronization facilities provided by
instruction to the desired Boolean instruction (or, lwarx and stwcx.. A major weakness of a Sys-
xor, etc.). tem/370-style Compare and Swap instruction is
that, although the instruction itself is atomic, it
checks only that the old and current values of the
Test and Set word being tested are equal, with the result that
This version of the “Test and Set” primitive atomically programs that use such a Compare and Swap to
loads a word from storage, sets the word in storage to a control a shared resource can err if the word has
nonzero value if the value loaded is zero, and sets the been modified and the old value subsequently
EQ bit of CR Field 0 to indicate whether the value restored. The sequence shown above has the
loaded is zero. same weakness.
In this example it is assumed that the address of the 3. In some applications the second bne- instruction
word to be tested is in GPR 3, the new value (nonzero) and/or the mr instruction can be omitted. The
is in GPR 4, and the old value is returned in GPR 5. bne- is needed only if the application requires that
if the EQ bit of CR Field 0 on exit indicates “not
loop: equal” then (r4) and (r6) are in fact not equal. The
lwarx r5,0,r3 #load and reserve mr is needed only if the application requires that if
cmpwi r5,0 #done if word not equal to 0 the comparands are not equal then the word from
bne- exit storage is loaded into the register with which it was
stwcx. r4,0,r3 #try to store non-0 compared (rather than into a third register). If
bne- loop #loop if lost reservation either or both of these instructions is omitted, the
exit: ... resulting Compare and Swap does not obey Sys-
tem/370 semantics.
B.2.1 Lock Acquisition and Import quent isync create an import barrier that prevents the
load from “data1” from being performed until the branch
Barriers has been resolved not to be taken.
An “import barrier” is an instruction or sequence of If the shared data structure is in storage that is neither
instructions that prevents storage accesses caused by Write Through Required nor Caching Inhibited, an
instructions following the barrier from being performed lwsync instruction can be used instead of the isync
before storage accesses that acquire a lock have been instruction. If lwsync is used, the load from “data1”
performed. An import barrier can be used to ensure may be performed before the stwcx.. But if the stwcx.
that a shared data structure protected by a lock is not fails, the second branch is taken and the lwarx is
accessed until the lock has been acquired. A sync re-executed. If the stwcx. succeeds, the value
instruction can be used as an import barrier, but the returned by the load from “data1” is valid even if the
approaches shown below will generally yield better per- load is performed before the stwcx., because the
formance because they order only the relevant storage lwsync ensures that the load is performed after the
accesses. instance of the lwarx that created the reservation used
by the successful stwcx..
B.2.1.1 Acquire Lock and Import
Shared Storage B.2.1.2 Obtain Pointer and Import
If lwarx and stwcx. instructions are used to obtain the
Shared Storage
lock, an import barrier can be constructed by placing an If lwarx and stwcx. instructions are used to obtain a
isync instruction immediately following the loop con- pointer into a shared data structure, an import barrier is
taining the lwarx and stwcx.. The following example not needed if all the accesses to the shared data struc-
uses the “Compare and Swap” primitive to acquire the ture depend on the value obtained for the pointer. The
lock. following example uses the “Fetch and Add” primitive to
obtain and increment the pointer.
In this example it is assumed that the address of the
lock is in GPR 3, the value indicating that the lock is In this example it is assumed that the address of the
free is in GPR 4, the value to which the lock should be pointer is in GPR 3, the value to be added to the pointer
set is in GPR 5, the old value of the lock is returned in is in GPR 4, and the old value of the pointer is returned
GPR 6, and the address of the shared data structure is in GPR 5.
in GPR 9.
loop:
loop: lwarx r5,0,r3 #load pointer and reserve
lwarx r6,0,r3,1 #load lock and reserve add r0,r4,r5#increment the pointer
cmpw r4,r6 #skip ahead if stwcx. r0,0,r3 #try to store new value
bne- wait # lock not free bne- loop #loop if lost reservation
stwcx. r5,0,r3 #try to set lock lwz r7,data1(r5) #load shared data
bne- loop #loop if lost reservation
isync #import barrier The load from “data1” cannot be performed until the
lwz r7,data1(r9)#load shared data pointer value has been loaded into GPR 5 by the
. lwarx. The load from “data1” may be performed before
. the stwcx.. But if the stwcx. fails, the branch is taken
wait... #wait for lock to free and the value returned by the load from “data1” is dis-
carded. If the stwcx. succeeds, the value returned by
The hint provided with lwarx indicates that after the
the load from “data1” is valid even if the load is per-
program acquires the lock variable (i.e., stwcx. is suc-
formed before the stwcx., because the load uses the
cessful), it will release it (i.e., store to it) prior to another
pointer value returned by the instance of the lwarx that
program attempting to modify it.
created the reservation used by the successful stwcx..
The second bne- does not complete until CR0 has
An isync instruction could be placed between the bne-
been set by the stwcx.. The stwcx. does not set CR0
and the subsequent lwz, but no isync is needed if all
until it has completed (successfully or unsuccessfully).
accesses to the shared data structure depend on the
The lock is acquired when the stwcx. completes suc-
value returned by the lwarx.
cessfully. Together, the second bne- and the subse-
B.2.2 Lock Release and Export The lwsync ensures that the store that releases the
lock will not be performed with respect to any other pro-
Barriers cessor until all stores caused by instructions preceding
the lwsync have been performed with respect to that
An “export barrier” is an instruction or sequence of
processor.
instructions that prevents the store that releases a lock
from being performed before stores caused by instruc-
tions preceding the barrier have been performed. An
export barrier can be used to ensure that all stores to a
shared data structure protected by a lock will be per- B.2.3 Safe Fetch
formed with respect to any other processor before the If a load must be performed before a subsequent store
store that releases the lock is performed with respect to (e.g., the store that releases a lock protecting a shared
that processor. data structure), a technique similar to the following can
be used.
B.2.2.1 Export Shared Storage and In this example it is assumed that the address of the
Release Lock storage operand to be loaded is in GPR 3, the contents
A sync instruction can be used as an export barrier of the storage operand are returned in GPR 4, and the
independent of the storage control attributes (e.g., address of the storage operand to be stored is in GPR
presence or absence of the Caching Inhibited attribute) 5.
of the storage containing the shared data structure.
lwz r4,0(r3)#load shared data
Because the lock must be in storage that is neither
cmpw r4,r4 #set CR0 to “equal”
Write Through Required nor Caching Inhibited, if the bne- $-8 #branch never taken
shared data structure is in storage that is Write stw r7,0(r5)#store other shared data
Through Required or Caching Inhibited a sync instruc-
tion must be used as the export barrier. An alternative is to use a technique similar to that
described in Section B.2.1.2, by causing the stw to
In this example it is assumed that the shared data depend on the value returned by the lwz and omitting
structure is in storage that is Caching Inhibited, the the cmpw and bne-. The dependency could be created
address of the lock is in GPR 3, the value indicating by ANDing the value returned by the lwz with zero and
that the lock is free is in GPR 4, and the address of the then adding the result to the value to be stored by the
shared data structure is in GPR 9. stw. If both storage operands are in storage that is nei-
ther Write Through Required nor Caching Inhibited,
stw r7,data1(r9)#store shared data (last) another alternative is to replace the cmpw and bne-
sync #export barrier with an lwsync instruction.
stw r4,lock(r3)#release lock
The sync ensures that the store that releases the lock
will not be performed with respect to any other proces-
sor until all stores caused by instructions preceding the
sync have been performed with respect to that proces-
sor.
B.5.1 Enter Critical Section GPR 3. The immediate value of the andis. instruction
selects the Failure Persistent bit in the upper half of
The following example shows the entry point to a criti- TEXASR to be tested.
cal section using transactional lock elision. The entry
code starts a transaction using the tbegin. instruction
tle_abort:
and checks whether the transaction was aborted or not.
mfspr r4, TEXASRU # Read high-order half
If not, it checks whether the lock is free or not. If the # of TEXASR
lock is found to be free, the thread proceeds to execute andis. r5,r4,0x0100 # determine whether failure
the critical section. # is likely to be persistent
bne tle_acquire_lock #Persistent, acquire lock
In this example it is assumed that the address of the
#enter critical sec
lock is in GPR 3, and the value indicating that the lock b tle_entry #Transient, try TLE again
is free is in GPR 4. The handling of cases of transaction
abort and busy lock are described in subsequent exam-
ples. This example can be extended to keep track of the
number of transient aborts and fall back on the acquisi-
tle_entry: tion of the lock after the number of transient failures
tbegin. #Start TLE transaction reaches some threshold. It can also be extended to
beq- tle_abort #Handle TLE transaction abort handle reentrant locks. Acquisition of TLE locks is
lwz r6,0(r3) #Read lock described in a subsequent example.
cmpw r6,r4 #Check if lock is free
bne- busy_lock #If not, handle lock busy case
B.5.4 TLE Exit Section Critical
critical_section1:
Path
The following example illustrates the instruction
sequence used to exit a TLE critical section. The CR0
B.5.2 Handling Busy Lock value set by tend. indicates whether the current thread
was in a transaction. If so, the exited critical section
In the event that the lock is already held, by either
was entered speculatively, and the transaction is
another thread or the current thread, the transaction is
ended. If not, the execution takes a path to release the
aborted using the tabort instruction, using a soft-
lock.
ware-defined code TLE_BUSY_LOCK indicating the
cause of the abort. The abort returns control to the beq Release of an acquired TLE lock is described in a sub-
following tbegin. in the critical section entrance sequent example.
sequence, allowing for an abort handler to react appro-
priately.
tle_exit:
tend. #End the current trans-
busy_lock: #action, if any
li r3, TLE_BUSY_LOCK bng- tle_release_lock #Release lock, if was
tabort r3 #Abort TLE transaction #not in a transaction
Book III:
Chapter 1. Introduction
- The definition of “store is performed” applies has finished executing. The contents of SPRs
to accesses for recording reference and that are the targets of mtspr instructions
change information. between the point of interruption and the end
of the mtspr sequence may be altered.
- A TLB entry invalidation by thread T1 is per-
formed with respect to thread T2 when the “must”
instruction that requested the invalidation has If hypervisor software violates a rule that is stated
caused the specified entry, if present, to be using the word “must” (e.g., “this field must be set
made invalid in T2’s TLB, and similarly for to 0”), and the rule pertains to the contents of a
invalidations of entries in other caches of hypervisor resource, to executing an instruction
information derived from tables used in that can be executed only in hypervisor state, or to
address translation. accessing storage in real addressing mode, the
results are undefined, and may include altering
exception
resources belonging to other partitions, causing
An error, unusual condition, or external signal, that
the system to “hang”, etc.
may set a status bit and may or may not cause an
interrupt, depending upon whether the correspond- hardware
ing interrupt is enabled. Any combination of hard-wired implementation,
emulation assist, or interrupt for software assis-
interrupt
tance. In the last case, the interrupt may be to an
The act of changing the machine state in response
architected location or to an implementa-
to an exception, as described in Chapter
tion-dependent location. Any use of emulation
6. “Interrupts” on page 1049.
assists or interrupts to implement the architecture
trap interrupt is implementation-dependent.
An interrupt that results from execution of a Trap
hypervisor privileged
instruction.
A term used to describe an instruction or facility
Additional exceptions to the rule that the thread that is available only when the thread is in hypervi-
obeys the sequential execution model, beyond sor state.
those described in Section 2.2 of Book I and in the
bullet defining “program order” in Section 1.1 of privileged state and supervisor mode
Book II, are the following. Used interchangeably to refer to a state in which
privileged facilities are available.
- A System Reset or Machine Check interrupt
may occur. The determination of whether an problem state and user mode
instruction is required by the sequential exe- Used interchangeably to refer to a state in which
cution model is not affected by the potential privileged facilities are not available.
occurrence of a System Reset or Machine /, //, ///, ... denotes a field that is reserved in an
Check interrupt. (The determination is instruction, in a register, or in an architected stor-
affected by the potential occurrence of any age table.
other kind of interrupt.)
?, ??, ???, ... denotes a field that is implementa-
- A context-altering instruction is executed tion-dependent in an instruction, in a register, or in
(Chapter 11. “Synchronization Requirements an architected storage table.
for Context Alterations” on page 1133). The
context alteration need not take effect until the
required subsequent synchronizing operation 1.2.2 Reserved Fields
has occurred.
Book I's description of the handling of reserved bits in
- A Reference and Change bit is updated by the System Registers, and of reserved values of defined
thread. The update need not be performed fields of System Registers, applies also to the SLB.
with respect to that thread until the required Book I's description of the handling of reserved values
subsequent synchronizing operation has of defined fields of System Registers applies also to
occurred. architected storage tables (e.g., the Page Table).
- A Branch instruction is executed and the Software should set reserved fields in the SLB and in
branch is taken. The update of the architected storage tables to zero, because these fields
Come-From Address Register (see may be assigned a meaning in some future version of
Section 8.2 of Book III) need not occur until a the architecture.
subsequent context synchronizing operation
has occurred. Some fields of certain architected storage tables may
be written to automatically by the hardware, e.g., Refer-
- An mtgsr is executed and an interrupt occurs ence and Change bits in the Page Table. When the
before the mtspr sequence following mtgsr
47 Privileged Doorbell Exit Enable power-saving mode upon the occurrence of a System
Reset exception and any of the exceptions that were
0 When the stop instruction is executed
enabled by the PECE field when the stop instruction
with PSSCREC=1, Directed Privileged
was executed. In addition, they may also exit
Doorbell exceptions are not enabled to
power-saving mode on exceptions that were disabled
cause exit from power-saving mode
by the PECE field as well. See Section 6.5.1 and Sec-
1 When the stop instruction is executed
tion 6.5.2 for additional information about exit from
with PSSCREC=1, Directed Privileged
power-saving mode.
Doorbell exceptions are enabled to cause
exit from power-saving mode. 52 Mediated External Exception Request
(MER)
48 Hypervisor Doorbell Exit Enable
0 A Mediated External exception is not
0 When the stop instruction is executed
requested.
with PSSCREC=1, Directed Hypervisor
1 A Mediated External exception is
Doorbell exceptions are not enabled to
requested.
cause exit from power-saving mode
1 When the stop instruction is executed The exception effects of this bit are said to be
with PSSCREC=1, Directed Hypervisor consistent with the contents of this bit if one of
Doorbell exceptions are enabled to cause the following statements is true.
exit from power-saving mode. - LPCRMER = 1 and a Mediated External
exception exists.
49 External Exit Enable
- LPCRMER = 0 and a Mediated External
0 When the stop instruction is executed exception does not exist.
with PSSCREC=1, External exceptions
are not enabled to cause exit from A context synchronizing instruction or event
power-saving mode. that is executed or occurs when LPCRMER = 0
1 When the stop instruction is executed ensures that the exception effects of
with PSSCREC=1, External exceptions LPCRMER are consistent with the contents of
are enabled to cause exit from power-sav- LPCRMER. Otherwise, when an instruction
ing mode. changes the contents of LPCRMER, the excep-
tion effects of LPCRMER become consistent
50 Decrementer Exit Enable with the new contents of LPCRMER reason-
0 When the stop instruction is executed ably soon after the change.
with PSSCREC=1, Decrementer excep-
tions are not enabled to cause exit from Programming Note
power-saving mode. LPCRMER provides a means for the
1 When the stop instruction is executed hypervisor to direct an external exception
with PSSCREC=1, Decrementer excep- to a partition independent of the partition's
tions are enabled to cause exit from MSREE setting. (When MSREE=0, it is
power-saving mode. (Decrementer excep- inappropriate for the hypervisor to deliver
tions do not occur if the state of the Decre- the exception.) Using LPCRMER, the parti-
menter is not maintained and updated as tion can be interrupted upon enabling
if the thread was not in power-saving external interrupts. Without using
mode.) LPCRMER, the hypervisor must check the
state of MSREE whenever it gets control,
51 Other Exit Enable
which will result in less timely delivery of
0 When the stop instruction is executed the exception to the partition.
with PSSCREC=1, Machine Check,
Hypervisor Maintenance, and certain 53 Guest Translation Shootdown Enable
implementation-specific exceptions are (GTSE)
not enabled to cause exit from power-sav-
Controls whether the operating system is per-
ing mode.
mitted to use tlbie, slbieg, and slbiag directly,
1 When the stop instruction is executed
or must issue a system call to the hypervisor.
with PSSCREC=1, Machine Check,
0 Guest is not permitted to use tlbie,
Hypervisor Maintenance, and certain
slbieg, slbiag, tlbsync, and slbsync.
implementation-specific exceptions are
1 Guest is permitted to use tlbie, slbieg,
enabled to cause exit from power-saving
slbiag, tlbsync, and slbsync.
mode.
If the state of the PECE field is lost during power-saving
mode, implementations must provide the means to exit
55:58 Reserved See Section 6.5 on page 1063 for a description of how
the setting of LPES affects the processing of interrupts.
59 Hypervisor External Interrupt Control
(HEIC)
0 Direct External interrupts can occur in 2.3 Hypervisor Real Mode Offset
1
Hypervisor state.
Direct External interrupts cannot occur in
Register (HRMOR)
hypervisor state. The layout of the Hypervisor Real Mode Offset Register
(HRMOR) is shown in Figure 1 below.
Programming Note
By setting HEIC=1, the Hypervisor Inter- // HRMO
rupt Virtualization handler can prevent 0 4 63
External interrupts from occurring during
the Hypervisor Virtualization interrupt han- Bits Name Description
dler. See Section 6.5.7.1. 4:63 HRMO Real Mode Offset
Figure 1. Hypervisor Real Mode Offset Register
All other fields are reserved.
60 Logical Partitioning Environment Selector The supported HRMO values are the non-negative
(LPES) multiples of 2r, where r is an implementation-dependent
0 External interrupts set the HSRRs, set value and 12 r 26.
MSRHV to 1, and leave MSRRI The contents of the HRMOR affect how some storage
unchanged. accesses are performed as described in Section 5.7.3
1 External interrupts set the SRRs, set on page 984 and Section 5.7.5 on page 987.
MSRRI to 0, and leave MSRHV
unchanged.
Programming Note
2.4 Logical Partition
LPES = 1 should be used by operating Identification Register (LPIDR)
systems not running under a hypervisor,
The layout of the Logical Partition Identification Regis-
so that external interrupts are directed to
ter (LPIDR) is shown in Figure 2 below.
the SRRs rather than to the HSRRs.
LPID
Programming Note 32 63
In versions of the architecture that pre-
cede Version 2.07, LPES was a two-bit Bits Name Description
field, in which the second bit controlled 32:63 LPID Logical Partition Identifier
significant aspects of storage accessing
Figure 2. Logical Partition Identification Register
and interrupt handling.
The contents of the LPIDR identify the partition to
61 Reserved which the thread is assigned, affecting some aspects of
translation and interrupt delivery. The number of
62 Hypervisor Virtualization Interrupt Condi-
LPIDR bits supported is implementation-dependent.
tionally Enable (HVICE)
Programming Note
- Unless the second item of this list applies, bits
in system registers read back 0s for mfspr
Radix tree translation assigns special meaning to and mtspr operations have no effect on their
LPID=0, specifically indicating the hypervisor’s own values, except as described immediately
partition. When HR=1, LPIDR should not be set to below for bits 44:45 of the XER.
zero except when MSRHV=1. For bits 44:45 of the XER, two pairs of bits are
HPT translation provides special functionality for provided, an “OV32-CA32” bit pair for
LPID=0 when HV=1, as described in Section 5.9.3, XEROV32 and XERCA32 and a “reserved” bit
to support the execution of a “bare metal” operating pair for legacy XER bits 44:45 behavior.
system (an operating system that runs in hypervi- Which bit pair is read by mfxer is controlled by
sor state). Speculative Segment Table walks are the PCR. mtxer writes to both bit pairs, inde-
prohibited when MSRHV=1 in other partitions pendent of the PCR. mcrxr reads the
because adjunct translations are bolted. A partition "OV32-CA32" bit pair.
that uses HPT translation and requires the services
of an adjunct should not be assigned LPID=0. Each bit in the “OV32-CA32” bit pair is implic-
itly set by instructions that implicitly set their
respective XEROV or XERCA, independent of
Programming Note the PCR. The “reserved” bit pair for bits 44:45
The aspect of interrupt delivery that the LPIDR of the XER are not altered by these instruc-
affects is the delivery of certain external interrupts. tions, independent of the PCR.
Some platforms make LPIDR/PIDR/TIDR available
The txer, selii[.], selir[.], selri[.], and selrr[.]
so that specific threads can be targeted for interrupt
instructions read bits 44:45 of the XER as 0s,
delivery. This function is most commonly used to
independent of the PCR.
communicate the disposition of accelerator-related
processing back to the initiating thread.
Programming Note
The "reserved" bit pair does not conform
to the usual rules for reading (mfspr)
2.5 Processor Compatibility reserved bits in registers (see
Register (PCR) Section 1.3.3 of Book I) because some
early implementations used bits 44:45 of
The layout of the Processor Compatibility Register the XER for implementation-specific pur-
(PCR) is shown in Figure 3 below. poses. On these implementations, and on
subsequent implementations that imple-
Version mented versions of the architecture that
bits precede V. 3.0, mfxer returned the con-
tents of the bits, despite that the bits were
v2.07
v2.06
v2.05
Bit Description
0:59 Reserved
PVR (see Section 4.3.1) Threads are numbered sequentially, with valid values
RPR (see Section 4.3.9) ranging from 0 to t-1, where t is the number of threads
PTCR (see Section 5.7.6.1) implemented. A thread for which TIR = n is referred to
AMOR (see Section 5.7.13.1) as “thread n.”
HMEER (see Section 6.2.10)
The layout of the TIR is shown below.
Time Base (see Section 7.2)
Virtual Time Base (see Section 7.3)
Hypervisor Decrementer (see Section 7.5) TIR
certain implementation-specific registers or imple- 0 63
mentation-specific fields in architected registers Figure 4. Thread Identification Register
The set of resources that are shared is implementa- Access to the TIR is privileged.
tion-dependent.
Since the thread number contained in this register is
Threads that share any of the resources listed above, different if it is read in hypervisor from when it is read in
with the exception of the PTCR, the PVR and the privileged, non-hypervisor state in implementations that
HRMOR, must be in the same partition. support sub-processors, the following conventions are
For each field of the LPCR, except the AIL, EVIRT, used.
ONL, HDICE, MER,PECE, HEIC, and HVICE fields, - The value returned in privileged, non-hypervi-
software must ensure that the contents of the field are sor state is referred to as the “privileged
identical among all threads that are in the same parti- thread number.”
tion and are not in hypervisor state.
- The value returned in hypervisor state is
referred to as the “hypervisor thread number.”
2.8 Sub-Processors
Hardware is allowed to sub-divide a multi-threaded pro- 2.10 Hypervisor Interrupt Lit-
cessor into “sub-processors” that appear to privileged
programs as multi-threaded processors with fewer
tle-Endian (HILE) Bit
threads. Such a multi-threaded processor appears to The Hypervisor Interrupt Little-Endian (HILE) bit is a bit
the hypervisor as a processor with a number of threads in an implementation-dependent register or similar
equal to the sum of all sub-processor threads, and in mechanism. The contents of the HILE bit are copied
which the LPIDR for each sub-processor must appear into MSRLE by interrupts that set MSRHV to 1 (see Sec-
to be shared among all threads of that sub-processor. tion 6.5), to establish the Endian mode for the interrupt
handler. The HILE bit is set, by an implementa-
tion-dependent method, only during system initializa-
2.9 Thread Identification Regis- tion.
ter (TIR) The contents of the HILE bit must be the same for all
threads under the control of a given instance of the
The TIR is a 64-bit read-only register that contains the hypervisor; otherwise all results are undefined.
thread number, which is a binary number correspond-
ing to the thread.
For implementations that do not support sub-proces-
sors, the thread number of a thread is unique among all
thread numbers of threads on the multi-threaded pro-
cessor.
For implementations that support sub-processors, the
value of this register depends on whether it is read in
hypervisor or privileged, non-hypervisor state as fol-
lows.
- When this register is read in privileged,
non-hypervisor state, the thread number is
unique among all thread numbers of threads
on the sub-processor.
- When this register is read in hypervisor state,
the thread number is unique among all thread
numbers of threads on the multi-threaded pro-
cessor.
6:28 Reserved
29:30 Transaction State (TS)
00 Non-transactional
01 Suspended
10 Transactional
11 Reserved
Changes to MSRTS that are caused by Trans- 0 The thread is in privileged state.
actional Memory instructions, and by invoca- 1 The thread is in problem state.
tion of the transaction's failure handler, take
effect immediately (even though these instruc- Programming Note
tions and events are not context synchroniz- Any instruction that sets MSRPR to 1 also
ing). sets MSREE, MSRIR, and MSRDR to 1.
31 Transactional Memory Available (TM)
50 Floating-Point Available (FP)
0 The thread cannot execute any Transac-
tional Memory instructions or access any 0 The thread cannot execute any float-
Transactional Memory registers. ing-point instructions, including float-
1 The thread can execute Transactional ing-point loads, stores, and moves.
Memory instructions and access Transac- 1 The thread can execute floating-point
tional Memory registers unless the Trans- instructions unless they have been made
actional Memory facility has been made unavailable by some other register.
unavailable by some other register. 51 Machine Check Interrupt Enable (ME)
32:37 Reserved 0 Machine Check interrupts are disabled.
38 Vector Available (VEC) 1 Machine Check interrupts are enabled.
0 The thread cannot execute any vector This bit is a hypervisor resource; see Chapter
instructions, including vector loads, 2., “Logical Partitioning (LPAR) and Thread
stores, and moves. Control”, on page 927.
1 The thread can execute vector instruc-
tions unless they have been made Programming Note
unavailable by some other register. The only instructions that can alter
39 Reserved MSRME are rfid and hrfid.
See below.
Programming Note
56:57 Reserved Software can use this bit as a pro-
58 Instruction Relocate (IR) cess-specific marker which, in conjunction
with MMCR0FCM0 FCM1 (see
0 Instruction address translation is disabled. Section 9.4.4) and MMCR2 (see
1 Instruction address translation is enabled. Section 9.4.6), permits events to be
counted on a process-specific basis. (The
Programming Note bit is saved by interrupts and restored by
See the Programming Note in the defini- rfid.)
tion of MSRPR.
Common uses of the PMM bit include the
59 Data Relocate (DR) following.
0 Data address translation is disabled. All counters count events for a few
Effective Address Overflow (EAO) (see selected processes. This use
Book I) does not occur. requires the following bit settings.
1 Data address translation is enabled. EAO - MSRPMM=1 for the selected pro-
causes a Data Storage interrupt. cesses, MSRPMM=0 for all other
processes
Programming Note - MMCR0FCM0=1
- MMCR0FCM1=0
See the Programming Note in the defini- - MMCR2 = 0x0000
tion of MSRPR.
All counters count events for all but a
60 Reserved few selected processes. This use
requires the following bit settings.
- MSRPMM=1 for the selected pro-
61 Performance Monitor Mark (PMM) cesses, MSRPMM=0 for all other
processes
This bit is used by software in conjunction with - MMCR0FCM0=0
the Performance Monitor, as described in - MMCR0FCM1=1
Chapter 9. - MMCR2 = 0x0000
Notice that for both of these uses a mark
value of 1 identifies the “few” processes
and a mark value of 0 identifies the
remaining “many” processes. Because
the PMM bit is set to 0 when an interrupt
occurs (see Figure 65 on page 1064),
interrupt handlers are treated as one of
the “many”. If it is desired to treat interrupt
handlers as one of the “few”, the mark
value convention just described would be
reversed.
If only a specific counter n is to be frozen,
MMCR0FCM0 FCM1 is set to 0b00, and
MMCR2FCnM0 and MMCR2FCnM1 instead
of MMCR0FCM0 and MMCR0FCM1 are set
to the values described above.
0 The thread is in Big-Endian mode. the second column refers to the MSRTS and MSRTM
1 The thread is in Little-Endian mode. values supplied by CTR for rfscv, BESCR for rfebb
(just the TS value), SRR1 for rfid, HSRR1 for hrfid, or
Programming Note register RS for mtmsrd.
The only instructions that can alter MSRLE
are rfid and hrfid, and rfscv.
Programming Note
The transition rules are the same for mtmsrd as for
the rfid-type instructions because if a transition
were illegal for mtmsrd but allowed for rfid, or vice
versa, software could use the instruction for which
the transition is allowed to achieve the effect of the
other instruction.
Sx Sx
Notes:
1.Generate TM Bad Thing type Program interrupt. “All others" includes all attempts to set MSRTS to 0b11
(reserved value).
2.Instruction completes, change to MSRTM suppressed, except when attempted by rfebb, in which case the result
is a TM Bad Thing type Program interrupt.
Programming Note
For rfscv, [h]rfid, and mtmsrd, the attempted tran-
sition from S0 to N0 is suppressed in order that
interrupt handlers that are "unaware" of transac-
tional memory, and load an MSR value that has not
been updated to take account of transactional
memory, will continue to work correctly. (If the
interrupt occurs when a transaction is running or
suspended, the interrupt will set MSRTS||TM to S0.
If the interrupt handler attempts to load an MSR
value that has not been updated to take account of
transactional memory, that MSR value will have TS
|| TM = N0. It is desirable that the interrupt handler
remain in state S0, so that it can return normally to
the interrupted transaction.)
The problem solved by suppressing this transition
does not apply to rfebb, so for rfebb an attempt to
transition from S0 to N0 is not suppressed, and
instead causes a TM Bad Thing type Program
interrupt.
ESL
SD
EC
PLS /// PSLL /// TR MTL RL
0 4 41 42 43 44 48 54 56 60
Programming Note
The Hypervisor Facility Unavailable inter-
rupt occurs when a privileged non-hyper-
visor program executes stop when
PSSCRRL > PSSCRPSLL so that the
Hypervisor may decide whether or not to
allow the requested loss of state to occur.
If the hypervisor decides that some loss of
state is acceptable, it may choose to
re-execute stop after either setting PSS-
CRMTL to a value that causes state loss,
or setting both PSSCRRL and PSSCRMTL
to values that cause state loss. When the
thread exits power-saving mode, the
hypervisor can quickly determine whether
any resources were actually lost and need
to be restored.
LR CIA + 4
CTR33:36 42:47 undefined if (MSR29:31 ¬= 0b010 | CTR29:31 ¬= 0b000) then
CTR0:32 37:41 48:63 MSR0:32 37:41 48:63 MSR29:31 CTR29:31
MSR new_value (see below) MSR48 CTR48 | CTR49
NIA (see below) MSR58 CTR58 | CTR49
MSR59 CTR59 | CTR49
The effective address of the instruction following the
MSR0:2 4:28 32 37:41 49:50 52:57 60:63CTR0:2 4:28 32 37:41 49:50 52:57
System Call Vectored instruction is placed into the Link
60:63
Register. Bits 0:32, 37:41, and 48:63 of the MSR are NIA iea LR0:61 || 0b00
placed into the corresponding bits of Count Register,
and bits 33:36 and 42:47 of Count Register are set to If bits 29 through 31 of the MSR are not equal to 0b010
undefined values. or bits 29 through 31 of the Count Register are not
equal to 0b000, then the value of bits 29 through 31 of
Then a System Call Vectored interrupt is generated. the Count Register is placed into bits 29 through 31 of
The interrupt causes the MSR to be altered as the MSR. The result of ORing bits 48 and 49 of the
described in Section 6.5. Count Register is placed into MSR48. The result of
The interrupt causes the next instruction to be fetched ORing bits 58 and 49 of the Count Register is placed
as specified in LPCRAIL (see to Section 2.2). into MSR58. The result of ORing bits 59 and 49 of the
Count Register is placed into MSR59. Bits 0:2, 4:28, 32,
The SRRs are not affected. 37:41, 49:50, 52:57, and 60:63 of the Count Register
are placed into the corresponding bits of the MSR.
This instruction is context synchronizing.
If the instruction attempts to cause an illegal transaction
Special Registers Altered:
state transition or, when TM is made unavailable in
LR CTR MSR
problem state by the PCR, attempts to cause a transi-
tion to problem state and also a transaction state transi-
tion that Table 3 on page 947 shows as legal and as
resulting in the thread being in Transactional or Sus-
pended state, a TM Bad Thing type Program interrupt
is generated (unless a higher-priority exception is
pending). If this interrupt is generated, the value placed
into SRR0 by the interrupt processing mechanism (see
Section 6.4.3) is the address of the rfscv instruction.
Otherwise, if the new MSR value does not enable any
pending exceptions, then the next instruction is
fetched, under control of the new MSR value, from the
address LR0:61 || 0b00 (when SF=1 in the new MSR
value) or 320 || LR32:61 || 0b00 (when SF=0 in the new
MSR value). If the new MSR value enables one or
more pending exceptions, the interrupt associated with
the highest priority pending exception is generated; in
this case the value placed into SRR0 or HSRR0 by the
interrupt processing mechanism (see Section 6.4.3) is
the address of the instruction that would have been
executed next had the interrupt not occurred.
This instruction is privileged and context synchronizing.
Special Registers Altered:
MSR
Programming Note
If this instruction sets MSRPR to 1, it also sets
MSREE, MSRIR, and MSRDR to 1.
Programming Note
For the power-saving level corresponding to the
second item above, if the state of the Decrementer
were not maintained and updated as if the thread
was not in power-saving mode, Decrementer
exceptions would not reliably cause exit from this
power-saving level even if Decrementer exceptions
were enabled to cause exit.
Programming Note
Software EBB handlers should ensure that previ-
ous exceptions have been cleared (by setting
BESCRPMEO and/or BESCREEO to 0) before
re-enabling event-based branches (by setting BES-
CRGE to 1 or executing rfebb 1) in order to prevent
earlier exceptions from causing additional EBBs.
4.1 Fixed-Point Facility Over- version number, such as clock rate and
Engineering Change level.
view Version numbers are assigned by the Power ISA pro-
This chapter describes the details concerning the regis- cess. Revision numbers are assigned by an implemen-
ters and the privileged instructions implemented in the tation-defined process.
Fixed-Point Facility that are not covered in Book I.
4.3.2 Chip Information Register
4.2 Special Purpose Registers The Chip Information Register (CIR) is a 32-bit
read-only register that contains a value identifying the
Special Purpose Registers (SPRs) are read and written manufacturer and other characteristics of the chip on
using the mfspr (page 975) and mtspr (page 974) which the processor is implemented. The contents of
instructions. Most SPRs are defined in other chapters the CIR can be copied to a GPR by the mfspr instruc-
of this book; see the index to locate those definitions. tion. Read access to the CIR is privileged; write access
is not provided.
Bit Description
4.3.1 Processor Version Register 32:35 Manufacturer ID (ID) A four-bit field that iden-
The Processor Version Register (PVR) is a 32-bit tifies the manufacturer of the chip.
read-only register that contains a value identifying the 36:63 Implementation-dependent.
version and revision level of the implementation. The
contents of the PVR can be copied to a GPR by the Figure 8. Chip Information Register
mfspr instruction. Read access to the PVR is privi-
leged; write access is not provided. 4.3.3 Processor Identification
Version Revision Register
32 48 63
The Processor Identification Register (PIR) is a 32-bit
Figure 7. Processor Version Register register that contains a 20-bit PROCID field that can be
used to distinguish the thread from other threads in the
The PVR distinguishes between implementations that system. The contents of the PIR can be copied to a
differ in attributes that may affect software. It contains GPR by the mfspr instruction. Read access to the PIR
two fields. is privileged; write access is not provided.
Version A 16-bit number that identifies the version
of the implementation. Different version
numbers indicate major differences
between implementations.
Revision A 16-bit number that distinguishes between
implementations of the version. Different
revision numbers indicate minor differences
between implementations having the same
driving the RUN Light on a system The maximum value to which the PSPB can be set
operator panel must be a power of 2 minus 1. Bits that are not required
Direct External exception routing to represent this maximum value must return 0s when
Performance Monitor Counter incre- read regardless of what was written to them.
menting (see Chapter 9)
When the PSPB is set to a value less than its maxi-
The RUN bit can be used by the operating mum value but greater than 0, its contents decrease
system to indicate when the thread is doing monotonically at the same rate as the SPURR until its
useful work. contents minus the amount it is to be decreased are 0
or less when a problem state program is executing on
Write access to the CTRL is privileged. Reads can be
the thread at a priority of medium high.When the con-
performed in privileged or problem state.
tents of the PSPB minus the amount it is to be
decreased are 0 or less, its contents are replaced by 0.
4.3.7 Program Priority Register When the PSPB is set to its maximum value or 0, its
Privileged programs may set a wider range of program contents do not change until it is set to a different value.
priorities in the PRI field of PPR and PPR32 than may Whenever the priority of a thread is medium high and
be set by problem state programs (see Chapter 3 of either of the following conditions exist, hardware
Book II). Problem state programs may only set values changes the priority to medium:
in the range of 0b001 to 0b100 unless the Problem
State Priority Boost register (see Section 4.3.8) allows - the PSPB counts down to 0, or
the value 0b101. Privileged programs may set values in - PSPB=0 and the privilege state of the thread
the range of 0b001 to 0b110. Hypervisor software may is changed to problem state (MSRPR=1).
also set 0b111. For all priorities except 0b101, if a pro-
gram attempts to set a value that is not allowed for its
privilege level, the PRI field remains unchanged. If a 4.3.9 Relative Priority Register
problem state program attempts to set its priority value
The Relative Priority Register (RPR) is a 64-bit register
to 0b101 when this priority value is not allowed for
that allows the hypervisor to control the relative priori-
problem state programs, the priority is set to 0b100.
ties corresponding to each valid value of PPRPRI.
The values and their corresponding meanings are as
follows.
/ RP1 RP2 RP3 RP4 RP5 RP6 RP7
Bit(s) Description 0 8 16 24 32 40 48 56
SPRG0
SPRG1
SPRG2
SPRG3
0 63
Programming Note
Neither the contents of the SPRGs, nor accessing
them using mtspr or mfspr, has a side effect on
the operation of the thread. One or more of the reg-
isters is likely to be needed by non-hypervisor inter-
rupt handler programs (e.g., as scratch registers
and/or pointers to per thread save areas).
Operating systems must ensure that no sensitive
data are left in SPRG3 when a problem state pro-
gram is dispatched, and operating systems for
secure systems must ensure that SPRG3 cannot
be used to implement a “covert channel” between
problem state programs. These requirements can
be satisfied by clearing SPRG3 before passing
control to a program that will run in problem state.
HSPRG0
HSPRG1
0 63
Programming Note
Neither the contents of the HSPRGs, nor accessing
them using mtspr or mfspr, has a side effect on
the operation of the thread. One or more of the reg-
isters is likely to be needed by hypervisor interrupt
handler programs (e.g., as scratch registers and/or
pointers to per thread save areas).
Load Byte and Zero Caching Inhibited Load Halfword and Zero Caching
Indexed X-form Inhibited Indexed X-form
lbzcix RT,RA,RB lhzcix RT,RA,RB
31 RT RA RB 853 / 31 RT RA RB 821 /
0 6 11 16 21 31 0 6 11 16 21 31
if RA = 0 then b 0 if RA = 0 then b 0
else b (RA) else b (RA)
EA b + (RB) EA b + (RB)
RT 560 || MEM(EA, 1) RT 480 || MEM(EA, 2)
Let the effective address (EA) be the sum Let the effective address (EA) be the sum
(RA|0)+ (RB). The byte in storage addressed by EA is (RA|0)+ (RB). The halfword in storage addressed by
loaded into RT56:63. RT0:55 are set to 0. EA is loaded into RT48:63. RT0:47 are set to 0.
The storage access caused by this instruction is per- The storage access caused by this instruction is per-
formed as though the specified storage location is formed as though the specified storage location is
Caching Inhibited and Guarded. Caching Inhibited and Guarded.
This instruction is hypervisor privileged. This instruction is hypervisor privileged.
Special Registers Altered: Special Registers Altered:
None None
Load Word and Zero Caching Inhibited Load Doubleword Caching Inhibited
Indexed X-form Indexed X-form
lwzcix RT,RA,RB ldcix RT,RA,RB
31 RT RA RB 789 / 31 RT RA RB 885 /
0 6 11 16 21 31 0 6 11 16 21 31
if RA = 0 then b 0 if RA = 0 then b 0
else b (RA) else b (RA)
EA b + (RB) EA b + (RB)
RT 320 || MEM(EA, 4) RT MEM(EA, 8)
Let the effective address (EA) be the sum Let the effective address (EA) be the sum
(RA|0)+ (RB). The word in storage addressed by EA is (RA|0)+ (RB). The doubleword in storage addressed by
loaded into RT32:63. RT0:31 are set to 0. EA is loaded into RT.
The storage access caused by this instruction is per- The storage access caused by this instruction is per-
formed as though the specified storage location is formed as though the specified storage location is
Caching Inhibited and Guarded. Caching Inhibited and Guarded.
This instruction is hypervisor privileged. This instruction is hypervisor privileged.
Special Registers Altered: Special Registers Altered:
None None
Store Byte Caching Inhibited Indexed Store Halfword Caching Inhibited Indexed
X-form X-form
stbcix RS,RA,RB sthcix RS,RA,RB
31 RS RA RB 981 / 31 RS RA RB 949 /
0 6 11 16 21 31 0 6 11 16 21 31
if RA = 0 then b 0 if RA = 0 then b 0
else b (RA) else b (RA)
EA b + (RB) EA b + (RB)
MEM(EA, 1) (RS)56:63 MEM(EA, 2) (RS)48:63
Let the effective address (EA) be the sum Let the effective address (EA) be the sum
(RA|0)+ (RB). (RS)56:63 are stored into the byte in stor- (RA|0)+ (RB). (RS)48:63 are stored into the halfword in
age addressed by EA. storage addressed by EA.
The storage access caused by this instruction is per- The storage access caused by this instruction is per-
formed as though the specified storage location is formed as though the specified storage location is
Caching Inhibited and Guarded. Caching Inhibited and Guarded.
This instruction is hypervisor privileged. This instruction is hypervisor privileged.
Special Registers Altered: Special Registers Altered:
None None
31 RS RA RB 917 / 31 RS RA RB 1013 /
0 6 11 16 21 31 0 6 11 16 21 31
if RA = 0 then b 0 if RA = 0 then b 0
else b (RA) else b (RA)
EA b + (RB) EA b + (RB)
MEM(EA, 4) (RS)32:63 MEM(EA, 8) (RS)
Let the effective address (EA) be the sum Let the effective address (EA) be the sum
(RA|0)+ (RB). (RS)32:63 are stored into the word in stor- (RA|0)+ (RB). (RS) is stored into the doubleword in
age addressed by EA. storage addressed by EA.
The storage access caused by this instruction is per- The storage access caused by this instruction is per-
formed as though the specified storage location is formed as though the specified storage location is
Caching Inhibited and Guarded. Caching Inhibited and Guarded.
This instruction is hypervisor privileged. This instruction is hypervisor privileged.
Special Registers Altered: Special Registers Altered:
None None
4.4.2 OR Instruction
or Rx,Rx,Rx can be used to set PPRPRI (see Section
4.3.7) as shown in Figure 17. For all priorities except
medium high, PPRPRI remains unchanged if the privi-
lege state of the thread executing the instruction is
lower than the privilege indicated in the figure. For pri-
ority medium high, PPRPRI is set to medium if the
thread executing the instruction is in problem state and
medium high priority is not allowed for problem state
programs. (The encodings available to problem state
programs, as well as encodings for additional shared
resource hints not shown here, are described in Chap-
ter 3 of Book II.)
Chapter 2).
3 This register cannot be directly written. Instead, bits in the register corresponding to 0 bits in (RS) can be
cleared using mtspr SPR,RS.
4 The value specified in register RS may be masked by the contents of the [U]AMOR before being placed into the
quent mtspr instructions. See the introductory text in this section for more details
10 SPR numbers 777-778, 783, 793-794, and 799 are reserved for the Performance Monitor. All other SPR num-
bers that are not shown above and are not implementation-specific are reserved.
11 The mftb instruction is Phased-Out. Assemblers targeting Version 2.03 or later of the architecture should gener-
ate an mfspr instruction for the mftb and mftbu extended mnemonics; see the corresponding Assembler Note
in the mftb instruction description (see Section 6.1 of Book II).
12 No extended mnemonic is provided because previous versions of the architecture defined the obvious extended
mnemonic as resolving to the non-privileged SPR number, and because there is no software benefit in using the
privileged SPR number, rather than the non-privileged SPR number, for this function.
13 mtspr specifying PIDR performs the implementation specific lookaside invalidation specified by slbia with
IH=0b011 when using Radix Tree translation.
14 mtspr specifying LPIDR performs the implementation specifc lookaside invalidation specified by slbia with
IH=0b110 when switching between partitions both of which use Radix Tree translation.
*This figure also defines extended mnemonics for the mtspr and mfspr instructions, including the Special Purpose
Registers (SPRs) defined in Book I and for the Move From Time Base instruction defined in Book II.
The mtspr and mfspr instructions specify an SPR as a numeric operand; extended mnemonics are provided that
represent the SPR in the mnemonic rather than requiring it to be coded as an operand. Similar extended mnemon-
ics are provided for the Move From Time Base instruction, which specifies the portion of the Time Base as a
numeric operand.
Note: mftb serves as both a basic and an extended mnemonic. The Assembler will recognize an mftb mnemonic
with two operands as the basic form, and an mftb mnemonic with one operand as the extended form. In the
extended form the TBR operand is omitted and assumed to be 268 (the value that corresponds to TB)
Move To Special Purpose Register a reserved SPR, and is treated as a no-op; see
XFX-form Section 1.3.3, “Reserved Fields, Reserved Values, and
Reserved SPRs” in Book I. Otherwise, the contents of
mtspr SPR,RS register RS are placed into the designated Special Pur-
pose Register, except as described in the next four
31 RS spr 467 / paragraphs. For Special Purpose Registers that are 32
0 6 11 21 31 bits long, the low-order 32 bits of RS are placed into the
SPR.
n spr5:9 || spr0:4 When the designated SPR is the Authority Mask Regis-
switch (n) ter (AMR), (using SPR 13 or SPR 29), or the desig-
case(13): if MSRHV PR = 0b10 then nated SPR is the Instruction Authority Mask Register
SPR(13) (RS)
(IAMR), and MSRHV PR=0b00, the contents of bit posi-
else
if MSRHV PR = 0b00 then tions of register RS corresponding to 1 bits in the
SPR(13) ((RS) & AMOR) | Authority Mask Override Register (AMOR) are placed
((SPR(13)) & ¬AMOR) into the corresponding bits of the AMR or IAMR,
else respectively; the other AMR or IAMR bits are not modi-
SPR(13) ((RS) & UAMOR) | fied.
((SPR(13)) & ¬UAMOR)
case(29,61):if MSRHV PR = 0b10 then When the designated SPR is the AMR, using SPR 13,
SPR(n) (RS) and MSRPR=1, the contents of bit positions of register
else RS corresponding to 1 bits in the User Authority Mask
SPR(n) ((RS) & AMOR) | Override Register (UAMOR) are placed into the corre-
((SPR(n)) & ¬AMOR) sponding bits of the AMR; the other AMR bits are not
case(48): SPR(n) (RS) modified.
if PATEHR=1 for the partition then
All implementation-specific When the designated SPR is the UAMOR and
lookaside information that was MSRHV PR=0b00, the contents of register RS are
created when address translation ANDed with the contents of the AMOR and the result is
was enabled and for which effPID0 placed into the UAMOR.
is invalidated.
case (157): if MSRHV PR = 0b10 then When the designated SPR is the PIDR and the partition
SPR(157) (RS) uses Radix Tree translation, the implementation spe-
else cific lookaside invalidation specified by slbia with
SPR(157) (RS) & AMOR IH=0b011 is performed along with the SPR update.
case (158):start mtspr sequence optimization When the designated SPR is the LPIDR and both the
case(319): SPR(n) (RS)
originating and destination partitions use Radix Tree
if PATEHR=1 for both the originating
and destination partitions than translation, the implementation specific lookaside inval-
All implementation-specific idation specified by slbia with IH=0b110 is performed
lookaside information that was along with the SPR update.
created when address translation
When the designated SPR is the Hypervisor Mainte-
was enabled and for which effPID0
or effLPID0 is invalidated. nance Exception Register (HMER), the contents of reg-
All implementation-specific ister RS are ANDed with the contents of the HMER and
lookaside information that was the result is placed into the HMER.
created when address translation
For this instruction, SPRs TBL and TBU are treated as
was disabled and MSRHV PR was equal
to 0b00 is invalidated. separate 32-bit registers; setting one leaves the other
case (336):SPR(336) (SPR(336)) & (RS) unaltered.
case (808, 809, 810, 811): spr0=1 if and only if writing the register is privileged.
default: if length(SPR(n)) = 64 then
SPR(n) (RS) Execution of this instruction specifying an SPR number
else with spr0=1 causes a Privileged Instruction type Pro-
SPR(n) (RS)32:63 gram interrupt when MSRPR=1 and, if the SPR is a
hypervisor resource (see Figure 18) when
MSRHV PR=0b00, causes a Privileged Instruction type
Program interrupt if LPCREVIRT=0 and a Hypervisor
The SPR field denotes a Special Purpose Register,
Emulation Assistance interrupt if LPCREVIRT=1.
encoded as shown in Figure 18. If the SPR field con-
tains the value 158, the instruction indicates the start of
a sequence of mtspr instructions that may be synchro- Execution of this instruction specifying an SPR number
nized as a group. See the introductory material in this that is undefined for the implementation causes one of
section for more information. If the SPR field contains the following.
a value from 808 through 811, the instruction specifies if spr0=0:
Note
See the Notes that appear with mtspr.
Programming Note
If this instruction sets MSRPR to 1, it also sets
MSREE, MSRIR, and MSRDR to 1.
This instruction does not alter MSRME or MSRLE.
(This instruction does not alter MSRHV because it
does not alter any of the high-order 32 bits of the
MSR.)
If the only MSR bits to be altered are MSREE RI, to
obtain the best performance L=1 should be used.
Move To Machine State Register Except in the mtmsrd instruction description in this
Doubleword X-form section, references to “mtmsrd” in this document imply
either L value unless otherwise stated or obvious from
mtmsrd RS,L context (e.g., a reference to an mtmsrd instruction that
modifies an MSR bit other than the EE or RI bit implies
31 RS /// L /// 178 / L=0).
0 6 11 15 16 21 31
Programming Note
if L = 0 then If this instruction sets MSRPR to 1, it also sets
MSREE, MSRIR, and MSRDR to 1.
if (MSR29:31 ¬= 0b010 | RS29:31 ¬= 0b000) then This instruction does not alter MSRLE, MSRME or
MSR29:31 RS29:31 MSRHV.
MSR48 (RS)48 | (RS)49
MSR58 (RS)58 | (RS)49 If the only MSR bits to be altered are MSREE RI, to
MSR59 (RS)59 | (RS)49 obtain the best performance L=1 should be used.
MSR0:2 4:28 32:47 49:50 52:57 60:62
(RS)0:2 4 6:28 32:47 49:50 52:57 60:62
else Programming Note
MSR48 62 (RS)48 62 If MSREE=0 and an External, Decrementer, or Per-
formance Monitor exception is pending, executing
The MSR is set based on the contents of register RS
an mtmsrd instruction that sets MSREE to 1 will
and of the L field.
cause the interrupt to occur before the next instruc-
L=0: tion is executed, if no higher priority exception
exists (see Section 6.9, “Interrupt Priorities” on
If bits 29 through 31 of the MSR are not equal to
page 1092). Similarly, if a Hypervisor Decrementer
0b010 or bits 29 through 31 of RS are not equal to
interrupt is pending, execution of the instruction by
0b000, then the value of bits 29 through 31 of RS
the hypervisor causes a Hypervisor Decrementer
is placed into bits 29 through 31 of the MSR.The
interrupt to occur if HDICE=1.
result of ORing bits 48 and 49 of register RS is
placed into MSR48. The result of ORing bits 58 and For a discussion of software synchronization
49 of register RS is placed into MSR58. The result requirements when altering certain MSR bits, see
of ORing bits 59 and 49 of register RS is placed Chapter 11.
into MSR59. Bits 0:2, 4:28, 32:47, 49:50, 52:57,
and 60:62 of register RS are placed into the corre-
Programming Note
sponding bits of the MSR.
mtmsrd serves as both a basic and an extended
L=1: mnemonic. The Assembler will recognize an
Bits 48 and 62 of register RS are placed into the mtmsrd mnemonic with two operands as the basic
corresponding bits of the MSR. The remaining bits form, and an mtmsrd mnemonic with one operand
of the MSR are unchanged. as the extended form. In the extended form the L
operand is omitted and assumed to be 0.
If the instruction attempts to cause an illegal transaction
state transition or, when TM is made unavailable in
problem state by the PCR, attempts to cause a transi-
tion to problem state and also a transaction state transi-
tion that Table 3 on page 947 shows as legal and as
resulting in the thread being in Transactional or Sus-
pended state, a TM Bad Thing type Program interrupt
is generated (unless a higher-priority exception is
pending). If this interrupt is generated, the value placed
into SRR0 by the interrupt processing mechanism (see
Section 6.4.3) is the address of the mtmsrd instruction.
This instruction is privileged.
If L=0 this instruction is context synchronizing. If L=1
this instruction is execution synchronizing; in addition,
the alterations of the EE and RI bits take effect as soon
as the instruction completes.
Special Registers Altered:
MSR
31 RT /// /// 83 /
0 6 11 16 21 31
RT MSR
The contents of the MSR are placed into register RT.
This instruction is privileged.
Special Registers Altered:
None
5.4 Data Access Stores are not performed out-of-order (even if the
Store instructions that caused them were executed
Data accesses are controlled by MSRDR. out-of-order).
Storage Control Overview 2. The hypervisor may assign a guest real address
space size for each partition that uses Radix Tree
Host real address space size is 2m bytes, m60; translation. Accesses to guest real storage out-
see Note 1. side this range but still mappable by the second
Guest real address space size is 2m bytes, m60; level Radix Tree will cause an HISI or HDSI.
see Notes 1 and 2. Accesses to storage outside the mappable range
will have boundedly undefined results.
Real page size is 212 bytes (4 KB).
3. The value of n is implementation-dependent (sub-
Effective address space size is 264 bytes.
ject to the range given above). In references to
For HPT translation, an effective address is trans- 78-bit virtual addresses elsewhere in this Book, the
lated to a virtual address via a segment descriptor high-order 78-n bits of the “78-bit” virtual address
that was either bolted into the Segment Lookaside are assumed to be zeros.
Buffer (SLB) by software or found and installed into
4. The supported values of p for the larger virtual
the SLB via a hardware walk of the Segment Table.
page sizes are implementation-dependent (subject
After that, the virtual address is translated to a host
to the limitations given above).
real address via a hardware walk of the Page
Table.
Programming Note
- Virtual address space size is 2n bytes,
65n78; see Note 3. Note that without some of the reserved bits in the
- Segment size is 2s bytes, s=28 or 40. Radix PTE, the RPN field cannot address the full
- 2n-40 number of virtual segments 2n-28; 60-bit real address space. Similarly without some
see Note 3. of the reserved bits in the HPT PTE, the ARPN field
- Virtual page size is 2p bytes, where 12p, and cannot address the full 60-bit real address space.
2p is no larger than either the size of the big- Note that without some of the reserved bits in the
gest segment or the real address space; a HPT PTE, the AVA field cannot resolve the full
size of 4 KB, 64 KB, and an implementa- 78-bit virtual address.
tion-dependent number of other sizes are sup-
ported; see Note 4. The Page Table specifies
the virtual page size. The SLB specifies the
base virtual page size, which is the smallest
5.7.1 32-Bit Mode
virtual page size that the segment can con- The computation of the 64-bit effective address is inde-
tain. The base virtual page size is 2b bytes. pendent of whether the thread is in 32-bit mode or
- Segments contain pages of a single size, a 64-bit mode. In 32-bit mode (MSRSF=0), the high-order
mixture of 4 KB and 64 KB pages, or a mixture 32 bits of the 64-bit effective address are treated as
of page sizes that include implementa- zeros for the purpose of addressing storage. This
tion-dependent page sizes. applies to both data accesses and instruction fetches. It
- applies independent of whether address translation is
For Radix Tree translation, an effective address is enabled or disabled. This truncation of the effective
translated to a (guest or host) real address via a address is the only respect in which storage accesses
hardware walk of the Page Table.. in 32-bit mode differ from those in 64-bit mode.
- Virtual page size is 2p bytes, where 12p, and
2p is no larger than the size of the real Programming Note
address space; a size of 4 KB, 64 KB, 2MB, Treating the high-order 32 bits of the effective
and an implementation-dependent number of address as zeros effectively truncates the 64-bit
other sizes are supported; see Note 4. The effective address to a 32-bit effective address such
virtual page size is determined by the location as would have been generated on a 32-bit imple-
of the Page Table Entry in the Radix Tree. mentation of the Power ISA. Thus, for example, the
ESID in 32-bit mode is the high-order four bits of
Notes:
this truncated effective address; the ESID thus lies
1. The value of m is implementation-dependent (sub- in the range 0-15. When address translation is
ject to the maximum given above). When used to enabled, these four bits would select a Segment
address storage or to represent a guest real Register on a 32-bit implementation of the Power
address, the high-order 60-m bits of the “60-bit” ISA. The SLB entries that translate these 16 ESIDs
real address must be zeros. can be used to emulate these Segment Registers.
Guarded
Programming Note
not SAO
There are two cautions about mixing different types
Additionally, storage accesses in hypervisor real of accesses (i.e.Load/Store Caching Inhibited
addressing mode are performed as though all storage instructions vs. any other Load or Store instruction
was not No-execute. vs. instruction fetches). The first, as indicated
above, is to avoid confusing the history mecha-
Programming Note nism, and the granularity for concern is a history
Because storage accesses in hypervisor real block. For this caution, instruction fetches are irrel-
addressing mode do not use the SLB or the Page evant because they have their own history mecha-
Table, accesses in this mode bypass all checking nism and are always intended to be non-guarded.
and recording of information contained therein
The second caution is to avoid storage paradoxes
(e.g., storage protection checks that use informa-
that result from a Caching Inhibited access to a
tion contained therein are not performed, and refer-
location that is held in a cache. The nature of this
ence and change information is not recorded).
caution and its solution are described in
Section 5.8.2.2, “Altering the Storage Control Bits”.
5.7.3.2.1 Hypervisor Real Mode Storage Control The minimum granularity for concern is the history
block, but may be larger, depending on extant
The Hypervisor Real Mode Storage Control facility pro- translations to the storage in question. Since the
vides a means of specifying portions of real storage consistency of instruction storage is managed by
that are treated as non-Guarded in hypervisor real software and hypervisor real mode instruction
addressing mode (MSRHV PR=0b10, and MSRIR=0 or fetches are always not Caching Inhibited, instruc-
MSRDR=0, as appropriate for the type of access). The tion fetches are also irrelevant to this caution.
remaining portions are treated as Guarded in hypervi-
sor real addressing mode. The means is a hypervisor The facility does not apply to implicit accesses to the
resource (see Chapter 2), and may also be sys- Page Table performed during address translation or in
tem-specific. recording reference and change information. These
The facility divides real storage into history blocks, in accesses are performed as described in Section
implementation-specific sizes. The history for instruc- 5.7.3.4.
tion fetches is tracked separately from that for data
accesses. If there is no instruction fetch history for a Programming Note
block and it is the target of an instruction fetch, the The preceding capability can be used to improve
access is performed as though the block is Guarded, the performance of hypervisor software that runs in
but the block is treated as non-Guarded for subsequent hypervisor real addressing mode, by causing
instruction fetches on a best effort basis, limited by the accesses to instructions and data that occupy
amount of history that the facility can maintain. If there well-behaved storage to be treated as
is no data access history for a block and it is accessed non-Guarded.
using a Load/Store Caching Inhibited instruction, the
access is performed as though the block is Guarded,
and the block is treated as Guarded for subsequent 5.7.3.3 Virtual Real Mode Addressing
accesses on a best effort basis, limited by the amount Mechanism
of history that the facility can maintain. If there is no
data access history for a block and it is accessed using If MSRHV=0, the partition is using Paravirtualized HPT
any other Load or Store instruction, the access is per- translation (PATEHR=0), and MSRDR=0 or MSRIR=0 as
formed as though the block is Guarded, but the block is appropriate for the type of access, the access is said to
treated as non-Guarded for subsequent accesses on a be made in virtual real addressing mode and is con-
best effort basis, limited by the amount of history that trolled by the mechanism specified below. The set of
the facility can maintain. storage locations accessible by code is referred to as
the Virtualized Real Mode Area (VRMA).
The storage location specified by a Load/Store Caching
Inhibited instruction must not be in storage that is spec- In virtual real addressing mode, address translation,
ified by the Hypervisor Real Mode Storage Control storage protection, and reference and change record-
facility to be treated as non-Guarded. The storage loca- ing are handled as follows.
tion specified by any other Load or Store instruction Address translation and storage protection are
must not be in storage that is specified by the Hypervi- handled as if address translation were enabled,
sor Real Mode Storage Control facility to be treated as except that translation of effective addresses to vir-
Guarded. ("specified by the Hypervisor Real Mode tual addresses use the SLBE values in Figure 19
Storage Control facility" means "specified in a history instead of the entry in the SLB corresponding to
block".) The history can be erased using an slbia the ESID. In this translation, bits 0:23 of the effec-
instruction; see Section 5.9.3.2. tive address are ignored (i.e., treated as if they
were 0s), bits 24:63-m may be ignored if m < 40, 5.7.3.4 Storage Control Attributes for
and the Virtual Page Class Key Protection mecha- Implicit Storage Accesses
nism does not apply.
Implicit accesses to the Partition Table and to a parti-
Programming Note tion-scoped Page Table during address translation and
The Virtual Page Class Key Protection mecha- in recording reference and change information are per-
nism does not apply because the authority formed as though the storage occupied by the tables
mask that an OS has set for application pro- had the following storage control attributes.
grams executing with address translation not Write Through Required
enabled may not be the same as the authority not Caching Inhibited
mask required by the OS when address trans- Memory Coherence Required
lation is disabled, such as when first entering not Guarded
an interrupt handler. not SAO
Reference and change recording are handled as if Implicit accesses to a Process Table, Segment table, or
address translation were enabled. process-scoped Page Table during address translation
and in recording reference and change information are
Field Value performed using the storage control attributes in the
360 partition-scoped Page Table Entry that maps the other
ESID In-Memory Table Entry or the process-scoped Page
V 1 Table Entry that is being accessed. The storage control
B 0b01 - 1 TB attributes must be those described above.
VSID 0b00 || 0x0_01FF_FFFF
Ks 0 5.7.4 Definitions
Kp undefined
process-scoped: Refers to translation performed
N 0
using tables pointed to by Process Table Entries: guest
L PATEPS[0] Radix Tree translation, host Radix Tree translation for
C 0 quadrants 0 and 3 when MSRHV=1 , or Segment trans-
LP PATEPS[1:2] lation.
used for the LPID and/or the PID is specified to be zero The two contiguous real pages beginning at real
instead of the actual register contents. “Effective” or address 0x0000_0000_0000_1000 are reserved
“eff” is used to indicate the possibility of such a substi- for implementation-specific purposes.
tution. This value substitution happens only in Radix
Offset Real Mode interrupt vectors
Tree translation, and is based on the value of EA0:1
(see Section 5.7.5.1, “Effective Address Space Struc- The real page beginning at the real address speci-
ture for Radix-using Partitions”). Value substitution fied by the HRMOR is used similarly to the page
does not happen in HPT translation. When a guest for the fixed interrupt vectors.
uses Radix Tree translation, PID substitution may take
Relocated interrupt vectors
place. When a host uses Radix Tree translation, both
PID and LPID substitution may take place. When a Depending on the values of MSRIR DR and
host uses HPT translation, the only special significance LPCRAIL and on whether the specific interrupt will
associated with LPIDR=0 is with regard to Segment cause MSRHV to change, either the virtual page
Table walk when MSRHV=1, as described later. containing the byte addressed by effective
address 0x0000_0000_0001_8000 or the virtual
adjunct: An adjunct is a software entity that resides in page containing the byte addressed by effective
a partition along with an operating system and its appli- address 0xC000_0000_0000_4000 may be used
cations in order to efficiently provide services (e.g. similarly to the page for the fixed interrupt vectors.
device drivers) for the partition. The adjunct is man- (See Section 2.2.)
aged by the hypervisor. It runs in problem state with
MSRHV PR=0b11, thereby restricting the resources it System Call Vectored interrupt vectors
can modify (MSRPR=1) and causing its interrupts to go Depending on the value of LPCRAIL, the virtual
to the hypervisor (MSRHV=1). It shares an HPT with page containing the effective address
the partition it serves. The adjunct’s storage is kept 0x0000_0000_0001_7000 or
separate from the client partition’s storage using Virtual 0xc000_0000_0000_3000 contains the interrupt
Page Class Key protection. (The adjunct’s lightness of vectors that are invoked by the System Call Vec-
weight derives from not requiring a full partition context tored instruction.
switch (SLB flush, TLB flush, LPID/PID change, etc.)
when the client partition invokes the services of the Partition Table
adjunct.) Each hardware thread may have its own A contiguous sequence of real pages beginning at
unique translations for an adjunct. As a result, adjunct the real address specified by the PTCR contains
segment descriptors cannot exist in the process’s Seg- the Partition Table.
ment Table and must instead be bolted in the SLB man-
ually. The adjunct construct exists only with a Page Table
hypervisor that uses HPT translation and only for A contiguous sequence of real pages beginning at
LPID0. The adjunct has its own 64-bit EA space. the real address specified by the first doubleword
Entry to an adjunct is only possible from hypervisor of the Partition Table Entry when HR=0 contains
state. Prior to dispatching the adjunct, the hypervisor the Page Table.
must invalidate SLB entries that map the effective
address range that will be used by the adjunct. Simi-
larly, on exit from the adjunct, the hypervisor must 5.7.5.1 Effective Address Space Struc-
invalidate its SLB entries ture for Radix-using Partitions
When Radix Tree translation is in use but translation is
disabled (MSRPR=0), MSRHV selects between parti-
tion-scoped translation of the real mode guest real
address, formed by treating EA0:1 as 0b00, and hyper-
5.7.5 Address Ranges Having visor real mode (see Section 5.7.3). When Radix Tree
translation is in use and translation is enabled, EA0:1
Defined Uses together with MSRHV are used to select one of as many
The address ranges described below have uses that as four Radix Trees with which to perform pro-
are defined by the architecture. cess-scoped translation, as a technique to make sys-
tem calls and interrupts more efficient by avoiding the
Fixed interrupt vectors need to immediately change the contents of the PIDR
Except for the first 256 bytes, which are reserved and LPIDR. (See Figure 20 for an illustration of the
for software use, the real page beginning at real mappings.) Since there’s nothing to prevent a process
address 0x0000_0000_0000_0000 is either used from generating any address in the 64b EA space, the
for interrupt vectors or reserved for future interrupt exceptional cases are defined as follows. When a
vectors. quadrant of the EA space has no associated Radix
Tree, access to it results in an Instruction Segment
Implementation-specific use exception or Data Segment exception, as appropriate
for the type of access. Similarly, reference to any por- ally be put in PIDR between accesses to quadrants 0
tion of these quadrants or the real mode guest real and 1.
address described above that is not mapped by a
When MSRHV=1 and EA0:1=0b00 or 0b11, only pro-
Radix Tree (versus mapped by an invalid entry) will
cess-scoped translation is performed. When MSRHV=0
cause an Instruction or Data Segment exception.
and MSRIR/DR=0, only partition-scoped translation is
performed. Otherwise, nested process- and parti-
Programming Note
tion-scoped translations are performed.
Note that the quadrant structure is only available to
software running in 64b mode. 32b software will
only be able to access storage mapped by its own
Radix Tree.
Guest Host App Hypervisor
Example:
0 2 3 55 58 63
1 RTS1 / RPDB RTS2 RPDS
/ PRTB // PRTS
0 3 51 58 63
aligned. Software that uses Radix Tree translation DW Bit(s) Name Description
must set the low order PRTS bits of PRTB to 0s. When 0 0:1 B Segment Table Segment
Segment Tables are provided, the Process Table base Size
address is specified as a VSID with the assumption that 2:63 STABORGU Segment Table Origin Upper
the Process Table is located at zero offset in the seg-
1 0:3 STABORGL Segment Table Origin Lower
ment, and also includes the base page size used for
the HPT search, with the rest of the implied segment 56:59 STABSIZE Segment Table Size
descriptor being B=0b01 (1TB segment), Ks=Kp=0, = 212+STABSIZE,
N=0, C=0, and virtual page class key protection does STABSIZE
not apply. The Partition Table Entry variants are illus- 60:62 STPS Segment Table Page Size
trated in Figure 22.Notethat a configuration with HR=1 (uses L||LP encoding as in
for a non-zero LPID and HR=0 for LPID=0 is consid- SLBE)
ered an unsupported MMU configuration because it 63 V Valid
would attempt to perform HPT translation in quadrants
0 and 3 when MSRHV=1. In addition, LPID=0 with 0 2 3 55 58 63
Radix Tree translation is an unsupported MMU configu- / RTS1 / RPDB RTS2 RPDS
ration when MSRHV=0. ///
0 63
Programming Note
The size of the Process Table is provided to sim- DW Bit(s) Name Description
plify hardware design and testing. The size
0 1:2 RTS1 Radix Tree Size[0:1]
enables the hardware to mask address bits instead
3 / Reserved
of providing an adder. No size checking is provided
for these tables. (An out-of-range LPID or PID will 4:55 RPDB Root Page Directory Base
not produce an exception simply because of its 56:58 RTS2 Radix Tree Size[2:4] (number of
size.) Hypervisor software may help detect such address bits mapped),
errors by the OS by not providing a translation for size=2RTS+31
virtual / guest real addresses for a page or two 59:63 RPDS Root Page Directory Size
beyond the end of the Process Table. = 2RPDS+3, RPDS5
The pairing of Segment translation and Hashed Page In Paravirtualized HPT mode, the hypervisor also uses
Table (HPT) translation applies Segment translation to the segment/HPT pairing, and can create a process
an effective address to produce a virtual address as called an “adjunct”. To do so, it eliminates any poten-
described in Section 5.7.8, and HPT translation to the tially conflicting guest segment mappings and creates
virtual address to produce a host real address as adjunct mappings prior to dispatching the adjunct.
described in Section 5.7.9. Segment translations can
In the other pairing, Radix Tree translation is used for
be established by both the guest and the hypervisor,
both the process-scoped and partition-scoped map-
but the HPT translation is always managed by the
pings. This mode is sometimes referred to as nested
hypervisor with the guest typically giving direction via
Radix or Radix on Radix translation. Figure 25 gives
system calls to the hypervisor in a paravirtualization
an overview of the address translation process for
relationship. This mode is commonly referred to as
Radix on Radix translation. Note that each level of the
Paravirtualized HPT translation. The segment transla-
guest Radix Tree produces a guest real address that
tion is managed on a per-process (“process-scoped”)
must itself undergo partition-scoped translation. See
basis, mapping a smaller effective address space into a
Figure 36 for a detailed illustration of the entire pro-
large “partition-scoped” virtual address space, where
cess.
the segment can be used as a shared memory object.
There is also the possibility of thread-unique mappings. Storage exceptions for process-scoped translation are
In the basic version of HPT translation, storage excep- directed to the operating system, and storage excep-
tions are directed to the operating system, which in turn tions for partition-scoped translation are directed to the
issues system calls to the hypervisor. When Virtualized hypervisor. (In this categorization, single level transla-
Partition Memory is enabled, storage exceptions are tion is considered process-scoped translation except
directed to the hypervisor, enabling a higher degree of when VPM is active, in which case it is treated like par-
memory overcommitment as the hypervisor transpar- tition-scoped translation.) As a result, for Radix on
ently steals pages from the partition. Figure 24 gives Radix translation, the hypervisor can use the parti-
an overview of the address translation process. tion-scoped mapping to limit the size of the guest real
address space, and Virtualized Partition Memory is not
necessary to enable a higher degree of memory over-
commitment. If in Radix on Radix mode the guest real
address is outside the range covered by the parti-
Effective Address tion-scoped Radix Tree, the results are boundedly
undefined.
The address specified in ASDR is the guest real
address or VSID for which translation has most imme-
diately failed except when the translation fails too early
Lookup in SLB and possibly
Segment Table to produce that value. HDAR will generally contain the
EA or lower VA bits for which translation has most
immediately failed. For example, in the case of a Page
Directory being paged out, the ASDR will contain the
guest real address of the Page Directory Entry (down to
bit 51), rather than the GRA of the datum being
Virtual Address
accessed. Exceptions may be manifest in unexpected
ways. For example, an instruction fetch can fail to set a
Change bit in the host PTE mapping the guest PTE.
Similarly, the Reference bit update might fail for lack of
write authority on the PTE.
Lookup in
Page Table
Real Address
Programming Notes
1. Page Table Entries may or may not be cached
in a TLB.
Effective Address
2. It is possible that the hardware implements
more than one TLB, such as one for data and
one for instructions. In this case the size and
shape of the TLBs may differ, as may the val-
Lookup in process-scoped ues contained therein.
Page Table 3. Use the tlbie instruction to ensure that the TLB
no longer contains a mapping for a particular
virtual page.
Lookup in partition-
scoped Page Table
Programming Note
encoding base page size It is permissible for software to replace the contents
of a valid SLB entry without invalidating the transla-
0b000 4 KB
tion specified by that entry provided the specified
0b101 64 KB restrictions are followed. See Chapter 11 Note 10.
additional 2b bytes, where b > 12 and b may differ
values1 among encoding values
1 The “additional values” are implementation-depen- 5.7.8.2 SLB Search
dent, as are the corresponding base virtual page When the hardware searches the SLB, all entries are
sizes. Any values that are not supported by a given tested for a match with the EA. For a match to exist, the
implementation are reserved in that implementa- following conditions must be satisfied for indicated
tion. fields in the SLBE.
V=1
Figure 28. Page Size Encodings ESID0:63-s=EA0:63-s, where the value of s is speci-
For each SLB entry, software must ensure the following fied by the B field in the SLBE being tested
requirements are satisfied. If no match is found, the search fails. If one match is
- L||LP contains a value supported by the imple- found, the search succeeds. If more than one match is
mentation. found, one of the matching entries is used as if it were
- The base virtual page size selected by the L the only matching entry, or a Machine Check occurs.
and LP fields does not exceed the segment If the SLB search succeeds, the virtual address (VA) is
size selected by the B field. formed from the EA and the matching SLB entry fields
- If s=40, the following bits of the SLB entry con- as follows.
tain 0s. VA=VSID0:77-s || EA64-s:63
- ESID24:35
- VSID38:49 The Virtual Page Number (VPN) is bits 0:77-p of the vir-
tual address. The value of p is the actual virtual page
The bits in the above two items are ignored by size specified by the PTE used to translate the virtual
the hardware. address (see Section 5.7.9.1). If SLBEN = 1, the N
(No-execute) value used for the storage access is 1.
match is found, one of the matching entries is used as if 5.7.9 Hashed Page Table Transla-
it were the only matching entry.
tion
ESID V // B VSID KsKpNLC / LP SW In Paravirtualized HPT mode, conversion of a 78-bit vir-
0 35 36 63 65 115 120 121 123 127
tual address to a real address is done by searching the
Page Table as shown in Figure 30.
Bit(s) Name Description
0:35 ESID Effective Segment ID
36 V Entry valid (V=1) or invalid (V=0)
28
AND
Page Table
16 bytes
PTEG 0 PTE0 PTE7
14 28 11 7
0000000
60-bit Real Address of Page Table Entry Group (PTEG)
PTEG n
128 bytes
0 12 57 6162 63 0 1 2 4 6 7 44 52 54 5556 57 61 62 63
(ARPN||LP)0:56-p
57-p p
60-bit Real Address 000 Byte
Dword Bit(s) Name Description A virtual page is mapped to a sequence of 2p-12 contig-
0 12:56 AVA Abbreviated Virtual Address uous real pages such that the low-order p-12 bits of the
57:60 SW Available for software use real page number of the first real page in the sequence
61 L Virtual page size are 0s.
0b0 - 4 KB
PTEL LP specify both a base virtual page size (hence-
0b1 - greater than 4KB
forth referred to as the “base page size”) and an actual
(large page)
virtual page size (henceforth referred to as the “actual
62 H Hash function identifier page size” or “virtual page size”). The actual page size
63 V Entry valid (V=1) or invalid is the size of the virtual page mapped by the PTE. The
(V=0) base page size is the smallest actual page size that a
1 0 pp Page Protection bit 0 segment can contain for explicit accesses or for a given
2:3 key KEY bits 0:1 implicit access, and plays a role in the placement of the
4:5 B Segment Size PTE in the HPT.
0b00 - 256 MB
If PTEL=0, the base virtual page size and actual virtual
0b01 - 1 TB
page size are 4KB, and ARPN concatenated with LP
0b10 - reserved
(ARPN||LP) contains the page number of the real page
0b11 - reserved
that maps the virtual page described by the entry.
If PTEL=1, the base page size and actual page size are
Programming Note
specified by PTELP. In this case, the contents of PTELP
have the format shown in Figure 32. Bits labelled “r” are Implementations often have TLBs and implementa-
bits of the real page number. Bits labelled “z” specify tion-specific lookaside buffers (e.g. ERATs) used to
the base page size and actual page size. The values cache translations of recently used storage
of the “z” bits used to specify each size are implemen- addresses. Mapping virtual storage to large pages
tation-dependent. The values of the “z” bits used to may increase the effectiveness of such lookaside
specify each size, along with all possible values of “r” buffers, improving performance, because it is pos-
bits in the LP field, must result in LP values distinct sible for such buffers to translate a larger range of
from other LP values for other sizes. Actual page sizes addresses, reducing the frequency that the Page
4KB and 64KB are always supported; other actual page Table must be searched to translate an address.
sizes are implementation-dependent. If PTEL=1, the
actual page size must be greater than 4 KB. Which
combinations of different base page size and actual
page size are supported is implementation-dependent,
except that the combination of a base page size of 4 Instructions cannot be executed from a No-execute
KB with an actual page size of 64 KB is always sup- (N=1) page.
ported.
PTE LP actual page size Page Table Size
r r r r _r r r z 8 KB
r r r r _r r z z 16 KB The number of entries in the Page Table directly affects
r r r r _r z z z 32 KB performance because it influences the hit ratio in the
r r r r _z z z z 64 KB Page Table and thus the rate of page faults. If the table
r r r z _z z z z 128 KB
r r z z _z z z z 256 KB is too small, it is possible that not all the virtual pages
r z z z _z z z z 512 KB that actually have real pages assigned can be mapped
z z z z _z z z z 1 MB via the Page Table. This can happen if too many hash
collisions occur and there are more than 16 entries for
Figure 32. Format of PTELP when PTEL=1 the same primary/secondary pair of PTEGs (when the
secondary Page Table search is enabled) or more than
There are at least 2 formats of PTELP that specify a 8 entries for the same primary PTEG (when the sec-
64 KB page. One format is used with SLBEL||LP = ondary Page Table search is disabled).
0b000 and one format is used with SLBEL||LP = 0b101.
While this situation cannot be guaranteed not to occur
The actual page size selected by the LP field must not for any size Page Table, making the Page Table larger
exceed the segment size selected by the B field. Forms than the minimum size (see Section 5.7.6.1) will reduce
of PTELP not supported by a given implementation are the frequency of occurrence of such collisions.
treated as reserved values for that implementation.
The concatenation of the ARPN field and bits labeled Programming Note
“r” in the LP field contain the high-order bits of the real If large pages are not used, it is recommended that
page number of the real page that maps the first 4KB of the number of PTEGs in the Page Table be at least
the virtual page described by the entry. half the number of real pages to be accessed. For
example, if the amount of real storage to be
The low-order p-12 bits of the real page number con-
accessed is 231 bytes (2 GB), then we have
tained in the ARPN and LP fields must be 0s and are
231-12=219 real pages. The minimum recom-
ignored by the hardware.
mended Page Table size would be 218 PTEGs, or
Programming Note 225 bytes (32 MB).
40
Root Page Directory Base 13
2 + 10 + 40 p
Effective Page Number Byte
0 1 2 11 12 24 25 33 34 42 43 51
quadrant select
40 13 3
000
56-bit Real Address of Root (Level 1) PDE Level 1
Read Level 1 PDE PDE Access
44
V L 9
Contents of PTE
L=1,V=1
56-p p
56-bit Real Address Byte
Figure 33. Four level Radix Tree walk translating a 5.7.10.1 Radix Tree Page Directory
52b EA with NLS=13 in the root PDE and Entry
NLS=9 in the other PDEs.
.
(#3)
Hypervisor Page
Table Accesses (#4)
(#5)
40 13 3
(#12)
56-bit Real Address of Level 2 PDE Guest Level 2
Read Level 2 PDE PDE Access (#13)
44 Guest Real Address
of Level 3 PDE (#14)
V L 9 (#11)
(#15)
Contents of Level 2 PDE
L=0,V=1
44 9
L=0,V=1
44 9
(#22)
56-bit Real Address of PTE Guest
Read PTE PTE Access (#23)
44 Guest Real Address
(#24)
V L (#21)
(#25)
Contents of PTE
L=1,V=1
56-p p
56-bit Real Address Byte
Figure 36. Radix on Radix Page Table search for a For read, write, and execute authority, each is con-
52-bit EA depicting memory reads 1-24 trolled independently based on the least permissive
numbered in sequence setting of the two translation mechanisms (including all
component authority mechanisms within each of them).
When nested translation is being performed, there is
For storage ordering, the SAO attribute takes effect
the potential for two different sets of protection settings
when both SAO and normal memory attributes are
and two different sets of storage attributes. For protec-
specified. (The hypervisor will typically specify “normal
tion settings, the least permissive values take effect.
memory” and the OS may override that with SAO.) The
Guarded attribute is controlled by the process-scoped Unless otherwise stated or obvious from context, refer-
PTE. Mismatches of the Caching Inhibited attribute ences elsewhere in the Books to storage control attri-
have the following behavior. If the process-scoped butes for nested Radix Tree translations apply to the
PTE specifies I=1 when the partition-scoped PTE spec- result of combining the guest and host storage control
ifies I=0, the result is I=0. The reverse mismatch raises attributes as specified above. For example, the
a data storage or instruction storage exception, as description of TM disallowed access types in Section
appropriate for the access. The results of these rules 5.3.1 applies to the results of the combining.
are shown in Table 4. Together these rules can pro-
duce the WIMG=0b0011 state that any individual Att
value cannot express.
partition-scoped
00 01 10 11
Att
process-scoped
SAO/I/G 000 100 011 010
Att
00 000 000 100 Att mismatch Att mismatch
01 100 100 100 Att mismatch Att mismatch
10 011 001 001 011 011
11 010 000 000 010 010
Table 4: Effective SAO, I and G attributes for nested use of the Partition Table Entry to find the parti-
translation tion-scoped Page Table
use of the Partition Table Entry and the parti-
Programming Note tion-scoped Page Table to find the required Pro-
The mismatched Caching Inhibited attribute in the cess Table Entry
lower left quadrant above is given defined behavior use of the Process Table Entry and parti-
instead of excepting in order to support frame buf- tion-scoped Page Table to find the required Seg-
fer emulation. For frame buffer emulation, the ment Table Entry or walk the process-scoped Page
guest believes it is writing to a frame buffer (I=1) in Table (i.e. translate the effective address to a vir-
address space that the hypervisor maps to normal tual or guest real address), and
memory (I=0). use of the partition-scoped Page Table to translate
the virtual or guest real address.
Reference and Change bit recording is done in both the Depending on the translation mode and process state,
process-scoped and partition-scoped Page Table some of these steps may be skipped. The following
Entries. Recording is done as described in subsections enumerate the cases and explain the
Section 5.7.12, “Reference and Change Recording”. steps in more detail.
For performance reasons, the result of each walk of a
Radix Tree may be cached in a TLB. Logically, the 5.7.11.1 Fully-Qualified Address
result of each walk is cached separately. For nested
translation, the effective to guest real (process-scoped) The storage control facilities enable hardware to per-
translation may be cached, as well as the parti- form the entire translation process given a “fully-quali-
tion-scoped translation for each guest real address pro- fied address” and context that makes it a unique input.
duced by the translation process. A minimum of two In addition to its normal use, the term “effective
TLB accesses is required to complete a nested transla- address” is sometimes used as shorthand for the
tion: one for the effective to guest real address and one fully-qualified address, and the architecture should be
for the guest real to host real address. (An implemen- read with this possibilty in mind. The following are the
tation may optimize the process, as long as the optimi- components of the fully-qualified address.
zation can be managed correctly using the tlbie effLPID
instructions that software will use to manage the logical effPID
model.) EA
The additional context required to perform a translation
5.7.11 Translation Process or match a cached translation may include the follow-
ing.
As previously described, in its most complicated form PATEHR (selected using the value in LPIDR, not
the translation process includes the following steps: effLPID)
use of the PTCR to find the required Partition Table MSRHV PR IR DR
Entry
Programming Note
Reference and Change bits are set by the hardware as
The interrupt indicates to software that it must set
described below. An attempt to access storage may
the appropriate bit(s) itself. Note that an instruction
cause one or more of the bits to be set (as described
fetch can cause a Change bit to be set, for example
below) even if the access is not performed. The bits are
in the host Page Table Entry that maps the guest
updated in the Page Table Entry if the new value would
Page Table Entry if the instruction fetch causes the
otherwise be different from the old value for the virtual
Reference bit to be set in the guest Page Table
page, as determined by examining either the Page
Entry.
Table Entry or any lookaside information for the virtual
page (e.g., TLB) maintained by the hardware.
Reference Bit
The Reference bit is set to 1 if the corresponding
access (load, store, implicit access, or instruction
fetch) is required by the sequential execution
model and is performed. Otherwise the Reference
bit may be set to 1 if the corresponding access is
attempted, either in-order or out-of-order, even if
the attempt causes an exception, except that the
Reference bit is not set to 1 for the access caused
by an indexed Move Assist instruction for which
the XER specifies a length of zero.
Change Bit
The Change bit is set to 1 if a Store instruction is
executed and the store is performed or if an
Programming Note
Because the sync instruction is execution synchro- Status of Access R C
nizing, the set of Reference and Change bit Indexed Move Assist insn w 0 len in XER No No
updates that are performed with respect to the Load or Store Atomic instruction with Acc1 No
thread executing the sync instruction before the invalid function code, Load or Store
memory barrier is created includes all Reference Caching Inhibited executed when
and Change bit updates associated with instruc- MSRDR=1
tions preceding the sync instruction.
Storage protection violation Acc1 No
Out-of-order Store-type inst’n, including
If software refers to a Page Table Entry when transactional Store-type inst’n, exclud-
MSRDR=1 or MSRHV=0, the Reference and Change ing dcbtst
bits in the associated Page Table Entry are set as for Would be required by the sequential
ordinary loads and stores. See Section 5.10 for the execution model in the absence of
rules software must follow when updating Reference
system-caused or imprecise
and Change bits.
interrupts3, or transaction failure Acc Acc1 2
Figure 39 on page 1010 summarizes the rules for set- All other cases Acc No
ting the Reference and Change bits. The table applies Out-of-order I-fetch or Load-type Inst’n Acc No
to each atomic storage reference. It should be read (including transactional Load-type
from the top down; the first line matching a given situa- inst’n or dcbtst)
tion applies. For example, if stwcx. fails due to both a In-order Load-type or Store-type insn,
storage protection violation and the lack of a reserva-
access not performed4
tion, the Change bit is not altered. The figure applies to
PTE(s) that map instructions or storage operands of Store-type insn Acc Acc2
instructions. When Radix Tree translation is in use, Load-type insn Acc No
Reference and Change bits are set in other, parti- Other in-order access
tion-scoped, PTEs as described earlier in this section. Other ordinary Store, dcbz Yes Yes
icbi, icbt, dcbt, dcbtst, dcbst, dcbf[l] Acc No
I-fetch or ordinary Load Yes No
In the figure, the “Load-type” instructions are the Load “Acc” means that it is acceptable to set the bit.
instructions described in Books I, II, and III, and the 1 It is preferable not to set the bit.
Cache Management instructions that are treated as 2 If C is set, R is also set unless it is already set.
Loads. The “Store-type” instructions are the Store 3 For Floating-Point Enabled Exception type Pro-
instructions described in Books I, II, and III, and the gram interrupts, “imprecise” refers to the exception
Cache Management instructions that are treated as mode controlled by MSRFE0 FE1.
Stores. The Load Atomic and Store Atomic instructions 4 This case does not apply to the Touch instructions,
are considered to be both loads and stores, and as a because they do not cause a storage access.
result could match “Load-type” and “Store-type” entries
in the table. As a result, “Store-type” entries precede Figure 39. Setting the Reference and Change bits
“Load-type” entries in the table so that AMOs match
“Store-type” entries. The “ordinary” Load and Store
instructions are those described in Books I, II, and III.
“set” means “set to 1”.
5.7.13 Storage Protection imal range of effective, virtual, or guest real addresses
that cannot be mapped to real addresses. A protection
The storage protection mechanism provides a means boundary is a boundary between protection domains.
for selectively granting instruction fetch access, grant-
ing read access, granting write access, and prohibiting
5.7.13.1 Virtual Page Class Key Protec-
access to areas of storage based on a number of con-
trol criteria. tion
The operation of the storage protection mechanism The Virtual Page Class Key Protection mechanism pro-
depends on the contents of one or more of the follow- vides the means to assign virtual pages to one of 32
ing. classes, and to modify data access permissions for
each class by modifying the Authority Mask Register
- MSR bits HV, IR, DR, PR (AMR), shown in Figure 40, and to modify instruction
- the key bits and N bit in the associated SLB access permissions for each class by modifying the
entry Instruction Authority Mask Register (IAMR) shown in
Figure 41.
- the page protection bits, key bits, N bit, and G
attribute in the associated PTE Programming Note
- the AMR, IAMR, AMOR, and UAMOR If address translation is disabled for a given
The storage protection mechanism consists of the Vir- access, the access is not affected by the Virtual
tual Page Class Key Protection mechanism described Page Class Key Protection mechanism even if the
in Section 5.7.13.1, the Basic Storage Protection mech- access is made in virtual real addressing mode.
anism described in Section 5.7.13.2 and Section
5.7.13.3, and the Radix Tree Translation Storage Pro-
tection mechanism described in Section 5.7.13.4.
Authority Mask Register
When address translation is enabled for an access, the Key0 Key1 Key2 ... Key29 Key30 Key31
access is permitted in Paravirtualized HPT mode if and 0 2 4 6 58 60 62
only if the access is permitted by both the Virtual Page
Class Key Protection mechanism and the Basic Stor-
Bits Name Description
age Protection mechanism. When address translation
0:1 Key0 Access mask for class number 0
is enabled for a guest access, the access is permitted
2:3 Key1 Access mask for class number 1
in Radix on Radix mode if and only if the access is per-
… … …
mitted by the Radix Tree Translation Storage Protection
2n:2n+1 Keyn Access mask for class number n
mechanism for both the process-scoped and parti-
… … …
tion-scoped PTEs. When address translation is dis-
62:63 Key31 Access mask for class number 31
abled for a guest access or is enabled for an access
with MSRHV=1, the access is permitted in Radix on Figure 40. Authority Mask Register (AMR)
Radix mode if and only if the access is permitted by the
Radix Tree Translation Storage Protection mechanism The access mask for each class defines the access
for the partition-scoped PTE. When address transla- permissions that apply to loads and stores for which the
tion is disabled for an access with MSRHV=1, the virtual address is translated using a Page Table Entry
access is permitted if and only if the access is permit- that contains a Key field value equal to the class num-
ted by the Basic Storage Protection mechanism. If an ber. The access permissions associated with each
instruction fetch is not permitted, an Instruction Storage class are defined as follows, where AMR2n and
exception or a Hypervisor Instruction Storage exception AMR2n+1 refer to the first and second bits of the access
is generated. If a data access is not permitted, a Data mask corresponding to class number n.
Storage exception or a Hypervisor Data Storage excep- - A store is permitted if AMR2n=0b0; otherwise
tion is generated. the store is not permitted.
A protection domain is a maximal range of effective - A load is permitted if AMR2n+1=0b0; otherwise
addresses, virtual addresses, or guest real addresses the load is not permitted.
for which variables related to storage protection can be
independently specified (including by default, as in real The AMR can be accessed using either SPR 13 or
and hypervisor real addressing modes), or a maximal SPR 29. Access to the AMR using SPR 29 is privi-
range of addresses, effective, virtual, or guest real, for leged.
which variables related to storage protection cannot be
specified. Examples include: a segment, a virtual page
(including for the Virtualized Real Mode Area), the Vir-
tualized Real Mode Area, the effective address range
0:260-1 in hypervisor real addressing mode, and a max-
Programming Note
The preceding requirement permits designs to
implement the AMOR and/or UAMOR as 32-bit
registers — specifically, to implement only the
even-numbered bits (or only the odd-numbered
bits) of the register — in a manner such that the
reduction, from the architecturally-required 64 bits
to 32 bits, is not visible to (correct) software. This
implementation technique saves space in the hard-
ware. (A design that uses this technique does the
appropriate “fan in/out” when the register is
accessed, to provide the appearance, to (correct)
software, of supporting all 64 bits of the register.)
Permitting designs to implement the [U]AMOR as
32-bit registers by virtue of the software require-
ment specified above, rather than by defining the
[U]AMOR as 32-bit registers, permits the architec-
ture to be extended in the future to support con-
trolling modification of the “read access” AMR bits
(the odd-numbered bits) independently from the
“write access” AMR bits (the even-numbered bits),
if that proves desirable. If this independent control
does prove desirable, the only architecture change
would be to eliminate the software requirement.
Programming Note
When modifying the AMOR and/or UAMOR, the
hypervisor should ensure that the two registers are
consistent with one another before giving control to
a non-hypervisor program. In particular, the hyper-
visor should ensure that if AMORi=0 then
UAMORi=0, for all i in the range 0:63. (Having
AMORi=0 and UAMORi=1 would permit problem
state programs, but not the operating system, to
modify AMR bit i.)
Programming Note
The Virtual Page Class Key Protection mechanism replaces the Data Address Compare mechanism that was defined
in versions of the architecture that precede Version 2.04 (e.g., the two facilities use some of the same resources, as
described below). However, the Virtual Page Class Key Protection mechanism can be used to emulate the Data
Address Compare mechanism. Moreover, programs that use the Data Address Compare mechanism can be modi-
fied in a manner such that they will work correctly both on implementations that comply with versions of the architec-
ture that precede Version 2.04 (and hence implement the Data Address Compare mechanism) and on
implementations that comply with Version 2.04 of the architecture or with any subsequent version (and hence instead
implement the Virtual Page Class Key Protection mechanism). The technique takes advantage of the facts that the
SPR number for privileged access to the AMR (29) is the same as the SPR number for the Data Address Compare
mechanism's ACCR (Address Compare Control Register), that KEY4 occupies the same bit in the PTE as the Data
Address Compare mechanism's AC (Address Compare) bit, and that the definition of ACCR62:63 is very similar to the
definition of each even-odd pair of AMR bits. The technique is as follows, where PTE1 refers to doubleword 1 of the
PTE.
- Set bits 2:3 and 62:63 of SPR 29 (which is also be used for any virtual pages for which it
either the ACCR or the AMR) to x, where x is is desired that the Virtual Page Class Key Pro-
the desired 2-bit value for controlling Data tection mechanism permit all accesses. Do
Address Compare matches, and set bits 0:1 to not use PTEKEY =31.
0s.
- When a Hypervisor Data Storage interrupt
- Set PTE154 (which is either the AC bit or occurs, if HDSISR42=1 then ignore the inter-
KEY4) to the same value that the AC bit would rupt for Cache Management instructions other
be set to, and set PTE12:3 (which are either than dcbz. (These instructions can cause a
RPN bits, that correspond to a real address virtual page class key protection violation but
size larger than the size supported by any cannot cause a Data Address Compare
implementation that supports the Data match.) Otherwise forward the interrupt to the
Address Compare mechanism, or KEY0:1) operating system, which will treat the interrupt
and PTE152:53 (which are either reserved bits as if a Data Address Compare match had
or KEY2:3) to 0s. occurred. (Note: Cases for which it is unde-
fined whether a Data Address Compare match
- Use PTEKEY values 0 and 1 only for purposes
occurs do not necessarily cause a virtual page
of emulating the Data Address Compare
class key protection violation.)
mechanism, except that PTEKEY value 0 may
(Because privileged software can access the AMR using either SPR 13 or SPR 29, it might seem that, when SPR 13
was added to the architecture (in Version 2.06), SPR 29 should have been removed. SPR 29 is retained for two rea-
sons: first, to avoid requiring privileged software to change to use the newer SPR number; and second, to retain the
ability to emulate the Data Address Compare mechanism as described above.)
MSR bit IR or DR is 1 for instruction fetches or Load or Store instruction that causes an Alignment
data accesses respectively, or MSRHV=0, and exception, or that causes a [Hypervisor] Data Stor-
either G=1 or Att=0b010 in the relevant Page Table age exception for reasons other than Data
Entry. Address Watchpoint match.
The portion of the storage operand that is in Cach- age control bits that specify the presence or absence of
ing Inhibited and Guarded storage is not accessed. the corresponding storage control for all accesses
translated by the entry as shown in Figure 46 and Fig-
(The corresponding rules for instructions that
ure 47. In the following description, references to indi-
cause a Data Address Watchpoint match are given
vidual WIMG bits apply to the corresponding Radix Att
in Section 8.4.)
encoding, or to the result of combining the pro-
cess-scoped and partition-scoped ATT encodings (see
5.8.1.1 Out-of-Order Accesses to Section 5.7.10.3), except where otherwise stated or
Guarded Storage obvious from context.
In Sections 5.8.2.1 and 5.8.2.2, “access” includes At any given time, the value of the I bit must be the
accesses that are performed out-of-order, and refer- same for all accesses to a given real page.
ences to W, I, M, and G bits include the values of those
bits that are implied when the thread is in hypervisor
real addressing mode.
5.8.2.2 Altering the Storage Control
Bits
Programming Note When changing the value of the W bit for a given real
In a system consisting of only a single-threaded page from 0 to 1, software must ensure that no thread
processor which has caches, correct coherent exe- modifies any location in the page until after all copies of
cution does not require storage to be accessed as locations in the page that are considered to be modified
Memory Coherence Required, and accessing stor- in the data caches have been copied to main storage
age as not Memory Coherence Required may give using dcbst or dcbf[l].
better performance.
When changing the value of the I bit for a given real
page from 0 to 1, software must set the I bit to 1 and
then flush all copies of locations in the page from the
5.8.2.1 Storage Control Bit Restrictions caches using dcbf[l] and icbi before permitting any
All combinations of W, I, M, and G values are permitted other accesses to the page. Note that similar cache
except those for which both W and I are 1 and management is required before using the Fixed-Point
M||G 0b10. Load and Store Caching Inhibited instructions to
access storage that has formerly been cached. (See
The combination WIMG = 0b1110 is used to identify the Section 4.4.1 on page 965.)
Strong Access Ordering (SAO) storage attribute (see
Section 1.7.1, “Storage Access Ordering”, in Book II). Programming Note
Because this attribute is not intended for general pur-
pose programming, it is provided only for a single com- The storage control bit alterations described above
bination of the attributes normally identified using the are examples of cases in which the directives for
WIMG bits. That combination would normally be indi- application of statements about the W and I bits to
cated by WIMG = 0b0010. SAO given in the third paragraph of the preceding
subsection must be applied. A transition from the
References to Caching Inhibited storage (or storage typical WIMG=0b0010 for ordinary storage to
with I=1) elsewhere in the Power ISA have no applica- WIMG=0b1110 for SAO storage does not require
tion to SAO storage or its WIMG encoding, despite the the flush described above because both WIMG
encoding using I=1. Conversely, references to storage combinations indicate storage that is not Caching
that is not Caching Inhibited (or storage with I=0) apply Inhibited.
to SAO storage or its WIMG encoding. References to
Write Through Required storage (or storage with W=1)
elsewhere in the Power ISA have no application to Programming Note
SAO storage or its WIMG encoding, despite the fact It is recommended that dcbf be used, rather than
that the encoding uses W=1. Conversely, references to dcbfl, when changing the value of the I or W bit
storage that is not Write Through Required (or storage from 0 to 1. (dcbfl would have to be executed on all
with W=0) apply to SAO storage or its WIMG encoding. threads for which the contents of the data cache
may be inconsistent with the new value of the bit,
If a given real page is accessed concurrently as SAO whereas, if the M bit for the page is 1, dcbf need be
storage and as non-SAO storage, the result may be executed on only one thread in the system.)
characteristic of the weakly consistent model.
When changing the value of the M bit for a given real
page, software must ensure that all data caches are
Programming Note consistent with main storage. The actions required to
If an application program requests both the Write do this are system-dependent.
Through Required and the Caching Inhibited attri-
butes for a given storage location, the operating Programming Note
system should set the I bit to 1 and the W bit to 0. For example, when changing the M bit in some
The operating system should provide a means by directory-based systems, software may be required
which application programs can request SAO stor- to execute dcbf[l] on each thread to flush all stor-
age, in order to avoid confusion with the preceding age locations accessed with the old M value before
guideline (since SAO is encoded using WI=0b11). permitting the locations to be accessed with the
new M value.
At any given time, the value of the W bit must be the
same for all accesses to a given real page.
extended mnemonic ptesync (equivalent to sync with ptesync instruction completes. Executing a pte-
L=2). sync instruction ensures that all such searches
that implicitly load from the target location of such
The ptesync instruction has all of the properties of
a store willl obtain the value stored (or a value
sync with L=0 and also the following additional proper-
stored subsequently). Also, the memory barrier
ties.
created by the ptesync insruction ensures that all
The memory barrier created by the ptesync searches of these tables by any other thread, that
instruction provides an ordering function for the are performed after a store in set B of the memory
storage accesses associated with all instructions barrier has been performed with respect to the
that are executed by the thread executing the pte- other thread and that implicitly load from the target
sync instruction and, as elements of set A, for all location of such a store, will obtain the value stored
Reference and Change bit updates associated (or a value stored subsequently).
with additional address translations that were per-
In conjunction with the tlbie and tlbsync instruc-
formed, by the thread executing the ptesync
tions, the ptesync instruction provides an ordering
instruction, before the ptesync instruction is exe-
function for TLB invalidations and related storage
cuted. The applicable pairs are all pairs ai,bj in
accesses on other threads as described in the tlb-
which bj is a data access and ai is not an instruc-
sync instruction description on page 1042.
tion fetch.
The ptesync instruction causes all Reference and Similarly, in conjunction with the slbieg or slbiag
Change bit updates associated with address trans- and slbsync instructions, the ptesync instruction
lations that were performed, by the thread execut- provides an ordering function for SLB invalidations
ing the ptesync instruction, before the ptesync and related storage accesses on other threads as
instruction is executed, to be performed with described in the slbsync instruction description on
respect to that thread before the ptesync instruc- page 1032.
tion’s memory barrier is created.
Programming Note
The memory barrier created by the ptesync For instructions following a ptesync instruction, the
instruction provides an ordering function for all memory barrier need not order implicit storage
stores to the Partition Table, Process Tables, Seg- accesses for purposes of address translation and
ment Tables, Page Directories, and Page Tables reference and change recording.
caused by Store instructions preceding the pte- The functions performed by the ptesync instruction
sync instruction with respect to invalidations, of may take a significant amount of time to complete,
cached copies of information derived from these so this form of the instruction should be used only if
tables, caused by slbieg, slbiag, and tlbie instuc- the functions listed above are needed. Otherwise
tions following the ptesync instruction. The mem- sync with L=0 should be used (or sync with L=1,
ory barrier ensures that all searches of these or eieio, if appropriate).
tables by another thread, that are performed after
an invalidation caused by such an slbieg, slbiag, Section 5.10, “Translation Table Update Synchroni-
or tlbie instruction has been performed with zation Requirements” on page 1043 gives exam-
respect to the other thread and that implicitly load ples of uses of ptesync.
from the target location of such a store, will obtain
the value stored (or a value stored subsequently).
Programming Note
5.9.3 Lookaside Buffer
The next bullet is sufficient to order the stores Management
with respect to the invalidations on the thread
All implementations have a Segment Lookaside Buffer
executing the ptesync instruction. That bullet
(SLB). Independent of whether the executing partition
is also sufficient to provide the ordering with
operates in a mode that uses hardware SLB loading
respect to invalidations caused by slbie, slbia,
and bolting versus pure software loading (controlled by
and tlbiel instuctions, which affect only the
the value of LPCRUPRT), software is responsible for
thread executing them.
keeping the SLB current with the segment mapping for
the process that is executing. Proper management of
The ptesync instruction provides an ordering func-
the SLB across context switches is described in pro-
tion for all stores to the Partition Table, Process
gramming notes.
Tables, Segment Tables, Page Directories, and
Page Tables caused by Store instructions preced-
ing the ptesync instruction with respect to
searches of these tables that are performed, by the For performance reasons, most implementations also
thread executing the ptesync instruction, after the cache other information that is used in address transla-
tion. These caches may include: a Translation Loo-
kaside Buffer (TLB) which is a cache of recently used All implementations that have such lookaside informa-
Page Table Entries (PTEs); a cache of recently used tion provide a means by which software can invalidate
translations of effective addresses to real addresses; a all such lookaside information.
Page Walk Cache for Radix Tree translation; caching of
For simplicity, elsewhere in the Books it is assumed
the In-Memory Tables; or any combination of these.
that the TLB exists.
Lookaside information, including the SLB, is managed
using the instructions described in the subsections of
this section unless additional requirements are pro- Programming Note
vided in implementation-specific documentation. Because the instructions used to manage TLBs,
SLBs, Page Walk Caches, caches of Partition and
To simplify lookaside buffer management, hardware will Process Table Entries, and implementation-specific
only perform speculative translation for the context that lookaside information may be changed in a future
is executing, in particular using the current effective val- version of the architecture, it is recommended that
ues of LPID and PID. Except when LPIDR=0, no trans- software “encapsulate” their use into subroutines.
lations will be created and cached speculatively when
HR=0 and MSRHV=1. Furthermore, no translations will
be created and cached speculatively in hypervisor real Programming Note
addressing mode. The limitation of speculative behav- The function of all the instructions described in
ior in these situations is to cache a PATE when LPIDR Sections 5.9.3.2 - 5.9.3.3 is independent of
is loaded and a PRTE when PIDR is loaded. whether address translation is enabled or disabled.
For a discussion of software synchronization
Lookaside information derived from PTEs is not neces- requirements when invalidating SLB and TLB
sarily kept consistent with the Page Table. When soft- entries, see Chapter 11.
ware alters the contents of a PTE, in general it must
also invalidate all corresponding TLB entries and imple-
mentation-specific lookaside information; exceptions to 5.9.3.1 Thread-Specific Segment
this rule are described in Section 5.10.1.2. Translations
The effects of the slbie, slbieg, slbia, slbiag, and TLB It is necessary to provide thread-specific temporary
Management instructions on address translations, as ESID to VSID translations. These translations cannot
specified in Sections 5.9.3.2 for the SLB and be placed in valid entries in the Segment Table
5.9.3.3 for the TLB, Page Walk Cache, and In-Memory because the Segment Table has a process scope
Table caches, apply to all implementation-specific loo- rather than a thread scope. Instead, software will use
kaside information that is used in address translation. slbmte to install such translations in the SLB. All SLB
Unless otherwise stated or obvious from context, refer- entries created using slbmte are considered to be
ences to SLB entry invalidation and TLB entry invalida- “software created.” Software created entries will only
tion elsewhere in the Books apply also to invalidation of translate accesses from the hardware thread by which
Page Walk Cache content, In-Memory Table cache they are installed. When LPCRUPRT=1, they are also
content, and all implementation-specific lookaside considered to be “bolted.” Each thread has the ability
information that is derived from SLB entries and PTEs, to bolt four entries.
respectively.
All implementations provide a means by which software 5.9.3.2 SLB Management Instructions
can invalidate all implementation-specific lookaside Software establishes translations in the SLB using slb-
information that is derived from PTEs. mte. Care must be taken to avoid creating multiple
Implementation-specific lookaside information that con- effective-to-virtual translations for any given effective
tains translations of effective addresses to real address. Software-created entries will remain in the
addresses may include “translations” that apply in real SLB until invalidated using slbie or slbia (which also
addressing mode. Because such “translations” are invalidate related implementation-specific lookaside
affected by the contents of the LPCR and HRMOR, information) or overwritten using slbmte. After updat-
when software alters the (relevant) contents of these ing a Segment Table Entry, software must use an slbie
registers it must also invalidate the corresponding or slbieg instruction to remove lookaside information
implementation-specific lookaside information. Soft- associated with the old contents of the entry. slbie
ware can invalidate all such lookaside information by may be used to invalidate software-created entries, but
using the slbia instruction with IH=0b000. However, will not invalidate outboard translation caches. slbieg
performance is likely to be better if other, appropriate, does not invalidate software-created entries, but is the
IH values are used to limit the amount of lookaside only way to invalidate outboard translation caches.
information that is invalidated. When taking a PID out of service with the intent of reus-
ing it, software should use slbiag to remove stale
translations from SLBs and ERATs in the “nest.” (Nest
refers to the platform external to the processor cores. SLB Invalidate Entry X-form
Here the reference is to translations cached for use by
accelerators.) slbsync will establish order between slbie RB
slbieg and slbiag instructions and a subsequent pte-
sync. ptesync must also be used to synchronize the 31 /// /// RB 434 /
Segment Table update prior to performing the loo- 0 6 11 16 21 31
kaside management. When performing a context
switch, software must use an slbia instruction to ea0:35 (RB)0:35
remove lookaside information associated with the old if, for SLB entry that translates
context. slbmfee and slbmfev may be used by the or most recently translated ea,
hypervisor to save software-created entries. slbmte is
used to restore software-created entries. slbfee has no entry_seg_size = size specified in (RB)37:38
function when LPCRUPRT=1 for the partition that is run- then for SLB entry (if any) that translates ea
ning. SLBEV 0
all other fields of SLBE undefined
else
Programming Note
s log_base_2(entry_seg_size)
Accesses to a given SLB entry caused by the esid (RB)0:63-s
instructions described in this section obey the u undefined 1-bit value
sequential execution model with respect to the con- if u then
tents of the entry and with respect to data depen- if an SLB entry translates esid
dencies on those contents. That is, if an instruction SLBEV 0
sequence contains two or more of these instruc- all other fields of SLBE undefined
tions, when the sequence has completed, the final Let the Effective Address (EA) be any EA for which
contents of the SLB entry and of General Purpose EA0:35 = (RB)0:35. Let the segment size be equal to the
Registers is as if the instructions had been exe- segment size specified in (RB)37:38; the allowed values
cuted in program order. of (RB)37:38, and the correspondence between the val-
ues and the segment size, are the same as for the B
However, software synchronization is required in
field in the SLBE (see Figure 27 on page 994).
order to ensure that any alterations of the entry
take effect correctly with respect to address trans- The segment size must be the same as the segment
lation; see Chapter 11. size in the SLB entry that translates the EA, or the val-
ues that were in the SLB entry that most recently trans-
lated the EA if the translation is no longer in the SLB; if
Programming Note
these values are not the same, it is implementa-
Changes to the segment mappings in the presence tion-dependent whether the SLB entry (or implementa-
of active transactions may compromise transac- tion-dependent translation information) that translates
tional semantics if the transaction has accessed a the EA is invalidated, and the next paragraph need not
segment that is assigned a new VSID. Conse- apply.
quently, when modifying segment mappings, it is
the responsibility of the OS or hypervisor to ensure If the SLB contains only a single entry that translates
that any transaction that may have touched the the EA, then that is the only SLB entry that is invali-
modified segment is terminated, using a tabort. or dated, except that it is implementation-dependent
treclaim. instruction. whether an implementation-specific lookaside entry for
a real mode address “translation” is invalidated. If the
SLB contains more than one such entry, then zero or
more such entries are invalidated, and similarly for any
implementation-specific lookaside information used in
address translation; additionally, a machine check may
occur.
SLB entries are invalidated by setting the V bit in the
entry to 0, and the remaining fields of the entry are set
to undefined values.
This instruction terminates any Segment Table walks
being performed on behalf of the thread that executes
it.
The hardware ignores the contents of RB listed below
and software must set them to 0s.
- (RB)37
- (RB)39
- (RB)40:63
- If s = 40, (RB)24:35 SLB Invalidate Entry Global X-form
If this instruction is executed in 32-bit mode, (RB)0:31
slbieg RS,RB
must be zeros.
This instruction is privileged. 31 RS /// RB 466 /
0 6 11 16 21 31
Special Registers Altered:
None
target_PID = RS0:31
Programming Note if MSRHV=1 then target_LPID = RS32:63
else target_LPID = LPIDR
slbie does not affect SLBs on other threads. ea0:35 (RB)0:35
for each thread with LPIDR=target_LPID and
PIDR=target_PID
if, for each SLB entry that
Programming Note translates or most recently translated ea
entry_seg_size = size specified in (RB)37:38
The B value in register RB may be needed for then for SLB entry (if any)
invalidating ERAT entries corresponding to the that translates ea and is not software-created
translation being invalidated. SLBEV 0
all other fields of SLBE undefined
else
Programming Note s log_base_2(entry_seg_size)
When switching to execute an adjunct, a hypervisor esid (RB)0:63-s
will disable translation and use slbie to be sure u undefined 1-bit value
there is no SLB entry mapping the effective if u then
if an SLB entry translates esid and the entry
address space that will be used by the incoming is not software-created
adjunct. It will then bolt an entry for the incoming SLBEV 0
adjunct and transfer control to that adjunct. While all other fields of SLBE undefined
the thread is in hypervisor real addressing mode
and during adjunct execution, no speculative Seg-
ment Table walks will be performed. The operation performed by this instruction is based on
the contents of registers RS and RB. The contents of
these registers are shown below.
RS
PID LPID
0 32 63
RB
ESID C B 0s
0 36 37 39 63
RS0:31 PID
RS32:63 LPID
RB0:35 ESID
RB36 must be 0b0
RB37:38 B
RB39:63 must be 0b0 || 0x000000
the values and the segment size, are the same as for
Programming Note
the B field in the SLBE (see Figure 27 on page 994).
The B value in register RB may be needed for
Only SLBs for threads running on behalf of target_LPID invalidating ERAT entries corresponding to the
and target_PID are searched. Software-created entries translation being invalidated.
are ignored. The segment size must be the same as
the segment size in the SLB entry that translates the
EA, or the values that were in the SLB entry that most Programming Note
recently translated the EA if the translation is no longer Use of slbieg to invalidate software-created seg-
in the SLB; if these values are not the same, it is imple- ment descriptors is a programming error. The
mentation-dependent whether the SLB entry (or imple- architecture requires that bolted entries be ignored
mentation-dependent translation information) that by the instruction.
translates the EA is invalidated, and the next paragraph
need not apply.
If the SLB contains only a single entry that translates
the EA, then that is the only SLB entry that is invali- SLB Invalidate All X-form
dated, except that it is implementation-dependent
whether an implementation-specific lookaside entry for slbia IH
a real mode address “translation” is invalidated. If the
SLB contains more than one such entry, then zero or 31 // IH /// /// 498 /
0 6 8 11 16 21 31
more such entries are invalidated, and similarly for any
implementation-specific lookaside information used in
address translation; additionally, a machine check may switch (IH)
occur. case (0b000, 0b001, 0b010, 0b110):
for each SLB entry except SLB entry 0
SLB entries are invalidated by setting the V bit in the SLBEV 0
entry to 0, and the remaining fields of the entry are set all other fields of SLBE undefined
to undefined values. case (0b011):
for each SLB entry such that SLBEClass = 1
The hardware ignores the contents of RB listed below SLBEV 0
and software must set them to 0s. all other fields of SLBE undefined
- (RB)36 case (0b100):
for each SLB entry
- (RB)37 SLBEV 0
- (RB)39 all other fields of SLBE undefined
- (RB)40:63 case (0b111):
- If s = 40, (RB)24:35
slbia invalidates the contents of the SLB, and of imple-
If this instruction is executed in 32-bit mode, (RB)0:31 mentation-specific lookaside information for effective to
must be zeros. real address translations, based on the contents of the
IH field as described below. SLB entries are invali-
The operation performed by this instruction is ordered
dated by setting the V bit in the entry to 0. When an
by the eieio (or sync or ptesync) instruction with
SLB entry is invalidated, the remaining fields of the
respect to a subsequent slbsync instruction executed
entry are set to undefined values.
by the thread executing the slbieg instruction. The
operations caused by slbieg and slbsync are ordered In the description of the IH values, “implementa-
by eieio as a fifth set of operations, which is indepen- tion-specific lookaside information” is shorthand for
dent of the other four sets that eieio orders. “implementation-specific lookaside information for
effective to real address translations,” and “when
This instruction is privileged except when LPCRG-
address translation was enabled” is shorthand for
TSE=0, making it hypervisor privileged.
“when MSRIR was equal to 1 or MSRDR was equal to 1,
Special Registers Altered: as appropriate for the type of access,” and correspond-
None ingly for “when address translation was disabled.” The
descriptions specify which entries must be invalidated;
Programming Note additional entries may be invalidated except where the
slbieg does affect SLBs on other threads. description states that certain SLB entries are not inval-
idated.
0b000 All SLB entries except entry 0 are invalidated;
SLB entry 0 is not invalidated.
All implementation-specific lookaside informa-
tion is invalidated.
B VSID KsKpNLC 0 LP 0s
0 2 52 57 58 60 63
RB
ESID V 0s index
0 36 37 52 63
RS0:1 B
RS2:51 VSID
RS52 Ks
RS53 Kp
RS54 N
RS55 L
RS56 C
RS57 must be 0b0
RS58:59 LP
RS60:63 must be 0b0000
RB0:35 ESID
RB36 V
RB37:51 must be 0b000 || 0x000
RB52:63 index, which selects the SLB entry
Figure 48. GPR contents for slbmte
On implementations that support a virtual address size
of only n bits, n<78, (RS)2:79-n must be zeros.
When LPCRUPRT=1, the value of index must not
exceed 3. (RB)52:61 are ignored.
High-order bits of (RB)52:63 that correspond to SLB
entries beyond the size of the SLB provided by the
implementation must be zeros.
The hardware ignores the contents of RS and RB listed
below and software must set them to 0s.
- (RS)57
- (RS)60:63
- (RB)37:51
If this instruction is executed in 32-bit mode, (RB)0:31
must be zeros (i.e., the ESID must be in the range 0:15).
This instruction must not be used to load a segment SLB Move From Entry VSID X-form
descriptor that is in the Segment Table when
LPCRUPRT=1, and cannot be used to invalidate the slbmfev RT,RB
translation contained in an SLB entry.
31 RT /// L RB 851 /
This instruction is privileged.
0 6 11 15 16 21 31
Special Registers Altered:
None This instruction is used to read software-loaded SLB
entries. When LPCRUPRT=0, the entry is specified by
Programming Note bits 52:63 of register RB. When LPCRUPRT=1, only the
The reason slbmte must not be used to load seg- first four entries can be read, so bits 52:61 of register
ment descriptors that are in the Segment Table is RB are ignored. If the specified entry is valid (V=1), the
that there could be a race condition with hardware contents of the B, VSID, Ks, Kp, N, L, C, and LP fields
loading the same segment descriptor, resulting in of the entry are placed into register RT. The contents of
duplicate SLB entries. Software must not allow these registers are interpreted as shown in Figure 49.
duplicate SLB entries to be created; see
RT
Section 5.7.8.2, “SLB Search”.
The reason slbmte cannot be used to invalidate an B VSID KsKpNLC 0 LP 0s
SLB entry is that it does not necessarily affect 0 2 52 57 58 60 63
implementation-specific address translation loo-
kaside information. slbie (or slbia) must be used RB
for this purpose.
0s index
0 52 63
RT0:1 B
RT2:51 VSID
RT52 Ks
RT53 Kp
RT54 N
RT55 L
RT56 C
RT57 set to 0b0
RT58:59 LP
RT60:63 set to 0b0000
SLB Move From Entry ESID X-form SLB Find Entry ESID X-form
slbmfee RT,RB slbfee. RT,RB
This instruction is used to read software-loaded SLB The SLB is searched for an entry that matches the
entries. When LPCRUPRT=0, the entry is specified by effective address specified by register RB. When
bits 52:63 of register RB. When LPCRUPRT=1, only the LPCRUPRT=1, this instruction is nonfunctional. The
first four entries can be read, so bits 52:61 of register search is performed as if it were being performed for
RB are ignored. If the specified entry is valid (V=1), the purposes of address translation. That is, in order for a
contents of the ESID and V fields of the entry are given entry to satisfy the search, the entry must be
placed into register RT. If LPCRUPRT=1, the value of valid (V=1), and (RB)0:63-s must equal
the BO field of the entry is also placed into register RT. SLBE[ESID0:63-s] (where 2s is the segment size
The contents of these registers are interpreted as selected by the B field in the entry).If exactly one
shown in Figure 50. matching entry is found, the contents of the B, VSID,
Ks, Kp, N, L, C, and LP fields of the entry are placed
RT
into register RT. If no matching entry is found, register
RT is set to 0. If more than one matching entry is found,
ESID V BO 0s either one of the matching entries is used, as if it were
0 36 37 38 63 the only matching entry, or a Machine Check occurs. If
RB a Machine Check occurs, register RT, and CR Field 0
are set to undefined values, and the description below
of how this register and this field is set does not apply.
0s index
0 52 63 The contents of registers RT and RB are interpreted as
shown in Figure 51.
RT0:35 ESID
RT
RT36 V
RT37 BO, entry is bolted
RT38:63 set to 0b000 || 0x00_0000 B VSID KsKpNLC 0 LP 0s
RB0:51 must be 0x0_0000_0000_0000 0 2 52 57 58 60 63
RB52:63 index, which selects the SLB entry RB
Figure 50. GPR contents for slbmfee
ESID 0000 0s
If the SLB entry specified by bits 52:63 of register RB is 0 36 40 63
invalid (V=0), the contents of register RT are set to 0.
RT0:1 B
High-order bits of (RB)52:63 that correspond to SLB
RT2:51 VSID
entries beyond the size of the SLB provided by the
RT52 Ks
implementation must be zeros.
RT53 Kp
The hardware ignores the contents of RB0:51. RT54 N
RT55 L
This instruction is privileged. RT56 C
The use of the L field is implementation specific. RT57 set to 0b0
RT58:59 LP
Special Registers Altered: RT60:63 set to 0b0000
None RB0:35 ESID
RB36:39 must be 0b0000
RB40:63 must be 0x000000
Figure 51. GPR contents for slbfee.
If s > 28, RT80-s:51 are set to zeros. On implementa-
tions that support a virtual address size of only n bits, n
< 78, RT2:79-n are set to zeros.
CR Field 0 is set as follows. j is a 1-bit value that is
equal to 0b1 if a matching entry was found. Otherwise, j
is 0b0. When LPCRUPRT0, j=0b0.
Programming Note
slbsync should not be used to synchronize the
completion of slbie.
If RIC=0, this is a search for a single TLB entry. The All TLB entries on all threads that have all of the
following relationships must be true and tests and following properties are made invalid.
actions are performed to search for an HPT translation. The entry translates a virtual address for
which all the following are true.
If the base page size specified by the PTE that was
VA14:14+i is equal to (RB)0:i.
used to create the TLB entry to be invalidated is 4
L=0 or b20 or, if L=1 and b<20,
KB, the L field in register RB must contain 0.
VA58:77-b is equal to (RB)56:75-b.
If the L field in RB contains 0, the base page size is The segment size of the entry is the same as
4 KB and RB56:58 (AP - Actual Page size field) the segment size specified in (RB)54:55.
must be set to the SLBEL||LP encoding for the page Either of the following is true:
size corresponding to the actual page size speci- The L field in RB is 0, the base page
fied by the PTE that was used to create the TLB size of the entry is 4 KB, and the actual
entry to be invalidated. Thus, b is equal to 12 and p page size of the entry matches the
is equal to log2 (actual page size specified by actual page size specified in (RB)56:58.
(RB)56:58). The Abbreviated Virtual Address (AVA) The L field in RB is 1, the base page
field in register RB must contain bits 14:65 of the size of the entry matches the base
virtual address translated by the TLB entry to be page size specified in (RB)44:51, and
invalidated. Variable i is equal to 51. the actual page size of the entry
matches the actual page size specified
If the L field in RB contains 1, the following rules
in (RB)44:51.
apply.
TLBELPID = search_LPID.
The base page size and actual page size are
specified in the LP field in register RB, where Additional TLB entries may also be made invalid if
the relationship between (RB)44:51 (LP - Large those TLB entries contain an LPID that matches
Page size selector field) and the base page search_LPID.
size and actual page size is the same as the
The following relationships must be true and tests and
relationship between PTELP and the base
actions are performed to search for a Radix Tree trans-
page size and actual page size, except for the
lation. For a partition-scoped invalidation, references
“r” bits (see Section 5.7.9.1 on page 998 and
to the effective address are understood to refer to the
Figure 32 on page 999). Thus, b is equal to
guest real address.
log2 (base page size specified by (RB)44:51)
and p is equal to log2 (actual page size speci- The page size is encoded in RB56:58 (AP - Actual
fied by (RB)44:51). Specifically, (RB)44+c:51 Page size field). Thus p is equal to log2( page size
must be equal to the contents of bits c:7 of the specified by RB56:58). The Effective Page Number
LP field of the PTE that was used to create the (EPN) field in register RB must contain the bits 0:i
TLB entry to be invalidated, where c is the of the effective address translated by the TLB entry
maximum of 0 and (20-p). to be invalidated. Variable i is equal to 63-p.
Variable i is the larger of (63-p) and the value
The fields shown as zeros must be set to zero and
that is the smaller of 43 and (63-b). (RB)0:i
are ignored by the hardware.
must contain bits 14:(i+14) of the virtual
address translated by the TLB to be invali- All TLB entries on all threads that have all of the
dated. If b>20, RB64-b:43 may contain any following properties are made invalid.
value and are ignored by the hardware. The entry translates an effective address for
If b<20, (RB)56:75-b must contain bits 58:77-b which EA0:i is equal to (RB)0:i.
of the virtual address translated by the TLB to The page size of the entry matches the page
be invalidated, and other bits in (RB)56:62 may size specified in (RB)56:58.
contain any value and are ignored by the The entry has the appropriate scope (partition
hardware. or process).
If b20, (RB)56:62 (AVAL - Abbreviated Virtual The process ID specified in RS matches the
Address, Lower) may contain any value and process ID in the TLB entry if not invalidating
are ignored by the hardware. a partition-scoped translation.
TLBELPID matches the partiion ID of the parti-
Let the segment size be equal to the segment size
tion for which the translation is to be invali-
specified in (RB)54:55 (B field). The contents of
dated.
RB54:55 must be the same as the contents of the B
field of the PTE that was used to create the TLB Additional TLB entries may also be made invalid if
entry to be invalidated. those TLB entries contain an LPID that matches
the partition ID of the partition for which the trans-
RB52:53 and RB59:62 (when (RB)63 = 0) must con-
lation is to be invalidated.
tain zeros and are ignored by the hardware.
If RIC=3, then an implementation-specific encoding of if the IS field in RB contains 0b11, MSRHV=1, and
AP indicates the number and (base) size of a series of PRS=1, for all theads, all Process Table caching is
sequential virtual pages for which the translations will invalidated.
be invalidated. The pages occupy an aligned region of If the IS field in RB contains 0b11, MSRHV=0, and
virtual storage. The address in RB is masked to get the PRS=1, for all threads, caching of Process Tables
base address of the region. tlbies are then performed for the partition for which the translation is to be
for the virtual pages within the region. invalidated is invalidated.
IS field in RB is non-zero When i>40, RB40:i-1 may contain any value and are
ignored by the hardware.
If RIC=0 or RIC=2, all partition-scoped TLB entries
when PRS=0 and either MSRHV=1 or R=0, or all pro- For all IS values
cess-scoped TLB entries when PRS=1 on all threads
For all threads, any implementation specific lookaside
for which any of the following conditions are met for the
information that is based on any TLB entry that would
entry are made invalid.
be invalidated by this instruction will also be invali-
The IS field in RB contains 0b10 or MSRHV=0 and
dated.
the IS field contains 0b11, and TLBELPID matches
the partition ID of the partition for which the trans- MSRSF must be 1 when this instruction is executed;
lation is to be invalidated. otherwise the results are undefined.
The IS field in RB contains 0b01, TLBELPID
matches the partition ID of the partition for which If the value specified in RS0:31, RS32:63, RB54:55 when
the translation is to be invalidated, and R=0, RB56:58 when RB63=0, or RB44:51 when RB63=1
TLBEPID=RS0:31. is not supported by the implementation, the instruction
The IS field in RB contains 0b11 and MSRHV=1. is treated as if the instruction form were invalid.
If RIC=1 or RIC=2, if the following conditions are met, The operation performed by this instruction is ordered
the respective partition-scoped contents when PRS=0 by the eieio (or sync or ptesync) instruction with
and MSRHV=1 or process-scoped contents when respect to a subsequent tlbsync instruction executed
PRS=1 of the page walk cache are invalidated. by the thread executing the tlbie instruction. The oper-
If the IS field in RB contains 0b10 or if IS contains ations caused by tlbie and tlbsync are ordered by
0b11 and MSRHV=0, for all threads, all prop- eieio as a fourth set of operations, which is indepen-
erly-scoped page walk caching associated with the dent of the other four sets that eieio orders.
partition for which the translation is to be invali- This instruction is privileged except when LPCRGTSE=0
dated is invalidated. or when PRS=0 and HR=1, making it hypervisor privi-
If the IS field in RB contains 0b11 and MSRHV=1, leged.
the entire properly-scoped page walk caching for
each thread is invalidated. See Section 5.10, “Translation Table Update Synchro-
If the IS field in RB contains 0b01 (and PRS=1), for nization Requirements” for a description of other
all threads, all properly-scoped page walk caching requirements associated with the use of this instruction.
associated with process RS0:31 in the partition for Special Registers Altered:
which the translation is to be invalidated is invali-
dated. None
If RIC=2, if the following conditions are met, the respec- Extended Mnemonics:
tive partition and Process Table caching are invalidated Extended mnemonic for tlbie::
for all threads.
If the IS field in RB contains 0b01 and PRS=1, for Extended: Equivalent to:
all threads, caching of Process Table Entries for tlbie RB,RS tlbie RB,RS,0,0,0
process RS0:31 in the partition for which the trans-
lation is to be invalidated is invalidated.
If the IS field in RB contains 0b10, MSRHV=1, and Special Registers Altered:
PRS=0, for all threads, caching of Partition Tables None
for the partition for which the translation is to be
invalidated is invalidated.
If the IS field in RB contains 0b10 and PRS=1, for
all threads, caching of Process Tables for the parti-
tion for which the translation is to be invalidated is
invalidated.
if the IS field in RB contains 0b11, MSRHV=1, and
PRS=0, for all threads, all Partition Table caching
is invalidated.
actual_pg_size =
Programming Note page size specified in (RB)56:58
For tlbie[l] instructions in which (RB)63=0, the AP i = 51
value in RB is provided to make it easier for the else
hardware to locate address translations, in loo- base_pg_size = base page size specified
kaside buffers, corresponding to the address trans- in (RB)44:51
lation being invalidated. actual_pg_size =
actual page size specified in (RB)44:51
For tlbie[l] instructions the AP specification is not b log_base_2(base_pg_size)
binary compatible with versions of the architecture p log_base_2(actual_pg_size)
that precede Version 2.06. As an example, for an i = max(min(43,63-b),63-p)
actual page size of 64 KB AP=0b101, whereas sg_sizesegment size specified in (RB)54:55
software written for an implementation that com- for each TLB entry
plies with a version of the architecture that pre- if (entry_VA14:i+14 = (RB)0:i) &
cedes V. 2.06 would have AP=100 since AP was a (entry_sg_size = segment_size) &
1 bit value followed by 0s in RB57:58. If binary com- (entry_base_pg_size = base_pg_size) &
patibility is important, for a 64 KB page software (entry_actual_pg_size =actual_pg_size)&
can use AP=0b101 on these earlier implementa- (TLBELPID=search_LPID) &
tions since these implementations were required to (entry_process_scoped=0)
ignore RB57:58. then
if ((L = 0)|(b 20)) then
TLB entry invalid
Programming Note else
For tlbie[l] instructions the AVA and AVAL fields in if (entry_VA58:77-b = (RB)56:75-b) then
RB contain different VA bits from those in PTEAVA. TLB entry invalid
else
pg_size = page size specified in (RB)56:58
Programming Note p log_base_2(pg_size)
i = 63-p
An operating system that uses HPT translation
for each TLB entry
should only use tlbie to invalidate the translation if (entry_EA0:i = (RB)0:i) &
for a specific page when it knows whether VPM is (entry_pg_size = pg_size) &
active, and more specifically, what page size is (entry_LPID = search_LPID) &
actually in use for the target translation. The (entry_process_scoped = PRS) &
address comparison performed by tlbie is not sen- ((PRS = 0) |
sitive to whether VPM is active. As a result, the (entry_PID = (RS)0:31))
operating system must supply an AVA value that is then
appropriate for the page size that is in use. TLB entry invalid
case (0b01):
if RIC=0 | RIC=2 then
i implementation-dependent number, 40i51
for each TLB entry in set (RB)i:51
if (entry_LPID=search_LPID)
&(entry_PID=RS0:31)
&(entry_PRS=1)
then TLB entry invalid
if RIC=1 | RIC=2 then
invalidate process-scoped radix page walk
caching associated with process RS0:31 in
TLB Invalidate Entry Local X-form partition search_LPID
if (RIC=2)&(PRS=1) then
tlbiel RB,RS,RIC,PRS,R invalidate Process Table caching associated
with process RS0:31 in partition search_LPID
31 RS / RIC PRS R RB 274 / case (0b10):
0 6 11 12 14 15 16 21 31
if RIC=0 | RIC=2 then
iimplementation-dependent number, 40i51
if (PRS=0)&((MSRHV=1)|(R=0)) then
IS (RB)52:53 for each partition-scoped TLB entry in set
search_LPID=LPIDRLPID (RB)i:51
switch(IS) if entry_LPID=search_LPID
case (0b00): then TLB entry invalid
If RIC=0 if PRS=1 then
If R=0 for each process-scoped TLB entry in
L (RB)63 set (RB)i:51
if L = 0 then if entry_LPID=search_LPID
base_pg_size = 4K then TLB entry invalid
R=0, IS=1, and RIC2 (HPT translations are not contain any value and are ignored by the
associated with processes.) hardware.
R=0, RIC=2, PRS=0, HV=0, and IS=2 or 3 (The If b20, (RB)56:62 (AVAL - Abbreviated Virtual
similar cases with RIC=0 allow the HPT OS to Address, Lower) may contain any value and
invalidate all of its TLB entries. The only incremen- are ignored by the hardware.
tal function of these cases is to invalidate partition
Let the segment size be equal to the segment size
table caching, which the OS is not permitted to do.)
specified in (RB)54:55 (B field). The contents of
The results of an attempt to invalidate a translation out- RB54:55 must be the same as the contents of the B
side of quadrant 0 for Radix Tree translation (R=1, field of the PTE that was used to create the TLB
RIC=0, PRS=1, IS=0, and EA0:10b00) are boundedly entry to be invalidated.
undefined. All TLB entries that have all of the following proper-
ties are made invalid on the thread executing the
IS field in RB contains 0b00 tlbiel instruction.
If RIC=0, this is a search for a single TLB entry. The The entry translates a virtual address for
following relationships must be true and tests and which all the following are true.
actions are performed to search for an HPT translation. VA14:14+i is equal to (RB)0:i.
L=0 or b20 or, if L=1 and b<20,
If the base page size specified by the PTE that was VA58:77-b is equal to (RB)56:75-b.
used to create the TLB entry to be invalidated is 4 The segment size of the entry is the same as
KB, the L field in register RB must contain 0. the segment size specified in (RB)54:55.
Either of the following is true:
If the L field in RB contains 0, the base page size is The L field in RB is 0, the base page
4 KB and RB56:58 (AP - Actual Page size field) size of the entry is 4 KB, and the actual
must be set to the SLBEL||LP encoding for the page page size of the entry matches the
size corresponding to the actual page size speci- actual page size specified in (RB)56:58.
fied by the PTE that was used to create the TLB The L field in RB is 1, the base page
entry to be invalidated. Thus, b is equal to 12 and p size of the entry matches the base
is equal to log2 (actual page size specified by page size specified in (RB)44:51, and
(RB)56:58). The Abbreviated Virtual Address (AVA) the actual page size of the entry
matches the actual page size specified
field in register RB must contain bits 14:65 of the
in (RB)44:51.
virtual address translated by the TLB entry to be TLBELPID = LPIDRLPID.
invalidated. Variable i is equal to 51.
The following relationships must be true and tests and
If the L field in RB contains 1, the following rules
actions are performed to search for a Radix Tree trans-
apply.
lation. For a partition-scoped invalidation, references
The base page size and actual page size are
to the effective address are understood to refer to the
specified in the LP field in register RB, where
guest real address.
the relationship between (RB)44:51 (LP - Large
Page size selector field) and the base page The page size is encoded in RB56:58 (AP - Actual
size and actual page size is the same as the Page size field). Thus p is equal to log2( page size
relationship between PTELP and the base specified by RB56:58). The Effective Page Number
page size and actual page size, except for the (EPN) field in register RB must contain the bits 0:i
“r” bits (see Section 5.7.9.1 on page 998 and of the effective address translated by the TLB entry
Figure 32 on page 999). Thus, b is equal to to be invalidated. Variable i is equal to 63-p.
log2 (base page size specified by (RB)44:51)
The fields shown as zeros must be set to zero and
and p is equal to log2 (actual page size speci-
are ignored by the hardware.
fied by (RB)44:51). Specifically, (RB)44+c:51
must be equal to the contents of bits c:7 of the All TLB entries that have all of the following proper-
LP field of the PTE that was used to create the ties are made invalid on the thread executing the
TLB entry to be invalidated, where c is the tlbiel instruction..
maximum of 0 and (20-p). The entry translates an effective address for
Variable i is the larger of (63-p) and the value which EA0:i is equal to (RB)0:i.
that is the smaller of 43 and (63-b). (RB)0:i The page size of the entry matches the page
must contain bits 14:(i+14) of the virtual size specified in (RB)56:58.
address translated by the TLB to be invali- The entry has the appropriate scope (partition
dated. If b>20, RB64-b:43 may contain any or process).
value and are ignored by the hardware. The process ID specified in RS matches the
If b<20, (RB)56:75-b must contain bits 58:77-b process ID in the TLB entry if not invalidating
of the virtual address translated by the TLB to a partition-scoped translation.
be invalidated, and other bits in (RB)56:62 may
TLBELPID matches the partiion ID of the parti- When i>40, RB40:i-1 may contain any value and are
tion for which the translation is to be invali- ignored by the hardware.
dated.
For all IS values
IS field in RB is non-zero
Any implementation specific lookaside information that
If RIC=0 or RIC=2, (RB)i:51 (bits i-40:11 of the SET field is based on any TLB entry that would be invalidated by
in (RB)) specify a set of TLB entries, where i is an this instruction will also be invalidated.
implementation-dependent value in the range 40:51.
Each partition-scoped entry when PRS=0 and either Depending on the variant of the instruction, RB0:39,
MSRHV=1 or R=0, or each process-scoped entry when RB59:62, RB59:63, RB54:55, and RB54:63 are the equiva-
PRS=1 in the set is invalidated if any of the following lent of reserved fields, should contain 0s, and are
conditions are met for the entry. ignored by the hardware. RS32:63 is always the equiva-
The IS field in RB contains 0b10, or MSRHV=0 and lent of a reserved field, should contain 0s, and is
the IS field contains 0b11, and TLBELPID = ignored by the hardware.
LPIDRLPID. Only TLB entries, page walk caching, and process and
The IS field in RB contains 0b01, Segment Table caching on the thread executing the
TLBELPID=LPIDRLPID, and TLBEPID=RS0:31. tlbiel instruction are affected.
The IS field in RB contains 0b11 and MSRHV=1.
MSRSF must be 1 when this instruction is executed;
How the TLB is divided into the 252-i sets is implemen- otherwise the results are boundedly undefined.
tation-dependent. The relationship of virtual addresses
to these sets is also implementation-dependent. How- If the value specified in RS0:31, RB54:55, RB56:58, or
ever, if, in an implementation, there can be multiple RB44:51, when it is needed to perform the specified
TLB entries for the same virtual address and same par- operation, is not supported by the implementation, the
tition, then all these entries must be in a single set. instruction is treated as if the instruction form were
invalid.
If RIC=1 or RIC=2, if the following conditions are met,
the respective partition-scoped contents when PRS=0 This instruction is privileged except when PRS=0 and
and MSRHV=1 or process-scoped contents when HR=1, making it hypervisor privileged.
PRS=1 of the page walk cache are invalidated. See Section 5.10, “Translation Table Update Synchro-
If the IS field in RB contains 0b10 or if IS contains nization Requirements” on page 1043 for a description
0b11 and MSRHV=0, all properly-scoped page of other requirements associated with the use of this
walk caching associated with partition LPDIRLPID instruction.
is invalidated.
If the IS field in RB contains 0b11 and MSRHV=1, Special Registers Altered:
the entire properly-scoped page walk caching is None
invalidated. Extended Mnemonics:
If the IS field in RB contains 0b01 (and PRS=1), all
properly-scoped page walk caching associated Extended mnemonic for tlbiel::
with process RS0:31 in partition LPIDRLPID is inval-
idated. Extended: Equivalent to:
tlbiel RB tlbiel RB,r0,0,0,0
If RIC=2, if the following conditions are met, the respec-
tive partition and Process Table caching are invali-
dated.
If the IS field in RB contains 0b01 and PRS=1, Programming Note
caching of Process Table Entries for process tlbie and tlbiel serve as both basic and extended
RS0:31 in partition LPIDRLPID is invalidated. mnemonics. The Assembler will recognize a tlbie
If the IS field in RB contains 0b10, MSRHV=1, and or tlbiel mnemonic with five operands as the basic
PRS=0, caching of Partition Tables for partition form, and a tlbie with two operands or a tlbiel
LPIDRLPID is invalidated. mnemonic with one operand as the extended form.
If the IS field in RB contains 0b10 and PRS=1, In the extended form the RIC, PRS, and R oper-
caching of Process Tables for partition LPIDRLPID ands, and for tlbiel the RS operand, are omitted
is invalidated. and assumed to be 0.
if the IS field in RB contains 0b11, MSRHV=1, and
PRS=0, all Partition Table caching is invalidated.
if the IS field in RB contains 0b11, MSRHV=1, and
PRS=1, all Process Table caching is invalidated.
If the IS field in RB contains 0b11, MSRHV=0, and
PRS=1, caching of Process Tables for partition
LIDRLPID is invalidated.
Programming Note
The primary use of this instruction by hypervisor TLB Synchronize X-form
software is to invalidate TLB entries prior to reas-
signing a thread to a new logical partition. tlbsync
Programming Note
tlbsync should not be used to synchronize the
completion of tlbiel.
Tree translation but PTEG for HPT translation), multiple are being modified. (For Partition Table Entries, and for
entries will also be prevented. (Secondary hash groups Process Table Entries when HR=1, the process or par-
will generally not be covered by the same reservation tition affected must be inactive because the entries do
granule as primary hash groups.) not have valid bits.) With the exceptions previously
identified for Segment Table walks (see Section 5.9.3,
Updates (by software) to the tables are performed only
“Lookaside Buffer Management”), any thread, including
when they are known to be required by the sequential
a thread on which software is modifying any of the set
execution model (see Section 5.5). Because address
of tables described in the first sentence, may look in
translation for instructions preceding a given Store
those tables at any time in an attempt to translate an
instruction might cause an interrupt, and thereby pre-
address. When modifying an entry in any of the former
vent the corresponding store from being required by
set of tables, software must ensure that the table
the sequential execution model, address translations
entry’s V bit is 0 if the table entry does not correctly
for instructions preceding the Store instruction must be
specify its portion of the translation (e.g., if the RPN
performed before the corresponding store is per-
field is not correct for the current AVA field).
formed. As a result, an update to a translation table
need not be preceded by a context synchronizing For HPT translation, updates of Reference and
instruction. Change bits by the hardware are not synchronized
with the accesses that cause the updates. When
All of the sequences require a context synchronizing
modifying doubleword 1 of a PTE, software must take
operation after the sequence if the new contents of the
care to avoid overwriting a hardware update of these
translation table are to be used for address translations
bits and to avoid having the value written by a Store
associated with subsequent instructions.
instruction overwritten by a hardware update.
As noted in the description of the Synchronize instruc-
The most basic sequence that will achieve proper sys-
tion in Section 4.6.3 of Book II, address translation
tem synchronization for PTE updates is the following.
associated with instructions which occur in program
tlbie instruction(s) specifying the same LPID oper-
order subsequent to the Synchronize (and this includes
and value
the ptesync variant) may be performed prior to the
eieio
completion of the Synchronize. To ensure that these
tlbsync
instructions and data which may have been specula-
ptesync
tively fetched are discarded, a context synchronizing
operation is required. Other instructions may be interleaved among these
instructions. Operating system and hypervisor soft-
Programming Note ware that updates Page Table Entries should use this
In many cases this context synchronization will sequence.
occur naturally; for example, if the sequence is exe- Operating systems and nested hypervisors are
cuted within an interrupt handler the rfid, rfscv, or exposed to being interrupted during this sequence.
hrfid instruction that returns from the interrupt han- The interrupting hypervisor is responsible for complet-
dler may provide the required context synchroniza- ing the sequence above. In general this will require the
tion. hypervisor to include the following sequence in an
interrupt handler.
Translation table entries must not be changed in a eieio
manner that causes an implicit branch. tlbsync
ptesync
5.10.1 Translation Table Updates This sequence itself may be interrupted by a higher
level hypervisor. When returning to the interrupted soft-
TLBs are non-coherent caches of the HTABs and Radix ware, the original sequence will be completed. Hard-
Trees. TLB entries must be invalidated explicitly with ware must tolerate the result of nested interleaving of
one of the TLB Invalidate instructions. SLBs are these sequences. tlbie and tlbsync instructions
non-coherent caches of the Segment Tables, SLB should only be used as part of these sequences.
entries must be invalidated explicitly with one of the
SLB Invalidate instructions. Page Walk Caches are The corresponding sequence for Segment Table
non-coherent caches of the intermediate steps in Radix updates uses slbieg in place of tlbie and slbsync in
Tree translation. Non-coherent caching of the Partition place of tlbsync. Similarly slbieg and slbsync should
and Process Tables is permitted. Provision has been only be used as part of these sequences. In circum-
made for the use of the TLB Invalidate instructions to stances where a hypervisor may be interrupting either a
manage the types of caching described in the preced- PTE update or a Segment Table update, it must include
ing two sentences at a PID or LPID granularity. both tlbsync and slbsync in its completing sequence,
in either order. Hardware must tolerate the result of
Unsynchronized lookups in the Page, Segment, and nested interleaving of these additional sequences.
when HR=0, Process Tables continue even while they
PTE, maintain a consistent state, ensure that the trans- dated, the following sequence can be used to modify
lation instantiated by the old entry is no longer avail- the STE, maintain a consistent state, ensure that the
able, and ensure that a subsequent reference to the translation instantiated by the old entry is no longer
virtual address translated by the new entry will use the available, and ensure that a subsequent reference to
correct real address and associated attributes. the effective address translated by the new entry will
use the correct virtual address and associated attri-
The following sequence is to interact correctly with
butes. (The sequence is much like the general case for
atomic hardware updates. It returns stable Reference
a change to a PTE that is subject to non-atomic hard-
and Change bit values for the old translation and is
ware updates, and is equivalent to deleting the STE
safe for multitheaded software. If the purpose of the
and then adding a new one.) Mutual exclusion with
sequence is mainly to collect Reference and Change
respect to other software threads may be required. A
bit values, the part of the sequence beginning with tlbie
similar sequence (except using tlbie with RIC=2 and
may be deferred and performed as a bulk invalidation
tlbsync) may be used to modify HR=0 Process Table
(e.g. for a range of storage or an entire process) after
Entries.
collecting values for a plurality of pages. A similar
seqence (i.e. using Load And Reserve and Store Con- STEV 0 /* (other fields don’t matter)*/
ditional instructions) can be used to update a Segment ptesync /* order update before slbieg and
Table Entry but will not interact correctly with before next Segment Table search */
non-atomic hardware Reference and Change bit slbieg(old_B,old_ESID,old_TA,old_PID,old_LPID)
updates. /*invalidate old translation*/
r6PTEV L SW RPN R C Att EAA eieio /* order slbieg before slbsync */
r4addr(pte) slbsync /* order slbieg before ptesync */
loop: ptesync /* order slbieg, slbsync and 1st
lqarx r2,0,r4 update before 2nd update */
if V=0 abort, else /* to interact with locking */ /* deletion sequence ends here */
stqcx r6,0,r4 STEVSID, Ks, Kp, N, L, C, LP, SW new values
bne- loop eieio /* order 2nd update before 3rd */
ptesync /* order update before tlbie and STEESID,V new values (V=1)
before next Page Table search */ ptesync /* order 2nd and 3rd updates before
tlbie(old_EA0:63-b,old_AP,old_PID, next Segment Table search and
old_LPID) before next data access */
eieio /* order tlbie before tlbsync */
tlbsync /* order tlbie before ptesync */ Resetting the Reference Bit (PTE)
ptesync /*complete the sequence, stores ordered
/*by first ptesync If the only change being made to a valid entry is to set
the Reference bit to 0, a simpler sequence suffices
The corresponding sequence for non-atomic hardware
because the Reference bit need not be maintained
updates is the following. (The sequence is equivalent
exactly. The byte store is exposed to overwriting
to deleting the PTE and then adding a new one.)
another change being performed by multithreaded soft-
Mutual exclusion with respect to other software threads
ware, so mutual exclusion may be required.
may be required. The Reference and Change bit val-
ues will not be stable until the entire sequence is com-
oldR PTER /* get old R */
pleted. if oldR = 1 then
PTEV 0 /* (other fields don’t matter)*/ PTER 0 /* store byte (R=0, other bits
ptesync /* order update before tlbie and unchanged) */
before next Page Table search */ tlbie(B,VA14:77-b,L,LP,AP,LPID) /* invalidate
tlbie(old_B,old_VA14:77-b,old_L,old_LP,old_AP, entry */
old_LPID) eieio /* order tlbie before tlbsync */
/*invalidate old translation*/ tlbsync /* order tlbie before ptesync */
eieio /* order tlbie before tlbsync */ ptesync /* order tlbie, tlbsync, and update
tlbsync /* order tlbie before ptesync */ before next Page Table search
ptesync /* order tlbie, tlbsync and 1st and before next data access */
update before 2nd update */
PTEARPN,LP,AC,R,C,WIMG,N,PP new values
eieio /* order 2nd update before 3rd */ Setting a Reference or Change Bit or
PTEB,AVA,SW,L,H,V new values (V=1) Upgrading Access Authority (PTE
ptesync /* order 2nd and 3rd updates before Subject to Atomic Hardware Updates)
next Page Table search and
before next data access */ If the only change being made to a valid PTE that is
subject to atomic hardware updates is to set the Refer-
General Case(STE) ence or Change bit to 1 or to add access authorities, a
simpler sequence suffices because the translation
If a valid entry is to be modified and the translation hardware will refetch the PTE if an access is attempted
instantiated by the entry being modified is to be invali- for which the only problems were reference and/or
change bits needing to be set or insufficient access before next data access */
authority. The store is exposed to overwriting another
change being performed by multithreaded software, so
mutual exclusion may be required.
Chapter 6. Interrupts
HSRR1
6.2 Interrupt Registers 0 63
Segment Descriptor Register (ASDR). When nested caused the interrupt, with the high-order 32 bits of the
Radix Tree translation is taking place, the ASDR will HDAR set to 0 if the interrupt occurs in 32-bit mode.
generally provide the guest real address down to bit 51.
(The smallest supported page size is 4k.) When using HDAR
paravirtualized HPT translation, information from the 0 63
segment descriptor that was used to perform the effec-
tive to virtual translation is provided in the ASDR. For Figure 57. Hypervisor Data Address Register
exceptions that take place when translating the
address of the process table entry or segment table 6.2.6 Data Storage Interrupt
entry group, only the VSID will be provided, because
those addresses are specified as virtual addresses and Status Register
the rest of the segment descriptor is implied. Some
instances of the Machine Check interrupt may require The Data Storage Interrupt Status Register (DSISR) is
the ASDR to be set similarly to how it is set for the a 32-bit register that is set by the Machine Check, Data
hypervisor storage interrupts. The ASDR is set inde- Storage, and Data Segment interrupts; see Sections
pendent of the value of UPRT for the partition that is 6.5.2, 6.5.3, and 6.5.4.
running.
Programming Note
HFSCR58:60 are used to control the availability of
Transactional Memory, the Performance Monitor,
and the BHRB in problem and privileged
non-hypervisor states. FSCR58:60 are reserved
since the availability of Transactional Memory is
controlled by the MSR, and the availability of the
Performance Monitor and BHRB is controlled by
MMCR0.
Value Meaning
Programming Note
Notice that rfebb, rfscv, [h]rfid, and mtmsrd 0:7 Interruption Cause (IC)
instructions can cause a TM Bad Thing type Pro- When a Hypervisor Facility Unavailable inter-
gram interrupt even when executed in a privilege rupt occurs, the IC field contains a binary
state in which TM is made unavailable by the number indicating the access that was
HFSCR. Here are two examples. Both assume that attempted. The values and their meanings are
HFSCRTM=0; the second assumes that specified below.
HFSCREBB=1. 00 Access to a Floating Point register or exe-
An operating system, running with MSRTS TM cution of a Floating Point instruction
= 0b000 (N0), sets SRR129:31 to 0b101 (T1) 01 Access to a Vector or VSX register or exe-
then executes rfid. The attempted illegal cution of a Vector or VSX instruction
transaction state transition will cause a TM 02 Access to the DSCR at SPRs 3 or 17
Bad Thing type Program interrupt, despite the 03 Read or write access of a Performance
fact that TM is made unavailable in privileged Monitor SPR in group A, or read access of
non-hypervisor state by the HFSCR. a Performance Monitor SPR in group B.
An application program, running with (See Section 9.4.1 for a definition of
MSRTS TM = 0b000 (N0), sets BESCRTS to groups A and B.)
04 Execution of a BHRB Instruction
0b01 (S) then executes rfebb. The attempted
05 Access to a Transactional Memory SPR
illegal transaction state transition will cause a
or execution of a Transactional Memory
TM Bad Thing type Program interrupt, despite
instruction
the fact that TM is made unavailable in prob-
06 Reserved
lem state by the HFSCR.
07 Access to an Event-Based Branch SPR or
This anomaly cannot be caused by the PCR. execution of an Event-Based Branch
rfscv, [h]rfid, and mtmsrd cannot be exe- instruction
cuted in the privilege state (problem state) in 08 Access to the Target Address Register
which TM is made unavailable by the PCR. 09 Access to the stop instruction in privileged
rfebb can be executed in the privilege state in non-hypervisor state when one or more of
which TM is made unavailable by the PCR, but the following conditions exist.
the PCR bit that makes TM unavailable (the PSSCREC=1
v2.06 bit) also makes rfebb unavailable. PSSCRESL=1
PSSCRMTL>PSSCRPSLL
Another difference between the HFSCR and the PSSCRRL>PSSCRPSLL
PCR is that PCRv2.06=1 prevents a thread from 0A Access to the msgsndp or msgclrp
being simultaneously in problem state and in instructions, the TIR or the DPDES Regis-
Transactional or Suspended state and ter
HFSCRTM=0 does not. However, if the hypervisor
always returns to the partition in Non-transactional All other values are reserved.
state when HFSCRTM=0, the partition will be 8:63 Facility Enable (FE)
unable to enter Transactional or Suspended state.
The FE field controls the availability of various
When the PCR makes a facility unavailable in problem facilities in problem and privileged non-hyper-
state, the facility is treated as not defined in problem visor states as specified below.
state; any Hypervisor Facility Unavailable interrupt that 8:52 Reserved
would occur if the facility were not made unavailble by
the PCR does not occur as a result of problem state Programming Note
access. See Section 2.5 for additional information. There is no bit in this register controlling
When a Hypervisor Facility Unavailable interrupt the availability of the stop instruction
occurs, the facility that was accessed is indicated in the because the availability of stop in privi-
most-significant byte of the HFSCR. leged non-hypervisor state is controlled by
the PSSCR. See Section 3.2.3.
IC Facility Control
0 8 63
53 msgsndp instructions and SPRs (MSGP)
Figure 64. Hypervisor Facility Status and Control
Register 0 The msgsndp and msgclrp instructions
and the TIR and DPDES registers are not
The contents of the HFSCR are specified below.
available in privileged non-hypervisor
state.
1 The msgsndp and msgclrp instructions mance Monitor exceptions do not cause
and the TIR and DPDES registers are Performance Monitor interrupts to occur
available in privileged non-hypervisor when the thread is in problem or privi-
state unless made unavailable by another leged states.
register. 1 Read and write operations of Perfor-
mance Monitor SPRs in group A and read
54 Reserved
operations of Performance Monitor SPRs
55 Target Address Register (TAR) in group B are available in problem and
0 The TAR and bctar instruction are not privileged states unless made unavail-
available in problem and privileged able by another register; read and write
non-hypervisor state. operations to privileged Performance
1 The TAR and bctar instruction are avail- Monitor registers (SPRs 784-792,
able in problem and privileged states 795-798) are available in privileged state;
unless made unavailable by another reg- Performance Monitor interrupts to occur if
ister. MSREE=1 and MMCR0EBE=0. See
Section 9.2 of Book III for additional infor-
56 Event-Based Branch Facility (EBB) mation
0 The Event-Based Branch facility SPRs 61 Data Stream Control Register (DSCR)
and instructions are not available in prob-
lem and privileged non-hypervisor states, 0 SPR 3 is not available in problem or privi-
and event-based exceptions and leged non-hypervisor states and SPR 17
branches do not occur. is not available in privileged non-hypervi-
1 The Event-Based Branch facility SPRs sor state.
and instructions are available in problem 1 SPR 3 is available in problem and privi-
and privileged states unless made leged states and SPR 17 is available in
unavailable by another register, and privileged state unless made unavailable
event-based exceptions and branches are by another register.
allowed to occur if enabled by other bits. 62 Vector and VSX Facilities (VECVSX)
57 Reserved 0 The facilities whose availability is con-
58 Transactional Memory Facility (TM) trolled by either MSRVEC or MSRVSX are
not available in problem and privileged
0 The Transactional Memory Facility SPRs non-hypervisor states.
and instructions are not available in prob- 1 The facilities whose availability is con-
lem and privileged non-hypervisor states. troled by either MSRVEC or MSRVSX are
1 The Transactional Memory Facility SPRs available in problem and privileged states
and instructions are available in problem unless made unavailable by another reg-
and privileged states unless made ister.
unavailable by another register.
63 Floating Point Facility (FP)
59 BHRB Instructions (BHRB)
0 The facilities whose availability is con-
0 The BHRB instructions (clrbhrb, trolled by MSRFP are not available in
mfbhrbe) are not available in problem problem and privileged non-hypervisor
and privileged non-hypervisor states. states.
1 The BHRB instructions (clrbhrb, 1 The facilities whose availability is con-
mfbhrbe) are available in problem and trolled by MSRFP are available in problem
privileged states unless made unavail- and privileged states unless made
able by another register. unavailable by another register.
60 Performance Monitor Facility SPRs (PM)
0 Read and write operations of Perfor-
mance Monitor SPRs in group A and read
operations of Performance Monitor SPRs
in group B are not available in problem
and privileged non-hypervisor states; read
and write operations to privileged Perfor-
mance Monitor registers (SPRs 784-792,
795-798) are not available in privileged
non-hypervisor state. (See Section 9.4.1
for a definition of groups A and B.) Perfor-
Programming Note
The FSCR can be used to determine whether a
particular facility is being used by an application,
and the HFSCR can be used to determine whether
a particular facility is being used by either an appli-
cation or by an operating system. This is done by
disabling the facility initially, and enabling it in the
interrupt handler upon first usage. The information
about the usage of a particular facility can be used
to determine whether that facility’s state must be
saved and restored when changing program con-
text.
Programming Note
The following tables summarize the interrupts that occur as a result of accessing the non-privileged Performance
Monitor registers in problem state when MMCR0PMCC, PCR, and HFSCR are set to various values. (Accesses to
privileged Performance Monitor SPRs (SPRs 784-792, 795-798) in problem state result in Privileged Instruction Type
Program interrupts.)
mfspr mtspr
PMCC PMCC
SPR # 00 01 10 11 00 01 10 11
3 4 4 4 4 4 4 4
MMCR2 769 HU FU, HU HU HU HE,HU FU, HU HU HU4
4 4 4 4 4 4 4
MMCRA 770 HU FU, HU HU HU HE,HU FU, HU HU HU4
PMC1 771 HU4 FU, HU4 HU4 HU4 HE,HU4 FU, HU4 HU4 HU4
4 4 4 4 4 4 4
HU4
Group A
SIER3 768 HU4 FU, HU4 HU4 HU4 See 2. See 2. See 2. See 2.
HU4 HU4 HU4 HU4
Group B
Programming Note
When an MSR bit makes a facility unavailable, the
facility is made unavailable in all privilege states.
Examples of this include the Floating Point, Vector,
and VSX facilities. The FSCR and HFSCR affect
the availability of facilities only in privilege states
that are lower than the privilege of the register
(FSCR or HFSCR).
With the exception of System Reset and Machine 2. An interrupt is generated such that all instructions
Check interrupts, all interrupts are context synchroniz- preceding the instruction causing the exception
ing as defined in Section 1.5.1. System Reset and appear to have completed with respect to the exe-
Machine Check interrupts are context synchronizing if cuting thread.
they are recoverable (i.e., if bit 62 of SRR1 is set to 1 3. The instruction causing the exception may appear
by the interrupt). If a System Reset or Machine Check not to have begun execution (except for causing
interrupt is not recoverable (i.e., if bit 62 of SRR1 is set the exception), may have been partially executed,
to 0 by the interrupt), it acts like a context synchronizing or may have completed, depending on the inter-
operation with respect to subsequent instructions. That rupt type.
is, a non-recoverable System Reset or Machine Check
interrupt need not satisfy items 1 through 3 of Section 4. Architecturally, no subsequent instruction has
1.5.1, but does satisfy items 4 and 5. begun execution, except that if an mtspr
sequence started by mtgsr is active when the
interrupt occurs, some of the sequence’s mtsprs
6.4 Interrupt Classes beyond the interrupt point may have been exe-
cuted; see Chapter 11 of Book III.
Interrupts are classified by whether they are directly
caused by the execution of an instruction or are caused
by some other system exception. Those that are “sys-
6.4.2 Imprecise Interrupt
tem-caused” are: This architecture defines one imprecise interrupt, the
System Reset Imprecise Mode Floating-Point Enabled Exception type
Machine Check Program interrupt.
External When an Imprecise Mode Floating-Point Enabled
Decrementer Exception type Program interrupt occurs, the following
Directed Privileged Doorbell conditions exist at the interrupt point.
Hypervisor Decrementer
Hypervisor Maintenance 1. SRR0 addresses either the instruction causing the
Hypervisor Virtualization exception or some instruction following that
Directed Hypervisor Doorbell instruction; see Section 6.5.9, “Program Interrupt”
Performance Monitor on page 1074.
External, Decrementer, Hypervisor Decrementer, 2. An interrupt is generated such that all instructions
Directed Privileged Doorbell, Directed Hypervisor Door- preceding the instruction addressed by SRR0
bell, Hypervisor Maintenance, and Hypervisor Virtual- appear to have completed with respect to the exe-
ization interrupts are maskable interrupts. Therefore, cuting thread.
software may delay the generation of these interrupts. 3. The instruction addressed by SRR0 may appear
System Reset and Machine Check interrupts are not not to have begun execution (except, in some
maskable. cases, for causing the interrupt to occur), may
“Instruction-caused” interrupts are further divided into have been partially executed, or may have com-
two classes, precise and imprecise. pleted; see Section 6.5.9.
4. No instruction following the instruction addressed
by SRR0 appears to have begun execution, except
that if an mtspr sequence started by mtgsr is
active when the interrupt occurs, some of the
sequence’s mtsprs beyond the interrupt point may
have been executed; see Chapter 11.
Programming Note
For instruction-caused interrupts, in some cases it may If the instruction is a Storage Access instruction, the
be desirable for the operating system to emulate the emulation must satisfy the atomicity requirements
instruction that caused the interrupt, while in other described in Section 1.4 of Book II.
cases it may be desirable for the operating system not
In general, the instruction should not be emulated if:
to emulate the instruction. The following list, while not
complete, illustrates criteria by which decisions regard- - The purpose of the instruction is to cause an
ing emulation should be made. The list applies to gen- interrupt. Example: System Call interrupt
eral execution environments; it does not necessarily caused by sc.
apply to special environments such as program debug-
- The interrupt is caused by a condition that is
ging, bring-up, etc.
stated, in the instruction description, poten-
In general, the instruction should be emulated if: tially to cause the interrupt. Example: Align-
ment interrupt caused by lwarx for which the
- The interrupt is caused by a condition for
storage operand is not aligned.
which the instruction description (including
related material such as the introduction to the - The program is attempting to perform a func-
section describing the instruction) implies that tion that it should not be permitted to perform.
the instruction works correctly. Example: Example: Data Storage interrupt caused by
Alignment interrupt caused by lmw for which lwz for which the storage operand is in stor-
the storage operand is not aligned, or by dcbz age that the program should not be permitted
for which the storage operand is in storage to access. (If the function is one that the pro-
that is Write Through Required or Caching gram should be permitted to perform, the con-
Inhibited. ditions that caused the interrupt should be
corrected and the program re-dispatched such
- The instruction is an illegal instruction that
that the instruction will be re-executed. Exam-
should appear, to the program executing it, as
ple: Data Storage interrupt caused by lwz for
if it were supported by the implementation.
which the storage operand is in storage that
Example: A Hypervisor Emulation Assistance
the program should be permitted to access
interrupt is caused by an instruction that has
but for which there currently is no PTE that
been phased out of the architecture but is still
satisfies the Page Table search.)
used by some programs that the operating
system supports.
Programming Note
If a program modifies an instruction that it or
another program will subsequently execute and the
execution of the instruction causes an interrupt, the
state of storage and the content of some registers
may appear to be inconsistent to the interrupt han-
dler program. For example, this could be the result
of one program executing an instruction that
causes a Hypervisor Emulation Assistance inter-
rupt just before another instance of the same pro-
gram stores an Add Immediate instruction in that
storage location. To the interrupt handler code, it
would appear that a hardware generated the inter-
rupt as the result of executing a valid instruction.
Programming Note
Instructions excluded from the list include the fol-
lowing.
instructions that set or use XERCA
instructions that set XEROV or XERSO
andi., andis., and fixed-point instructions with
Rc=1 (Fixed-point instructions with Rc=1 can
be replaced by the corresponding instruction
with Rc=0 followed by a Compare instruction.)
all floating-point instructions
mftb
These instructions, and the other excluded instruc-
tions, may be implemented with the assistance of
the Hypervisor Emulation Assistance interrupt, or of
implementation-specific interrupts that modify
HSRR0 and HSRR1. The included instructions are
guaranteed not to be implemented thus. (The
included instructions are sufficiently simple as to be
unlikely to need such assistance. Moreover, they
are likely to be needed in interrupt handlers before
HSRR0 and HSRR1 have been saved or after
HSRR0 and HSRR1 have been restored.)
Effective
Address1 Interrupt Type
Effective 1 The values in the Effective Address column are
Address1 Interrupt Type interpreted as follows.
00..0000_0100 System Reset 00...0000_0nnn means
00..0000_0200 Machine Check 0x0000_0000_0000_0nnn unless the values
00..0000_0300 Data Storage of LPCRAIL and MSRHV IR DR cause the appli-
00..0000_0380 Data Segment cation of an effective address offset. See the
00..0000_0400 Instruction Storage description of LPCRAIL in Section 2.2 for more
00..0000_0480 Instruction Segment details.
0...00_0001_7nnn means
00..0000_0500 External
0x0000_0000_0001_7nnn unless the values
00..0000_0600 Alignment
of LPCRAIL and MSRHV IR DR cause the usage
00..0000_0700 Program of an alternate effective address. See the
00..0000_0800 Floating-Point Unavailable description of LPCRAIL in Section 2.2 for
00..0000_0900 Decrementer details.
Hypervisor Decrementer 2
00..0000_0980 Effective addresses 0x0000_0000_0000_0000
00..0000_0A00 Directed Privileged Doorbell through 0x0000_0000_0000_00FF are used by
00..0000_0B00 Reserved software and will not be assigned as interrupt
00..0000_0C00 System Call vectors.
00..0000_0D00 Trace
00..0000_0E00 Hypervisor Data Storage Figure 66. Effective address of interrupt vector by
00..0000_0E20 Hypervisor Instruction Storage interrupt type
00..0000_0E40 Hypervisor Emulation Assistance
00..0000_0E60 Hypervisor Maintenance Programming Note
00..0000_0E80 Directed Hypervisor Doorbell When address translation is disabled, use of any of
00..0000_0EA0 Hypervisor Virtualization the effective addresses that are shown as reserved
00..0000_0EC0 Reserved in Figure 66 risks incompatibility with future imple-
mentations.
00..0000_0EE0 Reserved for implementa-
tion-dependent interrupt for per-
formance monitoring
00..0000_0F00 Performance Monitor
00..0000_0F20 Vector Unavailable 6.5.1 System Reset Interrupt
00..0000_0F40 VSX Unavailable
If a System Reset exception causes an interrupt that is
00..0000_0F60 Facility Unavailable
not context synchronizing or causes the loss of a
00..0000_0F80 Hypervisor Facility Unavailable Machine Check exception or a Direct External excep-
00..0000_0FA0 Reserved tion, or if the state of the thread has been corrupted, the
. . . ... interrupt is not recoverable.
00..0000_0FFF Reserved
When the thread is in any power-saving level, a System
00..0001_7000 System Call Vectored
Reset interrupt occurs when a System Reset exception
00..0001_7020 System Call Vectored exists. When the thread is in a power-saving level that
. . . ... was entered when PSSCREC=1, a System Reset inter-
00..0001_7FE0 System Call Vectored rupt also occurs when any of the following events
00..0001_7FFF (end of scv interrupt vectors) occurs provided that the event is enabled to cause exit
from power-saving mode (see Section 2.2). When the
thread is in a power-saving level that allows the state of
the LPCR to be lost, it is implementation-specific
whether the following events, when enabled, cause
exit, or whether only a system-reset exception causes
exit.
External
Decrementer
Directed Privileged Doorbell
Directed Hypervisor Doorbell
Hypervisor Maintenance
34:36 Set to 0.
42:45 If the interrupt did not occur when the
thread was in power-saving mode, set to an
implementation-specific value. If the inter-
rupt occurred when the thread was in
power-saving mode, set to indicate the
Programming Note
If a Machine Check interrupt is caused by an error
in the storage subsystem, the storage subsystem
may return incorrect data, which may be placed
into registers. This corruption of register contents
may occur even if the interrupt is recoverable.
6.5.3 Data Storage Interrupt to access an accelerator that is not properly con-
figured for the software’s use.
A Data Storage interrupt occurs when no higher priority The access violates Basic Storage Protection.
exception exists and either The access violates Virtual Page Class Key Stor-
age Protection and LPCRKBV=0.
(a) a copy-paste transfer other than from main storage The process- and partition-scoped page attributes
to a properly initiated accelerator is attempted, or conflict.
An unsupported radix tree configuration is found in
(b) (MSRHV PR=0b10) & (MSRDR=0)) , or the process-scoped tables.
(c) HPT translation is being performed, the value of the A reference or change bit update cannot be per-
formed in a process-scoped PTE.
expression A Data Address Watchpoint match occurs.
((MSRHV PR=0b10)|((¬VPM|¬PRTEV)& MSRDR)) An attempt is made to execute a Load Atomic or
Store Atomic instruction with an invalid function
is 1, and a data access cannot be performed, code.
An attempt is made to execute a Fixed-Point Load
except for the case of MSRHV PR0b10,
or Store Caching Inhibited instruction with
VPM=0, LPCRKBV=1, and a Virtual Storage MSRDR=1 or specifying a storage location that is
specified by the Hypervisor Real Mode Storage
Page Class Key Protection exception exists or Control facility to be treated as non-Guarded.
(d) Radix Tree translation is being performed, and A Data Storage interrupt also occurs when no higher
either a Data Address Watchpoint match occurs, an priority exception exists and an attempt is made to exe-
cute a Load Atomic or Store Atomic instruction specify-
attempt is made to execute an AMO with an invalid ing an invalid function code.
function code, or process-scoped translation either
Programming Note
does not complete or prevents the data access from
When an attempt to execute a Load Atomic or
being performed Store Atomic instruction containing an invalid func-
tion code (see Figures 3 and 4 in Book II) causes a
for any of the following reasons that can occur in the DSI, the condition is very similar to an invalid form
respective translation state. (In the expression for (a) of an instruction. As a result, this instance of DSI
above, “¬PRTEV” is shorthand representing the case occurs with a high prioirty that blocks the transla-
of an invalid segment table descriptor stopping the tion process and prevents Reference and Change
translation process.) bit updates.
Data address translation is enabled (MSRDR=1)
and the effective or virtual address of any byte of If a stbcx., sthcx., stwcx., stdcx., or stqcx. would not
the storage location specified by a Load, Store, perform its store in the absence of a Data Storage inter-
icbi, dcbz, dcbst, or dcbf[l] instruction cannot be rupt, and either (a) the specified effective address
translated to a real address because no valid PTE refers to storage that is Write Through Required or
was found for the process-scoped Radix Tree Caching Inhibited, or (b) a non-conditional Store to the
translation or HPT translation with VPM off. specified effective address would cause a Data Storage
The address of the appropriate process table entry interrupt, it is implementation-dependent whether a
or segment table entry group cannot be translated Data Storage interrupt occurs.
when HR=0 and either VPM=0 or the process table
entry is invalid (independent of VPM). If the XER specifies a length of zero for an indexed
The effective address specified by a lq, stq, lwat, Move Assist instruction, a Data Storage interrupt does
ldat, lbarx, lharx, lwarx, ldarx, lqarx, stwat, not occur.
stdat, stbcx., sthcx., stwcx., stdcx., or stqcx. The following registers are set:
instruction refers to storage that is Write Through
Required or Caching Inhibited; or the effective SRR0 Set to the effective address of the instruc-
address specified by a copy or paste. instruction tion that caused the interrupt.
refers to storage that is Caching Inhibited; or the SRR1
effective address specified by a lwat, ldat, stwat, 33:36 Set to 0.
or stdat instruction refers to storage that is 42:47 Set to 0.
Guarded. Others Loaded from the MSR.
An accelerator is specified as the source of a copy
MSR See Figure 65.
instruction, normal memory is specified at the tar-
get of a paste. instruction, or an attempt is made DSISR
32 Set to 0.
33 Set to 1 if MSRDR=1 and the translation for ured for the software’s use; otherwise set to
an attempted access is not found in the 0. These exceptions are presented differ-
Page Table; otherwise set to 0.. ently from most instruction-caused excep-
34 Set to 1 if the process- and partition-scoped tions. See Section 4.4, “Copy-Paste
page attributes conflict; otherwise set to 0. Facility”, in Book II for details. Additional
35 Set to 0. information may be retained by the platform
36 Set to 1 if the access is not permitted by if the accelerator is not properly configured.
Figure 44 46, or the privilege, read, or 61 Set to 1 if an attempt is made to execute a
read/write bits in Figure 45 as appropriate; Load Atomic or Store Atomic instruction
otherwise set to 0. specifying an invalid function code; other-
37 Set to 1 if the access is due to a lq, stq, wise set to 0.
lwat, ldat, lbarx, lharx, lwarx, ldarx, 62 Set to 1 if an attempt is made to execute a
lqarx, stwat, stdat, stbcx., sthcx., stwcx., Fixed-Point Load or Store Caching Inhib-
stdcx., or stqcx. instruction that addresses ited instruction with MSRDR=1 or specifying
storage that is Write Through Required or a storage location that is specified by the
Caching Inhibited; or if the access is due to Hypervisor Real Mode Storage Control
a copy or paste. instruction that addresses facility to be treated as non-Guarded.
storage that is Caching Inhibited; or if the 63 Set to 0.
access is due to a lwat, ldat, stwat, or
DAR Set to the effective address of a storage
stdat instruction that addresses storage
element as described in the following list.
that is Guarded; otherwise set to 0.
The list should be read from the top down;
38 Set to 1 for a Store, dcbz, or Load/Store
the DAR is set as described by the first item
Atomic instruction; otherwise set to 0.
that corresponds to an exception that is
39:40 Set to 0.
reported in the DSISR. For example, if a
41 Set to 1 if a Data Address Watchpoint
Load Word instruction causes a storage
match occurs; otherwise set to 0.
protection violation and a Data Address
42 Set to 1 if the access is not permitted by vir-
Watchpoint match (and both are reported in
tual page class key protection; otherwise
the DSISR), the DAR is set to the effective
set to 0.
address of a byte in the first aligned double-
43 Set to 0.
word for which access was attempted in the
44 Set to 1 if an unsupported radix tree config-
page that caused the exception.
uration is found during the translation pro-
undefined, for Load Atomic or Store
cess; otherwise set to 0.
Atomic instruction specifying an invalid
45 Set to 1 if an attempt to atomically set a ref-
function code
erence or change bit fails; otherwise set to
0.
undefined, when DSISR60=1
a Data Storage exception occurs for
Programming Note
reasons other than a Data Address
The number of attempts hardware Watchpoint match
makes to atomically set reference and - a byte in the block that caused the
change bits before triggering this exception, for a Cache Manage-
exception is implementation depen- ment instruction
dent. The POWER9 processor makes - a byte in the first aligned quad-
no attempt. Software may still support word for which access was
the atomic update programming model attempted in the page that caused
to get performance benefits such as the exception, for a quadword
those described in Section 5.7.12. Load or Store instruction (i.e., a
Load or Store instruction for which
46 Set to 1 if the address of the appropriate the storage operand is a quad-
process table entry or segment table entry word; “first” refers to address
group cannot be translated when VPM=0 order: see Section 6.7)
and HR=0, or the process table entry is - a byte in the first aligned double-
invalid (independent of VPM) when HR=0. word for which access was
47:59 Set to 0. attempted in the page that caused
60 Set to 1 if an accelerator is specified as the the exception, for a non-quadword
source of a copy instruction, normal mem- Load or Store instruction
ory is specified as the target of a paste. set as described in the previous major
instruction, or an attempt is made to access bullet, except that the low order 5 bits
an accelerator that is not properly config- are undefined, for a Data Address
Watchpoint match
For the cases in which the DAR is specified DAR Set to the effective address of a storage
above to be set to a defined value, if the element as described in the following list.
interrupt occurs in 32-bit mode the a byte in the block that caused the
high-order 32 bits of the DAR are set to 0. exception, for a Cache Management
instruction
If multiple Data Storage exceptions occur for a given
a byte in the first aligned quadword for
effective address, any one or more of the bits corre-
which access was attempted in the
sponding to these exceptions may be set to 1 in the
segment that caused the exception, for
DSISR. However, if one or more DSI-causing excep-
a quadword Load or Store instruction
tions occur together with a Virtualized Page Class Key
(i.e., a Load or Store instruction for
Storage Protection exception that occurs when
which the storage operand is a quad-
LPCRKBV=1 and Virtualized Partition Memory is dis-
word; “first” refers to address order:
abled by VPM=0, an HDSI results, and all of the excep-
see Section 6.7)
tions are reported in the HDSISR.
a byte in the first aligned doubleword
Execution resumes at effective address for which access was attempted in the
0x0000_0000_0000_0300, possibly offset as specified segment that caused the exception, for
in Figure 66. a non-quadword Load or Store instruc-
tion
6.5.4 Data Segment Interrupt If the interrupt occurs in 32-bit mode the
high-order 32 bits of the DAR are set to 0.
For Paravirtualized HPT Translation, a Data Segment
interrupt occurs when no higher priority exception Execution resumes at effective address
exists and a data access cannot be performed because 0x0000_0000_0000_0380, possibly offset as specified
data address translation is enabled and the effective in Figure 66.
address of any byte of the storage location specified by
a Load, Store, icbi, dcbz, dcbst, or dcbf[l] instruction Programming Note
cannot be translated to a virtual address. A Data Segment interrupt occurs if MSRDR=1 and
the translation of the effective address of any byte
For Radix Tree Translation (in other than hypervisor of the specified storage location is not found in the
real mode), a Data Segment interrupt occurs when no SLB (or in any implementation-specific address
higher priority exception exists and a data access can- translation lookaside information).
not be performed because for the effective address
specified by a Load, Store, icbi, dcbz, dcbst, or dcbf[l]
instruction, EA0:1=0b01 or EA0:1=0b10 when MSRHV
6.5.5 Instruction Storage Interrupt
PR 0b10 and data address translation is enabled, or
EA2:63 is outside the range translated by the appropri- An Instruction Storage interrupt occurs when no higher
ate Radix Tree. priority exception exists and either
If a stbcx., sthcx., stwcx., stdcx., or stqcx. would not (a) HPT Translation is being performed, the value of the
perform its store in the absence of a Data Segment
interrupt and a non-conditional Store to the specified expression
effective address would cause a Data Segment inter- ((MSRHV PR=0b10)|((¬VPM|¬PRTEV)&MSRIR))
rupt, it is implementation-dependent whether a Data
Segment interrupt occurs. is 1, and the next instruction to be executed cannot
If the XER specifies a length of zero for an indexed be fetched, or
Move Assist instruction, a Data Segment interrupt does
(b) Radix Tree translation is being performed and
not occur.
process-scoped translation prevents the next
The following registers are set:
instruction to be executed from being fetched
SRR0 Set to the effective address of the instruc-
tion that caused the interrupt. for any of the following reasons. (In the expression for
SRR1 (a) above, “¬PRTEV” is shorthand representing the
33:36 Set to 0. case of an invalid segment table descriptor stopping
42:47 Set to 0. the translation process.)
Others Loaded from the MSR. Instruction address translation is enabled and the
MSR See Figure 65. effective or virtual address cannot be translated to
a real address because no valid PTE was found for
DSISR Set to an undefined value. the process-scoped Radix Tree translation or HPT
translation with VPM off.
The address of the appropriate process table entry and HR=0, or the process table entry is
or segment table entry group cannot be translated invalid (independent of VPM) when HR=0.
when HR=0 and either VPM=0 or the process table 47 Set to 0.
entry is invalid (independent of VPM). Others Loaded from the MSR.
The fetch access violates storage protection.
MSR See Figure 65.
The process- and partition-scoped page attributes
conflict. If multiple Instruction Storage exceptions occur due to
An unsupported radix tree configuration is found in attempting to fetch a single instruction, any one or more
the process-scoped tables. of the bits corresponding to these exceptions may be
A reference bit update cannot be performed in a set to 1 in SRR1.
process-scoped PTE.
Execution resumes at effective address
The following registers are set: 0x0000_0000_0000_0400, possibly offset as specified
in Figure 66.
SRR0 Set to the effective address of the instruction
that the thread would have attempted to exe-
cute next if no interrupt conditions were pres- 6.5.6 Instruction Segment
ent (if the interrupt occurs on attempting to
fetch a branch target, SRR0 is set to the Interrupt
branch target address). For Paravirtualized HPT Translation, an Instruction
SRR1 Segment interrupt occurs when no higher priority
33 Set to 1 if MSRIR=1 and the translation for exception exists and the next instruction to be executed
an attempted access is not found in the cannot be fetched because instruction address transla-
Page Table; otherwise set to 0. tion is enabled and the effective address cannot be
34 Set to 1 if the process- and partition-scoped translated to a virtual address.
page attributes conflict; otherwise set to 0. For Radix Tree Translation (in other than hypervisor
35 Set to 1 if the access is to No-execute (as real mode), an Instruction Segment interrupt occurs
indicated by the N bit in the segment table when no higher priority exception exists and the next
entry or the N bit in the HPT PTE or the instruction to be executed cannot be fetched because
Execute and Privilege bits in the EAA field EA0:1=0b01 or EA0:1=0b10 when MSRHV PR 0b10
of the Radix PTE and IAMR key 0) or and instruction address translation is enabled, or
Guarded storage; otherwise set to 0. EA2:63 is outside the range translated by the appropri-
36 Set to 1 if the access is not permitted by ate Radix Tree.
Figure 44 or 46, as appropriate; otherwise
set to 0. The following registers are set:
SRR0 Set to the effective address of the instruction
42 Set to 1 if the access is not permitted by vir- that the thread would have attempted to exe-
tual page class key protection; otherwise cute next if no interrupt conditions were pres-
set to 0. ent (if the interrupt occurs on attempting to
43 Set to 0. fetch a branch target, SRR0 is set to the
44 Set to 1 if an unsupported radix tree config- branch target address).
uration is found during the translation pro-
cess; otherwise set to 0. SRR1
45 Set to 1 if an attempt to atomically set a ref- 33:36 Set to 0.
erence bit fails; otherwise set to 0. 42:47 Set to 0.
Others Loaded from the MSR.
Programming Note MSR See Figure 65 on page 1064.
The number of attempts hardware
makes to atomically set reference and Execution resumes at effective address
change bits before triggering this 0x0000_0000_0000_0480, possibly offset as specified
exception is implementation depen- in Figure 66.
dent. The POWER9 processor makes
no attempt. Software may still support Programming Note
the atomic update programming model An Instruction Segment interrupt occurs if
to get performance benefits such as MSRIR=1 and the translation of the effective
those described in Section 5.7.12. address of the next instruction to be executed is not
found in the SLB (or in any implementation-specific
address translation lookaside information).
46 Set to 1 if the address of the appropriate
process table entry or segment table entry
group cannot be translated when VPM=0
The following registers are set: The following registers are set:
SRR0 Set to the effective address of the instruc- SRR0 Set to the effective address of the instruc-
tion that the thread would have attempted tion that the thread would have attempted
to execute next if no interrupt conditions to execute next if no interrupt conditions
were present. were present.
SRR1 SRR1
33:36 Set to 0. 33:36 Set to 0.
42:47 Set to 0. 42:47 Set to 0.
Others Loaded from the MSR. Others Loaded from the MSR.
MSR See Figure 65 on page 1064.
MSR See Figure 65 on page 1064.
Execution resumes at effective address
Execution resumes at effective address
0x0000_0000_0000_0900, possibly offset as specified
0x0000_0000_0000_0A00, possibly offset as specified
in Figure 66.
in Figure 66.
the storage location specified by a Load, Store, The access violates storage protection. In addition
icbi, dcbz, dcbst, or dcbf[l] instruction cannot be to the legacy VPM cases, this includes mis-
translated to a real address because no valid PTE matches in access authority in which the pro-
was found for the VPM translation. cess-scoped PTE permits the access but the
HR=1 and the guest real address of any byte of the partition-scoped PTE does not. It also includes
storage location specified by a Load, Store, icbi, lack of necessary authority for accesses to pro-
dcbz, dcbst, or dcbf[l] instruction cannot be trans- cess-scoped tables, for example lack of write
lated to a host real address because no valid PTE authority to set a reference bit in the pro-
was found in the partition-scoped page table. cess-scoped PTE. (In such a case, the “access”
The guest real address of a page directory entry or reported as failing would be the access to the pro-
process table entry could not be translated when cess-scoped table. The HDAR would provide the
HR=1; or the virtual address of a process table guest real / (abbreviated) virtual address of the
entry or segment table entry group could not be table entry.)
translated when VPM=1 and HR=0. A Data Address Watchpoint match occurs, HR=0
An unsupported MMU configuration is found. In only.
addition to an invalid radix tree configuration found An attempt is made to execute a Load Atomic or
in the partition-scoped tables, this type of excep- Store Atomic instruction with an invalid function
tion will also be reported outside of hypervisor real code, HR=0 only.
mode for translation mode mismatches including
A Hypervisor Data Storage interrupt also occurs when
UPRT=0 when HR=1, LPID=0 if MSRHV=0 when
no higher priority exception exists and an attempt is
HR=1, and HR=0 for LPID=0 when HR=1 for
made to execute a Load Atomic or Store Atomic
another partition ID.
instruction specifying an invalid function code.
A reference or change bit update in a parti-
tion-scoped PTE cannot be performed (including
for the process-scoped PDE or PTE or process Programming Note
table entry for a radix guest. When an attempt to execute a Load Atomic or
Store Atomic instruction containing an invalid func-
Programming Note tion code (see Figures 3 and 4 in Book II) causes
When reporting failure to set a reference or an HDSI, the condition is very similar to an invalid
change bit for a table entry, whether the form of an instruction. As a result, this instance of
change bit must be set is inferred from HDSI occurs with a high prioirty that blocks the
whether the access is reported to be a store. translation process and prevents Reference and
(A load may report store if, when attempting to Change bit updates.
set the reference bit, the update of the change
bit in the partition-scoped PTE mapping the If a stbcx., sthcx., stwcx., stdcx., or stqcx. would not
process-scoped PTE fails.) Behavior is similar perform its store in the absence of a Hypervisor Data
for access authority failures. Storage interrupt, and either (a) the specified effective
address refers to storage that is Write Through
HR=0, data address translation is disabled Required or Caching Inhibited, or (b) a non-conditional
(MSRDR=0), and the virtual address of any byte of Store to the specified effective address would cause a
the storage location specified by a Load, Store, Hypervisor Data Storage interrupt, it is implementa-
icbi, dcbz, dcbst, or dcbf[l] instruction cannot be tion-dependent whether a Hypervisor Data Storage
translated to a real address by means of the virtual interrupt occurs.
real addressing mechanism.
The effective address specified by a lq, stq, lwat, If the XER specifies a length of zero for an indexed
ldat, lbarx, lharx, lwarx, ldarx, lqarx, stwat, Move Assist instruction, a Hypervisor Data Storage
stdat, stbcx., sthcx., stwcx., stdcx., or stqcx. interrupt does not occur.
instruction refers to storage that is Write Through The following registers are set:
Required or Caching Inhibited; or the effective
address specified by a copy or paste. instruction HSRR0 Set to the effective address of the instruc-
refers to storage that is Caching Inhibited; or the tion that caused the interrupt.
effective address specified by a lwat, ldat, stwat, HSRR1
or stdat instruction refers to storage that is 33:36 Set to 0.
Guarded. 42:47 Set to 0.
An accelerator is specified as the source of a copy Others Loaded from the MSR.
instruction, normal memory is specified at the tar-
get of a paste. instruction, or an attempt is made MSR See Figure 65.
to access an accelerator that is not properly con- HDSISR
figured for the software’s use; HR=0 only. 32 Set to 0.
- a byte in the first aligned double- the guest real address of the storage
word for which access was element, process table entry, page
attempted in the page that caused directory entry, or page table entry
the exception, for a non-quadword (depending on which partition-scoped
Load or Store instruction table has the flaw) for an unsupported
set as described in the previous major radix tree configuration in the parti-
bullet, except that the low order 5 bits tion-scoped table (the effective
are undefined, for a Data Address address for other cases of the invalid
Watchpoint match MMU configuration exception is found
in the HDAR)
For the cases in which the HDAR is speci-
the guest real address of the pro-
fied above to be set to an effective address,
cess-scoped PTE when an attempt is
if the interrupt occurs in 32-bit mode the
made to set a reference or change bit
high-order 32 bits of the HDAR are set to 0.
without write authority in the parti-
tion-scoped PTE that maps it
Programming Note
the guest real address or segment
Note that for HPT translation, the full EA is a super- descriptor associated with the speci-
set of the bits required to construct the full VA, fied storage element when a Hypervi-
when also provided with the VSID in the ASDR. sor Data Storage exception occurs for
reasons other than a Data Address
ASDR When HR=0, loaded with VSID, B, Ks, Kp, Watchpoint match
N, C, L, and LP values from the segment undefined, for a Data Address Watch-
descriptor that translated the access or point match, unsupported MMU config-
indicated the base of the table, or unde- uration, or accesses to storage that is
fined, as described in the following list. For Caching Inhibited or Write Through
a large segment the values of the bits Required by the instructions that are
below the VSID are undefined. When prohibited from making such accesses.
HR=1 (nested translaiton is taking place),
loaded with the guest real address down to If multiple Hypervisor Data Storage exceptions occur
bit 51 of a storage element or table entry, or for a given effective address, any one or more of the
undefined, as described in the following bits corresponding to these exceptions may be set to 1
list. The list should be read from the top in the HDSISR. If the HDSISR reports other exceptions
down; the ASDR is set as described by the together with a Virtualized Page Class Key Storage
first item that corresponds to an exception Protection exception that occurs when LPCRKBV=1 and
that is reported in the HDSISR. Virtualized Partition Memory is disabled by VPM=0, the
undefined, for Load Atomic or Store other exceptions are actually DSIs.
Atomic instruction specifying an invalid
function code Programming Note
A Virtual Page Class Key Storage Protection
undefined, when HDSISR60=1 exception that occurs with LPCRKBV=1 and Virtual-
the guest real page address of the ized Partition Memory disabled by VPM=0 identi-
table entry when a process table or fies an access that must be emulated by the
process-scoped page directory or hypervisor. When it is reported together with other
page table entry guest real address exceptions in the HDSISR, the hypervisor should
cannot be translated or the VSID of the service the Virtual Page Class Key Storage Protec-
table entry when a process or segment tion exception first. This is in part because the
table entry virtual address cannot be operating system may be using some PTE fields
translated (the rest of the segment for non-architected purposes, which could in turn
descriptor is implied). cause spurious exceptions to be reported.
the guest real address of the pro-
cess-scoped PDE or PTE or process Execution resumes at effective address
table entry when a reference or 0x0000_0000_0000_0E00, possibly offset as specified
change bit in the partition-scoped PTE in Figure 66.
mapping the process-scoped PDE or
PTE or process table entry cannot be
set atomically
the guest real address of the storage
element when a reference or change
bit in the partition-scoped PTE cannot
be set atomically
(b) Radix Tree translation is being performed and The following registers are set:
partition-scoped translation prevents the next HSRR0 Set to the effective address of the instruction
that the thread would have attempted to exe-
instruction to be executed from being fetched for any cute next if no interrupt conditions were pres-
ent (if the interrupt occurs on attempting to
of the following reasons.
fetch a branch target, HSRR0 is set to the
(In the expression for (a) above, “PRTEV” is shorthand branch target address).
indicating that an invalid segment table descriptor did
HSRR1
not stop the translation process. Note that an SLB hit
33 Set to 1 if the translation for an attempted
may satisfy this condition even when the Process Table
access is not found in the Page Table; oth-
Entry is invalid.)
erwise set to 0.
A Hypervisor Instruction Storage interrupt also occurs 34 Set to 0.
when no higher priority exception exists, HR=0, and a 35 Set to 1 if the access is to No-execute (as
reference or change bit update cannot be performed as indicated by the N bit in the segment table
described below. entry and HPT PTE or the exec bit in the
EAA field of the Radix PTE) or Guarded
Instruction address translation is enabled storage; otherwise set to 0.
(MSRIR=1) and the virtual address cannot be 36 Set to 1 if the access is not permitted by
translated to a real address because no valid PTE Figure 44 46, or the read or read/write bits
was found for the VPM translation. in Figure 45 as appropriate; otherwise set
HR=1 and the guest real address of the instruction to 0.
cannot be translated to a host real address 42 Set to 1 if the access is not permitted by vir-
because no valid PTE was found in the parti- tual page class key protection; otherwise
tion-scoped page table. set to 0.
The guest real address of a page directory entry or 43 Set to 0.
process table entry could not be translated when 44 Set to 1 if an unsupported MMU configura-
HR=1; or the virtual address of a process table tion is found during the translation process.
entry or segment table entry group could not be 45 Set to 1 if an attempt to atomically set a ref-
translated when VPM=1 and HR=0. erence or change bit fails; otherwise set to
An unsupported MMU configuration is found. In 0.
addition to an invalid radix tree configuration found
in the partition-scoped tables, this type of excep- Programming Note
tion will also be reported outside of hypervisor real
The number of attempts hardware
mode for translation mode mismatches including
makes to atomically set reference and
UPRT=0 when HR=1, LPID=0 if MSRHV=0 when
change bits before triggering this
HR=1, and HR=0 for LPID=0 when HR=1 for
exception is implementation depen-
another partition ID.
dent. The POWER9 processor makes
A reference or change bit update in a parti-
no attempt. Software may still support
tion-scoped PTE cannot be performed (including
the atomic update programming model
for the process-scoped PDE or PTE or process
to get performance benefits such as
table entry for a radix guest.
those described in Section 5.7.12.
Programming Note
This Programming Note illustrates how Hypervisor Emulation the instruction had caused a Hypervi-
Assistance interrupts should be handled by software, including sor Emulation Assistance interrupt
in environments that support nested hypervisors. (with HSRR145=1) to that hypervisor.
In this Note, “the hypervisor” may be the hypervisor to which - The program most recently dispatched by the level
hardware passes control when a Hypervisor Emulation Assis- N hypervisor is an operating system.
tance interrupt occurs or, in an environment that supports HSRR145=0: Emulate the instruction if
nested hypervisors, may be a nested hypervisor. The hypervi- appropriate (rather than pass control to the
sor to which hardware passes control when a Hypervisor operating system to do the emulation); oth-
Emulation Assistance interrupt occurs is here called the “level erwise pass control to the operating system
0 hypervisor,” and is the only level of hypervisor that runs with as if the instruction had caused an “Illegal
MSRHV PR=0b10 and that can access hypervisor resources Instruction type Program interrupt” as
directly; nested hypervisors run with MSRHV PR=0b00 and described in a Programming Note near the
their attempts to access hypervisor resources are virtualized end of <xref to Section 6.5.9>.
by a higher-level hypervisor as described below. In this Note, HSRR145=1: Either terminate the operating
the hypervisor receiving the Hypervisor Emulation Assistance system or pass control to the operating sys-
interrupt (which may have been passed from a higher-level tem as if the instruction had caused a Privi-
hypervisor as described below) is called the “level N hypervi- leged Instruction type Program interrupt as
sor.” This Note assumes that LPCREVIRT=1 if nested hypervi- described in a Programming Note near the
sors are used. (A Hypervisor Emulation Assistance interrupt end of <xref to Section 6.5.9>.
can set HSRR145 to 1 only when LPCREVIRT=1.) Higher level - The program most recently dispatched by the level
numbers correspond to lower level hypervisors. N hypervisor is an application program.
In the description immediately below, it is assumed that HSRR145=0: Emulate the instruction if
nested hypervisors (if any) are new versions of the existing appropriate; otherwise terminate the appli-
hypervisor, and that the purpose of the nesting is to test the cation program.
nested hypervisors before using them as level 0 hypervisors. HSRR145=1: Cannot occur.
When a Hypervisor Emulation Assistance interrupt is received The preceding description implicitly assumes that any nested
by the level N hypervisor, the cases and their suggested han- hypervisors being tested will, when run at level 0, be run on
dling are as follows. processors that support the same version of the architecture
The program that caused the interrupt is the level N as the processor on which they are being tested. If instead
hypervisor itself. they will be run on processors that support a newer version of
the architecture, the level 0 hypervisor should behave as
- HSRR145=0: Emulate the instruction, recover from described above if the interrupt is caused by an instruction
the error, or terminate this hypervisor, as appropri- that is unchanged between the two architecture versions.
ate. However, if the interrupt is caused by an instruction that differs
- HSRR145=1: Cannot occur for N=0; will not occur between the two architecture versions (e.g., an instruction that
for N>0 if the hypervisor nesting software is written is added by the newer version of the architecture), the level 0
correctly. hypervisor should emulate the behavior of the newer proces-
The program that caused the interrupt is not the level N sor, rather than, for example, passing the interrupt to a level 1
hypervisor. hypervisor.
- The program most recently dispatched by the level Other uses of nested hypervisors are also possible. For
N hypervisor is a level N+1 hypervisor. example, software that is designed to interact, nearly simulta-
HSRR145=0: Pass control to the level N+1 neously, with the hypervisor instance that is running on each
hypervisor as if the instruction had caused of many processors could be tested on a single processor by
a Hypervisor Emulation Assistance inter- running multiple level 1 hypervisors under a single level 0
rupt (with HSRR145=0) to that hypervisor. hypervisor.
HSRR145=1:
It is expected that in practice there will be at most two levels of
- The program that caused the interrupt nested hypervisor (i.e., N2). (For example, two levels are
is the level N+1 hypervisor: Virtualize
needed in the case described in detail above, to test the ability
the instruction.
of the nested hypervisors at level 1 to support nested hypervi-
- The program that caused the interrupt sors.)
is not the level N+1 hypervisor: Pass
control to the level N+1 hypervisor as if
HSRR1
Programming Note
33:36 Set to 0.
If a Hypervisor Emulation Assistance interrupt 42:47 Set to 0.
occurs with HSRR145=0 when the thread is not in Others Loaded from the MSR.
hypervisor state, for an instruction that the hypervi-
sor does not emulate, the hypervisor should pass MSR See Figure 65 on page 1064.
control to the operating system as if the instruction HMER See Section 6.2.9 on page 1051.
had caused an "Illegal Instruction type Program
interrupt", as described in a Programming Note
near the end of Section 6.5.9, “Program Interrupt” The exception bits in the HMER are sticky; that is, once
on page 1074. set to 1 they remain set to 1 until they are set to 0 by an
mthmer instruction.
Similarly, if a Hypervisor Emulation Assistance
interrupt occurs with HSRR145=1 when the thread Execution resumes at effective address
is in privileged non-hypervisor state, for an instruc- 0x0000_0000_0000_0E60.
tion that the hypervisor does not virtualize, the
hypervisor should pass control to the operating Programming Note
system as if the instruction had caused a Privileged Because the value of MSREE is always 1 when the
Instruction type Program interrupt, as described in thread is in problem state, the simpler expression
another Programming Note near the end of
(MSREE | ¬(MSRHV))
Section 6.5.9, “Program Interrupt” on page 1074.
is equivalent to the expression given above.
Programming Note
In versions of the architecture that precede V. 3.0B, Programming Note
an attempt when MSRPR=0 to execute an mtspr or If an implementation uses the HMER to record that
mfspr instruction specifying an SPR that was not a readable resource, such as the Time Base, has
implemented (with the exception of SPR 0 for been corrupted, then, because the HMI is disabled
mtspr and SPRs 0, 4, 5, and 6 for mfspr) was in the hypervisor state, it is necessary for the
treated as a no-op. These former no-op cases now hypervisor to check HMER after reading that
cause a Hypervisor Emulation Assistance interrupt resource to be sure an error has not occurred.
(with HSRR145=0) when LPCREVIRT=1 to enable
future functions to be emulated on older implemen-
tations. (An attempt when MSRPR=0 to execute an 6.5.20 Directed Hypervisor Door-
mtspr instruction specifying SPRs 4, 5, and 6 now
causes a Hypervisor Emulation Assistance inter- bell Interrupt
rupt regardless of the value of LPCREVIRT.) If there
A Directed Hypervisor Doorbell interrupt occurs when
is no future function emulation to be performed,
no higher priority exception exists, a Directed Hypervi-
hypervisor software must choose a policy from the
sor Doorbell exception is present, and the value of the
following.
following expression is 1.
treat the instruction as an error
emulate the legacy no-op behavior (MSREE | ¬(MSRHV) | MSRPR )
give control to the operating system
Directed Hypervisor Doorbell exceptions are generated
when Directed Hypervisor Doorbell messages (see
Chapter 10) are received and accepted by the thread.
6.5.19 Hypervisor Maintenance
The following registers are set:
Interrupt
HSRR0 Set to the effective address of the instruc-
A Hypervisor Maintenance interrupt occurs when no tion that the thread would have attempted
higher priority exception exists, a Hypervisor Mainte- to execute next if no interrupt conditions
nance exception exists (a bit in the HMER is set to were present.
one), the exception is enabled in the HMEER, and the
value of the following expression is 1. HSRR1
33:36 Set to 0.
(MSREE | ¬(MSRHV) | MSRPR ) 42:47 Set to 0.
Others Loaded from the MSR.
The following registers are set:
HSRR0 Set to the effective address of the instruc-
tion that the thread would have attempted
to execute next if no interrupt conditions
were present.
MSR See Figure 65 on page 1064. execute next if no interrupt conditions were
present.
Execution resumes at effective address
0x0000_0000_0000_0E80, possibly offset as specified SRR1
in Figure 66. 33:36 and 42:47
Reserved.
Programming Note Others Loaded from the MSR.
Because the value of MSREE is always 1 when the
thread is in problem state, the simpler expression MSR See Figure 65 on page 1064.
(MSREE | ¬(MSRHV)) Execution resumes at effective address
0x0000_0000_0000_0F00, possibly offset as specified
is equivalent to the expression given above.
in Figure 66.
Programming Note
When the System Call Vectored interrupt results in
MSRIR being 1 or MSRHV being 0, the effective
address described above is translated to a real
address before being used to access storage. If
the effective address cannot be translated, or if
instructions cannot be fetched from the addressed
storage location (e.g., the access would violate
storage protection, or would be to No-execute stor-
age), an [Hypervisor] Instruction Storage interrupt
occurs before the first instruction at the effective
address is executed.
Because the System Call Vectored interrupt uses
save/restore registers that differ from those used
by other interrupts, the System Call Vectored inter-
rupt handler can run with address translation
enabled and External interrupts enabled. Similarly,
the Programming Note about managing MSRRI at
the end of Section 6.4.3 does not apply to the Sys-
tem Call Vectored interrupt handler (the System
Call Vectored interrupt does not alter MSRRI).
6.7 Exception Ordering one exception, in the following list the hypervisor forms
of the Data Storage and Instruction Storage exceptions
Since multiple exceptions can exist at the same time can be substituted for the non-hypervisor forms since
and the architecture does not provide for reporting the hypervisor forms cannot be caused by the same
more than one interrupt at a time, the generation of instruction and have the same ordering. The exception
more than one interrupt is prohibited. Some exceptions, is that Virtual Page Class Key Storage Protection
such as the Mediated External exception, persist and exceptions that occur when LPCRKBV=1 and Virtual-
can be deferred. However, other exceptions would be ized Partition Memory is disabled by VPM=0 cause
lost if they were not recognized and handled when they only a Hypervisor Data Storage exception (and never a
occur. For example, if an External interrupt was gener- Data Storage exception).
ated when a Data Storage exception existed, the Data
System-Caused or Imprecise
Storage exception would be lost. If the Data Storage
exception was caused by a Store Multiple instruction 1. Program
for which the storage operand crosses a virtual page - Imprecise Mode Floating-Point Enabled Exception
boundary and the exception was a result of attempting 2. Hypervisor Maintenance
to access the second virtual page, the store could have 3. Hypervisor Virtualization, External, [Hypervisor]
modified locations in the first virtual page even though it Decrementer, Performance Monitor, Directed Privi-
appeared that the Store Multiple instruction was never leged Doorbell, Directed Hypervisor Doorbell
executed.
For the above reasons, all exceptions are prioritized
with respect to other exceptions that may exist at the
same instant to prevent the loss of any exception that is
not persistent. Some exceptions cannot exist at the
same instant as some others.
Data Storage, Hypervisor Data Storage, Data Segment,
and Alignment exceptions and transaction failure due
to attempted access of a disallowed type while in
Transactional state occur as if the storage operand
were accessed one byte at a time in order of increasing
effective address (with the obvious caveat if the oper-
and includes both the maximum effective address and
effective address 0). (The required ordering of excep-
tions on components of non-atomic accesses does not
extend to the performing of the component accesses in
the event of an exception. For example, if byte n
causes a data storage exception, it is not necessarily
true that the access to byte n-1 has been performed.)
Decrementer to step through zero). The exceptions tions to be reported. It then generates the appro-
are listed in order of highest to lowest priority. The priate ordered interrupt if no higher priority
phrase "corresponding interrupt" means the interrupt exception exists when the interrupt is to be gener-
having the same name as the exception unless the ated. Within this category a particular instruction
thread is in power-saving mode, in which case the may present more than a single exception. When
phrase means the System Reset interrupt. this occurs, those exceptions are ordered in prior-
ity as indicated in the following lists. Where [Hyper-
Unless otherwise stated or obvious from context, it is
visor] Data Storage, Data Segment, and Alignment
assumed below that one of the following conditions is
exceptions are listed in the same item they have
satisfied.
equal priority (i.e., the hardware may generate any
The thread is not in power-saving mode and the one of the three interrupts for which an exception
interrupt, unless it is the Machine Check inter- exists). For instructions that are disallowed in
rupt, is not disabled. (For the Machine Check Transactional state, and for mtspr specifying an
interrupt no assumption is made regarding SPR that is not part of the checkpointed registers
enablement.) and is not the GSR or a Transactional Memory
SPR, transaction failure takes priority over all
The thread is in power-saving mode and the
interrupts except Privileged Instruction type Pro-
exception is enabled to cause exit from the
gram interrupts, Hypervisor Emulation Assistance
mode.
interrupts, and [Hypervisor] Facility Unavailable
With one exception, in the following list the hypervisor interrupts. For data accesses that are disallowed
forms of the Data Storage and Instruction Storage in Transactional state, transaction failure has the
exceptions can be substituted for the non-hypervisor same priority as the group of “other” [Hypervisor]
forms since the hypervisor forms cannot be caused by Data Storage, Data Segment, and Alignment
the same instruction and have the same priority. The exceptions. (See Section 5.3.1 of Book II.)
exception is that exceptions caused by Virtual Page
A. Fixed-Point Loads and Stores
Class Key Storage Protection exceptions that occur
a. These exceptions are mutually exclusive
when LPCRKBV=1 and Virtualized Partition Memory is
and have the same priority:
disabled by VPM=0 cause only a Hypervisor Data Stor-
Hypervisor Emulation Assistance
age exception (and never a Data Storage exception).
Program - Privileged Instruction
1. System Reset b. Hypervisor Facility Unavailable
c. Facility Unavailable
System Reset exception has the highest priority of
d. Data Storage for the case of Fixed-Point
all exceptions. If this exception exists, the interrupt
Load or Store Caching Inhibited instructions
mechanism ignores all other exceptions and gen-
with MSRDR=1 or the case of an invalid
erates a System Reset interrupt.
function code for an Atomic Memory
Once the System Reset interrupt is generated, no Operation
nonmaskable interrupts are generated due to e. all other Data Storage, Hypervisor Data
exceptions caused by instructions issued prior to Storage, [Hypervisor] Data Segment,
the generation of this interrupt. Machine Check for invalid accelerator
access, or Alignment
2. Machine Check
f. Trace
With one exception, the Machine Check exception
B. Floating-Point Loads and Stores
is the second highest priority exception. If this
a. Hypervisor Emulation Assistance
exception exists and a System Reset exception
b. Hypervisor Facility Unavailable
does not exist, the interrupt mechanism ignores all
c. Floating-Point Unavailable
other exceptions and generates a Machine Check
d. [Hypervisor] Data Storage, [Hypervisor]
interrupt. The exception is that a Machine Check
Data Segment, Machine Check for invalid
caused by an attempt to access an accelerator as
accelerator access, or Alignment
other than an operand of copy or paste. is priori-
e Trace
tized similarly to a storage protection exception.
C. Vector Loads and Stores
Once the Machine Check interrupt is generated,
a. Hypervisor Emulation Assistance
no nonmaskable interrupts are generated due to
b. Hypervisor Facility Unavailable
exceptions caused by instructions issued prior to
c. Vector Unavailable
the generation of this interrupt.
d. [Hypervisor] Data Storage, [Hypervisor]
3. Instruction-Caused and Precise Data Segment, Machine Check for invalid
accelerator access, or Alignment
This exception is the third highest priority excep-
e. Trace
tion. When this exception is created, the interrupt
mechanism waits for all possible Imprecise excep- D. VSX Loads and Stores
Programming Note
An incorrect or malicious operating system
could corrupt the first instruction in the inter-
rupt vector location for an instruction-caused
interrupt such that the attempt to execute the
instruction causes the same exception that
caused the interrupt (a looping interrupt; e.g.,
Trap instruction and Program interrupt). Simi-
larly, the first instruction of the interrupt vector
for one instruction-caused interrupt could
cause a different instruction-caused interrupt,
and the first instruction of the interrupt vector
for the second instruction-caused interrupt
could cause the first instruction-caused inter-
rupt (e.g., Program interrupt and Floating-Point
Unavailable interrupt). The looping caused by
these and similar cases is terminated by the
occurrence of a System Reset or Hypervisor
Decrementer interrupt.
6.10 Relationship of
Event-Based Branches to Inter-
rupts
6.10.1 EBB Exception Priority
Event-based branches have a priority lower than that of
all interrupts. When an event-based exception is cre-
ated, the Event-Based Branch facility waits for all possi-
ble exceptions that would cause interrupts to be
reported. It then generates the event-based branch if
no exception that would cause an interrupt exists when
the event-based branch is to be generated.
7.4.1 Writing and Reading the hardware or of modification of the Hypervisor Decre-
menter caused by execution of an mtspr instruction.
Decrementer
The operation of the Hypervisor Decrementer has the
The contents of the Decrementer can be read or written following additional properties.
using the mfspr and mtspr instructions, both of which
1. Loading a GPR from the Hypervisor Decrementer
are privileged when they refer to the Decrementer.
has no effect on the accuracy of the Hypervisor
Using an extended mnemonic (Figure 18), the Decre-
Decrementer.
menter can be written from GPR Rx using:
2. Copying the contents of a GPR to the Hypervisor
mtdec Rx Decrementer replaces the contents of the Hypervi-
The Decrementer can be read into GPR Rx using: sor Decrementer with the contents of the GPR.
The definition of the set of threads S, the shared The rate at which the value represented by the con-
resources corresponding to the set S, and specifics of tents of the SPURR increases is an estimate of the por-
the algorithm for incrementing the PURR are imple- tion of resources used by the thread with respect to
mentation-specific. other threads that share those resources monitored by
the SPURR, and relative to the computational capacity
The PURR is implemented such that: provided by those resources. The computational
1. Loading a GPR from the PURR has no effect on capacity provided by the shared resources may vary as
the accuracy of the PURR. a function of the frequency of the oscillator which drives
the resources or as a result of deliberate delays in pro-
2. Copying the contents of a GPR to the PURR cessing that are created to reduce power consumption.
replaces the contents of the PURR with the con- When the thread is idle, the rate at which the SPURR
tents of the GPR. value increases is implementation dependent.
Programming Note
Estimates computed as described above may be
useful for purposes of resource use accounting,
program dispatching, etc.
IC
0 63
8.1 Overview setting need not occur until a subsequent context syn-
chronizing operation has occurred.
Implementations provide debug facilities to enable
hardware and software debug functions, such as con- CFAR //
trol flow tracing, data address watchpoints, and pro- 0 62 63
gram single-stepping. The debug facilities described in
Figure 73. Come-From Address Register
this section consist of the Come-From Address Regis-
ter (see Section 8.2), Completed Instruction Address The contents of the CFAR can be read and written
Breakpoint Register (see Section 8.3), and the Data using the mfspr and mtspr instructions. Acccess to the
Address Watchpoint Register (DAWRn) and Data CFAR is privileged.
Address Watchpoint Register Extension (DAWRXn)
(see Section 8.4). The interrupt associated with the Programming Note
Data Address Breakpoint registers is described in Sec- This register can be used for purposes of debug-
tion 6.5.3. The interrupt associated with the Completed ging software. For example, often a software bug
Instruction Address Breakpoint Register is described in results in the program executing a portion of the
Section 6.5.15. The Trace facility, which can be used code that it should not have reached or causing an
for single-stepping as well as for control flow tracing, is unexpected interrupt. In the former case, a break-
described in Section 6.5.15. point can be placed in the portion of the code that
The mfspr and mtspr instructions (see Section 4.4.4) was erroneously reached and the program reexe-
provide access to the registers of the debug facilities. cuted. In either case, the interrupt handler can save
the contents of the CFAR (before executing the first
In addition to the facilities mentioned above, implemen- instruction that would modify the register), and then
tations typically provide debug facilities, modes, and make the saved contents available for a debugger
access mechanisms that are implementation-specific. to use in determining the control flow path by which
For example, implementations typically provide facili- the exception was reached.
ties for instruction address tracing, and also access to
certain debug facilities via a dedicated interface such In order to preserve the CFAR's contents for each
as the IEEE 1149.1 Test Access Port (JTAG). partition and to prevent it from being used to imple-
ment a "covert channel" between partitions, the
hypervisor should initialize/save/restore the CFAR
8.2 Come-From Address Regis- when switching partitions on a given thread.
ter
The Come-From Address Register (CFAR) is a 64-bit 8.3 Completed Instruction
register. When an rfebb, rfid, or rfscv instruction is
executed, the register is set to the effective address of
Address Breakpoint
the instruction. When a Branch instruction is executed The Completed Instruction Address Breakpoint mecha-
and the branch is taken, the register is set to the effec- nism provides a means of detecting an instruction com-
tive address of an instruction in the instruction cache pletion at a specific instruction address. The address
block containing the Branch instruction, except that if comparison is done on an effective address (EA).
the Branch instruction is a B-form Branch (i.e., bc, bca,
bcl, or bcla) for which the target address is in the The Completed Instruction Address Breakpoint mecha-
instruction cache block containing the Branch instruc- nism is controlled by the Completed Instruction
tion or is in the previous or next cache block, the regis-
ter is not necessarily set. For Branch instructions, the
Address Breakpoint Register (CIABR), shown in Figure 75, and the Data Address Watchpoint Register
Figure 75. Extension (DAWRXn), shown in Figure 76.
The Data Address Watchpoint mechanism provides a All other fields are reserved.
means of detecting load and store accesses to a range
Figure 76. Data Address Watchpoint Register
of addresses starting at a designated doubleword. The
Extension
address comparison is done on an effective address
(EA). The supported PRIVM values are 0b000, 0b001,
0b010, 0b011, 0b100, and 0b111. If the PRIVM field
Programming Note does not contain one of the supported values, then
The Data Address Watchpoint mechanism employs whether a match occurs for a given storage access is
a simple EA compare. It makes no attempt to take undefined. Elsewhere in this section it is assumed that
the radix table translation quadrants (keyed off the PRIVM field contains one of the supported values.
EA0:1) into account to enable a single setting to
work in all privilege levels.
Programming Note
PMC5 and PMC6 are defined to facilitate calculat-
ing basic performance metrics such as cycles per
instruction (CPI).
9.4.4 Monitor Mode Control 1 The PMCs are not incremented, and
entries are not written into the BHRB, if
Register 0 MSRHV PR=0b00.
Monitor Mode Control Register 0 (MMCR0) is a 64-bit 34 Conditionally Freeze Counters and BHRB
register as shown below. in Problem State (FCP)
If the value of bit 51 (FCPC) is 0, this field has
MMCR0 the following meaning.
0 63
0 The PMCs are incremented (if permitted
Figure 78. Monitor Mode Control Register 0 by other MMCR bits) and entries are writ-
ten into the BHRB (if permitted by the
MMCR0 is used to control multiple functions of the Per-
BHRB Instruction Filtering Mode field in
formance Monitor. Some fields of MMCR0 are altered
MMCRA).
by the hardware when various events occur.
1 The PMCs are not incremented, and
The following notation is used in the definitions below. entries are not written into the BHRB, if
“PMCs” refers to PMCs 1 - n and “PMCj” refers to MSRPR=1.
PMCj, where 2 j n. n=4 when MMCR0PMCC=0b11
If the value of bit 51 (FCPC) is 1, this field has
and n=6 otherwise.
the following meaning.
When MMCR0PMCC is set to 0b10 or 0b11, providing 0 The PMCs are not incremented, and
problem state programs read/write access to MMCR0, entries are not written into the BHRB, if
only FC, PMAE, PMAO can be accessed. All other bits MSRHV PR=0b01.
are not changed when mtspr is executed in problem 1 The PMCs are not incremented, and
state, and all other bits return 0s when mfspr is exe- entries are not written into the BHRB, if
cuted in problem state. MSRHV PR=0b11.
Load or Store instruction to the point at F0 The thread has completed a Store instruc-
which it has reported all exceptions it will tion to the point at which it has reported all
cause. the exceptions it will cause.
F6 The thread has failed to locate an ERAT F2 The thread has dispatched an instruction.
entry during instruction address transla- F4 A cycle has occurred during which the
tion. RUN bit of the thread’s CTRL register
F8 A cycle has occurred during which all pre- contained 1.
viously initiated instructions have com- F6 The thread has failed to locate an ERAT
pleted and no instructions are available for entry during data address translation, and
initiation. a new ERAT entry corresponding to the
FA A cycle has occurred during which the data effective address has been written.
RUN bit of the CTRL register for one or F8 An external interrupt for the thread has
more threads of the multi-threaded pro- occurred.
cessor was set to 1. FA The thread has completed a Branch
FC A load type instruction finished. If the instruction for which the branch was
instruction caused more than one refer- taken.
ence, only one will be counted. FC The thread has failed to locate an instruc-
FE The thread has completed an instruction. tion in the primary cache.
FE The thread has filled a block in the primary
40:47 PMC2 Selector (PMC2SEL) data cache with data that were accessed
The value of PMC2SEL specifies the event to by a Load instruction and obtained from a
be counted by PMC2 as defined below. location other than the secondary cache.
All values in the range of E0 - FF that are not
specified below are reserved. 48:55 PMC3Selector (PMC3SEL)
The value of PMC3SEL specifies the event to
Hex Event
be counted by PMC3 as defined below.
00 Disable events. (No events occur.) All values in the range of E0 - FF that are not
01-BF Implementation-dependent specified below are reserved.
C0-DF Reserved
Hex Event
00 Disable events. (No events occur.)
The following events can occur only when ran-
01-BF Implementation-dependent
dom sampling is enabled (MMCRASE=1). The
C0-DF Reserved
sampling modes corresponding to each event
are listed in parentheses. (The sampling mode The following events can occur only when ran-
is specified in MMCRASM.) dom sampling is enabled (MMCRASE=1). The
E0 The thread has obtained the data for a sampling modes corresponding to each event
randomly sampled Load instruction from are listed in parentheses. (The sampling mode
storage that did not reside in any cache. is specified in MMCRASM.)
(RIS, RLS) E2 The thread has completed a randomly
E2 The thread has failed to locate the data for sampled Store instruction to the point at
a randomly sampled Load instruction in which it has reported all exceptions it will
the primary data cache. (RIS, RLS) cause. (RIS,RLS)
E4 The thread filled a block in the primary E4 The thread has mispredicted either
data cache with data that were accessed whether or not the branch would be
by a randomly sampled Load instruction taken, or if taken, the target address of a
and obtained from a location other than randomly sampled Branch instruction.
the secondary or tertiary cache. (RIS, (RIS, RBS)
RLS) E6 The thread has failed to locate an ERAT
E6 The threshold event counter has entry during data address translation for a
exceeded the number of events corre- randomly sampled instruction. (RIS,RLS)
sponding to threshold B (see Table 5). E8 The threshold event counter has
(RIS, RLS, RBS) exceeded the number of events corre-
E6 The threshold event counter has sponding to threshold C (see Table 5).
exceeded the number of events corre- (RIS, RLS, RBS)
sponding to threshold F (see Table 5). EA The threshold event counter has
(RIS, RLS, RBS) exceeded the number of events corre-
sponding to threshold G (see Table 5).
The following events can occur regardless of
(RIS, RLS, RBS)
whether random sampling is enabled.
The following events can occur regardless of sponding to threshold D (see Table 5).
whether random sampling is enabled. (RIS, RLS, RBS)
EC The threshold event counter has
F0 The thread has attempted to store data in exceeded the number of events corre-
the primary data cache but no block corre- sponding to threshold H (see Table 5).
sponding to the real address existed. (RIS, RLS, RBS)
F2 The thread has dispatched an instruction.
The following events can occur regardless of
F4 The thread has completed an instruction
whether random sampling is enabled.
when the RUN bit of the CTRL register for
all threads on the multi-threaded proces-
F0 The thread has attempted to load data
sor contained 1.
from the primary data cache but no block
F6 The thread has filled a block in the primary
corresponding to the real address existed.
data cache with data that were accessed
F2 A cycle has occurred during which the
by a Load instruction.
thread has dispatched one or more
F8 A Time Base transition event has
instructions.
occurred for the thread. This event is
F4 A cycle has occurred during which the
counted regardless of whether or not Time
PURR was incremented when the RUN bit
Base transition events are enabled by
of the thread’s CTRL register contained 1.
MMCR0TBEE.
F6 The thread has mispredicted either
FA The thread has loaded an instruction from
whether or not the branch would be
a higher level cache than the tertiary
taken, or if taken, the target address of a
cache.
Branch instruction.
FC The thread was unable to translate a data
F8 The thread has discarded prefetched
virtual address using the TLB.
instructions.
FE The thread has filled a block in the primary
FA The thread has completed an instruction
data cache with data that were accessed
when the RUN bit of the thread’s CTRL
by a Load instruction and obtained from a
register contained 1.
location other than the secondary or ter-
FC The thread was unable to translate an
tiary cache.
instruction virtual address using the TLB,
and a new TLB entry corresponding to the
56:63 PMC4 Selector (PMC4SEL)
instruction virtual address has been writ-
The value of PMC4SEL specifies the event to
ten.
be counted by PMC4 as defined below.
FE The thread has obtained the data for a
All values in the range of E0 - FF that are not
Load instruction from storage that did not
specified below are reserved.
reside in any cache.
Hex Event
00 Disable events. (No events occur.) Compatibility Note
01-BF Implementation-dependent In versions of the architecture that precede Version
C0-DF Reserved 2.02 the PMC Selector Fields were six bits long,
and were split between MMCR0 and MMCR1.
The following events can occur only when ran- PMC1-8 were all programmable.
dom sampling is enabled (MMCRASE=1). The
sampling modes corresponding to each event If more programmable PMCs are implemented in
are listed in parentheses. (The sampling mode the future, additional MMCRs may be defined to
is specified in MMCRASM.) cover the additional selectors.
E0 The thread has completed a randomly
sampled instruction. (RIS, RLS, RBS)
E4 The thread was unable to translate a data 9.4.6 Monitor Mode Control
virtual address using the TLB for a ran-
domly sampled instruction. (RIS,RLS)
Register 2
E6 The thread has loaded a randomly sam- Monitor Mode Control Register 2 (MMCR2) is a 64-bit
pled instruction from a higher level cache register that contains 9-bit control fields for controlling
than the tertiary cache. (RIS) the operation of PMC1 - PMC6 as shown below.
E8 The thread has filled a block in the primary
data cache with data that were accessed C1 C2 C3 C4 C5 C6 Res’d.
by a randomly sampled Load instruction
0 8 9 17 18 26 27 35 36 44 45 53 54 63
and obtained from a location other than
the secondary cache. (RIS, RLS) Figure 80. Monitor Mode Control Register 2
EA The threshold event counter has
exceeded the number of events corre-
When MMCR0PMCC = 0b11, fields C1 - C4 control the 0 PMCn is incremented (if permitted by
operation of PMC1 - PMC4, respectively and fields C5 other MMCR bits).
and C6 are ignored by the hardware; otherwise, fields 1 PMCn is not incremented if CTRLRUN=0.
C1 - C6 control the operation of PMC1 - PMC6, respec-
tively. The bit definitions of each Cn field are as fol- Programming Note
lows, where n = 1,...6. The operating system is expected to set
When MMCR0PMCC is set to 0b10 or 0b11, providing CTRLRUN to 0 when the thread is in a
problem state programs read/write access to MMCR2, “wait state”, i.e., when there is no process
only the FCnP0 bits can be accessed. All other bits are ready to run.
not changed when mtspr is executed in problem state,
and all other bits return 0s when mfspr is executed in 6 Freeze Counter n in Hypervisor State
problem state. (FCnH)
34:36 Threshold Event Counter Exponent This field specifies the event that causes the
(TECX) threshold event counter to start counting
occurrences of the event specified in the
This field species the exponent of the thresh- Threshold Event Counter Event (TECE) field.
old event counter value. See Section 9.4.3 for The events only occur if MMCRASE=1 (ran-
additional information. The maximum expo- dom sampling enabled) and one of the sam-
nent supported is at least 5. pling modes listed in parenthesis is in effect.
37 Reserved (The sampling mode that is currently in effect
is specified in MMCRASM.)
38:44 Threshold Event Counter Multiplier (TECM) 0000 Reserved.
This field species the multiplier of the thresh- 0001 The thread has randomly sampled an
old event counter value. See Section 9.4.3 for instruction while it is being decoded.
additional information. (RIS)
0010 The thread has dispatched a randomly
sampled instruction. (RIS)
0011 A randomly sampled instruction has
been sent to a facility (e.g. Branch,
Fixed Point, etc.) (RIS, RLS, RBS)
0100 The thread has completed a randomly
sampled instruction to the point at which
it has reported all exceptions it will
cause. (RIS, RLS, RBS)
0101 The thread has completed a randomly
sampled instruction. (RIS, RLS, RBS)
0110 The thread has failed to locate data for a 57:59 Eligibility for Random Sampling (ES)
randomly sampled Load instruction in When random sampling is enabled
the primary data cache. (RIS, RLS) (MMCRASE=1) and the SM field indicates ran-
0111 The thread has filled a block in the pri- dom instruction sampling (RIS), the encodings
mary data cache with data that were of this field specify the instructions that are eli-
accessed by a randomly sampled Load gible to be sampled as follows.
instruction. (RIS, RLS)
000 All instructions
The definition of the following values depends 001 All Load and Store instructions
on whether the access to MMCRA is in prob- 010 All probe no-op instructions
lem state or in privileged state. 011 Reserved
Problem state access (SPR 770) The definition of the following values depends
1000 - 1111 - Reserved on whether the access to MMCRA is in prob-
lem state or in privileged state.
Privileged access (SPR 770 or 786)
1000 - 1111 - Implementation-dependent Problem state access (SPR 770)
100 - 111 - Reserved
Privileged access (SPR 770 or 786)
52:55 Threshold End Event (TE)
100 - 111 - Implementation-dependent
This field specifies the event that causes the
threshold event counter to stop counting
occurrences of the event specified in the When random sampling is enabled
Threshold Event Counter Event (TECE) field. (MMCRASE=1) and the SM field indicates ran-
The events only occur if MMCRASE=1 (ran- dom Load/Store Facility sampling (RLS), the
dom sampling enabled) and one of the sam- encodings of this field specify the instructions
pling modes listed in parenthesis is in effect. that are eligible to be sampled as follows.
(The sampling mode that is currently in effect 000 Instructions for which the thread has
is specified in MMCRASM.) attempted to load data from the data
0000 Reserved cache but no block corresponding to the
0001 The thread has randomly sampled an real address existed.
instruction while it is being decoded. 001 Reserved
(RIS) 010 Reserved
0010 The thread has dispatched a randomly 011 Reserved
sampled instruction. (RIS)
The definition of the following values depends
0011 A randomly sampled instruction has
on whether the access to MMCRA is in prob-
been sent to a facility (e.g. Branch,
lem state or in privileged state.
Fixed Point, etc.) (RIS, RLS, RBS)
0100 The thread has completed a randomly Problem state access (SPR 770)
sampled instruction to the point at which 100 - 111 - Reserved
it has reported all exceptions that it will
Privileged access (SPR 770 or 786)
cause. (RIS, RLS, RBS)
100 - 111 - Implementation-dependent
0101 The thread has completed a randomly
sampled instruction. (RIS, RLS, RBS) When random sampling is enabled
0110 The thread has failed to locate data for a (MMCRASE=1) and the SM field indicates ran-
randomly sampled Load instruction in dom Branch Facility sampling (RBS), the
the primary data cache. (RIS, RLS) encodings of this field specify the instructions
0111 The thread has filled a block in the pri- that are eligible to be sampled as follows.
mary data cache with data that were 000 Instructions for which the thread has
accessed by a randomly sampled Load either mispredicted whether or not the
instruction. (RIS, RLS) branch would be taken, or if taken, the
target address of a Branch instruction.
The definition of the following values depends
001 Instructions for which the thread has
on whether the access to MMCRA is in prob-
mispredicted whether or not the branch
lem state or in privileged state.
of a Branch instruction would be taken
Problem state access (SPR 770) because the contents of the Condition
1000 - 1111 - Reserved Register differed from the predicted con-
tents.
Privileged access (SPR 770 or 786)
010 Instructions for which the thread has
1000 - 1111 - Implementation-dependent
mispredicted the target address of a
56 Reserved Branch instruction.
011 All Branch instructions for which the cuted, possibly out-of-order, at or around the time that
branch was taken. the Performance Monitor alert occurred.
The definition of the following values depends The instruction located at the effective address con-
on whether the access to MMCRA is in prob- tained in the SIAR is called the “sampled instruction”.
lem state or in privileged state.
Problem state access (SPR 770)
100 - 111 - Reserved The contents of SIAR may be altered by the hardware if
and only if MMCR0PMAE=1. Thus after the Perfor-
Privileged access (SPR 770 or 786) mance Monitor alert occurs, the contents of SIAR are
100 - 111 - Implementation-dependent not altered by the hardware until software sets
MMCR0PMAE to 1. After software sets MMCR0PMAE to
1, the contents of SIAR are undefined until the next
60 Reserved Performance Monitor alert occurs.
61:62 Random Sampling Mode (SM)
Programming Note
00 Random Instruction Sampling (RIS) -
Instructions that meet the criterion speci- When the Performance Monitor alert occurs,
fied in the ES field for random instruction SIERAMPPR SAMPHV indicates the value of
sampling are eligible to be sampled. MSRHV PR that was in effect when the sampled
01 Random Load/Store Facility Sampling instruction was being executed. (The contents of
(RLS) - Instructions that meet the criterion these SIER bits are visible only in privileged state.)
specified in the ES field for random Load/
Store Facility sampling are eligible for
sampling. 9.4.9 Sampled Data Address Reg-
10 Random Branch Facility Sampling ister
(RBS) - Instructions that meet the criterion
specified in the ES field for random The Sampled Data Address Register (SDAR) is a 64-bit
Branch Facility sampling are eligible for register.
sampling.
11 Reserved SDAR
63 Random Sampling Enable (SE) 0 63
010 The thread obtained the instruction in 000 The instruction did not require data
the secondary cache. address translation.
011 The thread obtained the instruction in 001 The thread translated the data virtual
the tertiary cache. address using the TLB.
100 The thread failed to obtain the instruc- 010 A PTEG required for data address trans-
tion in the primary, secondary, or tertiary lation for the instruction was obtained
cache from the secondary cache.
101 Reserved 011 A PTEG required for data address trans-
110 Reserved lation for the instruction was obtained
111 Reserved from the tertiary cache.
100 A PTEG required for data address trans-
52 Sampled Instruction Taken Branch
lation for the instruction was obtained
(SITAKBR)
from storage that did not reside in any
Set to 1 if the SITYPE field indicates a Branch cache.
instruction and the branch was taken; other- 101 A PTEG required for data address trans-
wise set to 0. lation for the instruction was obtained
from a cache on a different
53 Sampled Instruction Mispredicted Branch
multi-threaded processor that resides on
(SIMISPRED)
the same chip as the thread.
Set to 1 if the SITYPE field indicates a Branch 110 A PTEG required for data address trans-
instruction and the thread has mispredicted lation for the instruction was obtained
either whether or not the branch would be from a cache on a different chip from the
taken, or if taken, the target address; other- thread.
wise set to 0. 111 Reserved
54:55 Sampled Branch Instruction Misprediction
Information (SIMISPREDI)
60:62 Sampled Instruction Data Storage Access
If SIMISPRED=1, this field indicates how the Information (SIDSAI)
thread mispredicted the outcome of a Branch
This field contains information about data stor-
instruction; otherwise this field is set to 0s.
age accesses made by the sampled instruc-
00 The instruction was not a mispredicted
tion. The values and their meanings are as
Branch instruction.
follows.
01 The thread mispredicted whether or not
000 The instruction did not require data
the branch would be taken because the
address translation.
contents of the Condition Register dif-
001 The instruction was a Read for which
fered from the predicted contents.
the thread obtained the referenced data
10 The thread mispredicted the target
from the primary data cache.
address of the instruction.
010 The instruction was a Read for which
11 Reserved
the thread obtained the referenced data
56 Sampled Instruction Data ERAT Miss (SID- from the secondary cache.
ERAT) 011 The instruction was a Read for which
When the SITYPE field indicates a Load or the thread obtained the referenced data-
Store instruction, this field is set to 1 if the from the tertiary cache.
thread has failed to locate an ERAT entry 100 The instruction was a Read for which
during data address translation for the sam- the thread obtained the referenced data-
pled instruction and otherwise is set to 0. from storage that did not reside in any
cache.
When the SITYPE field does not indicate a 101 The instruction was a Read for which
Load or Store instruction, the contents of this the thread obtained the referenced data
field are undefined. from a cache on a different
57:59 Sampled Instruction Data Address Transla- multi-threaded processor that resides on
tion Information (SIDAXLATE) the same chip as the thread.
110 The instruction was a Read for which
This field contains information about data the thread obtained the referenced data
address translation for the sampled instruc- from a cache on a different chip from the
tion. If multiple data address translations were thread.
performed, the information pertains to the last 111 The instruction was a Store for which
translation. The values and their meanings the data were placed into a location
are as follows. other than the primary data cache.
63 Sampled Instruction Completed (SICMPL) Monitor facility are undefined and may change even
when MMCR0PMAE=0.
Set to 1 if the sampled instruction has com-
pleted; otherwise set to 0.
Programming Note
A potential combined use of the Trace and Perfor-
9.5 Branch History Rolling Buf- mance Monitor facilities is to trace the control flow
of a program and simultaneously count events for
fer that program.
The Branch History Rolling Buffer (BHRB) is described
in Chapter 8 of Book II but only at the level required by
application programmers. Additional aspects of the
BHRB are described here.
In order to enable problem state programs to use the
BHRB, MMCR0BHRBA must be set to 1 to enable exe-
cution of clrbhrb and mfbhrbe instructions in problem
state. Additionally, MMCR0PMCC must be set to 0b10 or
0b11 to allow problem state programs to read and write
the necessary Performance Monitor registers. (See
Section 9.4.4.)
If Performance Monitor event-based branching is
desired, MMCR0EBE must also be set to 1 to enable
Performance Monitor event-based branches.
Programming Note
Enabling Performance Monitor event-based
branching eliminates the need for the problem state
program to poll MMCR0PMAO in order to determine
when a Performance Monitor alert occurs.
Programming Note
mfbhrbe instructions return 0s when MMCR0P-
MAE=1 in order to prevent software from reading
the BHRB while it is being written by hardware.
BHRB Filtering
When the BHRB is written by hardware, only those
Branch instructions that meet the filtering criterion
specified in MMCRAIFM and for which the branch was
taken are included.
DPDES
0 63
Programming Note
The primary use of the DPDES is to provide the
means for the hypervisor to save a [sub-]proces-
sor's Directed Privileged Doorbell exception state
when the set of programs running on the [sub-]pro-
cessor is swapped out or moved from one
[sub-]processor to another. Since there is no such
need for a similar function for the hypervisor, there
is no similar register for the hypervisor. Privileged
programs are able to read the DPDES in order to
poll for Directed Privileged Doorbell exceptions
when the corresponding interrupt is disabled
(MSREE=1).
Programming Note
msgclr is typically issued only when MSREE=0. If
msgclr is executed when MSREE=1 when a
Directed Hypervisor Doorbell interrupt is about to
occur, the corresponding interrupt may or may not
occur.
Message Send Privileged X-form sors are not supported), then this instruction
behaves as a no-op
msgsndp RB
The actions taken on receipt of a message are defined
in Section 10.2.
31 /// /// RB 142 /
0 6 11 16 21 31 This instruction is privileged.
Special Registers Altered:
msgtype (RB)32:36 DPDES
payload (RB)37:63
t (RB)57:63
if msgtype = 5 and Programming Note
t maximum privileged thread number If msgsndp is used to notify the receiver that
on processor or sub-processor updates have been made to storage, a lwsync or
then sync should be placed between the stores and the
DPDES63-t 1 msgsndp. See Section 5.9.2.
send_msg(msgtype, payload, t)
msgsndp sends a message to other threads that are
on the same multi-threaded processor (if the processor
is not in sub-processor mode) or to other threads that
are on the same sub-processor (if the processor is in
sub-processor mode). The message type and destina-
tion thread(s) are specified in RB.
RB
Message Payload
/// TYPE /// TIRTA
G
0 32 37 39 57 63
Changing the contents of certain System Registers, the If a sequence of instructions contains context-altering
contents of SLB entries, or the contents of other system instructions and contains no instructions that are
resources that control the context in which a program affected by any of the context alterations, no software
executes can have the side effect of altering the con- synchronization is required within the sequence.
text in which data addresses and instruction addresses
are interpreted, and in which instructions are executed Programming Note
and data accesses are performed. For example, Sometimes advantage can be taken of the fact that
changing MSRIR from 0 to 1 has the side effect of certain events, such as interrupts, and certain
enabling translation of instruction addresses. These instructions that occur naturally in the program,
side effects need not occur in program order, and such as the rfid that returns from an interrupt han-
therefore may require explicit synchronization by soft- dler, provide the required synchronization.
ware. (Program order is defined in Book II.)
An instruction that alters the context in which data No software synchronization is required before or after
addresses or instruction addresses are interpreted, or a context-altering instruction that is also context syn-
in which instructions are executed or data accesses are chronizing or when altering the MSR in most cases
performed, is called a context-altering instruction. This (see the tables). No software synchronization is
chapter covers all the context-altering instructions. The required before most of the other alterations shown in
software synchronization required for them is shown in Table 7, because all instructions preceding the con-
Table 6 (for data access) and Table 7 (for instruction text-altering instruction are fetched and decoded before
fetch and execution). the context-altering instruction is executed (the hard-
ware must determine whether any of these preceding
The notation “CSI” in the tables means any context instructions are context synchronizing).
synchronizing instruction (e.g., sc, isync, or rfid). A
context synchronizing interrupt (i.e., any interrupt In situations such as context switch in which multiple
except non-recoverable System Reset or non-recover- SPRs are loaded in sequence, it is often the case that
able Machine Check) can be used instead of a context the composition of the implicit (implementation-specific,
synchronizing instruction. If it is, phrases like “the syn- nonarchitectural) synchronizations performed for each
chronizing instruction”, below, should be interpreted as individual mtspr will be excessive for the purpose.
meaning the instruction at which the interrupt occurs. If Software may identify such sequences by placing a
no software synchronization is required before (after) a mtgsr before the sequence. Hardware may respond to
context-altering instruction, “the synchronizing instruc- this identification by removing redundant synchroniza-
tion before (after) the context-altering instruction” tion so that the net synchronization effect approaches
should be interpreted as meaning the context-altering that of a single context synchronization at the end of
instruction itself. the sequence. A potential side effect of the optimiza-
tion is that the SPRs specified by the sequence may be
The synchronizing instruction before the context-alter- loaded in an order other than that specified by the pro-
ing instruction ensures that all instructions up to and gram with the result that if an exception interrupts the
including that synchronizing instruction are fetched and sequence, mtspr instructions past the point of interrup-
executed in the context that existed before the alter- tion may have loaded their SPRs. When control
ation. The synchronizing instruction after the con- returns to the interrupted sequence, any such mtspr
text-altering instruction ensures that all instructions instructions are re-executed. The programmer must
after that synchronizing instruction are fetched and ensure that this side effect will not affect the outcome of
executed in the context established by the alteration. the sequence. The degree of optimization is imple-
Instructions after the first synchronizing instruction, up mentation-specific. Transaction failure may compro-
to and including the second synchronizing instruction, mise optimization.
may be fetched or executed in either context.
Programming Note
Because the individual mtspr instructions in an
optimized sequence may be executed in any order,
a single sequence should not contain multiple
loads of the same SPR, and should not contain any
set of SPRs for which the relative order of execu-
tion of the mtspr instructions targeting SPRs in the
set matters.
No slbie (or slbia) is needed if the slbmte instruc- 14. LPIDR when using HPT translation and LPCRHR
tion replaces a valid SLB entry with a mapping of a must not be altered when MSRDR=1 or MSRIR=1;
different ESID (e.g., to satisfy an SLB miss). How- if they are, the results are undefined.
ever, the slbie is needed later if and when the
translation that was contained in the replaced SLB Programming Note
entry is to be invalidated. The prohibitions above are because of the dif-
ficulty of avoiding an implicit branch relative to
11. When the HRMOR or the VC field of the LPCR is the value of enabling software to avoid using
modified, software must invalidate all implementa- hypervisor real addressing mode for the oper-
tion-specific lookaside information used in address ation. (The tables used for translation are
translation that depends on the old contents of the determined by the partition ID and LPCRHR is
register or field (i.e., the contents immediately used as a shortcut. See Section 5.7.6 for
before the modification). The slbia instruction can details.)
be used to invalidate all such implementation-spe-
cific lookaside information. 15. This line applies to the following Performance
Monitor SPRs: PMC1-6, MMCR0, MMCR1,
12. A context synchronizing instruction or event that is MMCR2, and MMCRA.
executed or occurs when LPCRMER = 1 does not
necessarily ensure that the exception effects of 16. This line applies to all SPR numbers that access
LPCRMER are consistent with the contents of the BESCR (800-803, 806).
LPCRMER. See Section 2.2. 17. There are additional software synchronization
13. This line applies regardless of which SPR number requirements when an mtspr instruction modifies
(13 or 29) is used for the AMR. this SPR in a multi-threaded environment. See
Section 2.7.
18. As an alternative to a CSI, the execution of an
rfebb instruction or the occurrence of an
event-based branch is sufficient to provide the
necessary synchronization.
19. These instructions and events, with the exception
of nested tbegin. nested tend., TM instructions
that except or are described to be treated as
no-ops, Transaction Abort Conditional instructions
that do not abort, and events and rfebb instruc-
tions for which the event did not take place in
Transactional state, will change MSRTS. No soft-
ware synchronization is required.
This appendix contains opcode maps showing the pri- invoked for all members of the POWER family, and this
mary opcodes, extended opcodes, and expanded is likely to remain true in future models (it is guaranteed
opcodes. in the Power ISA). An instruction having primary
opcode 0 but not consisting entirely of binary 0s is
Table 8 describes the conventions used in the opcode
reserved except for the following extended opcode
maps.
(instruction bits 21:30).
The instruction consisting entirely of binary 0s causes 256 Service Processor “Attention”
the system illegal instruction error handler to be
Table 8: Opcode Maps Legend
po
mnemonic
book po
version privilege format primary opcode (decimal format)
xop
mnemonic
book xop
version privilege format extended or expanded opcode image (binary format)
0 instruction bit corresponding to an extended/expanded opcode bit having value of 0
1 instruction bit corresponding to an extended/expanded opcode bit having value of 1
/ reserved instruction bit, must have value of 0, otherwise invalid form
. instruction bit corresponding to an operand or control bit, can have a value of either 0 or 1
book
Book instruction defined
version
ISA version instruction introduced
privilege
P privileged instruction
H hypervisor-privileged instruction
format
instruction format
Illegal opcode
Opcode having no previous or current assignment, available for future use
08
subfic
I Defined opcode (primary, extended, or expanded)
P1 D Opcode assigned to a defined instruction
17
EXT17
Primary opcode having an extended opcode field
{extended} Opcode having extended opcode field used to identify multiple instructions
10110 000001
XPND04-1
Extended opcode having an expanded opcode field
{expanded} Opcode having expanded opcode field used to identify multiple instructions
Reserved opcode (primary, extended, or expanded)
{reserved} Opcode is not available for future use without careful consideration
1. Opcode corresponds to an instruction defined in a previous version of the architecture that has been subsequently
removed from the architecture. The opcode is treated as an illegal opcode.
2. Or, opcode is reserved for implementation-dependent use.
These opcodes will not be assigned a meaning in the Power ISA except after careful consideration of the effect of such
assignment on existing implementations.
Invalid form opcode
{invalid} Opcode corresponding to a defined instruction encoding with one or more reserved opcode bits having a value of 1
Table 10: EXT17: Extended Opcode Map for Primary Opcode 17 (opcode bits 30:31)
00 01 10 11
01 I 1/ I 1/
scv sc sc
v3.0 SC PPC SC {invalid}
00 01 10 11
Table 11: EXT30: Extended Opcode Map for Primary Opcode 30 (opcode bits 27:30)
000 001 010 011 100 101 110 111
000- I 000- I 001- I 001- I 010- I 010- I 011- I 011- I
0 rldicl[.] rldicl[.] rldicr[.] rldicr[.] rldic[.] rldic[.] rldimi[.] rldimi[.] 0
PPC MD PPC MD PPC MD PPC MD PPC MD PPC MD PPC MD PPC MD
1000 I 1001 I
1 rldcl[.] rldcr[.] 1
PPC MDS PPC MDS {reserved} {reserved} {reserved} {reserved}
Table 12: EXT57: Extended Opcode Map for Primary Opcode 57 (opcode bits 30:31)
00 01 10 11
00 I 10 I 11 I
lfdp lxsd lxssp
v2.05 DS {reserved} v3.0 DS v3.0 DS
00 01 10 11
Table 13: EXT58: Extended Opcode Map for Primary Opcode 58 (opcode bits 30:31)
00 01 10 11
00 I 01 I 10 I
ld ldu lwa
PPC DS PPC DS PPC DS {reserved}
00 01 10 11
Table 14: EXT61: Extended Opcode Map for Primary Opcode 61 (opcode bits 21:30)
000 001 010 011 100 101 110 111
-00 I 001 I -10 I -11 I -00 I 101 I -10 I -11 I
stfdp lxv stxsd stxssp stfdp stxv stxsd stxssp
v2.05 DS v3.0 DQ v3.0 DS v3.0 DS v2.05 DS v3.0 DQ v3.0 DS v3.0 DS
Table 15: EXT62: Extended Opcode Map for Primary Opcode 62 (opcode bits 21:30)
00 01 10 11
00 I 01 I 10 I
std stdu stq
PPC DS PPC DS v2.03 DS {reserved}
00 01 10 11
Table 16: EXT04: Extended Opcode Map for Primary Opcode 4 (opcode bits 0:5) (Sheet 1 of 8)
000000 000001 000010 000011 000100 000101 000110 000111
00000 000000 I 00000 000001 I 00000 000010 I 00000 000100 I 00000 000110 I 00000 000111 I
00000 vaddubm vmul10cuq vmaxub vrlb vcmpequb vcmpneb 00000
v2.03 VX v3.0 VX v2.03 VX v2.03 VX v2.03 VC v3.0 VC
00001 000000 I 00001 000001 I 00001 000010 I 00001 000100 I 00001 000110 I 00001 000111 I
00001 vadduhm vmul10ecuq vmaxuh vrlh vcmpequh vcmpneh 00001
v2.03 VX v3.0 VX v2.03 VX v2.03 VX v2.03 VC v3.0 VC
00010 000000 I 00010 000010 I 00010 000100 I 00010 000101 I 00010 000110 I 00010 000111 I
00010 vadduwm vmaxuw vrlw vrlwmi vcmpequw vcmpnew 00010
v2.03 VX v2.03 VX v2.03 VX v3.0 VX v2.03 VC v3.0 VC
00011 000000 I 00011 000010 I 00011 000100 I 00011 000101 I 00011 000110 I 00011 000111 I
00011 vaddudm vmaxud vrld vrldmi vcmpeqfp vcmpequd 00011
v2.07 VX v2.07 VX v2.07 VX v3.0 VX v2.03 VC v2.07 VC
00100 000000 I 00100 000010 I 00100 000100 I 00100 000111 I
00100 vadduqm vmaxsb vslb vcmpnezb 00100
v2.07 VX v2.03 VX v2.03 VX v3.0 VC
00101 000000 I 00101 000010 I 00101 000100 I 00101 000111 I
00101 vaddcuq vmaxsh vslh vcmpnezh 00101
v2.07 VX v2.03 VX v2.03 VX v3.0 VC
00110 000000 I 00110 000010 I 00110 000100 I 00110 000101 I 00110 000111 I
00110 vaddcuw vmaxsw vslw vrlwnm vcmpnezw 00110
v2.03 VX v2.03 VX v2.03 VX v3.0 VX v3.0 VC
00111 000010 I 00111 000100 I 00111 000101 I 00111 000110 I
00111 vmaxsd vsl vrldnm vcmpgefp 00111
v2.07 VX v2.03 VX v3.0 VX v2.03 VC
01000 000000 I 01000 000001 I 01000 000010 I 01000 000100 I 01000 000110 I
01000 vaddubs vmul10uq vminub vsrb vcmpgtub 01000
v2.03 VX v3.0 VX v2.03 VX v2.03 VX v2.03 VC
01001 000000 I 01001 000001 I 01001 000010 I 01001 000100 I 01001 000110 I
01001 vadduhs vmul10euq vminuh vsrh vcmpgtuh 01001
v2.03 VX v3.0 VX v2.03 VX v2.03 VX v2.03 VC
01010 000000 I 01010 000010 I 01010 000100 I 01010 000110 I
01010 vadduws vminuw vsrw vcmpgtuw 01010
v2.03 VX v2.03 VX v2.03 VX v2.03 VC
01011 000010 I 01011 000100 I 01011 000110 I 01011 000111 I
01011 vminud vsr vcmpgtfp vcmpgtud 01011
v2.07 VX v2.03 VX v2.03 VC v2.07 VC
01100 000000 I 01100 000010 I 01100 000100 I 01100 000110 I
01100 vaddsbs vminsb vsrab vcmpgtsb 01100
v2.03 VX v2.03 VX v2.03 VX v2.03 VC
01101 000000 I 01101 000001 I 01101 000010 I 01101 000100 I 01101 000110 I
01101 vaddshs bcdcpsgn. vminsh vsrah vcmpgtsh 01101
v2.03 VX v3.0 VX v2.03 VX v2.03 VX v2.03 VC
01110 000000 I 01110 000010 I 01110 000100 I 01110 000110 I
01110 vaddsws vminsw vsraw vcmpgtsw 01110
v2.03 VX v2.03 VX v2.03 VX v2.03 VC
01111 000010 I 01111 000100 I 01111 000110 I 01111 000111 I
01111 vminsd vsrad vcmpbfp vcmpgtsd 01111
v2.07 VX v2.07 VX v2.03 VC v2.07 VC
10000 000000 I 1-000 000001 I 10000 000010 I 10000 000011 I 10000 000100 I 10000 000110 I 10000 000111 I
10000 vsububm bcdadd. vavgub vabsdub vand vcmpequb. vcmpneb. 10000
v2.03 VX v2.07 VX v2.03 VX v3.0 VX v2.03 VX v2.03 VC v3.0 VC
10001 000000 I 1-001 000001 I 10001 000010 I 10001 000011 I 10001 000100 I 10001 000110 I 10001 000111 I
10001 vsubuhm bcdsub. vavguh vabsduh vandc vcmpequh. vcmpneh. 10001
v2.03 VX v2.07 VX v2.03 VX v3.0 VX v2.03 VX v2.03 VC v3.0 VC
10010 000000 I 1/010 000001 I 10010 000010 I 10010 000011 I 10010 000100 I 10010 000110 I 10010 000111 I
10010 vsubuwm bcdus. vavguw vabsduw vor vcmpequw. vcmpnew. 10010
v2.03 VX v3.0 VX v2.03 VX v3.0 VX v2.03 VX v2.03 VC v3.0 VC
10011 000000 I 1-011 000001 I 10011 000100 I 10011 000110 I 10011 000111 I
10011 vsubudm bcds. vxor vcmpeqfp. vcmpequd. 10011
v2.07 VX v3.0 VX v2.03 VX v2.03 VC v2.07 VC
10100 000000 I 1-100 000001 I 10100 000010 I 10100 000100 I 10100 000111 I
10100 vsubuqm bcdtrunc. vavgsb vnor vcmpnezb. 10100
v2.07 VX v3.0 VX v2.03 VX v2.03 VX v3.0 VC
10101 000000 I 1/101 000001 I 10101 000010 I 10101 000100 I 10101 000111 I
10101 vsubcuq bcdutrunc. vavgsh vorc vcmpnezh. 10101
v2.07 VX v3.0 VX v2.03 VX v2.07 VX v3.0 VC
10110 000000 I 10110 000001 10110 000010 I 10110 000100 I 10110 000111 I
10110 vsubcuw XPND04-1A vavgsw vnand vcmpnezw. 10110
v2.03 VX {expanded} v2.03 VX v2.07 VX v3.0 VC
1-111 000001 I 10111 000100 I 10111 000110 I
10111 bcdsr. vsld vcmpgefp. 10111
v3.0 VX v2.07 VX v2.03 VC
11000 000000 I 1-000 000001 I 11000 000010 11000 000100 I 11000 000110 I
11000 vsububs bcdadd. XPND04-2 mfvscr vcmpgtub. 11000
v2.03 VX v2.07 VX {expanded} v2.03 VX v2.03 VC
11001 000000 I 1-001 000001 I 11001 000100 I 11001 000110 I
11001 vsubuhs bcdsub. mtvscr vcmpgtuh. 11001
v2.03 VX v2.07 VX v2.03 VX v2.03 VC
11010 000000 I 1/010 000001 11010 000010 I 11010 000100 I 11010 000110 I
11010 vsubuws bcdus. vshasigmaw veqv vcmpgtuw. 11010
v2.03 VX {invalid} v2.07 VX v2.07 VX v2.03 VC
1-011 000001 I 11011 000010 I 11011 000100 I 11011 000110 I 11011 000111 I
11011 bcds. vshasigmad vsrd vcmpgtfp. vcmpgtud. 11011
v3.0 VX v2.07 VX v2.07 VX v2.03 VC v2.07 VC
11100 000000 I 1-100 000001 I 11100 000010 I 11100 000011 I 11100 000100 I 11100 000110 I
11100 vsubsbs bcdtrunc. vclzb vpopcntb vsrv vcmpgtsb. 11100
v2.03 VX v3.0 VX v2.07 VX v2.07 VX v3.0 VX v2.03 VC
11101 000000 I 1/101 000001 11101 000010 I 11101 000011 I 11101 000100 I 11101 000110 I
11101 vsubshs bcdutrunc. vclzh vpopcnth vslv vcmpgtsh. 11101
v2.03 VX {invalid} v2.07 VX v2.07 VX v3.0 VX v2.03 VC
11110 000000 I 11110 000001 11110 000010 I 11110 000011 I 11110 000110 I
11110 vsubsws XPND04-1B vclzw vpopcntw vcmpgtsw. 11110
v2.03 VX {expanded} v2.07 VX v2.07 VX v2.03 VC
1-111 000001 I 11111 000010 I 11111 000011 I 11111 000110 I 11111 000111 I
11111 bcdsr. vclzd vpopcntd vcmpbfp. vcmpgtsd. 11111
v3.0 VX v2.07 VX v2.07 VX v2.03 VC v2.07 VC
Table 16: EXT04: Extended Opcode Map for Primary Opcode 4 (opcode bits 0:5) (Sheet 2 of 8)
001000 001001 001010 001011 001100 001101 001110 001111
00000 001000 I 00000 001010 I 00000 001100 I 00000 001110 I
00000 vmuloub vaddfp vmrghb vpkuhum 00000
v2.03 VX v2.03 VX v2.03 VX v2.03 VX
00001 001000 I 00001 001010 I 00001 001100 I 00001 001110 I
00001 vmulouh vsubfp vmrghh vpkuwum 00001
v2.03 VX v2.03 VX v2.03 VX v2.03 VX
00010 001000 I 00010 001001 I 00010 001100 I 00010 001110 I
00010 vmulouw vmuluwm vmrghw vpkuhus 00010
v2.07 VX v2.07 VX v2.03 VX v2.03 VX
00011 001110 I
00011 vpkuwus 00011
v2.03 VX
00100 001000 I 00100 001010 I 00100 001100 I 00100 001110 I
00100 vmulosb vrefp vmrglb vpkshus 00100
v2.03 VX v2.03 VX v2.03 VX v2.03 VX
00101 001000 I 00101 001010 I 00101 001100 I 00101 001110 I
00101 vmulosh vrsqrtefp vmrglh vpkswus 00101
v2.03 VX v2.03 VX v2.03 VX v2.03 VX
00110 001000 I 00110 001010 I 00110 001100 I 00110 001110 I
00110 vmulosw vexptefp vmrglw vpkshss 00110
v2.07 VX v2.03 VX v2.03 VX v2.03 VX
00111 001010 I 00111 001110 I
00111 vlogefp vpkswss 00111
v2.03 VX v2.03 VX
01000 001000 I 01000 001010 I 01000 001100 I 01000 001101 I 01000 001110 I
01000 vmuleub vrfin vspltb vextractub vupkhsb 01000
v2.03 VX v2.03 VX v2.03 VX v3.0 VX v2.03 VX
01001 001000 I 01001 001010 I 01001 001100 I 01001 001101 I 01001 001110 I
01001 vmuleuh vrfiz vsplth vextractuh vupkhsh 01001
v2.03 VX v2.03 VX v2.03 VX v3.0 VX v2.03 VX
01010 001000 I 01010 001010 I 01010 001100 I 01010 001101 I 01010 001110 I
01010 vmuleuw vrfip vspltw vextractuw vupklsb 01010
v2.07 VX v2.03 VX v2.03 VX v3.0 VX v2.03 VX
01011 001010 I 01011 001101 I 01011 001110 I
01011 vrfim vextractd vupklsh 01011
v2.03 VX v3.0 VX v2.03 VX
01100 001000 I 01100 001010 I 01100 001100 I 01100 001101 I 01100 001110 I
01100 vmulesb vcfux vspltisb vinsertb vpkpx 01100
v2.03 VX v2.03 VX v2.03 VX v3.0 VX v2.03 VX
01101 001000 I 01101 001010 I 01101 001100 I 01101 001101 I 01101 001110 I
01101 vmulesh vcfsx vspltish vinserth vupkhpx 01101
v2.03 VX v2.03 VX v2.03 VX v3.0 VX v2.03 VX
01110 001000 I 01110 001010 I 01110 001100 I 01110 001101 I
01110 vmulesw vctuxs vspltisw vinsertw 01110
v2.07 VX v2.03 VX v2.03 VX v3.0 VX
01111 001010 I 01111 001101 I 01111 001110 I
01111 vctsxs vinsertd vupklpx 01111
v2.03 VX v3.0 VX v2.03 VX
10000 001000 I 10000 001010 I 10000 001100 I
10000 vpmsumb vmaxfp vslo 10000
v2.07 VX v2.03 VX v2.03 VX
10001 001000 I 10001 001010 I 10001 001100 I 10001 001110 I
10001 vpmsumh vminfp vsro vpkudum 10001
v2.07 VX v2.03 VX v2.03 VX v2.07 VX
10010 001000 I
10010 vpmsumw 10010
v2.07 VX
10011 001000 I 10011 001110 I
10011 vpmsumd vpkudus 10011
v2.07 VX v2.07 VX
10100 001000 I 10100 001001 I 10100 001100 I
10100 vcipher vcipherlast vgbbd 10100
v2.07 VX v2.07 VX v2.07 VX
10101 001000 I 10101 001001 I 10101 001100 I 10101 001110 I
10101 vncipher vncipherlast vbpermq vpksdus 10101
v2.07 VX v2.07 VX v2.07 VX v2.07 VX
10110 10110
11111 11111
Table 16: EXT04: Extended Opcode Map for Primary Opcode 4 (opcode bits 0:5) (Sheet 3 of 8)
010000 010001 010010 010011 010100 010101 010110 010111
00000 00000
00001 00001
00010 00010
00011 00011
00100 00100
00101 00101
00110 00110
00111 00111
01000 01000
01001 01001
01010 01010
01011 01011
01100 01100
01101 01101
01110 01110
01111 01111
10000 10000
10001 10001
10010 10010
10011 10011
10100 10100
10101 10101
10110 10110
10111 10111
11000 11000
11001 11001
11010 11010
11011 11011
11100 11100
11101 11101
11110 11110
11111 11111
Table 16: EXT04: Extended Opcode Map for Primary Opcode 4 (opcode bits 0:5) (Sheet 4 of 8)
011000 011001 011010 011011 011100 011101 011110 011111
00000 00000
00001 00001
00010 00010
00011 00011
00100 00100
00101 00101
00110 00110
00111 00111
01000 01000
01001 01001
01010 01010
01011 01011
01100 01100
01101 01101
01110 01110
01111 01111
10000 10000
10001 10001
10010 10010
10011 10011
10100 10100
10101 10101
10110 10110
10111 10111
11000 11000
11001 11001
11010 11010
11011 11011
11100 11100
11101 11101
11110 11110
11111 11111
Table 16: EXT04: Extended Opcode Map for Primary Opcode 4 (opcode bits 0:5) (Sheet 5 of 8)
100000 100001 100010 100011 100100 100101 100110 100111
----- 100000 I ----- 100001 I ----- 100010 I ----- 100011 I ----- 100100 I ----- 100101 I ----- 100110 I ----- 100111 I
00000 vmhaddshs vmhraddshs vmladduhm vmsumudm vmsumubm vmsummbm vmsumuhm vmsumuhs 00000
v2.03 VA v2.03 VA v2.03 VA v3.0B VA v2.03 VA v2.03 VA v2.03 VA v2.03 VA
00001 00001
00010 00010
00011 00011
00100 00100
00101 00101
00110 00110
00111 00111
01000 01000
01001 01001
01010 01010
01011 01011
01100 01100
01101 01101
01110 01110
01111 01111
10000 10000
10001 10001
10010 10010
10011 10011
10100 10100
10101 10101
10110 10110
10111 10111
11000 11000
11001 11001
11010 11010
11011 11011
11100 11100
11101 11101
11110 11110
11111 11111
Table 16: EXT04: Extended Opcode Map for Primary Opcode 4 (opcode bits 0:5) (Sheet 6 of 8)
101000 101001 101010 101011 101100 101101 101110 101111
----- 101000 I ----- 101001 I ----- 101010 I ----- 101011 I /---- 101100 I ----- 101101 I ----- 101110 I ----- 101111 I
00000 vmsumshm vmsumshs vsel vperm vsldoi vpermxor vmaddfp vnmsubfp 00000
v2.03 VA v2.03 VA v2.03 VA v2.03 VA v2.03 VA v2.07 VA v2.03 VA v2.03 VA
00001 00001
00010 00010
00011 00011
00100 00100
00101 00101
00110 00110
00111 00111
01000 01000
01001 01001
01010 01010
01011 01011
01100 01100
01101 01101
01110 01110
01111 01111
/---- 101100
10000 vsldoi 10000
{invalid}
10001 10001
10010 10010
10011 10011
10100 10100
10101 10101
10110 10110
10111 10111
11000 11000
11001 11001
11010 11010
11011 11011
11100 11100
11101 11101
11110 11110
11111 11111
Table 16: EXT04: Extended Opcode Map for Primary Opcode 4 (opcode bits 0:5) (Sheet 7 of 8)
110000 110001 110010 110011 110100 110101 110110 110111
----- 110000 I ----- 110001 I ----- 110011 I
00000 maddhd maddhdu maddld 00000
v3.0 VA v3.0 VA v3.0 VA
00001 00001
00010 00010
00011 00011
00100 00100
00101 00101
00110 00110
00111 00111
01000 01000
01001 01001
01010 01010
01011 01011
01100 01100
01101 01101
01110 01110
01111 01111
10000 10000
10001 10001
10010 10010
10011 10011
10100 10100
10101 10101
10110 10110
10111 10111
11000 11000
11001 11001
11010 11010
11011 11011
11100 11100
11101 11101
11110 11110
11111 11111
Table 16: EXT04: Extended Opcode Map for Primary Opcode 4 (opcode bits 0:5) (Sheet 8 of 8)
111000 111001 111010 111011 111100 111101 111110 111111
----- 111011 I ----- 111100 I ----- 111101 I ----- 111110 I ----- 111111 I
00000 vpermr vaddeuqm vaddecuq vsubeuqm vsubecuq 00000
v3.0 VA v2.07 VA v2.07 VA v2.07 VA v2.07 VA
00001 00001
00010 00010
00011 00011
00100 00100
00101 00101
00110 00110
00111 00111
01000 01000
01001 01001
01010 01010
01011 01011
01100 01100
01101 01101
01110 01110
01111 01111
10000 10000
10001 10001
10010 10010
10011 10011
10100 10100
10101 10101
10110 10110
10111 10111
11000 11000
11001 11001
11010 11010
11011 11011
11100 11100
11101 11101
11110 11110
11111 11111
Table 17: XPND04-1A: Extended Opcode Map for PO=4 XO=0b10110_000001 (opcode bits 11:15)
000 001 010 011 100 101 110 111
00 000 I 00 010 I 00 100 I 00 101 I 00 110 I 00 111 I
00 bcdctsq. bcdcfsq. bcdctz. bcdctn. bcdcfz. bcdcfn. 00
v3.0 VX v3.0 VX v3.0 VX v3.0 VX v3.0 VX v3.0 VX
01 01
10 10
11111 I
11 bcdsetsgn. 11
v3.0 VX
Table 18: XPND04-1B: Extended Opcode Map for PO=4 XO=0b11110_000001 (opcode bits 11:15)
000 001 010 011 100 101 110 111
00 000 I 00 010 I 00 100 I 00101 1/110 000001 I 00 110 I 00 111 I
00 bcdctsq. bcdcfsq. bcdctz. bcdctn. bcdcfz. bcdcfn. 00
{invalid} VX v3.0 VX v3.0 VX {invalid} VX v3.0 VX v3.0 VX
01 01
10 10
11 111 I
11 bcdsetsgn. 11
v3.0 VX
Table 19: XPND04-2: Extended Opcode Map for PO=4 XO=0b11000 000010 (opcode bits 11:15)
000 001 010 011 100 101 110 111
00 000 I 00 001 I 00 110 I 00 111 I
00 vclzlsbb vctzlsbb vnegw vnegd 00
v3.0 VX v3.0 VX v3.0 VX v3.0 VX
01 000 I 01 001 I 01 010 I
01 vprtybw vprtybd vprtybq 01
v3.0 VX v3.0 VX v3.0 VX
10 000 I 10 001 I
10 vextsb2w vextsh2w 10
v3.0 VX v3.0 VX
11 000 I 11 001 I 11 010 I 11 100 I 11 101 I 11 110 I 11 111 I
11 vextsb2d vextsh2d vextsw2d vctzb vctzh vctzw vctzd 11
v3.0 VX v3.0 VX v3.0 VX v3.0 VX v3.0 VX v3.0 VX v3.0 VX
Table 20: EXT19: Extended Opcode Map for Primary Opcode 19 (opcode bits 21:30) (Sheet 1 of 4)
00000 00001 00010 00011 00100 00101 00110 00111
00000 00000 I ----- 00010 I
00000 mcrf addpcis 00000
P1 XL v3.0 DX
00001 00001 I
00001 crnor 00001
P1 XL
00010 00010
00011 00011
00100 00001 I
00100 crandc 00100
P1 XL
00101 00101
00110 00001 I
00110 crxor 00110
P1 XL
00111 00001 I
00111 crnand 00111
P1 XL
01000 00001 I
01000 crand 01000
P1 XL
01001 00001 I
01001 creqv 01001
P1 XL
01010 01010
01011 01011
01100 01100
01101 00001 I
01101 crorc 01101
P1 XL
01110 00001 I
01110 cror 01110
P1 XL
01111 01111
10000 10000
10001 10001
10010 10010
10011 10011
10100 10100
10101 10101
10110 10110
10111 10111
11000 11000
11001 11001
11010 11010
11011 11011
11100 11100
11101 11101
11110 11110
11111 11111
Table 20: EXT19: Extended Opcode Map for Primary Opcode 19 (opcode bits 21:30) (Sheet 2 of 4)
01000 01001 01010 01011 01100 01101 01110 01111
00000 00000
00001 00001
00010 00010
00011 00011
00100 00100
00101 00101
00110 00110
00111 00111
01000 01000
01001 01001
01010 01010
01011 01011
01100 01100
01101 01101
01110 01110
01111 01111
10000 10000
10001 10001
10010 10010
10011 10011
10100 10100
10101 10101
10110 10110
10111 10111
11000 11000
11001 11001
11010 11010
11011 11011
11100 11100
11101 11101
11110 11110
11111 11111
Table 20: EXT19: Extended Opcode Map for Primary Opcode 19 (opcode bits 21:30) (Sheet 3 of 4)
10000 10001 10010 10011 10100 10101 10110 10111
00000 10000 I 00000 10010 III
00000 bclr[l] rfid 00000
P1 XL PPC P XL
00001 00001
{reserved} {reserved}
00010 10010 III
00010 rfscv 00010
v3.0 P XL
00011 00011
00101 00101
00110 00110
00111 00111
01001 01001
01010 01010
01100 01100
{reserved}
01101 01101
{reserved}
01110 01110
{reserved}
01111 01111
{reserved}
10000 10000 I
10000 bcctr[l] 10000
P1 XL
10001 10000 I
10001 bctar[l] 10001
v2.07 XL
10010 10010
10011 10011
10100 10100
10101 10101
10110 10110
10111 10111
11000 11000
11001 11001
11010 11010
11011 11011
11100 11100
11101 11101
11110 11110
11111 11111
Table 20: EXT19: Extended Opcode Map for Primary Opcode 19 (opcode bits 21:30) (Sheet 4 of 4)
11000 11001 11010 11011 11100 11101 11110 11111
00000 00000
00001 00001
00010 00010
00011 00011
00100 00100
00101 00101
00110 00110
00111 00111
01000 01000
01001 01001
01010 01010
01011 01011
01100 01100
01101 01101
01110 01110
01111 01111
10000 10000
10001 10001
10010 10010
10011 10011
10100 10100
10101 10101
10110 10110
10111 10111
11000 11000
11001 11001
11010 11010
11011 11011
11100 11100
11101 11101
11110 11110
11111 11111
Table 21: EXT31: Extended Opcode Map for Primary Opcode 31 (opcode bits 21:30) (Sheet 1 of 4)
00000 00001 00010 00011 00100 00101 00110 00111
00000 00000 I 00000 00100 I 00000 00110 I 00000 00111 I
00000 cmp tw lvsl lvebx 00000
P1 X P1 X v2.03 X v2.03 X
00001 00000 I 00001 00110 I 00001 00111 I
00001 cmpl lvsr lvehx 00001
P1 X {reserved} v2.03 X v2.03 X
00010 00100 I 00010 00111 I
00010 td lvewx 00010
{reserved} PPC X v2.03 X
00011 00111 I
00011 lvx 00011
{reserved} v2.03 X
00100 00000 I 00100 00111 I
00100 setb stvebx 00100
v3.0 VX {reserved} v2.03 X
00101 00111 I
00101 stvehx 00101
{reserved} {reserved} v2.03 X
00110 00000 I 00110 00111 I
00110 cmprb stvewx 00110
v3.0 X v2.03 X
00111 00000 I 00111 00111 I
00111 cmpeqb stvx 00111
v3.0 X {reserved} v2.03 X
01000 01000
{reserved}
01001 01001
{reserved}
01010 01010
{reserved}
01011 00111 I
01011 lvxl 01011
{reserved} v2.03 X
01100 01100
01101 01101
{reserved}
01110 01110
{reserved} {reserved}
01111 00111 I
01111 stvxl 01111
{reserved} {reserved} v2.03 X
10000 10000
{reserved} {reserved}
10001 10001
{reserved} {reserved} {reserved}
10010 00000 I 10010 00110 II
10010 mcrxrx lwat 10010
v3.0 X v3.0 X
10011 00110 II
10011 ldat 10011
{reserved} v3.0 X
10100 10100
{reserved}
10101 10101
{reserved} {reserved}
10110 00110 II
10110 stwat 10110
v3.0 X
10111 00110 II
10111 stdat 10111
{reserved} v3.0 X
11000 00110 II
11000 copy 11000
v3.0 X IV{reserved}
11001 11001
{reserved} {reserved}
11010 00110 II
11010 cpabort 11010
v3.0 X
11011 11011
{reserved}
11100 00110 II
11100 paste[.] 11100
v3.0 X {reserved}
11101 11101
{reserved} {reserved}
11110 11110
{reserved}
11111 11111
{reserved} {reserved}
Table 21: EXT31: Extended Opcode Map for Primary Opcode 31 (opcode bits 21:30) (Sheet 2 of 4)
01000 01001 01010 01011 01100 01101 01110 01111
00000 01000 I /0000 01001 I 00000 01010 I /0000 01011 I 00000 01100 I ----- 01111 I
00000 subfc[.] mulhdu[.] addc[.] mulhwu[.] lxsiwzx isel 00000
P1 XO PPC XO P1 XO PPC XO v2.07 X v2.03 A
00001 01000 I
00001 subf[.] 00001
PPC XO
/0010 01001 I /0010 01010 I /0010 01011 I 00010 01100 I
00010 mulhd[.] addg6s mulhw[.] lxsiwax 00010
PPC XO v2.06 XO PPC XO v2.07 X
00011 01000 I
00011 neg[.] 00011
P1 XO {reserved}
00100 01000 I 00100 01010 I 00100 01100 I 00100 01110 III
00100 subfe[.] adde[.] stxsiwx msgsndp 00100
P1 XO P1 XO v2.07 X v2.07 P X
--101 01010 I 00101 01110 III
00101 addex msgclrp 00101
v3.0B X v2.07 P X
00110 01000 I 00110 01010 I 00110 01110 III
00110 subfze[.] addze[.] msgsnd 00110
P1 XO P1 XO v2.07 HV X
00111 01000 I 00111 01001 I 00111 01010 I 00111 01011 I 00111 01110 III
00111 subfme[.] mulld[.] addme[.] mullw[.] msgclr 00111
P1 XO PPC XO P1 XO P1 XO v2.07 HV X
01000 01001 I 01000 01010 I 01000 01011 I 01000 01100 I 01000 01101 I
01000 modud add[.] moduw lxvx lxvl 01000
{reserved} v3.0 X P1 XO v3.0 X v3.0 X v3.0 X
01001 01101 I 01001 01110 I
01001 lxvll mfbhrbe 01001
v3.0 X v2.07 X
01010 01100 I
01010 lxvdsx 01010
{reserved} v2.06 X
01011 01100 I
01011 lxvwsx 01011
{reserved} {reserved} v3.0 X
01100 01001 I 01100 01011 I 01100 01100 I 01100 01101 I
01100 divdeu[.] divweu[.] stxvx stxvl 01100
v2.06 XO v2.06 XO v3.0 X v3.0 X
01101 01001 I --101 01010 I 01101 01011 I 01101 01101 I 01101 01110 I
01101 divde[.] addex divwe[.] stxvll clrbhrb 01101
v2.06 XO v3.0B X v2.06 XO v3.0 X v2.07 X
01110 01001 I 01110 01011 I
01110 divdu[.] divwu[.] 01110
PPC XO PPC XO
01111 01001 I 01111 01011 I
01111 divd[.] divw[.] 01111
{reserved} PPC XO PPC XO
10000 01000 I /0000 01001 10000 01010 I /0000 01011 10000 01100 I
10000 subfco[.] mulhdu[.] addco[.] mulhwu[.] lxsspx 10000
P1 XO {invalid} P1 XO {invalid} v2.07 X
10001 01000 I
10001 subfo[.] 10001
PPC XO
/0010 01001 /0010 01010 /0010 01011 10010 01100 I
10010 mulhd[.] addg6s mulhw[.] lxsdx 10010
{invalid} {invalid} {invalid} v2.06 X
10011 01000 I
10011 nego[.] 10011
P1 XO {reserved}
10100 01000 I 10100 01010 I 10100 01100 I 10100 01110 II
10100 subfeo[.] addeo[.] stxsspx tbegin. 10100
P1 XO P1 XO v2.07 X v2.07 X
--101 01010 I 10101 01110 II
10101 addex tend. 10101
v3.0B X v2.07 X
10110 01000 I 10110 01010 I 10110 01100 I 10110 01110 II
10110 subfzeo[.] addzeo[.] stxsdx tcheck 10110
P1 XO P1 XO v2.06 X v2.07 X
10111 01000 I 10111 01001 I 10111 01010 I 10111 01011 I 10111 01110 II
10111 subfmeo[.] mulldo[.] addmeo[.] mullwo[.] tsr. 10111
P1 XO PPC XO P1 XO P1 XO v2.07 X
11000 01001 I 11000 01010 I 11000 01011 I 11000 01100 I 11000 01101 I 11000 01110 II
11000 modsd addo[.] modsw lxvw4x lxsibzx tabortwc. 11000
{reserved} v3.0 X P1 XO v3.0 X v2.06 X v3.0 X v2.07 X
11001 01100 I 11001 01101 I 11001 01110 II
11001 lxvh8x lxsihzx tabortdc. 11001
v3.0 X v3.0 X v2.07 X
11010 01100 I 11010 01110 II
11010 lxvd2x tabortwci. 11010
{reserved} v2.06 X v2.07 X
11011 01100 I 11011 01110 II
11011 lxvb16x tabortdci. 11011
{reserved} {reserved} v3.0 X v2.07 X
11100 01001 I 11100 01011 I 11100 01100 I 11100 01101 I 11100 01110 II
11100 divdeuo[.] divweuo[.] stxvw4x stxsibx tabort. 11100
v2.06 XO v2.06 XO v2.06 X v3.0 X v2.07 X
11101 01001 I --101 01010 I 11101 01011 I 11101 01100 I 11101 01101 I 11101 01110 II
11101 divdeo[.] addex divweo[.] stxvh8x stxsihx treclaim. 11101
v2.06 XO v3.0B X v2.06 XO v3.0 X v3.0 X v2.07 X
11110 01001 I 11110 01011 I 11110 01100 I
11110 divduo[.] divwuo[.] stxvd2x 11110
PPC XO PPC XO v2.06 X
11111 01001 I 11111 01011 I 11111 01100 I 11111 01110 II
11111 divdo[.] divwo[.] stxvb16x trechkpt. 11111
{reserved} PPC XO PPC XO v3.0 X v2.07 X
Table 21: EXT31: Extended Opcode Map for Primary Opcode 31 (opcode bits 21:30) (Sheet 3 of 4)
10000 10001 10010 10011 10100 10101 10110 10111
00000 10011 I 00000 10100 II 00000 10101 I 00000 10110 II 00000 10111 I
00000 mfcr/mfocrf lwarx ldx icbt lwzx 00000
P1/v2.01 XFX PPC X PPC X v2.07 X P1 X
00001 10011 I 00001 10100 II 00001 10101 I 00001 10110 II 00001 10111 I
00001 mfvsrd lbarx ldux dcbst lwzux 00001
v2.07 XX1 v2.06 X PPC X PPC X P1 X
00010 10011 III 00010 10100 II 00010 10110 II 00010 10111 I
00010 mfmsr ldarx dcbf lbzx 00010
{reserved} P1 P X PPC X PPC X P1 X
00011 10011 I 00011 10100 II 00011 10111 I
00011 mfvsrwz lharx lbzux 00011
{reserved} v2.07 XX1 v2.06 X {reserved} P1 X
00100 10000 I 00100 10010 III 00100 10101 I 00100 10110 II 00100 10111 I
00100 mtcrf/mtocrf mtmsr stdx stwcx. stwx 00100
P1/v2.01 XFX P1 P X {reserved} PPC X PPC X P1 X
00101 10010 III 00101 10011 I 00101 10101 I 00101 10110 I 00101 10111 I
00101 mtmsrd mtvsrd stdux stqcx. stwux 00101
PPC P X v2.07 XX1 PPC X v2.07 X P1 X
00110 10011 I 00110 10110 II 00110 10111 I
00110 mtvsrwa stdcx. stbx 00110
{reserved} v2.07 XX1 PPC X P1 X
00111 10011 I 00111 10110 II 00111 10111 I
00111 mtvsrwz dcbtst stbux 00111
{reserved} X v2.07 XX1 PPC X P1 X
01000 10010 III 01000 10100 I 01000 10110 II 01000 10111 I
01000 tlbiel lqarx dcbt lhzx 01000
v2.03 P X v2.07 X {reserved} PPC X P1 X
01001 10010 III 01001 10011 I 01001 10111 I
01001 tlbie mfvsrld lhzux 01001
P1 HV X v3.0 XX1 {reserved} {reserved} P1 X
01010 10010 III 01010 10011 X 01010 10101 I 01010 10111 I
01010 slbsync mfspr lwax lhax 01010
v3.0 P X P1 O X PPC X {reserved} P1 X
01011 10011 II 01011 10101 I 01011 10111 I
01011 mftb lwaux lhaux 01011
{reserved} PPC X PPC X {reserved} P1 X
01100 10010 III 01100 10011 I 01100 10111 I
01100 slbmte mtvsrws sthx 01100
v2.00 P X v3.0 XX1 P1 X
01101 10010 III 01101 10011 I 01101 10111 I
01101 slbie mtvsrdd sthux 01101
PPC P X v3.0 XX1 {reserved} P1 X
01110 10010 III 01110 10011 X
01110 slbieg mtspr 01110
v3.0 P X P1 O X {reserved}
01111 10010 III
01111 slbia 01111
PPC P X {reserved} {reserved}
10000 10010 I 10000 10100 I 10000 10101 I 10000 10110 I 10000 10111 I
10000 nop ldbrx lswx lwbrx lfsx 10000
v2.05 X {reserved} v2.06 X P1 X P1 X P1 X
10001 10010 I 10001 10110 III 10001 10111 I
10001 nop tlbsync lfsux 10001
v2.05 X {reserved} PPC HV/P X P1 X
10010 10010 I 10010 10101 I 10010 10110 II 10010 10111 I
10010 nop lswi sync lfdx 10010
v2.05 X {reserved} P1 X P1 X P1 X
10011 10010 I 10011 10111 I
10011 nop lfdux 10011
v2.05 X {reserved} {reserved} {reserved} P1 X
10100 10010 I 10100 10100 I 10100 10101 I 10100 10110 I 10100 10111 I
10100 nop stdbrx stswx stwbrx stfsx 10100
v2.05 X {reserved} v2.06 X P1 X P1 X P1 X
10101 10010 I 10101 10110 II 10101 10111 I
10101 nop stbcx. stfsux 10101
v2.05 X {reserved} v2.06 X P1 X
10110 10010 I 10110 10101 I 10110 10110 II 10110 10111 I
10110 nop stswi sthcx. stfdx 10110
v2.05 X P1 X v2.06 X P1 X
10111 10010 I 10111 10011 I 10111 10111 I
10111 nop darn stfdux 10111
v2.05 X v3.0 X {reserved} {reserved} P1 X
11000 10101 III 11000 10110 I 11000 10111 I
11000 lwzcix lhbrx lfdpx 11000
v2.05 HV X P1 X v2.05 X
11001 10101 III
11001 lhzcix 11001
{reserved} v2.05 HV X {reserved} X {reserved}
11010 10010 III 11010 10011 III 11010 10101 III 11010 10110 II 11010 10111 I
11010 slbiag slbmfev lbzcix eieio lfiwax 11010
v3.0B P X v2.00 P X v2.05 HV X PPC X v2.05 X
11011 10101 III 11011 10110 III 11011 10111 I
11011 ldcix msgsync lfiwzx 11011
v2.05 HV X v3.0 HV X v2.06 X
11100 10011 III 11100 10101 III 11100 10110 I 11100 10111 I
11100 slbmfee stwcix sthbrx stfdpx 11100
{reserved} v2.00 P X v2.05 HV X P1 X v2.05 X
11101 10101 III
11101 sthcix 11101
{reserved} v2.05 HV X {reserved}
11110 10011 III 11110 10101 III 11110 10110 II 11110 10111 I
11110 slbfee. stbcix icbi stfiwx 11110
{reserved} v2.05 P X v2.05 HV X PPC X PPC X
11111 10101 III 11111 10110 II
11111 stdcix dcbz 11111
{reserved} v2.05 HV X P1 X
Table 21: EXT31: Extended Opcode Map for Primary Opcode 31 (opcode bits 21:30) (Sheet 4 of 4)
11000 11001 11010 11011 11100 11101 11110 11111
00000 11000 I 00000 11010 I 00000 11011 I 00000 11100 I 00000 11110 II
00000 slw[.] cntlzw[.] sld[.] and[.] wait 00000
P1 X P1 X PPC X P1 X {reserved} v3.0 X
00001 11010 I 00001 11100 I
00001 cntlzd[.] andc[.] 00001
PPC X P1 X {reserved}
00010 00010
{reserved}
00011 11010 I 00011 11100 I
00011 popcntb nor[.] 00011
v2.02 X P1 X
00100 11010 I
00100 prtyw 00100
{reserved} P1{reserved} v2.05 X
00101 11010 I
00101 prtyd 00101
{reserved} v2.05 X
00110 00110
{reserved} P1{reserved}
00111 11100 I
00111 bpermd 00111
{reserved} v2.06 X
01000 11010 I 01000 11100 I
01000 cdtbcd eqv[.] 01000
v2.06 X P1 X
01001 11010 I 01001 11100 I
01001 cbcdtd xor[.] 01001
v2.06 X P1 X
01010 01010
01011 11010 I
01011 popcntw 01011
v2.06 X
01100 11100 I
01100 orc[.] 01100
P1 X
01101 11100 I
01101 or[.] 01101
P1 X
01110 11100 I
01110 nand[.] 01110
P1 X
01111 11010 I 01111 11100 I
01111 popcntd cmpb 01111
v2.06 X v2.05 X
10000 11000 I 10000 11010 I 10000 11011 I
10000 srw[.] cnttzw[.] srd[.] 10000
P1 X {reserved} v3.0 X PPC X {reserved}
10001 11010 I
10001 cnttzd[.] 10001
v3.0 X
10010 10010
10011 10011
10100 10100
{reserved} {reserved}
10101 10101
{reserved}
10110 10110
{reserved} {reserved}
10111 10111
{reserved}
11000 11000 I 11000 11010 I
11000 sraw[.] srad[.] 11000
P1 X PPC X
11001 11000 I 11001 1101- I
11001 srawi[.] sradi[.] 11001
P1 X PPC XS
11010 11010
11011 1101- I
11011 extswsli[.] 11011
v3.0 XS
11100 11010 I
11100 extsh[.] 11100
{reserved} {reserved} P1 X
11101 11010 I
11101 extsb[.] 11101
{reserved} PPC X
11110 11010 I
11110 extsw[.] 11110
PPC X
11111 11111
Table 22: EXT59: Extended Opcode Map for Primary Opcode 59 (opcode bits 21:30) (Sheet 1 of 4)
00000 00001 00010 00011 00100 00101 00110 00111
00000 00010 I --000 00011 I
00000 dadd[.] dqua[.] 00000
v2.05 X v2.05 Z23
00001 00010 I --001 00011 I
00001 dmul[.] drrnd[.] 00001
v2.05 X v2.05 Z23
-0010 00010 I --010 00011 I
00010 dscli[.] dquai[.] 00010
v2.05 Z22 v2.05 Z23
-0011 00010 I --011 00011 I
00011 dscri[.] drintx[.] 00011
v2.05 Z22 v2.05 Z23
00100 00010 I
00100 dcmpo 00100
v2.05 X
00101 00010 I
00101 dtstex 00101
v2.05 X
-0110 00010 I
00110 dtstdc 00110
v2.05 Z22
-0111 00010 I --111 00011 I
00111 dtstdg drintn[.] 00111
v2.05 Z22 v2.05 Z23
01000 00010 I --000 00011 I
01000 dctdp[.] dqua[.] 01000
v2.05 X v2.05 Z23
01001 00010 I --001 00011 I
01001 dctfix[.] drrnd[.] 01001
v2.05 X v2.05 Z23
01010 00010 I --010 00011 I
01010 ddedpd[.] dquai[.] 01010
v2.05 X v2.05 Z23
01011 00010 I --011 00011 I
01011 dxex[.] drintx[.] 01011
v2.05 X v2.05 Z23
01100 01100
01101 01101
01110 01110
--111 00011 I
01111 drintn[.] 01111
v2.05 Z23
10000 00010 I --000 00011 I
10000 dsub[.] dqua[.] 10000
v2.05 X v2.05 Z23
10001 00010 I --001 00011 I
10001 ddiv[.] drrnd[.] 10001
v2.05 X v2.05 Z23
-0010 00010 I --010 00011 I
10010 dscli[.] dquai[.] 10010
v2.05 Z22 v2.05 Z23
-0011 00010 I --011 00011 I
10011 dscri[.] drintx[.] 10011
v2.05 Z22 v2.05 Z23
10100 00010 I
10100 dcmpu 10100
v2.05 X
10101 00010 I 10101 00011 I
10101 dtstsf dtstsfi 10101
v2.05 X v3.0 X
-0110 00010 I
10110 dtstdc 10110
v2.05 Z22
-0111 00010 I --111 00011 I
10111 dtstdg drintn[.] 10111
v2.05 Z22 v2.05 Z23
11000 00010 I --000 00011 I
11000 drsp[.] dqua[.] 11000
v2.05 X v2.05 Z23
11001 00010 I --001 00011 I
11001 dcffix[.] drrnd[.] 11001
v2.06 X v2.05 Z23
11010 00010 I --010 00011 I
11010 denbcd[.] dquai[.] 11010
v2.05 X v2.05 Z23
11011 00010 I --011 00011 I
11011 diex[.] drintx[.] 11011
v2.05 X v2.05 Z23
11100 11100
11101 11101
11110 11110
--111 00011 I
11111 drintn[.] 11111
v2.05 Z23
Table 22: EXT59: Extended Opcode Map for Primary Opcode 59 (opcode bits 21:30) (Sheet 2 of 4)
01000 01001 01010 01011 01100 01101 01110 01111
00000 00000
00001 00001
00010 00010
00011 00011
00100 00100
00101 00101
00110 00110
00111 00111
01000 01000
01001 01001
01010 01010
01011 01011
01100 01100
01101 01101
01110 01110
01111 01111
10000 10000
10001 10001
10010 10010
10011 10011
10100 10100
10101 10101
10110 10110
10111 10111
11000 11000
11001 11001
11010 01110 I
11010 fcfids[.] 11010
v2.06 X
11011 11011
11100 11100
11101 11101
11110 01110 I
11110 fcfidus[.] 11110
v2.06 X
11111 11111
Table 22: EXT59: Extended Opcode Map for Primary Opcode 59 (opcode bits 21:30) (Sheet 3 of 4)
10000 10001 10010 10011 10100 10101 10110 10111
///// 10010 I ///// 10100 I ///// 10101 I ///// 10110 I
00000 fdivs[.] fsubs[.] fadds[.] fsqrts[.] 00000
PPC A PPC A PPC A PPC A
///// 10010 ///// 10100 ///// 10101 ///// 10110
00001 fdivs[.] fsubs[.] fadds[.] fsqrts[.] 00001
{invalid} {invalid} {invalid} {invalid}
00010 00010
00011 00011
00100 00100
00101 00101
00110 00110
00111 00111
01000 01000
01001 01001
01010 01010
01011 01011
01100 01100
01101 01101
01110 01110
01111 01111
10000 10000
10001 10001
10010 10010
10011 10011
10100 10100
10101 10101
10110 10110
10111 10111
11000 11000
11001 11001
11010 11010
11011 11011
11100 11100
11101 11101
11110 11110
11111 11111
Table 22: EXT59: Extended Opcode Map for Primary Opcode 59 (opcode bits 21:30) (Sheet 4 of 4)
11000 11001 11010 11011 11100 11101 11110 11111
///// 11000 I ----- 11001 I ///// 11010 I ----- 11100 I ----- 11101 I ----- 11110 I ----- 11111 I
00000 fres[.] fmuls[.] frsqrtes[.] fmsubs[.] fmadds[.] fnmsubs[.] fnmadds[.] 00000
PPC A PPC A v2.02 A PPC A PPC A PPC A PPC A
///// 11000 ///// 11010
00001 fres[.] frsqrtes[.] 00001
{invalid} {invalid}
00010 00010
00011 00011
00100 00100
00101 00101
00110 00110
00111 00111
01000 01000
01001 01001
01010 01010
01011 01011
01100 01100
01101 01101
01110 01110
01111 01111
10000 10000
10001 10001
10010 10010
10011 10011
10100 10100
10101 10101
10110 10110
10111 10111
11000 11000
11001 11001
11010 11010
11011 11011
11100 11100
11101 11101
11110 11110
11111 11111
Table 23: EXT60: Extended Opcode Map for Primary Opcode 60 (opcode bits 21:30) (Sheet 1 of 4)
00000 00001 00010 00011 00100 00101 00110 00111
00000 000-- I 00000 001-- I
00000 xsaddsp xsmaddasp 00000
v2.07 XX3 v2.07 XX3
00001 000-- I 00001 001-- I
00001 xssubsp xsmaddmsp 00001
v2.07 XX3 v2.07 XX3
00010 000-- I 00010 001-- I
00010 xsmulsp xsmsubasp 00010
v2.07 XX3 v2.07 XX3
00011 000-- I 00011 001-- I
00011 xsdivsp xsmsubmsp 00011
v2.07 XX3 v2.07 XX3
00100 000-- I 00100 001-- I
00100 xsadddp xsmaddadp 00100
v2.06 XX3 v2.06 XX3
00101 000-- I 00101 001-- I
00101 xssubdp xsmaddmdp 00101
v2.06 XX3 v2.06 XX3
00110 000-- I 00110 001-- I
00110 xsmuldp xsmsubadp 00110
v2.06 XX3 v2.06 XX3
00111 000-- I 00111 001-- I
00111 xsdivdp xsmsubmdp 00111
v2.06 XX3 v2.06 XX3
01000 000-- I 01000 001-- I
01000 xvaddsp xvmaddasp 01000
v2.06 XX3 v2.06 XX3
01001 000-- I 01001 001-- I
01001 xvsubsp xvmaddmsp 01001
v2.06 XX3 v2.06 XX3
01010 000-- I 01010 001-- I
01010 xvmulsp xvmsubasp 01010
v2.06 XX3 v2.06 XX3
01011 000-- I 01011 001-- I
01011 xvdivsp xvmsubmsp 01011
v2.06 XX3 v2.06 XX3
01100 000-- I 01100 001-- I
01100 xvadddp xvmaddadp 01100
v2.06 XX3 v2.06 XX3
01101 000-- I 01101 001-- I
01101 xvsubdp xvmaddmdp 01101
v2.06 XX3 v2.06 XX3
01110 000-- I 01110 001-- I
01110 xvmuldp xvmsubadp 01110
v2.06 XX3 v2.06 XX3
01111 000-- I 01111 001-- I
01111 xvdivdp xvmsubmdp 01111
v2.06 XX3 v2.06 XX3
10000 000-- I 10000 001-- I
10000 xsmaxcdp xsnmaddasp 10000
v3.0 XX3 v2.07 XX3
10001 000-- I 10001 001-- I
10001 xsmincdp xsnmaddmsp 10001
v3.0 XX3 v2.07 XX3
10010 000-- I 10010 001-- I
10010 xsmaxjdp xsnmsubasp 10010
v3.0 XX3 v2.07 XX3
10011 000-- I 10011 001-- I
10011 xsminjdp xsnmsubmsp 10011
v3.0 XX3 v2.07 XX3
10100 000-- I 10100 001-- I
10100 xsmaxdp xsnmaddadp 10100
v2.06 XX3 v2.06 XX3
10101 000-- I 10101 001-- I
10101 xsmindp xsnmaddmdp 10101
v2.06 XX3 v2.06 XX3
10110 000-- I 10110 001-- I
10110 xscpsgndp xsnmsubadp 10110
v2.06 XX3 v2.06 XX3
10111 001-- I
10111 xsnmsubmdp 10111
v2.06 XX3
11000 000-- I 11000 001-- I
11000 xvmaxsp xvnmaddasp 11000
v2.06 XX3 v2.06 XX3
11001 000-- I 11001 001-- I
11001 xvminsp xvnmaddmsp 11001
v2.06 XX3 v2.06 XX3
11010 000-- I 11010 001-- I
11010 xvcpsgnsp xvnmsubasp 11010
v2.06 XX3 v2.06 XX3
11011 000-- I 11011 001-- I
11011 xviexpsp xvnmsubmsp 11011
v3.0 XX3 v2.06 XX3
11100 000-- I 11100 001-- I
11100 xvmaxdp xvnmaddadp 11100
v2.06 XX3 v2.06 XX3
11101 000-- I 11101 001-- I
11101 xvmindp xvnmaddmdp 11101
v2.06 XX3 v2.06 XX3
11110 000-- I 11110 001-- I
11110 xvcpsgndp xvnmsubadp 11110
v2.06 XX3 v2.06 XX3
11111 000-- I 11111 001-- I
11111 xviexpdp xvnmsubmdp 11111
v3.0 XX3 v2.06 XX3
Table 23: EXT60: Extended Opcode Map for Primary Opcode 60 (opcode bits 21:30) (Sheet 2 of 4)
01000 01001 01010 01011 01100 01101 01110 01111
0--00 010-- I 00000 011-- I
00000 xxsldwi xscmpeqdp 00000
v2.06 XX3 v3.0 XX3
0--01 010-- I 00001 011-- I
00001 xxpermdi xscmpgtdp 00001
v2.06 XX3 v3.0 XX3
00010 010-- I 00010 011-- I
00010 xxmrghw xscmpgedp 00010
v2.06 XX3 v3.0 XX3
00011 010-- I
00011 xxperm 00011
v3.0 XX3
0--00 010-- I 00100 011-- I
00100 xxsldwi xscmpudp 00100
v2.06 XX3 v2.06 XX3
0--01 010-- I 00101 011-- I
00101 xxpermdi xscmpodp 00101
v2.06 XX3 v2.06 XX3
00110 010-- I
00110 xxmrglw 00110
v2.06 XX3
00111 010-- I 00111 011-- I
00111 xxpermr xscmpexpdp 00111
v3.0 XX3 v3.0 XX3
0--00 010-- I 01000 011-- I
01000 xxsldwi xvcmpeqsp 01000
v2.06 XX3 v2.06 XX3
0--01 010-- I 01001 011-- I
01001 xxpermdi xvcmpgtsp 01001
v2.06 XX3 v2.06 XX3
01010 0100- I 01010 0101- I 01010 011-- I
01010 xxspltw xxextractuw xvcmpgesp 01010
v2.06 XX2 v3.0 XX2 v2.06 XX3
01011 01000 01011 0101- I
01011 XPND60-1 xxinsertw 01011
{expanded} v3.0 XX2
0--00 010-- I 01100 011-- I
01100 xxsldwi xvcmpeqdp 01100
v2.06 XX3 v2.06 XX3
0--01 010-- I 01101 011-- I
01101 xxpermdi xvcmpgtdp 01101
v2.06 XX3 v2.06 XX3
01110 011-- I
01110 xvcmpgedp 01110
v2.06 XX3
01111 01111
10000 010-- I
10000 xxland 10000
v2.06 XX3
10001 010-- I
10001 xxlandc 10001
v2.06 XX3
10010 010-- I
10010 xxlor 10010
v2.06 XX3
10011 010-- I
10011 xxlxor 10011
v2.06 XX3
10100 010-- I
10100 xxlnor 10100
v2.06 XX3
10101 010-- I
10101 xxlorc 10101
v2.07 XX3
10110 010-- I
10110 xxlnand 10110
v2.07 XX3
10111 010-- I
10111 xxleqv 10111
v2.07 XX3
11000 011-- I
11000 xvcmpeqsp. 11000
v2.06 XX3
11001 011-- I
11001 xvcmpgtsp. 11001
v2.06 XX3
11010 011-- I
11010 xvcmpgesp. 11010
v2.06 XX3
11011 11011
11100 011-- I
11100 xvcmpeqdp. 11100
v2.06 XX3
11101 011-- I
11101 xvcmpgtdp. 11101
v2.06 XX3
11110 011-- I
11110 xvcmpgedp. 11110
v2.06 XX3
11111 11111
Table 23: EXT60: Extended Opcode Map for Primary Opcode 60 (opcode bits 21:30) (Sheet 3 of 4)
10000 10001 10010 10011 10100 10101 10110 10111
00000 1010- I 00000 1011- I
00000 xsrsqrtesp xssqrtsp 00000
v2.07 XX2 v2.07 XX2
00001 1010- I
00001 xsresp 00001
v2.07 XX2
00010 00010
00011 00011
Table 23: EXT60: Extended Opcode Map for Primary Opcode 60 (opcode bits 21:30) (Sheet 4 of 4)
11000 11001 11010 11011 11100 11101 11110 11111
----- 11--- I
00000 xxsel 00000
v2.06 XX4
00001 00001
00010 00010
00011 00011
00100 00100
00101 00101
00110 00110
00111 00111
01000 01000
01001 01001
01010 01010
01011 01011
01100 01100
01101 01101
01110 01110
01111 01111
10000 10000
10001 10001
10010 10010
10011 10011
10100 10100
10101 10101
10110 10110
10111 10111
11000 11000
11001 11001
11010 11010
11011 11011
11100 11100
11101 11101
11110 11110
11111 11111
Table 24: XPND60-1: Extended Opcode Map for PO=60 XO=0b01011_01000 (opcode bits 11:15)
000 001 010 011 100 101 110 111
00 --- I
00 xxspltib 00
v3.0 XX1
01 01
10 10
11 11
Table 25: XPND60-2: Extended Opcode Map for PO=60 XO=0b10101_1011- (opcode bits 11:15)
000 001 010 011 100 101 110 111
00 000 I 00 001 I
00 xsxexpdp xsxsigdp 00
v3.0 XX2 v3.0 XX2
01 01
10 000 I 10 001 I
10 xscvhpdp xscvdphp 10
v3.0 XX2 v3.0 XX2
11 11
Table 26: XPND60-3: Extended Opcode Map for PO=60 XO=0b11101_1011- (opcode bits 11:15)
000 001 010 011 100 101 110 111
00 000 I 00 001 I 00 111 I
00 xvxexpdp xvxsigdp xxbrh 00
v3.0 XX2 v3.0 XX2 v3.0 XX2
01 000 I 01 001 I 01 111 I
01 xvxexpsp xvxsigsp xxbrw 01
v3.0 XX2 v3.0 XX2 v3.0 XX2
10 111 I
10 xxbrd 10
v3.0 XX2
11 000 I 11 001 I 11 111 I
11 xvcvhpsp xvcvsphp xxbrq 11
v3.0 XX2 v3.0 XX2 v3.0 XX2
Table 27: EXT63: Extended Opcode Map for Primary Opcode 63 (opcode bits 21:30) (Sheet 1 of 4)
00000 00001 00010 00011 00100 00101 00110 00111
00000 00000 I 00000 00010I --000 00011I 00000 00100I --000 00101I
00000 fcmpu daddq[.] dquaq[.] xsaddqp[o] xsrqpi[x] 00000
P1 X v2.05 X v2.05 Z23 v3.0 X v3.0 X
00001 00000 I 00001 00010I --001 00011I 00001 00100I --001 00101I 00001 00110I
00001 fcmpo dmulq[.] drrndq[.] xsmulqp[o] xsrqpxp mtfsb1[.] 00001
P1 X v2.05 X v2.05 Z23 v3.0 X v3.0 X P1 X
00010 00000 I -0010 00010I --010 00011I 00010 00110I
00010 mcrfs dscliq[.] dquaiq[.] mtfsb0[.] 00010
P1 X v2.05 Z22 v2.05 Z23 P1 X
-0011 00010I --011 00011I 00011 00100I
00011 dscriq[.] drintxq[.] xscpsgnqp 00011
v2.05 Z22 v2.05 Z23 v3.0 X
00100 00000 I 00100 00010I 00100 00100I 00100 00110I
00100 ftdiv dcmpoq xscmpoqp mtfsfi[.] 00100
v2.06 X v2.05 X v3.0 X P1 X
00101 00000 I 00101 00010I 00101 00100I
00101 ftsqrt dtstexq xscmpexpqp 00101
v2.06 X v2.05 X v3.0 X
-0110 00010I
00110 dtstdcq 00110
v2.05 Z22
-0111 00010I --111 00011I
00111 dtstdgq drintnq[.] 00111
v2.05 Z22 v2.05 Z23
01000 00010I --000 00011I --000 00101I
01000 dctqpq[.] dquaq[.] xsrqpi[x] 01000
v2.05 X v2.05 Z23 v3.0 X
01001 00010I --001 00011I --001 00101I
01001 dctfixq[.] drrndq[.] xsrqpxp 01001
v2.05 X v2.05 Z23 v3.0 X
01010 00010I --010 00011I
01010 ddedpdq[.] dquaiq[.] 01010
v2.05 X v2.05 Z23
01011 00010I --011 00011I
01011 dxexq[.] drintxq[.] 01011
v2.05 X v2.05 Z23
01100 00100I
01100 xsmaddqp[o] 01100
v3.0 X
01101 00100I
01101 xsmsubqp[o] 01101
v3.0 X
01110 00100I
01110 xsnmaddqp[o] 01110
v3.0 X
--111 00011I 01111 00100I
01111 drintnq[.] xsnmsubqp[o] 01111
v2.05 Z23 v3.0 X
10000 00010I --000 00011I 10000 00100I --000 00101I
10000 dsubq[.] dquaq[.] xssubqp[o] xsrqpi[x] 10000
v2.05 X v2.05 Z23 v3.0 X v3.0 X
10001 00010I --001 00011I 10001 00100I --001 00101I
10001 ddivq[.] drrndq[.] xsdivqp[o] xsrqpxp 10001
v2.05 X v2.05 Z23 v3.0 X v3.0 X
-0010 00010I --010 00011I 10010 00111
10010 dscliq[.] dquaiq[.] XPND63-1 10010
v2.05 Z22 v2.05 Z23 {expanded}
-0011 00010I --011 00011I
10011 dscriq[.] drintxq[.] 10011
v2.05 Z22 v2.05 Z23
10100 00010I 10100 00100I
10100 dcmpuq xscmpuqp 10100
v2.05 X v3.0 X
10101 00010I 10101 00011I
10101 dtstsfq dtstsfiq 10101
v2.05 X v3.0 X
-0110 00010I 10110 00100I 10110 00111 I
10110 dtstdcq xststdcqp mtfsf[.] 10110
v2.05 Z22 v3.0 X P1 XFL
-0111 00010I --111 00011I
10111 dtstdgq drintnq[.] 10111
v2.05 Z22 v2.05 Z23
11000 00010I --000 00011I --000 00101I
11000 drdpq[.] dquaq[.] xsrqpi[x] 11000
v2.05 X v2.05 Z23 v3.0 X
11001 00010I --001 00011I 11001 00100 --001 00101I
11001 dcffixq[.] drrndq[.] XPND63-2 xsrqpxp 11001
v2.05 X v2.05 Z23 {expanded} v3.0 X
11010 00010I --010 00011I 11010 00100 11010 00110I
11010 denbcdq[.] dquaiq[.] XPND63-3 fmrgow 11010
v2.05 X v2.05 Z23 {expanded} v2.07 X
11011 00010I --011 00011I 11011 00100I
11011 diexq[.] drintxq[.] xsiexpqp 11011
v2.05 X v2.05 Z23 v3.0 X
11100 11100
11101 11101
11110 00110I
11110 fmrgew 11110
v2.07 X
--111 00011I
11111 drintnq[.] 11111
v2.05 Z23
Table 27: EXT63: Extended Opcode Map for Primary Opcode 63 (opcode bits 21:30) (Sheet 2 of 4)
01000 01001 01010 01011 01100 01101 01110 01111
00000 01000 I 00000 01100I 00000 01110I 00000 01111 I
00000 fcpsgn[.] frsp[.] fctiw[.] fctiwz[.] 00000
v2.05 X P1 X P2 X P2 X
00001 01000 I
00001 fneg[.] 00001
P1 X
00010 01000 I
00010 fmr[.] 00010
P1 X
00011 00011
00101 00101
00110 00110
00111 00111
01000 01000 I
01000 fabs[.] 01000
P1 X
01001 01001
01010 01010
01011 01011
01100 01000 I
01100 frin[.] 01100
v2.02 X
01101 01000 I
01101 friz[.] 01101
v2.02 X
01110 01000 I
01110 frip[.] 01110
v2.02 X
01111 01000 I
01111 frim[.] 01111
v2.02 X
10000 10000
10001 10001
10010 10010
10011 10011
10100 10100
10101 10101
10110 10110
10111 10111
11000 11000
11011 11011
11100 11100
11111 11111
Table 27: EXT63: Extended Opcode Map for Primary Opcode 63 (opcode bits 21:30) (Sheet 3 of 4)
10000 10001 10010 10011 10100 10101 10110 10111
///// 10010I ///// 10100I ///// 10101I ///// 10110I ----- 10111 I
00000 fdiv[.] fsub[.] fadd[.] fsqrt[.] fsel[.] 00000
P1 A P1 A P1 A P2 A PPC A
///// 10010 ///// 10100 ///// 10101 ///// 10110
00001 fdiv[.] fsub[.] fadd[.] fsqrt[.] 00001
{invalid} {invalid} {invalid} {invalid}
00010 00010
00011 00011
00100 00100
00101 00101
00110 00110
00111 00111
01000 01000
01001 01001
01010 01010
01011 01011
01100 01100
01101 01101
01110 01110
01111 01111
10000 10000
10001 10001
10010 10010
10011 10011
10100 10100
10101 10101
10110 10110
10111 10111
11000 11000
11001 11001
11010 11010
11011 11011
11100 11100
11101 11101
11110 11110
11111 11111
Table 27: EXT63: Extended Opcode Map for Primary Opcode 63 (opcode bits 21:30) (Sheet 4 of 4)
11000 11001 11010 11011 11100 11101 11110 11111
///// 11000 I ----- 11001 I ///// 11010I ----- 11100I ----- 11101I ----- 11110I ----- 11111 I
00000 fre[.] fmul[.] frsqrte[.] fmsub[.] fmadd[.] fnmsub[.] fnmadd[.] 00000
v2.02 A P1 A PPC A P1 A P1 A P1 A P1 A
///// 11000 ///// 11010
00001 fre[.] frsqrte[.] 00001
{invalid} {invalid}
00010 00010
00011 00011
00100 00100
00101 00101
00110 00110
00111 00111
01000 01000
01001 01001
01010 01010
01011 01011
01100 01100
01101 01101
01110 01110
01111 01111
10000 10000
10001 10001
10010 10010
10011 10011
10100 10100
10101 10101
10110 10110
10111 10111
11000 11000
11001 11001
11010 11010
11011 11011
11100 11100
11101 11101
11110 11110
11111 11111
Table 28: XPND63-1: Extended Opcode Map for PO=63 XO=0b10010_00111 (opcode bits 11:15)
000 001 010 011 100 101 110 111
00 000 I 00 001 I
00 mffs[.] mffsce 00
P1 X v3.0B X
01 01
Table 29: XPND63-2: Extended Opcode Map for PO=63 XO=0b11001_00100 (opcode bits 11:15)
000 001 010 011 100 101 110 111
00 000 I 00 010 I
00 xsabsqp xsxexpqp 00
v3.0 X v3.0 X
01 000 I
01 xsnabsqp 01
v3.0 X
10 000 I 10 010 I
10 xsnegqp xsxsigqp 10
v3.0 X v3.0 X
11 011 I
11 xssqrtqp[o] 11
v3.0 X
Table 30: XPND63-3: Extended Opcode Map for PO=63 XO=0b11010_00100 (opcode bits 11:15)
000 001 010 011 100 101 110 111
00 001 I 00 010 I
00 xscvqpuwz xscvudqp 00
v3.0 X v3.0 X
01 001 I 01 010 I
01 xscvqpswz xscvsdqp 01
v3.0 X v3.0 X
10 001 I 10 100 I 10 110 I
10 xscvqpudz xscvqpdp[o] xscvdpqp 10
v3.0 X v3.0 X v3.0 X
11 001 I
11 xscvqpsdz 11
v3.0 X
This appendix lists all the instructions in the Power ISA, sorted by primary opcode, then by extended opcode bits
26:31 (if any), then by opcode bits 21:25 (if any), then by expanded opcode bits 11:15 (if any).
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
This appendix lists all the instructions in the Power ISA, sorted in reverse order by ISA version.
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name
Mode Dep4
Mnemonic
Privilege3
Version2
Format
Book
Page
Instruction1 Name