0% found this document useful (0 votes)

661 views30 pages

ARM CPU Architecture

Uploaded by

Tuyen Dinh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

661 views30 pages

ARM CPU Architecture

Uploaded by

Tuyen Dinh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Introduction: Introduces the ARMv8-A CPU architecture overview, setting the stage for subsequent technical details.
Speaker Introduction: Presents the speaker's background, experience, and role in the ARM company, establishing credibility.
ARM Architecture Evolution: Describes the historical development of ARM architecture leading to the ARMv8 series.
ARMv8-A AArch64 Features: Details the main features of the ARMv8-A AArch64 including instruction set and performance enhancements.
What's New in ARMv8-A: Highlights new execution states in ARMv8-A, differentiating between AArch32 and AArch64.
Register Banks: Explains the structure and function of register files used in ARMv8-A architecture.
Procedure Call Standard: Defines the rules for using registers in creating function calls in ARM architecture.
A64 Instruction Set Overview: An overview of the new A64 instruction set, detailing fixed length and similar instructions to previous ARM versions.
Multiprocessing: Describes techniques and tools used to enable multiprocessing in ARMv8-A CPUs.
ARM Cortex-A57 MPCore: Provides a diagram of the ARM Cortex-A57 memory and processing architecture.
Cluster of Cores: Explains the cluster configuration in ARM SMP systems.
Multicluster Systems: Details system integration of multiple clusters to enhance processing capability.
Power Efficiency: Examines the power efficiency strategies implemented in different ARM Cortex models.
Global Task Scheduling: Introduction to ARM's Global Task Scheduling strategy using big.LITTLE architecture.
big.LITTLE Development: Explores development strategies and considerations when employing ARM's big.LITTLE architecture.
Takeaways: Summarizes the ARM big.LITTLE technology advantages, focusing on performance and efficiency.
NEON - SIMD Data Processing: Discusses the functioning of the NEON SIMD data processing engine as an ARM technology feature.
Vectors: Provides explanation on the structure and access methods for vectors in SIMD operations.
Vectorizing Examples: Provides practical examples of code vectorization in ARMv8-A for optimization.
ARM Technical Training: Promotes ARM's training and development resources available for technical skill enhancement.
Closure: Concludes the presentation with acknowledgments and copyright notice.

ARMv8-A CPU Architecture

Overview

Chris Shore
Training Manager, ARM

ARM Game Developer Day, London

03/12/2015
Chris Shore – ARM Training Manager
 With ARM for 16 years
 Managing customer training for 15 years
 Worldwide customer training delivery
 Approved Training Centers
 Active Assist onsite project services

 Background
 MA Physics & Computer Science, Cambridge University, 1986
 Background as an embedded software consultant for 17 years
 Software Engineer
 Project Manager
 Technical Director
 Engineering Manager
 Training Manager

 Regular conference speaker and trainer

2 © ARM 2015
ARM Architecture Evolution
Virtualization

ARMv4

ARM7TDMI ARM926EJ ARM1176 Cortex®-A9 Cortex-A50 series

Increasing SoC complexity

Increasing OS complexity
Increasing choice of HW and SW

1995 2005 2015

3 © ARM 2015
ARMv8-A AArch64
 AArch64 supports ALL
ARMv8-A features
 Clean instruction set
 Larger address space (>4GB memory for
Application)
 Wider data register (64-bit)
 Better SIMD (NEON)
 Better floating point
 Increased number and size of general
purpose registers (31 general, 32
FP/NEON/Crypto)
 Better security architecture, dealing with
hypervisors
 More…

4 © ARM 2015
What’s new in ARMv8-A?
 ARMv8-A introduces two execution states: AArch32 and AArch64

 AArch32
 Evolution of ARMv7-A
 A32 (ARM) and T32 (Thumb) instruction sets
 ARMv8-A adds some new instructions
 Traditional ARM exception model
 Virtual addresses stored in 32-bit registers

 AArch64
 New 64-bit general purpose registers (X0 to X30)
 New instructions – A64, fixed length 32-bit instruction set
 Includes SIMD, floating point and crypto instructions
 New exception model
 Virtual addresses now stored in 64-bit registers

5 © ARM 2015
Register Banks
 AArch64 provides 31 general purpose registers
 Each register has a 32-bit (w0-w30) and 64-bit (x0-x30) form
63 32 31 0

Wn
Xn
 Separate register file for floating point, SIMD and crypto operations - Vn
 32 registers, each 128-bits
 Can also be accessed in 32-bit (Sn) or 64-bit (Dn) forms

S0 S1
D0 D1
V0 V1

6 © ARM 2015
Procedure Call Standard
 There is a set of rules known as a Procedure Call Standard (PCS) that
specifies how registers should be used:
EXPORT foo
foo PROC
…
extern int foo(int, int); … Must preserve
… X19 - X29
int main(void) …
{ … Can corrupt:
... … X0 - X18
a = foo(b, c); …
... …
} …
Return address: X30
…
(LR)
RET
ENDP

7 © ARM 2015
A64 Overview
 AArch64 introduces new A64 instruction set
 Similar set of functionality as traditional A32 (ARM) and T32 (Thumb) ISAs

 Fixed length 32-bit instructions

 Syntax similar to A32 and T32

ADD W0, W1, W2  w0 = w1 + w2 (32-bit addition)

ADD X0, X1, X2  x0 = x1 + x2 (64-bit addition)

 Most instructions are not conditional

 Floating point and Advanced SIMD instructions

 Optional cryptographic extensions

8 © ARM 2015
Multiprocessing

Use of common Multi-threading

In the core
parallelizing tools where possible

Never easy,
OpenMP,
ARM NEON but
Renderscript,
tech/SIMD increasingly
OpenCL, etc.
necessary

9 © ARM 2015
ARM Cortex-A57 MPCore
A useful diagram
Interrupt System
(GIC)

Cortex-A57

L1 I L1 D
Cache Cache

L2 Cache

10 © ARM 2015
A “Cluster” of Cores
The standard unit in an ARM SMP system

Interrupt System (GIC)

Cortex-A57 Cortex-A57

L1 I
Cache
L1 D
Cache
L1 I
Cache
L1 D
Cache
…
Snoop Control Unit (SCU)
L2 Cache

11 © ARM 2015
And a Multicluster
System built of more than one cluster, often using different cores

Interrupt System (GIC) Interrupt System (GIC)

Cortex-A57 Cortex-A57 Cortex-A53 Cortex-A53

L1 I
Cache
L1 D
Cache
L1 I
Cache
L1 D
Cache
… L1 I
Cache
L1 D
Cache
L1 I
Cache
L1 D
Cache
…
L2 Cache L2 Cache

Coherent Interconnect
12 © ARM 2015
Why is the Power Efficiency so Different?
 Cortex-A53: design focused on energy efficiency (while balancing performance)
 8-11 stages, In-Order and limited dual-issue

 Cortex-A57: focused on best performance (while balancing energy efficiency)

 15+ stages, Out-of-Order and multi-issue, register renaming
 Average 40-60% performance boost over Cortex-A9 in general purpose code
 Instructions per cycle (IPC) improvements
Ex1 Ex2 Integer pipe Wr
De
Fe1 Fe2 Fe3 Iss M1 M2 MAC pipe Wr
De
Cortex-A7 Pipeline
Cortex-A53 Pipeline F1 F2 Neon/FPU pipe (fixed length) F3 F4 F5

I1 E1 Simple Cluster 0 pipe WB

D1 D2 D3 I1 E1 Simple Cluster 1 pipe WB

F1 F2 F3 F4 F5 D1 D2 D3 R1 R2 P1 P2 I1 E1 E2 E3 MAC pipe WB

Cortex-A57
Cortex-A15 Pipeline
Pipeline D1 D2 D3 I1 E1 E2 Complex Cluster pipe (variable length) E10 WB

I1 E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 WB

13 © ARM 2015
Global Task Scheduling (GTS)
 ARM Implementation is called “big.LITTLE MP”
 Hosted in Linaro git trees but developed by ARM
 Using GTS, all of the big and LITTLE cores are available to the Linux kernel for
scheduling tasks.
Linux Kernel
big CPU
LITTLE CPU
Scheduler
LITTLE CPU big CPU

LITTLE CPU ?
big CPU
LITTLE CPU

big CPU

Task A Task A
14 © ARM 2015
Global Task Scheduling (GTS)
Cluster 1
Cluster 2
Thread Thread
Thread
1. System starts
Tasks fill up the system
Thread Thread Thread

LITTLE CPU
Thread

2. Demanding tasks detected Big CPU

Thread Thread
Based on amount of time a thread
Thread is ready to run (run queue residency)
LITTLE CPU Moved to a ‘big’ CPU Thread

Thread Thread

Thread Thread 3. Task no longer demanding Big CPU

LITTLE CPU
Moved to a ‘LITTLE’ CPU
15 © ARM 2015
ARM big.LITTLE Development (GTS)
 Trust the scheduler…  LITTLE cores are great
 Linux will schedule for performance and  You’ll be using them a lot
efficiency  ARM Cortex-A53 ~20% greater perf than
 All new tasks started on big to avoid latency Cortex-A9 cores
 Quickly adapts to a task’s needs  Most workloads will run on LITTLE
 More thermal headroom for other SoC
components
…Unless
 You know a thread is intensive but not
urgent  big cores are serious powerhouses
 Affine to LITTLE, never to big  Think of them as short-burst accelerators –
 E.g. maybe use this for asset loading on a e.g. physics-based special effects
separate thread  Think about the trade-offs during design

17 © ARM 2015
ARM big.LITTLE Development
Things to avoid
 Imbalanced threads sharing common data
 Cluster coherency is excellent but not free

 If you have real-time (RT) threads note that…

 RT threads are not auto migrated
 RT threads are a design decision, think carefully about affinity
 http://linux.die.net/man/2/sched_setaffinity

 Avoid long running tasks on big cores

 You’ll rarely need that processing power for long periods
 Can the task be parallelized?

18 © ARM 2015
Takeaways
 ARM big.LITTLE technology & Global Task Scheduling is easily available
 Fantastic peak performance
 Energy-efficient, sustainable compute for long running workloads

 Multi-processing
 Get ahead of the limits on single thread performance
 Avoid thermal constraints on performance

19 © ARM 2015
What is NEON?
 SIMD data processing engine
 A Single Instruction operates on Multiple Data
 Generally a common operation is carried out in parallel on pairs of elements in vector registers

 Provided as an extension to the instruction and register sets

 Can be implemented on all Cortex-A series processors
 NEON instructions are part of the ARM or Thumb instruction stream

 Instructions operate on vectors of elements of the same data type

 Data types may be floating point or integer
 Integers may be signed/unsigned 8-bit, 16-bit, 32-bit, 64-bit
 Single precision floating point (32-bit)
 Most instructions perform the same operation in all lanes
20 © ARM 2015
SIMD Registers
 Separate set of 32 registers, each 128-bit wide
 Architecturally named V0 – V31
 Used by scalar floating-point and SIMD instructions

 The instruction syntax uses qualified register names

 Bn for byte, Hn for half-word, Sn for single-word, Dn for double-word, Qn for quad-word

B0 B1
H0 H1
S0 S1
D0 D1
Q0 Q1
V0 V1

21 © ARM 2015
SIMD Operations
 Operand registers are treated as vectors of individual elements
 Operations are performed simultaneously on a number of “lanes”
 In this example each lane contains a pair of 32-bit elements, one from each of the operand 128-
bit vector registers
127 96 95 64 63 32 31 0
Element Vn

Element Vm

Op Op Op Op

Element Vd
Lane
22 © ARM 2015
Vectors
 When accessing a SIMD vector, the Vn register name is used with an extension to indicate the number and
size of elements in the vector
 Vn.xy
 n is the register number, x is the number of elements, y is the size of the elements encoded as a letter

FADD V0.2D, V5.2D, V6.2D ; 2x double-precision floats

 Total vector length must be either 128-bits or 64-bits

 If 64-bits are written, Vn[127:64] are automatically cleared to 0

ADD V0.8H, V3.8H, V4.8H ; 8x 16-bit integers

ADD V0.8B, V3.8B, V4.8B ; 8x 8-bit integers, clear top of V0

 Some instructions refer to a single element of a vector

 Example: V3.B[3] – Byte in V3[23:16]
 Rest of register is unaffected when an element is written

MUL V0.4S, V2.4S, V3.S[2] ; Multiply each element of V2 by V3.S[2]

23 © ARM 2015
Vectorizing Example 1 - Optimize for code size
void add_int(int * pa, int * pb, unsigned int n, int x) - No vectorization
{
unsigned int i;
for(i = 0; i < n; i++)
pa[i] = pb[i] + x;
}
armcc --cpu=Cortex-A15 -O3 -Ospace
add_int PROC
PUSH {r4,r5,lr}
MOV r4,#0
B |L0.28|
|L0.12|
LDR r5,[r1,r4,LSL #2]
ADD r5,r5,r3
STR r5,[r0,r4,LSL #2]
ADD r4,r4,#1
|L0.28|
CMP r4,r2
BCC |L0.12|
POP {r4,r5,pc}
ENDP

24 © ARM 2015
Vectorizing Example 2 - Optimize for speed
void add_int(int * pa, int * pb, unsigned int n, int x) - No vectorization
- Penalty of code size increase
{
unsigned int i;
for(i = 0; i < n; i++)
pa[i] = pb[i] + x;
}
Compilation options: armcc --cpu=Cortex-A15 -O3 -Otime

add_int PROC |L0.36|

CMP r2,#0 LSRS r2,r2,#1
Zero BEQ |L0.76| BEQ |L0.76|
iterations? TST r2,#1 |L0.44|
SUB r1,r1,#4 LDR r12,[r1,#4]
SUB r0,r0,#4 SUBS r2,r2,#1
Odd BEQ |L0.36| ADD r12,r12,r3
iterations? LDR r12,[r1,#4]! STR r12,[r0,#4] Unrolled
ADD r12,r12,r3 LDR r12,[r1,#8]! x2
STR r12,[r0,#4]! ADD r12,r12,r3
STR r12,[r0,#8]!
BNE |L0.44|
|L0.76|
BX lr
ENDP
25 © ARM 2015
Vectorizing Example 3 - Optimize for speed
void add_int(int *pa, int * pb, unsigned int n, int x) - Number of iterations is a power of 2
- No vectorization
{ Divisible
unsigned int i; by 4
for(i = 0; i < (n&~3); i++)
pa[i] = pb[i] + x;
}
Compilation options: armcc --cpu=Cortex-A15 -O3 -Otime

add_int PROC |L0.24|

BICS r12,r2,#3 LDR r12,[r1,#4]
BEQ |L0.56| SUBS r2,r2,#1
LSR r2,r2,#2 ADD r12,r12,r3
SUB r1,r1,#4 STR r12,[r0,#4]
SUB r0,r0,#4 LDR r12,[r1,#8]!
LSL r2,r2,#1 ADD r12,r12,r3
STR r12,[r0,#8]!
BNE |L0.24|
|L0.56|
BX lr
ENDP

26 © ARM 2015
Vectorizing Example 4 - Optimize for speed
void add_int(int * pa, int * pb, unsigned int n, int x) - Vectorization
{
unsigned int i;
for(i = 0; i < n; i++)
pa[i] = pb[i] + x;
}
armcc --cpu=Cortex-A15 -O3 -Otime --vectorize
add_int PROC |L0.48| |L0.96|
PUSH {r4-r6} VLD1.32 {d2,d3},[r1]! LDR r6,[r1,r2,LSL #2]
Overlap? SUB r4,r0,r1 Vector VADD.I32 q1,q1,q0 ADD r5,r1,r2,LSL #2
ASR r12,r4,#2 version VST1.32 {d2,d3},[r0]! ADD r6,r6,r3
CMP r12,#0 SUBS r2,r2,#1 STR r6,[r0,r2,LSL #2]
BLE |L0.32| BNE |L0.48| Scalar LDR r5,[r5,#4]
BIC r12,r2,#3 |L0.68| version ADD r4,r0,r2,LSL #2
CMP r12,r4,ASR #2 POP {r4-r6} ADD r2,r2,#2
BHI |L0.76| BX lr ADD r5,r5,r3
|L0.32| |L0.76| CMP r12,r2
BICS r12,r2,#3 BIC r2,r2,#3 STR r5,[r4,#4]
BEQ |L0.68| CMP r2,#0 BHI |L0.96|
VDUP.32 q0,r3 MOVNE r2,#0 POP {r4-r6}
LSR r2,r2,#2 BEQ |L0.68| BX lr
BLS |L0.68| ENDP

27 © ARM 2015
Vectorizing Example 5
void add_int(int* restrict pa, int* restrict pb, unsigned int n, int x)
{
unsigned int i;
for(i = 0; i < (n & ~3); i++)
pa[i] = pb[i] + x;
}
Compilation options: >armcc --cpu=Cortex-A15 -O3 -Otime --vectorize --c99

add_int PROC - Optimize for speed

BICS r12,r2,#3
BEQ |L0.36| - Vectorization
VDUP.32 q0,r3 - Number of loop iterations is a power of 2
LSR r2,r2,#2 - Non-Overlapping pointers, use of “restrict”
|L0.16|
VLD1.32 {d2,d3},[r1]!
VADD.I32 q1,q1,q0
VST1.32 {d2,d3},[r0]!
SUBS r2,r2,#1
BNE |L0.16|
|L0.36|
BX lr
ENDP

28 © ARM 2015
ARM Technical Training
 Available
 Covering all ARM cores and GPUs
 Hardware and Software options
 Customizable agendas from 2 hours (remote only) to 5 days
 Accessible
 Private, onsite delivery for corporate customers
 Public course schedule for open enrolment
 Live Remote webex for flexible delivery
 Affordable
 Standard private course fees include all ARM’s related expenses
 Public course pricing accessible to individuals
 Live Remote is cost effective for small teams http://www.arm.com/training
 Learn from the experts
29 © ARM 2015
For more information visit the
Mali Developer Centre:
http://malideveloper.arm.com
• Revisit this talk in PDF and audio
format post event
• Download tools and resources

Thank you

The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its
subsidiaries) in the EU and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their
respective owners.
Copyright © 2015 ARM Limited

ARMv8-A CPU Architecture
Overview
Chris Shore
ARM Game Developer Day, London
Training Manager, ARM
03/12/2015

© ARM 2015
2
Chris Shore – ARM Training Manager
With ARM for 16 years
Managing customer training for 15 years

World

© ARM 2015
3
ARM7TDMI
ARM1176
Cortex®-A9
Cortex-A50 series
ARM926EJ ARMv4
ARMv4
Increasing SoC complexity
Increasin

© ARM 2015
4
ARMv8-A AArch64
AArch64 supports ALL
ARMv8-A features

Clean instruction set

Larger address space (>4

© ARM 2015
5
What’s new in ARMv8-A?
ARMv8-A introduces two execution states: AArch32 and AArch64

AArch32

Evoluti

© ARM 2015
6
Register Banks
AArch64 provides 31 general purpose registers

Each register has a 32-bit (w0-w30) and 64-

© ARM 2015
7
Procedure Call Standard
There is a set of rules known as a Procedure Call Standard (PCS) that
specifies ho

© ARM 2015
8
A64 Overview
AArch64 introduces new A64 instruction set

Similar set of functionality as traditional A32

© ARM 2015
9
Multiprocessing
In the core
ARM NEON
tech/SIMD
Use of common
parallelizing tools
OpenMP,
Renderscript

© ARM 2015
10
ARM Cortex-A57 MPCore
A useful diagram
Cortex-A57
L1 I
Cache
L1 D
Cache
L2 Cache
Interrupt System
(

Transitioning From ARM v7 To ARM v8 - All The Basics Your Must Know
No ratings yet
Transitioning From ARM v7 To ARM v8 - All The Basics Your Must Know
18 pages
ARM Operating Modes
0% (1)
ARM Operating Modes
152 pages
DDI0553B P Armv8m Arm
No ratings yet
DDI0553B P Armv8m Arm
2,067 pages
ARM Notes From NPTEL
100% (2)
ARM Notes From NPTEL
3 pages
Designing A SoC With Arm Cortex-M (2.0) PDF
No ratings yet
Designing A SoC With Arm Cortex-M (2.0) PDF
18 pages
I2C Bus Architecture and Protocol Overview
100% (2)
I2C Bus Architecture and Protocol Overview
16 pages
SYS/BIOS User's Guide v6.46.01
No ratings yet
SYS/BIOS User's Guide v6.46.01
263 pages
Accident Detection and Alert System Based On ARM7 Microcontroller
No ratings yet
Accident Detection and Alert System Based On ARM7 Microcontroller
4 pages
Intro To Fpga and Quartus Prime Software
100% (1)
Intro To Fpga and Quartus Prime Software
38 pages
Efabless Caravel "Harness" Soc: Preliminary
No ratings yet
Efabless Caravel "Harness" Soc: Preliminary
30 pages
Arm Neoverse N2:: Arm'S 2 Generation High Performance Infrastructure Cpus and System Ips
100% (1)
Arm Neoverse N2:: Arm'S 2 Generation High Performance Infrastructure Cpus and System Ips
27 pages
Embedded Systems and Linux Interview Questions - ARM Interview Questions
100% (2)
Embedded Systems and Linux Interview Questions - ARM Interview Questions
5 pages
AMD64 Architecture Programmers Manual
No ratings yet
AMD64 Architecture Programmers Manual
386 pages
ARM RISC Architecture Overview
No ratings yet
ARM RISC Architecture Overview
243 pages
ARM7 9 Family
No ratings yet
ARM7 9 Family
46 pages
ARM Cortex-M4 Architecture Overview
No ratings yet
ARM Cortex-M4 Architecture Overview
14 pages
Compiler Code Generation Guide
No ratings yet
Compiler Code Generation Guide
64 pages
Axi Prot
No ratings yet
Axi Prot
273 pages
8.2.0 ARM Architecture
No ratings yet
8.2.0 ARM Architecture
117 pages
FSM Design Techniques for Glitch-Free Outputs
100% (1)
FSM Design Techniques for Glitch-Free Outputs
13 pages
Superscalar Processor Techniques Explained
No ratings yet
Superscalar Processor Techniques Explained
17 pages
Arm Cortex-M4 Processor Datasheet
100% (2)
Arm Cortex-M4 Processor Datasheet
10 pages
Technical Document Placeholder
100% (1)
Technical Document Placeholder
29 pages
Rvfpga Getting Started Guide: The Imagination University Programme
No ratings yet
Rvfpga Getting Started Guide: The Imagination University Programme
102 pages
Debugger Arc
No ratings yet
Debugger Arc
46 pages
ARM Instruction Sets and Program: Jin-Fu Li Department of Electrical Engineering National Central University
100% (2)
ARM Instruction Sets and Program: Jin-Fu Li Department of Electrical Engineering National Central University
116 pages
Coresight v3 0 Architecture Specification IHI0029E
No ratings yet
Coresight v3 0 Architecture Specification IHI0029E
280 pages
X HCI
100% (1)
X HCI
13 pages
Logic Synthesis and SoC Design Insights
No ratings yet
Logic Synthesis and SoC Design Insights
22 pages
USB Interface for FPGA Testing
No ratings yet
USB Interface for FPGA Testing
22 pages
FPGA Architecture
No ratings yet
FPGA Architecture
39 pages
Alternate Protocol Negotiation in A High Performance Interconnect
No ratings yet
Alternate Protocol Negotiation in A High Performance Interconnect
40 pages
SystemVerilog Update Part1 Cummings
No ratings yet
SystemVerilog Update Part1 Cummings
28 pages
7 Series Memory Controllers
100% (1)
7 Series Memory Controllers
36 pages
Chapter 2 ARM Cortex-M3 Architecture - 3
No ratings yet
Chapter 2 ARM Cortex-M3 Architecture - 3
68 pages
USB 3.2 and xHCI Training Overview
No ratings yet
USB 3.2 and xHCI Training Overview
3 pages
Book Constraining Designs For Synthesis and Timing Analysis PDF
No ratings yet
Book Constraining Designs For Synthesis and Timing Analysis PDF
2 pages
ARM Processor Architecture Guide
No ratings yet
ARM Processor Architecture Guide
6 pages
HLS Workshop Using AMD HLS To Supercharge Your Design Performance Apr 2024.2-FdB
No ratings yet
HLS Workshop Using AMD HLS To Supercharge Your Design Performance Apr 2024.2-FdB
232 pages
Design & Verification of AMBA APB Protocol
No ratings yet
Design & Verification of AMBA APB Protocol
4 pages
Introduction To CortexM3
No ratings yet
Introduction To CortexM3
15 pages
Customizing AXI IP for MicroBlaze
No ratings yet
Customizing AXI IP for MicroBlaze
28 pages
21CS43 - Module 1
No ratings yet
21CS43 - Module 1
21 pages
ARM Processor
No ratings yet
ARM Processor
296 pages
ARM Processor Roadmap
100% (1)
ARM Processor Roadmap
23 pages
Richard Grisenthwaite
No ratings yet
Richard Grisenthwaite
25 pages
Lecture 05 ARM Processors
No ratings yet
Lecture 05 ARM Processors
65 pages
Overview of ARM Microprocessors
No ratings yet
Overview of ARM Microprocessors
94 pages
Embedded Processor: Unit II
No ratings yet
Embedded Processor: Unit II
50 pages
CISC vs RISC: ARM Architecture Insights
No ratings yet
CISC vs RISC: ARM Architecture Insights
47 pages
A Brief History of ARM
No ratings yet
A Brief History of ARM
12 pages
Day1 Arm
No ratings yet
Day1 Arm
44 pages
Overview of ARM Processor Architecture
No ratings yet
Overview of ARM Processor Architecture
26 pages
Overview of ARM Architecture and Design
No ratings yet
Overview of ARM Architecture and Design
26 pages
ARM Processors and Architectures - Uni Program
No ratings yet
ARM Processors and Architectures - Uni Program
81 pages
2) Arm
No ratings yet
2) Arm
26 pages
Adv Comp Arch Q3'11
No ratings yet
Adv Comp Arch Q3'11
54 pages
Selected Topics in Embedded Systems: Seminar
No ratings yet
Selected Topics in Embedded Systems: Seminar
13 pages
ARM Processor Architecture Basics
No ratings yet
ARM Processor Architecture Basics
45 pages
Unit 1
No ratings yet
Unit 1
18 pages
AUTOSAR Model-Based Engineering Insights
No ratings yet
AUTOSAR Model-Based Engineering Insights
29 pages
Computer Science Distilled
7% (14)
Computer Science Distilled
35 pages
The Armv8-R Architecture
No ratings yet
The Armv8-R Architecture
1 page
Understanding Processor Security Technologies and How To Apply Them
No ratings yet
Understanding Processor Security Technologies and How To Apply Them
40 pages
Test Cases For Decision Coverage and Modified Condition - Decision Coverage
No ratings yet
Test Cases For Decision Coverage and Modified Condition - Decision Coverage
71 pages
Sydney Staff Expenses Report
No ratings yet
Sydney Staff Expenses Report
3 pages
Simple Vim Reference
No ratings yet
Simple Vim Reference
2 pages
Dome Sunroom Quotation (Round) - Prance-Price List - New
No ratings yet
Dome Sunroom Quotation (Round) - Prance-Price List - New
2 pages
Python Data Analytics Setup Guide
No ratings yet
Python Data Analytics Setup Guide
10 pages
ECKELT VARIO Mechanical Restraint Guide
No ratings yet
ECKELT VARIO Mechanical Restraint Guide
12 pages
Principles of Steel Design Overview
No ratings yet
Principles of Steel Design Overview
2 pages
Real Champions Academy - 1
No ratings yet
Real Champions Academy - 1
5 pages
B.Arch History Exam Questions 2021
No ratings yet
B.Arch History Exam Questions 2021
1 page
Construction Cost Breakdown Summary
No ratings yet
Construction Cost Breakdown Summary
1 page
SR en 12101-13
No ratings yet
SR en 12101-13
120 pages
General Specifications NHA
91% (46)
General Specifications NHA
511 pages
Specification Gypsum & Hilux Board
No ratings yet
Specification Gypsum & Hilux Board
2 pages
004 Architecture Prompts Ebook Tifa Studio
No ratings yet
004 Architecture Prompts Ebook Tifa Studio
17 pages
HVAC Calculations and Heat Recovery
No ratings yet
HVAC Calculations and Heat Recovery
12 pages
Smartplant 3D Structure Task: Process, Power and Marine Division
100% (1)
Smartplant 3D Structure Task: Process, Power and Marine Division
112 pages
Preliminary Estimate for Office Building
No ratings yet
Preliminary Estimate for Office Building
12 pages
Potsdamer Final03
No ratings yet
Potsdamer Final03
8 pages
Pdfvoucher - Hotel Voucher - 10521
No ratings yet
Pdfvoucher - Hotel Voucher - 10521
1 page
Construction Basics Quiz
No ratings yet
Construction Basics Quiz
12 pages
Sector 70 Mohali Floor Plan Layout
No ratings yet
Sector 70 Mohali Floor Plan Layout
1 page
EcoTherm Structural Insulation Quick Guide Aug 2015
No ratings yet
EcoTherm Structural Insulation Quick Guide Aug 2015
40 pages
Wall (Les) S, Liminal Environments Between Inside and Outside
No ratings yet
Wall (Les) S, Liminal Environments Between Inside and Outside
164 pages
Bimforum-Global Lod-2024 Final Eng
No ratings yet
Bimforum-Global Lod-2024 Final Eng
278 pages
Load Dispersion Patterns in Multistorey Masonry Walled Structures
No ratings yet
Load Dispersion Patterns in Multistorey Masonry Walled Structures
15 pages
Future Tenses - Common Mistakes
No ratings yet
Future Tenses - Common Mistakes
3 pages
Old City Hall (Tacoma, Washington)
No ratings yet
Old City Hall (Tacoma, Washington)
9 pages
Analysis of Mat Foundations Design
No ratings yet
Analysis of Mat Foundations Design
8 pages
Contact Information for NGOs and Police
No ratings yet
Contact Information for NGOs and Police
3 pages
Quotation For The Replacement of Ground Floor Tiles at Mare D'Albert Village Hall
No ratings yet
Quotation For The Replacement of Ground Floor Tiles at Mare D'Albert Village Hall
5 pages
History II Baroque and Rococco
No ratings yet
History II Baroque and Rococco
8 pages
The Changing Face of Morocco Under King Hassan II
No ratings yet
The Changing Face of Morocco Under King Hassan II
32 pages
CIGRE b2-305
No ratings yet
CIGRE b2-305
8 pages

ARM CPU Architecture

Uploaded by

ARM CPU Architecture

Uploaded by

ARMv8-A CPU Architecture

ARM Game Developer Day, London

 Regular conference speaker and trainer

ARM7TDMI ARM926EJ ARM1176 Cortex®-A9 Cortex-A50 series

Increasing SoC complexity

1995 2005 2015

 Fixed length 32-bit instructions

 Syntax similar to A32 and T32

ADD W0, W1, W2  w0 = w1 + w2 (32-bit addition)

 Most instructions are not conditional

 Floating point and Advanced SIMD instructions

 Optional cryptographic extensions

Use of common Multi-threading

Interrupt System (GIC)

Interrupt System (GIC) Interrupt System (GIC)

Cortex-A57 Cortex-A57 Cortex-A53 Cortex-A53

 Cortex-A57: focused on best performance (while balancing energy efficiency)

I1 E1 Simple Cluster 0 pipe WB

D1 D2 D3 I1 E1 Simple Cluster 1 pipe WB

2. Demanding tasks detected Big CPU

Thread Thread 3. Task no longer demanding Big CPU

 If you have real-time (RT) threads note that…

 Avoid long running tasks on big cores

 Provided as an extension to the instruction and register sets

 Instructions operate on vectors of elements of the same data type

 The instruction syntax uses qualified register names

FADD V0.2D, V5.2D, V6.2D ; 2x double-precision floats

 Total vector length must be either 128-bits or 64-bits

ADD V0.8H, V3.8H, V4.8H ; 8x 16-bit integers

 Some instructions refer to a single element of a vector

MUL V0.4S, V2.4S, V3.S[2] ; Multiply each element of V2 by V3.S[2]

add_int PROC |L0.36|

add_int PROC |L0.24|

add_int PROC - Optimize for speed

You might also like