You are on page 1of 15

ARM Processor

History
-

Advanced RISC Machine


ARM was developed at Acorn computers Ltd. Of Cambridge, England (1998).
o RISC concept introduced in 1980 at Stanford and Berkeley.
o Key design goal was to build a compact RISC CPU with low latency I/O
(interrupt) handling, which is required for embedded systems in real
environment.
o Led by Sophie Wilson and Steve Ferber.
ARM limited found in 1990.
ARM cores
o Licensed partners to develop and fabricate new controllers.
o Soft core.
ARM provides intellectual Property(IP core)
Soft intellectual property:
ARM provides IP to licenses and also the synthesis flows to allow the partner
to synthesise the processor to their technology.
RTL and synthesis flow.
GPSTI layout.

Note - Code Density: Space taken up in memory by the executable program.

ARM Nomenclature

ARM family
ARM [x][y][z][T][D][M][I][E][J][F][S][Z]

X Family belonging
Y Memory management protection unit
Z Available cache in processor
T Thumb mode
D Debug interface
M Multiplication unit (Multiplier)
I ICE (integrated circuit emulation) logic
-

Gives In-Circuit Emulation


Combination of software & hardware unit
Set watch pointer and break pointer.

E DSP enhancements
If only E is present then it is assumed that [T] [D] [I] are always present
J Java extension
JAZELLE (Java runtime environment)
F Floating point extension (co processor)
If floating point is present then you can connect or attach 15 co-processors.
S Synthesized core
-

Divides gate on. FPGA will be used for h/w implementation.


Source code that can be compiled into from C can be used by EDA
tools to select size of cache or any other h/w control modification.
Map a code program into IC unit.

Z- Trust Zone
-

Securing a system on internet.


Application will run into two systems. One is OS (non-secure) & another
is kernel (Secure).

ARM VERSIONS

Architecture Versions
(a) Ver.1 (1983-85): 26 bit address lines (no multiply or co-processor).
(b) Ver.2 : includes 32 bit result multiply co-processor;(32 bit databus,32 bit
(4Gbyte address space) and sixteen 32 bit registers. Simplest useful 32 bit
microprocessor in the world.
(c) Ver.3: 32 bit addressing, cache memory, co-processors (15) for cache register.
(d) Ver. 4: add signed, unsigned half word and signed byte and load and store
instruction.
(e) Ver. 4t: 16 bit thumb compressed form of instruction introduced.
(f) Ver. 5t: superset of 4t adding new instructions.
(g) Ver. 5te: add signal processing signal extension.

ARM6
Separate CPSR/SPSR undefined instructions & abort, MMU support
Virtual memory (extending RAM)
In late 1980s apple computer and LSI technology started working with acorn
on newer version of the ARM core.
ARM7 ARM7TDMI
Van Neumann (8k cache) architecture
32 bit embedded processor, MMU (3stage pipeline)
Strong ARM (power saving)
Broadcom (BCM 2121 processor)
Built in GPS protocol
ARM9E-S
More instruction for stage change
Enhance multiplier, DSP instruction, Fast MAC
ARM9
Offered Harvard architecture
Offered 5 stage pipeline
ARM9TDMI

ARM9E
1997

ARM9ES
Thumb/ARM enhanced
DSP instruction
Fast MAC
ARM IDE
1999
X-Scale (Intel) v5TE
Run at 1GHz ; MMU, Harvard Architecture.
ARM11
2003
Multiprocessor instructions
Multimedia instruction
Cortex

Pipelining Concept
It uses 3 stage pipelining
3 stage pipeline
a) 1st stage :Cycle1:- fetch instruction 1
nd
b) 2 stage
Cycle 2:- Fetch 2nd instruction &
Decode 1st instruction
rd
c) 3 stage
Cycle 3:- Fetch 3rd instruction
Decode 2nd instruction
Execute 1st instruction
ARM9
5 stage pipeline
Cycle1:- Fetch 1st inst
Cycle 2:- Fetch 2nd instruction, Decode 1st instr.
Cycle3:- Fetch 3rd instr., Decode 2nd inst & execute 3rd inst

Cycle4:- Buffer data: -> Access data memory or buffer


Cycle 5:- Write back to register file.
OR
a)
b)
c)
d)
e)

Fetch
Decode
Execute
Buffer data
Write back

Example:
TIME

CYCLE
1
2
3

fetch
ADD
SUB
CMP

decode

execute

ADD
SUB

ADD

Advantages:
An instruction is executed every cycle.
As pipeline length increases the amount of work done at each stage
decreases and hence processor attains high operating frequency.
System latency is also increases as it takes aim to fill pipeline before
execution can start.

Disadvantages
Length of pipeline increases which always increases data dependency.
While data dependency will be reduced b using a concept of instruction
scheduling

ARM architecture
Harvard Architecture
Architecture

Van Neumann

1. Used in DSP and other


processor found in latest
embedded systems and mobile
communication systems, audio
speech, image processing
systems.

Used in conventional
processors found in PCs and
servers and embedded
systems with only control
functions.

2. The data and program


memories are separate.

The data and program are


stored in the same memory.

3. The core is executed in parallel.

The code is executed serially


and takes more clock cycles.

4. It has MAC unit (Multiply


Accumulator)

There is no exclusive
multiplier.

5. Barrel shifter helps in shifting


and rotating operations of the
data.

No barrel shifter was made


available

6. The program tends to grow big


in size.

The program can be


optimized in lesser size.

e.g.: ARM (9 EJ)

e.g.: ARM7TDM

Some versions of ARM

1) ARM9TDMI
Dhrystone MIPS /MHz(Dhrystone increase processor performance)
It has 5 stage pipeline
Simultaneously access to instruction and data memory
It offers Harvard architecture
Increases available memory BW
Instruction memory interface
Data memory interface
2) ARM7TDMI
Mostly used in cell phones as processor
Broadcom(BCM) chip.(GPRS/GSM) single chip base band processor
ARM7TDMI(RISC)
Memory write interface
Voice recording recognition
Services
GSM/GPRS protocol stack

|
|
|
|
|

OAK(DSP)
single processing
Echo cancellation
speech algo noise suppression
Equalization

OMAP used in NOKIA series phones


Consists of ARM and DSP

3) ARM 11
Arm1136 J-S
8stage pipeline with incorporating separate load store architecture and
arithmetic pipeline
SIMP extension
Vector floating for fast floating point operation
4) X-Scale
Harvard Architecture
Works at 1GHz
Separate coprocessor for extension
Exhibits ARMv5TE instruction set

ARM architecture
RISC (Reduced Instruction Set Computer)

1)
2)
3)
4)

Fixed Instruction length 32 bit


Mostly single cycle instruction
Large uniform register
Load store architecture

5) Good speed/power consumption ratio


Enhanced RISC
1)
2)
3)
4)
5)
6)
7)
8)
9)

Control over ALU & shifter


L-S multiple instruction. Maximize the data throughput.
Auto increment and decrement mode (for loop)
Conditional execution of most instructions
Sequential memory access
Multiple register transfer
Inline barrel shifter
Having Thumb mode(16 bit) instruction
Enhance instruction (multiplier/saturation arithmetic)

1)
2)
3)
4)
5)
6)
7)
8)

ARM Registers
GPR hold either data or address
All registers are 32 bit
In user mode 16 data registers and 2 status registers are visible
Data registersr0 to r15
3 registers R13, R14, R15 perform special function.
R13-> SP(stack pointer)
R14-> LR(link register)
R15-> PC(program counter)

Link Register
1) Not stored in stack
2) If it is stored in stack then it will take time to complete process, which is not
granted in ES (system).
3) For return address when a subroutine is called.
4) It increases the processor speed.
Status Register
1)
2)
3)
4)

CPSR(current program status register)


32 bit register
Consist of Thumb mode of operation to be set or reset.
Mode of processor to be changed etc.
Status register
1) Flag
2) Status
3) Extension
4) Control
5) Mode(5 bit)
1) Conditional flag (32nd bit to 28th bit)
N-> negative flag
Z-> Zero flag
C-> carry flag

Read

V->overflow flag
2) Sticky overflow: (Q flag)
Q-> Related to saturation overflows
3) INT(interrupt flag)
DATA
Instruction decoder
7th bit and 6th bit
I=1 disable IRQ (interrupt request)
F=1 disable FIQ (Fast interrupt)
4) State bit (5th bit, T bit)
Rd, result
Write
Extend
T=0 -> ARM mode-> 32 bit Sign
instruction
T=1 -> Thumb mode -> 16 bit instruction
5) Mode (5 bit,0-4th bit)
Acc privilege mode
Process mode to control
(privilege
mode32&bit
non
Register
file (r0-r15)
(user))
H5

ARM data flow model

PC

Barrel shifter

MAC

A. L. U.

Address register

ARM core: HARVARD Architecture (Data flow representation)

Inc

Code

Read

DATA

Read

Write

Instruction decoder

Sign Extend

Rd, result

Register file (r0-r15) 32 bit


H5
PC

Barrel shifter

MAC

A. L. U.

Address register

Inc

ARM conditional Mnemonics


Mnemonics

Name

Conditional Flag

EQ
NE
CS
CC
MI
PL
VS
VC
HI
LS
GE
LT
GT
LE
AL

Equal
Not equal
Carry set
Carry clear
Minus
Plus
Overflow flag
Not overflow flag
Unsigned higher integer
Unsigned lower integer
Signed greater than equal
Signed less than
Signed greater than
Signed less than equal
Always

CACHE & TCM

Z
z
C
c
N
n
V
V
zC
Z or C
NV or nv
Nv or nV
NzV or nzv
Z or Nv or nV
Ignore

(tightly coupled memory)

The cache memory block fast memory placed between main memory and core.
It allows for more efficient fetches from some memory types.
The cache memory increases the performance gain.
It has two forms.
1) Von Neumann architecture
2) Harvard architecture
Disadvantages
These required good predictor because the main get certain memory, you
can fetch it and store it in the available cache memory & you can also
attach a tag.
Now if the processor will require instructions then if it is in cache then it
called tag hit and if it is not there then known as tag miss.
The good cache memory has high tag-hit rate.
Precision
For these problem we use LRU (least recently use) method.
Take out more frequently use data & put into cache.

A cache provides an overall increase in performance but at an expense of


predictable execution.
But for real time systems it is not that the code execution is
DETERMINISTIC: -- The time taken for loading and storing instructions or
data must be predicable. These can be achieved by using a form of
memory called tightly couple memory (TCM).
TCM is fast (SRAM) which is close to core and guarantees the clock cycle
required to fetch instructions or data. Critical for real time systems
provided the code execution is deterministic.
Now TCM appears in a memory in the address map and can be a fast
memory (used as predictable real time system)
By combining both the technologies, arm processor can have both
improved performance and predictable real time response.

EMBEDDED DEVICE (ARM Based)


Bus scheduling
Address of device/memory
Control of read and write both accesses.

AMBA bus protocol


-

The advanced microcontroller bus architecture (AMBA) most popular bus for
the ARM core.
The AMBA bus introduced the ARM system Bus (ASB) and ARM peripheral Bus
(APB).
Later on ARM introduces another bus design called ARM high performance
bus.

ASB
-

It is used in the interfacing all memory units, core and other units.
It is used for system purpose.

APB
AHB

This bus interface is generally used for I/O devices.


Like Ethernet card, keyboard or padetc.

It is used for providing higher data throughput then ASB because it is based
on a centralized multiplexed bus scheme rather than the ASB bidirectional
design.

Banked registers

R0
R1
R8
R2
R10
R3
R11
R4
R12
R5
R13
R6
R14
R7
R8
R9
R10
R11
R12
R13
R14
R15

Modes of ARM
1) ARM mode
2) Thumb mode

Thumb mode

16 bit instruction
High code density
(a space taken up in memory by an executable instruction)

Instructions in Thumb mode


ADD R0, #3
Thumb mode 2
It is a combine form of Thumb mode and ARM mode
Compatible with both
Now let us see how to switch
1) USER mode

As we know in user mode we cant access CPSR register


By using branch instruction
BX
-> R0=1 Thumb mode
BLX -> R0=0 ARM mode
2) Privilege mode
We can access CPSR register in this mode
Just change or set a bit T in the CPSR register.
Bit
operation result
T set
Thumb mode
T reset ARM mode

ARM Technologies

ARM item

Trust Zone

DSP

ARM

Intelligent Energy management


OS support
o Standards are available for power saving
Monitored all the devices and assume that at any point of time ,the
devices turn ON or OFF

TRUST ZONE
Consists of 2 OS
1) Simple OS
2) Kernel
The simple OS is a non-secure & kernel is known as secure process.
Eg: connected to internet so it can corrupted by any one. So to resist them
we can introduce secure zone as Trust zone.
Monitor
USER (OS) Non-secure
AAA
SSS
PPP
PPP

Kernel (Secure)
AAA
PPP

Application program

PPP
security application program

Both are orthogonal to each other.

DSP enhancement

MAC unit (Multiply & Arithmetic Control Unit)


Option of 32x16,16x16,32x32
16 instruction
Saturation arithmetic
Loading instruction.