You are on page 1of 69

Day 10 Agenda

 Exceptions
 System Design
 Memory Interface
 Synchronization
 Input / Output

39v10 The ARM Architecture TM


1 1
Exception Handling

 When an exception occurs, the ARM:


 Copies CPSR into SPSR_<mode>
 Sets appropriate CPSR bits
 Change to ARM state 0x1C FIQ
 Change to exception mode
0x18 IRQ
 Disable interrupts (if appropriate)
0x14 (Reserved)
 Stores the return address in LR_<mode>
 Sets PC to vector address 0x10 Data Abort
0x0C Prefetch Abort
 To return, exception handler needs to: 0x08 Software Interrupt
 Restore CPSR from SPSR_<mode>
0x04 Undefined Instruction
 Restore PC from LR_<mode>
0x00 Reset
This can only be done in ARM state. Vector Table
Vector table can be at
0xFFFF0000 on ARM720T
and on ARM9/10 family
devices

39v10 The ARM Architecture TM


2 2
PSR Mode Bit Values

39v10 The ARM Architecture TM


3 3
Normal and High Vector Address

39v10 The ARM Architecture TM


4 4
Reset

 When the nRESET signal goes LOW, the core abandons


executing instruction and
 Forces the PC to fetch the next instruction from address
0x00.
 When nRESET goes HIGH again, then Core
 Overwrites R14_svc and SPSR_svc by copying the current
values of the PC and CPSR into them. The value of the
saved PC and SPSR is not defined.
 Forces M[4:0] to 10011 (Supervisor mode), sets the I and F
bits in the CPSR, and clears the CPSR's T bit.
 Execution resumes in ARM state.

39v10 The ARM Architecture TM


5 5
Undefined Exception
 When the core comes across an instruction which it cannot handle, it
takes the undefined instruction trap.
 This mechanism may be used to extend either the THUMB or
ARM instruction set by software emulation.
 R14_udf = Address of next instruction address after the
undefined instruction
 SPSR_udf = CPSR
 CPSR[4:0] = 0b11011 (Mode bits forced to undef state)
 CPSR[T,IRQ] = 0b01 (ARM State, and Disable IRQs)
 Forces the PC to fetch the next instruction from address
0x04 or 0xFFFF0004
 After emulating the failed instruction, the trap handler should execute
the following irrespective of the state (ARM or Thumb)
 CPSR = SPSR_udf
 MOVS PC,R14_und (This restores the CPSR and returns to the
instruction following the undefined instruction)

39v10 The ARM Architecture TM


6 6
Software Interrupts
31 28 27 24 23 0

Cond 1 1 1 1 SWI number (ignored by processor)

Condition Field
 The software interrupt instruction (SWI) is used for entering Supervisor mode,
usually to request a particular supervisor function.
 R14_svc = Address of next instruction after the SWI instruction
 SPSR_svc = CPSR
 CPSR[4:0] = 0b10011
 CPSR[T,IRQ] = 0b01 (ARM State, and Disable IRQs)
 Forces the PC to fetch the next instruction from address 0x08 or
0xFFFF0008
 Upon Exiting SWI
 CPSR = SPSR_svc
 MOVS PC,R14_svc (This restores the PC and CPSR, and returns to
the instruction following the SWI)

39v10 The ARM Architecture TM


7 7
Pre-fetch Abort Instruction
 If a pre-fetch abort occurs, the pre-fetched instruction is marked as
invalid, but the exception will not be taken until the instruction reaches
the head of the pipeline. If the instruction is not executed - for example
because a branch occurs while it is in the pipeline - the abort does not
take place.
 R14_abt = Address of aborted instruction + 4
 SPSR_abt = CPSR
 CPSR[4:0] = 0b10111
 CPSR[T,IRQ] = 0b01 (ARM State, and Disable IRQs)
 Forces the PC to fetch the next instruction from address 0x0C
or 0xFFFF000C
 Upon Exiting Pre-Fetch Abort
 CPSR = SPSR_abt
 SUBS PC,R14, #4 (This restores the PC and CPSR, and returns
to the instruction following the Pre-Fetch abort)

39v10 The ARM Architecture TM


8 8
Data Abort

 If a data abort occurs, the action taken depends on the instruction type:
 Single data transfer instructions (LDR, STR) write back modified
base registers: the Abort handler must be aware of this.
 The swap instruction (SWP) is aborted as though it had not been
executed.
 Block data transfer instructions (LDM, STM) complete.
 If write-back is set, the base is updated.
 If the instruction would have overwritten the base with data (ie it
has the base in the transfer list), the overwriting is prevented.
 All register overwriting is prevented after an abort is indicated,
which means in particular that R15 (always the last register to be
transferred) is preserved in an aborted LDM instruction.
 The abort mechanism allows the implementation of a demand
paged virtual memory system. In such a system the processor is
allowed to generate arbitrary addresses. When the data at an
address is unavailable, the Memory Management Unit (MMU)
signals an abort.

39v10 The ARM Architecture TM


9 9
Data Abort

 The abort handler must then work out the cause of the abort, make the
requested data available, and retry the aborted instruction. The application
program needs no knowledge of the amount of memory available to it, nor is
its state in any way affected by the abort
 Entering Data Abort
 R14_abt = Address of aborted instruction + 8
 SPSR_abt = CPSR
 CPSR[4:0] = 0b10111
 CPSR[T,IRQ] = 0b01 (ARM State, and Disable IRQs)
 Forces the PC to fetch the next instruction from address 0x10 or
0xFFFF0010
 Upon Exiting Data Abort
 CPSR = SPSR_abt
 SUBS PC,R14, #8 (This restores the PC and CPSR, and re-executes the
aborted instruction)
 SUBS PC,R14, #4 (This restores the PC and CPSR, and returns to the
instruction following the data abort instruction)

39v10 The ARM Architecture TM


10 10
Interrupt Request (IRQ) Exception
 The IRQ (Interrupt Request) exception is a normal interrupt caused by a
LOW level on the nIRQ input. IRQ has a lower priority than FIQ and is
masked out when a FIQ sequence is entered. It may be disabled at any
time by
 setting the I bit in the CPSR, though this can only be done from a
privileged (non-User) mode.
 Entering IRQ
 R14_irq = Address of next instruction + 4
 SPSR_irq = CPSR
 CPSR[4:0] = 0b10010
 CPSR[T,IRQ] = 0b01 (ARM State, and Disable IRQs)
 Forces the PC to fetch the next instruction from address 0x18 or
0xFFFF0018
 Exiting IRQ
 CPSR = SPSR_irq
 SUBS PC,R14_irq, #4 (This restores the PC and CPSR, and returns to
the instruction)

39v10 The ARM Architecture TM


11 11
Fast Interrupt Request (FIQ) Exception
 The FIQ (Fast Interrupt Request) exception is designed to support a data transfer
or channel process, and in ARM state has sufficient private registers to remove
the need for register saving (thus minimizing the overhead of context switching).
 FIQ is externally generated by taking the nFIQ input LOW. This input can accept
either synchronous or asynchronous transitions, depending on the state of the
ISYNC input signal. When ISYNC is LOW, nFIQ and nIRQ are considered
asynchronous, and a cycle delay for synchronization is incurred before the
interrupt can affect the processor flow.
 Entering FIQ
 R14_fiq = Address of next instruction + 4

 SPSR_fiq = CPSR

 CPSR[4:0] = 0b10001

 CPSR[T,FIQ,IRQ] = 0b011 (ARM State, and Disable FIQ’s & IRQs)

 Forces the PC to fetch the next instruction from address 0x1C or

0xFFFF001C
 Exiting FIQ
 CPSR = SPSR_fiq

 SUBS PC,R14_fiq, #4 (This restores the PC and CPSR, and returns to the

instruction)
39v10 The ARM Architecture TM
12 12
Return Address Calculation
Return Instruction Previous State Cycles

ARM R14_x THUMB


R14_x
BL MOV PC, R14 PC + 4 PC + 2 1
SWI MOVS PC, R14_svc PC + 4 PC + 2 1
UDEF MOVS PC, R14_und PC + 4 PC + 2 1
FIQ SUBS PC, R14_fiq, #4 PC + 4 PC + 4 2
IRQ SUBS PC, R14_irq, #4 PC + 4 PC + 4 2
PABT SUBS PC, R14_abt, #4 PC + 4 PC + 4 1
DABT SUBS PC, R14_abt, #8 PC + 8 PC + 8 3
RESET NA – – 4

39v10 The ARM Architecture TM


13 13
Exception Priorities

Highest priority:
 1. Reset
 2. Data abort
 3. FIQ
 4. IRQ
 5. Pre-fetch abort
Lowest priority:
 6. Undefined Instruction and Software interrupt.

39v10 The ARM Architecture TM


14 14
Agenda

Exceptions
 System Design
Memory Interface
Synchronization
Input / Output

39v10 The ARM Architecture TM


15 15
Example ARM-based System

16 bit RAM 32 bit RAM

Interrupt
Controller
Peripherals I/O
nIRQ nFIQ

ARM
Core
8 bit ROM

39v10 The ARM Architecture TM


16 16
AMBA
Arbiter Reset

ARM
TIC
Remap/
External Bus Interface Timer
Pause
ROM External

Bridge
Bus
Interface
External
RAM On­chip Interrupt
Decoder RAM Controller

AHB or ASB APB

System Bus Peripheral Bus
 AMBA
 Advanced Microcontroller Bus Architecture
 Open specification framework for System-on-Chip (SoC) Designs

39v10 The ARM Architecture TM


17 17
AMBA
 AHB
 The widely adopted AHB System Bus connects embedded processors
such as an ARM core to high-performance peripherals, DMA controllers,
on-chip memory and interfaces.
 APB
 The AMBA APB (Advanced Peripheral Bus) is a simpler bus protocol
designed for ancillary or general purpose peripherals
 ADK
 The AMBA Design Kit is a library of components which enables system
developers to build AMBA based systems quickly and accurately.
 ACT
 The AMBA Compliance Testbench, a comprehensive environment which
enables the rapid development of tests to certify the IP as AMBA
compliant.
 PrimeCell
 ARM’s AMBA compliant peripherals

39v10 The ARM Architecture TM


18 18
Agenda

Exceptions
System Design
 Memory Interface
Synchronization
Input / Output

39v10 The ARM Architecture TM


19 19
Memory Interface

 Memory Hierarchy
Memory Size and Speed
ARM MMU
Memory Interfacing

39v10 The ARM Architecture TM


20 20
Memory
 Memories come in many shapes, sizes and types
 Shapes means packages like TQFP, TSOP, DIP Surface Mount
 Size: Like 4Mx8-Bit, 16Kx1-bit)

39v10 The ARM Architecture TM


21 21
Memory Technologies

 DRAM: Dynamic Random Access Memory


 upside: very dense (1 transistor per bit) and inexpensive
 downside: requires refresh and often not the fastest access times
 often used for main memories Word line
Pass transistor

 SRAM: Static Random Access Memory Capacitor

 upside: fast and no refresh required Bit line

 downside: not so dense and not so cheap


 often used for caches
A A
B B

 ROM: Read-Only Memory


 often used for bootstrapping and such

39v10 The ARM Architecture TM


22 22
Exploiting Memory Hierarchy

 Users want large and fast memories!

SRAM access times are 2 - 25ns at cost of $100 to $250 per Mbyte.
1997
DRAM access times are 60-120ns at cost of $5 to $10 per Mbyte.
Disk access times are 10 to 20 million ns at cost of $.10 to $.20 per Mbyte.
CPU

 Try and give it to them anyway


 build a memory hierarchy
Increasing distance
Level 1
from the CPU in
access time

Levels in the Level 2


memory hierarchy

Level n

Size of the memory at each level

39v10 The ARM Architecture TM


23 23
The Memory Pyramid

39v10 The ARM Architecture TM


24 24
Locality

 A principle that makes having a memory hierarchy a good idea

 If an item is referenced,

temporal locality: it will tend to be referenced again soon


spatial locality: nearby items will tend to be referenced soon.

Why does code have locality?

 Our initial focus: two levels (upper, lower)


 block: minimum unit of data
 hit: data requested is in the upper level
 miss: data requested is not in the upper level

39v10 The ARM Architecture TM


25 25
Cache

 Two issues:
 How do we know if a data item is in the cache?
 If it is, how do we find it?
 Our first example:
 block size is one word of data
 "direct mapped"

For each item of data at the lower level,


there is exactly one location in the cache where it might be.

e.g., lots of items at the lower level share locations in the upper level

39v10 The ARM Architecture TM


26 26
Direct Mapped Cache
Cache Memory
 64-way set-associative cache
with I-Cache and D-Cache
16KB each
 8words length per line with
one valid bit and two dirty bits
64
per line
Cache
CAM
 Pseudo random or round robin
RAM
Line replacement algorithm
Index  Write-through or write-back
cache operation to update the
main memory
 The write buffer can hold 16
words of data and four
addresses.

39v10 The ARM Architecture TM


27 27
Memory Interface

Memory Hierarchy
 Memory Size and Speed
ARM MMU
Memory Interfacing

39v10 The ARM Architecture TM


28 28
Storage Basics

 CPU sees the RAM as one


long, thin line of bytes
 That doesn't mean that it's
actually laid out that way
 Real RAM chips don't store
whole bytes, but rather they
store individual bits in a grid,
which you can address one
bit at a time

39v10 The ARM Architecture TM


29 29
SRAM Memory Timing
for Read Accesses

tRC
 Address and chip select tAA
signals are provided tAA Address
before data is available A11-A0 old address new address
CS
 Outputs reflect new data
WE

high undef
Address Bus Dout
impedance
Data Valid
tHz
2147H High-Speed 4096x1-bit static RAM tACS
2147H
Dout
A11-A0
tRC = Read cycle time
tAA = Address access time
DinWE CS
tACS = Chip select access time
tHZ = Chip deselections to high-Z out

39v10 The ARM Architecture TM


30 30
SRAM Memory Timing
for Write Accesses

 Address and data must be tWC


tAA
stable tS time-units before Address
write enable signal falls A11-A0 old address new address
tS
CS

WE
Address Bus
2147H High-Speed 4096X1-bit static RAM Din old data new data
tHz
2147H tACS
Din
A11-A0

DinWE CS
tS = Signal setup time
tRC = Read cycle time
tAA = Address access time
tACS = Chip select access time
tHZ = Chip deselections to high-Z out

39v10 The ARM Architecture TM


31 31
DRAM Organization and
Operations
 In the traditional DRAM, any storage location can be randomly
accessed for read/write by inputting the address of the
corresponding storage location.

 A typical DRAM of bit capacity 2N * 2M consists of an array of


memory cells arranged in 2N rows (word-lines) and 2M columns (bit-
lines).

 Each memory cell has a unique location represented by the


intersection of word and bit line.

 Memory cell consists of a transistor and a capacitor. The charge on


the capacitor represents 0 or 1 for the memory cell. The support
circuitry for the DRAM chip is used to read/write to a memory cell.

39v10 The ARM Architecture TM


32 32
DRAM Organization and
Operations
 Address decoders
to select a row and a column
 Sense amps
To detect and amplify the charge in
the capacitor of the memory cell.
 Read/Write logic
To read/store information in the
memory cell.
 Output Enable logic
Controls whether data should appear
at the outputs.
 Refresh counters
To keep track of refresh sequence.

39v10 The ARM Architecture TM


33 33
DRAM Memory Access

 DRAM Memory is arranged in a XY grid pattern of rows and


columns.
 First, the row address is sent to the memory chip and latched,
then the column address is sent in a similar fashion.
 This row and column-addressing scheme (called multiplexing)
allows a large memory address to use fewer pins.
 The charge stored in the chosen memory cell is amplified using
the sense amplifier and then routed to the output pin.
 Read/Write is controlled using the read/write logic.

39v10 The ARM Architecture TM


34 34
How DRAM Works

39v10 The ARM Architecture TM


35 35
DRAM Memory Access

A typical DRAM read operation:


2. The row address is placed on the address pins visa the address bus
3. RAS pin is activated, which places the row address onto the Row
Address Latch.
4. The Row Address Decoder selects the proper row to be sent to the sense
amps.
5. The Write Enable is deactivated, so the DRAM knows that it’s not being
written to.
6. The column address is placed on the address pins via the address bus
7. The CAS pin is activated, which places the column address on the
Column Address Latch
8. The CAS pin also serves as the Output Enable, so once the CAS signal
has stabilized, the sense amps place the data from the selected row and
column on the Data Out pin so that it can travel the data bus back out into
the system.
9. RAS and CAS are both deactivated so that the cycle can begin again.

39v10 The ARM Architecture TM


36 36
DRAM Performance Specs

 Important DRAM Performance Considerations


 Random access time: time required to read any random single cell
 Fast Page Cycle time: time required for page mode access --
read/write to memory location on the most recently-accessed page (no
need to repeat RAS in this case)
 Extended Data Out (EDO): allows setup of next address while current
data access is maintained
 SDRAM - Burst Mode: Synchronous DRAMs use a self-incrementing
counter and a mode register to determine the column address
sequence after the first memory location accessed on a page --
effective for applications that usually require streams of data from one
or more pages on the DRAM
 Required refresh rate: minimum rate of refreshes

39v10 The ARM Architecture TM


37 37
Turning
Bits
Into Bytes
(2x This
Picture)

39v10 The ARM Architecture TM


38 38
Memory Interface

Memory Hierarchy
Memory Size and Speed
 ARM MMU
Memory Interfacing

39v10 The ARM Architecture TM


39 39
ARM MMU

 Complex VM and protection mechanisms


 Presents 4 GB address space (why?)
 Memory granularity: 3 options supported
 1MB sections
 Large pages (64 KBytes) - access control within a large page on 16
KBytes
 Small pages (4 KBytes) - access control within a large page on 1
Kbytes
 Puts processor in Abort Mode when virtual address not mapped or
permission check fails
 Change pointer to page tables (called the translation table base, in
ARM jargon) to change virtual address space
 useful for context switching of processes

39v10 The ARM Architecture TM


40 40
Example: Single-Level Page Table
31 12 11 0
Virtual value = x value = y
Address
32 bits

x
220 y
entries page 212
data entries
table page
frame
8 bits
Size of page table
= 220 * 32 bits = 4 Mbytes Size of page
= 212 * 8 bits = 4 Kbytes

39v10 The ARM Architecture TM


41 41
Single-Level Page Table

 Assumptions
 32-bit virtual addresses
 4 Kbyte page size = 212 bytes
 32-bit address space

 How many virtual page numbers?


 232 / 212 = 220 = 1,048,576 virtual page numbers = number of entries in
the page table

 If each page table entry occupies 4 bytes, how much memory is


needed to store the page table?
 220 entries * 4 bytes = 222 bytes = 4 Mbytes

39v10 The ARM Architecture TM


42 42
Example: Two-level Page Table
31 22 21 12 11 0
Virtual value = x value = y value = z
Address

x
210 y 210
entries entries
page z
directory 212
page data
32 bits table
entries
page
32 bits frame
Size of page directory
8 bits
= 210 * 32 bits = 4 Kbytes
Size of page table
= 210 * 32 bits = 4 Kbytes
Size of page
= 212 * 8 bits = 4 Kbytes

39v10 The ARM Architecture TM


43 43
Two-Level Page Table

 Assumptions
 210 entries in page directory (= max number of page tables)
 210 entries in page table
 32 bits allocated for each page directory entry
 32 bits allocated for each page table entry

 How much memory is needed?


 Page table size = 210 entries * 32 bits = 212 bytes = 4 Kbytes
 Page directory size = 210 entries * 32 bits = 212 bytes = 4 Kbytes

39v10 The ARM Architecture TM


44 44
Two-Level Page Table
 Small (typical) system
 One page table might be enough
 Page directory size + Page table size = 8 Kbytes of memory
would suffice for virtual memory management
 How much physical memory could this one page table handle?
 Number of page tables * Number of page table entries * Page
size =
1 * 210 * 212 bytes = 4 Mbytes
 Large system
 You might need the maximum number of page tables
 Max number of page tables * Page table size =

210 directory entries * 212 bytes = 222 bytes = 4 Mbytes of


memory would be needed for virtual memory management
 How much physical memory could these 210 page tables handle?
 Number of page tables * Number of page table entries * Page
size =
210 * 210 * 212 bytes = 4 Gbytes
39v10 The ARM Architecture TM
45 45
Memory Interface

Memory Hierarchy
Memory Size and Speed
ARM MMU
 Memory Interfacing

39v10 The ARM Architecture TM


46 46
Interfacing External Memory
 Little/Big Endian support
 Address space: 4G bytes, (Differs in processor Implementation)
 Supports programmable 8/16/32-bit data bus width for each bank
 External address lines vary for a specific processor implementation
 Programmable bank start address and bank size for bank 7
 Eight memory banks:
 Memory banks for ROM, SRAM or Synchronous DRAM
 Fully Programmable access cycles for all memory banks
 Supports external wait signals to expend the bus cycle
 Supports self-refresh mode in SDRAM for power down
 Supports various types of ROM for booting (NOR/NAND Flash, EEPROM,
and others)
 The write buffer can hold 16 words of data and four addresses.

39v10 The ARM Architecture TM


47 47
CPU - Memory Interface

 CPU - Memory Interface usually consists of:


address bus
 uni-directional address bus
data bus
 bi-directional data bus
 read control line Read
 write control line CPU Write Memory
Ready
 ready control line size
 size (byte, word) control line
 Memory access involves a memory bus transaction
 read:
(1) set address, read and size,
(2) copy data when ready is set by memory
 write:
(1) set address, data, write and size,
(2) done when ready is set

39v10 The ARM Architecture TM


48 48
Memory Subsystem Components
 Memory subsystems
generally consist of address bus
chips+controller data bus
 Each chip provides few bits Read
(e.g., 1-4) per access CPU Write Memory
 Bits from multiple chips are Ready
accessed in parallel to fetch Size
bytes and words
 Memory controller
decodes/translates address 16x8-bit memory array
and control signals 0000 1 0 1 1 0 0 1 0
 Controller can also be on 0001 1 0 0 0 0 0 0 1
memory chip address 1-of-16
 Example: decoder
 contains 8 16x1-bit chips 1111 0 1 0 1 0 0 1 1
and very simple controller D7 D6 D5 D4 D3 D2 D1 D0

16x1-bit memory chip


39v10 The ARM Architecture TM
49 49
EEPROM Interfacing

Memory Interface with 8-bit ROM


ARM MEMORY
A0 – A15 A0 – A15
D0 – D7 DQ0 – DQ7
WE WE
OE OE
GCS CE

Memory Interface with 8-bit ROM

39v10 The ARM Architecture TM


50 50
Interfacing 8 - Bit Memory Banks

Memory Interface with 8-bit ROM x 2

39v10 The ARM Architecture TM


51 51
Interfacing 16 - Bit Memory Banks

Extra Signals
BE – Bank Enable

Memory Interface with 16-bit ROM x 2

39v10 The ARM Architecture TM


52 52
Interfacing Banked SDRAM

Memory Interface with 16-bit SDRAM x 2


39v10 The ARM Architecture TM
53 53
Signals in Interfacing SDRAM

ARM SDRAM Signals Description


SCKE SCKE Clock Enable (high/Low)
SCLK SCLK System Clock
SCS0 SCS Chip Select
SRAS SRAS Row Address Strobe
SCAS SCAS Column Address Strobe
WE WE Write Enable

Memory Interface with 16-bit SDRAM x 2

39v10 The ARM Architecture TM


54 54
Critical Thinking

 It’s a commonly held belief that adding more


RAM increases your performance. If you
wanted to speed up your computer, what kind
of RAM would you buy and why?

39v10 The ARM Architecture TM


55 55
Agenda

Exceptions
System Design
Memory Interface
 Synchronization
Input / Output

39v10 The ARM Architecture TM


56 56
What is the Problem

Adding two array elements to another array element


LDR R0  A[0]
LDR R1  A[1]
ADD R2,R1,R0
STR R2 A[3]
Swapping the Variables
LDR R0  X
LDR R1  Y
STR R1  X
STR R2  Y

What to do ?????

39v10 The ARM Architecture TM


57 57
The Solution

Adding two array elements to another array element


LDR R0  A[0]
LDR R1  A[1]
ADD R2,R1,R0
Bubble or other instructions
STR R2 A[3]
Swapping the Variables
LDR R0  X
LDR R1  Y
STR R0  Y
STR R1  X

That’s Synchronization

39v10 The ARM Architecture TM


58 58
How to Achieve in ARM

 SINGLE DATA SWAP (SWP) SWP R0,R1,[R2]


[3:0] Source Register Load R0 with the word addressed
[15:12] Destination Register by R2, and store R1 at R2.

[19:16] Base Register SWPB R2,R3,[R4]


[22] Byte/Word Bit Load R2 with the byte addressed
0 = Swap word quantity by R4, and store bits 0 to 7 of R3
at R4.
1 = Swap word quantity
[31:28] Condition Field SWPEQ R0,R0,[R1]
Conditionally swap the contents of
the word addressed by R1 with
R0.

39v10 The ARM Architecture TM


59 59
How to Achieve in ARM
 The data swap instruction is used to swap a byte or word quantity
between a register and external memory. This instruction is
implemented as a memory read followed by a memory write which are
“locked” together (the processor cannot be interrupted until both
operations have completed, and the memory manager is warned to
treat them as inseparable). This class of instruction is particularly
useful for implementing software semaphores.
 The swap address is determined by the contents of the base register
(Rn). The processor first reads the contents of the swap address. Then
it writes the contents of the source register (Rm) to the swap address,
and stores the old memory contents in the destination register (Rd).
The same register may be specified as both the source and
destination.
 The LOCK output goes HIGH for the duration of the read and write
operations to signal to the external memory manager that they are
locked together, and should be allowed to complete without
interruption. This is important in multi-processor systems where the
swap instruction is the only indivisible instruction which may be used to
implement semaphores; control of the memory must not be removed
from a processor while it is performing a locked operation.
39v10 The ARM Architecture TM
60 60
Processor Independent Techniques

 Semaphores

 Mutual Exclusion
 Message Ques
 Pipes … etc

39v10 The ARM Architecture TM


61 61
Agenda

Exceptions
System Design
Memory Interface
Synchronization
 Input / Output

39v10 The ARM Architecture TM


62 62
CPU - Bus - I/O
Address
 CPU needs to talk with I/O
Data
devices such as keyboard,
mouse, video, network, CPU Read
disk drive, LEDs Write
 Memory-mapped I/O
 Devices are mapped to Memory I/O Device
specific memory locations
just like RAM
Address
 Uses load/store
instructions just like Data
Memory I/O
accesses to memory CPU Read
 Ported I/O Write
 Special bus line and I/O Port
instructions Memory
I/O Device

39v10 The ARM Architecture TM


63 63
I/O Register Basics
 I/O Registers are NOT like normal memory
 Device events can change their values (e.g., status registers)
 Reading a register can change its value (e.g., error condition reset)
 so, for example, can't expect to get same value if read twice
 Some are read-only (e.g., receive registers)
 Some are write-only (e.g., transmit registers)
 Sometimes multiple I/O registers are mapped to same address
 selection of one based on other info (e.g., read vs. write or extra control
bits)
 The bits in a control register often each specify something
different and important -- and have significant side effects
 Cache must be disabled for memory-mapped addresses
 When polling I/O registers, should tell compiler that value can
change on its own
 volatile int *ptr;

39v10 The ARM Architecture TM


64 64
Up Next - Bus Architectures

39v10 The ARM Architecture TM


65 65
Bus Protocols

 Protocol refers to the set of rules agreed upon by both the bus
master and bus slave
 Synchronous bus - transfers occur in relation to successive edges of a
clock
 Asynchronous bus - transfers bear no particular timing relationship
 Semi-synchronous bus - Operations/control initiate asynchronously,
but data transfer occurs synchronously

Bus

CPU Device 1 Device 2 Device 3

39v10 The ARM Architecture TM


66 66
Synchronous Bus Protocol
 Transfer occurs in relation to successive edges of the system clock
 Example:
 Memory address is placed on the address bus within a certain time, relative to
the rising edge of the clock
 By the trailing edge of this same clock pulse, the address information has had
time to stabilize, so the READ line is asserted
 Once the chip has been selected, then the memory can place the contents of
the specified location on the data bus

Clock
stable stable
Address Instruction Addr Data Addr
decoding delay
Master (CPU) RD

Master (CPU) CS
unstable stable unstable stable
Data I-fetch data
access time

39v10 The ARM Architecture TM


67 67
Asynchronous Bus Protocol
 No system clock used
 Useful for systems where
CPU and I/O devices run Address
at different speeds there's
I see you
got it
some
Master data
 Example:
 Master puts address and
data on the bus and then Slave I’ve
I see you
got
it see I got it
raises the Master signal
 Slave sees master signal,
Data
reads the data and then
raises the Slave signal
 Master sees Slave signal write read
and lowers Master signal
 Slave sees Master signal
lowered and lowers Slave
signal
We call this exchange “handshaking”

39v10 The ARM Architecture TM


68 68
Thank You

Any
Questions?

39v10 The ARM Architecture TM


69 69

You might also like