You are on page 1of 42

Introduction to Arm Assembly

Chapter 2

Sepehr Naimi

www.NicerLand.com
Topics
 ARM’s CPU
 Its architecture
 Some simple programs
 Data Memory access
 Program memory RAM EEPROM Timers

 RISC architecture PROGRAM


Flash ROM

Program Data
Bus Bus
CPU

Interrupt Other
OSC Ports
Unit Peripherals

I/O
PINS

2
ARM ’s CPU
 ARM ’s CPU
 ALU
 16 General Purpose
R0
registers (R0 to R15) R1
ALU
 PC register (R15) R2


 Instruction decoder CPSR: I T H S V N Z C
R13 (SP)

CPU R14 (LR)


R15 (PC)

PC registers

Instruction decoder

Instruction Register

3
CPU

4
Some simple instructions
1. MOV (MOVE)

 MOV Rd, #k  MOV Rd, Rs


 Rd = k  Rd = Rs
 k is an 8-bit value  Example:
 Example:  MOV R5,R2
 MOV R5,#53  R5 = R2
 R5 = 53  MOV R9,R7
 MOV R9,#0x27  R9 = R7
 R9 = 0x27
 MOV R3,#2_11101100

5
LDR pseudo-instruction (loading 32-bit values)

 LDR Rd, =k
 Rd = k
 k is an 32-bit value
 Example:
 LDR R5,=5543
 R5 = 5543
 LDR R9,=0x123456
 R9 = 0x123456
 LDR R4,=2_10110110011011001

6
Some simple instructions
Instruction
2. Description
Arithmetic calculation
ADD Rd, Rn,Op2 * ADD Rn to Op2 and place the result in Rd
 Opcode
ADC destination,
Rd, Rn,Op2 source1,
ADD Rn to source2
Op2 with Carry and place the result in Rd
 Opcodes:
AND
ADD,AND
Rd, Rn,Op2
SUB, AND, etc.
Rn with Op2 and place the result in Rd
BIC Rd, Rn,Op2 AND Rn with NOT of Op2 and place the result in Rd
 Examples:
CMP Rn,Op2 Compare Rn with Op2 and set the status bits of CPSR**
CMN Rn,Op2 Compare Rn with negative of Op2 and set the status bits
 ADD
EOR
R5,R2,R1
Rd, Rn,Op2 Exclusive OR Rn with Op2 and place the result in Rd

MVN R5 = R2 + R1Store the negative of Op2 in Rd
Rd,Op2
MOV Rd,Op2 Move (Copy) Op2 to Rd
 SUB
ORR
R5, R9,#23OR Rn with Op2 and place the result in Rd
Rd, Rn,Op2
 R5
RSB = R9 - 23 Subtract Rn from Op2 and place the result in Rd
Rd, Rn,Op2
RSC Rd, Rn,Op2 Subtract Rn from Op2 with carry and place the result in Rd
SBC Rd, Rn,Op2 Subtract Op2 from Rn with carry and place the result in Rd
SUB Rd, Rn,Op2 Subtract Op2 from Rn and place the result in Rd
TEQ Rn,Op2 Exclusive-OR Rn with Op2 and set the status bits of CPSR
TST Rn,Op2 AND Rn with Op2 and set the status bits of CPSR
* Op2 can be an immediate 8-bit value #K which can be 0–255 in decimal, (00–FF in hex).
Op2 can also be a register Rm. Rd, Rn and Rm are any of the general purpose registers
** CPSR is discussed later in this chapter

7
A simple program
 Write a program that calculates 19 + 95

MOV R6, #19 ;R6 = 19


MOV R2, #95 ;R2 = 95
ADD R6, R6, R2 ;R6 = R6 + R2

8
A simple program
 Write a program that calculates 19 + 95 - 5
MOV R1, #19 ;R6 = 19
MOV R2, #95 ;R2 = 95
MOV R3, #5 ;R21 = 5
ADD R6, R1,R2 ;R6 = R1 + R2
SUB R6, R6,R3 ;R6 = R6 - R3

MOV R1, #19 ;R6 = 19


MOV R2, #95 ;R2 = 95
ADD R6, R1,R2 ;R6 = R1 + R2
MOV R2, #5 ;R21 = 5
SUB R6, R6,R2 ;R6 = R6 - R2

9
Status Register (CPSR)
D31 D30 D29 D28 ………. D7 D6 D5 D4 D3 D2 D1 D0
CPSR: N Z C V Reserved I F T M4 M3 M2 M1 M0

Negative oVerflow Interrupt Thumb

Zero carry

Example:Show
Example: Showthe thestatus
statusof ofthe
theZZflag
flagafter
afterthethesubtraction
subtractionof of0x73
0x23
Example:
Example: Show
Show the
the status of the C
status instructions: and
ofinstructions:
the ZC flag Z
and afterflags
Z flags after
theafter the addition
subtraction of
of 0x9C
the addition of
from0x52
from 0xA5 ininthe
the following
following
0x0000009C
from
0x38 0x9C
and 0x2Fin and
the 0xFFFFFF64
in following
the following in the following instructions:
instructions:
instructions:
LDR
LDR R0,=0xA5
R0,=0x52
MOV LDR LDR
R6, #0x38 R0,=0x9C
R0,=0x9C;R6 = 0x38
LDR
LDR R1,=0x23
R1,=0x73
MOV LDR
LDR
R7, #0x2F R1,=0xFFFFFF64
R1,=0x9C ;R17 = 0x2F
SUBS
SUBS R0,R0,R1
R0,R0,R1 ;subtract R1
;subtract R1 from
from R0R0
ADDS SUBS ADDS
R6, R6,R7 R0,R0,R1
R0,R0,R1;add R7 ;subtract ;add
to R6 R1 to R0
R21 from R20
Solution:
Solution:
Solution: 52
Solution: 0xA5 0101 101000100101
-- 9C
73 38 00000000
1001 1100 00000000 00000000 0011 1000
0x23 0111
0000009C 0010 0011
0011 00000000 00000000 10011100
00000000
+ - +DF
9C2F
0x82 1101
FFFFFF64 00000000
1001 1100
1111
11111111
1000
00000000
0010 11111111 R0 00000000
R0= =0xDF
11111111
0x82
0010 1111
01100100
Z = 10 because 0067the R20
00000000 00000000
1 0000
has a value
00000000 00000000
0000other than
00000000 R0 00000000
=00000000
zero $00 01100111
after the subtraction.
00000000
C
R0
ZZ====01
R6 because
becauseR1
=000000000
0x67
because theis
the R20bigger
R20 ishasthan
zero R0 the
after
a value and there
thanis0 aafter
subtraction.
other borrow from D32 bit.
the subtraction.
CC==11because
becausethere
R21 isisnot
R1 is a carry
not beyond
bigger theand
than R0
R20 D7there
andbit.
thereisisno
noborrow
borrowfrom
fromD32
D32bit.
bit.
C = 0 because there is nobigger than
carry beyond the D31 bit.
Z = 1 because R0 (the result) has a value 0 in it after the addition.
Z = 0 because the R6 (the result) has a value other than 0 after the addition.
Harvard in ARM9 and Cortex

11
Memory Map in STM32F103

8 bits
4G 0xFFFF FFFF
Cortex-M3 internal
peripherals
0xE000 0000
Example: Add contents of location 0x90 to contents of location 0x94
Afterand
running the following
store instruction:
the result STR (Store register)
in location 0x20000300.

SRAM
3G STR R5, 0000
0xC000 [R2]
Solution:
locations 0x20000000 through 0x20000003 will be loaded
with 0x78, 0x56, 0x34, and 0x12, respectively.
STR Rx,[Rd] ;[Rd]=Rx
LDR (Load register)
Example: Write a program
LDR R6,=0x90 ;R6 that copies the contents of location 0x80
= 0x90
FSMC 0x12 0x2000 0003
into location
0x8000 LDR
0x88.
R1,[R6]
0000 Example:
LDR Rd,
;R1 = [0x90] [Rx];Rd = [Rx]
2G 0x34 0x2000 0002
Solution: 0x56
LDR R6,=0x94
0x6000 0000
;R6 = 0x94 ;[0x20000000]=0x12345678
0x2000 0001
LDRR2,[R6]
LDR R2,=0x80 Example:
;R2 == [0x94]
;R1 0x80 0x78 0x2000 0000
Peripherals
0x5FFF FFFF LDR R5,=0x12345678
1G LDR R2,R2,R1
0x4000 ADD
0000 R1,[R2] ;R1 == R2
;R2 [0x80]
+ R1
LDR
R4,=0x20000000
0x3FFF FFFF
R5: 0x2000
LDR R2, =0x20000000
SRAM 0x12
LDR 0x34
R2,=0x88 0x56
LDR R6,=0x20000300;R2 0x78
;R6= =0x88
0x20000300
0000
LDR R1, [R4]
STR R2,[R6]
0x1FFF STR
FFFF R1,[R2] ;[0x88] =STR
R1 = R5,[R2]
;[0x20000300] R2 ; [R2] = R5
Flash
0 0x0000 0000
LDRB, LDRH, STRB, STRH
Data Size Bits Load instruction used Store instruction used
Byte 8 LDRB STRB
Half-word 16 LDRH STRH
Word 32 LDR STR

LDR Rd,[Rs] STR Rs,[Rd]


LDRB Rd,[Rs] STRB Rs,[Rd]
LDRH Rd,[Rs] STRH Rs,[Rd]

Assumethat
Assume thatR5=0x40000200,
R5=0x40000200,and andR1locations 0x40000200
= 0x41526374.

SRAM
through
After 0x40000203
running contain
the following 0x78, 0x56, 0x34 ,and 0x12,
instruction:
respectively.
STRB R1, [R5]
After running
locations the following
0x40000200 will beinstruction:
loaded with 0x74.
LDRH R7, [R5]
R7 will be loaded with 0x00005678 0x12
- 0x4000 0203
0x34
- 0x4000 0202
0x56
- 0x4000 0201
0x00 0x00 0x78
0x74 0x4000 0200

R7
R1 0x00
x 0x00
x 0x56
x 0x78
0x74
13
Memory Map in STM32F103

I/O Register Address


GPIOA_LCKR 0x40010818
GPIOA_BRR 0x40010814
GPIOA_BSRR 0x40010810
GPIOA_ODR 0x4001080C
GPIOA_IDR 0x40010808
GPIOA_CRH 0x40010804
GPIOA_CRL 0x40010800
Example: Read the contents of GPIOA_IDR.
Example: Write 0x53F6 into GPIOA_ODR.
Solution:
Solution: LDR R1,=0x40010808 ;R1= 0x40010808
LDR R2,=0x53F6 LDR ;R6
R2,[R1]
= 0x53F6 ;R2 = [0x4001080C]
LDR R1,=0x4001080C ;R1= 0x4001080C
STR R2,[R1] ;[0x4001080C] =
0x53F6

14
Some Arm addressing modes
 Immediate
 MOV R1, #0x25 F04F0125
 ADD R6, R6, #0x40
 Register addressing mode
 MOV R2, R4
 ADD R3, R2, R1 EB020301

 Register indirect (indexed)


 STR R5, [R6]
 LDR R10, [R3]

15
Assembler Directives

16
Assembler
Assembly
Editor Program

myfile.a
assembler

Assembler Program

Machine
Language

[scriptFile.scr] [otherFiles.o] myfile.o myfile.lst

Linker

Downloaded to the
myfile.map myfile.hex
Program Memory

17
Assembler directives vs. Instructions
 Instructions (e.g. ADD, MOV) tell the CPU what
to do
 Assembler directives tell the assembler what to
do
 AREA
 IMPORT and EXPORT
 END
 DCD, DCW, DCB
 EQU
 INCLUDE

18
AREA
 AREA sectionName, attribute1, attribute2, …
 Code:
8 bits
 AREA myCode, CODE, READONLY 4G 0xFFFF FFFF

Data:
Cortex-M3 internal
 AREA
AREA MY_PROG,CODE,READONLY
MY_PROG,CODE,READONLY peripherals
0xE000 0000

__main
__main
 AREA
MOV myData1,
MOV R4,
R4, #6
#6 DATA, READWRITE 3G 0xC000 0000
ADD
ADD R1,R1,R2
R1,R1,R2
 AREA
….
…. myConst, DATA, READONLY
FSMC
myFunc
myFunc
2G 0x8000 0000
ADD
ADD R2,R3,R4
R2,R3,R4

… 0x6000 0000

0x5FFF FFFF
Peripherals
1G 0x4000 0000
0x3FFF FFFF
READWRITE
READWRITE SRAM 0x2000 0000
0x1FFF FFFF
READONLY
READONLY 0
Flash
0x0000 0000

19
IMPORT and EXPORT
File1.s
; from the main program:
IMPORT MY_FUNC
...
BL MY_FUNC ;call MY_FUNC function
...

File2.s
AREA OUR_EXAMPLE,CODE,READONLY
EXPORT MY_FUNC
IMPORT DATA1
MY_FUNC
LDR R1,=DATA1
...

20
First Assembly Program

EXPORT __main
AREA PROG_2_1, CODE, READONLY
__main
MOV R1, #0x25 ; R1 = 0x25
MOV R2, #0x34 ; R2 = 0x34
ADD R3, R2, R1 ; R3 = R2 + R1
HERE B HERE ; stay here forever
END ;end of source file

21
Defining Const. Values using DCD, DCW, and DCB

 DCB allocates bytes of memory & initializes them.


 Examples:
 MYVALUE DCB 5
 FIBO DCB 1,1,2,3,5,8
 MY_MSG DCB “Hello World!”
 DCW allocates a half-word
 Example:
 MYVALUE DCW 25425
 DCD allocates a word of memory
 MYDATA DCD 0x200000, 0x30F5, 5000000

22
Storing Fixed Data in Program Memory
EXPORT __main
AREA PROG2_2, CODE, READONLY
__main LDR R2, =OUR_FIXED_DATA ; point to OUR_FIXED_DATA
LDRB R0, [R2] ; load R0 with the contents
; of memory pointed to by R2
ADD R1, R1, R0 ; add R0 to R1
HERE B HERE ; stay here forever
AREA LOOKUP_EXAMPLE, DATA, READONLY
OUR_FIXED_DATA
DCB 0x55, 0x33, 1, 2, 3, 4, 5, 6
DCD 0x23222120, 0x30
DCW 0x4540, 0x50
END

23
Allocating memory using SPACE
 SPACE allocates memory without initializing.
 Example 1: Allocating 4 bytes of memory:
 MY_LONG SPACE 4
 Example 2: Allocating 2 bytes:
 ALFA SPACE 2
 Example 3: Allocating an array of 20 bytes:
 MY_ARRAY SPACE 20

24
Defining 3 variables A, B, and C
EXPORT __main AREA OUR_DATA, DATA, READWRITE
AREA OUR_PROG, CODE, READONLY ; Allocates the followings in SRAM
__main ; A = 5 A SPACE 4
LDR R0, =A ; R0 = Addr. of A B SPACE 4
MOV R1, #5 ; R1 = 5 C SPACE 4
STR R1, [R0] ; init. A with 5 END
; B = 4
LDR R0, =B ; R0 = Addr. of B
MOV R1, #4 ; R1 = 4
STR R1, [R0] ; init. B with 4
; R1 = A
LDR R0, =A ; R0 = Addr. of A
LDR R1, [R0] ; R1 = value of A int main()
; R2 = B {
LDR R0, =B ; R0 = Addr. of A int a = 5;
LDR R2, [R0] ; R2 = value of A int b = 4;
; C = R1 + R2 (C = A + B) int c = a + b;
ADD R3, R1, R2 ; R3 = A + B
LDR R0, =C ; R0 = Addr. of C while(1)
STR R3, [R0] ; C = R3 {
loop B loop }
}

25
ALIGN
 ALIGN is used to align data on 32-bit or 16-bit
boundary.
a)

DTA DCB 0x55


DCB 0x22
END
b)

DTA DCB 0x55


ALIGN 2
DCB 0x22
END
c)

DTA DCB 0x55


ALIGN 4
DCB 0x22

26
Assembler Directives
EQU and RN
 name EQU value
 Example:

COUNT EQU 0x25


MOV R1, #COUNT ;R1 = 0x25
MOV R2, #COUNT + 3 ;R2 = 0x28
 Example 2:
GPIOA_ODR EQU 0x4001080C
 name RN register
 Example 1:

RESULT RN R2
MOV RESULT,#23
 Example 2:
ProgCounter RN R15

27
Assembler Directives
INCLUDE
 INCLUDE “filename.ext”

hFile.inc
GPIOA_CRL EQU 0x40010800
GPIOA_CRH EQU 0x40010804
GPIOA_IDR EQU 0x40010808
GPIOA_ODR EQU 0x4001080C
....

Program.s
include “hFile.inc”

28
Power up in Cortex-M

29
Startup and main files
Startup_stm32f10x.s

AREA RESET, DATA, READONLY


EXPORT __Vectors
__Vectors DCD __initial_sp ; loc. 0 to 3 (Stack init)
DCD Reset_Handler ; loc. 4 to 7
...
main.s
Reset_Handler PROC
IMPORT __main AREA OUR_EXAMPLE,CODE,READONLY
... EXPORT __main
__main
LDR R0, =__main ...
BX R0

;reserving 0x400 bytes for stack

AREA STACK, NOINIT, READWRITE, ALIGN=3

Stack_Mem SPACE 0x400

__initial_sp

30
Flash memory and PC register
0x08000200 F04F0125
0x08000204 F04F0234
0F02
0x08000208 EB020301
0x0800020C E7FE
0x0800020E

RAM

PROGRAM
Flash ROM ALU

main.lst 32bit
PC: 0x0800020C
0x08000200
0x08000208
0x08000204
0x0800020E Data
CPU Bus
Line Offset Machine Instruction _ 32bit

1 00000000 ; The program adds some data Code Instruction dec.


2 00000000 EXPORT __main Bus
3 00000000 AREA PROG_2_4, CODE, READONLY
4 00000000 __main
5 00000000 F04F 0125 MOV R1, #0x25 ; R1 = 0x25 Ports
6 00000004 F04F 0234 MOV R2, #0x34 ; R2 = 0x34
7 00000008 EB02 0301 ADD R3, R2, R1 ; R3 = R2 + R1
I/O
8 0000000C PINS
9 0000000C E7FE HERE B HERE ; stay here forever
31
10 0000000E END
How to speed up the CPU
 Increase the clock frequency
 More frequency  More power consumption &
more heat
 Limitations
 Change the architecture
 Pipelining
 Harvard
 RISC

32
Pipeline
 Non-pipeline
 Just fetches, decodes, or executes in a given time

 Pipeline

33
Pipeline (Cont.)

SUB R3,R3,R4
LDR R2, [R4] ; R2 = [R4] ADD R0, R0,R1
ADD R0,R0,R1 ; R20 = R20 + R21 LDR R2, [R4]
SUB R3,R3,R4

Fetch

Decode

Execute

34
Harvard Architecture
 separate buses for opcodes and operands
 Advantage: opcodes and operands can go in and out of the CPU
together.
 Disadvantage: Using Harvard architecture in motherboards leads
to more cost in general purpose computers.

Control bus Control bus


Code Data
Memory Data bus CPU Data bus Memory
Address bus Address bus

35
Changing the architecture
RISC vs. CISC
 CISC (Complex Instruction Set Computer)
 Put as many instruction as you can into the CPU
 RISC (Reduced Instruction Set Computer)
 Reduce the number of instructions, and use your
facilities in a more proper way.

36
RISC architecture
 Feature 1 (fixed instruction size)
 RISC processors have a fixed instruction size. It
makes the task of instruction decoder easier.
 In ARM the instructions are 4 bytes.
 In Thumb2 the instructions are either 2 or 4 bytes.
 In CISC processors instructions have different
lengths
 E.g. in 8051
 CLR C ; a 1-byte instruction
 ADD A, #20H ; a 2-byte instruction
 LJMP HERE ; a 3-byte instruction

37
RISC architecture
 Feature 2: reduce the number of instructions
 Pros: Reduces the number of used transistors
 Cons:
 Can make the assembly programming more difficult
 Can lead to using more memory

38
RISC architecture
 Feature 3: limit the addressing mode
 Advantage
 hardwiring
 Disadvantage
 Can make the assembly programming more difficult

39
RISC architecture
 Feature 4: Load/Store
LDR R8,=0x20
LDR R0,[R8]
LDR R8,=0x220
LDR R1,[R8]
ADD R0, R0,R1
LDR R8,=0x230 RAM USART Timers
STR R0,[R8]
PROGRAM
Flash ROM ALU

PC: Data
CPU Bus
Instruction dec.
Program
Bus

Interrupt Other
OSC Ports
Unit Peripherals

I/O
PINS

40
RISC architecture
 Feature 5: more than 95% of instructions are
executed in 1 machine cycle

41
RISC architecture
 Feature 6
 RISC processors have at least 32 registers.
Decreases the need for stack and memory usages.
 In ARM there are 16 general purpose registers (R0
to R15)

42

You might also like