You are on page 1of 91

EE382N-4

Advanced Microcontroller Systems

Programming the ARM Processor on the Zed Board

Mark McDermott

Spring 2018

2/8/18 EE382N-4 Class Notes Page 1


Agenda

§ Overview of GNU tools


§ ZED Board specifics
§ C and ALP (Assembly Language Programming)

2/8/18 EE382N-4 Class Notes 2


GNU compiler and binutils

§ There are two types of compilations


– One is for code that is compiled to run on top of Linux (or an OS)
– The other is for “Bare Metal” code

§ GNU compiler and bin/utils


– gcc: GNU C compiler
– as: GNU assembler
– ld: GNU linker
– gdb: GNU project debugger

§ We use the following variant of the GNU toolset:


arm-linux-gnueabihf-xxxxx

For more info on EABI: https://en.wikipedia.org/wiki/Application_binary_interface

2/8/18 EE382N-4 Class Notes 3


Pipeline

§ COFF (common object file format)


§ ELF (extended linker format)
§ Segments in the object file
– Text: code
– Data: initialized global variables
– BSS: uninitialized global variables

Simulator
gcc as ld Debugger
.c .s .coff .elf …

C source asm source object file executable

2/8/18 EE382N-4 Class Notes 4


ARM Procedure Call Standard (APCS)
§ ARM defines a set of rules for procedure entry and exit so that
– Object codes generated by different compilers can be linked together
– Procedures can be called between high-level languages and assembly

§ APCS defines
– Use of registers
– Use of stack
– Format of stack-based data structure
– Mechanism for argument passing

2/8/18 EE382N-4 Class Notes 5


APCS register usage convention
Register APCS Name Function
0 a1 Argument 1 / integer result / scratch register Used to pass the first
1 a2 Argument 1 / scratch register 4 parameters
Caller-saved if
2 a3 Argument 1 / scratch register
necessary
3 a4 Argument 1 / scratch register
4 v1 Register variable 1
5 v2 Register variable 2
6 v3 Register variable 3
Register variables,
must return
7 v4 Register variable 4
unchanged
8 v5 Register variable 5 Callee-saved
9 sb/v6 Register variable 6 / Static base
10 sl/v7 Register variable 7 / Stack limit
11 fp Frame pointer
12 ip Scratch register / new sb in inter-link -unit calls
Registers for special
purposes
13 sp Lower end of current stack frame
Could be used as
14 lr Link address / scratch register temporary variables
15 pc Program counter if saved properly.
2/8/18 EE382N-4 Class Notes 6
GCC Calling Convention

§ First 4 arguments saved in registers saved


registers
§ Registers saved by caller and callee (by caller)

(including frame pointer and returning PC)


argument 6
§ Frame pointer points just below the last frame pointer argument 5
argument passed on the stack (the bottom saved
of the frame) registers
(by callee)
§ Stack pointer points to the first word after high
the frame memory
local variables
§ Return value is in R0

dynamic area
§ NOTE: Procedures with less than four
parameters are more effective stack pointer
stack area

2/8/18 EE382N-4 Class Notes 7


Stack Backtrace Data Structure

§ Save code pointer (the value of pc) allows the save code pointer [fp, #0] ¬fp points
function corresponding to a stack backtrace return link value [fp, #-4]
structure to be located return sp value [fp, #-8]
return fp value [fp, #-12]
§ Entry code {saved v7 value}
MOV ip, sp ; save current sp,
; ready to save as old sp
{saved v6 value}
STMFD sp!, {a1-a4, v1-v7, sb, fp, ip, lr, pc} ;as needed {saved v5 value}
SUB fp, ip, #4 {saved v4 value}
{saved v3 value}
§ On function exit {saved v2 value}
return link value pc {saved v1 value}
return sp value sp {saved a4 value}
return fp value fp {saved a3 value}
{saved a2 value}
§ If no use of stack, can be simple as {saved a1 value}
MOV pc, lr ; or
BX lr

2/8/18 EE382N-4 Class Notes Page 8


Make file for applications
# KDIR is the location of the kernel source. The current standard is to link to the associated source tree
# from the directory containing the compiled modules.

KDIR:= /usr/src/plnx_kernel
LIBPATH=$(KDIR)/lib
INCPATH=$(KDIR)/include

CC := arm-linux-gnueabihf-gcc

PWD:= $(shell pwd)


CFLAGS= -I$(INCPATH) -L$(LIBPATH)

dm:
$(CC) $(CFLAGS) dm.c -o $@

clean:
rm dm

2/8/18 EE382N-4 Class Notes 9


Make file for kernel modules
CC := arm-linux-gnueabihf-gcc

# obj-m is a list of what kernel modules to build. The .o and other objects will be automatically built from the
# corresponding .c file - no need to list the source files explicitly.

obj-m := zed_gpio_int.o

# KDIR is the location of the kernel source. The current standard is to link to the associated source tree
# from the directory containing the compiled modules.

KDIR:= /usr/src/plnx_kernel

# PWD is the current working directory and the location of our module source files.
PWD := $(shell pwd)

# default: is the default make target. The rule here says to run make with a working directory of the directory
# containing the kernel source and compile only the modules in the PWD (local) directory.

default:

$(MAKE) -C $(KDIR) M=$(PWD) modules


$(CC) intr_latency.c -o intr_latency.exe
$(CC) intr_latency_csv.c -o intr_latency_csv

clean::
$(RM) .skeleton* *.cmd *.o *.ko *.mod.c *.exe
$(RM) -R .tmp*

2/8/18 EE382N-4 Class Notes 10


Agenda

§ Overview of GNU tools


§ ZED Board specifics
§ C and ALP (Assembly Language Programming)

2/8/18 EE382N-4 Class Notes 11


Memory mapped I/O operations using C
#define MAP_SIZE 4096UL
#define MAP_MASK (MAP_SIZE -1)
#define GPIO_DEV_PATH "/dev/gpio_int"
#define GPIO_PIN_NUM 0x0
#define GPIO_DR 0x43c00000 // Data register

#define ONE_BIT_MASK(_bit) (0x00000001<<(_bit))

volatile int rc;

rc = gpio_set_pin(GPIO_DR, GPIO_PIN_NUM, 0); // Clear output pin

// Lets look at gpio_set_pin function:

int gpio_set_pin(unsigned int target_addr, unsigned int pin_number, unsigned int bit_val)
{
unsigned int reg_data;

int fd = open("/dev/mem", O_RDWR|O_SYNC);


if(fd == -1) { printf("Unable to open /dev/mem. Ensure it exists (major=1, minor=1)\n");
return -1; }

volatile unsigned int *regs, *address ;

regs = (unsigned int *)mmap(NULL, MAP_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd,


target_addr&~MAP_MASK); // Physical to virtual address mapping

// Continued on next page à

EE382N-4 Class Notes 12


Memory mapped I/O operations using C
// Continued from previous page

address = regs + (((target_addr)&MAP_MASK)>>2);

/* Read register value to modify */

reg_data = *address;

if (bit_val == 0) {

/* Deassert output pin in the target port's DR register*/

reg_data&= ~ONE_BIT_MASK(pin_number);
*address = reg_data;
} else {

/* Assert output pin in the target port's DR register*/

reg_data |= ONE_BIT_MASK(pin_number);
*address = reg_data;
}

int temp = close(fd); // CRITICAL: Need to close file descriptor

if(temp == -1) { printf("Unable to close /dev/mem. Ensure it exists (major=1,


minor=1)\n"); return -1; }

munmap(NULL, MAP_SIZE); // CRITICAL: Need to release mapping


return 0;

EE382N-4 Class Notes 13


Memory mapped I/O operations using shell scripts

#!/bin/sh # Enable GPIO_19 as a 48 MHz reference clock


/bin/pm 0x020e0220 0x00000003
# Enable IOMUX port for CSPI /bin/pm 0x020e05f0 0x000020e9
/bin/pm 0x020e0144 0x00000001 /bin/pm 0x020c4060 0x010e00d0
/bin/pm 0x020e0148 0x00000001
/bin/pm 0x020e014c 0x00000001 # Dump registers
/bin/pm 0x020e01cc 0x00000001 /bin/dm 0x02008000 0x10
/bin/pm 0x020e07dc 0x00000002 /bin/dm 0x020c4000
/bin/pm 0x020e07e8 0x00000002 /bin/dm 0x020c406c
/bin/pm 0x020e024c 0x00000000
/bin/pm 0x020e01bc 0x05 // Select ALT 5
# Set drive strength for SPI_SCLK /bin/smb 0x020a400c 17 0x1 // Select rising edge
/bin/pm 0x020e0514 0x00002090 /bin/smb 0x020a4014 8 0x1 // Set bit 8

# Enable clock gating to ECSPI /bin/pm 0x020e016c 0x05 // Select ALT 5


/bin/pm 0x020c406c 0x00300c03 /bin/pm 0x020a4010 0x00200000 // Select rising edge triggered
/bin/smb 0x020a4014 26 0x1 // Set bit 26
# Reset SPI
/bin/pm 0x02008008 0x01f01010 /bin/pm 0x020e0334 0x05 // Set ALT5 should be default
/bin/pm 0x020e071c 0x00003088 // Sets drive strength
# Enable SPI /bin/smb 0x020b4004 8 0x1 // Set pin 8
/bin/pm 0x02008008 0x01f004f9
/bin/pm 0x0200800c 0x00000f00 /bin/pm 0x020e01f4 0x05 // Set ALT5 (should be default)
/bin/pm 0x020e01f4 0x00003088 // Sets drive strength
# Enable interrupts /bin/smb 0x0209c004 24 0x1 // Set bit 24
/bin/pm 0x02008010 0x00000010

# Enable test mode


/bin/pm 0x02008020 0x80000000

2/8/18 EE382N-4 Class Notes Page 14


Memory mapped I/O operations using C
#include "stdio.h"
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include "mx6_registers.h"

// Prototypes

int spi_int_test(unsigned int channel);


int pm( unsigned int target_addr, unsigned int value );
int smb (unsigned int target_addr, unsigned int bit_number, unsigned int bit_val );

/* --------------------------------------------------------------------------
* init_spi: Initialize SPI. Two modes of operation
* channel = 0: H1KP Digital Channel
* channel = 1: H1KP Digital plus Analog Channel
* channel = 2: ADC -- 32 bit word
* channel = 3: DAC -- 32 bit word
* ------------------------------------------------------------------------ */

int init_spi_chan(unsigned int channel) {

if (channel > 3)return -1;

// -----------------------------------------------------------------------
// Enable IOMUX ports for CSPI1
// -----------------------------------------------------------------------
if ((channel == 0) || (channel == 1)) {
pm (0x020e0144, 0x00000001); // Select ALT1 for CSPI1_SCLK
pm (0x020e0148, 0x00000001); // Select ALT1 for CSPI1_MISO
pm (0x020e014c, 0x00000001); // Select ALT1 for CSPI1_MOSI
pm (0x020e01cc, 0x00000001); // Select ALT1 for CSPI1_SS0
pm (0x020e07dc, 0x00000002); // Select ALT1 for CSPI1_MISO input
pm (0x020e0518, 0x000030b0); // Select 100K PD on MISO input
pm (0x020e024c, 0x00000000); // Select ALT1 for CSPI1_SS1
}
} // End of init_spi routine

2/8/18 EE382N-4 Class Notes


“Hello World” from a Kernel Module
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/proc_fs.h>
#include <linux/sched.h>
#include <asm/uaccess.h>

int rc, len,temp;


msg="Hello World from more /proc/hello";
char *msg; len=strlen(msg);
temp = len;
int read_proc(struct file *filp,char *buf,size_t
count,loff_t *offp ) printk(KERN_INFO "Hello World from proc_create
{ routine \n");
if(count>temp) }
{
count=temp;
}
int proc_init (void) {
temp=temp-count; create_new_proc_entry();
rc = copy_to_user(buf,msg, count); printk("Loading module: hello \n");
return 0;
if(count==0) temp=len; }

return count; void proc_cleanup(void) {


} printk("Unloading module: hello - Bye bye
\n");
struct file_operations proc_fops = { remove_proc_entry("hello",NULL);
read: read_proc }
};
MODULE_LICENSE("GPL");
module_init(proc_init);
void create_new_proc_entry( void) module_exit(proc_cleanup);
{
proc_create("hello",0,NULL,&proc_fops);

2/8/18 EE382N-4 Class Notes 16


Making device nodes and inserting kernel modules

#!/bin/sh #!/bin/sh -e

# --------------------------------------------- # --------------------------------------------
# Manual way to setup device nodes and install # Automatically make nodes and insert kernel
# kernel modules
# modules on boot up
# ---------------------------------------------
# --------------------------------------------’
# ### BEGIN INIT INFO
# Make a device node: # Provides: imod
# # Required-Start:
# Required-Stop:
/bin/rm /dev/spi_int # Should-Start:
/bin/mknod /dev/spi_int c 245 0 # Should-Stop:
# Default-Start: S
# # Default-Stop:
# Load kernel module of the SPI device driver: # Short-Description: Make the various device
#
# nodes and insert kernel modules
/sbin/rmmod spi_int
# Description: Located in /etc/init.d
/sbin/insmod ./spi_int.ko
### END INIT INFO
if[ $? != 0 ];then
printf"$0: *** Error: insmod failed\n" PATH='/sbin:/bin'
exit1
fi /bin/mknod /dev/gpio_int c 243 0
/sbin/insmod /root/code/ZED/gpio-int-
/bin/pm 0x020c406c 0x00300c03 latency/zed_gpio_int.ko
/bin/pm 0x02008008 0x01f01019
/bin/pm 0x0200800c 0x00000100
/bin/pm 0x02008020 0x80000000
/bin/dm 0x02008000 0x10

2/8/18 EE382N-4 Class Notes Page 17


Setting up automatic loading during boot up

§ Need to generate link in /etc/rcS.d to imod script

2/8/18 EE382N-4 Class Notes 18


C Programming in Embedded Systems
§ Assembly language
– dependent of processor architecture
– cache control, registers, program status, interrupt

§ High-level language
– memory model
– independent of processor architecture (partially true)

§ Advantages and disadvantages


– performance
– code size
– software development and life cycle

2/8/18 EE382N-4 Class Notes 19


Interface C and Assembly Language
§ Why combine C and assembly language
– performance
– C doesn’t handle most architecture features, such as registers, program
status, etc.

§ Develop C and assembly programs and then link them together


– at source level – in-line assembly code in C program
– at object level – procedure call

Assembly .s Assembler
source module
.o

C Library .so Linker Executable

.c .o
C source module C Compiler

2/8/18 EE382N-4 Class Notes 20


Inline Assembly Language in gcc
§ You can write inline assembler instructions in much the same way as you would
write assembler programs.
§ Registers and constants are used in a different way if they refer to expressions
of your C program.
§ The connection between registers and C operands is specified in the second
and third part of the asm instruction, the list of output and input operands,
respectively.
§ The general form of the asm instruction is:
asm(code : output operand list : input operand list : clobber list);

§ In the code section, operands are referenced by a percent sign followed by a


single digit.
%0 refers to the first
%1 to the second operand and so forth.
asm(”instruction %1,%2,%0" : "=r" (output) : "r" (input1), "r" (input2));

EE382N-4 Class Notes 21


Inline Assembly Language in gcc
You can add the volatile attribute to the asm statement to instruct the compiler
not to optimize your assembler code.
asm volatile("mov %0, %1, ror #1" : "=r" (result) : "r" (value));

// Example of C stub function containing nothing but assembly code

unsigned long htonl(unsigned long val)


{
asm volatile ("eor r3, %1, %1, ror #16\n\t"
"bic r3, r3, #0x00FF0000\n\t"
);
return val;
}

§ The clobber list should contain:


– The registers modified, either explicitly or implicitly, by your code.
– If your code modifies the condition code register, “cc”.
– If your code modifies memory, “memory”.
§

EE382N-4 Class Notes 22


Inline Assembly Language in gcc
#include <stdio.h>
#include <stdint.h>

static inline __attribute__((always_inline))

uint32_t arm_ror_imm(uint32_t v, uint32_t sh) {


uint32_t d;
asm ("ROR %[Rd], %[Rm], %[Is]" : [Rd] "=r" (d) : [Rm] "r" (v), [Is] "i" (sh));
return d;
}

static inline __attribute__((always_inline))

uint32_t arm_ror(uint32_t v, uint32_t sh) {


uint32_t d;
asm ("ROR %[Rd], %[Rm], %[Rs]" : [Rd] "=r" (d) : [Rm] "r" (v), [Rs] "r" (sh));
return d;
}

int main() {
uint32_t val;

val = 0x22;
printf("val = 0x%08X\n", val);
printf("arm_ror(val, 1) = 0x%08X\n", arm_ror(val, 1));
printf("arm_ror(val, 2) = 0x%08X\n", arm_ror(val, 2));
printf("arm_ror_imm(val, 3) = 0x%08X\n", arm_ror_imm(val, 3));
printf("arm_ror_imm(val, 4) = 0x%08X\n", arm_ror_imm(val, 4));
return 0;
}
EE382N-4 Class Notes 23
Inline Assembly Language in gcc
Example 1
asm("foo %1,%2,%0" : "=r" (output) : "r" (input1), "r" (input2));

The generated code could be


#APP
foo r13,r5,r9 // %0,%1, and %2 are replaced with registers
// holding the first three argument.
#NO_APP

Example 2
asm("foo %1,%2,%0" : "=r" (ptr->vtable[3](a,b,c)->foo.bar[baz]) : : "r"
(gcc(is) + really(damn->cool)), "r" (42));

GCC will treat this just like:


register int t0, t1, t2;
t1 = gcc(is) + really(damn->cool);
t2 = 42;
asm("foo %1,%2,%0" : "=r" (t0) : "r" (t1), "r" (t2));
ptr->vtable[3](a,b,c)->foo.bar[baz] = t0;

2/8/18 EE382N-4 Class Notes 24


Example: C assignments
§ C:
x = (a + b) - c;

§ Assembler:
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
ADR r4,b ; get address for b, reusing r4
LDR r1,[r4] ; get value of b
ADD r3,r0,r1 ; compute a+b
ADR r4,c ; get address for c
LDR r2,[r4] ; get value of c
SUB r3,r3,r2 ; complete computation of x
ADR r4,x ; get address for x
STR r3,[r4] ; store value of x

2/8/18 EE382N-4 Class Notes 25


Example: C assignment
§ C:
y = a*(b+c);

§ Assembler:
ADR r4,b ; get address for b
LDR r0,[r4] ; get value of b
ADR r4,c ; get address for c
LDR r1,[r4] ; get value of c
ADD r2,r0,r1 ; compute partial result
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
MUL r2,r2,r0 ; compute final value for y
ADR r4,y ; get address for y
STR r2,[r4] ; store y

2/8/18 EE382N-4 Class Notes 26


Example: C assignment
§ C:
z = (a << 2) | (b & 15);

§ Assembler:
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
MOV r0,r0,LSL 2 ; perform shift
ADR r4,b ; get address for b
LDR r1,[r4] ; get value of b
AND r1,r1,#15 ; perform AND
ORR r1,r0,r1 ; perform OR
ADR r4,z ; get address for z
STR r1,[r4] ; store value for z

2/8/18 EE382N-4 Class Notes 27


Example: if statement
§ C:
if (a > b) { x = 5; y = c + d; } else x = c - d;

§ Assembler:
; compute and test condition
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
ADR r4,b ; get address for b
LDR r1,[r4] ; get value for b
CMP r0,r1 ; compare a < b
BLE fblock ; if a ><= b, branch to false block

2/8/18 EE382N-4 Class Notes 28


if statement, cont’d.
; true block
MOV r0,#5 ; generate value for x
ADR r4,x ; get address for x
STR r0,[r4] ; store x
ADR r4,c ; get address for c
LDR r0,[r4] ; get value of c
ADR r4,d ; get address for d
LDR r1,[r4] ; get value of d
ADD r0,r0,r1 ; compute y
ADR r4,y ; get address for y
STR r0,[r4] ; store y
B after ; branch around false block

2/8/18 EE382N-4 Class Notes 29


if statement, cont’d.
; false block
fblock:
ADR r4,c ; get address for c
LDR r0,[r4] ; get value of c
ADR r4,d ; get address for d
LDR r1,[r4] ; get value for d
SUB r0,r0,r1 ; compute a-b
ADR r4,x ; get address for x
STR r0,[r4] ; store value of x

after ...

2/8/18 EE382N-4 Class Notes 30


Example: Conditional instruction implementation
; true block
MOVLT r0,#5 ; generate value for x
ADRLT r4,x ; get address for x
STRLT r0,[r4] ; store x
ADRLT r4,c ; get address for c
LDRLT r0,[r4] ; get value of c
ADRLT r4,d ; get address for d
LDRLT r1,[r4] ; get value of d
ADDLT r0,r0,r1 ; compute y
ADRLT r4,y ; get address for y
STRLT r0,[r4] ; store y

2/8/18 EE382N-4 Class Notes 31


Conditional instruction implementation, cont’d.
; false block
ADRGE r4,c ; get address for c
LDRGE r0,[r4] ; get value of c
ADRGE r4,d ; get address for d
LDRGE r1,[r4] ; get value for d
SUBGE r0,r0,r1 ; compute a-b
ADRGE r4,x ; get address for x
STRGE r0,[r4] ; store value of x

2/8/18 EE382N-4 Class Notes 32


Example: switch statement
§ C:
switch (test) { case 0: … break; case 1: … }

§ Assembler:
ADR r2,test ; get address for test
LDR r0,[r2] ; load value for test
ADR r1,switchtab ; load address for switch table
LDR r1,[r1,r0,LSL #2] ; index switch table

switchtab:
DCD case0
DCD case1
...

2/8/18 EE382N-4 Class Notes 33


Example: FIR filter
§ C:
for (i=0, f=0; i<N; i++)
f = f + c[i]*x[i];

§ Assembler
; loop initiation code
MOV r0,#0 ; use r0 for I
MOV r8,#0 ; use separate index for arrays
ADR r2,N ; get address for N
LDR r1,[r2] ; get value of N
MOV r2,#0 ; use r2 for f

2/8/18 EE382N-4 Class Notes 34


FIR filter, cont’.d
ADR r3,c ; load r3 with base of c
ADR r5,x ; load r5 with base of x
; loop body
loop:
LDR r4,[r3,r8] ; get c[i]
LDR r6,[r5,r8] ; get x[i]
MUL r4,r4,r6 ; compute c[i]*x[i]
ADD r2,r2,r4 ; add into running sum
ADD r8,r8,#4 ; add one word offset to array index
ADD r0,r0,#1 ; add 1 to i
CMP r0,r1 ; exit?
BLT loop ; if i < N, continue

2/8/18 EE382N-4 Class Notes 35


Gnu AS program format

.file “test.s”
.text
.global main
.type main, %function
main:
MOV R0, #100
ADD R0, R0, R0
SWI #11
.end

2/8/18 EE382N-4 Class Notes 36


Gnu AS program format

.file “test.s”
.text
export variable .global main
.type main, %function
main:
set the type of a
MOV R0, #100
symbol to be
ADD R0, R0, R0 either a function
SWI #11 or an object
signals the end
.end
of the program

call interrupt to
end the program

2/8/18 EE382N-4 Class Notes 37


ARM assembly program

label operation operand comments

main:
LDR R1, value // load value
STR R1, result
SWI #11

value: .word 0x0000C123


result: .word 0

2/8/18 EE382N-4 Class Notes 38


Control structures

§ Flow of control:
– Sequence.
– Decision: if-then-else, switch
– Iteration: repeat-until, do-while, for

2/8/18 EE382N-4 Class Notes 39


if statements

if C then T else E // find maximum


if (R0>R1) then R2:=R0
C else R2:=R1

BNE else

T
B endif
else:

E
endif:

2/8/18 EE382N-4 Class Notes 40


if statements

if C then T else E // find maximum


if (R0>R1) then R2:=R0
C else R2:=R1

BNE else

T CMP R0, R1
B endif BLE else
else: MOV R2, R0
B endif
E
else: MOV R2, R1
endif:
endif:

2/8/18 EE382N-4 Class Notes 41


if statements

// find maximum
if (R0>R1) then R2:=R0
Two other options: else R2:=R1

CMP R0, R1
MOVGT R2, R0
MOVLE R2, R1 CMP R0, R1
BLE else
MOV R2, R0 MOV R2, R0
CMP R0, R1 B endif
MOVLE R2, R1 else: MOV R2, R1
endif:

2/8/18 EE382N-4 Class Notes 42


if statements

if (R1==1 || R1==5 || R1==12) R0=1;

TEQ R1, #1 ...


TEQNE R1, #5 ...
TEQNE R1, #12 ...
MOVEQ R0, #1 BNE fail

2/8/18 EE382N-4 Class Notes 43


if statements

if (R1==0) zero
else if (R1>0) plus
else if (R1<0) neg

TEQ R1, #0
BMI neg
BEQ zero
BPL plus
neg: ...
B exit
Zero: ...
B exit
...

2/8/18 EE382N-4 Class Notes 44


Multi-way branches
CMP R0, #`0’
BCC other // less than ‘0’
CMP R0, #`9’
BLS digit // between ‘0’ and ‘9’
CMP R0, #`A’
BCC other
CMP R0, #`Z’
BLS letter // between ‘A’ and ‘Z’
CMP R0, #`a’
BCC other
CMP R0, #`z’
BHI other // not between ‘a’ and ‘z’
letter: ...

2/8/18 EE382N-4 Class Notes 45


Switch statements

switch (exp) { e=exp;


case c1: S1; break; if (e==c1) {S1}
case c2: S2; break; else
... if (e==c2) {S2}
case cN: SN; break; else
default: SD;
...
}

2/8/18 EE382N-4 Class Notes 46


Switch statements

switch (R0) { CMP R0, #0


case 0: S0; break; BEQ S0
case 1: S1; break; CMP R0, #1
case 2: S2; break; BEQ S1
case 3: S3; break; CMP R0, #2
default: err;
BEQ S2
}
CMP R0, #3
BEQ S3
err: ...
The range is between 0 and N
B exit
Slow if N is large S0: ...
B exit

2/8/18 EE382N-4 Class Notes 47


Switch statements

ADR R1, JMPTBL What if the range is between


CMP R0, #3 M and N?
LDRLS PC, [R1, R0, LSL #2]
err:...
B exit For larger N and sparse values,
S0: ... we could use a hash function.
R1 JMPTBL
JMPTBL:
S0
.word S0
.word S1 R0
S1
.word S2
.word S3 S2

S3

2/8/18 EE382N-4 Class Notes 48


Iteration
§ repeat-until
§ do-while
§ for

2/8/18 EE382N-4 Class Notes 49


repeat loops

do { S } while ( C )

loop:
S

Test C
BEQ loop
endw:

2/8/18 EE382N-4 Class Notes 50


while loops

while ( C ) { S }

loop:
B test
Test C loop:
BNE endw
S
S test:
Test C
BEQ loop
B loop endw:
endw:

2/8/18 EE382N-4 Class Notes 51


while loops

while ( C ) { S }

C
B test BNE endw
loop: loop:

S S
test: test:

C C
BEQ loop BEQ loop
endw: endw:

2/8/18 EE382N-4 Class Notes 52


GCD

int gcd (int i, int j)


{
while (i!=j)
{
if (i>j)
i -= j;
else
j -= i;
}
}

2/8/18 EE382N-4 Class Notes 53


for loops

for ( I ; C ; A ) { S } for (i=0; i<10; i++)


{ a[i]:=0; }

loop:
I

C
BNE endfor

A
B loop
endfor:

2/8/18 EE382N-4 Class Notes 54


for loops

for ( I ; C ; A ) { S } for (i=0; i<10; i++)


{ a[i]:=0; }

loop:
I MOV R0, #0
ADR R2, A
C MOV R1, #0
BNE endfor loop: CMP R1, #10
BGE endfor
S STR R0,[R2,R1,LSL #2]
ADD R1, R1, #1
A B loop
endfor:
B loop
endfor:

2/8/18 EE382N-4 Class Notes 55


for loops

for (i=0; i<10; i++) Execute a loop for a


{ do something; } constant # of times.

MOV R1, #0 MOV R1, #10


loop: CMP R1, #10 loop:
BGE endfor
// do something // do something
ADD R1, R1, #1 SUBS R1, R1, #1
B loop BNE loop
endfor: endfor:

2/8/18 EE382N-4 Class Notes 56


Procedures
§ Arguments: expressions passed into a function
§ Parameters: values received by the function
§ Caller and callee

void func(int a, int b)


callee
{
...
parameters
}

int main(void) caller


{
arguments
...
func(100,200);
...
}

2/8/18 EE382N-4 Class Notes 57


Procedures

main:
... func:
BL func ...
... ...

.end .end

§ How to pass arguments? By registers? By stack? By memory?


In what order?

2/8/18 EE382N-4 Class Notes 58


Procedures

main: caller callee


// use R5 func:
BL func ...
// use R5 // use R5
... ...
... ...
.end .end
§ How to pass arguments? By registers? By stack? By memory?
In what order?
§ Who should save R5? Caller? Callee?

2/8/18 EE382N-4 Class Notes 59


Procedures (caller save)

main: caller callee


// use R5 func:
// save R5 ...
BL func // use R5
// restore R5
// use R5
.end .end
§ How to pass arguments? By registers? By stack? By memory?
In what order?
§ Who should save R5? Caller? Callee?

2/8/18 EE382N-4 Class Notes 60


Procedures (callee save)

main: caller callee


// use R5 func: // save R5
BL func ...
// use R5 // use R5

//restore R5
.end .end
§ How to pass arguments? By registers? By stack? By memory?
In what order?
§ Who should save R5? Caller? Callee?

2/8/18 EE382N-4 Class Notes 61


ARM Procedure Call Standard (APCS)
§ ARM defines a set of rules for procedure entry and exit so that
– Object codes generated by different compilers can be linked together
– Procedures can be called between high-level languages and assembly
§ APCS defines
– Use of registers
– Use of stack
– Format of stack-based data structure
– Mechanism for argument passing
§ Argument passing
– The first four word arguments are passed through R0 to R3.
– Remaining parameters are pushed into stack in the reverse order.
– Procedures with less than four parameters are more effective.
§ Return Value
– One word value in R0
– A value of length 2~4 words (R0-R1, R0-R2, R0-R3)

2/8/18 EE382N-4 Class Notes 62


APCS register usage convention
Register APCS Name Function
0 a1 Argument 1 / integer result / scratch register Used to pass the first
1 a2 Argument 1 / scratch register 4 parameters
Caller-saved if
2 a3 Argument 1 / scratch register
necessary
3 a4 Argument 1 / scratch register
4 v1 Register variable 1
5 v2 Register variable 2
6 v3 Register variable 3
Register variables,
must return
7 v4 Register variable 4
unchanged
8 v5 Register variable 5 Callee-saved
9 sb/v6 Register variable 6 / Static base
10 sl/v7 Register variable 7 / Stack limit
11 fp Frame pointer
12 ip Scratch register / new sb in inter-link -unit calls
Registers for special
purposes
13 sp Lower end of current stack frame
Could be used as
14 lr Link address / scratch register temporary variables
15 pc Program counter if saved properly.
2/8/18 EE382N-4 Class Notes 63
Function entry/exit
§ A simple leaf function with less than four parameters has the
minimal overhead. 50% of calls are to leaf functions

BL leaf1 main
...

leaf1: ...
... leaf leaf

MOV PC, LR // return


leaf

leaf

2/8/18 EE382N-4 Class Notes 64


Function entry/exit
§ Save a minimal set of temporary variables

BL leaf2
...

leaf2: STMFD sp!, {regs, lr} // save


...
LDMFD sp!, {regs, pc} // restore and
// return

2/8/18 EE382N-4 Class Notes 65


Standard ARM C program address space

application
load address code
application image
top of application static data

heap
top of heap

stack limit (sl)


stack pointer (sp)

stack
top of memory
2/8/18 EE382N-4 Class Notes 66
Accessing operands

§ A procedure often accesses operand in the following ways


– An argument passed on a register: no further work
– An argument passed on the stack: use stack pointer (R13) relative addressing
with an immediate offset known at compiling time
– A constant: PC-relative addressing, offset known at compiling time
– A local variable: allocate on the stack and access through stack pointer
relative addressing
– A global variable: allocated in the static area and can be accessed by the
static base relative (R9) addressing

2/8/18 EE382N-4 Class Notes 67


Procedure
main: low
LDR R0, #0
...
BL func
...

high
stack
2/8/18 EE382N-4 Class Notes 68
Procedure
func: STMFD SP!, {R4-R6, LR} low
SUB SP, SP, #0xC
...
STR R0, [SP, #0] // v1=a1
v1
v2
...
v3
ADD SP, SP, #0xC
R4
LDMFD SP!, {R4-R6, PC}
R5
R6
LR

high
stack
2/8/18 EE382N-4 Class Notes 69
Block copy example
void bcopy(char *to, char *from, int n)
{
while (n--)
*to++ = *from++;
}

2/8/18 EE382N-4 Class Notes 70


Block copy example
// arguments: R0: to, R1: from, R2: n
bcopy: TEQ R2, #0
BEQ end
loop: SUB R2, R2, #1
LDRB R3, [R1], #1
STRB R3, [R0], #1
B bcopy
end: MOV PC, LR

2/8/18 EE382N-4 Class Notes 71


Block copy example
// arguments: R0: to, R1: from, R2: n
// rewrite “n–-” as “-–n>=0”
bcopy: SUBS R2, R2, #1
LDRPLB R3, [R1], #1
STRPLB R3, [R0], #1
BPL bcopy
MOV PC, LR

2/8/18 EE382N-4 Class Notes 72


Block copy example

// arguments: R0: to, R1: from, R2: n


// assume n is a multiple of 4; loop
unrolling
bcopy: SUBS R2, R2, #4
LDRPLB R3, [R1], #1
STRPLB R3, [R0], #1
LDRPLB R3, [R1], #1
STRPLB R3, [R0], #1
LDRPLB R3, [R1], #1
STRPLB R3, [R0], #1
LDRPLB R3, [R1], #1
STRPLB R3, [R0], #1
BPL bcopy
MOV PC, LR

2/8/18 EE382N-4 Class Notes 73


Block copy example

// arguments: R0: to, R1: from, R2: n


// n is a multiple of 16;
bcopy: SUBS R2, R2, #16
LDRPL R3, [R1], #4
STRPL R3, [R0], #4
LDRPL R3, [R1], #4
STRPL R3, [R0], #4
LDRPL R3, [R1], #4
STRPL R3, [R0], #4
LDRPL R3, [R1], #4
STRPL R3, [R0], #4
BPL bcopy
MOV PC, LR

2/8/18 EE382N-4 Class Notes 74


Block copy example

// arguments: R0: to, R1: from, R2: n


// n is a multiple of 16;
bcopy: SUBS R2, R2, #16
LDMPL R1!, {R3-R6}
STMPL R0!, {R3-R6}
BPL bcopy
MOV PC, LR

// can be extended to copy 40 bytes at a time


// if not multiple of 40, add a copy_rest
loop

2/8/18 EE382N-4 Class Notes 75


Search example
int main(void)
{
int a[10]={7,6,4,5,5,1,3,2,9,8};
int i;
int s=4;

for (i=0; i<10; i++)


if (s==a[i]) break;
if (i>=10) return -1;
else return i;
}

2/8/18 EE382N-4 Class Notes 76


Search
.section .rodata
.LC0:
.word 7
.word 6
.word 4
.word 5
.word 5
.word 1
.word 3
.word 2
.word 9
.word 8

2/8/18 EE382N-4 Class Notes 77


Search
.text low
.global main
.type main, %function
main: sub sp, sp, #48
s
adr r4, L9 // =.LC0
i
add r5, sp, #8
a[0]
ldmia r4!, {r0, r1, r2, r3}
stmia r5!, {r0, r1, r2, r3}
:
ldmia r4!, {r0, r1, r2, r3}
stmia r5!, {r0, r1, r2, r3}
a[9]
ldmia r4!, {r0, r1}
stmia r5!, {r0, r1}

high
stack
2/8/18 EE382N-4 Class Notes 78
Search
mov r3, #4
low
str r3, [sp, #0] // s=4
mov r3, #0
str r3, [sp, #4] // i=0
s
loop: ldr r0, [sp, #4] // r0=i i
cmp r0, #10 // i<10? a[0]
bge end
ldr r1, [sp, #0] // r1=s :
mov r2, #4
mul r3, r0, r2
add r3, r3, #8 a[9]
ldr r4, [sp, r3] // r4=a[i]

high
stack
2/8/18 EE382N-4 Class Notes 79
Search
teq r1, r4 // test if s==a[i] low
beq end

add r0, r0, #1 // i++


s
str r0, [sp, #4] // update i
i
b loop
a[0]
end: str r0, [sp, #4]
:
cmp r0, #10
movge r0, #-1
a[9]
add sp, sp, #48
mov pc, lr

high
stack
2/8/18 EE382N-4 Class Notes 80
Optimization
§ Remove unnecessary load/store
§ Remove loop invariant
§ Use addressing mode
§ Use conditional execution

2/8/18 EE382N-4 Class Notes 81


Search (remove load/store)
mov r1,
r3, #4
str r3, [sp, #0] // s=4 low
mov r3,
r0, #0
str r3, [sp, #4] // i=0
s
loop: ldr r0, [sp, #4] // r0=i i
cmp r0, #10 // i<10? a[0]
bge end
ldr r1, [sp, #0] // r1=s :
mov r2, #4
mul r3, r0, r2
add r3, r3, #8 a[9]
ldr r4, [sp, r3] // r4=a[i]

high
stack
2/8/18 EE382N-4 Class Notes 82
Search (remove load/store)
teq r1, r4 // test if s==a[i]
beq end low

add r0, r0, #1 // i++


s
str r0, [sp, #4] // update i
i
b loop
a[0]
end: str r0, [sp, #4]
:
cmp r0, #10
movge r0, #-1
a[9]
add sp, sp, #48
mov pc, lr

high
stack
2/8/18 EE382N-4 Class Notes 83
Search (loop invariant/addressing mode)
mov r1,
r3, #4
str r3, [sp, #0] // s=4 low
mov r0,
r3, #0
str r3, [sp, #4] // i=0
s
loop: ldr r0, [sp, #4] // r0=i
mov r2, sp, #8 i
cmp r0, #10 // i<10? a[0]
bge end
ldr r1, [sp, #0] // r1=s :
mov r2, #4
mul r3, r0, r2
add r3, r3, #8
a[9]
ldr r4, [sp, r3] // r4=a[i]
ldr r4, [r2, r0, LSL #2]

high
stack
2/8/18 EE382N-4 Class Notes 84
Search (conditional execution)
teq r1, r4 // test if s==a[i]
beq end low

add
addeq r0, r0, #1 // i++
s
str r0, [sp, #4] // update i
i
bbeq loop
a[0]

:
end: str r0, [sp, #4]
cmp r0, #10 a[9]
movge r0, #-1
add sp, sp, #48
mov pc, lr
high
stack
2/8/18 EE382N-4 Class Notes 85
Optimization Summary
§ Remove unnecessary load/store
§ Remove loop invariant
§ Use addressing mode
§ Use conditional execution

§ From 22 words to 13 words and execution time is greatly reduced.

2/8/18 EE382N-4 Class Notes 86


Backup

2/8/18 EE382N-4 Class Notes 87


Assembler: Pseudo-ops

AREA -> chunks of data ($data) or code ($code)

ADR -> load address into a register


ADR R0, BUFFER

ALIGN -> adjust location counter to word boundary usually after a


storage directive

END -> no more to assemble

2/8/18 EE382N-4 Class Notes 88


Assembler: Pseudo-ops

DCD -> defined word value storage area


BOW DCD 1024, 2055, 9051

DCB -> defined byte value storage area


BOB DCB 10, 12, 15

% -> zeroed out byte storage area


BLBYTE % 30

2/8/18 EE382N-4 Class Notes 89


Assembler: Pseudo-ops

IMPORT -> name of routine to import for use in this routine


IMPORT _printf ; C print routine

EXPORT -> name of routine to export for use in other routines


EXPORT add2 ; add2 routine

EQU -> symbol replacement


loopcnt EQU 5

2/8/18 EE382N-4 Class Notes 90


Assembly Line Format

label <whitespace> instruction <whitespace> ; comment

label: created by programmer, alphanumeric

whitespace: space(s) or tab character(s)

instruction: op-code mnemonic or pseudo-op with required fields

comment: preceded by ; ignored by assembler but useful


to the programmer for documentation

NOTE: All fields are optional.

2/8/18 EE382N-4 Class Notes 91

You might also like