Computer Architecture

Interlude: Example ISA ARM

Nov 14, 2013 Josef Weidendorfer (replacement today for Prof. Gerndt)

Chair for Computer Architecture LRR-TUM (“Informatik 10”)

Technische Universität München

PowerPC.ISAs (still) Used Today CISC •  x86 = IA-32 (Intel) •  x86_64 = Intel64 = amd64 (AMD/Intel) •  s390 (IBM Mainframes) RISC •  MIPS. Intel) Old ISAs •  680x0 (old CISC. SPARC. Motorola) •  PA-RISC. ARM VLIW •  Itanium (EPIC. Alpha (old RISC) WS12/13 2 / 15 .

was „Advanced RISC Machines“.ARM. network.g. The Company •  „ARM limited“. was „Acorn RISC Computer“. 28nm TSMC) –  cores as VHDL codes –  fee per core + fee per built device •  for easy integration with other parts into custom chips for embedded market: „SoC“ –  includes GPU. UK •  Designs RISC processors for embedded systems •  two licence models –  cores ready for production with a given manufacturing process (e. … WS12/13 3 / 15 . controllers.

WLAN. responsible for GUI) –  other cores part of controllers for GSM. Bluetooth. … WS12/13 4 / 15 . GPS.Market for ARM Processors •  > 1 billon cores in Q4 2009 •  > 60% in mobile sets –  only 7% are “application processors” (running Linux/Android/iOS/WP8. network.

… •  ARM ISA is extensible: opcode space reserved for “co-processors” –  customers can add its own coprocessors –  ARM provides optional co-processors in own implementations –  examples •  floating point co-processor VFP-3 •  vector co-processor NEON •  Different encoding schemes –  ARM 32bit –  Thumb 1/2: 16bit / + mixed 32 bit –  Jazelle (native execution for Java Bytecode) WS12/13 5 / 15 . ARMv7.Processors with ARM ISA •  Adhere to a ISA version ARMv6.

ISA Versions. Implementations & Features WS12/13 6 / 15 .

SoCs using ARM Implementations •  Samsung S5L8900 –  used in iPod Touch 1G. original iPhone –  has a ARM 1176JZ-S •  familiy ARM11. ARMv7 ISA •  Others –  Texas Instruments OMAP3/4. iPhone 4S –  manufactured by Samsung –  Cortex A9. NVIDIA Tegra2/3 7 / 15 WS12/13 . adhering to ARMv6 ISA •  Apple A5 –  used in iPad2.

Example: OMAP3 / “BeagleBoard” WS12/13 8 / 15 .

ARMv7 ISA •  RISC (Reduced Instruction Set Computing) –  few. simple instructions (~32) –  fixed format (4 bytes for 32bit encoding…) •  load/store architecture –  explicit instructions for memory access –  simple + indexed addressing modes –  multiple load/store with modifying index •  arithmetic/logic operations combined with barrel shifter •  conditional execution WS12/13 9 / 15 .

ARM Register File •  16 general registers (r0 .g. for interrupt mode •  Special –  r13: stack –  r14: link register (return address for function calls) –  r15: program counter –  CPSR: status register (+ SPSR „saved“) •  including N/Z/C/V •  NEON –  separate 32 64bit / 16 128bit vector registers 10 / 15 WS12/13 .r15) –  some duplicated e.

•  Syntax: <Operation>{<cond>}{S} Rd. 11 / 15 WS12/13 . Operand2 •  Comparisons set flags only .they do not specify Rd •  Data movement does not specify Rn •  Second operand is sent to the ALU via barrel shifter. NOT memory. Rn.Data Processing Instructions •  Consist of : –  Arithmetic: ADD –  Logical: AND –  Comparisons: CMP –  Data movement: ADC ORR CMN MOV SUB EOR TST MVN SBC BIC TEQ RSB RSC •  Work on registers.

§  5 instructions §  5 words §  5 or 6 cycles §  3 instructions §  3 words §  3 cycles 12 / 15 WS12/13 . else ADD r2. #1 end . } else { r2 = r2 + 1. #1 B end . r2. #0 CMP r0. #1 ADDNE r2..Conditional execution examples C source code if (r0 == 0) { r1 = r1 + 1.. r1. r1. r2. #0 BNE else ADDEQ r1... #1 ADD r1. } ARM instructions unconditional conditional CMP r0.

pc} : : : : : MOV pc. •  BL <subroutine> –  Stores return address in LR –  Returning implemented by restoring the PC from LR –  For non-leaf functions. {regs. {regs.lr} : BL func2 : LDMFD sp!. LR will have to be stacked func1 func2 : : BL func1 : : STMFD sp!. ±32 Mbyte range.ARM Branches and Subroutines •  B <label> –  PC relative. lr 13 / 15 WS12/13 .

Thumb 1/2: not really RISC any longer •  Code size is important for power consumption è Thumb encoding uses 16bit instructions •  Possible by reducing flexibility. without need to switch modes –  adds bit field manipulation. no conditional execution •  Fast switch between encoding modes –  Bit 0 in PC specifies mode –  Switch modes via jumps •  Thumb-2: add some 32bit instructions –  flexible. less bits for immediates. conditional execution for a set of following instructions •  default for GCC with ARMv7 14 / 15 WS12/13 .

in-order. … –  OMAP5. OMAP3 (Beagleboard). 16-32 kB L1I/D. more stages. Samsung Exynos 5x (Nexus 10) –  can compete with current x86 Atom processors 15 / 15 WS12/13 . VFP-4. OMAP4 (Pandaboard).Current ARM implementations •  Cortex A8 (ARMv7) –  Single core. up to 1GHz –  13 stage superscalar (dual) pipeline. Tegra3 (Nexus 7) •  Cortex A15 –  more aggressive OoO. Tegra2 •  Cortex A9 –  1-4 cores. branch prediction. 0-1 MB L2 (only L2 cache controller part of core). 0-8 MB L2 –  speculative out-of-order –  Apple A5. VFP-3/NEON optional –  Apple A4. 16-64 kB L2I/D.

Samsung. Broadcom.ARMv8: 64bit for Servers •  new 64bit mode: AArch64 still supports 32bit = AArch32 •  completely new instruction set –  instructions still 32bit –  operands and addresses 64bit –  31 general registers with 64bit –  conditional execution removed (good branch prediction gives better performance ?!) –  mandatory extended NEON: 32 regs (128bit) •  for massively parallel server loads (simple. but lots of cores?) •  implementations expected for 2014 by AMD. … 16 / 15 WS12/13 .

2013 with representatives from ARM. TI.More information… •  ARM Architecture Reference Manual („ARM ARM“) •  CRE podcast about ARM. STMicro 17 / 15 WS12/13 . Wikipedia •  @TUM: new introductionary lab course with recently sponsored BeagleBoards (~40) •  inauguration event at Nov 23.

TOP 500: November 2012 in June 2012 NEW 1 2 3 NEW 4 NEW 18 / 15 WS12/13 .

TOP 500: November 2012 19 / 15 WS12/13 .

Sign up to vote on this title
UsefulNot useful