You are on page 1of 225

Ho Chi Minh City University of Technology

Department of Electrical and Electronics

1. History of CPUs
2. Intel x86 Processors
3. ARM processors
4. Memory
5. Computer Software

Chapter 5 1
1. History of CPUs
 1950s:
 Ferranti Mark 1, 1951: from University of Manchester
 single 80-bit accumulator , the 40-bit "multiplicand/quotient
register"
 UNIVAC I (UNIVersal Automatic Computer I) designed
principally by J. Presper Eckert and John Mauchly, the
inventors of the ENIAC
 1,905 operations per second running on a 2.25 MHz clock.
 IBM 704 in 1957:

Ferranti Mark 1, c. 1951 Ferranti Mark 1, c. 1951 An IBM 704 computer


at NACA in 1957
Chapter 5 2
1. History of CPUs
 1960s:
 IBM System/360 (S/360): 34,500 instructions per
second, with memory from 8 to 64 KB
 PDP-11: developed by Digital Equipment Corporation
 32 bit processor, allow 4 MB of physical memory
 Motorola 68000:
 Initial speed grades were 4, 6, and 8 MHz.
 68k instruction set

IBM System/360 PDP-11/40 Motorola MC68000


Chapter 5 3
1. History of CPUs
 1970s:
 Intel 4004 (1971):
 a single instruction cycle was 10.8 microseconds
 Clock rate is 1 MHz

 Intel 8008 (1972)/ 8080(1974)/8086(1976): 8-bit CPU with an


external 14-bit address
 8008 clock frequency: 0.2 - 0.8MHz
 8080 clock frequency: 2 MHz
 8086 clock frequency : 5-10MHz
 32-bit VAX (1977): based on DEC's earlier PDP-11, support
virtual memory

Intel 4004 Intel 8088 Intel 8086


Chapter 5 4
A Brief History of Computer
Link YouTube: https://www.youtube.com/watch?v=iK0PT5q7GlE

Chapter 5 5
2. Intel x86 Processors
 Dominate laptop/desktop/server market

 Evolutionary design
 Backwards compatible up until 8086, introduced in 1978
 Added more features as time goes on

 Complex instruction set computer (CISC)


 Many different instructions with many different formats
 But, only small subset encountered with Linux programs

 Hard to match performance of Reduced Instruction Set


Computers (RISC)
 But, Intel has done just that!
 In terms of speed. Less so for low power.

Chapter 5 6
Intel x86 Evolution: Milestones
Name Date Transistors MHz
 8086 1978 29K 5-10
 First 16-bit Intel processor. Basis for IBM PC & DOS
 1MB address space
 386 1985 275K 16-33
 First 32 bit Intel processor , referred to as IA32
 16 bit data path
 Added “flat addressing”, capable of running Unix
 486
 32-bit register, 32-bit data
 486DX include FPU (Floating Point Unit)
 Pentium 4E 2004 125M 2800-3800
 First 64-bit Intel x86 processor, referred to as x86-64
 Core 2 2006 291M 1060-3500
 First multi-core Intel processor
 Core i3, i5, i7 2008 731M 1700-3900
 Two cores / four cores

Chapter 5 7
Intel x86 Processors, cont.
 Machine Evolution
 386 1985 0.3M
 Pentium 1993 3.1M
 Pentium/MMX 1997 4.5M
 Pentium Pro 1995 6.5M
 Pentium III 1999 8.2M
 Pentium 4 2001 42M
 Core 2 Duo 2006 291M
 Core i7 2008 731M
 Added Features
 Instructions to support multimedia operations
 Instructions to enable more efficient conditional operations
 Transition from 32 bits to 64 bits
 More cores

Chapter 5 8
2015 State of the Art
 Core i7 Broadwell 2015

 Desktop Model
 4 cores
 Integrated graphics
 3.3-3.8 GHz
 65W

 Server Model
 8 cores
 Integrated I/O
 2-2.6 GHz
 45W

Chapter 5 9
2. Intel x86 Processors

 8086 processor
 40 pin dual in-line package
 16-bit wide data bus
 16-bit registers
 20-bit external address bus
provides a 1 MB physical
address space
 The maximum linear address
space is limited to 64 KB
 Max CPU clock: 5- 10 MHz

Chapter 5 10
1. CPU - x86 Processor
 CPU, memory, input/output devices
 Instruction set, interfacing C to assembly, macros, stack
frame and calling convention
 Interrupt, exception

Chapter 5 11
The architecture of 8086 microprocessor
 2 major units:
 BIU - Bus Interface Unit: bus interface, segment registers, fetch
queue
 EU - Execution Unit: control unit, ALU, registers

Chapter 5 12
1. CPU
1.3. x86 Processors - 8086
 Instructions:
 One-address or two addresses operations
 Support Assembly and high-level programming language (C,
Pascal)
 Main registers: are called data register or general register
 16 bit data
 Can be accessed by 8-bit registers

AH AL AX (primary accumulator)
BH BL BX (base, accumulator)
CH CL CX (counter, accumulator)
DH DL DX (accumulator, other functions

Chapter 5 13
1. CPU
1.3. 8086 Processors - 8086
 Index registers: for addressing

SI Source Index
DI Destination Index
BP Base Pointer
SP Stack Pointer
 Program counter:

IP Instruction Pointer

 Segment registers:
CS Code Segment
DS Data Segment
ES Extra Segment
SS Stack Segment
Chapter 5 14
1. CPU
1.3. 8086 Processors
 Segment registers:
 a way to allow programs to address more than 64 KB
 the registers CS, DS, SS, and ES point to the currently used program code
segment (CS), the current data segment (DS), the current stack segment
(SS), and one extra segment determined by the programmer (ES).

CS Code Segment
DS Data Segment
ES Extra Segment
SS Stack Segment

0110 1000 1000 0111 0000 Segment, 16 bits, shifted 4 bits left
+ 0011 0100 1010 1001 Offset, 16 bits

0110 1011 1101 0001 1001 Address, 20 bits

Chapter 5 15
1. CPU
1.3. 8086 Processors - 8086
 Examples for x86
memory segmentation

Chapter 5 16
1. CPU
1.3. x86 Processors
 x86-32: 80386, 80486
 Register extend to 32-bit
 EAX. EBX ECX, EDX
 ESI, EDI, EBP, ESP, EIP, EFLAGS
 Two new segment registers (FS and GS) were added
 FS, GS is extra data for segment registers
 x86-64: AMD64, Core i5, Core i7,
 An R-prefix identifies the 64-bit registers (RAX, RBX,
RCX, RDX, RSI, RDI, RBP, RSP, RFLAGS, RIP)
 Add eight additional 64-bit general registers (R8-R15)

Chapter 5 17
Some History: IA32 Registers
Origin
(mostly obsolete)
%eax %ax %ah %al accumulate

%ecx %cx %ch %cl counter


general purpose

%edx %dx %dh %dl data

%ebx %bx %bh %bl base

source
%esi %si index

destination
%edi %di index
stack
%esp %sp
pointer
base
%ebp %bp
pointer

16-bit virtual registers


Chapter 5 (backwards compatibility) 18
x86-64 Integer Registers
%rax %eax %r8 %r8d

%rbx %ebx %r9 %r9d

%rcx %ecx %r10 %r10d

%rdx %edx %r11 %r11d

%rsi %esi %r12 %r12d

%rdi %edi %r13 %r13d

%rsp %esp %r14 %r14d

%rbp %ebp %r15 %r15d

 Can reference low-order 4 bytes (also low-order 1


& 2 bytes) Chapter 5 19
3. ARM Processors
• ARM (Acorn RISC Machine) started as a new, powerful, CPU
design for the replacement of the 8-bit 6502 in Acorn
Computers (Cambridge, UK, 1985)
• First models had only a 26-bit program counter, limiting the
memory space to 64 MB (not too much by today standards,
but a lot at that time).
• 1990 spin-off: ARM renamed Advanced RISC Machines

Chapter 5 20
3. ARM Processors
• ARM now focuses on Embedded CPU cores
• IP licensing: Almost every silicon manufacturer sells
some microcontroller with an ARM core. Some even
compete with their own designs.
• Processing power with low current consumption
• Good MIPS/Watt figure

• Ideal for portable devices

• Compact memories: 16-bit opcodes (Thumb)


• New cores with added features
• Harvard architecture (ARM9, ARM11, Cortex)
• Floating point arithmetic
• Vector computing
• Java language
Chapter 5 21
3. ARM Processors
• 32-bit CPU, Harvard architecture
• 3-operand instructions (typical): ADD Rd,Rn,Operand2
• RISC design:
• Few, simple, instructions
• Load/store architecture (instructions operate on registers, not
memory)
• Large register set
• Pipelined execution

Chapter 5 22
Topologies
Von Neumann Harvard

ARM9s
ARM7s and newers
and olders
Inst. Data

AHB
bus
I D
Cache Cache
MEMORY
& I/O

Bus Interface

AHB
Memory-mapped I/O: bus
• No specific instructions for I/O
(use Load/Store instr. instead) MEMORY
• Peripheral’s registers at some & I/O
memory addresses
Chapter 5 23
ARM Pipelining examples
ARM7TDMI Pipeline

FETCH DECODE EXECUTE


Reg. Reg.
Read Shift ALU Write

1 Clock cycle

ARM9TDMI Pipeline

FETCH DECODE EXECUTE MEMORY WRITE


Reg. Reg.
Shift ALU access
Read Write

1 Clock cycle

• Fetch: Read Op-code from memory to internal Instruction Register

• Decode: Activate the appropriate control lines depending on Opcode


• Execute: Do the actual processing

Chapter 5 24
ARM7TDMI Pipelining (I)

1 FETCH DECODE EXECUTE

2 FETCH DECODE EXECUTE

3 FETCH DECODE EXECUTE


instruction
time

• Simple instructions (like ADD) Complete at a rate of one per cycle

Chapter 5 25
ARM7TDMI Pipelining (II)

• More complex instructions:

1 ADD FETCH DECODE EXECUTE

2 STR FETCH DECODE Cal. ADDR Data Xfer.

3 ADD FETCH stall DECODE EXECUTE

4 ADD FETCH stall DECODE EXECUTE

5 ADD FETCH DECODE EXECUTE


instruction
time

STR : 2 effective clock cycles (+1 cycle)

Chapter 5 26
Data Sizes and Instruction Sets
◼ The ARM is a 32-bit architecture.

◼ When used in relation to the ARM:


◼ Byte means 8 bits
◼ Halfword means 16 bits (two bytes)
◼ Word means 32 bits (four bytes)

◼ Most ARM’s implement two instruction sets


◼ 32-bit ARM Instruction Set
◼ 16-bit Thumb Instruction Set

Chapter 5 27
Processor Modes
◼ The ARM has seven operating modes:

◼ User : unprivileged mode under which most tasks run

◼ FIQ : entered when a high priority (fast) interrupt is raised

◼ IRQ : entered when a low priority (normal) interrupt is raised

◼ SVC : (Supervisor) entered on reset and when a Software Interrupt


instruction is executed

◼ Abort : used to handle memory access violations

◼ Undef : used to handle undefined instructions

◼ System : privileged mode using the same registers as user mode

Chapter 5 28
The Registers
◼ ARM has 37 registers all of which are 32-bits long.
◼ 1 dedicated program counter
◼ 1 dedicated current program status register
◼ 5 dedicated saved program status registers
◼ 30 general purpose registers

◼ The current processor mode governs which of several banks is


accessible. Each mode can access
◼ a particular set of r0-r12 registers
◼ a particular r13 (the stack pointer, sp) and r14 (the link register, lr)
◼ the program counter, r15 (pc)
◼ the current program status register, cpsr

Privileged modes (except System) can also access


◼ a particular spsr (saved program status register)

Chapter 5 29
The ARM Register Set
Current Visible Registers
r0
Abort
Undef
SVC
IRQ
FIQ
User Mode
Mode
Mode
Mode
Mode r1
r2
r3 Banked out Registers
r4
r5
User,
r6 User FIQ IRQ SVC Undef Abort
r7
SYS
r8 r8 r8
r9 r9 r9
r10 r10 r10
r11 r11 r11
r12 r12 r12
r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp)
r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr)
r15 (pc)

cpsr
spsr spsr spsr spsr spsr spsr

Chapter 5 30
Special Registers
◼ Special function registers:
◼ PC (R15): Program Counter. Any instruction with PC as its destination
register is a program branch

◼ LR (R14): Link Register. Saves a copy of PC when executing the BL


instruction (subroutine call) or when jumping to an exception or interrupt
routine
- It is copied back to PC on the return from those routines

◼ SP (R13): Stack Pointer. There is no stack in the ARM architecture. Even


so, R13 is usually reserved as a pointer for the program-managed stack

◼ CPSR : Current Program Status Register. Holds the visible status register

◼ SPSR : Saved Program Status Register. Holds a copy of the previous status
register while executing exception or interrupt routines
- It is copied back to CPSR on the return from the exception or interrupt
- No SPSR available in User or System modes

Chapter 5 31
5. Memory
 Memory - Purpose of memory is data storage. Two major
types of memory :

 Primary memory - to hold data and instructions during


processing
 eg RAM. Relatively limited capacity and volatile

 Secondary memory - to provide permanent long term storage


 eg hard disk. High capacity and non-volatile

RAM banks Hard disk NAND flash chip


Chapter 5 32
5. Memory
 Primary memory consists of a set of locations defined
by sequentially numbered addresses. Each location
contains a binary number that can be interpreted as data
or an instruction.
 8086 uses 20-bit physical address
 Manage 1MB of memory
 80386 uses 32-bit physical address
 Manage 4GB of memory
 X86-64 uses 64-bit physical address
 Manage ??? of memory

Chapter 5 33
◆ Memory locations are called words. Words are 8 bits (one byte) in size, or
a multiple of 8. Common word sizes are 16, 32 and 64 bits.

0 1 0 0 1 0 0 0 1

1
1 1 0 1 0 0 1 1
2
0 1 0 0 0 0 0 0
3

4 1 0 1 0 0 1 1 1

5 1 1 1 0 1 0 1 0

1 1 0 0 1 0 1 0

Memory locations, using an 8 bit word


Chapter 5 34
2. Memory

 Memory is commonly measured in multiples of bits


and bytes.
1 bit = 1 binary digit (0 or 1).

1. 1 byte = 8 bits

2. 1KB = 1024 bytes = 210

3. 1MB = 1024 KB= 220

4. 1GB = 1024 MB = 230

5. 1TB = 1024 GB = 240


Chapter 5 35
Big Endian vs. Little Endian
• x86 processors are little-endian
• IBM z/Architecture mainframes are big-endian processors

Big Endian Little Endian


(Others) High Memory (Intel)
Addresses
Register Register

FE ED FA CE
00 0x5 00 FE ED FA CE
00 0x4 00
CE 0x3 FE
FA 0x2 ED
ED 0x1 FA
FE 0x0 CE
Low Memory Addresses
Chapter 5 36
Quiz
1) Pick the correct choice for the 8086 CPU.
A 16 bit word size, 8 bit data path
B 8 bit word size, 8 bit data path
C 16 bit word size, 16 bit data path
D 4 bit word size, 8 bit data path
E 8 bit word size, 16 bit data path
2) Pick the correct choice for the 80386SX CPU.
A 16 bit word size, 16 bit data path
B 32 bit word size, 16 bit data path
C 8 bit word size, 32 bit data path
D 32 bit word size, 8 bit data path
E 32 bit word size, 32 bit data path
3) Pick the correct choice for the 80486DX CPU.
A 32 bit word size, 16 bit data path
B 64 bit word size, 32 bit data path
C 32 bit word size, 32 bit data path
D 32 bit word size, 16 bit data path
E 32 bit word size, 64 bit data path
Chapter 5 37
Quiz
4) What is the first CPU to include an internal math
coprocessor?
A 386DX
B 486SX
C 486DX
D Pentium
5) What are the two main components of the CPU?
A The Control Unit and ALU
B The Registers and Output/Input management
C The ALU and FPU
6) What are the two main desktop CPU manufacturers?
A Intel and AMD
B Via and Power PC Address Content
C Marek and Sun UltraSparc 0x4000 2F
7) What are the 32-bit data when we read a double-word at 0x4001 65
the address 0x4000 with Big Endian mode?
0x4002 7E
A 0xAC7E652F
B 0x2F657EAC 0x4003 AC
C 0xCAE756F2 Chapter 5 38
Quiz
8) Pick the correct choice for the ARM processor.
A 16 bit word size, 16 bit data path
B 32 bit word size, 16 bit data path
C 8 bit word size, 32 bit data path
D 32 bit word size, 8 bit data path
E 32 bit word size, 32 bit data path
9) Pick the wrong choice for ARM architecture.
A Von Neumann architecture
B Harvard architecture
C 3 stage pipeline architecture
D 32-bit ARM Instruction Set
10) Pick the wrong choice for ARM registers.
A ARM has 37 32-bit registers
B There are 13 general purpose registers
C R13 is Stack Pointer
D R14 is the program counter

Chapter 5 39
Exercises
1. Suppose that you discover that RAM addresses 000C0000 to 000C7FFF are
reserved for a PC’s video adapter. How many bytes of memory is this?
2. Suppose that you have an Intel 8086. Find the five-hex-digit address that
corresponds to each of these segment:offset pairs:
(a) 2B8C:8D21 (b) 059A:7A04 (c) 1234:5678
3. In an 8086 program, suppose that the data segment register DS contains the
segment number 23D1 and that an instruction fetches a word at offset 7B86
in the data segment. What is the five-hex-digit address of the word that is
fetched?
4. In an 8086 program, suppose that the code segment register CS contains the
segment number 014C and that the instruction pointer IP contains 15FE.
What is the five-hex-digit address of the next instruction to be fetched?
5. What are advantages and disadvantage of secondary memory?

Chapter 5 40
ĐẠI HỌC QUỐC GIA TP.HỒ CHÍ MINH
TRƯỜNG ĐẠI HỌC BÁCH KHOA
KHOA ĐIỆN-ĐIỆN TỬ
BỘ MÔN KỸ THUẬT ĐIỆN TỬ

Computer System Engineering


Chapter 6: ARM programming

1. Introduction to ARM processors


2. ARM Cortex-M4 Microcontroller Series
3. ARM programming

1
References
• Textbook
– Joseph Yiu, “The Definitive Guide to the ARM Cortex-M3”,
Elsevier Newnes, 2007
• Websites
– www.arm.com www.thegioiic.com
– www.st.com www.arm.vn
– www.ti.com www.tme.vn
– www.nxp.com www.proe.vn

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 2


1. Introduction to ARM processors
• ARM (Advanced RISC Machine)
– is the industry's leading provider of 32-bit embedded microprocessors
– offering a wide range of processors that deliver high performance, industry
leading power efficiency and reduced system cost

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 3


1. Introduction to ARM processors
• Cortex™-A Series - High performance processors for open Operating
Systems
• Cortex-R Series - Exceptional performance for real-time applications
• Cortex-M Series - Cost-sensitive solutions for deterministic microcontroller
applications

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 4


2 ARM Cortex-M4 – TM4F
• Features:
– ARM Cortex-M4F core CPU speed up to 80
MHz with floating point
– Up to 256-KB Flash
– Up to 32-KB single-cycle SRAM
– Two high-speed 12-bit ADCs up to 1MSPS
– Up to two CAN 2.0 A/B controllers
– Optional full-speed USB 2.0 OTG/
Host/Device
– Up to 40 PWM outputs
– Serial communication with up to: 8 UARTs, 6
I2Cs, 4 SPI/SSI
– Intelligent low-power design power
consumption as low as 1.6 μA

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 5


2 ARM Cortex-M4 – Development Kit
• EK-TM4F120 LaunchPad • EK-LM4F232 Development Kit
Evaluation Kit − ARM® Cortex™-LX4F232
– 80-MHz, 32-bit ARM Cortex-M4 − color OLED display,
CPU − USB OTG,
– 256 Kbytes of FLASH − A micro SD card, a coin cell battery,
– Many peripherals such as MC − A temperature sensor,
PWMs, 1-MSPS ADCs, eight UARTs, − A three axis
four SPIs, four I2Cs, USB
Host|Device, and up to 27 timers.

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 6


2. ARM Cortex-M4 – TM4F
• Connectivity features:
– CAN, USB Device, SPI/SSI, I2C, UARTs
• High-performance analog
integration
– Two 1 MSPS 12-bit ADCs
– Analog and digital comparators
• Best-in-class power
consumption
– As low as 370 μA/MHz
– 500μs wakeup from low-power
modes
– RTC currents as low as 1.7μA
• Solid roadmap
– Higher speeds, Ultra-low power
– Larger memory
Bộ môn Kỹ Thuật Điện Tử - ĐHBK 7
2. ARM Cortex-M4 - LM4F120H5QR
• LM4F120H5QR
package

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 8


2. ARM Cortex-M4 - LM4F120H5QR
• LM4F120H5QR has 6 GPIO blocks,
supporting up to 43 IO pins
– Port A: 8 bits
– Port B : 8 bits
– Port C : 8 bits
– Port D : 8 bits
– Port E : 6 bits
– Port F : 5 bits
• GPIO pad configuration
– Weak pull-up or pull-down resistors
– 2-mA, 4-mA, and 8-mA pad drive
– Slew rate control for 8-mA pad drive
– Open drain enables
– Digital input enables

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 9


2. ARM Cortex-M4 - LM4F120H5QR
• 256KB Flash memory
– Single-cycle to 40MHz
– Pre-fetch buffer and speculative branch improves
performance above 40 MHz
• 32KB single-cycle SRAM with bit-banding
– Internal ROM loaded with StellarisWare software
– Stellaris Peripheral Driver Library
– Stellaris Boot Loader
– Advanced Encryption Standard (AES) cryptography tables
– Cyclic Redundancy Check (CRC) error detection functionality
• 2KB EEPROM (fast, saves board space)
– Wear-leveled 500K program/erase cycles
– 10 year data retention
– 4 clock cycle read time

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 10


2. ARM Cortex-M4 - LM4F120H5QR
• Clock and reset

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 11


2. ARM Cortex-M4 - LM4F120H5QR

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 12


2. ARM Cortex-M4 - LM4F120H5QR
USB Device Signals
GPIO Pin Pin Function USB Device
PD4 USB0DM D-
PD5 USB0DP D+

Stellaris® In-Circuit Debug


Virtual COM Port Signals
Interface (ICDI) Signals

GPIO Pin Pin Function GPIO Pin Pin Function


PC0 TCK/SWCLK PA0 U0RX
PC1 PC1 TMS/SWDIO PA1 U0TX
PC2 TDI
PC3 PC3 TDO/SWO

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 13


2. ARM Cortex-M4 - LM4F120H5QR
Virtual COM Port Signals

GPIO Pin Pin Function


PA0 U0RX
PA1 U0TX

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 14


2. ARM Cortex-M4 – STM32F4
• Features:
– 180 MHz/225 DMIPS Cortex-M4
– Single cycle DSP MAC and floating
point unit
– Memory accelerator
– Graphic accelerator
– Multi DMA controllers
– SDRAM interface support
– Ultra-low dynamic power in Run
mode: 260 μA/MHz at 180 MHz

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 15


2. ARM Cortex-M4 – STM32F4
• Block diagram

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 16


2. ARM Cortex-M4 – STM32F4
• Development kit
– STM32F407VGT6 microcontroller
– 168MHz/210 DMIPS
– DSP MAC and floating point unit
– 1 MB Flash, 192 KB RAM
– On-board ST-LINK/V2
– Power supply: 3 V and 5 V
– 3-axis accelerometer
– Audio sensor, omni-directional digital
microphone
– audio DAC with integrated class D speaker
driver
– Eight LEDs:
– Two push buttons (user and reset)
Discovery kit for STM32F407

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 17


2. ARM Cortex-M4 – STM32F4
• Package LQFP100
• GPIO
– Port A: 16 bit
– Port B: 16 bit
– Port C: 16 bit
– Port D: 16 bit
– Port E: 16 bit
• can sink or source up to
±8mA
• except PC13, PC14 and
PC15 which can sink or
source up to ±3mA

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 18


2. ARM Cortex-M4 – STM32F4
Typical application with an 8 MHz crystal

Typical application with a 32.768 kHz crystal

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 19


2. ARM Cortex-M4 – STM32F4
• Reset circuit

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 20


3. ARM Programming
• Using Assembly
– for small projects
– can get the best optimization, smallest memory size
– increase development time, easy to make mistakes
• Using C
– easier for implementing complex operations
– larger memory size
– able to include assembly code (inline assembler)
– Tools: RealView Development Suite (RVDS), KEIL RealView
Microcontroller Development Kit, Code Composer, IAR

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 21


3. ARM Programming
• Typical development flow

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 22


3. ARM Programming – Simple program
This simple program contains the initial SP value, the initial PC value, and setup
registers and then does the required calculation in a loop.

STACK_TOP EQU 0x20002000 ; constant for SP starting value


AREA |Header Code|, CODE
DCD STACK_TOP ; Stack top
DCD Start ; Reset vector
ENTRY ; Indicate program execution start here
Start ; Start of main program initialize registers
MOV r0, #10 ; Starting loop counter value
MOV r1, #0 ; starting result
loop ; Calculated 10+9+8+...+1
ADD r1, r0 ; R1 = R1 + R0
SUBS r0, #1 ; Decrement R0, update fl ag (“S” suffi x)
BNE loop ; If result not zero jump to loop,
; result is now in R1
deadloop B deadloop ; Infinite loop
END ; End of file

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 23


3. ARM Programming - Simple program
• Compile a assemply code
– armasm --cpu cortex-m3 -o test1.o test1.s
• Link to executable image
– armlink --rw_base 0x20000000 --ro_base 0x0 --map -o test1.elf test1.o
• Create the binary image
– fromelf --bin --output test1.bin test1.elf
• generate a disassembled code list file
– fromelf -c --output test1.list test1.elf

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 24


3. ARM Programming
• Using EQU to define constants
NVIC_IRQ_SETEN0 EQU 0xE000E100
NVIC_IRQ0_ENABLE EQU 0x1
LDR R0,NVIC_IRQ_SETEN0
MOV R1,#NVIC_IRQ0_ENABLE ; Move immediate data to
register
STR R1, [R0] ; Enable IRQ 0 by writing R1 to address in R0
• Using DCI to code an instruction
DCI 0xBE00 ; Breakpoint (BKPT 0), a 16-bit instruction

• Using DCB and DCD to define binary data


MY_NUMBER
DCD 0x12345678
HELLO_TXT
DCB “Hello\n”,0 ; null terminated string

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 25


3. ARM Programming – Moving data
• Data transfers can be of one of the following types:
– Moving data between register and register
– Moving data between memory and register
– Moving data between special register and register
– Moving an immediate data value into a register

MOV R8, R3 ; moving data from register R3 to register


R8
MOV R0, #0x12 ; Set R0 = 0x12 (hexadecimal)
MOV R1, #‘A’ ; Set R1 = ASCII character A
MRS R0, PSR ; Read Processor status word
into R0
MSR CONTROL, R1 ; Write value of R1 into control register
LDR R0, address1 ; R0 set to 0x4001
...
address1
0x4000: MOV R0, R1 ; address1 contains program code

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 26


3. ARM Programming – Using Stack
• Stack PUSH and POP
subroutine_1
PUSH {R0-R7, R12, R14} ; Save registers
… ; Do your processing
POP {R0-R7, R12, R14} ; Restore registers
BX R14 ; Return to calling function

• Link register (LR or R14)


main ; Main program
BL function1 ; Call function1 using Branch with Link
; instruction.
; PC function1 and
; LR the next instruction in main

function1
… ; Program code for function 1
BX LR ; Return

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 27


3. ARM Programming – Special Register
• Special registers can only be accessed via MSR and MRS instructions

MRS <reg>, <special_reg> ; Read special register


MSR <special_reg>, <reg> ; write to special register

• ASP can be changed by using MSR instruction, but EPSR and IPSR are read-
only

MRS r0, APSR ; Read Flag state into R0


MRS r0, IPSR ; Read Exception/Interrupt state
MRS r0, EPSR ; Read Execution state
MSR APSR, r0 ; Write Flag state
MRS r0, PSR ; Read the combined program status
word
MSR PSR, r0 ; Write combined program state word

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 28


3. ARM Programming – Special Register
• To access the Control register, the MRS and MSR instructions
are used:
MRS r0, CONTROL ; Read CONTROL register into
R0
MSR CONTROL, r0 ; Write R0 into CONTROL
register

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 29


3. ARM Programming
• 16-Bit Load and Store Instructions

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 30


3. ARM Programming
• 16-Bit Branch Instructions

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 31


3. ARM Programming – Arithmetic Instructions

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 32


3. ARM Programming – IF-THEN
• The IF-THEN (IT) instructions allow up to four succeeding instructions
(called an IT block) to be conditionally executed.
• They are in the following formats:

IT<x> <cond>
IT<x><y> <cond>
IT<x><y><z> <cond>
where:
• <x > specifies the execution condition for the second instruction
• <y > specifies the execution condition for the third instruction
• <z > specifies the execution condition for the fourth instruction

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 33


3. ARM Programming – IF-THEN

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 34


3. ARM Programming – IF-THEN
• An example of a simple conditional execution
if (R1<R2) then
R2=R2-R1
R2=R2/2
else
R1=R1-R2
R1=R1/2
• In
assembly:
CMP R1, R2 ; If R1 < R2 (less then)
ITTEE LT ; then execute instruction 1 and 2
; (indicated by T)
; else execute instruction 3 and 4
; (indicated by E)
SUBLT.W R2,R1 ; 1st instruction
LSRLT.W R2,#1 ; 2nd instruction
SUBGE.W R1,R2 ; 3rd instruction (notice the GE is opposite of
LT)
LSRGE.W R1,#1 ; 4th instruction
Bộ môn Kỹ Thuật Điện Tử - ĐHBK 35
3. ARM Programming – Using Data Memory
STACK_TOP EQU 0x20002000 ; constant for SP starting value
AREA | Header Code|, CODE
DCD STACK_TOP ; SP initial value
DCD Start ; Reset vector
ENTRY
Start ; Start of main program, initialize registers
MOV r0, #10 ; Starting loop counter value
MOV r1, #0 ; starting result. Calculated 10+9+8+...+1
loop ADD r1, r0 ; R1 = R1 + R0
SUBS r0, #1 ; Decrement R0, update fl ag (“S” suffi x)
BNE loop ; If result not zero jump to loop; Result is now in R1
LDR r0,=MyData1 ; Put address of MyData1 into R0
STR r1,[r0] ; Store the result in MyData1
deadloop B deadloop ; Infi nite loop
AREA | Header Data|, DATA
ALIGN 4
MyData1 DCD 0 ; Destination of calculation result
MyData2 DCD 0
END ; End of file
Bộ môn Kỹ Thuật Điện Tử - ĐHBK 36
3. ARM Programming
• A Low-Cost Test Environment for Outputting Text Messages
– UART interface is common output method to send messages to a
console
– Hyper-Terminal program can be used as a console

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 37


3. ARM Programming
• A simple routine to output a character through UART
UART0_BASE EQU 0x4000C000
UART0_FLAG EQU UART0_BASE+0x018
UART0_DATA UART0_BASE+0x000
Putc ; Subroutine to send a character via UART
; Input R0 = character to send
PUSH {R1,R2, LR} ; Save registers
LDR R1,=UART0_FLAG
PutcWaitLoop
LDR R2,[R1] ; Get status flag
TST R2, #0x20 ; Check transmit buffer full flag bit
BNE PutcWaitLoop ; If busy then loop
LDR R1,=UART0_DATA ; otherwise
STRB R0, [R1] ; Output data to transmit buffer
POP {R1,R2, PC} ; Return

The register addresses and bit definitions here are just examples

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 38


TI’s ARM Cortex-M Development Kit
• LM3S9B96 development Kit
• Stellaris LM3S9B96 MCU with fully-integrated
Ethernet, CAN, and USB OTG/Host/Device
• Bright 3.5" QVGA LCD touch-screen display
• Navigation POT switch and select pushbuttons
• Integrated Interchip Sound (I2S) Audio Interface

•The Tiva C Series EK-TM4C123GXL


LaunchPad Evaluation Kit
• A TM4C123G LaunchPad Evaluation board
• On-board In-Circuit Debug Interface (ICDI)
• USB Micro-B plug to USB-A plug cable
• Preloaded RGB quickstart application
• ReadMe First quick-start guide

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 39


Assignments
1. Write a program to move 10 words from 0x20000000 to 0x3000000.
2. Write a program to read STATUS register and write to 0x20000004
3. Write a program to write a value in 0x30000000 to CONTROL register
4. Write a subroutine to perform a function 40*X + 50
5. Write a subroutine to convert data of 10 words form big endian to little
endian.
6. Write a program as pseudo code below:
if (R0 equal R1) then {
R3 = R4 + R5
R3 = R3 / 2 }
else {
R3 = R6 + R7
R3 = R3 / 2
}

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 40


Assignments
• Design a circuit described as follows:
– Using Cortex-M4 processor LM4F120H5QR
– Port A connects to 8 single LEDs
– Port B connects to 8 buttons
– Write a program to control 8 LEDs by 8 buttons

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 41


Assignments
• Design a circuit described as follows:
– Using Cortex-M4 processor STM32F407VGT6
– Port A connects to a character LCD
– Port B connects to 3 buttons START, STOP, CLEAR
– Write a program to control as follows:
• START: start to count number in millisecond
• STOP: stop to count
• CLEAR: clear the number to zero

Bộ môn Kỹ Thuật Điện Tử - ĐHBK 42


Ho Chi Minh City University of Technology
Department of Electrical and Electronics

ECE391 Computer System Engineering


Chapter 5:
x86 Assembly Language
1. Introduction to Assembly Language
2. Basic instructions
3. Branching and looping
4. Procedures

1
1. Introduction to Assembly language
A Hierarchy of Languages

2
Assembly and Machine Language
⚫Machine language
⚫Native to a processor: executed directly by hardware
⚫Instructions consist of binary code: 1s and 0s
⚫Assembly language
⚫A programming language that uses symbolic names to represent
operations, registers and memory locations.
⚫Slightly higher-level language
⚫Readability of instructions is better than machine language
⚫One-to-one correspondence with machine language instructions
⚫Assemblers: translate assembly to machine code
⚫Compilers: translate high-level programs to machine code
⚫Either directly, or
⚫Indirectly via an assembler

3
Compiler and Assembler

4
Instructions and Machine Language
⚫Each command of a program is called an instruction
(it instructs the computer what to do).
⚫Computers only deal with binary data, hence the
instructions must be in binary format (0s and 1s) .
⚫The set of all instructions (in binary form) makes up
the computer's machine language. This is also
referred to as the instruction set.

5
Instruction Fields
⚫Machine language instructions usually are made up
of several fields. Each field specifies different
information for the computer. The major two fields
are:
⚫Opcode field which stands for operation code and it
specifies the particular operation that is to be
performed.
⚫Each operation has its unique opcode.
⚫Operands fields which specify where to get the
source and destination operands for the operation
specified by the opcode.
⚫The source/destination of operands can be a constant,
the memory or one of the general-purpose registers.

6
Assembly vs. Machine Code

7
Translating Languages
English: D is assigned the sum of A times B plus 10.

High-Level Language: D = A * B + 10

A statement in a high-level language is translated


typically into several machine-level instructions

Intel Assembly Language: Intel Machine Language:


mov eax, A A1 00404000
mul B F7 25 00404004
add eax, 10 83 C0 0A
mov D, eax A3 00404008

8
Mapping Between Assembly Language and HLL
⚫Translating HLL programs to machine language
programs is not a one-to-one mapping
⚫A HLL instruction (usually called a statement) will
be translated to one or more machine language
instructions

9
Advantages of High-Level Languages
⚫Program development is faster
⚫High-level statements: fewer instructions to code
⚫Program maintenance is easier
⚫For the same above reasons
⚫Programs are portable
⚫Contain few machine-dependent details
⚫ Can be used with little or no modifications on different machines
⚫Compiler translates
to the target machine language
⚫However, Assembly language programs are not
portable

10
Why Learn Assembly Language?
⚫Accessibility to system hardware
⚫Assembly Language is useful for implementing system software
⚫Also useful for small embedded system applications

⚫Space and Time efficiency


⚫Understanding sources of program inefficiency
⚫Tuning program performance

⚫Writing compact code

⚫Writing assembly programs gives the computer designer


the needed deep understanding of the instruction set and
how to design one
⚫To be able to write compilers for HLLs, we need to be
expert with the machine language. Assembly programming
provides this experience
11
Assembly vs. High-Level Languages
❖Some representative types of applications:

12
Assembler
⚫Software tools are needed for editing, assembling,
linking, and debugging assembly language programs
⚫An assembler is a program that converts source-code
programs written in assembly language into object
files in machine language
⚫Popular assemblers have emerged over the years for
the Intel family of processors. These include …
⚫MASM (Microsoft assembler): www.masm.com
⚫TASM (Turbo Assembler from Borland):
www.phatcode.net
⚫NASM (Netwide Assembler for both Windows and
Linux): www.nasm.us

13
Assemble and Link Process

Source Object
File Assembler File

Source Object Executable


File Assembler File Linker
File

Source Object Link


File Assembler File Libraries

A project may consist of multiple source files


Assembler translates each source file separately into an object file
Linker links all object files together with link libraries

14
Debugger
⚫Allows you to trace the execution of a program
⚫Allows you to view code, memory, registers, etc.
⚫Example: 32-bit Windows debugger,
http://www.windbg.org/

15
2. Basic Instructions
⚫Moving Data eax
mov Destination, Source ebx

⚫Operand Types ecx

⚫Immediate: Constant integer data edx


⚫Example: mov eax, 0x400 esi
⚫Like C constant
edi
⚫Encoded with 1, 2, or 4 bytes
esp
⚫Register:
⚫Example: mov eax, ebx ebp
⚫ax, bx, cx, dx: 16-bit registers

⚫eax, ebx, ecx, edx: 32-bit registers

⚫Memory:
⚫Example: mov eax, [ebx]

⚫Various other “address modes”

16
movl Operand Combinations
Destination Source C Analog

Reg mov eax, 0x4 temp = 0x4;


Imm mov [eax], -147 *p = -147;
Mem

Reg mov edx,eax temp2 = temp1;


mov Reg
Mem mov [edx], eax *p = temp;

Mem Reg mov edx, [eax] temp = *p;

⚫Cannot do memory-memory transfers with single


instruction

17
Using Simple Addressing Modes
C function ASM function
void swap(int a, int b) swap:
{ mov eax, a
int t0 = a; mov ebx, b
int t1 = b; mov a, ebx
a = t1; mov b, eax
b = t0; ret
}

18
Exchange data
⚫swap Value1 and Value2
⚫xchg eax, ebx ;
=> Take 2clock cycles

⚫mov ecx, eax ;


⚫mov eax, ebx ;
⚫mov ebx, ecx ;
=> Take 3 clock cycles

19
Some Arithmetic Operations
Format Computation
⚫Two Operand Instructions
add Dest, Src Dest = Dest + Src
sub Dest, Src Dest = Dest - Src
inc Dest Dest = Dest + 1
dec Dest Dest = Dest - 1
neg Dest Dest = -Dest
sal Dest,k Dest = Dest << k Also called shl
sar Dest,k Dest = Dest >> k Arithmetic
shr Dest,k Dest = Dest >> k Logical
k is an immediate value or contents of %cl
xor Dest, Src Dest = Dest ^ Src
and Dest, Src Dest = Dest & Src
or Dest, Src Dest = Dest | Src
not Dest Dest = ~Dest
20
Multiplication / Division
⚫Signed number: imul / idiv
⚫Unsigned number: mul / div
⚫mul Src
⚫If Src is 1 byte: AL = AL * Src
⚫If Src is 2 byte: AX = AX * Src
⚫If Src is 4 byte: EAX = EAX * Src
⚫If Src is 8 byte: RAX = RAX * Src
⚫mul Dest, Src Dest = Dest * Src
⚫mul Dest, Src, Immediate Dest = Src*
Immediate

21
Example for Arithmetic Expressions
C function ASM function
int arith
(int x, int y, int z)
{
int t1 = x+y;
int t2 = z+t1;
int t3 = x+4;
int t4 = y * 48;
int t5 = t3 + t4;
int rval = t2 * t5;
return rval;
}

22
Another Example
C function ASM function
int logical(int x, int y)
{
int t1 = x^y;
int t2 = t1 >> 17;
int mask = (1<<13) - 7;
int rval = t2 & mask;
return rval;
}

23
Differences among MASM, GAS,
and NASM
⚫MASM, the Microsoft Assembler. It outputs OMF files
(but Microsoft's linker can convert them to win32 format).
It supports a massive and clunky assembly language.
Memory addressing is not intuitive. The directives
required to set up a program make programming
unpleasant.
⚫GAS, the GNU assember. This uses the rather ugly AT&T-
style syntax so many people do not like it; however, you
can configure it to use and understand the Intel-style. It
was designed to be part of the back end of the GNU
compiler collection (gcc).
⚫NASM, the "Netwide Assembler." It is free, small, and
best of all it can output zillions of different types of object
files. The language is much more sensible than MASM in
many respects.

24
Differences among MASM, GAS,
and NASM
⚫GAS uses % to prefix registers
⚫GAS is source(s) first, destination last; MASM and NASM go
the other way.
⚫GAS denotes operand sizes on instructions (with b, w, l
suffixes), rather than on operands
⚫GAS uses $ for immediates, but also for addresses of variables.
⚫GAS puts rep/repe/repne/repz/repnz prefixes on separate lines
from the instructions they modify
⚫MASM tries to simplify things for the programmer but makes
headaches instead: it tries to "remember" segments, variable
sizes and so on. The result is a requirement for stupid ASSUME
directives, and the inability to tell what an instrction does by
looking at it (you have to go look for declarations; e.g. dw vs.
equ).
⚫MASM writes FPU registers as ST(0), ST(1), etc.
⚫NASM treats labels case-sensitively; MASM is case-insensitive.

25
MASM vs. GAS
Operation MASM GAS
mov destination, source mov source, destination
Register to register mov ebx, esi movl %esi, %ebx
Immediate to register mov al, 10 movb $10, %al
Variable t to register mov eax, t movl t, %eax
Move immediate byte mov byte ptr [edx], 10 movb $10, (%edx)
value 10 into memory
pointed to by edx
Add into esi the value in add esi, [eax+ecx*8] addl (%eax,%ecx,8), %esi
memory ecx quadwords
past the cell pointed to by
eax

26
(v) FLAG Register
o Flags register contains bits that show the status of some
activities
o Instructions that involve comparison and arithmetic will
change the flag status where some instruction will refer to
the value of a specific bit in the flag for next subsequent
action
O D I T S Z A P C

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

- 9 of its 16 bits indicate the current status of the computer


and the results of processing
- the above diagram shows the stated 9 bits

27
Add, Sub, Inc, Dec Operations
Operation Syntax Example

Addition add destination, source add ax, cx

Subtraction sub destination, source sub eax, ecx

Increment inc destination inc ecx

Decrement dec destination dec al

⚫These operations can affect EFLAGS register:


⚫SF: sign flag
⚫ZF: zero flag
⚫CF: carry flag
⚫OF: overflow flag

28
O D I T S Z A P C

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

OF (overflow): indicate overflow of a high-order (leftmost) bit following arithmetic


DF (direction): Determines left or right direction for moving or comparing string
(character) data
IF (interrupt): indicates that all external interrupts such as keyboard entry are to be
processed or ignored
TF (trap): permits operation of the processor in single-step mode. Usually used in
“debugging” process
SF (sign): contains the resulting sign of an arithmetic operation (0 = +ve, 1 = -ve)
ZF (zero): indicates the result of an arithmetic or comparison operation (0 = non
zero; 1 = zero result)
AF (auxilary carry): contains a carry out of bit 3 into bit 4 in an arithmetic operation,
for specialized arithmetic
PF (parity): indicates the number of 1-bits that result from an operation. An even
number of bits causes so-called even parity and an odd number causes odd parity
CF (carry): contains carries from a high-order (leftmost) bit following an arithmetic
operation; also, contains the content of the last bit of a shift or rotate operation.
29
Quiz
⚫Determine the value of registers after the instruction is
executed
Before Instruction After
BX: FF 75 mov bx, cx BX: ?
CX: 01 A2 CX: ?
AX: 00 75 add ax, cx AX: ?
CX: 01A2 CX: ?
SF: ? ZF: ? CF: ? OF: ?
EAX: 00 00 00 75 sub ecx, eax EAX: ?
ECX: 00 00 01 A2 ECX: ?
SF: ? ZF: ? CF: ? OF: ?
ECX: 00 00 01 A2 inc ecx ECX: ?
SF: ? ZF: ? CF: ? OF: ?
BX: 00 01 dec bx BX: ?
SF: ? ZF: ? CF: ? OF: ?
30
3. Branching and Looping
⚫Unconditional Jumps
⚫jmp StatementLabel
⚫Examples
jmp quit ; exit from program
.
.
quit: INVOKE ExitProcess, 0 ; exit with return code 0
⚫Indirect jump
⚫jmp edx
⚫jmp DWORD PTR [ebx]

31
3. Branching and Looping
⚫Conditional jump
⚫jnz : jump if not zero
⚫jns : jump if not negative
⚫Comparison instruction
⚫cmp operand1, operand2
⚫This instruction affects to AF, CF, OF, PF, SF, and ZF
⚫Examples:
cmp eax, 356
cmp pattern, 0d3a6h
cmp bh, '$‘
cmp 100, total ; => illegal

32
3. Branching and Looping
⚫Appropriate for use after comparison of unsigned operands

33
Branching and Looping
⚫Appropriate for use after comparison of signed
operands

34
3. Branching and Looping
⚫Other conditional jumps

35
Branching and Looping
⚫Example:

cmp eax, 100


ja bigger

cmp ax,100
jg bigger

36
Branching and Looping
⚫Example:
if value < 10 then
add 1 to smallCount;
else
add 1 to largeCount;
end if;

cmp ebx, 10 ; value < 10 ?


jnl elseLarge
inc smallCount ; add 1 to small_count
jmp endValueCheck
elseLarge: inc largeCount ; add 1 to large_count
endValueCheck:

37
Branching and Looping
if (total >= 100) or (count = 10) then
add value to total;
end if;

38
Looping
while continuation condition loop
... { body of loop }
end while;

whileSum: cmp sum, 1000 ; sum < 1000?


jnl endWhileSum ; exit loop if not
. ; body of loop
.
.
jmp whileSum ; go check condition again
endWhileSum:

39
Looping
while (sum < 1000) loop
... { body of loop }
end while;

while: . ; code to check Boolean expression


.
.
body: . ; loop body
.
.
jmp while ; go check condition again
endWhile:

40
Looping
while (sum < 1000) and (count <= 24) loop
... { body of loop }
end while;

41
Procedures
⚫Call a procedure
⚫call destination
⚫Return from procedure
⚫ret
⚫Example: square root procedure

Sqrt := 0;
while Sqrt*Sqrt ≤ Nbr loop
add 1 to SqRt;
end while;
subtract 1 from Sqrt;
42
The x86 Stack
⚫Push source store source into the stack
⚫Pop destination restore destination from the stack

43
Procedure example
; procedure to compute integer square root of number Nbr
; Nbr is passed to the procedure in EAX
; The square root SqRt is returned in EAX
; Other registers are unchanged.
Root: push ebx ; save registers
push ecx
mov ebx, 0 ; SqRt := 0
WhileLE: mov ecx, ebx ; copy SqRt
imul ecx, ebx ; SqRt*SqRt
cmp ecx, eax ; SqRt*SqRt <= Nbr ?
jnle EndWhileLE ; exit if not
inc ebx ; add 1 to SqRt
jmp WhileLE ; repeat
EndWhileLE:
dec ebx ; subtract 1 from SqRt
mov eax, ebx ; return SqRt in AX
pop ecx ; restore registers
pop ebx
ret ; return
44
Procedures
⚫Using parameter values passed on stack

; add two words passed on the stack


; return the sum in the EAX register
Add2: push ebp ; save EBP
mov ebp,esp ; establish stack frame
mov eax,[ebp+8] ; copy second parameter value
add eax,[ebp+12] ; add first parameter value
pop ebp ; restore EBP
ret ; return

45
Procedures
⚫Recursive procedure
⚫Towers of Hanoi puzzle

46
Recursive Procedures
⚫Tower of Hanoi puzzle

procedure Move(NbrDisks, Source, Destination, Spare);


begin
if NbrDisks = 1 then
display “Move disk from”, Source, “to”, Destination
else
Move(NbrDisks Ð 1, Source, Spare, Destination);
Move(1, Source, Destination, Spare);
Move(NbrDisks Ð 1, Spare, Destination, Source);
end if;
end procedure Move;
begin {main program}
prompt for and input Number;
Move(Number, ‘A’, ‘B’, ‘C’);
end;

47
Ho Chi Minh City University of Technology
Department of Electrical and Electronics

ECE391 Computer System Engineering


Chapter 7:
x86 Interrupts and Handlers
1. Purpose of interrupts
2. Interrupts and Exceptions
3. Interrupt vector
4. Exception Handling
5. Interrupt Handling

Chapter 6 1
Text book
⚫D. P. Bovet, M. Cesati, Understanding the Linux Kernel,
3rd edition, O'Reilly, 2005, ISBN 0-596-00565-2
⚫Chapter 4: Interrupt and Exception
⚫Page 131

Chapter 6 2
1. The Purpose of Interrupts
⚫Interrupts are useful when interfacing I/O devices with low
data-transfer rates, like a keyboard or a mouse, in which case
polling the device wastes valuable processing time
⚫The peripheral interrupts the normal application execution,
requesting to send or receive data.
⚫The processor jumps to a special program called Interrupt
Service Routine to service the peripheral
⚫After the processor services the peripheral, the execution of
the interrupted program continues.

Chapter 6 3
BASIC INTERRUPT TERMINOLOGY
⚫Interrupt pins: Set of pins used in hardware interrupts
⚫Interrupt Service Routine (ISR) or Interrupt handler:
code used for handling a specific interrupt
⚫Interrupt priority: In systems with more than one
interrupt inputs, some interrupts have a higher priority
than other
⚫They are serviced first if multiple interrupts are triggered
simultaneously
⚫Interrupt vector: Code loaded on the bus by the
interrupting device that contains the Address (segment
and offset) of specific interrupt service routine
⚫Interrupt Masking: Ignoring (disabling) an interrupt
⚫Non-Maskable Interrupt: Interrupt that cannot be
ignored (power-down)

Chapter 6 4
Interrupt processing flow
Main program

Interrupt N
Req
Y
Accept N
interrupt
Y
Get interrupt vector

Jump to ISR
Save PC

Load PC

Chapter 6 5
2. Interrupts and Exceptions
⚫“Interrupts and exceptions are events that indicate that
a condition exists somewhere in the system, the processor,
or within the currently executing program or task that
requires the attention of a processor.”
⚫“When an interrupt is received or an exception is detected,
the currently running procedure or task is suspended while
the processor executes an interrupt or exception handler.
When execution of the handler is complete, the processor
resumes execution of the interrupted procedure or task.”
⚫“The processor receives interrupts from two sources:
⚫External (hardware generated) interrupts.

⚫Software-generated interrupts.”

Chapter 6 6
Difference between Interrupt and
Exception?
⚫Interrupts and Exceptions both alter program flow

⚫Interrupts:
⚫typically indicate events from external hardware.
⚫clear the Interrupt Flag (IF - talked about later),
Exceptions do not.

⚫Exceptions:
⚫typically indicate error conditions internally
⚫Generated by CPU

Chapter 6 7
Difference between Interrupt and
Exception?
⚫Interrupts
⚫Maskable interrupts: interrupt request (IRQ) from
devices
⚫Non-maskable interrupt (MNI): hardware error,
recognized by CPU

⚫Exceptions:
⚫Fault - recoverable - pushed EIP points to the
faulting instruction
⚫Trap - recoverable - pushed EIP points to the
instruction following the trapping instruction
⚫Abort - unrecoverable - may not be able to save EIP
where abort occurred

Chapter 6 8
Exception types
⚫Type 0: (fault) Divide error – Division overflow or division by zero
⚫Type 1: (fault or trap) Single step or Trap – After the execution of each
instruction when trap flag set
⚫Type 2: NMI Hardware Interrupt – ‘1’ in the NMI pin
⚫Type 3: (trap) One-byte Interrupt – INT3 instruction (used for breakpoints)
⚫Type 4: (trap) Overflow – INTO instruction with an overflow flag
⚫Type 5: (fault) BOUND – Register contents out-of-bounds
⚫Type 6: (fault) Invalid Opcode – Undefined opcode occurred in program
⚫Type 7: (fault) Coprocessor not available – MSW indicates a coprocessor
⚫Type 8: (abort) Double Fault – Two separate interrupts occur during the same
instruction
⚫Type 9: (abort) Coprocessor Segment Overrun – Coprocessor call operand
exceeds FFFFH
⚫Type 10: (fault) Invalid Task State Segment – TSS invalid (probably not
initialized)

Chapter 6 9
Exception types
⚫Type 11: (fault) Segment not present – Descriptor P bit indicates
segment not present or invalid
⚫Type 12: (fault) Stack Segment Overrun – Stack segment not
present or exceeded
⚫Type 13: (fault) General Protection – Protection violation in 286
(general protection fault)
⚫Type 14: (fault) Page Fault – 80386 and above
⚫Type 15: reserved by Intel
⚫Type 16: (fault) Coprocessor Error – ERROR΄ = ‘0’ (80386 and
above)
⚫Type 17: (fault) Alignment Check – Word/Doubleword data
addressed at odd location (486 and above)
⚫Type 18: (abort) Machine Check – Memory Management
interrupt (Pentium and above)
⚫Type 19: (fault) SIMD floating point exception, error condition
on a floating-point operation.
Chapter 6 10
Hardware Interrupts – Interrupt pins and timing
⚫x86 Interrupt Pins
⚫INTR: Interrupt Request. Activated by a peripheral device to interrupt the processor.
⚫ Level triggered. Activated with a logic 1.
⚫/INTA: Interrupt Acknowledge. Activated by the processor to inform the interrupting
device the interrupt request (INTR) is accepted.
⚫ Level triggered. Activated with a logic 0.
⚫NMI: Non-Maskable Interrupt. Used for major system faults such as parity errors and
power failures.
⚫ Edge triggered. Activated with a positive edge (0 to 1) transition.
⚫ Must remain at logic 1, until it is accepted by the processor.
⚫ Before the 0 to 1 transition, NMI must be at logic 0 for at least 2 clock cycles.
⚫ No need for interrupt acknowledgement.

Chapter 6 11
Quiz
1. What are the purposes of interrupts?
2. Describe the interrupt processing flow
3. What are differences between interrupt and
exception?
4. Describe x86 interrupt pins
5. List types of interrupts? Give examples
6. List types of exceptions? Give examples

Chapter 6 12
3. Interrupt Vectors
⚫An interrupt vector is the memory location of an interrupt
handler, which prioritizes interrupts and saves them in a queue if
more than one interrupt is waiting to be handled.
⚫The processor uses the interrupt vector to determine the address
of the ISR of the interrupting device.
⚫Operations:
⚫An interrupt is a signal from a device attached to a computer, or
from a program within the computer, that tells the OS to stop and
decide what to do next.
⚫When an interrupt is generated, the OS saves its execution state by
means of a context switch, a procedure that a computer processor
follows to change from one task to another while ensuring that the
tasks do not conflict.
⚫Once the OS has saved the execution state, it starts to execute the
interrupt handler at the interrupt vector.

Chapter 6 13
The Intel x86 Vector Interrupts: Real Mode (16-bit)
⚫In the 8088/8086 processor as well as in the 80386/
80486/ Pentium processors operating in Real Mode (16-bit
operation), the interrupt vector is a pointer to the
Interrupt Vector Table.
⚫The Interrupt Vector Table occupies the address range
from 00000H to 003FFH (the first 1024 bytes in the
memory map).
⚫Each entry in the Interrupt Vector Table is 4 bytes
long:
⚫The first two represent the offset address and the last
two the segment address of the ISR.
⚫The first 5 vectors are reserved by Intel to be used by
the processor.
⚫The vectors 5 to 255 are free to be used by the user.
Chapter 6 14
The Intel x86 Vector Interrupts: Protected Mode (32-bit)
⚫In the 80386/80486/Pentium processors operating in the Protected
Mode (32-bit operation), the interrupt vector is a pointer to the
Interrupt Descriptor Table.
⚫The Interrupt Descriptor Table can be located anywhere in the
memory.
⚫Its starting address is pointed by the Interrupt Descriptor Table
Register (IDTR).
⚫Each entry in the Interrupt Vector Table is 8 bytes long:
⚫Four bytes represent the 32-bit offset address, two the segment
selector and the rest information such as the privilege level.
⚫The first 32 vectors are reserved by Intel to be used by the
processor.
⚫The vectors 33 to 255 are free to be used by the user.

Chapter 6 15
Circuits for generating Interrupt Vectors
(Real mode)

Interrupt Vector: FFH

Interrupt Vector: any

Chapter 6 16
Interrupt Vector - Example
⚫Draw a circuit diagram to show how a device with interrupt
vector 4CH can be connected on an 8088 microprocessor system.
⚫Answer:
⚫ The peripheral device activates the INTR line
⚫ The processor responds by activating the INTA signal
⚫ The NAND gate enables the 74LS244 octal buffer
⚫ the number 4CH appears on the data bus
⚫ The processor reads the data bus to get the interrupt vector

Chapter 6 17
Example
⚫Draw a circuit diagram to show how a device with
interrupt vector 58H can be connected on an 8088
microprocessor system.

Chapter 6 18
Interrupt Vector Table – Real Mode (16-bit) Example
⚫Using the Interrupt Vector Table shown below, determine the address of the
ISR of a device with interrupt vector 42H.
⚫Answer: Address in table = 4 X 42H = 108H
⚫ (Multiply by 4 since each entry is 4 bytes)
⚫ Offset Low = [108] = 2A, Offset High = [109] = 33
⚫ Segment Low = [10A] = 3C, Segment High = [10B] = 4A
⚫ Address = 4A3C:332A = 4A3C0 + 332A = 4D6EAH

Chapter 6 19
Interrupt Vector Table – Real Mode (16-bit) Example
⚫Write a sequence of instructions that initialize vector 40H to point to the
ISR “isr40”.
⚫Answer: Address in table = 4 * 40H = 100H
⚫Set ds to 0 since the Interrupt Vector Table begins at 00000H
⚫Get the offset address of the ISR using the Offset directive and store it in the
addresses 100H and 101H
⚫Get the segment address of the ISR using the Segment directive and store it in the
addresses 102H and 103H

push ax Save registers in the


push ds stack
mov ax,0
Set ds to 0 to point to the interrupt vector
mov ds,ax
table
mov Get the offset address of the ISR and
ax,offset_isr40
mov [0100h],ax store it in the address 0100h (4X40h =
100h)
mov ax,segment_isr40 Get the segment address of the ISR and
mov [0102h],ax store it in the address 0102h
pop ds
Restore registers from the
pop ax
stack
Chapter 6 20
Example
1. Determine the address of ISR of a device with the
interrupt vector F9h
2. Write a sequence of instructions that initialize vector
42H to point to the ISR “isr42”.

Chapter 6 21
Expanding Interrupt to seven request lines
IR0΄ IR1΄ IR2΄ IR3΄ IR4΄ IR5΄ IR6΄ Vecto
r
1 1 1 1 1 1 0 FEH
1 1 1 1 1 0 1 FDH
1 1 1 1 0 1 1 FBH
1 1 1 0 1 1 1 F7H
1 1 0 1 1 1 1 EFH
1 0 1 1 1 1 1 DFH
0 1 1 1 1 1 1 BFH

Chapter 6 22
Interrupt Vectors
⚫The Interrupt Vector contains the address of the interrupt
service routine
⚫The Interrupt Vector Table is located in the first 1024 bytes of
memory at address 000000H-0003FFH.
⚫It contains 256 different 4-byte interrupt vectors, grouped in 18
types
⚫000H: Type 0 (Divide error) • 030H: Type 12 (Stack
⚫004H: Type 1 (Single-step) segment overrun)
⚫008H: Type 2 (NMI) • 034H: Type 13 (General
⚫00CH: Type 3 (1-byte breakpoint) protection)
⚫010H: Type 4 (Overflow) • 038H: Type 14 (Page fault)
⚫014H: Type 5 (BOUND) • 03CH: Type 15
⚫018H: Type 6 (Undefined opcode) (Unassigned)
• 040H: Type 16
⚫01CH: Type 7 (Coprocessor not available)
(Coprocessor error)
⚫020H: Type 8 (Double fault)
• 044H-07CH: Type 14-31
⚫024H: Type 9 (Coprocessor segment overrun)
(Reserved)
⚫028H: Type 10 (Invalid task state segment) • 080H: Type 32-255 (User)
⚫02CH: Type 11 (Segment not present)

Chapter 6 23
Interrupt Descriptor Table
(Protected mode)
⚫The ‘entry-point’ to the interrupt-handler is located
via the Interrupt Descriptor Table (IDT)
⚫IDT: “gate descriptors”
⚫Segment selector + offset for handler
⚫Descriptor Privilege Level (DPL)
⚫Gates (slightly different ways of entering kernel)
⚫Task gate: includes TSS to transfer to (not used by Linux)
⚫Interrupt gate: disables further interrupts
⚫Trap gate: further interrupts still allowed
How is the IDT found?
⚫There is a specific register which points at the base (0th
entry) of the IDT. The IDT Register is named IDTR ;)
⚫When interrupt/exception occurs, the hardware
automatically
⚫consults the IDTR
⚫finds the appropriate offset in the IDT
⚫pushes the saved state onto the stack
⚫changes EIP to the address of the interrupt handler, as read from
the IDT entry (interrupt descriptor).

25
IDTR Format

⚫The upper 32 bits of the register specify the linear


address where the IDT is stored. The lower 16 bits
specify the size of the table in bytes.
⚫Special instructions used to load a value into the
register or store the value out to memory
⚫LIDT - Load 6 bytes from memory into IDTR
⚫SIDT - Store 6 bytes of IDTR to memory
⚫Structured the same way as the GDT
⚫Also, WinDbg displays upper 32 bits as idtr, and lower
parts as idtl.

26
IDTR Usage
⚫Relation between the IDTR and IDT

27
Interrupt Descriptors
⚫The descriptors in the IDT describe one of three
gate types
⚫Trap Gate
⚫Task Gate
⚫Interrupt Gate
⚫“The only difference between an interrupt gate and a trap
gate is the way the processor handles the IF flag in the
EFLAGS register.“ Discussed later, but from this you can
infer that a Trap Exception isn’t related to a Trap Gate. Since
there’s the difference between where EIP points for trap vs
interrupt exceptions.
⚫Gates are used in the IDT to facilitate control flow
transfers between privilege levels

28
Trap Gate Descriptor
IDT Gate Descriptors
Note that the two halves
of the offset form a
32 bit address

Descriptors not in use should have P = 0

29
Task Gate Descriptor

Descriptors not in use


should have P = 0

30
Interrupt Gate Descriptor

Note that the two halves


of the offset form a
32 bit address.

Descriptors not in use should have P = 0

31
Descriptor Descriptions
⚫The DPL is again the Descriptor Privilege Level. And it is
only checked when a descriptor is accessed by a software
interrupt, in which case it is only allowed if CPL <= DPL
(ignored on hardware interrupts)
⚫Note that the descriptor specifies a segment selector and
32 bit address. Why that looks like a “logical address” aka
“far pointer” to me!
⚫D flag specifies whether you’re jumping into a 16 or 32 bit
segment.
⚫P (Present) flag

32
IDT Relation to Segments
Interrupt Procedure Call

33
The Intel x86 Interrupt Software Instructions
⚫All x86 processors provide the following instructions related to
interrupts:
⚫INT nn: Interrupt. Run the ISR pointed by vector nn.
⚫ INT 0 is reserved for the Divide Error
⚫ INT 1 is reserved for Single Step operation
⚫ INT 2 is reserved for the NMI pin
⚫ INT 3 is reserved for setting a Breakpoint
⚫ INT 4 is reserved for Overflow (Same as the INTO (Interrupt on overflow)
instruction.
⚫CLI: Clear Interrupt Flag. IF is set to 0, thus interrupts are
disabled.
⚫STI: Set Interrupt Flag. IF is set to 1, thus interrupts are enabled.
⚫IRET: Return from interrupt. This is the last instruction in the ISR
(Real Mode only). It pops from the stack the Flag register, the IP
and the CS.
⚫ After returning from an ISR the interrupts are enabled, since the initial
value of the flag register is poped from the stack.
⚫IRETD: Return from interrupt. This is the last instruction in the
ISR (Protected Mode only). It pops from the stack the Flag register,
the EIP and the CS.
Chapter 6 34
Interrupt Hardware
Legacy PC Design
(for single-proc IRQs
systems)
Ethernet Maste
Slave x86
r
PIC
PIC INTR CPU
SCSI Disk (8259)
(8259)
Real-Time Clock

Keyboard Controller Programmable Interval-Timer

● I/O devices have (unique or shared) Interrupt Request


Lines (IRQs)
● IRQs are mapped by special hardware to interrupt
vectors, and passed to the CPU
● This hardware is called a Programmable Interrupt
Controller (PIC)
The 8259A Programmable Interrupt Controller
⚫Adds 8 vectored priority encoded interrupts to the microprocessor
⚫Can be expanded without additional hardware to accept up to 64 IRQ
(one 8259A master, and one slave)
⚫Requires 4 wait states to be connected to a x386
⚫D0-D7: Bidirectional data connections
⚫IR0-IR7: Interrupt request inputs
⚫WR΄: Write input strobe
⚫RD΄: Read input connects to the IORC΄signal
⚫INT: Output, connects to μP INTR pin
⚫INTA΄: Input, connects to μP INTA΄ pin
⚫A0: Command word select
⚫CS΄: Chip select input
⚫SP/EN΄: Slave program/enable buffer pin
⚫CAS0-CAS2: Outputs from master to slave for cascading multiple 8259A
chips

Chapter 6 36
Connecting a single 8259A controller

Chapter 6 37
Multiple Logical Processors
Multi-CORE CPU

CPU CPU
0 1 I/O
APIC
LOCAL LOCAL
APIC APIC

Advanced Programmable Interrupt Controller is needed


to
perform ‘routing’ of I/O requests from peripherals to
CPUs

(The legacy PICs are masked when the APICs are enabled)
4. Exception Handling
Exception handlers have a standard structure consisting
of three steps:
1. Save the contents of most registers in the Kernel Mode
stack (this part is coded in assembly language).
2. Handle the exception by means of a high-level C
function.
3. Exit from the handler by means of the
ret_from_exception( ) function.

Chapter 6 39
Software Interrupts
⚫Traps: (self-interrupt!)
⚫Single step mode
⚫Calls to Operating System (INT 21H - x86, SC – PPC)
⚫Exceptions:
⚫Divide by zero
⚫Memory protection fault

Chapter 6 40
Exception handling functions
⚫Most of exception functions invoke the do_trap()
function to store the hardware error code and the
exception vector in the process descriptor of current, and
then send a suitable signal to that process.

⚫When the C function that implements the exception


handling terminates, the code performs a jmp instruction
to the ret_from_exception( ) function

⚫The current process takes care of the signal right after the
termination of the exception handler.

Chapter 6 41
5. Interrupt Handling
1. Save state
⚫Disable interrupts for the duration of the ISR or allow
it to be interrupted too?
⚫Save program counter
⚫Save flags
⚫Save register values?
2. Jump to interrupt service routine
⚫Location obtained by interrupt vector
3. Process interrupt
4. Restore state
⚫Load PC, flags, registers etc.

Chapter 6 42
I/O Interrupt handling
⚫The hardware circuits and the software functions are
used to handle an interrupt.

Chapter 6 43
Hardware to Software

Memory Bus
IRQs 0

idt
INTR r IDT
0
PIC CPU
vector

N handler

Mask points
255
Assigning IRQs to Devices
⚫IRQ assignment is hardware-dependent
⚫Sometimes it’s hardwired, sometimes it’s set physically,
sometimes it’s programmable
⚫PCI bus usually assigns IRQs at boot
⚫Some IRQs are fixed by the architecture
⚫IRQ0: Interval timer
⚫IRQ2: Cascade pin for 8259A
⚫Linux device drivers request IRQs when the device is opened
⚫Note: especially useful for dynamically-loaded drivers, such as for
USB or PCMCIA devices
⚫Two devices that aren’t used at the same time can share an IRQ,
even if the hardware doesn’t support simultaneous sharing
Assigning Vectors to IRQs

⚫Vector: index (0-255) into interrupt descriptor table


⚫Vectors usually IRQ# + 32
⚫Below 32 reserved for non-maskable intr & exceptions
⚫Maskable interrupts can be assigned as needed
⚫Vector 128 used for syscall
⚫Vectors 251-255 used for inter-processor interrupt (IPI)
Interrupt Handling
⚫Processor handles a total of 255 interrupts
⚫0-31 are used by machine or reserved
⚫32-255 are user definable
⚫0 – Divide error, goes to first descriptor in
IDT
⚫1 – Debug
⚫8 – Double Fault
⚫12 – Stack Segment fault
⚫13 – General Protection Fault
⚫14 – Page Fault

Chapter 6 47
I/O interrupt handling
⚫IRQ assignment to I/O devices

Chapter 6 48
82C55 Keyboard Interrupt Circuit

Chapter 6 49
Quiz
1. Describe processing steps for exception handling.
2. Describe processing steps for interrupt handling.
3. What is the main difference between exception
handling and interrupt handling?
4. What are IRQs for non-maskable interrupts &
exceptions?

Chapter 6 50
Ho Chi Minh City University of Technology
Department of Electrical and Electronics

ECE391 Computer System Engineering


Chapter 7:
Memory Management
1. Memory Segmentation
2. Protection
3. Virtual Memory and Paging
4. Memory Management Problem

Chapter 7 1
Memory Management
Ensured
by
⚫Multi User Operating Systems Segmenta
⚫Ease of Programming tion
⚫Process Mobility in the Address Space
⚫Multiprocess Context switching
⚫Protection across Processes
Ensured
⚫Intra process protection: Separation of Code,
by
Data and Stack Paging
⚫Inter process protection

⚫Virtual Memory
⚫4GB address space for every process

Chapter 7 2
1. Memory Segmentation Main Memory
Code_Segment: 0000
if (j>k) Code and Data
segments are
max = j mov EAX, [0] Operating System
separate
mov EBX, [4] and both (Kernel)
else
assumed
max = k cmp EAX,EBX to start from 0 0700

jle 0x7 //Label_1 Other User


Process
mov [8], EAX Every Memory Data
Access should add 0900
jmp 0x5 //Label_2 the value stored in
Data Segment
Register Our Code
Label_1: mov [8], EBX By default.
Segment
Label_2: ….
Segment Register (Data) 1900
Data Segment: 2100 Vacant
Space
0: // Allocated for j 2100
Address of j: 2100
Address of k: 2104 4: // Allocated for k Our Data
Address of max: 2108 Segment
8: // Allocated for max 2300
Vacant
Space
2500
Ease Of Programming
Chapter 7 3
1. Memory Segmentation Main Memory
Code_Segment: 0000
if (j>k)
max = j mov EAX, [0] Operating System
mov EBX, [4] (Kernel)
else
max = k cmp EAX,EBX 0700

jle 0x7 //Label_1 Other User


Process
mov [8], EAX A new process needs
a 0900
jmp 0x5 //Label_2 segment of size 260
The space is
available Our Code
Label_1: mov [8], EBX but not contiguous
Segment
Label_2: ….
Segment Register (Data) 1900
Data Segment: 2100
2300 Vacant
New User
Space
0: // Allocated for j 2100 Process
Address
Address of
of j:
j: 2300
2100
Address
Address of
of k:
k: 2304
2104 4: // Allocated for k Our
Vacant
Data
2160
Address of max: 2308
2108 Segment
Space
Vacant Space
8: // Allocated for max 2300
Our
Vacant
Data
Segment
Space
2500
Process Mobility
Chapter 7 4
Multiple Segments
⚫The segment register can change its values to point to different
segments at different times.
⚫X86 architecture provides additional segment registers to access
multi data segments at the same time.
⚫DS, ES, FS and GS
⚫X86 supports a separate Stack Segment Register (SS) and a Code
segment Register (CS) in addition.
⚫By default a segment register is fixed for every instruction, for all
the memory access performed by it. For eg. all data accessed by
MOV instruction take DS as the default segment register.
⚫An segment override prefix is attached to an instruction to
change the segment register it uses for memory data access.

Chapter 7 5
Multiple Segments
⚫Multiple segments provide hardware enforced protection of code,
data structures, and programs and tasks.

Chapter 7 6
0000
mov [10], eax
- this will move the DS
contents of eax register to
memory location 0510 0500
Opcode: 0x89 0x05 0x10
mov [ES:10], eax
C 1500
-this will move the contents
of eax register to memory S
location 3510
Opcode SS 2500
0x26 0x89 0x05 0x10
“0x26” is the segment
override prefix. 3500
E
S

Multiple Segments
Chapter 7 7
Process 1
C CS

S Process 1
DS
Process 1 D Process 2
in S CS

Execution Process 2
SS
S
Process 2
S DS
Process 2 Process 1
SS
in
Execution
Multiprocess Context switching

Chapter 7 8
Operating mode
⚫Intel processor runs in five modes of operations
⚫Real Mode: 20-bit segmented memory address space, no
support for memory protection, multitasking, or code privilege
levels
⚫Protected Mode: allows system software to use features such
as virtual memory, paging and safe multi-tasking
⚫Virtual 8086 mode is used to run 8086 compatible programs
concurrently with other protected mode programs
⚫Long mode– Extended Memory model for 64-bits
⚫Compatibility mode – execute 32 bit code in 64-bit mode
without recompilation. No access to 64-bit address space
⚫64-bit mode – 64-bit OS accessing 64-bit address space and
64-bit registers
⚫System Management mode (80386): provide for handling
system-wide functions like power management, system
hardware control, or proprietary OEM designed code

Chapter 7 9
Segmentation in Real mode
⚫The 16-bit segment selector in the segment register is
interpreted as the most significant 16 bits of a linear 20-bit
address, called a segment address, of which the remaining
four least significant bits are all zeros.
⚫The segment address is always added to a 16-bit offset in
the instruction to yield a linear address, which is the same
as physical address in this mode.
⚫For instance, the segmented address 06EFh:1234h has a
segment selector of 06EFh, representing a segment
address of 06EF0h, to which we add the offset, yielding
the linear address 06EF0h + 1234h = 08124h.

Chapter 7 10
Real Mode - Memory Addressing
•Segment << 4 + offset = 20 bit EA

•Segment size is a fixed 64K

DS = 0x1004 mov [0x1000], EAX

The mov will store the content of EAX in


0x10040 + 0x1000 = 0x11040
Why this stuff? - To get 1 MB addressing using 16-
bit Segment Registers
Chapter 7 11
Protected Mode Addressing
⚫ mov [DS:1000], EAX
⚫ Let value of DS be 0x10. This is used to select
a segment descriptor in a descriptor table.
⚫The segment descriptor contains information
about the base address of the segment, to
which 1000 is added to get the effective
address.
⚫The value stored in DS is called a selector.
⚫Henceforth we discuss protected mode.
Chapter 7 12
Protected Mode Addressing
Logical
Address
SELECTOR OFFSET

Descriptor Table

Segment Descriptor Base Address

Linear
Address

Chapter 7 13
2. Protection
⚫In the protected mode, x86 processor provide a
protection mechanism operates at both the segment
level and the page level.
⚫The protection checks that are performed fall into
the following categories:
⚫Limit checks.
⚫Type checks.
⚫Privilege level checks.

Chapter 7 14
2.1 Limit Checking
⚫The limit field of a segment descriptor prevents
programs or procedures from addressing memory
locations outside the segment.
⚫The effective value of the limit depends on the
setting of the G (granularity) flag .
⚫For data segments, the limit also depends on the E
(expansion direction) flag and the B (default stack
pointer size and/or upper bound) flag.
⚫The E flag is one of the bits in the type field when
the segment descriptor is for a data segment type.

Chapter 7 15
•A process always executes from Code segment. It should
not execute by accessing from adjoining Data or stack area
or any other code area too.
•A stack should not overgrow into adjoining segments

500
CS Every segment is specified a
1000
start address and limit.
ES 1500 Architecture checks if limit is
not exceeded.
SS 2000

jmp
mov jmp
PUSH
CS:501
POP
PUSH CS:250
[ES:498],
POP
mov
EAX
EAX
AX
[ES:498],
//This
AX //Let
//Let
//Let
EAXis SP
aSP
SP//This
AX
//This
violation
be
be
be
//This
2,498,
498,
2, itaisasfine
isViolation!!!
it
is
violation!!!
violation
is
fine
limit
fineis 500

Intra and Inter process Protection


Chapter 7 16
Process 1 should be Process 1
prevented from loading
CS, such that it can
C CS

access the code of S Process 1


Process 2 DS

Similarly for the DS,SS, D Process 2


CS
ES, FS and GS S
Process 2
SS
Privilege levels: [0-3] S
assigned to each Process 2
segment.
S DS

0: Highest privilege Process 1


SS
3: Lowest privilege

Interprocess Protection

Chapter 7 17
2.1. Limit Checking
⚫When the G flag is clear (byte granularity)
⚫The effective limit is the value of the 20-bit limit field
in the segment descriptor.
⚫The limit ranges from 0 to FFFFFH (1 MByte).
⚫When the G flag is set (4-KByte page granularity),
⚫The processor scales the value in the limit field by a
factor of 212 (4 KBytes).
⚫In this case, the effective limit ranges from FFFH (4
KBytes) to FFFFFFFFH (4 GBytes).
⚫Note that when scaling is used (G flag is set),
⚫the lower 12 bits of a segment offset (address) are not
checked against the limit;
⚫for example, note that if the segment limit is 0, offsets
0 through FFFH are still valid.

Chapter 7 18
3.1 Limit Checking
⚫The processor causes a general-protection exception
any time an attempt is made to access the following
addresses in a segment:
⚫A byte at an offset greater than the effective limit
⚫A word at an offset greater than the (effective-limit – 1)
⚫A doubleword at an offset greater than the (effective-
limit – 3)
⚫A quadword at an offset greater than the (effective-
limit – 7)
⚫A double quadword at an offset greater than the
(effective limit – 15)

Chapter 7 19
2.2 Type Checking
⚫Segment descriptors contain type information in two
places:
⚫The S (descriptor type) flag.
⚫The type field.
⚫The processor uses this information to detect
programming errors that result in an attempt to use a
segment or gate in an incorrect or unintended
manner.

Chapter 7 20
3.2 Type Checking
⚫Code and data segment types

Chapter 7 21
Types of non-system segment
descriptors
⚫System bit S = 1
⚫000 – Data, Read only
⚫001 – Data, Read/Write
⚫010 – expand down, Read only
⚫011 – expand down, Read/Write
⚫100 – Code, Execute only
⚫101 – Code, Execute/Read
⚫110 – Conforming Code, Execute only
⚫111 - Conforming Code, Execute/Read

Chapter 7 22
2.3 Privilege Checking
⚫The processor uses privilege levels to prevent a program or task
operating at a lesser privilege level from accessing a segment
with a greater privilege, except under controlled situations.
⚫When the processor detects a privilege level violation, it
generates a general-protection exception

Chapter 7 23
Protection Implementation
⚫Every segment is associated with a descriptor
stored in a descriptor table.
⚫The privilege level of any segment is stored in
its descriptor.
⚫The descriptor table is maintained in memory
and the starting location of the table is pointed
to by a Descriptor Table Register (DTR).
⚫The segment register stores an offset into this
table.

Chapter 7 24
Structure of a Descriptor

Chapter 7 25
Descriptor Privilege Level
⚫Privilege levels apply to entire segments
⚫The privilege level is defined in the segment
descriptor
⚫The privilege level of the code segment
determines the Current Privilege Level (CPL)

Chapter 7 26
Privilege levels and Protection
⚫Every segment has an associated privilege
level and hence any code segment will have an
associated privilege level.
⚫The CPL (Current Privilege Level) of a process
is the privilege level of the code segment, the
code stored in which, it is executing.
⚫A process can access segments that have
privilege levels numerically greater than or
equal to (less privileged than) its CPL.

Chapter 7 27
Privilege levels
⚫The need is to prevent
⚫Users from interfering with one another
⚫Users from examining secure data
⚫Program bugs from damaging other programs
⚫Program bugs from damaging data
⚫Malicious attempts to compromise system integrity
⚫Accidental damage to data

Chapter 7 28
Privilege Protection
⚫Continuous checking by the processor on
whether the application is privileged enough to
⚫Type 1: Execute certain instructions
⚫Type 2: Reference data other than its own
⚫Type 3: Transfer control to code other than its own
⚫To manage this every segment has a privilege
level called the DPL (Descriptor Privilege
Level) Bits 45,46

Chapter 7 29
3. Virtual memory
⚫What if we wanted more RAM than we
had available. For example, we might have
1 M of RAM, what if we wanted 10 M? How
could we manage?
⚫One way to extend the amount of
memory accessible by a program is to use
disk. This idea of extending memory is
called virtual memory. It's called
"virtual" only because it's not RAM.
⚫The real problem with disk is that it's
really, really slow to access. The advantage
of disk is it's easy to get lots of disk space
for a small cost.
⚫Still, because disk is so slow to access, we
want to avoid accessing disk
unnecessarily.Chapter 7 30
Uses of Virtual Memory
⚫Initially, virtual memory meant the idea of using disk
to extend RAM
⚫Virtual memory was used as a means of memory
protection.
⚫Every program uses a range of addressed called the
address space.
⚫Virtual memory can help prevent programs from
interfering with other programs.
⚫Virtual memory can also help programs to cooperate,
and share memory.

Chapter 7 31
Virtual
address
space

⚫A virtual
address space is
the set of ranges
of virtual
addresses that
an operating
system makes
available to a
process

Chapter 7 32
Virtual Memory and Paging
⚫It is always enough if the next instruction to be
executed and the data needed to execute the same
are available in the memory.
⚫The complete code and data segment need not be
available.
⚫Use of paging to realize the stuff!
⚫By using segmentation the processor calculates
an 32-bit effective address.

Chapter 7 33
Paging fundamentals
⚫A page is a sequence of N bytes where N is a
power of 2.
⚫In 8088/8086 processors:
⚫Each page is 4096 bytes
⚫Physical RAM has page frames like photo frames,
which is also 4096 bytes.
⚫A page is copied into the page frame, if needed and
removed to accommodate some other page.
⚫By this, a 4 GB code can run on a 128MB physical
memory
⚫In modern processors:
⚫Page sizes are at least 4K in size and maybe as large as
64 K or more.

Chapter 7 34
Protected Mode Addressing with paging
10 10 12
DIR TABLE OFFSET

PAGE FRAME

PAGE DIRECTORY PAGE TABLE


PHYS ADDRS
4KB entries
with 4 bytes
4KB entries per entry
with 4 bytes
PG TBL
per entry
ENTRY

DIR ENTRY If 20 bytes are used as a single level


paging then page table alone is 4 MB
which is inefficient. So two level paging.

CR3 REG Develop the page table on demand


TLB’s used to improve performance
Dirty bit accommodated in each page
entry
Chapter 7 35
Paging - Example
⚫Suppose your program generated the following virtual
address F0F0F0F0hex (which is 1111 0000 1111 0000 1111 0000
1111 0000 two). How would you translate this to a physical
address?

Chapter 7 36
Protected Mode Addressing - Paging entries

PDE: Page Directory Entry


PTE: Page Table Entry

Chapter 7 37
Page hit vs. Page fault
⚫If the valid bit (present bit) is 1, then the virtual page
is in RAM, and you can get the physical page from the
PTE. This is called a page hit, and is basically the
same as a cache hit.
⚫If the valid bit is 0, the page is not in RAM, and the
20 bit physical page is meaningless. This means, we
must get the disk page corresponding to the virtual
page from disk and place it into a page in RAM. This
is called a page fault.

Chapter 7 38
Mapping segments to pages
⚫The segmentation and paging mechanisms provide in the support a
wide variety of approaches to memory management.
⚫When segmentation and paging are combined, segments can be
mapped to pages in several ways.
⚫To implement a flat (unsegmented) addressing environment, for
example, all the code, data, and stack modules can be mapped to one
or more large segments (up to 4-GBytes) that share same range of
linear addresses

Chapter 7 39
4. Memory management problems
1. Always provide memory allocation for pointers
⚫Instructions:
⚫Malloc:
⚫P = malloc( 200 * sizeof(char) );
⚫buffer = (char*) malloc (i+1);

⚫New
⚫P = new (sizeof(char));
⚫my_class = new Myclass* [size1];

⚫Example:

int *x; int *x = new int;


*x = 5; // => error *x = 5; // => OK

Chapter 7 40
Memory management problems
⚫Example
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
char name[100];
char *description;
strcpy(name, "Zara Ali");

/* allocate memory dynamically */


description = malloc( 200 * sizeof(char) );

if( description == NULL ) {


fprintf(stderr, "Error - unable to allocate required memory\n");
}
else {
strcpy( description, "Zara ali a DPS student in class 10th");
}
printf("Name = %s\n", name );
printf("Description: %s\n", description );
}

Chapter 7 41
Memory management problems
2. Free memory after using
⚫Instructions:
⚫Delete (C++ instruction)
⚫delete p;

⚫Free (C instruction)
⚫free(p);

⚫Example:

void f(int n)
{ int* array = calloc(n, sizeof(int));
//do_some_work(array);
free(array);
}

Chapter 7 42
Memory management problems
⚫Example
#include <iostream> // std::cout
#include <new> // ::operator new

struct MyClass {
int data[100];
MyClass() {std::cout << "constructed [" << this << "]\n";}
};
int main () {
std::cout << "1: ";
MyClass * p1 = new MyClass;
// allocates memory by calling: operator new (sizeof(MyClass))
// and then constructs an object at the newly allocated space

std::cout << "2: ";


MyClass * p2 = new (std::nothrow) MyClass;
delete p1;
delete p2;
return 0;
}
Chapter 7 43
Memory management problems
3. Resizing memory allocation
⚫Instruction: realloc()
⚫description = realloc( description, 100 * sizeof(char) );

Chapter 7 44
⚫Example
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
char name[100];
char *description;
strcpy(name, "Zara Ali");
description = malloc( 30 * sizeof(char) ); /* allocate memory dynamically */
if( description == NULL ) {
fprintf(stderr, "Error - unable to allocate required memory\n");
}
else {
strcpy( description, "Zara ali a DPS student.");
}
/* suppose you want to store bigger description */
description = realloc( description, 100 * sizeof(char) );
if( description == NULL ) {
fprintf(stderr, "Error - unable to allocate required memory\n");
}
else {
strcat( description, "She is in class 10th");
}
printf("Name = %s\n", name );
printf("Description: %s\n", description );

/* release memory using free() function */


free(description);
}
Chapter 7 45
Memory management problems
4. Free allocated memory of an array
⚫free each element of the array, then free the array
pointer
⚫Example
Problem Solution
Passenger **p = new
Passenger **p = new Passenger*[100]; Passenger*[100];
for(int i=0;i<100;i++) for(int i=0;i<100;i++)
p[i] = new Passenger[50]; p[i] = new Passenger[50];

//After using p //After using p


delete[] p; for(int i=0;i<100;i++)
delete p[i];
delete[] p;

Chapter 7 46

You might also like