You are on page 1of 26

Microcontrollers in FPGAs

Tomas Sdergrd
University of Vaasa
Contents

Finite state machine


Design of instructions
Architecture
Registry file
Hardware aspects of MCUs
Comparison of microcontrollers
Picoblaze, Nios II and Atmega328P
Conclusions
Finite state machine

Moore Machine
Output only dependent on current state (Pedroni: 2004:
159)
Mealy Machine
Output dependent on current state and external input.
Synchronisation (Zwolinski 2000: 82)
Clock
Reset
Programmable state machine
General purpose FSM (Meyer-Baese 2007: 537, Chu 2008:
324-326)
Programmable state machine

Control
Program
Memory

Data
ALU
Memory
Instructions

Operations

ALU operations Data move Branch


- Add - Move - Compare
- Mul - Push - Jump
- Not - Pop - Loop
Addressing modes

Addressing modes describe how the operands for an operation


are located. (Meyer-Baese 2007: 544)

Implied addressing (Meyer-Baese 2007: 544-545)


Location is implicitly defined
No operands in the instruction

Immediate addressing (Meyer-Baese 2007: 546)


One operand in the instruction
The operand is a constant
Addressing modes

Register addressing (Meyer-Baese 2007: 546547)


Data is fetched from fast CPU registers
Used for ALU operations in most RISC machines

Memory addressing (Meyer-Baese 2007: 547549)


Direct addressing
Additional register needed due to instruction size
In base addressing the additional register contains a constant that is
added to the constant in the instruction.
In page addressing the additional register contain the most significant
bits of the address. Full address is obtained by concatenation.
Indirect addressing
The additional register contains the full address
Data flow

An instruction contains at least one (the first) of the following:


Operation code
Operands
Result location

Parameters affecting the instruction size


Number of operations
Number of operands
Memory size
Zero address CPU Stack machine

No operands in the instruction


All operations are performed on the two top elements of the
stack

Code example:
Push #5
Push #3
Add
Pop Reg1

(Meyer-Baese 2007: 552-553)


One address CPU Accumulator
machine

One operand in the instruction


The second operand is the value of the accumulator
The destination is the accumulator

Code example
Load #5
Add #3
Store Reg1

(Meyer-Baese 2007: 553-554)


Two address CPU

The instruction contains two operands


The destination of the result is the location of the first operand

Code examples
Move Reg1, #5 Move Reg2, #5
Add Reg1, #3 Move Reg1, #3
Add Reg1, Reg2

(Meyer-Baese 2007: 555)


Three address CPU

The instruction contains three addresses


Destination and sources can be specified separately

Code examples
Move Reg2, #5 Add Reg1, #5, #3
Move Reg3, #3
Add Reg1,Reg2,Reg3

(Meyer-Baese 2007: 555-556)


Architecture

Von Neumann Architecture Data &


Shared data and program memory = One CPU
bus Program
Harvard Architecture
Separate data and program memory = Data
Two buses CPU
Program
Super Harvard Architecture
Separate X and Y data memories and
separate program memory = Three buses Data X
Fast cache registers for immediate results CPU Data Y
(Meyer-Baese 2007: 558)
Program
Registry file

Two dimensional bit array


Has a mechanism for storing data to the registry file
Has a mechanism for reading data from the registry file
Consumes many logical elements in a FPGA
The registry file in the example discussed on the following pages is
of size 8x16 and consumes 211 LEs (Meyer-Baese 2007: 560)
VHDL registry file example

Entity declaration (Meyer-Baese 2007: 560)

Entity reg_file IS
generic (W: integer:=7;
N: integer :=15);
port(clk, reg_ena : in std_logic;
data : in std_logic_vector(W downto 0);
rd, rs, rt : in integer range 0 to 15;
s, t : out std_logic_vector(W downto 0));
End;
VHDL registry file example
Architecture: type declarations (Meyer-Baese 2007: 560)

Architecture fpga of reg_file is


subtype bitw is std_logic_vector(W downto 0);
type SLV_NxW is array (0 to N) of bitw;
signal r : SLV_NxW;
Begin

Mux: Process
Begin
wait until clk=1;
if rd>0 then
r(rd)<=data;
end if;
End Process Mux;
VHDL registry file example

Architecture: Demux for outputs (Meyer-Baese 2007: 560)

Demux: Process(r,rs,rt)
Begin
if rs>0 then
s<=r(rs);
else
s<=(others=>0);
end if;
if rt>0 then
t<=r(rt);
else
t<=(others=>0);
end if;
End Process Demux;
FSM vs PSM (Chu:2008:324)

FSM PSM

Special purpose General purpose

State register Program counter (PC)

Generates certain output based on Generates outputs based on


simple logic encoding and decoding
Next state can be specified freely Next state is normally an
incrementation of the PC.
Exceptions are branch instructions.
Structural aspects for FPGAs

Harvard Architecture better for FPGA MCUs


Reason: Memory size more limited (and slower)

Data flow (Meyer-Baese 2007: 556-557)


A more complex instruction implies:
Easier assembly programming
More complicated C compiler development
Longer instruction
Fewer instructions needed
Lower speed
Larger constant is immediate addressing
Comparison of instructions

Parameter Picoblaze Nios II Atmega328P


Architecture Harvard Harvard Harvard
Registry file 16 x 8 bit 32 x 32 bit 32 x 8 bit
Clk/instr. 2 1 1-2
Instr. count 57 256 131
Data mem. 64 B ? 2 kB
Instr. width 18 bit 32 bit ?
LE count ~200 >700 -
Data flow 2 address 3 address 2 address

(Chu 2008: 323, 326-327, 329, 332-337 Altera Nios II/e, Altera Nios II/f, Altera 2011:
3,11-12, Atmega328P: 1, 8, Moshovos 2007)
Recently developed MCU

Article publiched in Semptember 2011 by Martin Shoeberl.


Properties:
Name= Leros
16 bit microcontroller
Accumulator machine/one address CPU
200 LEs
2 stage pipeline = fectch and decode
2 clock cycles/instruction
Portable= Successfully tested in Altera and Xilinx devices
Assembly compiler available
Conclusions Useful technology?

Area optimisation
Algorithms like FFT may consume less resources, but will hence
become slower. (Meyer-Baese 2007: 537)
Main purpose of FPGA technology is processing speed?

Reuse of code
Controller and datpath partitioning (Zwolinski 2000: 160)
General vs special purpose state machine (Chu 2008: 324)

Complexity
Moves some of the complexity of VHDL (or Verilog) to the compiler
Conclusions Useful technology?

Speed
No parallism anymore
Backwards development?

Especially useful when:


Part of a larger circuit
Multi controller systems that perform simpler tasks
Sources
Atmega 328P. 8-bit Microcontroller with 4/8/16/32K Bytes In-System Programmable
Flash [online] [cited 17.11.2011] Available from Internet: URL
http://www.atmel.com/dyn/resources/prod_documents/doc8271.pdf

AVR assembly. Beginners introduction to AVR assembler


[online][cited 17.11.2011] Available from Internet: URL
http://www.avr-asm-tutorial.net/avr_en/beginner/index.html

Altera Nios II (2011). Processor Architecture. [online][cited 18.11.2011]


http://www.altera.com/literature/hb/nios2/n2cpu_nii51002.pdf

Altera Nios II/e Core. Economy. [online][cited 18.11.2011] URL:


http://www.altera.com/devices/processor/nios2/cores/economy/ni2-economy-
core.html
Sources
Altera Nios II/f Core. Fast for Performance Critical Applications [online] [cited
18.11.2011]. URL: http://www.altera.com/devices/processor/nios2/cores/fast/ni2-
fast-core.html

Chu, Pong P. (2007). FPGA Prototyping by VHDL Examples. Ohio: Wiley.

Meyer-Baese, U. (1999). Digital Signal Processing with Field Programmable Gate


Arrays. 3. Edition. Heidelberg: Springer.

Moshovos, Andreas (2007). Using Assembly Language to Write Programs. [online]


[cited 18.11.2011]. Available from Internet. URL:
http://www.eecg.toronto.edu/~moshovos/ECE243-2009/lec5%20-
%20Intro%20to%20Assembly.htm
Sources
Shoeberl, Martin (2011). Leros: A Tiny Microcontroller for FPGAs. Field Program-
mable Logic and Applications (FPL), 2011 International Conference. 1014.

Zwolinski, Mark (2000). Digital System Design with VHDL. 2. Edition. Essex: Pearson
Education Limited.

You might also like