You are on page 1of 10

Microblaze Softcore and Digilent S3 FPGA

Demonstration Board
Tutorial
Computer Electronics
1
st
Semester, 2010
1 Introduction
This tutorial introduces the MicroBlaze (MB) softcore processor [2] for Xilinx Field Pro-
grammable Gate Array (FPGA) devices [3] and gives an example of this cores utilization
for implementing image processing algorithms. Furthermore, a detailed analysis on how to
attach several peripherals to the MB basic architecture in order to enhance interoperability
between a demonstration boards modules and this processor will be provided.
The system design will be accomplished employing the Xilinx ISE and Embedded Devel-
opment Kit (EDK) tools [4], version 10.1.03. The implementation will be supported on the
Digilent S3 starterkit board [1] which embed a Xilinx Spartan 3 FPGA (part XC3S1000-4).
Before going through this tutorial the students are suggested to:
understand the internal components and organization of the FPGA devices;
have a comprehensive reading on the MB processor architecture and supported instruc-
tions;
develop their familiarity with the Xilinx EDK environment;
know basic concepts of C and assembly language;
know basic concepts of VHDL hardware description language.
After the completion of this tutorial the students are intended to know how to:
implement a MB processor with their own conguration;
design their own peripherals;
eciently characterize an algorithm in software and hardware components for an FPGA
system;
This tutorial is organized as follows. In section 2 the background on the MB softcore and the
hardware modules is described. Section 3 nalizes with an introductory example conguration
of a MB based system.
1
2 Preliminaries
In this section we introduce the basic concepts of the FPGA powered Digilent S3 starterkit
board and the MB software processor.
2.1 Digilent S3 starterkit board
This board integrates several devices, which functionality can be exploited by the possible
embedded FPGA congurations. Among these devices are a 50MHz clock generator; SRAM;
VGA, PS2, and serial ports; and leds, switches and buttons. These devices are statically
assigned to the FPGA pads, thus in order to control these devices the proper digital words
may be forwarded to the correct FPGA pads by the user conguration. For full details on this
board the students may refer to [1].
2.2 Microblaze softcore
The use of softcores is a way to speedup the development of digital systems, since these (op-
timized) softcores are usually provided by the physical devices suppliers with several cong-
uration options that the user can set to suit his demands, avoiding the large development
time/costs of a dedicated solution, by shifting the eorts from the hardware design to a soft-
ware design. The time from design to the implementation using these softcores is typically
very short, since the developers are provided with user-friendly wizard-like tools that export
the most suitable conguration for the user applications. The MB softcore is not an exception.
The MB is provided by the Xilinx FPGA supplier, and a complete programming environment
is provided to deal with the conguration of this softcore. This programming environment is
the EDK and supports not only the conguration of the MB softcore, but also of the PowerPC
processor which is a processor available in some of the Xilinxs FPGA platforms.
The MB softcore is a Reduced Instruction Set Computer (RISC) processor implemented
employing the FPGA internal resources (arithmetic, logic and memory). The resources it uses
depend on the conguration. Also, several peripherals can be embedded in this processor al-
lowing, e.g., for enhanced I/O communication, dedicated computation of critical routines. MB
is a processor with a word length of 32 bits, has 32 registers and can be congured to use
cache memory for data and instructions. Its micro-architecture is depicted in Figure 1. The
MB maximum operating frequency depends on the conguration and available resources. More
complex/shorter pipeline congurations may have higher critical path, thus lower frequency.
Also, with fewer resources the options to place the MB in the device are reduced, thus the rout-
ing demands would be higher and, consequently, the operating frequency lower. Nevertheless,
the maximum operating is usually between 50 MHz and 100MHz. The MB allows two dierent
pipeline depths (3 or 5 stage), depending on the user goals (resource saving or high perfor-
mance). Conicts in the pipeline are resolved using stalls and the pipeline stages are relled
during the conditional jumps. Figure 2 depicts the introduction of the stall instructions for a
three-stage pipeline (fetch, decode and execute). MicroBlaze uses a Big-Endian bit-reversed
data type organization in which the most signicant byte is stored in the lower memory address
(see Figure 3). For more details about the MB softcore the students should refer to [2].
2
10 www.xilinx.com MicroBlaze Processor Reference Guide
1-800-255-7778 UG081 (v9.0)
Chapter 1: MicroBlaze Architecture
R
Overview
The MicroBlaze embedded processor soft core is a reduced instruction set computer (RISC)
optimized for implementation in Xilinx

Field Programmable Gate Arrays (FPGAs). Figure 1-1


shows a functional block diagram of the MicroBlaze core.
Features
The MicroBlaze soft core processor is highly configurable, allowing you to select a specific set of
features required by your design.
The fixed feature set of the processor includes:
Thirty-two 32-bit general purpose registers
32-bit instruction word with three operands and two addressing modes
32-bit address bus
Single issue pipeline
In addition to these fixed features, the MicroBlaze processor is parameterized to allow selective
enabling of additional functionality. Older (deprecated) versions of MicroBlaze support a subset of
the optional features described in this manual. Only the latest (preferred) version of MicroBlaze
(v7.10) supports all options.
Xilinx recommends that all new designs use the latest preferred version of the MicroBlaze
processor.
Table 1-1, page 11 provides an overview of the configurable features by Microblaze versions.
Figure 1-1: MicroBlaze Core Block Diagram
DXCL_M
DXCL_S
Data-side Instruction-side
IOPB
ILMB
bus interface bus interface
Instruction
Buffer
Program
Counter
Register File
32 X 32b
ALU
Instruction
Decode
Bus
IF
Bus
IF
IXCL_M
IXCL_S
I
-
C
a
c
h
e
D
-
C
a
c
h
e
Shift
Barrel Shift
Multiplier
Divider
FPU
Special
Purpose
Registers
Optional MicroBlaze feature
IPLB
UTLB
ITLB DTLB
Memory Management Unit (MMU)
DOPB
DLMB
DPLB
MFSL 0..15
DWFSL 0..15
SFSL 0..15
DRFSL 0..15
or
or
Figure 1: MB internal micro-architecture.
44 www.xilinx.com MicroBlaze Processor Reference Guide
1-800-255-7778 UG081 (v9.0)
Chapter 1: MicroBlaze Architecture
R
Pipeline Architecture
MicroBlaze instruction execution is pipelined. For most instructions, each stage takes one clock
cycle to complete. Consequently, the number of clock cycles necessary for a specific instruction to
complete is equal to the number of pipeline stages, and one instruction is completed on every cycle.
A few instructions require multiple clock cycles in the execute stage to complete. This is achieved
by stalling the pipeline.
When executing from slower memory, instruction fetches may take multiple cycles. This additional
latency directly affects the efficiency of the pipeline. MicroBlaze implements an instruction prefetch
buffer that reduces the impact of such multi-cycle instruction memory latency. While the pipeline is
stalled by a multi-cycle instruction in the execution stage, the prefetch buffer continues to load
sequential instructions. When the pipeline resumes execution, the fetch stage can load new
instructions directly from the prefetch buffer instead of waiting for the instruction memory access to
complete.
Three Stage Pipeline
When area optimization is enabled, the pipeline is divided into three stages to minimize hardware
cost: Fetch, Decode, and Execute.
cycle 1 cycle 2 cycle 3 cycle4 cycle5 cycle6 cycle7
instruction 1 Fetch Decode Execute
instruction 2 Fetch Decode Execute Execute Execute
instruction 3 Fetch Decode Stall Stall Execute
Five Stage Pipeline
When area optimization is disabled, the pipeline is divided into five stages to maximize
performance: Fetch (IF), Decode (OF), Execute (EX), Access Memory (MEM), and Writeback
(WB).
cycle
1
cycle
2
cycle
3
cycle
4
cycle
5
cycle
6
cycle
7
cycle
8
cycle
9
instruction 1 IF OF EX MEM WB
instruction 2 IF OF EX MEM MEM MEM WB
instruction 3 IF OF EX Stall Stall MEM WB
Branches
Normally the instructions in the fetch and decode stages (as well as prefetch buffer) are flushed
when executing a taken branch. The fetch pipeline stage is then reloaded with a new instruction from
the calculated branch address. A taken branch in MicroBlaze takes three clock cycles to execute,
two of which are required for refilling the pipeline. To reduce this latency overhead, MicroBlaze
supports branches with delay slots.
Figure 2: Pipeline stages and conict resolution.
12 www.xilinx.com MicroBlaze Processor Reference Guide
1-800-255-7778 UG081 (v9.0)
Chapter 1: MicroBlaze Architecture
R
Data Types and Endianness
MicroBlaze uses Big-Endian bit-reversed format to represent data. The hardware supported data
types for MicroBlaze are word, half word, and byte. The bit and byte organization for each type is
shown in the following tables.
Byte address n n+1 n+2 n+3
Byte label 0 1 2 3
Byte significance MSByte LSByte
Bit label 0 31
Bit significance MSBit LSBit
Byte address n n+1
Byte label 0 1
Byte significance MSByte LSByte
Bit label 0 15
Bit significance MSBit LSBit
Byte address n
Bit label 0 7
Bit significance MSBit LSBit
Use Xilinx Cache Link for All I-Cache Memory Accesses - - - - option
Use Xilinx Cache Link for All D-Cache Memory Accesses - - - - option
1. Used in Virtex-2Pro and subsequent families, for saving MUL18 and DSP48 primitives.
Table 1-1: Configurable Feature Overview by MicroBlaze Version
Feature
MicroBlaze Versions
v4.00 v5.00 v6.00 v7.00 v7.10
Table 1-2: Word Data Type
Table 1-3: Half Word Data Type
Table 1-4: Byte Data Type
Figure 3: MB endianness
3 My rst MicroBlaze implementation
This section will guide you through the implementation of a comprehensive example employ-
ing the MB softprocessor and some peripherals, using the EDK environment. Some les are
provided to you with this tutorial to get your path easier.
Creating an EDK project
a) Create a folder that will contain your project. You should use only letter and number
3
standard characters. Avoid create this folder in a location with a very large path, or in
other words, keep this folder near the root of your le system. You can name this folder as
you like. In this tutorial we will refer to this folder as project_path.
b) Open a command prompt and run xps. The EDK environment will now open.
c) You will be prompt if you would like to create an empty project or create a new one. Select
Base system Builder wizard and press OK.
d) You will be asked for your project name and path. Browse for the folder project_path and
rename the *.xmp with mb.xmp. Instead of rename to mb.xmp you can use any other
name. However, in order for this tutorial to be easier to follow, we suggest you to use the
names pointed in the tutorial. Press OK.
e) Now select I would like to create a new design and press Next.
f) Now you have to select the board that you will use. The EDK environment has already
specied boards. By selecting these boards you will have some of the work done for you,
namely in what refers to the FPGA pad mapping. Unfortunately, your board is not already
dened, hence you must choose I would like to create a system for a custom board and
press Next.
g) Now, you have to select the FPGA type. Check the FPGA label in your board for these
details. You shall have a xc3s1000-4ft256 FPGA, or in other words, a Spartan 3 architecture,
with a ft256 package, and speed grade -4. This FPGA does not have stepping. Select the
MicroBlaze processor option and press Next.
h) Now you have to choose some of the MB congurable architecture characteristics. The rst
one is the operating frequency. MB has the possibility to adjust the FPGA input clock to
several operating frequencies. In this project you do not need adjustments, hence we will
use the board reference clock (50 MHz) to feed the MB clock requirements. Thus we must
select the same frequency of the reference clock to the processor-bus clock. We will assign
the MB reset signal to be active when it is High or 1. We do not need debug hardware,
neither cache, nor oating point support. For the local memory we will select 32 KB, that
shall be enough for our simple introductory example. Press Next. You can always change
these settings later after the wizard is nished.
i) For now we will not add any I/O device. Press Next.
j) Also, we will not add any peripheral for now. Press Next. You will be warned that you are
conguring an architecture that have no outputs, which makes sense since processors with
no outputs do not worth for nothing... But do no worry, ignore the warning because we will
add I/O devices and peripherals pretty soon, just after the wizard is nished.
4
Figure 4: MB diagram after project creation.
k) You can select if you are interested in an sample C application, in this case a memory test.
We do not need any sample, since this tutorial will give it for you. Press Next.
l) Now you are informed about the MB main characteristics as well as the address ranges that
will be used to store data and instructions. Press Generate.
m) The wizard will inform you that everything was complete successfully (if not, repeat thor-
oughly the previous steps). It lists the conguration les and asks you if you want to save
settings to be used in future projects. You can save the settings le. Press Finish. The
wizard will also warn you that you are not using a board already dened in EDK, hence you
must pay attention to the FPGA pin assignments and congurations. Do not worry since
this tutorial will tell you how to do that.
n) We are done for now. If you select the Block Diagram1 tab we should be able to see
something like Figure 4. In this diagram you identify the MB connected to a block of memory
for data and instruction and the correspondent memory controllers. The controllers connect
to MB using the Local Memory Bus (LMB) type. You can also identify the Processor Local
Bus (PLB) which has nothing connected for now.
Using a peripheral
a) Now that we have a MB project, we must use it to do something. For now "something"
means "put the leds blinking". To do this we will use a General Purpose I/O (GPIO)
peripheral to get signals out of the MB and connect it to the leds on your board. Some
peripherals, including the GPIO, are already available in the EDK environment. In the
IP Catalog tab in the left part of your screen you can nd the GPIO peripheral in the
General Purpose IO folder. Expand it, right-click over XPS General Purpose IO, and
select Add IP. You can rename the peripheral at your own taste. In this tutorial we
rename it for leds_output_gpio. Note that you can always right-click the components in
the IP catalog and select View PDF datasheet.
b) We have our GPIO already added to our project, now we need to do the right connections
5
in both Bus interfaces and Ports, as well as reserve a range in the address space for the
MB to communicate to this peripheral. The GPIO interface is a slave to the PLB. In the
Bus interface tab if you expand the recently added GPIO you can select the bus to connect
to, in this case the PLB bus in our system, which shall be named mb_plb.
c) We have already the peripheral connected to the bus. Now select the Ports tab to connect
the ports. In this tab we can manage the interconnections that are not merged in bus
structures. Beyond the systems components, we also have the external ports, that for now
is just a reset and clock signal. Expand the external ports. We can rename the ports
names as well as the interconnection nets to our own taste. In this tutorial we rename
sys_rst_pin to io06 and sys_clk_pin to clock. The net that connects to the io06 pin
we will call sn and the net connected to clock we will call sclk2dcm.
d) The wizard created a system component, the clock_generator, that is useful to control the
clock properties and assure that it is stable throughout all the FPGA area. We connect the
sclk2dcm net to this component input (CLKIN), and rename the net connected to the
output (CLKOUT0) to sclk. The net sclk is the one that we will use to distribute the
clock signal for all other components.
e) Since we renamed the clock and reset nets we will have to expand all the other components
and connect to the new names. You need to update the clock signal sclk in the dlmb,
ilmb, mb_plb, and proc_sys_reset_0. The reset sn signal need to be updated in
proc_sys_reset_0.
f) We are done with the clock and reset nets. Now check the GPIO peripheral ports. There
are several ports, but we need only one: GPIO_d_out, since it is a registered output. The
other ports are bidirectional or employ tri-state connections, characteristics that we are not
interested in. Lets congure the GPIO prior the connection. Right-click on the peripheral
and select congure IP. Since we have 8 leds we need 8 bit width for the GPIO. We need
only one channel for the communication, thus we do not need to enable the channel 2. Also,
we are not using interrupts. Now we need to congure the channel 1. Update the channel
1 options according to our application: the GPIO is only an output. Regarding the default
out values, we can put there anything.
g) Lets connect our GPIO to the external ports, so we will be able to connect to the leds.
Create a net that connect to the GPIO_d_out and name it to led_out.
h) Add an external port to the system by clicking in the Add External Port in the top right
of the environment window. Rename the port name to led_out and the connection net to
led_out, the same name you used in the GPIO net. Set the direction to output and the
range to 7:0, the 8-bit width of the GPIO. With this we complete the ports connection.
i) Now we need to generate a valid address range to the GPIO in order to be possible to
access it by software. Select the Addresses tab and click Generate Addresses. This will
automatically generate a valid address range, and since we have no special requirements
about it we can use the automatic address range.
6
j) One last step stands to have our hardware set: assign the correct FPGA pins to our external
ports and set the operating frequency constraints. For this, select the Project tab on the
left side of the environment window. There is a le mb.ucf which is the one that contains
the pin assignment and constraint information. Double click on this le. The content of this
le was automatically generated during the project creation. Since we changed/renamed
the external ports since then, you can remove all the info from this le. With this tutorial
a mb.ucf was provided to you. Open the le that was provided to you and copy only the
lines related to our systems external ports to the mb.ucf le in your EDK project (control
and debugging signals, and the clock signal constraints).
k) We are done with the hardware specication of our system. Hence we are able to gener-
ate all software libraries that suit our hardware implementation. For this, click the menu
Software->Generate Libraries and BSPs. This will upgrade the les in the project_path/
microblaze_0 folder. We can nd software examples for the peripherals you selected in the
project_path/microblaze_0/libsrc folder. However, in practice you only need the include
les in the project_path/ microblaze_0/ include that the EDK environment adds to your
compilation path automatically. We suggest you to take a look at these include les since
it will help you to understand the routines and macros that we will use in our application.
l) Now we are able to implement the software. In the project_path create a folder with the
name running_leds, which is the name of our application. In the project_path/running_leds
create the folder src that will keep the source of our software application. Copy the
main_leds.c le that was provided to you with this tutorial to this folder.
m) In the EDK environment select the Applications tab. Double-click in Add software Appli-
cation.... Give the same software project name that was used to name the application folder:
running_leds and click OK. Right-click on the sources of the recently created software
project and select Add existing les. Browse to the main_leds.c le in the project_path/
running_leds/src. Take a a comprehensive look on this source le. It is commented and is
intended to be self-explanatory. Also take a look in the include les used that are in the
project_path/ microblaze_0/ include folder.
n) Right-click the software project and select Mark to Initialize BRAMs. Now, this project
will be used to initialize the MB when the FPGA conguration le is loaded to the FPGA.
Right-click the software project and build the project.
o) Right-click on your application and select Set Compiler Options.... In the Debug and
Optimization tab set the compiler optimization to No optimization. This option will avoid
the compiler to remove useless instructions, such as the delay loops presented in this sample
application. Without the delay loops you will not be able to visualize the application. For
other applications you may use the O2 optimization level for higher performance, which is
not critical in this application.
p) Now we have everything ready to generate the FPGA conguration le. For this select
Hardware->Generate Bitstream. This will generate the conguration le. It takes a while
7
General
Purpose I/O
MicroBlaze
FSL
peripheral
F
P
G
A

p
a
d
s
FSL
PLB
Internal nets
S
l
a
v
e
M
a
s
t
e
r
M
a
s
t
e
r
S
l
a
v
e
LEDs
External nets
FSL
S
l
a
v
e
M
a
s
t
e
r
(mb2peripheral)
(peripheral2mb)
Figure 5: MB based FSL example overview.
since the synthesis, mapping and place of the project to the FPGA resources will be accom-
plished.
q) Before conguring the device we need to assign the correct procedure for the conguration
tool to congure the design successfully. This procedure is dened in the IMPACT Com-
mand le in the Project tab. Remove the content of this le and copy the content of the
download.cmd le provided to you with this tutorial for your IMPACT Command le.
r) Select Device Conguration->update bitstream to load the software application to the
conguration le and select Device Conguration->Download bitstream to program the
device. Voil, now you should have the leds blinking. If not, review all the previous steps.
Creating a peripheral
a) We already know how to use a peripheral with the PLB bus. Now let us create our own
peripheral. We will not use the PLB for our peripheral. We will create a Fast Simplex
Link (FSL) bus based peripheral which is a simpler peripheral. In Figure 5 is depicted an
overview of the implementation we aim to accomplish. The FSL bus works as a FIFO queue
and assumes only one slave and one master of the bus. Thus, it allows to communicate
between only two components. The slave is the component that receives data and the
master is the component that sends data. We will use the wizard to create the peripheral.
To run the wizard select Hardware->Create or Import Pripheral. Press Next
b) Select the option that states Create templates for new peripheral. Press Next.
c) You will be asked for where do you want your peripherals stored. You can choose to store
the peripheral in a common repository of EDK, or you can keep it stored locally, only for
the current project. Select the latter: To an XPS project. Press Next.
d) Insert the name of your peripheral. In this tutorial we will name the peripheral fsl_example.
Press Next.
e) Select the FSL bus to attach the peripheral. Press Next.
f) You will be asked for the size of the input and output FIFO queue. Select 1 32-bit word for
input and output. Select Output FSL interface. Since the peripheral has input and output
it will be a master and a slave at the same time.
8
g) You will be asked for your preferred hardware description language and if you want to follow
a dierent ow for our design than the EDKs. In this tutorial we will use VHDL language
and will follow the EDKs ow. Keep only the last option selected that will generate a
software template to our new peripheral. Press Next.
h) Some info is required for your software template. You can add a function descriptor at your
own taste. In this tutorial we will keep new FSL core. Press Next.
i) The tool will congratulate you for your rst peripheral creation. Press Finish.
j) Take a look at project_path/pcores. The generated core is dened in this folder. You can
nd the VHDL description of the device at the hdl/vhdl folder. The software template
can be found in project_path/driver. Inside your peripheral folder there is a src folder
that contains a sample application. Both VHDL and software les generated by the wizard
are commented and are intended to be self-explanatory. The template generated consists
of a accumulator of all the consecutive inputs that t the FSL bus FIFO queues depth
(from our options, only one position exist in the queue). For our options, only one input is
accumulated, hence the output is the same than the input.
k) Now we will add the peripheral to our design. At end of the IP catalog tab you have your
recently created peripheral. Right-click on it and select Add IP.
l) You may have notice that we have not any FSL in the system, hence we need to add it.
In the IP catalog in Bus and bridge section you have the FSL bus. Add two FSL buses:
one to communicate data from the peripheral to MB and another one to communicate data
from MB to the peripheral. Rename one of the FSL buses to peripheral2mb and the other
one to mb2peripheral.
m) In the bus interface connect the master port of the peripheral (MSFL) to the peripheral2mb
FSL bus, since in this bus it is the peripheral that is sending data. Connect the slave port
to the other FSL bus.
n) Now we need to connect the FSL buses to the MB. In the Bus interfaces tab expand the
microblaze_0. You will not nd any port to connect to the FSL buses. Which means that
we have to congure the MB to connect to FSL buses. For this, right-click on the microb-
laze_0 and select Congure IP. Go to the buses tab and update the number of FSL
links to 1. Click OK. Now we have the FSL ports available. Connect the mb2peripheral
FSL bus in the master port (MFSL0) and the other FSL bus in the slave port.
o) In the Ports tab we will need to connect the clock and reset ports in both FSL buses to
all the clock and reset reset nets from external ports (net sclk and sn, respectively). In
the Address tab we do not need to update anything, since the simplied FSL communi-
cation relies on dedicated instructions of the MB instruction set to handle the FSL data
transactions.
9
p) We completed the procedure to attach the FSL peripheral. We must select software-
>generate libraries and BSPs for the tools to update the header les that we can use
in our software application.
q) Now you can update your software application to use your FSL peripheral, for this check
the project_path/driver for a template and pay attention to the comments. Also, you
can change the hardware description at your own taste by modifying the VHDL le in
project_path/pcores.
r) We suggest you to perform the same application of running_leds, but performing the
rotational shift operation in your new peripheral instead of software.
References
[1] Xilinx Inc. Digilent S3 Starterkit board. http://digilentinc.com/ Data/ Products/
S3BOARD/ S3BOARD_RM.pdf.
[2] Xilinx Inc. Microblaze Reference Manual, version 10.1. http://www.xilinx.com/ support/
documentation/ sw_manuals/ mb_ref_guide.pdf.
[3] Xilinx Inc. Xilinx FPGA Documentation. http://www.xilinx.com/ support/ documenta-
tion/ index.htm.
[4] Xilinx Inc. Xilinx ISE and EDK tools. http://www.xilinx.com/ support/ download/ in-
dex.htm.
10

You might also like