Professional Documents
Culture Documents
Xilinx
FPGA
Core
nP
SRAM Custom
Instruction Set
Control
Resource IOB CLB
Address Register 11
Instruction Register 5
Page Address Register 3
Address Register (AR) Address Multiplexer 11
Program Counter (PC)
Program Counter 12
11 Bit Address Bus Accumulator 9
Control Logic 2 8
Total 21 40
Figure 3: Nano Processor Core Architecture.
Table 2: Resource Utilization of Nano Processor Core.
The data path size for the nP core is eight bits - As stated previously, the core nP consumes 40 Xil-
the width of the attached SRAM. The various register inx CLBs with resources divided among the functional
sizes are established as a result of this 8-bit data width. units as described in Table 2. The goal in this design
The nano processor consists of ve registers: is to minimize the logic necessary for control in or-
der to leave valuable recongurable logic for custom
Instruction Register (IR), hardware.
Page Address Register (PAR),
Program Counter (PC), 3.3 Instruction Set
Address Register (AR), As stated previously, the nP core instruction set
Accumulator (A). consists of six standard instructions. To simplify
execution, the nano processor has xed instruction
To conserve resources, the IR, PAR, and the AR lengths of two bytes. Each instruction contains only
are all stored in Xilinx IOB
ip-
ops (Figure 3). Un- two parts: an instruction opcode, and one operand ref-
der the current architecture, the IR contains ve bits erence. The operand reference is split into two parts:
and the PAR contains three bits. Five IR bits al- the page address (3-bits) that species which of the
lows up to 32 unique instructions, and three PAR eight 256-byte pages the reference belongs, and the
bits allows up to eight dierent pages (256-byte pages). page oset, an eight bit oset value within the speci-
For the Xilinx implementation, both registers can be ed page.
mapped into IOBs to conserve available registers and The rst byte contains the instruction opcode in
logic. the lower ve bits, and the page address in the upper
The program counter (PC) and the address regis- three bits. The second byte contains the page oset
ter (AR) are both eleven bits wide allowing for a 2K (Figure 4).
IEEE Workshop on FPGAs for Custom Computing Machines, Napa, CA, April 10-13, 1994, pg. 23-30. 5
Byte 1 Byte 2 STore Accumulator
PAR OPCODE OFFSET to memory STR mem[AR] <- A
7 4 0 7 0 LoaD accumulator
from memory LD A <- mem[AR]
LoaD accumulator
Figure 4: Nano Processor Instruction. from memory + C LDC A <- mem[AR]+C
ADd memory to
accumulator with Carry ADC A <- A+C+mem[AR]
The nano processor has a three-stage instruction SuBtract memory
cycle. from accumulator - C SBB A <- A-C-mem[AR]
Jump to new location
Instruction Fetch (IF) at No Carry JNC PC <- AR (if C=0)
Instruction Decode (ID)
Execution cycle (EX)
Table 3: EX stage for Nano Processor instructions.
The IF stage performs two primary operations.
First, it loads the instruction register and the page
address register with the rst byte of the instruction with custom instructions on the available recong-
specied by the PC. Second, it increments the pro- urable hardware.
gram counter. Custom instructions are developed as separate
stage IF: modules using conventional schematic entry or syn-
thesis methods. Instruction modules interface with
IR <- mem[PC],0-4 the nP core by having access to nP core registers and
control signals. Each custom instruction module must
PAR <- mem[PC],5-7
decode the IR register during the ID stage to detect
PC <- PC + 1
the instruction reference. During the EX stage, the
The ID stage fetches the second byte of the instruc- instruction may make use of operand reference on the
tion word (page oset) and calculates the address of 8-bit data bus.
the referenced operand (specied by the PAR and the With the instruction set dened, the nano assem-
page oset). In addition, it increments the PC to pre- bler is used to generate the program les. The nano
pare for the next instruction. assembler is a
exible assembler that includes instruc-
tion denition support for custom instructions. Before
stage ID: any program can be written, the instruction deni-
tions must be built. The instructions are dened using
AR <- mem[PC] + PAR the .INST assembler directive. Although the instruc-
PC <- PC + 1 tions can be dened in each program, it is best to write
an include le that has all unique instruction deni-
The EX stage performs the desired function on tions for an individual nP conguration. This insures
the operand specied by the opcode. Although ve that all instruction calls for the same conguration are
instruction register bits allow for 32 unique instruc- the same. The following parameters for each instruc-
tions, the core nP implements only six instructions tion must be dened: instruction name, opcode, and
and leaves the extra instruction slots available for cus- instruction length. An example instruction denition
tom instructions. The basic operation of the EX stage for the core nP instructions dened above is seen in
is as follows: Figure 5.
After the instructions are dened, a conventional
stage EX: assembly language program can be written for the new
processor. Conventional assembler directives, labels,
A <- A op mem[AR] macros and commands can then be added to obtain a
functional program. Figure 6 is a code segment that
The six basic instructions are described in Table 3. shows how the dened instructions are used to imple-
This limited instruction set contains all the necessary ment a simple counter.
features to implement a larger and more complicated
instruction set, while minimizing the required control
3.5 Performance
logic. In order to optimize performance, the design goal
was to minimize the system cycle time. Because of
3.4 Instruction Set Augmentation the synchronous nature of the design, the cycle speed
As stated earlier, custom functionality for the nP is limited by the slowest unit in any of the three cycles.
is provided through custom instructions. The custom Using the - 125 speed grade and Xilinx's APR with no
instructions, along with the six instructions provided optimizations, the slowest signal in the control logic is
with the core nP, provides a custom instruction set for approximately 30 ns for a system cycle speed of 33
each nP. Although a nP can operate without any cus- MHz. The nP will operate at 11 MIPS under this
tom instructions, the nP is intended to be extended conguration. Maximum system clock is estimated
IEEE Workshop on FPGAs for Custom Computing Machines, Napa, CA, April 10-13, 1994, pg. 23-30. 6
SRAM SRAM
DRAM
; SAMPLE INSTRUCTION DEFINITION FILE Xilinx Xilinx
; test.inc
;
3090 3090
; .INST = COMPILER DIRECTIVE ADC
; (INSTRUCTION DEFINITION)
; .INST "<name>", <opcode>, <opcode length> DAC
.INST "STR", 0x07, 0x0001
.INST "LD", 0x02, 0x0001 MIDI
.INST "LDC", 0x03, 0x0001
.INST "ADC", 0x01, 0x0001
.INST "SBB", 0x00, 0x0001
PC Interface
.INST "JNC", 0x05, 0x0001
Figure 7: X2 Layout.
Figure 5: Example Instruction Denition.
at 75 MHz using -230 speed grade parts and routing
optimizations.
4 Nano Processor Applications
A number of custom Nano Processors have been im-
plemented on recongurable systems with encouraging
results. A good example of how the Nano Processor
operates on a recongurable system is the National
Technologies Inc., X2 sound card. The X2 is a small
recongurable logic system with the external compo-
nents necessary to implement a 16-bit stereo sound
; program test.nsm card on a PC system. Specically, the card includes
.include test.inc two Xilinx 3090 FPGAs, two 32K x 8 SRAMs, 1 Mb
DRAM, a 16-bit stereo Codec, and a PC interface
:loop_back (Figure 7).
ld temp Although the X2 oers two reprogrammable FP-
adc one GAs for general purpose recongurable systems, it was
str temp specically designed for a versatile PC sound card sys-
tem. The on-board FPGAs allow for multiple hard-
sbb count
ware realizations of sound related algorithms as well
jnc stop
as control over the data acquisition. Currently, a num-
adc zero ber of unique congurations run on the system for a
jnc loop_back wide variety of audio applications. A subset of these
stop: congurations include those using the Nano Processor
jnc stop as the core processing unit (Figure 8).
The audio interface is a Nano Processor congura-
; data definitions tion that implements custom instructions and logic
one: .db 0x01 to interface 48 kHz stereo audio data to and from
zero: .db 0x00 the PC as well as asynchronous MIDI (Musical In-
strument Digital Interface) data. It includes several
count: .db 0xdd
software modules that change the functionality of the
temp: .db 0x00
interface system. The saturating mixer is a Nano Pro-
cessor conguration that mixes multiple audio data
Figure 6: Sample nP Code. les. Running on the X2 sound card, the saturating
mixer executes 240 times faster than a 486-33 PC.
This conguration is used with special audio editing
tools to speed up audio editing features. A number of
other audio editing eects and acquisition congura-
tions are under development that take advantage of nP
versatility. Each custom processor has the same core
IEEE Workshop on FPGAs for Custom Computing Machines, Napa, CA, April 10-13, 1994, pg. 23-30. 7
External SRAM
Codec Input Interface
PAR IR Accumulator C
Control
PC Input Interface
Mixer .
. High Address Register
. .
.. Executable #m
Configuration
#n
Figure 9: X2 Audio Interface Conguration.
Figure 8: X2 Nano Processor Congurations. implements a custom UART that operates indepen-
dently of the nP. The nP includes instructions to poll
the incoming data port, send a data byte, and control
the function of the MIDI interface. All overhead asso-
instruction set yet employs dierent custom instruc- ciated with the interface is encapsulated in the MIDI
tions unique to its application. The audio interface hardware module.
processor has custom instructions to eciently handle The Codec interface must control the external
audio data transfers as well as external device con- ADC/DAC and send it the appropriate data. This in-
trol. The saturating mixer includes a custom multiply terface implements eight input ports dedicated to the
and accumulate instruction and other special-purpose ADC/DAC. Four 8-bit registers buer the two incom-
signal processing functionality. ing 16-bit audio data bytes, and four 8-bit registers
4.1 Audio Interface buer the two outgoing audio data bytes. The inter-
The audio interface is a custom nP conguration face must have the ability to change the various modes
designed to control a complex multi-media sound card. of the ADC/DAC, and adjust data
ow appropriately.
The card has three major functions that must be care- The PC interface must handle PC requests for data
fully integrated: in a timely fashion, and receive data from the PC at
audio data rates. Similar to the Codec interface, the
Transfer of stereo 48kHz PCM audio data be- PC interface uses four 8-bit input registers and four
tween ADC/DAC and PC, 8-bit output registers. Custom port read and write in-
Handle all asynchronous data transfer to and structions automatically control a six-byte FIFO that
from the external MIDI port, is used to buer data to and from the PC. Interfac-
Control external synthesis engine. ing with these ports requires only simple PC port-read
and port-write functions.
To appropriately handle the data transfer and The Synthesis interface controls the operation of
Codec control, ve modules were added to the core the wavetable synthesis engine. The wavetable load
nP (Figure 9): instruction used for this interface automatically loads
a specic wavetable in the DRAM with an incoming
MIDI Interface, data packet. In addition, special-purpose control reg-
Codec Interface, isters are used to modify the synthesis behavior.
PC Interface, The memory interface buers incoming and outgo-
ing audio data on the 32k x 8 SRAM used for the nP
Synthesis Interface, program memory. Because the nP core can only ad-
Memory Interface. dress 2K, an extra high address register is added to
address higher pages in memory. The nP program is
Each module interfaces with an external device at- stored in the low 2k, and the upper 30k is used for au-
tached to the nP, and contains the custom function- dio data buering. Custom instructions are available
ality necessary to independently handle the interface. that set this high address register, and access data
Associated with each hardware module is a set of in- using this high address register.
structions used to control and read the interface. The individual interfaces allow custom control for
The MIDI interface handles the interface to the se- each module in the system. Unique control of these
rial UART used for MIDI data transfer. The inter- interfaces is available through unique custom instruc-
face must be responsible for receiving and transmit- tions. The operation of these interfaces is dependent
ting asynchronous data at 32 kbits/sec. The interface upon the software system associated with it. This al-
IEEE Workshop on FPGAs for Custom Computing Machines, Napa, CA, April 10-13, 1994, pg. 23-30. 8
lows for
exible control over the interface without re- Recongurable processors with custom instructions
designing the nP. are an eective way of implementing recongurable
4.2 Interface Operating System logic systems. Recongurable processors oer a more
The audio interface nP oers all the hardware capa-
exible environment of development than conventional
bility necessary to control the external devices simul- recongurable systems while oering similar high lev-
taneously. Although the hardware for the interfaces is els of performance.
available, software modules must be present to control
each interface. Software modules allow custom control
of the interfaces to tailor the hardware to the specic
needs of the user.
Currently, there are ve software modules that run
on the audio interface. Other software modules may
be available in the future to allow further control over
the processor. The ve software modules dier in the
control over the PC and Codec interfaces. For varying
audio data formats, each interface must transfer data
dierently. Each of the ve software modules changes
the control of the interfaces to adapt the card to the
appropriate data format. The ve data formats are as
follows:
16-bit stereo (in/out),
16-bit mono (in/out),
8-bit stereo (in/out),
8-bit mono (in/out),
dual channel 16-bit mono (in/out).