You are on page 1of 24

Directives for Performance optimization

Affecting Latency (but also Affecting Throughput (mainly):


throughput): – Pipelining directive
– Default behavior: • On operators in functions or loops.
• Function parallelism and • Implies unrolling internal loops (affects latency).
• sentence (operator) parallelism, but – Dataflow directive
• not for loops!!! • Only in toplevel function
• Controlled with latency directive. • Similar to pipelining between functions or
– On demand directives: loops (coarse-grain pipelining).
• Loop unrolling • Infers buffers: ping-pong memories (RAM) or
• Loop flattening FIFOs
• Loop merging – Array partitioning and reshaping directives
Directives for Area optimization

Use bit-accurate types Inline directive


– Not a directive, but highly important. – Optimizes area eliminating hierarchy
– Use C++: ap_int and ap_fixed between functions.
– Be careful with integer promotion – Similar to loop flattening
Binding Already seen for performance:
– Configure effort and/or operator – Loop unrolling
minimization – Loop merging
Allocation directive – Loop flattening
– Limit resource (operator) instances – Array mapping
Resource directive • Horizontal or vertical array merging
– Specifies a specific core for an operator • But also array partitioning and reshaping
(ej: pipelined multiplier)
Vivado HLS IO Options

Vivado HLS has four types of IO


1. Data ports created by the original top-level C function arguments
2. IO protocol signals added at the Block-Level
3. IO protocol signals added at the Port-Level
4. IO protocol signals added externally as Pcore Interfaces
Data Ports
– These are the function arguments/parameters
Block-Level Interfaces (optional, but HIGHTLY RECOMMENDED. Added by default)
– An interface protocol which is added at the block level
– Controls the addition of block level control ports: start, idle, done, and ready
Port-Level interfaces (optional)
– IO interface protocols added to the individual function arguments
Pcore interfaces (optional)
– Added as external adapters when exported as an IP
22- 3
IO Level Protocols 22- © Copyright 2015 Xilinx
3
I/O Interfaces
Basic ports Sequential block

#include "adders.h" Data ports Data


int adders(int in1, int in2, int *sum) {
Block protocol ap_ctrl_hs
Adder Example int temp;
*sum = in1 + in2 + *sum; Port protocol ap_hs
temp = in1 + in2;
Port protocol ap_fifo
return temp;
} Port protocol ap_bus

Synthesis

adders ap_done
ap_start ap_idle Custom
adapter
ap_return
in1
in1_ap_vld in1_ap_ack Adapter AXI4LiteS
in2
in2_read
in2_empty_n Adapter AXI4
sum_datain sum_dataout
sum_req_full_n sum_req_write
sum_rsp_empty_n sum_rsp_read
sum_req_din Adapter AXI4Stream
ap_clk
sum_address
ap_rst
ap_ce sum_size
Pcore
Interface Types

Multiple interface protocols


– Every combination of C argument
and port protocol is not supported
– It may require a code modification
to implement a specific IO protocol
No IO Protocol
Key:
I : input
IO : inout
O : output
D : Default Interface Wire handshake protocols

Memory protocols : RAM


: FIFO
Bus protocols

Block Level Protocol

Block level protocols can be applied to the


return port - but the port can be omitted and just
22- 5 the function name specified

IO Level Protocols 22- © Copyright 2015 Xilinx


5
Port-Level Interfaces

The AXI4 interfaces supported by Vivado HLS include


– The AXI4-Stream (axis)
• Specify on input arguments or output arguments only, not on input/output
arguments
– The AXI4 master (m_axi)
• Specify on arrays and pointers (and references in C++) only. You can group
multiple arguments into the same AXI4-Lite interface using the bundle option
– The AXI4-Lite (s_axilite)
• Specify on any type of argument except arrays. You can group multiple
arguments into the same AXI4-Lite interface using the bundle option

Creating Processor © Copyright 2015 Xilinx


System 24- 6
Interface Modes
Pointer or
Scalar Array
Argument Type Reference
pass-by-value pass-by-reference pass-by-reference

Native AXI Interfaces Interface Mode Input Return I IO O I IO O

ap_ctrl_none
– AXI4 Slave Lite and AXI4 Master supported
ap_ctrl_hs D
by INTERFACE directive ap_ctrl_chain
– Provided in RTL after Synthesis axis
axis
s_axilite
s_axilite
– Supported by C/RTL Co-simulation m_axi
m_axi
– Supported for Verilog and VHDL ap_none D D
ap_stable
BRAM Memory Interface ap_ack
© Copyright 2015 Xilinx
– Identical IO protocol to ap_memory ap_vld D
ap_ovld D
– Bundled differently in IP Integrator ap_hs
• Provides easier integration to memories with ap_memory D D D
BRAM interface bram
bram
ap_fifo
ap_bus

Supported. D = Default Interface Not Supported

Creating Processor
System 24- 7
Coding Considerations
fir.c
The vast majority of C, C++ and SystemC is supported data_t fir (
data_t x
– Provided it is statically defined at compile time ){
fir_coeff.h
– If it’s not defined until run time, it won’ be synthesizable static data_t shift_reg[N];
-1,
acc_t acc;
-20,
int i; -19,
84,
Any of the three variants of C can be used const coef_t c[N+1]={
271,
370,
271,
– If C is used, Vivado HLS expects the file extensions to be .c #include "fir_coef.h"
88,
}; -19,
– For C++ and SystemC it expects file extensions .cpp -20,
-1,
acc=0;
mac_loop: for (i=N-1;i>=0;i--) {
Static and Const have great impact on array implementation if (i==0) {
acc+=x*c[0];
– Const implies a ROM shift_reg[0]=x;
} else {
– Statics are initialized at “Start up” only shift_reg[i]=shift_reg[i-1];
acc+=shift_reg[i]*c[i];
}
}

return=acc;
}
22- 8
IO Level Protocols 22- © Copyright 2015 Xilinx
8
Coding Considerations

Unsupported: System calls


– Most of C system calls do not have hardware counterparts and are not synthesizable
• printf(), getc(), time(), …
Unsupported: Dynamic memory allocation
– Not allowed since it requires the construction (or destruction) of hardware at runtime
– malloc, alloc, free are not synthesizable
Unsupported: General Pointer Casting
– Casting a pointer to a different type is not allowed in the general case
– Pointer casting is allowed between native C integer types
Unsupported: Recursive Functions
– The code re-entrance indirectly uses dynamic memory allocation

22- 9
IO Level Protocols 22- © Copyright 2015 Xilinx
9
Coding Considerations

Structs as pointers
– Structs are implemented differently as pointers or pass-by-value

#include "bus.h"
Pointer arithmetic void foo (int *d) {
static int acc = 0;
– Only supported as interface using ap_bus int i;

– With an ap_fifo interface all rd&wr must be sequential for (i=0;i<4;i++) {


acc += *(d+i+1); Pointer
*(d+i) = acc; arithmetic
}
Converting malloc pointers }

– Dynamic memory allocation is not supported


– Use this style as a workaround: // Internal image buffers
#ifndef __SYNTHESIS__
my_type *yuv = (my_type *)malloc(N*sizeof(my_type));
my_type *scale = (my_type *)malloc(N*sizeof(my_type));
#else // Workaround malloc() calls w/o changing rest of code
my_type _yuv[N];
my_type _scale[N];
my_type *yuv = &_yuv;
my_type *scale = &_scale;
#endif
res = *yuv;
Coding Considerations

Multi-access pointers
– Are read or written to multiple times during a function call
– This can result in incorrect hardware
– This cannot be verified using a test bench: only the final pointer value is passed
to the test bench (intermediate values can only be seen by using printf)
– DO NOT USE THEM unless there is no other option
– If you do, must use volatile:
• C Compilers will not optimize the IO accesses
• Will assume the pointer value may change outside the scope of the function & leave the
accesses void fifo (volatile int *d_o,
volatile int *d_i) {
• In this ex. Vivado HLS will create a design with 4 reads and 2 writes static int acc = 0;
int cnt;
• Without volatile: for (cnt=0;cnt<4;cnt++) {
acc += *d_i;
 C compilers will optimize to 1 read and 1 write if (cnt%2==1) {
*d_i is read multiple times

 Vivado HLS will create a design with 1 read and 1 write *d_o = acc;
*d_o is written to multiple times
}
}
}
Generate the hardware accelerator

Select Solution > Export RTL


Select IP Catalog, System Generator for Vivado
or PCore for EDK
Click on Configuration… if you want to change
the version number or other information
– Default is v1_00_a
Click on OK
– The directory (ip) will be generated under the impl folder
under the current project directory and current
solution
– RTL code will be generated, both for Verilog and VHDL
languages in their respective folders

Creating Processor © Copyright 2015 Xilinx


System 24- 12
Generated impl Directory
Generated Verilog RTL Files

Point IP Catalog to
point to the ip directory

IP Integrator will use


this file

XSDK will use this


directory

Header file for slave


interfaces
Generated VHDL RTL Files

Creating Processor © Copyright 2015 Xilinx


System 24- 13
C Driver API for AXI4-Lite Interface

Creating Processor © Copyright 2015 Xilinx


System 24- 14
C Driver API for AXI4-Lite Interface
u32 XHls_macc_IsDone(XHls_macc *InstancePtr) {
u32 Data;

Xil_AssertNonvoid(InstancePtr != NULL);
Xil_AssertNonvoid(InstancePtr->IsReady == XIL_COMPONENT_IS_READY);

Data = XHls_macc_ReadReg(InstancePtr->Hls_macc_periph_bus_BaseAddress, XHLS_MACC_HLS_MACC_PERIPH_BUS_ADDR_AP_CTRL);


return (Data >> 1) & 0x1;
}

u32 XHls_macc_IsIdle(XHls_macc *InstancePtr) {


u32 Data;

Xil_AssertNonvoid(InstancePtr != NULL);
Xil_AssertNonvoid(InstancePtr->IsReady == XIL_COMPONENT_IS_READY);

Data = XHls_macc_ReadReg(InstancePtr->Hls_macc_periph_bus_BaseAddress, XHLS_MACC_HLS_MACC_PERIPH_BUS_ADDR_AP_CTRL);


return (Data >> 2) & 0x1;
}

u32 XHls_macc_IsReady(XHls_macc *InstancePtr) {


u32 Data;

Xil_AssertNonvoid(InstancePtr != NULL);
Xil_AssertNonvoid(InstancePtr->IsReady == XIL_COMPONENT_IS_READY);

Data = XHls_macc_ReadReg(InstancePtr->Hls_macc_periph_bus_BaseAddress, XHLS_MACC_HLS_MACC_PERIPH_BUS_ADDR_AP_CTRL);


// check ap_start to see if the pcore is ready for next input
return !(Data & 0x1);

// HLS_MACC_PERIPH_BUS
// 0x00 : Control signals
// bit 0 - ap_start (Read/Write/COH)
// bit 1 - ap_done (Read/COR)
// bit 2 - ap_idle (Read)
// bit 3 - ap_ready (Read)
// bit 7 - auto_restart (Read/Write)
Creating Processor © Copyright 2015 Xilinx // others - reserved
System 24- 15
Use of software drivers to control peripherals
#include “xgpiops.h”

 Simple ex. for the XGpioPS device: XGpioPs_Config *GPIO_Config;


#include “xparameters.h”

1. Include driver header. To access


XPAR_PS7_GPIO_0_DEVICE_ID
device’s configuration, initialization,
read and write functions. Check header
GPIO_Config = XGpioPs_LookupConfig(XPAR_PS7_GPIO_0_DEVICE_ID);
doc. for each peripheral!!!
2. Include xparameters header. Defines XGpioPs my_Gpio;
devices’ names and useful system’s int Status;
parameters
3. Define data structs. For the device’s:
configuration, handler and status
4. Recover configuration settings for the
device
Status = XGpioPs_CfgInitialize(&my_Gpio, GPIO_Config, GPIO_Config->BaseAddr);
5. Initialize device with config settings
6. Use device (write and/or read data
to/from it) XGpioPs_SetDirectionPin(&my_Gpio, 7, 1);
XGpioPs_WritePin(&my_Gpio, 7, 1);

Code&Figs. from: Zynq_ZedBoard_Vivado_Workshop2014.1: Exercise 2


Embedded System Design on Zynq using Vivado

Create a new Vivado project, or open an existing project


Invoke IP Integrator
Construct(modify) the hardware portion of the embedded design by adding the IP-XACT
hardware accelerator created in Vivado HLS
Create (Update) top level HDL wrapper
Synthesize any non-embedded components and implement in Vivado
Export the hardware description, and launch XSDK
Create a new software board support package and application projects in the XSDK
Compile the software with the GNU cross-compiler in XSDK
Download the programmable logic’s completed bitstream using Xilinx Tools > Program
FPGA in XSDK
Use XSDK to download and execute the program (the ELF file)

Creating Processor © Copyright 2015 Xilinx


System 24- 17
Use of software drivers to control peripherals
#include <stdio.h>
#include "platform.h"
1 #include "xgpiops.h" Simple ex. for the XGpioPS device
 Procedure: 2 #include "xparameters.h"

1. Include driver header. To access int main() {


XGpioPs_Config *GPIO_Config;
device’s configuration, initialization, 3 XGpioPs my_Gpio;
read and write functions. Check header int Status;

doc. for each peripheral!!! init_platform();

2. Include xparameters header. Defines printf("Exercise 02\n\r");


devices’ names and useful system’s 4 GPIO_Config = XGpioPs_LookupConfig(XPAR_PS7_GPIO_0_DEVICE_ID);
parameters 5
Status = XGpioPs_CfgInitialize(&my_Gpio, GPIO_Config,
GPIO_Config->BaseAddr);
3. Define data structs. For the device’s:
5 XGpioPs_SetDirectionPin(&my_Gpio, 7, 1);
configuration, handler and status
while(1) {
4. Recover configuration settings for the 6 XGpioPs_WritePin(&my_Gpio, 7, 0);
for (Status=0; Status< 2000; Status++) {
device print(".");
}
5. Initialize device with config settings
6 XGpioPs_WritePin(&my_Gpio, 7, 1);
6. Use device (write and/or read data for (Status=0; Status< 2000; Status++) {
print(".");
to/from it) }
}
return 0;
Code&Figs. from: Zynq_ZedBoard_Vivado_Workshop2014.1: Exercise 2
}
Software Drivers: Simple ex. (no interrupts) PL peripheral
#include "platform.h"
#include "xgpio.h"
#include
#include
"xparameters.h"
<stdio.h>
Simple ex. for the XGpio device

int main() // Write the value back to the GPIO


{ LED_value = 0xFF << 8;
// Declarations of the config struct XGpio_DiscreteWrite(&my_Gpio, 1, LED_value);
XGpio_Config *GPIO_Config;
// Go around in a loop for ever
// Declarations of the GPIO instance struct while (1)
XGpio my_Gpio; {
// Read from the GPIO to determine the position of the DIP switches
// Declare some variables that we will use later DIP_value = XGpio_DiscreteRead(&my_Gpio, 1);
int Status;
unsigned int DIP_value; // Mask the upper 8 bits, so that the value from the previous iteration
unsigned int LED_value; is not re-read.
DIP_value = DIP_value & 0x00FF;

// Assign a value to LED_Value variable, adjusting it as necessary


init_platform(); LED_value = DIP_value << 8;

printf("Exercise 5\n\r"); // Print the values of the variables to the UART to help us debug
printf("DIP = 0x%04X, LED = 0x%04X\n\r", DIP_value, LED_value);
// Lookup the config information and store it in the struct "GPIO_Config"
GPIO_Config = XGpio_LookupConfig(XPAR_AXI_GPIO_0_DEVICE_ID); // Write the value back to the GPIO
XGpio_DiscreteWrite(&my_Gpio, 1, LED_value);
// Initialise the GPIO using a reference to the my_Gpio struct, }
// the struct "GPIO_Config", and also a base address value.
Status = XGpio_CfgInitialize(&my_Gpio, GPIO_Config, GPIO_Config- // Technically we should never reach this far!
>BaseAddress); return (0);
}
// Set the direction of the bits in the GPIO.
// The lower (LSB) 8 bits of the GPIO are for the DIP Switches (inputs).
// The upper (MSB) 8 bits of the GPIO are for the LEDs (outputs).
XGpio_SetDataDirection(&my_Gpio, 1, 0x00FF);

Code from: Zynq_ZedBoard_Vivado_Workshop2014.1: Exercise 5


Software Drivers: using interrupts

 Interrupts related headers:


0
#include "xparameters.h" // Parameter definitions for processor periperals
#include "xscugic.h" // Processor interrupt controller device driver
#include "XHls_macc.h" // Device driver for HLS HW block
#include "xil_exception.h” // Exception handler functions. Not necessary in
// some vers, as it is also called from xscugic.h

Figures from: Zynq_ZedBoard_Vivado_Workshop2014.1: Exercise 8


Software Drivers: using interrupts

0
0

0 PL# Handler = hls_macc_isr

2
6

HLS_MACC
4
Code from: Ug871: Ch10: Using IP with Zynq
Software Drivers: Interrupt example for the XScuTimer (I)
// Look up the config information for the GIC
#include <stdio.h> Gic_Config = XScuGic_LookupConfig(XPAR_PS7_SCUGIC_0_DEVICE_ID);
#include "platform.h" 0 // Initialise the GIC using the config information
#include "xscutimer.h" Status = XScuGic_CfgInitialize(&my_Gic, Gic_Config, Gic_Config->CpuBaseAddress);
#include "xparameters.h"
#include "xscugic.h" // Look up the the config information for the timer
Timer_Config = XScuTimer_LookupConfig(XPAR_PS7_SCUTIMER_0_DEVICE_ID);
#define INTERRUPT_COUNT_TIMEOUT_VALUE 50 // Initialise the timer using the config information
Status = XScuTimer_CfgInitialize(&my_Timer, Timer_Config, Timer_Config->BaseAddr);
// Function prototypes
static void my_timer_interrupt_handler(void *CallBackRef); // Initialize Exception handling on the ARM processor
1 Xil_ExceptionInit();
// Global variables
int InterruptCounter = 0; // Connect the supplied Xilinx general interrupt handler
// to the interrupt handling logic in the processor.
int main() // All interrupts go through the interrupt controller, so the
{ // ARM processor has to first "ask" the interrupt controller
init_platform(); // which peripheral generated the interrupt. The handler that
// does this is supplied by Xilinx and is called "XScuGic_InterruptHandler"
// Declare variables that we'll use later 2 Xil_ExceptionRegisterHandler(XIL_EXCEPTION_ID_IRQ_INT,
int Status; (Xil_ExceptionHandler)XScuGic_InterruptHandler, &my_Gic);

// Declare two structs. One for the Timer instance, and // Assign (connect) our interrupt handler for our Timer
// the other for the timer's config information 3 Status = XScuGic_Connect(&my_Gic, XPAR_SCUTIMER_INTR,
XScuTimer my_Timer; (Xil_ExceptionHandler)my_timer_interrupt_handler, (void *)&my_Timer);
XScuTimer_Config *Timer_Config;
// Enable the interrupt *input* on the GIC for the timer's interrupt
// Declare two structs. One for the General Interrupt 4 XScuGic_Enable(&my_Gic, XPAR_SCUTIMER_INTR);
// Controller (GIC) instance, and the other for config
information // Enable the interrupt *output* in the timer.
XScuGic my_Gic; 5 XScuTimer_EnableInterrupt(&my_Timer);
XScuGic_Config *Gic_Config;
// Enable interrupts in the ARM Processor.
6 Xil_ExceptionEnable();
Code from: Zynq_ZedBoard_Vivado_Workshop2014.1: Exercise 8
Software Drivers: Interrupt example for the XScuTimer (II)
static void my_timer_interrupt_handler(void *CallBackRef)
// Load the timer with a value that represents one second of real time { // The Xilinx drivers automatically pass an instance of
// HINT: The SCU Timer is clocked at half the frequency of the CPU. // the peripheral which generated in the interrupt into this
XScuTimer_LoadTimer(&my_Timer, XPAR_PS7_CORTEXA9_0_CPU_CLK_FREQ_HZ / 2); // function, using the special parameter called "CallBackRef".
// Enable Auto reload mode on the timer. When it expires, it re-loads // We will locally declare an instance of the timer, and assign
// the original value automatically. This means that the timing interval // it to CallBackRef. You'll see why in a minute.
// is never skewed by the time taken for the interrupt handler to run XScuTimer *my_Timer_LOCAL = (XScuTimer *) CallBackRef;
XScuTimer_EnableAutoReload(&my_Timer); // Here we'll check to see if the timer counter has expired.
// Start the SCU timer running (it counts down) // Technically speaking, this check is not necessary.
XScuTimer_Start(&my_Timer); // We know that the timer has expired because that's the
// reason we're in this handler in the first place!
// Create an infinite loop of nothing-ness // However, this is an example of how a callback reference
while(1) { // can be used as a pointer to the instance of the timer
// There's nothing in here, the processor will just sit doing nothing. // that expired. If we had two timers then we could use the same
// The only way we'll see messages on the UART is if there's an interrupt. // handler for both, and the "CallBackRef" would always tell us
// In a real application, this is where the rest of our code would be. // which timer had generated the interrupt.
if (XScuTimer_IsExpired(my_Timer_LOCAL))
// Check to see if we've serviced more than 20 interrupts { // Clear the interrupt flag in the timer, so we don't service
if (InterruptCounter >= INTERRUPT_COUNT_TIMEOUT_VALUE) { // the same interrupt twice.
// Break out of the while loop XScuTimer_ClearInterruptStatus(my_Timer_LOCAL);
break; // Increment a counter so that we know how many interrupts
} // have been generated. The counter is a global variable
} InterruptCounter++;
// Print a message to the UART to show that we've made it out of the while loop // Print something to the UART to show that we're in the
printf("If we see this message, then we've broken out of the while loop\n\r"); interrupt handler
// Disable interrupts in the Processor. printf("\n\r** This message comes from the interrupt handler! (%d)
Xil_ExceptionDisable(); **\n\r\n\n\r", InterruptCounter);
// Check to see if we've had more than the defined number of
// Disconnect the interrupt for the Timer. interrupts
XScuGic_Disconnect(&my_Gic, XPAR_SCUTIMER_INTR); if (InterruptCounter >= INTERRUPT_COUNT_TIMEOUT_VALUE) {
// Stop the timer from automatically re-loading, so
cleanup_platform(); // that we don't get any more interrupts.
return 0; // We couldn't do this if we didn't have the CallBackRef
} XScuTimer_DisableAutoReload(my_Timer_LOCAL);
}
Code from: Zynq_ZedBoard_Vivado_Workshop2014.1: Exercise 8
Software Drivers: Interrupt

You might also like