You are on page 1of 1

UVM and Emulation: How to Get Your Ultimate

Testbench Acceleration Speed-up


Hans van der Schoot & Ahmed Yehia
{hans_vanderschoot,ahmed_yehia}@mentor.com

DUAL-TOP FRAMEWORK FOR TESTBENCH ACCELERATION TESTBENCH ACCELERATION PERFORMANCE DEMYSTIFIED


1. Employ two separated HVL and HDL sides
2. Model all timed testbench code for synthesis on the HDL side, t[total] = t[HVL] + t[HVL-HDL] + t[HDL]
leaving the HVL side untimed Host Workstation Emulator (or simulator)
3. Devise a transaction-level, function-based communication HVL Side HVL-HDL HDL side
(SV/UVM)
interface between HVL and HDL sides Channels (SV XRTL)

Testbench
Proxy SV Virtual DUT
Model
Acceleratable UVM Testbench Acceleratable UVM Agent Model Interfaces
BFM

Single Unified Testbench for Simulation and Acceleration


HVL Top

Proxy Class
Scoreboard Proxy SV Connect BFM
Model (SCEMI DPI-C)
Test
Controller Coverage
BFM
Interface/Module
BFM
Monitor Monitor
Interface/Module Proxy
Remote task & function calls
Model SCEMI Pipes BFM
Monitor Monitor
Emulator Tasks/Functions/Pipes

Monitor Driver Untimed transactions between testbench, Choice of transaction Timed signal level
BFM BFM
Stimulus Driver Driver DUT Responde
Responder Slave proxy models and transactors transport use models activity between
Pin IFs
DUT and BFM

HDL
RTL Transactor Testbench Throughput : t[HDL] / t[total]
Layer Layer Layer DUT
HVL Pin Interface H/W Bound : t[HDL] / t[total] >> (t[HVL] + t[HVL-HDL]) / t[total]
HDL Top

TACKLING TESTBENCH ACCELERATON PERFORMANCE

S/W Time S/W-H/W Communication Time H/W time


t[HVL] t[HVL-HDL] t[HDL]
• Workstation executing HVL-side testbench • Workstation-emulator context-switching and data transfer • Emulator executing HDL-side RTL DUT along
threads causing emulated design clocks to be • Profile acceleration runs with BFMs and clock/reset generators
stopped • Choose an efficient HVL-HDL communication scheme • Optimize for best possible emulator clock
• Profile conventional simulation runs to appropriate for the application at hand frequency, which is a function of the
analyze the testbench portion – discount the • For reactive applications (with instantaneous transfer of combinational logic critical path
RTL DUT to be allotted to the emulator control and data), use VIF-based inbound and outbound • Typical optimization considerations to
• Code SV/UVM testbench for simulation functions and tasks, or equivalently, DPI-C export and achieve higher emulator frequency and level
performance, not just functionality import functions out capacity are critical path analysis, clock
• Maximize constraint solver performance • For streaming interfaces (with producer and consumer utilization (inactive edges, edge alignment),
• Move assertions and coverage models to the decoupled), use SCEMI transaction pipes (e.g. audio, h/w parallelism, etc.
HDL-side where applicable and capacity is not video, Ethernet, etc.) • Use automated performance and capacity
of concern • Maximize H/W–S/W concurrency advisor technology

import ahb_types_pkg::*; interface ahb_monitor_bfm (ahb_if pins);


class ahb_seq_item Virtual interface pointer from testbench import ahb_types_pkg::*;
extends uvm_sequence_item; side to HDL-side BFM class ahb_monitor extends uvm_monitor;
... Package import of back-pointer class type
class ...
ahb_driver virtual ahb_monitor_bfm bfm;
extends uvm_driver #(ahb_seq_item); import ahb_pkg::ahb_monitor;
rand bit we; ... ahb_monitor proxy;
virtual ahb_driver_bfm
rand bfm;
bit [31:0] addr; interface ahb_driver_bfm(ahb_if pins);
rand bit [31:0] data; function void connect_phase(uvm_phase phase); function void run();
... rand int delay; ... ... -> start;
endfunction Time consuming FSM initiated
rand bit error; bfm.proxy = this;
Assigning the back-pointer in from the testbench side via 0-
virtual task run_phase(uvm_phase phase); task wait_for_reset(); endtask
the build or connect phase initial begin time function call
bfm.wait_for_reset();
constraint ahb_seq_item_delay_c { ...
forever begin
delay < 100; endtask task run_phase(uvm_phase phase); @(start);
seq_item_port.get_next_item(req);
} bfm.run(); @(negedge pins.clk);
bfm.drive(req.we, task drive(bit we, endtask monitor_daemon();
... req.addr, req.data, ...); bit [31:0] addr, data, ...); end
seq_item_port.item_done(); @(posedge pins.clk); function void write(ahb_seq_item_s req_s);
endendclass // Drive request on pin i/f req.from_struct(req_s); task monitor_daeom();
endtask ... ap.write(req); forever begin
endtask endfunction // Sample next request on pin i/f
Flexible modeling options: ...
... ...
• Separate read/write calls proxy.write(req_s);
endinterface Time consuming task called directly
• Separate address/data transfers (possibly end
endclass from the testbench side to the HDL endclass
forked in parallel) One-way outbound function call via back pointer from endtask
• Reactive vs. streaming communication side
BFM back to monitor proxy instance in testbench
The UVM driver wiggles the DUT pins endinterface
indirectly, no longer directly

class pmu_monitor extends uvm_monitor;


...
SV/UVM Testbench Acceleration Case Studies
uvm_event pmu_st_80_e; Design Size Simulation Time Acceleration Time Speed-up
function void trigger_pmu_state_80(); (gates) (hrs) (secs)
HDL BFM waits on PMU state and
if (pmu_st_80_e == null)
notifies HVL domain via proxy
pmu_st_80_e = m_config.ev_pool.get("pmu_st_80"); Application Processor +200M 151 3060 177X
backpointer trigger
pmu_st_80_e.trigger();
endfunction PMU monitor receives notification 34M
from HDL BFM and triggers
Network Switch 16 ½ 240 245X
endclass corresponding uvm_event interface bfm;
... Graphics Sub-System 8M 86 ½ 635 491X
class pmu_sanity_test extends pmu_test_base;
...
pmu_kg::pmu_monitor proxy; Mobile Display Processor 1.2M 5 46 399X
uvm_event pmu_st_80_e;
task run();
always begin
// Reset, init and register settings Memory Controller 1.1M 5 308 60X
wait(hdl_top.cluster_0.core_1.pmu.current_state == 8’h80);
...
@(posedge clk);
// Power sequences Face Recognition Engine 1M ½ 6.58 128X
proxy.trigger_pmu_state_80();
...
end
// Ping on PMU state
if (pmu_st_80_e == null) Wireless Multi-Media Sub-System 1M 53 658 288X
endinterface
pmu_st_80_e = ev_pool.get("pmu_st_80");
pmu_st_80_e.wait_ptrigger(); Raid Engine Controller I 25M 13 174 268X
UVM & Emulation, DVCon India 2015, HvdS
...
endtask
Raid Engine Controller II 25M 15.5 327 171X
endclass Test waits on uvm_event trigger

© Accellera Systems Initiative

You might also like