ACCELARATING SOC LEVEL FUNCTIONAL
VERIFICATION CLOSURE USING
CO-EMULATION TECHNIQUES
Kavya R Shrinidhi S Rao Ruchi Misra
Samsung Semiconductor India Research Samsung Semiconductor India Research Samsung Semiconductor India Research
Bengaluru, India Bengaluru, India Bengaluru, India
kavyar.77@samsung.com shrinidhi.r@samsung.com ruchi.misra@samsung.com
Sanjoy Saha Alok Kumar Garima Srivatsava
Samsung Semiconductor India Research Samsung Semiconductor India Research Samsung Semiconductor India Research
Bengaluru, India Bengaluru, India Bengaluru, India
sanjoy.s@samsung.com kumar.alok@samsung.com kumar.alok@samsung.com
Abstract—Verification and Validation life cycle of an SoC goes
through multiple methodologies to flush out all bugs in the design.
Out of these SoC Design Verification Simulation is the first step
and is often the bottleneck owing to large simulation run times. In
an SoC involving a RTL for multiple Processing Units of various
IPs, along with dedicated CPU core for debug methodologies, run
times are large owing to the activity of these multiple processing
elements. Compared to simulation, emulation methodologies run
faster, but often lack verification capabilities of simulation. We
try to bridge this trade-off by leveraging emulation capabilities
in simulation. This methodology of verification of the SoC with
Multiple CPUs, enables faster run times and better verification
lifecycle. In this paper, we describe how decrease in run time can
be achieved with UVM and C level testbench through creation
of reusable testbench using hardware acceleration platform. We
have observed 92 percent performance gain in our experiments
and suggest adopting this methodology for a faster and efficient
debug in future projects. Fig. 1. Typical SoC with integrated Debug environment
Index Terms—SoC, hardware acceleration, C based test bench,
reusable
As shown in figure 1, the SoC consists of multiple masters
I. I NTRODUCTION which include processing elements within them. The SoC has
Time to market is meaningful if we ensure a bug-free a Debug architecture which is used to perform Debug access
product on time. If a quality based product is released and to these masters, which will be referred to as debug targets.
wins the market then, proportionally quantity of the product With growing complexity, modern SoCs can have multiple
is also increased. To release a high quality SoC on time, a processing elements, which are all debug targets and to which
huge amount of verification effort and design-cycle is required, port connection and functionality has to be verified at SoC
which in turn increases the complexity and size of the test- context. Debug refers to features that enable observation or
bench. With the adoption of UVM testbench i.e the reusable modification of the state of design. Features used for debug
environment, we do not address other needs such as ability to include the ability to read and modify register values of
run and debug large number of tests in a short amount of time. processors. Debug frequently involves halting execution once
With such high complexity, the speed of simulation becomes a a failure has been observed, and collecting state information
bottleneck as modern SOC are getting more complex in terms retrospectively to investigate the problem. The Debug port
of supported features and gates used. Reducing the number shown in Figure 1 implements a bridge between the external
of simulation tests to meet verification timelines will pose a protocol and various on-chip protocols. This bridge provides
threat to quality of verification. The verification effort needs to a flexible and scale-able solution by powering up individual
be integrated with the emulation techniques to eliminate long debug targets, irrespective of the activity of other targets in
simulation time and huge disk consumption related issues. the SOC.
II. T ESTBENCH A. Acceleration Methodologies
Test bench for verification of the SoC subsystem involv- So to make ease-of-verification for DV engineers, hardware
ing multiple heterogeneous processing elements along with acceleration techniques have been introduced. Acceleration are
the debug environment are written in C as per the DUT techniques that are used to address performance shortcomings
Specification and Verification Intent. The C code is compiled of conventional simulation.
using the gcc compiler to generate hex code. This hex code This technique includes different approaches
is loaded into memory via backdoor. Upon reset release, the 1.Signal based Acceleration [Co-Simulation]
processing element and debug master starts execution upon. 2.Transaction based Acceleration [Co-Emulation]
This C-based test bench is supported by UVM sequences
to initialize and configure signals and systems before the Signal based Acceleration: The exchange of signal
chip boots up. The stimulus to control the simulation flow values back and forth between the testbench (workstation)
and verify various functionalities are written in UVM based and the emulator (hardware platform) is termed as Signal
testbench The simulation is then run on the conventional RTL based Acceleration. The signal synchronization must occur
simulator. Simulation are often plagued by long run times and on every clock cycle. The testbench that uses behavioural
iterations for debug fixes. So this adversely effects time to verification models to drive and sample the design interfaces
market. requires the use of signal based acceleration. The verification
III. V ERIFICATION M ETHODOLOGIES AND THEIR components such as drivers, monitor, agents are termed as
CHALLENGES Bus Functional Models, which have to be in sync between
It is observed that almost 60-70 percentage of design-cycle- TB and DUT at every cycle.
efforts are absorbed in functional verification and debugging.
So it is very important to make the verification phase of
the design-cycle faster and efficient. To reduce the time
spent on verification, we always look for a better verification
environment and better tools, but no matter how fast are the
advance methods we use, software simulators suffer their na-
tive feature of being sequential. This very nature of simulators
being sequential and time consuming makes us adopt better
methodology in order to overcome these hurdles which are the
bottlenecks in the functional verification phase.
The Alternate method of Verification is via Emulation.
“Emulation” refers to the process of mapping an entire design
into a hardware platform to further increase performance. In
Emulation the design is compiled and run on an emulator
which mimics the working of a real chip. Emulator doesn’t Fig. 2. Co-Simulation Environment
require constant connection between testbench environment
and hardware platform. Since there is no communication to
the testbench from hardware, Emulation run times are in the The above Figure 2 illustrates
order of few minutes to hours, providing significant gain in
run times. Even with the elimination of constant connection • Single clock cycle based transaction.
to the testbench environment, some of the hardware platforms • BFM’s (drivers, monitors, etc.) are present in TB.
expect for on-demand testbench access for activities such as • Hundreds to thousands of signals and event updates
displaying messages, assertion-failure data and loading new between TB in simulator and DUT in emulator.
memory data to the simulator screen to indicate progress
problems. Transaction based Acceleration: TBA exchanges only
But DV engineers cannot always rely on Emulation runs, high-level transaction data between testbench (workstation)
since the user does not have complete control over the stimulus and the emulator (hardware platform), at less frequent
to the DUT. Emulation does not have testbench which is intervals. TBA splits the verification environment into two
the major disadvantage for DV engineers to port their tests parts: Hardware platform - that control the design interfaces
to Emulation and also the process of verification involves on every clock; Software platform - the high-level generation
scripting by using Trace32 feature, which has a significant and checking that occur less frequently. TBA implements the
learning curve. low-level functions in hardware and the high-level functions
Compared to Emulation, Test Bench Acceleration tech- on the workstation. TBA increases performance by requiring
niques provide comparable gain of run time, along with less frequent synchronization and the option of buffering
absolute control over the test bench and stimulus. This makes transactions to further increase performance.
this approach very attractive to DV engineers.
• Co-Emulation enables us to leverage the best of both
worlds, namely hardware providing expected speed and
software providing the desired flexibility and ease-of-use
of the advances features methodologies.
V. M ETHODOLOGY
A. Updated TB architecture
The foremost requirement for the update architecture is to
have HVL and HDL top level module hierarchies as shown
in Figure 4. Software environment involves the HVL domain,
which contains the non-synthesizable code such as logic
present in transactions, scoreboards etc. Hardware platform
Fig. 3. Co-Emulation Environment includes HDL domain, which include the synthesizable logic,
i.e logic present in Driver BFM, Monitor BFM, DUT, clock
and reset generators.
The above Figure 3 illustrates
• Communication with emulator is transaction based..
• BFM’s (drivers, monitors, etc.) are emulated along with
RTL DUT
• Testbench passes multiclock cycle transactions to BFMs
in emulator.
• Clocks are generated in emulator.
IV. CASE STUDY
As a rule of thumb the environment can optimally parti-
tioned between hardware and software, of which syntesizable
codes are added to emulator (hardware) and non-syntesizable
code remains in software environment. This is termed as Co-
Emulation or Transaction Based Acceleration.
• Our simulation, which consists of C and UVM can be ac-
celerated to run many times faster than on the simulation
platform. This is achieved by bifurcating the testbench Fig. 4. Simulation vs Acceleration
into software (stimulus generation and scoreboard part)
which are non syntesizable and the hardware (BFM’s
interact with the DUT based on signal level activity). B. Acceleration of C test bench
• Creating two separate domains [Updated architecture]: AVIP’s establishe communication from HDL to HVL and
Non-synthesizable and untimed hardware verification lan- vice versa through interface task/function using funnel based
guage [HVL] and a synthesizable hardware description approach. The classes on the HVL side, which acts as a proxy
language [HDL]. to the BFM interfaces, will call appropriate tasks and functions
• Moving all the synthesizable code from the testbench declared inside the BFMs via virtual interface handles to drive
for emulator synthesis in HDL domain with the help of and sample DUT signals.
BFM’s and all the untimed and behaviour modelling in
HVL domain.
• Accelerated Verification IP (AVIP) enables hardware ac-
celerated simulation providing connection to the DUT.
It is one of the approach being used in our case study
probably an interface to be used as a communication
between HVL and HDL domains.
Fig. 5. Modified Test bench flow
• Software testbench is being simulated in the software
simulator on the conventional server where as hardware
testbench along with DUT runs on the Emulator. Figure 5 shows the transaction between test bench and the
• Since Emulator runs much faster than the Simulator, in DUT where AVIP’s and DPI helps in communicating. All the
order to sync data flow between them, the communication proxy BFM’s codes are accelerated through AVIP’s, which
is intermittent and data. This enhances the performance the codes were made synthesizable and loaded on emulator.
of verification on the hardware acceleration platform After hooking up of AVIP’s, part of the test bench remains
(Emulator) over the Simulator. synthesizable.
UVM testbench provides much higher control over the complex stress scenarios. Summary of the results are captured
stimulus to DUT compared to C based testbench. DV engineer in the Figure 8
can have some of the C functions call complex UVM codes,
that executes stimulus to probe and stress the design. Portion
of these functions also can be synthesized and loaded on to
the emulator. This enables faster execution of DPI calls, while
giving higher control over test execution compared to pure C
based TB.
Direct programming interface (DPI): DPI allows direct
inter language function calls between Sytem Verilog and
C ++ language. DPI’s allow transfer of data between two
domains through function arguments and return. Functions
defined between system Verilog and C++ are named as import
and export functions. Figure 6 and Figure 7 show examples
of usage of DPI.
Fig. 8. Experimental Results
CONCLUSION
We present our case study in this paper taking mixture of
both C and UVM based tests bench, partitioned the same with
respect to hardware and software portions. We here explain
about the mechanism of software and hardware interactions
and share the performance improvement observe in hardware
acceleration technique which helped in enhancing the overall
Fig. 6. Code guidelines with DPI-C performance gain, productivity and faster verification closure.
R EFERENCES
[1] Jain, Abhishek & Gupta, Piyush & Gupta, Hima Dhar, Sachish. (2014).
Accelerating SystemVerilog UVM Based VIP to Improve Methodology
for Verification of Image Signal Processing Designs Using HW Emula-
tor. International Journal of VLSI Design & Communication Systems.
4. 10.5121/vlsic.2013.4602.
[2] J. Vikas Billa & Sundar Haran (2018).Challenges and Mitigations
of porting a UVM testbench from Simulation to Transaction-Based
Acceleration(Co-Emulation),DVCON US.
[3] Hans van dar Schoot Et. Al., Off to the Races With Your Acclerated
Sytsem Verilog Testbench.
Fig. 7. Code guidelines with BFM’s
C. Issues faced during porting of simulation to acceleration
• Wait, force and assign statements are untimed transac-
tions, which are not supported by the accelerator by
default and we will have to export hierarchies using
tasks such as $exportread, $exportfrcrel and $exportevent,
provided by the vendor.
• The hierarchies added in testbench need to be updated
with paths seen by the emulator, which is often different
from the path seen in simulator
RESULT
In our experiment with the test bench accelertion, we
observed about 92% acceleration in run times compared to
traditional simualtion methodologies, across 6 categories of
tests. These tests range from simple register access tests to