You are on page 1of 6

2013 14th International Workshop on Microprocessor Test and Verification

Verification Methodology of Heterogeneous


DSP+ARM Multicore processors for Multi-core
System on Chip
David Brier, Rama Venkatasubramanian, Sowmya Rangarajan, Abhishek Arun,
David Thompson and Neelima Muralidharan
Multicore and DSP Development Group,
Embedded Processors, Texas Instruments, Dallas, TX USA
dbrier@ti.com

Abstract—Processor complexity continues to evolve, with new


architectures more complex and more tightly intertwined with
the systems in which they operate than previous generations.
Magnifying the individual processor complexity is the need to
create heterogeneous processor clusters which contain multiple
heterogeneous processors (ARM and DSP) with multiple levels of
caches. These processor clusters need to be validated for
functionality and memory coherency across all the levels of
caches. Management of the verification process of these
processor cluster has likewise grown in complexity impacting the
creation and management of tests, of particular interest are the C
and assembly code driven tests which are the primary methods
addressed in this paper. Lessons in test creation from the UVM,
software coding and other previous test management methods Fig.1 Block Diagram of Keystone II System on chip from Texas Instruments [2]
are combined to permit automation of testing for generation of
test suites for processor sub-systems. Key elements of these and capable of expansion and modification thus making it
methodologies are detailed in this paper. highly desirable for reusable IP in large SOCs.
In considering a verification methodology for large multi-
I. INTRODUCTION core designs, where not only the number of cores in each SOC
Heterogeneous multicore Soc’s that use a combination of may vary but also being heterogeneous (ARM or DSP), the
DSP and ARM multicore processors are extensively used in lessons which UVM provide can be extended into the practices
many application markets including but not limited to for creation of tests code for the CPUs which provide system
basestations, multimedia gateway applications and high stimulus for verification. Given the objective of “UVM like”
performance computing [1]. Extensive integration at the structuring of the code which will run in simulations on the
System on chip (SOC) level leads to heterogeneous processor cores there are several issues that must be addressed, where
clusters that lead to extremely complex processor clusters with UVM models are rich in features, some containing many
multiple levels of cache. The Texas Instruments KeyStone II methods in the classes, code which runs on the CPU in a
Multicore Architecture [2] provides a unified platform for simulation must perform the task as efficiently as possible,
integrating ARM and C66x DSP processing cores [3] along therefore the choice is made to follow the examples of UVM in
with both hardware/firmware based application-specific the creation of library elements and the phasing of the test code
acceleration and high performance I/Os. From a design so that test cases using multiple code segments across varying
verification standpoint, the development team is expected to numbers and types of CPUs may be automated.
validate the functionality of the complex multicore processors Large numbers of test cases are written for verification of
in very short development cycles. This demands a methodical the CPU based sub-system for an SOC, this code, in many
approach to Design Verification of the heterogeneous cases contains elements which are similar for each of the CPUs
processor clusters. Fig.1 illustrates the Keystone II System on and in many instances may run on different types of
chip architecture from Texas Instruments [2] and this paper processors. Identifying “library functions” is an important step
illustrates the verification methodology for the heterogeneous in this methodology just as it is in implementing an effective
multiprocessor cluster in the context of such a multicore Soc. UVM like flow. In addition to the lessons which can be taken
from UVM, the adoption of some typical software design
practices can also be adopted and leveraged in performing sub-
System Verilog has been extended using the Universal system verification. Generators create the thing API
Verification Methodology (UVM), providing a platform functionality driven by the Test Control File for the test case
approach to verification of complex IPs which is both reusable allow maintainability as do APIs in software development but

1550-4093/13 $31.00 © 2013 IEEE 112


DOI 10.1109/MTV.2013.32
only generate sufficient code to perform configuration III. LIBRARIES
operations which are needed to support the specific setup. Libraries for testing are built and managed to assure that
The final aspect of the methodology is the application of the code contained within exists only once amongst all of the
the test code onto the various sub-system configurations which libraries and is available to all test cases which might have
are specified using System Verilog Configurations (Build need of that code. There are three libraries with clearly defined
Configuration). In addition to the sub-system configurations objectives. Critical in the control of testing quality is the
which are specified for SOCs, the configuration is also assurance of uniqueness of a code unit. This is contradicted
modified for efficient verification reducing test case run times when code is dispersed into the individual test case directories
by depopulating CPU slots to increase simulation performance where slight modifications may be made in order to customize
and performing conditional code builds to reduce the number the code to the specific application. The need for these slight
of operations which any processor must execute. Therefore a deviations can be eliminated by planning of the code units to
management system automates the generation of tests for the meet a more generic distributed use model. Functional library
sub-system and will take information provided by the test case components utilized to build test cases leverages the
creator about the requirements of each test case and centralization to enhance repeatability, as well as
automatically match these to the configurations which are maintainability of all tests. Having a single instance of a well
specified using the System Verilog Configuration. This thought out code unit used for different test cases not only
automation allows aggregating code from existing test cases reduces the effort involved in creating test cases but also
into newly created test cases in order to build compound tests permits inheritance of enhancements and corrections which
while enhancing maintainability and improving the utilization occur during the test development time.
of existing test cases across various platforms. This A. Functions Library
methodology adds another dimension in the randomization of
the verification process by being capable of randomly Functions which are performed to manipulate the DUT in
populating/depopulating processor slots and in a constrained such a manner as to cause an observable action to occur are
random method combining test cases into single simulations by loaded within the Functions Library. The code within this
applying different test cases to each CPU. library is divided into individual sub-directories which are
organized by the codes target platform.
II. TEST CONSTRUCTION
• General Purpose – code which is capable of
Mimicking the structure of a UVM test, each test has running on any CPU.
multiple phases which the simulation passes through to
complete the setup, testing and reporting. Unlike UVM which • DSP Specific – code which utilizes features that
utilizes the software simulator to pass events from one model are found in the DSP platforms.
to another test code which runs on a processor sub-system must • ARM Specific – code which utilizes features that
rely on the CPU passing flags to a mailbox to communicate are found in the ARM platforms.
with the other CPUs in the simulation model. Flags must also
be passed to the UVM portion of the testbench in order to Further delineation is necessary when considering code
synchronize events which may be driven by BFMs and provide within one of the above categories, code which runs on an
error management and simulation termination commands. The ARM class processor may need to be further subdivided into
test case code based phases are: the type of ARM, i.e. ARM9 and even whether the processor
contains certain extensions.
• Initialization Phase: init()
Elements contained within the code library are used as
• Configuration Phase: config() building blocks for the test cases which are described by a Test
• Pre-Run Phase:: prerun() Control File which can aggregate code from the library to
control the DUT in such a manner as to create a sequence of
• Test Run Phase: run() events which must complete properly to produce a measurable
and correct result. Each of the code divisions mentioned above
• Post Run Phase: postrun() may contain code which only manipulates the DUT, checks the
• Result Extraction Phase: extract() state of the DUT thus making the results visible or manipulate
and check when precise timing or interleaving of events are
• Test Report Phase: report() required for the testing of specific functions or interconnects.
Each of the phases represent discrete activities that are 1) Manipulators
accomplished in that phase with the Pre-Run and Post Run Library code which causes a certain state to exist in the target
phases being the general synchronization points in the test. is classified as a manipulator. The manipulation may be
Initialization for the different processor types is addressed in loading a memory with a data pattern setting an interface into
init(), and config(), where in many cases run() is capable of
a certain state, or causing a sequence of state changes on an
running on a DSP or ARM class processor. The characteristics
interface. Manipulators may also load a particular value into a
of these tests are captured at the time the test is created to be
utilized in the test allocation process. register or set of registers which in turn may cause some
activity in the DUT to take place with the objective of
producing a result which may be checked for correct behavior.

113
2) Checkers to generate standardized performance metrics and are not
Checkers can have a wide range of actions such as simply generally run for design coverage.
checking that data exists in a memory location, the data
matches a specific pattern, or a sequence of patterns which can
C. Conversion Library
be used as proof that an event has occurred. These building
block functions are normally used to observe behavior in a Verification of the sub-modules within the DUT is
DUT which has been induced by a Manipulator. Checkers accomplished at the individual IP unit levels; the primary
also will produce error data which may be used by a reporting functional coverage for the DUT is performed at the sub-
function to present a detailed error report for the test. module level in a full UVM environment. These tests emulate
the interface activity between the sub-module and the other
sub-modules with which it interacts. Every bus in this
3) Manipulators with Checking methodology has a protocol monitor attached for checking
For complex interactions where specific manipulations and proper transaction activity and thus is capable of recording
checking are required code is written which will perform these transactions. Selecting transaction recordings which can be
interactions. Many of these applications contain tight loops translated into CPU driven bus transactions, C code can be
which may perform many manipulations which must be generated which when run on a CPU generates the same
followed immediately by a checking operation. Code which is sequence of transactions on the bus pertinent to the test target.
categorized into this bin should most often represent the entire These tests cannot generate the precise sequence timing that
test. the unit level testing does, and it will not guarantee that the bus
sequences will not have another access sequence interleaved,
but the sequence to the target will remain consistent, thus
B. Test Case Library making it possible to port many sequential event tests to the
Test Cases are maintained in a directory structure which DUT.
divides the library by sub-module target and then by operation
so that tests which have a similar goal are contained in one IV. TEST CASE MANAGEMENT AND SIMULATION BUILD
directory. These Test Case files are also assigned a structured ENVIRONMENT
name which encodes the function, thus they are named in a Several issues arise with a large number of test cases and a
manner: multicore sub-system, such as tests which require a particular
<function_code><sequence_number> type of CPU to run, or certain positions in the architecture may
physically have different functions available. The intent of the
test may also be lost through project attrition as team members
The individual tests cases in this library do not contain any are rotated off, thus demanding a robust DV environment.
executable code; the Test Control file contains the list of When different sub-system architectures are required for new
threads and specifies the code which makes up those threads, SOCs, it may require testing which was not done before, with
thus defining the test case. The file also contains the an automated methodology for test case assignments the
classification of the test case, permitting tests to be called by environment comprehends the differences and locations of
classification; some of the classifications are identified as: specific elements in the sub-system and can allocate the
appropriate testing to each CPU based on type, location of the
1) Smoke core in the processor cluster and interaction with other cores in
Smoke tests are those which give a basic indication that the the cluster.
DUT is working properly. They are not exhaustive but are
A. Automated Test Allocation
chosen because they run quickly and provide a check of all of
the portions of the DUT. Each test case directory must contain a Test Control File
2) Functional which provides information about the test case, such as the
type of CPU that it can run on, whether it can run
Functional tests comprise the bulk of the testing, these are the
simultaneously on multiple CPUs, and positional restrictions
test cases which obtain the functional coverage, they are
within the hierarchy, the source code file and parameters which
similar to the Smoke tests, and the primary difference is most require modification from the default configurations which are
often run times. These run as a cursory examination of the specified in config() state of the simulation. This Test Control
functionality of the DUT would consume large amounts of File is created at the time of test case creation and captures the
simulation time. intent of the writer for use in allocating tests across multiple
3) Performance Build Configurations. It permits allocation onto multiple
Performance tests are used to determine the capabilities of the processors or the code to be use in compound tests as are
DUT to execute a particular operation, be that an algorithm, present in the different builds which are tested.
the transfer of data or some other operation which would have
Each test case may be comprised of code which is common
meaning in the evaluation of the DUT while performing
to multiple tests, or code which is written specifically for the
typical applications operations. test case. Preference is given to re-use of test code where
4) Benchmark feasible, but the methodology permits unique code creation and
Benchmark tests represent those common benchmarks such as inclusion in test cases.
Dhrystone, Whetstone, etc... These are tests which are utilized

114
The test allocation methodology permits Test Cases to call Processing the Test Plan data base generates a Test List
out specific combinations which are permitted, limiting them to which, was generated according to the crossing rules place on
one Test Case, one Configuration and one Build to multiple the Test Plan and is a complete listing of all the Test Cases
combinations, all controlled by the test case developer who is crossed with the possible Configurations for the DUT. The list
capable of completely describing any limitations in the Test output when processing the Test Plan is a combination of the
Control File and the Test Plan. test case name and the configurations which that test case is
crossed with.
1) Test Case
As discussed previously, the Test Case is a control file 2) Generation of Tests
which stipulates the code which comprises a thread in the test Inputs to the Test Generator builds a test that performs the
and which processor slot it will run on and the resources functions as called out in the Test Control file with the device
required supporting the test. configurations on the build configuration based upon three
categories:
2) Configuration (Setup)
Configuration setup is a collection of files which identify a • Test Case (Test Control File)
unique setup condition for a parameter in the DUT. For
example, L1D Cache size would be one configuration which • Configuration
would be stipulated. These may be aggregated and crossed • Build
with the Test Case to create multiple Test Case Scenarios.
Each category permits configuration of the test being
3) Build Configuration generated to target specific attributes of the DUT. The Test
The Build Configuration will control the presence of Generator pulls the test code specified in the Test Control File
resources within the DUT utilizing a System Verilog from the code library into a build directory for each thread
configuration file. System Verilog configurations can be used specified. The generator also utilizes the parameters set in the
to control which processor slot is populated with an RTL Configuration files called out for the test and generates the
version of the processor or a BFM which stubs out all the code needed to set these configuration parameters in the DUT
signals. At the SOC level individual modules may be saving that into the build directory and lastly maps the test
eliminated from the simulation or module types may be code onto the build configuration(s) specified in the Build
exchanged. The objective is to reduce the simulation image parameters.
size which improves performance of the simulations without
modifying the structure of the design at the netlist level. The Test Generator performs an audit of the Test Case,
Configuration and Build parameters to assure that the test
This also is used to release various combinations of Builds specification is capable of running. For each mapping to a
for different end product architectures. This method of control Build, the generator audits the code called out in the threads to
permits the utilization of the crosses between the Test Case, assure that the compute resources are present in that build for
Configuration and Build. These are controlled by the Test Plan the code to run. These checks include the number of
and the Test Generator. processors, one for each thread, the type of processor and the
hierarchical location of the processor.
B. Test Plan
The Test Plan is used to dictate the legal combinations of These audits are in place to assure that the crosses made in
the Test Cases and the Configurations (Setup). Each Test Case combination with the Test Plan processing and the Test Control
may support multiple Configurations, and the method for File produce tests which will run properly on the Build
stipulation permits specifying ranges of Configurations using configuration which is being targeted.
wild cards, simplifying the specification process. 3) Test List
The Test Plan is used to generate a crossing list of the tests The Test Generator outputs two items, the database
which shall be run; this listing is exhaustive for the containing the test code which is a compilation of code copied
combinations which are legally possible based upon the inputs from the code libraries based upon the thread definitions
to the Test Plan. contained in the Test Control File and the generated
configuration code called out in the configuration specification
Under configuration control and change tracking a history in the Test Plan. This list will produce working simulations for
of the Test Plan development is maintained for verification verification purposes, as the generator has audited resources
tracking purposes. This allows auditing of the testing which and configurations for matching the test case requirements,
has been performed on any processor sub-system and also thus eliminating tests which cannot pass on the Build because
permits running of additional tests on a sub-system which has of missing resources.
previously verified and updating tracking for complete
histories of the verification process. The final Test List is comprised of a listing of the tests
created following the naming convention enabling the
C. Test Generation simulation environment to pull the data bases and allocate the
Test generation is a multi-step process; the methodology is tests which have been called across the server farm. The
scripted, thus guaranteeing repeatability of the process. naming convention of the tests is:
1) Test List test_case-config1-config2…config(n)-build_config

115
Thus each test has a unique name and can be mapped into the other test cases which utilize the same code. A common
the data base which tracks the simulations status for each test code library using base functions as test building blocks is one
for a given design build. of the primary methods of test development in UVM, adapting
the common library approach strengthens the verification
V. TESTING PHILOSPHY process in an environment driven by code running on CPUs in
Verification of large complex designs represents a huge the simulation by assuring the propagation of updates to the
load on the simulation environment, to the point where code units.
simulation performance is significantly degraded. Performing One significant departure to note is that in UVM most
many of the functional tests at the unit level reduces the modules are all feature rich, the classes used have multiple
number of test cases which must be created and run at the methods which are utilized for manipulation of the class,
device level. Even the CPU Cluster must be broken down into wherein the code written to run on the CPUs is written as
unit tests and sub-system testing at the Cluster level and the simply as possible for simulation performance purposes. Even
loading on the simulator for the simulation at the Cluster level with when writing as such primitive level flexibility of the
still represents a significant simulation load and thus is code unit is taken into consideration with the objective of
likewise verified in parts. making the code as flexible as possible.
cluster_1 A. Tests
DSP DSP DSP
Tests are created to target specific features within the
design, as previously mentioned, basic functionality is not the
primary objective of the testing at the higher levels of
Soc L3 integration and coverage of some functionality is performed as
a means to an end.
1) Interconnect Tests
The first test objectives at the higher hierarchical levels is
ARM ARM ARM bus interconnect testing, this is accomplished by targeting
specific memory or register locations throughout the hierarchy.
Coding library modules in a manner that permits retargeting of
The figure above shows the makeup of the CPU Cluster, payload addresses and/or transfer sizes allows the same
Build configurations are used in simulations to depopulate the modules and even threads to be utilized to test many
specific blocks within the Cluster to enhance simulation interconnects within the design. Payload addresses may be
performance. Verification performed at the unit level for the mapped into the L1D Cache, an EMIF, or a System Memory
sub-modules instantiated in the Cluster is leveraged to reduce location. The interconnect testing can be accomplished using
the amount of Cluster level verification require. This reduces the same code for each location and threads can be modified in
the primary verification at the Processor Cluster level coverage the individual test cases to customize them for the
of sub-module interconnect and functional coverage of events characteristics of the bus being verified.
which require interactions between multiple sub-modules in the
Processor Cluster. The objective is to provide through coverage A single instantiation of a master element may be
of the Processor Cluster while minimizing the time it takes to instantiated in each of the slots connecting to the Switch Slave
develop functional coverage. This is done through analysis or Ports in separate Build configurations, until all ports have been
the design, identification of coverage points and carefully verified. Each of these represent a new test case, which is an
planning the required resources for each test case created. adaptation of existing tests cases. Replication of test cases,
deriving executable code form a library maximizes reuse of
Lessons learned in the DV environment which drove the code and enhances consistency across all test cases using a
creation of the UVM methodology within System Verilog are specific code module.
applied to CPU driven code based verification, the structuring
of the test environment as well as the test generation process 2) Interaction Tests
applies many of the items which UVM incorporates and Interaction between masters must be verified at the
applies them to the code based verification. Processor Cluster level, this testing utilizes the same library
code that was used for interconnect testing. Many of the
Test code is stored in a Library and the test generator pulls threads which comprised Interconnect test cases can be utilized
the code for each individual test directly from that code library for Interaction testing, in many cases the Test Control file for
to prevent copying of code into test case directories. This the test case will have multiple threads assigned to different
enhances the maintainability of the test code as well as CPUs in the Build. Each of the threads may also be the same
representing a more formal structure which can be used to sequence of code units, the Test Generator is capable of
audit the completeness of the tests and the propagation of generating code which either places the payload for each thread
revisions to a particular module of code which is used in in different memory spaces, the same memory but different
multiple test cases. One motivation; it has not been uncommon addresses or in the same addresses thus generating different
in non-library based environments to discover code from one bus and target conflict scenarios.
test case that has been copied to other test cases, an error, or
deficiency found and that change has not been propagated to

116
3) Complex/Functional Tests The deterministic nature of the Test Generation allows
Performance and Benchmarking tests are specified in the reliable generation of tests, with this environment a Test Case
same manner as all other tests, these are treated by the Test may be regressed against all Configurations and Builds it may
Generator in a manner similar to that of all of the preceding map to as well as when a new Build is derived for a new
tests. product, it may be regressed against all Test Cases and
Configurations which it is capable of supporting.
In addition, some functional tests which were performed at
sub-module unit level may need to be run at the higher levels VI. CONCLUSION
of the design driven by code run on a CPU. These tests are
translated into C code taken from the logging monitors which As the complexity of multicore designs increase they
are connected to the unit level master bus. These monitors log require more judicious application of compute resources by
bus writes and reads which in turn represent writes and reads creating test environments which can automate the mix of
that a CPU would perform. Conversion of these types of tests architecture and test case to generate complete testing with
is optimal for sequential operations, as timing of bus lower expenditures of man power. In facilitating reuse of IP it
transactions for a CPU will normally be quite different from is incumbent on the design and verification teams to establish
that of a transaction master in a unit level test. methodologies which capture intent and in as automated
manner as possible apply that intent to designs.
Conversely as bus transactions are recorded by the bus
monitors, it is possible to replay CPU generated transactions in This methodology provides the capture of intent and the
the unit level verification environment in order to add test cases application of that intent when generating the complete test
which uncover additional design coverage at the sub-module from the test case files and applying those tests to the
level. appropriate architectures. Properly applied and maintained the
data bases which are created in the design and verification
B. Test Environment Utilization phase of the first sub-system can greatly enhance the yield of a
Capabilities of the test environment enable the specification design and verification team, thus reducing development time
of tests to be run with the following controls: and increasing quality.

• Test Case REFERENCES


o Specific Test Case
[1] The Case for Heterogeneous Multi-Core SoCs –
o Regexp Range of Test Cases http://chipdesignmag.com/display.php?articleId=2090
[2] http://processors.wiki.ti.com/index.php/Keystone_II_multicore
o All Test Cases
[3] Damodaran, R.; et al., "A 1.25GHz 0.8W C66x DSP Core in 40nm
• Configuration CMOS," VLSI Design (VLSID), 2012 25th International Conference
on , vol., no., pp.286,291, 7-11 Jan. 2012
o Specific Configuration(s) [4] Bishnupriya Bhattacharya et-al, Advanced Verification Topics: Cadence
Design Systems.
o Regexp Range of Configurations
o All Applicable Configurations
• Build
o Specific Build
o Regexp Range of Builds
o All Builds

117

You might also like