A Scalable, Coherent
Network-on-Chip Solution
NetSpeed GEMINI: A Scalable, Coherent, Network-on-Chip Solution

The last few decades have seen a massive growth in the number of CPU cores, computing clusters and
other IP blocks in an SoC. This massive growth along with the need for complex chip integration has
driven the need for sophisticated interconnects. SoC architects have employed a variety of methods -
from buses to crossbars to hand crafted NoCs with Lego
like blocks with varying degrees of success.
The increase in number of agents accessing a critical resource like memory has also meant the shared
data needs to be managed to ensure cache coherency. This coherency can be achieved either through a
software-based coherency solution or a hardware-based coherency solution. Factors including
performance, power, and time-to-market make hardware-based coherency the preferred solution.

However, existing hardware-based coherency solutions have two key limitations on performance and
scalability. First, coherency systems are usually fixed configurations, which means they cannot adapt to
your system requirements. They may be over-designed or under-performing. Second, to manage the
complex on-chip communications, they employ separate interconnects for coherent and non-coherent
traffic. This creates unnecessary floor planning obstacles, prevents efficient resource sharing, requires
multiple interconnect methodologies, and requires additional hardware support to allow the traffic to
interact. NetSpeed GEMINI addresses these issues effectively through a unique scalable coherency
architecture and a sophisticated fabric that handles both coherent & non-coherent traffic.

NetSpeed GEMINI is the second product in the family of NoC IPs from NetSpeed Systems. It is a high-
performance, scalable, coherent NoC solution. It supports all three levels of coherent traffic cache
coherent, I/O coherent & non-coherent traffic in a single NoC. NetSpeed GEMINI provides full cache
coherency for small & large systems from 1 to 64 coherent CPU clusters and 1 - 200 I/O & non-cached
agents. NetSpeed GEMINI NoCs deliver high performance & significant time-to-market advantages to SoC
designers for a wide range of markets from mobile, networking to high-performance computing.

Complexity in multi-core SoCs has increased
dramatically over the last few years - the
number of CPU cores & other compute agents
like GPUs and DSPs has increased both in
numbers and complexity. In these SoCs, access
to memory is the critical performance
bottleneck. To address this complexity, modern
SoCs have adopted multiple layers of memory
caching from a local or cluster-level L1, L2 cache
to a system-level L3 cache.
As the number of caches
increases, the method of
keeping these caches
coherent with each other
and the main memory has
also become difficult.
Cache coherence is
addressed through two
main techinques -
Software-based coherency
and Hardware-based coherency. In the software-
based coherency model, the programmer is
tasked with maintaining memory coherency,
dealing with stale memory and invalidating
cache & memory lines. Hardware-based
coherency utilizes a coherency protocol and
hardware support to automatically maintain
coherency in the system.

Complexity of software-based coherency
systems grows with the number of agents as well
as the kinds of agents in an SoC. With increasing
use of heterogeneous architectures &
sophisticated workloads, software-based
coherency solutions do not scale. As shown in
the figure below, this has led to increasing
percentage of software costs incurred in
developing systems. Hardware-based
coherency, on the other
hand, has three distinct
a. Reduces Power:
Caches do not need to be
flushed when passing data
between agents
b. Increases system
Sharing data requires no
additional software
overhead, and fine-grain sharing is possible
c. Reduces Software Complexity:
Coherency is transparent to the software,
allowing direct sharing of data without the
need for software maintenance.

Coherency solutions
Reduces Power and
increases Overall
system Performance
NetSpeed GEMINI is the second product in the family
of NetSpeed's Network-on-chip IP products.
NetSpeed GEMINI is a fully cache-coherent, high-
performance NoC IP. NetSpeed GEMINI uses an
innovative directory-based approach to address the
issue of scalability in multicore and multi-cluster SoC
systems. SoC architects can build small and large
coherent interconnect systems. Using GEMINI,
architects can connect anywhere from 1 to 64 fully-
cache coherent CPU clusters, GPU blocks and other
coherent compute blocks. It also supports 1 to 200 I/O coherent and non-coherent agents. Currently,
NetSpeed GEMINI supports AMBA 4 agents with future revision planned to support AMBA 5.

NetSpeed GEMINI uses the underlying NetSpeed NoC technology allowing it to deliver a customized NoC
for any given SoC specification. Many traditional approaches separate out coherent and non-coherent
traffic, creating inefficient resource sharing and additional hardware support to handle the two
interconnects. NetSpeed GEMINI, on the other hand, handles both coherent and non-coherent traffic
seamlessly in a single underlying fabric. It also uses a number of proven algorithms to optimize the SoC
interconnect, providing a high-performance, coherent Network-on-chip solution. Finally, NetSpeed GEMINI
uses graph theory and formal techniques to ensure that there are no protocol-level or network-level
deadlocks in the entire system.

SoC Architects can
connect up to
64 coherent CPU clusters
and up to 200 I/O and
Non-coherent agents



NetSpeed GEMINI is configured and optimized using NocStudio - a NoC architecture exploration platform
and design compiler. NocStudio takes detailed user specifications & uses machine learning algorithms to
identify the ideal topology needed while solving complex SoC issues like QoS & Deadlock avoidance.
NetSpeed GEMINI design flow uses placement-aware optimizations to tailor the topology and its channel
and buffer sizing are fully heterogeneous. Broadly, NocStudio has three main steps in the design flow:
1. SPECIFY: NocStudio takes high-level SoC specifications like components & their connectivity,
performance requirements (bandwidth, latency, power), coherency requirements (coherency bandwidth,
protocol, participation level) and other SoC requirements like Quality of Service (QoS).
2. OPTIMIZE: NocStudio performs many optimizations to construct the on-chip network.
Coherency Controller Optimization: Based on the coherency bandwidth requirements, NocStudio
automatically identifies the number of coherency controllers needed in the system as well as other
Gemini coherency IP blocks needed for the SoC like NCB (Non-cache Bridge) and DVM.
Automatic Topology Generation: Based on floorplan, connectivity & performance specifications,
NocStudio will triangulate to the correct NoC topology, such as a bus, mesh or even a heterogeneous
topology. Routes for various flows between IP blocks are selected during NoC configuration to reduce
latency, meet bandwidth requirements, and minimize power and area.
Layer Optimization: NetSpeed GEMINI supports up to 8 physical layers & 32 virtual networks. These
layers & networks are fully heterogeneous and are optimized to meet end-to-end requirements.
A cycle-aware performance simulator is available to characterize performance of the NoC.
3. GENERATE: The final step in the design flow is used to generate synthesizable RTL along with C++
functional models, detailed performance statistics and sanity verification test benches.

Design Flow
1. SCALABLE ARCHITECTURE: NetSpeed GEMINI achieves scalability through multiple design dimensions.
a. Coherency Bandwidth: The number of
coherency controllers needed for a SoC is
automated and is determined based on the
coherent bandwidth needed for the
system. Employing multiple coherency
controllers enables more coherent lookups
per cycle.
b. Directory Structure: The directory
structure used in GEMINI is a unique,
scalable directory. Typical directories grow on the order of O(n
) with number of agents as more
entries are needed and each entry must track more caches. However, NetSpeeds directory
solution grows close to linearly with increasing number of entries and agents. The GEMINI directory
is built to reduce power by limiting the number of associative ways, while using advanced
directory encodings and management to maintain peak performance levels.
c. Underlying Interconnect Architecture: NetSpeed GEMINI underlying NoC scales with increasing
traffic in the SoC. This is achieved through the use of multiple physical layers in the NoC.
2. SPECIALIZED ACCELERATORS: NetSpeed GEMINI includes an accelerator for ordered coherent traffic called
the Non-cache Bridge. It achieves higher ordered throughput by performing coherent lookups in parallel
while ensuring completion occurs in the specified order. GEMINI also includes hardware support for
Distributed Virtual Memory (DVM), enabling memory management operations to be distributed to all
required agents.

Employing multiple coherency
controllers enables
more coherent lookups &
increases coherent bandwidth



1. NOC PLATFORM: The unique architecture of NetSpeed GEMINI allows it to scale performance to match
both the growing number of IP blocks and increasing design complexity. This allows NetSpeed IP to be
used as NoC platform for entire product families. The underlying hardware elements of NetSpeed GEMINI,
like the coherency controller, coherency directory & router modules, are designed to support higher
throughput with low footprint & power. Using these elements, efficient NoCs can be built for a variety
of SoCs, from mobile to enterprise networking and high performance computing.
GEMINI uses patent-pending algorithms to design
NoCs that are correct-by-construction. It uses
graph theory & formal techniques to ensure that
there are no cycles in the entire message
dependency chain. It captures dependencies
from protocol requirements, traffic flows, and
user specification. The combined dependency
specification is used to ensure full deadlock
avoidance both at protocol & network level.
3. USER-CONFIGURABILITY: Many existing coherency solutions are fixed-point solutions leading to system
designs that may be under-performing and over-performing. However, NetSpeed GEMINI is a fully
configurable & customizable coherent NoC IP. NetSpeed GEMINI is configured and optimized using
NocStudio - a NoC architecture exploration platform. Using NocStudio, SoC designers can describe their
interconnect specifications at a high level such as floorplan, connectivity, bandwidth and latency. In a
user controlled and automated design environment, a number of interconnect design choices can be
rapidly generated, evaluated and benchmarked.

Patent-pending Algorithms
And formal methods
to design NoCs that are
The growing number of computing blocks in an SoC, increasing design complexity and the paradigm shift
towards hardware-driven coherency have created a need for scalable, coherent interconnect solutions.
NetSpeed GEMINI effectively addresses these needs. It uses a number of proven algorithms to optimize
interconnects, providing a scalable, high performance, correct-by-construction Network-on-Chip
solution. NetSpeed GEMINIs coherency architecture is based on an innovative directory that scales the
number of coherency modules depending on high-level SoC specifications while dramatically reducing
area and power.

