Professional Documents
Culture Documents
Abstract—An open-source memory compiler has recently data), characterization (timing, power), and verification
drawn the attention of academic researchers involved in Static (DRC, LVS, PEX). Its feasibility is demonstrated by SRAM
Random Access Memory (SRAM) designs. Since simulation designs in a generic 45nm technology (FreePDK45) and a
results for circuits implemented by practical Process Design fabricable 0.5m technology (SCMOS). Based on
Kits (PDKs) have rarely been reported so far, the effectiveness OpenRAM project, authors in work [6] proposed a novel
and the reliability for circuit implementation by compiler still differential single-port 12T bitcell with improved read
needs to be further verified. This paper presents an SRAM
reliability for high-density and high-speed SRAMs, where
circuit implemented in a 0.6m/±2.5V CMOS process with the
component configurations can be freely customized, and
compiler framework. We present pre- and post-layout
SRAMs can be easily realized with the help of OpenRAM.
simulation results with respect to the functionality and various
performance parameters of our SRAM circuit, and the
OpenRAM is improved subsequently by several works.
simulation results are thoroughly analyzed. A summary for Authors in work [7] proposed a replica-based timing
our SRAM circuit demonstrates the effectiveness and the speculative SRAM to detect read timing failures and to
reliability of the compiler to design an SRAM in a fabricable protect from incorrect write operations, while maintaining
technology process, showing a promising result for the energy- and area-efficient. Aiming to optimize high-
compiler to be verified on a fabricated chip. performance word line driver topologies for SRAMs, the
work [8] proposed an analytical optimization methodology
Keywords-memory compiler, PDK, SRAM circuit, post-layout using a delay model that includes gate delay, wire resistance,
simulation and wire capacitance. The work [9] proposed a word-line
buffer insertion optimization algorithm that place and size
I. INTRODUCTION the buffers to reduce word line and overall SRAM delay. To
enhance the throughput and flexibility of SRAM, the work
Modern IC designs pay more and more attention to [10] proposed a parameterized bitcell that can support any
SRAM circuit as it plays a significant role in optimizing combination of read, write, and read-write ports, which in
system performance [1] [2]. Specifically, for the System-on- turn makes OpenRAM suitable for multi-port configurations
Chip (SoC), Application-specific Integrated Circuit (ASIC), and high-performance systems.
and microprocessor, SRAM is a critical component OpenRAM has brought about inspiring results to
embedded with the systems that affects drastically the overall academic researchers, though, its portability across
area and power as well as speed [3] [4]. In the past decades, technology nodes, effectiveness and reliability to SRAM
several memory-oriented commercial tools have emerged to design still need to be further verified with a practical PDK.
customize the SRAM design. However, these tools have the On the one hand, the existing works and progress made are
following limitations: They serve as the black box to users all from the same research group, practical PDKs or
without knowing the details; they lack flexibility as users fabricable technology processes that they apply are very few
cannot modify the configuration of the base cells; they are and also FreePDK45 process is non-fabricable, the
dependent of technology process and thus the portability effectiveness and the reliability of OpenRAM is less
over other processes are poor. For the academic researchers adequately demonstrated as compared to a mature
involved in memory related projects, most importantly, commercial memory compiler. On the other hand, simulation
licensing issue is a great obstacle due to the expensive cost. results for SRAM circuits designed with a practical PDK by
Although several non-commercial tools are also available, OpenRAM have rarely been reported by far. To this end, we
they either can only generate simply memory structures, or implement an SRAM circuit in a 0.6m/±2.5V CMOS
provide no public releases for methodologies and design process with OpenRAM framework. After an SRAM design
flows. is generated from OpenRAM, we import our design into
Aiming to overcome the above limitations, in 2016, Cadence for circuit simulation and layout verification with
Guthaus et al. developed an open-source memory compiler the tool interface that the framework provides. In this paper,
[5], referred as to OpenRAM, to meet a variety of we present pre- and post-layout simulation results with
requirements for the sizes, configurations, and technologies respect to the functionality and various performance
in SRAM designs. OpenRAM provides a framework that parameters of our SRAM circuit. The simulation results are
incorporates multiple methodologies for the design thoroughly analyzed, which shows a correct function and a
generation (Verilog model, SPICE netlist, GDSII layout
good performance. Furthermore, a summary for the access transistor (N3 or N4). The wordline rwwl connects to
simulation results and an OPenRAM report demonstrates the the gates of the access transistors and is shared by bitcells in
effectiveness and the reliability of OPenRAM framework to same row. The bitlines, rwbl and its complement rwbl_bar,
design an SRAM in a fabricable technology process. Our also connect to the access transistors and are shared by
realized SRAM circuit also contributes to the further bitcells in same column. The default 6T cell is a single-port
verification of OpenRAM framework by silicon (rw) bitcell as it only supports a read or write operation in an
measurements, after a chip is fabricated. access time. By adding the circuit in red and the circuit in
The rest of the paper is organized as follows. Section Ⅱ blue for a write-only port (w) and a read-only (r) port,
mainly introduces an SRAM architecture and methodologies respectively, a 6T cell can be extended to a multi-port bitcell
used in OpenRAM. Section Ⅲ describes our design flow for (rw/w, rw/r, and w/r). The newest version of OpenRAM
an SRAM circuit based on OpenRAM framework. Section provides multi-port bitcell so that an SRAM circuit can be
Ⅳ presents pre- and post-layout simulation results of an accessed simultaneously by multiple read or write operations,
SRAM circuit for the functionality and various performance such feature is very useful in high-performance SRAMs for
parameters. Finally, Section Ⅴ concludes this work. multi-core microprocessors. OpenRAM provides user with a
parameterized bitcell, though, we can replace it with our own
II. OPENRAM FRAMEWORK custom-designed one for the performance improvement and
In this section, we mainly introduce an SRAM the area efficiency.
architecture and methodology associated with the circuit Address codes are decoded by an address decoder, and
implementation in OpenRAM. then wordline driver drives the row select signal and the
associated wordline can be asserted. Fig. 1 (c) shows a 4-16
A. An SRAM Architecture in OpenRAM decoder that can select the associated row from 16 rows.
As shown in Fig. 1, the SRAM architecture is composed Precharge (see Fig. 1 (d)) circuit is used to charge the bitlines
of a bitcell array and peripheral circuit arrays including of a bitcell to VDD in a read operation. Column multiplexer
control logic, address decoder, wordline driver, precharge, (see Fig. 1 (e)) is used to select the associated word when
column multiplexer, write driver, and sense amplifier. The there are many words in a row, it is an optional module in the
SRAM shown has 4-bit address and 8-bit input/output data. architecture and is usually used in multi-bank SRAM
Note that the bit number for address and data can be set implementation. Write driver (see Fig. 1(f)) is used to drive
arbitrarily in OpenRAM. A bitcell array with the word the input data into memory during a write operation and each
number, m, and the word size, n, has m×n bitcells. As seen in bit has one driver. Sense amplifier (see Fig. 1(g)) is a latch-
Fig. 1 (a), it shows a 16×8 bitcell array, where the horizontal type amplifier used to sense the voltage swing on the bitlines
line is wordline and the vertical line is bitline, each bitcell during a read operation. The more sensitive it is to the
placed in a tile has two bitlines. voltage swing, the less read access time of SRAM.
Control logic and replica bitline receive external signals,
clock signal clk0, chip select signal csb0, and write enable
signal web0. csb0 and web0 are both active low, SRAM
circuit works only if csb0 is in low voltage. Control logic
adopts the Replica Bit Line (RBL) technique to optimize the
Sense Amplifier Enable (SAE) signal timing, and to prevent
read failures.
For a write operation: In a clock cycle, at the time of
positive clock edge, if both csb0 and web0 are low voltage
(GND), address and input datas are captured, and then input
datas are sent to bitlines. When the associated row is selected
by an address decoder and the wordline turns to be high
voltage (VDD), the datas are written into memory cells with
delays from the negative clock edge.
For a read operation: In a clock cycle, at the time of
positive clock edge, if csb0 is low voltage and web0 is high
voltage, address is captured. Then, bitlines are precharged to
VDD and the associated wordline turns to be high voltage as
in the write operation. Assume that storage value in a bitcell
(see 6T cell in Fig. 1 (b)) is 1, i.e., values at nodes Q and
Q_bar are 1 and 0, respectively. The bitline rwbl remains
Figure 1. AN SRAM architecture in OpenRAM. VDD through the current path from P1 to N3,whereas the
bitline rwbl_bar discharges to GND through the current path
As seen in Fig.1 (b), OpenRAM adopts a 6T cell (circuit from N4 to N2. Thus, storage value 1 is passed to bitlines by
in black) as the bitcell that is most commonly used in SRAM, making rwbl 1 and rwbl_bar 0, respectively. If storage value
where two inverters (N1, P1, and, N2, P2) are cross-coupled, is 0, due to the inverse electrical behavior, rwbl and
and each output of the inverters respectively connects an rwbl_bar would be 0 and 1, respectively. Next, sense
amplifier can sense the values on bitlines as long as there is a work we implement a single-port (rw) 96-bit SRAM circuit
minor voltage difference. Finally, the stored datas in all with 4-bit address (16 words) and 6-bit input/output data (6-
accessed bitcells are simultaneously read out by a word per bit word size), which utilizes a 0.6m/±2.5V CMOS process.
time. Likewise, output datas also have delays from the As we apply our SRAM circuit to a Programmable Logic
negative clock edge. Array (PLA), a large memory size is not our main concern.
B. Methodology Used in OpenRAM for An SRAM Design Besides, this work aims to demonstrate the effectiveness and
reliability of OpenRAM to generate SRAM designs.
Methodology used in OpenRAM for an SRAM design is
Our design requirements are defined in the configuration
shown in Fig. 2. A technology library for PDK needs to be
file as aforementioned in last section. Main challenges are
available before the circuit implementation, it consists of
design migration for base cells from reference technologies
netlist and layout files for base cells, SPICE device models
(FreePDK45 or SCMOS) to our technology, and
for different corners, a tech configuration file in Python,
modification to the tech configuration file. A custom-
along with other Python files for PDK environment setup.
designed library including netlists and layouts is
User can set up the library for a specific technology based on
implemented for base cells like 6T bitcell, D-type flip-flop,
the reference implementations for FreePDK45 and SCMOS
sense amplifier, write driver and tri-gate. Replica bitcell and
technologies. Base cells are essential elements to constitute
dummy bitcell are also designed because RBL technique is
an SRAM which can be premade cells by foundry or
used in OpenRAM for the read reliability improvement.
handmade cells by user. SPICE device models are used in a
During the dynamic generation of SRAM circuit, replica
characterizer for circuit simulation. The tech configuration
bitcell array is fixed to output a 0 value, and dummy bitcell
file is used to set up GDS layer map, DRC/LVS rules,
array is placed with bitlines disconnected for wordline load
analytical characterization parameters and SPICE simulation
and lithography regularity. If multi-port bitcell configuration
parameters. The SRAM design is defined in a configuration
is applied, then, bitcell, replica and dummy bitcells (1rw/1r,
file where user specifies parameters associated with the word
1rw/1w) need to be designed as well.
size, the number of words, port types for bitcell
configuration, PVT (process corner, voltage, temperature)
for characterization, and verification tools.
An SRAM is generated by the technology library and
configuration file based on a Python-implemented memory
compiler framework. The generated results consist of
logical, physical, timing/power modules, a datasheet for
design report, a log file, and a configuration file copy.
Specifically, characterizer uses analytical model or spice
model to generate the timing/power modules, analytical
characterization is default as it is close-accurate to estimate
the timing/power and can speed up the process. Memory
characterizer finally uses a netlist with the extracted
parasitics to perform post-layout simulation and obtain more
accurate timing/power modules.
Figure 3. Bitcell layouts in (a) SCMOS 0.35m technology and in (b) our
0.6m technology.
(a)
(b)
Figure 6. Pre- and post-layout simulation results for write delay (a) and read delay (b) of an SRAM circuit.
feedbacks of cross-coupled inverters of an SRAM bitcell (see
a 6T bitcell in Fig. 1), and then biasing bitcell ports with
appropriate voltages. Next, sweeping voltage at storage node
Q while monitoring voltage of the other storage node Q_bar.
SNM is simply the side of the largest square embedded
between the voltage transfer characteristic curves of inverters.
Simulation setup and the obtained results for SNM are
shown in Fig. 7. HSNM is 1.2V, RSNM is 0.6V, and WSNM
is 1.8V with supply voltages of ±2.5V. Because layout of
bitcell is well designed under the mismatch and symmetry
constraints, the parasitic effect to our bitcell circuit is very
small. Therefore, pre-simulation and post-layout simulation
results are totally same.
C. Summary and Analysis
(a)
Results for the maximum working frequency, area, and
power along with the performance parameters are
summarized in Table Ⅰ. They are from pre-simulation and
post-layout simulation, we also include results from an
OPenRAM report that is used to compare to simulation
results for the discussion of OPenRAM methodology.
TABLE I. SUMMARY FOR OUR SRAM CIRCUIT
Item
Specification OpenRAM Post-layout
Pre Sim.
Report Sim.
Max. Working Freq. (MHz) 357 71.4 66.7
Area (mm2 ) 0.38 n/a 0.38
Leakage Power (mW) 13.06 18.84 18.84
(b) Total Power (mW) 13.09 19.12 19.06
Write Delay (ns) n/a 5.2 5.7
Read Delay (ns) 1.4 5.7 6.7
Write Time (ns) n/a 380.2 380.7
Read Time (ns) 376.4 380.7 381.7
HSNM@±2.5V (V) n/a 1.2 1.2
RSNM@±2.5V (V) n/a 0.6 0.6
WSNM@±2.5V (V) n/a 1.8 1.8