You are on page 1of 4

3D-IC BISR for Stacked Memories Using Cross-Die Spares

Chun-Chuan Chi
1,2
Yung-Fa Chou
2
Ding-Ming Kwai
2
Yu-Ying Hsiao
2
Cheng-Wen Wu
1,2
Yu-Tsao Hsing
3
Li-Ming Denq
3
Tsung-Hsiang Lin
3
1
Department of
Electrical Engineering
National Tsing-Hua University
Hsinchu, Taiwan
{ccchi, cww}@larc.ee.nthu.edu.tw
2
Information and Communications
Research Laboratories
Industrial Technology Research Institute
Hsinchu, Taiwan
{yfchou, dmkwai, yuyinghsiao}@itri.org.tw
3
HOY Technologies
Hsinchu, Taiwan
{john.hsing, taros.denq,
sean.lin}@hoy-tech.com
Abstract
3D ICs basea on Through-Silicon-Jias (TSJs) enable the stacking of logic ana memory aies to manufacture chips with higher per-
formance, lower power, ana smaller form factor. To improve the yiela of the memory aies in 3D ICs, this paper proposes a Built-In
Self-Repair (BISR) architecture which allows the sharing of spares between aifferent layers of aies. The corresponaing pre-bona
(before the memory aies are bonaea together) ana post-bona (after the memory aies are bonaea together) test ow is presentea as
well. In oraer to maximi:e the yiela gain introaucea by the cross-aie spares, a aie matching algorithm is proposea to aetermine which
aies shoula be stackea together, so that the spare sharing can be most efhcient. Experimental results show that the area overheaa
of the proposea BISR circuit is only 2.43, which can be smaller if larger logic ana memory aies are aaoptea, ana the yiela gain
achievea by cross-aie spare sharing can be up to 23.
1 Introduction
Through Silicon Via (TSV) is an emerging process technology
which can provide inter-die connections through silicon sub-
strate. TSVs are manufactured by drilling through a silicon sub-
strate and lling the holes with metal, such as copper or tungsten,
so that they can provide high-density, low-latency, and low-power
vertical interconnects between dies. [16].
TSVs enable three-dimensional ICs (3D ICs) that integrate mul-
tiple dies into a single chip. Due to the shorter vertical inter-die
connections, 3D ICs can offer benets like higher performance,
lower power, and smaller form factor. In addition, 3D ICs en-
able heterogeneous integration, allowing each individual die to
be manufactured by different process technologies. One of the
most promising 3D integration paradigms is to stack processor
and memory dies together, which is effective in addressing the
memory wall problem that limits the processor performance
[714].
Recently, several techniques are proposed to improve the yield of
3D-stacked memories. [15] presents a redundancy scheme that
can be shared between different memory dies, by means of which
the yield of memory stacks can be increased. [16] proposes a
die matching algorithm to select which memory dies should be
stacked together to maximize the stack yield, under the assump-
tion that the redundancy can be shared between different memory
dies. The proposed algorithm only addresses two-die stacks. [17]
tries to combine two bad dies into a good stack, so that the amount
of memory products that can be shipped is increased. However,
prior works lack in discussion of details of BISR architectures.
This paper focuses on the repair of memory dies in processor-
memory stacks. We propose a 3D BISR architecture, which
adopts a cross-die spare scheme that allows the sharing of spares
among memory dies. The hardware is identical for each mem-
ory die, independent of the layer it is located at. A pre-bond
and post-bond test/repair ow is presented as well, based on the
proposed 3D BISR to improve the yield of memory stacks. In
order to share the spares efciently between dies, we propose a
die matching algorithm, which uses heuristics, and therefore can
quickly determine which dies should be stacked together to make
the number of good stacks as large as possible.
The rest of this paper is organized as follows. Section 2 presents
the target 3D memory and the proposed BISR architecture; then
the corresponding pre-bond and post-bond test/repair ow is de-
tailed. Section 3 describes the proposed die matching algorithm,
which is essential to make the spare sharing effective. Experi-
mental results on area costs of the proposed BISR and the yield
gain brought by cross-die spares are shown in Section 4. Finally,
Section 5 concludes this paper.
978-1-4577-2081-9/12/$26.00 2012 IEEE
2 3D BISR Architecture
2.1 The Target 3D Memory
The target 3D memory in this paper is shown in Figure 1, in
which the bottom die is assumed to be a logic die, and several
memory dies are stacked on it. In a typical paradigm, the logic
die can be a processor; the memory dies can be either SRAM
or DRAM. The interconnects between dies are implemented by
TSVs; external I/Os are assumed to be located on the bottom
logic die.
Figure 1: The target 3D memory in a logic-memory stack.
The access of the memory is controlled by Die 0, which generates
memory enable signals and broadcasts control signals to memory
dies. Only one memory die is enabled and responds to the control
signals at a time, since address and data buses are shared. Each
memory die consists of a main memory which is partitioned into
several banks, as well as a small amount of spare memory (not
shown in Figure 1).
2.2 3D BISR
Figure 2 shows the proposed BISR architecture for the memory
dies in a logic-memory stack. In each memory die, there is a ded-
icated local Built-In Self-Test (BIST) and Built-In Redundancy
Analysis (BIRA). The BIST is responsible for generating test pat-
terns for the main and spare memories on the die and comparing
test responses with expected results to locate fault sites. Based on
the test results from the BIST, the BIRA can perform a built-in
RA algorithm and determine how many and what types of spares
are required to repair the faults in the main memory. The Re-
pair Sig. registers are used to temporarily store repair signatures
generated by the BIRA, which indicate the addresses of fault lo-
cations and the required types of spares to repair such faults. The
signatures are shifted out after the testing is nished. The added
testing-related hardware is identical for each memory die, and is
independent of which layer the memory is stacked at.
On the bottom die (Die 0), there is a Global Spare Assignment
Unit (GSAU), which receives repair signatures frommemory dies
and assign spares globally. The scope of GSAU covers the entire
memory stack, and it is aware of which dies have spares available
and vice versa, since all repair signatures generated by local BI-
RAs are transferred to GSAU after testing. Therefore, the GSAU
can allocate spares on all memory dies to repair faults across dies,
that is, using one dies spares to x another dies faults is allowed.
It should be noted that we assume the main and spare memories
in every memory die meet the timing specication. Hence, us-
ing spares on a die to repair faults on another die does not intro-
duce timing problems. This assumption is reasonable, because all
memory dies in the 3D IC must meet the specied timing spec.
in order to enable random access.
Figure 2: The proposed 3D BISR architecture.
The function of Address Remapping Unit (ARU) is to remap
faulty addresses to spare addresses according to the spare assign-
ment results from GSAU, and hence the faulty memory can be
repaired. The Remap Test Unit (RTU) is used to test the memory
after address remapping, to check whether the remapping is cor-
rect. Another purpose of this RTU test is to access memory dies
from Die 0, so that the faults on TSVs that serve as address and
data buses, and other control signals can be detected.
2.3 Test Flow
(a) Pre-bond
(b) Post-bond
Figure 3: Pre- and post-bond test/repair ow.
Based on the BISR architecture presented above, this sub-section
presents a corresponding pre-bond and post-bond test/repair ow,
which tries to maximize the yield of stacked memories. Fig-
ure 3 (a) shows the pre-bond test ow, which is a typical memory
test ow with BIRA. The BIST and BIRA cooperates to classify
memory dies into Good Dies and Bad Dies. Since in our pro-
posed scheme, spares can be shared between dies, a die is con-
sidered as a bad die only if the faults on itself cannot be repaired
by all spares in the memory stack. For example, if a memory die
contains 4 spare rows, and the number of memory dies in a stack
is set to 2, then a die will be considered as a bad die only if its
faults cannot be repaired by 4 2 = 8 spare rows.
Note that some of the Good Dies here are only possibly re-
pairable, because they may require spares from other dies to be
repaired. We propose an algorithm that can select which dies
should be bonded together to efciently share spares across dies
and maximize the yield of memory stacks, which will be detailed
in Section 3.
After pre-bond testing and the dies are stacked together, a post-
bond test ow is performed, as shown in Figure 3 (b). This post-
bond test is to detect the faults introduced by 3D process steps,
such as wafer thinning, bonding, etc. During the post-bond test-
ing, all memory dies are tested in parallel, and if there is any
memory die that is irreparable, then the test operations stop. On
the other hand, if all memory dies are still Good Dies after all of
the BIST circuits nish the testing, the GSAU allocates spares to
repair faults according to the repair signatures from every mem-
ory die. If the amount of requested spares is more than the total
amount of available spares in the memory stack, the 3D mem-
ory is considered irreparable; otherwise the address remapping
conguration is stored, and an optional test can be performed to
check whether the remapping is successful as well as to test the
faults on TSVs that interconnect logic and memory dies.
3 Die Matching Algorithm
Figure 4: Optimization ow of the proposed die matching algorithm.
After pre-bond testing, a die matching algorithm is required to
select which dies should be bonded together, so that the spares in
different dies can be shared efciently and the stack yield can be
maximized. The optimization ow of the proposed die matching
algorithm is shown in Figure 4.
A two-valued vector (# spare rows, # spare cols) is attached to
each die, which represents the remaining spares on itself after
pre-bond testing. The input of the algorithm is a set of dies with
their corresponding two-valued vectors, and a pre-dened num-
ber of dies that should be in a stack, k; the output is a set of die
stacks, each having two non-negative values in its overall two-
valued vector, which is dened as the sum of all individual vec-
tors of the dies in a stack.
In the beginning of the optimization ow, the dies are catego-
rized into 4 bins, according to their two-valued vectors. Bin1
contains the dies which have both spare rows and columns avail-
able (the amount is indicated by positive values); Bin2 contains
the dies which have only spare rows available, and need spare
columns from other dies to be repaired (the required amount is
indicated by a negative value); Bin3 contains the dies which have
only spare columns available, and need spare rows from other
dies to be repaired; Bin4 contains the dies which have run out of
all spares on themselves, and need both spare rows and columns
from other dies to be repaired.
The brief concept of the ow in Figure 4 is to always keep
the resulting overall vector consisting of positive values on each
step. Whenever a resulting vector consists of negative values,
the newly stacked die is removed, and a die from another bin is
stacked according to the previously obtained vector. For exam-
ple, if the previous vector is (+, -), indicating that the vector has
a positive value in the rst position and a negative value in the
second, then a die from Bin3, which has a vector of (-, +) will be
stacked; if the previous vector is (-, -), then a die from Bin1 with
a vector of (+, +) will be stacked, and so on. The choice is made
based on an attempt to neutralize the negative values. After the
neutralization, the algorithm tries to stack a Bin4 die.
In Figure 4, the circle, Stack a Bin2 die & a Bin3 die, represents
the process that exhaustively searches for a combination of Bin2
and Bin3 dies, so that the two-die combination results in an over-
all vector which is (+, +). This process is performed when the
algorithm needs to stack a Bin1 die but Bin1 is already empty.
The matching process stops when Bin1 is empty, and there is no
Bin2-Bin3 combination that has a (+, +) vector.
4 Experimental Results
4.1 Area Cost of BISR
We have implemented an example design, which is a three-die
stack, with a logic die at the bottom (Die 0), and two SRAM dies
(Dies 1 and 2) stacked on it, to evaluate the area cost of the pro-
posed 3D BISR. Each memory die consists of a main memory of
8k x 64, and three spare columns.
The area result is listed in Table 1. For each memory die, the area
introduced by the 3D BISR occupies 2.3% (excluding the area
of spares). Compared to the entire logic-memory stack, the 3D
BISR hardware represents approximately 2.43%.
The area of Die 0 (the logic die) is set to the same value as the
memory dies. Although this value is unrealistically small com-
pared to a real processor, it is sufcient to give us an estimate of
the BISR area overhead. Since the BISR hardware does not in-
crease with respect to the area of the logic die, it is expected that
the area overhead will be smaller if a more realistic large pro-
cessor die is adopted. Similarly, if the size of the main memory
increases, the area percentage occupied by the BIST and BIRA
will be smaller.
Functional Circuits GSAU ARU RTU Area
(m
2
) (m
2
) (m
2
) (m
2
) Overhead
Die 0 1.78M 20k 7k 21k 2.70%
Main Memory Spares BIST BIRA Area
(m
2
) (m
2
) (m
2
) (m
2
) Overhead
Die 1 1.78M 410k 39k 2k 2.30%
Die 2 1.78M 410k 39k 2k 2.30%
Total area overhead 2.43%
Table 1: Area costs of the proposed 3D BISR in 0.13m technology.
4.2 Yield Gain
In order to evaluate the yield gain that can be achieved by cross-
die spares and the proposed die matching algorithm, experiments
as described below are conducted. Consider 1,000 memory dies,
each with a capacity of 8k x 64 (row address: 8-bit, column ad-
dress: 5-bit, word length: 64-bit). The memory dies are used
as inputs as an RA simulator, in which the Essential Spare Piv-
oting (ESP) RA algorithm [18] is used. Given a defect density,
the RA simulator outputs 1,000 dies, each having a two-valued
vector representing its remaining spare rows and columns after
defect injection (based on the specied defect density) and RA
simulation.
These 1,000 dies output fromthe RAsimulator are then the inputs
of the proposed die matching algorithm. They are categorized
into 4 bins according to their corresponding two-valued vectors;
and the optimization procedure shown in Figure 4 is applied to
the 4 bins. In the experiment, we assume each memory stack con-
sists of 4 dies. Since defects are injected randomly, for each given
defect density, the above described experiment is performed 100
times to get the average results. Figure 5 shows the average yield
gain results, where the yield gain is dened as the extra number
of good memory stacks that can be obtained by using cross-die
spares.
Figure 5: Average yield gain introduced by cross-die spares.
From Figure 5, it is seen that for the 3R3C (each memory die
has 3 spare rows and 3 spare columns) case, the yield gain is the
largest and can be up to 23%. For the other two cases, the yield
gain is lower and varies between 0 to 10%, because the amount
of spares is so limited that many memory dies cannot be repaired
even the cross-die spares are used. A larger yield gain is also
observed when the defect density is higher.
5 Conclusion
In this paper, we propose a BISR architecture for 3D memories
in a logic-memory stack. The BISR adopts cross-die spares to
improve the overall yield of the 3D memory. The correspond-
ing pre-bond and post-bond test/repair ow is also presented. In
addition, a die matching algorithm is proposed, which can se-
lect which dies should be bonded together to maximize the mem-
ory stack yield. The experimental results show that the area cost
of the proposed BISR is only 2.43%, which can be smaller if
larger logic and memory dies are adopted. Using the proposed
die matching algorithm, the yield gain introduced by cross-die
spares can be up to 23%, and the yield improvement is larger
with higher defect densities. Therefore, the proposed technique
is especially effective for shortening yield ramp-up period and
accelerating time-to-market.
References
[1] Kaustav Banerjee et al. 3-D ICs: A Novel Chip Design for Improving Deep-Submicrometer Intercon-
nect Performance and Systems-on-Chip Integration. Proceeaings of the IEEE, 89(5):602633, May
2001.
[2] Bart Swinnen et al. 3D Integration by Cu-Cu Thermo-Compression Bonding of Extremely Thinned
Bulk-Si Die Containing 10m Pitch Through-Si Vias. In Proceeaings IEEE International Electron
Devices Meeting (IEDM), pages 14, May 2006.
[3] Robert S. Patti. Three-Dimensional Integrated Circuits and the Future of System-on-Chip Designs.
Proceeaings of the IEEE, 94(6):12141224, June 2006.
[4] Eric Beyne and Bart Swinnen. 3D System Integration Technologies. In Proceeaings of IEEE Interna-
tional Conference on Integratea Circuit Design ana Technology (ICICDT), pages 13, June 2007.
[5] Philip Garrou, Christopher Bower, and Peter Ramm, editors. Hanabook of 3D Integration Technol-
ogy ana Applications of 3D Integratea Circuits. Wiley-VCH, Weinheim, Germany, August 2008.
[6] Jan Van Olmen et al. 3D Stacked IC Demonstration using a Through Silicon Via First Approach. In
Proceeaings IEEE International Electron Devices Meeting (IEDM), pages 14, December 2008.
[7] Christianto C. Liu et al. Bridging the Processor-Memory Performance Gap with 3D IC Technology.
IEEE Design & Test of Computers, 22(6):556564, November/December 2005.
[8] Philip Jacob et al. Predicting the Performance of a 3D Processor-Memory Chip Stack. IEEE Design
& Test of Computers, 22(6):540547, November/December 2005.
[9] Bryan Black et al. Die Stacking (3D) Microarchitecture. In Proceeaings IEEE/ACM International
Symposium on Microarchitecture, pages 469479, December 2006.
[10] Gabriel H. Loh, Yuan Xie, and Bryan Black. Processor Design in 3D Die-Stacking Technologies.
IEEE Micro, 27(3):3148, May/June 2007.
[11] Gabriel H. Loh. 3D-Stacked Memory Architectures for Multi-Core Processors. In Proceeaings IEEE
International Symposium on Computer Architecture (ISCA), pages 453464, June 2008.
[12] Philip Jacob et al. Mitigating Memory Wall Effects in High-Clock-Rate and Multicore CMOS 3-D
Processor Memory Stacks. Proceeaings of the IEEE, 97(1):108122, January 2009.
[13] Yangyang Pan and Tong Zhang. Improving VLIW Processor Performance using Three-Dimensional
(3D) DRAM Stacking. In Proceeaings IEEE International Conference on Application-specihc Sys-
tems, Architectures ana Processors (ASAP), pages 3845, July 2009.
[14] Dong Hyuk Woo et al. An Optimized 3D-Stacked Memory Architecture by Exploiting Excessive,
High-Density TSV Bandwidth. In Proceeaings IEEE International Symposium on High Performance
Computer Architecture (HPCA), pages 112, January 2010.
[15] Che-Wei Chou et al. Yield-Enhancement Techniques for 3D Random Access Memories. In Proceea-
ings IEEE International Symposiumon JLSI Design Automation ana Test (JLSI-DAT), pages 104107,
April 2010.
[16] Li Jiang et al. Yield Enhancement for 3D-Stacked Memory by Redundancy Sharing across Dies. In
Proceeaings International Conference on Computer-Aiaea Design (ICCAD), pages 230234, Novem-
ber 2010.
[17] Yung-Fa Chou et al. Yield Enhancement by Bad-Die Recycling and Stacking With Though-Silicon
Vias. IEEE Transactions on JLSI Systems, 19(8):13461356, August 2011.
[18] Chih-Tsun Huang et al. Built-In Redundancy Analysis for Memory Yield Improvement . IEEE Trans-
actions on Reliability, 52(4):386399, December 2003.

You might also like