Professional Documents
Culture Documents
562
For applications requiring modular designs, where heterogeneous through-silicon vias, which are used to provide I/O escapes from
integration of multiple die must be interconnected in a reworkable the bonded stack.
manner, chip-to-chip and chip-to-wafer 3D assemblies are useful.
Chip stacking using thin die with TSVs and conventional pitch C4
interconnects may dramatically increase density of memory
products with only incremental changes to die, package and final
assembly. Further increases in density can be realized by
decreasing the volume of the joining metallurgy and using
thermo-compression techniques to reduce stacked inter-die
spacing down to a few microns [4]. Since the Cu wiring density
achieved using standard CMOS BEOL techniques is orders of
magnitude higher than typical first levels packages, silicon
carriers increase in-plane interconnection density between
multiple die, assuming the I/O pitch of these die can be reduced Figure 3 Schematic illustration of face-to-face wafer-scale
simultaneously. Bond and assembly of multiple chips on a silicon assembly enabled by copper-to-copper bonding of
carrier have been undertaken using a variety of solder interconnects
metallurgies on die having dense microbump arrays (25 micron Because wafer-scale integration enables further processing of
diameter at 50 micron pitch) with excellent yield and reliability wafers after the join step, process flows can also be developed
[3]. Such techniques may be extended down to pitches of tens of which first join wafers mechanically and then electrically at a later
microns. Planarity, solder volume and chip placement accuracy step. One example of this in practice today enables very high
issues may limit further reductions. Examples of chip stacks and densities of interlayer interconnections. It uses a glass-handle
silicon carrier assemblies are shown in Figure 2. wafer to stack silicon-on-insulator (SOI) top-circuit layer onto an
underlying bottom-circuit wafer in a face-to-back orientation, as
1.2 Wafer Scale Integration illustrated schematically in Figure 4.
Wafer-scale integration typically involves the joining and
interconnection of two wafers that already have active circuits and
devices. The use of such wafer-to-wafer joining preserves the
physical integrity of the wafer and therefore enables additional
conventional semiconductor processing (typically back-end-of-
line operations) after the wafer-joining step. In wafer-scale
integration, the wafers can be joined in either a face-to-face or
face-to-back orientation; however, both types of builds typically
require wafer thinning, aligned wafer-to-wafer bonding, and
formation of wafer-to-wafer interconnect. These processes are
Figure 4 Schematic illustration showing steps used to stack a
generally performed using materials that are compatible with
silicon-on-insulator (SOI) top-circuit layer onto an underlying
future back-end-of-line processing.
bottom-circuit wafer in a face-to-back orientation
Wafer-scale integration can be used in combination with through-
silicon vias (TSVs). As one example, face-to-face wafer-scale 2. 3D Advantages and Opportunities
integration can be enabled by copper-to-copper bonding of Vertical interconnects, in a number of embodiments have been
shown to provide a number of distinct advantages to high
a) b) performance chips. Each application has a different requirement
for via pitch and count, electrical performance, and operating
conditions. These requirements dictate which of the 3D schemes
is most economically suited to the use.
2.1 Wirelength Reduction
3DI is commonly cited as a means of reducing lateral wire length.
With the introduction of multiple active silicon layers, a 3DI 2-3
micron tall vertical via with resistance of a few milliohms,
Figure 2 a). a 3D stack comprising a full thickness chip, a 90 inductance of under 1 pH, and capacitance of a couple fF replaces
micron thick Si carrier, and a ceramic package, and b) SEM a lateral wire of tens to perhaps hundreds of microns. Figure 5 is a
of a 2-chip stack showing TSVs, thin joining metallurgy and histogram of wirelengths from a recent 90nm-generation
the resulting small inter-die gap. microprocessor subunit, assuming wirelength reduction factors
interconnects (Figure 3). Copper metallurgy can be used to ranging from 1X to 0.5X. As the wire lengths decrease, so does
simultaneously form mechanical and electrical connections, and is the demand for wire buffers, also shown in Figure 5. Overall,
compatible with additional back-end processing. A full thickness power dissipated in interconnects and buffers goes monotonically
top wafer is fabricated with both circuits and deep-via (deeper with the wire length-reduction factor, and is shown in Figure 6.
than the final top-wafer thickness target) structures. This wafer is
flipped over and aligned to the bottom wafer, so the copper
patterns used at the join of the two wafers must incorporate
mirroring in the design. After aligned bonding, the top wafer is
thinned from the backside to convert the deep-via structures into
563
2000
45 specialized accelerator functions separate from the cores; security
1500 1.0X
40 and network functions are two common operations addressed by
Buffer Count (per bin)
0.9500
specialized hardware. These trends combine to significantly
Total B uffer s
Thous ands
0.9000 35
increase the need for larger cache capacity to support the many
1000 0.8500
0.8000
0.7500 30
500
0.7000
25
different threads running on a chip, as well as increasing the
0
20
overall bandwidth requirements for the cache and memory
systems.
0 5 10 15 20 25 30
1 0.95 0.9 0.85 0.8 0.75 0.7
Leng th Bi n (microns)
Thousands 3D Avg Net Length Reducti on Factor
10
1
Power ( W)
W irePwr Sv gs (W )
BufPwr Sv gs ( W )
0.1
0.01
1.00 0.90 0.80 0.70
0.95 0.85 0.75
Avg Net Leng th Reduction
564
proportional to the reciprocal of a root (frequently, the square drain-induced barrier lowering, decreasing leakage. Passive
root) of the capacity of the cache, and is workload dependent. The energy per cycle thus decreases for some range of decreasing
bus utilization is a product of the TE (which is the “service time” supply voltage [6][7]. Figure 8 illustrates this relationship
for a miss) and the miss rate. The nonlinearities in the TE effects between both active and passive energy per cycle and supply
arise when the utilization gets pushed too hard. voltage.
Cache capacity and bandwidth can be thought of as “mutually Low-voltage operation comes at the cost of increased gate delay,
fungible” entities. If the cache can be made larger, less bandwidth however, due to the reduced on-current available to charge each
is required; if more bandwidth is made available, we can live with gate output node. This increased gate delay degrades performance
a smaller cache. It is always better to have more of both however: and allows leakage power to integrate over a longer period,
big caches with lots of bandwidth. 3D affords us this opportunity. eventually causing passive energy to rise again at sufficiently low
Obviously, 3D enables more cache capacity directly. Cache supply values, as shown in Figure 8.
“planes” can be readily stacked upon a system footprint. Less Circuit architecture can capture the improved efficiency of low-
obvious is that within the 3D stack, much higher bandwidth voltage operation while maintaining performance by
(between planes) is achievable as well, much higher than would compensating for delay. Parallelism can offset delay by
have been possible had the planes been laid out on a 2D package. increasing the number of circuits performing a given task,
To wire between chips on a 2D package requires a “Manhattan” x dividing the workload [8][9]. Since not all algorithms can be cast
and y wiring having dimensions on the scale of the chip size. into a perfect parallel implementation, the percentage increase in
These wires can be quite long - several centimeters, or even circuit count required to maintain performance is typically larger
inches. If areas to be connected between chips are spatially co- than the percentage increase in delay, commonly represented by a
located, then when put into 3D the connections can be primarily power-law relationship with coefficient α. In a system
vertical (in the z dimension). In this case, particular, busses characterized by an α value of 1.4, for example, circuit count must
between the adjacent layers in a cache hierarchy have the potential increase by a factor of 2.6 for each doubling of delay. The price of
of requiring very little x or y displacement. Short busses like this power efficiency constrained by constant performance is therefore
(now perhaps only 10s or 100s of microns) run much faster, and at an increase in chip area. Power-area tradeoffs can be difficult to
lower relative power. In addition, if specific 2D layouts can be accept in planar implementations. 3D integration offers options.
anticipated (to minimize the x and y motion required in moving Specific system blocks such as SRAM caches could be moved to
between planes), wiring blockages may be eliminated. This will their own layer, freeing area for increased logic count. In another,
enable a denser thru-via grid, which allows the busses to be very multiple cores with dedicated local memory could be stacked in
wide as well. This enables removal of much of the Trailing Edge, multiple layers.
and its pernicious effects. A key obstacle toward realizing robust low-voltage design is
2.4 Low Voltage and Power Savings process variability, particularly as it impacts threshold voltage.
The performance of many high performance processors, is limited Such variability can be at least partly countered by adjusting
not by raw capability but by the ability to supply sufficient power threshold voltage dynamically through body bias (in bulk and
or remove the consequential heat. One effective means of partially-depleted SOI) or a backgate (in fully-depleted SOI).
improving efficiency is by the use of reduced-voltage supplies. Implementing threshold-adjusting biases with fine granularity
Since the signal swing in CMOS logic is determined by the supply across a chip is a difficult design challenge, however, requiring
voltage, reducing this voltage decreases the energy supplied by a generation of many individual voltages. 3D integration provides
logic gate to and discharged from the wiring and an elegant solution to this problem, enabling individual, adjustable
voltage converters to be placed in their own layer directly above
the zones of the logic chip where they are required.
565
density of 2 W/mm2 would give a prohibitive temperature drop of
over 50 C per chip layer for a “typical” case, improved thermal
design can manage this. Nevertheless, care must be taken that hot
spots on different chips in a stack do not overlap. For DRAM
chips with power densities in the range of 0.01 W/mm2, stacks of
many chips may be possible through careful thermal design.
Table 1 Approximate area-normalized thermal resistance for
layers in stacked chip structures.
Structure Thermal resistance Figure 9 Repeater explosion due to metal resistance increase
2 with CMOS scaling
(C-mm /W)
In order to exploit the full potential of 3D technology, new
PbSn solder balls 100 µm tall, 15% 16 challenges in the area of physical design [17][18], thermal
coverage analysis[15][16], system level design and analysis need to be
Cu balls 20 µm tall, 20% coverage 0.3 addressed [13]. 3D interconnects have the potential of reducing
critical paths delays significantly, which are typically between
200 µm thick Si wafer 1.6 memory and the interfacing logic.
10 µm thick SiO2 layer 8 New tools that consider thermally aware physical design
Total “typical” (solder balls) >25 implementations, most importantly at the architecture and SoC
level are crucial to the success of 3D as thermal issues are
Total “improved” (Cu balls, thin ~5 exacerbated in 3D implementations [12]. To justify the cost and
dielectric) complexity overhead of 3D technology, it is essential to study the
benefit of 3D early in the design cycle. This requires strong
3.2 Test Approaches linkage between architecture level analysis tools and 3D physical
The Automated Test Equipment (ATE) required to test advanced planning tools. Most of the advantages of 3D will be utilized with
3DI wafers in the future will be no different than that needed for new system architectures and physical implementations.
testing standard, non-bonded 2D wafers today. However, for Therefore, the tools to aid 3D implementation must also operate at
some 3DI bonding processes, the 3DI wafers will be aligned and the higher level in addition to the 3D place and route algorithms
bonded before metallization of the topmost layer, with the process that have been proposed in the literature before. In fact, in our
repeated for each additional layer [11]. Without metallization of view, the benefits from 3D place and route will be limited since
each IC layer prior to bonding, testing individual layers prior to current 2D designs do a fairly good job of optimizing the critical
bonding will not be possible. Because testing of individual IC path distance. There is a very strong need for 3D architectural and
layers prior to bonding will not be possible, contact pads, ESD, physical planning tools that operate in the domain of thermal,
I/O, and test structures on individual layers will not be necessary, physical, and performance analysis in order to yield an optimized
and can be placed on a single dedicated layer, the Peripheral and system implementation in 3D technology [19][20][21][22]. Most
Test Layer (PTL). The debate remains open as to where the PTL of the studies reporting huge benefits from 3D for wire length [12]
should be placed in the stack, where the contact pads should be do not adequately consider
located (topmost layer or backside of substrate), and how the
signals should be routed through the 3DI assembly. If the PTL is
placed first on the stack and the contact pads are located on the
backside of the substrate, there will be significant opportunities
for developing new test Front-End-Hardware (FEH),
methodologies and processes for improving 3DI quality,
throughput and yield. Locating the contact pads on the backside
of the substrate will enable the continuous monitoring of the
bonding process as well as testing of the assembly prior to
completion. If the contact pads are located on the topmost layer,
as is done with standard, non-bonded 2D wafers today, test will
continue to be the last step in the process, and the FEH,
methodologies, processes and value added by test will remain
largely unchanged.
3.3 EDA Enablements for 3D
A fundamental shift in the technology has occurred beyond 90nm
CMOS where the interconnect resistance has been increasing
significantly to cause a repeater explosion problem. This problem
translates into not only significant area overhead but also power, Figure 10 Sweet spot of 3D partitioning when considering 3D
as repeaters are among the leakiest circuit topologies. 3D through via impact
technology has the potential of easing the challenge of repeater
explosion (Figure 9) the physical impact of vertical vias. It is crucial to consider the
impact of vertical vias on the physical design of ICs, from area,
latency, and thermal impact point of view. Figure 10 shows that
the sweet spot of partitioning for 3D implementation lies at the
566
unit level (where a unit is a large logical entity such as floating [9] H. P. Hofstee, "Power efficient processor architecture
point logic or Instruction decode logic etc) and beyond, when and the cell processor" Proc. 11th Int. Symp. on High-
considering the via impact. Performance Computer Architecture (San Francisco,
In addition to 3D design and implementation tools, there are CA), Feb. 2005, pp. 258-262, 2005.
important challenging issues in 3D test and yield that must be [10] B. Dang, et al., “Integrated Thermal-Fluidic I/O
addressed as well. It is well known that yield has a quadratic Interconnects for an On-Chip Microchannel Heat Sink”,
dependency on die size, and a linear dependency on chip count at Electron Device Letters, Vol. 27, pp. 117-119, Feb.
a given die size. 3D designs may incur some yield loss due to 2006.
vertical vias, and may gain some yield due to density. One of the [11] K. Bernstein, et al., "Introduction to 3D Integration",
benefits of 3D is that this technology is compatible with the ISSCC '06 Tutorial 3, February 2006.
known-good-die practices, a known contributor to cost reduction [12] J. Cong, et al., “An Automated Design Flow for 3D
and test simplification. Microarchitecture Evaluation”, Proc. of Asia Pacific
DAC 2006, pp. 384-389, 2006.
4. References [13] A. Rahman, et al., “Wire length distribution of 3-D
[1] Takahashi, K. et al., “Process Integration of 3D Chip ICs”, Proc. of IEEE Intl. conference on interconnect
Stack with Vertical Interconnection,” Proceedings of technology 1999, pp. 671-678, 1999.
the 54th Electronic Components and Technology [14] S. Das, et al., “Design Tools for 3-D ICs”, Proc. of
Conference, Las Vegas, NV, pp. 601-609, 2004. Asia-Pacific DAC 2003, pp. 53-56, 2003.
[2] Bower, C.A. et al., “High Density Vertical [15] J. Cong, et al., “A thermal driven floorplanning
Interconnects for 3D Integration of Silicon Integrated algorithm for 3-D ICs”, Proc. of ICCAD 2004. pp. 306-
Circuits,” Proceedings of the 56th Electronic 313, 2004.
Components and Technology Conference, San Diego, [16] J. Cong, et al., “Thermal driven multi-level routing for
CA, pp. 399-403, 2006. 3-D ICs”, Proc. of Asia-Pacific DAC 2005, pp. 121-126,
[3] Wright, S.L. et al., “Characterization of Micro-bump C4 2005.
Interconnects for Si-Carrier SOP Applications,” [17] C. Ababei, et al., “Placement and Routing in 3D ICs”,
Proceedings 56th Electronic Components and IEEE Design & Test, Nov-Dec. 2005, pp. 520-531,
Technology Conference., San Diego, CA,, pp. 633-640, 2005.
2006. [18] S.K.Lim, “Physical Design for 3D system on package”,
[4] Sakuma, K, et al., “3D Chip Stacking Technology with IEEE Design & Test, Nov-Dec. 2005, pp. 532-539,
Low-Volume Lead-Free Interconnections,” to be 2005.
published in the Proceedings 57th Electronic [19] G.L.Loi, et al., “A Thermally aware performance
Components and Technology Conference, Reno, NV. analysis of vertically integrated (3D) processor memory
May 29 – June 1, 2007. Hierarchy”, Proceedings, 43rd Design Automation
[5] Emma, P.G., “How Bandwidth Works in Computers,” Conf. (DAC), pp. 991-996, 2006.
Chapter 11 in High Performance Energy Efficient [20] O.Ozturk, et al., “Optimal Topology Exploration for
Microprocessor Design, edited by V.G. Oklobdzija and Application-Specific 3D Architectures”, Proc. of Asia
R. Krishnamurthy, published by Springer, Feb., 2006. Pacific DAC 2006, pp. 390-395, 2006.
[6] S. Hanson, et al., "Ultralow-voltage minimum-energy [21] J. Kim, et al., “A Novel Dimensionally-Decomposed
CMOS", IBM J. Res. and Dev., vol. 50 no. 4/5, pp. 469- Router for On-Chip Communication in 3D
488, 2006. Architectures”, to appear in Proc of International
[7] A. Bryant, et al., "Low power CMOS at Vdd=4KT/q", Symposium on Computer Architecture (ISCA) 2007.
Proceedings of the Device Research Conference (Notre [22] W.-L. Hung, et al., “Interconnect and Thermal-aware
Dame, IN), June 2001, pp. 22-23, 2001. Floorplanning for 3D Microprocessors”, International
[8] H.P. Hofstee, "Power-constrained microprocessor Symposium on Quality Electronic Design (ISQED),
design," Proc. 2002 IEEE Int. Conf. on Computer 2006, pp. 98-104, 2006.
Design: VLSI in Computers and Processors (San Jose,
CA), Sept. 2002, pp. 14-16, 2002.
567