This action might not be possible to undo. Are you sure you want to continue?
Rakesh Anigundi1 , Hongbin Sun2 , Jian-Qiang Lu1 , Ken Rose1 , and Tong Zhang1 1 Rensselaer Polytechnic Institute, Troy, NY USA 2 Xi’an Jiaotong University, Xi’an, Shaanxi, P. R. China
Abstract— Motivated by increasingly promising threedimensional (3D) integration technologies, this paper reports an architecture design of 3D integrated dynamic RAM (DRAM). To accommodate the potentially signiﬁcant pitch mismatch between DRAM word-line/bit-line and through silicon vias (TSVs) for 3D integration, this paper presents two modestly different coarse-grained inter-sub-array 3D DRAM architecture partitioning strategies. Furthermore, to mitigate the potential yield loss induced by 3D integration, we propose an interdie inter-sub-array redundancy repair approach to improve the memory repair success rate. For the purpose of evaluation, we modiﬁed CACTI 5 to support the proposed coarse-grained 3D partitioning strategies. Estimation results show that, for the realization of a 1Gb DRAM with 8 banks and 256-bit data I/O, such 3D DRAM design strategies can effectively reduce the silicon area, access latency, and energy consumption compared with 3D packaging with wire bonding and conventional 2D design. We further developed a memory redundancy repair simulator to demonstrate the effectiveness of proposed inter-die inter-subarray redundancy repair approach. Keywords—3D integration, DRAM I. I NTRODUCTION Three-dimensional (3D) integration refers to a family of integration technologies including packaging-based 3D integration such as system-in-package (SiP) and package-on-package (PoP), die-to-die and die-to-wafer 3D integration, and waferlevel back-end-of-the-line (BEOL)-compatible 3D integration. Because of its many compelling advantages, particularly its massive inter-die interconnect bandwidth and low cost potential, wafer-level BEOL-compatible 3D integration using through silicon vias (TSVs) has received tremendous recent attention and experienced signiﬁcant development . This work is interested in 3D DRAM architecture design using wafer-level BEOL-compatible 3D integration technology. The motivation is two-fold: (i) Performance of high capacity DRAM is increasingly dominated by global interconnect, while one of the major advantages of 3D integration is the potential to largely reduce global interconnect length. (ii) Performance of computer systems is increasingly limited by memory access bandwidth and latency. 3D integration of processor and DRAM appears to be one of the most promising solutions to address this issue, where appropriate 3D DRAM architecture design is indispensable. 3D SRAM design has been addressed in , where word-line and bit-line 3D partitioning has been explored to improve SRAM speed/energy performance and evaluated based on CACTI 3, an earlier version of the widely used memory modeling tool CACTI . A so-called “true” 3D DRAM design strategy has been announced by Tezzaron Corporation , where DRAM cell arrays and DRAM peripheral circuits locate on separate dies so that high-performance logic die(s) can be used to implement DRAM peripheral circuits to improve the speed and reduce overall silicon area. Both prior works [2, 4] essentially use intra-sub-array 3D partitioning, in which each memory word-line and/or bit-line associates with at least one TSV. Although impressive improvement may be achieved, such intrasub-array 3D partitioning tends to demand fabricating a relatively large amount of TSVs with small pitch. This may put an increasingly stringent constraint on TSV fabrication as the technology scales down, particularly for DRAMs. With the objective to relax the TSV fabrication constraints, this paper focuses on coarse-grained inter-sub-array 3D partitioning, in which only the address and/or data I/O wires of each memory sub-array associate with TSVs and the entire memory sub-array including cell array and peripheral circuits remain exactly the same as in 2D design. In particular, we present two modestly different inter-sub-array 3D partitioning strategies that represent different TSV complexity vs. memory access energy trade-off. Moreover, due to the lack of known-good-die (KGD) in wafer-level 3D integration, yield improvement is a critical issue. In this regard, we propose to extend conventional 2D memory redundancy repair with 3D inter-die inter-sub-array redundancy repair to largely improve the redundancy repair success rate in 3D DRAMs. For the purpose of evaluation, we modiﬁed CACTI 5, the latest version of CACTI, to support the proposed inter-sub-array 3D partitioning strategies, which has been used to show their effectiveness on the design of a 1Gb DRAM with 8 banks and 256-bit data I/O. Finally, we developed a memory defect modeling and redundancy repair simulator to demonstrate the effectiveness of proposed inter-die inter-sub-array redundancy repair approach. II. BACKGROUND AND P RIOR W ORK A. 3D Integration Technology In general, 3D integration refers to a variety of technologies which provide electrical connectivity between stacked multiple active device planes. Various 3D integration technologies are currently pursued and can be divided into three categories: (a) 3D packaging technology: It is enabled by wire bonding, ﬂipchip bonding, and thinned die-to-die bonding . As the most mature 3D integration technology, it is already being used in many commercial products, noticeably in cell phones. Its major limitation is very low inter-die interconnect density (e.g., only few hundreds of inter-die bonding wires) compared to the other emerging 3D integration technologies. (b) Transistor build-up 3D technology: It forms transistors layer by layer, on poly-
978-1-4244-2953-0/09/$25.00 ©2009 IEEE
10th Int'l Symposium on Quality Electronic Design
respectively. massive inter-die bandwidth and wirelength reduction. where all the memory dies are designed separately using the conventional 2D SRAM or commodity DRAM design practice. C. Tsai et al. which tends to put an increasingly stringent constraint on TSV pitch as the technology scales down. Access Latency silicon ﬁlms. From an IC design perspective. particularly for DRAMs. This requires the fabrication of a relatively large amount of TSVs with small pitch. inter-die interconnects can have very high density. and interconnect. Loh  investigated the potential of 3D memory-processor integration by using a so-called “true” 3D DRAM design strategy announced by Tezzaron Corporation (see ). Each sub-bank is further divided into sub-arrays. high capacity DRAMs have a hierarchical architecture consisting of banks. particularly for high capacity DRAMs. The key feature is to put DRAM cell arrays and DRAM peripheral circuits on separate dies so that high-performance logic die(s) can be used to implement DRAM peripheral circuits to improve the speed and reduce overall silicon area.Area Memory cells H-tree Wordline stitching Peripheral circuits Access Latency H-tree Decoders Bitlines SA and others Energy Consumption H-tree Bitlines SAs Decoders & others (a) (b) (c) Fig. e. Wafer-level BEOL-compatible 3D integration appears to be the most promising option for high-volume production of highly integrated systems. access latency. Because 3D integration could reduce global interconnect. 10–14]. As the technology scales. Each bank is divided into sub-banks. Patti  demonstrated the feasibility of fabricating TSVs with 4µm pitch. or on single-crystal H-treesilicon ﬁlms. mainly driven by the great potential of using 3D memory-processor integration to improve overall computing system performance. Realized by through TSVs. and explored the corresponding 3D SRAM performance space by modifying CACTI 3. sub-banks. sense ampliﬁers (SAs). it may not be able to exploit the potential beneﬁt of 3D integration to its full extent. One option for 3D memory integration is to directly stack several memory dies connected with high bandwidth through-silicon vias. it is expected that a 3D integrated DRAM may achieve better overall performance than its conventional 2D counterpart. The 3D memory design strategies explored in [2.g. (b) access latency. DRAM Architecture In current design practice. including integration of disparate technologies. Therefore. thinning and inter-wafer interconnections . each DRAM bank can be accessed independently from the other banks. it is not Decoders readily compatible to existing fabrication process and is subject Bitlines to severe process temperature constraints that tend to degrade the circuit electrical performance. it can provide several distinct advantages. For the purpose of demonstration. wafer-level. including silicon area. which partition word-line/bitline in 3D domain. and energy consumption. compared with prior work [2. and each sub-array contains an indivisible array of DRAM cells surrounded by supporting peripheral circuits such as word-line decoders/drivers. Estimated results of (a) area. this work targets on much more coarse-grained partition- . and the data are read/written from/to one sub-bank during each memory access.  evaluated two 3D SRAM design strategies. using a 1Gb DRAM as an example. area. Such direct memory stacking has been assumed in [12–14]. With its own address and data bus. Therefore. CACTI 5 is the the latest version of the widely recognized CACTI tool that can integrally optimize and estimate SRAM/DRAM cache and memory access time. peripheral circuits. 4] essentially used intra-sub-array 3D partitioning where each memory word-line and/or bit-line associates with at least one TSV. and output drivers etc. including H-tree outside/inside banks and sub-banks and the associated buffers. 3D Memory Integration 3D integration of digital memory has been considered in prior work [2. latency and energy and contributions of different components at the 65nm technology node. III. P ROPOSED 3D DRAM A RCHITECTURE D ESIGN The objective of this work is to develop a scalable design solution that can seamlessly accommodate a potentially signiﬁcant mismatch between the ever shrinking DRAM word-line/bitline pitch and TSV pitch. and dynamic power. The results are also used in  to study 3D memory-processor integration. Intuitively. (c) Monolithic. this work focuses on the use of wafer-level BEOL-compatible 3D integration technology to design 3D integrated DRAM. although this option requires almost no changes on the memory circuit and architecture level. we use CACTI 5 to estimate the area. 1. and (c) energy consumption for an example conventional 1Gb 2D DRAM using CACTI 5  at 65nm technology node based on ITRS projection. interconnect tends to play an increasingly important role. DRAM cells. leakage. and subarrays. SA and others back-end-of-the-line (BEOL) compatible 3D technology: It is enabled by wafer alignment. Although a drastically high vertical interconnect density can be realized. bonding. cycle time.. altogether determine the overall DRAM performance. 4]. These results clearly show the signiﬁcant role of interconnect in determining the overall DRAM performance. as shown in Fig. B. 1. Reference  gives detailed discussions on modern DRAM circuit and architecture design.
SLDA and MLDA. A. each redundant row or column is used by one memory sub-array or shared by few spatially adjacent memory sub-arrays.e. redundant row/column repair has been widely used to realize defect tolerance and hence improve the yield . 3 (a) and (b) show two different options in realizing each 3D sub-array set. This work moves the 3D partitioning up to the inter-sub-array level. The global address/data routing outside and inside each band is simply distributed across all the n layers through TSVs. the Ndata -bit address bus is uniformly distributed across all the n layers and is shared by all the n 2D sub-arrays within the same 3D sub-array set through a TSVs bundle. as illustrated in Fig. as illustrated in Fig. In current 2D design practice. it may enable a higher degree of ﬂexibility to carry out redundant row/column repair and hence improve the overall 3D memory yield. leading to much higher energy consumed inside each sub-array set. 2. as we will show later. Within each 3D sub-array set. have different TSV complexity vs. In fact. we assume Ndata and Nadd are divisible by n. i. and recall that n denotes the number of memory layers. energy consumption trade-offs: the SLDA design option demands the realization of Ndata TSVs across the n layers to distribute the Ndata -bit output from each layer across all the n layers. redundancy repair across more sub-arrays can achieve higher repair success rates but may result in higher interconnect cost and even overall memory access latency penalty due to the larger area to be covered. drivers. and the operational characteristics of each sub-array are insensitive to the parasitics of TSVs. Hence each 3D sub-array set contains n individual 2D sub-arrays. Following the above intuitive discussions.e. in SLDA only one 2D sub-array handles the read/write of all Ndata bits at a time. which are referred to single-layer data access (SLDA) and multiple-layer data access (MLDA). if we consider the redundancy repair vertically other than horizontally across several memory sub-arrays. Hence it is a common practice to use redundancy repair to improve the DRAM yield. Fig. in our 3D DRAM design. Let Ndata and Nadd denote the data access and address input bandwidth of each 3D sub-array set. while all the n 2D sub-arrays participate the write/read operations in MLDA. all the 2D sub-arrays only share address and data input/output through TSVs. Intuitively. 4. The prior work [2. In current 2D design practice. 3D DRAM Redundancy Repair Since very aggressive design rules are typically applied to DRAM design. In both options. we can still achieve higher repair success rates and meantime largely reduce the interconnect cost. while all the n layers are activated for each access in MLDA. It is reasonable to expect that sharing redundant rows/columns within the 3D domain may further improve the defect tolerance and hence boost the overall 3D memory yield. and SAs. which clearly requires a much lower number of TSVs than the intra-sub-array 3D partitioning. this section will present two possible realizations of inter-sub-array 3D DRAM partitioning and a simple inter-die redundancy repair technique. 3. leading to a much lower number of TSVs and less stringent constraints on TSV pitch. split each individual memory sub-array. Let n denote the number of memory dies being integrated. Moreover. 4] essentially use intra-sub-array 3D partitioning (i. which looks exactly the same as its 2D counterpart except that each leaf is a 3D sub-array set. These two design options. only one layer is activated at a time in SLDA. Without loss of generality. for wafer-level BEOL-compatible 3D integration. In SLDA. This motivates us to propose redundancy repair across all the DRAM sub-arrays within each 3D sub-array set. we use log2 n address bits to determine which layer is activated to handle the present read/write operation. B. The essential difference between these two options is that. redundancy repair occurs within each individual memory sub-array or across few spatially adjacent sub-arrays. including • Since each individual memory sub-array is retained in one layer. across several dies). 3. Meanwhile. and each 2D sub-array contains the memory cells and its own complete peripheral circuits such as decoders.. Compared with ﬁner-grained 3D partitioning. in which we can use one redundant row in layer 2 to . Illustrated top view of a 3D DRAM with inter-sub-array 3D partitioning. such intersub-array 3D partitioning tends to have two advantages. potential yield loss is certainly a concern. each 2D sub-array handles the read/write of Ndata /n bits. respectively. while the MLDA design option does not incur any TSVs on the data bus.ing in the 3D domain. Generally speaking. DRAM tends to be much more prone to manufacturing and even post-manufacturing defects compared to other types of integrated circuits. as illustrated in Fig. • It does not affect the critical redundant row/column repair for defect tolerance within each 2D sub-array. 2 illustrates the top view of a 3D memory with inter-subarray 3D partitioning. including the memory cells and peripheral circuits. For digital memories. and the Ndata -bit data bus uniformly distributes across all the n layers outside the sub-array set. as we will show later. Bank 3D sub-array set 3D sub-array set H-Tree Routing Bank 3D sub-array set 3D sub-array set 3D sub-array set 3D sub-array set TSV I/Os 3D sub-array set 3D sub-array set Bank 3D sub-array set 3D sub-array set 3D sub-array set 3D sub-array set Bank 3D sub-array set 3D sub-array set 3D sub-array set 3D sub-array set Fig. the SLDA design option can enable a more ﬂexible use of redundancy repair within each sub-array set to improve the defect tolerance. On the other hand. Inter-Sub-Array 3D Partitioning Fig.. we can simply keep exactly the same sub-array circuit design as in the current practice. since KGD technology cannot be used as in 3D packaging. and various memory testing and repair analysis methods have been well studied in the open literature .
we use 1Gb DRAM with 8 banks and 256-bit data I/O as a test vehicle to evaluate the above presented 3D DRAM design approaches. which enables the use of one redundant row in layer 2 to repair one defective row in layer 1 if all the redundant rows in the layer 1 have been used up. and we assume its resistance can be ignored due to its relatively large size. the sub-arrays within the same sub-array set should not be all accessed at the same time. IV. The TSV size of 10µm×10µm is used. every time we only need We modiﬁed CACTI 5 to support the 3D DRAM architecture design approaches discussed in the previous section. 3. Hence this modiﬁed CACTI tool assumes a 3D bank → 3D sub-bank → 3D sub-array set hierarchy.2nJ) 2 2-layer 4-layer (a) 8-layer 4-layer (b) 8-layer 4-layer (c) 8-layer Fig. 5. all the banks and subbanks uniformly distribute over all the DRAM layers. Estimated results of (a) footprint. we must activate all the sub-arrays. Because of the use of inter-sub-array 3D partitioning. such coarse-grained inter-sub-array 3D partitioning demands a very small amount of TSVs. Shared Address Bus Layer 1 Defect Cells Layer 2 to activate one sub-array to perform the read/write operation. Furthermore. Clearly. Therefore. (b) access latency. In order to support such inter-sub-array redundancy repair strategy within each 3D sub-array set.5 1 0. E XPERIMENTAL R ESULTS Repair Redundant Rows Fig. repair one defective row in layer 1 if all the redundant rows in the layer 1 have been used up. 3D die packaging 90 SLDA MLDA 3D die packaging 25 Access Latency (ns) 20 15 10 5 0 2-layer SLDA 2D (26ns) MLDA 3D die packaging 2 Energy (nJ) 1. 4. In this work. since we may not know beforehand which layer is responsible for the present memory operation. and (b) multi-layer data access (MLDA) in which all the sub-arrays are activated. including the conventional 2D design and 3D packaging of separate DRAM dies without the use of TSVs (referred to as . in this context we trade memory access energy consumption for enabling such inter-die inter-subarray redundancy repair to further improve the memory defect tolerance. we can only apply the SLDA design option in this context. 2. Illustration of a 3D sub-array set with 4 layers using (a) single-layer data access (SLDA) in which only one sub-array is activated at a time. as illustrated in Fig. Illustration of inter-sub-array redundancy repair. we also evaluate two other design options. as pointed out earlier. hence it can greatly relax the TSV size/pitch constraints. In case no redundant row/column in one sub-array has been used to repair another sub-array.TSVs bundle Nadd/4 Distributed Nadd-bit address bus Nadd/4 Nadd/4 Nadd/4 Ndata TSVs bundle Memory cell array Peripheral circuits Ndata/4 Nadd/4 Distributed Nadd-bit address bus Nadd/4 Memory cell array Peripheral circuits Ndata/4 Ndata/4 Distributed Ndata-bit data bus Ndata Ndata/4 Ndata Ndata/4 Ndata TSVs bundle (a) Distributed Ndata-bit data bus Nadd/4 Ndata/4 Nadd/4 Ndata/4 (b) Ndata/4 Fig. Otherwise.5 0 2-layer SLDA MLDA Footprint (mm ) 75 60 45 30 15 0 2D (92mm2) 2D (2. For the purpose of comparison. and (c) energy consumption for the 1Gb DRAM design using different design approaches at 65nm node based on ITRS projection.
C. K.” IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 1214–1224. Finally. memory access energy trade-offs. Tsai. V. G. N. G. B. “Wafer-level three-dimensional hyper-integration technology using dielectric adhesive wafer bonding. Liu. We only consider random DRAM cell defects with a defect density of 0. “Design Space Exploration for 3-D Cache. Simulated results of repair success rates. MLDA activates all the sub-arrays while SLDA only activates one subarray. Who. T. Black. vol. Since the MLDA design option incurs less TSVs and correspondingly each individual sub-array has less data output compared with the SLDA design option. S UMMARY AND C ONCLUSIONS This paper investigated the architecture design of 3D integrated DRAM using wafer-level 3D integration technology. we set the CACTI area vs. K. leakage. K. 1. we developed a redundancy repair simulator to evaluate the proposed inter-die inter-sub-array redundancy repair approach. “An Industry Perspective on Current and Future State of the Art in System-on-Chip (SoC) Technology. and M. “PicoServer: Using 3D Stacking Technology to Enable a Compact Energy Efﬁcient Chip Multiprocessor. 5 shows the estimated 3D 1Gb DRAM footprint. pp. and R. “Die Stacking (3D) Microarchitecture.48) (49. we simulated a total of 1500 3D sub-array sets. 1121–1137. 2008.54) Fig. Ahn. “CACTI 5. which clearly demonstrates the advantage of enabling such inter-die inter-sub-array redundancy repair strategy. 2008. Muralimanohar. “Defect analysis system speeds test and repair of redundant memories.” in Proceedings of the 12th Symp. E. R. Prentice Hall.” Electronics.-F. Tezzaron Semiconductors. “Bridging the Processor-Memory Performance Gap with 3D IC Technology. 6 shows the simulation results.” IEEE Micro. July 2007.future-fab. and use the repair most redundancy analysis approach  that always allocates the spare row/column to repair the memory row/column with the maximum defects. and S. 2002.S. 175–179. N. and estimation results show encouraging performance advantages over direct 100% Without using inter-die inter-sub-array redundancy repair Using inter-die inter-sub-array redundancy repair Repair success rate 80% 60% 40% 20% 0% (48. http://www. H. of Annual IEEE/ACM International Symposium on Microarchitecture. Black et al. 25– 27. i.” Advanced Architecture Laboratory HP Laboratories HPL-2007-167. J.50) (51.. August 2005. J. Xie. Two inter-sub-array 3D partitioning approaches are presented. R EFERENCES        J.M.” Future Fab International (http://www. CACTI 5 has been modiﬁed to evaluate the proposed 3D partitioning strategies. a memory redundancy repair simulator has been developed to demonstrate the higher repair success rate achieved by the proposed inter-die inter-sub-array redundancy repair approach. targeting different TSV complexity vs. For the purpose of simplicity. pp. 31–48. I. 27. H. May-June 2007.J. These results are also compared against with the conventional 2D design. Jan.” Proceedings of the IEEE. pp. Rose. Lu.. P. The advantage of the proposed design options using wafer-level 3D integration over 3D die packaging is mainly due to the fact that the global routing is distributed over all the DRAM dies in the proposed design options while all the DRAM dies in 3D die packaging are completely separate and have their own individual global routing. Zschech. 94.49) Redundancy (row. vol.com/research/cacti/. vol. “3D-Stacked Memory Architectures for Multi-Core Processors. In our simulation. Lu.S. We further assume that the entire defect map for each sub-array is available to ensure the highest repair success rate. http://www.-Q. Ganusov. Mikolajick). 2006. pp. and B. 3D Stacked DRAM. M. “Three-dimensional integrated circuits and the future of systemon-chip designs. where each 3D sub-array set has four 2D subarrays and the size of each 2D sub-array is 1024 × 256. we did not take into account the effects of the bonding wire parasitics in 3D die packaging on the delay and energy analysis. June 2006. 386–397. access latency. 5(c). within one 3D sub-array set. H. T. 2006. M. Cale. Y. Loh. Springer. CACTI: An integrated cache and memory access time. 3D die packaging and conventional 2D design.52) (53. Vijaykrishnan.3D die packaging).html. For all the design options. as shown in Fig.” IEEE Design and Test of Computers. 444–455. C.hp.05%. Whelan.” in Proc.          . C. Jouppi.hpl. June 2006. vol.tachyonsemi. On the other hand. Wang. Fig.” in Proceedings of the 35th ACM/IEEE International Symposium on Computer Architecture. The obvious advantages of 3D design over 2D design are expectable because of the dominant impact of global routing in DRAM as illustrated in Fig. 1984. area. Finally.. F.A. and N. Since the parasitics in 3D die packaging are likely to introduce signiﬁcant delay. Fig. CACTI will weight the area and speed performance metrics equally during its internal optimization process. Tiwari. 5(a) and (b). Claasen. vol. and R. 6. Y. 22. S. D. an inter-die inter-sub-array redundancy repair approach has been proposed to improve the 3D DRAM redundancy repair efﬁciency. pp. Gutmann. http://www.com/memory/Overview 3D DRAM. Mazumder. and S.hp. Oct. Moreover. J.51) (52. Loh. H. Patti.” Materials for Information Technology: Devices. Itoh. VLSI Memory Chip Design. and dynamic power model. Burtscher.53) (54. 16. T. April 2008. 94. The key is to focus on coarse-grained inter-sub-array 3D memory partitioning in order to accommodate the potentially signiﬁcant pitch mismatch between DRAM word-line/bit-line and TSVs. as shown in Fig. Kgil et al. 2007. cycle time. 2001. Irwin. on Architectural Support for Programming Languages and Operating Systems.” Proceedings of the IEEE. Fault-Tolerance and Reliability Techniques for High-Density Random-Access Memories. hence SLDA leads to less energy consumption. Interconnects and Packaging (Eds. 556564. SpringerVerlag (London) Ltd. “Processor Design in 3D Die-Stacking Technologies.col) (50. Y. What. pp.com/). T.com/techreports/2008/HPL-2008-20. and energy consumption using the proposed SLDA and MLDA design options and 3D die packaging under 65nm technology node based on ITRS projection. pp. NovemberDecember 2005. Boudreau. C.. Tarr. When?. 469–479. Vitkavage. Chakraborty and P.htm.hpl.e. our analysis below represents a pessimistic estimation of the beneﬁt of TSV-based 3D integration over 3D die packaging. speed trade-off parameter as 50%:50%. pp. Thoziyoor. pp. Murphy.-Q. “3D Integration: Why. Xie. MLDA can realize slightly better area and access latency performance.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.