You are on page 1of 6

Buffer Insertion for Clock Delay and Skew Minimization

X. Zeng1, 2, D. Zhou1, Wei Li1

Department of Electrical and Computer Engineering University of North Carolina at Charlotte, NC 28223 of Electronic Engineering Fudan University, Shanghai 200433, China

Abstract  Buffer insertion is an effective approach to achieve both minimal clock signal delay and skew in high speed VLSI circuit design. In this paper, we develop an optimal buffer insertion and sizing scheme. Particularly, due to the buffer to buffer delay is a convex function of buffer positions in a clock tree, we show that the minimal clock delay can be obtained by equalizing derivatives of this convex function, and the minimal skew can be obtained by equalizing delay functions of different source to sink paths. Based on this theory, we further develop a three-stage method to initially insert buffers in a given clock routing tree, minimize delay by optimizing buffer positions, and minimize skew by buffer level augment and buffer size refinement. The presented algorithm achieves both minimal delay and skew in real clock tree design. 1. Introduction In the coming century, the clock rate of microprocessors will approach multi-GHz. This trend will last as process technology moves into the deep submicron level. The drastically increased requirement for high performance and high speed VLSI circuits has posed challenges to the design of high speed clock network, where clock delay and skew minimization has been a critical problem. Clock delay and skew can be optimized by good routing strategy [1][2] and effective buffer insertion [3][5]. Early research [1][2] have proposed algorithms to minimize skew by properly balance the length of the path from source to sinks. To further reduce the clock signal delay and transmission line effect, buffer insertion is necessary. The objective of buffer insertion is to find the proper buffer numbers, sizes and placement in a given clock tree. The fix-position algorithms proposed in [6][7] insert buffers at either branching nodes in a clock tree or directly after the branching nodes. However, the optimal buffer positions for minimal delay do not necessarily reside on those places as shown by our optimal buffer position theory in this paper. Algorithms proposed in [6][8] insert the same number of buffers in each sourceto-sink path. They further assume the buffers at the same level have the same size. Undoubtedly this buffer insertion strategy helps to reduce skew sensitivity to process variations [9], but this strategy requires an almost balanced routing tree topology. Otherwise, it can not effectively reduce the skew to any specified toler* This research is supported in part by AirForce Office of Scientific Research grant F49620-96-1-0341, NSF NYI Award MIP9457402 and project 69806004 supported by NSFC.

ance. The balanced buffer insertion scheme [10] attempts to partition the clock tree into several subtrees such that every subtree has equal path-length and all source-to-sink paths have an equal number of levels. This scheme requires an equal path length routing tree. In many real design situation, it is impossible to construct an equal-path-length tree or well-balanced tree in a routing plane if there exist pre-placed cell modules [15]. Furthermore, an equalpath-length clock tree does not guarantee the minimal delay and skew due to the fact that circuit delay depends not only on the path length but also on the circuit topology [16]. In practical IC design, the delay and skew are usually determined by the system specification. It is unnecessary to achieve zero skew [11][12]. This paper endeavors to solve the optimal buffer insertion problem from both theoretical and practical design points of view. In contrast to the previous research, we aim to establish our buffer insertion theories, methodologies and algorithms based on real circuit design settings, where the clock routing tree is neither well balanced nor equal-length. 2. Definitions and Delay Model We start by giving a synopsis of the basic definitions which will be used throughout this paper. 2.1 Definition A tree is denoted by T(V,E), where V is the set of nodes and E is the set of edges. Definition 1 (Unbuffered routing tree): An unbuffered routing tree (Figure 1(a)) is a directed binary tree T(V,E), which consists of wire set E and node set V={{s0} ∪ SI ∪ IN}, where s0 is the unique source node, IN is the set of branching nodes (or internal nodes) and SI is the set of sink nodes. Definition 2 (Buffered routing tree): After buffers have been inserted into the routing tree, the tree is called a buffered routing tree BT (Figure 1(b)), where the inserted buffers become the new nodes, called buffer nodes in the tree. The node set V of BT is then indicated by {{s0} ∪ SI ∪ IN ∪ BN}, where BN is the set of buffer nodes. Definition 3 (Wire): A wire ev (ev ∈ E) is a directed edge (u,v) (u,v ∈ V), where the signal propagates from u to v. Node u is called the parent of node v, and node v is called the child of node u. Definition 4 (Path): A path P(v0,vn) from node v0 to node vn in tree T is a sequence of nodes v0, v1, v2,..., vn∈V and wires e v1 , e v ,..., e v ∈V such that e v links v and v , where e v ≠ e v , i-1 i 2 n i i j ∀i, j, i ≠ j . Definition 5 (Buffer layer): In a clock tree T, if k buffers are inserted into a source-to-sink path P(s0,si), we say that there are klayer (k-level) of buffers on path P(s0,si). From source s0 to sink si, the j-th (j=1,2,...,k) buffer is called j-th layer (level) buffer. 2.2 Delay Model The delay model used in this paper is given in Figure 2. A

2) Delay Minimization Without adding more buffers or changing buffer sizes. The constant parameters. The objective of Figure 2: Delay model (a) A distributed RC line.(3) is used for the approximation. and then to find the optimized buffer positions and sizes. and le and we are the length and width of the wire e. respectively. r e ( le ) = r s ⋅ le ⁄ we = r 0 le c e ( l e ) = c a l e w e + 2c f ( l e + w e ) = ( c a w e + 2c f ) l e = c 0 l e where le >> we. in the following. bi+1 τ (v 0.. clock delay is minimized by moving buffer positions. their positions and sizes such that clock delay and skew are minimized. respectively. (b) The equivalent π-model.e v . The algorithm generates a buffered clock tree called BT. 3. and C(Tv) is the lumped capacitance of the subtree Tv rooted at node v. Buffer delay τb is given by the following equation. respectively. Input T0.(4) is the well known Elmore delay [4]. where r ev and c e v are the resistance and capacitance of the wire ev. one strategy is to fix the number of buffers first.Three-stage clock delay and skew optimization approach. The buffer is modeled by a standard RC network (Figure 2(c. We only need to consider this k-level buffer insertion problem since we can change the value of k from 1 to a desired number and repeat the same procedure to find the optimal value of k.v n) = ∑ τ ( ev ) i n (4) Eqn. . Therefore. (d) Buffer model. area capacitance and fringe capacitance of a unit-width unit-length wire segment. To solve the above defined buffer insertion problem. Resistance re and capacitance ce of wire e are given by Eqns. ce (3) τ ( e v ) = r e ⋅  -----v + C ( T v )  v  2 Supposing a path P(v0. where db. (4). respectively.d)). (1) and (2). we actually do not need to run this procedure many times. We call this problem a k-level buffer insertion problem. terms r0 and c0 respectively denote the resistance and the capacitance of an unit-length wire. which is further modeled by the equivalent π-model [13] as shown in Figure 2(b) (assuming properly segmented). RC re ce/2 (a) Buffer db cb (c) (d) (b) ce/2 rb Initial Buffer Insertion BT Delay minimization by buffer position optimization (1) (2) BTD Skew minimization by adding buffers and changing buffer size Output BTDS Figure 3. Figure 3 sketches our proposed three-stage buffer insertion approach for the k-level buffer insertion problem. Considering the number of buffers is upper bounded by a very small number in practical application.. determine the number of the inserted buffers. wire in a clock tree is modeled by a distributed RC line (Figure 2(a)). k is assumed to be a fixed value. Eqn. the path delay of P(v0. (c) A clock buffer. we first formulate the buffer insertion optimization problem. and then present the three-stage approach to achieve both delay and skew minimization. A clock tree before and after buffer insertion. 1) Initial Buffer Insertion The initial buffer insertion algorithm inserts k level of buffers in each source-to-sink path for a given unbuffered clock tree T0. e v . Buffer Insertion Problem: Given a routed clock tree T0 and the buffer size range from the maximum value rbmax to the minimum value is given by Eqn. rb and cb are buffer’s intrinsic delay. 2 n bj u v ev s0 s0 : . For calculating the delay of wire ev which enters node v from its parent node. Delay and Skew Minimization by Buffer Insertion In this section. are the sheet consisting of a number of n wires e v1. b1 (a) Unbuffered routing tree (b) Buffered routing tree Figure 1.. k i=1 : : bi : . say k buffers in each path.. and db is taken as a nonzero value in our discussion and experiment. τb = d b + r b ⋅ C b (5) where Cb is buffer’s load capacitance. rs ca and cf. output resistance and input capacitance. From the above definition.

special treatment should be applied to ensure continuity of the delay function. In the following. To illustrate the complexity of this problem.(8). (10) f ′1 ( x1 ) = f ′2 ( x2 ) The position of buffer Bj for minimal delay is therefore given by β 2 – β 1 + 2α 2 L j (11) 2 ( α1 + α2 ) We have the following optimal buffer position theory for delay minimization. an optimal solution of determining the number and position of buffers for driving a long uniform wire was developed in the previous work [14]. we analyze this problem using the circuit theory. the following equation holds.. vm+1 Bj+1 . 4. We minimize clock tree delay in a path by path fashion.. v m ) = α 1 x 1 + β 1 x 1 + Γ 1 (7) Similarly. which is a quadratic function of variable x1. si . and if necessary. vm and vm+q denote the three buffer nodes. v m ).1. Combining Eqns. x 1 = -------------------------------------5. as shown in Figure 4. for τ ( v 0. 5. the delay τ ( v m. when a buffer is moved over a branching node. However. according to The- . Suppose buffer Bj locates between vm-1 and vm+1.. x1 + x2 = L j (6) Using the delay model described in Section 2. The result of this stage is a tree with minimal delay and skew. add more buffers to minimize skew. However. The following theorem was proved in [16] for this algorithm. We first derive a theory of the optimal buffer position and then show the need for splitting a buffer when minimizing the path delay. 3) Skew Minimization In this stage. vm-1. consider one portion of a path P(s0. In this subsection. v m + q ) = α 2 x 2 + β 2 x 2 + Γ 2 2 (8) Note that parameters α 1. A buffered path. can be written as a function of x1 in Eqn. v1. (7) and (8) we obtain τ ( v 0. the sizes and positions of the buffers on the minimum delay path are kept unchanged. the path delay of a source-to-sink path achieves its minimum when the derivatives of all delay functions between two consecutive buffers on the path are the same. clock tree delay minimization by buffer placement is challenged by the fact that buffer position in one path affects the performance of the other paths. orem 2 buffer Bj will be moved continuously to the right along the wire in order to reduce the path delay. 5. s0 Bj-1 v0 v1 . vm+q-1.. if the optimal position is not in between the two adjacent branching from source s0 to sink si in a buffered clock tree. v m + q ) from buffer Bj to buffer Bj+1 can be written as a function of x2 in Eqn... In figure 5.1. From Theorem 2. a buffer can be continuously moved to its optimal position for delay minimization. Γ 2 are constants for the given path. v1 Bj B j1 Bj2 vm’ vm-1 vm . We use buffer sizing.. Bj+1 .1 Single Path Delay Minimization We now investigate how to achieve the optimal position when one buffer is moved between two branching nodes and all other buffers are fixed at their current position.. when it is moved down through a branching node to maintain the continuity of the delay function.. vm-1 vm vm+q-1 vm+q v0 vm+q-1 vm+q Figure 4... The wire length of e(vm-1. Bj-1 . we establish a theory for the optimal placement of buffers in a single path for achieving the minimal delay. called BTDS.. before buffer splitting.. v m + q ) the delay portion contributed by buffer Bj’s output resistance r b j and input capacitance c b j is given by τ 1 = R p c b + r b ( C p1 + C p2 ) (12) j j where RP is the resistance of path P(v0. Based on this theory. In such a case. v m + q ) in Eqn. i.. we present details of the above three procedures. In the following sections. vm+1). (9). the delay x1 x2 Figure 5. vm) and e(vm. Initial Buffer Insertion The initial buffer insertion procedure was presented in our previous paper [16]. v2.vm). Between buffer Bj-1 and Bj there are m – 1 ( m ≥ 1 ) branching nodes. f 2 ( x 2 ) = τ ( v m. further movement needs to be treated specifically. x1 Bj vm+1 x2 . All the buffers are chosen the same middle size.this stage is to generate a minimal delay buffered tree called BTD. the buffer may need to be moved over branching nodes.. Especially. vm+2.. Γ 1.2 Buffer Splitting Theory Consider a generic case shown in Figure 5. Theorem 2 (Optimal Buffer Position Theorem): Given a buffered clock tree. vm+1) is a constant Lj in the given routing tree.. Here we briefly describe it for the purpose of a self-contained paper. β 1.. the delay function is no longer a continuous function of the buffer movement. v m + q ) = α1 x1 + β1 x1 + Γ1 + α2 ( L j – x1 ) + β2 ( L j – x1 ) + Γ2 2 2 (9) The minimum delay of f ( x 1 ) is achieved when f ′ ( x 1 ) = 0 .. we discuss how to split a buffer into two smaller buffers.1 Buffer Position Optimization for Delay Minimization τ ( v 0.... from buffer Bj-1 to buffer Bj. respectively.. 2 f 1 ( x 1 ) = τ ( v 0. 5.. we start by examining the problem of buffer placement in a single path in a buffered tree.(7). Without loss of generality.e. Let v0. the delay from v0 to vm+q. Delay Minimization by Buffer Position Optimization To achieve the minimum delay. Theorem 1: The proposed level-by-level initial buffer insertion procedure inserts the same number of buffers in each sourceto-sink path. β 2. But when the buffer hit the branching node v m ′ . α 2. i. f ( x 1 ) = τ ( v 0.. Between buffer Bj and Bj+1 there are q – 1 ( q ≥ 1 ) branching nodes vm+1.e. CP1 and CP2 are . Hence. The procedure is a top-down (from source to sinks) and level-by-level method to insert k level of buffers in each source-to-sink path. Buffer splitting technique. Let x1 and x2 denote the wire length of e(vm-1. an iterative algorithm is consequently proposed to minimize the delay of a buffered clock tree.

si) by moving buffers on this path.1.2. buffer B11 can not be moved. end while. ∆X : buffer moving step. Buffer Bi1 can not be moved towards branching node v4. Let Pi denote the i-th path to be minimized and τ P be the delay of i path P . but can not be moved up through the branching nodes and will be split into two buffers when moved down through the branching nodes. which will increase the previously optimized minimum Repeat Step1 and Step2 until τ min can not be further reduced. 5. Procedure PathDelayMinimization (P(s0. In this example.W b j1 j j2 j j1 j2 j 1+γ 1+γ where L b . v m + q ) the delay portion τ 2 contributed by buffer Bj2’s input capacitance c b and output resistance r b is j2 j2 Table 1. L b . B2.2. we develop an iterative algorithm for single path delay minimization. Buffer Bi2 can be moved between v4 and v5.. i Step1. Let Pmin denote the minimal delay path and τ min denote the minimal delay. else τ ← Calculate path delay. if buffer Bi is moved down ∆X . L b = L b = L b (16) W b = ----------. B. P(s0. if τ < τ then min Move buffer Bi up ∆X . W b . and W b .lumped capacitance of the two subtrees rooted at node v m ′ . Sort all paths in BT (buffered tree) according to their delay in an ascending order τ P1 ( τ min ) < … < τ Pi < … < τ Pq . Suppose we optimize path P( Figure 6 illustrates the above constraints. Step2. Input: Path P(s0. 6. To ensure the delay monotonously decreasing and the convergence of the algorithm.3 Algorithm for Single Path Delay Minimization Using Theorem 2 and 3. it is split into two buffers Bj1 and Bj2. Otherwise. After the tree delay minimization we obtain a buffered tree with minimal delay. When the last buffer on the path is tested.(15). Theorem 3 (Buffer Splitting Theorem): In a buffered clock tree. For i = 1 to q Minimize delay of Pi using the single path delay minimization algorithm with buffer moving constraints (discussed in Section 5.1 Tree Delay Minimization Algorithm Suppose there are q paths in a buffered clock tree.2 Moving Constraints for Monotonous Delay Reduction Suppose Pi is the current path for delay minimization. move_flag ← 1. To ensure this .(14) and Eqn. After buffer Bj is moved onto branches 1 and 2 directly after the branching node v m ′ . since such a movement will increase the capacitive load of the subtree rooted at buffer B11. cb = cb + cb (14) where c b j1 j j1 j2 r b ( C P1 + C P2 ) = r b C P2 j j2 (15) τ ← Calculate path delay. j j1 j2 Bj1 and Bj2. k) τ min ← Calculate path delay. C. it can not be moved. The algorithm is as follows.W b . τ 2 should equal to τ 1 . Output: Optimized path P(s0. tree delay minimization is much more difficult to handle than single path delay minimization. end if. the algorithm goes back to the first buffer and starts a new iteration until the delay can not be further reduced. three constraints are enforced in the buffer moving process: A. γ 1 W b = ----------. if it is split into two buffers with their sizes satisfying Eqn. From source to sink on path Pi. Step3... s1) is the minimal delay path Pmin. To ensure the continuity. the single path delay minimization (SPDM) algorithm is given in Table 1. According to the buffer moving constraints. called BTD. L b denote the transistor gate length of buffer Bj. On path Pi other buffers which are not in case A and B. but can not be moved over v4. The algorithm repeatedly selects a buffer on the path from source to sink and tests if the path delay can be reduced by changing buffer position. An effective algorithm should be developed to ensure the convergence of the minimization process. Bk on this path from source s0 to sink si). while (move_flag = 1) move_flag ← 0.. (16). 5. all path delays in the clock tree remain the same. and will be split into two buffers when moved through v5. because one path minimization affects the performance of the others which may have already been minimized. end for. move_flag ← 1. can be moved both up and down. However. the first buffer which is not on path Pmin can not be moved down along the path. move_flag ← 1.2). if buffer Bi is moved up ∆X . The tree delay minimization algorithm optimizes the tree delay in a path by path fashion using the single path delay minimization algorithm (SPDM). We obtain the size relationship of the two split buffers in Eqn. W b j j1 j2 denote their gate width. end Procedure.2 Delay Minimization for Clock Tree The algorithm for tree delay minimization is based on the single path delay minimization. If the delay is reduced. if τ < τ min then Move buffer Bi up ∆X . (16). when a buffer is moved down over a branching node. Formally. Buffer Bi1 can only be moved towards branching node v1. it stays in its original (There’s k buffers B1. for i ← 1 to k do τ 2 = R p ( c b + c b ) + r b C P2 j1 j2 j2 (13) is the input capacitance of buffer Bj1. end if. 5. but can not be moved through v1. Skew Minimization Strategy Skew minimization is achieved by increasing buffer size or adding more buffers (if necessary) to the paths such that delays on these paths decrease towards the minimum delay. If a buffer on path Pi is also on path Pmin. 5. Algorithm of single path delay minimization. Suppose C P1 = γ C P2 . For τ ( v 0. which results in the following two conditions in Eqn. the buffer is moved to the new position.

And we assume buffer intrinsic delay varies directly as its output resistance. (17) holds. Table 3 presents results of delay and skew before and after buffer insertion. Otherwise. Since buffer B11 is on the minimal delay path and the size of buffer Bi1 has been optimized. (18). the procedure increases the candidate buffer size one by one to the maximum until the current path delay is smaller than the minimal delay of the tree. where routing obstacles exist. Results and Conclusions The presented algorithm was implemented in C language on SPARC 20 workstation. which could be done sizing for the skew optimization. Apply the tree delay minimization algorithm to optimize the positions of the buffers on the candidate subpath. i. specific buffer sizing algorithm is given below. For all of the tested cases the skew is reduced to within the scope of 100ps which is necessary for running the clock network at multi-GHz. Step 3. 6. Buffer sizing has been done for path P(s0. . Step 1. For example. A four-step strategy is proposed for the buffer adding algorithm.sq v4 Bi2 B23 v2 Pmin s1 Figure 6: Delay minimization by buffer position optimization. Different number of the inserted buffer layers were tested and the optimal buffer layers were found. Along this subpath. Eqn. The above algorithm minimizes skew path by path. 300 and 500. Repeat Step1 and Step 2 until the skew specification is met. Bqm vi Bi1 s0 B11 s2 s1 Figure 7: Buffer sizing for skew minimization. on subpath P(vi. The clock routing is generated using the algorithm in [15].si) only buffers Bij and Bim are the candidate buffers. besides the routing trees are not well balanced and not equal length. rbj denotes the buffer’s output resistance. 7. suppose there are m candidate buffers for buffer sizing. If Eqn. We Bqj sq Bij Pi Bim si Step 2. 1 i q n–1 ˜ ∑ ( r bj – r bmin )C j + ( r bn – r bn )C n = τ P – τ min i (18) j=1 s0 B11 Bi1 v1 B12 v5 Bim si v3 B24 s2 where n – 1 ( 1 ≤ n ≤ m ) is the number of buffers having been ˜ increased to the maximum size r bmin and r bn is the size of the current buffer under consideration (labeled as the n-th buffer).1 Skew Minimization Algorithm Step 1. the sizes and positions of the buffers on the minimal delay path in BTD are kept unchanged. only its intrinsic delay and output resistance change. Apply the buffer sizing algorithm to optimize the sizes of the buffers on the subpath. In each test case.3 Adding Buffer Strategy When all buffers in the candidate subpath are increased to their maximum size and the skew is still beyond the specification. Note that due to the buffer structure is the cascaded inverters [17]. all the buffers on that path will have their sizes and positions fixed such that later optimized paths will not increase the delay of the previously optimized paths. changing buffer size is enough to meet the skew specification. For i = 2 to q On the current path Pi. the size of this current buffer is determined by Eqn.(17). Repeat Step 1 to Step 3 until the tolerable skew is within the specification. respectively. si) is the current path under consideration of sizing. In the table. P(s0. the routing trees have the number of sinks 50. 6. Once a path is optimized.e. Step 3. We tested a 6mm x 6mm chip and a 1cm x 1cm chip. strategy do not increase the minimum delay. Step 4. Apply the Layer Defined Initial Buffer Insertion algorithm to add one more layer buffer to the candidate subpath.2 Buffer Sizing Procedure Suppose there are m candidate buffers on the subpath for sizing. Term Cj denotes the lumped capacitance of the subtree rooted at the j-th buffer. the summation of the Elmore delays of the subtrees and the buffer delays along a source-to-sink path. the delay is measured as the signal propagation time from the source to the sinks. The results show that the reduction of both delay and skew using the proposed method are in one order of magnitude. additional buffers are added to the subpath. In this paper. Sort all the paths in BTD according to their delay in an ascending order τ P ( τ min ) < … < τ P < … < τ P .. find the subpath for buffer sizing or buffer adding. from top to bottom. when buffer size is changed. we derived an optimal buffer placement theory for delay minimization based on the Elmore delay model. and τ skew is the specific skew tolerance. s1) is the minimal delay path. If buffer sizing suffices. (17) tests whether the delay of the considered path can be reduced to satisfy the specified skew tolerance by setting all the candidate buffers to the maximum buffer size rbmin. its input capacitance remains the same. Suppose P(s0. Step 2. j Pmin ∑ m 6. Table 2 shows the technology parameters used in our experiment. Then. more buffers need to be added into this path. in Figure 7. Reduce the delay of path Pi with respect to the minimum delay τ min by sizing or adding buffers. ( r bj – r bmin )C j ≥ τ p – τ min – τ skew i (17) =1 In Eqn. 100. sq). Also we will setup several constraints for buffer sizing and buffer adding to guarantee the convergence of the algorithm.

pp.63 12. [6] S. “Performance optimization of VLSI interconnect layout”.14 Ω/u Wire Width 10 µm Area Capacitance 0. [8] Y.10 660. 1995 [11] F.25 8. Chen and D. Proc. Cong and G. Haksu Kim for his valuable discussions and providing us with the test input examples.P.27 InitiallyIns erted Buffer Layer 4 4 5 4 4 5 4 4 After Delay&Skew Minimization Delay Skew (τ) (τs) 707. on Computer-Aided Design. 1999. International Journal of High Speed Electronics and Systems. Franklin.62 169.97 87. Zhou. “Clock skew optimization”. Solid-State Circuits.573-579.08 fF/µm2 Fringe Capacitance 0. [12] J.32 58.70 66. Zeng. 1998. Conf.39 12.1).06 2.55 17. References [1] M. Dhar and M.46 27. “Effective Buffer insertion of Clock Tree for High-Speed VLSI Circuits”. He. it can be easily used for a more general case where the signal arriving time to each individual sink is specified to be different. “On the Optimal Drivers of HighSpeed Low Power ICs”. 1990. N. 55-63. pp.C. Computer. M.70 790. pp.29 47. pp.32-40.38 1106. Dai.57 50 100 Chip1 300 500 50 100 Chip2 300 500 *all delay and skew data are in ps. on Computer-Aided Design. Since our approach is carried out in the manner of path by path. Kim and D. [15] H. [16] D. A. “Optimal Wire Sizing and Buffer Insertion for Low Power and a Generalized Delay Model”.87 430. Conf. 1994. “Skew and Delay Optimization for Reliable Buffered Clock Trees”. van Ginneken. W.11 889.Y. Koh and P. pp.11 Chip Sink# Before Buffer Insertion Delay Skew (τs0) (τ0) 4422. “An Effective Buffer Insertion Algorithm for High-Speed Clock Network”.41 16. 23:291-300. Zhou. P. vol. 945-951. “Algorithms for VLSI Physical Design Automation.82 72.50 10. [7] L.05 9932. Pillage. July 1990. “An Automatic Clock Tree Design System for High-Speed VLSI Designs: planar clock routing with the treatment of obstacles”. 39. 556-562. vol.24 2. [3] B. “Clock routing for high-performance ICs”. Longest Path Length (cm) 2.06 11445. Lillis. Jan.P.60 1248. 27th ACM IEEE Design Automation Conference. Elmore. Pullela. Li and X. pp. Wu and N. 21:1-94.1995. Impv2 = τ s ⁄ τ s 0 show that the optimal buffer placement for delay minimization is achieved when all delay functions have the same derivative values.25 52. [2] A. “High-performance clock routing based on recursive geometric matching”. [13] N. European on ComputerAided Design.18 Delay&Skew Improvement Impv1 Impv2 6. submitted to IEEE Design Automation Conference.01 pF MinBuffer input C 0. Wong. pp. [4] W. 138143. 26(no. IEEE J.31 710. “The Transient Response of Damped Linear Network with Particular Regard to Wideband Amplifier.53 917. A. Jackson. L. Applied Physics. J. Srinivasan and E. C. 491-496. Consequently.53 9.219-223.48 674.12 22.P. 1999 to appear. pp. International Symposium on Circuits and Systems. Anceau.” J. pp. P. [5] J. pp.K.01 2.71 24.26 16. IEEE Trans. 7(no. H.39 3.89 84. S. IEEE J. of Solid-State Circuits. July 1992.30 891. [10] Joe G. [9] J. 1991. Proc.27 48. Proc. 1990.29 659.08 5668. International Symposium on Circuits and Systems. we also derived a formula to calculate the optimal size of the inserted buffers in the meantime.18 1325. Fishburn. INTEGRATION. the VLSI Journal.1993.39 3. we developed a delay minimization algorithm which gives the minimal clock delay.332-337. 287-303. “An algorithm for zero-skew clock tree routing with buffer insertion”. Xi and Wayne W.49 27055. IEEE Int. Omar and L.19 36485. Liu.T. “Buffer placement in distributed RCtree networks for minimal Elmore delay”. Cong.39 124. Menezes. vol. [17] D.03 fF/µm MaxBuffer output R 5 Ω MinBuffer output R 100 Ω MaxBuffer input C 0. A skew minimization scheme to minimize the skew of the clock signal is further expatiated. Sherwani. pp. Sherwani. J. Microelectronics Journal. “Optimum Buffer Circuits for Driving Long Uniform Lines”.96 14727. 400-402. Kluwer Academic Publishers. 1996. Zhou and X. pp.A. ACM/IEEE Design Automation Conference.56 17. Kahng. Sheet Resistance 0.65 14794.F. Lin.41 7. IEEE Int. [14] S.11 3. 1991.55 831. 1996. vol. B. 1948. Kuh.01 pF MaxBuffer intrinsic delay 100 ps MinBuffer intrinsic delay 30 ps Table 3: Results before and after buffer insertion. Acknowledgment The authors wish to thank Mr. pp.24 2. C. . 865-868. Impv1 = τ 0 ⁄ τ . 19.32 3. To achieve the minimal skew. Robins. “Buffer Insertion and Sizing Under Process Variations for Low Power Clock Distribution”.2). 1982.Table 2: Technology parameters. 51-56. Chen and T.” 2nd edition. “A synchronous approach for clocking VLSI systems”.K. Madden. SC-17.42 3. 28th ACM IEEE Design Automation Conference.