Professional Documents
Culture Documents
I. I NTRODUCTION
Fig. 1. Example of a symmetrical clock tree.
As the technology of semiconductor process is scaling
down to 10nm and below, clock distribution network (CDN)
becomes even more challenging due to on-chip variation hybrid CDN structures like clock mesh and clock spine can
(OCV) effects [1]. And CDN contributes more than 40% of also be implemented easily because of its low skew feature.
processor power [2]. Clock tree is the most common CDN
structure due to its simplicity. Clock tree design in VLSI is II. S YMMETRICAL CLOCK TREE
also called clock tree synthesis, which is used to dynamically
A. Problem formulation
insert clock drivers between the clock source pin and multiple
receiver pins, physically placing the drivers in the optimized Given: a set of N clock sinks S = {s1,s2,...,sn}, with their
locations, and routing the clock nets. locations, {x1,y1}, {x2,y2},...,{xn,yn}, a library of buffers,
clock skew constraint, clock slew rate constraint and capaci-
It is well known that H-tree is a classic top-level CDN tance constraint.
architecture with nearly equal geometric lengths that can be
effective against OCV problem. However, H-tree does not Problem: obtain a symmetrical clock tree with appropriate
account for the uneven distribution of sinks and does not level planning, topology generation, buffering and routing
minimize wire capacitance because of much wire length cost resources such that the given design constraints are likely to
[3]. Compared with traditional H-tree, the symmetrical tree- be satisfied.
like structure has the characteristics of shorter wire length, and
The sinks represent the registers in the layout design. Skew
more broad scope of application. And it also inherits the robust
is because the difference of the clock arriving time from sinks.
advantage of H-tree. Another influential algorithm in tradi-
Let r be the root, N be the number of clock pins, and li be a
tional tree style is called Deferred Merge Embedding (DME)
leaf. Clock skew is defined as
which aims at the minimum wire length and zero clock skew
[4]. Compared with the classic DME tree, the symmetrical Skew = max|d(r, li ) − d(r, lj )|, (1 ≤ i, j ≤ N )
tree-like structure does not need to change the delay model
for getting low skew while technology library updates. The Fig. 2 shows our symmetrical clock tree design flow. The
idea of symmetrical clock tree structure is proposed in papers detail approaches will be discussed in following parts.
[5], [6], as shown in Fig. 1. However, these related articles do
not explicitly apply obstacle-avoiding feature to this tree style.
On the other hand, the fan-out number of their clock trees also 7UHH $UFKLWHFWXUH 0DWFKLQJ &OXVWHU :LUH 6QDNLQJ
%XIIHU 6WUDWHJ\
3ODQQLQJ &RQVWUXFWLRQ IRU EDODQFH WUHH
has a corresponding limit which will cause the clock tree level
number to increase easily.
In order to overcome the drawbacks of the previous Fig. 2. The proposed algorithmic flow.
works, an obstacle-aware symmetrical clock tree algorithm
is proposed to solve the problem for facing more practical
B. Tree architecture planning
layout situation. The introduction of multiple fan-out matching
algorithm and buffer insertion strategy also promote results of Level number of tree can be determined based on sink
ISPD benchmarks [7]. By using our tree synthesis algorithm, number. We have factorized the number of sinks N, including
516
GMA phase, and this arc ensures that the three TRRs have TABLE I. B UFFER TABLE
intersection. Driver Receiver Fanout Spacing
buffer1 buffer1 2 d1
buffer1 buffer2 2 d2
buffer2 buffer1 2 d3
buffer2 buffer2 2 d4
slew(ps)
70
60
50
0DQKDWWDQ ULQJV
40
30
10
0
0 200000 400000 600000 800000 1000000 1200000
Fig. 4. Cells merging example.
Fig. 6. Slew increases with spacing increasing (the wire type is 0, the fan-out
E. Buffering strategy number is set to 2, and one buffer1 drives two buffer2).
517
TABLE III. E XPERIMENTAL RESULTS FOR IBM BENCHMARKS .
ACKNOWLEDGMENTS
III. E XPERIMENTAL RESULTS This work is supported by the Strategic Priority Research
The proposed approach is implemented by using standard Program of Chinese Academy of Sciences (under Grant XDA-
C++ language on a PC workstation of Intel Core i7-3537U 06010402).
CPU. Four benchmarks in ISPD09 clock network synthesis
contest are used to test our symmetrical clock tree synthesis R EFERENCES
algorithm [7]. For comparison, NGSPICE simulation based on [1] D. Wang, X. Du, L. Yin, C. Lin, H. Ma, W. Ren, H. Wang, X. Wang,
the 45nm process technology is used to evaluate the quality of S. Xie, L. Wang et al., “Mapu: A novel mathematical computing
results. Additionally, we also ran experiments on benchmark architecture,” in High Performance Computer Architecture (HPCA), 2016
circuits r1-r5 [4]. The results of clock skew, resource usage IEEE International Symposium on. IEEE, 2016, pp. 457–468.
are shown in Table II and Table III. Compared with Shih’s [2] J. M. Rabaey, A. P. Chandrakasan, and B. Nikolic, Digital integrated
circuits. Prentice hall Englewood Cliffs, 2002, vol. 2.
approach [5], we obtain 17.2% decrement on skew result while
[3] D.-J. Lee and I. L. Markov, “Contango: Integrated optimization of soc
we also use 24.5% less capacitance resource on the average. clock networks,” VLSI Design, vol. 2011, 2011.
The routing graph of ispd09f11 is shown in the Fig. 9. [4] R. S. Tsay, “Exact zero skew,” in IEEE International Conference on
Computer-Aided Design, 1991. Iccad-91. Digest of Technical Papers,
TABLE II. E XPERIMENTAL RESULTS FOR ISPD09 BENCHMARKS . 1991, pp. 336–339.
Shih’s Approach [5] Our Approach [5] X. W. Shih and Y. W. Chang, “Fast timing-model independent buffered
benchmarks sinks Skew (ps) Cap (f F ) Skew (ps) Cap (f F ) clock-tree synthesis,” in Design Automation Conference, 2010, pp. 80–
85.
ispd09f11 121 0.110 95749 0.093 73595
ispd09f12 117 0.051 96609 0.038 72211
[6] J. T. Yan, M. C. Huang, and Z. W. Chen, “Top-down-based symmetrical
buffered clock routing,” in Proceedings of the great lakes symposium on
ispd09f21 117 2.321 108755 2.076 81453
VLSI, 2012, pp. 75–78.
ispd09f22 91 1.160 69696 1.986 58721
[7] C. N. Sze, P. Restle, G. J. Nam, and C. Alpert, “Ispd2009 clock network
synthesis contest,” in International Symposium on Physical Design, ISPD
2009, San Diego, California, Usa, March 29 - April, 2009, pp. 149–150.
IV. C ONCLUSION [8] A. B. Kahng, J. Lienig, I. L. Markov, and J. Hu, VLSI physical design:
from graph partitioning to timing closure. Springer Science & Business
Our obstacle-aware symmetrical clock tree algorithm is Media, 2011.
proposed to achieve better results in the constraints of bench- [9] H. Qian, P. J. Restle, J. N. Kozhaya, and C. L. Gunion, “Subtractive
mark circuits. We also have considered multiple fan-out match- router for tree-driven-grid clocks,” IEEE Transactions on Computer-
ing, merging, embedding and buffer insertion situation. This Aided Design of Integrated Circuits and Systems, vol. 31, no. 6, pp.
clock tree structure makes it have the ability to resist OCV, 868–877, 2012.
518