You are on page 1of 4

Obstacle-aware symmetrical clock tree construction

Meng Liu, Zhiwei Zhang, Wenqin Sun, Donglin Wang


Institute of Automation
Chinese Academy of Sciences
University of Chinese Academy of Sciences

Abstract—High performance chip design is always a hot topic


in integrated circuit (IC) field. Clock design plays a critical role
in improving chip performance and affecting power consumption.
The regular clock layout has always been the ideal way to improve
the timing of results. In this paper, we propose a symmetrical
clock tree synthesis algorithm for top-level design, including tree
architecture planning, matching, merging and embedding. We
also integrate buffer insertion and obstacle processing into the
algorithm flow. By using NGSPICE simulation for benchmark
circuits, our skew results decrease by average 17.2% while using
less than average 24.5% capacitance resource compared with
other symmetrical clock trees.

I. I NTRODUCTION
Fig. 1. Example of a symmetrical clock tree.
As the technology of semiconductor process is scaling
down to 10nm and below, clock distribution network (CDN)
becomes even more challenging due to on-chip variation hybrid CDN structures like clock mesh and clock spine can
(OCV) effects [1]. And CDN contributes more than 40% of also be implemented easily because of its low skew feature.
processor power [2]. Clock tree is the most common CDN
structure due to its simplicity. Clock tree design in VLSI is II. S YMMETRICAL CLOCK TREE
also called clock tree synthesis, which is used to dynamically
A. Problem formulation
insert clock drivers between the clock source pin and multiple
receiver pins, physically placing the drivers in the optimized Given: a set of N clock sinks S = {s1,s2,...,sn}, with their
locations, and routing the clock nets. locations, {x1,y1}, {x2,y2},...,{xn,yn}, a library of buffers,
clock skew constraint, clock slew rate constraint and capaci-
It is well known that H-tree is a classic top-level CDN tance constraint.
architecture with nearly equal geometric lengths that can be
effective against OCV problem. However, H-tree does not Problem: obtain a symmetrical clock tree with appropriate
account for the uneven distribution of sinks and does not level planning, topology generation, buffering and routing
minimize wire capacitance because of much wire length cost resources such that the given design constraints are likely to
[3]. Compared with traditional H-tree, the symmetrical tree- be satisfied.
like structure has the characteristics of shorter wire length, and
The sinks represent the registers in the layout design. Skew
more broad scope of application. And it also inherits the robust
is because the difference of the clock arriving time from sinks.
advantage of H-tree. Another influential algorithm in tradi-
Let r be the root, N be the number of clock pins, and li be a
tional tree style is called Deferred Merge Embedding (DME)
leaf. Clock skew is defined as
which aims at the minimum wire length and zero clock skew
[4]. Compared with the classic DME tree, the symmetrical Skew = max|d(r, li ) − d(r, lj )|, (1 ≤ i, j ≤ N )
tree-like structure does not need to change the delay model
for getting low skew while technology library updates. The Fig. 2 shows our symmetrical clock tree design flow. The
idea of symmetrical clock tree structure is proposed in papers detail approaches will be discussed in following parts.
[5], [6], as shown in Fig. 1. However, these related articles do
not explicitly apply obstacle-avoiding feature to this tree style.
On the other hand, the fan-out number of their clock trees also 7UHH $UFKLWHFWXUH 0DWFKLQJ &OXVWHU :LUH 6QDNLQJ
%XIIHU 6WUDWHJ\
3ODQQLQJ &RQVWUXFWLRQ IRU EDODQFH WUHH
has a corresponding limit which will cause the clock tree level
number to increase easily.
In order to overcome the drawbacks of the previous Fig. 2. The proposed algorithmic flow.
works, an obstacle-aware symmetrical clock tree algorithm
is proposed to solve the problem for facing more practical
B. Tree architecture planning
layout situation. The introduction of multiple fan-out matching
algorithm and buffer insertion strategy also promote results of Level number of tree can be determined based on sink
ISPD benchmarks [7]. By using our tree synthesis algorithm, number. We have factorized the number of sinks N, including

978-1-5090-6389-5/17/$31.00 ©2017 IEEE 515


2, 3, 5 and 7. Compared with Shih’s approach [5], our approach the last cluster is partitioned, as shown in Fig. 3(b). We call the
is more flexible in extending fan-out number based on buffer above algorithm Greedy Matching Algorithm (GMA). Based
driving capability. We propose the decision-making approach, on GMA, multiple cells matching can be easily solved.
as shown in Algorithm 1. For example, 30 can be factorized
with 5, 3, 2, this means clock tree with 30 sink nodes can be Algorithm 2 Greedy Matching Algorithm
planned as three levels. The first level adopts the method of Input: cells set S with size N , merging points number n,
five fan-out tree merging. While the latter two are combined weight table W
using the trinary tree and binary tree. Output: cluster with reasonable cells
1: cluster number M = N/n
Algorithm 1 Tree Branch Planning 2: direction vector V = {v1,v2,...,v8}
Input: number of sinks n, max fan-out m 3: for v ∈ V do
Output: prime number list p, adjusted number of sinks an 4: find the boundary cell s b from S
1: define function f (x) to get prime num list pnl 5: for w ∈ W do
2: for i = 2 → m do 6: find the second cell s 2 from s b
3: while x/i ==0 do 7: find the third cell s 3 from s 2
4: x ← x/i 8: make cluster of the three cells
5: push back i to pnl 9: delete the three cells from S
6: end while 10: end for
7: end for 11: get the cluster results
8: if n ==1 then 12: end for
9: push back 1 to prime number list p 13: get min{max{cluster 1, cluster 2, ... ,cluster n }} from
10: else each vector
11: t ← n, p ← f (t) 14: compare and choose the optimal result strategy
12: while t > m do
13: clear prime number list p
14: t+ = 1, an ← t ← m
15: p ← f (t)
16: end while
17: end if

C. Greedy matching algorithm


Reasonable clustering can effectively reduce the wire
length cost. The idea of using Edmonds matching algorithm
for binary tree merging has been discussed in this paper [5].
D (LJKW YHFWRU GLUHFWLRQV E *UHHG\ PDWFKLQJ UHVXOW
However, the paper also points out the deficiency of this algo-
rithm in the process of trinary tree merging or more complex
trees, that is, NP-complex problem is formed. Moreover, other Fig. 3. Greedy matching result.
related articles also did not discuss this situation.
This problem can be written as: min{max{cluster 1, clus- D. Node merging and embedding
ter 2, ... ,cluster n }}, this formula means computing the
maximum internal distance of all clusters, and expecting this Node merging can be done after computing tree plan and
distance to be as small as possible. This is a bottleneck matching strategy. The tilted rectangular region (TRR) is used
problem. In our works, an algorithm based on greedy and to represent potential embedding positions for tree nodes. A
iteration for multiple fan-out tree merging is given as shown TRR is a collection of points within a fixed distance of a
in Algorithm 2. The following content presents the trinary tree Manhattan arc.
merging algorithm as an example. First of all, we build a
The algorithm framework includes two phases: the first
weight table for each cell point (to simplify the name, sink
phase is bottom-up, and determines all possible locations of
nodes and root nodes are called cell points), the weights are
internal nodes of topology. The next phase is called top-
obtained by the Manhattan distance between each two cell
down phase or embedding phase which exact locations of
points, the four quadrants are divided into eight vectors shown
internal nodes in topology are determined [8]. Based on TRR
in Fig. 3(a). Then, find the cell point that is furthest from the
model, we can easily find the intersection or merging segments
center of gravity of cell set S along the given vector direction,
between two potential sinks, as shown in Fig. 4(a), the gray
and the cell point is chosen as the initial point. From initial
region represents TRR. Additionally, multiple fan-out tree
point, we can find the second point with the smallest weight.
is considered in our merging process for reducing the level
From second point, the third point with the smallest weight
number of tree. In this situation, sink number is not limited to
will be found without considering the initial point. Three cells
an integer multiple of two or three.
of the cluster has been formed now. After above work, these
three cells will be moved out of the set S. Next we iterate The followings presents the three points merging as an
through the above algorithmic process based on updated S, example in Fig. 4(b). Each node builds its TRR based on the
the new cluster will be partitioned. The algorithm ends until same Manhattan arc, the Manhattan arc is computed in the

516
GMA phase, and this arc ensures that the three TRRs have TABLE I. B UFFER TABLE
intersection. Driver Receiver Fanout Spacing
buffer1 buffer1 2 d1
buffer1 buffer2 2 d2
buffer2 buffer1 2 d3
buffer2 buffer2 2 d4

slew(ps)
70

60

50
0DQKDWWDQ ULQJV
40

30

D 7ZR SRLQWV PHUJLQJ E 7KUHH SRLQWV PHUJLQJ 20

10

0
0 200000 400000 600000 800000 1000000 1200000
Fig. 4. Cells merging example.

Fig. 6. Slew increases with spacing increasing (the wire type is 0, the fan-out
E. Buffering strategy number is set to 2, and one buffer1 drives two buffer2).

To meet the slew-rate constraint, feasible buffers are needed


to insert onto clock tree paths. In order to construct the sym-
cap(fF)
metrical clock tree, the paths at each level must be guaranteed 2000
to have the same number of buffers and corresponding types. 1800

In other words, we are concerned about the buffer strategy 1600


1400
used by each level. Based on the balanced structure, skew can 1200
be reduced easily. 1000
800
600
Apparently, we are more concerned about slew parameter. 400
Therefore, we can establish the buffer look-up table in the case 200

of different fan-out with different buffers driving combination 0


0 200000 400000 600000 800000 1000000 1200000
through simulation by NGSPICE, as shown in Fig. 5. Starting
from the clock root point, the strategy of buffer insertion
including the maximum spacing and buffer types can be Fig. 7. Capacitance increases with spacing increasing(the wire type is 0, the
determined based on fan-out number for each level through fan-out number is set to 2, and one buffer1 drives two buffer2).
dynamic programming.
According to the simulation results, we depicted the skew
parameter and capacitance parameter curves, as shown in Fig. 6 F. Obstacle-aware routing
and Fig. 7. The characteristic curves for different fan-out and
wire types can be easily obtained. Table I is an example of After tree topology and buffering strategy have been de-
two fan-out driving simulation information which includes termined, the next step is to finish routing. In our work,
type of driver buffer, type of receiver buffer, fan-out number rectangular obstacles in layout are considered and we firstly
and maximum spacing. Based on spacing value, the distance introduce obstacle-aware routing among related papers [5], [6].
of buffer on the path can be determined in the constraint Obstacle-aware routing means the path from parent to child
condition. need to avoid obstacles which intersect its shortest path and
buffers from the path need to find their reasonable locations,
as shown in Fig. 8. Firstly, identify all wires that intersect
obstacles and put every four points from obstacles, child points
and parent points into the table of path points.

We also build direction vector to assist the process. Begin-


ning from parent, we find the nearest point and set it as the
next path point. And next path point is determined based on
the direction vector. The obstacle-aware routing algorithm is
shown in Algorithm 3. The buffers inserted can be determined
along the path scheduled. The overlapping of buffers can be
Fig. 5. Examples of buffer driving simulation.
avoided. The paths from the same parent will be balanced by
using wire snaking technology [9].

517
TABLE III. E XPERIMENTAL RESULTS FOR IBM BENCHMARKS .

Shih’s Approach [5] Our Approach


3DUHQW benchmarks sinks Skew (ps) Cap (f F ) Skew (ps) Cap (f F )
r1 267 1.510 13829 0.323 10778
&KLOG r2 598 1.770 31056 0.338 30572
r3 862 2.310 44188 0.407 40903
r4 1903 2.540 98450 1.386 88721
r5 3101 3.010 171228 2.532 158721
%XIIHU ORFDWLRQ

Fig. 8. Obstacle-aware routing.

Algorithm 3 Obstacle-aware Routing


Input: parent point p, child point c, obstacle set obs
Output: path points vector path point
1: define direction vector s from p to c
2: for i = 1 → obs.size() do
3: check if intersects with s and push back to new obs
4: end for
5: sort new obs based on distance from p
6: push back p to path point and parent now = p Fig. 9. Simulation graph of ispd09f11 benchmark (the red point means sink
7: for i = 1 → obs.size() do point and the blue point means buffer point).
8: find the nearest point np of current obs from
parent now
9: parent now = np while the logic level can be greatly reduced. By using this
10: push back parent now to path point tree as the top-level clock design, designers can easily build
11: find the next np from current obs based on s the entire regular CDN structure like clock mesh or spines.
12: repeat the above step For future research, we will take more complex semiconductor
13: end for technology into account for routing rule.

ACKNOWLEDGMENTS
III. E XPERIMENTAL RESULTS This work is supported by the Strategic Priority Research
The proposed approach is implemented by using standard Program of Chinese Academy of Sciences (under Grant XDA-
C++ language on a PC workstation of Intel Core i7-3537U 06010402).
CPU. Four benchmarks in ISPD09 clock network synthesis
contest are used to test our symmetrical clock tree synthesis R EFERENCES
algorithm [7]. For comparison, NGSPICE simulation based on [1] D. Wang, X. Du, L. Yin, C. Lin, H. Ma, W. Ren, H. Wang, X. Wang,
the 45nm process technology is used to evaluate the quality of S. Xie, L. Wang et al., “Mapu: A novel mathematical computing
results. Additionally, we also ran experiments on benchmark architecture,” in High Performance Computer Architecture (HPCA), 2016
circuits r1-r5 [4]. The results of clock skew, resource usage IEEE International Symposium on. IEEE, 2016, pp. 457–468.
are shown in Table II and Table III. Compared with Shih’s [2] J. M. Rabaey, A. P. Chandrakasan, and B. Nikolic, Digital integrated
circuits. Prentice hall Englewood Cliffs, 2002, vol. 2.
approach [5], we obtain 17.2% decrement on skew result while
[3] D.-J. Lee and I. L. Markov, “Contango: Integrated optimization of soc
we also use 24.5% less capacitance resource on the average. clock networks,” VLSI Design, vol. 2011, 2011.
The routing graph of ispd09f11 is shown in the Fig. 9. [4] R. S. Tsay, “Exact zero skew,” in IEEE International Conference on
Computer-Aided Design, 1991. Iccad-91. Digest of Technical Papers,
TABLE II. E XPERIMENTAL RESULTS FOR ISPD09 BENCHMARKS . 1991, pp. 336–339.
Shih’s Approach [5] Our Approach [5] X. W. Shih and Y. W. Chang, “Fast timing-model independent buffered
benchmarks sinks Skew (ps) Cap (f F ) Skew (ps) Cap (f F ) clock-tree synthesis,” in Design Automation Conference, 2010, pp. 80–
85.
ispd09f11 121 0.110 95749 0.093 73595
ispd09f12 117 0.051 96609 0.038 72211
[6] J. T. Yan, M. C. Huang, and Z. W. Chen, “Top-down-based symmetrical
buffered clock routing,” in Proceedings of the great lakes symposium on
ispd09f21 117 2.321 108755 2.076 81453
VLSI, 2012, pp. 75–78.
ispd09f22 91 1.160 69696 1.986 58721
[7] C. N. Sze, P. Restle, G. J. Nam, and C. Alpert, “Ispd2009 clock network
synthesis contest,” in International Symposium on Physical Design, ISPD
2009, San Diego, California, Usa, March 29 - April, 2009, pp. 149–150.
IV. C ONCLUSION [8] A. B. Kahng, J. Lienig, I. L. Markov, and J. Hu, VLSI physical design:
from graph partitioning to timing closure. Springer Science & Business
Our obstacle-aware symmetrical clock tree algorithm is Media, 2011.
proposed to achieve better results in the constraints of bench- [9] H. Qian, P. J. Restle, J. N. Kozhaya, and C. L. Gunion, “Subtractive
mark circuits. We also have considered multiple fan-out match- router for tree-driven-grid clocks,” IEEE Transactions on Computer-
ing, merging, embedding and buffer insertion situation. This Aided Design of Integrated Circuits and Systems, vol. 31, no. 6, pp.
clock tree structure makes it have the ability to resist OCV, 868–877, 2012.

518

You might also like