You are on page 1of 12

DAYANANDA SAGAR COLLEGE OF

ENGINEERING
(An Autonomous Institute affiliated to VTU, Approved by AICTE & ISO 9001:2008
Certified) Accredited by National Assessment & Accreditation Council (NAAC) with ‘A’
grade)
Shavige Malleshwara Hills, Kumaraswamy Layout, Bangalore-560078

Department of Electronics and Communication Engineering


Alternate Assessment Tool -- 2
Report on

“CLOCK SKEW OPTIMIZATION FOR VOLTAGE VARIATION”


VLSI DESIGN AND EMBEDDED SYSTEMS

SUBMITTED BY
Sanjay H(1DS20LVS04)
VIDYA SAGAR DODDAMANI (1DS20LVS10)
II SEMESTER, M.TECH
Dept. of Electronics & Communication
Engineering Dayananda Sagar College of
Engineering Bangalore-560078

SUBMITTED TO
Dr. Dinesha P
Associate Professor
Dept. of Electronics & Communication
Engineering Dayananda Sagar College of
Engineering Bangalore-560078
(2020-2022)
CONTENTS

Sl No TITLE Page no
1 Introduction 1

2 WIDE-DIVERGENCE BUFFER 4

3 EXPERIMENTAL RESULTS 9

Conclusion 10
CHAPTER 1

INTRODUCTION

In synchronized designs, a clock signal must be delivered to all memory elements, such as flip-
flops. In modern designs, there can be millions of flip-flops which all need to be synchronized. The
clock skew defined as the difference of clock arrival times, imposes important constraints on the
system performance. Some industrial cases may require up to 30% of design turnaround time to fix
clock skew.

Despite the problem of clock skew minimization has been extensively studied, the problem of
which is still very difficult to resolve due to the following reasons. To reduce power dissipation, many
advanced techniques such as voltage scaling techniques are adopted in modern design, such as DVS,
DVFS, etc. These low power techniques dynamically change clock arrival times and therefore,
complicate the clock skew problem. In addition, the growing impact of voltage fluctuation due to
ground bounce or voltage drop, lead to severe clock skew uncertainty in different operating
environments. To satisfy clock skew under different voltage variations, multi-corner timing analysis is
adopted to capture the worst effects of variation. Normally, the maximum voltage variation is assumed
to be as high as 10% of the supply voltage, which can lead to as much as 11% delay variation.

To address clock skew optimization problem, many previous works proposed to use buffer
insertion or sizing methodology. The research proposed to generate clock network by fusing several
auxiliary-trees to tolerate supply voltage variation. The research proposed a clock tree resynthesis
methodology to calculate clock schedules for a synthesized clock tree. Aiming at solving the
multicorner skew problem, proposed a global LP-based buffer insertion/removal and local machine
learning based buffer sizing/shifting method for optimizing clock skew variation under MCMM
designs.

For most previous works, the basic methodology to tackle the skew problem is to apply buffer
insertion/replacement. Despite there may be many different sizes of clock buffers in a library, multi-
corner skew optimization problem is still very challenging. It has been known that optimizing skew at
one corner may cause violation at another corner; as a result, skew optimization may require iterations
of optimization among. various corners which can take very long time, sometimes may insert very long
buffer chains, or even fail to satisfy the skew constraints.

Page 1
Fig. 1. Example of inserting long buffer chains during fix timing.

(a) Assume we have 2 traditional buffers and one WDB. The initial clock tree has 56 ps timing
violation in corner 1.

(b) If only traditional buffers are used, very difficult to meet timing closure and insert long
buffer chains.

(c) Using WDB, buffer chains can be reduced significantly.

Fig. 1 shows an example of the long buffer chains inserted when minimizing the clock skew for
different corners. Assume we have two traditional buffers in the library, the delay of the buffer is 30/20
and 13/10 respectively. In Fig. 1(a), assume the clock skew in corner 1 is 56 ps and in corner 2 is 0 ps.
To fix the skew in both of the corners using traditional buffers is shown in Fig. 1 (b): total 42 buffers is
inserted in the buffer chains, the clock latency is 420 ps and 280 ps respectively. However, if a new
clock buffer which delays under two voltages are 40 and 20 is used, then the total buffer inserted is
reduced to 12, the clock latency is also reduce to 160 ps and 80 ps shown in Fig. 1 (c).

The exploration of this observation leads to the concept of widedivergence buffer (WDB),
whose delay under the low voltage (the worst corner) is about 2 to 3 times of the delay under the high
voltage (the best corner). Note that in the 28nm technology, the delay of a clock buffer in the worst
corner is about 1 to 2 times of the delay in the best corner. The first contribution of this paper is to
propose the use of WDB in the clock tree synthesis. We also show how a WDB can be implemented
using analog design techniques. Unlike the traditional analog design which needs to satisfy precise and
strict design constraints, a WDB is used for optimization and strict constraint is not necessary. As a

Page 2
result, even if an analog WDB is not precisely implemented, it will not lead to functional errors but
may only affect the skew optimization result.

The design of a WDB requires to determine an important parameter called the reference
difference which greatly affect the delay discrepancy between two different voltages. Since skew
requirements for each design is different, we may need several different WDBs with their own
reference differences. In this paper, given a design and its skew constraints, we also propose an
efficient algorithm to find “ideal” reference differences for WDBs. We first describe how to optimize
reference differences using ILP; then we propose innovative algorithms and theorems to speed up the
ILP algorithm. Experimental results demonstrate that the use of WDBs has on average 54.96%
reduction in total negative slack (TNS) compared with the traditional buffered tree.

Page 3
CHAPTER 2
WIDE-DIVERGENCE BUFFER
A WDB has wide discrepancy of delay under different voltages. In this section, we describe a
way of implementing WDB in detail

Design
Fig. 2 (a) shows a possible design of WDB. The design consists of two parts: one is the Vref
generator and the other is a WDB buffer. Since the Iref in Fig. 2 (a) is a gate current which can be very
small, long wire from the Vref generator to a WDB buffer will not affect the functionality. As a result,
one Vref generator can be shared by many WDB buffers if wire length is not an issue

Fig. 2. (a) Proposed wide-divergence buffer


(b) Reference voltage generator with voltage biasing circuit

In Fig. 2 (a), our design of a WDB buffer requires transistor Mn to work in the saturation mode
for low voltage so that the gate voltage (Vref) of Mn can limit the current (If). Therefore, transistor Mn
serves as a current limiter constraining the amount of current. There exists many works proposed to
generate low-voltage highaccuracy reference voltages. In this work, we adopt a reference voltage
generator with voltage biasing circuit, as Fig. 2 (b) shown. The biasing circuit consists of an ideal
current source and a set of resistance. From Fig. 2 (b), the output voltage Vref is equal to Vdd-
Iideal×Rideal.

Page 4
We define reference difference, Vdiff to be Vdiff=Iideal×Rideal which can be designed
insensitive to the variation of Vdd. The value of Vdiff is the difference of Vref from the Vdd and is
important in the following discussion. Since Vref changes with variation, instead of finding
appropriate Vref, we attempt to find appropriate reference difference Vdiff which is independent to
variation for a WDB. We have implemented the design in Fig. 2 and compared with a traditional clock
buffer. In the following, we demonstrate the simulation results. Fig. 3 shows the delay of a clock buffer
and a WDB under different supply voltages. The x-axis shows the supply voltage and the y-axis shows
the delay value. Assume both buffers have the same size and output loading. The Vdiff of the WDB is
assumed to be 0.2 v. Take Fig. 3 for example. The delays of the traditional buffer working under 0.9 v
and 1.1 v are 107.65 ps and 64.32 ps, respectively. The delays of a WDB working under 0.9 v and 1.1
v are 199.63 ps and 75.20 ps, respectively. Fig. 4 shows the delays of WDBs with different Vdiff’s
from 0.1 v to 0.3 v where the x-axis shows the Vdiff and y-axis shows the delay. In high voltage, the
delay is 71.63 ps when Vdiff equals 0.1 v and the delay is 89.75 ps when Vdiff equals 0.3 v. In low
voltage, the delay value grows dramatically. When Vdiff equals 0.1 v, 0.2 v, and 0.3 v, the delay
values are 134.48 ps, 199.63 ps, and 578.65 ps, respectively. The results show that the delay in low
voltage has large difference. Finally, Fig. 5 (a) shows the simulation waveform of a WCB working
under 0.9 v with Vdiff equals 0.2 v

CUSTOMIZE WDBS IN DESIGN


Since each design may have its own skew constraints, it is best to customize WDBs for each
design. In this section, we discuss how to create WDBs with appropriate Vdiff for each design. Since
WDBs are used to replace traditional buffers, we need to know where WDBs will be placed. In this
section, we discuss how to find WDB placement and WDB’s Vdiff, i.e., to find the locations of WDBs
and their best Vdiff in a clock tree for clock skew optimization. For easy of discussion, let us assume
that the optimization objective is to minimize the total negative slacks of a design. Here, we first
describe a naïve approach of using exhaustive ILP algorithm; then, we describe an innovative heuristic
to resolve WDB customization problem.

Problem Formulation:
In the WDB customization problem, the inputs include a synthesized clock tree and the delay
information of data path and clock path under each given voltage. The objective is to place WDBs in
appropriate positions with their appropriate Vdiff’s so that the total negative slack (TNS) is as small as
possible. When we say a WDB is placed at a buffer position, it is equivalent to say that a WDB replace
a buffer in the position
Page 5
Fig. 3. The delay comparison between a traditional buffer and a WDB

Fig. 4. The delay of a WDB under different Vdiff

Fig. 5. Waveform simulation for an (a) WDB (b) traditional buffer

Page 6
Integer Linear Programming Approach
The WDB customization problem can be formulated as an ILP problem using the following equations.
First, all the notations are summarized in Table 1.
Table 1. The notations used in our ILP formulation.

Our optimization goal is to minimize TNS across different voltage corners. The cost function in our
ILP formulation can be describe in EQ (1).

EQ (1)

Fig. 6. Experimental framework


Page 7
The slack Si,j,c after WDB assignment can be calculated as EQ (2).

S𝑖,𝑗,𝑐 = 𝑃 − 𝐷𝐷𝑖,𝑗,𝑐 + (𝐴𝑇′𝑗,𝑐 − 𝐴𝑇′ 𝑖,𝑐 ) EQ (2)

EQ (3)

With the ILP formulation, we can obtain several appropriate Vdiff’s of WDBs and their appropriate positions
for placement. In our experiment, we assume only two types of WDBs are needed. Although the ILP
formulation provides the near-optimum solution, it is too time consuming when there are large number of
possible buffer nodes, which can be replaced by a WDB in a clock tree. In the following, we propose an
innovative heuristic method.

Iterative Heuristic
In this section, we propose a fast heuristic to speed up the ILP process. The basic idea of our work is as follows.
We use the term a candidate set to describe a subset of the buffer nodes which we are allowed to replace with
WDBs. Our algorithm starts with an initial candidate set. In each iteration, we first perform ILP on the current
candidate set. The ILP will choose a subset of the current candidate set based on the cost function. With the
subset, we add other buffers to the subset to form a new candidate set. A buffer node which is removed from the
candidate set will not be added back in later iterations. In the algorithm, we say the level of a buffer to be the
distance from the node to the root of the tree. The initial set of candidates is selected from several levels, and
buffers in those levels are added into the initial candidate set. In each iteration, we add nodes in other levels
which have not been selected before. In addition, a node will not be added if the node satisfies the following
lemma.
Lemma: Let us consider a non-leaf node N which is not a candidate in current iteration. If all children of node
N are candidates but are not selected by ILP in current iteration, then N need not be added in next iteration.

Page 8
CHAPTER 3
EXPERIMENTAL RESULTS

Our experimental flow is shown in Fig. 6. The clock tree of a design is generated by a commercial
tool with TSMC 28nm technology. Then, we extract the clock timing information and the clock tree
structure from the STA report of PrimeTimeTM as inputs of our WDB optimization framework. We
generate the WDBs libraries by HspiceTM. All the experiments are implemented under a Linux-based
work station with 3.5 GHz AMD CPU and 8 GB memory.
Table 2. Experimental Results

We have implemented the ILP algorithm and the iterative ILP algorithm (IT-ILP) and performed
on a set of ISPD benchmark and one industrial circuit. The experimental results are shown in Table 2.
The first column shows the name of a benchmark, column two gives the number of buffers of the
benchmark; column three shows the number of flip-flops of the benchmark. In column four, we show
the initial TNS calculated from the initial clock tree synthesized by the commercial tool. Column five
to column eight show the TNS, number of candidates and runtime for ILP and column nine to column
12 show those of IT-ILP. We would like to mention that IND1 is a partial clock tree inside a real
commercial IC. There have been significant efforts spent to improve its TNS using the available
buffers. After using the iterative ILP method, we can reduce the number of candidates to 2660 and
finish the IT-ILP computation in 30 minutes, with only 119 ns of TNS. On average, the use of WDB
can achieve 54.96% of TNS improvement compared to the use of traditional buffers.

Page 9
Conclusion

This paper proposes the concept that using wild-divergence buffers to satisfied timing closure also
minimize clock buffer chain. We propose a new buffer design that enlarges the delay variation caused
by the voltage fluctuation. Furthermore, we propose an ILP based approach which performs WDB
optimization (including WDD insertion and Vdiff assignment) to minimize total negative slack under
constraints. Experimental results show that on average 54.96% of TNS of benchmarks can be reduced

Page 10

You might also like