You are on page 1of 6

Enhanced timing closure using latches

Vijay Bhargava, Gourav Kapoor and Syed Shakir Iqbal, Freescale Semiconductor - June 24,
2014

The efficiency of modern SoC timing closure critically depends upon the effectiveness of the timing
fixes and their implementation. As we scale down to deep submicron technology, the complexity of
architecture continues to increase and the timing closure is not just simply limited to tool
optimization. It is these multiple manual iterations that the STA and implementation team undergo
that begin to play a very important part of the design cycle. In addition, with the increased number
of signoff corners and with cells having greater variations across corners, even the standard setup
and hold fixing can turn out to be a major challenge in some cases. In particular, hold fixing is one of
the most crucial parts of timing fixes.

A design can always be relaxed easily in terms of setup by relaxing clock frequency; but hold is
something that needs to be completely taken care of. In this paper, we will be discussing some
specific timing scenarios where other than conventional timing fixes, use of latches to fix timing
turns out to be a much more beneficial strategy.

Hold Fixing and Latches

Hold timing fixes are often done by increasing the delay of the concerned data paths. The
implementation tool or design engineer either makes the path logic slower or adds buffers in order
to meet the hold. However, often there are scenarios where the designer cannot afford to insert
extra hold buffers or downsize the data path enough to meet the hold requirements. Such scenarios
are very common with DFT architectures where during SHIFT operation the clock skew and
uncommon path can be very high due to stitching of functionally asynchronous domains.

click for larger image

Figure 1. A typical lockup latch insertion scenario in scan path between two different
domains with large skew resulting in hold criticality

To meet such timing paths DFT uses specific architectural timing latches called lockup latches to
take care of clock skew and the associated hold (Figure 1). A lockup latch is a level sensitive element
used intelligently to ease out hold timing without interfering with the functionality of the state
machine of the design. Lockup latches provide the desired robustness against undesired variations
in clock skew and are inserted within scan paths with very large skew or uncommon clock paths.

The practical use of lockup latches is however mostly limited to scan shift mode timing closure, but
this is not the end of their application. In fact, with due care, they can be used in functional hold
timing closure too and with much effect. Thousands of hold buffers can be saved with the use of
lockup latches in functional paths as well as discussed in later sections.

In the following sections will be discussing certain scenarios where we can use lockup latches to fix
functional timing more efficiently.

Functional Timing Closure Using Latches

Functional Timing Closure Using Latches

Traditionally data path buffering is the most basic approach to fix functional hold violations
particularly if there is little to no scope in data path through cell resizing. However, there are
multiple scenarios in a design when using simple data path buffering is not an option even if area
and power are of no major concern. The general expectation that a hold critical path will most
definitely have a relaxed setup is not always true. We will now discuss certain scenarios where we
can demonstrate how insertion of a latch can help in functional timing closure.

Case 1: Intentional Clock Skews

Let us consider the scenario shown in Figure 2. Flops A, B and C built at a much lower latency than
flop D leading towards a significant hold from A,B and C to D.

click for larger image

Figure 2. A typical scenario of intentional clock skew introduction resulting in hold critical
path at endpoint.

The intent behind this clock skew might have been to allow a larger setup capturing window in order
to meet the requirements of a very large combinational path which has been optimized to be as fast
as possible.

In this case the designer can neither insert data path buffer nor slow the data path towards the
endpoint as it will result in a setup violation, rather buffers have to be inserted from the start points
at nodes having less data path making these hold critical paths. The number of buffers required
cater such hold fixing would be dependent upon the start point group size, hold violation and
technology. Furthermore, he cannot reduce the clock skew as it was introduced intentionally to help
setup.

For example let us suppose we have a case of 128 start point flip flops that are hold critical by 10ps
to 200ps and the maximum delay for the technology specific buffer is say 50ps (Best Corner as it is
hold critical). In order to resolve such scenarios the average buffer count will be 300 to 500.
Moreover, there can be many such start point groups that will thus escalate this buffer count
further. However, by insertion of a latch within the data path the designer can make use of the latch
borrow to manage both the setup as well as hold in this design as shown in Figure 3.

click for larger image

Figure 3. Hold criticality introduced by intentional clock skew fixed by inserting a latch in
timing path

The negative level triggered latch allows the latch borrow to enable a full cycle setup path from flop
A/B/C to flop D while having the same clock skew. In addition, it also shifts the hold check from the
launching flops A/B/C to be timed at the clock edge being used in the latch instead of the capture
clock and hence relaxing it. To gain the maximum benefit the clock skew between the inserted
launch flop and latch should be kept as low as possible. Moreover by placing this latch within the
combinational path itself the setup and hold checks between flop A to latch and latch to flop D can
also be adjusted easily.

Cross Corner Variations

Case 2: Cross Corner Variations

Let us consider another scenario shown in Figure 4. Flop A is once again built at a much lower
latency than flop D leading towards a significant hold from A to D. This scenario unlike the previous
case can be resolved with end point buffering. However, in some case insertion of data path buffers
can still fail to resolve such violations specially when analyzing in multiple corners.

click for larger image

Figure 4. A typical example of cross corner variations creating both setup and hold critical
paths
The clock cells are generally more robust and have less variation thus the clock skew is least
affected across different corners. On the other hand, the data path cells are generally much less
robust and variation prone as a result of power/area saving and their delays can vary ~3x times
across the best and worst corners.

If the clock skew is very high and delay scaling of implementation technology across best and worst
corners is poor then the same path can become hold critical in best corner while being setup critical
in worst. In such cases, rather than adding a chain of buffers to delay the data path, a simple latch
can also be used. The latch will not only reduce the number of buffers involved but also since the
latch functionally ensures a half cycle retain time the variation of buffer delay across corners is also
kept in check. Figure 5 shows how the latch based fix can help resolving the timing challenges
across multiple corners.

click for larger image

Figure 5. Cross corner variation problem solved by inserting latch in timing path

click for larger image

Figure 6. Timing waveform for scenario in Figure 5

Challenges in using Latches for Timing Closure

So far we have seen how functional timing closure can be made easier by using lockup latches to fix
the hold in certain scenarios which are not easier by conventional methods. However, there are
certain issues that need to be taken care of while applying latch insertion in the design.

Challenges in using Latches for Timing Closure


Insertion of a lockup latch in scan shift path is part of the standard design flow across industry.

a. Critical Setup Path

The data path associated in a scan shift path is mostly a simple flop to flop path along with a very
relaxed shift test clock frequency. This path is thus inherently relaxed in terms of setup and
hence welcomes the use of lockup latches. The functional path on the other hand generally
consists of multiple levels of logic and also is very likely to get timed at the functional clock
frequency. This presents us with an issue that whether we can afford to make our full cycle setup
critical by half cycle or not.

b. Functional ECO Feasibility

Since the functional path has multiple levels of complicated logic within it, the insertion of latch
in the data path penalizes the scope for future functional ECOs associated with the new path. The
data path after the latch can be almost unusable for ECOs. Thus, special care need to be taken
while placing the latch within the data path.

c. LEC Debug

Insertion of a latch in the data path is bound to introduce non-equivalences which need to be
debugged through LEC. In case of scan shift path, since use of lockup latches is a standard
practice, this exercise is much more mature and easier. Moreover, LEC debug of a functional
path is much more complicated than a simple scan shift path. Hence use of latches in functional
timing fixes is bound to make the LEC difficult.

d. Timing Subtlety

The case of a latch in functional timing path is not as simple as in shift path. As stated above, the
insertion of latch in timing path can only be carried out if it does not disturb the state machines
functionality. Some of the common timing intricacies involved are:

i. Half cycle paths. Special care needs to be taken if there are half-cycle paths being formed. A
positive level sensitive latch cannot be inserted in a path from negative edge-triggered flop to
positive edge triggered flop as it will cause the state machine to go in an invalid state and
vice-versa.
ii. Different clock domains. For the insertion of latch, we have to be sure that the clocks at
launching at capturing flops, under all functional scenarios, run at same frequency or, at most,
the start-point clock can run faster. Because if this condition is violated, the state machine will
go into invalid state. This is not a problem in shift mode since all the clocks run at same
frequency during shift operation.
iii. Maximum permissible skew. The maximum permissible skew between the launch and
capture flops clocks cannot be more than half cycle of the fastest operating frequency of the
launch and capture flops. If this is not the case, even the insertion of a latch will not solve the
problem.

As stated above, there are certain architecture driven complexities and prerequisites for using
latches in functional paths. We cannot use them everywhere. In that case, use of buffers to fix is the
way to go. Also, the area of latch is more compared to a buffer, so there may be scenarios where
using a latch does not give much saving; however, when there are issues relating to cross corner
variations, we should invest in using latches as there is no other option with greater returns.
Conclusion

In this paper we presented an enhanced methodology for fixing functional hold in a design using
latches. We presented multiple scenarios where this practice can be used for better results and also
discussed the pros and cons associated with its implementation. It can be concluded that this
application of lockup latches in functional timing fixes can yield us good results and the issues
associated with this can also be taken care of by using more mature tools that can implement such
fixes themselves more intelligently.

More about authors Vijay Bhargava, Gourav Kapoor and Syed Shakir Iqbal

You might also like