Professional Documents
Culture Documents
Table 2. Achievable Clock period (ns) using flip-flops without any time-borrowing(Critical Path), theoretical minimum with
time-borrowing (MCR), pulsed latches (PL) and clock skew (CS) subject to different minimum delay assumptions.
clocks). On average, clock periods are reduced by 29%, same sequential elements. Such loops do not present a prob-
which matches fairly closely with prior work [4] (see third lem for clock skew, which uses edge triggered flip-flops,
last row in the table). The MCR column does not consider however, they do present a problem for latches, and limit
short path delay (hold-time) violations and thus, it represents the performance improvement for some latch circuits. With
a floor on the achievable clock period. 60% short path (min) delays, the pulsed latch results degrade
Moving onto the results that consider hold-time viola- further – the short path delays become so small, it is diffi-
tions. The columns grouped under 80% represent results cult, using a single pulse width, to achieve a performance
for the case of minimum path delays set to be 80% of max- improvement.
imum path delays. Observe that the performance results After observing that short paths have a considerable neg-
degrade considerably when hold-times constraints must be ative impact on both pulsed latch and clock skew-based opti-
honored, and underscores the necessity of considering such mization, we investigated the extent of delay padding (i.e. hold-
constraints. Pulsed latches provide 5% performance im- time repair) needed to improve the clock period results. Specif-
provement, on average, relative to the traditional flow (flip- ically, we investigated how each circuit’s clock period would
flops with single clock); clock skew with arbitrarily many be affected if it were possible to pad x% of its paths1 . The
clocks provides 10% improvement. Similar results are ob- aggregated results, normalized to the clock period without
served when minimum delays are set to 70% of maximum any short path repairs, are given in Fig. 4. The rightmost bar
delays. Essentially, with 70% min delays, 5% clock period indicates that about 3% additional reduction in clock period
improvement can be achieved, on average, by latch conver- would be achieved if it were possible to lengthen about 10%
sion on a routed design, without requiring any delay padding of paths. Circuit-by-circuit results could not be included due
(hold-time violation repair). to page limits, however, we did see reductions of up to 8%
for some circuits with 10% of paths delay-padded.
Comparing the results for pulsed latches and clock skew The results above are based on placement/routing so-
with 70% min delays, we observe identical clock periods lutions generated in a manner that was unaware of pulsed
for many circuits. Moreover, we found that latch-based op- latch-based optimization. We wondered whether improved
timization with a single clock achieved 90% of the clock pe- results could be achieved if the placement and routing tools
riod reductions possible with clock skew (with an arbitrary were aware of latches? VPR gauges the criticality of a con-
number of clocks) for 23 of the 27 circuits. A more detailed nection using its slack ratio, which is the ratio of the worst-
analysis revealed that the circuits for which pulsed latches
did poorly compared to clock skew contained what we term 1 For this analysis, we removed the 6 circuits that already achieved the
as self loops: combinational paths that begin and end at the optimum clock period using latch-based optimization.
1.005
1
7. CONCLUSIONS AND FUTURE WORK
0.995 In this paper, we considered performance optimization us-
ing pulsed latches for FPGAs. We showed that most of the
od
0.99
ClockPerio
0.985
gains of using skewed clocks can be realized by a post place-
0.98
0.975
and-route latch-based optimization that has no area or power
0.97 drawbacks, and that exploits existing features in commercial
0.965 FPGAs. However, short paths still limit the possible gains
0.96 of pulsed latches, and integration with the place-and-route
0.955 algorithms could yield even better results.
None 3% 5% 10% Results in Fig. 5 and Fig. 4 show that padding short paths
%ofConnectionsFixed can give considerable benefits, while driving VPR with cycle-
slacks gives varied results. It is clear that optimizing the
Fig. 4. Clock period as a result of extending 3%, 5%, or MCR throughout place and route is a flawed approach if
short paths are not considered. This calls for an integrated
10% of the total number of short paths under 70% minimum approach that exploits the time-borrowing benefits of pulsed
delay assumptions. latches and mitigates the short paths that hinder the benefits
of pulsed latches.
PostP&ROptimization Integrated+PostP&ROptimization
1.03 8. REFERENCES
1.02
1.01
d
ClockPeriod