You are on page 1of 83

Unit 4

System Timing

BYU ECEn 320

Know Your Unit Prefixes


deka- D 101 -- deci- d 10-1
hecto- h 102 -- centi- c 10-2
kilo- k, K 103 210 milli- m 10-3
k = 103 and K = 210
mega- M 106 220 micro- µ 10-6
giga- G 109 230 nano- n 10-9
tera- T 1012 240 pico- p 10-12
peta- P 1015 250 femto- f 10-15
exa- E 1018 260 atto- a 10-18
zetta- Z 1021 270 zepto- z 10-21
yotta- Y 1024 280 yocto- y 10-24

BYU ECEn 320

1
Synchronous System Timing

Clock Jitter, Setup Time, Hold Time

BYU ECEn 320

Clock signals

• Very important with most sequential circuits


– State variables change state at clock edge.

BYU ECEn 320

2
Sample Clock Specifications

Xilinx FPGA

Virtex 2.5V FPGA Datasheet


Xilinx Corporation
Version 1.3, 1999

Micron SDRAM

256Mb SDRAM Datasheet


Micron Corporation
2002

BYU ECEn 320

Clock Jitter
• Jitter is the phase variations that happens in a clock signal
as a result of noise, patterns, or other causes, with a
frequency of variation greater than a few tens of Hertz.
• Slower changes in phase due to temperature, voltage, and
other physical changes are usually referred to as "wander."
• Period Jitter: The short-term variations in the clock period.

Tideal
+Tjitter
−Tjitter
BYU ECEn 320

3
Measurement of Clock Jitter
• The variations in every clock period sampled are plotted as a
histogram of the number of periods with a given length. It has
normal distribution.
• The outer curve is the accumulated count (in this case, 865,000
samples). Peak-to-peak period jitter is 56.2 ps.
• The inner curve is the latest sampled count of 1000 periods. Peak-
to-peak period jitter is 34.2 ps.

BYU ECEn 320

Clock Jitter Example

Spartan-II 2.5V FPGA Family


DC and Switching Characteristics
Xilinx Corporation
Version 2.6, August 2002

BYU ECEn 320

4
D flip-flop timing parameters

• Clock-to-q propagation delay (from CLK)


• Setup time (D before CLK)
• Hold time (D after CLK)

tres

BYU ECEn 320

Xilinx Flip-Flop Variations

BYU ECEn 320

5
Xilinx Flip-Flop Variations

BYU ECEn 320

Xilinx Flip-Flop Timing

BYU ECEn 320

6
Setup and Hold Times
There
Thereisisaatiming
timing
"window"
"window"around
aroundthe the
clocking
clockingevent
event
during
duringwhich
whichthetheinput
input
must
mustremain
remainstable
stable
and
andunchanged
unchanged
in
inorder
order
Input to
Tsu tobe
berecognized
recognized

Clock Th Setup Time (Tsu , Ts )


Minimum time before the clocking event by
which the input must be stable

Hold Time (Th )


Minimum time after the clocking event during
which the input must remain stable

BYU ECEn 320

Setup and Hold Time Violations

Invalid!

Input value changes after the setup


Input time. The input is not stable long
Tsu enough before the clock edge.

Clock

BYU ECEn 320

7
Setup and Hold Time Violations

Invalid!

Input value changes before the hold


Input time. The input is not stable long
enough after the clock edge.

Clock Th

BYU ECEn 320

Taking your friend to the train . . .


If the train leaves at 8:00 (the event) and you live 20
minutes away from the station, when should you
leave your house?

At 7:40! (“setup time” is 20 minutes before the event)

If you leave after 7:40, you will miss the train. If you
leave before 7:40, you should have enough time to
get to the station before it leaves.

BYU ECEn 320

8
Helping your friend onto the train . . .
Once the train has started, your friend needs help
staying on the train. If he does not have continuous
help for five minutes, he will fall off the train.

How long should you help your friend stay on the


train after the train has left the station?

At least five minutes (“hold time”) or 8:05 at the


earliest.

Without your help for the full 5 minutes, your friend


will fall off the train.

BYU ECEn 320

Negative Hold Time

Input The
Thevalid
validregion
regionor
or
Tsu "window"
"window"associated
associated
with the clock event
with the clock event
Th (Negative Hold Time) does
doesnot
nothave
haveto
tobe
be
centered
centeredaround
aroundthe
the
Clock clock edge.
clock edge.
The
Theregion
regioncan
canbe beto
to
the
theright
rightor
orleft
leftof
ofthe
the
clock
clockedge
edgewhen
whenthe the
When the hold time is negative, the setup
setupororhold
holdtimes
times
valid region is to the left of the clock are
arenegative.
negative.
edge.

This allows the input to change


slightly before the clock edge without
disturbing the operation of the flip-flop.
BYU ECEn 320

9
Negative Setup Time

The
Thevalid
validregion
regionor
or
Input "window"
"window"associated
associated
with
withthe
theclock
clockevent
event
Tsu does
doesnot
nothave
havetotobe
be
(Negative Setup Time) centered
centered aroundthe
around the
Th clock
clockedge.
edge.
Clock
The
Theregion
regioncan
canbe beto
to
the
theright
rightor
orleft
leftof
ofthe
the
clock
clockedge
edgewhen
whenthe the
setup
setupororhold
holdtimes
times
When the setup time is negative, the are
arenegative.
negative.
valid region is to the right of the clock
edge. Note: you cannot have both
a negative setup time and
This allows the input to change a negative hold time!
slightly after the clock edge without
disturbing the operation of the flip-flop.

BYU ECEn 320

Clock-to-Q Propagation Delay


The output of a flip-flip does not change instantaneously at the
clock edge. The change in output occurs after a propagation
delay through the flip flop.

Clk
Tplh T phl

Tcq Tcq
Q

The propagation delay is usually different for the


low to high and high to low transitions.

BYU ECEn 320

10
Timing Specifications

74LS74 Positive
Edge Triggered Tsu Th T su Th
D Flipflop 20 5 20 5
ns ns ns ns
D
• Setup time Tw
• Hold time 25
• Minimum clock width ns
• Propagation delays Clk
(low to high, high to low, Tplh T phl
max and typical) 25 ns 40 ns
13 ns 25 ns
Q

All measurements are made from the clocking event


that is, the rising edge of the clock

BYU ECEn 320

Cascaded Flip-Flops
IN Q0 Q1
D Q D Q

C Q C Q

CLK

Clock

IN

Q0

Q1

BYU ECEn 320

11
Cascaded Flip-Flops
Are the Setup and Hold Times met?
Q0: Input is IN
IN Q0 Q1
D Q D Q

C Q C Q
Setup and hold
times are met
CLK
if the input, IN,
does not change
within the valid
region or
window.

Clock

IN

Q0

Q1

BYU ECEn 320

Cascaded Flip-Flops
Are the Setup and Hold Times met?
Q1: Input is Q0
IN Q0 Q1
D Q D Q
Wait!
C Q C Q
Q0 is changing
CLK right at the clock
edge. Won’t this
violate the hold
time of Q1?

Does it violate
Clock the setup time of
Q1?

IN

Q0

Q1

BYU ECEn 320

12
Cascaded Flip-Flops
Are the Setup and Hold times of Q1 met?

IN

Tcq
Tcq
Q0

Th Th
Q1

As long as Tplh > Th and Tphl > Th Do we use Tplh min or max?
BYU ECEn 320

Cascaded Flip-Flops
Are the Setup and Hold times of Q1 met?

IN

Tcq

Q0
Tcq

Th Th
Q1

If Tplh < Th or Tphl < Th, there is a hold time violation!


BYU ECEn 320

13
Cascading Flip-Flops
• Flip-flop families are designed to guarantee that
Tcq(min) > Th
• You can safely mix within a flip-flop family
• If you mix flip-flop families, you need to make sure
that Tcq(min)_fam1 > Th_fam2 & Tcq(min)_fam2 > Th_fam1
• You cannot solve this kind of problem by slowing
down the clock.

• Problem occurs when you mix fast flip-flops with


slow ones (propagation delay of fast flip-flop).
– Artificial delays may need to be added

BYU ECEn 320

Hold-Time Margin
• Tcq_min + Tcomb_min ≥ Thold IN
D Q
Q0
Comb
Q0’
D Q
Q1

C Q C Q

• Hold-Time Margin CLK

= Tcq_min + Tcomb_min - Thold

• Hold-Time Margin with clock skew (discussed


next lecture)
= Tcq_min + Tcomb_min - Thold - Tskew

• Jitter has no effect on hold time margin since hold time


margin is not a function of the clock period.

BYU ECEn 320

14
Cascaded Flip-Flops
How fast can you clock this circuit?
IN Q0 Q1
D Q D Q

C Q C Q

CLK

Tclk ?

Clock

Q0

Q1

BYU ECEn 320

Cascaded Flip-Flops
How fast can you clock this circuit?
IN Q0 Q1
D Q D Q

C Q C Q

CLK

Tclk > Tcq + Tsetup Do we use Tp min or max?

Clock

Tcq
Q0
Tsetup

Q1

BYU ECEn 320

15
Synchronous Circuit
How fast can you clock this circuit?
IN Q0 Q0’ Q1
D Q D Q

C Q C Q

CLK

Clock

IN

Q0

Q0’

Q1

BYU ECEn 320

Synchronous Circuit
How fast can you clock this circuit?
IN Q0 Q0’ Q1
D Q D Q

C Q C Q

CLK

Clock

IN
Tcq

Q0 Tpinv

Q0’ Tsetup

Q1

Tclk > Tcq + Tpinv + Tsetup


BYU ECEn 320

16
Setup-Time Margin
• Tclk ≥ Tcq + Tcomb + Tsetup

• Setup Time Margin = Tclk - (Tcq + Tcomb + Tsetup)


– Provide timing margin for unexpected circumstances
– Marginal components, engineering errors, brown outs

• Clock Jitter
– Must account for worst-case clock jitter
– Tclkmin = Tclk - Tmaxjitter
– Tclk ≥ Tcq + Tcomb + Tsetup + Tmaxjitter

BYU ECEn 320

Synchronous Design Methodology

• All loops have at least one flip-flop in the loop.


• All flip-flops clocked by the same common clock
– Asynchronous preset/clear not used (except for
initialization)
– The clock is not gated
• All inputs to the design are synchronous to the
clock

Combinational
Synchronous
Logic
Inputs
Flip-Flops
BYU ECEn 320

17
Synchronous Circuit

Combinational
Synchronous Logic
Inputs Flip-Flops

• There may be many flip-flops in the design


• There will be many wire/gate paths from flip-flop output
to flip-flop input
• Minimum Clock Period: Tclk > Tcq(max) + Tdelaymax + Tsetup
• Task: Identify worst-case flip-flop to flip-flop timing path

BYU ECEn 320

State Machine Example


Flip-Flop Timing Input Timing
Th 5 ns Tinput 35 ns (max)
25 ns (min)
Tsu 20 ns Gate Timing
Tplh 25 ns (max) Tplh 22 ns (max)
10 ns (min) 10 ns (min)
Tphl 40 ns (max) Tphl 15 ns (max)
20 ns (min) 8 ns (min)

X A
D Q
B Z
Q
A

B
A B
D Q

Q
B
CLK

BYU ECEn 320

18
State Machine Example
Flip-Flop Timing Input Timing
Th 5 ns Tinput 35 ns (max)
25 ns (min)
Tsu 20 ns Gate Timing
Tplh 25 ns (max) Tplh 22 ns (max)
10 ns (min) 10 ns (min)
Tphl 40 ns (max) Tphl 15 ns (max)
20 ns (min) 8 ns (min)

X A
D Q
B Z Tp_fl
Q
A

B 40 ns
A B
D Q

Q
B
CLK

BYU ECEn 320

State Machine Example


Flip-Flop Timing Input Timing
Th 5 ns Tinput 35 ns (max)
25 ns (min)
Tsu 20 ns Gate Timing
Tplh 25 ns (max) Tplh 22 ns (max)
10 ns (min) 10 ns (min)
Tphl 40 ns (max) Tphl 15 ns (max)
20 ns (min) 8 ns (min)

X A
D Q
B Z Tp_fl + Tp_ifl
Q
A

B 40 + 22 + 22 ns
A B
D Q

Q
B
CLK

BYU ECEn 320

19
State Machine Example
Flip-Flop Timing Input Timing
Th 5 ns Tinput 35 ns (max)
25 ns (min)
Tsu 20 ns Gate Timing
Tplh 25 ns (max) Tplh 22 ns (max)
10 ns (min) 10 ns (min)
Tphl 40 ns (max) Tphl 15 ns (max)
20 ns (min) 8 ns (min)

X A
D Q
B Z Tp_fl + Tp_ifl + Tsu
Q
A

B 40 + 22 + 22 + 20 = 104 ns
A B
D Q

Q
B
CLK

BYU ECEn 320

State Machine Example


Flip-Flop Timing Input Timing
Th 5 ns Tinput 35 ns (max)
25 ns (min)
Tsu 20 ns Gate Timing
Tplh 25 ns (max) Tplh 22 ns (max)
10 ns (min) 10 ns (min)
Tphl 40 ns (max) Tphl 15 ns (max)
20 ns (min) 8 ns (min)

X A
D Q
B Z Tinput
Q
A

B 35 ns
A B
D Q

Q
B
CLK

BYU ECEn 320

20
State Machine Example
Flip-Flop Timing Input Timing
Th 5 ns Tinput 35 ns (max)
25 ns (min)
Tsu 20 ns Gate Timing
Tplh 25 ns (max) Tplh 22 ns (max)
10 ns (min) 10 ns (min)
Tphl 40 ns (max) Tphl 15 ns (max)
20 ns (min) 8 ns (min)

X A
D Q
B Z Tinput + Tp_ifl
Q
A

B 35 + 22 + 22 ns
A B
D Q

Q
B
CLK

BYU ECEn 320

State Machine Example


Flip-Flop Timing Input Timing
Th 5 ns Tinput 35 ns (max)
25 ns (min)
Tsu 20 ns Gate Timing
Tplh 25 ns (max) Tplh 22 ns (max)
10 ns (min) 10 ns (min)
Tphl 40 ns (max) Tphl 15 ns (max)
20 ns (min) 8 ns (min)

X A
D Q
B Z Tinput + Tp_ifl + Tsu
Q
A

B 35 + 22 + 22 + 20 = 99 ns
A B
D Q

Q
B
CLK

BYU ECEn 320

21
State Machine Example
Flip-Flop Timing Input Timing Feedback Path: 104 ns
Th 5 ns Tinput 35 ns (max)
Input Path: 99 ns
25 ns (min)
Tsu 20 ns Gate Timing
Critical Path: 104 ns (9.6 MHz)
Tplh 25 ns (max) Tplh 22 ns (max)
10 ns (min) 10 ns (min)
Tphl 40 ns (max) Tphl 15 ns (max)
20 ns (min) 8 ns (min)

X A
D Q
B Z
Q
A

B
A B
D Q

Q
B
CLK

BYU ECEn 320

State Machine Example


Output Timing
Flip-Flop Timing Input Timing
Th 5 ns Tinput 35 ns (max)
25 ns (min)
Tsu 20 ns Gate Timing
Tplh 25 ns (max) Tplh 22 ns (max)
10 ns (min) 10 ns (min)
Tphl 40 ns (max) Tphl 15 ns (max)
20 ns (min) 8 ns (min)

X A
D Q
B Z Tp_fl
Q
A

B 40 ns
A B
D Q

Q
B
CLK

BYU ECEn 320

22
State Machine Example
Output Timing
Flip-Flop Timing Input Timing
Th 5 ns Tinput 35 ns (max)
25 ns (min)
Tsu 20 ns Gate Timing
Tplh 25 ns (max) Tplh 22 ns (max)
10 ns (min) 10 ns (min)
Tphl 40 ns (max) Tphl 15 ns (max)
20 ns (min) 8 ns (min)

X A
D Q
B Z Tp_fl + Tp_ofl
Q
A

B 40 + 22 = 62 ns
A B
D Q

Q
B
CLK

BYU ECEn 320

State Machine Example


Flip-Flop Timing Input Timing Output Path: 62 ns
Th 5 ns Tinput 35 ns (max)
25 ns (min) Critical Path: 104 ns (9.6 MHz)
Tsu 20 ns Gate Timing
Tplh 25 ns (max) Tplh 22 ns (max)
10 ns (min) 10 ns (min)
Tphl 40 ns (max) Tphl 15 ns (max)
20 ns (min) 8 ns (min)

X A
D Q
B Z
Q
A

B
A B
D Q

Q
B
CLK

BYU ECEn 320

23
Static Timing Analysis
• Static timing analysis tools are used to identify
worst-case signal delays
– Identifies every combinational path in the circuit
– Calculates timing on each path
– Identifies the worst-case design path
• Delay file: delay of each independent net (not
combinational paths)
– .dly file in the project implementation directory
• Trace file: delay of each combinational path
– Must run trce command on placed and routed design
trce -a -v vendingmachine.ncd
– .twr output file from trce command

BYU ECEn 320

Delay File Example


Mon Sep 16 10:50:12 2002

File: vendingmachine.dly

The 20 Worst Net Delays are:


-------------------------------
| Max Delay (ns) | Netname | N103
------------------------------- u1_u1.DOA3
6.698 N103 6.698 cathodes_1_OBUF.F1
6.300 N106 6.472 cathodes_1_OBUF.G4
6.259 N104 6.422 cathodes_0_OBUF.F4
6.184 N105 6.683 cathodes_0_OBUF.G1
5.437 clkdivide<19> 6.493 cathodes_5_OBUF.G4
3.427 loadswitches 6.188 cathodes_4_OBUF.F4
2.986 clkdivide<13> 6.441 cathodes_4_OBUF.G1
2.977 clkdivide<14>
2.536 switches_d<0>
2.200 anodes_2_OBUF clkdivide<19>
2.184 switches_d<1> clkdivide<18>.YQ
1.986 cathodes_6_OBUF 5.437 slowclkbuf.IN
1.888 cathodes_5_OBUF 0.963 clkdivide<18>.G3
1.886 anodes_3_OBUF
1.871 switches_d<3>
1.866 switches_d<2>
1.865 digitselect_d<0>
1.861 switches_d<4>
1.825 digitselect_d<1>
1.746 GLOBAL_LOGIC0
---------------------------------
BYU ECEn 320

24
Trace File Example
--------------------------------------------------------------------------------
Delay: 18.621ns (data path)
Source: u1_u1.A
Destination: cathodes<6>
Data Path Delay: 18.621ns (Levels of Logic = 3)
Source Clock: CLK_BUFGP rising

Data Path: u1_u1.A to cathodes<6>


Location Delay type Delay(ns)
Physical Resource
Logical Resource(s)
------------------------------------------------- -------------------
RAMB4_R2C1.DOA3 Tbcko 3.985 u1_u1
u1_u1.A
CLB_R10C42.S0.G1 net (fanout=7) 6.441 N103
CLB_R10C42.S0.Y Tilo 0.653 cathodes_4_OBUF
Mrom_cathodes_inst_lut4_6
P148.O net (fanout=1) 1.986 cathodes_6_OBUF
P148.PAD Tioop 5.556 cathodes<6>
cathodes_6_OBUF
cathodes<6>
------------------------------------------------- ---------------------------
Total 18.621ns (10.194ns logic, 8.427ns route)
(54.7% logic, 45.3% route)

BYU ECEn 320

Trace File Example

Clock clk to Pad


---------------+------------+
| clk (edge) |
Destination Pad| to PAD |
Setup/Hold to clock clk ---------------+------------+
---------------+------------+------------+ anodes<0> | 13.130(R)|
| Setup to | Hold to | anodes<1> | 12.500(R)|
Source Pad | clk (edge) | clk (edge) | anodes<2> | 13.710(R)|
---------------+------------+------------+ anodes<3> | 13.432(R)|
switches<0> | 0.478(R)| 0.001(R)| cathodes<0> | 19.761(R)|
switches<1> | 0.301(R)| 0.179(R)| cathodes<1> | 20.084(R)|
switches<2> | 0.436(R)| 0.043(R)| cathodes<2> | 18.920(R)|
switches<3> | 0.300(R)| 0.180(R)| cathodes<3> | 19.131(R)|
switches<4> | 0.355(R)| 0.124(R)| cathodes<4> | 19.582(R)|
---------------+------------+------------+ cathodes<5> | 20.406(R)|
cathodes<6> | 20.452(R)|
---------------+------------+

BYU ECEn 320

25
Clock Skew

BYU ECEn 320

Clock Skew
• Proper operation of synchronous systems
requires that all registered elements are clocked
at the same time
• Some times this is not possible - the clock seen
at one flip-flop may be slightly delayed with
respect to the clock at another flip-flop
• The relative delay of the clock is called clock skew

BYU ECEn 320

26
Causes of Clock Skew
• Natural delays in clock wiring
• Capacitance on clock line
• Artificial delay due to improper design
• Wiring delays between chips

BYU ECEn 320

Three Cases of Skew


IN Q0
D Q D Q Q1
• Skew is in same
C C
direction as
δ
dataflow CLK0 skew CLK1

IN Q0
D Q D Q Q1
• Skew is in opposite
C C
direction of
δ
dataflow CLK1 skew CLK0

IN Q0
D Q D Q Q1
• Type of skew is not
C C
known.
Clock Network

BYU ECEn 320

27
Clock Skew (Type 1)
IN Q0 Q1
D Q D Q

C C

δ
CLK0 skew CLK1

• Clock CLK1 is a delayed version of CLK0


• Clock skew = tskew

BYU ECEn 320

Clock Skew (Type 1)


IN Q0 Q1
D Q D Q

C C

δ
Assuming the setup and
CLK0 skew CLK1 hold times for Q0 are met,
are the setup and hold
times for Q1 met?

CLK0

IN

Q0
Tskew
δ
CLK1

BYU ECEn 320

28
Clock Skew (Type 1)
IN Q0 Q1
D Q D Q

C C

δ
CLK0 skew CLK1

CLK0

IN

Q0
Tskew
Tsu Tsu Tsu
CLK1
Th Th Th

BYU ECEn 320

Clock Skew (Type 1)

CLK0

IN

Tcq

Q0
Tcq

Ts Ts

CLK1 Th Th
Tskew Tskew

Hold Time Margin = Tcq − Thold − Tskew > 0


Should we use Tcq(min) or Tcq(max)?
BYU ECEn 320

29
Clock Skew (Type 1)

CLK0

IN

Tcq

Q0
Tcq

Ts Ts

CLK1 Th Th
Tskew Tskew
Hold Time Margin = Tcq(min) − Thold − Tskew > 0
How should combinational propagation delays be handled?
BYU ECEn 320

Clock Skew and Hold Time Margin


• Hold time margin with combinational delay and skew in
direction of data :
Hold Time Margin =
Tcq(min) + Tcomb(min) − Thold − Tskew(max) > 0

• First two terms are minimum time after clock edge that
a D input changes
• Hold time is earliest time that the input may change
• Clock skew subtracts from the available
hold-time margin
• Compensating for clock skew:
– Longer flip-flop propagation delay
– Explicit combinational delays
– Shorter (even negative) flip-flop hold times

BYU ECEn 320

30
Very Long Clock Skew

CLK0

IN

Tcq

Q0
Tcq

Ts Ts
CLK1 Th Th
Tskew Tskew
The clock skew can be so long that setup and hold times
are met. What is wrong with this condition?
BYU ECEn 320

Clock Skew and Clock Rate


IN Q0 Q1
D Q D Q

C C

δ
CLK0 skew CLK1

Assuming hold-times are met, how does clock skew


affect minimum clock rate on this circuit?

BYU ECEn 320

31
Clock Skew and Clock Rate

CLK0

IN

Tcq

Q0
Tclk Tcq

Ts
CLK1 Th
Tskew
Assuming hold-times are met, how does clock skew
affect minimum clock rate?
BYU ECEn 320

Clock Skew and Clock Rate

CLK0

IN

Tcq

Q0
Tclk Tcq

Ts
CLK1 Th
Tskew
Setup Time Margin = Tclk − (Tcq + Tsetup − Tskew) > 0
Tclk > Tcq + Tsetup − Tskew
BYU ECEn 320

32
Summary of Effects of Skew
Type of Skew Hold Time Setup Time Max Frequency
Margin Margin
Skew is in same Tcq(min) + Tcomb(min) Tclk − (Tcq(max) + Tcq(max) + Tcomb(max)
direction as data − Thold − Tskew(max) Tcomb(max) Tsetup − Tsetup − Tskew(max)
Tskew(max))
Decrease Increase Increase
Skew is in Tcq(min) + Tcomb(min) Tclk − (Tcq(max) + Tcq(max) + Tcomb(max)
opposite direction − Thold + Tskew(max) Tcomb(max) Tsetup + Tsetup + Tskew(max)
of data Tskew(max))
Increase Decrease Decrease
Skew type not Tcq(min) + Tcomb(min) Tclk − (Tcq(max) + Tcq(max) + Tcomb(max)
specified or − Thold − Tskew(max) Tcomb(max) Tsetup + Tsetup + Tskew(max)
unknown, assume Tskew(max))
worse case. Decrease Decrease Decrease

BYU ECEn 320

Example of bad clock distribution

BYU ECEn 320

33
Clock distribution in ASICs

• This is what a typical ASIC router will do if you


don’t lay out the clock by hand.
BYU ECEn 320

“Clock-tree” solution

• Often laid out by hand


• Wide,fast metal (low R ==> fast RC time constant)
BYU ECEn 320

34
Clock Skew in FPGAs
• Wiring delays in FPGAs are relatively long
– Wiring delay much longer than gate delay
– Custom ICs: gate delays dominate wiring delay
• FPGAs use a segmented routing structure
– Each wire consumes multiple routing segments
– Routing segments connected by routing boxes: switches & gates
– Each wire segment and routing box consume delay
• It is very difficult to guarantee that all clock wires will
arrive at the flip-flops at the same time

BYU ECEn 320

FPGA Routing Example

Routing Box

Routing Box

Local Global Line


Interconnect

Routing Box Long Line

Global Line Routing Box


Routing Box

BYU ECEn 320

35
Clock Skew in FPGAs
• Solution: Provide a dedicated low-skew clock wire
– Single wire routed throughout entire device
– Delay of wire carefully controlled (same at each FF)
– High-speed (short rise and fall time)
• Spartan 2 (3) FPGA
– Provides 4 (8) global clock wires
– Must use the BUFG primitive to use a clock wire

BYU ECEn 320

Clock Skew in FPGAs


• Timing reports indicate high-skew clock net
WARNING:Timing:2554 - Clock nets using non-dedicated resources were found in
this design. Clock skew on these resources will not be automatically
addressed during path analysis. To create a timing report that analyzes
clock skew for these paths, run trce with the '-skew' option.

The following clock nets use non-dedicated resources:


slowclock

• Go back and fix design to make sure low-skew


BUFG used in design

BYU ECEn 320

36
DLL-Based Clock Design

Xilinx AppNote: xapp462

BYU ECEn 320

Conventional Clock Driver

CLK_PIN CLK

Delay is introduced in the BUFG driver


( output of BUFG delayed from output
of IBUFG )

Common delay does not increase clock skew


between internal flip-flops.
However, there will be considerable skew
between flip-flops outside of FPGA and
flip-flops inside FPGA

BYU ECEn 320

37
Xilinx Delay-Locked Loop ( DLL )
• Used to synchronize two clocks
– Minimize internal clock skew
– Remove internal clock buffer delays
• Operates using feedback
– Compares the phase of the input clock and feedback
clock
– Adds delay to insure CLKFB and CLKIN are in phase

BYU ECEn 320

Using a Delay-Locked Loop


CLK_PIN CLK

The delay of an
IBUFG has been
characterized
and is
compensated
for in the DLL
Using feedback from BUFG output, CLKDLL adds appropriate
delay to insure input of IBUFG (clk_pin) and output of BUFG (clk)
are in phase (i.e. delay of both BUFG and IBUFG are hidden).

Output clock is 50% duty cycle and very low jitter.

BYU ECEn 320

38
This synchronizes CLK_IN with CLK…
Using a Single DLL That gives control over setup times for
external inputs …
IBUFG BUFG
CLKIN_IBUFG CLK0 CLK
CLKIN_IN CLKIN CLK0
To rest of FPGA…

Tie to a push button CLKFB


DLL LOCKED_OUT
RST_IN RST LOCKED Locked is
DLL won’t start locking asserted when
process until Rst is low. the DLL has
acquired the
These are asynchronous lock on the
Sets – you want them to input clock.
work even before clock does Until then,
force Rst high.

Set Set Set Set


RST
‘0’ D Q D Q D Q D Q

This ensures main system is reset for about 4 cycles after DCM locks

BYU ECEn 320

DLL Notes
• The DLL may take several thousand clock cycles to
synchronize clocks.
– Output clocks may introduce glitches, spikes, and other spurious
movement
– The LOCKED output pin indicates that the DLL has locked
synchronization
– You must allow proper starting of flip-flops driven by DLL by using
LOCKED signal
• Input clock frequency must be stable and fall within
specified frequency range
• Resetting DLL:
– DLL must be reset when input frequency changes
– DLL must be reset when device is reconfigured
• The phase and duty cycle can be controlled with
attributes
– CLK0, CLK90, CLK180, and CLK270 output pins
BYU ECEn 320

39
Dealing with Late Starting Clocks
These FFs power up to the low state
When FPGA configuration is
done, the I/O buffers turn on.
However, the input clock pin
‘1’ D Q … D Q starts toggling a little late.
Reset the DLL until after the
clock starts so the DLL can
acquire the lock properly.
IBUFG
RST
CLKIN_IN CLKIN CLK0
To on-chip circuits
DLL BUFG
CLKFB
LOCKED

To reset circuitry

The DLL must be reset until the input clock is stable

BYU ECEn 320

Clock Division and Multiplication

CLK

BUFG
CLK2x
CLK/16
BUFG

CLKDV_DIVIDE=16

Custom outputs for 2x clock multiplier and for


programmable clock division (divide by 1.5 to 16)

Multiple DLLs needed for further multiplication/division

BYU ECEn 320

40
Coarse Phase Adjustment Using
DLLs
• Each DLL has many outputs including:
– CLK0: the “normal” clock output
– CLK90: 90 degree phase delay
– CLK180: 180 degree phase delay
– CLK270: 270 degree phase delay
• Useful in special contexts
• Not useful for “globally synchronous” designs

BYU ECEn 320

The First Attempt at Clocking the


SDRAM

CLKIN To FPGA core…

To SDRAM
OBUF

BYU ECEn 320

41
The Results (not good)

Setup and Hold times are not met at the SDRAM. The problem is there is
skew between the on-chip clock and the clock that goes to the SDRAM.

BYU ECEn 320

Removing Board-Level Skew

Clock leaving chip is synchronized


to input clock (feedback delay
includes board wiring)

Internal clock synchronized


to input clock

SDRAM

BYU ECEn 320

42
Dual DLLs

One DLL gets feedback from off-chip


Generates clock for off-chip devices

Second DLL generates clock for on-chip

Both add enough delay to achieve ideal


clock alignment

Entire board now behaves as one large


globally synchronous system

Clk-to-Q times for signals leaving FPGA


are large, but all synchronous devices
share one clock

This solves many problems

BYU ECEn 320

Dual DLLs

IBUFG OBUF
CLKBUF RST EXTCLK0
CLK_IN CLKIN SCLK

SCLK_FB FBBUF
EXTDLL To off-chip
CLKFB circuits
From LOCKED
off-chip IBUF
circuits EXTLOCK

INTRST

RST BUFG
CLK0
CLKIN CLK
INTDLL To on-chip
CLKFB
LOCKED circuits

LOCKED_OUT

To reset circuitry

BYU ECEn 320

43
Simulation of Dual DLL
1 2 3 4 5 6

1. clk_in and sclk_fb are not aligned.


2. DLL advances sclk so clk_in and sclk_fb are aligned.
3. External DLL locks, starts internal DLL.
4. clk_in is aligned with both clk and sclk_fb.
5. Internal DLL locks.
6. Reset falls, count starts counting.

BYU ECEn 320

Instancing The Building Blocks


library ieee;
use ieee.std_logic_1164.ALL;
use ieee.numeric_std.ALL;

library UNISIM;
use UNISIM.Vcomponents.ALL;
Provides component definitions.
Also provides simulation
capability for your design
that is ignored in synthesis…
component IBUFG
port ( I : in std_logic;
O : out std_logic);
end component;

component BUFG
port ( I : in std_logic;
O : out std_logic); Defined for you
end component;
in UNISIM.Vcomponents
component CLKDLL
port (
clkin : in std_logic;
clkfb : in std_logic;
rst : in std_logic;
clk0 : out std_logic;
clkdv : out std_logic;
locked : out std_logic
);
BYU ECEn 320

44
Metastability

BYU ECEn 320

The Bistable Element


• The simplest sequential circuit
• Two stable states
– One state variable, say, Q

HIGH LOW
LOW HIGH

LOW HIGH HIGH LOW

BYU ECEn 320

45
Analog Analysis
• Assume pure CMOS thresholds, 5V rail
• Theoretical threshold center is 2.5 V

BYU ECEn 320

Dynamic Analysis

Top inverter
Vout1
= Vin2 reflection
line

Bottom
inverter Vin1 = Vout2
Vin2 = Vout1
Vin1= Vout2

Points of crossing indicate stable states

BYU ECEn 320

46
Dynamic Analysis

4
Vout1
= Vin2 2
3

Vin1 = Vout2
Vin2 = Vout1

Vout1 = T (T (T (Vin1 )))


123
Vout 1
1 Vin1= Vout2 142 43
Vout 2

BYU ECEn 320

Metastable State
• Metastability is inherent in any bistable circuit

2.5 V 2.5 V

Vout1
= Vin2

2.5 V 2.5 V

Vin1 = Vout2
Vin2 = Vout1
Vin1= Vout2
• Two stable points, one metastable point
BYU ECEn 320

47
Metastability

6
4
5

2
3

BYU ECEn 320

Another look at metastability

BYU ECEn 320

48
Why Study Metastability?
• All real systems are subject to it
– Problems are caused by “asynchronous inputs” that do
not meet flip-flop setup and hold times.
• Especially severe in high-speed systems
– since clock periods are so short, “metastability
resolution time” can be longer than one clock period.
• Many digital designers, products, and
companies have been burned by this
phenomenon.

BYU ECEn 320

Flip-Flop Decision Window

If D is not stable during the decision


window then metastability may occur.

tcq

BYU ECEn 320

49
Metastability resolution time

tcq
tr = Resolution time is the extra time
needed to resolve the logic state
BYU ECEn 320

Metastability Resolution Time


• Various manufacturers use various definitions of Tr
– Extra time beyond tcq to resolve metastable output
– Total time after clock edge to resolve it

• Pay attention to the examples to ensure you


understand which it is…

BYU ECEn 320

50
Metastability Window
T0 Assume that data arrives
uniformly over clock cycle, Tclk.
The probability that data will
tcq arrive during T0 in a clock
tcq period Tclk is:
T0
tcq P= = T0 f
HOLD
Tclk
The probability of a
metastable event
happening

Tclk
BYU ECEn 320

Metastability Resolution Time

P = e − tr /τ
The probability of a metastable event
lasting longer than some time, tr
1 τ : “Resolving Time Constant”

tr Linear on semi-log plot

BYU ECEn 320

51
How Resolution Time is Measured
VOHmin
Exponential
decay of
Failures failures

VOLmax

tcqMax
A digital oscilloscope is used to count failure events by zones
(masks). The width of each mask represents a time unit for
comparing events at different times. The tallies of these
masks reveal the population decay rate. The number of masks
should be chosen so that enough decay rate is observed.
BYU ECEn 320

Analysis of Failure Counts

Semi-log scale

1 b
− =
τ a

BYU ECEn 320

52
Mean Time Between Failure

Error Rate
=
Frequency of asynchronous events
x P=a
The probability of a metastable event happening
x P = T0 f
The probability of a metastable event lasting
longer than some time, t P = e − t /τ r

MTBF = 1 / Error Rate

BYU ECEn 320

Flip-Flop Metastability Failure Time


( tr / τ )
MTBF (t r ) = Te0 ⋅ f ⋅a
MTBF(tr) = mean-time between synchronizer failures,
where a failure occurs if metastability persists
beyond time tr after Tcq.

f frequency of flip-flop clock


a number of asynchronous input changes per second
T0 Metastability time window (describes the likelihood of
going metastable)
τ Resolving time constant (describes the speed at which
the metastable condition is being resolved)

BYU ECEn 320

53
Typical
flip-flop
metastability
parameters

MTBF (t r )
r ( t /τ )
= e
T0 ⋅ f ⋅ a

τ and T0 are
flip-flop dependent
constants
BYU ECEn 320

Xilinx Metastability Measurements

XAPP 094

• T0 = 0.1 · 10-9
– XC4000-3 IOB: τ = .062 ns
– XC4000-3 CLB: τ = .051 ns
BYU ECEn 320

54
MTBF Metastability Example
ƒ 10 MHz microprocessor clock
ƒ Asynchronous input changes 100,000 times/second
ƒ T0 = 0.4 (for 74LS74)
ƒ τ = 1.5ns (for 74LS74)
ƒ Output must be stable 80 ns after Tcq
( tr / τ )
MTBF (t r ) = Te0 ⋅ f ⋅a
MTBF(80 ns) = e80/1.5 / (0.4 · 107 · 105) = 3.6 · 1011 sec
(~100 centuries)

Note: if you ship 10,000 copies of your system, one will fail
every year
BYU ECEn 320

Is 1000 years enough?


• If MTBF = 1000 years and you ship
52,000 copies of the product, then some
system experiences a mysterious failure
every week.
• Real-world MTBFs must be much higher.
• How to get better MTBFs?
– Use faster flip-flops
– Wait for multiple clock ticks to get a longer
metastability resolution time (i.e. increase tr)
• Waiting longer usually doesn’t hurt performance
• …unless there is a critical “round-trip” handshake.

BYU ECEn 320

55
Asynchronous Inputs and
Synchronizers

BYU ECEn 320

Asynchronous inputs
• Not all inputs are synchronized with the clock
– Keystrokes
– Switches & buttons
– Sensor inputs
– Interrupt signals
– Asynchronous communication protocols (UART, etc.)
• Asynchronous inputs must be synchronized with
system clock before being used within the
system
– Asynchronous inputs can cause data integrity
problems if the signals are not synchronized properly.

BYU ECEn 320

56
Synchronization Problems
• A flip-flop output may not operate
correctly when setup and hold times are
not met.
– Output may enter a metastable state
– Time in this state is theoretically unbounded
(although probability decreases exponentially
with time)
– Some gates may interpret metastable state
as a “1” while others will interpret it as a “0”

BYU ECEn 320

A One-Stage Synchronizer

BYU ECEn 320

57
Metastability Resolution Time
• Max tr - Maximum metastability resolution time
• Maximum time that the output can remain
metastable without causing synchronizer failure
– The flip-flop may be metastable for a short time and
return to normal before being sampled by the next FF

clock
tr < setup-time margin
tr < Tclk − tcq(max − tcomb(max) − tsetup
ff-out

tcq tcomb setup-


time
t
margin setup tr(max) = setup time margin
tr

BYU ECEn 320

A Two-Stage Synchronizer

• If FF1 resolves in less than tr = tclk − tcq(max) − tsetup time, then


only META goes meta-stable.
• SYNCIN is valid early in the clock cycle, giving maximal setup
margin to the synchronous system.
• Easy to calculate the probability of “synchronizer failure” (FF1
still metastable when META sampled)
• This is the recommended synchronizer circuit for most cases.
BYU ECEn 320

58
MTBF Metastability Example
- Asynchronous interrupt to a microprocessor system
- 10 MHz system clock
- Use the following synchronizer circuit
- Use 74LS74 parts

Step 1: compute Max tr = setup time margin to FF2


tr = tclk − tcq(max) − tsetup = 100 ns − 10 − 10 = 80 ns
BYU ECEn 320

MTBF Metastability Example


Step 2: Compute MTBF
ƒ 10 MHz microprocessor clock
ƒ Asynchronous input changes 100,000 times/second
ƒ T0 = 0.4 (for 74LS74)
ƒ τ = 1.5ns (for 74LS74)
( tr / τ )
MTBF (t r ) = Te0 ⋅ f ⋅a
MTBF(80 ns) = e80/1.5 / (0.4 · 107 · 105) = 3.6 · 1011 sec
(~100 centuries)

Note: if you ship 10,000 copies of your system, one will fail
every year
BYU ECEn 320

59
MTBF Metastability Example #2

Increase clock rate to 16 MHz

tr = tclk − tcq(max) − tcomb − tsetup = 62.5ns − 10 − 0 − 10 = 42.5


ns
MTBF(42.5 ns) = e42.5/1.5 / (0.4 · 107 · 105) = 3.1 sec !

BYU ECEn 320

Multiple-cycle synchronizer

( n ⋅ tclk − tcq (max) − tsetup ) / τ


tr = n · tclk − tcq(max) − tsetup MTBF (tr ) = e
T0 ⋅ f ⋅ a
What is wrong with this circuit?
BYU ECEn 320

60
Multiple-cycle synchronizer

Clock-skew problem

BYU ECEn 320

Deskewed multiple-cycle synchronizer

• Necessary in really high-speed systems


• DSYNCIN is valid for almost an entire clock period

BYU ECEn 320

61
Cascaded Synchronizer
ASYNCIN MET1 MET2 METn

...
CLOCK

Skew problem can be eliminated by using cascaded synchronizers

Probability of failure the product of failure for n+1 flip-flops


( n⋅tr /τ ) ( n⋅( tclk − tcq (max) − t setup ) /τ )
MTBF (tr ) = e =e
T0⋅ f ⋅a T0⋅ f ⋅a
Note that this not as effective as the multiple-cycle synchronizer
since the setup-time and cq-time must be subtracted for each flip-flop
BYU ECEn 320

Multiple Clock Domains


• A clock domain is defined as all synchronous logic and
signals driven by a single clock or multiple derived clocks
that have constant phase relationships to the primary
clock, such as
– Inverted clocks
– Half frequency clocks
• A signal driven from one clock domain arrives as an
asynchronous signal in the destination-clock domain,
possibly violating the destination flip-flop setup or hold
times, and causing it to enter a metastable condition.
• Static timing analysis tools can only operate within a
clock domain, not across multiple clock domains.
• Paths that cross clock domains are false paths (paths
that cannot be analyzed by static timing tools); any
logic in this path must be carefully crafted and verified.
BYU ECEn 320

62
Synchronizers Between Different Clock
Domains

Synchronizer

CLK1 CLK2
Synchronous
Synchronous
Domain
Domain
Synchronizer

Synchronization Objectives:
• Don’t miss any events, or drop any data
• Don’t duplicate any events, or data
• Maintain event/data order
BYU ECEn 320

Level Synchronization
clk1
in
clk2
out

A level-signal synchronized with clk1 is translated into a


level-signal synchronized with clk2.
The exact delay from the signal in to out is not specified.
Synchronizer
META OUT
IN

CLK2

BYU ECEn 320

63
Pulse Synchronization
clk1
in

clk2
out

A pulsed-signal synchronized with clk1 is translated into a


pulsed-signal synchronized with clk2.
The exact delay from the signal in to out is not specified.
Pulse
Catcher Synchronizer
META OUT
IN EN

RST RST

CLK1
CLK2

BYU ECEn 320

Understanding the Failure Modes


• Divergence of an Asynchronous Signal
A divergence of an asynchronous signal can cause
functional errors
• Convergence of Asynchronous Signals
The combinational logic with asynchronous inputs
can cause glitches that are caught by the
synchronizer, creating functional errors.
• Convergence of Synchronized Signals
Once synchronization is completed, the structures
beyond the synchronizers still matter. The design
must ensure that the synchronized signals do not
converge—reconvergence can create functional
errors.
BYU ECEn 320

64
Divergence of an Asynchronous Signal

SYNC1 ≠ SYNC2

BYU ECEn 320

Even Worse Divergent Paths

• Combinational delays to the two synchronizers


are likely to be different.

BYU ECEn 320

65
Example of the Problem
One-hot State Machine

A in A B
in
CLOCK

B IN (ASYNC)

1. in is asserted asynchronously.
2. in changes near clock edge, both flip-flops go metastable.
3. State A is reset, but state B is not set.
4. State machine is stuck in invalid state, game over !
BYU ECEn 320

The Solution to this Problem

A in

in in is captured by a single synchronizer, then


it is propagated to the state machine.
B
One-hot State Machine
Synchronizer A B
IN (ASYNC)

CLOCK

BYU ECEn 320

66
The Way to Do It

• One synchronizer per input


• Carefully locate the synchronization points in a
system.

BYU ECEn 320

Convergence of Asynchronous Signals

Combinational Synchronizer
Logic

CLKB
CLKA

Combinational logic can cause glitches that are


caught by the synchronizer, creating functional errors.
Timing cannot be checked by a static timing checker.

BYU ECEn 320

67
The Way to Do It

Combinational
Synchronizer
Logic

CLKB
CLKA

Add a flip-flip to filter out the glitches.


Register all signals before crossing clock boundaries.
Timing can be checked by static timing analyzer.

BYU ECEn 320

Convergence of Synchronized Signals


Synchronizer
X Xsync

Combinational
Synchronizer Logic

Y Ysync

CLKA CLKB

The logic beyond the synchronizers still matters, because the propagation
delay through the synchronizers is not predictable resulting in the loss of
correlation among signals crossing clock domains.
The relative timing (order of changes) that exists between X and Y do not
exist between Xsync and Ysync.
Simultaneity of Xsync and Ysync cannot be assured during synchronization.
Sequence of Xsync and Ysync cannot be assured during synchronization if X
and Y change concurrently. BYU ECEn 320

68
The Way to Do It

Combinational
Synchronizer
Logic

CLKB
CLKA

Consolidate signals before they are sent to reduce the number of registers
and eliminate the issue of relative timing between synchronized signals.
When planning the design, it's important to eliminate the need for these
arrival order (timing) dependencies between synchronizes signals.

BYU ECEn 320

Multiple Clock Domain Design


• Identify all of the clock domains in your circuit.
• Identify signals that go from one clock domain to another.
Try to limit the number of signal that cross boundaries.
• Check for faulty structures (CAD tools can help):
– Divergence of an Asynchronous Signal
– Convergence of Asynchronous Signals
– Convergence of Synchronized Signals
• For each crossing signal, determine what kind of
synchronizer is needed.
• For data bus crossings, do not synchronize data. Instead,
define a handshake protocol, and only synchronize the
control signals.

BYU ECEn 320

69
Transferring Data
Across Different Clock Domains

BYU ECEn 320

Multi-Clock Systems
• With increasing system integration, systems
operating with multiple I/O standards using
multiple asynchronous clock frequencies are
becoming more common.
• Because of the asynchronous nature of these
designs, passing data or control signals between
logic operating on different clock frequencies
presents a special set of problems, often
dependent on the clock frequencies.
• Errors are nearly impossible to detect in
simulation and easily missed in product validation.

BYU ECEn 320

70
Transferring Data from ACLK to BCLK
Data0

Data1

Data2

ACLK BCLK

Why won’t this work?


• What happens if ACLK is faster than BCLK?
• What happens if ACLK is slower than BCLK?
• What happens if wire delays on data signals are not equal?
• Can these problems be solved by putting synchronizers in the data path?

BYU ECEn 320

Transferring Data from ACLK to BCLK

Basic structure of data transfer circuit from ACLK domain to BCLK domain.
If you put a synchronizer in the control path the data path will work properly.

Pulse
Catcher Synchronizer
NEW META
BLOAD
EN BYTE FF1 FF2
ALOAD
Control RST RST

path ACLK
BCLK
ALOAD BLOAD
CE CE

Data ADATA[7:0]
D
SDATA[7:0]
D
BDATA[7:0]

path

AREG BREG

BYU ECEn 320

71
BLOAD Circuit Timing
Pulse
Catcher Synchronizer
NEW META BLOAD
EN BYTE FF1 FF2
ALOAD
RST RST tr = Tbclk − tcq - tsetup
ACLK
BCLK

ACLK

ALOAD

SDATA[7:0]

NEWBYTE

BCLK
tcq tr tsetup tcq tr tsetup
META

BLOAD

BDATA[7:0]

BYU ECEn 320

Or else ... META Resolves Low


Pulse
Catcher Synchronizer
NEW META BLOAD
EN BYTE FF1 FF2
ALOAD
RST RST tr = Tbclk − tcq - tsetup
ACLK
BCLK

ACLK

ALOAD

SDATA[7:0]

NEWBYTE
woops
BCLK
tcq tr tsetup
META

BLOAD

BDATA[7:0]

BYU ECEn 320 woops

72
Timing Problems
• ALOAD comes too early
• NEWBYTE does not get set, second ALOAD is lost
• BDATA gets the wrong data
• First data is missed

ACLK

ALOAD

SDATA[7:0]

NEWBYTE
woops
BCLK
tcq tr tsetup
META

BLOAD

BDATA[7:0]

BYU ECEn 320 woops

Resolving Timing Problems


• Increase the minimum time between ALOADs to handle the
worse case timing (homework problem)
– Reduces the bandwidth to 1 byte / 10 ACLK periods.
• Increase the number of bytes transferred during each
BLOAD signal
– Transfer twice as much data across clock domains in same time.
– Requires half as many BLOAD signals.
– Increases bandwidth to 2 bytes / 10 ACLK periods.
• Use a handshake protocol with feedback (an Acknowledge)
to insure no data is lost or duplicated.
– New ALOAD would not be generated until after the acknowledge.
– May slightly reduce bandwidth due to overhead of the handshaking.

BYU ECEn 320

73
Transfer More Data at a Time
Buffer two bytes of Adata before transferring both data
bytes between ACLK and BCLK domains
DOUBLE
AREG1 BUFFER BREG1
ADATA[7:0]
D D D
ALOAD1 BLOAD
CE CE CE

AREG2 BREG2
D D
ALOAD2
CE CE

ACLK BCLK

BLOAD generated half as often (only after ALOAD2).


ALOAD1 can happen in parallel with BLOAD because of double buffer.

BYU ECEn 320

Transferring Data Across Clock Domains


ALOAD BLOAD

ACLK

ACK BCLK

SDATA

• The synchronizer guarantees that signals will settle following a metastability


violation, thereby preventing undetermined signal levels from propagating to the
destination module.
• The handshake protocol maintains signals levels long enough to ensure that the
system does not miss signal events or wrongly interpret them as multiple events.
• Normally, the circuit synchronizes only handshake signals that signify the validity
of data being transferred to the destination clock domain.
• Once the handshake signals transfer to the destination-clock domain, the system
clocks the data directly to the destination module.

BYU ECEn 320

74
Handshake Protocol
The handshake protocol must ensure that
• the data holds stable long enough for circuitry to sample
it once in the destination-clock domain
• a new data-valid signal can not be asserted until the
destination has acknowledged that the first data valid
signal was received
• data transfer works independent of both TX and RX clock
frequencies
TX Data
Domain A
Data Data
Valid Ack Valid Ack
RX Data
Domain B
BYU ECEn 320

Two-Phase Handshake Protocol

• Req and Ack are signaled by edges, not levels


• Positive and negative edge event driven
• Fast
• Works with any clock frequency

BYU ECEn 320

75
Four-Phase Handshake Protocol

• Req and Ack are signaled by levels


• Levels must be restored before next transfer
• More slow and complex the two-phase
• Works with any clock frequency

BYU ECEn 320

Four-Phase Handshake Protocol


Transfer of relatively low-bandwidth signals can be done
using a four-phase handshaking protocol
1. Sender drives data on to bus and provides a signal indicating that
data is available (req).
2. Receiver signals that the data has been received (ack).
3. Sender de-asserts data and control signal
4. Receiver de-asserts ack signaling that it is ready for more data

1 3
Req (data valid)
2 4
Ack (data ack)

valid data

BYU ECEn 320

76
Four-Phase Level-Handshake Protocol
Synchronizer
REQ REQsync

Sender Receiver
Synchronizer
FSM ACKsync ACK FSM
Q D

CLK

EN EN
Data Bus (not synchronized)

BYU ECEn 320

The Two-Phase Handshake “Toggle” Protocol

• By using the change in the handshake’s signal


level and not the level itself to communicate
through the synchronizer, the system
immediately readies itself for another
transaction.
• The deassertion acknowledgment occurs without
the need for a second round trip to restore all
control signals to their proper logical states, as
a four-phase round trip requires.

BYU ECEn 320

77
The Two-Phase Handshake “Toggle” Protocol

SDATA

ALOAD

BLOAD

ACK

BDATA

BYU ECEn 320

Toggle-Handshake Protocol Circuit

BLOAD

ALOAD

ACK

BYU ECEn 320

78
Using FIFOs to Transfer Data
• Most high-speed data transfers occur in bursts
rather than evenly spaced data streams
– Large blocks of data arriving at random times
– Often modeled with a Poisson distribution
e − λ λx
p ( x) =
– Example: Disk & Video I/O x!
• It is difficult to transfer such data using the
simple circuit described earlier
– Can’t keep up during burst data mode
– Is sitting idle when no data transfer is occurring
• First-in First out (FIFO) memories are often
used to transfer such data

BYU ECEn 320

First-In First-Out (FIFO)


• FIFO is implemented with memory
– Depth of FIFO is limited
– Analysis must be performed to ensure overflow does not occur
• Two ports for FIFO memory
– Write port: Port for writing data into the FIFO (via a write pointer)
– Read port: Port for reading data from the FIFO (via a read pointer)
• Control Signals:
– FIFO empty: indicates to the read port that there is no data
available in the FIFO
– FIFO full: indicates to the write port that the FIFO is full
• Synchronous FIFO: Read and Write port accessed with a
common clock
• Asynchronous FIFO: Different clocks can be used for the
read and write ports

BYU ECEn 320

79
FIFO-Handshake Protocols
• Due to bursty data and the difference in clock speeds, a
latency-absorbing FIFO often acts as a data buffer
between different clock domains.
• In this situation, the FIFO empty and full conditions
perform the handshaking.
• Asynchronous FIFOs must pass the read and write
pointers between clock domains (write clock/read clock).
• Because the pointers contain multiple bits, they can
introduce a race condition through the synchronizer. To
avoid this problem, you must implement the input and
output pointers as Gray Code counters to ensure that
only one bit changes at a time.

BYU ECEn 320

Synchronizing FIFO
• Valid data on in (out)
when ivalid (ovalid) is true
• FIFO (output client) able
to accept data on in (out)
when iready (oready) is
true
• Data exchanged on input
(output) when ivalid &
iready (ovalid & oready)
• iready = not full
• oready = not empty

clkin domain clkout domain

BYU ECEn 320

80
FIFO Full and Empty
• Head points to next location from which to read
• Tail points to next location to which to write
• FIFO empty when head = tail and no data in FIFO
• FIFO full when head = tail and all locations full
• Hard to tell the two apart
• Simple solution – leave one location empty
– FIFO full when (tail+1) = head (gray coded increment)
• Need to compare head (clkout domain) with tail
(clkin domain) to compute full and empty
– Requires synchronizers to get head in clkin domain and
tail in clkout domain.
– Gray code these two counters
BYU ECEn 320

BlockRAM FIFO

FIFO Full FIFO Data Present


Synchronization and
Control Logic

Counter BRAM Counter


Write Read
INC ADDR ADDR INC

WE
DataIn DataOut
DATA IN DATA OUT
Write Clock Read Clock

BYU ECEn 320

81
Common Pitfalls
• In level-handshakes protocols, you must initiate second round trip
to restore the logical level of the handshake signals to their original
state—doubling the protocol latency, in preparation for the next
handshake.
• Clocking of the data, as well as the strobe signal, through the
synchronizers causes a race condition. Only put synchronizers on the
control signals, not the data lines.
• A designer must neither assume nor require any fixed relationship
between the source- and destination clock frequencies.
• Simulation cannot hope to duplicate the infinite number of clock and
signal-edge relationships that are possible in a clock-domain-
crossing design.
• When inspecting a clock-domain-crossing design, you must fully
analyze two cases.
– First, assume one extreme clock-frequency relationship (10 to one, for
example) and manually analyze the behavior of the implementation on a
timing diagram for at least two consecutive transactions.
– Then, repeat the analysis, assuming the opposite clock-frequency
relationship.

BYU ECEn 320

Homework Unit 4
1. What is the maximum frequency 3. For the circuit below, if asyncin
that a synchronous system can changes 106 times per second, the
operate, given: clock frequency is 333 MHz, setup
Tcq (clock-to-q prop time) = 8 ns time is 0.5 ns, clock-to-q prop time is
(min) and 10 ns (max),
1 ns, T0=10-10sec and τ=.051 ns, what
Tsetup= 3 ns, Thold = 5 ns,
Tprop (total combinational logic prop is the mean time to failure of this
time) = 12 ns (min) and 21 ns (max) circuit? Is it acceptable if you are
Tskew= 1 ns max. planning to sell 100,000 of them?

2. What are the setup-time and hold-


time margins for the second flip-
flop in the circuit below if Tcq is 8 ns
(min) and 10 ns (max), Tsetup is 3 ns
and Thold is 5 ns. Tskew is 1 ns max.
4. For the same parameters as above,
Assume the period of the clock is
tclk. how many flip-flops are needed below
to create a synchronizer with a MTBF
of a million years.
ASYNCIN
...
Clk Network CLOCK

BYU ECEn 320

82
Homework Unit 4
5. Consider the pulse synchronizer c. Considering the worse case
circuit below. scenario, what is the maximum
rate (or minimum period) of input
TRAP META OUT pulses that can be handled by
IN EN
this circuit, in terms of Tclk1,
RST RST
Tclk2, Tsetup, Thold, and clock-
CLK1
to-q time Tcq.
CLK2

a. Show how you could alter the Hint: Worse case scenario is when
circuit to produce a two clk2-cycle Tclk2 > Tclk1 . Trapped pulse
pulse in response to a single cycle TRAP comes barely too late to
pulse at the input. meet the setup and hold times of
the middle flip-flop, which goes
b. Suppose the MTBF of the meta-stable. META eventually
synchronizer was not enough. Show resolves low, and it takes another
how you could alter the circuit to clk2 cycle to get meta to go high.
increase the MTBF.

BYU ECEn 320

83

You might also like