Session12 1

Dr.
Sathish Shet K
Associate Professor
BITS Pilani Department of Electrical & Electronics Engineering
Bengaluru (Off-Campus
Pilani|Dubai|Goa|Hyderabad
BITS Pilani
Pilani|Dubai|Goa|Hyderabad
CAD for IC Design

Lecture 12
Out line of the Topics
Hardware Models for High -Level Synthesis

Hardware Allocation and Assignment
Scheduling and Algorithms
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

High-level synthesis
High-level synthesis is the process of mapping a behavioral

description at the algorithmic level to a structural description
in terms of functional units, memory elements and
interconnections (e.g. multiplexers and buses).
The functional units normally implement one or more
elementary operations like addition, multiplication, etc.
This step in the design process can be made visible in Gajski's Y-
chart.

High-level synthesis as a transition in
Gajski's Y-chart.

Hardware Models for High-level
Synthesis
Logic circuits interact with their environments by means of input
and output signals.
If the outputs of a circuit are observed for different patterns of
input signals and it turns out that the outputs only depend on
the current inputs (but not on the previous inputs), the circuit is
called combinational.
The function of such a circuit can be completely described by a
truth table that presents the value of the output signals for each
possible combination of input signal values.
If, on the other hand, the output signals are not uniquely defined
for a combination of input signal values, the circuit is called
sequential.

This means that the state of the circuit influences the output
values, implying that the circuit has an internal memory.
Sequential circuits are divided into two groups: synchronous or
clocked circuits, where the state transitions can only happen on
regular moments defined by one or more clock signals and
asynchronous circuits, where state transitions can occur on
arbitrary moments.

Hardware for Computations, Data
Storage, and Interconnection
A very essential hardware component is the functional unit (FU).
This is a combinatorial or sequential logic circuit that realizes
some Boolean function, such as an adder, a multiplier or an
arithmetic logic unit (ALU). Figure presents the symbol to be
used for an FU.

Hardware components that can be used by a high-level synthesis system: a
functional unit (a), a register (b), a 4-input multiplexer (c), a configuration with a
bus (d), and a three-state driver (e).

Another essential component inherent to synchronous logic is the
register which makes it possible to store data in the circuit (see
Figure in previous slide ).
Both FUs and registers normally operate on words, which means
that each input or output is actually realized by a number of
signals carrying a bit.
The number of bits in a word is called the word length.

Data, Control, and Clocks
It is a common practice to divide signals in a logic circuit into

two groups: data signals and control signals. Informally, data
signals carry the "operands" of the functional units.
Control signals regulate the transfer of data signals between
hardware components: the enable input of a three-state buffer,
the control inputs of a multiplexer, the signals that select the
function of an ALU, the address signals for a register file, etc.
The hardware components interconnected by wires carrying data
signals form the so called data path.
The data path is the part of the logic circuit where the actual
computations are performed and their results are stored.

In synchronous circuits, the notion of the system clock is essential.
The duration of a computation on an FU can be expressed in multiples
of the system clock period. Such a period is also called a control
step or a cycle.
Some high-level synthesis systems work with a hardware model where
all computations are done in one clock period.
In a more general hardware model, computations are allowed to last
several clock periods. These are so-called multicycle operations.
The opposite, performing more than one computation in one clock
period, is called (operation) chaining.
Multicycle operations normally occupy an FU for the total duration of
the computation.

Internal Representation of the Input
Algorithm
The algorithm to be synthesized by the high-level synthesis system has to be
described in some way.
The usual way of doing this is in textual form, by means of a formal language.
Whatever language is used, the textual form is not appropriate for the
representation of the algorithm during the process of synthesis.
One would especially be interested in the representation of the parallelism
present in the algorithm.
It is therefore necessary to parse the text and transform it into a structured
internal representation.
It is almost generally agreed that this representation should be graph based.
The graph that is used to represent an algorithm is called a data-flow graph
(DFG).

Data-flow graphs are closely related to signal-flow graphs, which
are traditionally used in the field of digital signal processing
(DSP).
The special requirements of DSP, such as the repetitive
application of the same algorithm to data arriving at fixed
intervals, have led to the development of specific DFGs for the
synthesis of DSP algorithms.
An important class of DSP algorithms, characterized by the
absence of computations that are controlled by data-dependent
conditions (such as if-then-else and while constructs), is
formed by the synchronous data-flow graphs.

Simple Data Flow
A data-flow graph is a directed graph G(V, E). The set of nodes V

is subdivided in computational nodes, where actual
computations are performed, input and output nodes for the
communications with the outer world, and conditional nodes,
where data-dependent decisions are taken.
Computational nodes are either atomic or composite. Atomic
nodes perform elementary computations (e.g. additions),
composite nodes are DFGs themselves and therefore allow the
representation of hierarchy. If one wants to allow recursively
defined computations at the data-flow level, recursive
computations can be expressed using composite nodes.

A simple Programme
x <- a*b
y <- c*d
z<- x*-y
The variables a, b, c and d are inputs, x and y are the labels of intermediate
edges and z is an output. The DFG is shown in Figure(a)
The rest of the figure illustrates the token flow. All inputs produce their tokens
simultaneously at, say, time : 0 (Figure (b)). As they arrive simultaneously,
they are immediately consumed by the addition and multiplication nodes
that will compute x and y respectively. Assuming that an addition is ready
earlier than a multiplication, the input tokens for the addition node that will
compute z arrive at distinct moments in time (Figure (cd)). When both are
present, they will be consumed and some time later the output token z will
be computed (Figure e).

Conditional Data Flow
Conditional computations require the use of two special

conditional nodes: the selector node and the distributor node.
They are shown in Figure .
Both of them are characterized by a horizontal input that can only
carry Boolean tokens. Boolean tokens can be produced by
computational nodes that e.g. perform a comparison such as
"less than or equal to" (<).
A selector node has two inputs labeled true and false, one of
which will be selected depending on the value of the token on
the horizontal input.
In a similar way, the horizontal input selects one of the two
outputs labeled true and false of a distributor node.

The firing rules for a selector node (a,b) and
a distributor node (c,d).

Conditional nodes have firing rules that slightly differ from the
firing rule for a computational node.
In the case of a selector node, a node can fire if:
 there is a token with value true on the horizontal input and a
token at the input labeled true; in this case, the latter token will
be propagated to the output.
 there is a token with value false on the horizontal input and a
token at the input Labeled false; in this case, the latter token
will be propagated to the output.

A distributor node can fire when both its horizontal and "vertical"
inputs contain a token.
However, a token is produced at only one output: if the value of
the token at the horizontal input is true the input token is gated
to the output labeled true; it is gated to the output labeled false,
otherwise.

Conditional nodes can be used to represent if-then-else
constructs, by using combinations of distributor and selector
nodes that receive the same horizontal input.
An example of a conditional program fragment and A
corresponding DFG is shown in Figure.
if(a > b)
c<- a-b;
else
c<- b-a;

DFG representations of the program
Conditional nodes

lterative Data Flow
Combinations of selector and distributor nodes can also be used

for the representation of iterative constructs. This can be done
in many ways. However, a structured representation makes it
easier to recognize these constructs in the DFG. A simple
while loop is shown below.
while (a > b)
a<- a-b:
as an example of an iterative construct.
Its possible representation as a DFG is given in Figure.

A DFG representation of the program
fragment

Allocation, Assignment and Scheduling
The main issue in high-level synthesis is the mapping of the internal

description of some algorithm to a hardware configuration that
obeys the hardware model of the synthesis system.
Scheduling is the task of determining the instants at which the
execution of the operations in the DFG will start.
Assignment maps each operation in the DFG to a specific functional
unit on which the operation will be executed. Assignment is also
concerned with mapping storage values to specific memory
elements and of data transfers to interconnection structures. A
storage value is an intermediate result produced by the data path that
needs to be stored until no operation will make use of the value
anymore. Assignment is also called binding.

Allocation (or "resource allocation") simply reserves the
hardware resources that will be necessary to realize the
algorithm. So, it determines that x units of resource type A, y
units of resource type B, etc. will be used, without specifying
which unit will execute which operation. Another term used
for this task is module selection.

ASAP (As Soon As Possible)
Is a scheduling algorithm commonly used in high-level synthesis (HLS) to

schedule operations in a dataflow graph onto hardware resources to meet
timing constraints and optimize performance.
scheduling algorithm works in HLS at a high level:
Dataflow Graph Representation:
The first step in HLS is to represent the algorithm or design as a dataflow
graph. This graph shows the operations and their dependencies.
Node Scheduling:
The ASAP scheduling algorithm works by scheduling operations on the graph
nodes based on their earliest possible start times.
The earliest possible start time for an operation is determined by the
availability of its input data and the hardware resources (functional units)
available.

Calculate ASAP Times:
Calculate the ASAP time for each node (operation) in the
dataflow graph. The ASAP time is the earliest time at which an
operation can start execution without violating any data
dependencies.
Priority Queue:
Create a priority queue (or list) of operations based on their
ASAP times. The operation with the smallest ASAP time is
given the highest priority.

Schedule Operations:
While there are operations in the priority queue, perform the
following steps:
a. Select the operation with the highest priority (smallest ASAP
time).
b. Assign the operation to an available functional unit
(hardware resource).
c. Update the ASAP times of the operations dependent on the
scheduled operation based on its completion time.

Repeat:
– Repeat the scheduling process until all operations are scheduled or until a timing
constraint is met.
Resource Sharing:
– In ASAP scheduling, resources are shared among operations. If a functional unit is
available and multiple operations can use it, the algorithm selects the operation with the
smallest ASAP time.
Output:
The final schedule provides the order of execution for each
operation on the hardware resources, ensuring that the design
meets the desired timing constraints.

ASAP scheduling aims to maximize parallelism and minimize the
critical path in the design, which is essential for optimizing
performance in hardware designs.
However, it doesn't necessarily optimize resource utilization, so
additional scheduling and resource allocation steps may be
required for that purpose.
Overall, the ASAP scheduling algorithm is a fundamental
component of high-level synthesis, and it helps translate a
high-level algorithm description into an efficient hardware
design by scheduling operations to achieve the best possible
performance while meeting timing constraints.

Mobility-based Scheduling
Mobility-based scheduling is a scheduling approach used in high-

level synthesis (HLS) to optimize the placement and
scheduling of operations in a way that maximizes resource
sharing and minimizes hardware usage.
It aims to reduce the area and power consumption of the
synthesized design. In mobility-based scheduling, operations
are scheduled based on their mobility, which is a measure of
how much freedom there is in choosing their execution times.
Operations with higher mobility can be scheduled more
flexibly.

Mobility-Based Scheduling Algorithm:
Operation Graph: Start with a directed acyclic graph (DAG) representation
of the design, similar to the ASAP scheduling algorithm.
Operation Mobility: Calculate the mobility of each operation. Mobility is
typically defined as the difference between the latest time an operation can
start (based on its data dependencies and timing constraints) and the earliest
time it can start. High mobility indicates more flexibility in scheduling.
Topological Sort: Perform a topological sort of the operation graph.
Scheduling Iteration: For each operation in the topological order, select the
operation with the highest mobility. In the case of ties, select the operation
with the smallest latency.
Assign Time: Assign a time slot for the selected operation, ensuring that it
satisfies the data dependencies and any timing constraints.
Repeat: Continue the scheduling iteration until all operations are
scheduled.

Session12 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Session12 1

Uploaded by

Copyright:

Available Formats

Dr.

CAD for IC Design

Hardware Models for High -Level Synthesis

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

High-level synthesis is the process of mapping a behavioral

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

It is a common practice to divide signals in a logic circuit into

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

A data-flow graph is a directed graph G(V, E). The set of nodes V

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Conditional computations require the use of two special

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Combinations of selector and distributor nodes can also be used

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

The main issue in high-level synthesis is the mapping of the internal

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Is a scheduling algorithm commonly used in high-level synthesis (HLS) to

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Mobility-based scheduling is a scheduling approach used in high-

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

You might also like