You are on page 1of 53

Architecting for VLSI Implementation

Presenter : Chandra Shekhar

Director
CEERI
Pilani – 333 031
(Rajasthan)

Phone :








FAX :










Email :




Architecting for VLSI Implementation

Logic Specification vs. Implementation

Logic Specification and Logic Implementation are two different things.

Logic Specification precedes Logic Implementation.




For a particular Logic Specification, there are many different possible Logic


Implementations.

These different Logic Implementations may widely differ in their cost, speed


of operation and power consumption.

Logic Specification is also called Behavioural Description of logic.

Logic Implementation is also called Structural Description of logic.


c CEERI, Pilani 1

Architecting for VLSI Implementation

Specifying Logic

How do we specify logic ?

1. Through Boolean Expressions.




















2. Through Truth Tables.

3. Through Natural Language Statements.

c CEERI, Pilani 2

Architecting for VLSI Implementation

Specifying Logic

!
"
4. Through Programming Language Statements.

( , , , , , ...)


&

'

+
$%

$%

()

5. Through Behavioural Description Constructs of Hardware Description Lan-


guages ( , ) e.g. Process statement in .
-

&


,

,
/

c CEERI, Pilani 3

Architecting for VLSI Implementation

Implementing Logic

How do you efficiently implement logic given the constraints on

Speed of Operation.


Power Consumption.


Design Time.


Design Cost.


c CEERI, Pilani 4

Architecting for VLSI Implementation

Implementing Logic

!
"
Product Cost.


Upgradability.


The strategic planning and selection of an optimal approach for implementation


of logic is typically called architecting or architecture design.

c CEERI, Pilani 5

Architecting for VLSI Implementation

Specifying and Implementing Logic

Example Logic Specification





0

1


Architecture #1 (Combinational) for Logic Implementation

A
B +
+ Z
C +
+
D
E

c CEERI, Pilani 6

Architecting for VLSI Implementation

Implementing Logic

!
"
Architecture #2 (Combinational) for Logic Implementation

A
B +
+ Z
C +
D +
E

c CEERI, Pilani 7

Architecting for VLSI Implementation

Implementing Logic

!
"
Architecture #3 (Sequential) for Logic Implementation

2:1
Mux

+ R
A B C D E
Select

Control

c CEERI, Pilani 8

Architecting for VLSI Implementation

Sequential Architectures

Characteristics of Sequential Architectures:

They need storage elements besides combinational logic.




They need a sequence of steps to implement the full logic specification.




Next step should be taken only when the logic function of the previous


step has been completed and its result saved.

c CEERI, Pilani 9

Architecting for VLSI Implementation

Sequential Architectures

!
"
The stepping can be asynchronous/self-timed/synchronous (with a timing


signal called clock).

Depending upon the selection of method of stepping, sequential architec-




tures can be asynchronous/self-timed/synchronous.

c CEERI, Pilani 10

Architecting for VLSI Implementation

Implementing Logic

!
"
Architecture #4 (Pipelined; Synchronous) for Logic Implementation

A
B +

C
+ Z
+
D +

c CEERI, Pilani 11

Architecting for VLSI Implementation

Implementing Logic

!
"
Architecture #5 (Pipelined; Synchronous) for Logic Implementation

A
B +
Z
C
+ +
D +

c CEERI, Pilani 12

Architecting for VLSI Implementation

Pipelined Architectures

Characteristics of Pipelined Architectures:

They increase the sustained throughput of logic function computation (roughly




by a factor of for a -stage pipelined architecture)


2

They do not reduce the delay of computation of the logic function.




Their cost is higher due to the need of pipeline registers.




c CEERI, Pilani 13

Architecting for VLSI Implementation

Pipelined Architectures

!
"
They can be coarse-grained or fine-grained.


The pipeline can be balanced (all pipeline stages have identical delays) or


unbalanced (different pipeline stages have different delays).

c CEERI, Pilani 14

Architecting for VLSI Implementation

Other Architectural Choices

Parallel Combinational Architectures.




Parallel Sequential Architectures.




Parallel Pipelined Architectures.




Mixed Architectures.


c CEERI, Pilani 15

Architecting for VLSI Implementation

Implementing Logic

!
"
Architecture #6 (Control-Programmable; Sequential) for Logic Implemen-
tation

2:1
Mux

A B C D E ALU R
Select

Control
Op_Select

c CEERI, Pilani 16

Architecting for VLSI Implementation

Control-Programmable Sequential Architectures

Characteristics of Control-Programmable Sequential Architectures:

They have a fixed execution unit, but a programmable controller.




By appropriately programming the controller, any logic function can be




implemented.

A popular choice for control programming is through micro-programming




via a Writable Control Store (WCS).

c CEERI, Pilani 17

Architecting for VLSI Implementation

Implementing Logic

!
"
Architecture #7 (Instruction-set Based; Programmable; Sequential) for
Logic Implementation

The von Neumann architecture of a general-purpose stored-program digital


computer (CISC).

Memory CPU


0
3

3


0

3

3


0
3

3


0
3

c CEERI, Pilani 18

Architecting for VLSI Implementation

CPU Block Diagram

Clock Bus Controller


Generator

State
Sequencer
Control
Instruction
Generator Decoder

Instruction
Register

Register Bank MAR PC ALU MDR

Execution Unit

c CEERI, Pilani 19

Architecting for VLSI Implementation

Instruction-Set Based Architectures

Characteristics of Instruction-Set Based Architectures:

They completely decouple the implementing of hardware from the logic




specification (the user logic specification).

Each instruction in the instruction set specifies a soft gate (or virtual gate)


with an appropriate logic function and its connectivity to other ‘soft gates’
(through operand address specification).

A sequence of instructions (program), therefore, can be translated into an




equivalent logic network of ‘soft gates’ (or a netlist of ‘soft gates’).

c CEERI, Pilani 20

Architecting for VLSI Implementation

Instruction-Set Based Architectures

!
"
The equivalent logic network of ‘virtual gates’ (‘soft gates’) can be easily


modified by changing the order of instructions in the program (instruction


sequence) or by changing the operand address or both.

The implementation of each ‘soft logic gate’ (instruction) using hardware




logic is done by the CPU.

A user implements his logic specification using only ‘soft gates’.




A Random Access Memory (RAM) is used to store the logic specification’s




implementation in terms of ‘soft gates’ — including logical values of the all


the circuit nodes in the equivalent logic network of ‘soft gates’.

c CEERI, Pilani 21

Architecting for VLSI Implementation

Instruction-Set Based Architectures

!
"
The instruction set (which defines the ‘soft gates’) acts as a hardware-


software interface for the implementation of user specified logic function.

The hardware implementation of logic functions of each instruction (‘soft




gate’) is decided by the CPU architect/designer (and, therefore, is beyond


the control of the programmer).

The ‘soft gates’ implementation of the user’s logic specification is com-




pletely under the control of the programmer.

c CEERI, Pilani 22

Architecting for VLSI Implementation

CISC Architectures (Register-Memory Architectures)

Characteristics of CISC Architectures:

Feature a large variety of addressing modes to address the memory operand




(for implementing data structures in the memory convenient specifica-

4
tion of interconnections amongst ‘soft gates’).

Typically 2 operands per instruction up to one of which can be in the mem-




ory (the other is in a general purpose register).

Most instructions can use most of the addressing modes.




c CEERI, Pilani 23

Architecting for VLSI Implementation

Benefits of CISC Architectures

!
"
Excellent support for data structuring and program structuring at assembly


language level.

Compact object codes.




c CEERI, Pilani 24

Architecting for VLSI Implementation

Disadvantages of CISC Architectures

!
"
Variable instruction lengths and many different instruction formats greatly


increase the complexity of CPU implementation (instruction decoding and


control generation part of the CPU).

Widely varying clock cycle counts for completion of different instructions




— makes the use of pipelining difficult.

Increased complexity of the control part which occupies a large part of the


chip area (crowding out the execution unit).

Increased complexity of the control part also becomes a speed bottle-




neck.

c CEERI, Pilani 25

Architecting for VLSI Implementation

Implementing Logic

!
"
Architecture #8 (Instruction-set Based; Programmable; Sequential) for
Logic Implementation

The Harvard architecture of a general-purpose stored-program digital com-


puter (used in DSPs).

Instruction
Memory
CPU
Data
Memory

c CEERI, Pilani 26

Architecting for VLSI Implementation

Benefits of Harvard Architectures

Reduced clock cycle counts for completion of instructions due to concur-




rent fetching of operands and instructions (overlapped implementation of


two ‘soft gates’ by the CPU).

Increased throughput due to above.




c CEERI, Pilani 27

Architecting for VLSI Implementation

Implementing Logic

!
"
Architecture #9 (Instruction-set Based; Programmable; Pipelined) for Logic
Implementation

The RISC architecture of a general-purpose stored-program digital computer.

c CEERI, Pilani 28

Architecting for VLSI Implementation

Pipelined RISC Architecture

Instruction Fetch Instruction Decode / Compute Address / Memory Access Write Back
Register Fetch Execute

+4

Address

Data
Memory
A

LMD
Data
Reg
+
Instruction
PC

Registers
Memory

IR

S−Ex Imm

c CEERI, Pilani 29

Architecting for VLSI Implementation

RISC Architectures (Register-Register Architectures)

Characteristics of RISC Architectures:

A reduced instruction set featuring only very frequently used instructions




encoded in a few simple and fixed-field instruction formats (fewer types of


‘soft gates’).

Typically having only register operands (higher speed of interconnections




between ‘soft gates’).

c CEERI, Pilani 30

Architecting for VLSI Implementation

RISC Architectures

!
"
These will drastically reduce the complexity of the control part thereby releas-
ing chip area for more resources in the execution unit including larger register
files.

Also, easier pipelining of the CPU is possible leading to increase in speed


(overlapped implementation of several ‘soft gates’ by the CPU) and throughput.

c CEERI, Pilani 31

Architecting for VLSI Implementation

Key Features of RISC Architectures

!
"
Load-Store architectures :


Only and instructions can transfer data from and to memory


.



using a few simple addressing modes.

All other instructions operate only on Register operands – typically 2 source


operands and 1 destination operand.

Simplified instruction decoding.




Drastic reduction in the complexity of the control part.




c CEERI, Pilani 32

Architecting for VLSI Implementation

Key Features of RISC Architectures




!
"
Easier pipelining of instruction execution.


Much larger fraction of chip area becomes available for execution unit re-


sources (e.g. a larger register file, more powerful operational units, more
buses) which can lead to enhanced performance.

c CEERI, Pilani 33

Architecting for VLSI Implementation

Architectural Evolution of CPUs

Generation 1

!
"
F D E F D E F D E

Instruction 1 Instruction 2 Instruction 3

Time

Generation 2


!
"
F D E Instruction 1

F D E Instruction 2

F D E Instruction 3

Time

F Fetch Instruction D Decode E Execute


c CEERI, Pilani 34

Architecting for VLSI Implementation

Generation 3

!
"
F D A R E W

F D A R E W

F D A R E W

F D A R E W

F D A R E W

F D A R E W

Time

F Fetch Instruction R Read Operands

D Decode E Execute

A Address Calculation W Write Result


c CEERI, Pilani 35

Architecting for VLSI Implementation

Generation 4

!
"
F D A R E W

F D A R E W

F D A R E W

F D A R E W

F D A R E W

F D A R E W

F D A R E W

F D A R E W

F D A R E W

F D A R E W

F D A R E W

F D A R E W

Time
F Fetch Instruction R Read Operands

D Decode E Execute

A Address Calculation W Write Result

c CEERI, Pilani 36

Architecting for VLSI Implementation

Generation 5

!
"
F D A R E E E E W

F D A R E E E E E E W

F D A R E E E E W

F D A E E E E E W

F D A E E E E E W

F D A E E E E E W

F D E E E E E W

F D E E E W

F D E E E E W

F E E E E E W

F E E E E E W

F E E E E E W

Dataflow Model
Time
F Fetch Instruction R Read Operands

D Decode E Execute

A Address Calculation W Write Result

c CEERI, Pilani 37

Architecting for VLSI Implementation

Throughput and Performance Evolution

Throughput depends upon :

How many bits does a microprocessor process simultaneously ?




4, 8, 16, 32, 64 bits (Improvement = 16 times)

How many clock cycles does it take to complete 1 instruction (cycles per


instruction or CPI) ?

8, 1, 1/4 (Improvement = 32 times)

c CEERI, Pilani 38

Architecting for VLSI Implementation

Throughput and Performance Evolution

!
"
What is the maximum clock speed at which the processor can run ?


0.5 MHz (in 1971) to 3.5 GHz (in 2004) (Improvement = 7000 times)

Performance Operand Bit-width


6

7
6

:
8

f
6

;<=

Total Improvement = 16 32 7000 = 3.584 Million times


>

>

c CEERI, Pilani 39

Architecting for VLSI Implementation

Contributing Factors to Increased Throughput

1. Increase of operand bit-width (from 4 bits to 64 bits) : direct consequence


of feature size reduction and chip size increase of MOS technologies.

2. Reduction of CPI : due to architectural innovations and pipelining (includ-


ing multiple pipelines running concurrently).

3. Increase of clock frequency :

Due to architectural innovations and pipelining.




Due to feature size reduction of MOS technologies.




c CEERI, Pilani 40

Architecting for VLSI Implementation

SoC and Embedded System Design

SoC and Embedded System Design represents the convergence of hardware


and software design.

Besides digital functions, a SoC typically also integrates some analog and/or
mixed signal and/or RF functions on a single chip.

The boundary between what functions must necessarily be done in analog (or
can be better done in analog) and what functions are better done as digital has
been fairly clear and stable for quite some time.

However, it is only more recent that the boundary between what digital func-
tions are better done in hardware and what functions are better done in soft-
ware has been sought to be defined in view of the speed-power-cost, time-to-
market and system upgradability points of view of the proposed solution.
c CEERI, Pilani 41

Architecting for VLSI Implementation

Hardware vs. Software Decision

It needs reminding that software is actually implemented through a hardware


architecture (that of the processor) with the processor’s instruction set defining
the hardware-software boundary/interface.

Logic functionality of the instruction set is realized in hardware, where as the


higher-end logic functionality is realized in software (using a sequence of in-
structions from the instruction set).

Obviously, software provides a more flexible way of performing logic functions.


A change in the sequence of instructions or a change in the operands of in-
structions changes the logic function. However, this flexibility is afforded by
software at a cost — in terms of speed and power.

c CEERI, Pilani 42

Architecting for VLSI Implementation

Hardware vs. Software Decision

!
"
Memory provides the means of building a soft logic network (represented by
software) as opposed to the hard logic network (represented by hardware).

Each software logic gate receives its configuration as well as inputs from mem-
ory via memory bus and stores its result in the memory via memory bus.

A hardware logic gate by contrast receives its inputs directly from the output of
a preceding hardware logic gate over a short wire.

Thus, there is typically an overhead of four memory transfers per logic opera-
tion when using software logic gates as opposed to hardware logic gates.

c CEERI, Pilani 43

Architecting for VLSI Implementation

Hardware vs. Software Decision

!
"
These memory transfers occur over the memory bus and the - buses

?
@A

@A
internal to the memory and are, therefore, very slow as well as power consum-
ing owing to the large capacitances associated with memory buses ( tens of

B
) and - buses ( several ).
B
?

?
@A

@A
C

C
D

D
So, software logic, though very flexible, is both slow and very power consum-
ing.

Besides, there isn’t much concurrency in software logic. Classical von Neu-
mann CPU architectures of software logic have no concurrency.

Pipelined RISC architectures process instructions in an overlapped manner


and hence have a concurrency equal to the number of stages in the pipeline.
c CEERI, Pilani 44

Architecting for VLSI Implementation

Hardware vs. Software Decision

!
"
Superscalars (with multiple pipelines) have still higher concurrencies ( 10-

B
15). However, it is no where close to the concurrency in hardware logic —
which can be massive.

For these reasons, software logic provides a very flexible but slow and high
power consuming logic implementation option, whereas hardware logic pro-
vides a totally rigid but fast and low power logic implementation option.

The software logic design is faster and its implementation is less expensive in
many situations.

Hence, one needs to carefully partition one’s system logic into software logic
and hardware logic.
c CEERI, Pilani 45

Architecting for VLSI Implementation

Hardware vs. Software Decision

!
"
Very often in the past performance (speed) has been the sole criterion for
deciding what portion of the system logic be implemented in hardware.

More recently, in the context of battery-operated portable/hand-held devices,


power consumption has emerged as the additional criterion for deciding the
system logic to be realized in hardware.

c CEERI, Pilani 46

Architecting for VLSI Implementation

Architecture #10 (Application Specific Instruction-set Based; Programmable;


Non-pipelined/Pipelined) for Logic Implementation

Besides the standard hardware and software options, there is another option
that effectively draws upon the strengths of both hardware logic and software
logic to provide a solution that optimally mixes the benefits of both these ap-
proaches in the context of a given application or class of applications — that
of Application Specific Instruction Set Processor (ASIP).

This logic implementation is application specific, using a programmable pro-


cessor usually for embedded systems applications.

c CEERI, Pilani 47

Architecting for VLSI Implementation

ASIP Architectures

ASIPs (Application Specific Instruction-Set Processors) fill the gap between


two kinds of architectures for electronic system design :

1. General-purpose Instruction-set + CPU based system design : where nei-


ther the instruction set nor the CPU architecture is tailored for the applica-
tion.

Thus, while there is all the flexibility afforded by this approach, perfor-
mance may be inadequate and power consumption excessive.

2. Application-specific dedicated hardware designs : where the architecture


and design are optimized for performance and power, but there is no flex-
ibility.

c CEERI, Pilani 48

Architecting for VLSI Implementation

Example ASIP Block Diagram

HOST COMM.
PC CONTROLLER
PROGRAM
MEMORY PC LOGIC FETCHED
PARAMETER
REGISTER

INSTRUCTION ADDRESS
REGISTER UNIT

INSTRUCTION
DECODER PARAMETER
OUTPUT OUTPUT AND RAM
REG. CONTROLLER
CONTROL
SEQUENCER
DATA
ADDR. RAM
GEN.

SPEECH
SAMPLE RAM

TEMPORARY FLOATING−POINT FLOATING−POINT MULTI−


REGISTERS
FUNCTION
ADDER−SUB MULTIPLIER UNIT

c CEERI, Pilani 49

Architecting for VLSI Implementation

Reconfigurable Computing

Another interesting and potentially useful area with a bearing on embedded


systems is the area of reconfigurable computing. So far, by and large, logic
reconfiguration has been provided by software which runs on a hard archi-
tecture. However, with the FPGA technology’s integration in SoC-embedded
systems, hardware architecture is no longer that hard. It can be reconfigured
— providing major advantages of speed and power consumption.

This adds one more dimension to programming — that of hardware program-


ming (e.g. architectural configuration / reconfiguration).

c CEERI, Pilani 50

Architecting for VLSI Implementation

Reconfigurable Computing

!
"
FPGA blocks of SoCs or on SoC platforms provide a low cost means of im-
plementation of Application Specific Instruction Set Processor (ASIP) ideas,
and indeed, dynamically reconfigurable instruction sets and implementation
architectures — particularly where there are repetitive functions / long running
loops.

This holds an immense potential of enhancing speed and reducing power


consumption of single-function or multi-function / multi-standard hand-held de-
vices.

c CEERI, Pilani 51

Architecting for VLSI Implementation

Acknowledgment

I wish to thank my colleagues of the IC Design Group at CEERI, Pilani for


their continued support, interest in larger perspectives, and enthusiasm for
visualizing the scenarios of the future — to select the new directions for R&D
efforts.

c CEERI, Pilani 52


You might also like