SoC System Design

SYSTEM ON CHIP
1 Mr. A. B. Shinde
Lecturer,
Department of Electronics Engg.
P.V.P.I.T. Budhgaon.
SOC BASICS
2
SOC BASICS
System-on-a-chip or system on chip (SoC or SOC)

refers to integrating all components of a computer or
other electronic system into a single integrated circuit
(chip).
It may contain digital, analog, mixed-signal, and often

radio-frequency functions all on one.
Microcontrollers typically have under 100K of RAM

(often just a few Kbytes) and often really are single-
chip-systems;
whereas the term SoC is typically used with more
powerful processors, capable of running software such
as Windows or Linux, which need external memory
chips (flash, RAM) to be useful, and which are used 3
with various external peripherals.
SOC BASICS CONT
Many interesting systems are too complex to fit on just

one chip built with a process optimized for just one of
the system's tasks.
When it is not feasible to construct an SoC for a

particular application, an alternative is a system in
package (SiP) comprising a number of chips in a single
package.
In large volumes, SoC is believed to be more cost

effective than SiP, because its packaging is simpler.
The SoC chip includes processors and numerous digital

peripherals, and comes in a ball grid package with lower
and upper connections. 4
A TYPICAL SOC CONSISTS OF:
One microcontroller, microprocessor or DSP core(s).
Some SoCs called multiprocessor System-on-Chip
(MPSoC) include more than one processor core.
Memory blocks including a selection of ROM, RAM,

EEPROM and Flash.
Timing sources including oscillators and phase-locked

loops.
Peripherals including counter-timers, real-time timers

and power-on reset generators.
External interfaces including industry standards such

as USB, FireWire, Ethernet, USART, SPI. 5
A TYPICAL SOC CONSISTS OF: CONT
Analog interfaces including ADCs and DACs.
Voltage regulators and power management

circuits.
These blocks are connected by either a proprietary

or industry-standard bus such as the AMBA bus
from ARM.
DMA controllers route data directly between

external interfaces and memory, by-passing the
processor core and thereby increasing the data 6
throughput of the SoC
DEIGN FLOW
OF SOC
7
DESIGN FLOW OF SOC:
An SoC consists
of both the
hardware
described above,
and the software
that controls the
microcontroller,
microprocessor or
DSP cores,
peripherals and
interfaces.
The design flow

for an SoC aims
to develop this
hardware and
software in
parallel. 8
DESIGN FLOW OF SOC: CONT
Most SoCs are developed from pre-qualified hardware blocks for

the hardware elements described above, together with the
software drivers that control their operation.
A key step in the design flow is emulation: the hardware is

mapped onto an emulation platform based on a FPGA that
mimics the behavior of the SoC, and the software modules are
loaded into the memory of the emulation platform. Once
programmed, the emulation platform enables the hardware and
software of the SoC to be tested and debugged at close to its full
operational speed.
After emulation the hardware of the SoC follows the place and
route phase of the design of an integrated circuit before it is
fabricated.
Chips are verified for logical correctness before being sent to

foundry. This process is called functional verification.
9
Verilog and VHDL are typical hardware description languages
used for verification.
SIMD
SINGLE INSTRUCTION MULTIPLE DATA
10
SIMD (SINGLE INSTRUCTION, MULTIPLE DATA)
In computing, SIMD (Single
Instruction, Multiple Data;
"vector instructions") is a
technique employed to achieve
data level parallelism.
Supercomputers, popular in the

1980s such as the Cray X-MP
were called "vector processors.
The first era of SIMD machines was characterized by

supercomputers such as the Thinking Machines CM-1 and CM-2.
These machines had many limited functionality processors that
would work in parallel.
For example: each of 64,000 processors in a Thinking
Machines CM-2 would execute the same instruction at the 11
same time so that you could do 64,000 multiplies on 64,000
pairs of numbers at a time.
SIMD machine exploits a property of the data

stream called data parallelism.
12
A very important class of architectures in the history of

computation, single-instruction/multiple-data machines are capable of
applying the exact same instruction stream to multiple streams of data
simultaneously.
This type of architecture is perfectly suited to achieving very high
13
processing rates, as the data can be split into many different
independent pieces, and the multiple instruction units can all operate
on them at the same time.
SIMD TYPES
Synchronous (lock-step): These types of systems are
generally considered to be synchronous, meaning that they are
built in such a way as to guarantee that all instruction units
will receive the same instruction at the same time, and thus all
will potentially be able to execute the same operation
simultaneously.
Deterministic SIMD architectures: These are deterministic
because, at any one point in time, there is only one instruction
being executed, even though multiple units may be executing
it. So, every time the same program is run on the same data,
using the same number of execution units, exactly the same
result is guaranteed at every step in the process.
Well-suited to instruction/operation level parallelism: The
single in single-instruction doesnt mean that theres only
one instruction unit, as it does in SISD, but rather that theres
only one instruction stream, and this instruction stream is
executed by multiple processing units on different pieces of 14
data, all at the same time, thus achieving parallelism.
SIMD (ADVANTAGES)
An application where the same value is being added (or
subtracted) to a large number of data points, a common
operation in many multimedia applications.
One example would be changing the brightness of an image.
To change the brightness, the R G and B values are read
from memory, a value is added (or subtracted) from them,
and the resulting values are written back out to memory.
The data is understood to be in blocks, and a number of
values can be loaded all at once.
Instead of a series of instructions saying "get this pixel, now
get the next pixel", a SIMD processor will have a single
instruction that effectively says "get lots of pixels. This can
take much less time than "getting" each pixel individually,
like with traditional CPU design.
If the SIMD system works by loading up eight data
points at once, the add operation being applied to the 15
data will happen to all eight values at the same time.
SIMD (DISADVANTAGES)
Not all algorithms can be vectorized.
Implementing an algorithm with SIMD

instructions usually requires human labor; most
compilers don't generate SIMD instructions from a
typical C program, for instance.
Programming with particular SIMD instruction

sets can involve numerous low-level challenges.
It has restrictions on data alignment.
Gathering data into SIMD registers and scattering it
to the correct destination locations is tricky and can be
inefficient.
Specific instructions like rotations or three-operand
addition aren't in some SIMD instruction sets. 16
SIMD HARDWARE
Small-scale (64 or 128 bits) SIMD has become

popular on general-purpose CPUs in the early
1990s and continuing through 1997 and later
with Motion Video Instructions (MVI) for Alpha.
SIMD instructions can be found, to one degree or

another, on most CPUs
The IBM, Sony, Toshiba co-developed Cell

Processor's SPU's instruction set is heavily SIMD
based. NXP founded by Philips developed several
SIMD processors named Xetal. 17
SIMD SOFTWARE
SIMD instructions are widely used to process 3D graphics,
although modern graphics cards with embedded SIMD have
largely taken over this task from the CPU.
Adoption of SIMD systems in personal computer software
was at first slow, due to a number of problems.
SIMD on x86 had a slow start.
Apple Computer had somewhat more success, even though
they entered the SIMD market later than the rest.
Additionally, many of the systems that would benefit from
SIMD were supplied by Apple itself. Apple was the dominant
purchaser of PowerPC chips from IBM.
SIMD within a register, or SWAR, is a range of techniques
and tricks used for performing SIMD in general-purpose
registers on hardware that doesn't provide any direct support
for SIMD instructions. This can be used to exploit parallelism
in certain algorithms even on hardware that does not support 18
SIMD directly.
SISD
SINGLE INSTRUCTION SINGLE DATA
19
SISD (SINGLE INSTRUCTION, SINGLE DATA )
This is the oldest style of
computer architecture, and
still one of the most
important: all personal
computers fit within this
category
Single instruction refers to

the fact that there is only one
instruction stream being
acted on by the CPU during
any one clock tick; single
data means, analogously,
that one and only one data
stream is being employed as
input during any one clock
tick. 20
SISD (SINGLE INSTRUCTION, SINGLE DATA )
In computing, SISD (Single

Instruction, Single Data) is a term
referring to a computer architecture
in which a single processor, a
uniprocessor, executes a single
instruction stream, to operate on data
stored in a single memory. This
corresponds to the von Neumann
architecture.
According to Michael J. Flynn, SISD

can have concurrent processing
characteristics. Instruction fetching
and pipelined execution of
instructions are common examples
found in most modern SISD 21
computers.
CHARACTERISTICS OF SISD
Serial Instructions are executed one after the
other, in lock-step; this type of sequential execution is
commonly called serial, as opposed to parallel, in
which multiple instructions may be processed
simultaneously.
Deterministic Because each instruction has a
unique place in the execution stream, and thus a
unique time during which it and it alone is being
processed, the entire execution is said to
be deterministic, meaning that you (can potentially)
know exactly what is happening at all times, and,
ideally, you can exactly recreate the process, step by
step, at any later time.
Examples:
all personal computers,
all single-instruction-unit-CPU workstations,
mini-computers, and 22
mainframes.
MIMD
MULTIPLE INSTRUCTION MULTIPLE DATA
23
MIMD (MULTIPLE INSTRUCTION, MULTIPLE DATA )
In computing, MIMD (Multiple
Instruction stream, Multiple Data
stream) is a technique employed to
achieve parallelism.
Machines using MIMD have a

number of processors that function
asynchronously and
independently.
At any time, different processors

may be executing different
instructions on different pieces of
data. MIMD architectures may be
used in a number of application
areas such as computer-aided
design/computer-aided
manufacturing, simulation, 24
modeling, and as communication
switches.
MIMD machines can be of
either shared memory or
distributed memory
categories.
Shared memory machines
may be of the bus-based,
extended, or hierarchical
type.
Distributed memory machines
may have hypercube or mesh
interconnection schemes.
25
Many believe that the next major advances in

computational capabilities will be enabled by this
approach to parallelism which provides for multiple
instruction streams simultaneously applied to multiple 26
data streams.
The most general of all of the major categories, a MIMD machine
is capable of being programmed to operate as if it were in fact any
of the four.
Synchronous or asynchronous MIMD instruction streams can
potentially be executed either synchronously or asynchronously, i.e.,
either in tightly controlled lock-step or in a more loosely bound do your
own thing mode.
Deterministic or non-deterministic MIMD systems are potentially
capable of deterministic behavior, that is, of reproducing the exact same
set of processing steps every time a program is run on the same data.
Well-suited to block, loop, or subroutine level parallelism. The more
code each processor in an MIMD assembly is given domain over, the
more efficiently the entire system will operate, in general.
Multiple Instruction or Single Program MIMD-style systems are
capable of running in true multiple-instruction mode, with every
processor doing something different, or every processor can be given
the same code; this latter case is called SPMD, Single Program Multiple
27
Data, and is a generalization of SIMD-style parallelism.
MIMD : SHARED MEMORY MODEL
The processors are all connected to a "globally available"
memory, via either a software or hardware means. The
operating system usually maintains its memory coherence.
Bus-based:
MIMD machines with shared memory have processors which
share a common, central memory.
Here all processors are attached to a bus which connects them to
memory.
This setup is called bus-base point where there is too much
contention on the bus.
Hierarchical:
MIMD machines with hierarchical shared memory use a
hierarchy of buses to give processors access to each other's
memory.
Processors on different boards may communicate through inter-
nodal buses.
Buses support communication between boards.
28
With this type of architecture, the machine may support over a
thousand processors.
MIMD : DISTRIBUTED MEMORY MODEL
In distributed memory MIMD machines, each processor
has its own individual memory location. Each processor
has no direct knowledge about other processor's
memory.
For data to be shared, it must be passed from one
processor to another as a message. Since there is no
shared memory, contention is not as great a problem
with these machines.
It is not economically feasible to connect a large number
of processors directly to each other. A way to avoid this
multitude of direct connections is to connect each
processor to just a few others.
The amount of time required for processors to perform
simple message routing can be substantial.
Systems were designed to reduce this time loss and
hypercube and mesh are among two of the popular
interconnection schemes. 29
MIMD : DISTRIBUTED MEMORY MODEL
Interconnection schemes:
Hypercube interconnection network:
In an MIMD distributed memory machine with a hypercube system
interconnection network containing four processors, a processor
and a memory module are placed at each vertex of a square.
The diameter of the system is the minimum number of steps it
takes for one processor to send a message to the processor that is
the farthest away.
So, for example, In a hypercube system with eight processors and
each processor and memory module being placed in the vertex of a
cube, the diameter is 3. In general, a system that contains 2^N
processors with each processor directly connected to N other
processors, the diameter of the system is N.
Mesh interconnection network:
In an MIMD distributed memory machine with a mesh
interconnection network, processors are placed in a two-
dimensional grid.
Each processor is connected to its four immediate neighbors. Wrap
around connections may be provided at the edges of the mesh.
30
One advantage of the mesh interconnection network over the
hypercube is that the mesh system need not be configured in
powers of two.
MISD
MULTIPLE INSTRUCTIONS SINGLE DATA
31
MISD (MULTIPLE INSTRUCTIONS, SINGLE DATA)
In computing, MISD (Multiple
Instruction, Single Data) is a type of
parallel computing architecture
where many functional units perform
different operations on the same data.
Pipeline architectures belong to this
type.
Fault-tolerant computers executing
the same instructions redundantly in
order to detect and mask errors, in a
manner known as task replication,
may be considered to belong to this
type.
Not many instances of this
architecture exist, as MIMD and
SIMD are often more appropriate for 32
common data parallel techniques.
I thought of another example
of a MISD process that is
carried out routinely at [the]
United Nations. When a
delegate speaks in a language
of his/her choice, his speech is
simultaneously translated
into a number of other
languages for the benefit of
other delegates present. Thus
the delegates speech (a single
data) is being processed by a
number of translators
(processors) yielding different 33
results.
This category was included more for the sake of

completeness than to identify a working group of
actual computer systems.
MISD Examples:
Multiple frequency filters operating on a single signal
stream.
Multiple cryptography algorithms attempting to crack
a single coded message.
Both of these are examples of this type of processing

where multiple, independent instruction streams are
34
applied simultaneously to a single data stream.
PIPELINING
35
PIPELINING
In computing, a pipeline is a set of data

processing elements connected in series, so that
the output of one element is the input of the next
one.
The elements of a pipeline are often executed in
parallel or in time-sliced fashion; in that case,
some amount of buffer storage is often inserted
between elements. 36
PIPELINING (CONCEPT AND MOTIVATION)
Consider the assembly of a car:
Assume that certain steps in the assembly line are to
install the engine, install the hood, and install the wheels.
A car on the assembly line can have only one of the
three steps done at once. After the car has its engine
installed, it moves on to having its hood installed, leaving the
engine installation facilities available for the next car.
The first car then moves on to wheel installation, the
second car to hood installation, and a third car begins to have
its engine installed.
If engine installation takes 20 minutes, hood
installation takes 5 minutes, and wheel installation takes 10
minutes, then finishing all three cars when only one car can
be operated at once would take 105 minutes.
On the other hand, using the assembly line, the total
time to complete all three is 75 minutes. At this point,
additional cars will come off the assembly line. 37
PIPELINING (COSTS, DRAWBACKS, AND BENEFITS)
As the assembly line example shows, pipelining doesn't

decrease the time for a single datum to be processed; it
only increases the throughput of the system when
processing a stream of data.
High pipelining leads to increase of latency - the time
required for a signal to propagate through a full pipe.
A pipelined system typically requires more resources

(circuit elements, processing units, computer memory,
etc.) than one that executes one batch at a time,
because its stages cannot reuse the resources of a
previous stage. Moreover, pipelining may increase the
time it takes for an instruction to finish. 38
PIPELINING (IMPLEMENTATIONS)
Buffered, Synchronous pipelines:
Conventional microprocessors are synchronous circuits that use
buffered, synchronous pipelines. In these pipelines, "pipeline registers" are
inserted in-between pipeline stages, and are clocked synchronously.
Buffered, Asynchronous pipelines:
Asynchronous pipelines are used in asynchronous circuits, and have
their pipeline registers clocked asynchronously. Generally speaking, they
use a request/acknowledge system, wherein each stage can detect when it's
"finished.
When a stage is finished and the next stage has sent it a "request"
signal, the stage sends an "acknowledge" signal to the next stage, and a
"request" signal to the previous stage. When a stage receives an
"acknowledge" signal, it clocks its input registers, thus reading in the data
from the previous stage.
Unbuffered pipelines:
Unbuffered pipelines, called "wave pipelines", do not have registers
in-between pipeline stages.
Instead, the delays in the pipeline are "balanced" so that, for each 39
stage, the difference between the first stabilized output data and the last is
minimized.
PIPELINING (COMPUTER-RELATED)
Instruction pipelines, such as the classic RISC pipeline, which
are used in processors to allow overlapping execution of
multiple instructions with the same circuitry. The circuitry is
usually divided up into stages, including instruction decoding,
arithmetic, and register fetching stages, wherein each stage
processes one instruction at a time.
Graphics pipelines, found in most graphics cards, which consist

of multiple arithmetic units, or complete CPUs, that implement
the various stages of common rendering operations.
Software pipelines, consisting of multiple processes arranged

so that the output stream of one process is automatically and
promptly fed as the input stream of the next one. Unix
40
pipelines are the classical implementation of this concept.
INSTRUCTION PIPELINE
An instruction pipeline is a technique
used in the design of computers and
other digital electronic devices to
increase their instruction throughput
(the number of instructions that can be
executed in a unit of time).
The fundamental idea is to split the
processing of a computer instruction into
a series of independent steps, with
storage at the end of each step. This
allows the computer's control circuitry to
issue instructions at the processing rate
of the slowest step, which is much faster
than the time needed to perform all
steps at once.
For example, the classic RISC pipeline is
broken into five stages with a set of flip
flops between each stage.
Instruction fetch
Instruction decode and register fetch
Execute
41
Memory access
Register write back
PIPELINING (ADVANTAGES AND DISADVANTAGES)
Pipelining does not help in all cases. An instruction pipeline is said to be
fully pipelined if it can accept a new instruction every clock cycle. A
pipeline that is not fully pipelined has wait cycles that delay the progress
of the pipeline.
Advantages of Pipelining:
The cycle time of the processor is reduced, thus increasing instruction issue-
rate in most cases.
Some combinational circuits such as adders or multipliers can be made faster
by adding more circuitry. If pipelining is used instead, it can save circuitry.
Disadvantages of Pipelining:
A non-pipelined processor executes only a single instruction at a
time. This prevents branch delays and problems with serial
instructions being executed concurrently. Consequently the design
is simpler and cheaper to manufacture.
The instruction latency in a non-pipelined processor is slightly
lower than in a pipelined equivalent. This is due to the fact that
extra flip flops must be added to the data path of a pipelined
processor.
A non-pipelined processor will have a stable instruction bandwidth. 42
The performance of a pipelined processor is much harder to predict
and may vary more widely between different programs.
PARALLEL
COMPUTING
43
PARALLEL COMPUTING
Parallel computing is a form of computation in which
many calculations are carried out simultaneously,
operating on the principle that large problems can often
be divided into smaller ones, which are then solved
concurrently ("in parallel").
There are several different forms of parallel computing:
bit-level,
instruction level,
data, and
task parallelism.
Parallelism has been employed for many years, mainly in
high-performance computing, but interest in it has grown lately
due to the physical constraints preventing frequency scaling.
As power consumption by computers has become a concern in
recent years, parallel computing has become the dominant 44
paradigm in computer architecture, mainly in the form of
multicore processors.
PARALLEL COMPUTING
Traditionally, computer software has been written for serial
computation. To solve a problem, an algorithm is constructed
and implemented as a serial stream of instructions. These
instructions are executed on a central processing unit on one
computer. Only one instruction may execute at a timeafter
that instruction is finished, the next is executed.
Parallel computing, on the other hand, uses multiple

processing elements simultaneously to solve a problem.
This is accomplished by breaking the problem into

independent parts so that each processing element can
execute its part of the algorithm simultaneously with the
others.
The processing elements can be diverse and include resources

such as a single computer with multiple processors, several 45
networked computers, specialized hardware, or any
combination of the above.
TYPES OF PARALLELISM
Bit-level parallelism:
From the advent of VLSI in the 1970s until about 1986, speed-up in
computer architecture was driven by doubling computer word size
the amount of information the processor can manipulate per cycle.
Increasing the word size reduces the number of instructions the
processor must execute to perform an operation on variables whose
sizes are greater than the length of the word.
Instruction-level parallelism:
A computer program is, a stream of instructions executed by a
processor. These instructions can be re-ordered and combined into
groups which are then executed in parallel without changing the
result of the program. This is known as instruction-level parallelism.
Data parallelism:
Data parallelism is parallelism inherent in program loops, which
focuses on distributing the data across different computing nodes to
be processed in parallel.
Task parallelism:
Task parallelism is the characteristic of a parallel program that
"entirely different calculations can be performed on either the same
or different sets of data This contrasts with data parallelism, where46
the same calculation is performed on the same or different sets of
data.
BIT-LEVEL PARALLELISM
Bit-level parallelism is a form of parallel computing based on
increasing processor word size.
Increasing the word size reduces the number of instructions the
processor must execute in order to perform an operation on
variables whose sizes are greater than the length of the word.
For example:
Consider a case where an 8-bit processor must add two 16-bit
integers. The processor must first add the 8 lower-order bits from
each integer, then add the 8 higher-order bits, requiring two
instructions to complete a single operation. A 16-bit processor would
be able to complete the operation with single instruction
Historically, 4-bit microprocessors were replaced with 8-bit, then

16-bit, then 32-bit microprocessors. This trend generally came to an
end with the introduction of 32-bit processors, which has been a
standard in general purpose computing for two decades. Only
recently, with the advent of x86-64 architectures, have 64-bit
processors become commonplace. 47
INSTRUCTION LEVEL PARALLELISM
Instruction-level parallelism (ILP) is a measure of
how many of the operations in a computer program can
be performed simultaneously. Consider the following
program:
For Example:
1. e = a + b
2. f = c + d
3. g = e * f
Here, Operation 3 depends on the results of operations 1
and 2, so it cannot be calculated until both of them are
completed. However, operations 1 and 2 do not depend
on any other operation, so they can be calculated
simultaneously.
If we assume that each operation can be completed
in one unit of time then these three instructions can be 48
completed in a total of two units of time, giving an ILP of
3/2.
INSTRUCTION LEVEL PARALLELISM: CONT
A goal of compiler and processor designers is to

identify and take advantage of as much ILP as
possible.
Ordinary programs are typically written under a
sequential execution model where instructions
execute one after the other and in the order specified
by the programmer. ILP allows the compiler and the
processor to overlap the execution of multiple
instructions or even to change the order in which
instructions are executed.
How much ILP exists in programs is very application

specific. In certain fields, such as graphics and
scientific computing the amount can be very large.
However, workloads such as cryptography exhibit
much less parallelism. 49
DATA PARALLELISM
Data parallelism (also known as loop-level
parallelism) is a form of parallelization of
computing across multiple processors in parallel
computing environments.
Data parallelism focuses on distributing the data
across different parallel computing nodes.
In a multiprocessor system executing a single set of

instructions (SIMD), data parallelism is achieved
when each processor performs the same task on
different pieces of distributed data. In some
situations, a single execution thread controls
50
operations on all pieces of data.
DATA PARALLELISM
For instance, consider a 2-processor system (CPUs A
and B) in a parallel environment, and we wish to do a
task on some data d. It is possible to tell CPU A to do
that task on one part of d and CPU B on another part
simultaneously, thereby reducing the duration of the
execution.
The data can be assigned using conditional statements
As a specific example, consider adding two matrices. In
a data parallel implementation, CPU A could add all
elements from the top half of the matrices, while CPU
B could add all elements from the bottom half of the
matrices.
Since the two processors work in parallel, the job
of performing matrix addition would take one half the
time of performing the same operation in serial using 51
one CPU alone.
TASK PARALLELISM
Task parallelism (also known as function
parallelism and control parallelism) is a form of
parallelization of computer code across multiple
processors in parallel computing environments.
Task parallelism focuses on distributing execution
processes (threads) across different parallel computing
nodes.
In a multiprocessor system, task parallelism is achieved

when each processor executes a different thread (or
process) on the same or different data.
The threads may execute the same or different code. In
the general case, different execution threads
communicate with one another as they work.
Communication takes place usually to pass data from
one thread to the next as part of a workflow.
52
TASK PARALLELISM: CONT
As a simple example, if we are running code on a 2-
processor system (CPUs "a" & "b") in a parallel
environment and we wish to do tasks "A" and "B" ,
it is possible to tell CPU "a" to do task "A" and CPU
"b" to do task 'B" simultaneously, thereby reducing
the runtime of the execution.
The tasks can be assigned using conditional
statements.
Task parallelism emphasizes the distributed

(parallelized) nature of the processing (i.e. threads),
as opposed to the data (data parallelism).
53
SYSTEM DESIGN ISSUES IN SOCS
The design of a SoC has similar goals as an embedded design.
The designed system will be used in a well-specified
environment, and has to fulfill strict requirements. Some
requirements are clearly defined by the application like the
functional requirements of an algorithm, e.g. the decoding of
an MPEG 1 Layer 3 data stream, which covers certain quality
restrictions.
The environment poses other requirements: e.g. minimizing
the cost, footprint, or power consumption. However due to the
flexibility of a SoC design, achieving the set goals, involves
analyzing a multi dimensional design space.
The degrees of freedom stem from the process element types
and characteristics, their allocation, the mapping of
functional elements to the process elements, their
interconnection with busses and their scheduling.
A SoC design has to deal with a wide range: it starts with a 54
functional description on system level, where major function
blocks are defined and no timing information is given.
ABSTRACTION LEVELS IN SOC DESIGN
The goal of SoC design paradigm is to manage the

immense size of design decisions in the hardware
software co-design. This is only possible by above 55
well-defined flow of design steps
EMBEDDED SYSTEM
56
EMBEDDED SYSTEM
An embedded system is a
special-purpose computer
system designed to perform
one or a few dedicated
functions, often with real-
time computing constraints.
It is usually embedded as
part of a complete device
including hardware and
mechanical parts. In
contrast, a general-purpose
computer, such as a personal
computer, can do many
different tasks depending on
The internals of a Netgear ADSL programming.
modem/router. A modern example Embedded systems control
of an embedded system. Labelled many of the common devices
parts include a microprocessor (4), in use today. 57
RAM (6), and flash memory (7).

EMBEDDED SYSTEM: CONT
Physically, embedded systems range from portable
devices such as digital watches and MP4 players, to
large stationary installations like traffic lights, factory
controllers, or the systems controlling nuclear power
plants.
Complexity varies from low, with a single
microcontroller chip, to very high with multiple units,
peripherals and networks mounted inside a large
chassis or enclosure.
In general, "embedded system" is not an exactly defined

term, as many systems have some element of
programmability.
For example, Handheld computers share some elements
with embedded systems such as the operating
systems and microprocessors which power them but
are not truly embedded systems, because they allow 58
different applications to be loaded and peripherals to be
connected.
INTRO TO
EMBEDDED SYSTEM DESIGN
59
MICROCONTROLLERS
A Microcontroller is essentially a small and self sufficient
computer on a chip, used to control devices
It has all the memory and I/O it needs on board

Is not expandable no external bus interface
Characteristics of a Microcontroller
Low cost, on the order of $1
Low speed, on the order of 10 KHz 20 MHz
Low Power, extremely low power in sleep mode
Small architecture, usually an 8-bit architecture
Small memory size, but usually enough for the type of
application it is intended for. Onboard Flash.
Limited I/O, but again, enough for the type of 60
application intended for
MICROPROCESSORS
A Microprocessor is fundamentally a collection of on/off
switches laid out over silicon in order to perform
computations
Characteristics of a Microprocessor
High cost, anywhere between $20 - $200 or more!
High speed, on the order of 100 MHz 4 GHz
High Power consumption, lots of heat
Large architecture, 32-bit, and recently 64-bit
architecture
Large memory size, onboard flash and cache, with an
external bus interface for greater memory usage
Lots of I/O and peripherals, though Microprocessors 61
tend to be short on General purpose I/O
HARVARD ARCHITECTURE
Harvard Architecture refers to a memory structure
where the processor is connected to two different
memory banks via two sets of buses
This is to provide the processor with two distinct data

paths, one for instruction and one for data
Through this scheme, the CPU can read both an

instruction and data from the respective memory banks
at the same time
This inherent independence increases the throughput of

the machine by enabling it to always pre-fetch the next
instruction
The cost of such a system is complexity in hardware 62
Commonly used in DSPs
VON-NEUMANN MACHINE
A Von-Neumann Machine, in contrast to the Harvard
Architecture provides one data path (bus) for both
instruction and data
As a result, the CPU can either be fetching an

instruction from memory, or read/writing data to it
Other than less complexity of hardware, it allows for

using a single, sequential memory.
Todays processing speeds vastly outpace memory access

times, and we employ a very fast but small amount of
memory (cache) local to the processor
Modern processors employ a Harvard Architecture to
read from two instruction and data caches, when at the
63
same time using a Von-Neumann Architecture to access
external memory
LITTLE VS. BIG ENDIAN
Although numbers are always displayed in the

same way, they are not stored in the same way in
memory
Big-Endian machines store the most significant
byte of data in the lowest memory address
Little-Endian machines on the other hand, store
the least significant byte of data in the lowest
memory address
64
LITTLE VS. BIG ENDIAN: CONT
The Intel family of Microprocessors and processors

from Digital Equipment Corporation use Little-
Endian mode
Whereas Architectures from Sun, IBM, and

Motorola are Big-Endian
Architectures such as PowerPC, MIPS, and Intels

IA- 64 are Bi-Endian, supporting either mode
Unfortunately both methods are in prevalent use 65

today, and neither method is superior to the other
PROGRAM COUNTER (PC)
The Program Counter is a 16 or 32 bit register which

contains the address of the next instruction to be
executed
The PC automatically increments to the next sequential

memory location every time an instruction is fetched
Branch, jump, and interrupt operations load the

Program Counter with an address other than the next
sequential location
During reset, the PC is loaded from a pre-defined

memory location to signify the starting address of the 66
code
RESET VECTOR
The significance of the reset vector is that it points the
processor to the memory address which contains the
firmwares first instruction
Without the Reset Vector, the processor would not know
where to begin execution
Upon reset, the processor loads the Program Counter

(PC) with the reset vector value from a pre-defined
memory location
On CPU08 architecture, this is at location

$FFFE:$FFFF
A common mistake which occurs during the debug
phase when reset vector is not necessary the
developer takes it for granted and doesnt program into
the final image. As a result, the processor doesnt start 67
up on the final product.
STACK POINTER (SP)
The Stack Pointer (SP), much like the reset vector, is
required at boot time for many processors
Some processors, in particular the 8-bit microcontrollers

automatically provide the stack pointer by resetting it to
a predefined value
On a higher end processor, the stack pointer is usually

read from a non-volatile memory location, much likethe
reset vector
For example on a ColdFire microprocessor, the first

sixteen bytes of memory location must be programmed
as follows:
0x00000000: Reset Vector 68
0x00000008: Stack Pointer
COP WATCHDOG TIMER
The Computer Operating Properly (COP) module is a
component of modern processors which provides a
mechanism to help software recover from runaway code
The COP, also known as the Watchdog Timer, is a free

running counter that generates a reset if it runs up to a
pre-defined value and overflows
In order to prevent a watchdog reset, the user code

must clear the COP counter periodically.
COP can be disabled through register settings, and

even though this is not good practice for final firmware
release, it is a prudent strategy through the course of
debug
69
THE INFINITE LOOP
Embedded Systems, unlike a PC, never exit an

application
They idle through an Infinite Loop waiting for an

event to happen in the form of an interrupt, or a
pre-scheduled task
In order to save power, some processors enter

special sleep or wait modes instead of idling
through an Infinite Loop, but they will come out of
this mode upon either a timer or an External
Interrupt 70
INTERRUPTS
Interrupts are mostly hardware mechanisms
which tell the program an event has occurred
They happen at any time, and are therefore

asynchronous to program flow
They require special handling by the processor,

and are ultimately handled by a corresponding
Interrupt Service Routine (ISR)
Need to be handled quickly. Take too much time

servicing an interrupt, and you may miss another
interrupt. 71
DESIGNING AN EMBEDDED SYSTEM
Proposal
Definition
Technology Selection
Budgeting (Time, Human, Financial)
Material and Development tool purchase
Schematic Capture & PCB board design
Firmware Development & Debug
Hardware Manufacturing
Testing: In-Situ, Environmental
Certification: CE
Firmware Release
Documentation
72
Ongoing Support
SYSTEM DESIGN CYCLE
The purpose of the design cycle is to remind and
guide the developer to step within a framework
proven to keep you on track and on budget.
There are numerous design cycle methodologies, of

which the following are most popular
The Spaghetti Model
The Waterfall Model
Top-down versus Bottom-up
Spiral Model
GANTT charts
73
SYSTEM DESIGN CYCLE:
THE WATERFALL MODEL
Waterfall is a software development model in which

development is seen flowing steadily through the
phases of
Requirement Analysis
Design
Implementation
Testing
Integration
Maintenance
Advantages are good progress tracking due to clear

milestones
Disadvantages are its inflexibility, by making it
difficult to respond to changing customer needs /
market conditions 74
TOP-DOWN VERSUS BOTTOM-UP
The Top-Down Model analyses the overall

functionality of a system, without going into
details
Each successive iteration of this process then designs
individual pieces of the system in greater detail
The Bottom-Up Model in contrast defines the

individual pieces of the system in great detail
These individual components are then interfaced
together to form a larger system
75
THE SPIRAL MODEL
Modern software design practices such as the

Spiral Model employ both top-down and bottom-
up techniques
Widely used in the industry today
For a GUI application, for example, the Spiral

Model would contend that
You first start off with a rough-sketch of user
interface (simple buttons & icons)
Make the underlying application work
Only then start adding features and in a final stage
76
spruce up the buttons & icons
GANTT CHART
GANTT Chart is simply a type of bar chart which

shows the interrelationships of how projects and
schedules progress over time
77
DESIGN METRICS
Metrics to consider in designing an Embedded

System
Unit Cost: Can be a combination of cost to
manufacture hardware + licensing fees
NRE Costs: Non Recurring Engineering costs
Size: The physical dimensions of the system
Power Consumption: Battery, power supply, wattage,
current consumption, etc.
Performance: The throughput of the system, its
response time, and computation power
Safety, fault-tolerance, field-upgradeability,
ruggedness, maintenance, ease of use, ease of
78
installation, etc. etc.
SCHEMATIC CAPTURE
79
PCB LAYOUT
80
PCB BOARD
81
ANY
shindesir.pvp@gmail.com
82

SoC System Design

Uploaded by

Document Information

Copyright

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

SoC System Design

Uploaded by

Copyright:

SYSTEM ON CHIP

System-on-a-chip or system on chip (SoC or SOC)

It may contain digital, analog, mixed-signal, and often

Microcontrollers typically have under 100K of RAM

Many interesting systems are too complex to fit on just

When it is not feasible to construct an SoC for a

In large volumes, SoC is believed to be more cost

The SoC chip includes processors and numerous digital

Memory blocks including a selection of ROM, RAM,

Timing sources including oscillators and phase-locked

Peripherals including counter-timers, real-time timers

External interfaces including industry standards such

Analog interfaces including ADCs and DACs.

Voltage regulators and power management

These blocks are connected by either a proprietary

DMA controllers route data directly between

The design flow

Most SoCs are developed from pre-qualified hardware blocks for

A key step in the design flow is emulation: the hardware is

Chips are verified for logical correctness before being sent to

Supercomputers, popular in the

The first era of SIMD machines was characterized by

SIMD machine exploits a property of the data

A very important class of architectures in the history of

Implementing an algorithm with SIMD

Programming with particular SIMD instruction

Small-scale (64 or 128 bits) SIMD has become

SIMD instructions can be found, to one degree or

The IBM, Sony, Toshiba co-developed Cell

Single instruction refers to

In computing, SISD (Single

According to Michael J. Flynn, SISD

Machines using MIMD have a

At any time, different processors

Many believe that the next major advances in

This category was included more for the sake of

Both of these are examples of this type of processing

In computing, a pipeline is a set of data

As the assembly line example shows, pipelining doesn't

A pipelined system typically requires more resources

Graphics pipelines, found in most graphics cards, which consist

Software pipelines, consisting of multiple processes arranged

Parallel computing, on the other hand, uses multiple

This is accomplished by breaking the problem into

The processing elements can be diverse and include resources

Historically, 4-bit microprocessors were replaced with 8-bit, then

A goal of compiler and processor designers is to

How much ILP exists in programs is very application

In a multiprocessor system executing a single set of

In a multiprocessor system, task parallelism is achieved

Task parallelism emphasizes the distributed

The goal of SoC design paradigm is to manage the

RAM (6), and flash memory (7).

In general, "embedded system" is not an exactly defined

It has all the memory and I/O it needs on board

This is to provide the processor with two distinct data

Through this scheme, the CPU can read both an

This inherent independence increases the throughput of

As a result, the CPU can either be fetching an

Other than less complexity of hardware, it allows for

Todays processing speeds vastly outpace memory access

Although numbers are always displayed in the

The Intel family of Microprocessors and processors

Whereas Architectures from Sun, IBM, and

Architectures such as PowerPC, MIPS, and Intels

Unfortunately both methods are in prevalent use 65

The Program Counter is a 16 or 32 bit register which