SYSTEM ON CHIP

1

Mr. A. B. Shinde
Lecturer,
Department of Electronics Engg. P.V.P.I.T. Budhgaon.

SOC BASICS
2

SOC BASICS

System-on-a-chip or system on chip (SoC or SOC) refers to integrating all components of a computer or other electronic system into a single integrated circuit (chip). It may contain digital, analog, mixed-signal, and often radio-frequency functions – all on one.
Microcontrollers typically have under 100K of RAM (often just a few Kbytes) and often really are singlechip-systems; whereas the term SoC is typically used with more powerful processors, capable of running software such as Windows or Linux, which need external memory chips (flash, RAM) to be useful, and which are used with various external peripherals.

3

SOC BASICS CONT…

Many interesting systems are too complex to fit on just one chip built with a process optimized for just one of the system's tasks.
When it is not feasible to construct an SoC for a particular application, an alternative is a system in package (SiP) comprising a number of chips in a single package. In large volumes, SoC is believed to be more cost effective than SiP, because its packaging is simpler.
The SoC chip includes processors and numerous digital peripherals, and comes in a ball grid package with lower and upper connections.

4

A TYPICAL SOC CONSISTS OF:

One microcontroller, microprocessor or DSP core(s). Some SoCs – called multiprocessor System-on-Chip (MPSoC) – include more than one processor core. Memory blocks including a selection of ROM, RAM, EEPROM and Flash. Timing sources including oscillators and phase-locked loops.

Peripherals including counter-timers, real-time timers and power-on reset generators.
External interfaces including industry standards such as USB, FireWire, Ethernet, USART, SPI.

5

A TYPICAL SOC CONSISTS OF: CONT…

Analog interfaces including ADCs and DACs. Voltage regulators circuits. and power management

These blocks are connected by either a proprietary or industry-standard bus such as the AMBA bus from ARM. DMA controllers route data directly between external interfaces and memory, by-passing the processor core and thereby increasing the data throughput of the SoC

6

DEIGN FLOW OF SOC
7

DESIGN FLOW OF SOC:

An SoC consists of both the hardware described above, and the software that controls the microcontroller, microprocessor or DSP cores, peripherals and interfaces. The design flow for an SoC aims to develop this hardware and software in parallel.

8

DESIGN FLOW OF SOC: CONT…

Most SoCs are developed from pre-qualified hardware blocks for the hardware elements described above, together with the software drivers that control their operation.
A key step in the design flow is emulation: the hardware is mapped onto an emulation platform based on a FPGA that mimics the behavior of the SoC, and the software modules are loaded into the memory of the emulation platform. Once programmed, the emulation platform enables the hardware and software of the SoC to be tested and debugged at close to its full operational speed. After emulation the hardware of the SoC follows the place and route phase of the design of an integrated circuit before it is fabricated.

Chips are verified for logical correctness before being sent to foundry. This process is called functional verification. Verilog and VHDL are typical hardware description languages used for verification.

9

SIMD
SINGLE INSTRUCTION MULTIPLE DATA
10

SIMD (SINGLE INSTRUCTION, MULTIPLE DATA)

In computing, SIMD (Single Instruction, Multiple Data; "vector instructions") is a technique employed to achieve data level parallelism.

Supercomputers, popular in the 1980s such as the Cray X-MP were called "vector processors.‖

 The first era of SIMD machines was characterized by supercomputers such as the Thinking Machines CM-1 and CM-2.  These machines had many limited functionality processors that would work in parallel. For example: each of 64,000 processors in a Thinking Machines CM-2 would execute the same instruction at the same time so that you could do 64,000 multiplies on 64,000 pairs of numbers at a time.

11

SIMD (SINGLE INSTRUCTION, MULTIPLE DATA)

SIMD machine exploits a property of the data stream called data parallelism.

12

SIMD (SINGLE INSTRUCTION, MULTIPLE DATA)

A very important class of architectures in the history of computation, single-instruction/multiple-data machines are capable of applying the exact same instruction stream to multiple streams of data simultaneously. This type of architecture is perfectly suited to achieving very high processing rates, as the data can be split into many different independent pieces, and the multiple instruction units can all operate on them at the same time.

13

SIMD TYPES

Synchronous (lock-step): These types of systems are generally considered to be synchronous, meaning that they are built in such a way as to guarantee that all instruction units will receive the same instruction at the same time, and thus all will potentially be able to execute the same operation simultaneously. Deterministic SIMD architectures: These are deterministic because, at any one point in time, there is only one instruction being executed, even though multiple units may be executing it. So, every time the same program is run on the same data, using the same number of execution units, exactly the same result is guaranteed at every step in the process. Well-suited to instruction/operation level parallelism: The “single” in single-instruction doesn’t mean that there’s only one instruction unit, as it does in SISD, but rather that there’s only one instruction stream, and this instruction stream is executed by multiple processing units on different pieces of data, all at the same time, thus achieving parallelism.

14

SIMD (ADVANTAGES)

An application where the same value is being added (or subtracted) to a large number of data points, a common operation in many multimedia applications.

One example would be changing the brightness of an image. To change the brightness, the R G and B values are read from memory, a value is added (or subtracted) from them, and the resulting values are written back out to memory.

The data is understood to be in blocks, and a number of values can be loaded all at once.

Instead of a series of instructions saying "get this pixel, now get the next pixel", a SIMD processor will have a single instruction that effectively says "get lots of pixels―. This can take much less time than "getting" each pixel individually, like with traditional CPU design.
15

If the SIMD system works by loading up eight data points at once, the add operation being applied to the data will happen to all eight values at the same time.

SIMD (DISADVANTAGES)
 

Not all algorithms can be vectorized. Implementing an algorithm with SIMD instructions usually requires human labor; most compilers don't generate SIMD instructions from a typical C program, for instance. Programming with particular SIMD instruction sets can involve numerous low-level challenges.

It has restrictions on data alignment. Gathering data into SIMD registers and scattering it to the correct destination locations is tricky and can be inefficient. Specific instructions like rotations or three-operand addition aren't in some SIMD instruction sets.

16

SIMD HARDWARE

Small-scale (64 or 128 bits) SIMD has become popular on general-purpose CPUs in the early 1990s and continuing through 1997 and later with Motion Video Instructions (MVI) for Alpha. SIMD instructions can be found, to one degree or another, on most CPUs

The IBM, Sony, Toshiba co-developed Cell Processor's SPU's instruction set is heavily SIMD based. NXP founded by Philips developed several SIMD processors named Xetal.

17

SIMD SOFTWARE

 

SIMD instructions are widely used to process 3D graphics, although modern graphics cards with embedded SIMD have largely taken over this task from the CPU. Adoption of SIMD systems in personal computer software was at first slow, due to a number of problems. SIMD on x86 had a slow start. Apple Computer had somewhat more success, even though they entered the SIMD market later than the rest. Additionally, many of the systems that would benefit from SIMD were supplied by Apple itself. Apple was the dominant purchaser of PowerPC chips from IBM. SIMD within a register, or SWAR, is a range of techniques and tricks used for performing SIMD in general-purpose registers on hardware that doesn't provide any direct support for SIMD instructions. This can be used to exploit parallelism in certain algorithms even on hardware that does not support SIMD directly.

18

SISD
SINGLE INSTRUCTION SINGLE DATA
19

SISD (SINGLE INSTRUCTION, SINGLE DATA )

This is the oldest style of computer architecture, and still one of the most important: all personal computers fit within this category Single instruction refers to the fact that there is only one instruction stream being acted on by the CPU during any one clock tick; single data means, analogously, that one and only one data stream is being employed as input during any one clock tick.

20

SISD (SINGLE INSTRUCTION, SINGLE DATA )

In computing, SISD (Single Instruction, Single Data) is a term referring to a computer architecture in which a single processor, a uniprocessor, executes a single instruction stream, to operate on data stored in a single memory. This corresponds to the von Neumann architecture.
According to Michael J. Flynn, SISD can have concurrent processing characteristics. Instruction fetching and pipelined execution of instructions are common examples found in most modern SISD computers.

21

CHARACTERISTICS OF

SISD

Serial Instructions are executed one after the other, in lock-step; this type of sequential execution is commonly called serial, as opposed to parallel, in which multiple instructions may be processed simultaneously. Deterministic Because each instruction has a unique place in the execution stream, and thus a unique time during which it and it alone is being processed, the entire execution is said to be deterministic, meaning that you (can potentially) know exactly what is happening at all times, and, ideally, you can exactly recreate the process, step by step, at any later time. Examples:
  

all personal computers, all single-instruction-unit-CPU workstations, mini-computers, and mainframes.

22

MIMD
MULTIPLE INSTRUCTION MULTIPLE DATA
23

MIMD (MULTIPLE INSTRUCTION, MULTIPLE DATA )

In computing, MIMD (Multiple Instruction stream, Multiple Data stream) is a technique employed to achieve parallelism. Machines using MIMD have a number of processors that function asynchronously and independently. At any time, different processors may be executing different instructions on different pieces of data. MIMD architectures may be used in a number of application areas such as computer-aided design/computer-aided manufacturing, simulation, modeling, and as communication switches.

24

MIMD (MULTIPLE INSTRUCTION, MULTIPLE DATA )

MIMD machines can be of either shared memory or distributed memory categories.

Shared memory machines may be of the bus-based, extended, or hierarchical type. Distributed memory machines may have hypercube or mesh interconnection schemes.

25

MIMD (MULTIPLE INSTRUCTION, MULTIPLE DATA )

Many believe that the next major advances in computational capabilities will be enabled by this approach to parallelism which provides for multiple instruction streams simultaneously applied to multiple data streams.

26

MIMD (MULTIPLE INSTRUCTION, MULTIPLE DATA )

The most general of all of the major categories, a MIMD machine is capable of being programmed to operate as if it were in fact any of the four.

Synchronous or asynchronous MIMD instruction streams can potentially be executed either synchronously or asynchronously, i.e., either in tightly controlled lock-step or in a more loosely bound “do your own thing” mode. Deterministic or non-deterministic MIMD systems are potentially capable of deterministic behavior, that is, of reproducing the exact same set of processing steps every time a program is run on the same data. Well-suited to block, loop, or subroutine level parallelism. The more code each processor in an MIMD assembly is given domain over, the more efficiently the entire system will operate, in general. Multiple Instruction or Single Program MIMD-style systems are capable of running in true “multiple-instruction” mode, with every processor doing something different, or every processor can be given the same code; this latter case is called SPMD, “Single Program Multiple Data”, and is a generalization of SIMD-style parallelism.

27

MIMD : SHARED MEMORY MODEL

The processors are all connected to a "globally available" memory, via either a software or hardware means. The operating system usually maintains its memory coherence.

Bus-based:
MIMD machines with shared memory have processors which share a common, central memory.  Here all processors are attached to a bus which connects them to memory.  This setup is called bus-base point where there is too much contention on the bus.

Hierarchical:
MIMD machines with hierarchical shared memory use a hierarchy of buses to give processors access to each other's memory.  Processors on different boards may communicate through internodal buses.  Buses support communication between boards.  With this type of architecture, the machine may support over a thousand processors.

28

MIMD : DISTRIBUTED MEMORY MODEL


In distributed memory MIMD machines, each processor has its own individual memory location. Each processor has no direct knowledge about other processor's memory. For data to be shared, it must be passed from one processor to another as a message. Since there is no shared memory, contention is not as great a problem with these machines. It is not economically feasible to connect a large number of processors directly to each other. A way to avoid this multitude of direct connections is to connect each processor to just a few others. The amount of time required for processors to perform simple message routing can be substantial. Systems were designed to reduce this time loss and hypercube and mesh are among two of the popular interconnection schemes.

29

MIMD : DISTRIBUTED MEMORY MODEL
 

Interconnection schemes: Hypercube interconnection network:
In an MIMD distributed memory machine with a hypercube system interconnection network containing four processors, a processor and a memory module are placed at each vertex of a square.  The diameter of the system is the minimum number of steps it takes for one processor to send a message to the processor that is the farthest away.  So, for example, In a hypercube system with eight processors and each processor and memory module being placed in the vertex of a cube, the diameter is 3. In general, a system that contains 2^N processors with each processor directly connected to N other processors, the diameter of the system is N.

Mesh interconnection network:
In an MIMD distributed memory machine with a mesh interconnection network, processors are placed in a twodimensional grid.  Each processor is connected to its four immediate neighbors. Wrap around connections may be provided at the edges of the mesh. 30  One advantage of the mesh interconnection network over the hypercube is that the mesh system need not be configured in powers of two.

MISD
MULTIPLE INSTRUCTIONS SINGLE DATA
31

MISD (MULTIPLE INSTRUCTIONS, SINGLE DATA)

 

In computing, MISD (Multiple Instruction, Single Data) is a type of parallel computing architecture where many functional units perform different operations on the same data. Pipeline architectures belong to this type. Fault-tolerant computers executing the same instructions redundantly in order to detect and mask errors, in a manner known as task replication, may be considered to belong to this type. Not many instances of this architecture exist, as MIMD and SIMD are often more appropriate for common data parallel techniques.

32

MISD (MULTIPLE INSTRUCTIONS, SINGLE DATA)

―I thought of another example of a MISD process that is carried out routinely at [the] United Nations. When a delegate speaks in a language of his/her choice, his speech is simultaneously translated into a number of other languages for the benefit of other delegates present. Thus the delegate‘s speech (a single data) is being processed by a number of translators (processors) yielding different results.‖

33

MISD (MULTIPLE INSTRUCTIONS, SINGLE DATA)

This category was included more for the sake of completeness than to identify a working group of actual computer systems.

MISD Examples:
 

Multiple frequency filters operating on a single signal stream. Multiple cryptography algorithms attempting to crack a single coded message.

Both of these are examples of this type of processing where multiple, independent instruction streams are applied simultaneously to a single data stream.

34

PIPELINING
35

PIPELINING

In computing, a pipeline is a set of data processing elements connected in series, so that the output of one element is the input of the next one.  The elements of a pipeline are often executed in parallel or in time-sliced fashion; in that case, some amount of buffer storage is often inserted between elements.

36

PIPELINING (CONCEPT AND MOTIVATION)

Consider the assembly of a car: Assume that certain steps in the assembly line are to install the engine, install the hood, and install the wheels. A car on the assembly line can have only one of the three steps done at once. After the car has its engine installed, it moves on to having its hood installed, leaving the engine installation facilities available for the next car. The first car then moves on to wheel installation, the second car to hood installation, and a third car begins to have its engine installed. If engine installation takes 20 minutes, hood installation takes 5 minutes, and wheel installation takes 10 minutes, then finishing all three cars when only one car can be operated at once would take 105 minutes. On the other hand, using the assembly line, the total time to complete all three is 75 minutes. At this point, additional cars will come off the assembly line.

37

PIPELINING (COSTS, DRAWBACKS, AND BENEFITS)

As the assembly line example shows, pipelining doesn't decrease the time for a single datum to be processed; it only increases the throughput of the system when processing a stream of data. High pipelining leads to increase of latency - the time required for a signal to propagate through a full pipe.
A pipelined system typically requires more resources (circuit elements, processing units, computer memory, etc.) than one that executes one batch at a time, because its stages cannot reuse the resources of a previous stage. Moreover, pipelining may increase the time it takes for an instruction to finish.

38

PIPELINING (IMPLEMENTATIONS)

Buffered, Synchronous pipelines: Conventional microprocessors are synchronous circuits that use buffered, synchronous pipelines. In these pipelines, "pipeline registers" are inserted in-between pipeline stages, and are clocked synchronously. Buffered, Asynchronous pipelines: Asynchronous pipelines are used in asynchronous circuits, and have their pipeline registers clocked asynchronously. Generally speaking, they use a request/acknowledge system, wherein each stage can detect when it's "finished―. When a stage is finished and the next stage has sent it a "request" signal, the stage sends an "acknowledge" signal to the next stage, and a "request" signal to the previous stage. When a stage receives an "acknowledge" signal, it clocks its input registers, thus reading in the data from the previous stage. Unbuffered pipelines: Unbuffered pipelines, called "wave pipelines", do not have registers in-between pipeline stages. Instead, the delays in the pipeline are "balanced" so that, for each 39 stage, the difference between the first stabilized output data and the last is minimized.

PIPELINING (COMPUTER-RELATED)

Instruction pipelines, such as the classic RISC pipeline, which are used in processors to allow overlapping execution of multiple instructions with the same circuitry. The circuitry is usually divided up into stages, including instruction decoding, arithmetic, and register fetching stages, wherein each stage processes one instruction at a time. Graphics pipelines, found in most graphics cards, which consist of multiple arithmetic units, or complete CPUs, that implement the various stages of common rendering operations. Software pipelines, consisting of multiple processes arranged so that the output stream of one process is automatically and promptly fed as the input stream of the next one. Unix pipelines are the classical implementation of this concept.

40

INSTRUCTION PIPELINE

An instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput (the number of instructions that can be executed in a unit of time). The fundamental idea is to split the processing of a computer instruction into a series of independent steps, with storage at the end of each step. This allows the computer's control circuitry to issue instructions at the processing rate of the slowest step, which is much faster than the time needed to perform all steps at once. For example, the classic RISC pipeline is broken into five stages with a set of flip flops between each stage.
    

Instruction fetch Instruction decode and register fetch Execute Memory access Register write back

41

PIPELINING (ADVANTAGES AND DISADVANTAGES)

Pipelining does not help in all cases. An instruction pipeline is said to be fully pipelined if it can accept a new instruction every clock cycle. A pipeline that is not fully pipelined has wait cycles that delay the progress of the pipeline. Advantages of Pipelining:
The cycle time of the processor is reduced, thus increasing instruction issuerate in most cases.  Some combinational circuits such as adders or multipliers can be made faster by adding more circuitry. If pipelining is used instead, it can save circuitry.

Disadvantages of Pipelining:
A non-pipelined processor executes only a single instruction at a time. This prevents branch delays and problems with serial instructions being executed concurrently. Consequently the design is simpler and cheaper to manufacture.  The instruction latency in a non-pipelined processor is slightly lower than in a pipelined equivalent. This is due to the fact that extra flip flops must be added to the data path of a pipelined processor.  A non-pipelined processor will have a stable instruction bandwidth. The performance of a pipelined processor is much harder to predict and may vary more widely between different programs.

42

PARALLEL COMPUTING
43

PARALLEL COMPUTING

Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently ("in parallel"). There are several different forms of parallel computing:
 


bit-level, instruction level, data, and task parallelism.

Parallelism has been employed for many years, mainly in high-performance computing, but interest in it has grown lately due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multicore processors.

44

PARALLEL COMPUTING

Traditionally, computer software has been written for serial computation. To solve a problem, an algorithm is constructed and implemented as a serial stream of instructions. These instructions are executed on a central processing unit on one computer. Only one instruction may execute at a time—after that instruction is finished, the next is executed.

Parallel computing, on the other hand, uses multiple processing elements simultaneously to solve a problem.
This is accomplished by breaking the problem into independent parts so that each processing element can execute its part of the algorithm simultaneously with the others. The processing elements can be diverse and include resources such as a single computer with multiple processors, several networked computers, specialized hardware, or any combination of the above.

45

TYPES OF PARALLELISM

Bit-level parallelism:

From the advent of VLSI in the 1970s until about 1986, speed-up in computer architecture was driven by doubling computer word size— the amount of information the processor can manipulate per cycle. Increasing the word size reduces the number of instructions the processor must execute to perform an operation on variables whose sizes are greater than the length of the word.

Instruction-level parallelism:

A computer program is, a stream of instructions executed by a processor. These instructions can be re-ordered and combined into groups which are then executed in parallel without changing the result of the program. This is known as instruction-level parallelism.
Data parallelism is parallelism inherent in program loops, which focuses on distributing the data across different computing nodes to be processed in parallel. Task parallelism is the characteristic of a parallel program that "entirely different calculations can be performed on either the same or different sets of data‖ This contrasts with data parallelism, where46 the same calculation is performed on the same or different sets of data.

Data parallelism:

Task parallelism:

BIT-LEVEL PARALLELISM
 

Bit-level parallelism is a form of parallel computing based on increasing processor word size. Increasing the word size reduces the number of instructions the processor must execute in order to perform an operation on variables whose sizes are greater than the length of the word. For example: Consider a case where an 8-bit processor must add two 16-bit integers. The processor must first add the 8 lower-order bits from each integer, then add the 8 higher-order bits, requiring two instructions to complete a single operation. A 16-bit processor would be able to complete the operation with single instruction Historically, 4-bit microprocessors were replaced with 8-bit, then 16-bit, then 32-bit microprocessors. This trend generally came to an end with the introduction of 32-bit processors, which has been a standard in general purpose computing for two decades. Only recently, with the advent of x86-64 architectures, have 64-bit processors become commonplace.

47

INSTRUCTION LEVEL PARALLELISM

Instruction-level parallelism (ILP) is a measure of how many of the operations in a computer program can be performed simultaneously. Consider the following program: For Example: 1. e = a + b 2. f = c + d 3. g = e * f Here, Operation 3 depends on the results of operations 1 and 2, so it cannot be calculated until both of them are completed. However, operations 1 and 2 do not depend on any other operation, so they can be calculated simultaneously. If we assume that each operation can be completed in one unit of time then these three instructions can be completed in a total of two units of time, giving an ILP of 3/2.

48

INSTRUCTION LEVEL PARALLELISM: CONT…

A goal of compiler and processor designers is to identify and take advantage of as much ILP as possible. Ordinary programs are typically written under a sequential execution model where instructions execute one after the other and in the order specified by the programmer. ILP allows the compiler and the processor to overlap the execution of multiple instructions or even to change the order in which instructions are executed.
How much ILP exists in programs is very application specific. In certain fields, such as graphics and scientific computing the amount can be very large. However, workloads such as cryptography exhibit much less parallelism.

49

DATA PARALLELISM
Data parallelism (also known as loop-level parallelism) is a form of parallelization of computing across multiple processors in parallel computing environments.  Data parallelism focuses on distributing the data across different parallel computing nodes.
 

In a multiprocessor system executing a single set of instructions (SIMD), data parallelism is achieved when each processor performs the same task on different pieces of distributed data. In some situations, a single execution thread controls operations on all pieces of data.

50

DATA PARALLELISM


For instance, consider a 2-processor system (CPUs A and B) in a parallel environment, and we wish to do a task on some data ‗d‘. It is possible to tell CPU A to do that task on one part of ‗d‘ and CPU B on another part simultaneously, thereby reducing the duration of the execution. The data can be assigned using conditional statements As a specific example, consider adding two matrices. In a data parallel implementation, CPU A could add all elements from the top half of the matrices, while CPU B could add all elements from the bottom half of the matrices. Since the two processors work in parallel, the job of performing matrix addition would take one half the time of performing the same operation in serial using one CPU alone.

51

TASK PARALLELISM

Task parallelism (also known as function parallelism and control parallelism) is a form of parallelization of computer code across multiple processors in parallel computing environments. Task parallelism focuses on distributing execution processes (threads) across different parallel computing nodes. In a multiprocessor system, task parallelism is achieved when each processor executes a different thread (or process) on the same or different data. The threads may execute the same or different code. In the general case, different execution threads communicate with one another as they work. Communication takes place usually to pass data from one thread to the next as part of a workflow.

52

TASK PARALLELISM: CONT…
As a simple example, if we are running code on a 2processor system (CPUs "a" & "b") in a parallel environment and we wish to do tasks "A" and "B" , it is possible to tell CPU "a" to do task "A" and CPU "b" to do task 'B" simultaneously, thereby reducing the runtime of the execution.  The tasks can be assigned using conditional statements.
 

Task parallelism emphasizes the distributed (parallelized) nature of the processing (i.e. threads), as opposed to the data (data parallelism).

53

SYSTEM DESIGN ISSUES IN SOCS
 

The design of a SoC has similar goals as an embedded design. The designed system will be used in a well-specified environment, and has to fulfill strict requirements. Some requirements are clearly defined by the application like the functional requirements of an algorithm, e.g. the decoding of an MPEG 1 Layer 3 data stream, which covers certain quality restrictions. The environment poses other requirements: e.g. minimizing the cost, footprint, or power consumption. However due to the flexibility of a SoC design, achieving the set goals, involves analyzing a multi dimensional design space. The degrees of freedom stem from the process element types and characteristics, their allocation, the mapping of functional elements to the process elements, their interconnection with busses and their scheduling. A SoC design has to deal with a wide range: it starts with a functional description on system level, where major function blocks are defined and no timing information is given.

54

ABSTRACTION LEVELS IN SOC DESIGN

The goal of SoC design paradigm is to manage the immense size of design decisions in the hardware software co-design. This is only possible by above well-defined flow of design steps

55

EMBEDDED SYSTEM

56

EMBEDDED SYSTEM

The internals of a Netgear ADSL modem/router. A modern example of an embedded system. Labelled parts include a microprocessor (4), RAM (6), and flash memory (7).

An embedded system is a special-purpose computer system designed to perform one or a few dedicated functions, often with realtime computing constraints. It is usually embedded as part of a complete device including hardware and mechanical parts. In contrast, a general-purpose computer, such as a personal computer, can do many different tasks depending on programming. Embedded systems control many of the common devices in use today. 57

EMBEDDED SYSTEM: CONT…

Physically, embedded systems range from portable devices such as digital watches and MP4 players, to large stationary installations like traffic lights, factory controllers, or the systems controlling nuclear power plants. Complexity varies from low, with a single microcontroller chip, to very high with multiple units, peripherals and networks mounted inside a large chassis or enclosure. In general, "embedded system" is not an exactly defined term, as many systems have some element of programmability. For example, Handheld computers share some elements with embedded systems — such as the operating systems and microprocessors which power them — but are not truly embedded systems, because they allow different applications to be loaded and peripherals to be connected.

58

INTRO TO

EMBEDDED SYSTEM DESIGN

59

MICROCONTROLLERS

A Microcontroller is essentially a small and self sufficient computer on a chip, used to control devices

  

It has all the memory and I/O it needs on board Is not expandable – no external bus interface
Characteristics of a Microcontroller • Low cost, on the order of $1 • Low speed, on the order of 10 KHz – 20 MHz • Low Power, extremely low power in sleep mode • Small architecture, usually an 8-bit architecture • Small memory size, but usually enough for the type of application it is intended for. Onboard Flash. • Limited I/O, but again, enough for the type of 60 application intended for

MICROPROCESSORS

A Microprocessor is fundamentally a collection of on/off switches laid out over silicon in order to perform computations Characteristics of a Microprocessor • High cost, anywhere between $20 - $200 or more! • High speed, on the order of 100 MHz – 4 GHz • High Power consumption, lots of heat •Large architecture, 32-bit, and recently 64-bit architecture • Large memory size, onboard flash and cache, with an external bus interface for greater memory usage • Lots of I/O and peripherals, though Microprocessors tend to be short on General purpose I/O

61

HARVARD ARCHITECTURE

Harvard Architecture refers to a memory structure where the processor is connected to two different memory banks via two sets of buses This is to provide the processor with two distinct data paths, one for instruction and one for data Through this scheme, the CPU can read both an instruction and data from the respective memory banks at the same time This inherent independence increases the throughput of the machine by enabling it to always pre-fetch the next instruction The cost of such a system is complexity in hardware Commonly used in DSPs

62

VON-NEUMANN MACHINE

A Von-Neumann Machine, in contrast to the Harvard Architecture provides one data path (bus) for both instruction and data As a result, the CPU can either be fetching an instruction from memory, or read/writing data to it Other than less complexity of hardware, it allows for using a single, sequential memory.

Today‘s processing speeds vastly outpace memory access times, and we employ a very fast but small amount of memory (cache) local to the processor Modern processors employ a Harvard Architecture to read from two instruction and data caches, when at the same time using a Von-Neumann Architecture to access external memory

63

LITTLE VS. BIG ENDIAN
Although numbers are always displayed in the same way, they are not stored in the same way in memory  Big-Endian machines store the most significant byte of data in the lowest memory address  Little-Endian machines on the other hand, store the least significant byte of data in the lowest memory address

64

LITTLE VS. BIG ENDIAN: CONT…

The Intel family of Microprocessors and processors from Digital Equipment Corporation use LittleEndian mode Whereas Architectures from Motorola are Big-Endian Sun, IBM, and

Architectures such as PowerPC, MIPS, and Intel‘s IA- 64 are Bi-Endian, supporting either mode Unfortunately both methods are in prevalent use today, and neither method is superior to the other
65

PROGRAM COUNTER (PC)

The Program Counter is a 16 or 32 bit register which contains the address of the next instruction to be executed
The PC automatically increments to the next sequential memory location every time an instruction is fetched Branch, jump, and interrupt operations load the Program Counter with an address other than the next sequential location

During reset, the PC is loaded from a pre-defined memory location to signify the starting address of the code

66

RESET VECTOR

The significance of the reset vector is that it points the processor to the memory address which contains the firmware‘s first instruction Without the Reset Vector, the processor would not know where to begin execution

Upon reset, the processor loads the Program Counter (PC) with the reset vector value from a pre-defined memory location
On CPU08 architecture, this is at location $FFFE:$FFFF A common mistake which occurs during the debug phase – when reset vector is not necessary – the developer takes it for granted and doesn‘t program into the final image. As a result, the processor doesn‘t start up on the final product.

 

67

STACK POINTER (SP)

The Stack Pointer (SP), much like the reset vector, is required at boot time for many processors

Some processors, in particular the 8-bit microcontrollers automatically provide the stack pointer by resetting it to a predefined value
On a higher end processor, the stack pointer is usually read from a non-volatile memory location, much likethe reset vector For example on a ColdFire microprocessor, the first sixteen bytes of memory location must be programmed as follows: 0x00000000: Reset Vector 0x00000008: Stack Pointer

68

COP WATCHDOG TIMER

The Computer Operating Properly (COP) module is a component of modern processors which provides a mechanism to help software recover from runaway code The COP, also known as the Watchdog Timer, is a free running counter that generates a reset if it runs up to a pre-defined value and overflows

In order to prevent a watchdog reset, the user code must clear the COP counter periodically.
COP can be disabled through register settings, and even though this is not good practice for final firmware release, it is a prudent strategy through the course of debug

69

THE INFINITE LOOP

Embedded Systems, unlike a PC, never ―exit‖ an application They idle through an Infinite Loop waiting for an event to happen in the form of an interrupt, or a pre-scheduled task In order to save power, some processors enter special sleep or wait modes instead of idling through an Infinite Loop, but they will come out of this mode upon either a timer or an External Interrupt

70

INTERRUPTS

Interrupts are mostly hardware mechanisms which tell the program an event has occurred They happen at any time, and are therefore asynchronous to program flow They require special handling by the processor, and are ultimately handled by a corresponding Interrupt Service Routine (ISR)

Need to be handled quickly. Take too much time servicing an interrupt, and you may miss another interrupt.

71

DESIGNING AN EMBEDDED SYSTEM




      

Proposal Definition Technology Selection Budgeting (Time, Human, Financial) Material and Development tool purchase Schematic Capture & PCB board design Firmware Development & Debug Hardware Manufacturing Testing: In-Situ, Environmental Certification: CE Firmware Release Documentation Ongoing Support

72

SYSTEM DESIGN CYCLE

The purpose of the design cycle is to remind and guide the developer to step within a framework proven to keep you on track and on budget. There are numerous design cycle methodologies, of which the following are most popular The Spaghetti Model The Waterfall Model Top-down versus Bottom-up Spiral Model GANTT charts

73

SYSTEM DESIGN CYCLE:
THE WATERFALL MODEL

Waterfall is a software development model in which development is seen flowing steadily through the phases of

   

Requirement Analysis Design Implementation Testing Integration Maintenance


Advantages are good progress tracking due to clear milestones Disadvantages are its inflexibility, by making it difficult to respond to changing customer needs / market conditions

74

SYSTEM DESIGN CYCLE:
TOP-DOWN VERSUS BOTTOM-UP

The Top-Down Model analyses the overall functionality of a system, without going into details

Each successive iteration of this process then designs individual pieces of the system in greater detail

The Bottom-Up Model in contrast defines the individual pieces of the system in great detail

These individual components are then interfaced together to form a larger system
75

SYSTEM DESIGN CYCLE:
THE SPIRAL MODEL

Modern software design practices such as the Spiral Model employ both top-down and bottomup techniques Widely used in the industry today For a GUI application, for example, the Spiral Model would contend that

 

You first start off with a rough-sketch of user interface (simple buttons & icons) Make the underlying application work Only then start adding features and in a final stage spruce up the buttons & icons

76

SYSTEM DESIGN CYCLE:
GANTT CHART

GANTT Chart is simply a type of bar chart which shows the interrelationships of how projects and schedules progress over time

77

DESIGN METRICS

Metrics to consider in designing an Embedded System
 

  

Unit Cost: Can be a combination of cost to manufacture hardware + licensing fees NRE Costs: Non Recurring Engineering costs Size: The physical dimensions of the system Power Consumption: Battery, power supply, wattage, current consumption, etc. Performance: The throughput of the system, its response time, and computation power Safety, fault-tolerance, field-upgradeability, ruggedness, maintenance, ease of use, ease of installation, etc. etc.

78

SCHEMATIC CAPTURE

79

PCB LAYOUT

80

PCB BOARD

81

ANY

shindesir.pvp@gmail.com
82

Sign up to vote on this title
UsefulNot useful