You are on page 1of 51

Embedded Systems: A

Contemporary Design Tool
James K. Peckol, Univ. of
Washington
ISBN: 978-0-471-72180-2
Chapter 4 – Memories and the
Memory Subsystem
1

E. Sisinni – Digital Systems for Signal Processing

Classifying memory

The term memory is generic; there are many different kinds of
memory, each with its strengths and weaknesses.

RAM - Random Access Memory. As the name suggests, any
location in memory is visible for immediate access rather than
having to sequence through predecessor locations. The times for a
read operation and a write operation are comparable. It may be
organized as bits, bytes, or words.
ROM – Read Only Memory. During normal operation, RO can only
be read. Like RAM, any location in memory is visible for immediate
access rather than having to sequence through predecessor
locations. The read operation is orders of magnitude faster than a
write operation. Like the RAM, the ROM may be organized as bits,
bytes, or words.

2

E. Sisinni – Digital Systems for Signal Processing

Classifying memory - RAM

3

DRAM - Dynamic RAM. A simple memory cell design with bit
storage implemented using a stored charge mechanism. The stored
charge can leak away if it is not repeatedly restored. These devices
are used for larger memory systems. I/O is aynchronous with
respect to any external system clocks.

SRAM – Static RAM. A more complex memory cell design with bit
storage implemented using a latch-type mechanism. The stored
data does not have to be refreshed. I/O is asynchronous with
respect to any external system clocks.

SDRAM – Synchronous DRAM.
SDRAM synchronizes all
addresses, data, and control
signals to the system clock and
allows much higher data transfer
rates than asynchronous transfers.
E. Sisinni – Digital Systems for Signal Processing

Classifying memory - ROM

PROM – Programmable ROM. A PROM is typically programmed
using a purposely designed device. The memory can be
programmed only one time.

EPROM – Like PROM, a programming device is needed. Erasure,
so that it can be reprogrammed, is done by placing the device under
ultraviolet light for a specified time interval.

EEPROM – Electrically Erasable PROM. Erasure is done electrically
via the programming device.
FLASH - A kind of EEPROM. It can be reprogrammed in situ.


4

E. Sisinni – Digital Systems for Signal Processing

A general memory interface

Memory = Array

For each index that is
accessed, the
corresponding stored value
appears on the output
(Read access).
Conversely, if one provides
an index and an input
value, the data will be
stored at the
corresponding indexed
location (Write access).

5

E. Sisinni – Digital Systems for Signal Processing

A general memory interface

6

The physical model
requires a bit more
work.
A memory interface
generally requires three
categories of signals:
address, data, and
control.
Address signals are
inputs to the memory,
data can be either an
input or an output, and
the control signals are
generally inputs.

E. Sisinni – Digital Systems for Signal Processing

A general memory interface

7

All of the different memory types require both address and data
signals. They differ in the number and the nature of the necessary
control signals.

E. Sisinni – Digital Systems for Signal Processing

Memories: terminology

8

Access Time: The time to access a word in memory
Cycle time: the time interval from the start of one read or write
operation until the start of the next

E. Sisinni – Digital Systems for Signal Processing

Memories: terminology



Bandwidth - is a measure of the word
transmission rate to and from memory via the
memory bus (≠ memory timings).
Latency - the amount of time required to
access the first of a sequence of words.
Block Size - A block is a logical view placed
on a collection of words in memory.
Block Access Time - gives a measure of the
time to access an entire block from the start
of a read.
Page - A page is a logical view placed on
larger collections of words in memory.
– Pages are generally comprised of blocks; the
size of a page can be given in words or in
blocks.

9

E. Sisinni – Digital Systems for Signal Processing

Memory architecture

Independent of
memory type, the
typical memory chip
appears as:
– The vertical and
horizontal dimensions
are usually very
similar, for an aspect
ratio of unity.
– Multiple words are
stored in each row
and selected
simultaneously
– A column decoder is
added to select the
desired word from a
row.

10

E. Sisinni – Digital Systems for Signal Processing

Memory architecture

11

Larger memories start to suffer excess delay along bit and word
lines. A third dimension is added to the address space to solve this
problem:

E. Sisinni – Digital Systems for Signal Processing

ROM overview

12

Transistors are used to connect to ground bit inside a memory word
(a floating connection is read as a logical ‘1’)

E. Sisinni – Digital Systems for Signal Processing

ROM – Read operation

13

A value is read from a ROM by asserting one of the row lines.

E. Sisinni – Digital Systems for Signal Processing

Non-volatile Read-Write Memories

Virtually identical in structure to ROMs.
Selective enabling/disabling of transistors is accomplished through
modifications to threshold voltage. This is accomplished through a
floating gate.
– Applying a high voltage (15 to 20 V) between source and gate-drain
create high electric field and causes avalanche injection to occur.
– Hot electrons traverse first oxide and get trapped on floating gate,
leaving it negatively charged.
– This increases the threshold voltage to ~7V. Applying 5V to the gate
does not permit the device to turn on.

14

E. Sisinni – Digital Systems for Signal Processing

SRAM - Overview

15

A high-level interface to the SRAM is very similar to that for the
ROM. There are six transistors per cell (two in each of the buffers
and the two pull-up transistors); two access transistors enable the
cell for read and write.

E. Sisinni – Digital Systems for Signal Processing

SRAM – Read & Write

Typical timing for a read and a write operation is:
OE_ determines direction.
Hi = Write, Lo = Read
Writes are dangerous! Be careful!
Double signaling: OE_ Hi, RW_ Lo

Write Timing:
CS_
D

Data In

A

Write Address

Read Timing:
High Z

Data Out
Read Address

Read Address

OE_
RW_

16

Write
Hold Time
Write Setup Time

Read Access
Time

E. Sisinni – Digital Systems for Signal Processing

Read Access
Time

DRAM - Overview


17

In the DRAM there is only one transistor per cell!
The read operation destroy the info! The sensed and amplified value
is placed back on to the bit line (a restore or rewrite operation).
Write op: it charges the capacitor if a logical 1 is to be stored and
discharges it if a logical 0 is to be stored.

E. Sisinni – Digital Systems for Signal Processing

DRAM – Read & Write

CAS or Column
Address Strobe is a
clock used in
dynamic memories
to control the input
of column
addresses to the
memory.

RAS or Row
Address Strobe is a
clock used in
dynamic memories
to control the input
of row addresses to
the memory

18

E. Sisinni – Digital Systems for Signal Processing

The memory map

19

The memory map lists the
addresses in memory allocated to
each portion of the application.
Usually ROM hold words that are
not expected to change at
runtime. RAM is the space
available to hold data, among
other things.
If the design is using memory
mapped I/O, then all of physical
memory will not be available for
data or code
Virtual memory makes possible
for the required code and data
space to exceed total available
primary memory
E. Sisinni – Digital Systems for Signal Processing

The memory map

20

We need glue-logic for address decoding

0x0000-0x0FFF
0x1000-0x1FFF
0x2000-0x2FFF
0x3000-0xFFFF

E. Sisinni – Digital Systems for Signal Processing

4K RAM0
4K RAM1
4K RAM2
Vacant

Full I/O decoding

21

Full I/O decoding involves checking every single line (ie. all bits) of
the address bus (and the I/O R/W signal eventually) to determine if a
device is selected or not. With Full I/O decoding, each hardware
register is mapped to an unique I/O port address.

Full address decoding is very efficient in the use of the available I/O
address space (one I/O address for one hardware register), but is
often impracticable to use because of the excessive hardware
needed to implement it.

E. Sisinni – Digital Systems for Signal Processing

Full I/O decoding
2K
2K
8K
1
1

22

E. Sisinni – Digital Systems for Signal Processing

Partial I/O decoding


Partial I/O decoding only checks for a few lines (ie. bits) of the
address bus (and the I/O R/W signal eventually) to determine if a
device is selected or not.
There are caveats to such simple decoding:
Ghost addresses
– Since not all the address bus lines are decoded, a device can respond
to several differents I/O address but, more importantly, several devices
can respond to the same ghost address (which may lead to bus conflict,
see below).

Bus conflict
– This is a short circuit between two,
or more, devices trying to drive the
DATA bus at the same time.

23

E. Sisinni – Digital Systems for Signal Processing

Partial I/O decoding

24

E. Sisinni – Digital Systems for Signal Processing

An example: SRAM design

25

A system specification requires an SRAM system that can store up
to 4 K 16-bit words. However, the largest memory device available is
1 K (1024) by 8-bit words  the design will require eight of the
smaller memory devices: two sets of four.
In the worst case, to support 4 K 16-bit words, 12 address lines and
16 data lines are required. If sufficient lines are available on the uP,
the design is straightforward.
Let's assume that such is not the case and that only 8 address
lines and 8 data lines are available. Under such a restriction, two
address transfers and two data transfers will be necessary to
complete a single transaction.
Ten address bits are needed to identify each cell in the device. Next,
one must be able to identify which of the 1 K blocks to read from or
write to. Two additional address bits enable such a selection to be
made. These four combinations be used to activate the chip select
(CS) control and the output enable (OE)
E. Sisinni – Digital Systems for Signal Processing

An example: SRAM design
Data Bus D7-D0
Address Bus D7-D0
R/W

GPIOs

26

E. Sisinni – Digital Systems for Signal Processing

An example: WRITE operation


27

Since the uP only supports eight address lines, the full address is
built up in two transfers on the address bus.
Each address and data byte is stored in a register.
Each address/data transfer is accompanied by a strobe signal. After
the data has been stored in the data latches, the write command is
issued.

E. Sisinni – Digital Systems for Signal Processing

An example: READ operation

28

To execute a read operation, the desired memory address is
selected, as was done during the write operation. The proper chip
select signal, combined with the state of the read line, begins the
read process on the selected memory block.
For a read operation, one must disable the outputs of the data
latches and enable the memory output drivers.

E. Sisinni – Digital Systems for Signal Processing

An example: the whole picture

A multiplexed
implementation is the
more common
architecture: sharing
one set of bus lines
between the two
functions (address
and data).

Under such
circumstances, the
addressand data
registers are
necessary for
temporary storage.

29

E. Sisinni – Digital Systems for Signal Processing

Accessing the I/O



External devices are accessed by means of registers
External devices are almost always connected not directly to the
system bus but to an INTERFACE.
Registers in the interface allow for a wide range of possibilities for
the designer to determine how it is to interface to the bus.
Typically consists of three registers
– Control Register- the setting of
which will determine if the interface
is to send or receive.
– Data Register – for the data
element to be transmitted or to hold
a data element received.
– Status Register – used to obtain
information about the “status”
(diagnostic) of the I/O device

30

I/O Interface
Control
Register

Data and
Status Registers

Input/Output Device

E. Sisinni – Digital Systems for Signal Processing

Memory mapped I/O

I/O Devices and memory share the same address space.
– Each I/O Device (Interface) is assigned a unique set of addresses.
– When the processor places a particular address on the address lines, the
device recognizing this address responds to the commands on the control
lines.
– The processor requests either a read or a write operation, and the
requested data is transferred over the data lines.
– Any machine instruction that can access memory can be used to transfer
data to/from I/O devices.
Address Lines
Data Lines

FFFF

Control Lines
I/O

Memory

Address
Decoder

Control
Circuits

Data and
Status Registers

I/O Interface
0000

Peripheral

31

Input/Output Device

E. Sisinni – Digital Systems for Signal Processing

Memory mapped Input

32

E. Sisinni – Digital Systems for Signal Processing

Memory mapped Output

33

E. Sisinni – Digital Systems for Signal Processing

Port Mapped I/O

Memory and I/O…
– Occupy different “spaces”
– Are accessed by unique instructions


Differentiated by instructions (Memory vs I/O instructions)
I/O instructions
– move data to/from a specified I/O address (“port”) and a CPU register
(e.g., the accumulator)
FFFF
– IN port – inputs data from a device FFFF
– OUT port – outputs data to a device

34

Typically, access to memory
and I/O uses the same address
bus and data bus
A dedicated control bus signal
differentiates a “memory cycle”
from an “I/O cycle”

Memory

0000

E. Sisinni – Digital Systems for Signal Processing

I/O

0000

Memory subsystem architecture

Memory hierarchy: the metric is based on speed and storage
capacity (and cost!).



35

At the top are the slowest, largest, and least
expensive secondary memory.
In the middle is main or primary memory.
At the bottom are the smallest, fastest memories
called cache memory.
CPU registers are sometimes included in the ranking
as higher speed memory than cache.

E. Sisinni – Digital Systems for Signal Processing

Caching

36

Cache is a small, fast memory that temporarily holds copies of block
data and program instructions from the main memory.
Harvard architecture, will internally support both an icache
(instruction cache) and a dcache (data cache).

E. Sisinni – Digital Systems for Signal Processing

Locality

37

Program execution generally occurs either
sequentially or in small loops with a small
number of instructions. With respect to the
entire program, actual execution takes
place within a small window that moves
forward through the program: sequential
locality of reference.
Spatial locality suggests that a future
access of a resource, a memory address
in this case, is going to be physically near
one previously accessed.
Temporal locality suggests that a future
access of a resource,again, a memory
address, is going to be temporally near
one recently accessed.

E. Sisinni – Digital Systems for Signal Processing

Cache systems


Cache memory is organized into several levels.
The application program begins executing and encounters a need
for a piece of data or an instruction. To locate that item, first the
cache is checked.
If the item is found, there is a cache hit.
If the item is not found, there has been a cache miss and the item
must be obtained from somewhere else.





38

How do we know when something is not in the cache?
Where do we go to find something if it is not in the cache?
What if it's not there?
How do we know if there is rood left in the cache?
How do we know if information in the cache was modified?
How do we select the block to replace?

E. Sisinni – Digital Systems for Signal Processing

Cache systems – Direct mapping

The main memory page size is set equal to the cache size;
therefore, each page will contain a corresponding number of blocks.
Block address 0
Block address 1

Block address 0
Block address 1

Block address 0
Block address 1
Block address 0
Block address 1
Block address 0
Block address 1

39

E. Sisinni – Digital Systems for Signal Processing

Cache systems – Direct mapping – An example
The specifications:
• The cache and main memory will store 32-bit words.
• The cache size will be 64 K words (128K -> 17bit-addresses).
• The cache will be organized as 128 0.5 K word blocks.
• The cache will implement a direct mapped replacement algorithm.
• Memory addresses will be 32 bits.
• Main memory size will be 128 M words.
• Main memory will be organized as 2 K pages (128M/64K);
– page size = cache size -> each page will hold 128 blocks.

40

E. Sisinni – Digital Systems for Signal Processing

Cache systems – Direct mapping – An example
Address Interpretation in the Cache Context:
• Each data or instruction word is 32 bits (4B) long; bits A1 and A0
identify a byte within a word.
• Each block contains 512 words. Address bits A10-A2 identify a word
in a block.
• The block address within the cache is identified by address bits A17A11. These bits are called the index into cache and also
correspond to the block's address within a main memory page.
• Bits A31-A17 identify which main memory page the block came
from. This value is called the tag. These values will be stored in a
data structure called a tag table and are used when testing to see if
the needed word is in a cache.
Use for search in the cache!

41

E. Sisinni – Digital Systems for Signal Processing

The TAG table

The tag table contains one record for each block in the cache (i.e.
for the current design128 entries). Typical information contained in
each record includes:
– TAG: A subset of bits from the main memory address identifying the
page (in main memory) where the block originated.
– VALID BIT: A flag indicating whether the corresponding block contains
valid data (i.e. just memorized). If the valid bit is TRUE, the block must
be checked for changes before overwriting it.
– DIRTY BIT: A flag indicating whether the corresponding block contains
data that has been modified. Cache and main memory must be
coherent. The write through approach propagates any data change
immediately to main memory; the delayed write approach assumes that
if a piece of data changed once, it may change again in the near future.
Thus, time can be saved by not performing (potentially) multiple write
operations to the same data.
– TIME: when the block was brought into the cache or when it was last
accessed

42

E. Sisinni – Digital Systems for Signal Processing

Cache systems – Associative mapping

A new block can be placed anywhere in the cache. An associative
search is then executed to locate it. Such an algorithm searches by
content rather than by address.
To find a word in the cache, the tag and block portions of the
memory address specify the target for the associative search.

Block address 0
Block address 1

Use for search in the cache!

Block address ?
Block address ?

43

E. Sisinni – Digital Systems for Signal Processing

Cache systems – Associative mapping

44

With associative mapping
algorithm, time is added
as one of the components
of the tag table record.
Two of the commonly
algorithms applying
temporal locality are:
Least Recently Used
(FIFO) and Most
Recently Used (LIFO).
A third algorithm selects
and removes a block at
random.

E. Sisinni – Digital Systems for Signal Processing

Static vs Dynamic memory allocation

45

Static memory allocation: The required
memory space for a declared variable is
allocated at compilation time. The
program “knows” the actual data location.
Dynamic memory allocation: Memory is
assigned during run time. The program
“knows” only a pointer to the actual data
location. Memory requests are satisfied
by allocating portions from a large pool of
memory called the heap. At any given
time, some parts of the heap are in use,
while some are "free" (unused) and thus
available for future allocations. Several
issues complicate implementation, such
as internal and external fragmentation,

E. Sisinni – Digital Systems for Signal Processing

Dynamic memory allocation


46

Dynamic means we allocate memory at runtime.
How managing main memory to accommodate
– programs larger than main memory
– multiple processes in main memory (process=program+data+stack=an
instance or invocation of a program)
– multiple programs in main memory
Overlays: one of two or more pieces of code (or data) that can be loaded to
a pre-determined memory region on demand at runtime. Initially, each
overlay is stored in ROM/Flash, just like ordinary code/data. During runtime,
an overlay can be copied to a known address in RAM and executed there
when required. This can later be replaced by another overlay when
required.
Swapping: the system remains resident in memory and further assumes
that only a single user program is resident in memory at a time. One
program but many processes; save the context and swap among them!
Multiprogramming: permits one to run multiple programs in the same
memory space; we need an OS or at least a dispatcher
E. Sisinni – Digital Systems for Signal Processing

Process vs Programming


A process is a program in execution; it is often called a job or task
Program is static, just a bunch of bytes
No one-to-one mapping between processes and programs
– can have multiple processes of the same program
– one process can invoke multiple programs

A process consists of (at least):
– an address space
– the code and the data for the running
program
– an execution stack and stack pointer (SP)
(traces state of procedure calls made)
– the program counter (PC), indicating the next instruction
– general-purpose processor registers and their values
– a set of OS resources (open files, network connections, sound
channels, …)

47

E. Sisinni – Digital Systems for Signal Processing

Testing memories – Data lines

48

Assumption: the design of chips is correct and they contain no
internal manufacturing defects (ignore soft errors).
To test for the stuck-at-1 condition, a pattern of all 0's is written to a
memory address and followed by a read operation from the same
address. For a stuck-at-0 condition, the vice versa apply. If the same
data is read as was written, then a stuck-at fault does not exist on
any of the data lines.
A bridge fault connects two (or more) data lines; the actual voltage
level depends on the relative strengths of the driving signals. The
assumption here is that each of those signals (D0 and D1) will share
a common value.

E. Sisinni – Digital Systems for Signal Processing

Testing memories – Address lines

49

In the presence of a stuck-at address line fault, two different memory
addresses are mapped to the same location. An address bit, A0, is
selected as the bit under test. Next, a data pattern, e.g. . . . .0000, is
chosen and that data is written to memory address . . . xxx0. A
different data pattern, say . . . .1111, is then selected and written to
memory address . . . xxx1.
The contents of the two locations are then read. If there is a stuck-at
fault on A0, both addresses will be mapped to the same location and
the same data will be read from the two different addresses.

E. Sisinni – Digital Systems for Signal Processing

Testing memories – Address lines

50

Assume that a test for a bridge between address bits A0 and A1 is
conducted. Any address of the form: . . . xxxx01 is selected.
Next, a background data pattern (e.g. ...1111) is written to the two
possible aliased addresses: . . . xxxx00 and ... xxxx11. A different
data pattern (. . . 0000, for example) is then written to the test
address.
Finally, all three addresses are read. The test may have any of four
possible outcomes: no bridge fault, the logical 1 in the test address
dominates, or the logical 0 dominates, or neither dominates (in such
a case all three patterns are affected)

E. Sisinni – Digital Systems for Signal Processing

Testing memories - ROM

51

The ROM stores a particular set of data. If the data are incorrect, the
device is considered to have a failure. Thus, the testing strategy
must address the stuck-at and bridging faults as well as ensuring
that the correct data has been stored.
An effective method for testing ROM memories that can address all
of these issues is based on the CRC or cyclic redundancy check
and is known as signature analysis.

E. Sisinni – Digital Systems for Signal Processing