You are on page 1of 73

BCS-29

Advanced Computer Architecture


External Memory
Types of External Memory
Magnetic Disk
Floppy Disk
Hard Disk
Removable
Optical
CD-ROM
CD-Recordable (CD-R)
CD-R/W
DVD
Magnetic Tape
PD/SSD

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-2


Floppy Disk
• The first floppy disks, invented and made by IBM.
• First Disk diameter of 8 inches (203 mm) Subsequently 5 1⁄4-
inch (133 mm) and then ​3 1⁄2 inch (90 mm)
• Capacity
• 360 KB
• 1.2 MB
• 1.44 MB

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-3


Magnetic Disk
• Disk substrate coated with magnetizable material
(iron oxide…rust)
• Substrate used to be aluminium
• Now glass
• Improved surface uniformity
• Increases reliability
• Reduction in surface defects
• Reduced read/write errors
• Lower flight heights (See later)
• Better stiffness
• Better shock/damage resistance

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-4


Disk Medium Materials
• Aluminum with a deposit of magnetic material
• Some disks also use glass platters
• Eg. Newer IBM/Hitachi products
• Better surface uniformity and stiffness but harder to
deposit magnetic material
• Anti-Ferromagnetically Coupled media
• Uses two magnetic layers of opposite polarity to reinforce
the orientation.
• Can provide higher densities but at higher manufacturing
complexity

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-5


Read and Write Mechanisms
• Recording & retrieval via conductive coil called a head
• May be single read/write head or separate ones
• During read/write, head is stationary, platter rotates
• Write
• Current through coil produces magnetic field
• Pulses sent to head
• Magnetic pattern recorded on surface below
• Read (traditional)
• Magnetic field moving relative to coil produces current
• Coil is the same for read and write
• Read (contemporary)
• Separate read head, close to write head
• Partially shielded magneto resistive (MR) sensor
• Electrical resistance depends on direction of magnetic field
• High frequency operation
• Higher storage density and speed

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-6


Inductive Write MR Read

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-7


New Recording Technologies
• Longitudinal Recording now expected to extend
above 100 Gb/sq-in.

• Perpendicular Recording expected to extend to 1


Tb/sq-in

• Beyond that:
• Heat-assisted recording (HAMR)

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-8


Data Organization and Formatting
• Concentric rings or tracks
• Gaps between tracks
• Reduce gap to increase capacity
• Same number of bits per track (variable packing density)
• Constant angular velocity
• Tracks divided into sectors
• Minimum block size is one sector
• May have more than one sector per block

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-9


Storage Density
• Determines both capacity and performance
• Density Metrics
• Linear density (Bits/inch or BPI) BPI
• Track density (Tracks/inch or TPI)
• Areal Density = BPIxTPI
TPI

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-10


Disk Data Layout

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-11


Disk Rotation
• Bit near centre of rotating disk passes fixed point slower than
bit on outside of disk
• Increase spacing between bits in different tracks
• Rotate disk at constant angular velocity (CAV)
• Gives pie shaped sectors and concentric tracks
• Individual tracks and sectors addressable
• Move head to given track and wait for given sector
• Waste of space on outer tracks
• Lower data density
• Can use zones to increase capacity
• Each zone has fixed bits per track
• More complex circuitry

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-12


Disk Layout Methods Diagram

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-13


Winchester Disk Format (Seagate ST506)

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-14


HDD Organization

Arm
Assembly Spindle Cylinder
Arm Head

Platter
Track

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-15


HDD Organization
• Typical configurations seen in disks today
• Platter diameters: 3.7”, 3.3”, 2.6”
• RPMs: 5400, 7200, 10000, 15000
• 0.5-1% variation in the RPM during operation
• Number of platters: 1-5
• Mobile disks can be as small as 0.75”
• Power proportional to: (# Platters)*(RPM)2.8(Diameter)4.6
• Tradeoff in the drive-design
• Read/write head
• Reading – Faraday’s Law
• Writing – Magnetic Induction
• Data-channel
• Encoding/decoding of data to/from magnetic phase changes

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-16


Tracks and Sectors
• Bits are grouped into sectors
• Typical sector-size = 512 B of data
• Sector also has overhead information
• Error Correcting Codes (ECC)
• Servo fields to properly position the head
Internal Data Rate (IDR)
• Rate at which data can be read from or written to the
physical media
• Expressed in MB/s
• IDR is determined by
• BPI
• Platter-diameter
• RPM

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-17


Formatting
Low Level Formatting (Physical or true formatting)
1. It is done at the factory level. (In low level
formatting all the data stored on the disk is lost as
the disk is physically formatted)
2. It magnetically divides the disk into tracks and
sector.
3. Basic addressing information is written to each
sector of each cylinder.
4. It checks for bad sectors and maps them out.

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-18


Formatting
High Level Formatting
1. It is done with the help of OS.
2. High level Format program scans the disk for tracks and sectors marked
bad during low level formatting. The scanning program performs five
retries to read the tracks or sectors. If the tracks are still unreadable, the
area is noted as bad cluster in FAT.
3. After scanning the entire disk, the drive heads return to the first sector of
the partition and write MBR. Immediately in the next sector 1st copy of
FAT is written and after that 2nd copy of FAT is written. Initially FATS are
blank except for the bad cluster marks found in the initial scan.
4. After the 2nd copy of FAT blank root directory is created.

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-19


FAT (File Allocation Table) File System
1. Developed by Microsoft for MS-DOS, MS-Windows 95,98,Me
2. FAT located in MBR sector of bootable disk
3. 2 Important Functions of FAT;
4. contains allocation information (in the form of linked list)
5. Indicate which allocation units are free.
6. It is simple and reliable. Two identical copies of FAT are used.

Structure of FAT

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-20


New Technology File System (NTFS)
Structure of NTFS

1. Used by Windows NT, XP, 2000 , Server 2003, Server 2008, Windows Vista
2. NTFS provides better performance, security compatibility and
extendibility than FAT
3. Read, Search, Write, Recovery are done fast.
4. Master File Table (MFT) contain information about all files and folders.
First file on NTFS volume.
5. Partition Boot Sector Start at Sector 0 to 16. First Info on an NTFS volume.

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-21


Features of NTFS

1. It allows you to encrypt files and automatically decrypt them as


they are read.
2. Supports long file names upto 255 characters
3. Supports File Size upto 2 TB
4. For keeping track of clusters it uses a B- tree directory
5. Reliable File System as compared to FAT
6. Allows Large partition sizes i.e more than 4 GB
7. Built-in file compression facility
8. Improved Security And access control deciding who can
perform what sorts of operations on various data within the file
system

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-22


Addressing and Finding Sectors
• Must be able to identify start of track and sector
• Format disk
• Additional information not available to user
• Marks tracks and sectors
• Seek time
• Moving head to correct track
• (Rotational) latency
• Waiting for data to rotate under head
• Access time = Seek + Latency
• Transfer rate

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-23


Timing of Disk I/O Transfer

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-24


RAID
• Redundant Array of Inexpensive Disks
• Basic idea is to connect multiple disks together to
provide
• large storage capacity
• faster access to reading data
• redundant data
• Many different levels of RAID systems
• differing levels of redundancy, error checking, capacity,
and cost
Striping

• Take file data and map it to different disks


• Allows for reading data in parallel

file data block 0 block 1 block 2 block 3

Disk 0 Disk 1 Disk 2 Disk 3


Parity
• Way to do error checking and correction
• Add up all the bits that are 1
• if even number, set parity bit to 0
• if odd number, set parity bit to 1
• To actually implement this, do an exclusive OR of
all the bits being considered
• Consider the following 2 bytes
byte parity
10110011 1
01101010 0
• If a single bit is bad, it is possible to correct it
Mirroring
• Keep to copies of data on two separate disks
• Gives good error recovery
• if some data is lost, get it from the other source
• Expensive
• requires twice as many disks
• Write performance can be slow
• have to write data to two different spots
• Read performance is enhanced
• can read data from file in parallel
RAID Level-0
• Often called striping
• Break a file into blocks of data
• Stripe the blocks across disks in the system
• Simple to implement
• disk = file block % number of disks
• sector = file block / number of disks
• provides no redundancy or error detection
• important to consider because lots of disks means low
Mean Time To Failure (MTTF)
RAID Level-0

file data block 0 block 1 block 2 block 3 block 4

0 block 0 0 block 1
1 block 2 1 block 3
sectors 2 block 4 2
sectors
3 3
4 4
5 5

Disk 0 Disk 1
RAID Level-1

• A complete file is stored on a single disk


• A second disk contains an exact copy of the file
• Provides complete redundancy of data
• Read performance can be improved
• file data can be read in parallel
• Write performance suffers
• must write the data out twice
• Most expensive RAID implementation
• requires twice as much storage space
RAID Level-1

file data block 0 block 1 block 2 block 3 block 4

0 block 0 0 block 0
1 block 1 1 block 1
sectors 2 block 2 2 block 2
sectors
3 block 3 3 block 3
4 block 4 4 block 4
5 5

Disk 0 Disk 1
RAID Level-2
• Stripes data across disks similar to Level-0
• difference is data is bit interleaved instead of block
interleaved
• Uses ECC to monitor correctness of information on
disk
• Multiple disks record the ECC information to
determine which disk is in fault
• A parity disk is then used to reconstruct corrupted
or lost data
RAID Level-2

file data block 0 block 1 block 2 block 3 block 4

Data Disk Data Disk ECC Disk ECC Disk Parity Disk
RAID Level-2
• Reconstructing data
• assume data striped across eight disks
• correct data: 10011010
• parity: 0
• data read: 10011110
• if we can determine that disk 2 is in error
• just use read data and parity to know which bit to flip
RAID Level-2
• Requires fewer disks than Level-1 to provide
redundancy
• Still needs quite a few more disks
• for 10 data disks need 4 check disks plus parity disk
• Big problem is performance
• must read data plus ECC code from other disks
• for a write, have to modify data, ECC, and parity disks
• Another big problem is only one read at a time
• while a read of a single block can be done in parallel
• multiple blocks from multiple files can’t be read
because of the bit-interleaved placement of data
RAID Level-3
• One big problem with Level-2 are the disks needed to
detect which disk had an error
• Modern disks can already determine if there is an
error
• using ECC codes with each sector
• So just need to include a parity disk
• if a sector is bad, the disk itself tells us, and use the parity
disk to correct it
RAID Level-4
• Big problem with Level-2 and Level-3 is the bit
interleavening
• to access a single file block of data, must access all the
disks
• allows good parallelism for a single access but doesn’t
allow multiple I/O’s
• Level-4 interleaves file blocks
• allows multiple small I/O’s to be done at once
RAID Level-4
• Still use a single disk for parity
• Now the parity is calculated over data from multiple
blocks
• Level-2,3 calculate it over a single block
• If an error detected, need to read other blocks on
other disks to reconstruct data
Level-4 vs. Level-2,3
0 1 2 3
a

b 4 different disks
Transfer Units
c

a0 b0 c0 d0 a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3
L3
a b c d
L3 Parity

a0 a1 a2 a3 b0 b1 b2 b3 c0 c1 c2 c3 d0 d1 d2 d3
L4

0 1 2 3
L4 Parity
RAID Level-4
• Reads are simple to understand
• want to read block A, read it from disk 0
• if there is an error, read in blocks B,C, D, and parity
block and calculate correct data
• What about writes?
• it looks like a write still requires access to 4 data disks
to recalculate the parity data
• not true, can use the following formula
• new parity = (old data xor new data) xor old parity
• a write requires 2 reads and 2 writes
RAID Level-4
• Doing multiple small reads is now faster than before
• However, writes are still very slow
• this is because of calculating and writing the parity blocks
• Also, only one write is allowed at a time
• all writes must access the check disk so other writes have
to wait
RAID Level-5
• Level-5 stripes file data and check data over all the
disks
• no longer a single check disk
• no more write bottleneck
• Drastically improves the performance of multiple
writes
• they can now be done in parallel
• Slightly improves reads
• one more disk to use for reading
RAID Level-5
Level-4 Level-5
check
data disks data and check disks
disk

1 2 3 4 5 1 2 3 4 5

S0 S0

S1 S1

S2 S2

S3 S3

S4 S4

S5 S5
RAID Level-5

• Notice that for Level-4 a write to sector 0 on disk 2


and sector 1 on disk 3 both require a write to disk
five for check information
• In Level-5, a write to sector 0 on disk 2 and sector
1 on disk 3 require writes to different disks for
check information (disks 5 and 4, respectively)
• Best of all worlds
• read and write performance close to that of RAID Level-
1
• requires as much disk space as Levels-3,4
RAID Level-10
• Combine Level-0 and Level-1
• Stripe a files data across multiple disks
• gives great read/write performance
• Mirror each strip onto a second disk
• gives the best redundancy
• The most high performance system
• The most expensive system
Optical Storage CD-ROM
• Originally for audio
• 650Mbytes giving over 70 minutes audio
• Polycarbonate coated with highly reflective coat,
usually aluminium
• Data stored as pits
• Read by reflecting laser
• Constant packing density
• Constant linear velocity

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-47


CD Operation

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-48


CD-ROM Drive Speeds
• Audio is single speed
• Constant linier velocity
• 1.2 ms-1
• Track (spiral) is 5.27km long
• Gives 4391 seconds = 73.2 minutes
• Other speeds are quoted as multiples
• e.g. 24x
• Quoted figure is maximum drive can achieve

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-49


CD-ROM Format

• Mode 0=blank data field


• Mode 1=2048 byte data+error correction
• Mode 2=2336 byte data

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-50


Virtual Memory
• To facilitate the use of memory hierarchies, the memory
addresses normally generated by modern processors
executing application programs are not physical addresses,
but are rather virtual addresses of data items and
instructions.
• Physical addresses, of course, are used to reference the
available locations in the real physical memory of a system.
• Virtual addresses must be mapped to physical addresses
before they can be used.

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-51


Virtual to Physical Mapping

• The mapping from virtual to physical addresses can be


formally defined as follows:
 if m  M has been allocated to store
m,
ft v =  the data identified by virtual address m

 if data v is missing in M

The mapping returns a physical address if a memory hit


occurs. If there is a memory miss, the referenced item
has not yet been brought into primary memory.

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-52


Mapping Efficiency
• The efficiency with which the virtual to physical mapping can
be accomplished significantly affects the performance of the
system.
• Efficient implementations are more difficult in multiprocessor
systems where additional problems such as coherence,
protection, and consistency must be addressed.

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-53


Virtual Memory Models
• Private Virtual Memory
• In this scheme, each processor has a separate virtual
address space, but all processors share the same physical
address space.
• Advantages:
• Small processor address space
• Protection on a per-page or per-process basis
• Private memory maps, which require no locking
• Disadvantages
• The synonym problem – different virtual addresses in different/same
virtual spaces point to the same physical page
• The same virtual address in different virtual spaces may point to
different pages in physical memory

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-54


Virtual Memory Models
• Shared Virtual Memory
• All processors share a single shared virtual address space, with each
processor being given a portion of it.
• Some of the virtual addresses can be shared by multiple processors.
• Advantages:
• All addresses are unique
• Synonyms are not allowed
• Disadvantages
• Processors must be capable of generating large virtual addresses (usually > 32
bits)
• Since the page table is shared, mutual exclusion must be used to guarantee
atomic updates
• Segmentation must be used to confine each process to its own address space
• The address translation process is slower than with private (per processor) virtual
memory

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-55


Memory Allocation
• Both the virtual address space and the physical
address space are divided into fixed-length pieces.
• In the virtual address space these pieces are called pages.
• In the physical address space they are called page frames.
• The purpose of memory allocation is to allocate
pages of virtual memory using the page frames of
physical memory.

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-56


Address Translation Mechanisms
• [Virtual to physical] address translation requires use
of a translation map.
• The virtual address can be used with a hash function to
locate the translation map (which is stored in the cache,
an associative memory, or in main memory).
• The translation map is comprised of a translation
lookaside buffer, or TLB (usually in associative memory)
and a page table (or tables). The virtual address is first
sought in the TLB, and if that search succeeds, not further
translation is necessary. Otherwise, the page table(s)
must be referenced to obtain the translation result.
• If the virtual address cannot be translated to a physical
address because the required page is not present in
primary memory, a page fault is reported.

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-57


Centralized vs. Distributed Shared Memory

• Centralized shared-memory multiprocessor or


Symmetric shared-memory multiprocessor (SMP)
• Multiple processors connected to a single
centralized memory – since all processors see the
same memory organization → uniform memory
access (UMA)
• Shared-memory because all processors can access
the entire memory address space
• Can centralized memory emerge as a bandwidth
bottleneck? – not if you have large caches and
employ fewer than a dozen processors

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-58


SMPs or Centralized Shared-Memory

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-59


Distributed Memory Multiprocessors
• For higher scalability, memory is distributed among
processors → distributed memory multiprocessors
• If one processor can directly address the memory local to
another processor, the address space is shared → distributed
shared-memory (DSM) multiprocessor
• If memories are strictly local, we need messages to
communicate data → cluster of computers or multicomputers
• Non-uniform memory architecture (NUMA) since local
memory has lower latency than remote memory

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-60


Distributed Memory Multiprocessors

Dr P K Singh BCS-29 Advanced Computer Architecture BCS-29(!)-61


Categories of Parallel Computers(MIMD)

• Considering their architecture only, there are two


main categories of parallel computers:
• systems with shared common memories, and
• systems with unshared distributed memories.
Shared-Memory Multiprocessors
• Shared-memory multiprocessor models:
• Uniform-memory-access (UMA)
• Nonuniform-memory-access (NUMA)
• Cache-only memory architecture (COMA)
• These systems differ in how the memory and
peripheral resources are shared or distributed.

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-62


Categories of Parallel Computers(MIMD)

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-63


The UMA Model
• Physical memory uniformly shared by all processors, with equal access
time to all words.
• Processors may have local cache memories.
• Peripherals also shared in some fashion.
• Tightly coupled systems use a common bus, crossbar, or multistage
network to connect processors, peripherals, and memories.
• Many manufacturers have multiprocessor (MP) extensions of uniprocessor
(UP) product lines.
• Synchronization and communication among processors achieved through shared
variables in common memory.
• Symmetric MP systems – all processors have access to all peripherals, and any
processor can run the OS and I/O device drivers.
• Asymmetric MP systems – not all peripherals accessible by all processors; kernel
runs only on selected processors (master); others are called attached processors
(AP).

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-64


The UMA Multiprocessor Model

P1 P2 … Pn

System Interconnect
(Bus, Crossbar, Multistage network)

I/O SM1 … SMm

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-65


Performance Calculation

• Consider two loops. The first loop adds corresponding elements of two N-
element vectors to yield a third vector. The second loop sums elements of
the third vector. Assume each add/assign operation takes 1 cycle, and
ignore time spent on other actions (e.g. loop counter
incrementing/testing, instruction fetch, etc.). Assume interprocessor
communication requires k cycles.
• On a sequential system, each loop will require N cycles, for a total of 2N
cycles of processor time.
• On an M-processor system, we can partition each loop into M parts, each
having L = N / M add/assigns requiring L cycles. The total time required is
thus 2L. This leaves us with M partial sums that must be totaled.
• Computing the final sum from the M partial sums requires l = log2(M)
additions, each requiring k cycles (to access a non-local term) and 1 cycle
(for the add/assign), for a total of l  (k+1) cycles.
• The parallel computation thus requires
2N / M + (k + 1) log2(M) cycles.

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-66


Performance Calculation
• Assume N = 220.
• Sequential execution requires 2N = 221 cycles.
• If processor synchronization requires k = 200 cycles,
and we have M = 256 processors, parallel execution
requires
2N / M + (k + 1) log2(M)
= 221 / 28 + 201  8
= 213 + 1608 = 9800 cycles
• Comparing results, the parallel solution is 214 times
faster than the sequential, with the best theoretical
speedup being 256 (since there are 256 processors).
Thus the efficiency of the parallel solution is 214 /
256 = 83.6 %.

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-67


The NUMA Model
• Shared memories, but access time depends on the location of the data
item.
• The shared memory is distributed among the processors as local
memories, but each of these is still accessible by all processors (with
varying access times).
• Memory access is fastest from the locally-connected processor, with the
interconnection network adding delays for other processor accesses.
• Additionally, there may be global memory in a multiprocessor system,
with two separate interconnection networks, one for clusters of
processors and their cluster memories, and another for the global shared
memories.

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-68


Shared Local Memories

• The shared memory is physically distributed to all processors, called local


memories. The collection of all local Memories forms a global address
space accessible by all processors.
• It is faster to access a local memory with a local processor. The access of
remote memory attached to other processors takes longer time due to the
added delay through the interconnection network.

LM1 P1

LM2 P2
. Inter-
.
. connection .
. Network .
LMn Pn

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-69


Hierarchical Cluster Model
• In the hierarchically structured multiprocessor the processors are divided into
several cluster. Each cluster is itself an UMA or a NUMA multiprocessor.
• The clusters are connected to global shared-memory modules. The entire system is
considered a NUMA multiprocessor.
• All processors belonging to the same cluster are allowed to uniformly access the
cluster shared-memory modules. GSM GSM … GSM

Global Interconnect Network

P CSM P CSM

P C CSM P CSM
I … C
. N . . I .
. . . N .
. . . .
P CSM P CSM

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-70


The COMA Model
• In the COMA(Cache-Only Memory Architecture) model, processors only
have cache memories; the caches, taken together, form a global address
space.
• Each cache has an associated directory that aids remote machines in their
lookups; hierarchical directories may exist in machines based on this
model.
• Initial data placement is not critical, as cache blocks will eventually
migrate to where they are needed.
Interconnection Network

D D D

C C … C

P P P

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-71


Distributed-Memory Multicomputers
• This system consists of multiple computers, often called nodes, inter-
connected by a message-passing network. Each node is an autonomous
computer consisting of a processor, local memory and sometimes
attached disks or I/O peripherals.
• The message-passing network provides point-to-point static connections
among the nodes.
• All local memories are private and are accessible only by local processors.
For this reason, traditional multicomputers have also been called no-
remote-memory-access(NORMA) machines.
• Internode communication is carried out by passing messages through the
static connection network. With advances in interconnection and network
technologies, this model of computing has gained importance.

Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-72


Distributed-Memory Multicomputers

P P

M M

M P P M
Message-passing
interconnection
network
M P P M

P P

M M
Dr. P K Singh MMMUT, Gorakhpur BCS-29(!)-73

You might also like