CSS490 Fundamentals

Textbook Ch1 Instructor: Munehiro Fukuda

These slides were compiled from the course textbook and the reference books.
Winter, 2004 CSS490 Fundamentals 1

Parallel v.s. Distributed Systems
Parallel Systems Distributed Systems Memory Tightly coupled shared memory UMA, NUMA Global clock control SIMD, MIMD Order of Tbps Bus, mesh, tree, mesh of tree, and hypercube (-related) network Performance Scientific computing Distributed memory Message passing, RPC, and/or used of distributed shared memory No global clock control Synchronization algorithms needed Order of Gbps Ethernet(bus), token ring and SCI (ring), myrinet(switching network) Performance(cost and scalability) Reliability/availability Information/resource sharing
CSS490 Fundamentals 2


Processor interconnection Main focus

Winter, 2004

Milestones in Distributed Computing Systems
1945-1950s 1950s-1960s 1960s 1960s-1970s 1969-1973 1960s-early1980s Early 1980s 1980s ± present 1990s Late 1990s
Winter, 2004

Loading monitor Batch system Multiprogramming Time sharing systems WAN and LAN Minicomputers Workstations Workstation/Server models Clusters Grid computing
CSS490 Fundamentals

Multics, IBM360 ARPAnet, Ethernet PDP, VAX Alto Sprite, V-system Beowulf Globus, Legion

System Models 

Minicomputer model Workstation model Workstation-server model Processor-pool model Cluster model Grid computing

Winter, 2004

CSS490 Fundamentals


Minicomputer Model


ARPA net


Extension of Time sharing system  User must log on his/her home minicomputer.  Thereafter, he/she can log on a remote machine by telnet. Resource sharing  Database  High-performance devices
Winter, 2004 CSS490 Fundamentals 5

Workstation Model
Workstation Workstation Workstation 100Gbps LAN



Process migration  Users first log on his/her personal workstation.  If there are idle remote workstations, a heavy job may migrate to one of them. Problems:  How to find am idle workstation  How to migrate a job  What if a user log on the remote machine
Winter, 2004 CSS490 Fundamentals 6

Workstation-Server Model
Client workstations  Diskless Workstation  Graphic/interactive applications processed in local  All file, print, http and even cycle computation Workstation Workstation requests are sent to servers.  Server minicomputers  Each minicomputer is dedicated to one or more 100Gbps different types of services. LAN  Client-Server model of communication  RPC (Remote Procedure Call)  RMI (Remote Method Invocation) MiniMiniMini A Client process calls a server process¶ Computer Computer Computer function. file server http server cycle server  No process migration invoked  Example: NSF 

Winter, 2004

CSS490 Fundamentals


Processor-Pool Model 

100Gbps LAN 

Server 1

Server N 

Clients:  They log in one of terminals (diskless workstations or X terminals)  All services are dispatched to servers. Servers:  Necessary number of processors are allocated to each user from the pool. Better utilization but less interactivity

Winter, 2004

CSS490 Fundamentals


Cluster Model
Workstation Workstation

Client  Takes a client-server Workstation model  Server 100Gbps  Consists of many LAN PC/workstations http server2 connected to a highhttp server N http server1 speed network. Slave Master Slave Slave  Puts more focus on N node 1 2 performance: serves for requests in parallel. 

1Gbps SAN
Winter, 2004 CSS490 Fundamentals 9

Grid Computing 



Minicomputer Cluster 

High-speed Information high way Supercomputer Cluster    

Goal  Collect computing power of supercomputers and clusters sparsely located over the nation and make it available as if it were the electric grid Distributed Supercomputing  Very large problems needing lots of CPU, memory, etc. High-Throughput Computing  Harnessing many idle resources On-Demand Computing  Remote resources integrated with local computation Data-intensive Computing  Using distributed data Collaborative Computing 

Winter, 2004


Support communication among multiple parties 10

CSS490 Fundamentals

Reasons for Distributed Computing Systems 

Inherently distributed applications 

Distributed DB, worldwide airline reservation, banking system CSCW or groupware Sharing DB/expensive hardware and controlling remote lab. devices Emergence of Gbit network and high-speed/cheap MPUs
Effective for coarse-grained or embarrassingly parallel applications Non-stopping (availability) and voting features. Loosely coupled connection and hot plug-in Reconfigure the system to meet users¶ requirements
CSS490 Fundamentals 11 

Information sharing among distributed users  

Resource sharing  

Better cost-performance ratio / Performance   




Winter, 2004

Network v.s. Distributed Operating Systems
Features Network OS Distributed OS

SSI (Single System Image)

NO Ssh, sftp, no view of remote memory High Local OS at each computer No global job coordination Unavailability grows as faulty machines increase.
CSS490 Fundamentals

YES Process migration, NFS, DSM (Distr. Shared memory) Low A single system-wide OS Global job coordination Unavailability remains little even if fault machines increase.


Fault Tolerance

Winter, 2004

Issues in Distributed Computing System

Transparency (=SSI) 

Access transparency  Memory access: DSM  Function call: RPC and RMI Location transparency  File naming: NFS  Domain naming: DNS (Still location concerned.) Migration transparency  Automatic state capturing and migration Concurrency transparency  Event ordering: Message delivery and memory consistency Other transparency:  Failure, Replication, Performance, and Scaling
CSS490 Fundamentals 13

Winter, 2004

Issues in Distributed Computing System


Faults  Fail stop  Byzantine failure Fault avoidance  The more machines involved, the less avoidance capability Fault tolerance  Redundancy techniques  K-fault tolerance needs K + 1 replicas  K-Byzantine failures needs 2K + 1 replicas.  Distributed control  Avoiding a complete fail stop Fault detection and recovery  Atomic transaction  Stateless servers
CSS490 Fundamentals 14

Winter, 2004


Ease of modification Ease of enhancement
User applications Monolithic Kernel (Unix) User applications Monolithic Kernel (Unix) User applications Daemons (file, name, Paing) Microkernel (Mach) User applications Daemons (file, name, Paing) Microkernel (Mach) Network
CSS490 Fundamentals 15

User applications

User applications Daemons (file, name, Paing) Microkernel (Mach)

Monolithic Kernel (Unix)

Winter, 2004

Unlike parallel systems, distributed systems involves OS intervention and slow network medium for data transfer  Send messages in a batch: 

Avoid OS intervention for every message transfer. Avoid repeating the same data transfer Avoid OS intervention (= zero-copy messaging). Avoid network saturation. Avoid heavy traffic between clients and servers
CSS490 Fundamentals 16 

Cache data  

Minimizing data copy  

Avoid centralized entities and algorithms  

Perform post operations on client sides 

Winter, 2004


Data and instruction formats depend on each machine architecture If a system consists of K different machine types, we need K±1 translation software. If we have an architecture-independent standard data/instruction formats, each different machine prepares only such a standard translation software.   

Java and Java virtual machine
CSS490 Fundamentals 17

Winter, 2004


Lack of a single point of control Security concerns: 

Messages may be stolen by an intruder. Messages may be plagiarized by an intruder. Messages may be changed by an intruder. 

Cryptography is the only known practical method.
CSS490 Fundamentals 18

Winter, 2004

Distributed Computing Environment
DCE Applications

Threads RPC

Distributed Time Service Name

Security Distributed File Service

Various 0perating systems and networking
Winter, 2004 CSS490 Fundamentals 19

Exercises (No turn-in)
1. 2.




6. 7.

In what respect are distributed computing systems superior to parallel systems? In what respect are parallel systems superior to distributed computing systems? Discuss the difference between the workstation-server and the processor-pool model from the availability view point. Discuss the difference between the processor-pool and the cluster model from the performance view point. What is Byzantine failure? Why do we need 2k+1 replica for this type of failure? Discuss about pros and cons of Microkernel. Why can we avoid OS intervention by zero copy?
Winter, 2004 CSS490 Fundamentals 20

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.