Distributed System Patterns

Lesson 02:
Distributed System
Patterns
Phillip J. Windley, Ph.D.
CS462 – Large-Scale Distributed Systems
Lesson 02: Distributed System Patterns
Contents
00 Coupling
03 Distributed Architectures
01 04
Distributed System Design Axes Conclusion
02 Rethinking Distributed
Coupling One of the most important
properties of a distributed
system is how tightly or
loosely coupled the
processing is.
Coupling
Coupling refers to the degree to which two or more

processes are interdependent.
We say two things are tightly coupled when they are
interdependent and loosely coupled when they are independent.
Tight and loose coupling are not binary positions, but rather
relative terms. Any given architectural choice might make two
processes more or less coupled.
CS462 – Large-Scale Distributed Systems 4

Distributed Systems and

Loose Coupling
Generally when designing If two processes are completely independent they can successfully
operate without any kind of coordination with the other process.
distributed systems
Whenever we introduce dependencies, the processes must
anything that makes the coordinate their activities. This requires some kind of
system more loosely communication between the two processes. This can result in
coupled is desirable. wasted computation, delays due to latency, and computational
errors.
But, no useful software Of course, we can’t accomplish most interesting computations
without some coupling. Good distributed architectures accomplish
system can be built where their tasks with a minimum of coupling. Good distributed system
all the components are architects choose architectures that minimize the coupling
necessary to get the job done.
completely decoupled.

System Level Coupling

Coupling can occur on multiple levels within a system. The following table gives some of
the choices that can be made at different levels of a system that increased or decrease
the level of coupling.
Level More Tightly Coupled More Loosely Coupled
Physical connection Direct Intermediary
Communication style Synchronous Asynchronous
Type system Strong type system Weak type system
Interaction pattern Remote procedure call Messages
Process logic control Central design Independent teams
Data schema Normalized Denormalized
Service discovery/binding Static Dynamic
Platform dependencies Dependent/specified Independent/guided

Adapted from Enterprise SOA: Service-Oriented Architecture Best Practices

Types of Coupling - Logical
In Event-Based Logical coupling occurs when two processes share information or
make assumptions about the other. When this happens, one
Programming, Ted Faison process can have an effect on the other even when they share no
identifies three flavors of data.
coupling: logical, type, and For example, suppose both processes share an algorithm for
signature. calculating sales tax. Changes to one process will affect the other
despite the fact that there is no computational artifact (code, API,
etc.) that links them. This can occur even if one makes assumptions
about how the other computes sales tax.
Logical coupling can be and should be avoided because it adds
nothing to the computation and is a potential source of logical
errors and system failure.

Types of Coupling - Type
Type coupling occurs when Type coupling is one of the most common forms of coupling in
distributed systems. External interfaces abound and distributed
one process references systems use those interfaces to make requests of and give
the external interface or commands to other processes.
(worse) internal model of External interfaces, called APIs, provide a contract that defines the
another. interprocess communication. Coupling occurs because the calling
process has to know the syntax and semantics of the API and is
thus dependent on it.
Another, more insidious, form of type coupling occurs when
processes share a data model. For example two processes may
have direct access to a database and used it as a shared memory,
linking them. Shared memory requires careful and complex
coordination and is often a source of logical coupling through
share semantics.

Types of Coupling - Platform
Platform coupling is a special case of type coupling. Platforms

include not only popular language platforms like the Java JVM and
the .Net CLR, but also frameworks for other languages. Platforms
create coupling through common components and assumed or
defined interface patterns.
This kind of coupling can provide advantages in the form of
reduced programming effort and abstractions to common
interaction patterns.
But relying on platforms to solve interaction patterns also limits the
kinds of processes that can participate. As an example, if you build
a distributed system on the JVM and use RMI for interprocess
communication, processes implemented in non-JVM-based
languages can’t easily interact with the system as first-class
citizens.

Types of Coupling - Signature
Signature coupling occurs For distributed systems, the primary difference between type and
signature coupling is that the latter refers to run-time
when one process
considerations. For example, a service you depend on may be
indirectly and dynamically down or non-performant.
causes another process to Messaging interfaces are also a form of signature coupling since
take action.* they are more dynamic than request-response or RPC-style
interfaces. Messages don’t require the same level of interface
dependency as an API.
In general, signature coupling is preferable to type coupling since
two processes coupled by messaging know much less about the
internal semantics and capabilities of the other. Dynamic
* Faison defines signature coupling relative interaction also allows for programmatic self-healing of faults, a
to object-oriented programming. I’ve recast it desirable property of distributed systems.
in light of more general-purpose distributed
system concepts.

Design Axes The term distributed system
can refer to systems with
vastly different design
choices.
Distributed Systems Use Many

Processes
Distributed systems are We saw in Why Distributed Systems?[01] that a distributed system
is one made up of a number of processes that are coordinating
made from two or more their efforts to achieve a specific goal or offer a specific service.
processes that are The key principle is that even though multiple processes are being
interconnected to achieve used, to the end user of the computation, it appears that a single
a specific purpose. computer were being used.
Achieving this requires careful design. There are myriad design
Design involves choosing choices a distributed system architect can follow in achieving
precise system goals.
the process architecture,
their interconnections,
and controlling data.

Where is the processing done?
What do we mean by process? We’ve been circumspect to

this point, merely referring to processes. A process can be
many things.
A process might be an entire CPU in an embedded system with a
simple, single-threaded OS. A process might be an OS process or
thread. And of course, they can be further virtualized by things like
hypervisors or the Java Virtual Machine (JVM).
Processes can be running next to each other on the same CPU, on
different cores on the same chip, or spread out across machines in
the same data center or around the world.

How are processes interconnected?
Put another way what is the topology of the distributed

system?
We can arrange the processing so that it flows along a pipeline of
processes. We can have have a central controller yielding a star
topology. We can arrange processing in a stack where
communication flows up and down. We might create a hierarchy or
tree. We can fully connect each of the processes so that they can
all speak together.
None of these are inherently better than the others. The best
choice depends on the problem that you’re solving.

What is the communication style?
There are numerous ways that processes can exchange

information. Each has implications for scalability, reliability,
and maintainability of the system.
RPC, or remote procedure call, is the most straightforward, but
creates tight coupling that can affect system performance and
code understandability.
Request-response is the primary communication style of the Web
and it’s underlying HTTP protocol. Request-response is typically
synchronous. Request-response and RESTful are often treated as
synonymous, but we’ll see they’re different.
Messaging is the most general and can be synchronous or
asynchronous. Events are special types of messages that notify of
a state change.

Where is information stored?
Distributed systems are significantly simpler when they

don’t have to worry about state or information storage. Of
course, most interesting systems have state.
Parallel processing achieves impressive performance by carefully
managing where information is stored and when. In addition, many
parallel processing processes are stateless.
Google, Facebook, and other large systems split stored information
into logical chunks (called sharding) and route requests to the
process with the right chunk.
Microservice architectures denormalize data to locate it with the
processes that use it.
Poor information management leads to poor performance and tight
coupling.

The common nomenclature
Rethinking for relating distributed and

decentralized computing is
limited. Let’s redefine it.
Distributed
Centralized, Decentralized, &

Distributed
Traditionally centralized, decentralized, and distributed topologies have been arranged

linearly as shown above. Occasionally you’ll see distributed and decentralized swapped.
This section presents a different view of these important concepts.

Distributed Doesn’t Belong with

Centralized
A better way to think about the

categorization is to use different axes.
Centralized and decentralized are opposites,
indicating the degree to which the components
are under the control of a single entity or multiple
entities. A central control point could be logical or
abstract so long as it is able to effectively
coordinate nodes in the system.
The second axis measures the degree to which
the components are co-located or distributed. Co-
location could be either physical or logical
depending on the context and level of
abstraction.

Introducing Heterarchy
Hierarchy is a familiar Hierarchy is an arrangement of items where any given item can
be above or below others. Hierarchy depends on ranking and
word that is used invokes concepts such as superior, inferior, subordinate, order,
frequently to describe rank, and level.
different kinds of In contrast, a heterarchy is an arrangement where items are not
organization. ranked and all, theoretically, play an equal role—they are peers.
Peers in a heterarchy may be related and connected to each
other in different ways.
Heterarchy refers to a
Most interesting systems, like social and biological systems, are
related, familiar concept, mixtures of hierarchy and heterarchy. For example, cities are
even if it is an unfamiliar largely heterarchical in organization, but contain many
hierarchical structures, including the city government,
word. businesses, and families.

A Third Way to Classify
Computer processes can be arranged

hierarchically or heterarchically.
Heterarchical and hierarchical are different than
distributed, co-located, centralized, or
decentralized, providing a third axis in our
classification matrix.
Computer systems can be centralized and yet
heterarchical (e.g. Facebook’s OpenGraph
model) or decentralized and yet hierarchical (e.g.
DNS). We’ll explore these concepts more in
future lessons.

Heterarchical Computing
Computing processes can Heterarchical organization of a distributed system depends on

processes that are not hard wired for a particular function.
be arranged as peers, in a Different nodes in the computation should be able to perform
heterarchy. different functions depending on what needs done.
Heterarchy also requires the freedom to bypass. Specifically, if the
connections between the nodes make some nodes intermediaries
or gatekeepers, then the nodes in the system cannot perform as
peers. Any intermediaries must be transparent to the participants
in the computation.
Heterarchical systems are sometimes called peer-to-peer or fully-
connected.

Interconnection The way processes
communicate has a big
Patterns
impact on performance,
reliability, and scalability.
Pipelines and Trees
Parallel computation performs the same operations on

similarly structured data. Processes performing these
operations have limited, structured communication needs.
Parallel computation thus lends itself to process organization as
some form of tree or pipeline. Super computers and big data
computations are usually organized as pipelines or trees.
If you’ve played with UNIX pipes and filters, you probably realize
that many of the filters can execute concurrently (although they
often don’t) to quickly achieve the desired result.

Client-Server
One of the most widely used distributed architectures is

the client-server architecture. Mobile computing is the
latest invocation of this model.
The server is accessed by many remotely-located clients. The
server provides access to resources like data and processing. The
server is often responsible for application workflow and the model
integrity of that workflow.
The client may do very little processing, called thin-client, or may be
responsible for the bulk of the application logic, called thick-client.
A weather app, for example, may be primarily about display and
some user interaction. The hard work of producing the weather
forecast done somewhere else.
A camera app, on the other hand, is doing most of the work on the
local machine with only ancillary remote interactions.

Multi-Tiered Architecture
Multi-tiered architectures are an extension of the

Browser
client-server model, expanding the server into
distributed parts.
The tiers represent key components the model-view-controller Presentation Tier API
model, which each part having it’s own layer.
The processes in each tier can be optimized and configured for a specific task.
The computers supporting them can be sized differently. Scaling, management,
Application Tier
security, and reliability can be handled differently at each tier.
Tiered architectures must be carefully built to maintain performance since each

layers adds latency because of increased network and overhead delays.
Data Tier
Within a given tier, processes might be distributed across machines or across
different data centers. The usual way of scaling within a tier involves some form
of horizontal scaling.

Horizontal Scaling
Horizontal scaling is applicable where

processes are parametrically similar.
For example, Web servers can be scaled by
making copies of the Web server and routing
requests via a load balancer to anyone of N
servers. Scaling involves adding more
servers of the same type.
Horizontal scaling sometimes involves routing requests
with specific properties to a specific server. For example,
Web server frequently do session pinning where a
specific user, identified by session ID, is sent to the same
server on subsequent requests.
Similarly, sharding sends data requests to a specific data

base server based on specific properties of the data
request.

Peer to Peer
The nodes in a peer-to-peer (P2P) system are not

distinguished from each other in capability or
interconnectedness.
That doesn’t imply they might not play different roles or run
different algorithms.
The Internet (not the Web) is the largest single example of a P2P
system. Any node on the Internet can communicate with any other
node.
Building P2P systems without any central coordinator is a
challenging task and will comprise a significant portion of this
class.


Conclusion Summary & Review
Credits
Summary and Review
This lesson has looked at several important The word distributed is used as a catchall for
concepts that will guide us in our explorations. systems that are decentralized, heterarchical, or
Coupling is at the heart of what makes merely distributed. Yet these terms mean very
distributed systems hard to build and tricky to different things. We can design systems that
operate. No useful system can avoid it, but we have any or all of these properties.
can reduce it. In doing so there are numerous interconnection
There are multiple design axes that affect the patterns we can choose from. This choice has
behavior, performance, scalability, and cost of a significant impact on the operation of our
distributed system. We will use these as we system. Consequently, we’ll review each of the
design distributed systems. architectural patterns (except for the multi-tier
model) in detail in coming lessons.

Credits
Photos and Diagrams:

Ø Tied Shoelaces (https://www.flickr.com/photos/sobriquet/314024495), CC BY-NC-SA 2.0
Ø Computer Scrap (https://www.flickr.com/photos/investingingold/7361094822), CC BY 2.0
Ø Fully-Connected Topology (https://commons.wikimedia.org/wiki/File:NetworkTopology-
FullyConnected.png), Public Domain
Ø 3D Tin Can Phones (https://www.flickr.com/photos/86530412@N02/8210762750/), CC BY 2.0
Ø Stack of HDD’s (https://www.flickr.com/photos/ervins_strauhmanis/9961198294), CC BY 2.0
Ø Centralized, decentralized, distributed (https://en.wikipedia.org/wiki/File:Centralised-
decentralised-distributed.png), CC BY-SA 3.0
Ø iOS 8 Icons (https://www.flickr.com/photos/microsiervos/15398845851), CC BY 2.0
Ø Chord network (https://commons.wikimedia.org/wiki/File:Chord_network.png), CC BY-SA 3.0

Distributed System Patterns

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Distributed System Patterns

Uploaded by

Copyright:

Available Formats

Lesson 02:

​Coupling refers to the degree to which two or more

​CS462 – Large-Scale Distributed Systems 4

Distributed Systems and

​CS462 – Large-Scale Distributed Systems 5

System Level Coupling

Level More Tightly Coupled More Loosely Coupled

Physical connection Direct Intermediary

Communication style Synchronous Asynchronous

Type system Strong type system Weak type system

Interaction pattern Remote procedure call Messages

Process logic control Central design Independent teams

Data schema Normalized Denormalized

Service discovery/binding Static Dynamic

Platform dependencies Dependent/specified Independent/guided

​CS462 – Large-Scale Distributed Systems 6

Types of Coupling - Logical

​CS462 – Large-Scale Distributed Systems 7

Types of Coupling - Type

​CS462 – Large-Scale Distributed Systems 8

Types of Coupling - Platform

​Platform coupling is a special case of type coupling. Platforms

​CS462 – Large-Scale Distributed Systems 9

Types of Coupling - Signature

​CS462 – Large-Scale Distributed Systems 10

Distributed Systems Use Many

​CS462 – Large-Scale Distributed Systems 12

Where is the processing done?

​What do we mean by process? We’ve been circumspect to

​CS462 – Large-Scale Distributed Systems 13

How are processes interconnected?

​Put another way what is the topology of the distributed

​CS462 – Large-Scale Distributed Systems 14

What is the communication style?

​There are numerous ways that processes can exchange

​CS462 – Large-Scale Distributed Systems 15

Where is information stored?

​Distributed systems are significantly simpler when they

​CS462 – Large-Scale Distributed Systems 16

Rethinking for relating distributed and

Centralized, Decentralized, &

​Traditionally centralized, decentralized, and distributed topologies have been arranged

​CS462 – Large-Scale Distributed Systems 18

Distributed Doesn’t Belong with

​A better way to think about the

​CS462 – Large-Scale Distributed Systems ​ 19

​CS462 – Large-Scale Distributed Systems 20

A Third Way to Classify

​Computer processes can be arranged

​CS462 – Large-Scale Distributed Systems 21

​Computing processes can ​Heterarchical organization of a distributed system depends on

​CS462 – Large-Scale Distributed Systems 22

Pipelines and Trees

​Parallel computation performs the same operations on

​CS462 – Large-Scale Distributed Systems 24

​One of the most widely used distributed architectures is

​CS462 – Large-Scale Distributed Systems 25

​Multi-tiered architectures are an extension of the

​Tiered architectures must be carefully built to maintain performance since each

​CS462 – Large-Scale Distributed Systems 26

​Horizontal scaling is applicable where

​Similarly, sharding sends data requests to a specific data

​CS462 – Large-Scale Distributed Systems 27

Coupling refers to the degree to which two or more

CS462 – Large-Scale Distributed Systems 4

CS462 – Large-Scale Distributed Systems 5

CS462 – Large-Scale Distributed Systems 6

CS462 – Large-Scale Distributed Systems 7

CS462 – Large-Scale Distributed Systems 8

Platform coupling is a special case of type coupling. Platforms

CS462 – Large-Scale Distributed Systems 9

CS462 – Large-Scale Distributed Systems 10

CS462 – Large-Scale Distributed Systems 12

What do we mean by process? We’ve been circumspect to

CS462 – Large-Scale Distributed Systems 13

Put another way what is the topology of the distributed

CS462 – Large-Scale Distributed Systems 14

There are numerous ways that processes can exchange

CS462 – Large-Scale Distributed Systems 15

Distributed systems are significantly simpler when they

CS462 – Large-Scale Distributed Systems 16

Traditionally centralized, decentralized, and distributed topologies have been arranged

CS462 – Large-Scale Distributed Systems 18

A better way to think about the

CS462 – Large-Scale Distributed Systems 19

CS462 – Large-Scale Distributed Systems 20

Computer processes can be arranged

CS462 – Large-Scale Distributed Systems 21

Computing processes can Heterarchical organization of a distributed system depends on

CS462 – Large-Scale Distributed Systems 22

Parallel computation performs the same operations on

CS462 – Large-Scale Distributed Systems 24

One of the most widely used distributed architectures is

CS462 – Large-Scale Distributed Systems 25

Multi-tiered architectures are an extension of the

Tiered architectures must be carefully built to maintain performance since each

CS462 – Large-Scale Distributed Systems 26

Horizontal scaling is applicable where

Similarly, sharding sends data requests to a specific data

CS462 – Large-Scale Distributed Systems 27

The nodes in a peer-to-peer (P2P) system are not

CS462 – Large-Scale Distributed Systems 28

CS462 – Large-Scale Distributed Systems 30

Photos and Diagrams:

CS462 – Large-Scale Distributed Systems 31