You are on page 1of 31

Lesson 02:

Distributed System
Patterns
​Phillip J. Windley, Ph.D.
​CS462 – Large-Scale Distributed Systems
Lesson 02: Distributed System Patterns

Contents

00 Coupling
03 Distributed Architectures

01 04
Distributed System Design Axes Conclusion

02 Rethinking Distributed
Coupling ​One of the most important
properties of a distributed
system is how tightly or
loosely coupled the
processing is.
Lesson 02: Distributed System Patterns

Coupling

​Coupling refers to the degree to which two or more


processes are interdependent.
​We say two things are tightly coupled when they are
interdependent and loosely coupled when they are independent.
​Tight and loose coupling are not binary positions, but rather
relative terms. Any given architectural choice might make two
processes more or less coupled.

​CS462 – Large-Scale Distributed Systems 4


Lesson 02: Distributed System Patterns

Distributed Systems and


Loose Coupling

​Generally when designing If two processes are completely independent they can successfully
operate without any kind of coordination with the other process.
distributed systems
​Whenever we introduce dependencies, the processes must
anything that makes the coordinate their activities. This requires some kind of
system more loosely communication between the two processes. This can result in
coupled is desirable. wasted computation, delays due to latency, and computational
errors.

​But, no useful software ​Of course, we can’t accomplish most interesting computations
without some coupling. Good distributed architectures accomplish
system can be built where their tasks with a minimum of coupling. Good distributed system
all the components are architects choose architectures that minimize the coupling
necessary to get the job done.
completely decoupled.

​CS462 – Large-Scale Distributed Systems 5


Lesson 02: Distributed System Patterns

System Level Coupling


​Coupling can occur on multiple levels within a system. The following table gives some of
the choices that can be made at different levels of a system that increased or decrease
the level of coupling.

Level More Tightly Coupled More Loosely Coupled

Physical connection Direct Intermediary

Communication style Synchronous Asynchronous

Type system Strong type system Weak type system

Interaction pattern Remote procedure call Messages

Process logic control Central design Independent teams

Data schema Normalized Denormalized

Service discovery/binding Static Dynamic

Platform dependencies Dependent/specified Independent/guided


​Adapted from Enterprise SOA: Service-Oriented Architecture Best Practices

​CS462 – Large-Scale Distributed Systems 6


Lesson 02: Distributed System Patterns

Types of Coupling - Logical

​In Event-Based ​Logical coupling occurs when two processes share information or
make assumptions about the other. When this happens, one
Programming, Ted Faison process can have an effect on the other even when they share no
identifies three flavors of data.
coupling: logical, type, and ​For example, suppose both processes share an algorithm for
signature. calculating sales tax. Changes to one process will affect the other
despite the fact that there is no computational artifact (code, API,
etc.) that links them. This can occur even if one makes assumptions
about how the other computes sales tax.
​Logical coupling can be and should be avoided because it adds
nothing to the computation and is a potential source of logical
errors and system failure.

​CS462 – Large-Scale Distributed Systems 7


Lesson 02: Distributed System Patterns

Types of Coupling - Type

​Type coupling occurs when ​Type coupling is one of the most common forms of coupling in
distributed systems. External interfaces abound and distributed
one process references systems use those interfaces to make requests of and give
the external interface or commands to other processes.
(worse) internal model of ​External interfaces, called APIs, provide a contract that defines the
another. interprocess communication. Coupling occurs because the calling
process has to know the syntax and semantics of the API and is
thus dependent on it.
​Another, more insidious, form of type coupling occurs when
processes share a data model. For example two processes may
have direct access to a database and used it as a shared memory,
linking them. Shared memory requires careful and complex
coordination and is often a source of logical coupling through
share semantics.

​CS462 – Large-Scale Distributed Systems 8


Lesson 02: Distributed System Patterns

Types of Coupling - Platform

​Platform coupling is a special case of type coupling. Platforms


include not only popular language platforms like the Java JVM and
the .Net CLR, but also frameworks for other languages. Platforms
create coupling through common components and assumed or
defined interface patterns.
​This kind of coupling can provide advantages in the form of
reduced programming effort and abstractions to common
interaction patterns.
​But relying on platforms to solve interaction patterns also limits the
kinds of processes that can participate. As an example, if you build
a distributed system on the JVM and use RMI for interprocess
communication, processes implemented in non-JVM-based
languages can’t easily interact with the system as first-class
citizens.

​CS462 – Large-Scale Distributed Systems 9


Lesson 02: Distributed System Patterns

Types of Coupling - Signature

​Signature coupling occurs ​For distributed systems, the primary difference between type and
signature coupling is that the latter refers to run-time
when one process
considerations. For example, a service you depend on may be
indirectly and dynamically down or non-performant.
causes another process to ​Messaging interfaces are also a form of signature coupling since
take action.* they are more dynamic than request-response or RPC-style
interfaces. Messages don’t require the same level of interface
dependency as an API.
​In general, signature coupling is preferable to type coupling since
two processes coupled by messaging know much less about the
internal semantics and capabilities of the other. Dynamic
​* Faison defines signature coupling relative interaction also allows for programmatic self-healing of faults, a
to object-oriented programming. I’ve recast it desirable property of distributed systems.
in light of more general-purpose distributed
system concepts.

​CS462 – Large-Scale Distributed Systems 10


Design Axes ​The term distributed system
can refer to systems with
vastly different design
choices.
Lesson 02: Distributed System Patterns

Distributed Systems Use Many


Processes

​Distributed systems are ​We saw in Why Distributed Systems?[01] that a distributed system
is one made up of a number of processes that are coordinating
made from two or more their efforts to achieve a specific goal or offer a specific service.
processes that are ​The key principle is that even though multiple processes are being
interconnected to achieve used, to the end user of the computation, it appears that a single
a specific purpose. computer were being used.
​Achieving this requires careful design. There are myriad design
​Design involves choosing choices a distributed system architect can follow in achieving
precise system goals.
the process architecture,
their interconnections,
and controlling data.

​CS462 – Large-Scale Distributed Systems 12


Lesson 02: Distributed System Patterns

Where is the processing done?

​What do we mean by process? We’ve been circumspect to


this point, merely referring to processes. A process can be
many things.
​A process might be an entire CPU in an embedded system with a
simple, single-threaded OS. A process might be an OS process or
thread. And of course, they can be further virtualized by things like
hypervisors or the Java Virtual Machine (JVM).
​Processes can be running next to each other on the same CPU, on
different cores on the same chip, or spread out across machines in
the same data center or around the world.

​CS462 – Large-Scale Distributed Systems 13


Lesson 02: Distributed System Patterns

How are processes interconnected?

​Put another way what is the topology of the distributed


system?
​We can arrange the processing so that it flows along a pipeline of
processes. We can have have a central controller yielding a star
topology. We can arrange processing in a stack where
communication flows up and down. We might create a hierarchy or
tree. We can fully connect each of the processes so that they can
all speak together.
None of these are inherently better than the others. The best
choice depends on the problem that you’re solving.

​CS462 – Large-Scale Distributed Systems 14


Lesson 02: Distributed System Patterns

What is the communication style?

​There are numerous ways that processes can exchange


information. Each has implications for scalability, reliability,
and maintainability of the system.
​RPC, or remote procedure call, is the most straightforward, but
creates tight coupling that can affect system performance and
code understandability.
​Request-response is the primary communication style of the Web
and it’s underlying HTTP protocol. Request-response is typically
synchronous. Request-response and RESTful are often treated as
synonymous, but we’ll see they’re different.
​Messaging is the most general and can be synchronous or
asynchronous. Events are special types of messages that notify of
a state change.

​CS462 – Large-Scale Distributed Systems 15


Lesson 02: Distributed System Patterns

Where is information stored?

​Distributed systems are significantly simpler when they


don’t have to worry about state or information storage. Of
course, most interesting systems have state.
​Parallel processing achieves impressive performance by carefully
managing where information is stored and when. In addition, many
parallel processing processes are stateless.
​Google, Facebook, and other large systems split stored information
into logical chunks (called sharding) and route requests to the
process with the right chunk.
​Microservice architectures denormalize data to locate it with the
processes that use it.
​Poor information management leads to poor performance and tight
coupling.

​CS462 – Large-Scale Distributed Systems 16


​The common nomenclature

Rethinking for relating distributed and


decentralized computing is
limited. Let’s redefine it.

Distributed
Lesson 02: Distributed System Patterns

Centralized, Decentralized, &


Distributed

​Traditionally centralized, decentralized, and distributed topologies have been arranged


linearly as shown above. Occasionally you’ll see distributed and decentralized swapped.
​This section presents a different view of these important concepts.

​CS462 – Large-Scale Distributed Systems 18


Lesson 02: Distributed System Patterns

Distributed Doesn’t Belong with


Centralized

​A better way to think about the


categorization is to use different axes.
​Centralized and decentralized are opposites,
indicating the degree to which the components
are under the control of a single entity or multiple
entities. A central control point could be logical or
abstract so long as it is able to effectively
coordinate nodes in the system.
​The second axis measures the degree to which
the components are co-located or distributed. Co-
location could be either physical or logical
depending on the context and level of
abstraction.

​CS462 – Large-Scale Distributed Systems ​ 19


Lesson 02: Distributed System Patterns

Introducing Heterarchy

​Hierarchy is a familiar ​Hierarchy is an arrangement of items where any given item can
be above or below others. Hierarchy depends on ranking and
word that is used invokes concepts such as superior, inferior, subordinate, order,
frequently to describe rank, and level.
different kinds of ​In contrast, a heterarchy is an arrangement where items are not
organization. ranked and all, theoretically, play an equal role—they are peers.
Peers in a heterarchy may be related and connected to each
other in different ways.
​Heterarchy refers to a
​Most interesting systems, like social and biological systems, are
related, familiar concept, mixtures of hierarchy and heterarchy. For example, cities are
even if it is an unfamiliar largely heterarchical in organization, but contain many
hierarchical structures, including the city government,
word. businesses, and families.

​CS462 – Large-Scale Distributed Systems 20


Lesson 02: Distributed System Patterns

A Third Way to Classify

​Computer processes can be arranged


hierarchically or heterarchically.
​Heterarchical and hierarchical are different than
distributed, co-located, centralized, or
decentralized, providing a third axis in our
classification matrix.
​Computer systems can be centralized and yet
heterarchical (e.g. Facebook’s OpenGraph
model) or decentralized and yet hierarchical (e.g.
DNS). We’ll explore these concepts more in
future lessons.

​CS462 – Large-Scale Distributed Systems 21


Lesson 02: Distributed System Patterns

Heterarchical Computing

​Computing processes can ​Heterarchical organization of a distributed system depends on


processes that are not hard wired for a particular function.
be arranged as peers, in a Different nodes in the computation should be able to perform
heterarchy. different functions depending on what needs done.
​Heterarchy also requires the freedom to bypass. Specifically, if the
connections between the nodes make some nodes intermediaries
or gatekeepers, then the nodes in the system cannot perform as
peers. Any intermediaries must be transparent to the participants
in the computation.
​Heterarchical systems are sometimes called peer-to-peer or fully-
connected.

​CS462 – Large-Scale Distributed Systems 22


Interconnection ​The way processes
communicate has a big

Patterns
impact on performance,
reliability, and scalability.
Lesson 02: Distributed System Patterns

Pipelines and Trees

​Parallel computation performs the same operations on


similarly structured data. Processes performing these
operations have limited, structured communication needs.
​Parallel computation thus lends itself to process organization as
some form of tree or pipeline. Super computers and big data
computations are usually organized as pipelines or trees.
​If you’ve played with UNIX pipes and filters, you probably realize
that many of the filters can execute concurrently (although they
often don’t) to quickly achieve the desired result.

​CS462 – Large-Scale Distributed Systems 24


Lesson 02: Distributed System Patterns

Client-Server

​One of the most widely used distributed architectures is


the client-server architecture. Mobile computing is the
latest invocation of this model.
​The server is accessed by many remotely-located clients. The
server provides access to resources like data and processing. The
server is often responsible for application workflow and the model
integrity of that workflow.
​The client may do very little processing, called thin-client, or may be
responsible for the bulk of the application logic, called thick-client.
​A weather app, for example, may be primarily about display and
some user interaction. The hard work of producing the weather
forecast done somewhere else.
​A camera app, on the other hand, is doing most of the work on the
local machine with only ancillary remote interactions.

​CS462 – Large-Scale Distributed Systems 25


Lesson 02: Distributed System Patterns

Multi-Tiered Architecture

​Multi-tiered architectures are an extension of the


Browser
client-server model, expanding the server into
distributed parts.
​The tiers represent key components the model-view-controller Presentation Tier API
model, which each part having it’s own layer.
​The processes in each tier can be optimized and configured for a specific task.
The computers supporting them can be sized differently. Scaling, management,
Application Tier
security, and reliability can be handled differently at each tier.

​Tiered architectures must be carefully built to maintain performance since each


layers adds latency because of increased network and overhead delays.
Data Tier
​Within a given tier, processes might be distributed across machines or across
different data centers. The usual way of scaling within a tier involves some form
of horizontal scaling.

​CS462 – Large-Scale Distributed Systems 26


Lesson 02: Distributed System Patterns

Horizontal Scaling

​Horizontal scaling is applicable where


processes are parametrically similar.
​For example, Web servers can be scaled by
making copies of the Web server and routing
requests via a load balancer to anyone of N
servers. Scaling involves adding more
servers of the same type.
​Horizontal scaling sometimes involves routing requests
with specific properties to a specific server. For example,
Web server frequently do session pinning where a
specific user, identified by session ID, is sent to the same
server on subsequent requests.

​Similarly, sharding sends data requests to a specific data


base server based on specific properties of the data
request.

​CS462 – Large-Scale Distributed Systems 27


Lesson 02: Distributed System Patterns

Peer to Peer

​The nodes in a peer-to-peer (P2P) system are not


distinguished from each other in capability or
interconnectedness.
​That doesn’t imply they might not play different roles or run
different algorithms.
​The Internet (not the Web) is the largest single example of a P2P
system. Any node on the Internet can communicate with any other
node.
​Building P2P systems without any central coordinator is a
challenging task and will comprise a significant portion of this
class.

​CS462 – Large-Scale Distributed Systems 28


Conclusion ​Summary & Review
​Credits
Lesson 02: Distributed System Patterns

Summary and Review

This lesson has looked at several important ​The word distributed is used as a catchall for
concepts that will guide us in our explorations. systems that are decentralized, heterarchical, or
Coupling is at the heart of what makes merely distributed. Yet these terms mean very
distributed systems hard to build and tricky to different things. We can design systems that
operate. No useful system can avoid it, but we have any or all of these properties.
can reduce it. ​In doing so there are numerous interconnection
There are multiple design axes that affect the patterns we can choose from. This choice has
behavior, performance, scalability, and cost of a significant impact on the operation of our
distributed system. We will use these as we system. Consequently, we’ll review each of the
design distributed systems. architectural patterns (except for the multi-tier
model) in detail in coming lessons.

​CS462 – Large-Scale Distributed Systems 30


Lesson 02: Distributed System Patterns

Credits

​Photos and Diagrams:


Ø Tied Shoelaces (https://www.flickr.com/photos/sobriquet/314024495), CC BY-NC-SA 2.0
Ø Computer Scrap (https://www.flickr.com/photos/investingingold/7361094822), CC BY 2.0
Ø Fully-Connected Topology (https://commons.wikimedia.org/wiki/File:NetworkTopology-
FullyConnected.png), Public Domain
Ø 3D Tin Can Phones (https://www.flickr.com/photos/86530412@N02/8210762750/), CC BY 2.0
Ø Stack of HDD’s (https://www.flickr.com/photos/ervins_strauhmanis/9961198294), CC BY 2.0
Ø Centralized, decentralized, distributed (https://en.wikipedia.org/wiki/File:Centralised-
decentralised-distributed.png), CC BY-SA 3.0
Ø iOS 8 Icons (https://www.flickr.com/photos/microsiervos/15398845851), CC BY 2.0
Ø Chord network (https://commons.wikimedia.org/wiki/File:Chord_network.png), CC BY-SA 3.0

​CS462 – Large-Scale Distributed Systems 31

You might also like