You are on page 1of 9

2023

DESIGNING A DISTRIBUTED SYSTEM


ABSTRACT
The field of computer science and engineering faces numerous challenges while designing a
distributed system. The difficulties, factors, and important principles involved in the design of
distributed systems are briefly summarized in this abstract. The utilization of numerous
interconnected nodes, frequently spread across different geographical regions, to facilitate
cooperative computation and data sharing is what defines distributed systems. The primary
goals of such systems' design are to maximize resource efficiency while ensuring dependability,
scalability, and fault tolerance.
The following are some of the basic principles of creating distributed systems that are covered
in this abstract:
1. Architecture Selection: A key factor in determining how a system behaves and performs
is the architectural pattern that is chosen, such as client-server, peer-to-peer, or micro
services.

2. Communication protocols: It's crucial for distributed components to communicate


effectively with one another. On the basis of the system's unique requirements,
carefully considered protocols such as HTTP, MQTT, or custom ones must be used.

3. Data Consistency and Replication: One of the main challenges is managing data across
dispersed nodes while assuring consistency. There is the use of replication, sharing, and
consensus techniques.

4. Scalability and load balancing: Planning for growing system load is a part of designing for
scalability. Incoming requests are evenly distributed among nodes using load balancing
techniques.

5. Resource Management: For the best system performance, it's essential to effectively
manage computational resources like CPU, memory, and storage.

6. Various distributed algorithms are used to coordinate actions among nodes, including
consensus methods (like Paxos or Raft) and distributed locking systems.

7. Testing and Simulation: To assess system behavior in a variety of situations and edge
cases, rigorous testing and simulation are required.
1. ACID DESIRABLE PROPERTIES OF A TRANSACTION
ACID (Atomicity, Consistency, Isolation, Durability) in database management systems, the
desired characteristics of a transaction. Each of these qualities is briefly described below:
1. Atomicity:
A transaction must be viewed as a single, indivisible unit of work in order to be considered
atomic. It suggests that either all of a transaction's modifications are saved to the database, or
none of them are. A transaction is therefore atomic—it's all or nothing.
Atomicity, for instance, assures that when money is transferred between two bank accounts, if
the debit from one account is successful, the credit to the other account will also be successful.
To preserve the database's consistency, the entire transaction will be rolled back if either of
these operations fails.
2. Consistency:
A transaction moves the database from one legitimate state to another by guaranteeing
consistency. It ensures, in other words, that a transaction won't contravene the integrity
limitations and guidelines established for the database.
A transaction that tries to insert a new user with an existing email address won't be allowed, for
example, if a database constraint requires that each user have a distinct email address. This
keeps the data consistent.
3. Isolation:
Isolation makes guarantee that one transaction's execution is separate from other transactions'
executions. No transactions should impede one another. This characteristic shields against
issues including phantom reads, non-repeatable reads, and filthy reads.
Isolation makes guarantee that the reading transaction sees a consistent, non-intermediate
state of the data, for instance, if one transaction is updating a record while another is reading
the same record.
4. Durability:
When a transaction is committed, durability ensures that its effects on the database will endure
despite any system failures. In other words, you can be sure that the changes are permanent
after you receive confirmation that your purchase was completed.
Durability, for instance, assures that the email you sent won't be lost even if the email server
collapses just after you send it if you get a confirmation message confirming it was sent.
2. DISTRIBUTED FILE SYSTEM

A distributed file system (DFS) is a type of file system that enables users and programs to access
and share files as if they were located on a single centralized file system while running on
several machines. In networked situations, distributed file systems are frequently used to
enable scalable and dependable access to data across several devices. I'll give a quick rundown
of a distributed file system below, along with references to certain applications and tools.

The essential features of a distributed file system are:


1. Scalability:
File systems that are distributed should be able to effectively handle a high user count
and an expanding volume of data. Their design must take scalability into account.
2. Transparency:
Offering users and applications transparency is one of a networked file system's main
objectives. This means that users and programs shouldn't be aware of the files' actual
locations or the system's inherent complexity.
3. Security:
Security mechanisms, such as access control and encryption, are crucial to protect data in
a distributed environment.
4. Fault Tolerance:
DFSs are made to be fault-tolerant by design. Even if any of the network's nodes
malfunction, they should still be able to function. Redundancy and data replication are
frequently involved in this.
Here is an architecture of distributed file system

CLOUD CLOUD
STORAGE

END USER DFS SERVER

LOCAL
AREA
NETWORK
LOCAL
STORAGE
3 .DOMAIN NAME SERVICE

An essential part of the internet is the Domain Name Service (DNS), which converts human-
friendly domain names like www.example.com into IP addresses like 192.0.2.1 so that
computers can find and interact with one another. For email, online browsing, and many other
internet applications, DNS is essential.
Here's a brief overview of DNS:

 Requests from users:


When a user types a domain name into a web browser, such as www.example.com,
their computer launches a DNS lookup to discover the related IP address.
 Local DNS Resolver:
The user's computer initially looks through the cache of its local DNS resolver. It utilizes
that address if it locates the IP address there (based on recent lookups). If not, it moves
on to the following phase.
 Recursive DNS Server:
The user's computer asks a recursive DNS server, often offered by their Internet Service
Provider (ISP), if the IP address is not already in the cache. The IP address might already
be on file on this server, or the lookup will go on.
 DNS servers for TLDs:
The root server directs the recursive server to the proper DNS server for a TLD (for
example, the.com DNS server for www.example.com).
 Recursive server is directed to the authoritative DNS server for the given domain
(example.com) by the TLD DNS server. The final IP address for the domain is stored on
the authoritative DNS server.
 User response:
The IP address is returned from the authoritative DNS server to the recursive server,
which then sends it to the user's computer. The PC can now connect to the web server
hosting www.example.com using the IP address.
 Root DNS Servers:
The recursive server contacts one of the root DNS servers if it lacks the necessary data.
There are 13 root servers (letters A to M) in existence. The authoritative DNS servers for
top-level domains (TLDs, such as.com, .org, and.net) are kept on file by these servers.
 Caching:
To speed up subsequent visits to the same domain, the recursive server and the user's
computer store the IP address for later use.
Here is a brief architecture if the DNS

REQUEST
USER WEB SERVER
GOOGLE.COM

GOOGLE.COM?

DNS SERVER

74.125.68.102
4 .INTER-PROCESS COMMUNICATION
A key idea in computer science and operating systems is inter-process communication (IPC),
which permits communication between several processes. Building sophisticated systems
where different processes must cooperate or share resources requires the use of IPC.
Let's examine IPC using examples of typical IPC mechanisms:
A. PIPE:
A unidirectional data channel is referred to as a pipe. A two-way data link between two
processes can be established using two pipes. This employs typical input and output
techniques. All POSIX systems as well as Windows operating systems make use of pipes.
B. SOCKET:
The endpoint for delivering or receiving data over a network is the socket. This holds
true whether the data is transmitted between processes on the same computer or
between machines connected to the same network. Sockets are primarily used by
operating systems for inter process communication.
C. FILE:
A file is a data record that can be kept on a disk or downloaded by a file server when
needed. A file can be accessed by as many processes as necessary. Every operating
system uses files to store data.
D. SIGNAL:
In restricted circumstances, signals can be useful for inter process communication. They
are transmissions of system messages between processes. Signals are typically utilized
for remote commands between processes rather than data transfer.
E. SHARED MEMORY
The memory that numerous programs can access at once is known as shared memory.
This is carried out to enable communication between the processes. Shared memory is a
feature of Windows and all POSIX systems.
F. MESSAGE QUEUE
The message queue can be read and written to by many processes that are not
connected to one another. Messages are kept in the queue until they are picked up by
their intended receiver. Most operating systems use message queues, which are highly
useful for inter process communication.
5 .REMOTE PROCEDURE CALL
Programs on various computers can call procedures or functions on distant machines as if they
were local thanks to the powerful paradigm known as Remote Procedure Call (RPC), which is
used to design distributed systems. By using this technique, distributed applications can be
created that appear to be running on the same system as the function call.
Here are examples of how rpc works:
a) RPC Model
A local procedure call model and a remote procedure call (RPC) model are comparable.
In the local paradigm, a procedure's caller passes parameters to it in a predetermined
location, such a result register.
b) Marshalling and Unmarshalling:
The parameters are marshalled (serialized) into a format that can be transferred over
the network when a client calls a method on the proxy object. This information also
contains the citation. The function is called on the real server object after the received
data has been unmarshalled (deserialized) on the server side.
c) RPC Authentication:
It's possible that neither the server nor the caller wants to know who is making the call.
Some network services, like the Network File System (NFS), demand higher security,
though. Authentication with Remote Procedure Calls (RPC) offers a certain level of
protection.
d) Programming in RPC
Remote procedure calls can be made from any language. Remote Procedure Call (RPC)
protocol is generally used to communicate between processes on different
workstations. However, RPC works just as well for communication between different
processes on the same workstation.
e) Stub:
A stub or proxy is a piece of code that is present on the client side. The stub creates a
message that may be sent over the network that contains the procedure call and its
parameters. It manages the specifics of network connectivity.
References:
1. IBM Documentation. (n.d.).
2. Educative Answers - Trusted Answers to Developer Questions. (n.d.).
3. Rehman, J. 2021, November 19. What is domain name server (DNS) with example - IT
Release.
4. Distributed File Systems - What is DFS? (n.d.).
5. Desirable properties of transactions. Brain Kart. (n.d.).

You might also like