You are on page 1of 13

Blockchain client types

January 26, 2022

There are four basic blockchain client types, each of which has a different use within the context

of a network. Determining the differences between the four types can help you choose the right

one for your needs.

Blockchains are decentralized networks that require many parties to sync with them in a peer-to-

peer manner. Each blockchain may support multiple software implementations, or clients, that

enable users to take part. A user may execute one of these software implementations in different

set forms, called client types or nodes, and these formed nodes are the parties that create the

decentralized network.

Typically, each node stores its own copy of the blockchain and keeps track of incoming

transactions to have an up-to-date view of the network. These nodes are necessary to ensure the

correctness of the chain, prevent malicious activity from occurring, and maintain decentralized

consensus.

However, for some users the storage and computing requirements of blockchains can prove a

barrier to entry — and it is technically difficult to run nodes that require a full download of the

blockchain. Thus, new ways are needed to trustlessly sync with the chain in a secure fashion.

The four basic client types of a blockchain network are:


 Full nodes

 Archive (or archival) nodes

 Light clients

 Stateless clients

Each has its own characteristics, and each can be used in a different way within the context of a

blockchain network.

Full nodes

The most standard type of node is a full node. Full nodes store all the blockchain data on disk

and verify the rules of the network — which include tasks such as participating in block

validation, receiving and verifying all transactions, and generally serving the network with data.

Full nodes must also store a copy of the state, a data structure that holds the status of users in the

network — such as the UTXO set in Bitcoin, or all the accounts and balances in Ethereum (in

which the state is respectively represented as a Merkle tree and a modified Merkle Patricia Trie).

Full nodes are distinct from miners, as miners simply reorder or remove transactions from the

data received by nodes and then perform the mining process to solve cryptographic puzzles.
While clients in general must follow a formal specification, a given network can be open to

different client implementations. For example, the Ethereum network consists mostly of Geth

and Parity nodes (mostly Geth), and eth2 will support a large variety of client implementations

including Prysm, Lighthouse, and Lodestar.

However, full nodes must keep track of a significant quantity of information, requiring large

amounts of storage and bandwidth to operate (SSDs must be used due to the volume of

read/write operations). For example, as of early October 2020, the Bitcoin blockchain took up

around 300 GB on Geth, and the Ethereum blockchain used around 500 GB.

Besides, though nodes require 24/7 uptime and a high level of technical knowledge to maintain

them, there is generally no direct economic incentive to run a full node. As a result, many users

running full nodes are businesses such as exchanges or infrastructure providers who rely on the

other benefits of full nodes.

While running a full node is technically challenging, there are benefits to doing so. First, running

a full node is the most secure way to access a network. It guarantees maximum self-sovereignty

because you can trustlessly verify that all network rules are being followed. By running a full

node you also improve the decentralization and overall health of the network by acting as a data

provider and protecting other clients from being tricked by malicious nodes or miners.

Full nodes help secure the network by verifying all transactions rather than just those that are

relevant to them, and further secure the network by alerting other client types of invalid blocks.

In some networks, certain other client types rely on full nodes to verify transactions and cannot

access the blockchain without connecting to a full node.

In some networks you may also be able to receive rewards for running a full node. For example,

Celo aims to address the lack of economic incentives for operating a full node: its network
allows individuals who run non-validating full nodes to set gateway fees for answering requests

and forwarding transactions on behalf of other types of clients. We hope to see further

experimentation to incentivize participants to run full nodes in future blockchain networks.

Archival nodes

While full nodes already store a large amount of data, archival nodes take storage even further by

retaining everything included in a full node along with an archive of the historical states of the

chain.
Full node Archival node

Stores current balance of any account in Stores full balance history of any account in
chain chain

Stores state of last few blocks in network Stores history of every state change in network

Has information needed to re-compute Has historical network data stored, does not
historical network data need to re-compute to find
An archival node could be described as a full node with a massive amount of cached historical

data. However, and importantly, an archival node does not provide any more validation or

security than a full node.

As of October 2020, archival nodes on Ethereum occupy more than 5.3 TB of data. With such a

large volume of data, to sync an archival node on a network should take approximately two

weeks. But by using an infrastructure-as-a-service product, that time may be dramatically

reduced — the Coinbase Cloud ETH archival node is production-ready in a few hours. Only a

very small number of archival nodes are actually run on the network, due to the lift of spinning

up an archival node on one’s own, and they are typically run by entities such as block explorers,

data analytics companies, and infrastructure providers.

Light clients

Due to their intense storage and the uptime needed to remain functioning, most users choose not

to run full nodes. Light clients, however, improve the accessibility of blockchain networks for

resource-constrained devices, while giving high security and requiring low computing power.
As a low-resource node, a light client allows users to sync with a blockchain in a

cryptographically secure manner without having to store the whole blockchain. Light clients can

be used to find out the state of an account, check that a transaction was confirmed, or watch for

logged events.

Light clients operate by downloading and verifying a chain of block headers and requesting any

other relevant information, such as transaction data, from full nodes. The header is the smallest

unit that forms a chain, and each header refers back to the previous block’s header. The block

header stores a condensed version of information in the block, including the hash of the previous

block, the timestamp, and the Merkle tree root.

This Merkle tree root is a representation of the state of a block and the set of all transactions. It

could be regarded as a fingerprint of information about the block. The goal of a light client is to

verify and archive the headers, and verify received information against the Merkle tree. Only the

specific portion of the state that is relevant to the light client needs to be verified, and proofs

received from full nodes can be verified against the Merkle root in the block header.

While light clients do not need to be run constantly, they must connect to intermediary full nodes

to request data and interact with the blockchain. Verification is trust-minimized: proofs can be

verified regardless of whether the light clients trust the full node.

In Bitcoin, the method above is known as SPV verification. SPV clients trust downloaded

headers as long as they belong to the longest chain. For any given transaction, full nodes provide

light clients with an SPV proof and a Merkle path to the transaction in the tree as the data needed

to verify the transaction. This method can be used for cross-chain interactions such as bridges or

sidechains.
Light clients are well suited for low-capacity users, such as those using smartphones or browser

extensions, because they are able to maintain a high-security assurance about the state of a chain.

While light clients do not write data to the network, they do make blockchains more accessible to

a variety of other users.

Light client designs

The design space of a light client is enormous and there is always room for improvement and

more features. Light clients can borrow techniques from cryptography and distributed systems to

construct complex yet innovative solutions.

Below are some examples of cutting-edge light client designs.

Celo’s ultralight client Plumo uses a mix of different cryptography techniques to achieve

lightweight validation. In general, SPV verification for proof of stake networks is expensive:

users need to verify that two-thirds of the validating stake has signed on a block for a given

header and blocks occur frequently.

Celo has improved on this, using epoch-based synching whereby only one header is downloaded

per epoch. In Celo, the validator set changes once only per epoch – and an epoch is one day, so

the load on light clients is already drastically reduced: they need to verify headers only once a

day rather than once per block.

Cryptographic primitives such as BLS signatures can be used to aggregate all of a validator’s

signatures, and SNARKs — proofs used to verify the correctness of a computation without

having to execute it yourself — can be submitted from full nodes to prove the light client

protocol. This process consists of checking the signatures of the last header of each epoch plus

any validator set changes. Using SNARKs, one could (relatively) quickly prove validator set

changes over the span of months.


Light clients are also being improved in research. For example, storage and bandwidth

requirements scale linearly with the chain length of an SPV proof, and can still be a burden in a

larger blockchain. Flyclient is an efficient solution for light client block header verification. It

improves on a protocol called Non-Interactive Proofs of Proof of Work by being compatible with

variable difficulty and hashrates. It also involves short inclusion proofs, which are 10 times

smaller than previous solutions. Flyclient operates by downloading only a logarithmic number of

block headers (instead of having to download every one) while storing only a single block header

between executions.

With Flyclient, one can prove the whole chain is valid using as little information as possible,

enabling easier cross-chain interoperability in decentralized protocols that require light client

verification. ZCash specifically plans to use Flyclient research to implement a ZEC-ETH bridge

(tZEC will implement a light client verification of the ZCash blockchain inside an Ethereum

smart contract).

Stateless clients

Blockchains depend on a shared state that corresponds to the values in a block at any given time.

As explained earlier, the state changes after transactions are executed, and is typically stored in a

tree data structure such as a Merkle tree or Merkle Patricia Trie. However, the state can become

very large, and rebuilding the tree for the purposes of verification can be expensive. This can

make node sync times very long, making them harder to function and ultimately decreasing how

many nodes are run.

A research initiative called “Stateless Ethereum” aims to make nodes in Ethereum easier and

faster to spin up by requiring the bare minimum amount of information to ensure the validity of
the state. This could enable nodes to begin functioning in minutes rather than days, which could

serve as an enormous improvement on the status quo.

The most traditional way to sync a node is by using the full sync method, which involves starting

at the genesis block to sync. Alternatively, fast sync could be used, which starts requesting

blocks from a trusted checkpoint and then switches to full sync as soon as it catches up.

The closest iteration of Stateless Ethereum in research has been the exploration of a “beam sync”

mode, which only pulls the data it needs to execute changes to the state, rather than downloading

the whole state.

In beam sync, clients begin watching and executing transactions as they happen, and request a

witness (proof) for each block for any information it does not have. The client can then gradually

rely more on its locally computed state as it builds up its own history of transactions.

It is prudent to note that statelessness is a spectrum: a truly stateless client would not store any

state itself; instead it would store only the latest transactions, together with witnesses, to execute

the next block.

In practice, there will probably be a spectrum of stateful nodes, some providing full information

and some receiving selected portions of it. For example, full state nodes would compute a

witness and attach it to a block; partial state nodes would only keep state for a few blocks, or

would simply watch the state relevant to them and request the rest of the data from witnesses.

(Zero state nodes would rely entirely on witnesses to verify blocks.)

Fraud/validity proofs and data availability

Most light clients operate on the assumption that the majority of miners/validators are honest,

and simply check that the miners/validators have supported a given block rather than verifying
the block themselves. However, a set of malicious nodes might be able to attack light clients and

submit invalid blocks.

One way to protect against such behavior is to introduce a system of alerts so honest full nodes

can report an invalid block to light clients. Specifically, fraud proofs can be used to report

dishonest behavior and additionally weaken the honest majority assumption. If a verifying node

processes a block and finds that it is invalid, it can create a “fraud proof” containing information

from the block and Merkle tree to convince any light client that the block is invalid.

Light clients could simply take this proof and verify the block themselves, even if they are given

no other data. With fraud proofs, light clients have full assurance about the state of a blockchain,

and are provided with a better security model as long as there is at least one honest node (1 of

N). In a stateless validation setting, light clients would need to verify individual blocks only if

they hear alarms (and where the alarms are verifiable).

However, what happens if an attacker creates an invalid block but does not release data about it

(called the data availability problem)? Fishermen — actors who check for invalid blocks —

would not have enough data to prove that the block is invalid. Furthermore, the resulting game

between fisherman and attacker could become complicated, as the attacker could publish the data

at any time if accused of bad behavior.

One solution is to create “proofs” of data availability by the use of erasure codes (e.g. Reed

Solomon codes), a cryptographic technique that allows a piece of information to be divided into

many pieces (codes) but reconstructed with only a subset of the pieces. Using erasure codes,

light clients would be able to prove the data availability of a block probabilistically by

downloading only certain chunks of data.


Another source of improvement is using SNARKs or STARKs to create validity proofs, which

are cryptographically verifiable proofs that allow block producers to prove to clients that a block

satisfies some arbitrarily complex conditions. The light client would simply need to download

the header, verify the proof, and then randomly sample some Merkle tree branches of erasure-

coded data for data availability checks.

Conclusion

It is clear that a wide ecosystem of client types is required to serve a variety of blockchain users

and use cases, and to maintain truly healthy blockchain networks.

While full nodes must exist for decentralized blockchain networks, the barriers to entry remain

high, and not every user can run a full node.

Light and stateless clients are therefore necessary to improve the accessibility and

decentralization of blockchains by increasing participation — and are more convenient for most

users. The easier validation is, the greater the chance that new nodes can sync with a chain,

which makes the network more resilient to attacks.

The future of blockchain clients is exciting. As new research is implemented, we will see designs

that are drastically more functional, performant, and accessible. Novel cryptographic tools such

as SNARKs and STARKs will accelerate the progress of light clients and lead to improvements

in areas such as sharding and cross-chain protocols — or simply enable use cases that have not

yet been imagined.

As we progress these systems further, by developing different technologies and adopting new

trust models, our definition of validation and even decentralization may change. Lightweight

verification has already enabled more robust social coordination with 1 of N trust models, and

we now realize that not everyone is required to validate everything in a blockchain.


And so, perhaps some day, anyone with a smartphone and internet connection will be securely

connected to blockchain networks, and we will all have access to a truly global financial system.

Query & Transact

Anyone building products and services with blockchain data needs access to reliable read/write

nodes, as nodes are the access points into the entire ecosystem. But developing and managing

decentralized and resilient node infrastructure in-house is not a simple task — especially when

trying to support a diverse range of blockchain protocols. Relying on a provider that rate-limits

data usage, or only supports a few networks, is not an option for many businesses that anticipate

rapid growth.

Query & Transact by Coinbase Cloud is an infrastructure product designed for companies and

entrepreneurs who face the challenges of developing and managing decentralized and resilient

node infrastructure as they build secure Web 3.0 applications. QT provides a robust link between

off-chain systems and blockchain networks, making it significantly easier for companies to add

blockchain support and expand their protocol coverage without investing to develop in-house

capabilities.

“Whether you’re an established company looking to free up engineering resources, or you’re a

team just getting started in the blockchain space and you want to build something with secure

and reliable access to these chains, QT clusters make it fast and easy to build on any of these

blockchains. We’re thankful to our early QT customers who helped make this a better product

than anything that exists in the ecosystem.”

— Joe Lallouz, Coinbase Cloud

In addition to offering full nodes, we’re proud to offer archival nodes as part of QT clusters. QT

Archival includes complete block-by-block information about the state of the network — data
not included in a full node’s ledger. Data and machine learning (ML) companies can make use of

archival nodes without the hassle and expense of maintaining them in-house. Learn More…

You might also like