Professional Documents
Culture Documents
There are four basic blockchain client types, each of which has a different use within the context
of a network. Determining the differences between the four types can help you choose the right
Blockchains are decentralized networks that require many parties to sync with them in a peer-to-
peer manner. Each blockchain may support multiple software implementations, or clients, that
enable users to take part. A user may execute one of these software implementations in different
set forms, called client types or nodes, and these formed nodes are the parties that create the
decentralized network.
Typically, each node stores its own copy of the blockchain and keeps track of incoming
transactions to have an up-to-date view of the network. These nodes are necessary to ensure the
correctness of the chain, prevent malicious activity from occurring, and maintain decentralized
consensus.
However, for some users the storage and computing requirements of blockchains can prove a
barrier to entry — and it is technically difficult to run nodes that require a full download of the
blockchain. Thus, new ways are needed to trustlessly sync with the chain in a secure fashion.
Light clients
Stateless clients
Each has its own characteristics, and each can be used in a different way within the context of a
blockchain network.
Full nodes
The most standard type of node is a full node. Full nodes store all the blockchain data on disk
and verify the rules of the network — which include tasks such as participating in block
validation, receiving and verifying all transactions, and generally serving the network with data.
Full nodes must also store a copy of the state, a data structure that holds the status of users in the
network — such as the UTXO set in Bitcoin, or all the accounts and balances in Ethereum (in
which the state is respectively represented as a Merkle tree and a modified Merkle Patricia Trie).
Full nodes are distinct from miners, as miners simply reorder or remove transactions from the
data received by nodes and then perform the mining process to solve cryptographic puzzles.
While clients in general must follow a formal specification, a given network can be open to
different client implementations. For example, the Ethereum network consists mostly of Geth
and Parity nodes (mostly Geth), and eth2 will support a large variety of client implementations
However, full nodes must keep track of a significant quantity of information, requiring large
amounts of storage and bandwidth to operate (SSDs must be used due to the volume of
read/write operations). For example, as of early October 2020, the Bitcoin blockchain took up
around 300 GB on Geth, and the Ethereum blockchain used around 500 GB.
Besides, though nodes require 24/7 uptime and a high level of technical knowledge to maintain
them, there is generally no direct economic incentive to run a full node. As a result, many users
running full nodes are businesses such as exchanges or infrastructure providers who rely on the
While running a full node is technically challenging, there are benefits to doing so. First, running
a full node is the most secure way to access a network. It guarantees maximum self-sovereignty
because you can trustlessly verify that all network rules are being followed. By running a full
node you also improve the decentralization and overall health of the network by acting as a data
provider and protecting other clients from being tricked by malicious nodes or miners.
Full nodes help secure the network by verifying all transactions rather than just those that are
relevant to them, and further secure the network by alerting other client types of invalid blocks.
In some networks, certain other client types rely on full nodes to verify transactions and cannot
In some networks you may also be able to receive rewards for running a full node. For example,
Celo aims to address the lack of economic incentives for operating a full node: its network
allows individuals who run non-validating full nodes to set gateway fees for answering requests
and forwarding transactions on behalf of other types of clients. We hope to see further
Archival nodes
While full nodes already store a large amount of data, archival nodes take storage even further by
retaining everything included in a full node along with an archive of the historical states of the
chain.
Full node Archival node
Stores current balance of any account in Stores full balance history of any account in
chain chain
Stores state of last few blocks in network Stores history of every state change in network
Has information needed to re-compute Has historical network data stored, does not
historical network data need to re-compute to find
An archival node could be described as a full node with a massive amount of cached historical
data. However, and importantly, an archival node does not provide any more validation or
As of October 2020, archival nodes on Ethereum occupy more than 5.3 TB of data. With such a
large volume of data, to sync an archival node on a network should take approximately two
reduced — the Coinbase Cloud ETH archival node is production-ready in a few hours. Only a
very small number of archival nodes are actually run on the network, due to the lift of spinning
up an archival node on one’s own, and they are typically run by entities such as block explorers,
Light clients
Due to their intense storage and the uptime needed to remain functioning, most users choose not
to run full nodes. Light clients, however, improve the accessibility of blockchain networks for
resource-constrained devices, while giving high security and requiring low computing power.
As a low-resource node, a light client allows users to sync with a blockchain in a
cryptographically secure manner without having to store the whole blockchain. Light clients can
be used to find out the state of an account, check that a transaction was confirmed, or watch for
logged events.
Light clients operate by downloading and verifying a chain of block headers and requesting any
other relevant information, such as transaction data, from full nodes. The header is the smallest
unit that forms a chain, and each header refers back to the previous block’s header. The block
header stores a condensed version of information in the block, including the hash of the previous
This Merkle tree root is a representation of the state of a block and the set of all transactions. It
could be regarded as a fingerprint of information about the block. The goal of a light client is to
verify and archive the headers, and verify received information against the Merkle tree. Only the
specific portion of the state that is relevant to the light client needs to be verified, and proofs
received from full nodes can be verified against the Merkle root in the block header.
While light clients do not need to be run constantly, they must connect to intermediary full nodes
to request data and interact with the blockchain. Verification is trust-minimized: proofs can be
verified regardless of whether the light clients trust the full node.
In Bitcoin, the method above is known as SPV verification. SPV clients trust downloaded
headers as long as they belong to the longest chain. For any given transaction, full nodes provide
light clients with an SPV proof and a Merkle path to the transaction in the tree as the data needed
to verify the transaction. This method can be used for cross-chain interactions such as bridges or
sidechains.
Light clients are well suited for low-capacity users, such as those using smartphones or browser
extensions, because they are able to maintain a high-security assurance about the state of a chain.
While light clients do not write data to the network, they do make blockchains more accessible to
The design space of a light client is enormous and there is always room for improvement and
more features. Light clients can borrow techniques from cryptography and distributed systems to
Celo’s ultralight client Plumo uses a mix of different cryptography techniques to achieve
lightweight validation. In general, SPV verification for proof of stake networks is expensive:
users need to verify that two-thirds of the validating stake has signed on a block for a given
Celo has improved on this, using epoch-based synching whereby only one header is downloaded
per epoch. In Celo, the validator set changes once only per epoch – and an epoch is one day, so
the load on light clients is already drastically reduced: they need to verify headers only once a
Cryptographic primitives such as BLS signatures can be used to aggregate all of a validator’s
signatures, and SNARKs — proofs used to verify the correctness of a computation without
having to execute it yourself — can be submitted from full nodes to prove the light client
protocol. This process consists of checking the signatures of the last header of each epoch plus
any validator set changes. Using SNARKs, one could (relatively) quickly prove validator set
requirements scale linearly with the chain length of an SPV proof, and can still be a burden in a
larger blockchain. Flyclient is an efficient solution for light client block header verification. It
improves on a protocol called Non-Interactive Proofs of Proof of Work by being compatible with
variable difficulty and hashrates. It also involves short inclusion proofs, which are 10 times
smaller than previous solutions. Flyclient operates by downloading only a logarithmic number of
block headers (instead of having to download every one) while storing only a single block header
between executions.
With Flyclient, one can prove the whole chain is valid using as little information as possible,
enabling easier cross-chain interoperability in decentralized protocols that require light client
verification. ZCash specifically plans to use Flyclient research to implement a ZEC-ETH bridge
(tZEC will implement a light client verification of the ZCash blockchain inside an Ethereum
smart contract).
Stateless clients
Blockchains depend on a shared state that corresponds to the values in a block at any given time.
As explained earlier, the state changes after transactions are executed, and is typically stored in a
tree data structure such as a Merkle tree or Merkle Patricia Trie. However, the state can become
very large, and rebuilding the tree for the purposes of verification can be expensive. This can
make node sync times very long, making them harder to function and ultimately decreasing how
A research initiative called “Stateless Ethereum” aims to make nodes in Ethereum easier and
faster to spin up by requiring the bare minimum amount of information to ensure the validity of
the state. This could enable nodes to begin functioning in minutes rather than days, which could
The most traditional way to sync a node is by using the full sync method, which involves starting
at the genesis block to sync. Alternatively, fast sync could be used, which starts requesting
blocks from a trusted checkpoint and then switches to full sync as soon as it catches up.
The closest iteration of Stateless Ethereum in research has been the exploration of a “beam sync”
mode, which only pulls the data it needs to execute changes to the state, rather than downloading
In beam sync, clients begin watching and executing transactions as they happen, and request a
witness (proof) for each block for any information it does not have. The client can then gradually
rely more on its locally computed state as it builds up its own history of transactions.
It is prudent to note that statelessness is a spectrum: a truly stateless client would not store any
state itself; instead it would store only the latest transactions, together with witnesses, to execute
In practice, there will probably be a spectrum of stateful nodes, some providing full information
and some receiving selected portions of it. For example, full state nodes would compute a
witness and attach it to a block; partial state nodes would only keep state for a few blocks, or
would simply watch the state relevant to them and request the rest of the data from witnesses.
Most light clients operate on the assumption that the majority of miners/validators are honest,
and simply check that the miners/validators have supported a given block rather than verifying
the block themselves. However, a set of malicious nodes might be able to attack light clients and
One way to protect against such behavior is to introduce a system of alerts so honest full nodes
can report an invalid block to light clients. Specifically, fraud proofs can be used to report
dishonest behavior and additionally weaken the honest majority assumption. If a verifying node
processes a block and finds that it is invalid, it can create a “fraud proof” containing information
from the block and Merkle tree to convince any light client that the block is invalid.
Light clients could simply take this proof and verify the block themselves, even if they are given
no other data. With fraud proofs, light clients have full assurance about the state of a blockchain,
and are provided with a better security model as long as there is at least one honest node (1 of
N). In a stateless validation setting, light clients would need to verify individual blocks only if
However, what happens if an attacker creates an invalid block but does not release data about it
(called the data availability problem)? Fishermen — actors who check for invalid blocks —
would not have enough data to prove that the block is invalid. Furthermore, the resulting game
between fisherman and attacker could become complicated, as the attacker could publish the data
One solution is to create “proofs” of data availability by the use of erasure codes (e.g. Reed
Solomon codes), a cryptographic technique that allows a piece of information to be divided into
many pieces (codes) but reconstructed with only a subset of the pieces. Using erasure codes,
light clients would be able to prove the data availability of a block probabilistically by
are cryptographically verifiable proofs that allow block producers to prove to clients that a block
satisfies some arbitrarily complex conditions. The light client would simply need to download
the header, verify the proof, and then randomly sample some Merkle tree branches of erasure-
Conclusion
It is clear that a wide ecosystem of client types is required to serve a variety of blockchain users
While full nodes must exist for decentralized blockchain networks, the barriers to entry remain
Light and stateless clients are therefore necessary to improve the accessibility and
decentralization of blockchains by increasing participation — and are more convenient for most
users. The easier validation is, the greater the chance that new nodes can sync with a chain,
The future of blockchain clients is exciting. As new research is implemented, we will see designs
that are drastically more functional, performant, and accessible. Novel cryptographic tools such
as SNARKs and STARKs will accelerate the progress of light clients and lead to improvements
in areas such as sharding and cross-chain protocols — or simply enable use cases that have not
As we progress these systems further, by developing different technologies and adopting new
trust models, our definition of validation and even decentralization may change. Lightweight
verification has already enabled more robust social coordination with 1 of N trust models, and
connected to blockchain networks, and we will all have access to a truly global financial system.
Anyone building products and services with blockchain data needs access to reliable read/write
nodes, as nodes are the access points into the entire ecosystem. But developing and managing
decentralized and resilient node infrastructure in-house is not a simple task — especially when
trying to support a diverse range of blockchain protocols. Relying on a provider that rate-limits
data usage, or only supports a few networks, is not an option for many businesses that anticipate
rapid growth.
Query & Transact by Coinbase Cloud is an infrastructure product designed for companies and
entrepreneurs who face the challenges of developing and managing decentralized and resilient
node infrastructure as they build secure Web 3.0 applications. QT provides a robust link between
off-chain systems and blockchain networks, making it significantly easier for companies to add
blockchain support and expand their protocol coverage without investing to develop in-house
capabilities.
team just getting started in the blockchain space and you want to build something with secure
and reliable access to these chains, QT clusters make it fast and easy to build on any of these
blockchains. We’re thankful to our early QT customers who helped make this a better product
In addition to offering full nodes, we’re proud to offer archival nodes as part of QT clusters. QT
Archival includes complete block-by-block information about the state of the network — data
not included in a full node’s ledger. Data and machine learning (ML) companies can make use of
archival nodes without the hassle and expense of maintaining them in-house. Learn More…