Professional Documents
Culture Documents
Dmitrii Zhelezov
Draft v0.6
Abstract
The mission of Subsquid is to bring scalable data APIs to Web 3.0 by
transforming a traditional SaaS model into a DAO with open participa-
tion. The Subsquid DAO replaces the standard approach of redundant
monolithic network nodes with heterogeneous workers. The workers are
incentivized and are subject to on-chain accountability and democracy.
Disclaimer
This Subsquid DAO Whitepaper is for information purposes only. In particular,
the roadmap section and all token-related aspects are subject to change due to
the nature of the regulatory environment. The information set forth below may
not be exhaustive and does not imply any elements of a contractual relationship.
This Whitepaper does not constitute a prospectus or offer document of any
sort and is not intended to constitute an offer of securities or a solicitation for
investment in securities in any jurisdiction. Upon receipt of this version of the
Subsquid DAO Whitepaper, all versions bearing a lower version number or an
earlier date of issue and having been supplied prior to this document become
void. For additional inquiries, please contact us via email contact@subsquid.io.
1 Motivation
Decentralized applications require complex and versatile backend in order to
compete with traditional Web 2.0 services and deliver comparable user expe-
rience. Popular websites like uniswap.org already see 10M+ monthly visits,
totalling to at least 100M+ backend hits and thus require a highly scalable
infrastructure to serve the actual on-chain data and analytics.
Another challenge facing Web 3.0 developers is the increasing logical com-
plexity of consumer-facing decentralized apps. For example, Uniswap[1] now
offers analytics covering the prices, trading volumes and other metrics. How-
ever, such data is not readily available on-chain, and so a dedicated index-
ing/aggregating middleware is needed.
TheGraph[2] is one of the fist projects to tackle the two problems above on
Ethereum. The centralized solution has seen a convincing product market fit.
1
It remains to be seen if the decentralized network will be able to keep up with
the increasing load and demand for customization. Indeed, introducing new
features to distributed protocols is known to be hard and cumbersome, as the
majority of the network participants must simultaneously upgrade the software
in order to avoid conflicts.
2 Subsquid DAO
The mission of Subsquid is to unlock the complexity and expressiveness of Web
2.0 APIs in a decentralized way, relying on accountability and governance in-
stead of redundancy of the services provided by anonymous network nodes run-
ning identical software (see e.g. TheGraph[2]). We envision a distributed in-
frastructure of loosely coupled services run by bonded workers with social rep-
utation and fully auditable logs. The first iteration is similar to a traditional
SaaS service and is fully controlled by Subsquid.io. As the community around
the project grows, the key governance and technical decisions are handed over
to the Subsquid DAO with on-chain democracy. The final goal is to fulfill the
the following high-level guiding principles:
• There is a free market for each service provided by the Subsquid DAO and
there are no entry barriers to participate in the DAO
• All service providers are responsible for fulfilling the agreed SLAs, and are
subject to bond slashing in case of violating the commitments (subject to
arbitration)
• The Subsquid DAO adds value by providing a marketplace for different
roles on the platform. SQD token holders control the development, arbi-
tration and execution of the platform via the governance process
3 Decentralization trade-offs
The trend we see in Web 3.0 departs from the crypto-anarchist fully anonymous
and peer-to-peer design and is being transformed into hybrid forms.
Oftentimes consumer-facing domains like uniswap.org are controlled by
a centralized and well known entity backed by traditional investment firms.
2
At the same time the website backend is powered by a smart contract and
thus censorship-resistant and decentralized to the extent guaranteed by the
blockchain.
Such a design can hardly withstand a nation-state level attack and indeed
consumer-facing dApps may not consider such risks at all. Instead, the legal
entities controlling the domain names tend to be fully compliant with the local
regulators1 . We believe that such hybrid centralized and fully compliant web
pages/mobile apps with decentralized backend (sometimes called “CeDApps”)
will likely dominate the Web 3.0 scene in the coming future.
• The data must be copied over across multiple nodes to be highly available
• If any computation is performed on the data, it must be deterministic
• Either there is a cryptographic proof of correctness or a quorum of inde-
pendent nodes is hit with the same consumer query and the validity is
determined by the majority rule. Note that the consensus guarantees are
far weaker than that of the underlying blockchain anyway.
Electronic Money Institution license by the U.K. Financial Conduct Authority, even though
the protocol is fully autonomous and is controlled by Ethereum smart contracts.
3
To make an example of ‘soft’ data, assume one is willing to build a decen-
tralized Twitter service. In order to deliver a comparable user experience the
decentralized API should provide the methods to search over the text data,
show recommendations and trending hashtags. The API is expected to output
recommendations for each user. However, the output of the algorithm is going
to be different at multiple query nodes, which makes it problematic to reach a
quorum consensus over multiple nodes and reach a quorum consensus. Even if
such replication is technically possible, the amount of resources makes quorum
consensus prohibitively expensive. A more viable approach is to optimistically
assume that a single query node outputs sound results. On top of that, the
consumer can hire an independent and reputable attester, who would analyze
the query node outputs and check that the service meets the pre-defined SLAs.
If a provable misbehaviour is found, then the query node owner can be punished
(e.g. by slashing a collateral), otherwise a subjective reputation is going to be
the main factor in the long run.
80% of the traffic. We expect that the rest 20% of the traffic will come from thousands of less
popular and smaller applications.
4
Figure 1: Left: each node in the network is an end-to-end middleware from data
ingestion to a GraphQL endpoint. Right: Multiple lightweight Hydra processors
share a smaller pool of Hydra indexers.
hashes. At the later stages, one can completely eliminate trust by attaching zk-
based proofs of correctness. Again, such proofs are hardly feasible when applied
API responses serving the block data after arbitrary transformations.
1. The consumer issues an on-chain request with the project distributed via
IPFS storage [4], defines service-level agreements (SLAs) and locks a cer-
tain amount of SQD tokens (or equivalent in other currency) as a collat-
eral for future API requests. SQD tokens will be used to cover the fees
requested by the workers running the required data pipeline and serving
the requests.
2. The consumer gets the bids from the query node Operators and chooses
one or multiple. This step is optional if the there are query node Operators
with fixed quotes known to the consumer in advance.
3. The chosen Operators accepts the requests and hires or provisions an
indexer, processor and a gateway.
4. The query node Operator reports an API endpoint together with the way
to access monitoring metrics and logs
5
5. The query node Operator is paid as defined in API request (e.g. per
request or a subscription fee) from the funds locked by the Consumer. A
certain percentage of the fee is burned. If the fee currency is different from
SQD, the payment currency is converted to SQD before burning.
The fee rebates to the worker depend on the amount of SQD bonded by
the worker. A de-facto standard way to ratio the budget in a fair way is to
distribute the fee as defined by the Cobb-Douglas production function, as was
pioneered by the 0x project[5]. Indeed then a rational actor would choose the
amount of stake aligned with the expected workload, so getting a higher load
of request would require a higher stake.
• Gateway operators. Operate API gateways and fulfill the requests from
the consumer dApps.
• Analytics providers. Similar to gateway operators, but with a focus on
expensive one-off analytics queries on the data and dashboarding
• Processors & Storage providers. Transform the input data from dif-
ferent sources as defined by the on-chain request from the API consumer
• Indexers. Index raw on-chain data and provide API access via indexer-
gateway Node operators: Run blockchain nodes
6
During the initial (centralized) phase Subsquid.io is going to provide services
for all the roles above.
5.4 Arbitrators
Most interactions between the consumer and the final gateway, as well as the
data flows between the network participants occur off-chain. However, any
response from an actor in the system is signed with the public key registered
on-chain and is publicly available for a considerable amount of time (via IPFS).
We expect that the API consumers (that is, dApp owners) will regularly audit
the authenticated logs as part of their standard routine. If there is an error
in the response or any internal malfunctioning, a claim is issued on-chain. An
open claim is then processed via arbitration by the technical council elected
through the governance. We expect that provable data manipulation cases will
be extremely rare and most cases will be caused by software bugs. In case a
software bug is discovered, the reporter will be rewarded from the bug bounty
fund of the DAO.
6 Query-node as an oracle
Oftentimes the query node API would benefit greatly from accessing external
data, such as an external price feed. A traditional way is to use an oracle feed
for delivering external data through a pluggable pallet (e.g. ChainLink pallet
for Substrate [6]). The external oracle data is then available via event data
picked up by the Hydra Indexers and Processors.
Figure 2: Oracle data can be exposed as runtime event data and captured by
the Hydra Processor.
This approach is limited to the data feeds provided by the oracle services
and even aggregating historical oracle data can be problematic.
A dedicated Subsquid Pallet will enable Hydra-based Query Node data on-
chain (see Figure 3). This gives a lot more flexibility for the end users, as the
Query Node data can be enriched with aggregations and any other on-chain
data, while preserving the integrity provided by the original oracle feeds.
7
Figure 3: The Subsquid palette brings the flexibility of Hydra endpoints on-
chain via off-chain workers.
7 SQD Token
7.1 Token use-cases
SQD token is the main unit of accounting within Subsquid DAO. It has multiple
use-cases can be described as a hybrid payment, governance and work token.
• Payment medium. SQD is used to collect the fees and distribute across
the DAO treasury and the various worker roles for the provided services.
• Work token. In order to take up a role one should stake SQD token, the
rewards are distributed based on the stake and actual amount of the work
done (as calculated by the Cobb-Douglas formula).
• Delegation. SQD token holders can delegate their tokens to any role to
get a share of the fees.
8
Figure 4: SQD allocation breakdown
8 Roadmap
Phase 0: Q3 2020. Hydra: a query-node framework
Hydra is a query node framework bootstrapped at Hackusama hackathon in
July 2020, and was awarded the first prize at the infrastructure track. By
defining the database schema and data transformation rules one can develop a
fully customized GraphQL API for a Substrate chain of choice.
After a year of development, Hydra has evolved into a collection of query-
node services and tools: Hydra Indexer + Hydra Indexer Gateway: Ingests
raw blockchain data, indexes and exposes via expressive GraphQL API Hydra
Processor: Extracts the relevant data from Hydra Indexers, transforms and
loads into the query node storage Hydra CLI: code generation for the query
node server, including the database schema, Apollo-based GraphQL server with
extensive filtering support.
9
7. Subsocial
8. Joystream (Sumer Testnet)
9 Competitive advantages
9.1 Fast development cycles and local testability
Using a Hydra Indexer is as simple as pasting a URL to the config file. It
offloads the developers from the resource-intensive task of fetching the historical
data from the chain, which may take hours and even days not to mention the
10
runtime upgrades and maintenance. The lightweight processing pipeline enables
easy debugging and fast iterations.
References
[1] Uniswap info. https://info.uniswap.org, last accessed on 30/07/21.
[2] TheGraph. https://thegraph.com, last accessed on 30/07/21.
[3] TheGraph Curators. https://thegraph.academy/curators/
definition/, last accessed on 30/07/21.
[4] Interplanetary File System. /https://ipfs.io/, last accessed on 30/07/21.
11
[5] Research on protocol fees and Liquidity incentives. https://gov.0x.org/t/
research-on-protocol-fees-and-liquidity-incentives/340, last ac-
cessed on 30/07/21.
[6] Chainlink-Polkadot. https://github.com/smartcontractkit/
chainlink-polkadot, last accessed on 30/07/21.
12