You are on page 1of 12

Subsquid DAO Whitepaper

Dmitrii Zhelezov
Draft v0.6

Abstract
The mission of Subsquid is to bring scalable data APIs to Web 3.0 by
transforming a traditional SaaS model into a DAO with open participa-
tion. The Subsquid DAO replaces the standard approach of redundant
monolithic network nodes with heterogeneous workers. The workers are
incentivized and are subject to on-chain accountability and democracy.

Disclaimer
This Subsquid DAO Whitepaper is for information purposes only. In particular,
the roadmap section and all token-related aspects are subject to change due to
the nature of the regulatory environment. The information set forth below may
not be exhaustive and does not imply any elements of a contractual relationship.
This Whitepaper does not constitute a prospectus or offer document of any
sort and is not intended to constitute an offer of securities or a solicitation for
investment in securities in any jurisdiction. Upon receipt of this version of the
Subsquid DAO Whitepaper, all versions bearing a lower version number or an
earlier date of issue and having been supplied prior to this document become
void. For additional inquiries, please contact us via email contact@subsquid.io.

1 Motivation
Decentralized applications require complex and versatile backend in order to
compete with traditional Web 2.0 services and deliver comparable user expe-
rience. Popular websites like uniswap.org already see 10M+ monthly visits,
totalling to at least 100M+ backend hits and thus require a highly scalable
infrastructure to serve the actual on-chain data and analytics.
Another challenge facing Web 3.0 developers is the increasing logical com-
plexity of consumer-facing decentralized apps. For example, Uniswap[1] now
offers analytics covering the prices, trading volumes and other metrics. How-
ever, such data is not readily available on-chain, and so a dedicated index-
ing/aggregating middleware is needed.
TheGraph[2] is one of the fist projects to tackle the two problems above on
Ethereum. The centralized solution has seen a convincing product market fit.

1
It remains to be seen if the decentralized network will be able to keep up with
the increasing load and demand for customization. Indeed, introducing new
features to distributed protocols is known to be hard and cumbersome, as the
majority of the network participants must simultaneously upgrade the software
in order to avoid conflicts.

2 Subsquid DAO
The mission of Subsquid is to unlock the complexity and expressiveness of Web
2.0 APIs in a decentralized way, relying on accountability and governance in-
stead of redundancy of the services provided by anonymous network nodes run-
ning identical software (see e.g. TheGraph[2]). We envision a distributed in-
frastructure of loosely coupled services run by bonded workers with social rep-
utation and fully auditable logs. The first iteration is similar to a traditional
SaaS service and is fully controlled by Subsquid.io. As the community around
the project grows, the key governance and technical decisions are handed over
to the Subsquid DAO with on-chain democracy. The final goal is to fulfill the
the following high-level guiding principles:
• There is a free market for each service provided by the Subsquid DAO and
there are no entry barriers to participate in the DAO
• All service providers are responsible for fulfilling the agreed SLAs, and are
subject to bond slashing in case of violating the commitments (subject to
arbitration)
• The Subsquid DAO adds value by providing a marketplace for different
roles on the platform. SQD token holders control the development, arbi-
tration and execution of the platform via the governance process

Such “decentralization via governance” relies on electing a council, a small group


of reputable representatives responsible for operations and development of the
DAO. The council members are expected to have significant technical knowledge
and experience, yet all the decisions are fully transparent and are constantly
assessed by the community. This form of governance has proved to work well
by such projects as MakerDAO, Yearn, Sushi. The most notable example of a
complex middleware governed by a DAO is a Substrate-based project Joystream,
which managed to build a dedicated tech-driven community maintaining a video
platform.

3 Decentralization trade-offs
The trend we see in Web 3.0 departs from the crypto-anarchist fully anonymous
and peer-to-peer design and is being transformed into hybrid forms.
Oftentimes consumer-facing domains like uniswap.org are controlled by
a centralized and well known entity backed by traditional investment firms.

2
At the same time the website backend is powered by a smart contract and
thus censorship-resistant and decentralized to the extent guaranteed by the
blockchain.
Such a design can hardly withstand a nation-state level attack and indeed
consumer-facing dApps may not consider such risks at all. Instead, the legal
entities controlling the domain names tend to be fully compliant with the local
regulators1 . We believe that such hybrid centralized and fully compliant web
pages/mobile apps with decentralized backend (sometimes called “CeDApps”)
will likely dominate the Web 3.0 scene in the coming future.

• The data must be copied over across multiple nodes to be highly available
• If any computation is performed on the data, it must be deterministic
• Either there is a cryptographic proof of correctness or a quorum of inde-
pendent nodes is hit with the same consumer query and the validity is
determined by the majority rule. Note that the consensus guarantees are
far weaker than that of the underlying blockchain anyway.

A fully permissionless and decentralized backend should have the highest


level or replication across the network node and thus inherently has the following
limitations.
In fact there is a trade-off between the expressiveness (that is, complexity of
the data transformation scripts and the data sources) and the verifiability of a
service.
On the left end of the spectrum we have ‘hard’ data with high replication
rate, such as any data stored on-chain. Such data is guaranteed to be immutable
and available from any node as dictated by the consensus rules. At the right
end of the spectrum one can place “soft” data, which may be available only at
a single node and is hard to replicate. It can be real time data streams or an
output of a non-deterministic algorithm (e.g. content discovery engines).

To illustrate, account balances in a crypto wallet app is a good example of


“hard” data – one can instantly reconnect to any node in the network and get
the same result (many wallets even check the consistency across random nodes
in the background).
1 For example, one of the largest peer-to-peer lending protocols Aave has been granted an

Electronic Money Institution license by the U.K. Financial Conduct Authority, even though
the protocol is fully autonomous and is controlled by Ethereum smart contracts.

3
To make an example of ‘soft’ data, assume one is willing to build a decen-
tralized Twitter service. In order to deliver a comparable user experience the
decentralized API should provide the methods to search over the text data,
show recommendations and trending hashtags. The API is expected to output
recommendations for each user. However, the output of the algorithm is going
to be different at multiple query nodes, which makes it problematic to reach a
quorum consensus over multiple nodes and reach a quorum consensus. Even if
such replication is technically possible, the amount of resources makes quorum
consensus prohibitively expensive. A more viable approach is to optimistically
assume that a single query node outputs sound results. On top of that, the
consumer can hire an independent and reputable attester, who would analyze
the query node outputs and check that the service meets the pre-defined SLAs.
If a provable misbehaviour is found, then the query node owner can be punished
(e.g. by slashing a collateral), otherwise a subjective reputation is going to be
the main factor in the long run.

4 Fat Indexers: serving the long tail of APIs


The key design decision of Hydra is the introduction of a dedicated indexing
service for providing access to the raw blockchain data. A traditional monolithic
approach to building a decentralized query node network assumes that the query
nodes are run by the operators as a black box with little to no customization.
Each network node provides an end-to-end pipeline which ingests the blockchain
data from a blockchain node, performs the necessary data transformations and
also serves the data through a GraphQL endpoint. Scaling such a network is
problematic, as such nodes require beefy hardware to hold an index at the same
time keep up with the API traffic.
In fact, with a limited number of indexing nodes in the network, there are
high opportunity costs for indexing new API pipelines (subgraphs in the The-
Graph parlance). For example, to mitigate this issue TheGraph relies on a
forward-looking selection by Curators [3], making it hard for small and lesser-
known projects to get indexed.
Subsquid/Hydra decouples the heavy indexing part from the rest of the
pipeline as illustrated on Figure 1. We expect that each indexer can easily serve
at least ten processor and each processor can replicate the data to at least ten
edge gateways, thus achieving the 1:100 indexer-to-API ratio. This makes the
intermediary data transformation and API serving steps cheap, opening up the
API market for the long tail of dApps2 .
At the same, the apparent centralization around Hydra Indexers is mitigated
by the ability to quickly verify the indexer output. Unlike the output of the final
API, the indexer serves the on-chain data, which may be batched across different
blocks but is never transformed. Thus, it is easy to verify by comparing the block
2 Here we assume that the dApp usage will follow a power law, with the top dApps capturing

80% of the traffic. We expect that the rest 20% of the traffic will come from thousands of less
popular and smaller applications.

4
Figure 1: Left: each node in the network is an end-to-end middleware from data
ingestion to a GraphQL endpoint. Right: Multiple lightweight Hydra processors
share a smaller pool of Hydra indexers.

hashes. At the later stages, one can completely eliminate trust by attaching zk-
based proofs of correctness. Again, such proofs are hardly feasible when applied
API responses serving the block data after arbitrary transformations.

5 Subsquid DAO Roles


5.1 Consumers
The workflow for requesting an API from the Subsquid DAO is as follows:

1. The consumer issues an on-chain request with the project distributed via
IPFS storage [4], defines service-level agreements (SLAs) and locks a cer-
tain amount of SQD tokens (or equivalent in other currency) as a collat-
eral for future API requests. SQD tokens will be used to cover the fees
requested by the workers running the required data pipeline and serving
the requests.
2. The consumer gets the bids from the query node Operators and chooses
one or multiple. This step is optional if the there are query node Operators
with fixed quotes known to the consumer in advance.
3. The chosen Operators accepts the requests and hires or provisions an
indexer, processor and a gateway.
4. The query node Operator reports an API endpoint together with the way
to access monitoring metrics and logs

5
5. The query node Operator is paid as defined in API request (e.g. per
request or a subscription fee) from the funds locked by the Consumer. A
certain percentage of the fee is burned. If the fee currency is different from
SQD, the payment currency is converted to SQD before burning.

The fee rebates to the worker depend on the amount of SQD bonded by
the worker. A de-facto standard way to ratio the budget in a fair way is to
distribute the fee as defined by the Cobb-Douglas production function, as was
pioneered by the 0x project[5]. Indeed then a rational actor would choose the
amount of stake aligned with the expected workload, so getting a higher load
of request would require a higher stake.

5.2 Query Node Operators


Query node operators are responsible for fulfilling the incoming API request
within the specified budget. A query node operator manages the whole query
node stack, “hiring” the indexers, processors and gateway operators to meet
the required SLAs. One can think of query node operators as independent SaaS
companies operating within DAO, which may or may not operate their own
workers.
We expect that the query node operators will accumulate sufficient expertise
and reputation within the DAO, so that the bidding for API requests will be
automated. The amount of staked SQD, the level of transparency, quality of
reporting and decentralization (whether the node operator controls the full stack
or hires independent workers) will likely be the deciding factors for consumers
for hiring a node operator.
Subsquid.io will be the first query node operator.

5.3 Service Provider Roles


Our decentralization via governance approach allows us to gradually shift from
the current centralized deployments to a distributed infrastructure controlled
by Subsquid DAO. We envision that complex APIs would require a layered
infrastructure fulfilled by workers with multiple roles.
During the first phase following worker roles will be present in the system:

• Gateway operators. Operate API gateways and fulfill the requests from
the consumer dApps.
• Analytics providers. Similar to gateway operators, but with a focus on
expensive one-off analytics queries on the data and dashboarding
• Processors & Storage providers. Transform the input data from dif-
ferent sources as defined by the on-chain request from the API consumer
• Indexers. Index raw on-chain data and provide API access via indexer-
gateway Node operators: Run blockchain nodes

6
During the initial (centralized) phase Subsquid.io is going to provide services
for all the roles above.

5.4 Arbitrators
Most interactions between the consumer and the final gateway, as well as the
data flows between the network participants occur off-chain. However, any
response from an actor in the system is signed with the public key registered
on-chain and is publicly available for a considerable amount of time (via IPFS).
We expect that the API consumers (that is, dApp owners) will regularly audit
the authenticated logs as part of their standard routine. If there is an error
in the response or any internal malfunctioning, a claim is issued on-chain. An
open claim is then processed via arbitration by the technical council elected
through the governance. We expect that provable data manipulation cases will
be extremely rare and most cases will be caused by software bugs. In case a
software bug is discovered, the reporter will be rewarded from the bug bounty
fund of the DAO.

6 Query-node as an oracle
Oftentimes the query node API would benefit greatly from accessing external
data, such as an external price feed. A traditional way is to use an oracle feed
for delivering external data through a pluggable pallet (e.g. ChainLink pallet
for Substrate [6]). The external oracle data is then available via event data
picked up by the Hydra Indexers and Processors.

Figure 2: Oracle data can be exposed as runtime event data and captured by
the Hydra Processor.

This approach is limited to the data feeds provided by the oracle services
and even aggregating historical oracle data can be problematic.
A dedicated Subsquid Pallet will enable Hydra-based Query Node data on-
chain (see Figure 3). This gives a lot more flexibility for the end users, as the
Query Node data can be enriched with aggregations and any other on-chain
data, while preserving the integrity provided by the original oracle feeds.

7
Figure 3: The Subsquid palette brings the flexibility of Hydra endpoints on-
chain via off-chain workers.

7 SQD Token
7.1 Token use-cases
SQD token is the main unit of accounting within Subsquid DAO. It has multiple
use-cases can be described as a hybrid payment, governance and work token.

• Payment medium. SQD is used to collect the fees and distribute across
the DAO treasury and the various worker roles for the provided services.

• Work token. In order to take up a role one should stake SQD token, the
rewards are distributed based on the stake and actual amount of the work
done (as calculated by the Cobb-Douglas formula).
• Delegation. SQD token holders can delegate their tokens to any role to
get a share of the fees.

• Governance. SQD token holders periodically vote for electing a govern-


ing council, responsible for the development of the protocol, the economic
parameters and dispute resolutions.

7.2 Token allocation


The total supply of 1.337B SQD tokens is split between the Subsquid DAO
(Community) and the operational part. The non-community allocation is set
aside to fund the development of the project until the DAO is fully operational.
See Figure 4.

8
Figure 4: SQD allocation breakdown

8 Roadmap
Phase 0: Q3 2020. Hydra: a query-node framework
Hydra is a query node framework bootstrapped at Hackusama hackathon in
July 2020, and was awarded the first prize at the infrastructure track. By
defining the database schema and data transformation rules one can develop a
fully customized GraphQL API for a Substrate chain of choice.
After a year of development, Hydra has evolved into a collection of query-
node services and tools: Hydra Indexer + Hydra Indexer Gateway: Ingests
raw blockchain data, indexes and exposes via expressive GraphQL API Hydra
Processor: Extracts the relevant data from Hydra Indexers, transforms and
loads into the query node storage Hydra CLI: code generation for the query
node server, including the database schema, Apollo-based GraphQL server with
extensive filtering support.

Phase 1: Q3 2021. Support of Hydra Indexers by Subsquid


The most resource-intensive part of the Hydra pipeline is the Hydra Indexer,
as it keeps a full index of the blockchain data. Subsquid has deployed Hydra
Indexers for the following key Substrate chains (and counting):
1. Polkadot
2. Kusama
3. Karura
4. Edgeware
5. Equilibrium
6. Robonomics

9
7. Subsocial
8. Joystream (Sumer Testnet)

Phase 2: Q4 2021. Query-Node-as-a-Service by Subsquid


The current phase extends the support of Hydra services by Subsquid. Apart
from using the publicly-available indexers, it will be possible to deploy a production-
ready Hydra-powered Query Nodes using a simple CLI tool. No need to host
the infrastructure, and the transition from a local development environment to
a production-ready one is just a matter of running a few CLI commands.

Phase 3: Q1 2022. Community-run Indexers


During the third phase we will be opening Indexer roles to the community.
Community-run Hydra Indexers will be used by Subsquid.io. Community-run
indexers will be rewarded through grants from the Subsquid DAO treasury.

Phase 4: Q3 2022. Subsquid DAO testnet. Onboarding


Query Node operators
During this phase query node operators will be manually selected from the
community members to bootstrap the DAO. On-chain API requests are live on
testnet. Query Node operators are rewarded through grants from the Subsquid
DAO treasury.

Phase 5: Q4 2022. SQD delegation, query node data on-


chain
• The worker roles will receive organic rewards fulfilling requests from the
query node operators

• The delegators receive a fee share


• A percentage of the fees is burned by the Subsquid DAO
• Query node data can be accessed by parachains as an oracle feed (e.g. via
integration with Kylin network or other oracle solutions) or via integration
with a Subsquid pallet.

9 Competitive advantages
9.1 Fast development cycles and local testability
Using a Hydra Indexer is as simple as pasting a URL to the config file. It
offloads the developers from the resource-intensive task of fetching the historical
data from the chain, which may take hours and even days not to mention the

10
runtime upgrades and maintenance. The lightweight processing pipeline enables
easy debugging and fast iterations.

9.2 Focus on type-safety and advanced type system


With Hydra, the vast majority of ETL errors are caught at compilation. We be-
lieve that maintainable enterprise-grade APIs require strong and expressive type
systems offered by modern languages. Our schema definition language supports
algebraic types, interfaces, custom JSON fields, entity relations. The additional
code generation tooling guarantees data consistency even across runtime up-
grades. The most common data deserialization errors are caught at compile
time, unlocking truly complex use-cases. For example, the Hydra schema pow-
ering Joystream Atlas has 50+ entities.

9.3 Multi-role architecture vs monolithic nodes


Our approach allows arbitrary extensions and with potential inclusion of ar-
bitrary data sources, including off-chain and/or oracle data. Our solution is
similar to modern loosely-coupled microservices architecture, compared to the
monolithic approach used by the competitors. This is achieved through hybrid
on- and off- chain accountability of the query node operators providing the API
access.

9.4 Better resource utilisation


Our indexers can be used as a standalone explorer-like service or pipelined with
a Hydra processor. Since many processors can share the same indexer, there’s
no need to run a resource-intensive archival node and eliminate the redundant
per-project indexing.

9.5 Flexible payment method


The customer is free to choose the payment currency – it can be SQD or any
other currency approved by the governance as the legal tender for the Subsquid
DAO. However, the protocol fee is always converted to SQD and then burned.

References
[1] Uniswap info. https://info.uniswap.org, last accessed on 30/07/21.
[2] TheGraph. https://thegraph.com, last accessed on 30/07/21.
[3] TheGraph Curators. https://thegraph.academy/curators/
definition/, last accessed on 30/07/21.
[4] Interplanetary File System. /https://ipfs.io/, last accessed on 30/07/21.

11
[5] Research on protocol fees and Liquidity incentives. https://gov.0x.org/t/
research-on-protocol-fees-and-liquidity-incentives/340, last ac-
cessed on 30/07/21.
[6] Chainlink-Polkadot. https://github.com/smartcontractkit/
chainlink-polkadot, last accessed on 30/07/21.

12

You might also like