You are on page 1of 11

Arweave Lightpaper

Version 0.9

Samuel Williams
William Jones
April 24, 2018

Contents 7 Conclusion 10

1 Introduction 2
Abstract
2 Background 2
Typical blockchains have several major well-
3 Motivation 3 known problems with data storage. These
problems require new third-party protocols to
4 Technology 4 be integrated on-top of existing blockchains,
4.1 Blockweave . . . . . . . . . . 4 as fees are too high for on-chain storage to be
4.2 Proof of Access . . . . . . . . 5 feasible. Therefore, with typical blockchains
4.3 Wildfire . . . . . . . . . . . . 5 there is always going to be a cost to access
4.4 Blockshadows . . . . . . . . . 7 content, and content is never stored perma-
4.5 Democratic Content Policy . . 7 nently. As the demand for data storage grows
4.6 Discussion . . . . . . . . . . . 8 exponentially, the need for a decentralized
4.6.1 Storage Pools . . . . . 8 low-cost data storage protocol that can scale
is a necessity.
5 Building Apps 8 In this work we present Arweave – a new
5.1 Client-Server Architecture . . 8 blockchain like structure called the block-
5.2 Serverless Architecture . . . . 9 weave. The blockweave is a platform de-
5.3 Event Based . . . . . . . . . . 9 signed to provide scalable on-chain storage in
5.4 Trustless and Provable . . . . 9 a cost-efficient manner for the very first time.
As the amount of data stored in the system
6 Use Cases 10 increases, the amount of hashing needed for
6.1 Authenticity . . . . . . . . . . 10 consensus decreases, thus reducing the cost of

1
storing data. The protocol’s existing REST Further still, a number of governments are
API makes it trivially simple to build de- taking increasing steps to censor and remove
centralised applications on top of the block- access to politically sensitive information on
weave, reflecting Arweave’s focus on the de- the internet [13, 5, 4]. Equally with media
veloper community and their ability to drive and news organizations, where we once held
adoption of emerging and novel protocols. physical and irrevocable copies, we now sim-
In this paper, we also introduce novel con- ply access the information and then discard
cepts such as; block-shadowing, a flexibly- it. It has become commonplace for media
sized transaction block distribution algo- organisations to update the contents of their
rithm that improves on current ‘sharding’ articles over time. While this provides a num-
techniques by other blockchains, a self- ber of advantages over the previous system,
optimising network topology, and a new con- most prominently, the ability to disseminate
sensus mechanism called proof of access. real-time updates about unfolding situations,
it also allows important context to be lost or
become obscured.
1 Introduction
In this information age we often succumb to 2 Background
the illusion that because information is read-
ily available, it can never be altered or lost. All blockchain innovations sit on the shoul-
This is foundationally untrue [7]. While, in ders of giants, including Bitcoin itself, a sym-
the internet, we have built a monumental phony of data structures, distributed net-
system of decentralised information dissem- working and cryptography. We too have
ination, we have yet to build a correspond- sought to further the space, solving spe-
ing system of permanent knowledge storage. cific shortcomings of existing blockchain net-
Modern history is full of examples of the de- works, namely storage, and along the way a
struction and loss of vital information, from novel approach to transaction speeds. Most
fires at libraries and archives [9, 10, 3, 8], to blockchain technologies today insist that a
book burning in authoritarian states [12, 11]. ”full node” must maintain a copy of the entire
When we look up information on the inter- blockchain in order to verify future transac-
net, we are depending on being allowed ac- tions. While the Merkle data structures that
cess to centralised stores of that data. Ac- make this possible are in and of themselves
cess to the servers that hold this informa- an impressive feat and add a layer of unparal-
tion can be revoked by their owners at any leled security, we feel that some performance
time. Similarly, as serving information on enhancements around this process could re-
the internet requires the paying of server and duce the burden of synchronization for a full
upkeep costs, websites can often simply dis- node. We present in section 4 several tech-
appear when funds are no longer available. nologies that address block, node, and wallet

2
synchronization. Ethereum [12]. In the past, archives (inter-
The full blockchain requirement is per- net or otherwise) have typically been main-
haps even more of a hindrance for exist- tained by a single institution (or even indi-
ing blockchain technologies when it comes to vidual), making them vulnerable to two pri-
storing data. In the case of Ethereum, a mary forms of manipulation. The first of
decentralised world computer, storage is in- these is through the modification of docu-
credibly costly using their native token. Ar- ments during their storage [2]. The second is
weave’s primary motivation is to make per- that the documents could have been forged
manent, immutable storage a reality, in the or modified prior to their entry into storage
same way it is represented in Ethereum. [1]. For example, the many works attributed
However, high fees make this storage increas- to Socrates that are believed to have been
ingly impractical. While it is possible to store penned by his disciples [6]. Arweave solves
data on the Ethereum, previous attempts both of these problems. Once the document
have been impractical due to data storage is stored on the weave, it is cryptographically
costs. linked with every other block on the weave.
Other blockchain technologies have focused This ensures that any attempt to change the
on improving consensus algorithms between contents of the document will be detected and
nodes, notably Stellar Lumens, and dPos ar- rejected by the network. In this way, no sub-
chitectures such as Ark and Neo. While this version of the information on the weave is
may improve transaction speeds, the burden possible. Arweave is a browsable sister net-
of storage still remains the long term hurdle work to the internet, providing the long-term,
many of these networks will face. By focusing permanent data storage features that the in-
on solving storage first, we have experienced ternet desperately needs but currently lacks.
several performance enhancements that can A critical component of the Arweave sys-
be applied to facilitate high-throughput cur- tem is designed for developers to easily build
rency transactions. applications that interface with, create, and
use data from the network. These apps, built
with a language agnostic REST API, will act
3 Motivation as a node in the network that listen to the
network. The functions of these apps will be
We have designed and implemented a wide and varied, from decentralised and im-
blockchain network where permanent, low mutable social networks to discussion web-
cost storage is a reality. Weaving stor- sites and news aggregators. In order to sub-
age access into consensus, combined with mit information to the weave, a small number
novel approaches to transaction bundling of tokens will be required. These tokens will
and arbitrarily sized blocks, creates a high- be used to pay miners for their work in main-
throughput cryptocurrency that improves on taining the weave and network, as well as dis-
other cryptocurrencies like Bitcoin [10] and incentivizing the propagation of spam. This

3
represents a great improvement over typical Instead, Arweave introduces two new con-
centralized storage systems. Similarly, it em- cepts that allow nodes to fulfil key network
powers individuals to ensure that the infor- functions without possessing the whole chain.
mation they personally care about will be The first of these concepts is the block hash
perpetuated into the future. The incentive to list, a list of the hashes of all previous blocks.
maintain the weave also increases as the net- This allows old blocks to be verified, and po-
work and documents will reinforce the value tential new blocks evaluated effectively. The
of the tokens. As these effects compound, we second of these concepts is the wallet list, a
expect Arweave tokens to become a valuable list of all active wallets in the system. This al-
asset for the information age; inseparably and lows transactions to be verified without pos-
intrinsically linked to a vast trove of impor- sessing the block in which the last transac-
tant documents. tion was used. Using these blockhash list and
wallet lists synchronized by the network and
available for download by the miners, nodes
4 Technology are able to join the network and participate
in mining the weave almost immediately.
Arweave is built on four core technologies Further, instead of having each miner ver-
that work together to create low cost, high- ify the entire block structure from the gene-
throughput, permanent storage on a new sis block to the current block when they join
blockchain. These innovations are: the network, Arweave uses a system of ‘on-
• Blockweave going verification’. When miners join the Ar-
weave network, they will download the cur-
• Proof of Access rent block and retrieve the blockhash and
wallet lists from the current block. Since
• Wildfire these blockhash and wallet lists have been
• Blockshadows continuously verified through the ongoing
progress of each block, new miners can start
While these technologies are intertwined, participating immediately without verifying
each plays a pivotal role in creating a new the entire weave themselves. Full weave ver-
type of network suited for both fast transac- ification is of course available to any node
tions and low cost permanent storage. that wishes to perform it. In this way, miners
do not need to find the previous transaction
associated with a wallet in order to verify a
4.1 Blockweave
new transaction. Instead, miners would sim-
A well known property of most blockchains is ply need to verify that the transaction has
that every block must be stored to participate been appropriately signed by the wallet own-
in validating transactions as a “full node”. ers private key. To prevent recall block forg-
This is not the case with Arweave. ing attacks, the hash of the blockhash list is

4
Figure 1: An illustration of the blockweave data structure, demonstrating the link to both
the previous block and the recall block.

distributed with every new block. the recall block, to independently verify that
the new block is valid.

4.2 Proof of Access


4.3 Wildfire
Arweaves consensus mechanism is based on
proof of access (PoA) and proof of work As a data storage system, Arweave requires
(PoW). While typical PoW systems only de- not only the ability to store large amounts
pend on the previous block in order to gener- of information, but also to provide access to
ate each successive block, the PoA algorithm that data in the most expedient manner pos-
incorporates data from a randomly chosen sible. Further, an important part of the Ar-
previous block. Combined with the block- weave system is costless access to data at the
weave data structure, miners do not need to point of request. Subsequently, the Arweave
store all blocks (forming a blockchain), but has an added layer of incentives to encourage
rather can store any previous blocks, incen- miners to share data freely.
tivised by PoA and wildfire, forming a weave Wildfire is a system that solves the prob-
of blocks, a blockweave. The ‘recall block’ lem of data sharing in a decentralised net-
to incorporate into the next block is chosen work by making the rapid fulfilment of data
by taking the hash of the current block and requests on the network a necessary part of
calculating its modulus with respect to the participation. Wildfire works by creating a
current block height. ranking system local to each node that de-
The transactions in the recall block are termines how quickly new blocks and trans-
hashed alongside those found in the current actions are distributed to peers, based on how
block in order to generate the next block. quickly they respond to requests and accept
When a miner finds an appropriate hash, data from others. Peers are served in the
they distribute the new block along with the order of their rank, with poorly performing
recall block to other members of the network. peers being blacklisted from the network en-
This allows the other members of the net- tirely. Peers are financially incentivised to
work, even those without their own copy of stay well positioned in each other’s rankings

5
Figure 2: Illustration of the wildfire system. Each node ranks its peers based on how
favourably these peers have behaved to them previously.

so that they can spend the largest amount of are prefered. In practise, the wildfire mecha-
time efficiently mining. nism builds a network topology that maps the
underlying physical connection substrate of
This strongly encourages nodes in the sys- the internet, adapting to changes in its archi-
tem to behave in the most friendly manner tecture over time. Overall, the wildfire sys-
possible to other peers, without cost to those tem ensures high speed distribution of new
who are receiving the data, even those who blocks and keeps data available with short
may potentially be making one-time requests. latency.
Even further, it creates a network topology
that adapts to the most efficient routes for
global distribution, as connections that allow
fast transfer of new data around the system

6
4.4 Blockshadows network, and consensus about blocks to be
achieved at near network speed. Further,
In a traditional blockchain system, when a this system ensures transaction fees do not
new block is mined, each entire block is dis- increase dramatically when network usage is
tributed to every node in the network, no high and a theoretical limit on transaction
matter how much of the block data a node throughputs on an optimistic 100mbps net-
already possesses. This is not only an enor- work is around 5000 transactions per second.
mous waste of data, but significantly slows
down the rate at which a network can come
to the consensus about a block. Arweave 4.5 Democratic Content Policy
therefore introduces a new technology, block- To support the freedom of individual partic-
shadows that not only minimises this waste ipants in the network to control what con-
of data, but enables fast block consensus and tent they store, and to allow the network as
massive transaction throughput. a whole to democratically reject content that
Blockshadowing works by partially decou- is widely reviled, the Arweave software pro-
pling transactions from blocks, and only vides a blacklisting system. Each node main-
sending between nodes a minimal block tains an (optional) blacklist containing, for
“shadow” that allows peers to reconstruct a example, the hashes or substrings of certain
full block, instead of transmitting the full data that it doesn’t wish to ever store, and
block itself. These blockshadows specifically will never write to disk content that matches
contain a hash of the wallet list and hash this. These blacklists can be built by indi-
list, and in place of the transactions inside viduals or collaboratively, or can be imported
a block, only contain a list of transaction from other sources.
hashes. From this information (likely only At a local level, these blacklists allow nodes
a few kilobytes), a node who already holds to control their own content, but the sum
all of the transactions inside the block and of these local rejections also creates network
an up-to-date hash and wallet list can recon- wide content rejection. Content that is re-
struct an entire block of almost arbitrary size. jected by more than half the network will
To facilitate this, nodes will also immediately not only be rejected by each of those indi-
share transactions with one another, but only vidual nodes, but will also be rejected by
attempt to place transactions inside a block the wider network as a whole. This creates
once they have a high certainty that other a democratic network-wide content rejection
nodes in the network also have the transac- system that can merge blacklists across a va-
tion. riety of cultures and opinions into a tiny, spe-
The result of this blockshadowing system cific blacklist of content that is universally re-
is a fast and flexible block distribution sys- viled. This near universal, democratic black-
tem that allows transactions to be processed list shields the network from outside censor-
as fast as they can be distributed around the ship by a small number of actors while still

7
allowing it the freedom to protect itself in a the network. There are several architectures
democratic manner. that can be built on top of the weave.

4.6 Discussion
5.1 Client-Server Architecture
4.6.1 Storage Pools
Traditional web or native applications have
One potential theoretical attack against the
a client-server architecture. A server run-
Arweave that has become extraordinarily
ning the cloud will be “Arweave enabled”,
large is that miners may work co-operatively
interacting with one or more Arweave nodes,
to maintain a single copy of the weave, which
reading and writing data on behalf of clients.
they all access to retrieve recall blocks. While
These services can be websites with clients
this kind of behaviour may at first seem prob-
as visitors, or they can be native applications
lematic, this is not in fact the case. If such
passing client requests to a server operated
‘storage pools’ were employed by a large pro-
by the developers. These servers will need
portion of the miners, the incentive for other
to maintain a float of AR tokens in order to
miners to store rare blocks increases. This
ensure that requests for writing data can be
is because if the centralised stores become
processed. Reading data from the weave how-
unavailable, miners with a copy of the rare
ever is still free using this architecture.
blocks will be highly likely to receive the
Monetization potential for this architec-
reward when that block becomes the recall
ture is simple. A developer will need to ac-
block in the future. This self-interested be-
crue more value through advertising, monthly
haviour provides a risk-offsetting function to
subscriptions or direct payments for a wrap-
the network, which scales as the potential
per “credit” within their application, than
for data loss (caused by centralised storage
the amount of AR tokens they are utilizing
pools) grows.
to power their storage. There are many ap-
plications for permanent immutable storage.
5 Building Apps For example, storing quantum resistant, en-
crypted legal case files, identity or medical
Applications using the weave can be built records. While some legislation needs to ac-
using a simple REST API. The REST end- commodate sensitive information storage, ge-
points are HTTP and access the network di- ographical boundaries and the right to be
rectly, such that any Arweave wallet is capa- forgotten, this can also be somewhat miti-
ble of reading and writing data. The client gated through encryption and key manage-
only needs to bring their Arweave wallet to ment. Several revenue generating models can
a website through a Chrome extension or na- be layered on top of the weave, with the pri-
tive application with Arweave wallet integra- mary value proposition being permanent im-
tion, in order to read or write data from/to mutable storage on-chain.

8
5.2 Serverless Architecture 5.3 Event Based
In the early days of Twitter, there was a
thriving ecosystem of cottage industry appli-
Applications can live on the weave itself, cations and developers building on top of the
accessed by a client through an Arweave “firehost” APIs that were streaming tweets
enabled browser. Due to the ubiquity of to anyone willing to pay for access. This is
browsers and proliferation of web technol- not the case anymore, and in the wake of the
ogy, it makes most sense to store these ap- Facebook Cambridge Analytica fiasco, many
plications as standard frontend web applica- “trusted partners” of these services that pro-
tions using HTML/CSS/JS. However, if the vided data analytics to their clients are being
client’s native application included an inter- arbitrarily shut off.
preter/parser for different languages such as Arweave is a decentralised network of pub-
LLVM bytecode or scripting language like lic data and thus can never censor data ac-
Python, they could run on the client and cess or the data itself, with the exception of
perhaps benefit from the same upgradability democratically rejected content. This means
found in web applications. that developers are free to build on top of
Arweave and can listen for incoming data us-
Developers will not only be able to de-
ing the REST API. As events are triggered,
ploy serverless applications to Arweave, these
the listeners will fire the appropriate function
applications will also be able to write per-
calls of the clients subscribed to those events.
sistence and provable state to the network.
Developers need not fear being throttled or
Since Arweave does not impose a particular
shut down, as the network is incentivised to
data structure, developers are free to store
provide them with reliable access to the data
their data in the format that makes the most
feed.
sense for them. If the application is best
served by a highly optimized Merkle struc-
ture such as the one found in the Ethereum 5.4 Trustless and Provable
Virtual Machine (EVM), it can be easily im-
plemented on the weave. If more text blob Application architectures can be designed
style storage is what the developer is looking such that information needing to be stored
for, this is trivial as well. and guaranteed as tamper-proof are easily
implemented. Additionally, provably fair
Serverless applications are extremely inter- runtime code can be stored on the weave
esting as they can write their own data. Lay- and interpreted directly by the client. Using
ering on distributed computation will, for ex- the transaction ID of the content, the client
ample, allow the training of neural networks can verify the payload from the weave prior
to store their results, possibly sharing their to computation and be guaranteed that code
resultant models with other nets. they are running is both trustless and prov-

9
ably fair, i.e. it is the same code that other and a high-throughput cryptocurrency. The
clients are running. This opens up interesting Arweave protocol is made possible through
possibilities for trustless random number gen- the use of a new blockchain-like data struc-
erators and other oracle-based services per- ture called the blockweave; flexible size trans-
haps serving other blockchain networks. action block distribution via blockshadowing;
a new consensus mechanism reducing depen-
dency on proof of work called proof of access;
6 Use Cases and a self-optimising network topology called
wildfire. Much like the Bitcoin network, our
Permanent storage has several use cases.
technical advancements in isolation are not
Specifically, regulations requiring the archiv-
terribly complex; however, when combined
ing of documents up to a certain number of
to form the whole of the network, the emer-
years. Provable media reporting, academic
gent behavior is extremely powerful. We have
research and immutable records are becoming
seen from our testnet results that secure, re-
increasingly important in our modern world
liable and immutable data storage is possible
of echo chambers and proliferation of fake
on a public, permissionless and decentralised
news.
network protocol. In addition to data stor-
age, arbitrary size blocks make a secure high-
6.1 Authenticity throughput cryptocurrency possible without
having to resort to complicated consensus
Too often the legal system is tied up with lit-
mechanisms such as dBFT or dPoS.
igation over the authenticity of documents.
Arweave is tightly woven into the fabric of
Arweave solves this problem by providing an
the internet through its REST API and sev-
indefinite and verifiable store of any digital
eral revenue generating businesses are being
content from an author. In 2017, the state
built using the Arweave mainnet. Bridges be-
of Delaware ruled to have blockchain evi-
tween Arweave and other popular cryptocur-
dence admissible in court proceedings. These
rencies, secure computation, and smart con-
records could dramatically speed up disputes
tract protocols will enable a low cost and per-
over artistic attribution and intellectual prop-
manent data store to be easily integrated into
erty battles. The effects are twofold for
the technology stack of decentralised applica-
the creative economy, allowing artists to li-
tions. A fully globalized world of information
cense their work to others instantly and avoid
and financial exchange requires permanent
frivolous litigation.
records. Through a combination of cryptog-
raphy and distributed systems, we have pro-
7 Conclusion vided the basis for those permanent record-
ings. We hope Arweave will become an essen-
We have presented a new blockchain network tial companion to existing internet protocols
powering low cost immutable data storage such as the world wide web; working with

10
others to build a more open and transparent https://blog.archive.org/2013/11/06/sca
future. nning-center-fire-please-help-rebuild/.

[9] Birmingham Public Libraries. Notes on


References the history of the Birmingham Public Li-
braries, 1861-1961. Birmingham Public
[1] The national archives: Investigation Libraries Birmingham, 1962.
into forged documents discovered
amongst authentic public records. [10] Satoshi Nakamoto. Bitcoin: A peer-to-
http://discovery.nationalarchives.gov.uk peer electronic cash system, 2008.
/details/r/C16525.
[11] Jonathan Rose. The holocaust and the
[2] North’s ex-secretary book: destruction and preservation. Univ
tells of altering memos. of Massachusetts Press, 2008.
http://www.nytimes.com/1989/03/23/
[12] Gavin Wood. Ethereum: A se-
us/north-s-ex-secretary-tells-of-altering-
cure decentralised generalised transac-
memos.html.
tion ledger. Ethereum Project Yellow
[3] The patent fire of 1836. Paper, 151, 2014.
http://patent.laws.com/patent-act-
of-1836/patent-act-of-1836-patent-fire- [13] Xueyang Xu, Z. Morley Mao, and
of-1836. J. Alex Halderman. Internet Censor-
ship in China: Where Does the Filtering
[4] Mustafa Akgul and Melih Kirlidog. In- Occur?, pages 133–142. Springer Berlin
ternet censorship in turkey. Internet Pol- Heidelberg, Berlin, Heidelberg, 2011.
icy Review, 4(2):1–22, 2015.
[5] Fernando Baez. A universal history of
the destruction of books: From ancient
Sumer to modern Iraq. Atlas Books,
2008.
[6] Anton-Hermann Chroust. Socrates–a
source problem. The New Scholasticism,
19(1):48–72, 1945.
[7] Anne Frank and Storm Jameson. Anne
Frank’s diary. Vallentine, mitchell, 1971.
[8] Brewster Kahle. Fire update: Lost many
cameras, 20 boxes. no one hurt., 2013.

11

You might also like