You are on page 1of 10

DataChain

By James Spann
The problem

 Data is locked away by high barriers


 Storage issues
We believe that data
should be open Access

Image from: https://commons.wikimedia.org/wiki/


File:Open_Access_PLoS.svg
Datachain
A decentralized option for Researchers to share their research and contribute to their communities! (no cash required!)
What is a Blockchain

• Decentralized

• Immutable and Secure way to store data


• Referencing capabilities
• Peer to Peer

Image from:
https://crackinterviewtoday.wordpress.com/tag/linked-lists/

Image from:
https://commons.wikimedia.org/wiki/File:Bitcoin_Block_Data.svg
How does it work?

 Wallet creation
 Nodes with data (models and datasets) activate and broadcast their information to other nodes on the
network
 These nodes verify the integrity of data
 If the data looks good, the node takes a hash and sends it out to all the other nodes. Once 51% consensus is
achieved amongst all the nodes, we can
 Data is stacked on top and we repeat
 Unlike a traditional Blockchain we do not use currency as an incentive for verifying data
 Data is stored in a reversible format
What have we solved with this solution?

 Allows knowledge to be spread without barriers


 Allows data to be sharded and replicated so it can be preserved despite saved a change of opinion (ex. Trump
and Nasa Climate data, ex. MIT Borgs)
 Avoid storage issues of centralization
 Prevents two people from working on the same thing
An example:

 As an example we created a Tensorflow model that predicts


the location of a college in the US based on information such
as
 Bytecode nature allows us to use this as a translator to other
libraries

Image from: https://www.kdnuggets.com/2017/12/tensorflow-


short-term-stocks-prediction.html

Data from:
https://catalog.data.gov/dataset/college-scorecard
Future Work

 Adapting this CPU power of all the devices on the network to train larger models
 Doing the codebase in Golang or C++ rather than Python
 Adding a DNS for easier discovery
 Creating a key access system for patented models
 Extending the interface beyond
 Block Explorer so data can be looked at without running a node
 Pulling gradients directly from tensorflow graph
Thank you
Any questions?

You might also like