You are on page 1of 10

Hyperledger v1 Ledger

High-level Design

Objectives

Support v1 endorsement/consensus model - separation of simulation


(chaincode execution) and block commit
Endorsement/simulation (chaincode execution) can be performed on a subset of peers.
Parallel execution of chaincode (concurrency)
Improved scalability

Embed transaction read/write sets on the blockchain (input-version and postimage)


Immutability, Auditing, Provenance

Optimize data storage for blockchain use pattern


New file-based ledger for improved performance
Continue using RocksDB for indexes to optimize ledger queries

Support pluggable data stores and rich query language


Challenging, given the first objective most databases do not support simulation and
read/write set requirements. Limitations will be likely. Next priority for investigation...

Ledger - Current work focus

KV-ledger
(High level components)
Block storage
Stores and retrieves blocks
Assumes blocks arrive in exact sequence
Queries supported
Retrieve blocks by block-hash and block-number
Scan blocks range between two block numbers
Retrieve Transaction by txId

Transactions execution
Simulates transactions and produces ReadWriteSet (Endorser)
Queries/Updates GetKey/SetKey/GetKeyRange
Validates And applies ReadWriteSet (Committer)
Key version based validation (MVCC)
Read-only queries
GetKey/GetKeyRange

Filesystem-based Block Storage

Blocks are stored in file segments

Each file segment contains

File segment header (version etc.)


A sequence of
Varint encoded length of block-bytes followed by block-bytes

RocksDB contains block indexes to support common queries

Default segment size 64 MB

Index block by hash, Index block by number, Index transaction by Id


Value of index is a file-offset-pointer
Potentially encode starting block number in segment file name, include a segment-specific block index at
the end of each segment file, and use blockNumber_tranId for transaction id, so that you can easily jump
to segment file given a block number or transaction id, without needing an external blockNum or txId
index (would still need a blockHash external index)

Usage

Raw ledger store batches of raw transactions to be committed


Final validated ledger store committed blocks of valid transactions

File seg-1

RocksDB
block
index

blockHash
blockNum
txId

SegNo + offset
SegNo + offset
SegNo + offset

File seg header


Block-1 length

Block-1
Block-2 length

Block-2

Filesystem-based Block Storage


Pros
Blocks arrive in a sequential order resulting in efficient append-only workload
Avoids the write amplification associated with RocksDB and other storage solutions
Becomes more feasible to move large numbers of blocks in bulk, for example when a new
peer comes online (move entire files instead of reading/writing N blocks).

Cons
Custom block data management on file system
Need to maintain sanity of file segments and consistency between block files and RocksDB
indexes
Need utilities to validate that block files and RocksDB are in sync, and to re-build indexes
as needed

Logical structure of a RWSet


Block{
Transactions [
{
"Id" : txUUID1
"Invoke" : Method(arg1, arg2,..,argN)"
TxRWSet" : [
{ Chaincode : ccId
Reads:[{"key" : key1", "version:v1}] // if a Tx perform both read and write on a key, the key appears only in Writes
Writes:[{"key" : key2", "version:v2, value" : bytes1}] // a missing value indicates a delete operation
} // end chaincode RWSet
] // end TxRWSet
}, // end transaction with "Id" txUUID1
{ // another transaction },
] // end Transactions
}// end Block

JSON syntax only for conceptual representation


Data is serialized in binary representation - sorted order of ccIds and sorted keys within chaincode
Notes:
Need to add chaincode version. Will be used for auditing, and perhaps for commit validation as well - especially upon chaincode upgradeneed to go
through all upgrade scenarios, e.g. ensure simulation was done on latest chaincode version available.

Transaction execution - Version maintenance

Version maintenance
Should be possible to detect if a key has changed between simulation and committing phase of a
transaction (MVCC validation)
Versioning scheme for a unique version per key two options:
Incrementing numbers (initial implementation)
txID of the last committed transaction that updated the key (implement with config option and compare)

Pro/Cons of using TxId as version identifier


Pro
Does not require introducing a new concept (e.g., auto-incrementing number for each key
separately)
Consistent with popular bitcoin transaction structure
(key + version) is equivalent to 'input; (key + newValue) is equivalent to UTXO output
Provides built-in provenance a pointer to prior transaction for this key, that can easily be
traversed backwards to track full history of a key over time
Separate fork ID not required in PoW for uniqueness

Cons
Transaction ids significantly longer than incrementing numbers (txIds may be 32 bytes if used crypto
hash of contents) in the case of pbft

Transaction execution - Simulation (Chaincode execution)

Transaction simulation

RocksDB contains latest state index for fast simulation queries

A scheme for simulating a transaction on a consistent copy of the data

Index by composite key (ccId:keyId)


if chaincodes are limited in number, use a separate column family per chaincode (Configurable?)
Collocating keys of a chaincode for faster transaction simulation particularly for range scan queries
Latest value encoded as [version:deleteMarker:latestValueBytes(if present)]
Value bytes can be file-offset-pointer to block storage for vary large values (configurable default - over 1 MB?)

Tx simulation to perform on a stable snapshot, supporting concurrency (initial two options):

Locking based concurrency control (initial implementation)


Read locks on RocksDB state by simulator(s) and write lock during commit
Snapshot based concurrency control (implement with config option and compare under load)
Create a RocksDB snapshot and simulate on the snapshot
Does not prevent concurrent commit of new blocks

RocksDB
State
index

ccId+keyId

version+deleteMarker+latestValueBytes

This is a simulation runtime optimization. Alternatively, state key index could point to ledger block/transaction
storage write set, and we could read values from there as the single source of truth, but would not be as efficient.
Bitcoin uses a similar index in LevelDB for unspent transactions.

Transaction execution Validation/Commit


Committing peer choreography
Receive batches of transactions from consensus (ordering service)
Call Validation System Chaincode (VSCC) to ensure endorsement policy has been fulfilled
Call ledger to perform Multiversion Concurrency Control (MVCC ) check; remove invalid
transactions; build block of remaining valid transactions
Initial implementation with sequential validation
Extend to parallel validation of transactions in a block
Using lock manager that maintains one lock for each key (acquire locks
sequentially and once all the locks are acquired; start performing validation)
Split transactions in conflict free batches by dependency analysis and perform
validation in parallel
Call Committer System Chaincode (CSCC) via gossip to ensure final blocks are same
across peers
Call ledger to commit validated block to file-based storage and update RocksDB indexes
Notes: Also need to validate that transaction id has not already been used.

* Blue steps call ledger APIs