Professional Documents
Culture Documents
SQL Era
Started around 1970 A Relational Model of Data for Large Shared Data Banks IBM -> System R -> DB2 Microsoft -> Microsoft SQL Server Open Source -> MySQL and PostgreSQL Relational Software -> Oracle
NoSQL Era
NoSQL was used by Carlo Strozzi to name his lightweight, open-source, relational DB Eric Evans, employee of Rackspace, reintroduced the term in 2009
What is NoSQL?
Distributed No joins No Schema Didn't attempt to provide ACID guarantees
Example of NoSQL
Key-value DB: Like a big hash table. One key and one value. o Dynamo (Amazon) o Redis o Voldemort (LinkedIn) Column(Family) based DB: Storing data by columns. o BigTable (Google) o Cassandra (Facebook) Document Based DB: Document based. Versioned. o MongoDB o CouchDB Graph DB: Graph based database. Think about relationship. o Neo4j o FlockDB (Twitter)
"Anything which needs agreement will eventually fail at scale." - Werner Vogels, CTO of Amazon
Other Systems
Dynamo:Design Consideration
Trades off strong consistency for availability Conflict resolution is executed during read instead of write, i.e. always writeable .
1.Incremental Scalability
Problem: Partition data over the set of servers to achieve incremental scalability. Given a key you want to figure out which machine stores the data. Simple way is to follow a modulo arithmetic approach: key mod N What if you add a machine. WIll key mod(N+1) give the answer?
1.Incremental Scalability
What if a node(server) becomes unavailable? How do you distribute the load? What if a node(server) becomes available again? How do you distribute the load?
2.Availability
Problem: Need to ensure that system is available during server and network failures.
Solution: Replication
Each data item is replicated at N hosts. Co-ordinator node replicates these keys at N-1 clockwise successor nodes Preference list : The list of nodes that is responsible for storing a particular key.
3.Eventual Consistency
Need to provide eventual consistency which allows for updates to be propagated to all replicas asynchronously
3.Eventual Consistency
A put() call may return to its caller before the update has been applied at all the replicas A get() call may return many versions of the same object. Challenge: an object having distinct version sub-histories, which the system will need to reconcile in the future. Solution: uses vector clocks in order to capture causality between different versions of the same object.
4.Handling Failures
Need to handle network and server failures for read & write operations to execute without latency
Conclusion
Stands out amongst other databases in terms of availability Can still work in case of multiple node failures Robust failure management system. No single point of failure Can tune N,R and W.Unique to Dynamo Hence meet desired performance,durability and consistency SLAs