Scalable Data Storage Getting You Down? To The Cloud!

SCALABLE DATA STORAGE
GETTING YOU DOWN?

TO THE CLOUD!
Web 2.0 Expo SF 20
Mike Male, Mike Pcnko, Dek Smh, Paul Lhrop

THE CAST
MIKE MALONE
INFRASTRUCTURE ENGINEER
@MJMALONE
MIKE PANCHENKO
@MIHASYA
DEREK SMITH
@DSMITTS
PAUL LATHROP
OPERATIONS
@GREYTALYN
SIMPLEGEO
We originally began as a mobile

gaming startup, but quickly
discovered that the location services
and infrastructure needed to support
our ideas didn’t exist. So we took
matters into our own hands and
began building it ourselves.
Mt Gaig Joe Stump

CSO & co-founder CTO & co-founder
THE STACK
www
gnop
AWS
RDS
AWS auth/proxy
ELB
HTTP
data centers
... ...
api servers record
storage
reads geocoder
queues reverse
geocoder
GeoIP
pushpin
writes
index
storage
Apache Cassandra
BUT WHY NOT
POSTGIS?
DATABASES
WHAT ARE THEY GOOD FOR?
DATA STORAGE
Durably persist system state
CONSTRAINT MANAGEMENT
Enforce data integrity constraints
EFFICIENT ACCESS
Organize data and implement access methods for efficient
retrieval and summarization
DATA INDEPENDENCE
Data independence shields clients from the details
of the storage system, and data structure
LOGICAL DATA INDEPENDENCE

Clients that operate on a subset of the attributes in a data set should
not be affected later when new attributes are added
PHYSICAL DATA INDEPENDENCE
Clients that interact with a logical schema remain the same despite
physical data structure changes like
• File organization
• Compression
• Indexing strategy
TRANSACTIONAL RELATIONAL
DATABASE SYSTEMS
HIGH DEGREE OF DATA INDEPENDENCE
Logical structure: SQL Data Definition Language
Physical structure: Managed by the DBMS
OTHER GOODIES
They’re theoretically pure, well understood, and mostly
standardized behind a relatively clean abstraction
They provide robust contracts that make it easy to reason
about the structure and nature of the data they contain
They’re ubiquitous, battle hardened, robust, durable, etc.
ACID
These terms are not formally defined - they’re a
framework, not mathematical axioms
ATOMICITY
Either all of a transaction’s actions are visible to another transaction, or none are
CONSISTENCY
Application-specific constraints must be met for transaction to succeed
ISOLATION
Two concurrent transactions will not see one another’s transactions while “in flight”
DURABILITY
The updates made to the database in a committed transaction will be visible to
future transactions
ACID HELPS
ACID is a sort-of-formal contract that makes it
easy to reason about your data, and that’s good
IT DOES SOMETHING HARD FOR YOU

With ACID, you’re guaranteed to maintain a persistent global
state as long as you’ve defined proper constraints and your
logical transactions result in a valid system state
CAP THEOREM
At PODC 2000 Eric Brewer told us there were three
desirable DB characteristics. But we can only have two.
CONSISTENCY
Every node in the system contains the same data (e.g., replicas are
never out of date)
AVAILABILITY
Every request to a non-failing node in the system returns a response
PARTITION TOLERANCE
System properties (consistency and/or availability) hold even when
the system is partitioned and data is lost
CAP THEOREM IN 30 SECONDS
CLIENT SERVER REPLICA


wre
CLIENT SERVER plice REPLICA

wre

wre
ack

wre
aept ack

wre
FAIL!
ni
UNAVAILAB!

wre
FAIL!
aept
CSTT!
ACID HURTS
Certain aspects of ACID encourage (require?)
implementors to do “bad things”
Unfortunately, ANSI SQL’s definition of isolation...

relies in subtle ways on an assumption that a locking scheme is
used for concurrency control, as opposed to an optimistic or
multi-version concurrency scheme. This implies that the
proposed semantics are ill-defined.
Joseph M. Hellerstein and Michael Stonebraker
Anatomy of a Database System
BALANCE
IT’S A QUESTION OF VALUES
For traditional databases CAP consistency is the holy grail: it’s
maximized at the expense of availability and partition
tolerance
At scale, failures happen: when you’re doing something a
million times a second a one-in-a-million failure happens every
second
We’re witnessing the birth of a new religion...
• CAP consistency is a luxury that must be sacrificed at scale in order to
maintain availability when faced with failures
NETWORK INDEPENDENCE
A distributed system must also manage the
network - if it doesn’t, the client has to
CLIENT APPLICATIONS ARE LEFT TO HANDLE

Partitioning data across multiple machines
Working with loosely defined replication semantics
Detecting, routing around, and correcting network and
hardware failures
WHAT’S WRONG
WITH MYSQL..?
TRADITIONAL RELATIONAL DATABASES
They are from an era (er, one of the eras) when Big Iron was
the answer to scaling up
In general, the network was not considered part of the system
NEXT GENERATION DATABASES
Deconstructing, and decoupling the beast
Trying to create a loosely coupled structured storage system
• Something that the current generation of database systems never
quite accomplished
UNDERSTANDING
CASSANDRA
APACHE CASSANDRA
A DISTRIBUTED STRUCTURED STORAGE SYSTEM
EMPHASIZING
Extremely large data sets
High transaction volumes
High value data that necessitates high availability
TO USE CASSANDRA EFFECTIVELY IT HELPS TO

UNDERSTAND WHAT’S GOING ON BEHIND THE SCENES
APACHE CASSANDRA
A DISTRIBUTED HASH TABLE WITH SOME TRICKS
Peer-to-peer architecture with no distinguished nodes, and
therefore no single points of failure
Gossip-based cluster management
Generic distributed data placement strategy maps data to nodes
• Pluggable partitioning
• Pluggable replication strategy
Quorum based consistency, tunable on a per-request basis
Keys map to sparse, multi-dimensional sorted maps
Append-only commit log and SSTables for efficient disk utilization
NETWORK MODEL
DYNAMO INSPIRED
CONSISTENT HASHING
Simple random partitioning mechanism for distribution
Low fuss online rebalancing when operational requirements
change
GOSSIP PROTOCOL
Simple decentralized cluster configuration and fault detection
Core protocol for determining cluster membership and
providing resilience to partial system failure
CONSISTENT HASHING
Improves shortcomings of modulo-based hashing
1
sh(alice) % 3
2
=> 23 % 3
=> 2 3
CONSISTENT HASHING
With modulo hashing, a change in the number of
nodes reshuffles the entire data set
1
sh(alice) % 4
2
=> 23 % 4
=> 3 3
4
CONSISTENT HASHING
Instead the range of the hash function is mapped to a
ring, with each node responsible for a segment
0
sh(alice) => 23
84 42
CONSISTENT HASHING
When nodes are added (or removed) most of the data
mappings remain the same
0
sh(alice) => 23
84 42
64
CONSISTENT HASHING
Rebalancing the ring requires a minimal amount of
data shuffling
0
sh(alice) => 23
96 32
64
GOSSIP
DISSEMINATES CLUSTER MEMBERSHIP AND
RELATED CONTROL STATE
Gossip is initiated by an interval timer
At each gossip tick a node will
• Randomly select a live node in the cluster, sending it a gossip message
• Attempt to contact cluster members that were previously marked as
down
If the gossip message is unacknowledged for some period of
time (statistically adjusted based on the inter-arrival time of
previous messages) the remote node is marked as down
REPLICATION
REPLICATION FACTOR Determines how many
copies of each piece of data are created in the
system
RF=3
0
sh(alice) => 23
96 32
64
CONSISTENCY MODEL
DYNAMO INSPIRED
QUORUM-BASED CONSISTENCY
W=2
0
wre
sh(alice) => 23
ad 96 32
W+R>N
R=2 64
Cstt
TUNABLE CONSISTENCY
WRITES
ZERO DON’T BOTHER WAITING FOR A RESPONSE
ANY WAIT FOR SOME NODE (NOT NECESSARILY A

REPLICA) TO RESPOND
ONE WAIT FOR ONE REPLICA TO RESPOND
QUORUM WAIT FOR A QUORUM (N/2+1) TO RESPOND
ALL WAIT FOR ALL N REPLICAS TO RESPOND

TUNABLE CONSISTENCY
READS
ONE WAIT FOR ONE REPLICA TO RESPOND
QUORUM WAIT FOR A QUORUM (N/2+1) TO RESPOND
ALL WAIT FOR ALL N REPLICAS TO RESPOND

CONSISTENCY MODEL
DYNAMO INSPIRED
READ REPAIR
HINTED HANDOFF
ANTI-ENTROPY
W=2
wre
fail
CONSISTENCY MODEL
DYNAMO INSPIRED
READ REPAIR Asynchronously checks replicas during
reads and repairs any inconsistencies
HINTED HANDOFF
ANTI-ENTROPY
W=2
wre
ad + fix
CONSISTENCY MODEL
DYNAMO INSPIRED
READ REPAIR
HINTED HANDOFF Sends failed writes to another node
with a hint to re-replicate when the failed node returns
ANTI-ENTROPY
wre
plica
CONSISTENCY MODEL
DYNAMO INSPIRED
READ REPAIR
HINTED HANDOFF Sends failed writes to another node
with a hint to re-replicate when the failed node returns
ANTI-ENTROPY
* ck *
pair
CONSISTENCY MODEL
DYNAMO INSPIRED
READ REPAIR
HINTED HANDOFF
ANTI-ENTROPY Manual repair process where nodes
generate Merkle trees (hash trees) to detect and
repair data inconsistencies
pair
DATA MODEL
BIGTABLE INSPIRED
SPARSE MATRIX it’s a hash-map (associative array):
a simple, versatile data structure
SCHEMA-FREE data model, introduces new freedom
and new responsibilities
COLUMN FAMILIES blend row-oriented and column-
oriented structure, providing a high level mechanism
for clients to manage on-disk and inter-node data
locality
DATA MODEL
TERMINOLOGY
KEYSPACE A named collection of column families
(similar to a “database” in MySQL) you only need one and
you can mostly ignore it
COLUMN FAMILY A named mapping of keys to rows
ROW A named sorted map of columns or supercolumns
COLUMN A <name, value, timestamp> triple
SUPERCOLUMN A named collection of columns, for

people who want to get fancy
DATA MODEL
{
column family
“users”: { key
“alice”: {
“city”: [“St. Louis”, 1287040737182],
columns
row (name, value, timestamp)
“name”: [“Alice”, 1287080340940],
},
...
},
“locations”: {
},
...
}
IT’S A DISTRIBUTED HASH TABLE
WITH A TWIST...
COLUMNS IN ARE STORED TOGETHER ON ONE NODE,
IDENTIFIED BY <keyspace, key>
{
column family
“users”: {
key
“alice”: {
“city”: [“St. Louis”, 1287040737182],
columns
“name”: [“Alice”, 1287080340940],
},
...
},
}
...
bob
alice s3b
3e8
HASH TABLE
SUPPORTED QUERIES
EXACT MATCH
RANGE
PROXIMITY
ANYTHING THAT’S NOT
EXACT MATCH
COLUMNS
SUPPORTED QUERIES
EXACT MATCH
{
RANGE “users”: {
“alice”: {
“city”: [“St. Louis”, 1287040737182],
PROXIMITY “friend-1”: [“Bob”, 1287080340940],
friends “friend-2”: [“Joe”, 1287080340940],
“friend-3”: [“Meg”, 1287080340940],
“name”: [“Alice”, 1287080340940],
},
...
}
}
LOG-STRUCTURED MERGE
MEMTABLES are in memory data structures that
contain newly written data
COMMIT LOGS are append only files where new

data is durably written
SSTABLES are serialized memtables, persisted to

disk
COMPACTION periodically merges multiple

memtables to improve system performance
CASSANDRA
CONCEPTUAL SUMMARY...
IT’S A DISTRIBUTED HASH TABLE
Gossip based peer-to-peer “ring” with no distinguished nodes and no
single point of failure
Consistent hashing distributes workload and simple replication
strategy for fault tolerance and improved throughput
WITH TUNABLE CONSISTENCY
Based on quorum protocol to ensure consistency
And simple repair mechanisms to stay available during partial system
failures
AND A SIMPLE, SCHEMA-FREE DATA MODEL
It’s just a key-value store
Whose values are multi-dimensional sorted map
ADVANCED CASSANDRA
- A case study -
SPATIAL DATA IN A DHT
A FIRST PASS
THE ORDER PRESERVING PARTITIONER
CASSANDRA’S PARTITIONING
STRATEGY IS PLUGGABLE
Partitioner maps keys to nodes
Random partitioner destroys locality by hashing
Order preserving partitioner retains locality, storing
keys in natural lexicographical order around ring z a
alice
a
bob
u h
sam m
ORDER PRESERVING PARTITIONER
EXACT MATCH
RANGE
On a single dimension
? PROXIMITY
SPATIAL DATA
IT’S INHERENTLY MULTIDIMENSIONAL
2 x 2, 2
1 2
DIMENSIONALITY REDUCTION
WITH SPACE-FILLING CURVES
1 2
3 4
Z-CURVE
SECOND ITERATION
Z-VALUE
14
x
GEOHASH
SIMPLE TO COMPUTE
Interleave the bits of decimal coordinates
(equivalent to binary encoding of pre-order
traversal!)
Base32 encode the result
AWESOME CHARACTERISTICS
Arbitrary precision
Human readable
Sorts lexicographically
01101
e
DATA MODEL
{
“record-index”: {
key
<geohash>:<id>
“9yzgcjn0:moonrise hotel”: {
“”: [“”, 1287040737182],
},
...
},
“records”: {
“moonrise hotel”: {
“latitude”: [“38.6554420”, 1287040737182],
“longitude”: [“-90.2992910”, 1287040737182],
...
}
}
}
BOUNDING BOX
E.G., MULTIDIMENSIONAL RANGE
Gie ﬆuff  bg box! Gie 2  3
1 2
3 4
Gie 4  5
SPATIAL DATA
STILL MULTIDIMENSIONAL
DIMENSIONALITY REDUCTION ISN’T PERFECT
Clients must
• Pre-process to compose multiple queries
• Post-process to filter and merge results
Degenerate cases can be bad, particularly for nearest-neighbor
queries
Z-CURVE LOCALITY
Z-CURVE LOCALITY
x
x
Z-CURVE LOCALITY
x
x
Z-CURVE LOCALITY
x
o o o x
o
o o
o
THE WORLD
IS NOT BALANCED
Credit: C. Mayhew & R. Simmon (NASA/GSFC), NOAA/NGDC, DMSP Digital Archive

TOO MUCH LOCALITY
1 2
SAN FRANCISCO
3 4
TOO MUCH LOCALITY
1 2
SAN FRANCISCO
3 4
TOO MUCH LOCALITY
1 2 I’m sad.
SAN FRANCISCO
3 4
TOO MUCH LOCALITY
I’m b.
1 2 I’m sad. Me o.
SAN FRANCISCO
3 4
Let’s py xbox.

A TURNING POINT
HELLO, DRAWING BOARD
SURVEY OF DISTRIBUTED P2P INDEXING
An overlay-dependent index works directly with nodes of the
peer-to-peer network, defining its own overlay
An over-DHT index overlays a more sophisticated data
structure on top of a peer-to-peer distributed hash table
ANOTHER LOOK AT POSTGIS
MIGHT WORK, BUT
The relational transaction management system (which we’d
want to change) and access methods (which we’d have to
change) are tightly coupled (necessarily?) to other parts of
the system
Could work at a higher level and treat PostGIS as a black box
• Now we’re back to implementing a peer-to-peer network with failure
recovery, fault detection, etc... and Cassandra already had all that.
• It’s probably clear by now that I think these problems are more
difficult than actually storing structured data on disk
LET’S TAKE A STEP BACK
EARTH
EARTH
EARTH
EARTH
EARTH, TREE, RING
DATA MODEL
{
“layer-name:37.875, -90:40.25, -101.25”: {
“38.6554420, -90.2992910:moonrise hotel”: [“”, 1287040737182],
...
},
},
“record-index-meta”: {
“layer-name:37.875, -90:40.25, -101.25”: {
“split”: [“false”, 1287040737182],
}
“layer-name: 37.875, -90:42.265, -101.25” {
“split”: [“true”, 1287040737182],
“child-left”: [“layer-name:37.875, -90:40.25, -101.25”, 1287040737182]
“child-right”: [“layer-name:40.25, -90:42.265, -101.25”, 1287040737182]
}
}
}
DATA MODEL
{
“layer-name:37.875, -90:40.25, -101.25”: {
“38.6554420, -90.2992910:moonrise hotel”: [“”, 1287040737182],
...
},
},
“layer-name:37.875, -90:40.25, -101.25”: {
“split”: [“false”, 1287040737182],
}
“layer-name: 37.875, -90:42.265, -101.25” {
“split”: [“true”, 1287040737182],
}
}
}
DATA MODEL
{
“layer-name:37.875, -90:40.25, -101.25”: {
“38.6554420, -90.2992910:moonrise hotel”: [“”, 1287040737182],
...
},
},
“layer-name:37.875, -90:40.25, -101.25”: {
“split”: [“false”, 1287040737182],
}
“layer-name: 37.875, -90:42.265, -101.25” {
“split”: [“true”, 1287040737182],
}
}
}
SPLITTING
IT’S PRETTY MUCH JUST A CONCURRENT TREE
Splitting shouldn’t lock the tree for reads or writes and failures
shouldn’t cause corruption
• Splits are optimistic, idempotent, and fail-forward
• Instead of locking, writes are replicated to the splitting node and the
relevant child[ren] while a split operation is taking place
• Cleanup occurs after the split is completed and all interested nodes are
aware that the split has occurred
• Cassandra writes are idempotent, so splits are too - if a split fails, it is
simply be retried
Split size: A Tunable knob for balancing locality and distributedness
The other hard problem with concurrent trees is rebalancing - we
just don’t do it! (more on this later)
THE ROOT IS HOT
MIGHT BE A DEAL BREAKER
For a tree to be useful, it has to be traversed
• Typically, tree traversal starts at the root
• Root is the only discoverable node in our tree
Traversing through the root meant reading the root for every
read or write below it - unacceptable
• Lots of academic solutions - most promising was a skip graph, but
that required O(n log(n)) data - also unacceptable
• Minimum tree depth was propsed, but then you just get multiple hot-
spots at your minimum depth nodes
BACK TO THE BOOKS
LOTS OF ACADEMIC WORK ON THIS TOPIC
But academia is obsessed with provable, deterministic,
asymptotically optimal algorithms
And we only need something that is probably fast enough
most of the time (for some value of “probably” and “most of
the time”)
• And if the probably good enough algorithm is, you know... tractable...
one might even consider it qualitatively better!
- THE ROOT -
A HOT SPOT AND A SPOF
We have We want
THINKING HOLISTICALLY
WE OBSERVED THAT
Once a node in the tree exists, it doesn’t go away
Node state may change, but that state only really matters
locally - thinking a node is a leaf when it really has children is
not fatal
SO... WHAT IF WE JUST CACHED NODES THAT
WERE OBSERVED IN THE SYSTEM!?
CACHE IT
STUPID SIMPLE SOLUTION
Keep an LRU cache of nodes that have been traversed
Start traversals at the most selective relevant node
If that node doesn’t satisfy you, traverse up the tree
Along with your result set, return a list of nodes that were
traversed so the caller can add them to its cache
TRAVERSAL
NEAREST NEIGHBOR
o o
o xo
TRAVERSAL
NEAREST NEIGHBOR
o o
o xo
TRAVERSAL
NEAREST NEIGHBOR
o o
o xo
KEY CHARACTERISTICS
PERFORMANCE
Best case on the happy path (everything cached) has zero
read overhead
Worst case, with nothing cached, O(log(n)) read overhead
RE-BALANCING SEEMS UNNECESSARY!
Makes worst case more worser, but so far so good
DISTRIBUTED TREE
SUPPORTED QUERIES
EXACT MATCH
RANGE
PROXIMITY
SOMETHING ELSE I HAVEN’T
EVEN HEARD OF
DISTRIBUTED TREE
SUPPORTED QUERIES
MUL
EXACT MATCH DI P
NS 
RANGE
NS!
PROXIMITY
SOMETHING ELSE I HAVEN’T
EVEN HEARD OF
LIFE OF A REQUEST
THE BIRDS ‘N THE BEES
ELB
gate gate
service service
cass
worker pool worker pool
index
ELB
gate
service
cass
worker pool
index
ELB load bag; AWS svice
gate
service
cass
worker pool
index
gate auicn; fwdg
service
cass
worker pool
index
service buss logic - bic validn
cass
worker pool
index
cass cd ﬆage
worker pool
index
worker pool buss logic - ﬆage/xg
index
worker pool buss logic - ﬆage/xg
index awome sauce f qryg

ELB
•Traffic management
•Control which AZs are serving traffic
•Upgrades without downtime
•Able to remove an AZ, upgrade, test,
replace
•API-level failure scenarios

•Periodically runs healthchecks on nodes
•Removes nodes that fail
GATE
•Basic auth
•HTTP proxy to specific services
•Services are independent of one another
•Auth is Decoupled from business logic
•First line of defense
•Very fast, very cheap
•Keeps services from being overwhelmed
by poorly authenticated requests
RABBITMQ
•Decouple accepting writes from
performing the heavy lifting
•Don’t block client while we write to db/
index
•Flexibility in the event of degradation
further down the stack
•Queues can hold a lot, and can keep
accepting writes throughout incident
•Heterogenous consumers - pass the same
message through multiple code paths easily
AWS
EC2
• Security groups
• Static IPs
• Choose your data center
• Choose your instance type
• On-demand vs. Reserved
ELASTIC BLOCK SUPPORT
• Storage devices that can be anywhere from 1GB to 1TB
• Snapshotting
• Automatic replication
• Mobility
ELASTIC LOAD BALANCING
• Distribute incoming traffic
• Automatic scaling
• Health checks
SIMPLE STORAGE SERVICE
• File sizes can be up to 5TBs
• Unlimited amount of files
• Individual read/write access credentials
RELATIONAL DATABASE SERVICE
• MySQL in the cloud
• Manages replication
• Specify instance types based on need
• Snapshots
CONFIGURATION
MANAGEMENT
WHY IS THIS NECESSARY?
• Easier DevOps integration
• Reusable modules
• Infrastructure as code
• One word: WINNING
POSSIBLE OPTIONS
SAMPLE MANIFEST
# /root/learning-manifests/apache2.pp
package {
'apache2':
ensure => present;
}
file {
'/etc/apache2/apache2.conf':
ensure => file,
mode => 600,
notify => Service[‘apache2’],
source => '/root/learning-manifests/apache2.conf',
}
service {
'apache2':
ensure => running,
enable => true,
subscribe => File['/etc/apache2/apache2.conf'],
}
AN EXAMPLE
TERMINAL
CONTINUOUS
INTEGRATION
GET ‘ER DONE
• Revision control
• Automate build process
• Automate testing process
• Automate deployment
Local code changes should result in production

deployments.
DON’T FORGET TO DEBIANIZE
• All codebases must be debianized
• Open source project not debianized yet, fork the repo and
do it yourself!
• Take the time to teach others
• Debian directories can easily be reused after a simple search
and replace
REPOMAN
HTTPS://GITHUB.COM/SYNACK/REPOMAN.GIT
repoman upload myrepo sg-example.tar.gz
repoman promote development/sg-example staging

HUDSON
MAINTAINING MULTIPLE
ENVIRONMENTS
• Run unit tests in a development environment
• Promote to staging
• Run system tests in a staging environment
• Run consumption tests in a staging environment
• Promote to production
Congratz, you have now just automated yourself

out of a job.
THE MONEY MAKER
Github Plugin
TYING IT ALL TOGETHER
TERMINAL
FLUME
Flume is a distributed, reliable and available
service for efficiently collecting, aggregating and
moving large amounts of log data.
syslog on steriods
DATA-FLOWS
AGENTS
Physical Host Logical Nodes Source and Sink
twitter_stream twitter(“dsmitts”, “mypassword”, “url”)

agentSink(35853)
tail(“/var/log/nginx/access.log”)
i-192df98 tail agentSink(35853)
collectorSource(35853)
hdfs_writer collectorSink("hdfs://namenode.sg.com/bogus/", "logs")
RELIABILITY
END-TO-END
STORE ON FAILURE
BEST EFFORT
GETTIN’ JIGGY WIT IT
Custom Decorators
HOW DO WE USE IT?
PERSONAL EXPERIENCES
• #flume
• Automation was gnarly
• Its never a good day when Eclipse is involved
• Resource hog (at first)
We’ buildg kick-s ols f vops  x,
tpt,  csume da cnect  a locn
MARCH 2011

Scalable Data Storage Getting You Down? To The Cloud!

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Scalable Data Storage Getting You Down? To The Cloud!

Uploaded by

Copyright:

Available Formats

SCALABLE DATA STORAGE

GETTING YOU DOWN?

Mike Male, Mike Pcnko, Dek Smh, Paul Lhrop

We originally began as a mobile

Mt Gaig Joe Stump

LOGICAL DATA INDEPENDENCE

IT DOES SOMETHING HARD FOR YOU

CLIENT SERVER REPLICA

CLIENT SERVER REPLICA

CLIENT SERVER plice REPLICA

CLIENT SERVER plice REPLICA

CLIENT SERVER plice REPLICA

CLIENT SERVER REPLICA

CLIENT SERVER REPLICA

Unfortunately, ANSI SQL’s definition of isolation...

CLIENT APPLICATIONS ARE LEFT TO HANDLE

TO USE CASSANDRA EFFECTIVELY IT HELPS TO

ZERO DON’T BOTHER WAITING FOR A RESPONSE

ANY WAIT FOR SOME NODE (NOT NECESSARILY A

ONE WAIT FOR ONE REPLICA TO RESPOND

QUORUM WAIT FOR A QUORUM (N/2+1) TO RESPOND

ALL WAIT FOR ALL N REPLICAS TO RESPOND

ONE WAIT FOR ONE REPLICA TO RESPOND

QUORUM WAIT FOR A QUORUM (N/2+1) TO RESPOND

ALL WAIT FOR ALL N REPLICAS TO RESPOND

SUPERCOLUMN A named collection of columns, for

COMMIT LOGS are append only files where new

SSTABLES are serialized memtables, persisted to

COMPACTION periodically merges multiple

Gie ﬆuﬀ  bg box! Gie 2  3

Credit: C. Mayhew & R. Simmon (NASA/GSFC), NOAA/NGDC, DMSP Digital Archive

1 2 I’m sad. Me o.

Let’s py xbox.

worker pool worker pool

gate auicn; fwdg

gate auicn; fwdg

service buss logic - bic validn

gate auicn; fwdg

service buss logic - bic validn

cass cd ﬆage

gate auicn; fwdg

service buss logic - bic validn

cass cd ﬆage

worker pool buss logic - ﬆage/xg

gate auicn; fwdg

service buss logic - bic validn

cass cd ﬆage

worker pool buss logic - ﬆage/xg

index awome sauce f qryg

•API-level failure scenarios

Local code changes should result in production

repoman upload myrepo sg-example.tar.gz

repoman promote development/sg-example staging

Congratz, you have now just automated yourself

twitter_stream twitter(“dsmitts”, “mypassword”, “url”)

You might also like