Professional Documents
Culture Documents
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. OSDI 2006
Google Scale
Lots of data
Copies
of the web, satellite data, user data, email and USENET, Subversion backing store
afford it if there was one Might not have made appropriate design choices
Firm believers in the End-to-End argument 450,000 machines (NYTimes estimate, June 14th 2006
2
Building Blocks
Scheduler (Google WorkQueue) Google Filesystem Chubby Lock service Two other pieces helpful but not required
Sawzall
MapReduce
Chubby
{lock/file/name} service Coarse-grained locks, can store small amount of data in a lock 5 replicas, need a majority vote to be active Also an OSDI 06 Paper
<Row, Column, Timestamp> triple for key - lookup, insert, and delete API Arbitrary columns on a row-by-row basis
Column family:qualifier. Family is heavyweight, qualifier lightweight Column-oriented physical store- rows are sparse! No table-wide integrity constraints No multirow transactions
SSTable
Immutable, sorted file of key-value pairs Chunks of data plus an index
Index
64K block
64K block
64K block
Index
Tablet
Contains some range of rows of the table Built out of multiple SSTables
Start:aardvark
64K block 64K block
End:apple
SSTable 64K block 64K block 64K block SSTable
Index
Index
Table
Multiple tablets make up the table SSTables can be shared Tablets do not overlap, SSTables can overlap
Tablet Tablet apple
aardvark
apple_two_E
boat
SSTable SSTable
SSTable SSTable
Finding a tablet
Stores: Key: table id + end row, Data: location Cached at clients, which may detect data to be incorrect
Servers
Tablet servers manage tablets, multiple tablets per server. Each tablet is 100-200 MB
Each
tablet lives at only one server Tablet server splits tablets that get too big
11
Masters Tasks
Chubby gives lease on lock, must be renewed periodically Server loses lock if it gets disconnected
Master
Finds
live servers (scan chubby directory) Communicates with servers to find assigned tablets Scans metadata table to find all tablets
Keeps track of unassigned tablets, assigns them Metadata root from chubby, other metadata tablets assigned before scanning.
13
Metadata Management
Master handles
table
Tablet servers directly update metadata on tablet split, then notify master
lost
14
Editing a table
May contain deletion entries to handle updates Group commit on log: collect multiple updates before log flush
tablet log
Delete
Insert Delete Insert
SSTable SSTable
GFS
15
Compactions
Merging compaction
Reduce
Major compaction
Merging
compaction that results in only one SSTable No deletion records, only live data
16
Locality Groups
Avoid mingling data, e.g. page contents and page metadata Can keep some groups all in memory
bitmap on keyvalue hash, used to overestimate which records exist avoid searching SSTable if bit not set Major compaction (with concurrent updates) Minor compaction (to catch up with updates) without any concurrent updates Load on new server without requiring any recovery action
Tablet movement
17
Log Handling
tablet movement when server fails, tablets divided among multiple servers
can cause heavy scan load by each such server optimization to avoid multiple separate scans: sort log by (table, rowname, LSN), so logs for a tablet are clustered, then distribute
two separate logs, one active at a time can have duplicates between these two
18
Immutability
caching, sharing across GFS etc no need for concurrency control SSTables of a tablet recorded in METADATA table Garbage collection of SSTables done by master On tablet split, split tables can start off quickly on shared SSTables, splitting them lazily
Microbenchmarks
20
21
Application at Google
22
Lessons learned
Interesting point- only implement some of the requirements, since the last is probably not needed Many types of failure possible Big systems need proper systems-level monitoring Value simple designs
23