You are on page 1of 13

The Google File System

(GFS)
Introduction
• Special Assumptions
• Consistency Model
• System Design
• System Interactions
• Fault Tolerance
• (Results)
Assumptions
• The system will always be broken
• Files are BIG
• Large streaming reads / small random
reads
• Large sequential writes (appends)
• Lots of multiple appends
• High sustained bandwidth
Consistency Model
• Consistent: All readers see same thing.
• Defined: You see exactly what you
write.
• Undefined: Consistent, but might not be
exactly as expected.
How do Apps Deal?
• Parts of files are inconsistent
• Must do some checking of data:
– Application level checksums
Single Master Architecture
• Good:
– Has global knowledge
• Can make intelligent placement/replication
decisions.
• Bad:
– Becomes a bottleneck
• Must limit it’s involvement in read/write
Architecture
• Master
– Keeps track of everything

• Chunk Servers
– Where the data lives
– Each chunk is 64MB
• On other file systems ~8KB
Let The Master Rule
• Namespace Locking
• Replica placement
• Creation
• (Garbage Collection)
Metadata
• In Memory
– Fast
– Limited space
• Chunk Locations
– No persistent record
• Op Log
– Every change to metadata
System Interactions
• To write:
– Ask master for chunk locations (cache)
– Push data to all chunks (to a buffer)
– Send write request to primary
– Primary writes changes (in order received)
– Primary forwards to secondaries (in order
received)
– Secondaries write changes, confirm.
Record Append
• Atomic
• Allows for multiple writers
• May cause inconsistent states between
successful appends.
Fault Tolerance
• Restore state fast
• Copies, Copies, Copies
• Checksums for data integrity
Results summary
• When you build a file system around the
specific applications which use the
system, it works well.

You might also like