0% found this document useful (0 votes)

87 views35 pages

Google File System Overview and Analysis

The document describes the Google File System (GFS), which was designed to provide reliable, scalable storage for Google's infrastructure. It discusses the key components of GFS, including the master server that manages metadata, chunkservers that store file data, and algorithms for writing and reading files. GFS employs replication across chunkservers for reliability and uses locking based on file paths to enable high concurrency while avoiding deadlocks.

Uploaded by

Rajib Mandal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views35 pages

Google File System Overview and Analysis

Uploaded by

Rajib Mandal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

The Google File System

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

SOSP ’03

CSC 456 Operating Systems

Seminar Presentation (11/11/2010)

Elif Eyigöz, Walter Lasecki

Outline
Background
Architecture
Master
Chunkserver
Write/ReadAlgorithm
Namespace management / Locking

Reliability
Replication in Chunkservers
Replication in Master

Conclusion
Comparison with AFS and NFS

2
Background – Introduction
Google
Search engine: Huge workload
Applications: Lots of data being processed
Extreme scale: 100 TB of storage across 1000s of disks on over a
thousand machines, accessed by hundreds of clients

AFS, NFS versus GFS

Read/write, Cache, Replication, Consistency, Faults

3
Background – Motivational Facts
Extreme scale:

Thousands of commodity-class PC's

Multiple clusters distributed worldwide
Thousands of queries served per second
One query reads 100's of MB of data
One query consumes 10's of billions of CPU cycles
Google stores dozens of copies of the entire web

Goal: Large, distributed, highly fault tolerant file system

4
Background – Design Observations
Component failures are normal
Fault tolerance and automatic recovery are needed

Huge files are common (Multi GB)

I/O and block size assumptions

Record appends are more prevalent than random writes

Appending is the focus of performance optimization and atomicity
guarantees

5
Outline
Background
Architecture
Master
Chunkserver
Write/ReadAlgorithm
Namespace management / Locking

Reliability
Replication in Chunkservers
Replication in Master

Conclusion
Comparison with AFS and NFS

6
Architecture

7
The Role of the Master
Metadata Server : Maintain all file system metadata

File namespace
File to chunk mapping
Chunk location information
Keep all the metadata in the master's memory (FAST)

Location of chunks and their replicas:

Master does not keep a persistent record (no disk I/O)

Operations log for persistence and recovery
Poll it from chunkservers (master as monitor, no extra cost)

8
Master as metadata server

Monitor

 Heartbeat messages to detect the state of each chunkserver

 Communicate with chunkservers periodically

Centralized Controller

 System-wide activities
 Manage chunk replication in the whole system

9
Master
Single master
Simplify the design of the system
Control chunk placement using global knowledge
Potential bottleneck?

Avoiding bottleneck
Clients do no read/write data through master
Clients cache metadata (e.g., chunk handles, locations)
Client typically asks for multiple chunks in the same request
Master can also include the information for chunks immediately
following those requested

10
Caching
Metadata
Cached on client after being retrieved from the Master
Only kept for a period of time to prevent stale data

File caching
Clients
never cache file data
Chunkservers never cache file data (Linux’s buffer will for local files)
File working sets too large
Simplifies cache coherence

11
Chunkserver
Large chunk size
64 MB: much larger than typical file system blocks
Advantages:
reduces network overhead via reduced client-server interaction
reduces network overhead
reduces the size of metadata stored on master
Disadvantages:
internal fragmentation (solution: lazy space allocation)
large chunks can become hot spots:
(solutions: higher replication factor, staggering application start
time, client-to-client communication)

12
Write/Read Algorithm
Write Algorithm

13
Write/Read Algorithm
Write Algorithm

14
Write/Read Algorithm
Write Algorithm

15
Write/Read Algorithm
Write Algorithm

16
Write/Read Algorithm
Read Algorithm

17
Write/Read Algorithm
Read Algorithm

18
Namespace Management
Namespace
How file system represent and manage the hierarchy of files and
directories

/dir
/dir/file1
/dir/file2

dir is a directory ?
file1 is a file
file2 is a file

User’s perspective File system’s perspective

19
Namespace Management
Traditional file systems
Per-directorydata structure that lists all files in the directory
Hierarchy: based on parent-child tree

/dir
/dir/file1 dir (parent)
/dir/file2

dir is a directory file1 file2

file1 is a file (child) (child)
file2 is a file

User’s perspective File system’s perspective

20
Namespace Management
Google File System
No such per-directory data structure
Lookup table mapping full pathname to metadata
Hierarchy: based on full pathname

/dir /dir
/dir/file1 /dir/file1
/dir/file2 /dir/file2

dir is a directory
file1 is a file
file2 is a file

User’s perspective File system’s perspective

21
Namespace Management
What’s the advantage of the namespace management in
Google File System?
Locking
Transaction operation
Deadlock prevention

22
Locking – Transactional Operation
Traditional file systems
Locking on thecertain data structure (e.g., mutex)
Example: Create new file under a directory
e.g., create /dir/file3
lock (dir->mutex)
create file3
add file3 to dir
unlock (dir->mutex)

Only one creation in a certain directory at one time

Reason: lock the directory
Prevent from creating files with the same name simultaneously

Add new file into the directory

23
Locking – Transactional Operation
Google File System
Locking on thepathname
Example: Create new file under a directory
e.g., create /dir/file3, /dir/file4/, ……
read_lock (/dir) read_lock (/dir)
write_lock (/dir/file3) write_lock
create file3 (/dir/file4) …
unlock (/dir/file3) create file4
unlock (/dir) unlock (/dir/file4)
unlock (/dir)

Allow concurrent mutations in the same directory

Key: using read lock for dir. Why it can?
By locking pathname, it can lock the new file before it is created  Prevent from
creating files with the same name simultaneously
24
No per-directory data structure  do not need to add new file into the directory
Locking – Deadlock Prevention
Traditional file systems (Linux)
Hierarchy: based on parent-child tree
Ordering for resources: parent, child
Ordering for requests : lock parent first, then lock child

Problem: e.g., rename operation

Operation 1 root root Operation 2
dir1 dir2 dir1 dir2
file1 file2 file1 file2
( mv /root/dir1/file1 /root/dir2/file2 ) ( mv /root/dir1/file2 /root/dir2/file1 )

Firststep of rename: lock dir1 and dir2

But it cannot decide which one should be locked first
Linux only allows one rename operation one time in one FS
25
Locking – Deadlock Prevention
Google file system
Hierarchy: based on full pathname
Based on consistent total order
Firstordered by level in the namespace tree
Lexicographically within the same level

Example: rename operation

Operation 1 : mv /root/dir1/file1 /root/dir2/file2
Operation 2 : mv /root/dir1/file2 /root/dir2/file1

Firststep of rename: lock dir1 and dir2

Level order : dir1 and dir2 are in the same level, so then
Lexicographical order : Lock dir1 first

26
Outline
Background
Architecture
Master
Chunkserver
Write/ReadAlgorithm
Namespace management / Locking

Reliability
Replication in Chunkservers
Replication in Master

Conclusion
Comparison with AFS and NFS

27
Replication in Chunkservers
Replication
Create redundancy
Each chunk is replicated on multiple chunkservers
By default, store three replicas
Replication of chunks
Each mutation is performed at all the chunk’s replicas
Split the replication into two flows
Data flow : replicate the data
Control flow : replicate the write requests and their order

28
Replication in Chunkservers
Data flow (Step 3)
Client pushes data to all replicas
Data is pushed linearly along a chain of
chunkservers in a pipelined fashion.
e.g., Client  Sec. A  Primary  Sec. B
Once a chunkserver receives some data, it
starts forwarding immediately

29
Replication in Chunkservers
Control flow
(Step 4) After data flow, client send write
request to the primary
The primary assigns consecutive serial
number to all the write requests it received
(Step 5) The primary forwards the write
requests and the serial number order to all
secondary replicas

30
Replication in Master
Replication of master state
Log of all changes made to metadata
Periodic checkpoints of the log
Log and checkpoints are replicated on multiple machines
If master fails, another replica would be activated as the new master
“shadow” masters provide read-only service
Not mirrors,
may lag the master slightly
Enhance read throughput and availability for some files

31
Outline
Background
Architecture
Master
Chunkserver
Write/ReadAlgorithm
Namespace management / Locking

Reliability
Replication in Chunkservers
Replication in Master

Conclusion
Comparison with AFS and NFS

32
Conclusion
AFS, NFS and GFS
 Read/write, Cache, Replication, Consistency, Faults
The design is driven by Google’s application workloads and
technological environment
Advantages
Know how to improve performance
Design the file system only for Google

Disadvantages
May not be adopted in other system
Single master may be still a potential bottleneck

33
References
*1+ The Google File System, SOSP ’03
*2+ Presentation slide of GFS in SOSP ’03
*3+ “Operating System Concepts”, 8th edition, 2009
[4] Linux 2.6.20

34
Thank you!

The Google File System
No ratings yet
The Google File System
21 pages
Google File System
No ratings yet
Google File System
22 pages
The File System: Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung (Google)
No ratings yet
The File System: Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung (Google)
31 pages
M4 - 05 - Google File System
No ratings yet
M4 - 05 - Google File System
28 pages
The Google File System: Alexandru Costan
No ratings yet
The Google File System: Alexandru Costan
38 pages
Google File System for Developers
No ratings yet
Google File System for Developers
28 pages
Overview of Google File System (GFS)
No ratings yet
Overview of Google File System (GFS)
40 pages
Lecture 4.1 - Hadoop - MapReduce - Hbase
No ratings yet
Lecture 4.1 - Hadoop - MapReduce - Hbase
94 pages
Overview of Google File System (GFS)
No ratings yet
Overview of Google File System (GFS)
20 pages
Introduction to Distributed Data Processing
No ratings yet
Introduction to Distributed Data Processing
2 pages
The Google File System: Firas Abuzaid
No ratings yet
The Google File System: Firas Abuzaid
22 pages
Cloud Storage Systems: Unit-Iii
No ratings yet
Cloud Storage Systems: Unit-Iii
40 pages
Unit 5 Lecture 2
No ratings yet
Unit 5 Lecture 2
22 pages
Refer Slide Time: 00:15
No ratings yet
Refer Slide Time: 00:15
31 pages
Google File System Overview
No ratings yet
Google File System Overview
18 pages
The Google File System: S. Ghemawat, H. Gobioff, and S. T. Leung. SOSP 2003
No ratings yet
The Google File System: S. Ghemawat, H. Gobioff, and S. T. Leung. SOSP 2003
33 pages
Chap 6
No ratings yet
Chap 6
54 pages
Cloud Application Requirements Overview
No ratings yet
Cloud Application Requirements Overview
21 pages
Google File System Review 2016
No ratings yet
Google File System Review 2016
4 pages
Ds 2016 17 Lec18
No ratings yet
Ds 2016 17 Lec18
26 pages
2 GFS
No ratings yet
2 GFS
30 pages
Thegooglefilesystem Lecturebyromainjacotin 141001154546 Phpapp02
No ratings yet
Thegooglefilesystem Lecturebyromainjacotin 141001154546 Phpapp02
52 pages
Paper Gfs Summary
No ratings yet
Paper Gfs Summary
14 pages
Distributed Computing Module 5 Important Topics PYQs
No ratings yet
Distributed Computing Module 5 Important Topics PYQs
23 pages
05 en Distributed File Systems
No ratings yet
05 en Distributed File Systems
63 pages
GFD Summary
No ratings yet
GFD Summary
3 pages
Storage Systems
No ratings yet
Storage Systems
23 pages
Google File System Seminar Report
No ratings yet
Google File System Seminar Report
28 pages
Distributed File System Study
No ratings yet
Distributed File System Study
4 pages
Google File System 1
No ratings yet
Google File System 1
48 pages
Overview of LFS, NFS, and AFS Systems
No ratings yet
Overview of LFS, NFS, and AFS Systems
37 pages
Google File System Architecture Overview
No ratings yet
Google File System Architecture Overview
40 pages
Google File System Overview
No ratings yet
Google File System Overview
9 pages
Distributed File System Google File System
No ratings yet
Distributed File System Google File System
44 pages
BDA Unit I
No ratings yet
BDA Unit I
18 pages
Case Study: Google File System
No ratings yet
Case Study: Google File System
7 pages
Overview of Google File System (GFS)
No ratings yet
Overview of Google File System (GFS)
20 pages
Chapter 2 1712934164766
No ratings yet
Chapter 2 1712934164766
21 pages
18-Distributed File Systems Study On Operating Systems
No ratings yet
18-Distributed File Systems Study On Operating Systems
24 pages
Unit-3 Part1
No ratings yet
Unit-3 Part1
57 pages
Hadoop and Big Data Unit 2
No ratings yet
Hadoop and Big Data Unit 2
11 pages
DFS Design and Implementation: Brent R. Hafner
No ratings yet
DFS Design and Implementation: Brent R. Hafner
40 pages
File System Design & Access Methods
No ratings yet
File System Design & Access Methods
40 pages
9238 DC Assignment 3
No ratings yet
9238 DC Assignment 3
5 pages
Distributed File Systems Guide
No ratings yet
Distributed File Systems Guide
35 pages
Google File System
No ratings yet
Google File System
48 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
6 pages
Chapter 2 Google File System 250525 070947
No ratings yet
Chapter 2 Google File System 250525 070947
42 pages
CSCI319 Distributed Systems
No ratings yet
CSCI319 Distributed Systems
26 pages
Distributed File System Overview
100% (1)
Distributed File System Overview
30 pages
DFSNov 1
No ratings yet
DFSNov 1
36 pages
DC - PPT A Case Study On Distributed File Systems
No ratings yet
DC - PPT A Case Study On Distributed File Systems
17 pages
Distributed File System Requirements
No ratings yet
Distributed File System Requirements
4 pages
Storage Systems
No ratings yet
Storage Systems
23 pages
04 en Network File Systems
No ratings yet
04 en Network File Systems
57 pages
An Overview of Google File System (GFS) - Medium
No ratings yet
An Overview of Google File System (GFS) - Medium
10 pages
CS2510 00 Distributed Storage Overview
No ratings yet
CS2510 00 Distributed Storage Overview
53 pages
Distributed File Systems
No ratings yet
Distributed File Systems
35 pages
MCA Windows Programming Test Questions
No ratings yet
MCA Windows Programming Test Questions
1 page
UGC NET Answer Key Challenge Details
No ratings yet
UGC NET Answer Key Challenge Details
5 pages
CS411-2010F-08 Context-Free Grammars 1
No ratings yet
CS411-2010F-08 Context-Free Grammars 1
10 pages
CS411-2010F-08 Context-Free Grammars 1
No ratings yet
CS411-2010F-08 Context-Free Grammars 1
10 pages
Shuffled Frog Leaping Algorithm A Memetic Meta Heuristic For Discrete Optimization
No ratings yet
Shuffled Frog Leaping Algorithm A Memetic Meta Heuristic For Discrete Optimization
27 pages
Web API Succinctly
No ratings yet
Web API Succinctly
94 pages
Operating System MCQ Bank
86% (7)
Operating System MCQ Bank
21 pages
Self Declaration of Project
100% (1)
Self Declaration of Project
1 page
C++ Programming Assignments
No ratings yet
C++ Programming Assignments
16 pages
GPU Computing Guide
No ratings yet
GPU Computing Guide
33 pages
2 - Side Slip Tester
No ratings yet
2 - Side Slip Tester
3 pages
Power Toys Configration
No ratings yet
Power Toys Configration
7 pages
Arithmetic Algorithm
No ratings yet
Arithmetic Algorithm
16 pages
S262
0% (1)
S262
3 pages
SSL: Key to Web Security Architecture
No ratings yet
SSL: Key to Web Security Architecture
8 pages
Nissan Consult Protocol Specification Issue 5
No ratings yet
Nissan Consult Protocol Specification Issue 5
5 pages
Lenovo IdeaPad 3 Users Manual
No ratings yet
Lenovo IdeaPad 3 Users Manual
34 pages
Oracle EBS R12 Init. Parameters
No ratings yet
Oracle EBS R12 Init. Parameters
27 pages
Mainboard d3173 d3183
No ratings yet
Mainboard d3173 d3183
20 pages
Lync Server 2013 Setup Guide
No ratings yet
Lync Server 2013 Setup Guide
14 pages
USB 3.0 to VGA Adapter Guide
No ratings yet
USB 3.0 to VGA Adapter Guide
6 pages
SS7 Signaling System Overview
No ratings yet
SS7 Signaling System Overview
148 pages
Mitsubishi VLSI MOS Memory RAM ROM and Memory Cards Jan91
No ratings yet
Mitsubishi VLSI MOS Memory RAM ROM and Memory Cards Jan91
170 pages
ICT (Question Paper)
No ratings yet
ICT (Question Paper)
4 pages
TCS Syllabus Updated
No ratings yet
TCS Syllabus Updated
9 pages
Netmanias.2019.01.29 - 5G Traffic Flow
No ratings yet
Netmanias.2019.01.29 - 5G Traffic Flow
1 page
DejaVuln Autoroot Script for webOS
No ratings yet
DejaVuln Autoroot Script for webOS
7 pages
Implementing IBM Storewise v7000 Unified
No ratings yet
Implementing IBM Storewise v7000 Unified
372 pages
Virtualization
No ratings yet
Virtualization
28 pages
COPA Skill Chart Overview
No ratings yet
COPA Skill Chart Overview
17 pages
Computer Networks Assignment Week 2
No ratings yet
Computer Networks Assignment Week 2
3 pages
Error in Phases MAIN - SWITCH/JOB - RSVBCHCK2 or MAIN - SWITCH/JOB - RSVBCHCK - D Checks After Phase MAIN - INIT/JOB - RSVBCHCK Were Negative
No ratings yet
Error in Phases MAIN - SWITCH/JOB - RSVBCHCK2 or MAIN - SWITCH/JOB - RSVBCHCK - D Checks After Phase MAIN - INIT/JOB - RSVBCHCK Were Negative
3 pages
Computer Communication Course Guide
No ratings yet
Computer Communication Course Guide
2 pages
E-Business Suite Technology Stack Certification Roadmap
No ratings yet
E-Business Suite Technology Stack Certification Roadmap
70 pages
Preparing DEM Data for QSWAT
No ratings yet
Preparing DEM Data for QSWAT
12 pages
Adveon Installation Guide V1.3 and V1.5 - Edgecam - Advanced
No ratings yet
Adveon Installation Guide V1.3 and V1.5 - Edgecam - Advanced
54 pages
Brumund Building A Smallish v1
No ratings yet
Brumund Building A Smallish v1
30 pages
How To Install Fgtech 2 v53 On Windows XP System
No ratings yet
How To Install Fgtech 2 v53 On Windows XP System
10 pages
Tohid Korbu Resume
No ratings yet
Tohid Korbu Resume
1 page

Google File System Overview and Analysis

Uploaded by

Google File System Overview and Analysis

Uploaded by

The Google File System

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

CSC 456 Operating Systems

Elif Eyigöz, Walter Lasecki

AFS, NFS versus GFS

Thousands of commodity-class PC's

Goal: Large, distributed, highly fault tolerant file system

Huge files are common (Multi GB)

Record appends are more prevalent than random writes

Location of chunks and their replicas:

Master does not keep a persistent record (no disk I/O)

 Heartbeat messages to detect the state of each chunkserver

User’s perspective File system’s perspective

dir is a directory file1 file2

User’s perspective File system’s perspective

User’s perspective File system’s perspective

Only one creation in a certain directory at one time

Add new file into the directory

Allow concurrent mutations in the same directory

Problem: e.g., rename operation

Firststep of rename: lock dir1 and dir2

Example: rename operation

Firststep of rename: lock dir1 and dir2

You might also like