Google File System Case Study Overview

1) Google uses the Google File System (GFS) to organize and manipulate huge files across its distributed computing infrastructure. The GFS breaks files into 64MB chunks that are stored across multiple commodity servers. 2) The GFS architecture consists of clients that make file requests, master servers that coordinate metadata and chunk locations, and chunk servers that store file chunks. Clients request chunk locations from the master and read/write directly from chunk servers. 3) Early benchmarking tests showed the GFS could operate efficiently using modest commodity hardware, such as dual 1.4GHz CPUs, 2GB RAM, and 100Mbps Ethernet networking.

Uploaded by

Navneet Sandhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views6 pages

Google File System Case Study Overview

Uploaded by

Navneet Sandhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

CASE STUDY OF DISTRIBUTED SYSTEM

TOPIC: Google File System DOS- 18 April 2012 Submitted ToMr. Mandeep Singh

Submitted ByNavneet Singh Roll no. RKE005B06

Reg. No 3050070091 Class BT MT(CSE)

Section KE005

Introduction
Google is a multi-billion dollar company. It's one of the big power players on the World Wide Web and beyond. The company relies on a distributed computing system to provide users with the infrastructure they need to access, create and alter data. Google uses the google file system to organize and manipulate huge files and to allow application developers the research and development resources they require. The GFS is unique to google and isn't for sale. But it could serve as a model for file systems for organizations with similar needs. Google File System GFS was designed with the following assumptions and objectives:

Component failure is the norm as system is built from inexpensive commodity hardware. Large number files of large sizes are common. In particular, multi-GB files are managed efficiently and support is provided for smaller files. Reads and writes are mainly of two types, large streaming and small random. Priority for high sustained bandwidth over lower latency. Files are modified by appending rather than over-writing.

In addition to usual operations such as create, delete, open, close, read, and write, GFS includes two unique operations called snapshot and record append. Snapshot creates a copy of a file or a directory tree and low cost, whole record append allows multiple client to append data to the same file concurrently while guaranteeing atomicity for each operation. Files on the GFS tend to be very large, usually in the multi-gigabyte (GB) range. Accessing and manipulating files that large would take up a lot of the network's bandwidth. Bandwidth is the capacity of a system to move data from one location to another. The GFS addresses this problem by breaking files up into chunks of 64 megabytes (MB) each. Every chunk receives a unique 64bit identification number called a chunk handle. While the GFS can process smaller files, its developers didn't optimize the system for those kinds of tasks.

Google File System Architecture

Google organized the GFS into clusters of computers. A cluster is simply a network of computers. Each cluster might contain hundreds or even thousands of machines. Within GFS clusters there are three kinds of entities: clients, master servers and chunk servers. Client Client refers to any entity that makes a file request. Requests can range from retrieving and manipulating existing files to creating new files on the system. Clients can be other computers or computer applications. You can think of clients as the customers of the GFS. Master Server The master server acts as the coordinator for the cluster. The master's duties include maintaining an operation log, which keeps track of the activities of the master's cluster. The operation log helps keep service interruptions to a minimum if the master server crashes, a replacement server that has monitored the operation log can take its place. The master server also keeps track of metadata, which is the information that describes chunks. The metadata tells the master server to which files the chunks belong and where they fit within the overall file. Upon startup, the master polls all the chunk servers in its cluster. The chunk servers respond by telling the master server the contents of their inventories. Chunk server Chunk servers are the workhorses of the GFS. They're responsible for storing the 64-MB file chunks. The chunk servers don't send chunks to the master server. Instead, they send requested chunks directly to the client. The GFS copies every chunk multiple times and stores it on different chunk servers. Each copy is called a replica. By default, the GFS makes three replicas per chunk, but users can change the setting and make more or fewer replicas if desired.

Architectural Design

Application GFS client

Chunk location?

GFS Master

Chunk data?

GFS chunkserver Linux file system

Google File system working

File requests follow a standard work flow. A read request is simple the client sends a request to the master server to find out where the client can find a particular file on the system. The server responds with the location for the primary replica of the respective chunk. The primary replica holds a lease from the master server for the chunk in question. If no replica currently holds a lease, the master server designates a chunk as the primary. It does this by comparing the IP address of the client to the addresses of the chunk servers containing the replicas. The master server chooses the chunk server closest to the client. Write requests are a little more complicated. The client still sends a request to the master server, which replies with the location of the primary and secondary replicas. The client stores this information in a memory cache. That way, if the client needs to refer to the same replica later on, it can bypass the master server. The client then sends the write data to all the replicas, starting with the closest replica and ending with the furthest one. It doesn't matter if the closest replica is a primary or secondary. Google compares this data delivery method to a pipeline.

Once the replicas receive the data, the primary replica begins to assign consecutive serial numbers to each change to the file. Changes are called mutations. The serial numbers instruct the replicas on how to order each mutation. The primary then applies the mutations in sequential order to its own data. Then it sends a write request to the secondary replicas, which follow the same application process. If everything works as it should, all the replicas across the cluster incorporate the new data. The secondary replicas report back to the primary once the application process is over.

Google File System Hardware

Google says little about the hardware it currently uses to run the GFS other than it's a collection of off-the-shelf, cheap Linux servers. But in an official GFS report, Google revealed the specifications of the equipment it used to run some benchmarking tests on GFS performance. The test equipment included one master server, two master replicas, 16 clients and 16 chunkservers. All of them used the same hardware with the same specifications, and they all ran on Linux operating systems. Each had dual 1.4 gigahertz Pentium III processors, 2 GB of memory and two 80 GB hard drives. In comparison, several vendors currently offer consumer PCs that are more than twice as powerful as the servers Google used in its tests. Google developers proved that the GFS could work efficiently using modest equipment. The network connecting the machines together consisted of a 100 megabytes-per-second (Mbps) full-duplex Ethernet connection and two Hewlett Packard 2524 network switches. The GFS developers connected the 16 client machines to one switch and the other 19 machines to another switch. They linked the two switches together with a one gigabyte-per-second (Gbps) connection.

Overview of Google File System (GFS)
No ratings yet
Overview of Google File System (GFS)
5 pages
Google File System Seminar Report
No ratings yet
Google File System Seminar Report
28 pages
Google File System and Hadoop Distributed File System-An Analogy
No ratings yet
Google File System and Hadoop Distributed File System-An Analogy
11 pages
HDFS Architecture and Data Management
No ratings yet
HDFS Architecture and Data Management
19 pages
Google File System Overview and Design
No ratings yet
Google File System Overview and Design
31 pages
Google File System Review 2016
No ratings yet
Google File System Review 2016
4 pages
GPS Vs Hdfs
No ratings yet
GPS Vs Hdfs
6 pages
Google File System
No ratings yet
Google File System
6 pages
Questions On Google File System
100% (1)
Questions On Google File System
3 pages
BDA Unit I
No ratings yet
BDA Unit I
18 pages
DS Mod 5.2
No ratings yet
DS Mod 5.2
6 pages
Overview of Google File System (GFS)
No ratings yet
Overview of Google File System (GFS)
22 pages
Case Study: Google File System
No ratings yet
Case Study: Google File System
7 pages
Understanding Google File System Basics
No ratings yet
Understanding Google File System Basics
9 pages
Google File System: Scalable Storage Solution
No ratings yet
Google File System: Scalable Storage Solution
1 page
Google File System Overview and Design
No ratings yet
Google File System Overview and Design
1 page
Bda Material Unit 2
No ratings yet
Bda Material Unit 2
19 pages
CC
No ratings yet
CC
17 pages
Introduction to Distributed Data Processing
No ratings yet
Introduction to Distributed Data Processing
2 pages
Overview of Google File System and Bigtable
No ratings yet
Overview of Google File System and Bigtable
23 pages
Programming Environment For GAE
No ratings yet
Programming Environment For GAE
35 pages
Overview of Google File System (GFS)
No ratings yet
Overview of Google File System (GFS)
48 pages
Overview of Google File System (GFS)
No ratings yet
Overview of Google File System (GFS)
40 pages
Hadoop and Big Data Unit 2
No ratings yet
Hadoop and Big Data Unit 2
11 pages
Google File System Paper Review
No ratings yet
Google File System Paper Review
5 pages
Google File System Overview
No ratings yet
Google File System Overview
9 pages
Big Data: Google File System Overview
No ratings yet
Big Data: Google File System Overview
22 pages
Google File System Overview and Architecture
No ratings yet
Google File System Overview and Architecture
22 pages
Big Data: Google File System Overview
No ratings yet
Big Data: Google File System Overview
22 pages
The Google File System: Alexandru Costan
No ratings yet
The Google File System: Alexandru Costan
38 pages
An Overview of Google File System (GFS) - Medium
No ratings yet
An Overview of Google File System (GFS) - Medium
10 pages
CC Unit-IV
No ratings yet
CC Unit-IV
41 pages
AnalyzingGFS HDFS
No ratings yet
AnalyzingGFS HDFS
11 pages
Google App Engine Programming Guide
No ratings yet
Google App Engine Programming Guide
8 pages
Google File System Overview and Case Study
No ratings yet
Google File System Overview and Case Study
4 pages
Google File System & Hadoop Overview
No ratings yet
Google File System & Hadoop Overview
22 pages
Gfs
No ratings yet
Gfs
15 pages
Overview of Google File System (GFS)
No ratings yet
Overview of Google File System (GFS)
20 pages
Google File System
No ratings yet
Google File System
48 pages
GFS - Architecture M5 GFS - Architecture M5
No ratings yet
GFS - Architecture M5 GFS - Architecture M5
25 pages
Google File System Overview
No ratings yet
Google File System Overview
18 pages
The Google File System
No ratings yet
The Google File System
21 pages
Chapter 2 1712934164766
No ratings yet
Chapter 2 1712934164766
21 pages
Distributed File Systems Overview
No ratings yet
Distributed File Systems Overview
4 pages
1564-Article Text-2810-1-10-20171231 PDF
No ratings yet
1564-Article Text-2810-1-10-20171231 PDF
5 pages
Unit 4
No ratings yet
Unit 4
14 pages
Overview of Google File System (GFS)
No ratings yet
Overview of Google File System (GFS)
20 pages
Google App Engine Overview and Architecture
No ratings yet
Google App Engine Overview and Architecture
41 pages
Google Distributed Systems Overview
No ratings yet
Google Distributed Systems Overview
23 pages
2 GFS
No ratings yet
2 GFS
30 pages
Overview of Google File System Design
No ratings yet
Overview of Google File System Design
52 pages
Overview of Google File System (GFS)
No ratings yet
Overview of Google File System (GFS)
9 pages
Google File System: Scalable Data Storage
No ratings yet
Google File System: Scalable Data Storage
9 pages
Hadoop and Cloud Computing Overview
No ratings yet
Hadoop and Cloud Computing Overview
94 pages
Big Data: Google File System & HDFS Guide
100% (1)
Big Data: Google File System & HDFS Guide
88 pages
Unit 10 Notes
No ratings yet
Unit 10 Notes
5 pages
Garden Snail and Slug Sheet PDF
No ratings yet
Garden Snail and Slug Sheet PDF
1 page
ISA Scholarship Application PDF
No ratings yet
ISA Scholarship Application PDF
6 pages
Unit 1 Progress Test - Version A
No ratings yet
Unit 1 Progress Test - Version A
12 pages
Teacher's Guide to Disney's Descendants
100% (1)
Teacher's Guide to Disney's Descendants
26 pages
Summary The Art of Asking Questions
No ratings yet
Summary The Art of Asking Questions
12 pages
MBA Recruitment Training Report
75% (4)
MBA Recruitment Training Report
63 pages
Food - Recipe Jornal-1
No ratings yet
Food - Recipe Jornal-1
10 pages
Complete List of Admissions To Unsa - Ceprunsa Ii Phase 2018
No ratings yet
Complete List of Admissions To Unsa - Ceprunsa Ii Phase 2018
15 pages
Narration (Practice Sheet - 2)
No ratings yet
Narration (Practice Sheet - 2)
3 pages
English Pupil'S Book Level A 1.2
No ratings yet
English Pupil'S Book Level A 1.2
15 pages
Present Perfect Tense Workshop
No ratings yet
Present Perfect Tense Workshop
12 pages
Human Learning and Instruction Whole
No ratings yet
Human Learning and Instruction Whole
216 pages
LTIMindtree Shortlists For Tech Interviews (TIT)
No ratings yet
LTIMindtree Shortlists For Tech Interviews (TIT)
15 pages
KSLU June 2023 Exam Results Summary
No ratings yet
KSLU June 2023 Exam Results Summary
1 page
Science 5-Q2w1d3-Functions of Digestive System
No ratings yet
Science 5-Q2w1d3-Functions of Digestive System
5 pages
6 Mau Slide Powerpoint Khoa Hoc
No ratings yet
6 Mau Slide Powerpoint Khoa Hoc
12 pages
CV of A Teacher.
No ratings yet
CV of A Teacher.
3 pages
Year 8 Descriptive Writing Scheme
No ratings yet
Year 8 Descriptive Writing Scheme
16 pages
IPCRF Guidelines and Sample Template
100% (4)
IPCRF Guidelines and Sample Template
11 pages
Teaching Algebra and Functions Concepts
No ratings yet
Teaching Algebra and Functions Concepts
138 pages
Research Entrepreneurship Orientation Report CGC
No ratings yet
Research Entrepreneurship Orientation Report CGC
3 pages
Childrens Myth Book Project Outline
No ratings yet
Childrens Myth Book Project Outline
7 pages
Smoking Cessation Survey
No ratings yet
Smoking Cessation Survey
5 pages
Emotional Intelligence Between Adolscence of Nuclear and Joint Family System
No ratings yet
Emotional Intelligence Between Adolscence of Nuclear and Joint Family System
10 pages
Music Endorsement Certification Guide
No ratings yet
Music Endorsement Certification Guide
2 pages
2025 - DME - UG - Revised First Round Allotment List (Provisional) - FR - V002 - 32
No ratings yet
2025 - DME - UG - Revised First Round Allotment List (Provisional) - FR - V002 - 32
187 pages
Advt. Planning 2026 V To IX Jalal Hazari
No ratings yet
Advt. Planning 2026 V To IX Jalal Hazari
42 pages
4E Cognitive Science and Wittgenstein 1st Edition Victor Loughlin Ebook Power Reader Edition
100% (1)
4E Cognitive Science and Wittgenstein 1st Edition Victor Loughlin Ebook Power Reader Edition
41 pages
Problem Solving Workbook
No ratings yet
Problem Solving Workbook
21 pages

Google File System Case Study Overview

Uploaded by

Google File System Case Study Overview

Uploaded by

CASE STUDY OF DISTRIBUTED SYSTEM

Submitted ByNavneet Singh Roll no. RKE005B06

Google File System Architecture

Application GFS client

GFS chunkserver Linux file system

Google File system working

Google File System Hardware

You might also like