Home > Essay examples > In Proceedings of SoCC (pp. 29-43).Google File System (GFS): Overview, Advantages, & Design

Essay: In Proceedings of SoCC (pp. 29-43).Google File System (GFS): Overview, Advantages, & Design

Essay details and download:

  • Subject area(s): Essay examples
  • Reading time: 3 minutes
  • Price: Free download
  • Published: 1 December 2020*
  • Last Modified: 22 July 2024
  • File format: Text
  • Words: 769 (approx)
  • Number of pages: 4 (approx)
  • Tags: Google essays

Text preview of this essay:

This page of the essay has 769 words.



Skip to end of metadata

Created by Xueqi Zhao, last modified on Mar 02, 2018 Go to start of metadata

Introduction

GFS refers to Google File System, a proprietary file system that Google designed to store vast amounts of search data. GFS is a scalable distributed file system for large, distributed applications that access large volumes of data. It runs on cheap commodity hardware and regards server failure as a normal phenomenon, provides fault tolerance, which greatly reduces system cost while ensuring reliability and availability of the system.

GFS is the cornerstone of Google Cloud Storage, and other storage systems such as Google Bigtable, Google Megastore and Google Percolator are built directly or indirectly on top of GFS. In addition, Google's large-scale batch system MapReduce also need to use GFS as the input and output of massive data.

Design

Assumption

The purpose of developing GFS is as following:

Set up on several inexpensive devices instead of large servers which can provide fault tolerance.

Compatible with large streaming reads and small random reads.

Mainly supports sequential writing of large workloads that append data to files, It is compatible with small random writes but efficiency is not guaranteed.

Emphasizes support for multi-person append and read the same file simultaneously.

High sustained bandwidth is more important than low latency in GFS. It primarily serves high-speed processing of large amounts of data rather than low latency individual reads or writes.

Interface

GFS provides a common file system interface rather than a concrete implementation of the API. In addition to providing the usual file operations, GFS also supports snapshot and record append operations. Snapshot create a copy or directory tree for the file and record append allows multiple users to simultaneously append the same file without extra locking.

architecture

A GFS cluster consists of a master and a large number of chunk servers and is accessed by many clients. As shown in Figure 1.

Each machine is a user-level server running commodity Linux (not a large server level). Allows chunkserver and client to be deployed on the same machine.

The file is divided into fixed size chunk, each chunk creates an unchangeable global unique 64-bit chunk handle tag when they are created. All chunk is saved in the form of Linux file by chunkserver, data can be operated through the chunk handle and byte range. Each file has three backups by default, users can set a different repeat level and namespace mark the backup in different locations.

Master saves and manages the entire file system metadata, including namespace, access control information, chunk mapping information, and chunk location information. Master also controls system-level operations such as chunk release management, garbage collection, chunk transfer and so on. Master will periodically communicate with all chunkserver, passing instructions and collecting information through HeartBeat message.

Each GFS client specific API and associated with the application by code. Master will periodically communicate with all chunkserver, passing instructions and collecting information through HeartBeat message.

GFS log system

The operational log contains the key metadata change history. The operation log is the only persistent storage record for metadata and is also the logical timebase that determines the sequence of synchronization operations. The log files are ensured to be complete and the log only visible to the client if the metadata changes persist. GFS copy the log to multiple remote machines, only after the corresponding log records written to the local and remote machine hard disk, will respond to the client's operation request. Master server will collect multiple log records then batch processing, to reduce the impact of writing and copy to the disk on the overall performance of the system.

GFS disaster recovery

The Master server restores the file system to its most recent state by replaying the operation log during disaster recovery. In order to shorten the time Master started, reproduce the log of system operation as little as possible, Master server create a checkpoint when the logs grow to a certain amount. During disaster recovery, the Master server was able to recover the system by reading the Checkpoint file from the disk and repeat a limited number of log files after checkpoint. Checkpoint files are stored as a compressed B-tree data structure, maps directly to memory, and does not require additional parsing when used for namespace queries. Master server recovery requires only the latest checkpoint files and subsequent log files.

Performance

The largest GFS has over 1000 storage nodes, over 300 TB of disk storage, and are heavily accessed by hundreds of clients on distinct machines on a continuous basis.

Reference

Howard, S. G., Gobioff, H., & Leung, S. T. (2004). The Google File System.

Discover more:

About this essay:

If you use part of this page in your own work, you need to provide a citation, as follows:

Essay Sauce, In Proceedings of SoCC (pp. 29-43).Google File System (GFS): Overview, Advantages, & Design. Available from:<https://www.essaysauce.com/essay-examples/2018-3-6-1520336934/> [Accessed 16-01-25].

These Essay examples have been submitted to us by students in order to help you with your studies.

* This essay may have been previously published on EssaySauce.com and/or Essay.uk.com at an earlier date than indicated.

NB: Our essay examples category includes User Generated Content which may not have yet been reviewed. If you find content which you believe we need to review in this section, please do email us: essaysauce77 AT gmail.com.