Bigtable: A Distributed Storage System For Structured Data

BIGTABLE: A DISTRIBUTED STORAGE SYSTEM FOR STRUCTURED DATA
ERIC JOHN S. DAILISAN
PINMALUDPOD, URDANETA CITY, PANGASINAN
EJSDDDDD@GMAIL.COM
09152227260
INTRODUCTION data, simply add a second cluster to your instance, and replication
starts automatically. No more managing masters or regions; just
Traditional relational databases present a view that is
design your table schemas, and Cloud Bigtable will handle the rest
composed of multiple tables, each with rows and named columns.
for you.
Queries, mostly performed in SQL (Structured Query Language)
allow one to extract specific columns from a row where certain Cluster resizing without downtime
conditions are met (e.g., a column has a specific value). Moreover,
You can increase the size of a Cloud Bigtable cluster for a few
one can perform queries across multiple tables (this is the
hours to handle a large load, then reduce the cluster's size again—
"relational" part of a relational database). For example a table of
all without any downtime. After you change a cluster's size, it
students may include a student's name, ID number, and contact
typically takes just a few minutes under load for Cloud Bigtable to
information. A table of grades may include a student's ID number,
balance performance across all of the nodes in your cluster.
course number, and grade. We can construct a query that extracts a
grades by name by searching for the ID number in the student table Open Source
and then matching that ID number in the grade table. Moreover,
with traditional databases, we expect ACID guarantees: that Bigtable is available as open source, which is a major advantage as
transactions will be atomic, consistent, isolated, and durable. As it enriches the kind of comments and contributions it receives over
we saw when we studied distributed transactions, it is impossible time. Users are then assured a good degree of improvement and
to guarantee consistency while providing high availability and addition with an active developer base in the open source
network partition tolerance. This makes ACID databases context. This also means that Bigtable would adhere to the
unattractive for highly distributed environments and led to the required industry standards. For example, the HBase API, which is
emergence of alternate data stores that are target to high one of the most popularly used bases, is seamlessly supported and
availability and high performance. Here, we will look at the organizations that already use products like HBase would find it
structure and capabilities of BigTable. doubly simple to set up Bigtable for their data.
BODY Security
Google Bigtable is a distributed, column-oriented data store With large amounts of data, concerns for data security also escalate
created by Google Inc. to handle very large amounts of structured just as much. Bigtable offers a replicated storage strategy, with
data associated with the company's Internet search and Web algorithms for encryption of data; something that is sure to help
services operations. allay these concerns. Customers can also bank on Google’s
expertise in this area, with their long-standing experience of
Bigtable was designed to support applications requiring handling the privacy and security of large amounts of data.
massive scalability; from its first iteration, the technology was
intended to be used with petabytes of data. The database was Maturity
designed to be deployed on clustered systems and uses a simple

Due to the simple fact that Bigtable has been used internally for a
data model that Google has described as "a sparse, distributed,
significant period of time by a data giant like Google, it can
persistent multidimensional sorted map." Data is assembled in
promise a high level of stability and maturity to its users. It is not
order by row key, and indexing of the map is arranged according to
at all comparable to a new and untested product, and might
row, column keys and timestamps. Compression algorithms help
probably score favourably on many fronts when compared to
achieve high capacity.
longstanding players in the arena as well. Due to its internal use,
Cloud Bigtable is exposed to applications through customers can also be sure of its continued availability and
multiple client libraries, including a supported extension to the enhancement. Drawing on its strengths as an organization, Google
Apache HBase library for Java\. As a result, it integrates with the also lists many of its service partners including Pythian, CCRi and
existing Apache ecosystem of open-source Big Data software. Sungard, as companies who can build platforms to help support a
faster transition to Bigtable.
Cloud Bigtable's powerful back-end servers offer several key
advantages over a self-managed HBase installation: Cloud Bigtable is ideal for applications that need very high
throughput and scalability for non-structured key/value data, where
Incredible scalability each value is typically no larger than 10 MB. Cloud Bigtable also
excels as a storage engine for batch MapReduce operations, stream
Cloud Bigtable scales in direct proportion to the number of
processing/analytics, and machine-learning applications.
machines in your cluster. A self-managed HBase installation has a
design bottleneck that limits the performance after a certain You can use Cloud Bigtable to store and query all of the following
threshold is reached. Cloud Bigtable does not have this bottleneck, types of data:
so you can scale your cluster up to handle more reads and writes.
Time-series Data, such as CPU and memory usage over time for
Simple Administration multiple servers.
Cloud Bigtable handles upgrades and restarts transparently, and it Marketing Data, such as purchase histories and customer
automatically maintains high data durability. To replicate your preferences.
Financial Data, such as transaction histories, stock prices, and automatically, saving users the effort of manually administering
currency exchange rates. their tablets. Understanding Cloud Bigtable Performance provides
more details about this process.
Internet of Things Data, such as usage reports from energy
meters and home appliances.
Graph Data, such as information about how users are connected Supported Data Types
to one another.
Cloud Bigtable treats all data as raw byte strings for most
To store the underlying data for each of your tables, Cloud purposes. The only time Cloud Bigtable tries to determine the type
Bigtable shards the data into multiple tablets (Not a typo! Tablets is for increment operations, where the target must be a 64-bit
and tables are different things.), where each tablet contains a integer encoded as an 8-byte big-endian value.
contiguous range of rows within the table.
Memory and disk usage
The following sections describe how several components of Cloud

Bigtable affect memory and disk usage for your instance.
Empty cells
Empty cells in a Cloud Bigtable table do not take up any space.

Each row is essentially a collection of key/value entries, where the
key is a combination of the column family, column qualifier and
timestamp. If a row does not include a value for a specific key, the
key/value entry is simply not present.
Column qualifiers
Column qualifiers take up space in a row, since each column

And here’s the important thing when it comes to tablets: they can qualifier used in a row is stored in that row. As a result, it is often
be reassigned to different nodes in your cluster, on demand, efficient to use column qualifiers as data. In the Prezzy example
allowing Cloud Bigtable to scale and re-balance seamlessly as your shown above, the column qualifiers in the follows family are the
use patterns change. usernames of followed users; the key/value entry for these columns
is simply a placeholder value.
Expected Performance Compactions
Under these typical workloads, Cloud Bigtable delivers highly

Cloud Bigtable periodically rewrites your tables to remove deleted
predictable performance, and according to the official
entries, and to reorganize your data so that reads and writes are
documentation, you can expect to achieve the following
more efficient. This process is known as a compaction. There are
performance for each node in your Cloud Bigtable cluster,
no configuration settings for compactions—Cloud Bigtable
depending on which type of storage your cluster uses:
compacts your data automatically.
Mutations and deletions
Mutations, or changes, to a row take up extra storage space,

In general, a cluster’s performance increases linearly as you add
because Cloud Bigtable stores mutations sequentially and
nodes to the cluster. For example, if you create an SSD cluster with
compacts them only periodically. When Cloud Bigtable compacts a
10 nodes, the cluster can support up to 100,000 QPS for a typical
table, it removes values that are no longer needed. If you update
read-only or write-only workload, assuming that each row contains
the value in a cell, both the original value and the new value will
1 KB of data.
be stored on disk for some amount of time until the data is
Load Balancing compacted.
Each Cloud Bigtable zone is managed by a master process, which

Deletions also take up extra storage space, at least in the
balances workload and data volume within clusters. The master
short term, because deletions are actually a specialized type of
splits busier/larger tablets in half and merges less-accessed/smaller
mutation. Until the table is compacted, a deletion uses extra
tablets together, redistributing them between nodes as needed. If a
storage rather than freeing up space.
certain tablet gets a spike of traffic, the master will split the tablet
in two, then move one of the new tablets to another node. Cloud
Bigtable manages all of the splitting, merging, and rebalancing
Data compression REFERENCES
Paul Krzyzanowski, BigTable, cs.rutgers.edu

Cloud Bigtable compresses your data automatically using an
intelligent algorithm. You cannot configure compression settings Margaret Rouse, Google Bigtable, techtarget.com
for your table. However, it is useful to know how to store data so
that it can be compressed efficiently: Google, Overview of Cloud Bigtable, cloud.google.com
Admin, Cloud Bigtable Launched by Google to Store Big Data,

 Random data cannot be compressed as efficiently as
suyati.com
patterned data. Patterned data includes text, such as
the page you're reading right now.
Colt McAnlis, Cloud Bigtable Performance 101, medium.com
 Compression works best if identical values are near
each other, either in the same row or in adjoining rows.
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh,
If you arrange your row keys so that rows with identical
Deborah A. Wallach,Mike Burrows, Tushar Chandra, Andrew
chunks of data are next to each other, the data can be
Fikes, Robert E. Gruber, Bigtable: A Distributed Storage System
compressed efficiently.
for Structured Data, Google,Inc
Cloud Bigtable and other storage options
Cloud Bigtable is not a relational database; it does not support SQL

queries or joins, nor does it support multi-row transactions. Also, it
is not a good solution for storing less than 1 TB of data.
 If you need full SQL support for an online transaction

processing (OLTP) system, consider Cloud Spanner or
Cloud 0SQL.
 If you need interactive querying in an online analytical

processing (OLAP) system, consider BigQuery.
 If you need to store immutable blobs larger than 10

MB, such as large images or movies, consider Cloud
Storage.
 If you need to store highly structured objects in a

document database, with support for ACID transactions
and SQL-like queries, consider Cloud Datastore.
CONCLUSION
We have described Bigtable, a distributed system for storing

structured data at Google... Our users like the performance and
high availability provided by the Bigtable implementation, and that
they can scale the capacity of their clusters by simply adding more
machines to the system as their resource demands change over
time... Finally, we have found that there are significant advantages
to building our own storage solution at Google. We have gotten a
substantial amount of flexibility from designing our own data
model for Bigtable.

Bigtable: A Distributed Storage System For Structured Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bigtable: A Distributed Storage System For Structured Data

Uploaded by

Copyright:

Available Formats

BIGTABLE: A DISTRIBUTED STORAGE SYSTEM FOR STRUCTURED DATA

ERIC JOHN S. DAILISAN

PINMALUDPOD, URDANETA CITY, PANGASINAN

designed to be deployed on clustered systems and uses a simple

The following sections describe how several components of Cloud

Empty cells in a Cloud Bigtable table do not take up any space.

Column qualifiers take up space in a row, since each column

Expected Performance Compactions

Under these typical workloads, Cloud Bigtable delivers highly

Mutations and deletions

Mutations, or changes, to a row take up extra storage space,

Each Cloud Bigtable zone is managed by a master process, which

Paul Krzyzanowski, BigTable, cs.rutgers.edu

Admin, Cloud Bigtable Launched by Google to Store Big Data,

Cloud Bigtable and other storage options

Cloud Bigtable is not a relational database; it does not support SQL

 If you need full SQL support for an online transaction

 If you need interactive querying in an online analytical

 If you need to store immutable blobs larger than 10

 If you need to store highly structured objects in a

We have described Bigtable, a distributed system for storing

You might also like