You are on page 1of 2

This history starts in 1970 with the publication 

of “A Relational Model of Data for Large Shared

Databanks,” an academic paper by Edgar F Cobb.  That original paper introduced a beautiful way

to model data: you build a bunch of cross-linked  tables and store any piece of data just once.

Such a database could answer any question, as  long as the answer was stored somewhere within
it.

Disk space would be used efficiently at a time  when storage was expensive. It was marvelous.

It was the future. The first commercial  implementation arrived in the late 1970s and

during the 80s and 90s relational databases grew  increasingly dominant. Delivering rich indexes

to make any query efficient. Table joins, a  term for read operations that pull together

separate records into one, and transactions, which  meant a combination of reads and especially
writes

across the database but they need to happen  together, were essential. SQL, the structured

query language, became the language of data, and  software developers learned to use it to ask
for

what they wanted and let the database decide how  to deliver it. Strict guarantees were
engineered

in to prevent surprises. And in the first decade  of the new millennium for many business models

that all went out the window. Relational databases  architected around the assumption of running
on a

single machine lacked something that became  essential with the advent of the internet:

they were painfully difficult to scale out. The  volume of data that can be created by millions or

billions of networked humans and devices  is more than any single server can handle.

When the workload grows so heavy that no single  computer can bear the load, when the most

expensive hardware on the market would be brought  to its knees by the weight of an application,

the only path is to move forward from a single  database server to a cluster of database nodes

working in concert. For a legacy SQL database  architected to run on a single server,

this was a painful process requiring  a massive investment of time and often

trade-offs and sacrifices of many of the features  that brought developers to these databases in
the

first place. By the late 2000s, SQL databases  were still extremely popular, but for those
who needed scale there were other options: NoSQL  had arrived on the scene. Google Bigtable,
HDFS,

and Cassandra are a few examples. These NoSQL  databases were built to scale out easily and

to tolerate node failures with minimal disruption  but they came with compromises and
functionality.

Typically, a lack of joins, and transactions, or  limited indexes. Shortcomings the developers had

to constantly engineer their way around. Scale  became cheap but relational guarantees didn't

come with it. The legacy SQL databases have tried  to fill the gap in the years since with add-on

features to help reduce the pain of scaling out.  At the same time NoSQL systems have been
building

out a subset of their missing SQL functionality  but none of these were architected from the
ground

up to deliver what we might call distributed  SQL. And that's where CockroachDB comes in.

You might also like