Professional Documents
Culture Documents
CRDB-Cluster Virtualization and Multi-Tenant CockroachDB-031223-072622
CRDB-Cluster Virtualization and Multi-Tenant CockroachDB-031223-072622
(As an analogy, CockroachDB’s cluster virtualization achieves the virtualization of CockroachDB SQL in a similar way that containers or
VMs achieve a virtualization of hosted servers.)
Today (Summer 2023), cluster virtualization is only available inside the CockroachCloud Serverless product. However, eventually, we wish
to evolve CockroachDB to serve all application traffic using cluster virtualization, including inside CockroachCloud Dedicated and
licensed CockroachDB self-hosted customers.
In the words of our CTO, “Virtual clusters is the way CockroachDB should have been designed from the start.”
This also means that we are now focusing our development to maximize the application developer experience on top of cluster
virtualization.
Care must be taken to distinguish the internal product architecture, discussed here, from the ability to actually run two or more virtual
clusters side-by-side:
Cockroach Labs would retain exclusive right to define more than one virtual clusters side-by-side on a shared storage cluster,
via the Serverless product offering.
In CockroachCloud Dedicated and for self-hosted deployments, applications will be able to utilize a single pre-defined virtual cluster,
without the capability to define more tenants.
Overview of run-time components
Summary table
What's virtualized New name for the virtualized Previous terminology New name for the physical
logical concept infrastructure
The CockroachDB cluster NEW: “Virtual cluster” or “Cluster” N/A: the underlying
service, as a whole alternatively “logical cluster” infrastructure is not visible to
end-users any more.
Run-time state for a (virtual) “VC Servers/pods” “Servers/pods” NEW: “Shared storage
cluster servers/pods”
On-disk state for a (virtual) NEW: “VC-specific data” or “CockroachDB data” NEW: “Shared storage data”
cluster “virtual keyspace”
new: the SQL interface used to administer other virtual clusters = system interface (Previously “system tenant”)
Beware of the difference between “Shared storage cluster” (deployed system) and “System interface” (logical cluster an
administrator connects to, to create additional virtual clusters)
Routes SQL clients to the “SQL proxy” “SQL proxy instance” “SQL proxy server” “SQL proxy pod”
right server
Runs SQL queries “SQL” or “SQL gateway” “SQL instance” “SQL server”or “SQL- “SQL pod”(implies “SQL-
only server” to highlight only server”)
server contains no KV
instance
Runs KV queries “KV components” “KV instance” “KV server” but the term N/A, we don't currently
(plural) is inclusive of mixed run KV-only servers.
servers, we don't yet
support KV-only
servers.
Stores data for multiple NEW: “Shared storage NEW: “Shared storage
virtual clusters, 1 unit server” pod”
Runs both SQL and KV NEW: “Mixed SQL/KV NEW: “Mixed SQL/KV
queries servers” pods”
Stores data for all virtual NEW: “Shared storage NEW: “Shared storage
clusters, fleet of all cluster” cluster”
servers
We also use the word “node” to designate either a unix process or Docker container, when the distinction does not matter.
This complete fleet of “All the things” is named a Serverless host cluster.
Architectural terms
SQL Proxy
Role:
“Instance”: a run-time realization of a data structure in the source code. Think: class vs object.
TCP/UDP ports are attached to instances.
“Server”: a unix process started from an executable file. Contains diverse instances.
CPU/memory/IOPS accounting commonly happens here.
“Pod”: a container, a kind of reduced virtual machine that can be managed by Kubernetes.
For example:
We use the word “Node” when the distinction between “server” and “pod” does not matter.
SQL
Role (collective):
“KV instance”: Accepts and serves KV requests for SQL instances. This does exist.
“KV-only server”: This does not exist yet: we have not yet built the capability to run a process containing only a KV instance.
N x KV instances
This gives a total of 2 or 3 times N instances able to run services, inside the same 'demo' process.
From the perspective of the users of 'cockroach demo', such a server process has two interfaces:
This could (hypothetically) be used to create additional virtual clusters inside the demo process.
There's currently some UX misdesign, in that the existence of two separate virtual clusters is not clear to the user of cockroach demo . We
know about this shortcoming and it should get fixed at some point.
We’ve called this the “Serverless Host Cluster”, often simplified “host cluster” and this includes:
Logical concepts
The essence of cluster virtualization is to introduce logical boundaries inside of a shared architecture — for the purpose of separate billing,
running client apps side-by-side, avoiding interference, etc. So we also need words to designate those things that have received logical
boundaries.
These concepts exist on a different semantic level than the run-time “deployment” aspects covered above. Hence the need for a separate
vocabulary.
Virtual CockroachDB clusters
To the extent that CockroachDB is perceived to serve a “database product” to end-users, the new architecture creates a virtualization of this
product.
Datacenter hosting went from physical machines to virtual machines (VMs) running on a shared physical infrastructure.
Memory architectures have this same split between physical addressing (corresponding to hardware) and virtual addressing (multiple
logical address spaces using shared hardware, coordinated by MMUs).
Operating systems enable sharing physical processing units (cores) to present virtual processing units (threads) to software.
The architecture shares a physical cluster (a set of interconnected shared storage servers) to produce the illusion of many virtual clusters
for end-users.
We're going to call the owner of virtual clusters and their adjacent data, tenants.
This “owner” abstraction exists beyond the CC serverless infrastructure: when our self-hosted customers ask us to deploy multi-tenant in
their infrastructure, it's because they want to split ownership of a physical cluster between multiple sub-organizations.
It really owns an adjacent constellation of data that is not shared with other tenants, including:
The tenant-specific keyspace, that defines the virtual CockroachDB cluster in KV.Also virtual keyspace.
The tenant-specific log files.
The tenant-specific heap, profile and goroutine dumps.
The tenant-specific crash dumps.
The tenant-specific exported traces.
The tenant-specific debug zips.
The tenant-specific backups and exports.
The tenant-specific metrics.
The state of a virtual cluster is the collection of all the related tenant-specific data.
In other words, our architecture (currently) implies that a SQL-only server corresponds to exactly one tenant, the one that owns the virtual
cluster served by that SQL server.
We are thus tempted to equate the phrases “tenant server” = “SQL-only server” = “virtual cluster server/service”.
However, consider that next to SQL nodes (servers and pods), a deployment would also run other pods that are specific to just one tenant;
for example, a Prometheus pod and a log collector.
We'll name the fleet of run-time nodes (servers and pods) that are serving just one tenant, the tenant nodes (servers and pods). This
includes both SQL-only servers but also other tenant-specific services needed to serve a virtual cluster.
This was not the only possible choice; we could have chosen to design an API separate from SQL, that exists “outside” of the virtual cluster
APIs. But here we are.
So we need a word to designate that virtual cluster. To follow established terminology, we will call this the system interface.
Today, the term “system interface” largely overlaps with “shared storage cluster” because, implementation-wise, we have chosen to give
SQL semantics to the keyspace that does not use a VC prefix. However, this choice may be revisited in the future, such that we mandate a
VC prefix for all logical clusters including the system cluster. Should such plans materialize, the system interface would be supported by a
virtual cluster too. It is thus useful to be disciplined about distinguishing the term “system interface”, which designates a SQL
interface, and “shared storage cluster”, which designates the set of interconnected storage servers.
This system interface and all its “own” data also has an owner, which in the context of CC is Cockroach Labs itself. The owner of the system
interface is the system tenant.
That's what our current “mixed SQL/KV servers” are about. They contain:
However, this is not the only way we can do this. In fact, we could also make a plan to enable running SQL instances for the system
interface in a separate SQL-only server.
Generally, we'll call any server that contains at least one SQL instance for the system interface, a system server. Our current shared
storage servers are also system servers; our future SQL-only servers with system cluster capability will be system servers too.
Our unit tests also run many SQL instances side-by-side, including multiple SQL instances that operate on system clusters; inside the
context of tests, these are system instances.
At run-time:
The SQL proxy node(s) (server(s) and pod(s)), which routes SQL client apps to their own virtual cluster.
The shared storage/DB nodes (servers and pods).
The networked shared storage/DB cluster, as a fleet of nodes.
The run-time state of the system interface.
On disk:
The aggregate state of all virtual clusters stored on a single storage cluster.