Professional Documents
Culture Documents
Jun 3
Written By John Ryan
(6 minute read)
Although not intended as a Snowflake data warehouse tutorial, this article will explain what
is Snowflake, which platforms does Snowflake support, and the key aspects of this ground
breaking technology.
It is possible to register and create an account within minutes, which includes $400 of free
credit which is enough to store a terabyte of data, and run an small data warehouse for nearly
two weeks, on a system that will support a small team of developers.
Finally, in addition to scaling up for larger data volumes, it’s also possible to automatically
scale out to support a massive numbers of users. The diagram below illustrates how the
Snowflake multi-cluster feature automatically scales out and then back in during the day, and
the user is only charged for the time the clusters are actually running.
Is Snowflake an MPP database?
MPP stands for Massively Parallel Processing, and is a database architecture successfully
deployed by Teradata and Netezza. Unlike traditional Symmetric Multi-Processing (SMP)
hardware which runs a number of CPUs in a single machine, the MPP architecture deploys a
cluster of independently running machines, with data distributed across the system. In
addition to the ability to handle massive data volumes, this means it supports a scale out
architecture, as additional nodes can be added to the cluster, although this can take from
hours to days to deploy.
EPP stands for Elastic Parallel Processing, and was pioneered by Snowflake Computing. This
uses a number of independently running MPP clusters connected to a shared data pool. This
architecture has the advantage that new clusters can be started within seconds, to elastically
grow or shrink resources as needed.
1. Cloud Service Layer: Is “the brains” of the operation. This provides connectivity to the
database and handles infrastructure, transaction management, SQL performance optimisation,
security and metadata.
The layers of the architecture work transparently to service end user SQL queries, although it
is possible to start and suspend virtual warehouses manually.
Storage is charged separately as a pass-through cost from the underlying provider, and on
AWS works out at around $23 per terabyte per month. This means it’s possible to store a 10
Terabyte data warehouse for around $230 per month. In reality, as Snowflake applies
columnar compression on the data, it’s likely that storage will work out much cheaper on
Snowflake than (for example) S3.
Unlike legacy data warehouses, Snowflake supports both structured and semi-structured data
including JSON, AVRO and Parquet, and these can be directly queried using SQL. Unlike
Hadoop, Snowflake independently scales compute and storage resources, and is therefore a
far more cost-effective platform for a data lake.
In his excellent article, Tripp Smith explains the benefits of the EPP Snowflake architecture
which can have savings of up to 300:1 on storage compared to Hadoop or MPP platforms.
I was lucky enough to attend a meeting with the founders, where the French born
founder Thierry Cruanesexplained in a full French accent how difficult it was to pronounce
the name of his previous company, Oracle. At least now, he joked, people could understand
“Snowflake”.
The advantages of cloud based data warehousing have been extensively reviewed. The main
advantages of Snowflake over traditional on-premise bases solutions are:-
In terms of the disadvantages, there is not much to write out. Customers on legacy Oracle,
Netezza, Teradata or IBM platforms will need to migrate to Snowflake, and this should be
considered as part of an overall cloud strategy, otherwise there's no significant drawbacks for
a data warehouse platform.
Disclaimer: The opinions expressed on this site are entirely my own, and will not necessarily
reflect those of my employer.
Email *
Submit
John Ryan
Comments (0)
Newest First
Newest First