You are on page 1of 2

A&I Data Services Big Data Solution

Snowflake: Data Warehouse for Cloud


Data Warehouse can be defined as a central repository of information which collects
data from various sources or applications and help business to derive meaningful insights.
Conventional data warehouse systems are struggling to scale with the variety and volume of data
being generated today thus making it difficult to achieve rapid analytics, insights and scale.
Snowflake is a data warehouse designed for cloud. It is offered as Software-as-a-Service
(SaaS) and is fully ANSI SQL compatible. It is faster, easier to use, and far more flexible than
traditional data warehouse offerings because of its unique multi-cluster shared data architecture.
Snowflake supports both structured and semi structured data structures, such as CSV files, tables,
JSON, Avro, Parquet etc. There is no management of the data warehouse system like traditional
systems. Compute, storage and Cloud services are separated out so that they can scale
independently.
Database Storage: Snowflake stores data as objects by reorganizing it into its internal
optimized, compressed, encrypted and columnar format in AWS S3 or Azure Blob depending on
the cloud service provider.
Query Processing: Snowflake processes queries using Virtual Warehouses, Virtual warehouses
are independent compute clusters where no resources are shared between two warehouses,
thereby no performance impacts. A warehouse provides the required resources, such as CPU,
memory, and temporary storage, to perform DML operations. Based on the number of servers in
the warehouses they are classified into different sizes (T-shirt sizes) with pre-defined credits per
second for each size. Multi-cluster virtual warehouse can be enabled and auto-scaled by
mentioning the minimum and maximum clusters to be utilized. There are two modes of scaling
policies:
1. Standard/Default scaling mode- This mode will start an additional cluster for performance
rather than conserving the credits and starts a cluster when a query is fired and shuts down after 2
to 3 consecutive checks of 1-minute interval.
2. Economy scaling mode- This mode will conserve the credits by fully loading the available
clusters compromising the performance, new cluster is started only if the query load is enough to
keep the cluster busy for 6 minutes and will shuts down after 5 to 6 consecutive checks of 1-
minute interval
Cloud Services: Cloud services is the layer above storage and compute that coordinates the
Authentication, Metadata Management, Access Controls, Query parsing and Optimization.
Snowflake offers multifactor, federated, Single-Sign On authentication. All communications are
protected through TLS (Transport Layer Security) and Time Travel feature is possible by using
the data retention and Fail Safe feature is there to overcome data loss during catastrophic events.

1
A&I Data Services Big Data Solution

Snowflake is hosted on both AWS and Azure now. Snowflake supports encryption at rest
and transit by default and is compatible with various BI and ETL tools in the market. Structured
and Semi Structured data can be loaded into Snowflake and can be queried using SQL. Persisting
results from the “Result Cache” can be used to reduce the execution time for repeating queries.
Time travel feature is enabled for 1 day, by default which can be extended up to 90 days with
additional charge for that. With this feature, dropped tables can be restored. Fail safe feature is
available to store the data for 7 days from the day retention period ends. Snowflake supports all
the needs of a data warehouse solution. It has the native ability to ingest unstructured data. The
services and support is completely handled by Snowflake. The main feature of Snowflake is that
it provides unlimited user concurrency. High concurrency, zero-clone copy, time travel etc. are
some key features of Snowflake.

Snowflake_POC_9A
ug18.pptx

You might also like