Snowflake PPT 22

What will you learn?
Getting Load from Table Stream

Started AWS Types s
Architecture Load from Zero-Copy M aterialized
Azure Cloning views
Loading Load from Data Data

Data GCP Sharing Masking
Copy Snowpip Data Access
Options e Sampling
Management
Unstructured Time Scheduling Partner

Data Travel Tasks Connect
Performance Fail Visualizations Best
Safe practices
Contents
Getting Started 4 Loading Data 44 Data Sampling 117
Snowflake Architecture 7 Performance Optimization 60 Tasks & Streams 124
Multi-Clustering 11 Snowpipe 88 Materialized Views 142
Data Warehousing 17 Fail Safe and Time Travel 92 Data Masking 155
Cloud Computing 26 Table Types 97 Access Control 159
Snowflake Editions 31 Zero Copy Cloning 101 Snowflake & Other 182
Tools
Snowflake Pricing 34 Swapping 104 Best Practices 186
Snowflake Roles 41 Data Sharing 109

Getting Started
Snowflake Architecture
Snowflake - Architecture
- SAAS Offering on Cloud
- No maintenance & administrative overhead costs
- Snowflake separates its architecture into the below 3 layers :

a. Cloud Services
b. Compute Services
c. Database Storage
- Snowflake architecture is a hybrid of “Shared Disk Database” and “Shared

Nothing Database”
Shared Nothing Architecture
Shared Disk Architecture
• Scalability is limited.
• Hard to maintain data consistency across the cluster.
• Bottle neck of communication with shared disk.
Snowflake - Architecture & Layers
What is Snowflake?
Snowflake is a cloud data warehouse platform.
It offered as Software as a Service (SaaS).

This advanced platform is used for data warehousing, data engineering, data
application development, data science, data analytics, data lakes, and for securely
sharing and consuming shared data.
Snowflake is highly scalable and supports a near-unlimited number of

concurrent workloads.
The Snowflake architecture is a hybrid model of shared-disk and

shared-nothing database architecture
Shared-Disk Architectur
distributed computing architecture used in traditional databases and

consists of a single storage layer accessible by all cluster nodes.
In this architecture, the nodes share the same disk devices but have their own private
memory and CPU.
It communicates with the central data

storage layer to source data and
processes it.
Shared Nothing Architecture
It is quite opposite to shared-disk architecture, and consists of distributed cluster nodes,
along with their own CPU, disk storage, and memory. Here the advantage is that the
data can be divided and stored across cluster nodes.
It scales processing and compute together.

It moves data storage close to compute.
Data distributed across the cluster requires
shuffling between nodes.
Performance is heavily dependent on how data

is distributed across the nodes in the system.
Compute can’t be sized independently of

Shared nothing architecture is built using a group of independent servers, with each server
taking a predefined workload
If, for example, a number of servers are in the cluster, the total workload is divided by the
number of servers, and each server caters to a specific workload.
The biggest disadvantage of shared nothing architecture is that it requires careful application
partitioning, and no dynamic addition of nodes is possible. Adding a node would require
complete redeployment, so it is not a scalable solution.
Brain of the system -
Managing infrastructure, Access control,
CLOUD security,
SERVICES Optimizier, Metadata etc.
Virtual Virtual Virtual

Warehous Warehous Warehous
Muscle of the system -
e e e Performs MMP
QUERY (Massive Parallel Processing)
PROCESSING
STORAGE
- Hybrid Columnar Storage -
Saved in blobs
Snowflake is natively built for the cloud and comes with a unique multi-cluster shared data
architecture.
This advanced architecture has been designed to deliver the performance, elasticity,
scalability, and concurrency demanded by modern organizations.
Snowflake Architecture is a combination of shared-disk (SD) and shared-nothing database

architecture.
Snowflake uses a central data repository for persisted data same as shared-disk architectures
and makes the data accessible from all nodes in the platform.
Snowflake works similar to shared-nothing architecture to execute queries using
MPP (massively parallel processing) compute clusters.
Here each node in the cluster stores a small portion of data and that becomes the
entire data set stored locally.
The main reason behind following hybrid architecture is to offer data management
simplicity to its customers using shared-disk architecture and high performance
and scalability using shared-nothing architecture.
The main reason behind following hybrid architecture is to offer data management
simplicity to its customers using shared-disk architecture and high performance and
scalability using shared-nothing architecture.
The following Image gives you a clear overview of Snowflake Architecture:
Snowflake architecture consists of three key layers which include the Cloud services layer,
the Query processing layer, and the database storage layer. Let’s understand how each
layer works.
Database Storage Layer:
Each time data is loaded into Snowflake, Snowflake organizes data into internal
optimized, columnar and compressed format. After organizing the data, Snowflake
stores this data in cloud storage.
Snowflake takes care of multiple aspects of data storage, which include file size,
compression, data structure, statistics, metadata, and much more.
All the objects stored in the Snowflake are not directly accessible or visible to
customers. Users can only access the data stored in the Snowflake by running SQL
query operations on it.
Query Processing Layer:
This is the layer where query execution is performed.
Snowflake uses virtual warehouses to process queries. Every virtual warehouse is
an MPP (massively parallel processing) compute cluster and consists of multiple
compute nodes allotted by Snowflake from a cloud provider.
In Snowflake Each virtual warehouse is independent and does not rely on or

share resources with other virtual warehouses.
This enables the virtual warehouses to scale independently without affecting the
performance of other warehouses.
Cloud Services:
The cloud services layer contains a group of services that coordinates activities
on the Snowflake cloud warehouse platform.
All these services integrate different components of Snowflake to execute user

requests right from the login to query dispatch.
handles all other services in Snowflake including sessions, authentication, SQL

compilation, encryption, etc
Web Interface(Classic Web Interface)
Snowflake
Once you have logged into the Snowflake web-based graphical interface,
you can create and manage all Snowflake objects,
virtual warehouses,
databases,
all database objects.
You can also use the interface to load limited amounts of data into tables, execute ad hoc
queries and perform other DML/DDL operations, and view past queries.
The interface is where you can change your Snowflake user password and specify other
preferences, such as your email address.
Object Hierarchy
, if you have the required administrator roles,
you can perform administrative tasks in the interface,
such as creating and managing users.
For more information about the administrative tasks you can perform, see Managing
• Databases Page
• Warehouses Page
• Worksheet Page
• History Page
• Help Menu
• User Menu
Databases
Page
Databases you have created or have privileges

to access.
Tasks
• Create,
• clone
• drop a database.
• Transfer ownership of a database
to a different role.
Warehouses Page
Tasks you can perform in this page

 Create or drop a warehouse.
 Suspend or resume a warehouse.
 Configure a warehouse.
 Transfer ownership of a warehouse to a
different role.
Worksheet Page
provides a powerful interface for entering and submitting SQL queries, as well as performing
DDL and DML operations, and viewing the results side-by-side as your queries/operations
complete
Tasks
• Run queries and other DDL/DML operations in a worksheet, or load SQL script files.
• Open concurrent worksheets, each with its own separate session.
• Save and reopen worksheets.
• Log out of Snowflake or switch roles within a worksheet, as well as refresh your browser, witho
losing your work:
• If you log out of Snowflake, any active queries stop running.
• If you’re in the middle of running queries when you refresh, they will resume running when the
refresh is completed.
• Resize the current warehouse to increase or decrease the compute resources utilized for
executing your queries and DML statements.
• Export the result for a selected statement (if the result is still available).
History
This page allows you to view and drill into the details of all queries executed in the last 14 days.
The page displays a historical listing of queries, including queries executed from SnowSQL or othe
SQL clients.
Filter queries displayed on the page.
Scroll through the list of displayed queries. The list includes (up to) 100 queries.
At the bottom of the list, if more queries are available, you can continue searching.
Abort a query that has not completed yet.
View the details for a query, including the result of the query. Query results are available for
a 24-hour period. This limit is not adjustable.
Change the displayed columns, such as status, SQL text, ID, warehouse, and start and end time,
by clicking any of the column headers.
Help Menu
• To access this menu, click the Help Help tab icon in the upper right.
• From the dropdown menu, choose one of the following actions:
• View the Snowflake Documentation in a new browser tab/window.
• Visit the Support Portal in a new browser tab/window.
• Download… the Snowflake clients by opening a dialog box where you can:
• Download the Snowflake CLI client (SnowSQL) and ODBC driver.
• View download info for the Snowflake JDBC driver, Python components, Node.js driver, and
Snowflake Connector for Spark.
• Show help panel with context-sensitive help for the current page.
User Menu
• You can then change your password or security role for the session (if you have multiple
roles assigned to you). For more information about security roles and how they
influence the objects
• you can see in the interface and the tasks you can perform, see Access Control in
Snowflake.
• You can also use this dropdown to:
• Switch languages for the user session (if additional languages have been enabled for
your account).
• Set your email address for notifications (if you are an account administrator).
• Log out (close your current session and exit the classic web interface).
• Determine the organization, edition, cloud platform, and region of the Snowflake
account you are logged into
Virtual Warehouse Sizes
XS 1 L 8
S 2
XL 16
M 4
4X 128
L
Multi-Clustering
Multi-Clustering
… More queries
… S
S
Multi-Clustering
… More queries
… S
> Auto-Scaling
S
Multi-Clustering
Queue
Auto-Scaling: When to start an additional

cluster?
Snowflake supports two ways to scale warehouses:
Scale up by resizing a warehouse.

Scale out by adding clusters to a multi-cluster warehouse (requires Snowflake Enterprise
Edition or higher).
Or
Horizontal scaling means that you scale by adding more machines into your pool of
resources whereas
Vertical scaling means that you scale by adding more power (CPU, RAM) to an existing
machine.
Scaling policy
Standard Economy
Favors Favors
starting conserving
additional credits rather
warehouses than starting
additional
warehouses
scaling policy in Snowflake
To help control the credits consumed by a multi-cluster warehouse running in
Auto-scale mode,
Snowflake provides scaling policies, which are used to determine when to
start or shut down a cluster.
The scaling policy for a multi-cluster warehouse only applies if it is running in
Auto-scale mode.
Scaling policy
Policy Description Cluster Starts… Cluster Shuts Down…
Immediately when either a query is After 2 to 3 consecutive

Prevents/minimizes queuing by queued or the system detects that successful checks (performed at
favoring starting additional clusters there are more queries than can be 1 minute intervals),
Standard (default) over conserving credits. executed by the currently available
which determine whether the load on
the least-loaded cluster could be
clusters. redistributed to the other clusters
Conserves credits by favoring

keeping running clusters fully-
loaded rather than starting Only if the system estimates there’s
additional clusters, enough query load to keep the After 5 to 6 consecutive
Economy cluster busy for at least 6 minutes. successful checks …
Result: May result in queries being
queued and taking longer to
complete.
Data Warehousing
What is a data warehouse?
Treditional data warehouses

What is the purpose of a data warehouse?

= Database that is used for reporting and

data analysis
Different layers
HR data
Staging Data Production

sales data
area Transformation
HR data
ETL
sales data
data warehouse
ETL = Extract, Transform & Load
Different layers
Reporting
Data Science
Raw Data
Access layer
data integration
Data scources Other Apps

Different layers
Staging Data
area Transformation
Cloud Computing
Cloud Computing
Why Cloud Computing?

Cloud Computing
MANAGED
• Infrastucture
• Security
• Electricity
Data Center • Software/Hardware upgrades
Software-as-a-Service
Cloud Computing
Application Databases, tables
etc.
Software Snowflake
Data Managing data storage,
Virtual warehouses,
Software-as-a-service Upgrades/Metadata etc.
Operating System
Physical servers
Virtual machines Cloud provider

AWS, Azure, GCP
Physical storage
Cloud Computing
Customer
Application
Creating tables etc.
Software
Snowflake
Data Managing data storage,
Virtual warehouses,
Software-as-a-service Upgrades/Metadata etc.
Operating System
Physical servers
Virtual machines Cloud provider

AWS, Azure, GCP
Physical storage
Snowflake Editions
Snowflake Editions
Enterprise
additional
features for the
needs of large-
Standard scale enterprises Business
even
introductory protection for
hi gh
level reirtliecvealsl
C organizations
with extremely
of data
sensitive data
Virtual
Private
highest level of
security
Snowflake Editions
Standar Enterprise Business Virtual

d Critical Private
 Complete DWH  All Standard features All Enterprise features  All Business
 Automatic 
Additional security Ciritcal features
data  Multi-cluster warehouse
features such as Data  Dedicated virtual

encryption
encryption servers and
 Time travel up to
 Time travel up
to 1 day
 Materialized views everywhere completely seperate
90 days
 Disaster recovery for 7  Search Optimization  Extended support Snowflake
days beyond time  Column-level  Database failover environment
security
travel and disaster
 Secure data share recovery
 Premier support 24/7
Snowflake Pricing
Snowflake Pricing
Compute Storage
 Charged for active warehouses per  M onthly storage fees

hour
 Based on average storage used per
 Depending on the size of the month
warehouse
 Cost calculated after compression
 Billed by second (minimum of 1min)
 Cloud Providers
 Charged in Snowflake credits
Snowflake Pricing
Compute Storage
$/€ Credits
Consumed
Snowflake Pricing

d Critical Private
 $2.70 /  $4 /  $5.40 /  Contact

Credit Credit Credit Snowflake
Region: EU
(Frankfurt)
Platform: AWS
Virtual Warehouse Sizes
XS 1 L 8
S 2
XL 16
M 4
4X 128
L
Snowflake Pricing
On Capacit
Demand  We think we need 1 TB of y
Storage storage Storage
 Scenario 1: 100GB of storage  Scenario 1: 100GB of storage
used used
0.1 TB x $40 = $4 1 TB x $23 = $23
 Scenario 2: 800GB of storage  Scenario 2: 800GB of storage
used used
0.8 TB x $40 = $32 0.8 TB x $40 = $23
Region: US East (Northern

Virginia) Platform: AWS
Snowflake Pricing
On Capacit
Demand y
Storage Storage
 Start with On Demand

 Once you are sure about your usage
use Capacity storage
Snowflake Roles
Key Elements
Some key elements to Access Control in Snowflake Roles are as follows:
Securable object: A secure object is one to which permission can be granted.

Access will be refused unless a grant by the admin allows it.
------------------------------------------------------------------------------------------
Role: A role is a type of entity to which privileges can be assigned.
It’s worth noting that roles can be given to other objects, forming a hierarchy.
-------------------------------------------------------------
Privilege: Privilege is a level of access to an object that is defined.
To manage the granularity of access allowed, multiple separate privileges might be
employed.
---------------------------------------------------------------------------------
User: Snowflake recognizes a user’s identity, whether it’s affiliated with a person or a program.
Snowflake Roles
ACCOUNTADMIN
SECURITYADMI SYSADMIN
N
USERADMIN Custom Role Custom Role

1 2
Custom Role
3
PUBLIC
Snowflake Roles
ACCOUNTADMIN SECURITYADMIN SYSADMIN USERADMIN PUBLIC
 SYSADMIN and  USERADMIN role  Create  Dedicated to  Automatically

SECURITYADMI is granted to warehouses user and role granted to
N SECURITYADMIN schemas,Virtual management every user
 top-level role in  Can manage Machines and only  Can create own
the
users and roles databases (and  Can create users objects like every
system and
 Can manage any more objects) other role
 should be granted roles
object grant  Recommended (available to
only to a limited
globally that all custom every other
number of users
 user creation, roles are user/role
 Extra privilages
Custom Creations Assigned
and grant command

CREATE ROLE IF NOT EXISTS TEST_ROLE;
GRANT ALL PRIVILEGES ON DATABASE DEMO_DB TO ROLE TEST_ROLE;

GRANT ALL PRIVILEGES ON SCHEMA EMPLOYEE TO ROLE TEST_ROLE;
GRANT ALL PRIVILEGES ON TABLE A TO ROLE TEST_ROLE;
GRANT ALL PRIVILEGES ON TABLE EMPLOYEE.A TO ROLE TEST_ROLE;
USE ROLE TEST_ROLE;
GRANT ROLE TEST_ROLE TO USER VEERENDRA;
CREATE OR REPLACE USER TEST_USER PASSWORD = 'ABC123' DEFAULT_ROLE =

'PUBLIC' MUST_CHANGE_PASSWORD = TRUE;
// GRANT PRIVILEGES TO ROLE

-- Grants one or more access privileges on a securable object to a role.
-- The privileges that can be granted are object-specific and are grouped into the
following categories:
-- Global privileges
-- Privileges for account objects (resource monitors, virtual warehouses, and
databases)
-- Privileges for schemas
-- Privileges for schema objects (tables, views, stages, file formats, UDFs, and
sequences)
// GRANT ROLE
-- Assigns a role to a user or another role:
-- Granting a role to another role creates a “parent-child” relationship between the
roles (also referred to as a role hierarchy).
-- Granting a role to a user enables the user to perform all operations allowed by the
role
-- (through the access privileges granted to the role).
SHOW GRANTS;
Loading Data
Loading Data
BULK CONTINUOUS
LOADIN LOADING
G
 M ost frequent method  Designed to load small volumes of data
 Uses warehouses  Automatically once they are added to
stages
 Loading from stages
 Lates results for analysis
 COPY command
 Snowpipe (Serverless feature)
 Transformations
possible
Understanding Stages
 Not to be confused with dataware house
 sLotacgaetison of data files where data can be

loaded from
Externa Internal
l Stage
Stage
Understanding Stages
Externa Internal
l Stage
Stage
 External cloud provider  Local storage
maintained
 S3
by Snowflake
 Google Cloud Plattform
 M icrosoft Azure
 Database object created in Schema

 CREATE STAGE (URL, access
settings)
Note: Additional costs may
apply if region/platform
differs
Copy Options
COPY INTO <table_name>

FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
copyOptions
Copy Options

FROM externalStage
ON_ERROR = CONTINUE
Copy Options

FROM externalStage
VALIDATION_MODE =
RETURN_n_ROWS | RETURN_ERRORS
 Validate the data files instead of loading

them
Copy Options

FROM externalStage
VALIDATION_MODE = RETURN_n_ROWS | RETURN_ERRORS
RETURN_N_ROWS (e.G. VaLIdates & returNs the

RETURN_10_ROWS) sPecIfIed NumBer of rows;
faILs at the fIrst error
eNcouNtered
RETURN_ERRORS ReturNs aLL errors IN CoPy CommaNd
 Validate the data files instead of loading
them
Copy Options

FROM externalStage
SIZE_LIMIT = num
 Specify maximum size (in bytes) of data loaded in that command (at least one
file)
 When the threshold is exceeded, the COPY operation

stops loading
Copy Options

FROM externalStage
RETURN_FAILED_ONLY = TRUE | FALSE
 Specifies whether to return only files that have failed to load in the statement
result
 DEFAULT = FALSE
Copy Options

FROM externalStage
TRUNCATECOLUMNS = TRUE | FALSE
 Specifies whether to truncate text strings that exceed the target column
length
Copy Options

FROM externalStage
FORCE = TRUE | FALSE
 Specifies to load all files, regardless of whether they’ve been loaded previously
and
have not changed since they were loaded
 Note that this option reloads files, potentially

duplicating data in a table
Copy Options

FROM externalStage
TRUNCATECOLUMNS = TRUE | FALSE
 Specifies whether to truncate text strings that exceed the target column length
 TRUE = strings are automatically truncated to the target column length
 FALSE = COPY produces an error if a loaded string exceeds the target column
length
 DEFAULT = FALSE
Copy Options

FROM externalStage
SIZE_LIMIT = num
 Specify maximum size (in bytes) of data loaded in that command (at least one
file)
 When the threshold is exceeded, the COPY operation stops

loadin
g
 Threshold for each file
 DEFAULT: null (no size limit)

Copy Options
FROM externalStage
PURGE = TRUE | FALSE
 specifies whether to remove the data files from the

stage
automatically after the data is loaded successfully
 DEFAULT: FALSE
Load unstructured data
Create
Stage
Load raw Type

data VARIANT
Analyse & Parse
Flatten &
Load
Performance Optimization
 Add indexes, primary keys

 Create table partitions
 Analyze the query execution table
plan
 Remove unnecessary full table
scans
 Add indexes, primary keys

 Create table partitions
 Analyze the query execution table
plan
 Remove unnecessary full table
scans
How does it work in Snowflake?
 Automatically managed
micro- partitions
What is our job?
 Assigning appropriate data

types
 Sizing virtual warehouses
 Cluster keys
Performance aspects
Dedicated Scalin
virtual g Up
warehouses
 For known
 Separated according to patterns of high
different workloads work load
Scaling
Out
Maximize
Cache
 Dynamically fo
unknown patterns of Usage
work load
Cluste  Automatic
 Forcaching
large
can be maximized
tables
r
Keys
Dedicated virtual warehouse
Database administrators
Data scources
Reporting
BI
ETL/ELT
Marketing
Data Science
Dedicated virtual warehouse
Identify &
Classify
 Identify & Classify groups of

workload/users
 BI Team, Data Science Team, Marketing
department
Create
dedicated
virtual
warehouses
 For every class of workload
& assign users
Considerations
Not too many

VW
 Avoid
underutilization
Refine
classifications
 Work patterns can

change
Considerations
 If you use at least Entripse Edition all warehouses should be Multi-
Cluster
 M inimum: Default should be 1

 M aximum: Can be very high
Let's
practice!
ETL/ELT
Data scources
Scaling Up/Down
 Changing the size of the virtual warehouse
depending on different work loads in different
periods
 ETL at certain times (for example between 4pm and
Use cases 8pm)
 Special business event with more work load
 NOTE: Common scenario is increased query

complexity
NOT more users (then Scaling out would be better)
Scaling Out
Scaling Up Scaling Out

Increasing the size of virtual Using addition warehouses/ Multi-
warehouses Cluster warehouses
More complex query More concurrent users/queries
Scaling Out
 Handling performance related to large numbers of concurrent
users
 Automation the process if you have fluctuating number of users

Caching
 Automatical process to speed up the queries
 If query is executed twice, results are cached and can be re-used
 Results are cached for 24 hours or until underlaying data has

changed
What can we do?
 Ensure that similar queries go on the same warehouse
 Example: Team of Data Scientists run similar queries, so they should all
use the same warehouse
Clustering in Snowflake
 Snowflake automatically maintains these cluster keys
 In general Snowflake produces well-clustered tables
 Cluster keys are not always ideal and can change over
time
 M anually customize these cluster keys

What is a cluster key?
 Subset of rows to locate the data in micro-partions
 For large tables this improves the scan efficiency in our

queries
Event Date Event ID Customers City
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
2021-04-05 134589 … …
2021-06-07 134594 … …
2021-07-03 134597 … …
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
Event Date Event ID Customers City Event Date Event ID Customers City
2021-03-12 134584 … … 2021-03-12 134584 … …
2021-12-04 134586 … … 2021-12-04 134586 … …
2021-11-04 134588 … … 2021-11-04 134588 … … 1

2021-04-05 134589 … … 2021-04-05 134589 … …
2021-06-07 134594 … … 2021-06-07 134594 … …
2021-07-03 134597 … … 2021-07-03 134597 … … 2

2021-03-04 134598 … … 2021-03-04 134598 … …
2021-08-03
2021-08-04
134599
134601
…
…
…
…
2021-08-03
2021-08-04
134599
134601
…
…
…
…
3
Event Date EventID
Event ID Customers
Customers City
City
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
SELECT COUNT(*)
2021-04-05 134589 … …
WHERE Event_Date > '2021-07-01'
2021-06-07 134594 … … AND Event_Date < '2021-08-01 '
2021-07-03 134597 … …
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
Event Date EventID
Event ID Customers
Customers City
City
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
SELECT COUNT(*)
2021-04-05 134589 … …
WHERE Event_Date > '2021-07-01'
2021-06-07 134594 … … AND Event_Date < '2021-08-01'
2021-07-03 134597 … … All partitions need to be scanned!
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
Event Date Event ID Customers City Event Date EventID
Event ID Customers
Customers City
City
2021-03-12 134584 … … 2021-03-04

2021-03-12 134598
134584 … …
2021-12-04 134586 … … 2021-03-12

2021-12-04 134584
134586 … …
2021-11-04 134588 … … 2021-04-05

2021-11-04 134589
134588 … … 1
2021-04-05 134589 … … 2021-06-07
2021-04-05 134594
134589 … …
2021-06-07 134594 … … 2021-07-03

2021-06-07 134597
134594 … …
2021-07-03 134597 … … 2021-08-03

2021-07-03 134599
134597 … … 2
2021-03-04 134598 … … 2021-08-04
2021-03-04 134601
134598 … …
2021-08-03
2021-08-04
134599
134601
…
…
…
…
2021-11-04
2021-08-03
2021-12-04
2021-08-04
134588
134599
134586
134601
…
…
…
…
3
When to cluster?
 Clustering is not for all tables
 Mainly very large tables of multiple terabytes can

benefit
How to cluster?
 Columns that are used most frequently in WHERE-

clauses (often date columns for event tables)
 If you typically use filters on two columns then the table can also
benefit from two cluster keys
 Column that is frequently used in Joins
 Large enough number of distinct values to enable effective grouping

Small enough number of distinct values to allow effective grouping
CREATE TABLE <name> ... CLUSTER BY ( <column1> [ , <column2> ... ] )
CREATE TABLE <name> ... CLUSTER BY ( <expression> )
ALTER TABLE <name> CLUSTER BY ( <expr1> [ , <expr2> ... ] )
ALTER TABLE <name> DROP CLUSTERING KEY

 Columns that are used most frequently in WHERE-

clauses (often date columns for event tables)
 If you typically use filters on two columns then the table can also
benefit from two cluster keys
 Column that is frequently used in Joins
 Large enough number of distinct values to enable effective grouping

Small enough number of distinct values to allow effective grouping
Snowpipe
What is Snowpipe?
 Enables loading once a file appears in a bucket

 If data needs to be available immediately for
analysis
 Snowpipe uses serverless features instead
of warehouses
Snowpipe
S3 notification Serverless
Load
COPY
S3 bucket
Snowflake DB
Setting up Snowpipe
Create  To have the

Stage connection
Test COPY  To make sure it

COMM AND works
Create  Create pipe as object with COPY

COM M AND
Pipe
S3  To trigger
Notification snowpipe
Fail Safe and Time Travel
Time Travel

d Critical Private
 Time travel up to 1  Time travel up to  Time travel up to  Time travel up to

day 90 90 90
days days days
RETENTION
PERIODE
DEFAULT =
1
Fail Safe
 Protection of historical data in case of

disaster
 Non-configurable 7-day period for
permanent
tables
 Period starts immediately after Time Travel
period
 ends
No user interaction & recoverable only
by
Snowflake
 Contributes to storage cost
Fail Safe
 Access and query data

etc.
Current
Data Storage
Continuous Data Protection Lifecycle
 No user
 Ropeecoravteioy
r nsb/  SELECT … AT |
BEFORE  Access and query data
 Restoring
e qyuoenrdionly
ieby
eTsm snowflake
Travel
support UNDROP etc. etc.
Fail Safe Current

(transient: 0 days Time Data Storage
permanent: 7 Travel
days) (1 – 90
days)
Table Types
Table types
Only for data that does Non-permanent
Permanent
not need to be data
data
protected
Until dropped Until dropped Only in session
Permanent Transient Temporary
CREATE TABLE CREATE TRANSIENT TABLE CREATE TEMPORARY TABLE
 Time Travel Retention  Time Travel Retention  Time Travel Retention

Period 0 – 90 days Period 0 – 1 day Period
0 – 1 day
 Fail Safe × Fail Safe
× Fail Safe
Table types
Only for data that does Non-permanent
Permanent
not need to be data
data
protected
Until dropped Until dropped Only in session
Permanent Transient Temporary

ManCaREgATinE
gTRSANtSoIErNaTgTeABCLEost
CREATE TABLE CREATE TEMPORARY TABLE
 Time Travel Retention  Time Travel Retention  Time Travel Retention

Period Period
Period 0 – 90 days
0 – 1 day 0 – 1 day
 Fail Safe × Fail
Safe × Fail Safe
Table types notes
 Types are also available for other
database objects (database,
schema etc.)
 For temporary table no naming
conflicts
with permanent/transient tables!
Other tables will be effectively

hidden!
Zero Copy Cloning
Zero-Copy Cloning
 Create copies of a database, a schema or

a
table
 Cloned object is independent from original
 tEaabslyeto copy all meta data & improved

storage
 mCraenaatignegmbeanctkups for development

purposes
Zero-Copy Cloning
CREATE TABLE <table_name> ...

CLONE <source_table_name>
BEFORE ( TIMESTAMP =>
<timestamp> )
Swapping
Swapping Tables
 Use-case: Development table into
production
table
Development Production
Swap
Meta Meta
data data
Swapping
ALTER TABLE <table_name> ...

SWAP WITH <target_table_name>
Swapping
ALTER SCHEMA <schema_name> ...

SWAP WITH <target_schema_name>
Swaping Tables
 Create copies of a database, a schema or a

table
Original
Cop
y
Meta data
operation
Data Sharing
Data Sharing
 Usually this can be also a rather complicated

process
 Data sharing without actual copy of the
data & uptodate
 Shared data can be consumed by the own

compute
resources
 Non-Snowflake-Users can also access

through a
reader account
Data Sharing
Account 1
Producer
Account 2
Consumer
Read-only
Data Sharing
Account 1
Account 2
Compute Resources Account 2
Read-only
Data Sharing
Account 1
Own Compute
Resources
R
e
a
d
Data Sharing
Own Compute Account 1

Resources
Reader Account
Data Sharing with
Non Snowflake Users

Resources
Reader Account
Sharing with
Non Snowflake users
New  Indepentant instance with
Reader own url & own compute
Account resources
Share  Share database &

data table
Create  As administrator create user &

Users roles
Create  In reader account create database from

database share
Data Sampling
Data Sampling
Why Sampling? 10 TB
SAMPLE
500 GB
Data Sampling
Why Sampling?
- Use-cases: Query development, data analysis etc.
- Faster & more cost efficient (less compute resources)

Data Sampling
Why Sampling? 10 TB
SAMPLE
500 GB
Data Sampling Methods
ROW or BERNOULLI method
BLOCK or SYSTEM method

Data Sampling Methods
ROW or BERNOULLI method BLOCK or SYSTEM method
Every row is chosen with percentage p Every block is chosen with percentage p
More "randomness" More effective processing
Smaller tables Larger tables

Data Sampling

Resources
Reader Account
Tasks & Streams
Scheduling Tasks
 Tasks can be used to schedule SQL

statements
 Standalone tasks and trees of tasks
Understand Create Schedule

tasks tasks tasks
T Check
r task
e history
e
Tree of Tasks
Root task
 Every task has one parent
Task A Task B
Task C Task D Task E Task F

Tree of Tasks
CREATE TASK ...

AFTER <parent task>
AS …
ALTER TASK ...

ADD AFTER <parent task>
Streams
Table
Stream object
Streams
Table
Stream object
D
E
L
E
T
E
I
N
S
E
Streams
Table
Stream object
Streams
Table
Stream object
METADATA$ACTION
METADATA$UPDATE
METADATA$ROW_ID
Streams
CREATE STREAM <stream name>

ON TABLE <table name>
SELECT * FROM <stream name>

Streams
Table
Stream object
INSERT
Streams
ETL
Data scources
Streams
Reporting
Data Science
Raw Data
Access layer
data integration

Streams
Reporting
Data Science
Raw Data
Access layer
data integration

Streams
Reporting
Data Science
Raw Data
Access layer
data integration

Streams
Reporting
Data Science
Raw Data
Access layer
data integration

Streams
Object that records (DML-)changes made
to a table Reporting
This process is called change data capture

(CDC)
Data Science
Raw Data
Access layer
data integration

Types of streams
STANDARD APPEND-ONLY
 INSERT  INSERT
 UPDATE
 DELETE
Syntax
CREATE STREAM <stream name>

ON TABLE <table
name> APPEND_ONLY = TRUE
Materialized Views
Materialized views
 We have a view that is queried frequently

and that a long time to be processed
× Bad user experience
× More compute consumption

Materialized views
 We have a view that is queried frequently

and that a long time to be processed
 We can create a materialized view to solve

that problem
What is a materialized view?
 Use any SELECT-statement to create this MV
 Results will be stored in a seperate table

and this will be updated automatically
based on the base table
When to use MV?
 Benefits
 Maintenance costs
When to use MV?
 View would take a long time to be processed

and is used frequently
 Underlaying data is change not frequently

and on a rather irregular basis
When to use MV?
If the data is updated on a very regular basis…
 Using tasks & streams could be a better alternative

Alternative – streams & tasks
Stream object
VIEW / TABLE
Underlaying Table
TASK with MERGE

When to use MV?
 Don't use materialized view if data changes are very

frequent
 Keep maintenance cost in mind
 Considder leveraging tasks (& streams) instead

Limitations
Only available for Enterprise edition

Limitations
× Joins (including self-joins) are not supported
× Limited amount of aggregation functions

Limitations
APPROX_COUNT_DISTINCT (HLL).
AVG (except when used in PIVOT).
× Joins (including self-joins) are noBtTI sAuNpDp_AoGrGt.ed
BITOR_AGG.
BITXOR_AGG.
× Limited amount of aggregation uf COnUcNtToi . ns
MIN.
MAX.
STDDEV.
STDDEV_POP.
STDDEV_SAMP.
SUM.
VARIANCE (VARIANCE_SAMP, VAR_SAMP).
VARIANCE_POP (VAR_POP).
Limitations
× Joins (including self-joins) are not supported
× Limited amount of aggregation functions
× UDFs
× HAVING clauses.
× ORDER BY clause.
× LIMIT clause
Data Masking
Data Masking
Data Masking
Column-level Security
Access Control
Access Control
 Who can access and perform operations

on objects in Snowflake
 Two aspects of access control combined

Access Control
Discretionary Role-based
Access Access
Control (DAC) Control
(RBAC)
 Each object has an
owner who can  Access privileges are assigned to
grant access to that roles, which are in turn assigned
object to users
Access Control
GRANT <role>
TO <user>
User 1
Creates
Role Table Role
1 Owns 2 User
Privileg 2
e
GRANT <privilege> Role User
ON <obeject> 3 3
TO <role>
Securable objects
Accoun
t
Other
User Role Database Warehous Account
e objects
Schem
a
Other
Table Vie Stage Integratio Schema
w n objects
Access Control
 Every object owned by a single role (multiple users)
 Owner (role) has all privileges per default

Key concepts
USER  People or
systems
 Entity to which privileges are

ROLE
granted (role hierarchy)
 Level of access to an object

PRIVILEGE
(SELECT, DROP, CREATE etc.)
 Objects to which privileges can

SECURABLE
OBJECT be granted
(Database, Table, Warehouse
etc.)
Snowflake Roles
ACCOUNTADMIN
N
USERADMIN
PUBLIC
Snowflake Roles
ACCOUNTADMIN
N
USERADMIN Custom Role Custom Role

1 2
Custom Role
3
PUBLIC
Snowflake Roles
ACCOUNTADMIN SECURITYADMIN SYSADMIN USERADMIN PUBLIC
 SYSADMIN and  USERADMIN role  Create  Dedicated to  Automatically

SECURITYADMI is granted to warehouses and user and role granted to
N SECURITYADMIN databases (and management every user
 top-level role in  Can manage more objects) only  Can create own
the
users and roles  Recommended  Can create users objects like every
system and
 Can manage any that all custom other role
 should be granted roles
object grant roles are (available to
only to a limited
globally assigned every other
number of users
user/role
ACCOUNTADMIN
ACCOUNTADMIN
N
USERADMIN
PUBLIC
ACCOUNTADMIN
Top-Level-Role
 Manage & view all

Best
objects practises
 All configurations on account level  Very controlled assignment strongly
recommended!
 Account operations
(create reader account, billing  Multi-factor authentification
 Firsteutcs.e) r will have this role  At least two users should be assigned to that role
assigned  Avoid creating objects with that role unless you
have to
 Initial setup & managing account
level objects
ACCOUNTADMIN
 Account admin tab
 Billing & Usage
 Reader Account
 Multi-Factor Authentification
 Create other users

ACCOUNTADMIN
 USERADMIN role
is granted to
SECURITYADMIN
 Can manage
users and roles
 Can manage any
object grant
globally
SECURITYADMIN
Sales Admin HR Admin

Role Role
Sales HR
Role Role
SECURITYADMIN
Custom Role
1
Custom Role
3
SYSADMIN
 Create & manage objects
 Create & manage

warehouses, databases,
tables etc.
 Custom roles should be

assigned to the
SYSADMIN role as the
parent
Then this role also has the ability to grant privileges

on warehouses, databases, and other objects to the
custom roles.
SYSADMIN

Role Role
Sales Role
HR Role
 Create a virtual warehouse & assign it to the custom roles
 Create a database and table & assign it to the custom

roles
Custom roles

Role Role
Sales HR
Role Role
 Customize roles to our needs & create own

hierarchies
 Custom roles are usually created by SECURITYADMIN
 Should be leading up to the SYSADMIN role

USERADMIN

Role Role
Sales HR
Role Role
 Create Users & Roles (User & Role M anagement)
 Not for granting privileges (only the one that is

owns)
PUBLIC

Role Role
Sales HR
Role Role
 Create Users & Roles (User & Role M anagement)
 Not for granting privileges (only the one that is

owns)
PUBLIC
 Least privileged role (bottom of hierarchy)
 Every user is automatically assigned to this

role
 Can own objects
 These objects are then available to everyone

Snowflake & Other Tools
Snowflake & other tools
Database administrators
Data scources
Reporting
BI
ETL/ELT
Marketing
Data Science
 Create easily trial accounts with Snowflake

partners
 Convenient option for trying 3rd-party tools

 ETL/data integration tools – M oving & transforming

data
 M achine Learning & Data Science tools
 Security & Governance

Best Practices
Best practices
 Virtual
warehouses
 Table design
 Monitoring
 Retention period
ETL/ELT
Data scources
Virtual warehouse
 Best Practice #1 – Enable Auto-Suspend
 Best Practice #2 – Enable Auto-Resume
 Best Practice #3 – Set appropriate

timeouts
ETL / Data Loading BI / SELECT queries DevOps / Data Science
Timeout Immediately 10 min 5 min

Table design
 Best Practice #1 – Appropiate table

type
 Staging tables – Transient
 Productive tables – Permanent
 Development tables – Transient

Table design
 Best Practice #1 – Appropiate table type
 Best Practice #2 – Appropiate data type
 Best Practice #3 – Set cluster keys only if

necesarry
 Large table
 Most query time for table scan
 Dimensions
Retention period
 Best Practice #1: Staging database – 0 days

(transient)
 Best Practice #2 – Production – 4-7 days (1 day min)
 Best Practice #3 – Large high-churn tables – 0 days

(transient) Active Time Travel Fail Safe
Timeout 20GB 400GB 2.8TB

Snowflake PPT 22

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Snowflake PPT 22

Uploaded by

Copyright:

Available Formats

What will you learn?

Getting Load from Table Stream

Loading Load from Data Data

Unstructured Time Scheduling Partner

Snowflake Architecture 7 Performance Optimization 60 Tasks & Streams 124

Multi-Clustering 11 Snowpipe 88 Materialized Views 142

Cloud Computing 26 Table Types 97 Access Control 159

Snowflake Roles 41 Data Sharing 109

- SAAS Offering on Cloud

- No maintenance & administrative overhead costs

- Snowflake separates its architecture into the below 3 layers :

- Snowflake architecture is a hybrid of “Shared Disk Database” and “Shared

It offered as Software as a Service (SaaS).

Snowflake is highly scalable and supports a near-unlimited number of

The Snowflake architecture is a hybrid model of shared-disk and

distributed computing architecture used in traditional databases and

It communicates with the central data

It scales processing and compute together.

Performance is heavily dependent on how data

Compute can’t be sized independently of

Virtual Virtual Virtual

Snowflake Architecture is a combination of shared-disk (SD) and shared-nothing database

The following Image gives you a clear overview of Snowflake Architecture:

In Snowflake Each virtual warehouse is independent and does not rely on or

All these services integrate different components of Snowflake to execute user

handles all other services in Snowflake including sessions, authentication, SQL

Databases you have created or have privileges

Tasks you can perform in this page

Abort a query that has not completed yet.

Auto-Scaling: When to start an additional

Scale up by resizing a warehouse.

To help control the credits consumed by a multi-cluster warehouse running in

Snowflake provides scaling policies, which are used to determine when to

start or shut down a cluster.

The scaling policy for a multi-cluster warehouse only applies if it is running in

Policy Description Cluster Starts… Cluster Shuts Down…

Immediately when either a query is After 2 to 3 consecutive

Conserves credits by favoring

Treditional data warehouses

What is the purpose of a data warehouse?

= Database that is used for reporting and

Staging Data Production

Data scources Other Apps

Why Cloud Computing?

Virtual machines Cloud provider

Virtual machines Cloud provider

Standar Enterprise Business Virtual

 Charged for active warehouses per  M onthly storage fees

Standar Enterprise Business Virtual

 $2.70 /  $4 /  $5.40 /  Contact

Region: US East (Northern

 Start with On Demand

Securable object: A secure object is one to which permission can be granted.

USERADMIN Custom Role Custom Role

ACCOUNTADMIN SECURITYADMIN SYSADMIN USERADMIN PUBLIC

 SYSADMIN and  USERADMIN role  Create  Dedicated to  Automatically

and grant command

GRANT ALL PRIVILEGES ON DATABASE DEMO_DB TO ROLE TEST_ROLE;

GRANT ROLE TEST_ROLE TO USER VEERENDRA;

CREATE OR REPLACE USER TEST_USER PASSWORD = 'ABC123' DEFAULT_ROLE =

// GRANT PRIVILEGES TO ROLE

 Not to be confused with dataware house

 sLotacgaetison of data files where data can be