You are on page 1of 6

Access Control

Framework
• Snowflake provides granular control over access to objects — who can access what objects,
what operations can be performed on those objects, and who can create or alter access
control policies.
• Access control privileges determine who can access and perform operations on specific
objects in Snowflake.
• Snowflake’s approach to access control combines aspects from both of the following models:
a) Discretionary Access Control (DAC): Each object has an owner, who can in turn grant
access to that object.
b) Role-based Access Control (RBAC): Access privileges are assigned to roles, which are in
turn assigned to users
• The key concepts to understanding access control in Snowflake are:
a) Securable object: An entity to which access can be granted. Unless allowed by a grant,
access will be denied.
b) Role: An entity to which privileges can be granted. Roles are in turn assigned to users. Note
that roles can also be assigned to other roles, creating a role hierarchy.
c) Privilege: A defined level of access to an object. Multiple distinct privileges may be used to
control the granularity of access granted.
d) User: A user identity recognized by Snowflake, whether associated with a person or
program.
Access Control Framework

The way in which snowflake handles access control different from others is that in user based access control model rights and privileges are assigned to
users or groups of users. While in snowflake access to securable objects are allowed via privileges assigned to roles, which are in turn assigned to other
roles or users. SF provides a significant amount of flexibility and control both ways.
Micro Partitions and Clustering
Clustering
• Typically the data stored in tables are sorted/ordered along natural keys (date, region ..etc.). This clustering is the key factor in queries.
• As the data is inserted/loaded into a table, clustering metadata is collected and recorded for each micro partition created during the process. This helps in
preventing the unnecessary scans.
• Snowflake maintains clustering metadata for the micro-partitions in a table, including:
• The total number of micro-partitions that comprise the table.
• The number of micro-partitions containing values that overlap with each other (in a specified subset of table columns).
• The depth of the overlapping micro-partitions.
• Any data landing to snowflake goes through the following operations:
1 Divide and map incoming data into micro-partition using the ordering of the data as it is inserted/loaded.
2 Compress the data
3 Capture and store metadata
Benefits of micro-partition
1- SF does it automatically so we do not need to define or maintain it up front.
2- Micro-partitions Size is 50-500 MB before SF applies compression, this enable efficient fine granular pruning for faster queries.
3- Micro-partitions can overlap in their range of values, preventing the skewness.
4- Columnar Storage which helps efficient scanning of columns mentioned only in select or query
5- Columns are compressed within each micro-partition.
Micro Partitions and Clustering
Clustering depth
• The clustering depth for a populated table measures the average depth (1 or greater) of the overlapping micro-partitions for specified columns in a
table. The smaller the average depth, the better clustered the table is with regards to the specified columns.

id f_name l_name dob active city   dob select * from customer where year(dob) = 1995
13 sjka kcds 2/5/1995 TRUE Hyd min So,1995 is having 0 overlap
9 ajksja nsln 8/13/1997 FALSE delhi  
19 akjas nksd 9/11/1997 TRUE mumbai   select * from customer where year(dob) = 1997
4 hdonc nlks 9/19/1997 TRUE pune Partition1 max So, 1997 is having 1 overlap & depth is 2
     
16 wno mnclas 10/19/1997 TRUE nasik min select * from customer where year(dob) = 2003
10 nonwn nlsk 1/21/2003 FALSE lucknow   Depth is 4 and overlap is 3
18 mcdm nkdlc] 1/23/2003 TRUE delhi  
7 lmkm iw 2/15/2003 TRUE hyd Partition2 max
     
11 ncsln lndc 3/2/2003 FALSE Hyd min
1 nsn ncl 3/15/2003 TRUE delhi  
5 nsdlnld ncdslk 3/19/2003 TRUE mumbai  
8 eojei nsdn 3/21/2003 TRUE pune partition3 max
     
14 nsdnk nscdn 4/4/2003 TRUE Hyd min
12 nlsdn ncdsl 4/11/2003 FALSE delhi  
6 nsdln ncsn 4/13/2003 TRUE mumbai  
17 snoej nskl 4/17/2003 TRUE pune partition4 max
     
15 knslnc lmsdc; 4/20/2003 FALSE nasik min
20 nsdlcn alk 4/24/2003 TRUE lucknow  
2 nsdln mdcs 4/29/2003 TRUE delhi  
3 nsdln scd 8/25/2008 TRUE hyd partition5 max
Micro Partitions and Clustering
Clustering depth
• As the number of overlapping micro-partitions decreases, the overlap depth decreases.
• When there is no overlap in the range of values across all micro-partitions, the micro-partitions are considered to be in a constant state (i.e. they cannot
be improved by clustering).
• SYSTEM$CLUSTERING_DEPTH( '<table_name>' , '( <col1> [ , <col2> ... ] )' [ , '<predicate>' ] )
• Computes the average depth of the table according to the specified columns (or the clustering key defined for the table). The average depth of a
populated table (i.e. a table containing data) is always 1 or more. The smaller the average depth, the better clustered the table is with regards to the
specified columns.
• SYSTEM$CLUSTERING_INFORMATION( '<table_name>' , '( <col1> [ , <col2> ... ] )’ )
• Returns clustering information, including average clustering depth, for a table based on one or more columns in the table.
• To improve the clustering of the underlying table micro-partitions, you can always manually sort rows on key table columns and re-insert them into
the table; however, performing these tasks could be cumbersome and expensive.Instead, Snowflake supports automating these tasks by designating
one or more table columns/expressions as a clustering key for the table. A table with a clustering key defined is considered to be clustered.
• In particular, to see performance improvements from a clustering key, a table has to be large enough to consist of a sufficiently large number of micro-
partitions, and the column(s) defined in the clustering key have to provide sufficient filtering to select a subset of these micro-partitions.
• In general, tables in the multi-terabyte (TB) range will experience the most benefit from clustering, particularly if DML is performed
regularly/continually on these tables.
• A clustering key is a subset of columns in a table (or expressions on a table) that are explicitly designated to co-locate the data in the table in the same
micro-partitions. This is useful for very large tables where the ordering was not ideal (at the time the data was inserted/loaded) or extensive DML has
caused the table’s natural clustering to degrade.
Micro Partitions and Clustering
• Some general indicators that can help determine whether to define a clustering key for a table include:
• Queries on the table are running slower than expected or have noticeably degraded over time.
• The clustering depth for the table is large.
• Benefits of using cluster key:
1- Improved scan efficiency and pruning.
2- Better column compression than tables with no clustering.
3- No additional administration is required, All future maintenance performed by snowflake.
the compute resources used to perform clustering consume credits. As such, you should cluster only when queries will benefit substantially from the clustering. Clustering in a
way will help perform faster wherever their is sort required.
The more frequently a table is queried, the more benefit clustering provides. However, the more frequently a table changes, the more expensive it will be to keep it clustered.
Therefore, clustering is generally most cost-effective for tables that are queried frequently and do not change frequently. The number of distinct values (i.e. cardinality) in a
column/expression is a critical aspect of selecting it as a clustering key. It is important to choose a clustering key that has: A large enough number of distinct values to enable
effective pruning on the table.A small enough number of distinct values to allow Snowflake to effectively group rows in the same micro-partitions.
A column with very low cardinality (e.g. a column that indicates only whether a person is male or female) might yield only minimal pruning. At the other extreme, a column
with very high cardinality (e.g. a column containing UUID or nanosecond timestamp values) is also typically not a good candidate to use as a clustering key directly.
Depth Avg = (frequency/partition)
Since the automatic clustering is on , whenever there is an insert, snowflake will create micro-partition and clustering
Automatic clustering is just flagging the snowflake that re clustering is to be performed for the said table
alter table t2_order_proirity suspend recluster;
alter table t2_order_proirity resume recluster;
Clustering is not supported for external table

You might also like