You are on page 1of 63
RT ys snowflake = ro Best Practices for Using Tableau with Snowflake BY ALAN ELDRIDGE, ET. AL. > b SS snowflake What's inside: ect Peace Ce eed Cet ne hae ence OMe ecco thane y 1c Sa ae PN ene ate) Pe eS a a en eee 29 Implementing role-based sccurity EC eo esac Te Re ne ey Ce ee cee er CO oe anes ad wb ys snowflake Introduction It isa widely recognized truth today that data is an increasingly valuable asset for organizations. Greater and greater volumes of data are being generated and captured across systems at increasing levels of granularity, At the same tims, move Useis are demanding sccess to this dats ‘to answer questions and gain business insignt Snowflake and Tableau are leading technologies addressing these two challenges—Snowflake providing a near-limitless platform for cloud-buit data warehousing: and Tableau providing a highly intuitive, self-service data analytics platform. ‘The objective of this whitepaper isto hep you make: utes fiom both of these highly complementary products. It is intended for Tableau users who are new to Snawfaks, Snowfiake users who are new ta Tableau, and of course any users who are new to both, tse off {G-cnmsenon auoes nest PACTS FO UNING TABLEAU WITH SHOWSLAKE 3 dL aS snowflake What is Tableau? ‘Tableau Software fsa business inteligence solution thet integrates data analysis and reports into a process, one that lets everyday business users quickly explore data and shift views on the fly to follow their train of thaught. Tableau combines data exploration, visualization, reporting and dashboarding into an application that is easy to lear and use. continuous visual anal Tableau’ solution set consists of INTERACT ‘three main products: wine Putyour dats an where ye peed it vis browser, + Tableau Desktop—the end-user tool for data een SLs aes srs oa comboweng en oe tsed on ts oan orn conjunction wth Tableau oO = Server/Tebleau Online “Tableau Serverthe server letform that provides series for olaboration prereset and content shoring. Thiscan be deployed ieee o inthe ous on Amazes AS Microsofts Azure or Google's GCP). “Tableau Onine-2 software Version of beau Sener a-service Either warking standalone with Tableau Desitap, or by publishing content to Tableau Server or Tableau Online, users can directly work ‘with data stored in Snowflake data warehouses. SECURITY & COMPLIANCE at) ena DEPLOYMENT Deploy where and how you want Tableau can integrate into your esting data infrastructure, ‘whether on-prem orn tne cloud. O@8ses {ovwneon cues srsT PRACTICES FOR UNG TALEAU WITH SHOMELARE 4 RT ys snowflake What is Snowflake? Sawfake fs an analytic data warehouse provided as software-as-a-servce (SaaS). Snowflake provides a data warehouse that is faster, easier to use and far more flexible than traditional data warehouse offerings. Snowflakes data warehouse is not bult on an ‘sting database or big-data software platform such as Hadoop. The Snowflake data warehouse uses @ new SQL database engine with a uniaue architecture esigned for the cloud. To the user, Snowflake has many similarities to other enterprise cats ‘warehouses, but alsa has additional functionality and unique capabilites DATA WAREHOUSE AS A CLOUD SERVICE Snowflake's data warehouse isa true SaaS offering. More specifically + There sino hardware (virtual or physical for you to select, install configure ar menage, + There is no software for you to instal, configure or manage, + Ongoing maintenance, management and tuning are handled by Snowflake, Snowflake runs completely on cloud infrastructure. Al components of Snowflakes service (other than 2n optional command ne client) unin a secure public Cloud infrastructure. Snowflake was originally bult on the Amazon Web Services (AWS) cloud infrastructure, Snowflake Uses virtual comaute instances provided by AWS EC2 (Elastic Compute Clous) for its comoute nescs and AWS 3 (Simple Storase Service for persistent storage of dats, In aclition, #3 Snow!lake continues ta serve enterprises ofall sizes across all industries, more and more customers consider AWS and other cloud infrastructure providers when moving to the cloud, Snowflake has responded, and is now also avalible on Microsoft Azure. Snowfake Is not @ packaged software offering that can be installed by @ user. Snowflake manages al aspects of software installation and updates. Snowflake cannot be run on private clou Infrastructures (on-premises or hasted), SNOWFLAKE ARCHITECTURE Snowflakes architecture isa hybrid of traditional shared-cisk database architectures and shared nothing database architectures. Simlar to shared-disk architectures, Snowflake uses @ central data repository for persisted data that's accessible from all compute nodes in the data warehouse, But similar to shared-nothing architectures, Snov/fiake processes queries using MPP (massively parallel processing) compute clusters where each node in the cluster stores ¢ portion of the entire data set locally This suptoach offers the Gate management simplicity of # shavedhoisk architecture, out with the performance and scale-out benefits of a shared-nathing architecture. {Pevwomon cuves west raAcrices FoR USING TABLEAU WITH SHOWFLARE 5 RT ys snowflake Snowflake’ unique architecture consists of three key layers + Cloud Services + Query Processing + Database Storage CLOUD SERVICES QUERY PROCESSING DYN Patel tas DATABASE STORAGE ‘When data is loaded into Snowflake, Snowflake reorganizes that data into its intemal optimizes, compressed, columnar format. Snovifiake siores this optimized dats using Amazon Web Service's 59 (Simple Storage Service) cloud storage or Azure Blob Storage Snowflake manages all aspects of how this data is stored in AWS or Azure—the organization. ile size, structure, compression, metadata statistics and other aspects of data storage, The date objects stores by Snowflake are not crectly visible or accessinle by customers; they are accessible only thraugh SQL ‘query operations run using Snowfake, QUERY PROCESSING Query execution is performed in the processing layer. Snowflake processes queries using ‘virtua warehouses. Each virtual warehouse is an MPP compute cluster composed of muitiole compute nodes allocated by Snowflake from Amazon EC2 or Azure Virtual Machines. Each virtual warehouse isan independent compute cluster that does not share comaute resources: “with other virtual warehouses. As 2 result each virus! warehouse has no impact en the peiformance of other vieual warehouses, For more information, see virtual warehouses in the Snowfake online documentation, {G-cnmnon auoes nest PRACT FOm UNING TABLEAU WITH SMOWELAKE wb ys snowflake CLOUD SERVICES ‘The cloud services layers a collection of services that coordinate activities across Snowflake, These: 25 Te together all ofthe citferent components of Snowflake In order to process user requests, from login to query cispatch. The cloud services layer also runs on compute instances provisioned by Snowflake. ‘Among the services in this layer + Authentication + Infrastructure management + Metadata management * Query parsing and optimization + Access control {Grenson auoes nest PACTS FOR UNING TABLEAU WHT SMOWSLAKE 7 wb ys snowflake What you don’t have to worry about with Snowflake Because Snowflake is a clouc-oult, data warehouse as a service, there are lots of things you don't ‘need to worry about compared to 8 trecitional on-oremises solution: taining hardware and software + Install 1. provisioning and ~ Snowake isa cloud-built data warehouse as a service. All vou need to dois create an account ‘and load some data. You can then just connect from Tebleau and start querying, We even provide free usage to help vou get started if you are new to Snowfake! * Working out the capacity of your data warehouse ~~ Srowtlahe is = fully elastic platform, yo it can scale te hance al of your data and al of your users You can adjust the size of your werehouses (the layer that does the query execution) up and down {and on the fly ta handle peaks and lls in your cate usage. You can even turn your warehouses completely off to save money when you are not using them, * Learning new tools and a new query language ~ Snowfake isa fuly SQL-compliant data warehouse so all the skills and tools you already have (such 2s Tableau) will easly connect. We provide connectors for ODBC, JDBC, Python, Spark and Nodes as wel 5 web-based anc command-line interfaces. ~ Business users increasingly need to work with both traditionally structured data (e.g. data in \VARCHAR, INT and DATE columns in tables) as well as semistructured data in formats such 2s XML_JSON and Parquet. Snowflake provides 3 special data type called a VARIANT thet alows ‘you to simply load your semi-structured data and then query It the same way you would query traditionally structured data—via SQL. * Optimizing and maintaining your data ~ Snowflake is highly-scalable, columnar deta platiown that alows users to run analytic queries ‘uickiy and easly t does nat require you to manage how your data is indexed or distributed {cross partitions. In Snowfake, this is all trensparently manages by the platform, ~ Snowflake also provides inherent dats protection capabilites, so you don't nes snapshots, backups or other administrative tasks such as running VACUUM jobs. worry about {Grenson auoes nest PRACTICHS FOR USING TABLEAU WHT SMOWELAKE RT ys snowflake * Securing your data ~ Secuityis critica forall cloud-besed systems, and Snowflake has you cavered on this front. All ata is transparently encrypted when it is loaded into Snowflake, and itis kept encrypted at al ‘pres when at rest and in transit. = If your business requirements include working with dats that requires HIPAA, Pll or PCL ‘compliance, Snowfiake can also support these validations with the Enterprise for Secure Data ection. * Sharing data ~ Snowflake has revolutionized how organizations distribute and consume shared data, Snowlake’s Unique architecture enables lve data sharing bebween Snowfleke accounts without copving and moving data sets. Data providers enable secure data shares to the'r data consumers, who can ‘view and seamlessly cornbine it with their own dats sources. When a dats provider aces t9 or updates data, consumers abways see the most current version. Al of these festures free up your time snd allows you to focus your eitention on using Tableau to view ang understand your data, {Pevwemon cuves west PaACTCEs FOR USING TABLEAU WITH SNOWFLAKE. # wb ys snowflake Creating efficient Tableau workbooks Not every recommendation in this whitepaper is unique to using Tableau with Snowflake, There are many best practice approaches for creating eficient workbooks that are comman across al data sources, and @ great resource for this information isthe ‘Best Practices for Designing Efficient Tableau Workbooks’ whitepaper ‘That whitepaper contains too much to repeat here, but the key points are: ‘Keep it simple—The msjovty of performance problems ste caused by inefficient workbook desien (eg. to0 many charts on one dashboard or trying to show too much data at once) Allow your users {0 incrementally cil down to cetals, rather than trying to show everything then iter. * Less is more—Workewith only the data you need. The fewer rows and columns you work with, the ‘aster your queries will execute. Also, splay only the data you need, The fewer marks you ora, ‘the faster your workbooks will render. Don’t try to do It all yourself—The query generetor in Tableau Is one of the most efficient on the ‘market, so trust it to create the queries for you. Certainly, you should understand how things in 2 Tableau worksheet impact the queries generated, but the less you fight against the tool (eg through custom SQL) the better your experience wil be. {Grenson aces nest PRACT Fom UNING TABLEAU WITH SMOWSLAKE 18 dL ys snowflake Connecting to Snowflake USE THE RIGHT CONNECTOR To connect to Snowfake, Tableau use the Snowfeke-provided ODBC driver. To use this, you should TRH 1086) So come oy — Oe = : Dime {Grenson auoes nest PRACTICES FOR UNINGTARLEAU WITH SHOWSUAKE a x snowflake LIVE VS. EXTRACTED CONNECTIONS Because Snowflake is a high-performance, analytic data warehouse. it allows users to run ive connections over very lege dais volLimes and stil maintain ecceptab's query response fines by selecting an appropriately sized warehouse, With Snowflake there is less need ta use Tebleau extracts to improve query performance than on other trecitional data platforms. ‘Tablesy extracts can be complementary to Snow"lsk= when: * Users require an offline data cache that can be used without @ connection ta Snowflake: or * Users create aggregated extracts to act as summarized data caches, This was initially proposed by Jeff Feng in a TCLS presentation (Cold, Warm, & Hot Dato: Applving a Three-Tiered Hadoop Dota ‘Strategy] 2s an approach to working with large, slow data lakes. However, because Snowflake can hanale large volumes of structured and semi-structured data while still providing analytic avery performance, you could use it to provide both the cold and warm layers cusToM sat Sometimes users new to Tableau try to bring old-world techniques to their workbooks such as cresting dats soutes Using hand-written SQL ststements. In many cases this s counteroroductve 25 Tables can generate much more efficient queries when we simply define the join relationships between the tables anc let the query engine write SQL specie tothe view being created. However, there are times when specifying joins inthe date connection window does not offer all the flexibility ‘you need to define the relationships in your data, Creating a data connection using a hand-written SQL statement can be very powerful, But in some cases, custom SQL can actually reduce performance. This is because in contrast ta defining joins, custom SQL is never deconstructed and is always executed atomically. This means no join culling ‘ocours, and we can end up with situations where the database is asked to process the whole query, possibly multiple times. Example Consider the following, very simple Tableau worksheet, It's showing the number of records in the ‘TPCH SF1 sample schema for each manth, broken down by the market segment: fiecusacewoa|ecce=="(@r/¢ ee | a — — {Pevweon cuves west PEACTICES FOR USING TABLEAU WITH SHOWFLAKE. 1 wb ys snowflake lfour undetiying data model uses the recommended approach of linking tables: [ese ioe toe Deen = 4 Die DA mene wen ‘The Tablesu query engine produces the following optimal query that joins only the tables needed and returns just the columns we are disolayings SELECT "Customers“C_MITSEGMENT? AS °C_MKTSEGMENT. ‘COUNTIDISTINCT "Orders'70 .ORDERKEV") AS "ctd-0 ORDERKEY'ok, DDATE_TRUNC MONTH" Ovders""O, ORDERDATE" AS “tmm.0. ORDERDATE-o* FROM "TPCHI"VLINEITEM “Unto INNER JOIN “TPCHI'ORDERS' "Orders" ON (Liner _ORDERKEY" = "Orders'70_ORDERKEY) INNER JOIN “TPCH""CUSTOMER" "Customers" ON Orders'"0_CUSTKE" = *Customers™C,CUSTKEY’) GROUPE 1. 3 Tris results in the following optimal query olan: {Govwnmon ouDes orsrPeACTCEs FOR USING TANLEAM ITH SHOW FLAKE a x snowflake ‘What happens though if we simulate an exemple of embedcing business logic in the data model? Using custom SQL there are tio aoproaches we can take, The preferred approach Isto isolate the custom SQL to just the part of the model that is affected and keep the rest of the schema as jain Cefinitions, This is what the folowing example does simply by replacing the LINEITEM table with 3 custom SQL statement: In this case, the following query is generated by Tableau (the custom SQLs highlighted for easy identification}: SELECT "Customers 7C_MKTSEGMENT AS "C_ MKTSEGMENT, ‘COUNTIDISTINCT "Orders'70_ORDERKEY") AS “cd-0 ORDERKEY.OK, [DATE_TRUNC MONTH" Orders'"O. ORDERDATE®) AS “mm-O. ORDERDATE:0e* FROM ( SELECT FROM TPCHS.LINEITEM )"Unetten INNER JOIN “TPCHI'ORDERS "Orders" ON (Linetem™_LORDERKEY” = "Ordes'70 ORDERKEY) INNER JOIN “TPCH1""CUSTOMER' "Customers" ON Orders'"O_CUSTKEY" = “Customers. CUSTKEY') GROUPEY 1, 3 (Note: iis important that you dont include a terminating semi-colon in your custom SQL as this wil cause an eran) {Deven cuves west PEACTICES FOR USING TABLEAU WITH SNOWFLAKE. wb ys snowflake As you can see, the custom SQ] is nat decomposed but because Its scope Is just for the LINEITEM table, Tableau can cul (eliminate) the joins to the unneeded tables, The Snowflake optimizer then ‘manages to parse this into an optimal query plan, identical tothe initial examy “The worst-case result happens when we encapsulate the entire data model (with all table joins) into 2 single, monolithic custom SQL statement. This is often what users who are new to Tableau wil do, especialy f they have an existing query staternent they have used in another toa) Ft {Gowen oUDES oEsr PRACTICES FOR USING TARLEAM METI SHOW FLARE 15 a x snowflake ‘Agpin, the custom SQLs not decomposed so all the Tableeu query engine does is wrap the custom SQL in a surrouncing SELECT statement, This means there is no join culling and Snowfake Is required ‘to join all the tables together before the required subset of data is selected (again, the custom SQL is highlighted for readabilty: SELECT "Bad Custom SQUY°C_MICTSEGMENT: AS "C. MKTSEGMENT: ‘COUNTIDISTINCT "Bad Custom SO™'0_ORDERKEY") AS "etd:O_ORDERKEY0K, DDATE_TRUNC{MONTH"Bad Custom SQL"O_ORDERDATE") AS “trn:0_ ORDERDATE:04" FROM ( SELECT* FROM “TPGHI'7LINEITEM™-Unettem” INNER JOIN "TECH" ORDERS" “Orders” ON CLineten“L_ORDERKEY”~ “Orders""0. ORDERKEY") INNER JOIN "TPCHI™CUSTOMER" "Customers" ON (Orders™0_CUSTKEY" = "Customers™C,CUSTKEY) INNER JOIN “TPCH"PART*Parts™ ON (Linltem’"_ PARTKEY" = "Pars™P_PARTKEY") INNER JOIN “TECH "PARTSUPP* “PARTSUPP” ON (arst"P PARTKEV" = “BARTSUPPT°PS_PARTKEY") INNER JOIN "TPCHE'TSUPPLIER’“Suppllrs" ON (PARTSUPP™PS.SUPPKEY" = "SupolrsS SUPPKEY") INNER JOIN “TPCHI'TNATION "CustNalion” ON CCustomersi=C_ NATIONKEY" = "CistNaton"7N_ NATIONKEY) INNER JOIN "TPCHITREGION? *Custegion ON ("CistNation""N REGIONKEY" = "CustRegion™7R. REGIONKEY) ) "Bad Custom SQL GROUPE 1. 3 ‘As the following query plan shows, the Snowflake optimizer can parse this only so far, and a less efficient query is performed: =a cI (SS Som anata teomenren 3 Tiislest approach is even more ineficient as this entire query plan needs to be run for potentially every {query in the dashboard. In this example, there is only one. But if there were muitiale data driven zones (eg. multi charts fiters, egends, etc) then each could potentially run this entire query wrenped in lis own select clause. |f this were the case, t would be orsferable to follow the Tebleey recommended practices and create the data connection as a Tableau extract (data volumes permitting). This would require ‘the complex query to run only ance and then all subsequent queries would run against the extract. owen cuves west yancrices rom wb ys snowflake INITIAL SQL As just discussed, one of the limitations of custom SQlLis that it can be run multiple times for a single dashboard. One way to avoid this isto use intial SQL to create 2 temporary table which will then be ‘the selected table in your query. Because inital SQL is executed eniy once when the workbook is ‘opened (as opposed to every time the visualization Is changed for custom SQL this could significantly Improve performance in some cases, However, the flip-side is that the data populated into the temp table wil be static for the duration of the session, even ifthe data in the underlying tables changes. Example Using the exemple above, insiese of placing the entre auery in = custom SQL statement, we could use: itin an inital SQL block and instantate a temporary table: (CREATE OR REPLACE TEMP TABLE TPCHLLFOO AS SELECT" FROM “TPCHI'SLINEITEM™ Une" INNER JOIN "TPCH'"ORDERS' Orders" ON (Linelten_ORDERKEY" = "Orders170_ORDERKEY") INNER JOIN “TPCHE=CUSTOMER™ “Customers” ON FOrdars™O_CUSTKEY" = "Customets™C. CUSTKEY) INNER JOIN “TPCHE"PART*Partst ON (Linltem!"_ PARTKEY" = "Pars™P_PARTKEY") INNER JOIN "TPCHE'"PARTSUPP” "PARTSUPP” ON (Parts7P_PARTKEV" = "PARTSUPP""PS_PARTKEY) INNER JOIN “TPCHI'TSUPPLIER" "Supliers" ON CPARTSUPP™PS.SUPPKEY" = "Supoles7S_SUPPKEY) INNER JOIN “TECH "NATION? *CustNation” ON CCustomers"C_ NATIONKEY" = *CustNation NATIONKEY) INNER JOIN “TPCHE'"REGION? “Custer” ON *CstNaton"*N REGIONKEY" = "CustRegion™"R_ REGIONKEY, \We then simply select the FOO table as our data source: ‘And Tebleau generates the following query: SELECT "FOO"C_MKCTSEGMENT? AS"C_MKTSEGMENT?, ‘COUNTIDISTINCT "FOO ORDERKEY") AS "td:0_ ORDERKEY.ok’ DDATE_TRUNCIMONTH"FOOO_ORDERDATE") AS “tnm:0_ORDERDATEOK" FROM *TPCHI'SFOO" *FOO" GROUPBY 1, 3 {Gown ouDes orsrPeAcrices FoR usin TANLEAM ITH SHOW ELAKE 7 RT ys snowflake Tris has a very simple query plan and fast execution time as the hard work has been done in the creation of the temp table, However, the data returned will not reflect changes to the underlying fact tables until we start 2 new session and the temp table is recreated, Note that on Tableau Server an administrator can seta restriction on intial SQL so thet It will nat run. You might need to check to see that this is okay n your environment if you are planning to publish your workbook to share with ethers, Also, note that temp tables take up adbitional space in Snowflake that will contrizute to the account storage charges, but they are ephemera, so this s generally not significant, VIEWS \Views are another alternative to using custom SQL. Instead of materaizing the result into a temp table, which is potentially a time-consuming action to perform when the workbook is inially opened, ‘you could define a view: CREATE OR REPLACE VIEW FOO. VWAS SELECT FROM “TPCHIVLINEITEM" "inet" INNER JOIN “TPCH1™ORDERS" "Orders" ON (Linen ORDERKEY" = "Orders'"O. ORDERKEY) INNER JOIN “TPCHI''CUSTOMER' "Customers" ON (Orders"'0_CUSTKEV" = "Customers1C.CUSTKEY)) INNER JOIN “TPCHI'= PART" "Pats" ON CLineltemL_PARTKEY" = "Pt INNER JOIN “TCH INNER JOIN "TPCHI'"NATION? "CustNation” ON ("Customers!™C_NATIONKEY" = "CustNatior??N_ NATIONKEY) INNER JOIN "TPCHE'TREGION? "CustReplon” ON ("CustNatlon""N_REGIONKEY = "CustRegionR_ REGIONKEY) In Tableau, we then use the view as our data source: {ewan cuves west raacrices ron RT ys snowflake ‘Tne query generated by Tableau Is simple, ast is with the temp table appras: SELECT"FOO_WW'"C_ MKTSEGMENT" AS °C_MKCTSEGMENTS, ‘COUNTIDISTINCT "FOO. VWY70 _ORDERKE")AS ct. ORDERKEYG [DATE_TRUNCIMONTH'FOO_ V0. ORDERDATE) AS “nitOORDERDATE:OK* FROM *TPCHI'FOO_WW" *FOO. VW" GROUPE 1, 2 However. this takes longer to run and has 2 ess efficient query olan as the view needs to be evaluated at query time. The benefit is that you will always see up-to-date data in your results: Allof the above approaches are valid but vou should select the most appropriate solution based on ‘your particular needs of performance, data freshness, maintainabiity and reussbiity, {Gowen ouDeEs orsr PRACTICES FOR USING TANLEAM ITH SNOWFLAKE wb ys snowflake Working with semi-structured data ‘Today, business users find themselves working with data in multiole forms from numerous sources, ‘This includes an ever-expanding amount of machine-genereted deta from applications, sensors, mabile devices, etc Increasingly, these deta are provided in semi-structured date formats such as JSON, ‘Auro, ORC, Parquet and XML that have flexible schemas, These semi-structured dats formats do not conform to the standards of traditionally structured data, but instead contain tags or other types of ‘mark-up that identify ingividual, distinct elements within the data: Coys ‘coord ("et -37.813999, "Yon 144963518), “eourty AUS, “fidname" “MELBOURNE ‘ig: 2158177, “ame "Mesboure’ ‘room 5 "eos “al 88) “mai “pumity’: 3, "pressure: 1010, "emp" 2084, “temo, max: 305.15, "temo_ in": 301.25) var "3H057), time 151446494, weather “dosripton®: “tral "eons "00, ia 500, "ai "Raio win "deg 950, "speeds 4.1) 1 ‘Two of the key attributes that cistinguish semi-structured data from structured data are nested data structures and the lack of a fred schema: * Unlike structured data, which represents data as 2 flat table, semi-structured data can contin r-level hierarchies of nested information + ‘Structured data requires a fixed schema that i defined before the data can be loaded and queried ina relational databace system. Semi-structured data does not require @ rior definition ofa schema and can constantly evolve: i., new attributes can be added at any time. ‘Tableau has introduced support for directly reacing JSON data files. However, broader support for semi-structured data can be achieved by fst loading the data into Snowflake. Snowflake provides Native support for semi-structured data, inclucing: + Flexble-schema data types for loading semi-structured data without transformation * Direct ingestion of JSON, Avra, ORC, Parquet and XML file formats + Automatic conversion of data to Snowflake's optimized internal storage format * Database optimization for fast and efficient querying of semi-structured data {Grenson auces nest PRACT FOm UNING TABLEAU WITH SHOWSLAKE 20 wb aS snowflake ‘THE VARIANT DATA TYPE Rather than requiring semi-structured data to be parsed and transformed into a traditional schema of sngle~yalue columns, Snowflake stares semi-structured dats in single column of = special yoo \VARIANT. Each VARIANT column can contain an entire semi-structured object consisting of multiple key-value pairs, SELECT’ FROM SNOWFLAKE SAMPLE DATAWEATHERWEATHER 14 TOTAL ura ‘KATHMANDU. io 1285240, ‘Tne VARIANT type Is quit 1 store the semi-structured object as 2 text string, but rather the individual keys and their values are stored in 2 columnarized format, just lke normal columns in 8 relational table, This means thst storage and query performance for operations on data ina VARIANT column are very similar to storege and query performance far data in # normal relational column Note that the maximum number of key-value pairs that will be columnarized for a single VARIANT column is 1000, f your semi-structured data has > 1000 key value pairs you may benefit from spreading the data across multiple VARIANT columns. Adcitionally, each VARIANT entry is imited to 2 maximum size of 16ME of comaressed lata ‘The individual key-value pairs can be queried directly frern VARIANT columns with 2 minor extension ‘to traditional SQL syntaxc SELECTVitime: TIMESTAMP TIME, ‘ctyname:VARCHAR CITY, Vetycountry:VARCHAR COUNTRY, main temp_max- 273-15} FLOAT AS TEMP._MAX \Vimein.temp in -273.15;:FLOAT AS TEMP-MIN, Veweather(0}. main:VARCHAR AS WEATHER MAIN FROM SNOWFLAKE. SAMPLE. DATAWEATHERWEATHER 14 TOTAL: euen2018 05am | Me aU 2 e Ran eon 2016 12 Ay 2 v Ren Tuer2016 1105 am | Mel au 2 18 suse {Grenson auces nest PRACT FOm UNING TABLEAU WITH SMOWSLAKE 2 RT ys snowflake ACCESSING SEMI-STRUCTURED DATA FROM TABLEAU Unfortunately, Tableau does not automatically recognize the VARIANT data type soit doesn't sutomatically create queties with the SQL extensions outlined above, This means we nese to manually create the SQL necessary for accessing the data in t (As outlined earlier in this paper, one may to sccess semistructured cats is to use custom SOL. ‘The best practices densified there should b= followed in this case—speciically that you dont use a ‘monolithic statement that joins across muitile tables. Instead, use 2 discrete statement to reference the key-value pairs from the VARIANT column and then join that custom SQL "table" with the other regular tables, Also remember to set “assume referential integrity’ so the Tableau query generator can cull tables when they are not required, Example ‘To use the WEATHER 14 TOTAL table in the semple WEATHER schema, we can create a custom SQL. {Gowneon ouDes orsr PRACTICES FOR USING TANLEAM METH SHOW LAKE 2 a x snowflake ‘Tris data source can then be used to create vizzes and dashboards as I the user were connected to = traditional structured schema, with similar performance: In this exemple, the WEATHER _14 TOTAL table has ~66M records and response times in Tableau Using 2 X-SMALL warehouse were quite acceptable, Cf course, as outlined earier, this SQL could also be used in 8 DB view, or in an initial SQL statement to create a temp table~the specifc spprosch you use should be cictated by your needs. Alternatively, as a way to ensure governance and a consistent view of the JSON data, you can always oublish the (sta source to Tableau Server (or Tableau Online) for reuse across multiple users and workbooks: Genmrson ouces vest RT ys snowflake PASS-THROUGH SQL ‘Tablesu provides @ number of functions that alow us to pass raw SQL fragments thraugh to ‘he undertying data source for processing, These are often referred to as RAW/SQL functions: hitpv/onlinehelptableau.com/current/pro/desktap/er-us/functions_ functions_passthrough htm! ‘We can use these to dynamically extract values from a VARIANT fie exnty Bete ona x resa_sT(” eityscountry stringy (V1) ‘This SQL fragment is passed through to Showfiake without alteration. Here is @ query fram 2 calculation based on RAWSQL statements (the fragments are highlghted! SELECT (ety countrystrng) AS "1D? “AVGK(Valrtemp max- 273.15i:fea) AS "avgtemp_maxcok FROM "WEATHER'TWEATHER_14 TOTAL” WEATHER. 14 TOTAL’ GROUP BY 2 ‘The advantage of this approach is that the Tableau author can extract specific VARIANT values without needing to create an entire custom SQL table statement, eT ‘Wile there are benefits to keeping your semi-structured data in VARIANT data types, there are 259 |iniistons as outlined above, f your dats scheme is well defined and sistic, you may benefit ‘rom performing some ELT (extract-laad-transform) transformations on your deta ta convert into @ ‘racitional data schema. ‘There are multiple ways to do this. One isto use SOL statements cirecly in Snowflake (for more tall on the supported semi-structured data functions click on this links htps/éocs-snowTekennet/ ‘manuals/sq/-eference/functions-semistructured ntmissem-structured-date-functions) f you need to conditionally separate the data into multiple tables you can use Snowflake's multi-table insert statement to imorove performance: hitos://docs.snowflake.net/manuals/sal-reference/sal/insert-mult tabletm {Gowen ouDes orsr PRACTICES FOR USING TANLEAM ITH SNOWFLAKE RT ys snowflake Another approach Is to use thire-party ETL/ELT tools from Snowflake partners such as Informatica, Matilion, Alaoms anc Alteryx (to name 3 few), Note that If you are dealing with large volumes of data, you wil want to use tools that can take advantage of in-DB processing. This means that the data processing will be pushed down into Snowflake, This greatly increases "worktlow performance by eliminating the need to transfer massive amounts of dats aut of Snawake to the ETL tool, manipulate i locally and then push it back into Snowflake, owen ouces ves a x snowflake Working with Snowflake Time Travel Sawflake Time Travel is 3 powerful feature that lets users access historical data. This enables many Powerful capabilities but in the case of Tableau it allows users to query data in the past that has since been updated ov deleted ‘When any data manipulation operations are performed on a table (eg. INSERT. UPDATE, DELETE ste], Snowflake retains previous versions of the isble data for = defined period of time. This enables querying earier versions of the Gata using an AT or BEFORE clause. Example ‘Tne following query selects historical data from a table as of the date and time represented by the specified timestamp: sevect* FROM ny table Timestamp => Mon, Of May 2015 1620.00 -0700timestamgl, ‘The following query selects historical data from a table as of five minutes ago: setect* FROM my table ATlofset => -60°S); ‘The folowing very selects historical Gata from = table ue) ‘the specified statement: but net including any changes made by setect* FROM my table BEFOREtatement => BeSd0ca?-005e-44e6-0858-28150370S726); ‘These features are included standard forall accounts (Le, no adcitional licensing is required). However, standard Time Travel is one dey. Extended Time Travel (up to 90 days) requires Snowfake Enterprise Edition or highes) In addition, Time Travel requires addtional dats storage, which has associated fees. ACCESSING TIME TRAVEL DATA FROM TABLEAU ‘As with semi-structured data, Tableau does nat automatically understand naw to create time travel ‘queries in Snowflake, so we must use custom SQL. Again, we recommend you use best prectices by limiting the scope of your custom SQL to the table(s) affected, Example ‘The following Tableau data source shows a copy of the TPCH_SF1 schema, with Time Travel enabled, \We want to be able to query the Lineltem table and use the data it contained at an arbitrary point in ‘the past. To do this we use @ custom SQL statement for just the Lineltern table. In this case, we set an AT clause for the Lineltem table and use a parameter to passin the Time Travel time value: {G-cnmsnon aces nest PAACTICS FOR UNING TABLEAU WITH SMOWSLAKE 26 wb ys snowflake Using this query in the following worksheet ee wec0ego WOR: D vom mee to 7 - =< Results in the following query running in Snowflake (the custom SQL Is highlighted): SELECT "Customers “C_MITSEGMENT? AS °C_MKTSEGMENT. ‘COUNTIDISTINCT "Orders'70 .ORDERKEV") AS "ctd-0 ORDERKEY'ok, DDATE_TRUNCTMONTHOrdere'"0, ORDERDATE) AS “tmn:O_ ORDERDATE:k* FROM ( SELECT? FROM TPCHLLINEITEM ATitinestamp => 2018-01-01 12:00:00.967:TIMESTAMP._NTZ) )"Uneten INNER JOIN “TECHI'"ORDERS' “Orders ON (Lineten"_ORDERKEY" = "Ordes!70_ORDERKEY") INNER JOIN “TPCH1™CUSTOMER' "Customers" ON Ordes""0.CUSTKEV" = “Customers™C.CUSTKEY') GROUPE A. 3 ‘Thanks to the parameter, tis easy forthe user to select cferent as-et times, Again, this SQL could 2s0 be used in 2 DB view orn an initial SQL statement to create a temp table. The specific approach ‘you use should be dictated by your needs. Geneon ouces vest peacrices fox RT ys snowflake Working with Snowflake Data Sharing Snowflake Data Sharing makes it possible to cirectly share data in real-time and ina secure, governed and scalable way from the Snowflake cloud data warehouse. Organizations can use it to share data interaly across lines of business, or even extemaliy with customers and partners with almost no friction or effort. This is an entirely new way of connecting users to dats that doesn't require transmission, and it significantly recuces the trecitional pein points of storage cuplication and latency. Instead of transmitting deta, Snowflake Data Sharing enables “date consumers’ ta cirecty access reac: coniy copies of ive data that’s in @ “data provider's’ account, ‘The obvious advantage of ths for Tableau users is the ability to query data shared by 2 provider and to know itis always up-to-date, No ongoing data administration is required by the consumer. Example To access shared data, you frst view the avaiable inbound shares: show shares; ‘You can then create a database from the Inbound share end apply appropriate privileges to the necessary rales: {iCreate a database from the share create or replace database SNOWFLAKE SAMPLE DATA ftom share SNOWFLAKE SHARED. SAMPLES; {Grant permissions to others _grant imported privileges on database SNOWFLAKE_SAMPLE.DATA to roe publi: Users can then sccess the database objects as if they were local. The tables and views anpeer i ‘Tableau the same as any other: el eee FOG Fe SFO PPO) we =e to {Grenson aces nesr PRACT FOm UNING TABLEAU WITH SHOWSLAKE 28 wb ys snowflake Implementing role-based security ‘Acommon requirement when accessing data Isto implement role-based security where the data retuned in @ Tableau viz is restricted elther by row, column or both, There are multiple ways to 1 and Snowfiake, and a key determinant ofthe right solution 's how reusable you ‘The following sections explain howto implement both Tableau-only and generic solutions. SETTING UP THE DATA ACCESS RULES ‘Torestrict the deta available to a user. we need a table to define re user context and the cata elements we wish ta alow access. Example Using the TPCH_$F1 schema in the sample data, we create a simple REGION, SECLRITY table 2s follows: ety aly link this table ve the REGION and NATION tables to the CUSTOMER table: 2 FP PPPrPrPrr {Grenson ouces nest PRACTICHS FOR USING TABLEAU WITH SMOWELAKE 28 RT ys snowflake In Tableau we now see ctferent result s retumed for each of the three RS_USER values: en xe key If we needed to create access rules across multiple dimensions of aur data (e.g. in adcition to region, ‘restrict access by product brane), then ne would create multiple access rule tables ‘sparopriately ita cur dats schema ‘we also want ano connect the PASSING IN THE USER CONTEXT “To make the above model secure, we need to enforce the restriction so that a viewing user can only see the result set permitted to their user context. Le, where the RS_USER field is equal to the viewing Users usemame, We nesd to pass this usemins from Teblzeu into Snowflake end the way ve do ‘this depends an whether we are creating @ Tableau-only solution or something thats generic and wil work for any data tool TABLEAU-ONLY SOLUTION If we are building a Tableau-only solution we have the ability to define some of our security logic at the Tableau layer. This can be an advantage in some scenarios, particularly when the viz author does ‘not have permission to modify the database scheme or add new objects (eg. views), as would be the case ifyou were consuming 2 data share from another account. Aduitionally, ifthe nteracior users in Tableau (i=, users scvsssine the viz vie Tebleeu Server, as ‘opgosed to the author creating it in Tablesu Desktop) do not use incvictal logins for Snowflake, then ‘you must create the solution using this appraach because the Snawfiake variable CURRENT_USER will be the same for all user sessions, Example To-enforce the rest source filter in Tableau that restricts the result set to ‘where the RS_USER field matches the Tableau function USERNAME) which returns the login ofthe current use! {owneon ouDEs oEsr PRACTICES FOR USING TARLEAM ITH SHOWELAKE 2 wb ys snowflake “The resuitis only the data permitted for the viewing user. Nots, that in the following screenshot there: ate na worksheet-evel fiters 2s the iter is enforced on the data source: 2 FBR RO RP OT or Tm 7/4 Pow net 1 Tae - <= area or euan ‘The USERNAME\) function is evaluated in Tableau and then oushed through to Snowflake a6 a literal as you can see in the resulting query: SELECT'ALAN’ AS “Caleustion 38812177424095891107, CustNationN_NAME" AS "NAME" FROM "TPCHI"INATION” "Custation” INNER JOIN “TPCH1™REGION' "CustRegion® ON ("CustNation"*N_REGIONKEY" = “CustRegio’ REGIONKEY) INNER JOIN “TPCHIREGION SECURITY" “REGION SECURITY" ON ("CustRegion"ZR.REGIONKEY" = “REGION. SECURITY""RS_REGIONKEY") WHERE (UPPERCREGION.SECURITY!"RS_USER") = ALAN) GROUPEY2 {O-cnmsnon auoes nesrPRACTICS FO UNING TABLEAU WHT SMOWSLAKE 3 RT ys snowflake ‘To enforce this fiter and prevent 8 workbook author from editing or emaving it, you should publish the data source to Tableau Server and create the workbook using the published data source, SOLUTION FOR ANY DATA TOOL, ‘Wile the above solution is simple to implement it only works for environments where all access is vig Tableau, 2s the security lope is implemented in the Tableau layer. not the database, To mete tie solution tray robust and applicable for users connecting via any tool, we need to push the logic dawn Into Snowflake, How we do this depends on whether each user has thelr own Snowflake login or whether all queries are sent to Snowflake vie @ common login, Example Continuing the exemple above. rather than using @ dats source filter in Tableau, we will create 3 View in Snowfake: ‘ate or replace secure view "TPCHI™'SECURE_REGION.VW" 35 ‘select R_REGIONKEY, R.NAME, COMMENT, RS_USER. from “TPCHI"REGION""CustRegion” ner jin “TPCH"REGION SECURITY" "REGION SECURITY" ‘on CustReglon’?R_ REGIONKEY" = "REGION, SECURITY"RS_REGIONKEY) [WHERE (UPPER REGION SECURITY )PPER|CURRENT. USER), In this example we are using the Snow/eke variable CURRENT_USER but we could equally be using CURRENT_ROLE ifyour solution needed to be mare scalable. Cleary this requires each viewing user to.be logging in to Snowfake with their own credentials—you can enforce this when publishing the ‘workbook to Tableau Server by setting the authentication type to ‘promet user’ Manage Data Sources source user ipe © ‘narerceaton 6 Anyeatatoolsottion Embedcedinworkbock | [ Promptuser : Promot user Embedded passwort | embed Al Passwords It you are using "embedded password! then CURRENT_USER and CURRENT _ROLE wil be the same forall user sessions, In this case we need to passin the viewing user name vis an inital SQL block: {owneon ouDEs oEsr PRACTICES FOR USING TABLEAU ITH SHOW LAKE 22 wb ys snowflake ‘The view would then refer ce this variable: rate or replace secure view "TPCHI"'SECURE_REGION.VW" 25 ‘select R_REGIONKEY, R NAME, R.COWMENT, RS_USER fom “TRCHI"TREGION" "CustRegion” Inner jin “TPCHI'7REGION. SECURITY” "REGION SECURITY" fon CustRegion’"R_REGIONKEY" - "REGION SECURITY""R5_REGIONKEY") WHERE (UPPERCREGION, SECURITY" JPPER(SVIEWING. USER; Tris then enforces the fiter on the result set returned to Tableau based on the Tableau user name: ‘There is one final step required to enforce the security rules specified in the SECURE REGION WW ‘view. We need to enforce referential integrity in the schema, Without thi, if you dontt use the SECURE_REGION.VW in your query then join culling could drop this table from the auery and security would be bypassed, {Grenson auces nest PRACT FOm UNING TABLEAU WITH SHOWSLAKE 32 RT ys snowflake It your data permits, you can create constraints between the tables in Snowflake (see hitos://docs, snowflake.net/manuals/sq-reference/sqi/reate-table-constraint html for detall Uncheck “assume referential integrity" in Tableau: ‘or you can simply ao eS aoe. a 8/85) {Grenson auoes nest PRACT FOm UNING TABLEAU WITH SMOWSLAKE 34 wb ys snowflake Using custom aggregations Snowflake provides 2 number of custom aggregation functions outside the ANSI SQL specification. Some of these are specie to working with semi-structured data, while others are useful for approximating results (2, carnal, simierty: frequency, etc) when you are working over very leige dats volumes. Atul ist ofthe avaliable aggregetion functions can be found in the Snowflake online documentation: httos:/docs snowflake:net/manuals/sal-reference/functions-aggregation.html “To use these functions in Tableau, you can leverage the pass-through functions in Tableau httpx/onlinehelptableau.com/current/pro/desktop/en-us/functions, functions passthrough htm! Example Snowflake provides # custom agaregation function APPROX_COUNT_DISTINCT which uses HyperLogLog to retum an approximation of the distinct cardinality of afield. To use this function, ‘we can create calculated feld that leverages the appropriate RAW/SQL function. When we use this, caleulation in 2 Tableau viz: e672 064-c/8-#a. Dar oe Gee |e enw — arcran — ep poproxCourtd| sc RISER ore ID Ta mmm [I “Tableau produces the following query in Snowflake: SELECT COUNTIDISTINCT “Orders'°0. ORDERKEY" AS “ctetO. ORDERKEYk (APPROX_COUNT_DISTINCTI‘Ordes!"0,ORDERKEY") AS "us Calculaton_1944218056180731900K, [DATE_PARTCYEAROrders™O_ORDERDATE? AS "yO ORDERDATE:o” FROM “TPCH.SFULINEITEME “Lint INNER JOIN "TPCH_SFI"ORDERS" "Orders" ON (‘Lelie "L_ORDERKEY" = "Orders0_ORDERKEY) GROUPEYS {Grenson aces nest PAACTICS FOR UNING TABLEAU WITH SMOWSLAKE 36 RT ys snowflake Scaling Snowflake warehouses Snowflake supports two ways to scale warehouses: + Scale up by resizing 2 warehouse. + Scale aut by adding clusters to a warehouse (requires Snowflake Enterprise Edition or higher) WAREHOUSE RESIZING IMPROVES PERFORMANCE Resizing a warehouse generally improves query performance, particulary for larger, more complex. ‘queries It can also help reduce the queuing that accurs if warehouse does nat have enough servers to pracess all the queries that are submitted concurrently. Nate that warehouse resizing is nat intended for handing concurrency issues, Instead, use adcitional warehouses to handle the workload lor use @ mutt-cluster warehouse if this feature is avalible for your account, ‘The number of servers recuired to process a query depends on the size and complexity of the query For the most part aueties scale inesty with warshouse size, parc aly fr later mere complex aueries: * When considering factors that impact query pracessing, the overall data sizeof the tables being (queried has more impact than the number of rows. + Fitering in @ query using predicates also has an impact on processing, 2s does the number of joins/ tables in the query. Snawfake supports resizing a warehouse at any time, even while running. Ifa query is running slowly ano you have additional queries of similar size and complexity that you want ta run on the seme warehouse, you might choose to resize the warehouse while It 's running. However, note the following: © As stated earlier about warehouse size, /arger’s nut nsvessaily fester: for smaller basic queres thet re ayeadly executing auicky, you may not see any sigificant improvement after resizing + Resizing @ running warehouse does not impact queries that are already being processed by the ‘warehouse: the additional servers are only used for queued and new queries. Decreasing the size ofa running warehouse removes servers from the warehouse, When the servers ate removed, the cache associated with the servers is drooped, which can imoact performance the samme way thst suspencing the wershouse cen impact performance after its resumed. Keep this 9 ming when choosing whether ta decrease the size of a running warehouse or keep it atthe current Size, In other words, there is @ trade-off with saving credits versus maintaining the server cache, Example To demonstrate the scale-up capability of Snowake, the following dashboard was created against the TPCH-SF10 samole data schema snd published to Tableau Server. Note that this dashboard was intentionally designed to ensure multiple queries would be initiated in the underlying DBMS: {Pevween cuves west PEACTICES FOR USING TABLEAU WITH SHOWFLAKE. 3 RT ys snowflake Resort sy te ‘Tne data source forthe workbook was also configured with an intil SQL statement to disable query result caching within Snowflake via the USE_CACHED RESULT session parameter. Result caching (which we caver in more detal later] would make every query after the first one extremely fast because the avery results do not need to be recomputed and are just reed from cache in < 1 sec ‘Tablolt was then installed on @ Windows PC to act 2s a load generator, invoking the dashboard with = refreah=y parameter to prevent caching within Tableau Server. A series of test runs were then performed, running the above dashbostd auainst citferent Snowfske warehouse sizes from XS to XL All ests "were run for five minutes with a single mulated user using the lnteractVisLaadTest script, and the total umber of completed tests and average test response times were recorded. The resuits areas follows: : 1 Ss ol {even cuves west PaACTCEs FOR USING TABLEAU WITH SHOWFLAKE 37 RT ys snowflake As you can see, as we increase the size of the warehouse (from XS up to XL) the average query time clecreases because there are more compute resources avaiable. The Improvement tals off from L to Les we have reached 2 point where compute resources are no longer 2 bottleneck ‘You can also see in the fist run of the query at each warehouse size (the orenge dot in the chart below), data needs to be read from the underiying table storage into the warehouse cache. Subseavent runs for each warehouse size can then read data entirely from the warehouse cache, resulting in improves performance: ase Fst Request wx Bie e ia en e e hee e De yey ‘To achieve the best resuts, try to execute relatively homogeneous queries (size, compleity, deta sets, etc) on the same warehouse; executing queries of widely-varying size and/or complexity on the same ‘warehouse makes it more dificult to analyze warehouse load, which can make it more difficult to select the best size to match the size, composition and number of queries in your workload. MULTI-CLUSTER WAREHOUSES IMPROVE CONCURRENCY Mult-cluster warehouses enable you ta scale warehouse resources to manage your user and query concurrency needs as they change, such as during peak and off hours. By default, ¢ warehouse consists ofa single cluster of servers that determines the total resour avaliable to the warehouse for executing queries, As queries are submitted to a warehouse, the ‘warehouse allocates server resources to each query and begins executing the queries. f resources are insuffi to execute all the queries submitted ta the warehouse, Snowflake queues the additional ‘Queries until the necessary resources become available, With muit-cluster warehouses, Snowflake supports allocating, either statically or dynamically. larger Doo! of resources to each warehouse, {evweeon cuves west PEACTICES FOR USING TABLEAU WITH SHOWFLAKE. 3 a x snowflake ‘When deciding whether to use multi-cluster warehouses and choosing the number of cl ‘consider the following: per warehou ‘ent for running in maximized mode, configure mult-cluster warshouses for suto-scal ng. This enables Snowflake to automaticaly stert and stow clusters a5 needed. Snowflake starts new clusters when it detects queries are queuing, and it stops clusters ven it Gstermines fist Ne clusters are sufficient t2 wn the current query (ead. © Unless you have # specie reau te oT ere are to scaling polices avaiable: ‘queving by running new clusters as soon as queries start to queue. sful load recistrieution checks, which are perfor ~ Standard scaling minimiz The clusters will shut down after 2-3 su at one-minute intervals. d ~ Economy scaling conserves crecits by keeping running clusters fully losded rather than starting S-> M) the total numberof tests completed increases by 8155 and the average test response time decreases by 44%, This Is unsurprising as we have Increased the number of servers running queries from one to two to four, Check out the following chart on the left: “al ‘li seta sig rae) sean S209 wt a oF 5 g a: a: ai i: a: a However, when we keep the number of servers constant by running four XS clusters, two S clusters or a single M cluster (each of which has four servers) we see that the mult-cluster warehouse approach has sinnficanly beter scab lity characteristics 55% more completed tests with an averase response ‘Sime of SES, Check out the above chart on the right. {G-cnmenon auces nest PRACT FOm UNING TABLEAU WITH SMOWSLAKE 8 RT ys snowflake Caching ‘The combination of Tableau and Snowflake has caching at multiple levels. The more effective we can make our caching, the more efficient our environment will be. Queries and workbooks wil return (saul fester anc! ess oad il be spplied to tie server layers, leacing to arester scalability. TABLEAU CACHING ‘Tableau has caching in both Tableau Desktop and Tableau Server which can significantly reduce the amount of rendering, calculation and querying needed to display views to users. For Tableau Server in particular, caching is critical for achieving concurrent user scalability PRESENTATION LAYER Caching atthe presentation ayer is relevant for Tableau Server only, In Tableau Server, multiple end users view and interact with views via a browser or mabile device. Depending on the capability of the browser and the compleuty of the view, rendering will be done elther on the server or on the cent BROWSER In the case of clent-side rendering, the client browser downloads a JavaScript "viz cient” that can vender Teblesu viz [shen reguests the intial dete geckage for the view ftom the seiversealleg the bootstrap response. This package includes the layout ofthe view and the data for the marks to be cisplayes. Together, they are called the view mace! \With this data, the viz client can then craw the view locally in the client's browser [As the user explores the view, simple interactions (e.g. tooltios, highlighting and selecting marks) can bbe handled locally by the viz client using the local data cache, This means no communication with ‘the server is required anc the resuit is a fast-sereen update. This also recuess the compute load on ‘the server which helps with Tebleau Server scalability, More complex interactions (e.g, changing @ parameter or filter) cannot be handled locally, so the update Is sent to the server and the new view made! is sent to update t local data cache. fil re fe een! Not al vizzes can be rendered client side. There are some complexities that wil force Tableau to use senerside rendering: + ifthe user's browser does not support the HTML 5 «canvas» element + If 3 view uses the polygon mark type or the page history feature: or «fs view is Geter ned to bs overly complex. The complenty of the vie number of marks, rows, columns and more. is determined by the {Grenson ouces nest PRACTICES FOR USING TABLEAU WITH SMOWELAKE At RT ys snowflake THE Inthe case of server-side rendering, the view is rendered on the server as set of static image “tes!” “These tle are sent to the browser, and a much simpler viz cient assembles them inta the final view. ‘Tne viz client monitors the session for user interaction (eg. havering fora tooltip, selecting mars, highlighting, interacting with fiters, ec), and if anything needs to be drawn it sends a request back to the server for new tle, No data is stored on the cient, s0 ll interactions require a server round-tri, ‘To help with performance, the tie images are persisted on Tableau Server in the Se cache. Inthe event ‘the same tle Is requested, t can be served tram the cache instead of being re-rendered. Note that the tile cache can be used only if the subsequent request is for the exact same view. This means requests for the same view from different sized browser windows can result in cfferent tiles being rendered. (One of the very easy performance optimizations you can make is to set your dashboards to fixed size instead of automatic. This means the view requests will always be the same size, irrespective ofthe browser window cimensions, allowing a better hit rate forthe tle cache. SESSION AND BOOTSTRAP RESPONSE Wen rendering # vi. we nese the vew layout instructions and mark dats~refetred to a5 the view rmodel—before we can do anything, The VizQL Server generates the madel based on several factors Inclucing the requesting user credentials and the size of the viewing window, It computes the view of the viz (Le, relevant header and axis layouts, label postions, legends, filters, marks and such) and then sends itto the renderer (either the browser for client-side rendering or the VizQL Server for server side rendering). ‘The VizQL Server on Tableau Server maintains 2 copy of view models for each user session. This is initially the default view of the vz, but as users interact withthe view in their browser. itis updated to reflect the view state—highlights, filters, selections, ec. A dashboard can reference multiple view madels—one for each worksheet used, it includes the resuits from local calculations (e.g. table cals, reference lines, forecasts, clustering, etc) and the visual layout (now many rows/columns to display for small multiles and crosstabs, the number and interval of ans ticks/grd lines to draw, the number and location of merk labels to be shown, etc) Generating view models can be computationally intensive, owe want to avoid this whenever possible. The initial models for views—called baotstrap responses—are persisted on Tableau Server by the Cache Server. When a request comes in for view, we check the cache to see if the bootstrap |ssponse aleady exists, and fit does, we serve it up without having to recompute eng reserialze Noteall views can use the bootstrap response cache, thaugh. Using the following capabilities wil prevent the bootstrap response from being cached: {Grenson auoes nest PRACT FOm UNING TABLEAU WITH SMOWSLAKE wb ys snowflake + Ifa view uses relative date fiters: + fa view uses user-specific fitters: or + Ifa view uses published Data Server connections. Like the ble cache, the bootstrap resoonse is specie to the view size so reguests forthe same view from browser windows with different sizes will produce aifferent bootstrap responses. The easy performance optimizations you can make is to set your dashboards to be fixed size instead of automatic, This means the view requests will always be the same size respective of the browse! window dimensions, allowing a better hit rate for the bootstrap response cache. Maintaining the view model is important as it means the VizQL doesn't need to recompute the view state every time t wry by the VizQL Server, and it does its best to share results across user sessions where possible. The key factors in ‘whether 3 visual model can be shared are: cts. The madels are created and cached in + The size ofthe viz display area: Models can be shared only across sessions where the size of the is the same. As already stated above, setting your dashboards to have fixed size will benefit the model cache in much the same way as it benefits the bootstrap response and tile caches, alowing greater reuse and lowering the workload on the server. ‘Whether the selections/fiters match: f the users changing fiters,oarameters, doing dil-downy Up, etc, the model s shared only with sessions that have a matching view state. Try not to publish workbooks with show selections" checked as this can reduce the Ikeinood that cifferent sessions will match, The credentials used to connect to the data source: f users a/= oromsted for credentials to connec to the dats source, then the model can be shares any across User sessions with the same recenifals. Use this capabi fy with caution 2s it can revue the effectiveness of the model cache. ‘Whether user filtering is used: if the workbook contains user fiters or has calculations containing: ‘functions such as USERNAME or IS MEMBEROF, then the madel is not shared with any other User sessions. Use these functions with caution as they can signifcantly reduce the effectiveness of ‘the madel cache. ‘The session models ate not persistent and expire after 30 mins of inactivity (default settingthis can be changed) in order to recycle memory on the server. Ifyou find that your sessions are expiring bosfore you are truly finished with 2 view, consicer increasing the VizQL session timeout setting ANALYTICS LAYER “The analytics layer includes the data manipulations and calculations performed on the underiying data ‘Tne objective of caching at this level Isto avoid unnecessary computations and to reduce the number cof queries sent to the underlying data sources. ‘Tne caching in this layer applies to both Tableau Desktop and Tableau Server: however, there sre ces in the caching mechanisms between the two tools. These cifferences are in ine with the on 3 single user (Desktop) and multi-user (Server) environment, {evwemon cuves west raAcrices FOR USING TABLEAU WITH SNOWFLAKE. 4 RT ys snowflake ABSTRACT QUERY ‘When a user interacts with a view, many physical queries may be generated, Rether than blindly eyscunng ther al Tableau groups them into # query batch ane cecomplies them to see itthere are any optimizations that can be achieved, This could involve removing duplicate queries, combining. ‘multiple similar queries into a single statement or even eliminating queries where the results of ane. can be derived from the results of another, Tableau maintains a cache of query results indexed by the logical structure of the query, and ths is checked to see i its necessary t9 proceed further to the netve query. ‘Tableau incorporates logic to not cache the results of * queries that return large result sets (t00 big forthe cache}: + queries that execute quickly (faster to run the query than check the cache! * gueries that use user fiters: or © queries that use relative dete filters. DATA LAYER ‘The data layer addresses the native connection bebween Tableau and the data sources, Caching at this level is about persisting the results of queries for reuse in future It also determines the nature of the connection to the data source - whether we are using live connections or the Tableau deta engine (replaced with Hyper in Tableau 10.5 anc later NATIVE QUERY ‘The native query cache is similar to the abstract auery cache, but instead of being indexed by the logics] query structure it's Keyed by the actual query sistement, Multis abstract queries can resolve to the same native query Lie the abstract query cache, Tableau incorporates losic to not cache the resuits of * queries that return large resuit sets (too big for the cache}: * queries that execute quickly (faster to run the query than check the cachet + queries that have user fiers: or + queries that use relative date fiters. SNOWFLAKE CACHING RESULT CACHING ‘When a query is executed, the results persisted in Snowflake for a period of time (currently 24 hours), at the end of which the results purged from the system, Apprsisted result is availabe for reuse by another avery as long as the user executing the query has the necessary access privileges forthe tablels) used in the query and all of the following conditions have been met: {Peowmon cuves west PaACTcEs FOR USING TABLEAU WITH SNOWFLAKE. RT ys snowflake + The new query syntactically matches the previously-executed query. + The table data contriauting to the a ted result for the previous auery is stil avaliable, + Any configuration options that affect how the result was produced have not changed, result has not changed + The 8 + The query does not include functions that must be evaluated at execution time (eg, CURRENT_TIMESTAMP), Result reuse can substantially reduce query time because Snowflake bypasses query execution and, instead, retrieves the result cirectly fram the cache. Each time the persisted result fora query is reused, Snowflake resets the 24-hour retention period for the result, up to a maximum of 31 dave from the dete anc tine that the query was frst evecuis. After 21 days, the result purees, snd the next time the query s submitted, a new result is returned and persisted. Result reuse is controled by the session osremsier USE CACHED RESULT. By Gefeult the psremeter is enabled but can be overridden at the account, user and session level if desired. Note that the result cache in Snowflake will contain very similar data as the native query cache in “Tablesu; however, they operate indevendently WAREHOUSE CACHING. Each warehouse, when running, maintains a cache of table data accessed as queries are processed by the warehouse, This enables improved performance for subsequent queries if they are able to read from the cache instead of from the table(s)in the query. The size ofthe cache is determined by the numberof servers in the warehouse: i. the larger the warehouse (and, therefore, the number of servers in the warehouse, the larger the cache. Tris cache is dropped when the warehouse is suspended, which may result in slower inital performance fer some queries after the wershouse is resumed. As the resumed warshouse runs and processes more queries, the cache Is rebut, and queries that are able to take acvantage of tne cache: ‘will experience improved performance. Keep this in ming when deciding whether to suspend 2 warehouse or leave it running, In other words consider the trade-off betieen saving credits by suspending 2 warehouse versus maintaining the cache of data from previous queries to help with performance, {Pevweeon cuves west PaACTCEs FOR USING TABLEAU WITH SHOWFLAKE. wb ys snowflake Other performance considerations CONSTRAINTS Constraints define integrity and consistency rules for data stared in tables. Snowflake provides suoport for constraints 2s defined in the ANSI SQL standard, as well as some extensions for comostiblty with other databases, such as Oracle. Constraints are provided primarily for data modeling purposes and compatibitty with other databases, as well as to support client tools that utilize constraints, Its recommended that you define constraints between tables you Intend to use in Tableau. Tableau Uses constraints to perform jain culing oin efimination), which can improve the performance of generated queries. I vou cannot define constrsints, be sure to set “Assume Referential Integrity" in the “Tablesy dais source to slow the query aenerstor to cul unnesdd joins EFFECT OF READ-ONLY PERMISSIONS Having the ality to ceate temp tables Is Important for Tableau users: when avaliable, they are used in multiple situations (eg. complex fiters. actions. sets, etc). When they are unavailable (eg. if you ate working with a shared database via # read-only account), Tableau wil try to use alternate avery structures, but these can be less efficient and in extreme cases can cause errors Example See the following visualizetion—a scatter plot showing Order Tetal Price vs. Supply Cost for each Customer Name in the TPCH-SF1 sample deta set, We have lassoed a number of the points and created a set which is then used on the color shel to show IN/OUT: \When we create the set, Tableau creates 3 temp table (in the background) to hol the lst of dimensions specified in the set (i this case, the customer name) {Grenson ouces nest PRACTICES FOR USING TABLEAU WITH SMOWFLAKE af Sb ys snowflake (CREATE LOCAL TEMPORARY TABLE Tableau 97-2_Fiter( "OK C_NAME" VARCHAR) NOT NULL "Tableau join ag” BIGINT NOT NULL ) ON COMMIT PRESERVE ROWS ‘We then populate this tern table with the selected values: INSERT INTO "# Tableau 37_2_Flter*(X_C_NAME:"X_Tableau jln fog") ‘VALUES (77), Finally, the following query is then executed to generate the required result set: [SELECT “ustomers""¢_ NAME" AS“C. NAME", (CASE WHEN (CASE WHEN (NOT (Fifer 1*°% Tableau Jln fag" IS NULL) THEN 1 ELSE 0 END} ~ 1) THEN 1 ELSE 0 END) AS “lst tn, 'SUM(Orders'"0_TOTAL PRICE’) AS "sur:_TOTALPRICEK SUMCPARTSUPP*"P5_SUPPLYCOST) AS "sum:P5,SUPPLYCOSToK* FROM "TPCHI'TLINEITEM "Lneltem" INNER JOIN “TPCHI'ORDERS' "Orders" ON (Lineitem™_ORDERKEY" = "Orde/s'70_ORDERKEY) INNER JOIN “TPCH1"CUSTOMER' "Customers" ON Orders-"0_CUSTKEY"~“Customers™C_CUSTKEY’) INNER JOIN “TPCHPART? Parts" ON (Ineltem_PARTKEY:~ “Parts PARTKEY?) INNER JOIN “TPCHI':PARTSUPP" *PARTSUPP” ON (‘Parts'"P_PARTKEY" = "PARTSUPP'PS_PARTKEY) LEFTJOIN *#Tableau_37_2 Fiter"*Fiter "ON (Customers™"C NAME" = "iter 1% CNAME?) GROUPEY 1, 2 If user does not have the right to create tempo tables n the target database or scheme, Tableau will create WHERE IN clauses in the query to specify the set members: SELECT "Customers7C_NAME”AS "C_NAME" FROM “TPCH_SF1""CUSTOME! customers! [WHERE (Cusiome:s. NAME" IN (Customer#000000388, Customer #000000412% "Customer#000000679; “astomers000001522,"Cstomer#000001948, ‘Customer#000001995, Custemer=000002266;, “Customer#000002548!‘Customert0000030197"Customer#000008433""Customer#000003451 ‘Customers000147362; 'Customer#000249548)) GROUPEY1 ORDER BY 1ASC {Orcnmron ouces nest PRACTICES FOR USING TABLEAU WITH SMOWELAKE &? st L ys snowflake However. poroach is subject to the limitations of the query parser. and you will encounter errors Ifyou ty to create @ set with > 16,884 marks: ‘SQL compilation error: eror ine 3 at position 31 maximum number of exceeded, expected at most 16,384, ot 40,483 ‘Unable to propery caleulate the domain forthe field Set 1. Displayed data may be incorrect. Goto Support coor) Show Guery Hide Deals This error could be overcome by converting the data connection fram 2 live connection te an extracted connection; however, this is a viable approach only if the data volumes are nat massive, {Orcnmsron ouces nest PRACTICES FOR USING TABLEAU WITH SMOWELAKE #8 a x snowflake Measuring performance IN TABLEAU PERFORMANCE RECORDER ‘Tne frst place you should look fo Tebleau Desktop and Server. You enable this ee Window Hale Ope Help er supper. Chaco radu Update. Wate Tang eee Samp Workbooks sample Galery noose Language Marage Product Keys. mance Recording feature of Char Sea Ser Sin in (Enable Aceclerated Graphics Manage ter SniceComecton ‘Set Dashioard Web View Soc {O-cnmaron ouces nest PRACTICES FOR USING TABLEAU WITH SMOWELAKE 4 wb ys snowflake Start performance recording, then open your workbook. Interact wih Its if you were an end user, and when you feel you have gathered enough data, go back in the Help menu and stop recording. Another ‘Tableau Desktop window will pen at this point with the data captured: vets Sorted by Te <<) ‘cr [inca sh ugencare ana agra ‘ome acne ne ath lf] Ona re — thet som ol fomcnog apy fey on oma ope = mee ompape) aetna opener ‘You can now identity the actions in the workbook that take the mast time; for exemple, in the above image, the selected query takes 0.8 seconds to complete, Clicking on the bar shows the text ofthe query being ‘eecuted. As the output ofthe performance recorder sa Tableau Workbook you can create adcitional views to explore this information in ather ways. ‘You should use this information to identity those sections of a workbook that afe the best candidates forreviews 2, where you get the best improvement for the time you spencl, More information on interoreting these recordings can be found by clicking the following lnk: https/tabsoft.co/1RdF420 Locs DESKTOP LOGS In Tableau, you can find detalled information on what Tableau is doing in the log files. The default location for Desktop is C:\Users\\ Documents\My Tableau Repository\Logs\log.tt Tris file is auite verbose and is written 2s JSON encoded text. but if you search for “begin-query” or "end-auery’ you can fine the query string being passed! to the dats source, Looking et the “end-ouery” log record will iso show you the time the query took te run and the number af records that were retuned to Tablea {evwmon cuves west PaACTCEs FOR USING TABLEAU WITH SHOWFLAKE. 5 a x snowflake (s2015-05-2¢71225:41.2267 pid 6460, "2674 eV “fo ea'T sess : tung. eit wotocot "4306707 algun ELT (DimPreduategary ProdutCsegoyfael {AS [none ProductCategoryamen],\n[DimProductSubcategory)[ProductSubcstegoryName] AS Inone:roduetSubeategoryNameins] \n SUMICAST((F2cSals|[ReturnQuonty) as BIGINT) AS IbumRcturnQuanttyoKl\n SUM(FactSales](SalesAmount) AS [sum:SalesAmount-ol\iFROM (dbo) [FacSales] [FaetSatesI\n INNER JOIN [dbo [DfmPreduct] [DimProduct} ON (FactSales|Preductky = [DinPreduct. [ProductkeyD\n INNER JON [abo),[DInProductsubcategory] [Dimeroductsubcategory] ON DimProduct [ProductSubestegorykey ~[DimProductSubeategory).[PreductSubcstegoryKeyi\n INNER JOIN ido (DinProductCategory|[DinProductCategory] ON (DimProductSubcetegory](ProductCategoryKey] = [DimproductCategory[ProduetCategoryKey)\nGROUP BY [DienPreduct Category [ProductCategoryName}\n (DimprocuctSubcategory)[ProductSubcatogoryNamelrows'32/elapsed0.951) Since Tebleau 10.1, JSON files are @ supported date source, so you can also open the log files with “Tabless for sesieransiyis. You can analyze the Sine it takes for each event, what queries are being run and how much data they are returning: eesnea oe Nea Owe Another useful tool for analyzing Tableau Desitop performance [s the Tableau Log Viewer. This cross- platform tool allows you to easily views filter and search the records from Tableau's log file. A powerful feature of TLV is that it supports live monitoring of Tableau Desktoo, so you can see in real-time the log entries creeted by your actions in Tableau. {Pevwemon cuves west PEACTICES FOR USING TABLEAU WITH SHOWFLAKE 51 wb ys snowflake ‘This too! is made avelaole as-is by Tableau nd can be cawnleaded from GitHub: hetpurbitiw/2rQJecu ‘Additionally, you could load the JSON logs into Snowflake and take advantage of the VARIANT data type to help with your analvsts SERVER LOGS If you are looking on Tableau Server, the logs are in C:\PragramDate\Tableau\Tableau Serverideta\ tabsve\\Logs, In most cases, you will be Interested in the VizQL Server lg files. If you do not have console access to the Tableau Server machine, you can download @ ZIP archive ofthe log. files from the Tableau Server status page: Logfiles Jan 272085 502M cmd (oot | a Just ke the Tableeu Desktop log fles, these are JSON formaties text fles,s0 you can open them in ‘Tableau Desktop or can reac them in their raw form. However, because the Information about user sessions and actions fs spread across multiple files (coesponding to the various services of Tableau Server) you may prefer to use a powerful tool called Logshatk to analyze Server logs. Logshark is 2 command-ine utilty that you can run against Tableau logs to generate a set of workbooks that provide insights into system performance, content usage and error investigation. You can use Logshark to visualize, investigate and solve issues with Tebleau at your convenience. {Pevweeon cuves west PaACTCES FOR USING TABLEAU WITH SHOWFLAKE 52 Ry 2 < snowflake Alcs ae + dhe ‘You can find out more about Logshark st the folowing link: hitou/tebsoft.co/2rQKBoS ‘And you can download it from GitHub: hetpubitly/2rQS0qn Additionally, you could load the JSON logs into Snowfake and take advantage of the VARIANT data type to help with your analysts SERVER PERFORMANCE VIEWS. ‘Tebleau Server comes with several views to help administrators monitor activity on Tableau Server, ‘The views sre cated in the Analysis table on the server's Maintenance cage. They cen prove ‘valuable information on the perfarmance af individual workbooks, helping you focus your attention on those that are performing poorly: {Grenson auoes nest PRACT FOm USING TABLEAU WITH SMOWSLAKE 52 More information on t se Views can be found at the folowing link: httoxtebsoft.co/ 1R}CCL2 Adgitonally, custom administrative views can be created by connecting to the PostgreSQL databe: that mstes uo part of the Tableau repository. Instructions can be found hers httoutabsoft.co/1RICACR TABMON ‘TebMon, is an open source cluster monitor for Tableau Server that allows you to collect performance statistics aver time. TabMon is community-supported, and we are releasing the full source code under the MIT open source license. ‘TabMon records system heaith and application metrics aut oft ‘Windows Perfmon, Java Health and Java Mbean MX) counters on Tableau Server machines across @ network You can use TabMon to monitor physical (CPU, RAM), network and hard-clsk usage. You can track cache-it ratio, request latency, active sessions and much more, It cisplays the dé Unified structure, making it easy to visualize the data in Tableau Desktop. box collects bult-in meties tke ina dean, Genmrson ouces vest a x snowflake a ‘TabMon glves you full control over which metrics to collect and which machines to monitor, no serlpting or cosing required. All you need to know is the machine and the metric name. Tat Mon can un bath remotely and independently of your cluster. You can monitor, aggregate and analyze the health of your clusters) from any comauter on your network with almost no added load to your production machines. ‘You can find more information on TabMon here: hetpubit y/1ULFe TABJOLT ‘While not exactly a performance monitoring tool, TabJott is very useful to helo you determine if your prablems are relates to platform capacity /ssues. Teblot is particularly useful for testing Snowflake mmuit-cluster warehouse scenarios when you want to simulate 2 load generated by multiple users. ‘Teblolt is a "point-and-run’ load and performance testing tool specifically designed to work easily with ‘Tableau Server. Unlike trectional load-testing tao, TabJolt can automatically crive load against your ‘Tableau Server without script development or maintenance, Because Tablott is aware of Tableau's tation model, t can automatically load visualizations and interpret possible interactions during execution, Tris allows you to ust point Tabsott to one or more workbooks on your server and automatically load and execute interactions on the Tableau views. Tablott also collects key metrics including average response time, throughput and 95th percentile response time, and captures Windows performance metrics for correlation, (Of course, even with Tebdolt, users should have sufficient knowledge of Tableau Server's architecture. ‘Treating Tableau Server asa black box for load testing is not recommended and will ikely yield results that arentin line with your expectations. ‘You can find more information on Tablolt here: eto: ty/AULE gi {Grenson aces nest PRACT FOm UNING TABLEAU WITH SMOWSLAKE SE RT ys snowflake IN SNOWFLAKE Snowflake has several built-in features that allow you to monitor performance of the queries generated by Tableey and to link them lack to speciic workbooks, dats connections and users. SNOWFLAKE INFORMATION SCHEMA ‘The Snowflake Information Schems [aka ‘Data Dictionary’) Is a set of system-defined views and functions that provide extensive information about the objects crested in your account, The Snowflake Information Schema is based on the SQl-92 ANSI Standard Information Schema, but also includes views and functions that ae specific to Snowflake, SNOWFLAKE QUERY HISTORY \Within tne Information Schema is @ set of tables and table functions that can be used to retrieve Information on queries that have been run during the past 7 days httos:/docs snowflake:net/manuals/sal-reference/functions/auery_history. “These queries can return detaled information on the profile execution timing and profile of queries, “yh Gan then be used to determine F your warehouse sizes are appropriate. You can se= information fon the start tme, end time, query duration, # rows retumed and the data volume scanned. You can aso see the amount of time queries spend queued vs, executing, which is useful Information when considering adding mult-cluster scaling to your warehouses. ‘A.useful way to access this information is through the query_histary table functions via Tableau: {Deven cuves west PEACTICES FOR USING TABLEAU WITH SHOWFLAKE. RT ys snowflake ‘Tne query history for an account can also be accessed from the Snowflake web interface: TEEEEFERRERREEES This page displays all queries executed in the previous sessione and queries executed by other users, This view shows query duration broken down by aueveing, comolltion and execution and is also useful for identifying queries that are resolved ‘through the result cache, last 14 days, including queries executed by you in SS aaa : 1 i i i t q 7 i pritegisr iiiiit {O-cnmenon auoes nest PRACT FOm UNING TABLEAU WITH SMOWSLAKE 57 wb ys snowflake By clicking on the hypertink i forthe selected query: Query ID column, you can etl through to more detailed statistics Tris is also where you have access to the query profile, SNOWFLAKE QUERY PROFILE Query Profle is @ powerful tool for understanding the mechanics of queries. Implemented as a page in the Snowflake web interface, It provides @ graphical tree view, a8 wal 2s cetalled execution statistics, for the major components of the pracessing plan for a query. it's designed to help troubleshoot issues in SQL query expressions that commonly cause performance bottlenecks. To access Query Profle: © Goto the Worksheet or History gee in the web interface, * Click on the Query ID for a campieted query. + In the query detalls, click the Profile ta. {Gowen ouDEs oEsr PRACTICES FOR USING TARLEAM ITH SHOW FLAKE 5 RT ys snowflake ‘To help you analyze query performance, the detal pane! orovides two classes of profiling information: EXECUTION TIME Execution time provides information about ‘where the time was spent’ during the processing of @ ‘query. Time spent can be broken down into the following categories, displayed in the folowing order: + Processing-time spent on data provessing by the CPU + Local Disk 1O—time when the processing was blocked by local disk access + Remote Disk 1O—time when the processing was blocked by remote ask access + Network Communication work data transfer + Synchronization various synchronization activities between participating processes © Initialization—tme soent setting up the query erocessing when the processing was waiting for STATISTICS Amajor source ‘following sections: formation provided in the detail pane! consists of various st stcs. grouped in the + 10—information about the input-outout operations performed during the query ~ SCAN PROGRESS—the percentage of data scanned for a given table so far ~ BYTES SCANNED—the number of bytes scanned so far ~ PERCENTAGE SCANNED FROM CACHE-the percentage of dats scanned fram the local disk cache ~ BYTES WRITTEN-bytes vnitten (e., when loading into a table) ~ BYTES WRITTEN TO RESULT—bytes written to 2 result object ~ BYTES READ FROM RESULT-bytes read from @ result object ~ EXTERNAL BYTES SCANNED—bytes read from an external object (e. 2 stage) {GowHeon oUDES oEsr PRACTICES FOR USING TANLEAM ITI SHOW FLAKE 3 RT ys snowflake + DML-statistics for Data Manipulation Language (DML) queries: NUMBER OF ROWS INSERTED—number of rows inserted into table (or tebes) NUMBER OF ROWS UPDATED-number of rows undated in a table NUMBER OF ROWS DELETED—number of rows deleted from 2 table NUMBER OF ROWS UNLOADED-number of rows unloaded during data export NUMBER OF BYTES DELETED—number of bytes deleted tram a table + Pruninginformation onthe effects of table pruning ~ PARTITIONS SCANNED—number of partitans scanned! so far ~ PARTITIONS TOTAL-total numberof partitions ina given table + Spiling information about disk usage for operations where intermeciate results 60 nt fin memory ~ BYTES SPILLED TO LOCAL STORAGE™volume of data spl to loca disk ~ BYTES SPILLED TO REMOTE STORAGE—volume of data spilled to remote disk + Network—netwark communication: ~ BYTES SENT OVER THE NETWORK-amount of data sent over the network ‘Tris information can help you identify commen probl s thet occur with SQL queries “EXPLODING” JOINS ‘Two common mistakes users make is joining tables without providing ajoin condition (resulting in 2 ‘Cartesian Product’) and getting the join condition incorrect (resulting in records from one table matching multiple records from another table). For such queries the Join operator produces significantly often by orders of magnitude) more tuples than it consumes. Tris can be observed by looking at the number of records produced by a Join operator and typically is also reflected in Join operator consuming a lot of time. ‘The following examole shows input in the hundreds of records but outout in the hundreds of thousands: select tt. te from join t2 on tttet = tet and tte? = 202: {ewemon cuves west PaAcrices FOR USING TABLEAU WITH SNOWFLAKE. RT ys snowflake ‘Tne recommended action for this problem Is to review the join conditions defined in the Tableau data connection sereen and correct any omissions, QUERIES TOO LARGE TO FIT IN MEMORY For some operations (e.. dupicate elimination for a huge data set), the amount of memory available for the servers used to execute the operation might not be sulfcient to hole intermesiate results. As 2 result, the query processing engine will start sping the cata to a lacal disk. if the local disk space is Incufficient, the spilled data is then saved to remote cis. ‘This soiling can have # profound effect on guery performance especially cist). To alleviate ths, we recommend: sis ssplled to 2 remote * Using 3 lager warehouse (effectively increasing the available memory/lacalcisk space for the operation), and/or + Processing dats in smaller batches INEFFICIENT PRUNING Shawflske collects rich statistics on data allowing it to avoid reading unneeded parts ofa table bases on the query filters. However. for this to have an effec, the data storage order needs to be correlated with the query fiter atibutes, ‘The efficiency of pruning can be cbserved by comparing Partitions scanned and Partitions totsl statistics in the TableScan operators. If the former isa smal fraction ofthe latter, pruning is eft ‘ot, the pruning did not have an effect. Of course, pruning can help only for queries that actually iter out a sigicent amount of cata If the pruning statistics éo not show data reduction, but there isa Filter operator above TableScan that filters out 2 number of records, this might signal that a different data organization might be beneficial for this query. LINKING PERFORMANCE DATA BETWEEN TABLEAU AND SNOWFLAKE (One of the challenges users will face when examining performance is connecting gueries thet unin Snowflake with the workbooks and user sessions in Tableau that generated them, One useful way to do this isto set the QUERY_TAG session variable using an intial SQL statement. {Pevwemon cuves west PEACTCES FOR USING TABLEAU WITH SHOWFLAKE. a 3% snowflake ‘Tableau provides paremeters that can be inserted into initial SQL statements that will pass through attrioutes from the Tableau environment into Snowfike: ToblesuServerUeer_— | The eer name ofthe current server user. Use when esting up maersonanon onthe eavver Sans 27 ematy fing fthe ser isnot signee nfo Tadley Server TeSesUGenerLeesu) | Tae user came ns aoe ctone surant server sen se wnen setting usimpersenaton on te server [Returns an empty tring tne user i at Sard 922 Tesenuape “hs pane ot the Tesesu spp azon Teolesu Desktop Potesio Tools Server Tea “The version ofthe Tableau apateson 5S ‘WerkbockName | The name cf the Tblesu workbook Use enyin wongoots witn en enbecded dat source Francia Anahsie If you use the TableauServerUser or TableauServerUserFull parameter in an initial SQL statement. ‘you will create = dedicates connection in Tableeu that can't be shared with other users. This wil iso restrict cache sharing, which can enhance security but may slaw performance. For more detailed information, see the Tableau documentation here: http:/onlinehelp tableau com/current/pro/des initialsaLhtm| Example Using an intial SQL bi the following statement is used to set the QUERY_TAG session verible: ALTER SESSION SET QUERY.TAG = [WorkbookNmel{TaleauAppTableauServer User: ‘The resuiting value is then availabe from the Query History table function and can be used to sttlaute quer es to speciic workbooks and uses: (Query Times he Tie a TT ott hima oo {Grenson auoes nest PAACTICS FOR UNING TABLEAU WITH SMOWSLAKE 2 wb as snowflake Find out more Snowfleke isthe only data warehouse built forthe cloud, enabling the dats-criven enterorise with instent elasticity, secure data shering and per-second pricing. Snowfake combines the power of data warehousing, the flexibility of big data platforms and th ity ofthe cloud at a fraction of the cost of traditional solutions. Snowflake: Your data, no limits. Find out more at snawfake.net.

You might also like