You are on page 1of 52

Is your data source up to it?

The bottom line is that a Tableau viz will only ever be as fast as the underlying
datasource. It wont magically turbo-boost an ageing relational database server or put a
rocket up your warehouse derriere. Check your views and queries in the source first and
if necessary, extract the data to a Tableau Data Extract (.tde). Theres much more info
about why you should consider this option in this great article by Tom Brown.

Avoid using custom SQL

When connect to multiple tables in database, Tableau writes a query which has been
optimized for the data source.
writing custom SQL may cause Tableaus connection to slow down.

Optimized calculated fields

Here are a few best practice suggestions from Tableau programming wizards. Its
unlikely that any one of these will suddenly solve all problems in a slow viz, but every
little helps.

Avoid string calculations if you can they take much more computing time than
mathematical operations

Correction: in Tableau 7, both if variants evaluate both sides of the condition

result i.e. computes the ELSE side even when true and THEN side even when
false. In Tableau 8 this is fixed so in fact IF should be faster than IIF. Apologies to
anyone who changed anything based on this tip and thanks to JD and JM for
pointing this out

Avoid lots of CASE statements

Use WINDOW_SUM rather than TOTAL

Use MIN() rather than ATTR() when aggregating a dimension

For nesting calculations, we have to create calculated fields separately and then
combine them in additional calculated fields

If use an extract then send the row level calculations to the source by selecting

quick filters
too many quick filters will slow down, particularly if you set them to use Only Relevant
Values and youve got lots of discrete lists.

Use the right granular level of data

Suppose database contains 5 years of data at a transactional level
So, that will be messy and not able to present all details in a tableau desktop all at once.
Pre-aggregate your data and roll it up to days or months.
Suppose you are want to visualize the data at individual transactions level for that then
connect to your data with a context filter then only it visualizes data (e.g. the most recent
3 months or quarter).
If workbook working well in Tableau Desktop but it is slow when view it in Server:-

Make sure install Server on a 64bit OS!

Server should be dedicated for Tableau and make sure its not competing for
resources with anything else.

Tableau Server can run on fairly modest hardware but of course, more is always

Ensure that youre caching the data if dont need live data

data source refreshes happen overnight or at times when users are not logged

Drop the timeout for inactive sessions down from the default 30 minutes

Identify usage patterns and try some demand management if there are very high
peak times

Tweak the number of concurrent processes depending on the usage of the

server. For example if things slow down when lots of users are logged in, up the
VizQL processes. There is a very good article on Tableaus website that
describes different configuration options here.

When publishing dont use the Show sheets as tabs option

You are using a 64bit OS, arent you?

Whats a Tableau Extract?

An extract in Tableau is a special type of database, optimised for use by Tableau. Its
super fast, portable and a great way to handle large data sets. But how do you create
one, and what creative ways are there to make use of them?

Firstly, how do you create and extract?

This bit is simple, assuming you are using Tableau Desktop. After you have connected
to your data, go to the DATA menu and choose EXTRACT DATA then accept the
defaults on the dialog box (although more on this later). Tableau will as where you want
to save the extract choose any location to save the file, although Tableau will probably
direct you towards My Tableau Repository | Datasources which is just fine too!
Now wait for the extract to be created, how long youll wait depends on the database
technology youre using, network speed, data volumes that sort of thing.
Youll know its done when the data source icon changes it will have another database
icon behind it, presumably representing a copy, which is exactly what an extract is.

So, thats how to create on (well the basics anyway) but WHY would you want

Extract reason #1 Performance

Extracts are FAST. They are awesomely fast with a million rows, and they only slow
down to amazingly fast with 100 Million rows.
But extracts are not just for big data sets. Once your familiar with the Tableau interface,
even a 1 second delay when using the interface can be annoying you get used to the
speed and responsiveness quickly. I recommend using extracts for any data set which
is anything other than super-responsive.
For example:
If you have 50,000 rows in Excel its likely to be annoyingly slow. An extract solves this
If youre using a Remote SQL server, create an extract over lunch, and youll have
instant response times when you get back.

Extract reason #2 Offline analysis

Extracts are files that you store on your PC (file extension .tde). So of course you can
take one with you on holiday. If you just cant sit by the pool without dreaming about
histograms and scatterplots, then extracts could be for you
They work on planes too.

Extract reason #3 Accessing additional functionality

Some database technologies have restrictions that stop Tableau communicating with
them as they would like. The most common problem caused by this (in my experience)
is the absence of COUNT DISTINCT when using Excel as a data source.
To see this in action, connect to an Excel source, and then drag a dimension out using
the RIGHT MOUSE BUTTON. When you drop this, youll get to choose from the
following options:

Which is no good for me!! Im trying to get COUNT DISTINCT

But, if I change to an extract

Now I get COUNT DISTINCT which allows some really insightful visualisations to be

The same holds true if you are trying to get a MEDIAN value for a measure, this is not
available from Excel, so youll need an extract.


Extract reason #4 Creating packaged workbooks

Packaged workbooks are a fantastic bit of Tableau functionality, allowing you to created
a distributable file which can be an interactive visualisation. This can be opened by
users of the FREE Tableau Reader which you can download here.
BUT you cant package data which is held on remote servers, so you cant package
data from a SQL server, Oracle etc.
UNLESS you create an extract first. Then youll have no problems.

Extract reason #5 Publishing to Tableau Public

You cant use packaged workbooks without extract as described above, but neither can
you use Tableau Public. If you want to publish data to the web, youll have to use an

Extract reason #6 Data security

Heres a more subtle use case. Suppose you work at a hospital, and you are REALLY
NOT ALLOWED to share patient level information yet you want to create a packaged
workbook with a data source which does contain this information. What do you do?
Heres where the extra functionality of the data extract dialog comes into play.
When you extract data, youll see this window. Notice the button at the bottom, this
removes all the dimensions and measures which you have NOT used in ANY VIZ from
the extract.

The extract you have then created can be packaged, safe in the knowledge that any
fields you did not want to be visible are not even in the extract, so they cant be
Using this same dialog, you can restrict the ROWS (rather than columns) which are
included in your data set by using the filter section. Consider using a relative date filter
for an extract which contains (say) the last 3 weeks of data).

Extract Reason #7 Double aggregations

This is my personal favorite.

Suppose you have timesheet data which has one row per employee/day. You want to
know the average hours recorded per MONTH.
To solve this one, you first need to aggregate the data by employee/month, and then
produce a view which then averages the data by month (hence the double aggregation).
Extracts address this problem as they can be used to perform the first level of
aggregation, providing a new data source over which to run the Average calculation.

This step is performed in the extract dialog window, by Aggregating visible

dimensions and then rolling up dates to month.

You can also use this feature to massively reduce the size of the extract file by Hiding
dimensions in the data window before you take this step.

How to publish an unpopulated #Tableau extract.

Occasionally youll run into a scenario which calls for a Big (with a capital B) extract. You
know your Tableau Server is beefy enough to handle the creation of the extract, but your
little laptop just doesnt have enough hard disk space.
How can you get your workbook (which needs that Big extract) published to Tableau Server
without having to create the darn extract on your machine?
It turns out doing so is pretty easy.
Were going to work with Big ol data source in this example - It points at a SQL Server
which contains about a gazillion rows (give or take a bajillion).
The first thing were going to do is create a simple Calculated Expression which returns
todays date:

Next, its time to create our extract on the data source - however were going to add a magic
filter to the extract (you read that right its magic).

After clicking Add, choose to base your filter off the Calculated Expression (What time is
it?) you just added. Note that a single value is returned by the filter, and its todays date.
Select that value, and Exclude it.

Thats about it. You can now jump back out to the Extract Data dialog, eyeball your new
filter and then create the extract. The filter youve just setup (Today == March 8 is false)
will prevent any rows from being returned for the rest of the day :

After the extract process has completed, check out its properties. See, zero rows!:

At this point, youll want to publish the workbook and extract to your Tableau Server make
sure to choose your refresh schedule!
When tomorrow rolls around on your Server, the nice little filter youve created will no longer
evalulate to False, and the floodgates will open your Server will have to deal with the
gazillion-ish rows that are waiting to populate the extract.

Tableau on Tour - Optimising Dashboard Performance - Mrunal

You need to start thinking about performance right from the start of your design. If you leave it to the end,
it is probably too late.
Basic principles it sounds like Im being a parent. Im just being practical

Everything in moderation


If it isnt fast in the database, it wont be in Tableau (unless you are using Extracts)


If it isnt fast in desktop, it wont be fast in Server

.tdes up to a few hundred million rows of data dont replace your data warehousing solutions
Flat files are opened in a temporary location and therefore doesnt make anything faster. Its using RAM.
Use an extract to apply indexing.
Server will only beat Desktop when you are hitting the server cache (remember folks, server caching has
improved a lot in v9)
4 major processes in desktop:


Connect to data


Native connection vs Generic ODBC (use the driver so it is fast and robust)
i. Slow loads could be to a lack of referential integrity
ii. Custom SQL is respected by Tableau and avoids join culling etc


Executing Query


Aggregations, Calculated Fields and Filters


Calcs use Boolean instead of IF? Remove String manipulation and DATEPART()


Filters often the culprit of slow performance


Computing Layout


Marks, Table Calcs and Sorting


Adding labels and working out if the labels are overlapping that is likely to take a long time


Table Calcs are happening locally so consider pushing back to the data source


Computing Quick Filters


If something isnt likely to change than having to populate the list of filter options. Dropdowns and
wildcard are better as they dont need to be pre-populated.

Visual Pipeline
Query > Data > Layout > Render

Query query database, cache results


Data Local data joins (location data from Tableau joining together with data set), Local calcs, local
filters, Totals, Forecasting, Table Calcs, 2nd Pass filters, Sort


Layout Layout Views, Compute Legends, Encode marks


Render Marks, Selection, Highlighting, Labels

Parallel aggregations in v9 really make a difference

External query cache (aka persistent query cache) the cache is being written to the disk
Multiple data engines have helped but Query Fusion will assist by working out the common
dimensions / aggregations and then working out locally what data is needed for each visualisation

The visual pipeline allows you think about what is happening.

To put the measure on level of detail will help with speed of interactivity

Mrunal uses 144 million rows of flight data to explore performance issues

Shows full list for filtering (expensive) and three quick filters (all having to be queried for each stage)

Relative date filter or range date filters are faster than date part filters
Using views for filters improves performance and the use of dashboard actions make life faster

Adding parameterised filter to the data source moves it up in the order of operations making your data
source smaller, sooner
Mrunal and I will disagree about what the better User Experience is between filters and actions. When
labelled well, I personally think dashboard actions make for a lot better experience and keeps you
focused on the dashboard rather than the tool.

Aggregate to Visible Dimensions is a great data granularity saver. Hide All Unused Fields make the data
set thinner.

We here at Tableau are very proud of how easy it is to see and understand data with Tableau.
Once you get started, its intuitive to dive deeper by adding more and more fields, formulae, and
calculations to a simple visualizationuntil it becomes slower and slower to render. In a world
where two-second response times can lose an audience, performance is crucial.

So where do I start?
So how can you make your dashboards run faster? Your first step is to identify the problem spots
by running and interpreting your performance recording. The performance recorder is every
Tableau speed demons ticket to the fast lane. The performance recorder can pinpoint slow
worksheets, slow queries, and long render-times on a dashboard. It even shows the query text,
allowing you to work with your database team on optimizing at the database level.

Now that you know which views or data connections are slowing you down, below are six tips to
make those dashboards more performant. For each tip, weve listed the most common causes of
performance degradation as well as some quick solutions.

1. Your data strategy drives

Extracts are typically much faster to work with than a live data source, and are especially great
for prototyping. The key is to use domain-specific cuts of your data. The Data Engine is not
intended to be a replacement for a data warehouse. Rather, its meant to be a supplement for fast
prototyping and data discovery.
Since an extract is a columnar store, the wider the data set, the slower the query time.

Minimize the number of fields based on the

analysis being performed. Use the hide all
unused fields option to remove unused columns
from a data source.

Minimize the number of records. Use extract

filters to keep only the data you need.

Optimize extracts to speed up future queries

by materializing calculations, removing columns
and the use of accelerated views.

Keep in mind: Extracts are not always the long-term solution.

The typical extent of an extract is between 500 million to one
billion rows; mileage will vary. When querying against
constantly-refreshing data, a live connection often makes more
sense when operationalizing the view.
For more information on data extracts, check out these additional resources:
Video: Using and Refreshing Extracts
Online Help for Extracting Data
Understanding Tableau Data Extracts (three-part series)

2. Reduce the marks (data points) in

your view
When data is highly granular, Tableau must render and precisely place each element. Each mark
represents a batch that Tableau must parse. More marks create more batches; drawing 1,000
points on a graph is more difficult than drawing three bars in a chart.

Large crosstabs with a bevy of quick filters can cause increased load times when you try to view
all the rows and dimensions on a Tableau view.
Excessive marks (think: data points) on a view also reduce the visual analytics value. Large,
slow, manual table scans can cause information overload and make it harder to see and
understand your data.
Heres how you can avoid this trap:

Practice guided analytics. Theres no need to fit everything you plan to

show in a single view. Compile related views and connect them with action
filters to travel from overview to highly-granular views at the speed of

Remove unneeded dimensions from the detail shelf.

Explore. Try displaying your data in different types of views.

3. Limit your filters by number and

Filtering in Tableau is extremely powerful and expressive. However, inefficient and excessive
filters are one of the most common causes of poorly performing workbooks and dashboards.
Note: Showing the filter dialog requires Tableau to load its members and may create extra
queries, especially if the filtered dimension is not in
the view.

Reduce the number of filters in

use. Excessive filters on a view will
create a more complex query, which
takes longer to return results. Doublecheck your filters and remove any that
arent necessary.

Use an include filter. Exclude filters

load the entire domain of a dimension,
while include filters do not. An include
filter runs much faster than an exclude
filter, especially for dimensions with
many members.

Use a continuous date filter.

Continuous date filters (relative and
range-of-date filters) can take advantage of the indexing properties in your
database and are faster than discrete date filters.

Use Boolean or numeric filters. Computers process integers and Booleans

(t/f) much faster than strings.

Use parameters and action filters. These reduce the query load (and work
across data sources).

4. Optimize and materialize your


Perform calculations in the database. Wherever possible, especially on

production views, perform calculations in the database to reduce overhead in
Tableau. Aggregate calculations are great for calculated fields in Tableau.
Perform row-level calculations in the database when you can.

Reduce the number of nested calculations. Just like Russian nesting

dolls, unpacking a calculation and then building it takes longer for each extra

Reduce the granularity of LOD or table calculations in the view. The

more granular the calculation, the longer it takes.

LODs - Look at the number of unique dimension members in the


Table Calculations - the more marks in the view, the longer it will
take to calculate.

Where possible, use MIN or MAX instead of AVG. AVG requires more
processing than MIN or MAX. Often rows will be duplicated and display the
same result with MIN, MAX, or AVG.

Make groups with calculations. Like include filters, calculated groups load
only named members of the domain, whereas Tableaus group function loads
the entire domain.

Use Booleans or numeric calculations instead of string calculations.

Computers can process integers and Booleans (t/f) much faster than strings.

5. Take advantage of Tableaus

query optimization

Blend on low-granularity dimensions. The more members in a blend, the

longer it takes. Blending aggregates the data to the level of the relationship.
Blends are not meant to replace row-level joins.

Minimize joined tables. Lots of joins take lots of time. If you find yourself
creating a data connection with many joined tables, it may be faster to
materialize the view in the database.

Assume referential integrity if your database is configured with this


Remove custom SQL. Tableau can take advantage of database

optimizations when the data connection does not use custom SQL.

6. Clean up your workbooks!

Reduce dashboard scope. Excess worksheets on a dashboard can impact


Delete or consolidate unused worksheets and data sources. A clean

workbook is a happy workbook.


There are some good reasons to look at the actual queries Tableau sends to a live relational
1. Performance is not good and your DBA needs to know what is happening so they can
optimize. This goes for live connections and slow extract generation.
2. Your viz still isnt performing well enough even with extracts, but you dont
understand the TQL language you see in the Performance Recorder (its okay, no one
does!). Seeing the same logic in SQL can help you understand what exactly is going
3. You just want to marvel at the amazing ability of the VizQL engine to translate your
actions into SQL. You should check out what LOD calculations, Sets, or calculated
fields look like sometime just to marvel at what is going on
Read more for how to accomplish #1
The best method for testing queries is using Tableau Desktop. It has two benefits over
Tableau Server (1) You can very quickly clear out your log files from My Documents\My
Tableau Repository\Logs (2) Only your actions are written into the logs with a single VizQL
session going. Server logs have multiple processes and you dont want to go deleting them
to single things out.
Note: You should always test with a Live Connection that is embedded into the
workbook i.e. dont use a source that is published to Data Server. This will show
you the SQL queries. If you use a Published Data Source, youll see an XML
theoretical query in the Desktop logs and Performance Recording. If you use an
extract, youll see TQL, the proprietary Extract Engine query language that is not
documented anywhere.
One method to see the queries that are sent is to use the Performance
Recorder, documented well on the Tableau website. There is also a document on interpreting
the performance recording. The only thing Ill add, is that the Query is often cut off in the viz,
so you should do See Underlying Data and then copy the results into a text editor or your
SQL management tool of choice.

The other method is to look at the Tableau logs. As mentioned earlier, the logs for Desktop
live in My Documents\My Tableau Repository\Logs . You want to look for the ones
labeled tabprotosrv to see the details of the querying, although sometimes log has the
basic query and elapsed time. Since Tableau 8.2 or so, the logs are in JSON format for ease
of processing using machine reading tools. I recommend using a tool that cleans up the JSON
and makes it easy to read through. The equivalent logs for Tableau Server live
in C:\ProgramData\Tableau\Tableau Server\data\tabsvc\logs\vizqlserver\ or the
equivalent of that in the zipped up log files (they will be directly under vizqlserver ).
1. Close Tableau Desktop
2. Delete all the log files in the Logs directory
3. Open up Tableau Desktop. Do whatever action gets you to the query where you see
4. Stop doing anything in Tableau, minimize it
5. Go to the log file and look for the latest end-query.
You want to look for the keys that say end-query . These show the full query that was sent
and the elapsed time. You can take that query and put it into your SQL tools (using EXPLAIN
or other tools) to see why in particular it is running slow. You can only affect the query
Tableau generates by changing what you show in the viz and sometimes a slight change
can make all the difference. However, most people then go to their DBA(s) and have them
work out optimizations whether that is doing maintenance, adding indexes, or creating
views to move more of the logic down into the database.

Optimizing Custom SQL

Custom SQL leads consistently to particularly poorly optimized queries, because Tableau
must wrap its queries around the ENTIRE Custom SQL query

every time. As the Tableau

queries get more complex, the optimizer in the database system tends to have more trouble
creating optimal query execution plans, and thus things get slower.
There are two solutions:
1. Translate your Custom SQL into a set of JOINs and the appropriate filters in Tableau.
Some of the things youve built in, you may be doing in Calculated Fields or

aggregations in Tableau. This is okay, and can often significantly increase

performance and flexibility
2. Put the Custom SQL into the database as a View. Tableau will see the view as a
regular table, and then send a more simplified quickly. This can be useful when using
the Custom SQL as the basis for an Extract. It also gives an opportunity for the DBA
to optimize where they can.

Youve come to this page because you want to know how to make Tableau perform as
efficiently as possible. Performance always starts in Desktop, where you connect to data and
build the worksheets and dashboards that you will then publish to Tableau Server. Tableau
Server is running the same VizQL engine as Desktop, so if its slow in Desktop, it will be slow
on Server.

Tableau Desktop
The most complete guide to everything Tableau Performance is Alan Eldridges
magnificent Best Practices for Designing Efficient Tableau Workbooks: Third Edition . Its
unlikely anything said anywhere else is not in this guide, but it is a hefty PDF tome. If you
want to become an expert, it is essential reading.

Database Connections / Join Culling

The most efficient way to set up your data for Tableau is a standard star schema with all
INNER JOINs, because it allows you to use Assume Referential Integrity
This is particularly relevant for MPP databases, which also need their own proper
configuration to run efficiently.
If your database has all of its Primary and Foreign Key relationships defined, Tableau can also
cull down unnecessary tables out of its queries (I believe INNER JOINs are still preferable).

Custom SQL

Custom SQL is the slowest way to go, because Tableau includes the entire Custom SQL query
in every subsequent query that it writes. If you can find a way to replace Custom SQL, even
if it is putting it in as a View in the database, it is worth doing this.
If you want to see what queries are happening so you can optimize them at the database
level,here is a guide

Big Data Systems

If you are using a big data system, make sure to think about how to pre-aggregate overview
data, rather than making that system repeatedly process roll-ups live. Both extracts and live
connections can be used together for optimal performance.
Slalom has a nice guide to how to optimize Redshift with Tableau.
You might need to do put in place a TDC file to get the best performance configurations for
your given system in place.
Vertica Performance in Tableau & ODBC Customization
ODBC Modifications for Big Data Systems
Russell Christopher put together an amazing tool for creating the TDC file correctly.

Designing for Performance

You should really read Alans guide, but here are my quick tips:
1. Watch and understand the Dashboard Best Practices video. Design by the Guided
Analytics philosophy: Use visualizations at the high level to filter down to the lower
levels of detail.
2. Hide your lowest detail levels, particularly any long lists, using the Exclude All
Valuesoption in actions. This is easier on your eyes, and far easier on your database
3. If you are embedding Tableau, take out all the fancy stuff and branding and put it in
your portal. The more stuff you have, the more Tableau has to draw
4. If you find yourself scrolling and scrolling, rethink that visualization. There must be
some other way of looking at things to help identify what is important to drill into

5. Minimize your cross-tabs of text. Keep them for the end, once you have filtered.
6. Use Actions for drilldown, but make sure to optimize them

Tableau Server
Tableau Servers performance is determined by two aspects: (1) Did you build things to
perform well in Desktop? (2) Is there adequate hardware for Tableau Server to run
efficiently?. For (1), see above. For (2), see the sections below.
Regarding configuring the processes on Tableau Server, the defaults are the defaults for a
reason. In most cases, two of any process per worker node is all you need.

Hardware / Virtualization
General thoughts on Tableau in a virtualized environment

Disk I/O / IOPS

The one thing youll notice when reading about AWS or Azure, is that disk read and write
speed, or IOPS, is an important limiting factor in Tableau Server performance. There is no
minimum disk I/O recommendation from Tableau, other than a minimum of 10,000RPM
magnetic drives. However, in many cases we have seen slow IOPS be the hidden culprit
when otherwise fast running workbooks end up dragging when published to Server.
If you are using extracts / TDEs, you need to be especially concerned with disk I/O. Extract
creation requires writing to disk quite intensely, and extracts are memory-mapped and
loaded into RAM as needed when used as a data source, so read speed matters as well. If
you are going for the fastest possible configuration, SSDs definitely win. Otherwise, fast,
local storage is the best solution. Things like SANs should be avoided unless you absolutely
know they are fast and dedicated to the Tableau Server.

Amazons AWS and Tableau Whitepaper
Yes, Tableau Server can run really fast in AWS. It can also run really slowly; you need the
right hardware setup. Russell Christophers explorations are a guide to finding that optimal
setup (and yes you should read them in this order):
1. Which EC2 Instance Type Should I run Tableau Server On? Part One
2. Which AWS EC2 Instance Type Should I run Tableau Server On? Part Two

3. Studying Tableau Performance Characteristics on AWS EC2

4. Tableau Server Performance on AWS EC2
5. Comparing Tableau Server performance on EC2: The c3 vs. c4 bake-off
6. AWS EC2 General Purpose (gp2) disks and Tableau Server: mostly awesome

Tableaus official KB article on running Tableau Server on Azure
My thoughts on the question of Tableau on Azure?
Choosing the right Azure box may be even more challenging than in AWS. Once again,
Russell has explored the options:
1. Yes, Tableau can go fast on Windows Azure
2. Can Tableau Server go faster on Windows Azure now?
3. Tableau Server: I wanna go fast on Windows Azure

Look and Feel / Minimizing Load Times

When using the JavaScript API, you can filter and set parameter values prior to the viz
In Tableau 9.3 and beyond, sheets in a dashboard will load as they come in, rather than
waiting for the whole dashboard to render. This will improve the user experience and let
them start working as long as something has loaded.

Load and Performance Testing

TabJolt is a tool built by Tableau (but not supported by Tableau Tech Support) for load testing
a Tableau Server. Use it with your own workbooks to get an accurate picture of concurrent
users that will saturation your cluster. It is available on GitHub
Russell has a great set of blog posts on how to use TabJolt, in the following approximate
1. The Mondo Tableau Server TabJolt Post Series Part 1

2. The Mondo Tableau Server TabJolt Series Part 2

3. The Mondo Tableau Server TabJolt Series Part 3
4. Load Testing Tableau Server 9 with TabJolt on EC2
5. Customizing Tableau Tabjolt Load Tests
6. Tableau Server TabJolt Testing The Light Load
7. Tableau Server TabJolt Testing The Heavy Load

TabMon is a tool built by Tableau (but not supported by Tableau Tech Support) to measure a
Tableau Server cluster, capturing both JMX console information and Windows Performance
Monitor metrics. It is available on GitHub
Russell Christophers explorations into how to use TabMon
1. TabMon is like a FitBit for your Tableau Server
2. TabMon on YouTube
3. Monitoring Tableau: TabMon is a #Tableau Server Detective


When you have a lot of Tableau extracts (TDEs), you want them to be as small as possible
for both generation and performance in a workbook. If you are deploying extracts to a lot of
different sites, you may have built an initial extract that is filtered down for Customer A on

Site A, but want to publish it to Site B and have it filtered for Customer B. Its not too
difficult, although is does require working with the XML a bit.
Tableau filters work in this order, relatively:
1. Filters on extract (applied as a WHERE clause on the query used to pull the data for
the extract)
2. Data Source Filters (applied as WHERE clause at all times)
3. Context Filters
4. Filters that are set to Apply to all Sheets Using This Datasource
5. Filters that only apply to a given worksheet
Often I see Data Source Filters being used on an extract, which has the same effect for the
end user, but the extract itself will still pull all of the information for all of the customers. The
only place an Extract filter can be set is in the dialog that pops up when you create the

In the workbook XML, there is a section of extract tags that give the information about how
the extract will be regenerated.
For example, if Ive put a filter on Region to only show South, the following appears in the


<filter class='categorical' column='[Region]'>

<groupfilter function='member' level='[Region]' member='&quot;South&quot;' user:ui-domai

marker='enumerate' />


Now, its pretty obvious that the member attribute is all the needs to change to change the
definition of the filter.
Heres what it looks like if you select more than one option:

<filter class='categorical' column='[Region]'>

<groupfilter function='union' user:ui-domain='database' user:ui-enumeration='inclusive'


<groupfilter function='member' level='[Region]' member='&quot;South&quot;' />

<groupfilter function='member' level='[Region]' member='&quot;West&quot;' />



How do we work with this if we want to publish our extracts to the Server using the REST API
(or tabcmd)? In that case, you should be saving the extract as a TDSX (Right-click on the
data source, then choose Add to Saved Data Sources, then select Tableau Packaged Data

A TDSX file is encoded using ZIP, and contains both the TDS that tells how to build the
extract, and the TDE file itself (See this previous post on how to open up and modify these

The TDS inside the TDSX will have the same filter and groupfilter sections, which is you
modify and pack the thing back up as a TDSX, will then be used the next time the extract is
refreshed (including once it is published to Server).
I dont have the ability to change the filters built into the tableau_rest_api library, but clearly
you can build out a programmatic process to do all of this.


When you are trying to maximize performance in Tableau, particularly on a live connection,
sometimes the smallest changes can make a big difference. All of your choices in Tableau
Desktop eventually end up as a real live SQL query, which the database will have to
interpret. The simpler the query, the easier the interpretation, and in most cases the quicker
the results.

Tableaus Dashboard Actions are amazing, and in the newer versions there is a quick little
Use as filter button on each sheet in a Dashboard. This creates an Action in the
Dashboard->Actions menu which is set to All Fields down at the bottom. This is incredibly
convenient from a creation standpoint; however, it means that the selected values for every
single dimension in the Source Sheet will be passed along as filters in the WHERE clause of
the eventual SQL query. This includes categorical information which you are displaying: if
you are showing Product Category, Product Sub-Category, and Product ID; all three will be
sent in the eventual query.
Particularly when you are getting down to granular details, you really only need the most
granular piece of information to be passed into the WHERE clause. For optimal performance,
you really only want to pass in values for fields that are indexed in the database. In the
previous example, presuming that a Product ID can only belong to one Category and SubCategory, setting the Action to Selected Fields and choosing Product ID would simplify
the query sent; hopefully Product ID is indexed and thus you get an incredibly quick lookup.


How do I show a user only the data I want them to see in Tableau? The answer is
not Permissions, which only affect what Tableau Server content a user can see. It is Row
Level Security, which requires setup both in your Tableau workbooks and in your database.
To be secure and perform well, you must use a Live Connection to your database. Extracts
will be too large if you join a security table that is one-to-many to your data table (we say
they blow up to dissuade people). While you can simulate row-level security using Data
Blending, it is not fully secure and it also can run into performance issues, so it is best to
avoid that method.
Since you need a live connection and a particular setup in your database, its worth
exploring the requirements and the best practices on the database side.

The Basics
Fully programmatic Row Level Security has been possible in Tableau for a long time, using
what I will refer to as the JOIN Method. It requires you have a security table (or set of
tables, or a view) which includes the Tableau Username as one column, and a column with a
security_key that is also present in a column in the data table. The security table must
have one row per security_key, so that when JOINed it will not cause any row duplication,
which would will cause incorrect aggregations.
Im generalizing here to the bare minimum. Ill talk about all of the possibilities of what the
security table can and should be further down. For our initial discussion, you can think of it
as a single table.
There is a rare case where the data is mapped 1 to 1 with individual users. In this case, you
dont need the extra security table and can simply filter the username field in the data table.
If you have this situation, lucky you! Most organizations have more complex mappings,
where a user belongs to an organization or a multiple groups and so on. When you have a
security table involved, there are two methods to achieve the necessary filtering.

The JOIN Method

The basics of the JOIN Method (as laid out in the Tableau KB article, second section) are:
1. Connect to Data Table d
2. INNER JOIN Security Table s ON d.security_key = s.security_key
3. Create a Calculated Field in Tableau using [s.tableau_username] = USERNAME()
4. Put the Calculated Field on as a Data Source Filter, set to True
5. Publish the Data Source to Tableau Server
6. All user workbooks must connect to the Published Data Source. Now the data source
will be filtered to the logged in user, and they cannot get around the data security
model, owing to the data source filter.
You can represent this in SQL like the following (although VizQL will produce whatever
complex queries it needs, this is the basic form):

FROM data d
INNER JOIN security s ON d.key = s.key
WHERE s.username = USERNAME()

Why does this not cause row duplication?

For those used to Tableau Data Extracts, the obvious issue with this method is that if your
security table has multiple entries, you will end up with many duplicate rows. And in truth, if
your live database is not optimizing the queries correctly, it might go through that process to
get your end result. But in most cases, the database processes in an order that actually
makes this more efficient.
Your initial JOINs in Tableau are just telling VizQL about the relationships;the database never
actually calculates this combined view:

With your row level security calculation in place, there will always be a WHERE clause that
filters the Security table. The RDBMS should process this first, which reduces the number
down just to the entries for that username.

Now that there is one (or a very few) number of keys remaining to JOIN on, the database
filters down the Data table based on those remaining keys from the security table.

Even if a given user has more than one match in the data table, as long as the relationship
remains one-to-one AFTER filtering down on the security side, you still wont have

How does it perform?

I tested on my local PostgreSQL instance and this method can produce very fast results. For
it to be fast, you must INDEX all of the filtered fields properly and set up the correct Primary
Key / Foreign Key relationships where you can. The query optimizer does the following (1)
Filters the security table down by the username column, using the INDEX on the username
column (2) Uses the remaining key (just 1 row) to quickly JOIN, removing all rows from the
data table that dont belong. Since the key field is indexed in both tables, this also is a very
fast process.
Will it perform this well in other RDBMSs? Im hoping to follow up with a few other systems
to make sure there isnt something being missed, but if the optimizer is decent, it seems like
it should work just fine. The database optimizer shouldnt be trying to JOIN all the data in
both tables first and then filtering down; if it is, then you may need to investigate how to
force different optimizer behavior (hinting and so forth).
I also tested whether either of the following queries changed the optimizer logic. The answer
was no: at least in PostgreSQL, the query optimizer recognized that these were all
equivalent, and performed the exact same operations.
FROM data d
INNER JOIN security s ON d.key = s.key AND s.username = USERNAME();
FROM data d
SELECT s.key
FROM security s
WHERE s.username = USERNAME()
) s ON d.key =s.key;

The WHERE Method

The other way of filtering down rows in SQL is through a WHERE clause, rather than a JOIN.
You can put a SELECT statement in the WHERE clause (a sub-select) which returns only
values necessary to filter the main table. Logically, this is exactly the same as INNER
JOINing, however its very likely that the database optimizer will do a different set of steps to
compute the result set.

There is currently no way to do this method in Tableaus Data Joins screen. Prior to Tableau
9.3, you can only do it using Custom SQL and JavaScript API parameters, which requires
embedding into another page (see the post about Parameters with Stored Procedures; its
exactly the same idea). With Tableau 9.3 you can use Initial SQL with Parameters.
The basic idea, rendered in SQL, is as follows:
FROM data d
WHERE d.key = (
SELECT s.key
FROM security s
WHERE s.username = USERNAME()

How does it perform?

Using the same tables in PostgreSQL, the WHERE Method does appear to be slightly more
efficient. I do mean slightly when using EXPLAIN, the total cost for the WHERE Method
scored about 10 points lower than the JOIN Method (I dont fully understand PostgreSQL
costing, but lower is better). I ran the queries over and over and the lowest speed results
were with the WHERE Method, but the average time was about the same for either. My data
set was around 3.5 million records, with security table size around 4000. As mentioned
earlier, I plan to test on some other RDBMS systems to see if there is more benefit /
differences in optimization. Russells look at Row Level Security in SQL Server 2016 leads me
to believe the WHERE Method might provide more benefit on SQL Server. The optimizer is
where RDBMS systems really vary, so YMMV.

Recommendation: Use the JOIN Method unless the WHERE Method is easy
The standard recommended JOIN Method for Row Level Security in Tableau works just fine,
given you have a security table to JOIN to, as described in the next section. While the
WHERE method may give slight performance gains (and it is probably more natural to SQL
experts), in a properly optimized database theres no reason not to use the JOIN method.

Exceptions to this recommendation, which take advantage of features in

particular RDBMS systems to do the WHERE method internally:

In Oracle, if you have VPD set up for the database users, you can use Initial SQL in
Tableau 9.3 to take advantage of the existing security filtering

In SQL Server 2016, you can set up Row Level Security in the database, based on
database user. Tableaus Impersonate User functionality will set the user correctly for
you, and SQL Server 2016 will do the filtering. See Russells exploration here.

In previous versions of SQL Server, you can create Views with the row level security
WHERE clauses built in that reference the current users role, then give users only
access to those specific Views. In this case, Impersonate User will work. It is a similar
concept to how SQL Server 2016 works, but the DBA must make every View they
want to expose instead of having the security logic applied automatically to every
table or view. You can reference the username of the user who is being impersonated
using the SYSTEM_USER constant value,per Stack Overflow

Each View would require a WHERE clause includes a reference

like [Entitlements Table].[Username] = SYSTEM_USER

Note that these require the usernames to exist on the database as users with permissions /
roles in the database. This is more likely in an internal organization use case than in an
externally facing Tableau deployment. If the usernames are not database users, but simply
exist in the data, then you can implement a solution starting with Tableau 9.3 using Initial
SQL with Parameters.

Building the Security Table

At the beginning, we assumed there was a single security table to be JOINed or used in the
WHERE clause. Now lets think about what that security table should consist of and how
we can get there from whatever we are starting with.
Starting with the end in mind, whatever we do needs to result in a one-to-one relationship
between a column in the data table and a column in the security table.
Standard database design practices mean you rarely have a single table that fits this
criteria. Most data isnt necessarily mapped one-to-one to a single user; its very likely the
data security is organized either by role, organization name, or both. A normalized set of
tables might look like:
1. Users table
2. Roles table
3. Users-Roles mapping table

4. Organization / Customer table. Well assume it has a one-to-one relationship with

To get to the security table mentioned previously, youd actually do
FROM roles r
INNER JOIN users_roles_map m ON r.role_id = m.role_id
INNER JOIN users u ON m.user_id = u.user_id
INNER JOIN organization o ON u.org_id = o.org_id
Certainly, you can do this in Tableaus JOINs dialog. But its a lot to ask of an end user to set
up every time, when the logic should always stay the same. I would always recommend
creating a View, possibly even a Materialized or Indexed View depending on your RDBMS, so
that you only have to JOIN that single security_table view to the data table in Tableau.

Hierarchical Filters
At some point in life, we all have to face the fact that we answer to someone. Most
organizations have a hierarchy, and often there is the desire that those higher in the
hierarchy can access the data of all of those who report to them. Recommendations start to
vary on how to deal with hierarchies, because your particular needs may be different, and
RDBMS systems have often solved the challenges of hierarchies with their own proprietary

Utilizing Tableau Groups and a Flattened Security Table

The standard Tableau Row Level Security calculated field looks like
[Username Field] = USERNAME()
But this calculation can include IF/THEN logic, and there is also a function for testing if a user
is a member of a Tableau Group called ISMEMBEROF(group name). Using this, you can
construct hierarchical filters, like so:
IF [Manager Username Field] = USERNAME() THEN 1 ELSE 0 END
IF [Username Field] = USERNAME() THEN 1 ELSE 0 END

Youll notice that the outputs need to be numeric; this is due to the nature of IF/THEN
calculations in Tableau.
The idea here is that a given user in the security table will have additional columns that
include the username key of the user above them in the hierarchy. For each level of
hierarchy, an additional column is necessary. Obviously there is some limit to the levels of
hierarchy you would be able to reasonably define this way you may have to decide on a
mapping process that simplifies down the actual hierarchy into a set number for Tableau to
handle. You also need a mechanism for synchronizing Tableau Group membership from the
database (using the REST API, but with some type of mapping from the database).

Unlimited levels of hierarchy

The standard pattern from representing hierarchy in a relational database actually is through
a long table with only two columns: User ID and Users Managers User ID. You can JOIN this
table to itself to create the flattened view of the hierarchy.

Common Table Expressions (CTEs)

Rather than hand write out each self JOIN (one for each level of hierarchy), many systems
now allow for Common Table Expressions (CTEs) which can recursively JOIN until the last
level has been joined. Tableau cannot deal with CTEs directly; you cant even put them in as
Custom SQL due to how VizQL uses the Custom SQL query. If you need to use a CTE to get to
your flattened security table, you need to make it as a View. Following the earlier advice, you
should just combine this CTE view with any other details you need to make a single View to
join with your data table in Tableau.

Security Functions, Stored Procedures, etc.

As mentioned, different RDBMS systems have other ways of handling hierarchy calculations.
SQL Server, for example, has a hierarchyID type and related functions for handling
relationships defined this way. You may have defined some functions or stored procedures
that do all of the security processing logic. To use them in Tableau, youll need to either use
the Parameters with Stored Procedures method, or Initial SQL in Tableau 9.3.


The best analogy Ive come up with that making a Tableau great data source is like making a
great bronze sculpture. You make the original model that you keep for yourself in your
workshop and then you cast new statues each time the outside world needs one. This
process allows you make changes to the original model or even make a new model without
replacing the existing statues (until necessary).
When used correctly, Data Sources, Extracts, and Tableau Data Server facilitate quick access
to data with minimal effort. However, using the correct process and order can save
tremendous amounts of time on initial setup and any subsequent changes. A fully thought
out process will look something like this:

Making the Model: Creating a Data Source

Every Tableau workbook begins with Connect to Data. When sculpting our data source, we
want to create a .TWB file that only contains datasources, without any viz. The TWB can
contain multiple datasources if necessary and contains all the connection information
necessary to publish a live connection or create an extract (TDE).

1. Open Tableau Desktop

2. Connect to your file or database.
3. Join your tables
4. Leave the connection live dont extract yet
5. Manage your metadata:

Rename fields




Create Calculations


Create Hierarchies


Hide fields hide anything empty or not used by anyone, for improved
extract generation performance


Set default properties (number format, comments)


Assign Data Types and Geographic Roles


Build parameters

6. Depending on your version, you have two options


9.3 and after: Publish the workbook to a Project dedicated for Data Source


9.2 and before, Save the TWB, following your Revision Control Process to
check the changes in.

Organizing your workshop: Revision Control Process

The essential aspect of Revision Control is tracking the changes you make so you can go
back in case of a mistake, or just see who made a particular change.

9.3 and after

In Tableau Server, 9.3, Revision Control can be turned on for Workbooks. Keep separate
workbooks for your data sources, then publish them to a Project dedicated to Data Source
Revisions. A workbook needs at least one viz or filter before it can be published; make
something simple and easy

9.2 and before

Prior to 9.3, Tableau Server does not track changes to overwritten TWB or TDS files, so you
have choice in your revision process if using an earlier version. All you need is a method for
distinguishing between versions and a way to check in and check out.
If your method is a simple shared drive, keep the first part of the file name the same and
add _timestamp (and possible _initials) to the filenames to keep them distinct. It is also
suggested that you publish the updated data source to a Tableau Server Project, visible only
to Data Stewards, using the same naming conventions.
If using SharePoint, checking the file back into SharePoint should archive the previous
version and keep the most recent check-in as the current file.
Industry standard revision control platforms like CVS, SVN, etc. can all be used for the same

Optional: Casting a Test Statue

Many organizations require a QA process before making any change to the verified data
sources that end user workbooks depend on. In this case, the data steward will first publish
the new data source (creating an extract if that will be the final result) to a QA Project on
Tableau Server. Then the QA team will open a copy of existing workbooks that connect to the
original data source in Desktop, and use the Replace data source functionality to confirm
that the updated data source will not cause any issues when it replaces the original data
Once QA has confirmed, they report back their approval to the Data Steward. QA does
not move the data source into the Verified Data Source Project.

Casting the final statue: Publishing to Verified Data Source Project


9.3 and later:Connect to the Published Data Source Workbook.


9.2 and before: Open the latest TWB from the Revision Control location, the
one that QA approved

2. If the final data source will be a Tableau Data Extract, follow these steps:

Add a Create Empty Extract parameter and calculation per this article


Set the Create Empty Extract parameter to Yes, then take an extract from
the local copy, adding the Create Empty Extract filter per the article so that the
extract generates with no data very quickly


Set the Create Empty Extract parameter to No before publishing

3. Publish the data source (live or extract) to the appropriate Project on the Tableau
Server, keeping the same name as was originally published (take out any revision
tracking endings). Overwrite if you are updating an existing data source so existing
workbook references continue to work. This Project should have permissions that
allow Data Stewards to Publish, and Business Users to Connect. In 9.3 and after:
Make sure that Update workbook to user the published data source is NOT
SELECTED. You dont to connect the Data Source workbook to the version you
4. Extract Only: To get the data in the extract to be generated, go into the Data Sources
page on Tableau Server, select the data source and then choose the Permissions
menu option. Choose Scheduled Tasks and then make the extract refresh Right
5. Business Users should connect to the published datasource via the Tableau Server
connection in Tableau Desktop to begin their own explorations of the data.

7 tips to improve Tableau

workbook performance
You have invested in Tableau - Great! I think you have boarded the right ship and you are on your
way to give yourself and your organization new an easier insight to your data. But all the greatest
insights in the world can be overseen if you time after time are met by slow loading information
I think Tableau can do incredible visualizations of enormous amounts of data in no time and I keep
getting impressed - but with that being said, we do have customers experiencing performance
troubles with big data samples. So, here is my take on performance issues with Tableau
There is definitely a technical angle of the performance improvement and I'll get to that later, but
where I usually see the biggest performance improvements is when time is invested in
understanding the Tableau Philosophy - so this is me preaching the word.
Tableau is a data analysis and visualization tool that brings analytics to everyone. This means that
we can do analytics at the speed of thought, automate them, customize them to the receiver and
answer ad hoc inquiries. But with great powers comes great responsibilities :) As an analyst working
with big datasets you should take a stand and consider the following when creating analytics in
What is the question you want to answer?

Instead of visualizing every possible figure and combination of variables in the dataset, your job is to
give the end-user easy access to the answers to his or her questions - often less is more.
Who is the receiver of the report?
As the production and customization of your reports are easier than ever, utilize it! Don't make one
master dashboard/workbook to service the entire organization. The combination of all the
organization's data sources and all the different questions you need to answer is bound to give you
poor performance.
What's possible with the data at hand?
Take a good look at the data and the data structure. What is your options Tableau-wise with the data
at hand? Plan your analysis before you start and make the necessary adjustments to the data or talk
to you DBA and ask him/her to help you out. This way you won't end up down some narrow dark
alley where your only way out is extremely complex table calcs or crazy parameter filter
Avoid reproducing old excel reports in Tableau. Use the Tableau functionalities to reinvent the report
and give the end-user easier access to insights. Reproducing old excel reports is often a hassel and
can force you to make solutions that are less than efficient and tough on performance.
Don't make Tableau a data extracting tool where your only wish is to make a list or a table of all your
records for the past month, quarter or year.
Don't build Tableau in Tableau. If the end-user should have all thinkable options to combine and
analyze data with no pre-thought put into the report, then the end-user needs a Tableau Desktop
license and not a dashboard with all the desktop functionalities build in. Give him or her the fantastic
experience of data discovery with Tableau Desktop.
But enough of me hearing myself talk about Tableau philosophy - now to the technical part of
improving your workbook performance.

If you have already considered and implemented the above mentioned and your workbook is still not
performing as it should there are a couple of things you should consider:
1. Your workbook will never perform better than the underlying database
Working with big datasets will often mean you are connected to a database and not just excel files.
Tableau does not import the data but queries the database and gets an answer back. If the database
is not responding fast enough you'll end up with a slow workbook. Try to make a query directly in the
database to the same view/table (or ask your database guy to do it) and see if the the response here
is just as slow. If thats the case your performance issue should be solved in the database and not in
Tableau. The issue could be related to an extensive amount of joins, or tables not optimized for
joining. The solution could be indexing the tables or creating a new table instead of the view with the
underlying joins.
2. Data connection
If your database is performing but your workbook is not you should take a look at your data
connection. A couple of suggestions for better data connections could be:
If not already using an extract, do it! Tableau Data Extract (tde) can really improve performance
compared to a live connection to many big datasets.
If possible, always use native connections.
Limit or avoid the use of custom SQL.
Are you blending? Data blending in Tableau is very powerful, but can also be a performance killer.
Be aware of the level on which you are blending (you want to blend on as high a level as possible)
and also the amount of data. If possible you should always choose joining over blending.
3. Data size
We all love huge amounts of data and Tableau is a great tool to handle it but if you experience
performance issues you should consider eliminating unnecessary data. Keep only the data that you
need to answer your questions by making partial extracts or by adding a data source filter. Do you
really need all 100 columns? And all the records for the past 10 years?

4. Performance recording
Tableau has a build-in performance recorder that you can utilize. The performance recording outputs
a Tableau report showing you the query time for all your different elements. This is a good start when
diagnosing your workbook.
5. Filters and Parameters
We all love quick filters but use them wisely. Don't plaster your dashboard with quick filters - it's
making your viz more complicated, it makes it slower and it doesn't look good. As a rule of thumb
just try to keep your visualizations simple and informative and use dashboard actions to improve
interaction, insights and performance. Same goes with parameters - use them when needed and not
just because you can.
6. Calculations
Are you making very complex calculated fields? On huge datasets this could affect your
performance. Especially string calculations can drain performance, but also complicated date
manipulations and calculations. You should consider making some of your complex calculations in
the data layer before the data enters Tableau. Another solution could be to utilize the Tableau Data
Extract and the optimize option that "saves" the value of your calculations for future queries.
7. Layout
Keep your dashboards simple. Not only for performance but also for the user experience. Stick to
maximum 4 sheets on your dashboard, keep the number of quick filters and parameters down and
use action filters. Consider the complexity of your graphs. Do you really need a scatter plot with
100.000 marks? Does it give the end user valuable insights? Probably not and its bad for your
Implementing or at least keeping the above mentioned in mind will hopefully improve performance
for your workbooks, but also improve the insights that you deliver to your endusers which of course
is the ultimate goal.