Apache Superset Readthedocs Io en Latest

Superset Documentation
Apache Superset Dev
Dec 05, 2019

CONTENTS
1 Superset Resources 3
2 Apache Software Foundation Resources 5
3 Overview 7
3.1 Features 7
3.2 Databases 7
3.3 Screenshots................................................................................................................................................................................... 9
3.4 Contents 12
3.5 Indices and tables ......................................................................................................................................... 115
i
ii
Apache Superset (incubating) is a modern, enterprise-ready business intelligence web application
Important: Disclaimer: Apache Superset is an effort undergoing incubation at The Apache Software Foundation
(ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review
indicates that the infrastructure, communications, and decision making process have stabilized in a manner
consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the
completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
Note: Apache Superset, Superset, Apache, the Apache feather logo, and the Apache Superset project logo are either
registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
CONTENTS 1
2 CONTENTS
CHAPTER
ONE
SUPERSET RESOURCES
• Superset’s Github, note that we use Github for issue tracking

• Superset’s contribution guidelines and code of conduct on Github.
• Our mailing list archives. To subscribe, send an email to dev-subscribe@superset.apache.org
• Join our Slack
3
4 Chapter 1. Superset Resources

CHAPTER
TWO
APACHE SOFTWARE FOUNDATION

RESOURCES
• The Apache Software Foundation Website

• Current Events
• License
• Thanks to the ASF’s sponsors
• Sponsor Apache!
5
6 Chapter 2. Apache Software Foundation Resources

CHAPTER
THREE
OVERVIEW
3.1 Features
• A rich set of data visualizations

• An easy-to-use interface for exploring and visualizing data
• Create and share dashboards
• Enterprise-ready authentication with integration with major authentication providers (database, OpenID, LDAP,
OAuth & REMOTE_USER through Flask AppBuilder)
• An extensible, high-granularity security/permission model allowing intricate rules on who can access individual
features and the dataset
• A simple semantic layer, allowing users to control how data sources are displayed in the UI by defining which
fields should show up in which drop-down and which aggregation and function metrics are made available to
the user
• Integration with most SQL-speaking RDBMS through SQLAlchemy
• Deep integration with Druid.io
3.2 Databases
The following RDBMS are currently supported:

• Amazon Athena
• Amazon Redshift
• Apache Drill
• Apache Druid
• Apache Hive
• Apache Impala
• Apache Kylin
• Apache Pinot
• Apache Spark SQL
• BigQuery
• ClickHouse
7
• Elasticsearch
• Exasol
• Google Sheets
• Greenplum
• IBM Db2
• MySQL
• Oracle
• PostgreSQL
• Presto
• Snowflake
• SQLite
• SQL Server
• Teradata
• Vertica
• Hana
Other database engines with a proper DB-API driver and SQLAlchemy dialect should be supported as well.
8 Chapter 3. Overview
3.3 Screenshots
3.3. Screenshots 9
00 Superset 0:: Security v r Manage v Sources v lill! Charts ft Dashboards .Il. SOL Lab v !§ v :. v
1 t RunQuery 1 O Save
1
Growth Rate -ti a- IEIII/III!ImEI II!I!Iilili!D 1 % 1 </> 1 [;!li.json 1 .csv 1 View Query 1
L_s_tyle .Banglades h e Brazil China Egypt, A rab Rep Ethiopia France eGermany lndia lndonesia
e 1ran, lslamic Rep ltaly Japan Korea , Rep e Mexico e Mya nmar Nigeria e Pakistan Philippines
,. Datasource & Chart Type e Russian Federation Thailand eTurkey Ukraine e united m e united States Vietnam
Kingdo
Datasource
DMMfiiUMMèM ' a
Visualization Type
1 Ml§i!4AW3Mi 1
- -
,. Time O
Since
TimeColumn Time Grain

year 10 option(s)
Until
,. Query
Metrics
Il x sum_SP_POP_ TOTL
Group by
Il x co untry_nam e 1
Series limit Sort By

25 649 option(s)
Sort Descending 0Contribution
00 Superset 0:: Security v r Manage v Sources v lill! Charts ft Dashboards .Il. SOL Lab v
Growth Analysis
v
1 e Scratchpad v 1 0
------------ --------------------------------------------------------------------------------------------
SELECT b. d ashboard_ i.d , a.da shboard_ ti.tl e, b. sli.ce_ i.d, c.sli.ce_name
Database: main FROM da shboard s a
JOIN da shboard_s li.ces bON a .i.d = b.da sl
Schema: superset 4 JOIN sli.ces c on c. i.d = b. sli.ce_i.d dashboar d s
dashboar column
Add a table (43) d_ti.tle sql
DATABA S column
E
data source_t
slices 8
ype data
created_on source_name
changed_on datasource_i.d
id
slice_name -SaveQuery Il ShareQuery 1 m:;m
datasource_type
datasource_ name
viz_type Ouery History Preview for slices Preview for dashboards
params
created_ by_fk %11111
changed_by_fk % Ill
description dashboard_id dashboard_title slice_id slice_name
cache_timeout Boys
perm
datasource_ id Participants
dashboards 8
GendersbyState
created_on
changed_on
id
Average and SumTrends
dashboard_title
positionjson created_
by_fk %11111
changed_by_fk % Ill
00 Superset 0:: Security v r Manage v Sources v lill! Charts ID Dashboards .llo. SOL Lab v
deck.gl Demo
'-------------'11 Act ions v 1
Scatterplot
Screen
grid
Golden
Gate
Bridge
-:--1:- ·-
= _. _::. ..-.:-.. :_-
,-_.: :,
•
• .·: ·:· 11'""1. ;. ..
•<t'! apbox © OpenStreetMap •lmprqve lis rnap ..1
Grid Hexagons
-.,' \\\n '(-" -

© Mapbox © OpenStreetM p lmproVè this map\
Polygons
Arcs
Moraga
Richmond Canyon
Berkeley
Alameda
Brisbane
San Bruno
MountainState
and CountyPark
© Mapbox © OpenStreetMap lmprove this map ..1
Path
ieadlands
;NRAI
+
© Mapbox © OpenStreetMap lmprJlv-;.Kthis map
3.3. Screenshots 11
3.4 Contents
3.4.1 Installation &

Configuration
Getting
Started
Superset has deprecated support for Python 2.* and supports only ~=3.6 to take advantage of the newer
Python features and reduce the burden of supporting previous versions. We run our test suite against 3.6, but
3.7 is fully supported as well.
Cloud-native!
Superset is designed to be highly available. It is “cloud-native” as it has been designed scale out in large, distributed
environments, and works well inside containers. While you can easily test drive Superset on a modest setup or
simply on your laptop, there’s virtually no limit around scaling out the platform. Superset is also cloud-native in
the sense that it is flexible and lets you choose your web server (Gunicorn, Nginx, Apache), your metadata database
engine (MySQL, Postgres, MariaDB, . . . ), your message queue (Redis, RabbitMQ, SQS, . . . ), your results
backend (S3, Redis, Memcached, . . . ), your caching layer (Memcached, Redis, . . . ), works well with services like
NewRelic, StatsD and DataDog, and has the ability to run analytic workloads against most popular database
technologies.
Superset is battle tested in large environments with hundreds of concurrent users. Airbnb’s production environment
runs inside Kubernetes and serves 600+ daily active users viewing over 100K charts a day.
The Superset web server and the Superset Celery workers (optional) are stateless, so you can scale out by running on
as many servers as needed.
Start with
Docker
Note: The Docker-related files and documentation are actively maintained and managed by the core committers
working on the project. Help and contributions around Docker are welcomed!
If you know docker, then you’re lucky, we have shortcut road for you to initialize development environment:
git clone https://github.com/apache/incubator-superset/

cd incubator-superset
# you can run this command everytime you need to start superset now:
docker-compose up
After several minutes for superset initialization to finish, you can open a browser and view http://localhost:8088
to start your journey.
From there, the container server will reload on modification of the superset python and javascript source code. Don’t
forget to reload the page to take the new frontend into account though.
See also CONTRIBUTING.md#building, for alternative way of serving the
frontend. It is currently not recommended to run docker-compose in production.
If you are attempting to build on a Mac and it exits with 137 you need to increase your docker resources. OSX
instructions: https://docs.docker.com/docker-for-mac/#advanced (Search for memory)
Or if you’re curious and want to install superset from bottom up, then go
ahead. See also docker/README.md
OS
dependencies
Superset stores database connection information in its metadata database. For that purpose, we use the
cryptography Python library to encrypt connection passwords. Unfortunately, this library has OS level depen-
dencies.
You may want to attempt the next step (“Superset installation and initialization”) and come back to this step if you
encounter an error.
Here’s how to install
them:
For Debian and Ubuntu, the following command will ensure that the required dependencies are installed:
sudo apt-get install build-essential libssl-dev libffi-dev python-dev python-pip

˓→libsasl2-dev libldap2-
dev
Ubuntu 18.04 If you have python3.6 installed alongside with python2.7, as is default on Ubuntu 18.04 LTS, run this
command also:
sudo apt-get install build-essential libssl-dev libffi-dev python3.6-dev python-pip
˓→libsasl2-dev libldap2-dev
otherwise build for cryptography fails.

For Fedora and RHEL-derivatives, the following command will ensure that the required dependencies are
installed:
sudo yum upgrade python-setuptools

sudo yum install gcc gcc-c++ libffi-devel python-devel python-pip python-wheel
˓→openssl-devel cyrus-sasl-devel openldap-devel
Mac OS X If possible, you should upgrade to the latest version of OS X as issues are more likely to be resolved for
that version. You will likely need the latest version of XCode available for your installed version of OS X. You should
also install the XCode command line tools:
xcode-select --
install
System python is not recommended. Homebrew’s python also ships with pip:
brew install pkg-config libffi openssl python

env LDFLAGS="-L$(brew --prefix openssl)/lib" CFLAGS="-I$(brew --prefix openssl)/
˓→include" pip install cryptography==2.4.2
Windows isn’t officially supported at this point, but if you want to attempt it, download get-pip.py, and run
python get-pip.py which may need admin access. Then run the following:
C:\> pip install cryptography
# You may also have to create C:\Temp

C:\> md C:\Temp
Python
virtualenv
It is recommended to install Superset inside a virtualenv. Python 3 already ships virtualenv. But if it’s not installed
3.4. Contents 13
in your environment for some reason, you can install it via the package for your operating systems, otherwise you
can install from pip:
3.4. Contents 13
pip install virtualenv
You can create and activate a virtualenv by:
# virtualenv is shipped in Python 3.6+ as venv instead of pyvenv.

# See https://docs.python.org/3.6/library/venv.html
python3 -m venv venv
. venv/bin/activate
On Windows the syntax for activating it is a bit different:
venv\Scripts\activate
Once you activated your virtualenv everything you are doing is confined inside the virtualenv. To exit a virtualenv just
type deactivate.
Python’s setup tools and pip
Put all the chances on your side by getting the very latest pip and setuptools libraries.:
pip install --upgrade setuptools pip
Superset installation and initialization
Follow these few simple steps to install Superset.:
# Install superset
pip install apache-superset
# Initialize the database

superset db upgrade
# Create an admin user (you will be prompted to set a username, first and last name
˓→before setting a password)
$ export FLASK_APP=superset
flask fab create-admin
# Load some data to play with

superset load_examples
# Create default roles and permissions

superset init
# To start a development web server on port 8088, use -p to bind to another port
superset run -p 8088 --with-threads --reload --debugger
After installation, you should be able to point your browser to the right hostname:port http://localhost:8088, login using
the credential you entered while creating the admin account, and navigate to Menu -> Admin -> Refresh Metadata.
This action should bring in all of your datasources for Superset to be aware of, and they should show up in Menu ->
Datasources, from where you can start playing with your data!
A proper WSGI HTTP

Server
While you can setup Superset to run on Nginx or Apache, many use Gunicorn, preferably in async mode, which
allows for impressive concurrency even and is fairly easy to install and configure. Please refer to the documentation
of your preferred technology to set up this Flask WSGI application in a way that works well in your environment.
Here’s an async setup known to work well in production:
gunicorn \
-w 10 \
-k gevent \
--timeout 120 \
-b 0.0.0.0:6666 \
--limit-request-line 0 \
--limit-request-field_size 0 \
--statsd-host localhost:8125 \
"superset.app:create_app()"
Refer to the Gunicorn documentation for more information.

Note that the development web server (superset run or flask run) is not intended for production use.
If not using gunicorn, you may want to disable the use of flask-compress by setting
ENABLE_FLASK_COMPRESS = False in your superset_config.py
Flask-AppBuilder
Permissions
By default, every time the Flask-AppBuilder (FAB) app is initialized the permissions and views are added
automatically to the backend and associated with the ‘Admin’ role. The issue, however, is when you are
running multiple concurrent workers this creates a lot of contention and race conditions when defining
permissions and views.
To alleviate this issue, the automatic updating of permissions can be disabled by setting FAB_UPDATE_PERMS
= False (defaults to True).
In a production environment initialization could take on the following form:
superset init gunicorn -w 10 . . . superset:app
Configuration behind a load

balancer
If you are running superset behind a load balancer or reverse proxy (e.g. NGINX or ELB on AWS), you may
need to utilise a healthcheck endpoint so that your load balancer knows if your superset instance is running.
This is provided at /health which will return a 200 response containing “OK” if the the webserver is running.
If the load balancer is inserting X-Forwarded-For/X-Forwarded-Proto headers, you should set ENABLE_PROXY_FIX
= True in the superset config file to extract and use the
headers.
In case that the reverse proxy is used for providing ssl encryption, an explicit definition of the X-Forwarded-Proto may
be required. For the Apache webserver this can be set as follows:
RequestHeader set X-Forwarded-Proto "https"
Configuration
3.4. Contents 15
To configure your application, you need to create a file (module) superset_config.py and make sure it is in
your PYTHONPATH. Here are some of the parameters you can copy / paste in that configuration module:
3.4. Contents 15
#---------------------------------------------------------
# Superset specific config
#---------------------------------------------------------
ROW_LIMIT = 5000
SUPERSET_WEBSERVER_PORT = 8088
#---------------------------------------------------------
#---------------------------------------------------------
# Flask App Builder configuration
#---------------------------------------------------------
# Your App secret key
SECRET_KEY = '\2\1thisismyscretkey\1\2\e\y\y\h'
# The SQLAlchemy connection string to your database backend

# This connection defines the path to the database that stores your
# superset metadata (slices, connections, tables, dashboards, ...).
# Note that the connection information to connect to the datasources
# you want to explore are managed directly in the web UI
SQLALCHEMY_DATABASE_URI = 'sqlite:////path/to/superset.db'
# Flask-WTF flag for CSRF

WTF_CSRF_ENABLED = True
# Add endpoints that need to be exempt from CSRF protection
WTF_CSRF_EXEMPT_LIST=[]
# A CSRF token that expires in 1 year
WTF_CSRF_TIME_LIMIT = 60 * 60 * 24 * 365
# Set this API key to enable Mapbox visualizations

MAPBOX_API_KEY = ''
All the parameters and default values defined in https://github.com/apache/incubator-superset/blob/master/superset/

config.py can be altered in your local superset_config.py . Administrators will want to read through the file to
understand what can be configured locally as well as the default values in place.
Since superset_config.py acts as a Flask configuration module, it can be used to alter the settings Flask
itself, as well as Flask extensions like flask-wtf, flask-cache, flask-migrate, and flask-
appbuilder. Flask App Builder, the web framework used by Superset offers many configuration settings.
Please consult the Flask App Builder Documentation for more information on how to configure it.
Make sure to
change:
• SQLALCHEMY_DATABASE_URI, by default it is stored at ~/.superset/superset.db
• SECRET_KEY, to a long random string
In case you need to exempt endpoints from CSRF, e.g. you are running a custom auth postback endpoint, you can
add them to WTF_CSRF_EXEMPT_LIST
WTF_CSRF_EXEMPT_LIST = [‘’]
Database
dependencies
Superset does not ship bundled with connectivity to databases, except for Sqlite, which is part of the Python standard
library. You’ll need to install the required packages for the database you want to use as your metadata database as
well as the packages needed to connect to the databases you want to access through Superset.
Here’s a list of some of the recommended
packages.
database pypi package SQLAlchemy URI prefix

Amazon pip install "PyAthenaJDBC>1.0. awsathena+jdbc://
Athena 9"
Amazon pip install "PyAthena>1.2.0" awsathena+rest://
Athena
Amazon pip install sqlalchemy-redshift redshift+psycopg2://
Redshift
Apache pip install sqlalchemy-drill For the REST API:‘‘ drill+sadrill:// For
Drill JDBC drill+jdbc://
Apache pip install pydruid druid://
Druid
Apache pip install pyhive hive://
Hive
Apache pip install impyla impala://
Impala
Apache pip install kylinpy kylin://
Kylin
Apache pip install pinotdb pinot+http://CONTROLLER:5436/
Pinot query?server=http://
CONTROLLER:5983/
Apache pip install pyhive jdbc+hive://
Spark
SQL
BigQuery pip install pybigquery bigquery://
Click- pip install
House sqlalchemy-clickhouse
Elastic- pip install elasticsearch-dbapi elasticsearch+http://
search
Exasol pip install sqlalchemy-exasol exa+pyodbc://
Google pip install gsheetsdb gsheets://
Sheets
IBM Db2 pip install ibm_db_sa db2+ibm_db://
MySQL pip install mysqlclient mysql://
Oracle pip install cx_Oracle oracle://
Post- pip install psycopg2 postgresql+psycopg2://
greSQL
Presto pip install pyhive presto://
Snowflake pip install snowflake://
snowflake-sqlalchemy
SQLite sqlite://
SQL pip install pymssql mssql://
Server
Teradata pip install sqlalchemy-teradata teradata://
Vertica pip install vertica+vertica_python://
sqlalchemy-vertica-python
Hana pip install hdbcli hana://
sqlalchemy-hana or pip install
superset[hana]
Note that many other databases are supported, the main criteria being the existence of a functional
SqlAlchemy dialect and Python driver. Googling the keyword sqlalchemy in addition of a keyword that
describes the database you want to connect to should get you to the right place.
3.4. Contents 17
Hana
The connection string for Hana looks like this
hana://{username}:{password}@{host}:{port}
(AWS) Athena
The connection string for Athena looks like this
awsathena+jdbc://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.
˓→amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}&...
Where you need to escape/encode at least the s3_staging_dir, i.e.,
s3://... -> s3%3A//...
You can also use PyAthena library(no java required) like this
awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.
˓→amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}&...
See PyAthena.
(Google) BigQuery
The connection string for BigQuery looks like this
bigquery://{project_id}
Additionally, you will need to configure authentication via a Service Account. Create your Service Account via the
Google Cloud Platform control panel, provide it access to the appropriate BigQuery datasets, and download the
JSON configuration file for the service account. In Superset, Add a JSON blob to the “Secure Extra” field in the
database configuration page with the following format
{
"credentials_info": <contents of credentials JSON file>
}
The resulting file should have this structure
{
"credentials_info": {
"type": "service_account",
"project_id": "...",
"private_key_id": "...",
"private_key": "...",
"client_email": "...",
"client_id": "...",
"auth_uri": "...",
"token_uri": "...",
"auth_provider_x509_cert_url": "...",
"client_x509_cert_url": "...",
}
}
You should then be able to connect to your BigQuery datasets.

To be able to upload data, e.g. sample data, the python library pandas_gbq is required.
Elasticsearch
The connection string for Elasticsearch looks like this
elasticsearch+http://{user}:{password}@{host}:9200/
Using HTTPS
elasticsearch+https://{user}:{password}@{host}:9200/
Elasticsearch as a default limit of 10000 rows, so you can increase this limit on your cluster or set Superset’s row limit
on config
ROW_LIMIT = 10000
You can query multiple indices on SQLLab for example
select timestamp, agent from "logstash-*"
But, to use visualizations for multiple indices you need to create an alias index on your cluster
POST /_aliases
{
"actions" : [
{ "add" : { "index" : "logstash-**", "alias" : "logstash_all" } }
]
}
Then register your table with the alias name logstasg_all
Snowflake
The connection string for Snowflake looks like this
snowflake://{user}:{password}@{account}.{region}/{database}?role={role}&warehouse=
˓→{warehouse}
The schema is not necessary in the connection string, as it is defined per table/query. The role and warehouse can
be omitted if defaults are defined for the user, i.e.
snowflake://{user}:{password}@{account}.{region}/{database}
Make sure the user has privileges to access and use all required databases/schemas/tables/views/warehouses, as the
Snowflake SQLAlchemy engine does not test for user rights during engine creation.
See Snowflake SQLAlchemy.
Teradata
The connection string for Teradata looks like this
3.4. Contents 19
teradata://{user}:{password}@{host}
Note: Its required to have Teradata ODBC drivers installed and environment variables configured for proper work of
sqlalchemy dialect. Teradata ODBC Drivers available here: https://downloads.teradata.com/download/connectivity/
odbc-driver/linux
Required environment
variables:
export ODBCINI=/.../teradata/client/ODBC_64/odbc.ini
export ODBCINST=/.../teradata/client/ODBC_64/odbcinst.ini
See Teradata SQLAlchemy.
Apache Drill
At the time of writing, the SQLAlchemy Dialect is not available on pypi and must be downloaded here:
SQLAlchemy Drill
Alternatively, you can install it completely from the command line as follows:
git clone https://github.com/JohnOmernik/sqlalchemy-drill

cd sqlalchemy-drill
python3 setup.py install
Once that is done, you can connect to Drill in two ways, either via the REST interface or by JDBC. If you are
connecting via JDBC, you must have the Drill JDBC Driver installed.
The basic connection string for Drill looks like this
drill+sadrill://{username}:{password}@{host}:{port}/{storage_plugin}?use_ssl=True
If you are using JDBC to connect to Drill, the connection string looks like this:
drill+jdbc://{username}:{password}@{host}:{port}/{storage_plugin}
For a complete tutorial about how to use Apache Drill with Superset, see this tutorial: Visualize Anything with
Superset and Drill
Caching
Superset uses Flask-Cache for caching purpose. Configuring your caching backend is as easy as providing a
CACHE_CONFIG, constant in your superset_config.py that complies with the Flask-Cache
specifications.
Flask-Cache supports multiple caching backends (Redis, Memcached, SimpleCache (in-memory), or the local filesys-
tem). If you are going to use Memcached please use the pylibmc client library as python-memcached does not
handle storing binary data correctly. If you use Redis, please install the redis Python package:
pip install redis
For setting your timeouts, this is done in the Superset metadata and goes up the “timeout searchpath”, from your
slice configuration, to your data source’s configuration, to your database’s and ultimately falls back into your global
default defined in CACHE_CONFIG.
CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 24, # 1 day default (in secs)
'CACHE_KEY_PREFIX': 'superset_results',
'CACHE_REDIS_URL': 'redis://localhost:6379/0',
}
It is also possible to pass a custom cache initialization function in the config to handle additional caching use cases.
The function must return an object that is compatible with the Flask-Cache API.
from custom_caching import CustomCache
def init_cache(app):
"""Takes an app instance and returns a custom cache backend"""
config = {
'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 24, # 1 day default (in secs)
'CACHE_KEY_PREFIX': 'superset_results',
}
return CustomCache(app, config)
CACHE_CONFIG = init_cache
Superset has a Celery task that will periodically warm up the cache based on different strategies. To use it, add the
following to the CELERYBEAT_SCHEDULE section in config.py:
CELERYBEAT_SCHEDULE = {
'cache-warmup-hourly': {
'task': 'cache-warmup',
'schedule': crontab(minute=0, hour='*'), # hourly
'kwargs': {
'strategy_name': 'top_n_dashboards',
'top_n': 5,
'since': '7 days ago',
},
},
}
This will cache all the charts in the top 5 most popular dashboards every hour. For other strategies, check the super-
set/tasks/cache.py file.
Deeper SQLAlchemy integration
It is possible to tweak the database connection information using the parameters exposed by SQLAlchemy. In the
Database edit view, you will find an extra field as a JSON blob.
3.4. Contents 21
This JSON string contains extra configuration elements. The engine_params object gets unpacked into the
sqlalchemy.create_engine call, while the metadata_params get unpacked into the sqlalchemy.MetaData call. Re-
fer to the SQLAlchemy docs for more information.
Note: If your using CTAS on SQLLab and PostgreSQL take a look at Create Table As (CTAS) for specific
engine_params.
Schemas (Postgres &

Redshift)
Postgres and Redshift, as well as other databases, use the concept of schema as a logical entity on top of the database.
For Superset to connect to a specific schema, there’s a schema parameter you can set in the table form.
External Password store for SQLAlchemy

connections
It is possible to use an external store for you database passwords. This is useful if you a running a custom secret
distribution framework and do not wish to store secrets in Superset’s meta database.
Example: Write a function that takes a single argument of type sqla.engine.url and returns the password for
the given connection string. Then set SQLALCHEMY_CUSTOM_PASSWORD_STORE in your config file to point to
that function.
def example_lookup_password(url):
secret = <<get password from external framework>>
return 'secret'
SQLALCHEMY_CUSTOM_PASSWORD_STORE = example_lookup_password
A common pattern is to use environment variables to make secrets available.

SQLALCHEMY_CUSTOM_PASSWORD_STORE can also be used for that purpose.
def example_password_as_env_var(url):
# assuming the uri looks like
# mysql://localhost?superset_user:{SUPERSET_PASSWORD}
return url.password.format(os.environ)
SQLALCHEMY_CUSTOM_PASSWORD_STORE = example_password_as_env_var
SSL Access to databases
This example worked with a MySQL database that requires SSL. The configuration may differ with other backends.
This is what was put in the extra parameter
{
"metadata_params": {},
"engine_params": {
"connect_args":{
"sslmode":"require",
"sslrootcert": "/path/to/my/pem"
}
}
}
Druid
• From the UI, enter the information about your clusters in the Sources -> Druid Clusters menu by hitting
the + sign.
• Once the Druid cluster connection information is entered, hit the Sources -> Refresh Druid Metadata menu item
to populate
• Navigate to your datasources
Note that you can run the superset refresh_druid command to refresh the metadata from your Druid clus-
ter(s)
Presto
By default Superset assumes the most recent version of Presto is being used when querying the datasource. If you’re
using an older version of presto, you can configure it in the extra parameter:
{
"version": "0.123"
}
Exasol
The connection string for Exasol looks like this
exa+pyodbc://{user}:{password}@{host}
Note: It’s required to have Exasol ODBC drivers installed for the sqlalchemy dialect to work properly. Exasol
ODBC Drivers available are here: https://www.exasol.com/portal/display/DOWNLOAD/Exasol+Download+Section
Example config (odbcinst.ini can be left empty)
$ cat $/.../path/to/odbc.ini
[EXAODBC]
DRIVER = /.../path/to/driver/EXASOL_driver.so
EXAHOST = host:8563
EXASCHEMA = main
See SQLAlchemy for Exasol.
3.4. Contents 23
CORS
The extra CORS Dependency must be installed:

superset[cors]
The following keys in superset_config.py can be specified to configure CORS:
• ENABLE_CORS: Must be set to True in order to enable CORS
• CORS_OPTIONS: options passed to Flask-CORS (documentation <https://flask-
cors.corydolphin.com/en/latest/api.html#extension>)
Domain
Sharding
Chrome allows up to 6 open connections per domain at a time. When there are more than 6 slices in dashboard, a lot
of time fetch requests are queued up and wait for next available socket. PR 5039 adds domain sharding to Superset,
and this feature will be enabled by configuration only (by default Superset doesn’t allow cross-domain request).
• SUPERSET_WEBSERVER_DOMAINS: list of allowed hostnames for domain sharding feature. default None
Middleware
Superset allows you to add your own middleware. To add your own middleware, update the
ADDITIONAL_MIDDLEWARE key in your superset_config.py. ADDITIONAL_MIDDLEWARE should be a list
of your additional middleware classes.
For example, to use AUTH_REMOTE_USER from behind a proxy server like nginx, you have to add a simple mid-
dleware class to add the value of HTTP_X_PROXY_REMOTE_USER (or any other custom header from the proxy) to
Gunicorn’s REMOTE_USER environment variable:
class RemoteUserMiddleware(object):
def init (self, app):
self.app = app
def call (self, environ, start_response):
user = environ.pop('HTTP_X_PROXY_REMOTE_USER', None)
environ['REMOTE_USER'] = user
return self.app(environ, start_response)
ADDITIONAL_MIDDLEWARE = [RemoteUserMiddleware, ]
Adapted from http://flask.pocoo.org/snippets/69/
Event Logging
Superset by default logs special action event on it’s database. These log can be accessed on the UI navigating to
“Security” -> “Action Log”. You can freely customize these logs by implementing your own event log class.
Example of a simple JSON to Stdout class:
class JSONStdOutEventLogger(AbstractEventLogger):
def log(self, user_id, action, *args, **kwargs):

records = kwargs.get('records', list())
dashboard_id = kwargs.get('dashboard_id')
slice_id = kwargs.get('slice_id')
(continues on next page)
(continued from previous page)

duration_ms = kwargs.get('duration_ms')
referrer = kwargs.get('referrer')
for record in records:

log = dict(
action=action,
json=record,
dashboard_id=dashboard_id,
slice_id=slice_id,
duration_ms=duration_ms,
referrer=referrer,
user_id=user_id
)
print(json.dumps(log))
Then on Superset’s config pass an instance of the logger type you want to use.
EVENT_LOGGER = JSONStdOutEventLogger()
Upgrading
Upgrading should be as straightforward as running:
pip install apache-superset --upgrade

superset db upgrade
superset init
We recommend to follow standard best practices when upgrading Superset, such as taking a database backup prior
to the upgrade, upgrading a staging environment prior to upgrading production, and upgrading production while less
users are active on the platform.
Note: Some upgrades may contain backward-incompatible changes, or require scheduling downtime, when that is
the case, contributors attach notes in UPDATING.md in the repository. It’s recommended to review this file prior to
running an upgrade.
Celery
Tasks
On large analytic databases, it’s common to run queries that execute for minutes or hours. To enable support for long
running queries that execute beyond the typical web request’s timeout (30-60 seconds), it is necessary to configure
an asynchronous backend for Superset which consists of:
• one or many Superset workers (which is implemented as a Celery worker), and can be started with the
celery worker command, run celery worker --help to view the related options.
• a celery broker (message queue) for which we recommend using Redis or RabbitMQ
• a results backend that defines where the worker will persist the query results
Configuring Celery requires defining a CELERY_CONFIG in your superset_config.py. Both the worker and
web server processes should have the same configuration.
class CeleryConfig(object):
BROKER_URL = 'redis://localhost:6379/0'
3.4. Contents 25

CELERY_IMPORTS = (
'superset.sql_lab',
'superset.tasks',
)
CELERY_RESULT_BACKEND = 'redis://localhost:6379/0'
CELERYD_LOG_LEVEL = 'DEBUG'
CELERYD_PREFETCH_MULTIPLIER = 10
CELERY_ACKS_LATE = True
CELERY_ANNOTATIONS = {
'sql_lab.get_sql_results': {
'rate_limit': '100/s',
},
'email_reports.send': {
'rate_limit': '1/s',
'time_limit': 120,
'soft_time_limit': 150,
'ignore_result': True,
},
}
CELERYBEAT_SCHEDULE = {
'email_reports.schedule_hourly': {
'task': 'email_reports.schedule_hourly',
'schedule': crontab(minute=1,
hour='*'),
},
}
CELERY_CONFIG = CeleryConfig
• To start a Celery worker to leverage the configuration run:
celery worker --app=superset.tasks.celery_app:app --pool=prefork -O fair -c 4
• To start a job which schedules periodic background jobs, run
celery beat --app=superset.tasks.celery_app:app
To setup a result backend, you need to pass an instance of a derivative of werkzeug.contrib.cache.

BaseCache to the RESULTS_BACKEND configuration key in your superset_config.py. It’s possible to use
Memcached, Redis, S3 (https://pypi.python.org/pypi/s3werkzeugcache), memory or the file system (in a single
server- type setup or for testing), or to write your own caching interface. Your superset_config.py may look
something like:
# On S3
from s3cache.s3cache import S3Cache
S3_CACHE_BUCKET = 'foobar-superset'
S3_CACHE_KEY_PREFIX = 'sql_lab_result'
RESULTS_BACKEND = S3Cache(S3_CACHE_BUCKET, S3_CACHE_KEY_PREFIX)
# On Redis
from werkzeug.contrib.cache import RedisCache
RESULTS_BACKEND = RedisCache(
host='localhost', port=6379, key_prefix='superset_results')
For performance gains, MessagePack and PyArrow are now used for results serialization. This can be disabled
by setting RESULTS_BACKEND_USE_MSGPACK = False in your configuration, should any issues arise.
Please clear your existing results cache store when upgrading an existing environment.
Important notes
• It is important that all the worker nodes and web servers in the Superset cluster share a common metadata
database. This means that SQLite will not work in this context since it has limited support for concurrency and
typically lives on the local file system.
• There should only be one instance of celery beat running in your entire setup. If not, background
jobs can get scheduled multiple times resulting in weird behaviors like duplicate delivery of reports,
higher than expected load / traffic etc.
• SQL Lab will only run your queries asynchronously if you enable “Asynchronous Query Execution” in your
database settings.
Email Reports
Email reports allow users to schedule email reports for

• chart and dashboard visualization (Attachment or inline)
• chart data (CSV attachment on inline table)
Setup
Make sure you enable email reports in your configuration file
ENABLE_SCHEDULED_EMAIL_REPORTS = True
Now you will find two new items in the navigation bar that allow you to schedule email reports
• Manage -> Dashboard Emails
• Manage -> Chart Email Schedules
Schedules are defined in crontab format and each schedule can have a list of recipients (all of them can receive a
single mail, or separate mails). For audit purposes, all outgoing mails can have a mandatory bcc.
In order get picked up you need to configure a celery worker and a celery beat (see section above “Celery Tasks”).
Your celery configuration also needs an entry email_reports.schedule_hourly for
CELERYBEAT_SCHEDULE.
To send emails you need to configure SMTP settings in your configuration file. e.g.
EMAIL_NOTIFICATIONS = True
SMTP_HOST = "email-smtp.eu-west-1.amazonaws.com"
SMTP_STARTTLS = True
SMTP_SSL = False
SMTP_USER = "smtp_username"
SMTP_PORT = 25
SMTP_PASSWORD = os.environ.get("SMTP_PASSWORD")
SMTP_MAIL_FROM = "insights@komoot.com"
To render dashboards you need to install a local browser on your superset instance
• geckodriver and Firefox is preferred
• chromedriver is a good option too
You need to adjust the EMAIL_REPORTS_WEBDRIVER accordingly in your configuration.
You also need to specify on behalf of which username to render the dashboards. In general dashboards and charts
are not accessible to unauthorized requests, that is why the worker needs to take over credentials of an existing
user to take a snapshot.
3.4. Contents 27
EMAIL_REPORTS_USER = 'username_with_permission_to_access_dashboards'
Important notes
• Be mindful of the concurrency setting for celery (using -c 4). Selenium/webdriver instances can consume a
lot of CPU / memory on your servers.
• In some cases, if you notice a lot of leaked geckodriver processes, try running your celery processes
with
celery worker --pool=prefork --max-tasks-per-child=128 ...
• It is recommended to run separate workers for sql_lab and email_reports tasks. Can be done by using
queue field in CELERY_ANNOTATIONS
• Adjust WEBDRIVER_BASEURL in your config if celery workers can’t access superset via its default value
http://0.0.0.0:8080/ (notice the port number 8080, many other setups use port 8088).
SQL
Lab
SQL Lab is a powerful SQL IDE that works with all SQLAlchemy compatible databases. By default, queries are
executed in the scope of a web request so they may eventually timeout as queries exceed the maximum duration of
a web request in your environment, whether it’d be a reverse proxy or the Superset server itself. In such cases, it is
preferred to use celery to run the queries in the background. Please follow the examples/notes mentioned above to
get your celery setup working.
Also note that SQL Lab supports Jinja templating in queries and that it’s possible to overload the default Jinja
context in your environment by defining the JINJA_CONTEXT_ADDONS in your superset configuration.
Objects referenced in this dictionary are made available for users to use in their SQL.
JINJA_CONTEXT_ADDONS = {
'my_crazy_macro': lambda x: x*2,
}
SQL Lab also includes a live query validation feature with pluggable backends. You can configure which validation
implementation is used with which database engine by adding a block like the following to your config.py:
FEATURE_FLAGS = {
'SQL_VALIDATORS_BY_ENGINE': {
'presto': 'PrestoDBSQLValidator',
}
}
The available validators and names can be found in sql_validators/.

Scheduling queries
You can optionally allow your users to schedule queries directly in SQL Lab. This is done by addding extra
metadata to saved queries, which are then picked up by an external scheduled (like [Apache Airflow]
(https://airflow.apache.org/)).
To allow scheduled queries, add the following to your config.py:
FEATURE_FLAGS = {
# Configuration for scheduling queries from SQL Lab. This information is
# collected when the user clicks "Schedule query", and saved into the `extra`
# field of saved queries.
# See: https://github.com/mozilla-services/react-jsonschema-form
'SCHEDULED_QUERIES': {
'JSONSCHEMA': {

'title':
'Schedule',
'description': (
'In order to schedule a query, you need to specify when it '
'should start running, when it should stop running, and how '
'often it should run. You can also optionally specify '
'dependencies that should be met before the query is '
'executed. Please read the documentation for best practices '
'and more information on how to specify dependencies.'
),
'type': 'object',
'properties': {
'output_table': {
'type': 'string',
'title': 'Output table name',
},
'start_date': {
'type': 'string',
'title': 'Start date',
# date-time is parsed using the chrono library, see
# https://www.npmjs.com/package/chrono-node#usage
'format': 'date-time',
'default': 'tomorrow at 9am',
},
'end_date': {
'type': 'string',
'title': 'End date',
# date-time is parsed using the chrono library, see
# https://www.npmjs.com/package/chrono-node#usage
'format': 'date-time',
'default': '9am in 30 days',
},
'schedule_interval': {
'type': 'string',
'title': 'Schedule interval',
},
'dependencies': {
'type': 'array',
'title':
'Dependencies',
'items': {
'type': 'string',
},
},
},
},
'UISCHEMA': {
'schedule_interval': {
'ui:placeholder': '@daily, @weekly, etc.',
},
'dependencies': {
'ui:help': (
'Check the documentation for the correct format when '
'defining dependencies.'
),
},
},
'VALIDATION': [
# ensure that start_date <= end_date
3.4. Contents 29
3.4. Contents 29

{
'name': 'less_equal',
'arguments': ['start_date', 'end_date'],
'message': 'End date cannot be before start date',
# this is where the error message is shown
'container': 'end_date',
},
],
# link to the scheduler; this example links to an Airflow pipeline
# that uses the query id and the output table as its name
'linkback': (
'https://airflow.example.com/admin/airflow/tree?' 'dag_id=query_$
{id}_${extra_json.schedule_info.output_table}'
),
},
}
This feature flag is based on [react-jsonschema-form](https://github.com/mozilla-services/react-jsonschema-form),

and will add a button called “Schedule Query” to SQL Lab. When the button is clicked, a modal will show up
where the user can add the metadata required for scheduling the query.
This information can then be retrieved from the endpoint /savedqueryviewapi/api/read and used to schedule the
queries that have scheduled_queries in their JSON metadata. For schedulers other than Airflow, additional fields can
be easily added to the configuration file above.
Celery
Flower
Flower is a web based tool for monitoring the Celery cluster which you can install from
pip:
pip install flower
and run via:
celery flower --app=superset.tasks.celery_app:app
Building from
source
More advanced users may want to build Superset from sources. That would be the case if you fork the project
to add features specific to your environment. See CONTRIBUTING.md#setup-local-environment-for-development.
Blueprints
Blueprints are Flask’s reusable apps. Superset allows you to specify an array of Blueprints in your
superset_config module. Here’s an example of how this can work with a simple Blueprint. By doing so,
you can expect Superset to serve a page that says “OK” at the /simple_page url. This can allow you to run other
things such as custom data visualization applications alongside Superset, on the same server.
from flask import Blueprint
simple_page = Blueprint('simple_page', name ,
template_folder='templates')
@simple_page.route('/', defaults={'page': 'index'})
@simple_page.route('/<page>')

def show(page):
return "Ok"
BLUEPRINTS = [simple_page]
StatsD logging
Superset is instrumented to log events to StatsD if desired. Most endpoints hit are logged as well as key events like
query start and end in SQL Lab.
To setup StatsD logging, it’s a matter of configuring the logger in your superset_config.py.
from superset.stats_logger import StatsdStatsLogger

STATS_LOGGER = StatsdStatsLogger(host='localhost', port=8125, prefix='superset')
Note that it’s also possible to implement you own logger by deriving superset.stats_logger.
BaseStatsLogger.
Install Superset with helm in Kubernetes
You can install Superset into Kubernetes with Helm <https://helm.sh/>. The chart is located in install/helm.
To install Superset into your Kubernetes:
helm upgrade --install superset ./install/helm/superset
Note that the above command will install Superset into default namespace of your Kubernetes cluster.
Custom OAuth2 configuration
Beyond FAB supported providers (github, twitter, linkedin, google, azure), its easy to connect Superset with other
OAuth2 Authorization Server implementations that support “code” authorization.
The first step: Configure authorization in Superset superset_config.py.
AUTH_TYPE = AUTH_OAUTH
OAUTH_PROVIDERS = [
{ 'name':'egaSSO',
'token_key':'access_token', # Name of the token in the response of access_
˓→token_url
'icon':'fa-address-card', # Icon for the provider

'remote_app': {
'consumer_key':'myClientId', # Client Id (Identify Superset application)
'consumer_secret':'MySecret', # Secret for this Client Id (Identify
˓→Superset application)
'request_token_params':{
'scope': 'read' # Scope for the Authorization
},
'access_token_method':'POST', # HTTP Method to call access_token_url
'access_token_params':{ # Additional parameters for calls to
˓→access_token_url
'client_id':'myClientId'
},
3.4. Contents 31

'access_token_headers':{ # Additional headers for calls to access_
˓→ token_ur
l 'Authorization': 'Basic Base64EncodedClientIdAndSecret'
},
'base_url':'https://myAuthorizationServer/oauth2AuthorizationServer/',
'access_token_url':'https://myAuthorizationServer/
˓→oauth2AuthorizationServer/token',
'authorize_url':'https://myAuthorizationServer/oauth2AuthorizationServer/
˓→authorize'
}
}
]
# Will allow user self registration, allowing to create Flask users from Authorized
˓→User
AUTH_USER_REGISTRATION = True
# The default user self registration role

AUTH_USER_REGISTRATION_ROLE = "Public"
Second step: Create a CustomSsoSecurityManager that extends SupersetSecurityManager and overrides

oauth_user_info:
from superset.security import SupersetSecurityManager
class CustomSsoSecurityManager(SupersetSecurityManager):
def oauth_user_info(self, provider, response=None):

logging.debug("Oauth2 provider: {0}.".format(provider))
if provider == 'egaSSO':
# As example, this line request a GET to base_url + '/' + userDetails
˓→with Bearer Authentication,
# and expects that authorization server checks the token, and response with user
˓→details
me = self.appbuilder.sm.oauth_remotes[provider].get('userDetails').data
logging.debug("user_data: {0}".format(me))
return { 'name' : me['name'], 'email' : me['email'], 'id' : me['user_name
˓→'], 'username' : me['user_name'], 'first_name':'', 'last_name':''}
...
This file must be located at the same directory than superset_config.py with the name
custom_sso_security_manager.py.
Then we can add this two lines to superset_config.py:
from custom_sso_security_manager import CustomSsoSecurityManager

CUSTOM_SECURITY_MANAGER = CustomSsoSecurityManager
Feature
Flags
Because of a wide variety of users, Superset has some features that are not enabled by default. For example, some
users have stronger security restrictions, while some others may not. So Superset allow users to enable or disable
some features by config. For feature owners, you can add optional functionalities in Superset, but will be only
affected by a subset of users.
You can enable or disable features with flag from superset_config.py:
DEFAULT_FEATURE_FLAGS = {
'CLIENT_CACHE': False,
'ENABLE_EXPLORE_JSON_CSRF_PROTECTION': False,
'PRESTO_EXPAND_DATA': False,
}
Here is a list of flags and descriptions:

• ENABLE_EXPLORE_JSON_CSRF_PROTECTION
– For some security concerns, you may need to enforce CSRF protection on all query request to
explore_json endpoint. In Superset, we use flask-csrf add csrf protection for all POST requests, but
this protection doesn’t apply to GET method.
– When ENABLE_EXPLORE_JSON_CSRF_PROTECTION is set to true, your users cannot make GET
request to explore_json. The default value for this feature False (current behavior), explore_json accepts
both GET and POST request. See PR 7935 for more details.
• PRESTO_EXPAND_DATA
– When this feature is enabled, nested types in Presto will be expanded into extra columns and/or arrays.
This is experimental, and doesn’t work with all nested types.
SIP-15
SIP-15 aims to ensure that time intervals are handled in a consistent and transparent manner for both the Druid and
SQLAlchemy connectors.
Prior to SIP-15 SQLAlchemy used inclusive endpoints however these may behave like exclusive for string columns
(due to lexicographical ordering) if no formatting was defined and the column formatting did not conform to an ISO
8601 date-time (refer to the SIP for details).
To remedy this rather than having to define the date/time format for every non-IS0 8601 date-time column, once can
define a default column mapping on a per database level via the extra parameter
{
"python_date_format_by_column_name": {
"ds": "%Y-%m-%d"
}
}
New deployments
All new Superset deployments should enable SIP-15 via,
SIP_15_ENABLED = True
Existing deployments
Given that it is not apparent whether the chart creator was aware of the time range inconsistencies (and adjusted the
endpoints accordingly) changing the behavior of all charts is overly aggressive. Instead SIP-15 proivides a soft tran-
sistion allowing producers (chart owners) to see the impact of the proposed change and adjust their charts
accordingly.
Prior to enabling SIP-15 existing deployments should communicate to their users the impact of the change and define
a grace period end date (exclusive of course) after which all charts will conform to the [start, end) interval, i.e.,
from dateime import date
3.4. Contents 33

SIP_15_ENABLED = True
SIP_15_GRACE_PERIOD_END = date(<YYYY>, <MM>, <DD>)
To aid with transparency the current endpoint behavior is explicitly called out in the chart time range (post SIP-15 this
will be [start, end) for all connectors and databases). One can override the defaults on a per database level via the
extra parameter
{
"time_range_endpoints": ["inclusive", "inclusive"]
}
Note in a future release the interim SIP-15 logic will be removed (including the time_grain_endpoints form-
data field) via a code change and Alembic migration.
3.4.2 Tutorials
Creating your first

dashboard
This tutorial targets someone who wants to create charts and dashboards in Superset. We’ll show you how to
connect Superset to a new database and configure a table in that database for analysis. You’ll also explore the
data you’ve exposed and add a visualization to a dashboard so that you get a feel for the end-to-end user
experience.
Connecting to a new
database
We assume you already have a database configured and can connect to it from the instance on which you’re running
Superset. If you’re just testing Superset and want to explore sample data, you can load some sample PostgreSQL
datasets into a fresh DB, or configure the example weather data we use here.
Under the Sources menu, select the Databases option:
On the resulting page, click on the green plus sign, near the top right:
You can configure a number of advanced options on this page, but for this walkthrough, you’ll only need to do two
things:
1. Name your database connection:
2. Provide the SQLAlchemy Connection URI and test the connection:
This example shows the connection string for our test weather database. As noted in the text below the URI, you
should refer to the SQLAlchemy documentation on creating new connection URIs for your target database.
Click the Test Connection button to confirm things work end to end. Once Superset can successfully connect and
authenticate, you should see a popup like this:
Moreover, you should also see the list of tables Superset can read from the schema you’re connected to, at the bottom
of the page:
If the connection looks good, save the configuration by clicking the Save button at the bottom of the page:
Adding a new table
Now that you’ve configured a database, you’ll need to add specific tables to Superset that you’d like to query.
3.4. Contents 35
Under the Sources menu, select the Tables option:
On the resulting page, click on the green plus sign, near the top left:
You only need a few pieces of information to add a new table to Superset:
• The name of the table
• The target database from the Database drop-down menu (i.e. the one you just added above)
• Optionally, the database schema. If the table exists in the “default” schema (e.g. the public schema in Post-
greSQL or Redshift), you can leave the schema field blank.
Click on the Save button to save the configuration:
When redirected back to the list of tables, you should see a message indicating that your table was created:
This message also directs you to edit the table configuration. We’ll edit a limited portion of the configuration now -
just to get you started - and leave the rest for a more advanced tutorial.
Click on the edit button next to the table you’ve created:
On the resulting page, click on the List Table Column tab. Here, you’ll define the way you can use specific columns
of your table when exploring your data. We’ll run through these options to describe their purpose:
• If you want users to group metrics by a specific field, mark it as Groupable.
• If you need to filter on a specific field, mark it as Filterable.
• Is this field something you’d like to get the distinct count of? Check the Count Distinct box.
• Is this a metric you want to sum, or get basic summary statistics for? The Sum, Min, and Max columns will
help.
• The is temporal field should be checked for any date or time fields. We’ll cover how this manifests itself in
analyses in a moment.
Here’s how we’ve configured fields for the weather data. Even for measures like the weather measurements (precipi-
tation, snowfall, etc.), it’s ideal to group and filter by these values:
As with the configurations above, click the Save button to save these settings.
Exploring your data
To start exploring your data, simply click on the table name you just created in the list of available tables:
3.4. Contents 37
By default, you’ll be presented with a Table View:
Let’s walk through a basic query to get the count of all records in our table. First, we’ll need to change the Since filter
to capture the range of our data. You can use simple phrases to apply these filters, like “3 years ago”:
The upper limit for time, the Until filter, defaults to “now”, which may or may not be what you want.
Look for the Metrics section under the GROUP BY header, and start typing “Count” - you’ll see a list of metrics
matching what you type:
Select the COUNT(*) metric, then click the green Query button near the top of the explore:
You’ll see your results in the

table:
Let’s group this by the weather_description field to get the count of records by the type of weather recorded by adding
it to the Group by section:
and run the

query:
Let’s find a more useful data point: the top 10 times and places that recorded the highest temperature in 2015.
We replace weather_description with latitude, longitude and measurement_date in the Group by section:
And replace COUNT(*) with max measurement_flag:
The max measurement_flag metric was created when we checked the box under Max and next to the measure-
ment_flag field, indicating that this field was numeric and that we wanted to find its maximum value when grouped by
specific fields.
In our case, measurement_flag is the value of the measurement taken, which clearly depends on the type of mea-
surement (the researchers recorded different values for precipitation and temperature). Therefore, we must filter our
query only on records where the weather_description is equal to “Maximum temperature”, which we do in the Filters
section at the bottom of the explore:
3.4. Contents 39
Finally, since we only care about the top 10 measurements, we limit our results to 10 records using the Row limit
option under the Options
header:
We click Query and get the following results:
In this dataset, the maximum temperature is recorded in tenths of a degree Celsius. The top value of 1370, measured
in the middle of Nevada, is equal to 137 C, or roughly 278 degrees F. It’s unlikely this value was correctly recorded.
We’ve already been able to investigate some outliers with Superset, but this just scratches the surface of what we
can do.
You may want to do a couple more things with this measure:
• The default formatting shows values like 1.37k, which may be difficult for some users to read. It’s likely
you may want to see the full, comma-separated value. You can change the formatting of any measure by
editing its config (Edit Table Config > List Sql Metric > Edit Metric > D3Format)
• Moreover, you may want to see the temperature measurements in plain degrees C, not tenths of a degree. Or
you may want to convert the temperature to degrees Fahrenheit. You can change the SQL that gets executed
against the database, baking the logic into the measure itself (Edit Table Config > List Sql Metric > Edit Metric
> SQL Expression)
For now, though, let’s create a better visualization of these data and add it to a
dashboard. We change the Chart Type to “Distribution - Bar Chart”:
Our filter on Maximum temperature measurements was retained, but the query and formatting options are dependent
on the chart type, so you’ll have to set the values again:
You should note the extensive formatting options for this chart: the ability to set axis labels, margins, ticks, etc. To
make the data presentable to a broad audience, you’ll want to apply many of these to slices that end up in
dashboards. For now, though, we run our query and get the following chart:
3.4. Contents 41
Creating a slice and dashboard
This view might be interesting to researchers, so let’s save it. In Superset, a saved query is called a Slice.
To create a slice, click the Save as button near the top-left of the explore:
A popup should appear, asking you to name the slice, and optionally add it to a dashboard. Since we haven’t yet
created any dashboards, we can create one and immediately add our slice to it. Let’s do it:
Click Save, which will direct you back to your original query. We see that our slice and dashboard were successfully
created:
Let’s check out our new dashboard. We click on the Dashboards menu:
and find the dashboard we just created:
Things seemed to have worked - our slice is here!
3.4. Contents 43
But it’s a bit smaller than we might like. Luckily, you can adjust the size of slices in a dashboard by clicking,
holding and dragging the bottom-right corner to your desired dimensions:
After adjusting the size, you’ll be asked to click on the icon near the top-right of the dashboard to save the new
configuration.
Congrats! You’ve successfully linked, analyzed, and visualized data in Superset. There are a wealth of other table
configuration and visualization options, so please start exploring and creating slices and dashboards of your own.
Exploring data with Apache

Superset
In this tutorial, we will introduce key concepts in Apache Superset through the exploration of a real dataset which
contains the flights made by employees of a UK-based organization in 2011. The following information about each
flight is given:
• The traveller’s department. For the purposes of this tutorial the departments have been renamed Orange,
Yellow and Purple.
• The cost of the ticket.
• The travel class (Economy, Premium Economy, Business and First Class).
• Whether the ticket was a single or return.
• The date of travel.
• Information about the origin and destination.
• The distance between the origin and destination, in kilometers (km).
Enabling Upload a CSV Functionality
You may need to enable the functionality to upload a CSV to your database. The following section explains how to
enable this functionality for the examples database.
In the top menu, select Sources → Databases. Find the examples database in the list and select the edit record button.
Within the Edit Database page, check the Allow Csv Upload checkbox.
Finally, save by selecting Save at the bottom of the page.
Obtaining and loading the data
Download the data for this tutorial to your computer from Github.
In the top menu, select Sources → Upload a CSV.
3.4. Contents 45
Then, enter the Table name as tutorial_flights and select the CSV file from your computer.
Next enter the text Travel Date into the Parse Dates field.
3.4. Contents 47
Leaving all the other options in their default settings, select Save at the bottom of the page.
Table Visualization
In this section, we’ll create our first visualization: a table to show the number of flights and cost per travel class.
To create a new chart, select the New → Chart.
Once in the Create a new chart dialogue, select tutorial_flights from the Chose a datasource dropdown.
3.4. Contents 49
Next, select the visualization type as Table.
Then, select Create new chart to go into the chart

view.
By default, Apache Superset only shows the last week of data: in our example, we want to look at all the data in
the dataset. No problem - within the Time section, remove the filter on Time range by selecting on Last week then
changing the selection to No filter, with a final OK to confirm your selection.
3.4. Contents 51
Now, we want to specify the rows in our table by using the Group by option. Since in this example, we want to
understand different Travel Classes, we select Travel Class in this menu.
Next, we can specify the metrics we would like to see in our table with the Metrics option. Count(*), which represents
the number of rows in the table (in this case corresponding to the number of flights since we have a row per flight),
is already there. To add cost, within Metrics, select Cost. Save the default aggregation option, which is to sum the
column.
Finally, select Run Query to see the results of the table.
3.4. Contents 53
Congratulations, you have created your first visualization in Apache Superset!

To save the visualization, click on Save in the top left of the screen. Select the Save as option, and enter the chart
name as Tutorial Table (you will be able to find it again through the Charts screen, accessible in the top menu).
Similarly, select Add to new dashboard and enter Tutorial Dashboard. Finally, select Save & go to dashboard.
Dashboard
basics
Next, we are going to explore the dashboard interface. If you’ve followed the previous section, you should already
have the dashboard open. Otherwise, you can navigate to the dashboard by selecting Dashboards on the top menu,
then Tutorial dashboard from the list of dashboards.
On this dashboard you should see the table you created in the previous section. Select Edit dashboard and then
hover over the table. By selecting the bottom right hand corner of the table (the cursor will change too), you can
resize it by dragging and dropping.
3.4. Contents 55
Finally, save your changes by selecting Save changes in the top

right.
Pivot
Table
In this section, we will extend our analysis using a more complex visualization, Pivot Table. By the end of this section,
you will have created a table that shows the monthly spend on flights for the first six months, by department, by travel
class.
As before, create a new visualization by selecting New → Chart on the top menu. Choose tutorial_flights again as a
datasource, then click on the visualization type to get to the visualization menu. Select the Pivot Table visualization
(you can filter by entering text in the search box) and then Create a new chart.
In the Time section, keep the Time Column as Travel Date (this is selected automatically as we only have one time
column in our dataset). Then select Time Grain to be month as having daily data would be too granular to see
patterns from. Then select the time range to be the first six months of 2011 by click on Last week in the Time Range
section,
then in Custom selecting a Start / end of 1st January 2011 and 30th June 2011 respectively by either entering directly
the dates or using the calendar widget (by selecting the month name and then the year, you can move more quickly
to far away dates).
Next, within the Query section, remove the default COUNT(*) and add Cost, keeping the default SUM aggregate.
Note that Apache Superset will indicate the type of the metric by the symbol on the left hand column of the list (ABC
for string, # for number, a clock face for time, etc.).
In Group by select Time: this will automatically use the Time Column and Time Grain selections we defined in the
Time section.
Within Columns, select first Department and then Travel Class. All set – let’s Run Query to see some data!
3.4. Contents 57
You should see months in the rows and Department and Travel Class in the columns. To get this in our
dashboard, select Save, name the chart Tutorial Pivot and using Add chart to existing dashboard select Tutorial
Dashboard, and then finally Save & go to dashboard.
Line
Chart
In this section, we are going to create a line chart to understand the average price of a ticket by month across the
entire dataset. As before, select New → Chart, and then tutorial_flights as the datasource and Line Chart as the
visualization type.
In the Time section, as before, keep the Time Column as Travel Date and Time Grain as month but this time for
the
Time range select No filter as we want to look at entire
dataset.
Within Metrics, remove the default COUNT(*) and add Cost. This time, we want to change how this column is
aggregated to show the mean value: we can do this by selecting AVG in the aggregate dropdown.
Next, select Run Query to show the data on the

chart.
How does this look? Well, we can see that the average cost goes up in December. However, perhaps it doesn’t
make sense to combine both single and return tickets, but rather show two separate lines for each ticket type.
Let’s do this by selecting Ticket Single or Return in the Group by box, and the selecting Run Query again. Nice! We
can see that on average single tickets are cheaper than returns and that the big spike in December is caused by
return tickets.
Our chart is looking pretty good already, but let’s customize some more by going to the Customize tab on the left
hand pane. Within this pane, try changing the Color Scheme, removing the range filter by selecting No in the Show
Range
3.4. Contents 59
Filter drop down and adding some labels using X Axis Label and Y Axis Label.
Once you’re done, Save as Tutorial Line Chart, use Add chart to existing dashboard to add this chart to the previous
ones on the Tutorial Dashboard and then Save & go to dashboard.
Markup
In this section, we will add some text to our dashboard. If you’re there already, you can navigate to the
dashboard by selecting Dashboards on the top menu, then Tutorial dashboard from the list of dashboards. Got
into edit mode by selecting Edit dashboard.
Within the Insert components pane, drag and drop a Markdown box on the dashboard. Look for the blue lines which
indicate the anchor where the box will go.
Now, to edit the text, select the box. You can enter text, in markdown format (see this Markdown Cheatsheet
for more information about this format). You can toggle between Edit and Preview using the menu on the top of
the box.
To exit, select any other part of the dashboard. Finally, don’t forget to keep your changes using Save changes.
Filter box
In this section, you will learn how to add a filter to your dashboard. Specifically, we will create a filter that allows
us to look at those flights that depart from a particular country.
A filter box visualization can be created as any other visualization by selecting New → Chart, and then tutorial_flights
as the datasource and Filter Box as the visualization type.
First of all, in the Time section, remove the filter from the Time range selection by selecting No filter.
3.4. Contents 61
Next, in Filters Configurations first add a new filter by selecting the plus sign and then edit the newly created filter by
selecting the pencil icon.
For our use case, it makes most sense to present a list of countries in alphabetical order. First, enter the column as
Origin Country and keep all other options the same and then select Run Query. This gives us a preview of our filter.
Next, remove the date filter by unchecking the Date Filter checkbox.
Finally, select Save, name the chart as Tutorial Filter, add the chart to our existing Tutorial Dashboard and then Save
& go to dashboard. Once on the Dashboard, try using the filter to show only those flights that departed from the
United Kingdom – you will see the filter is applied to all of the other visualizations on the dashboard.
Publishing your
dashboard
If you have followed all of the steps outlined in the previous section, you should have a dashboard that looks like the
below. If you would like, you can rearrange the elements of the dashboard by selecting Edit dashboard and dragging
and dropping.
If you would like to make your dashboard available to other users, simply select Draft next to the title of your
dashboard on the top left to change your dashboard to be in Published state. You can also favorite this dashboard by
selecting the star.
Taking your dashboard

further
In the following sections, we will look at more advanced Apache Superset topics.
Annotations
Annotations allow you to add additional context to your chart. In this section, we will add an annotation to the
Tutorial Line Chart we made in a previous section. Specifically, we will add the dates when some flights were
cancelled by the UK’s Civil Aviation Authority in response to the eruption of the Grímsvötn volcano in Iceland
(23-25 May 2011).
First, add an annotation layer by navigating to Manage → Annotation Layers. Add a new annotation layer by
selecting the green plus sign to add a new record. Enter the name Volcanic Eruptions and save. We can use this
layer to refer to a number of different annotations.
Next, add an annotation by navigating to Manage → Annotations and then create a new annotation by selecting the
green plus sign. Then, select the Volcanic Eruptions layer, add a short description Grímsvötn and the eruption dates
(23-25 May 2011) before finally saving.
3.4. Contents 63
Then, navigate to the line chart by going to Charts then selecting Tutorial Line Chart from the list. Next, go to the
Annotations and Layers section and select Add Annotation Layer. Within this dialogue:
• name the layer as Volcanic Eruptions
• change the Annotation Layer Type to Event
• set the Annotation Source as Superset annotation
• specify the Annotation Layer as Volcanic Eruptions
Select Apply to see your annotation shown on the

chart.
If you wish, you can change how your annotation looks by changing the settings in the Display configuration
section. Otherwise, select OK and finally Save to save your chart. If you keep the default selection to overwrite
the chart, your annotation will be saved to the chart and also appear automatically in the Tutorial Dashboard.
3.4. Contents 65
Advanced
Analytics
In this section, we are going to explore the Advanced Analytics feature of Apache Superset that allows you to apply
additional transformations to your data. The three types of transformation are:
Moving Average Select a rolling window1 , and then apply a calculation on it (mean, sum or standard deviation). The
fourth option, cumsum, calculates the cumulative sum of the series2 .
Time Comparison Shift your data in time and, optionally, apply a calculation to compare the shifted data with your
actual data (e.g. calculate the absolute difference between the two).
Python Functions Resample your data using one of a variety of
methods3 .
Setting up the base

chart
In this section, we’re going to set up a base chart which we can then apply the different Advanced Analytics features
to. Start off by creating a new chart using the same tutorial_flights datasource and the Line Chart visualization type.
Within the Time section, set the Time Range as 1st October 2011 and 31st October 2011.
Next, in the query section, change the Metrics to the sum of Cost. Select Run Query to show the chart. You should
see the total cost per day for each month in October 2011.
Finally, save the visualization as Tutorial Advanced Analytics Base, adding it to the Tutorial
Dashboard.
Rolling
mean
There is quite a lot of variation in the data, which makes it difficult to identify any trend. One approach we can take
is to show instead a rolling average of the time series. To do this, in the Moving Average subsection of Advanced
Analytics, select mean in the Rolling box and enter 7 into both Periods and Min Periods. The period is the length of
the rolling period expressed as a multiple of the Time Grain. In our example, the Time Grain is day, so the rolling
period is
1
See the Pandas rolling method documentation for more information.
2
See the Pandas cumsum method documentation for more information.
3
See the Pandas resample method documentation for more information.
7 days, such that on the 7th October 2011 the value shown would correspond to the first seven days of October
2011. Lastly, by specifying Min Periods as 7, we ensure that our mean is always calculated on 7 days and we avoid
any ramp up period.
After displaying the chart by selecting Run Query you will see that the data is less variable and that the series starts
later as the ramp up period is excluded.
Save the chart as Tutorial Rolling Mean and add it to the Tutorial Dashboard.
Time
Comparison
In this section, we will compare values in our time series to the value a week before. Start off by opening the
Tutorial Advanced Analytics Base chart, by going to Charts in the top menu and then selecting the visualization
name in the list (alternatively, find the chart in the Tutorial Dashboard and select Explore chart from the menu
for that visualization).
Next, in the Time Comparison subsection of Advanced Analytics, enter the Time Shift by typing in “minus 1 week”
(note this box accepts input in natural language). Run Query to see the new chart, which has an additional series with
the same values, shifted a week back in time.
3.4. Contents 67
Then, change the Calculation type to Absolute difference and select Run Query. We can now see only one series
again, this time showing the difference between the two series we saw previously.
Save the chart as Tutorial Time Comparison and add it to the Tutorial Dashboard.
Resampling the data
In this section, we’ll resample the data so that rather than having daily data we have weekly data. As in the previous
section, reopen the Tutorial Advanced Analytics Base chart.
Next, in the Python Functions subsection of Advanced Analytics, enter 7D, corresponding to seven days, in the Rule
and median as the Method and show the chart by selecting Run Query.
Note that now we have a single data point every 7 days. In our case, the value showed corresponds to the median
value within the seven daily data points. For more information on the meaning of the various options in this section,
refer to the Pandas documentation.
Lastly, save your chart as Tutorial Resample and add it to the Tutorial Dashboard. Go to the tutorial dashboard to
see the four charts side by side and compare the different outputs.
3.4.3 Security
Security in Superset is handled by Flask AppBuilder (FAB). FAB is a “Simple and rapid application development
framework, built on top of Flask.”. FAB provides authentication, user management, permissions and roles. Please
read its Security documentation.
Provided
Roles
Superset ships with a set of roles that are handled by Superset itself. You can assume that these roles will stay up-
to-date as Superset evolves. Even though it’s possible for Admin users to do so, it is not recommended that you
alter these roles in any way by removing or adding permissions to them as these roles will be re-synchronized to their
original values as you run your next superset init command.
Since it’s not recommended to alter the roles described here, it’s right to assume that your security strategy
should be to compose user access based on these base roles and roles that you create. For instance you
could create a role Financial Analyst that would be made of a set of permissions to a set of data sources
(tables) and/or databases. Users would then be granted Gamma, Financial Analyst, and perhaps sql_lab.
Admin
Admins have all possible rights, including granting or revoking rights from other users and altering other people’s
slices and dashboards.
3.4. Contents 69
Alpha
Alpha users have access to all data sources, but they cannot grant or revoke access from other users. They are also
limited to altering the objects that they own. Alpha users can add and alter data sources.
Gamma
Gamma users have limited access. They can only consume data coming from data sources they have been given
access to through another complementary role. They only have access to view the slices and dashboards made
from data sources that they have access to. Currently Gamma users are not able to alter or add data sources. We
assume that they are mostly content consumers, though they can create slices and dashboards.
Also note that when Gamma users look at the dashboards and slices list view, they will only see the objects that they
have access to.
sql_lab
The sql_lab role grants access to SQL Lab. Note that while Admin users have access to all databases by
default, both Alpha and Gamma users need to be given access on a per database basis.
Public
It’s possible to allow logged out users to access some Superset

features.
By setting PUBLIC_ROLE_LIKE_GAMMA = True in your superset_config.py, you grant public role the
same set of permissions as for the GAMMA role. This is useful if one wants to enable anonymous users to
view dashboards. Explicit grant on specific datasets is still required, meaning that you need to edit the Public
role and add the Public data sources to the role manually.
Managing Gamma per data source

access
Here’s how to provide users access to only specific datasets. First make sure the users with limited access
have [only] the Gamma role assigned to them. Second, create a new role (Menu -> Security -> List
Roles) and click the + sign.
This new window allows you to give this new role a name, attribute it to users and select the tables in the
Permissions dropdown. To select the data sources you want to associate with this role, simply click on the
dropdown and use the typeahead to search for your table names.
You can then confirm with your Gamma users that they see the objects (dashboards and slices) associated with the
tables related to their roles.
Customizing
The permissions exposed by FAB are very granular and allow for a great level of customization. FAB creates
many permissions automagically for each model that is created (can_add, can_delete, can_show, can_edit, . . . ) as
well as for each view. On top of that, Superset can expose more granular permissions like
all_datasource_access.
We do not recommend altering the 3 base roles as there are a set of assumptions that Superset is built upon. It is
possible though for you to create your own roles, and union them to existing ones.
Permissions
Roles are composed of a set of permissions, and Superset has many categories of permissions. Here are the
different categories of permissions:
• Model & action: models are entities like Dashboard, Slice, or User. Each model has a fixed set of
permissions, like can_edit, can_show, can_delete, can_list, can_add, and so on. By adding
can_delete on Dashboard to a role, and granting that role to a user, this user will be able to delete
dashboards.
• Views: views are individual web pages, like the explore view or the SQL Lab view. When granted to a
user, he/she will see that view in its menu items, and be able to load that page.
• Data source: For each data source, a permission is created. If the user does not have the
all_datasource_access permission granted, the user will only be able to see Slices or explore the data
sources that are granted to them
• Database: Granting access to a database allows for the user to access all data sources within that database,
and will enable the user to query that database in SQL Lab, provided that the SQL Lab specific permission
have been granted to the user
Restricting access to a subset of data

sources
The best way to go is probably to give user Gamma plus one or many other roles that would add access to
specific data sources. We recommend that you create individual roles for each access profile. Say people in your
finance department might have access to a set of databases and data sources, and these permissions can be
consolidated in a single role. Users with this profile then need to be attributed Gamma as a foundation to the
models and views they can access, and that Finance role that is a collection of permissions to data objects.
One user can have many roles, so a finance executive could be granted Gamma, Finance, and perhaps
another Executive role that gather a set of data sources that power dashboards only made available to
executives. When looking at its dashboard list, this user will only see the list of dashboards it has access to,
based on the roles and permissions that were attributed.
3.4.4 SQL
Lab
SQL Lab is a modern, feature-rich SQL IDE written in React.
3.4. Contents 71
Feature Overview
• Connects to just about any database backend

• A multi-tab environment to work on multiple queries at a time
• A smooth flow to visualize your query results using Superset’s rich visualization capabilities
• Browse database metadata: tables, columns, indexes, partitions
• Support for long-running queries
– uses the Celery distributed queue to dispatch query handling to workers
– supports defining a “results backend” to persist query results
• A search engine to find queries executed in the past
• Supports templating using the Jinja templating language which allows for using macros in your SQL code
Extra features
• Hit alt + enter as a keyboard shortcut to run your query
Templating with Jinja
SELECT *
FROM some_table
WHERE partition_key = '{{ presto.first_latest_partition('some_table') }}'
Templating unleashes the power and capabilities of a programming language within your SQL code.
Templates can also be used to write generic queries that are parameterized so they can be re-used easily.
Available
macros
We expose certain modules from Python’s standard library in Superset’s Jinja

context:
• time: time
• datetime: datetime.datetime
• uuid: uuid
• random: random
• relativedelta: dateutil.relativedelta.relativedelta
Jinja’s builtin filters can be also be applied where
needed.
Extending
macros
As mentioned in the Installation & Configuration documentation, it’s possible for administrators to expose more
more macros in their environment using the configuration variable JINJA_CONTEXT_ADDONS. All objects
referenced in this dictionary will become available for users to integrate in their queries in SQL Lab.
Query cost
estimation
Some databases support EXPLAIN queries that allow users to estimate the cost of queries before executing this.
Currently, Presto is supported in SQL Lab. To enable query cost estimation, add the following keys to the “Extra” field
in the database configuration:
{
"version": "0.319",
"cost_estimate_enabled": true
...
}
Here, “version” should be the version of your Presto cluster. Support for this functionality was introduced in Presto
0.319.
You also need to enable the feature flag in your superset_config.py, and you can optionally specify a custom formatter.
Eg:
def presto_query_cost_formatter(cost_estimate: List[Dict[str, float]]) ->
˓→List[Dict[str, str]]:
"""
Format cost estimate returned by Presto.
:param cost_estimate: JSON estimate from Presto

:return: Human readable cost estimate
"""
# Convert cost to dollars based on CPU and network cost. These coefficients are
˓→just
# examples, they need to be estimated based on your infrastructure.

cpu_coefficient = 2e-12
network_coefficient = 1e-12
3.4. Contents 73
cost = 0
for row in cost_estimate:
3.4. Contents 73

cost += row.get("cpuCost", 0) * cpu_coefficient
cost += row.get("networkCost", 0) * network_coefficient
return [{"Cost": f"US$ {cost:.2f}"}]
DEFAULT_FEATURE_FLAGS = {
"ESTIMATE_QUERY_COST": True,
"QUERY_COST_FORMATTERS_BY_ENGINE": {"presto": presto_query_cost_formatter},
}
Create Table As (CTAS)
You can use CREATE TABLE AS SELECT ... statements on SQLLab. This feature can be toggled on and off
at the database configuration level.
Note that since CREATE TABLE.. belongs to a SQL DDL category. Specifically on PostgreSQL, DDL is
transac- tional, this means that to properly use this feature you have to set autocommit to true on your engine
parameters:
{
...
"engine_params": {"isolation_level":"AUTOCOMMIT"},
...
}
3.4.5 Visualizations Gallery
215
+7.0%WoW
80.7M
-
- j
--· r:.,..c:,.:::-ca·=r.,. ._.;:::- ----
3.4. Contents 75
---·-.·...-_·,-_--- ·--·--··--•c..
..
.
·-•••
.. .•. . : .·• •
...
••
.....- ·- ..
.
..
•• " "..: ' 'loo ' 'lor' ·• . ' ••;.. •• · :" '" l.:.o "'O:o'
•
•
• 08.00 09.00 10.00 1100 1200
•••••
· ·- ·- ·- ·-
·-·- · _
...... ·-
' \
3.4. Contents 77
3.4. Contents 79
\·-,.
'
:
'"
...... --
..........-.·
--"'N .. . 0
.1•• •••••
• .. '
1
... ' 1 l' :,·,. .

1• ' .
l'·' ....
I l ,Il, : , ,
---
{,.ope&C..UoiAo.il
u,.,_...,Coor
-E.I&Hciii A!rlca
3.4. Contents 81
-
-
·---.
.....
·-- · -··-
· · · ....
·--
·--
Birth Names
Dashboard
Tt.e source daluet came from [ el
sum_SP_POP_TOTL
-·
$ (11. .0
3.4. Contents 83
First Section
This is an paragraph that explains what you should expect to f1r
Participants
"\
1.11
t:
-"-. ·,
.·--------
·
/ ..
·
·
-·'---------...._
3.4. Contents 85
3.4.6 Druid
Superset has a native connector to Druid and a majority of Druid’s features are accessible through Superset.
Note: Druid now supports SQL and can be accessed through Superset’s SQLAlchemy connector. The long-term
vision is to deprecate the Druid native REST connector and query Druid exclusively through the SQL interface.
Aggregations
Common aggregations or Druid metrics can be defined and used in Superset. The first and simpler use case is
to use the checkbox matrix expose in your datasource’s edit view (Sources -> Druid Datasources ->
[your datasource] -> Edit -> [tab] List Druid Column). Clicking the GroupBy and Filterable
checkboxes will make the column appear in the related dropdowns while in explore view. Checking Count
Distinct, Min, Max or Sum will result in creating new metrics that will appear in the List Druid Metric
tab upon saving the datasource. By editing these metrics, you’ll notice that their json element corresponds to
Druid aggregation definition. You can create your own aggregations manually from the List Druid Metric
tab following Druid documentation.
Post-Aggregations
Druid supports post aggregation and this works in Superset. All you have to do is create a metric, much like
you would create an aggregation manually, but specify postagg as a Metric Type. You then have to provide
a valid json post-aggregation definition (as specified in the Druid docs) in the Json field.
Unsupported
Features
Note: Unclear at this point, this section of the documentation could use some input.
3.4.7 Misc
Visualization Tools
The data is visualized via the slices. These slices are visual components made with the D3.js. Some components can
be completed or required inputs.
Country Map Tools
This tool is used in slices for visualization number or string by region, province or department of your countries.
So, if you want to use tools, you need ISO 3166-2 code of region, province or department.
3.4. Contents 87
ISO 3166-2 is part of the ISO 3166 standard published by the International Organization for Standardization (ISO),
and defines codes for identifying the principal subdivisions (e.g., provinces or states) of all countries coded in ISO
3166-1
The purpose of ISO 3166-2 is to establish an international standard of short and unique alphanumeric codes to
represent the relevant administrative divisions and dependent territories of all countries in a more convenient and less
ambiguous form than their full names. Each complete ISO 3166-2 code consists of two parts, separated by a hyphen:
[1]
The first part is the ISO 3166-1 alpha-2 code of the country; The second part is a string of up to three alphanumeric
characters, which is usually obtained from national sources and stems from coding systems already in use in the
country concerned, but may also be developed by the ISO itself.
List of
Countries
• Belgium
ISO Name of region

BE-BRU Bruxelles
BE-VAN Antwerpen
BE-VLI Limburg
BE-VOV Oost-Vlaanderen
BE-VBR Vlaams Brabant
BE-VWV West-Vlaanderen
BE-WBR Brabant Wallon
BE-WHT Hainaut
BE-WLG Liège
BE-VLI Limburg
BE-WLX Luxembourg
BE-WNA Namur
• Brazil
ISO Name of region

BR-AC Acre
BR-AL Alagoas
BR-AP Amapá
BR-AM Amazonas
BR-BA Bahia
BR-CE Ceará
BR-DF Distrito Federal
BR-ES Espírito Santo
BR-GO Goiás
BR-MA Maranhão
BR-MS Mato Grosso do Sul
BR-MT Mato Grosso
BR-MG Minas Gerais
BR-PA Pará
BR-PB Paraíba
BR-PR Paraná
BR-PE Pernambuco
BR-PI Piauí
BR-RJ Rio de Janeiro
BR-RN Rio Grande do Norte
BR-RS Rio Grande do Sul
BR-RO Rondônia
BR-RR Roraima
BR-SP São Paulo
BR-SC Santa Catarina
BR-SE Sergipe
BR-TO Tocantins
• China
ISO Name of region

CN-34 Anhui
CN-11 Beijing
CN-50 Chongqing
CN-35 Fujian
CN-62 Gansu
CN-44 Guangdong
CN-45 Guangxi
CN-52 Guizhou
CN-46 Hainan
CN-13 Hebei
CN-23 Heilongjiang
CN-41 Henan
CN-42 Hubei
CN-43 Hunan
CN-32 Jiangsu
CN-36 Jiangxi
CN-22 Jilin
CN-21 Liaoning
CN-15 Nei Mongol
Continued on next page
3.4. Contents 89
Table 1 – continued from previous page

ISO Name of region
CN-64 Ningxia Hui
CN-63 Qinghai
CN-61 Shaanxi
CN-37 Shandong
CN-31 Shanghai
CN-14 Shanxi
CN-51 Sichuan
CN-12 Tianjin
CN-65 Xinjiang Uygur
CN-54 Xizang
CN-53 Yunnan
CN-33 Zhejiang
CN-71 Taiwan
CN-91 Hong Kong
CN-92 Macao
• Egypt
ISO Name of region

EG-DK Ad Daqahliyah
EG-BA Al Bahr al Ahmar
EG-BH Al Buhayrah
EG-FYM Al Fayyum
EG-GH Al Gharbiyah
EG-ALX Al Iskandariyah
EG-IS Al Isma iliyah
EG-GZ Al Jizah
EG-MNF Al Minufiyah
EG-MN Al Minya
EG-C Al Qahirah
EG-KB Al Qalyubiyah
EG-LX Al Uqsur
EG-WAD Al Wadi al Jadid
EG-SUZ As Suways
EG-SHR Ash Sharqiyah
EG-ASN Aswan
EG-AST Asyut
EG-BNS Bani Suwayf
EG-PTS Bur Sa id
EG-DT Dumyat
EG-JS Janub Sina’
EG-KFS Kafr ash Shaykh
EG-MT Matrouh
EG-KN Qina
EG-SIN Shamal Sina’
EG-SHG Suhaj
• France
ISO Name of region

FR-67 Bas-Rhin
FR-68 Haut-Rhin
FR-24 Dordogne
FR-33 Gironde
FR-40 Landes
FR-47 Lot-et-Garonne
FR-64 Pyrénées-Atlantiques
FR-03 Allier
FR-15 Cantal
FR-43 Haute-Loire
FR-63 Puy-de-Dôme
FR-91 Essonne
FR-92 Hauts-de-Seine
FR-75 Paris
FR-77 Seine-et-Marne
FR-93 Seine-Saint-Denis
FR-95 Val-d’Oise
FR-94 Val-de-Marne
FR-78 Yvelines
FR-14 Calvados
FR-50 Manche
FR-61 Orne
FR-21 Côte-d’Or
FR-58 Nièvre
FR-71 Saône-et-Loire
FR-89 Yonne
FR-22 Côtes-d’Armor
FR-29 Finistère
FR-35 Ille-et-Vilaine
FR-56 Morbihan
FR-18 Cher
FR-28 Eure-et-Loir
FR-37 Indre-et-Loire
FR-36 Indre
FR-41 Loir-et-Cher
FR-45 Loiret
FR-08 Ardennes
FR-10 Aube
FR-52 Haute-Marne
FR-51 Marne
FR-2A Corse-du-Sud
FR-2B Haute-Corse
FR-25 Doubs
FR-70 Haute-Saône
FR-39 Jura
FR-90 Territoire de Belfort
FR-27 Eure
FR-76 Seine-Maritime
FR-11 Aude
FR-30 Gard
3.4. Contents 91

ISO Name of region
FR-34 Hérault
FR-48 Lozère
FR-66 Pyrénées-Orientales
FR-19 Corrèze
FR-23 Creuse
FR-87 Haute-Vienne
FR-54 Meurthe-et-Moselle
FR-55 Meuse
FR-57 Moselle
FR-88 Vosges
FR-09 Ariège
FR-12 Aveyron
FR-32 Gers
FR-31 Haute-Garonne
FR-65 Hautes-Pyrénées
FR-46 Lot
FR-82 Tarn-et-Garonne
FR-81 Tarn
FR-59 Nord
FR-62 Pas-de-Calais
FR-44 Loire-Atlantique
FR-49 Maine-et-Loire
FR-53 Mayenne
FR-72 Sarthe
FR-85 Vendée
FR-02 Aisne
FR-60 Oise
FR-80 Somme
FR-17 Charente-Maritime
FR-16 Charente
FR-79 Deux-Sèvres
FR-86 Vienne
FR-04 Alpes-de-Haute-Provence
FR-06 Alpes-Maritimes
FR-13 Bouches-du-Rhône
FR-05 Hautes-Alpes
FR-83 Var
FR-84 Vaucluse
FR-01 Ain
FR-07 Ardèche
FR-26 Drôme
FR-74 Haute-Savoie
FR-38 Isère
FR-42 Loire
FR-69 Rhône
FR-73 Savoie
• Germany
ISO Name of region

DE-BW Baden-Württemberg
DE-BY Bayern
DE-BE Berlin
DE-BB Brandenburg
DE-HB Bremen
DE-HH Hamburg
DE-HE Hessen
DE-MV Mecklenburg-Vorpommern
DE-NI Niedersachsen
DE-NW Nordrhein-Westfalen
DE-RP Rheinland-Pfalz
DE-SL Saarland
DE-ST Sachsen-Anhalt
DE-SN Sachsen
DE-SH Schleswig-Holstein
DE-TH Thüringen
• Italy
ISO Name of region

IT-CH Chieti
IT-AQ L’Aquila
IT-PE Pescara
IT-TE Teramo
IT-BA Bari
IT-BT Barletta-Andria-Trani
IT-BR Brindisi
IT-FG Foggia
IT-LE Lecce
IT-TA Taranto
IT-MT Matera
IT-PZ Potenza
IT-CZ Catanzaro
IT-CS Cosenza
IT-KR Crotone
IT-RC Reggio Di Calabria
IT-VV Vibo Valentia
IT-AV Avellino
IT-BN Benevento
IT-CE Caserta
IT-NA Napoli
IT-SA Salerno
IT-BO Bologna
IT-FE Ferrara
IT-FC Forli’ - Cesena
IT-MO Modena
IT-PR Parma
IT-PC Piacenza
IT-RA Ravenna
IT-RE Reggio Nell’Emilia
3.4. Contents 93

ISO Name of region
IT-RN Rimini
IT-GO Gorizia
IT-PN Pordenone
IT-TS Trieste
IT-UD Udine
IT-FR Frosinone
IT-LT Latina
IT-RI Rieti
IT-RM Roma
IT-VT Viterbo
IT-GE Genova
IT-IM Imperia
IT-SP La Spezia
IT-SV Savona
IT-BG Bergamo
IT-BS Brescia
IT-CO Como
IT-CR Cremona
IT-LC Lecco
IT-LO Lodi
IT-MN Mantua
IT-MI Milano
IT-MB Monza and Brianza
IT-PV Pavia
IT-SO Sondrio
IT-VA Varese
IT-AN Ancona
IT-AP Ascoli Piceno
IT-FM Fermo
IT-MC Macerata
IT-PU Pesaro E Urbino
IT-CB Campobasso
IT-IS Isernia
IT-AL Alessandria
IT-AT Asti
IT-BI Biella
IT-CN Cuneo
IT-NO Novara
IT-TO Torino
IT-VB Verbano-Cusio-Ossola
IT-VC Vercelli
IT-CA Cagliari
IT-CI Carbonia-Iglesias
IT-VS Medio Campidano
IT-NU Nuoro
IT-OG Ogliastra
IT-OT Olbia-Tempio
IT-OR Oristano
IT-SS Sassari

ISO Name of region
IT-AG Agrigento
IT-CL Caltanissetta
IT-CT Catania
IT-EN Enna
IT-ME Messina
IT-PA Palermo
IT-RG Ragusa
IT-SR Syracuse
IT-TP Trapani
IT-AR Arezzo
IT-FI Florence
IT-GR Grosseto
IT-LI Livorno
IT-LU Lucca
IT-MS Massa Carrara
IT-PI Pisa
IT-PT Pistoia
IT-PO Prato
IT-SI Siena
IT-BZ Bolzano
IT-TN Trento
IT-PG Perugia
IT-TR Terni
IT-AO Aosta
IT-BL Belluno
IT-PD Padua
IT-RO Rovigo
IT-TV Treviso
IT-VE Venezia
IT-VR Verona
IT-VI Vicenza
• Japan
ISO Name of region

JP-01 Hokkaido
JP-02 Aomori
JP-03 Iwate
JP-04 Miyagi
JP-05 Akita
JP-06 Yamagata
JP-07 Fukushima
JP-08 Ibaraki
JP-09 Tochigi
JP-10 Gunma
JP-11 Saitama
JP-12 Chiba
JP-13 Tokyo
JP-14 Kanagawa
3.4. Contents 95

ISO Name of region
JP-15 Niigata
JP-16 Toyama
JP-17 Ishikawa
JP-18 Fukui
JP-19 Yamanashi
JP-20 Nagano
JP-21 Gifu
JP-22 Shizuoka
JP-23 Aichi
JP-24 Mie
JP-25 Shiga
JP-26 Kyoto
JP-27 Osaka
JP-28 Hyogo
JP-29 Nara
JP-30 Wakayama
JP-31 Tottori
JP-32 Shimane
JP-33 Okayama
JP-34 Hiroshima
JP-35 Yamaguchi
JP-36 Tokushima
JP-37 Kagawa
JP-38 Ehime
JP-39 Kochi
JP-40 Fukuoka
JP-41 Saga
JP-42 Nagasaki
JP-43 Kumamoto
JP-44 Oita
JP-45 Miyazaki
JP-46 Kagoshima
JP-47 Okinawa
• Korea
ISO Name of region

KR-11 Seoul
KR-26 Busan
KR-27 Daegu
KR-28 Incheon
KR-29 Gwangju
KR-30 Daejeon
KR-31 Ulsan
KR-41 Gyeonggi
KR-42 Gangwon
KR-43 Chungbuk
KR-44 Chungnam
KR-45 Jeonbuk
KR-46 Jeonnam
KR-47 Gyeongbuk
KR-48 Gyeongnam
KR-49 Jeju
KR-50 Sejong
• Liechtenstein
ISO Name of region

LI-01 Balzers
LI-02 Eschen
LI-03 Gamprin
LI-04 Mauren
LI-05 Planken
LI-06 Ruggell
LI-07 Schaan
LI-08 Schellenberg
LI-09 Triesen
LI-10 Triesenberg
LI-11 Vaduz
• Morocco
ISO Name of region

MA-BES Ben Slimane
MA-KHO Khouribga
MA-SET Settat
MA-JDI El Jadida
MA-SAF Safi
MA-BOM Boulemane
MA-FES Fès
MA-SEF Sefrou
MA-MOU Zouagha-Moulay Yacoub
MA-KEN Kénitra
MA-SIK Sidi Kacem
MA-CAS Casablanca
MA-MOH Mohammedia
MA-ASZ Assa-Zag
3.4. Contents 97

ISO Name of region
MA-GUE Guelmim
MA-TNT Tan-Tan
MA-TAT Tata
MA-LAA Laâyoune
MA-HAO Al Haouz
MA-CHI Chichaoua
MA-KES El Kelaâ des Sraghna
MA-ESI Essaouira
MA-MMD Marrakech
MA-HAJ El Hajeb
MA-ERR Errachidia
MA-IFR Ifrane
MA-KHN Khénifra
MA-MEK Meknès
MA-BER Berkane Taourirt
MA-FIG Figuig
MA-JRA Jerada
MA-NAD Nador
MA-OUJ Oujda Angad
MA-KHE Khémisset
MA-RAB Rabat
MA-SAL Salé
MA-SKH Skhirate-Témara
MA-AGD Agadir-Ida ou Tanane
MA-CHT Chtouka-Aït Baha
MA-INE Inezgane-Aït Melloul
MA-OUA Ouarzazate
MA-TAR Taroudannt
MA-TIZ Tiznit
MA-ZAG Zagora
MA-AZI Azilal
MA-BEM Béni Mellal
MA-CHE Chefchaouen
MA-FAH Fahs Anjra
MA-LAR Larache
MA-TET Tétouan
MA-TNG Tanger-Assilah
MA-HOC Al Hoceïma
MA-TAO Taounate
MA-TAZ Taza
• Netherlands
ISO Name of region

NL-DR Drenthe
NL-FL Flevoland
NL-FR Friesland
NL-GE Gelderland
NL-GR Groningen
NL-YS IJsselmeer
NL-LI Limburg
NL-NB Noord-Brabant
NL-NH Noord-Holland
NL-OV Overijssel
NL-UT Utrecht
NL-ZE Zeeland
NL-ZM Zeeuwse meren
NL-ZH Zuid-Holland
• Russian
ISO Name of region

RU-AD Adygey
RU-ALT Altay
RU-AMU Amur
RU-ARK Arkhangel’sk
RU-AST Astrakhan’
RU-BA Bashkortostan
RU-BEL Belgorod
RU-BRY Bryansk
RU-BU Buryat
RU-CE Chechnya
RU-CHE Chelyabinsk
RU-CHU Chukot
RU-CU Chuvash
RU-SPE City of St. Petersburg
RU-DA Dagestan
RU-AL Gorno-Altay
RU-IN Ingush
RU-IRK Irkutsk
RU-IVA Ivanovo
RU-KB Kabardin-Balkar
RU-KGD Kaliningrad
RU-KL Kalmyk
RU-KLU Kaluga
RU-KAM Kamchatka
RU-KC Karachay-Cherkess
RU-KR Karelia
RU-KEM Kemerovo
RU-KHA Khabarovsk
RU-KK Khakass
RU-KHM Khanty-Mansiy
RU-KIR Kirov
RU-KO Komi
3.4. Contents 99

ISO Name of region
RU-KOS Kostroma
RU-KDA Krasnodar
RU-KYA Krasnoyarsk
RU-KGN Kurgan
RU-KRS Kursk
RU-LEN Leningrad
RU-LIP Lipetsk
RU-MAG Maga Buryatdan
RU-ME Mariy-El
RU-MO Mordovia
RU-MOW Moscow City
RU-MOS Moskva
RU-MUR Murmansk
RU-NEN Nenets
RU-NIZ Nizhegorod
RU-SE North Ossetia
RU-NGR Novgorod
RU-NVS Novosibirsk
RU-OMS Omsk
RU-ORL Orel
RU-ORE Orenburg
RU-PNZ Penza
RU-PER Perm’
RU-PRI Primor’ye
RU-PSK Pskov
RU-ROS Rostov
RU-RYA Ryazan’
RU-SAK Sakhalin
RU-SA Sakha
RU-SAM Samara
RU-SAR Saratov
RU-SMO Smolensk
RU-STA Stavropol’
RU-SVE Sverdlovsk
RU-TAM Tambov
RU-TA Tatarstan
RU-TOM Tomsk
RU-TUL Tula
RU-TY Tuva
RU-TVE Tver’
RU-TYU Tyumen’
RU-UD Udmurt
RU-ULY Ul’yanovsk
RU-VLA Vladimir
RU-VGG Volgograd
RU-VLG Vologda
RU-VOR Voronezh
RU-YAN Yamal-Nenets
RU-YAR Yaroslavl’


ISO Name of region
RU-YEV Yevrey
RU-ZAB Zabaykal’ye
• Singapore
Id Name of region
205 Singapore
• Spain
ISO Name of region

ES-AL Almería
ES-CA Cádiz
ES-CO Córdoba
ES-GR Granada
ES-H Huelva
ES-J Jaén
ES-MA Málaga
ES-SE Sevilla
ES-HU Huesca
ES-TE Teruel
ES-Z Zaragoza
ES-S3 Cantabria
ES-AB Albacete
ES-CR Ciudad Real
ES-CU Cuenca
ES-GU Guadalajara
ES-TO Toledo
ES-AV Ávila
ES-BU Burgos
ES-LE León
ES-P Palencia
ES-SA Salamanca
ES-SG Segovia
ES-SO Soria
ES-VA Valladolid
ES-ZA Zamora
ES-B Barcelona
ES-GI Girona
ES-L Lleida
ES-T Tarragona
ES-CE Ceuta
ES-ML Melilla
ES-M5 Madrid
ES-NA7 Navarra
ES-A Alicante
ES-CS Castellón
ES-V Valencia
ES-BA Badajoz
ES-CC Cáceres
3.4. Contents 101


ISO Name of region
ES-C A Coruña
ES-LU Lugo
ES-OR Ourense
ES-PO Pontevedra
ES-PM Baleares
ES-GC Las Palmas
ES-TF Santa Cruz de Tenerife
ES-LO4 La Rioja
ES-VI Álava
ES-SS Guipúzcoa
ES-BI Vizcaya
ES-O2 Asturias
ES-MU6 Murcia
• Switzerland
ISO Name of region

CH-AG Aargau
CH-AR Appenzell Ausserrhoden
CH-AI Appenzell Innerrhoden
CH-BL Basel-Landschaft
CH-BS Basel-Stadt
CH-BE Bern
CH-FR Freiburg
CH-GE Genf
CH-GL Glarus
CH-GR Graubünden
CH-JU Jura
CH-LU Luzern
CH-NE Neuenburg
CH-NW Nidwalden
CH-OW Obwalden
CH-SH Schaffhausen
CH-SZ Schwyz
CH-SO Solothurn
CH-SG St. Gallen
CH-TI Tessin
CH-TG Thurgau
CH-UR Uri
CH-VD Waadt
CH-VS Wallis
CH-ZG Zug
CH-ZH Zürich
• Uk
ISO Name of region

GB-BDG Barking and Dagenham
GB-BAS Bath and North East Somerset
GB-BDF Bedfordshire


ISO Name of region
GB-WBK Berkshire
GB-BEX Bexley
GB-BBD Blackburn with Darwen
GB-BMH Bournemouth
GB-BEN Brent
GB-BNH Brighton and Hove
GB-BST Bristol
GB-BRY Bromley
GB-BKM Buckinghamshire
GB-CAM Cambridgeshire
GB-CMD Camden
GB-CHS Cheshire
GB-CON Cornwall
GB-CRY Croydon
GB-CMA Cumbria
GB-DAL Darlington
GB-DBY Derbyshire
GB-DER Derby
GB-DEV Devon
GB-DOR Dorset
GB-DUR Durham
GB-EAL Ealing
GB-ERY East Riding of Yorkshire
GB-ESX East Sussex
GB-ENF Enfield
GB-ESS Essex
GB-GLS Gloucestershire
GB-GRE Greenwich
GB-HCK Hackney
GB-HAL Halton
GB-HMF Hammersmith and Fulham
GB-HAM Hampshire
GB-HRY Haringey
GB-HRW Harrow
GB-HPL Hartlepool
GB-HAV Havering
GB-HRT Herefordshire
GB-HEF Hertfordshire
GB-HIL Hillingdon
GB-HNS Hounslow
GB-IOW Isle of Wight
GB-ISL Islington
GB-KEC Kensington and Chelsea
GB-KEN Kent
GB-KHL Kingston upon Hull
GB-KTT Kingston upon Thames
GB-LBH Lambeth
GB-LAN Lancashire
GB-LEC Leicestershire
3.4. Contents 103


ISO Name of region
GB-LCE Leicester
GB-LEW Lewisham
GB-LIN Lincolnshire
GB-LND London
GB-LUT Luton
GB-MAN Manchester
GB-MDW Medway
GB-MER Merseyside
GB-MRT Merton
GB-MDB Middlesbrough
GB-MIK Milton Keynes
GB-NWM Newham
GB-NFK Norfolk
GB-NEL North East Lincolnshire
GB-NLN North Lincolnshire
GB-NSM North Somerset
GB-NYK North Yorkshire
GB-NTH Northamptonshire
GB-NBL Northumberland
GB-NTT Nottinghamshire
GB-NGM Nottingham
GB-OXF Oxfordshire
GB-PTE Peterborough
GB-PLY Plymouth
GB-POL Poole
GB-POR Portsmouth
GB-RDB Redbridge
GB-RCC Redcar and Cleveland
GB-RIC Richmond upon Thames
GB-RUT Rutland
GB-SHR Shropshire
GB-SOM Somerset
GB-SGC South Gloucestershire
GB-SY South Yorkshire
GB-STH Southampton
GB-SOS Southend-on-Sea
GB-SWK Southwark
GB-STS Staffordshire
GB-STT Stockton-on-Tees
GB-STE Stoke-on-Trent
GB-SFK Suffolk
GB-SRY Surrey
GB-STN Sutton
GB-SWD Swindon
GB-TFW Telford and Wrekin
GB-THR Thurrock
GB-TOB Torbay
GB-TWH Tower Hamlets
GB-TAW Tyne and Wear


ISO Name of region
GB-WFT Waltham Forest
GB-WND Wandsworth
GB-WRT Warrington
GB-WAR Warwickshire
GB-WM West Midlands
GB-WSX West Sussex
GB-WY West Yorkshire
GB-WSM Westminster
GB-WIL Wiltshire
GB-WOR Worcestershire
GB-YOR York
GB-ANT Antrim
GB-ARD Ards
GB-ARM Armagh
GB-BLA Ballymena
GB-BLY Ballymoney
GB-BNB Banbridge
GB-BFS Belfast
GB-CKF Carrickfergus
GB-CSR Castlereagh
GB-CLR Coleraine
GB-CKT Cookstown
GB-CGV Craigavon
GB-DRY Derry
GB-DOW Down
GB-DGN Dungannon
GB-FER Fermanagh
GB-LRN Larne
GB-LMV Limavady
GB-LSB Lisburn
GB-MFT Magherafelt
GB-MYL Moyle
GB-NYM Newry and Mourne
GB-NTA Newtownabbey
GB-NDN North Down
GB-OMH Omagh
GB-STB Strabane
GB-ABD Aberdeenshire
GB-ABE Aberdeen
GB-ANS Angus
GB-AGB Argyll and Bute
GB-CLK Clackmannanshire
GB-DGY Dumfries and Galloway
GB-DND Dundee
GB-EAY East Ayrshire
GB-EDU East Dunbartonshire
GB-ELN East Lothian
GB-ERW East Renfrewshire
GB-EDH Edinburgh
3.4. Contents 105


ISO Name of region
GB-ELS Eilean Siar
GB-FAL Falkirk
GB-FIF Fife
GB-GLG Glasgow
GB-HLD Highland
GB-IVC Inverclyde
GB-MLN Midlothian
GB-MRY Moray
GB-NAY North Ayrshire
GB-NLK North Lanarkshire
GB-ORK Orkney Islands
GB-PKN Perthshire and Kinross
GB-RFW Renfrewshire
GB-SCB Scottish Borders
GB-ZET Shetland Islands
GB-SAY South Ayrshire
GB-SLK South Lanarkshire
GB-STG Stirling
GB-WDU West Dunbartonshire
GB-WLN West Lothian
GB-AGY Anglesey
GB-BGW Blaenau Gwent
GB-BGE Bridgend
GB-CAY Caerphilly
GB-CRF Cardiff
GB-CMN Carmarthenshire
GB-CGN Ceredigion
GB-CWY Conwy
GB-DEN Denbighshire
GB-FLN Flintshire
GB-GWN Gwynedd
GB-MTY Merthyr Tydfil
GB-MON Monmouthshire
GB-NTL Neath Port Talbot
GB-NWP Newport
GB-PEM Pembrokeshire
GB-POW Powys
GB-RCT Rhondda
GB-SWA Swansea
GB-TOF Torfaen
GB-VGL Vale of Glamorgan
GB-WRX Wrexham
• Ukraine

ISO Name of region

UA-71 Cherkasy
UA-74 Chernihiv
UA-77 Chernivtsi
UA-43 Crimea
UA-12 Dnipropetrovs’k
UA-14 Donets’k
UA-26 Ivano-Frankivs’k
UA-63 Kharkiv
UA-65 Kherson
UA-68 Khmel’nyts’kyy
UA-30 Kiev City
UA-32 Kiev
UA-35 Kirovohrad
UA-46 L’viv
UA-09 Luhans’k
UA-48 Mykolayiv
UA-51 Odessa
UA-53 Poltava
UA-56 Rivne
UA-40 Sevastopol’
UA-59 Sumy
UA-61 Ternopil’
UA-21 Transcarpathia
UA-05 Vinnytsya
UA-07 Volyn
UA-23 Zaporizhzhya
UA-18 Zhytomyr
• Usa
ISO Name of region

US-AL Alabama
US-AK Alaska
US-AK Alaska
US-AZ Arizona
US-AR Arkansas
US-CA California
US-CO Colorado
US-CT Connecticut
US-DE Delaware
US-DC District of Columbia
US-FL Florida
US-GA Georgia
US-HI Hawaii
US-ID Idaho
US-IL Illinois
US-IN Indiana
US-IA Iowa
US-KS Kansas
US-KY Kentucky
3.4. Contents 107


ISO Name of region
US-LA Louisiana
US-ME Maine
US-MD Maryland
US-MA Massachusetts
US-MI Michigan
US-MN Minnesota
US-MS Mississippi
US-MO Missouri
US-MT Montana
US-NE Nebraska
US-NV Nevada
US-NH New Hampshire
US-NJ New Jersey
US-NM New Mexico
US-NY New York
US-NC North Carolina
US-ND North Dakota
US-OH Ohio
US-OK Oklahoma
US-OR Oregon
US-PA Pennsylvania
US-RI Rhode Island
US-SC South Carolina
US-SD South Dakota
US-TN Tennessee
US-TX Texas
US-UT Utah
US-VT Vermont
US-VA Virginia
US-WA Washington
US-WV West Virginia
US-WI Wisconsin
US-WY Wyoming
Need to add a new Country?
To add a new country in country map tools, we need to follow the following steps :
1. You need shapefiles which contain data of your map. You can get this file on this site: https://www.diva-
gis.org/ gdata
2. You need to add ISO 3166-2 with column name ISO for all record in your file. It’s important because it’s a norm
for mapping your data with geojson file
3. You need to convert shapefile to geojson file. This action can make with ogr2ogr tools: https://www.gdal.org/
ogr2ogr.html
4. Put your geojson file in next folder : superset/assets/src/visualizations/CountryMap/countries with the next name
: nameofyourcountries.geojson
5. You can to reduce size of geojson file on this site: https://mapshaper.org/
6. Go in file superset/assets/src/explore/controls.jsx

7. Add your country in component ‘select_country’ Example :
select_country: {
type: 'SelectControl',
label: 'Country Name Type',
default: 'France',
choices:
[ 'Belgiu
m',
'Brazil',
'China',
'Egypt',
'France',
'Germany',
'Italy',
'Japan',
'Korea',
'Morocco',
'Netherlands',
'Russia',
'Singapore',
'Spain',
'Uk',
'Usa',
].map(s => [s, s]),
description: 'The name of country that Superset should display',
},
Videos
Note: This section of the documentation has yet to be filled in.
Importing and Exporting Datasources
The superset cli allows you to import and export datasources from and to YAML. Datasources include both databases
and druid clusters. The data is expected to be organized in the following hierarchy:
.
databases
| database_1 |
| | table_1
| | | columns datasour
| | | | column_1 ce_1
| | | | column_2
| | | | ... (more
columns)
| | | metrics
| | | metric_1
| | | metric_2
| | | ... (more
metrics)
| | ... (more tables)
| ... (more
databases)
druid_clusters
cluster_1
3.4. Contents 109
3.4. Contents 109


| | columns
| | | column_1
| | | column_2
| | | ... (more
| | columns)
metrics
| | metric_1
| | metric_2
| | ... (more metrics)
| ... (more datasources)
... (more clusters)
Exporting Datasources to YAML
You can print your current datasources to stdout by running:

superset export_datasources
To save your datasources to a file run:
superset export_datasources -f <filename>
By default, default (null) values will be omitted. Use the -d flag to include them. If you want back references to
be included (e.g. a column to include the table id it belongs to) use the -b flag.
Alternatively, you can export datasources using the UI:
1. Open Sources -> Databases to export all tables associated to a single or multiple databases. (Tables for one or
more tables, Druid Clusters for clusters, Druid Datasources for datasources)
2. Select the items you would like to export
3. Click Actions -> Export to YAML
4. If you want to import an item that you exported through the UI, you will need to nest it inside its parent
element,
e.g. a database needs to be nested under databases a table needs to be nested inside a database element.
Exporting the complete supported YAML schema
In order to obtain an exhaustive list of all fields you can import using the YAML import run:
superset export_datasource_schema
Again, you can use the -b flag to include back references.
Importing Datasources from YAML
In order to import datasources from a YAML file(s), run:
superset import_datasources -p <path or filename>
If you supply a path all files ending with *.yaml or *.yml will be parsed. You can apply additional flags e.g.:
superset import_datasources -p <path> -r

Will search the supplied path

recursively.
The sync flag -s takes parameters in order to sync the supplied elements with your file. Be careful this can delete the
contents of your meta database. Example:
superset import_datasources -p <path / filename> -s columns,metrics
This will sync all metrics and columns for all datasources found in the <path / filename> in the
Superset meta database. This means columns and metrics not specified in YAML will be deleted. If you
would add tables to columns,metrics those would be synchronised as well.
If you don’t supply the sync flag (-s) importing will only add and update (override) fields. E.g. you can
add a verbose_name to the column ds in the table random_time_series from the example datasets by
saving the following YAML to file and then running the import_datasources command.
databases:
- database_name:
main tables:
- table_name: random_time_series
columns:
- column_name: ds
verbose_name: datetime
3.4.8 FA
Q
Can I query/join multiple tables at one

time?
Not directly no. A Superset SQLAlchemy datasource can only be a single table or a view.
When working with tables, the solution would be to materialize a table that contains all the fields needed for your
analysis, most likely through some scheduled batch process.
A view is a simple logical layer that abstract an arbitrary SQL queries as a virtual table. This can allow you to join and
union multiple tables, and to apply some transformation using arbitrary SQL expressions. The limitation there is your
database performance as Superset effectively will run a query on top of your query (view). A good practice may be to
limit yourself to joining your main large table to one or many small tables only, and avoid using GROUP BY where
possible as Superset will do its own GROUP BY and doing the work twice might slow down performance.
Whether you use a table or a view, the important factor is whether your database is fast enough to serve it in an
interactive fashion to provide a good user experience in Superset.
How BIG can my data source

be?
It can be gigantic! As mentioned above, the main criteria is whether your database can execute queries and return
results in a time frame that is acceptable to your users. Many distributed databases out there can execute queries
that scan through terabytes in an interactive fashion.
How do I create my own

visualization?
We are planning on making it easier to add new visualizations to the framework, in the meantime, we’ve tagged a few
pull requests as example to give people examples of how to contribute new visualizations.
3.4. Contents 111

https://github.com/airbnb/superset/issues?q=label%3Aexample+is%3Aclosed
3.4. Contents 111

Can I upload and visualize csv

data?
Yes, using the Upload a CSV button under the Sources menu item. This brings up a form that allows you
specify required information. After creating the table from CSV, it can then be loaded like any other on the
Sources -> Tables page.
Why are my queries timing

out?
There are many reasons may cause long query timing

out.
• For running long query from Sql Lab, by default Superset allows it run as long as 6 hours before it being killed
by celery. If you want to increase the time for running query, you can specify the timeout in configuration. For
example:
SQLLAB_ASYNC_TIME_LIMIT_SEC = 60 * 60 * 6
• Superset is running on gunicorn web server, which may time out web requests. If you want to increase the
default (50), you can specify the timeout when starting the web server with the -t flag, which is expressed in
seconds.
superset runserver -t 300
• If you are seeing timeouts (504 Gateway Time-out) when loading dashboard or explore slice, you are
probably behind gateway or proxy server (such as Nginx). If it did not receive a timely response from
Superset server (which is processing long queries), these web servers will send 504 status code to
clients directly. Superset has a client-side timeout limit to address this issue. If query didn’t come back
within clint-side timeout (60 seconds by default), Superset will display warning message to avoid
gateway timeout message. If you have a longer gateway timeout limit, you can change the timeout
settings in superset_config.py:
SUPERSET_WEBSERVER_TIMEOUT = 60
Why is the map not visible in the mapbox

visualization?
You need to register to mapbox.com, get an API key and configure it as MAPBOX_API_KEY in
superset_config.py.
How to add dynamic filters to a

dashboard?
It’s easy: use the Filter Box widget, build a slice, and add it to your
dashboard.
The Filter Box widget allows you to define a query to populate dropdowns that can be used for filtering. To build
the list of distinct values, we run a query, and sort the result by the metric you provide, sorting descending.
The widget also has a checkbox Date Filter, which enables time filtering capabilities to your dashboard.
After checking the box and refreshing, you’ll see a from and a to dropdown show up.
By default, the filtering will be applied to all the slices that are built on top of a datasource that shares the column
name that the filter is based on. It’s also a requirement for that column to be checked as “filterable” in the column tab
of the table editor.
But what about if you don’t want certain widgets to get filtered on your dashboard? You can do that by editing your
dashboard, and in the form, edit the JSON Metadata field, more specifically the filter_immune_slices key,
that receives an array of sliceIds that should never be affected by any dashboard level filtering.

{
"filter_immune_slices": [324, 65, 92],
"expanded_slices": {},
"filter_immune_slice_fields": {
"177": ["country_name", " time_range"],
"32": [" time_range"]
},
"timed_refresh_immune_slices": [324]
}
In the json blob above, slices 324, 65 and 92 won’t be affected by any dashboard level filtering.
Now note the filter_immune_slice_fields key. This one allows you to be more specific and define for a
specific slice_id, which filter fields should be disregarded.
Note the use of the time_range keyword, which is reserved for dealing with the time boundary filtering
mentioned above.
But what happens with filtering when dealing with slices coming from different tables or databases? If the column
name is shared, the filter will be applied, it’s as simple as that.
How to limit the timed refresh on a

dashboard?
By default, the dashboard timed refresh feature allows you to automatically re-query every slice on a dashboard
according to a set schedule. Sometimes, however, you won’t want all of the slices to be refreshed - especially if
some data is slow moving, or run heavy queries. To exclude specific slices from the timed refresh process, add the
timed_refresh_immune_slices key to the dashboard JSON Metadata field:
{
"filter_immune_slices": [],
"expanded_slices": {},
"filter_immune_slice_fields": {},
"timed_refresh_immune_slices": [324]
}
In the example above, if a timed refresh is set for the dashboard, then every slice except 324 will be
automatically re-queried on schedule.
Slice refresh will also be staggered over the specified period. You can turn off this staggering by setting the
stagger_refresh to false and modify the stagger period by setting stagger_time to a value in
milliseconds in the JSON Metadata field:
{
"stagger_refresh": false,
"stagger_time": 2500
}
Here, the entire dashboard will refresh at once if periodic refresh is on. The stagger time of 2.5 seconds is ignored.
Why does ‘flask fab’ or superset freezed/hung/not responding when started (my home directory is
NFS mounted)?
By default, superset creates and uses an sqlite database at ~/.superset/superset.db. Sqlite is known to don’t
work well if used on NFS due to broken file locking implementation on NFS.
You can override this path using the SUPERSET_HOME environment variable.
3.4. Contents 113

Another work around is to change where superset stores the sqlite database by adding
SQLALCHEMY_DATABASE_URI = 'sqlite:////new/location/superset.db' in superset_config.py
(create the file if needed), then adding the directory where superset_config.py lives to PYTHONPATH
environment variable (e.g. export PYTHONPATH=/opt/logs/sandbox/airbnb/).
What if the table schema

changed?
Table schemas evolve, and Superset needs to reflect that. It’s pretty common in the life cycle of a dashboard to want
to add a new dimension or metric. To get Superset to discover your new columns, all you have to do is to go to Menu
-> Sources -> Tables, click the edit icon next to the table who’s schema has changed, and hit Save
from the Detail tab. Behind the scene, the new columns will get merged it. Following this, you may want to
re-edit the table afterwards to configure the Column tab, check the appropriate boxes and save again.
How do I go about developing a new visualization

type?
Here’s an example as a Github PR with comments that describe what the different sections of the code do: https:
//github.com/airbnb/superset/pull/3013
What database engine can I use as a backend for

Superset?
To clarify, the database backend is an OLTP database used by Superset to store its internal information like
your list of users, slices and dashboard definitions.
Superset is tested using Mysql, Postgresql and Sqlite for its backend. It’s recommended you install Superset on one
of these database server for production.
Using a column-store, non-OLTP databases like Vertica, Redshift or Presto as a database backend simply won’t work
as these databases are not designed for this type of workload. Installation on Oracle, Microsoft SQL Server, or other
OLTP databases may work but isn’t tested.
Please note that pretty much any databases that have a SqlAlchemy integration should work perfectly fine as a data-
source for Superset, just not as the OLTP backend.
How can i configure OAuth authentication and

authorization?
You can take a look at this Flask-AppBuilder configuration

example.
How can I set a default filter on my

dashboard?
Easy. Simply apply the filter and save the dashboard while the filter is active.
How do I get Superset to refresh the schema of my

table?
When adding columns to a table, you can have Superset detect and merge the new columns in by using the
“Refresh Metadata” action in the Source -> Tables page. Simply check the box next to the tables you want
the schema refreshed, and click Actions -> Refresh Metadata.

Is there a way to force the use specific colors?
It is possible on a per-dashboard basis by providing a mapping of labels to colors in the JSON Metadata
attribute using the label_colors key.
{
"label_colors": {
"Girls": "#FF69B4",
"Boys": "#ADD8E6"
}
}
Does Superset work with [insert database engine

here]?
The community over time has curated a list of databases that work well with Superset in the Database dependencies
section of the docs. Database engines not listed in this page may work too. We rely on the community to contribute to
this knowledge base.
For a database engine to be supported in Superset through the SQLAlchemy connector, it requires having a Python
compliant SQLAlchemy dialect as well as a DBAPI driver defined. Database that have limited SQL support may work
as well. For instance it’s possible to connect to Druid through the SQLAlchemy connector even though Druid does
not support joins and subqueries. Another key element for a database to be supported is through the Superset
Database Engine Specification interface. This interface allows for defining database-specific configurations and
logic that go beyond the SQLAlchemy and DBAPI scope. This includes features like:
• date-related SQL function that allow Superset to fetch different time granularities when running time-series
queries
• whether the engine supports subqueries. If false, Superset may run 2-phase queries to compensate for the
limitation
• methods around processing logs and inferring the percentage of completion of a query
• technicalities as to how to handle cursors and connections if the driver is not standard DBAPI
• more, read the code for more details
Beyond the SQLAlchemy connector, it’s also possible, though much more involved, to extend Superset and write
your own connector. The only example of this at the moment is the Druid connector, which is getting superseded by
Druid’s growing SQL support and the recent availability of a DBAPI and SQLAlchemy driver. If the database you
are considering integrating has any kind of of SQL support, it’s probably preferable to go the SQLAlchemy route.
Note that for a native connector to be possible the database needs to have support for running OLAP-type queries
and should be able to things that are typical in basic SQL:
• aggregate data
• apply filters (==, !=, >, <, >=, <=, IN, . . . )
• apply HAVING-type filters
• be schema-aware, expose columns and types
3.5 Indices and tables
• genindex
• modindex
3.5. Indices and tables 115

• search

Apache Superset Readthedocs Io en Latest

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Apache Superset Readthedocs Io en Latest

Uploaded by

Copyright:

Available Formats

Superset Documentation

Apache Superset Dev

Dec 05, 2019

2 Apache Software Foundation Resources 5

Apache Superset (incubating) is a modern, enterprise-ready business intelligence web application

• Superset’s Github, note that we use Github for issue tracking

4 Chapter 1. Superset Resources

APACHE SOFTWARE FOUNDATION

• The Apache Software Foundation Website

6 Chapter 2. Apache Software Foundation Resources

• A rich set of data visualizations

The following RDBMS are currently supported:

TimeColumn Time Grain

Series limit Sort By

Sort Descending 0Contribution

-.,' \\\n '(-" -

© Mapbox © OpenStreetMap lmprove this map ..1

3.4.1 Installation &

git clone https://github.com/apache/incubator-superset/

sudo apt-get install build-essential libssl-dev libffi-dev python-dev python-pip

otherwise build for cryptography fails.

sudo yum upgrade python-setuptools

brew install pkg-config libffi openssl python

# You may also have to create C:\Temp

pip install virtualenv

You can create and activate a virtualenv by:

# virtualenv is shipped in Python 3.6+ as venv instead of pyvenv.

On Windows the syntax for activating it is a bit different:

Python’s setup tools and pip

pip install --upgrade setuptools pip

Superset installation and initialization

Follow these few simple steps to install Superset.:

# Initialize the database

# Load some data to play with

# Create default roles and permissions

A proper WSGI HTTP

Refer to the Gunicorn documentation for more information.

Configuration behind a load

# The SQLAlchemy connection string to your database backend

# Flask-WTF flag for CSRF

# Set this API key to enable Mapbox visualizations

All the parameters and default values defined in https://github.com/apache/incubator-superset/blob/master/superset/

database pypi package SQLAlchemy URI prefix

The connection string for Hana looks like this

The connection string for Athena looks like this

Where you need to escape/encode at least the s3_staging_dir, i.e.,

s3://... -> s3%3A//...

The connection string for BigQuery looks like this

The resulting file should have this structure

You should then be able to connect to your BigQuery datasets.

The connection string for Elasticsearch looks like this

You can query multiple indices on SQLLab for example

select timestamp, agent from "logstash-*"

Then register your table with the alias name logstasg_all

The connection string for Snowflake looks like this

The connection string for Teradata looks like this

See Teradata SQLAlchemy.

git clone https://github.com/JohnOmernik/sqlalchemy-drill

Deeper SQLAlchemy integration

Schemas (Postgres &

External Password store for SQLAlchemy

A common pattern is to use environment variables to make secrets available.

SSL Access to databases

The connection string for Exasol looks like this