You are on page 1of 11

Introduction

The following is a summary of how to access different formats of data stored in an external
object store. You can copy and modify the example queries below to access your own datasets.
For simplicity the included datasets are setup to not need credentials, but it is highly
recommended that you use credentials to access your own datasets.

You can use similar SQL to access your own external object store. Simply replace the following:

 LOCATION - Replace with the location of your object store. The location must begin
with /s3/ (Amazon) or /az/ (Azure).
 USER or ACCESS_ID - Add the user name for your external object store.
 PASSWORD or ACCESS_KEY - Add the password of the user on your external object
store.
 Uncomment the EXTERNAL SECURITY clause as necessary

When modifying to access your data - your external object store must be configured to allow
access from the Vantage environment. Provide your credentials in USER and PASSWORD
(used in the CREATE AUTHORIZATION command) and ACCESSID and ACCESSKEY (used
in the READ_NOS command).

Accessing External Object Storage


There are two ways to read data from an external object store:

1. READ_NOS
READ_NOS allows you to do the following:

 Perform an ad hoc query on data that is in CSV and JSON formats with the data in-place
on an external object store
 Examine the schema of PARQUET formatted data
 Bypass creating a foreign table in the database

2. Foreign Tables
Users with CREATE TABLE privilege can create a foreign table inside the database, point this
virtual table to an external storage location, and use SQL to translate the external data into a form
useful for business. Using a foreign table in gives you the ability to:

 Load external data to the database


 Join external data to data stored in the database
 Filter the data
 Use views to simplify how the data appears to your users
Data read through a foreign server is not automatically stored on disk and the data can only be
seen by that query.

Data can be loaded into the database by selecting from READ_NOS or a Foreign Table in a
CREATE TABLE AS … WITH DATA statement.

When accessing your own data


Create an authorization object to contain the credentials to your external object store and
uncomment the EXTERNAL SECURITY clauses in the statements below to use.
CREATE AUTHORIZATION InvAuth
AS INVOKER TRUSTED
USER 'ACCESS_KEY_ID'
PASSWORD 'SECRET_ACCESS_KEY';
NOS on CSV Files

Accessing CSV Data Stored on Amazon S3 with


READ_NOS
Select data from external object store using READ_NOS:
SELECT TOP 2 payload..* FROM READ_NOS(
ON ( SELECT CAST( NULL AS DATASET STORAGE FORMAT CSV ) ) USING
LOCATION('/s3/s3.amazonaws.com/trial-datasets/IndoorSensor/')
--ACCESS_ID ( 'ACCESS_KEY_ID' )
--ACCESS_KEY ( 'SECRET_ACCESS_KEY' )
) AS D;

NOTE: even “payload.*” works for csv as like json

Accessing CSV Data Stored on Amazon S3 with CREATE


FOREIGN TABLE
Create a foreign table:
CREATE FOREIGN TABLE sample_csv
--, EXTERNAL SECURITY INVOKER TRUSTED InvAuth
( Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC,
Payload DATASET INLINE LENGTH 64000 STORAGE FORMAT CSV
)
USING (LOCATION('/s3/s3.amazonaws.com/trial-datasets/IndoorSensor/'));

View some data using the foreign table:


SELECT TOP 2 CAST(payload AS VARCHAR(64000))
FROM sample_csv;
Import data into Vantage from CSV data stored on Amazon
S3
To persist the data from an external object store we can use a CREATE TABLE AS statement as
follows:

First, we need a schema to apply to the data:


CREATE CSV SCHEMA sample_csv_schema AS
'{"field_delimiter":",","field_names":["date", "time", "epoch", "moteid",
"temperature", "humidity", "light", "voltage"]}';

Create a foreign table that uses that schema:


CREATE FOREIGN TABLE sample_csv_ft
( Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC,
Payload DATASET INLINE LENGTH 64000 STORAGE FORMAT CSV WITH SCHEMA
sample_csv_schema
)
USING (LOCATION('/s3/s3.amazonaws.com/trial-datasets/IndoorSensor/data.csv'));

select top 10 cast(payload as varchar(64000)) from sample_csv_ft;

select top 10 cast(payload..* as varchar(64000)) from sample_csv_ft;

Create a view that splits out the CSV into individual columns:
REPLACE VIEW sample_csv_view
AS
(SELECT
CAST(payload.."date" AS DATE FORMAT 'YYYY-MM-DD') sensdate,
CAST(payload.."time" AS TIME(6) FORMAT 'HH:MI:SSDS(F)') senstime,
CAST(payload..epoch AS INTEGER) epoch,
CAST(payload..moteid AS INTEGER) moteid,
CAST(payload..temperature AS FLOAT) ( FORMAT '-ZZZ9.99') temperature,
CAST(payload..humidity AS FLOAT) ( FORMAT '-ZZZ9.99') humidity,
CAST(payload..light AS FLOAT) ( FORMAT '-ZZZ9.99') light,
CAST(payload..voltage AS FLOAT) ( FORMAT '-ZZZ9.99') voltage,
CAST(payload.."date" || ' ' || payload.."time" AS TIMESTAMP FORMAT
'YYYY-MM-DDBHH:MI:SSDS(F)') sensdatetime
FROM sample_csv_ft);

SELECT TOP 2 *
FROM sample_csv_view;
NOS on JSON Files

Accessing JSON Data Stored on Amazon S3 with


READ_NOS
Select data from external object store using READ_NOS:
SELECT TOP 2 payload.* FROM READ_NOS (
ON ( SELECT CAST( NULL AS JSON ) )
USING
LOCATION ('/S3/s3.amazonaws.com/trial-datasets/EVCarBattery/')
--ACCESS_ID ( 'ACCESS_KEY_ID' )
--ACCESS_KEY ( 'SECRET_ACCESS_KEY' )
) AS D;

NOTE : “payload..*” doesn’t work in JSON.

Accessing JSON Data Stored on Amazon S3 with CREATE


FOREIGN TABLE
Create a foreign table:
CREATE FOREIGN TABLE sample_json
--, EXTERNAL SECURITY INVOKER TRUSTED InvAuth
(
LOCATION VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC ,
Payload JSON INLINE LENGTH 32000 CHARACTER SET UNICODE )
USING(
LOCATION('/S3/s3.amazonaws.com/trial-datasets/EVCarBattery/') );

View some data using the foreign table:


SELECT TOP 2 * FROM sample_json;
NOS on Parquet Files

Exploring Parquet Data Stored on Amazon S3 with


READ_NOS
See what files are in the Parquet bucket:
SELECT location(CHAR(255)), ObjectLength
FROM read_nos (
ON (SELECT CAST(NULL AS DATASET INLINE LENGTH 64000 STORAGE FORMAT CSV))
USING
LOCATION ('/S3/s3.amazonaws.com/trial-datasets/SalesOffload/')
RETURNTYPE ('NOSREAD_KEYS')
--ACCESS_ID ( 'ACCESS_KEY_ID' )
--ACCESS_KEY ( 'SECRET_ACCESS_KEY' )
) AS D
ORDER BY 1;

See what the schema of the data is (specify one file - assuming all files in the bucket are
formatted the same):
SELECT * FROM READ_NOS (
USING
LOCATION
('/s3/s3.amazonaws.com/trial-datasets/SalesOffload/2010/1/object_33_0_1.parque
t')
RETURNTYPE ('NOSREAD_PARQUET_SCHEMA')
FULLSCAN ('TRUE')
--ACCESS_ID ( 'ACCESS_KEY_ID' )
--ACCESS_KEY ( 'SECRET_ACCESS_KEY' )
) AS D;
Accessing Parquet Data Stored on Amazon S3 with
CREATE FOREIGN TABLE
let's take a look at some of the data that is there in parquet bucket –
SELECT location(CHAR(255)), ObjectLength
FROM read_nos (
ON (SELECT CAST(NULL AS DATASET INLINE LENGTH 64000 STORAGE FORMAT CSV))
USING
LOCATION ('/s3/s3.amazonaws.com/trial-datasets/SalesOffload')
RETURNTYPE ('NOSREAD_KEYS')
) AS d
ORDER BY 1

Let's take a look at one of the files to get a better understanding of the file format:
SELECT * FROM READ_NOS (
USING
LOCATION
('/s3/s3.amazonaws.com/trial-datasets/SalesOffload/2010/1/object_33_0_1.parque
t')
RETURNTYPE ('NOSREAD_PARQUET_SCHEMA')
FULLSCAN ('TRUE')
)
AS d;

Create a foreign table:


CREATE FOREIGN TABLE sample_parquet
--, EXTERNAL SECURITY INVOKER TRUSTED InvAuth
(
Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC,
TheYear INTEGER,
TheMonth INTEGER,
sales_date DATE FORMAT 'YY/MM/DD',
customer_id INTEGER,
store_id INTEGER,
basket_id INTEGER,
product_id INTEGER,
sales_quantity INTEGER,
discount_amount FLOAT FORMAT '-ZZZ9.99'
)
USING (
LOCATION ('/s3/s3.amazonaws.com/trial-datasets/SalesOffload')
STOREDAS ('PARQUET'))
NO PRIMARY INDEX
PARTITION BY COLUMN;

View some data using the foreign table:


SELECT TOP 2 * FROM sample_parquet;

We want the data to look like a native table. So lets put a view on top –
REPLACE VIEW sales_fact_offload_v AS (
SELECT
sales_date,
customer_id,
store_id,
basket_id,
product_id,
sales_quantity,
discount_amount
FROM sample_parquet)

SELECT TOP 10 *
FROM sales_fact_offload_v
Lets Optimize the foreign table and view for efficient access –
DROP TABLE sales_fact_offload;

CREATE FOREIGN TABLE sales_fact_offload(


Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC,
TheYear INTEGER,
TheMonth INTEGER,
sales_date DATE FORMAT 'YY/MM/DD',
customer_id INTEGER,
store_id INTEGER,
basket_id INTEGER,
product_id INTEGER,
sales_quantity INTEGER,
discount_amount FLOAT FORMAT '-ZZZ9.99'
)
USING(
LOCATION ('/s3/s3.amazonaws.com/trial-datasets/SalesOffload')
PATHPATTERN ('$dir1/$year/$month')
STOREDAS ('PARQUET')
)
NO PRIMARY INDEX
PARTITION BY COLUMN

We have re-defined our foreign table to include a PATHPATTERN clause. When looking at
historical data by date, this allows us to read only the files we need!

Now let's re-create our user-friendly view that allows for this path filtering…
REPLACE VIEW sales_fact_offload_v AS (
SELECT
CAST($path.$year AS CHAR(4)) sales_year,
CAST($path.$month AS CHAR(2)) sales_month,
sales_date,
customer_id,
store_id,
basket_id,
product_id,
sales_quantity,
discount_amount
FROM sales_fact_offload);

SELECT TOP 10 *
FROM sales_fact_offload_v
WHERE sales_year = '2010'
AND sales_month = '9'
Clean-up
Drop the objects we created in our own database schema.
DROP AUTHORIZATION InvAuth;
DROP TABLE sample_csv;
DROP VIEW sample_csv_view;
DROP TABLE sample_csv_ft;
DROP CSV SCHEMA sample_csv_schema;
DROP TABLE sample_json;
DROP TABLE sample_parquet;

You might also like