Professional Documents
Culture Documents
The following is a summary of how to access different formats of data stored in an external
object store. You can copy and modify the example queries below to access your own datasets.
For simplicity the included datasets are setup to not need credentials, but it is highly
recommended that you use credentials to access your own datasets.
You can use similar SQL to access your own external object store. Simply replace the following:
LOCATION - Replace with the location of your object store. The location must begin
with /s3/ (Amazon) or /az/ (Azure).
USER or ACCESS_ID - Add the user name for your external object store.
PASSWORD or ACCESS_KEY - Add the password of the user on your external object
store.
Uncomment the EXTERNAL SECURITY clause as necessary
When modifying to access your data - your external object store must be configured to allow
access from the Vantage environment. Provide your credentials in USER and PASSWORD
(used in the CREATE AUTHORIZATION command) and ACCESSID and ACCESSKEY (used
in the READ_NOS command).
1. READ_NOS
READ_NOS allows you to do the following:
Perform an ad hoc query on data that is in CSV and JSON formats with the data in-place
on an external object store
Examine the schema of PARQUET formatted data
Bypass creating a foreign table in the database
2. Foreign Tables
Users with CREATE TABLE privilege can create a foreign table inside the database, point this
virtual table to an external storage location, and use SQL to translate the external data into a form
useful for business. Using a foreign table in gives you the ability to:
Data can be loaded into the database by selecting from READ_NOS or a Foreign Table in a
CREATE TABLE AS … WITH DATA statement.
Create a view that splits out the CSV into individual columns:
REPLACE VIEW sample_csv_view
AS
(SELECT
CAST(payload.."date" AS DATE FORMAT 'YYYY-MM-DD') sensdate,
CAST(payload.."time" AS TIME(6) FORMAT 'HH:MI:SSDS(F)') senstime,
CAST(payload..epoch AS INTEGER) epoch,
CAST(payload..moteid AS INTEGER) moteid,
CAST(payload..temperature AS FLOAT) ( FORMAT '-ZZZ9.99') temperature,
CAST(payload..humidity AS FLOAT) ( FORMAT '-ZZZ9.99') humidity,
CAST(payload..light AS FLOAT) ( FORMAT '-ZZZ9.99') light,
CAST(payload..voltage AS FLOAT) ( FORMAT '-ZZZ9.99') voltage,
CAST(payload.."date" || ' ' || payload.."time" AS TIMESTAMP FORMAT
'YYYY-MM-DDBHH:MI:SSDS(F)') sensdatetime
FROM sample_csv_ft);
SELECT TOP 2 *
FROM sample_csv_view;
NOS on JSON Files
See what the schema of the data is (specify one file - assuming all files in the bucket are
formatted the same):
SELECT * FROM READ_NOS (
USING
LOCATION
('/s3/s3.amazonaws.com/trial-datasets/SalesOffload/2010/1/object_33_0_1.parque
t')
RETURNTYPE ('NOSREAD_PARQUET_SCHEMA')
FULLSCAN ('TRUE')
--ACCESS_ID ( 'ACCESS_KEY_ID' )
--ACCESS_KEY ( 'SECRET_ACCESS_KEY' )
) AS D;
Accessing Parquet Data Stored on Amazon S3 with
CREATE FOREIGN TABLE
let's take a look at some of the data that is there in parquet bucket –
SELECT location(CHAR(255)), ObjectLength
FROM read_nos (
ON (SELECT CAST(NULL AS DATASET INLINE LENGTH 64000 STORAGE FORMAT CSV))
USING
LOCATION ('/s3/s3.amazonaws.com/trial-datasets/SalesOffload')
RETURNTYPE ('NOSREAD_KEYS')
) AS d
ORDER BY 1
Let's take a look at one of the files to get a better understanding of the file format:
SELECT * FROM READ_NOS (
USING
LOCATION
('/s3/s3.amazonaws.com/trial-datasets/SalesOffload/2010/1/object_33_0_1.parque
t')
RETURNTYPE ('NOSREAD_PARQUET_SCHEMA')
FULLSCAN ('TRUE')
)
AS d;
We want the data to look like a native table. So lets put a view on top –
REPLACE VIEW sales_fact_offload_v AS (
SELECT
sales_date,
customer_id,
store_id,
basket_id,
product_id,
sales_quantity,
discount_amount
FROM sample_parquet)
SELECT TOP 10 *
FROM sales_fact_offload_v
Lets Optimize the foreign table and view for efficient access –
DROP TABLE sales_fact_offload;
We have re-defined our foreign table to include a PATHPATTERN clause. When looking at
historical data by date, this allows us to read only the files we need!
Now let's re-create our user-friendly view that allows for this path filtering…
REPLACE VIEW sales_fact_offload_v AS (
SELECT
CAST($path.$year AS CHAR(4)) sales_year,
CAST($path.$month AS CHAR(2)) sales_month,
sales_date,
customer_id,
store_id,
basket_id,
product_id,
sales_quantity,
discount_amount
FROM sales_fact_offload);
SELECT TOP 10 *
FROM sales_fact_offload_v
WHERE sales_year = '2010'
AND sales_month = '9'
Clean-up
Drop the objects we created in our own database schema.
DROP AUTHORIZATION InvAuth;
DROP TABLE sample_csv;
DROP VIEW sample_csv_view;
DROP TABLE sample_csv_ft;
DROP CSV SCHEMA sample_csv_schema;
DROP TABLE sample_json;
DROP TABLE sample_parquet;