Professional Documents
Culture Documents
C O NT E NT S
223
Introduction
Installing and Starting Up
First SQL Queries
I N T RO D U C T I O N
DRILL_ARGS - -u jdbc:drill:zk=local
Calculating Drill classpath...
oct 26, 2015 9:33:46 AM org.glassfish.jersey.server.
ApplicationHandler initialize
INFO: Initiating Jersey application, version Jersey: 2.8 2014-0429 01:25:26...
apache drill 1.2.0
json aint no thang
0: jdbc:drill:zk=local>
Drill is now ready to query data. To check if it really works, use the
following query:
VALUES(CURRENT_DATE);
Result:
+------------------+
|
columns
|
+------------------+
| [HelloWorld!] |
+------------------+
SQLJAVA
SYNTAXENTERPRISE
FOR APACHE DRILL
EDITION 7
Apache Drill
enables schema-less
access to data
Learn More
D Z O NE, INC.
DZ O NE.C O M
Apache Drill
enables schema-less
access to data.
Learn More
3
The unique thing about this query is its use of the FROM clause. Instead
of a table name, it contains a reference to the file to be accessed.
The term dfs represents one of the supported storage plugins.
This particular plugin indicates that a file in the local file system
is accessed. The storage plugin is followed by a file specification
containing the correct directory and the file name. Note that the file
specification must be specified in between backticks and not single
quotes. The file contains only one line with data, so only one row is
returned. Because no column names are specified, the column name
columns is used.
Result:
+------+-----------+
| enr | lastname |
+------+-----------+
| 6
| Manzarek |
| 8
| Young
|
| 15
| Metheny
|
+------+-----------+
6,
Manzarek,
R,
Haseltine Lane,
Phoenix }
8,
Young,
N,
Brownstreet,
1234567 }
15,
Metheny,
M,
South
Q U E RY I N G A R R AYS
The JSON data structure is turned into a flat table, and for the missing
values in specific columns, the null value is presented. Result:
Create a file with the following content (note that each employee can
work for several projects) and call it EmployeesArrays.json:
+---------+-----------+-----------+-----------------+----------+----------+-----------+
| number |
name
| initials |
street
|
town
| mobile | province |
+---------+-----------+-----------+-----------------+----------+----------+-----------+
| 6
| Manzarek | R
| Haseltine Lane | Phoenix | null
| null
|
| 8
| Young
| N
| Brownstreet
| null
| 1234567 | null
|
| 15
| Metheny
| M
| null
| null
| null
| South
|
+---------+-----------+-----------+-----------------+----------+----------+-----------+
{ number :
projects:
}
{ number :
projects:
}
Q U E RY I N G N E S T E D DATA S T R U C T U R E S
8,
[ ACP3, FGTR ]
15,
[ ACP3, HHGT, X456 ]
Many data sources, such as MongoDB, Hadoop with AVRO, and JSON
files, contain nested data structures. In relational terminology they
would be called columns within columns. To address these nested
columns, a specific syntax is introduced. Several examples are used to
illustrate this syntax.
Result:
+---------+-------------------------+
| number |
projects
|
+---------+-------------------------+
| 8
| [ACP3,FGTR]
|
| 15
| [ACP3,HHGT,X456] |
+---------+-------------------------+
Manzarek,
R },
Haseltine Lane,
80,
1234KK,
Stratford } }
In each row (so for each employee), the projects column contains a set of
project values. To see them as separate values, use the flatten function:
SELECT FLATTEN(projects) AS project, enr
FROM
dfs.`\MyDirectory\EnployeesArrays.json`;
In the result each row contains a separate project value and the
number of the employee to which the project value belongs:
Young,
N },
Brownstreet,
80,
ZH,
Boston } }
+----------+------+
| project | enr |
+----------+------+
| ACP3
| 8
|
| FGTR
| 8
|
| ACP3
| 15
|
| HHGT
| 15
|
| X456
| 15
|
+----------+------+
Metheny,
M,
45 } }
D Z O NE, INC .
DZ O NE .C O M
Result:
+--------+------------------------------+
| owner |
cars
|
+--------+------------------------------+
| 1
| {key:Ford,value:3}
|
| 1
| {key:BMW,value:2}
|
| 1
| {key:Ferrari,value:1} |
| 2
| {key:BMW,value:4}
|
| 2
| {key:GM,value:5}
|
+--------+------------------------------+
Result:
+------+-------------------+---------------+
| enr | NumberOfProjects | ContainsFGTR |
+------+-------------------+---------------+
| 8
| 2
| true
|
| 15
| 3
| false
|
+------+-------------------+---------------+
Apache Drill supports all the query features to be expected from a SQL
product. The next example shows how the special functions can be
combined with more traditional joins and window functions. Create a
file with the following content and call it EmployeesProjects.json.
Each line in this file indicates how many hours an employee has
worked on a project on a specific day.
{enr: 8,
{enr: 8,
{enr: 8,
{enr:15,
{enr:15,
{enr:15,
{enr:15,
{enr:15,
{enr:15,
Result:
+---------+---------+---------+---------+
| number | EXPR$1 | EXPR$2 | EXPR$3 |
+---------+---------+---------+---------+
| 8
| ACP3
| FGTR
| null
|
| 15
| ACP3
| HHGT
| X456
|
+---------+---------+---------+---------+
project:ACP3,
project:ACP3,
project:FGTR,
project:ACP3,
project:ACP3,
project:HHGT,
project:HHGT,
project:HHGT,
project:X456,
date:2015-10-01,
date:2015-10-04,
date:2015-10-02,
date:2015-10-01,
date:2015-10-03,
date:2015-10-01,
date:2015-10-05,
date:2015-10-07,
date:2015-10-01,
hours:4}
hours:5}
hours:2}
hours:7}
hours:5}
hours:4}
hours:2}
hours:8}
hours:6}
Q U E RY I N G M A PS W I T H DATA
Result:
{ owner : 1,
cars : { Ford : 3,
BMW : 2,
Ferrari : 1 }
}
{ owner : 2,
cars : { BMW : 4,
GM : 5 }
}
+-------------+------+-----------+----------+--------+------------+
|
pdate
| enr |
ename
| project | hours | sum_hours |
+-------------+------+-----------+----------+--------+------------+
| 2015-10-01 | 15
| Metheny
| ACP3
| 7
| 17
|
| 2015-10-01 | 15
| Metheny
| HHGT
| 4
| 17
|
| 2015-10-01 | 15
| Metheny
| X456
| 6
| 17
|
| 2015-10-01 | 8
| Young
| ACP3
| 4
| 4
|
| 2015-10-02 | 8
| Young
| FGTR
| 2
| 2
|
| 2015-10-03 | 15
| Metheny
| ACP3
| 5
| 5
|
| 2015-10-04 | 8
| Young
| ACP3
| 5
| 5
|
| 2015-10-05 | 15
| Metheny
| HHGT
| 2
| 2
|
| 2015-10-07 | 15
| Metheny
| HHGT
| 8
| 8
|
| null
| 6
| Manzarek | null
| null
| null
|
+-------------+------+-----------+----------+--------+------------+
Result:
+---------------------------------------------------------------------------------+
|
cars
|
+---------------------------------------------------------------------------------+
| [{key:Ford,value:3},{key:BMW,value:2},{key:Ferrari,value:1}] |
| [{key:BMW,value:4},{key:GM,value:5}]
|
+---------------------------------------------------------------------------------+
The effect of the kvgen function is that the car data inside the cars
DEFINITION
SELECT
D Z O NE, INC .
DZ O NE .C O M
5
QUERY
STATEMENT
SQL
STATEMENT
DEFINITION
SHOW
SCHEMAS
SHOW FILES
SHOW TABLES
USE
SHOW TABLES
DEFINITION
USE <schema name>
ALTER
SESSION
ALTER
SYSTEM
The following data types are supported by Drill and can be used when
converting the data types of values.
DATA TYPE
BIGINT
DESCRIPTION
8-byte signed integer in the range -9,223,372,036,854,775,808 to
9,223,372,036,854,775,807.
9223372036854775807
BINARY
BOOLEAN
EXPLAIN
SHOW FILES
[ FROM <filesystem> . <director name> |
IN <filesystem> . <directory name> ]
DATA TYPES
VALUES
SQL
STATEMENT
SELECT
DEFINITION
D E F I N I T I O N S O F S Q L S TAT E M E N TS F O R
WORKING WITH SCHEM A S A ND SESSIONS
(continued)
DATE
DECIMAL(p,s),
DEC(p,s), or
NUMERIC(p,s)*
FLOAT
B@e6d9eb7
True or false.
true
Years, months, and days in YYYY-MM-DD format since 4713 BC.
2015-12-30
38-digit precision number (precision is p, and scale is s).
For example, DECIMAL(6,2) is 1234.56 (4 digits before and 2 digits after the
decimal point).
4-byte floating point number.
0.456
8-byte floating point number, precision-scalable.
DOUBLE or
DOUBLE
PRECISION
INTEGER or
INT
SQL
STATEMENT
2147483646
A day-time or year-month interval.
DEFINITION
INTERVAL
CREATE
TABLE
CREATE VIEW
DROP TABLE
DROP VIEW
DESCRIBE
SHOW
DATABASES
0.456
SMALLINT
32000
This data type is not supported in version 1.2.
TIME
SHOW DATABASES
22:55:55.23
D Z O NE, INC .
DZ O NE .C O M
6
DATA TYPE
TIMESTAMP
DESCRIPTION
DATE/TIME
FUNCTION
DATA TYPE
OF RESULT
INTERVALDAY
INTERVALYEAR
CURRENT_DATE
DATE
CURRENT_TIME
TIME
TIMESTAMP
DATE,
TIMESTAMP
DOUBLE
DATE,
TIMESTAMP
DOUBLE
CURRENT_TIMESTAMP
DATE_ADD(x,y)
The processing logic of all the scalar functions can easily be tested
by using the VALUES statement. For example, with the following
statement the CBRT function can be tested:
VALUES(CBRT(64));
DATE_PART(x,y)
Result:
+---------+
| EXPR$0 |
+---------+
| 4.0
|
+---------+
DATE_SUB(x,y)
NUMERIC
FUNCTION
DATA TYPE
OF RESULT
ABS(x)
Data type of x
FLOAT8
CEIL(x) or
CEILING(x)
Data type of x
DEGREES(x)
FLOAT8
E()
FLOAT8
Returns 2.718281828459045.
EXP(x)
FLOAT8
Data type of x
LOG(x)
FLOAT8
LOG(x, y)
FLOAT8
LOG10(x)
FLOAT8
Data type of x
CBRT(x)
FLOOR(x)
LSHIFT(x, y)
MOD(x, y)
DEFINITION
PI
FLOAT8
Returns pi.
POW(x, y)
FLOAT8
RADIANS
FLOAT8
RAND
FLOAT8
Data type of x
DECIMAL
Data type of x
SIGN(x)
INT
SQRT(x)
Data type of x
TRUNC(x [ , y ] )
Data type of x
DECIMAL
ROUND(x, y)
RSHIFT(x, y)
TRUNC(x, y)
TIME
LOCALTIMESTAMP
TIMESTAMP
NOW()
TIMESTAMP
VARCHAR
BIGINT
TIMEOFDAY()
FLOAT8
ROUND(x)
EXTRACT(x FROM y)
LOCALTIME
Data type of x
NEGATIVE(x)
UNIX_TIMESTAMP( [x] )
STRING FUNCTION
D Z O NE, INC .
DATA TYPE
OF RESULT
DEFINITION
BINARY or
VARCHAR
CHAR_LENGTH(x)
INTEGER
CONCAT(x,y)
VARCHAR
INITCAP(x)
VARCHAR
LENGTH(x)
INTEGER
LOWER(x)
VARCHAR
BYTE_SUBSTR(x,y [, z ] )
DEFINITION
or
AGE(x [, y ] )
2015-12-30 22:55:55.23
CHARACTER
VARYING,
CHARACTER,
CHAR, or
VARCHAR
DZ O NE .C O M
7
DATA TYPE
OF RESULT
STRING FUNCTION
LPAD(x,y [ , z ] )
VARCHAR
VARCHAR
POSITION(x IN y)
INTEGER
REGEXP_REPLACE(x,y,z)
VARCHAR
VARCHAR
RPAD(x,y,z)
AGGREGATE
FUNCTION
DEFINITION
LTRIM(x)
RTRIM(x)
VARCHAR
STRPOS(x,y)
INTEGER
SUBSTR(x,y,z)
VARCHAR
TRIM(x)
VARCHAR
UPPER(x)
VARCHAR
DATA TYPE
OF RESULT
CAST(x AS y)
Data type of y
CONVERT_TO(x,y)
Data type of y
CONVERT_
FROM(x,y)
Data type of y
MAX(x)
Data type of x
MIN(x)
Data type of x
SUM(x)
DATA TYPE
OF RESULT
DEFINITION
COALESCE(x,y [ , y ]... )
Data type of y
NULLIF(x,y)
Data type of x
WINDOW FUNCTION
COUNT(*)
BIGINT
integer-type
argument; DOUBLE
for a floating-point
argument; otherwise
the data type of x
BIGINT
BIGINT
Data type of x
Data type of x
BIGINT for
SMALLINT or
INTEGER arguments;
DECIMAL for BIGINT
arguments; DOUBLE
for floating-point
arguments; otherwise
the data type of x
DOUBLE PRECISION
BIGINT
DEFINITION
DENSE_RANK() OVER
(y)
D Z O NE, INC .
DEFINITION
COUNT( { * | x } )
OVER (y)
AVG(x)
DATA TYPE OF
RESULT
AGGREGATE
FUNCTION
The syntax definition for all the window functions in the table below is
as follows:
DEFINITION
NULL HANDLING
FUNCTION
DEFINITION
BIGINT
COUNT([DISTINCT] x)
DATA TYPE
CONVERSION
FUNCTION
DZ O NE .C O M
8
WINDOW FUNCTION
PERCENT_RANK() OVER
(y)
RANK() OVER(y)
ROW_NUMBER() OVER(y)
DATA TYPE OF
RESULT
INTEGER
DEFINITION
DOUBLE PRECISION
BIGINT
Data type of x
DATA TYPE OF
RESULT
WINDOW FUNCTION
BIGINT
DEFINITION
Data type of x
FIRST_VALUE(x) OVER
(y)
Data type of x
LAST_VALUE(x) OVER
(y)
Data type of x
DATA TYPE
OF RESULT
-
KVGEN(x)
VARCHAR
REPEATED_COUNT(x)
INTEGER
BOOLEAN
FLATTEN(x)
REPEATED_CONTAINS(x,y)
DEFINITION
RESOURCES
JOIN NOW
DZONE, INC.
150 PRESTON EXECUTIVE DR.
CARY, NC 27513
DZone communities deliver over 6 million pages each month to more than 3.3 million software
developers, architects and decision makers. DZone offers something for everyone, including news,
tutorials, cheat sheets, research guides, feature articles, source code and more.
888.678.0399
919.678.0300
REFCARDZ FEEDBACK WELCOME
refcardz@dzone.com
SPONSORSHIP OPPORTUNITIES
DZ Osales@dzone.com
NE .C O M
Copyright 2015 DZone, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or
D Zpermission
O NE, INC
transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written
of the. publisher.
VERSION 1.0
$7.95