You are on page 1of 90

6/1/23, 10:12 PRQL Language Book

Introduction
PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL
replacement. Like SQL, it’s readable, explicit and declarative. Unlike SQL, it forms a logical
pipeline of transformations, and supports abstractions such as variables and functions. It can
be used with any database that uses SQL, since it transpiles to SQL.

Let’s get started with an example:

PRQL SQL
from employees WITH table_1 AS (
filter start_date > @2021-01-01 SELECT
# Clear date syntax title,
derive [ country,
# `derive` adds columns / variables salary + COALESCE(tax, 0) +
gross_salary = salary + (tax ?? 0), benefits_cost AS _expr_0,
# Terse coalesce salary + COALESCE(tax, 0) AS _expr_1
gross_cost = gross_salary + FROM
benefits_cost, # Variables can use employees
other variables WHERE
] start_date > DATE '2021-01-01'
filter gross_cost > 0 )
group [title, country] ( SELECT
# `group` runs a pipeline over each title,
group country,
aggregate [ AVG(_expr_1),
# `aggregate` reduces each group to a SUM(_expr_0) AS sum_gross_cost,
value CONCAT(title, '_', country) AS id,
average gross_salary, LEFT(country, 2) AS country_code
sum_gross_cost = sum gross_cost, FROM
# `=` sets a column name table_1
] WHERE
) _expr_0 > 0
filter sum_gross_cost > 100000 GROUP BY
# `filter` replaces both of SQL's title,
`WHERE` & `HAVING` country
derive id = f"{title}_{country}" HAVING
# F-strings like python SUM(_expr_0) > 100000
derive country_code = s"LEFT(country, ORDER BY
2)" # S-strings allow using SQL as sum_gross_cost,
an escape hatch country DESC
sort [sum_gross_cost, -country] LIMIT
# `-country` means descending order 20
take 1..20
# Range expressions (also valid here as
`take 20`)

https://prql-lang.org/book/print.html 1/90
6/1/23, 10:12 PRQL Language Book

As you can see, PRQL is a linear pipeline of transformations — each line of the query is a
transformation of the previous line’s result.

You can see that in SQL, operations do not follow one another, which makes it hard to
compose larger queries.

https://prql-lang.org/book/print.html 2/90
6/1/23, 10:12 PRQL Language Book

Pipelines

The simplest pipeline


The simplest pipeline is just:

PRQL SQL
from employees SELECT
*
FROM
employees

Adding transformations
We can add additional lines, each one transforms the result:

PRQL SQL
from employees SELECT
derive gross_salary = (salary + *,
payroll_tax) salary + payroll_tax AS gross_salary
FROM
employees

…and so on:

from employees
derive gross_salary = (salary + payroll_tax)
sort gross_salary

Compiling to SQL
When compiling to SQL, the PRQL compiler will try to represent as many transforms as possible
with a single SELECT statement. When necessary it will “overflow” using CTEs (common table
expressions):
https://prql-lang.org/book/print.html 3/90
6/1/23, 10:12 PRQL Language Book

from e = employees
derive gross_salary = (salary + payroll_tax)
sort gross_salary
take 10
join d = department [==dept_no]
select [e.name, gross_salary, d.name]

See also
Syntax

https://prql-lang.org/book/print.html 4/90
6/1/23, 10:12 PRQL Language Book

Functions
Functions are a fundamental abstraction in PRQL — they allow us to run code in many places
that we’ve written once. This reduces the number of errors in our code, makes our code more
readable, and simplifies making changes.

Functions have two types of parameters:

1. Positional parameters, which require an argument.


2. Named parameters, which optionally take an argument, otherwise using their default
value.

So this function is named fahrenheit_to_celsius and has one parameter temp :

PRQL SQL
func fahrenheit_to_celsius temp -> (temp SELECT
- 32) / 1.8 *,
(temp_f - 32) / 1.8 AS temp_c
from cities FROM
derive temp_c = (fahrenheit_to_celsius cities
temp_f)

This function is named interp , and has two positional parameters named higher and x , and
one named parameter named lower which takes a default argument of 0 . It calculates the
proportion of the distance that x is between lower and higher .

PRQL SQL
func interp lower:0 higher x -> (x - SELECT
lower) / (higher - lower) *,
(sat_score - 0) / 1600 AS
from students sat_proportion_1,
derive [ (sat_score - 0) / 1600 AS
sat_proportion_1 = (interp 1600 sat_proportion_2
sat_score), FROM
sat_proportion_2 = (interp lower:0 students
1600 sat_score),
]

https://prql-lang.org/book/print.html 5/90
6/1/23, 10:12 PRQL Language Book

Piping
Consistent with the principles of PRQL, it’s possible to pipe values into functions, which makes
composing many functions more readable. When piping a value into a function, the value is
passed as an argument to the final positional parameter of the function. Here’s the same result
as the examples above with an alternative construction:

PRQL SQL
func interp lower:0 higher x -> (x - SELECT
lower) / (higher - lower) *,
(sat_score - 0) / 1600 AS
from students sat_proportion_1,
derive [ (sat_score - 0) / 1600 AS
sat_proportion_1 = (sat_score | interp sat_proportion_2
1600), FROM
sat_proportion_2 = (sat_score | interp students
lower:0 1600),
]

and

PRQL SQL
func fahrenheit_to_celsius temp -> (temp SELECT
- 32) / 1.8 *,
(temp_f - 32) / 1.8 AS temp_c
from cities FROM
derive temp_c = (temp_f | cities
fahrenheit_to_celsius)

We can combine a chain of functions, which makes logic more readable:

PRQL SQL
func fahrenheit_to_celsius temp -> (temp SELECT
- 32) / 1.8 *,
func interp lower:0 higher x -> (x - ((temp_c - 32) / 1.8 - 0) / 100 AS
lower) / (higher - lower) boiling_proportion
FROM
from kettles kettles
derive boiling_proportion = (temp_c |
fahrenheit_to_celsius | interp 100)

https://prql-lang.org/book/print.html 6/90
6/1/23, 10:12 PRQL Language Book

Roadmap

Late binding

Currently, functions require a binding to variables in scope; they can’t late-bind to column
names; so for example:

func return price -> (price - dividend) / price_yesterday

…isn’t yet a valid function, and instead would needs to be:

func return price dividend price_yesterday -> (price - dividend) /


(price_yesterday)

(which makes functions in this case not useful)

https://prql-lang.org/book/print.html 7/90
6/1/23, 10:12 PRQL Language Book

Tables
We can create a table — similar to a CTE in SQL — with table :

PRQL SQL
table top_50 = ( WITH table_0 AS (
from employees SELECT
sort salary salary
take 50 FROM
aggregate [total_salary = sum salary] employees
) ORDER BY
salary
from top_50 # Starts a new pipeline LIMIT
50
), top_50 AS (
SELECT
SUM(salary) AS total_salary
FROM
table_0
)
SELECT
total_salary
FROM
top_50

Note

The table expression requires surrounding parentheses. Without parentheses, the compiler
wouldn’t be able to evaluate where the expression stopped and the main pipeline started.

We can even place a whole CTE in an s-string, enabling us to use features which PRQL doesn’t
yet support.

https://prql-lang.org/book/print.html 8/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
table grouping = s""" WITH table_0 AS (
SELECT SUM(a) SELECT
FROM tbl SUM(a)
GROUP BY FROM
GROUPING SETS tbl
((b, c, d), (d), (b, d)) GROUP BY
""" GROUPING SETS ((b, c, d), (d), (b,
d))
from grouping ),
grouping AS (
SELECT
*
FROM
table_0 AS table_1
)
SELECT
*
FROM
grouping

Info

In PRQL table s are far less common than CTEs are in SQL, since a linear series of CTEs can
be represented with a single pipeline.

Syntax Usage Example


| Pipelines from employees | select first_name

from e = employees
= Assigns & Aliases
derive total = (sum salary)

Named args &


: interp lower:0 1600 sat_score
Parameters
[] Lists select [id, amount]

derive celsius = (fahrenheit - 32) /


() Precedence
1.8

'' &
Strings derive name = 'Mary'
""

` ` Quoted identifiers select `first name`

# Comments # A comment

@ Dates & Times @2021-01-01

https://prql-lang.org/book/print.html 9/90
6/1/23, 10:12 PRQL Language Book

Syntax Usage Example


== Equality filter [a == b, c != d, e > f]

== Self-equality in join join s=salaries [==id]

-> Function definitions func add a b -> a + b

+/- Sort order sort [-amount, +date]

?? Coalesce amount ?? 0

Pipes
Pipes — the connection between transforms that make up a pipeline — can be either line
breaks or a pipe character ( | ).

In almost all situations, line-breaks pipe the result of a line’s transform into the transform on
the following line. For example, the filter transform operates on the result of from
employees (which is just the employees table), and the select transform operates on the
result of the filter transform.

PRQL SQL
from employees SELECT
filter department == "Product" first_name,
select [first_name, last_name] last_name
FROM
employees
WHERE
department = 'Product'

In the place of a line-break, it’s also possible to use the | character to pipe results, such that
this is equivalent:

PRQL SQL
from employees | filter department == SELECT
"Product" | select [first_name, first_name,
last_name] last_name
FROM
employees
WHERE
department = 'Product'

https://prql-lang.org/book/print.html 10/90
6/1/23, 10:12 PRQL Language Book

A line-break doesn’t create a pipeline in a couple of cases:

within a list (e.g. the derive examples below),


when the following line is a new statement, which starts with a keyword of func , table
or from .

Lists
Lists are represented with [] , and can span multiple lines. A final trailing comma is optional.

PRQL SQL
from numbers SELECT
derive [x = 1, y = 2] *,
derive [ 1 AS x,
a = x, 2 AS y,
b = y 1 AS a,
] 2 AS b,
derive [ 1 AS c,
c = a, 2 AS d
d = b, FROM
] numbers

Most transforms can take either a list or a single item, so these are equivalent:

PRQL SQL
from employees SELECT
select [first_name] first_name
FROM
employees

PRQL SQL
from employees SELECT
select first_name first_name
FROM
employees

https://prql-lang.org/book/print.html 11/90
6/1/23, 10:12 PRQL Language Book

Parentheses
Parentheses —  () — are used to give precedence to inner expressions, as is the case in
almost all languages / math.

In particular, parentheses are used to nest pipelines for transforms such as group and
window , which take a pipeline. Here, the aggregate pipeline is applied to each group of
unique title and country values.

PRQL SQL
from employees SELECT
group [title, country] ( title,
aggregate [ country,
average salary, AVG(salary),
ct = count COUNT(*) AS ct
] FROM
) employees
GROUP BY
title,
country

Comments
Comments are represented by # . Currently only single line comments exist.

PRQL SQL
from employees # Comment 1 SELECT
# Comment 2 AVG(salary)
aggregate [average salary] FROM
employees

Quoted identifiers
To use identifiers that are otherwise invalid, surround them with backticks. Depending on the
dialect, these will remain as backticks or be converted to double-quotes.

https://prql-lang.org/book/print.html 12/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
prql target:sql.mysql SELECT
from employees `first name`
select `first name` FROM
employees

PRQL SQL
prql target:sql.postgres SELECT
from employees "first name"
select `first name` FROM
employees

BigQuery also uses backticks to surround project & dataset names (even if valid identifiers) in
the SELECT statement:

PRQL SQL
prql target:sql.bigquery SELECT
from `project-foo.dataset.table` `project-foo.dataset.table`.*,
join `project-bar.dataset.table` `project-bar.dataset.table`.*
[==col_bax] FROM
`project-foo.dataset.table`
JOIN `project-bar.dataset.table` ON
`project-foo.dataset.table`.col_bax =
`project-bar.dataset.table`.col_bax

Parameters
PRQL will retain parameters like $1 in SQL output, which can then be supplied to the SQL
query:

PRQL SQL
from employees SELECT
filter id == $1 *
FROM
employees
WHERE
id = $1

https://prql-lang.org/book/print.html 13/90
6/1/23, 10:12 PRQL Language Book

Query header: Target dialect & Version

Target dialect
PRQL allows specifying a target dialect at the top of the query, which allows PRQL to compile to
a database-specific SQL flavor.

Examples

PRQL SQL
prql target:sql.postgres SELECT
*
from employees FROM
sort age employees
take 10 ORDER BY
age
LIMIT
10

PRQL SQL
prql target:sql.mssql SELECT
TOP (10) *
from employees FROM
sort age employees
take 10 ORDER BY
age

Supported dialects

Note

Note that dialect support is early — most differences are not implemented, and most
dialects’ implementations are identical to generic ’s. Contributions are very welcome.

sql.ansi

https://prql-lang.org/book/print.html 14/90
6/1/23, 10:12 PRQL Language Book

sql.bigquery
sql.clickhouse
sql.generic
sql.hive
sql.mssql
sql.mysql
sql.postgres
sql.sqlite
sql.snowflake

Version
PRQL allows specifying a version of the language in the PRQL header, like:

PRQL SQL
prql version:"0.3" SELECT
*
from employees FROM
employees

This has two roles, one of which is implemented:

The compiler will raise an error if the compiler is older than the query version. This
prevents confusing errors when queries use newer features of the language but the
compiler hasn’t yet been upgraded.
The compiler will compile for the major version of the query. This allows the language to
evolve without breaking existing queries, or forcing multiple installations of the compiler.
This isn’t yet implemented, but is a gating feature for PRQL 1.0.

https://prql-lang.org/book/print.html 15/90
6/1/23, 10:12 PRQL Language Book

Transforms
PRQL queries are a pipeline of transformations (“transforms”), where each transform takes the
previous result and adjusts it in some way, before passing it onto to the next transform.

Because PRQL focuses on modularity, we have far fewer transforms than SQL, each one
fulfilling a specific purpose. That’s often referred to as “orthogonality”.

These are the currently available transforms:

Transform Purpose SQL Equivalent


from Starts from a table FROM

SELECT *, ... AS
derive Computes new columns
...

select Picks & computes columns SELECT ... AS ...

WHERE ,
filter Picks rows based on their values
HAVING , QUALIFY

sort Orders rows based on the values of columns ORDER BY

Adds columns from another table, matching


join JOIN
rows based on a condition
take Picks rows based on their position TOP , LIMIT , OFFSET

Partitions rows into groups and applies a GROUP BY , PARTITION


group
pipeline to each of them BY

aggregate Summarizes many rows into one row SELECT foo(...)

Applies a pipeline to overlapping segments


window OVER , ROWS , RANGE
of rows

https://prql-lang.org/book/print.html 16/90
6/1/23, 10:12 PRQL Language Book

Aggregate
Summarizes many rows into one row.

When applied:

without group , it produces one row from the whole table,


within a group pipeline, it produces one row from each group.

aggregate [{expression or assign operations}]

Note

Currently, all declared aggregation functions are min , max , count , average , stddev ,
avg , sum and count_distinct . We are in the process of filling out std lib.

Examples
PRQL SQL
from employees SELECT
aggregate [ AVG(salary),
average salary, COUNT(*) AS ct
ct = count FROM
] employees

PRQL SQL
from employees SELECT
group [title, country] ( title,
aggregate [ country,
average salary, AVG(salary),
ct = count COUNT(*) AS ct
] FROM
) employees
GROUP BY
title,
country

https://prql-lang.org/book/print.html 17/90
6/1/23, 10:12 PRQL Language Book

Aggregate is required
Unlike in SQL, using an aggregation function in derive or select (or any other transform
except aggregate ) will not trigger aggregation. By default, PRQL will interpret such attempts
functions as window functions:

PRQL SQL
from employees SELECT
derive [avg_sal = average salary] *,
AVG(salary) OVER () AS avg_sal
FROM
employees

This ensures that derive does not manipulate the number of rows, but only ever adds a
column. For more information, see window transform.

https://prql-lang.org/book/print.html 18/90
6/1/23, 10:12 PRQL Language Book

Derive
Computes one or more new columns.

derive [
{new_name} = {expression},
# or
{expression}
]

Examples
PRQL SQL
from employees SELECT
derive gross_salary = salary + *,
payroll_tax salary + payroll_tax AS gross_salary
FROM
employees

PRQL SQL
from employees SELECT
derive [ *,
gross_salary = salary + payroll_tax, salary + payroll_tax AS gross_salary,
gross_cost = gross_salary + salary + payroll_tax + benefits_cost
benefits_cost AS gross_cost
] FROM
employees

https://prql-lang.org/book/print.html 19/90
6/1/23, 10:12 PRQL Language Book

Filter
Picks rows based on their values.

filter {boolean_expression}

Examples
PRQL SQL
from employees SELECT
filter age > 25 *
FROM
employees
WHERE
age > 25

PRQL SQL
from employees SELECT
filter (age | in 25..40) *
FROM
employees
WHERE
age BETWEEN 25
AND 40

https://prql-lang.org/book/print.html 20/90
6/1/23, 10:12 PRQL Language Book

From
Specifies a data source.

from {table_reference}

Examples
PRQL SQL
from employees SELECT
*
FROM
employees

To introduce an alias, use an assign expression:

PRQL SQL
from e = employees SELECT
select e.first_name first_name
FROM
employees AS e

https://prql-lang.org/book/print.html 21/90
6/1/23, 10:12 PRQL Language Book

Group
Partitions the rows into groups and applies a pipeline to each of the groups.

group [{key_columns}] {pipeline}

The partitioning of groups are determined by the key_column s (first argument).

The most conventional use of group is with aggregate :

PRQL SQL
from employees SELECT
group [title, country] ( title,
aggregate [ country,
average salary, AVG(salary),
ct = count COUNT(*) AS ct
] FROM
) employees
GROUP BY
title,
country

In concept, a transform in context of a group does the same transformation to the group as it
would to the table — for example finding the employee who joined first across the whole table:

PRQL SQL
from employees SELECT
sort join_date *
take 1 FROM
employees
ORDER BY
join_date
LIMIT
1

To find the employee who joined first in each department, it’s exactly the same pipeline, but
within a group expression:

https://prql-lang.org/book/print.html 22/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
from employees WITH table_1 AS (
group role ( SELECT
sort join_date # taken from above *,
take 1 ROW_NUMBER() OVER (
) PARTITION BY role
ORDER BY
join_date
) AS _expr_0
FROM
employees
)
SELECT
*
FROM
table_1
WHERE
_expr_0 <= 1

https://prql-lang.org/book/print.html 23/90
6/1/23, 10:12 PRQL Language Book

Join
Adds columns from another table, matching rows based on a condition.

join side:{inner|left|right|full} {table} {[conditions]}

Parameters
side decides which rows to include, defaulting to inner .
Table reference
List of conditions
The result of join operation is a cartesian (cross) product of rows from both tables,
which is then filtered to match all of these conditions.
If name is the same from both tables, it can be expressed with only ==col .

Examples
PRQL SQL
from employees SELECT
join side:left positions employees.*,
[employees.id==positions.employee_id] positions.*
FROM
employees
LEFT JOIN positions ON employees.id =
positions.employee_id

PRQL SQL
from employees SELECT
join side:left p=positions employees.*,
[employees.id==p.employee_id] p.*
FROM
employees
LEFT JOIN positions AS p ON
employees.id = p.employee_id

https://prql-lang.org/book/print.html 24/90
6/1/23, 10:12 PRQL Language Book

Self equality operator


If the join conditions are of form left.x == right.x , we can use “self equality operator”:

PRQL SQL
from employees SELECT
join positions [==emp_no] employees.*,
positions.*
FROM
employees
JOIN positions ON employees.emp_no =
positions.emp_no

https://prql-lang.org/book/print.html 25/90
6/1/23, 10:12 PRQL Language Book

Select
Picks and computes columns.

select [
{new_name} = {expression},
# or
{expression}
]

Examples
PRQL SQL
from employees SELECT
select name = f"{first_name} CONCAT(first_name, ' ', last_name) AS
{last_name}" name
FROM
employees

PRQL SQL
from employees SELECT
select [ CONCAT(first_name, ' ', last_name) AS
name = f"{first_name} {last_name}", name,
age_eoy = dob - @2022-12-31, dob - DATE '2022-12-31' AS age_eoy
] FROM
employees

PRQL SQL
from employees SELECT
select first_name first_name
FROM
employees

PRQL SQL
from e=employees SELECT
select [e.first_name, e.last_name] first_name,
last_name
FROM
employees AS e

https://prql-lang.org/book/print.html 26/90
6/1/23, 10:12 PRQL Language Book

Note

In the final example above, the e representing the table / namespace is no longer available
after the select statement. For example, this would raise an error:

from e=employees
select e.first_name
filter e.first_name == "Fred" # Can't find `e.first_name`

To refer to the e.first_name column in subsequent transforms, either refer to it using


first_name , or if it requires a different name, assign one in the select statement:

PRQL SQL
from e=employees WITH table_1 AS (
select fname = e.first_name SELECT
filter fname == "Fred" first_name AS fname
FROM
employees AS e
)
SELECT
fname
FROM
table_1
WHERE
fname = 'Fred'

https://prql-lang.org/book/print.html 27/90
6/1/23, 10:12 PRQL Language Book

Concat & Union


Note

concat & union are currently experimental and may have bugs; please report any as
GitHub Issues.

Concat
concat concatenates two tables together, like UNION ALL in SQL. The number of rows is
always the sum of the number of rows from the two input tables.

PRQL SQL
from employees_1 (
concat employees_2 SELECT
*
FROM
employees_1
)
UNION
ALL
SELECT
*
FROM
employees_2

Union
union takes the union of rows, where duplicates are discarded (using the definition of union
from set logic), like UNION DISTINCT in SQL. If all rows are different between the tables, this is
synonymous with concat ; if there are duplicate rows it will produce fewer rows.

https://prql-lang.org/book/print.html 28/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
from employees_1 (
union employees_2 SELECT
*
FROM
employees_1
)
UNION
DISTINCT
SELECT
*
FROM
employees_2

Roadmap
We’d also like to implement the set operations of intersect and difference .

https://prql-lang.org/book/print.html 29/90
6/1/23, 10:12 PRQL Language Book

Sort
Orders rows based on the values of one or more columns.

sort [{direction}{column}]

Parameters
One column or a list of columns to sort by
Each column can be prefixed with:
+ , for ascending order, the default
- , for descending order
When using prefixes, even a single column needs to be in a list or parentheses.
(Otherwise, sort -foo is parsed as a subtraction between sort and foo .)

Examples
PRQL SQL
from employees SELECT
sort age *
FROM
employees
ORDER BY
age

PRQL SQL
from employees SELECT
sort [-age] *
FROM
employees
ORDER BY
age DESC

https://prql-lang.org/book/print.html 30/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
from employees SELECT
sort [age, -tenure, +salary] *
FROM
employees
ORDER BY
age,
tenure DESC,
salary

We can also use expressions:

PRQL SQL
from employees WITH table_1 AS (
sort [s"substr({first_name}, 2, 5)"] SELECT
*,
substr(first_name, 2, 5) AS _expr_0
FROM
employees
ORDER BY
_expr_0
)
SELECT
*
FROM
table_1

Notes

Ordering guarantees

Most DBs will persist ordering through most transforms; for example, you can expect this
result to be ordered by tenure .

https://prql-lang.org/book/print.html 31/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
from employees SELECT
sort tenure *,
derive name = f"{first_name} CONCAT(first_name, ' ', last_name) AS
{last_name}" name
FROM
employees
ORDER BY
tenure

But:

This is an implementation detail of the DB. If there are instances where this doesn’t hold,
please open an issue, and we’ll consider how to manage it.
Some transforms which change the existence of rows, such as join or group , won’t
persist ordering; for example:

PRQL SQL
from employees WITH table_1 AS (
sort tenure SELECT
join locations [==employee_id] *
FROM
employees
ORDER BY
tenure
)
SELECT
table_1.*,
locations.*
FROM
table_1
JOIN locations ON table_1.employee_id
= locations.employee_id

See Issue #1363 for more details.

https://prql-lang.org/book/print.html 32/90
6/1/23, 10:12 PRQL Language Book

Take
Picks rows based on their position.

take {n|range}

See Ranges for more details on how ranges work.

Examples
PRQL SQL
from employees SELECT
take 10 *
FROM
employees
LIMIT
10

PRQL SQL
from orders SELECT
sort [-value, date] *
take 101..110 FROM
orders
ORDER BY
value DESC,
date
LIMIT
10 OFFSET 100

https://prql-lang.org/book/print.html 33/90
6/1/23, 10:12 PRQL Language Book

Window
Applies a pipeline to segments of rows, producing one output value for every input value.

window rows:{range} range:{range} expanding:false rolling:0 {pipeline}

For each row, the segment over which the pipeline is applied is determined by one of:

rows , which takes a range of rows relative to the current row position.
0 references the current row.
range , which takes a range of values relative to current row value.

The bounds of the range are inclusive. If a bound is omitted, the segment will extend until the
edge of the table or group.

For ease of use, there are two flags that override rows or range :

expanding:true is an alias for rows:..0 . A sum using this window is also known as
“cumulative sum”.
rolling:n is an alias for row:(-n+1)..0 , where n is an integer. This will include n last
values, including current row. An average using this window is also knows as a Simple
Moving Average.

Some examples:

Expression Meaning
rows:0..2 current row plus two following
rows:-2..0 two preceding rows plus current row
rolling:3 (same as previous)
rows:-2..4 two preceding rows plus current row plus four following rows
rows:..0 all rows from the start of the table up to & including current row
expanding:true (same as previous)
rows:0.. current row and all following rows until the end of the table
rows:.. all rows, which same as not having window at all

https://prql-lang.org/book/print.html 34/90
6/1/23, 10:12 PRQL Language Book

Example
PRQL SQL
from employees SELECT
group employee_id ( *,
sort month SUM(paycheck) OVER (
window rolling:12 ( PARTITION BY employee_id
derive [trail_12_m_comp = sum ORDER BY
paycheck] month ROWS BETWEEN 11 PRECEDING
) AND CURRENT ROW
) ) AS trail_12_m_comp
FROM
employees

PRQL SQL
from orders SELECT
sort day *,
window rows:-3..3 ( AVG(value) OVER (
derive [centered_weekly_average = ORDER BY
average value] day ROWS BETWEEN 3 PRECEDING
) AND 3 FOLLOWING
group [order_month] ( ) AS centered_weekly_average,
sort day SUM(value) OVER (
window expanding:true ( PARTITION BY order_month
derive [monthly_running_total = sum ORDER BY
value] day ROWS BETWEEN UNBOUNDED
) PRECEDING
) AND CURRENT ROW
) AS monthly_running_total
FROM
orders

Windowing by default
If you use window functions without window transform, they will be applied to the whole table.
Unlike in SQL, they will remain window functions and will not trigger aggregation.

https://prql-lang.org/book/print.html 35/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
from employees SELECT
sort age *,
derive rnk = rank RANK() OVER (
ORDER BY
age ROWS BETWEEN UNBOUNDED
PRECEDING
AND UNBOUNDED FOLLOWING
) AS rnk
FROM
employees
ORDER BY
age

You can also only apply group :

PRQL SQL
from employees SELECT
group department ( *,
sort age RANK() OVER (
derive rnk = rank PARTITION BY department
) ORDER BY
age ROWS BETWEEN UNBOUNDED
PRECEDING
AND UNBOUNDED FOLLOWING
) AS rnk
FROM
employees

Window functions as first class citizens


There is no limitaions where windowed expressions can be used:

https://prql-lang.org/book/print.html 36/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
from employees WITH table_1 AS (
filter salary < (average salary) SELECT
*,
AVG(salary) OVER () AS _expr_0
FROM
employees
)
SELECT
*
FROM
table_1
WHERE
salary < _expr_0

https://prql-lang.org/book/print.html 37/90
6/1/23, 10:12 PRQL Language Book

Coalesce
We can coalesce values with an ?? operator. Coalescing takes either the first value or, if that
value is null, the second value.

PRQL SQL
from orders SELECT
derive amount ?? 0 *,
COALESCE(amount, 0)
FROM
orders

https://prql-lang.org/book/print.html 38/90
6/1/23, 10:12 PRQL Language Book

Dates & Times


PRQL uses @ followed by a string to represent dates & times. This is less verbose than SQL’s
approach of TIMESTAMP '2004-10-19 10:23:54' and more explicit than SQL’s implicit option of
just using a string '2004-10-19 10:23:54' .

Note

Currently PRQL passes strings which can be compiled straight through to the database, and
so many compatible formats string may work, but we may refine this in the future to aid in
compatibility across databases. We’ll always support the canonical ISO8601 format
described below.

Dates
Dates are represented by @{yyyy-mm-dd} — a @ followed by the date format.

PRQL SQL
from employees SELECT
derive age_at_year_end = (@2022-12-31 - *,
dob) DATE '2022-12-31' - dob AS
age_at_year_end
FROM
employees

Times
Times are represented by @{HH:mm:ss.SSS±Z} with any parts not supplied being rounded to
zero, including the timezone, which is represented by +HH:mm , -HH:mm or Z . This is consistent
with the ISO8601 time format.

https://prql-lang.org/book/print.html 39/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
from orders SELECT
derive should_have_shipped_today = *,
(order_time < @08:30) order_time < TIME '08:30' AS
should_have_shipped_today
FROM
orders

Timestamps
Timestamps are represented by @{yyyy-mm-ddTHH:mm:ss.SSS±Z} / @{date}T{time} , with any
time parts not supplied being rounded to zero, including the timezone, which is represented by
+HH:mm , -HH:mm or Z . This is  @ followed by the ISO8601 datetime format, which uses T to
separate date & time.

PRQL SQL
from commits SELECT
derive first_prql_commit = @2020-01- *,
01T13:19:55-0800 TIMESTAMP '2020-01-01T13:19:55-0800'
AS first_prql_commit
FROM
commits

Intervals
Intervals are represented by {N}{periods} , such as 2years or 10minutes , without a space.

Note

These aren’t the same as ISO8601, because we evaluated P3Y6M4DT12H30M5S to be difficult


to understand, but we could support a simplified form if there’s demand for it. We don’t
currently support compound expressions, for example 2years10months , but most DBs will
allow 2years + 10months . Please raise an issue if this is inconvenient.

https://prql-lang.org/book/print.html 40/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
from projects SELECT
derive first_check_in = start + 10days *,
start + INTERVAL 10 DAY AS
first_check_in
FROM
projects

Examples
Here’s a fuller list of examples:

@20221231 is forbidden — it must contain full punctuation ( - and : ),


@2022-12-31 is a date
@2022-12 or @2022 are forbidden — SQL can’t express a month, only a date
@16:54:32.123456 is a time
@16:54:32 , @16:54 , @16 are all allowed, expressing @16:54:32.000000 ,
@16:54:00.000000 , @16:00:00.000000 respectively
@2022-12-31T16:54:32.123456 is a timestamp without timezone
@2022-12-31T16:54:32.123456Z is a timestamp in UTC
@2022-12-31T16:54+02 is timestamp in UTC+2
@2022-12-31T16:54+02:00 and @2022-12-31T16:54+02 are datetimes in UTC+2
@16:54+02 is forbidden — time is always local, so it cannot have a timezone
@2022-12-31+02 is forbidden — date is always local, so it cannot have a timezone

Roadmap

Datetimes

Datetimes are supported by some databases (e.g. MySql, BigQuery) in addition to timestamps.
When we have type annotations, these will be represented by a timestamp annotated as a
datetime:

derive pi_day = @2017-03-14T15:09:26.535898<datetime>

These are some examples we can then add:

https://prql-lang.org/book/print.html 41/90
6/1/23, 10:12 PRQL Language Book

@2022-12-31T16:54<datetime> is datetime without timezone


@2022-12-31<datetime> is forbidden — datetime must specify time
@16:54<datetime> is forbidden — datetime must specify date

https://prql-lang.org/book/print.html 42/90
6/1/23, 10:12 PRQL Language Book

Distinct
PRQL doesn’t have a specific distinct keyword. Instead, use group and take 1 :

PRQL SQL
from employees SELECT
select department DISTINCT department
group department ( FROM
take 1 employees
)

This also works without a linebreak:

PRQL SQL
from employees SELECT
select department DISTINCT department
group department (take 1) FROM
employees

Selecting from each group


We are be able to select a single row from each group by combining group and sort :

https://prql-lang.org/book/print.html 43/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
# youngest employee from each department WITH table_1 AS (
from employees SELECT
group department ( *,
sort age ROW_NUMBER() OVER (
take 1 PARTITION BY department
) ORDER BY
age
) AS _expr_0
FROM
employees
)
SELECT
*
FROM
table_1
WHERE
_expr_0 <= 1

Roadmap
When using Postgres dialect, we are planning to compile:

# youngest employee from each department


from employees
group department (
sort age
take 1
)

… to …

SELECT DISTINCT ON (department) *


FROM employees
ORDER BY department, age

https://prql-lang.org/book/print.html 44/90
6/1/23, 10:12 PRQL Language Book

F-Strings
f-strings are a readable approach to building new strings from existing strings. Currently PRQL
supports this for concatenating strings:

PRQL SQL
from employees SELECT
select full_name = f"{first_name} CONCAT(first_name, ' ', last_name) AS
{last_name}" full_name
FROM
employees

This can be much easier to read for longer strings, relative to the SQL approach:

PRQL SQL
from web SELECT
select url = f"http{tls}://www.{domain}. CONCAT(
{tld}/{page}" 'http',
tls,
'://www.',
domain,
'.',
tld,
'/',
page
) AS url
FROM
web

Roadmap
In the future, f-strings may incorporate string formatting such as datetimes, numbers, and
padding. If there’s a feature that would be helpful, please post an issue.

https://prql-lang.org/book/print.html 45/90
6/1/23, 10:12 PRQL Language Book

Null handling
SQL has an unconventional way of handling NULL values, since it treats them as unknown
values. As a result, in SQL:

NULL is not a value indicating a missing entry, but a placeholder for anything possible,
NULL = NULL evaluates to NULL , since one cannot know if one unknown is equal to
another unknown,
NULL <> NULL evaluates to NULL , using same logic,
to check if a value is NULL , SQL introduces IS NULL and IS NOT NULL operators,
DISTINCT column may return multiple NULL values.

For more information, check out the Postgres documentation.

PRQL, on the other hand, treats null as a value, which means that:

null == null evaluates to true ,


null != null evaluates to false ,
distinct column cannot contain multiple null values.

PRQL SQL
from employees SELECT
filter first_name == null *
filter null != last_name FROM
employees
WHERE
first_name IS NULL
AND last_name IS NOT NULL

Note that PRQL doesn’t change how NULL is compared between columns, for example in joins.
(PRQL compiles to SQL and so can’t change the behavior of the database).

For more context or to provide feedback check out the discussion on issue #99.

https://prql-lang.org/book/print.html 46/90
6/1/23, 10:12 PRQL Language Book

Ranges
PRQL has a concise range syntax start..end . If only one of start & end are supplied, the
range is open on the empty side.

Ranges can be used in filters with the in function, with any type of literal, including dates:

PRQL SQL
from events SELECT
filter (date | in @1776-07-04..@1787-09- *,
17) latitude >= 0 AS is_northern
filter (magnitude | in 50..100) FROM
derive is_northern = (latitude | in 0..) events
WHERE
date BETWEEN DATE '1776-07-04'
AND DATE '1787-09-17'
AND magnitude BETWEEN 50
AND 100

Like in SQL, ranges are inclusive.

As discussed in the take docs, ranges can also be used in take :

PRQL SQL
from orders SELECT
sort [-value, date] *
take 101..110 FROM
orders
ORDER BY
value DESC,
date
LIMIT
10 OFFSET 100

Note

Half-open ranges are generally less intuitive to read than a simple >= or <= operator.

https://prql-lang.org/book/print.html 47/90
6/1/23, 10:12 PRQL Language Book

Roadmap
We’d like to use ranges for other types, such as whether an object is in an array or list literal.

https://prql-lang.org/book/print.html 48/90
6/1/23, 10:12 PRQL Language Book

S-Strings
An s-string inserts SQL directly, as an escape hatch when there’s something that PRQL doesn’t
yet implement. For example, there’s no version() function in SQL that returns the Postgres
version, so if we want to use that, we use an s-string:

PRQL SQL
from my_table SELECT
select db_version = s"version()" version() AS db_version
FROM
my_table

We can embed columns in an s-string using braces. For example, PRQL’s standard library
defines the average function as:

func average column -> s"AVG({column})"

So this compiles using the function:

PRQL SQL
from employees SELECT
aggregate [average salary] AVG(salary)
FROM
employees

Here’s an example of a more involved use of an s-string:

PRQL SQL
from de=dept_emp SELECT
join s=salaries side:left [ de.*,
(s.emp_no == de.emp_no), s.*
s"""({s.from_date}, {s.to_date}) FROM
OVERLAPS dept_emp AS de
({de.from_date}, {de.to_date})""" LEFT JOIN salaries AS s ON s.emp_no =
] de.emp_no
AND (s.from_date, s.to_date) OVERLAPS
(de.from_date, de.to_date)

For those who have used python, s-strings are similar to python’s f-strings, but the result is SQL
code, rather than a string literal. For example, a python f-string of f"average{col}" would
https://prql-lang.org/book/print.html 49/90
6/1/23, 10:12 PRQL Language Book

produce "average(salary)" , with quotes; while in PRQL, s"average{col}" produces


average(salary) , without quotes.

We can also use s-strings to produce a full table:

PRQL SQL
from s"SELECT DISTINCT ON first_name, WITH table_2 AS (
id, age FROM employees ORDER BY age ASC" SELECT
join s = s"SELECT * FROM salaries" DISTINCT ON first_name,
[==id] id,
age
FROM
employees
ORDER BY
age ASC
),
table_3 AS (
SELECT
*
FROM
salaries
)
SELECT
table_0.*,
table_1.*
FROM
table_2 AS table_0
JOIN table_3 AS table_1 ON table_0.id
= table_1.id

Note

S-strings in user code are intended as an escape-hatch for an unimplemented feature. If we


often need s-strings to express something, that’s a sign we should implement it in PRQL or
PRQL’s stdlib.

Braces
To output braces from an s-string, use double braces:

https://prql-lang.org/book/print.html 50/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
from employees SELECT
derive [ *,
has_valid_title = regexp_contains(title, '([a-z0-9]*-)
s"regexp_contains(title, '([a-z0-9]*-) {2,}') AS has_valid_title
{{2,}}')" FROM
] employees

Precedence
The PRQL compiler simply places a literal copy of each variable into the resulting string, which
means we may get surprising behavior when the variable is has multiple terms and the s-string
isn’t parenthesized.

In this toy example, the salary + benefits / 365 gets precedence wrong:

PRQL SQL
from employees SELECT
derive [ *,
gross_salary = salary + benefits, salary + benefits AS gross_salary,
daily_rate = s"{gross_salary} / 365" salary + benefits / 365 AS daily_rate
] FROM
employees

Instead, we’d need to put the denominator {gross_salary} in parentheses:

PRQL SQL
from employees SELECT
derive [ *,
gross_salary = salary + benefits, salary + benefits AS gross_salary,
daily_rate = s"({gross_salary}) / 365" (salary + benefits) / 365 AS
] daily_rate
FROM
employees

https://prql-lang.org/book/print.html 51/90
6/1/23, 10:12 PRQL Language Book

Strings
Strings in PRQL can use either single or double quotes:

PRQL SQL
from my_table SELECT
select x = "hello world" 'hello world' AS x
FROM
my_table

PRQL SQL
from my_table SELECT
select x = 'hello world' 'hello world' AS x
FROM
my_table

To quote a string containing quotes, either use the “other” type of quote, or use three-or-more
quotes, and close with the same number.

PRQL SQL
from my_table SELECT
select x = '"hello world"' '"hello world"' AS x
FROM
my_table

PRQL SQL
from my_table SELECT
select x = """I said "hello world"!""" 'I said "hello world"!' AS x
FROM
my_table

PRQL SQL
from my_table SELECT
select x = """""I said """hello 'I said """hello world"""!' AS x
world"""!""""" FROM
my_table

https://prql-lang.org/book/print.html 52/90
6/1/23, 10:12 PRQL Language Book

Note

Currently PRQL does not adjust escape characters.

Warning

Currently PRQL allows multiline strings with either a single character or multiple character
quotes. This may change for strings using a single character quote in future versions.

https://prql-lang.org/book/print.html 53/90
6/1/23, 10:12 PRQL Language Book

Switch
Note

switch is currently experimental and may change behavior in the near future

PRQL uses switch for both SQL’s CASE and IF statements. Here’s an example:

PRQL SQL
from employees SELECT
derive distance = switch [ *,
city == "Calgary" -> 0, CASE
city == "Edmonton" -> 300, WHEN city = 'Calgary' THEN 0
] WHEN city = 'Edmonton' THEN 300
ELSE NULL
END AS distance
FROM
employees

If no condition is met, the value takes a null value. To set a default, use a true condition:

PRQL SQL
from employees SELECT
derive distance = switch [ *,
city == "Calgary" -> 0, CASE
city == "Edmonton" -> 300, WHEN city = 'Calgary' THEN 0
true -> "Unknown", WHEN city = 'Edmonton' THEN 300
] ELSE 'Unknown'
END AS distance
FROM
employees

https://prql-lang.org/book/print.html 54/90
6/1/23, 10:12 PRQL Language Book

Standard Library
The standard library currently contains commonly used functions that are used in SQL. It’s not
yet as broad as we’d like, and we’re very open to expanding it.

Currently s-strings are an escape-hatch for any function that isn’t in our standard library. If we
find ourselves using them for something frequently, raise an issue and we’ll add it to the stdlib.

Note

Currently the stdlib implementation doesn’t support different DB implementations itself;


those need to be built deeper into the compiler. We’ll resolve this at some point. Until then,
we’ll only add functions here that are broadly supported by most DBs.

Here’s the source of the current PRQL std :

https://prql-lang.org/book/print.html 55/90
6/1/23, 10:12 PRQL Language Book

# Aggregate Functions
func min <scalar|column> column -> null
func max <scalar|column> column -> null
func sum <scalar|column> column -> null
func avg <scalar|column> column -> null
func stddev <scalar|column> column -> null
func average <scalar|column> column -> null
func count <scalar|column> non_null:s"*" -> null
# TODO: Possibly make this into `count distinct:true` (or like `distinct:` as an
# abbreviation of that?)
func count_distinct <scalar|column> column -> null

# Window functions
func lag<column> offset column -> null
func lead<column> offset column -> null
func first<column> offset column -> null
func last<column> offset column -> null
func rank<column> -> null
func rank_dense<column> -> null
func row_number<column> -> null

# Other functions
func round<scalar> n_digits column -> null
func as<scalar> `noresolve.type` column -> null
func in<bool> pattern value -> null

# Transform type definitions


func from<table> `default_db.source`<table> -> null
func select<table> columns<column> tbl<table> -> null
func filter<table> condition<bool> tbl<table> -> null
func derive<table> columns<column> tbl<table> -> null
func aggregate<table> a<column> tbl<table> -> null
func sort<table> by tbl<table> -> null
func take<table> expr tbl<table> -> null
func join<table> `default_db.with`<table> filter `noresolve.side`:inner tbl<table>
-> null
func concat<table> `default_db.bottom`<table> top<table> -> null
func union<table> `default_db.bottom`<table> top<table> -> (
top | concat _param.bottom | group [`*`] (take 1)
)
func group<table> by pipeline tbl<table> -> null
func window<table> rows:0..0 range:0..0 expanding:false rolling:0 pipeline
tbl<table> -> null

And a couple of examples:

https://prql-lang.org/book/print.html 56/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
from employees SELECT
derive [ *,
gross_salary = (salary + payroll_tax | CAST(salary + payroll_tax AS int) AS
as int), gross_salary,
gross_salary_rounded = (gross_salary | ROUND(CAST(salary + payroll_tax AS
round 0), int), 0) AS gross_salary_rounded,
time = s"NOW()", # an s-string, given NOW() AS time
no `now` function exists in PRQL FROM
] employees

https://prql-lang.org/book/print.html 57/90
6/1/23, 10:12 PRQL Language Book

Java (prql-java)
prql-java offers rust bindings to the prql-compiler rust library. It exposes a java native
method public static native String toSql(String query) .

Installation

<dependency>
<groupId>org.prqllang</groupId>
<artifactId>prql-java</artifactId>
<version>${PRQL_VERSION}</version>
</dependency>

Usage

import org.prqllang.prql4j.PrqlCompiler;

class Main {
public static void main(String[] args) {
String sql = PrqlCompiler.toSql("from table");
System.out.println(sql);
}
}

https://prql-lang.org/book/print.html 58/90
6/1/23, 10:12 PRQL Language Book

Javascript (prql-js)
JavaScript bindings for prql-compiler . Check out https://prql-lang.org for more context.

Installation

npm install prql-js

Usage
Currently these functions are exposed

function compile(prql_string) # returns CompileResult


function to_sql(prql_string) # returns SQL string
function to_json(prql_string) # returns JSON string ( needs JSON.parse() to get
the json)

From NodeJS

const prql = require("prql-js");

const { sql, error } = compile(`from employees | select first_name`);


console.log(sql);
// handle error as well...

https://prql-lang.org/book/print.html 59/90
6/1/23, 10:12 PRQL Language Book

From a Browser

<html>
<head>
<script src="./node_modules/prql-js/dist/web/prql_js.js"></script>
<script>
const { compile } = wasm_bindgen;

async function run() {


await wasm_bindgen("./node_modules/prql-js/dist/web/prql_js_bg.wasm");
const { sql, error } = compile("from employees | select first_name");

console.log(sql);
// handle error as well...
}

run();
</script>
</head>

<body></body>
</html>

From a Framework or a Bundler

import compile from "prql-js/dist/bundler";

const { sql, error } = compile(`from employees | select first_name`);


console.log(sql);
// handle error as well...

Notes

This uses wasm-pack to generate bindings1.

1 though we would be very open to other approaches, and used


trunk successfully in a rust-driven
approach to this, RIP prql-web .

Development
Build:

https://prql-lang.org/book/print.html 60/90
6/1/23, 10:12 PRQL Language Book

npm run build

This builds Node, bundler and web packages in the dist path.

Test:

wasm-pack test --firefox

https://prql-lang.org/book/print.html 61/90
6/1/23, 10:12 PRQL Language Book

Python (prql-python)

Installation
pip install prql-python

Usage

import prql_python as prql

prql_query = """
from employees
join salaries [==emp_id]
group [dept_id, gender] (
aggregate [
avg_salary = average salary
]
)
"""

sql = prql.to_sql(prql_query)

https://prql-lang.org/book/print.html 62/90
6/1/23, 10:12 PRQL Language Book

R (prqlr)
R bindings for prql-compiler . Check out https://eitsupi.github.io/prqlr/ for more context.

Note

prqlr is generously maintained by @eitsupi in the eitsupi/prqlr repo.

Installation

install.packages("prqlr", repos = "https://eitsupi.r-universe.dev")

Usage

library(prqlr)

"
from employees
join salaries [emp_id]
group [dept_id, gender] (
aggregate [
avg_salary = average salary
]
)
" |>
prql_to_sql()

https://prql-lang.org/book/print.html 63/90
6/1/23, 10:12 PRQL Language Book

Rust (prql-compiler)

Installation

cargo new myproject


cd myproject
cargo add prql-compiler

Usage
cargo run

src/main.rs

use prql_compiler::compile;

fn main() {
let prql = "from employees | select [name,age] ";
let sql = compile(prql).unwrap();
println!("{:?}", sql.replace("\n", " "));
}

Cargo.toml

[package]
name = "myproject"
version = "0.1.0"
edition = "2021"

[dependencies]
prql-compiler = "0.2.2"

https://prql-lang.org/book/print.html 64/90
6/1/23, 10:12 PRQL Language Book

Internals
This chapter explains PRQL’s semantics: how expressions are interpreted and their meaning.
It’s intended for advanced users and compiler contributors.

https://prql-lang.org/book/print.html 65/90
6/1/23, 10:12 PRQL Language Book

Name resolving
Because PRQL primarily handles relational data, it has specialized scoping rules for referencing
columns.

Scopes
In PRQL’s compiler, a scope is the collection of all names one can reference from a specific
point in the program.

In PRQL, names in the scope are composed from namespace and variable name which are
separated by a dot, similar to SQL. Namespaces can contain many dots, but variable names
cannot.

Example

Name my_table.some_column is a variable some_column from namespace my_table .

Name foo.bar.baz is a variable baz from namespace foo.bar .

When processing a query, a scope is maintained and updated for each point in the query.

It start with only namespace std , which is the standard library. It contains common functions
like sum or count , along with all transform functions such as derive and group .

In pipelines (or rather in transform functions), scope is also injected with namespaces of tables
which may have been referenced with from or join transforms. These namespaces contain
simply all the columns of the table and possibly a wildcard variable, which matches any
variable (see the algorithm below). Within transforms, there is also a special namespace that
does not have a name. It is called a “frame” and it contains columns of the current table the
transform is operating on.

Resolving
For each ident we want to resolve, we search the scope’s items in order. One of three things
can happen:

https://prql-lang.org/book/print.html 66/90
6/1/23, 10:12 PRQL Language Book

Scope contains an exact match, e.g. a name that matches in namespace and the variable
name.

Scope does not contain an exact match, but the ident did not specify a namespace, so we
can match a namespace that contains a * wildcard. If there’s a single namespace, the
matched namespace is also updated to contain this new variable name.

Otherwise, the nothing is matched and an error is raised.

Translating to SQL
When translating into a SQL statement which references only one table, there is no need to
reference column names with table prefix.

PRQL SQL
from employees SELECT
select first_name first_name
FROM
employees

But when there are multiple tables and we don’t have complete knowledge of all table
columns, a column without a prefix (i.e. first_name ) may actually reside in multiple tables.
Because of this, we have to use table prefixes for all column names.

PRQL SQL
from employees SELECT
derive [first_name, dept_id] employees.first_name,
join d=departments [==dept_id] d.title
select [first_name, d.title] FROM
employees
JOIN departments AS d ON
employees.dept_id = d.dept_id

As you can see, employees.first_name now needs table prefix, to prevent conflicts with
potential column with the same name in departments table. Similarly, d.title needs the
table prefix.

https://prql-lang.org/book/print.html 67/90
6/1/23, 10:12 PRQL Language Book

Functions

Function call
The major distinction between PRQL and today’s conventional programming languages such as
C or Python is the function call syntax. It consists of the function name followed by arguments
separated by whitespace.

function_name arg1 arg2 arg3

If one of the arguments is also a function call, it must be encased in parentheses, so we know
where arguments of inner function end and the arguments of outer function start.

outer_func arg_1 (inner_func arg_a, arg_b) arg_2

Pipeline
There is a alternative way of calling functions: using a pipeline. Regardless of whether the
pipeline is delimited by pipe symbol | or a new line, the pipeline is equivalent to applying each
of functions as the last argument of the next function.

a | foo 3 | bar 'hello' 'world' | baz

… is equivalent to …

baz (bar 'hello' 'world' (foo 3 a))

As you may have noticed, transforms are regular functions too!

https://prql-lang.org/book/print.html 68/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
from employees SELECT
filter age > 50 *
sort name FROM
employees
WHERE
age > 50
ORDER BY
name

… is equivalent to …

PRQL SQL
from employees | filter age > 50 | sort SELECT
name *
FROM
employees
WHERE
age > 50
ORDER BY
name

… is equivalent to …

PRQL SQL
filter age > 50 (from employees) | sort SELECT
name *
FROM
employees
WHERE
age > 50
ORDER BY
name

… is equivalent to …

https://prql-lang.org/book/print.html 69/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
sort name (filter age > 50 (from SELECT
employees)) *
FROM
employees
WHERE
age > 50
ORDER BY
name

As you can see, the first example with pipeline notation is much easier to comprehend,
compared to the last one with the regular function call notation. This is why it is recommended
to use pipelines for nested function calls that are 3 or more levels deep.

Currying and late binding


In PRQL, functions are first class citizens. As cool as that sounds, we need simpler terms to
explain it. In essence in means that we can operate with functions are with any other value.

https://prql-lang.org/book/print.html 70/90
6/1/23, 10:12 PRQL Language Book

Syntax highlighting
PRQL contains multiple grammar definitions to enable tools to highlight PRQL code. These are
all intended to provide as good an experience as the grammar supports. Please raise any
shortcomings in a GitHub issue.

The definitions are somewhat scattered around the codebase; this page serves as an index.

Lezer — used by CodeMirror editors. The PRQL file is at prql-lezer/README.me .

Handlebars — currently duplicated:

The book: book/highlight-prql.js


The website (outside of the book & playground): website/themes/prql-
theme/static/plugins/highlight/prql.js

Textmate — used by the VSCode Extension. It’s in the prql-vscode repo in prql-
vscode/syntaxes/prql.tmLanguage.json .

Monarch — used by the Monaco editor, which we use for the Playground. The grammar is
at playground/src/workbench/prql-syntax.js .

While the pest grammar at prql-compiler/src/prql.pest isn’t used for syntax highlighting,
it’s the arbiter of truth given it currently powers the PRQL compiler.

https://prql-lang.org/book/print.html 71/90
6/1/23, 10:12 PRQL Language Book

dbt-prql

Original docs at https://github.com/prql/dbt-prql

dbt-prql allows writing PRQL in dbt models. This combines the benefits of PRQL’s power &
simplicity within queries, with dbt’s version control, lineage & testing across queries.

Once dbt-prql in installed, dbt commands compile PRQL between {% prql %} & {% endprql
%} jinja tags to SQL as part of dbt’s compilation. No additional config is required.

Examples

Simple example

{% prql %}
from employees
filter (age | in 20..30)
{% endprql %}

…would appear to dbt as:

SELECT
employees.*
FROM
employees
WHERE
age BETWEEN 20
AND 30

https://prql-lang.org/book/print.html 72/90
6/1/23, 10:12 PRQL Language Book

Less simple example

{% prql %}
from {{ source('salesforce', 'in_process') }}
derive expected_sales = probability * value
join {{ ref('team', 'team_sales') }} [==name]
group name (
aggregate (expected_sales)
)
{% endprql %}

…would appear to dbt as:

SELECT
name,
{{ source('salesforce', 'in_process') }}.probability * {{ source('salesforce',
'in_process') }}.value AS expected_sales
FROM
{{ source('salesforce', 'in_process') }}
JOIN {{ ref('team', 'team_sales') }} USING(name)
GROUP BY
name

…and then dbt will compile the source and ref s to a full SQL query.

Replacing macros

dbt’s use of macros has saved many of us many lines of code, and even saved some people
some time. But imperatively programming text generation with code like if not loop.last is
not our highest calling. It’s the “necessary” part rather than beautiful part of dbt.

Here’s the canonical example of macros in the dbt documentation:

{%- set payment_methods = ["bank_transfer", "credit_card", "gift_card"] -%}

select
order_id,
{%- for payment_method in payment_methods %}
sum(case when payment_method = '{{payment_method}}' then amount end) as
{{payment_method}}_amount
{%- if not loop.last %},{% endif -%}
{% endfor %}
from {{ ref('raw_payments') }}
group by 1

Here’s that model using PRQL1, including the prql jinja tags.

https://prql-lang.org/book/print.html 73/90
6/1/23, 10:12 PRQL Language Book

{% prql %}
func filter_amount method -> s"sum(case when payment_method = '{method}' then
amount end) as {method}_amount"

from {{ ref('raw_payments') }}
group order_id (
aggregate [
filter_amount bank_transfer,
filter_amount credit_card,
filter_amount gift_card,
]
)
{% endprql %}

As well the query being simpler in its final form, writing in PRQL also gives us live feedback
around any errors, on every keystroke. Though there’s much more to come, check out the
current version on PRQL Playground.

What it does
When dbt compiles models to SQL queries:

Any text in a dbt model between {% prql %} and {% endprql %} tags is compiled from
PRQL to SQL before being passed to dbt.
The PRQL compiler passes text that’s containing {{ & }} through to dbt without
modification, which allows us to embed jinja expressions in PRQL. (This was added to
PRQL specifically for this use-case.)
dbt will then compile the resulting model into its final form of raw SQL, and dispatch it to
the database, as per usual.

There’s no config needed in the dbt project; this works automatically on any dbt command (e.g.
dbt run ) assuming dbt-prql is installed.

Installation

pip install dbt-prql

https://prql-lang.org/book/print.html 74/90
6/1/23, 10:12 PRQL Language Book

Current state
Currently this is new, but fairly feature-complete. It’s enthusiastically supported — if there are
any problems, please open an issue.

https://prql-lang.org/book/print.html 75/90
6/1/23, 10:12 PRQL Language Book

Jupyter

Original docs at https://pyprql.readthedocs.io/en/latest/magic_readme.html

Work with pandas and PRQL in an IPython terminal or Jupyter notebook.

Implementation
This is a thin wrapper around the fantastic IPython-sql magic. Roughly speaking, all we do is
parse PRQL to SQL and pass that through to ipython-sql . A full documentation of the
supported features is available at their repository. Here, we document those places where we
differ from them, plus those features we think you are mostly likely to find useful.

Usage

Installation

If you have already installed PyPRQL into your environment, then you should be could to go!
We bundle in IPython and pandas , though you’ll need to install Jupyter separately. If you
haven’t installed PyPRQL, that’s as simple as:

pip install pyprql

Set Up

Open up either an IPython terminal or Jupyter notebook. First, we need to load the
extension and connect to a database.

In [1]: %load_ext pyprql.magic

https://prql-lang.org/book/print.html 76/90
6/1/23, 10:12 PRQL Language Book

Connecting a database

We have two options for connecting a database

1. Create an in-memory DB. This is the easiest way to get started.

In [2]: %prql duckdb:///:memory:

However, in-memory databases start off empty! So, we need to add some data. We have a
two options:

We can easily add a pandas dataframe to the DuckDB database like so:

In [3]: %prql --persist df

where df is a pandas dataframe. This adds a table named df to the in-memory


DuckDB instance.

Or download a CSV and query it directly, with DuckDB:

!wget https://github.com/graphql-compose/graphql-compose-
examples/blob/master/examples/northwind/data/csv/products.csv

…and then from products.csv will work.

2. Connect to an existing database

When connecting to a database, pass the connection string as an argument to the line
magic %prql . The connection string needs to be in SQLAlchemy format, so any
connection supported by SQLAlchemy is supported by the magic. Additional connection
parameters can be passed as a dictionary using the --connection_arguments flag to the
the %prql line magic. We ship with the necessary extensions to use DuckDB as the
backend, and here connect to an in-memory database.

Querying

Now, let’s do a query! By default, PRQLMagic always returns the results as dataframe, and
always prints the results. The results of the previous query are accessible in the _ variable.

These examples are based on the products.csv example above.

https://prql-lang.org/book/print.html 77/90
6/1/23, 10:12 PRQL Language Book

In [4]: %%prql
...: from p = products.csv
...: filter supplierID == 1

Done.
Returning data to local variable _
productID productName supplierID categoryID quantityPerUnit
unitPrice unitsInStock unitsOnOrder reorderLevel discontinued
0 1 Chai 1 1 10 boxes x 20 bags
18.0 39 0 10 0
1 2 Chang 1 1 24 - 12 oz bottles
19.0 17 40 25 0
2 3 Aniseed Syrup 1 2 12 - 550 ml bottles
10.0 13 70 25 0

In [5]: %%prql
...: from p = products.csv
...: group categoryID (
...: aggregate [average unitPrice]
...: )

Done.
Returning data to local variable _
categoryID avg("unitPrice")
0 1 37.979167
1 2 23.062500
2 7 32.370000
3 6 54.006667
4 8 20.682500
5 4 28.730000
6 3 25.160000
7 5 20.250000

We can capture the results into a different variable like so:

In [6]: %%prql results <<


...: from p = products.csv
...: aggregate [min unitsInStock, max unitsInStock]

Done.
Returning data to local variable results
min("unitsInStock") max("unitsInStock")
0 0 125

Now, the output of the query is saved to results .

https://prql-lang.org/book/print.html 78/90
6/1/23, 10:12 PRQL Language Book

Prefect
Because Prefect is in native python, it’s extremely easy to integrate with PRQL.

With a Postgres Task, replace:

PostgresExecute.run(..., query=sql)

…with…

PostgresExecute.run(..., query=pyprql.to_sql(prql))

We’re big fans of Prefect, and if there is anything that would make the integration easier, please
open an issue.

https://prql-lang.org/book/print.html 79/90
6/1/23, 10:12 PRQL Language Book

Examples
These examples are rewritten from other languages such as SQL. They try to express real-
world problems in PRQL, covering most of the language features. We are looking for different
use-cases of data transformation, be it database queries, semantic business modeling or data
cleaning.

If you want to help, translate some of your queries to PRQL and open a PR to add them here!

https://prql-lang.org/book/print.html 80/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
from employees WITH table_1 AS (
filter country == "USA" SELECT
# Each line transforms the previous title,
result. country,
derive [ salary + payroll_tax + benefits_cost
# This adds columns / variables. AS _expr_0,
gross_salary = salary + payroll_tax, salary + payroll_tax AS _expr_1,
gross_cost = gross_salary + salary
benefits_cost # Variables can use other FROM
variables. employees
] WHERE
filter gross_cost > 0 country = 'USA'
group [title, country] ( )
# For each group use a nested pipeline SELECT
aggregate [ title,
# Aggregate each group to a single row country,
average salary, AVG(salary),
average gross_salary, AVG(_expr_1),
sum salary, SUM(salary),
sum gross_salary, SUM(_expr_1),
average gross_cost, AVG(_expr_0),
sum_gross_cost = sum gross_cost, SUM(_expr_0) AS sum_gross_cost,
ct = count, COUNT(*) AS ct
] FROM
) table_1
sort sum_gross_cost WHERE
filter ct > 200 _expr_0 > 0
take 20 GROUP BY
title,
country
HAVING
COUNT(*) > 200
ORDER BY
sum_gross_cost
LIMIT
20

https://prql-lang.org/book/print.html 81/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
from employees WITH table_1 AS (
group [emp_no] ( SELECT
aggregate [ AVG(salary) AS _expr_0,
emp_salary = average salary # emp_no
average salary resolves to "AVG(salary)" FROM
(from stdlib) employees
] GROUP BY
) emp_no
join titles [==emp_no] )
group [title] ( SELECT
aggregate [ AVG(table_1._expr_0) / 1000 AS
avg_salary = average emp_salary salary_k,
] AVG(table_1._expr_0) / 1000 * 1000 AS
) salary
select salary_k = avg_salary / 1000 # FROM
avg_salary should resolve to table_1
"AVG(emp_salary)" JOIN titles ON table_1.emp_no =
take 10 # titles.emp_no
induces new SELECT GROUP BY
derive salary = salary_k * 1000 # titles.title
salary_k should not resolve to LIMIT
"avg_salary / 1000" 10

https://prql-lang.org/book/print.html 82/90
6/1/23, 10:12 PRQL Language Book

Single item is coerced into a list


PRQL SQL
from employees SELECT
select salary salary
FROM
employees

Same as above but with salary in a list:

PRQL SQL
from employees SELECT
select [salary] salary
FROM
employees

Multiple items
PRQL SQL
from employees SELECT
derive [ *,
gross_salary = salary + payroll_tax, salary + payroll_tax AS gross_salary,
gross_cost = gross_salary + salary + payroll_tax + benefits_cost
benefits_cost AS gross_cost
] FROM
employees

Same as above but split into two lines:

https://prql-lang.org/book/print.html 83/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
from employees SELECT
derive gross_salary = salary + *,
payroll_tax salary + payroll_tax AS gross_salary,
derive gross_cost = gross_salary + salary + payroll_tax + benefits_cost
benefits_cost AS gross_cost
FROM
employees

https://prql-lang.org/book/print.html 84/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
table newest_employees = ( WITH average_salaries AS (
from employees SELECT
sort tenure country,
take 50 AVG(salary) AS
) average_country_salary
FROM
table average_salaries = ( salaries
from salaries GROUP BY
group country ( country
aggregate average_country_salary = ),
(average salary) newest_employees AS (
) SELECT
) *
FROM
from newest_employees employees
join average_salaries [==country] ORDER BY
select [name, salary, tenure
average_country_salary] LIMIT
50
)
SELECT
newest_employees.name,
newest_employees.salary,

average_salaries.average_country_salary
FROM
newest_employees
JOIN average_salaries ON
newest_employees.country =
average_salaries.country

https://prql-lang.org/book/print.html 85/90
6/1/23, 10:12 PRQL Language Book

Employees
These are homework tasks on employees database.

Clone and init the database (requires a local PostgreSQL instance):

psql -U postgres -c 'CREATE DATABASE employees;'


git clone https://github.com/vrajmohan/pgsql-sample-data.git
psql -U postgres -d employees -f pgsql-sample-data/employee/employees.dump

Execute a PRQL query:

cd prql-compiler
cargo run compile examples/employees/average-title-salary.prql | psql -U postgres
-d employees

Task 1

rank the employee titles according to the average salary for each department.

My solution:

for each employee, find their average salary,


join employees with their departments and titles (duplicating employees for each of their
titles and departments)
group by department and title, aggregating average salary
join with department to get department name

https://prql-lang.org/book/print.html 86/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
from salaries WITH table_1 AS (
group [emp_no] ( SELECT
aggregate [emp_salary = average AVG(salary) AS _expr_0,
salary] emp_no
) FROM
join t=titles [==emp_no] salaries
join dept_emp side:left [==emp_no] GROUP BY
group [dept_emp.dept_no, t.title] ( emp_no
aggregate [avg_salary = average ),
emp_salary] table_2 AS (
) SELECT
join departments [==dept_no] t.title,
select [dept_name, title, avg_salary] AVG(table_1._expr_0) AS avg_salary,
dept_emp.dept_no
FROM
table_1
JOIN titles AS t ON table_1.emp_no =
t.emp_no
LEFT JOIN dept_emp ON table_1.emp_no
= dept_emp.emp_no
GROUP BY
dept_emp.dept_no,
t.title
)
SELECT
departments.dept_name,
table_2.title,
table_2.avg_salary
FROM
table_2
JOIN departments ON table_2.dept_no =
departments.dept_no

Task 2

Estimate distribution of salaries and gender for each department departments.

https://prql-lang.org/book/print.html 87/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
from e=employees WITH table_1 AS (
join salaries [==emp_no] SELECT
group [e.emp_no, e.gender] ( e.gender,
aggregate [ AVG(salaries.salary) AS _expr_0,
emp_salary = average salaries.salary e.emp_no
] FROM
) employees AS e
join de=dept_emp [==emp_no] side:left JOIN salaries ON e.emp_no =
group [de.dept_no, gender] ( salaries.emp_no
aggregate [ GROUP BY
salary_avg = average emp_salary, e.emp_no,
salary_sd = stddev emp_salary, e.gender
] ),
) table_2 AS (
join departments [==dept_no] SELECT
select [dept_name, gender, salary_avg, table_1.gender,
salary_sd] AVG(table_1._expr_0) AS salary_avg,
STDDEV(table_1._expr_0) AS
salary_sd,
de.dept_no
FROM
table_1
LEFT JOIN dept_emp AS de ON
table_1.emp_no = de.emp_no
GROUP BY
de.dept_no,
table_1.gender
)
SELECT
departments.dept_name,
table_2.gender,
table_2.salary_avg,
table_2.salary_sd
FROM
table_2
JOIN departments ON table_2.dept_no =
departments.dept_no

Task 3

Estimate distribution of salaries and gender for each manager.

https://prql-lang.org/book/print.html 88/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
from e=employees WITH table_1 AS (
join salaries [==emp_no] SELECT
group [e.emp_no, e.gender] ( e.gender,
aggregate [ AVG(salaries.salary) AS _expr_0,
emp_salary = average salaries.salary e.emp_no
] FROM
) employees AS e
join de=dept_emp [==emp_no] JOIN salaries ON e.emp_no =
join dm=dept_manager [ salaries.emp_no
(dm.dept_no == de.dept_no) and s" GROUP BY
(de.from_date, de.to_date) OVERLAPS e.emp_no,
(dm.from_date, dm.to_date)" e.gender
] ),
group [dm.emp_no, gender] ( table_2 AS (
aggregate [ SELECT
salary_avg = average emp_salary, AVG(table_1._expr_0) AS salary_avg,
salary_sd = stddev emp_salary STDDEV(table_1._expr_0) AS
] salary_sd,
) dm.emp_no
derive mng_no = emp_no FROM
join managers=employees [==emp_no] table_1
derive mng_name = s"managers.first_name JOIN dept_emp AS de ON
|| ' ' || managers.last_name" table_1.emp_no = de.emp_no
select [mng_name, managers.gender, JOIN dept_manager AS dm ON
salary_avg, salary_sd] dm.dept_no = de.dept_no
AND (de.from_date, de.to_date)
OVERLAPS (dm.from_date, dm.to_date)
GROUP BY
dm.emp_no,
table_1.gender
)
SELECT
managers.first_name || ' ' ||
managers.last_name AS mng_name,
managers.gender,
table_2.salary_avg,
table_2.salary_sd
FROM
table_2
JOIN employees AS managers ON
table_2.emp_no = managers.emp_no

Task 4

Find distributions of titles, salaries and genders for each department.

https://prql-lang.org/book/print.html 89/90
6/1/23, 10:12 PRQL Language Book

PRQL SQL
from de=dept_emp WITH table_1 AS (
join s=salaries side:left [ SELECT
(s.emp_no == de.emp_no), de.dept_no,
s"({s.from_date}, {s.to_date}) AVG(s.salary) AS salary,
OVERLAPS ({de.from_date}, {de.to_date})" de.emp_no
] FROM
group [de.emp_no, de.dept_no] ( dept_emp AS de
aggregate salary = (average s.salary) LEFT JOIN salaries AS s ON s.emp_no
) = de.emp_no
join employees [==emp_no] AND (s.from_date, s.to_date)
join titles [==emp_no] OVERLAPS (de.from_date, de.to_date)
select [dept_no, salary, GROUP BY
employees.gender, titles.title] de.emp_no,
de.dept_no
)
SELECT
table_1.dept_no,
table_1.salary,
employees.gender,
titles.title
FROM
table_1
JOIN employees ON table_1.emp_no =
employees.emp_no
JOIN titles ON table_1.emp_no =
titles.emp_no

https://prql-lang.org/book/print.html 90/90

You might also like