PRQL Language Book

6/1/23, 10:12 PRQL Language Book
Introduction
PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL
replacement. Like SQL, it’s readable, explicit and declarative. Unlike SQL, it forms a logical
pipeline of transformations, and supports abstractions such as variables and functions. It can
be used with any database that uses SQL, since it transpiles to SQL.
Let’s get started with an example:
PRQL SQL
from employees WITH table_1 AS (
filter start_date > @2021-01-01 SELECT
# Clear date syntax title,
derive [ country,
# `derive` adds columns / variables salary + COALESCE(tax, 0) +
gross_salary = salary + (tax ?? 0), benefits_cost AS _expr_0,
# Terse coalesce salary + COALESCE(tax, 0) AS _expr_1
gross_cost = gross_salary + FROM
benefits_cost, # Variables can use employees
other variables WHERE
] start_date > DATE '2021-01-01'
filter gross_cost > 0 )
group [title, country] ( SELECT
# `group` runs a pipeline over each title,
group country,
aggregate [ AVG(_expr_1),
# `aggregate` reduces each group to a SUM(_expr_0) AS sum_gross_cost,
value CONCAT(title, '_', country) AS id,
average gross_salary, LEFT(country, 2) AS country_code
sum_gross_cost = sum gross_cost, FROM
# `=` sets a column name table_1
] WHERE
) _expr_0 > 0
filter sum_gross_cost > 100000 GROUP BY
# `filter` replaces both of SQL's title,
`WHERE` & `HAVING` country
derive id = f"{title}_{country}" HAVING
# F-strings like python SUM(_expr_0) > 100000
derive country_code = s"LEFT(country, ORDER BY
2)" # S-strings allow using SQL as sum_gross_cost,
an escape hatch country DESC
sort [sum_gross_cost, -country] LIMIT
# `-country` means descending order 20
take 1..20
# Range expressions (also valid here as
`take 20`)
https://prql-lang.org/book/print.html 1/90
As you can see, PRQL is a linear pipeline of transformations — each line of the query is a
transformation of the previous line’s result.
You can see that in SQL, operations do not follow one another, which makes it hard to
compose larger queries.
Pipelines
The simplest pipeline

The simplest pipeline is just:
PRQL SQL
from employees SELECT
*
FROM
employees
Adding transformations
We can add additional lines, each one transforms the result:
PRQL SQL
derive gross_salary = (salary + *,
payroll_tax) salary + payroll_tax AS gross_salary
FROM
employees
…and so on:
from employees
derive gross_salary = (salary + payroll_tax)
sort gross_salary
Compiling to SQL
When compiling to SQL, the PRQL compiler will try to represent as many transforms as possible
with a single SELECT statement. When necessary it will “overflow” using CTEs (common table
expressions):
from e = employees
derive gross_salary = (salary + payroll_tax)
sort gross_salary
take 10
join d = department [==dept_no]
select [e.name, gross_salary, d.name]
See also
Syntax
Functions
Functions are a fundamental abstraction in PRQL — they allow us to run code in many places
that we’ve written once. This reduces the number of errors in our code, makes our code more
readable, and simplifies making changes.
Functions have two types of parameters:
1. Positional parameters, which require an argument.

2. Named parameters, which optionally take an argument, otherwise using their default
value.
So this function is named fahrenheit_to_celsius and has one parameter temp :
PRQL SQL
func fahrenheit_to_celsius temp -> (temp SELECT
- 32) / 1.8 *,
(temp_f - 32) / 1.8 AS temp_c
from cities FROM
derive temp_c = (fahrenheit_to_celsius cities
temp_f)
This function is named interp , and has two positional parameters named higher and x , and
one named parameter named lower which takes a default argument of 0 . It calculates the
proportion of the distance that x is between lower and higher .
PRQL SQL
func interp lower:0 higher x -> (x - SELECT
lower) / (higher - lower) *,
(sat_score - 0) / 1600 AS
from students sat_proportion_1,
derive [ (sat_score - 0) / 1600 AS
sat_proportion_1 = (interp 1600 sat_proportion_2
sat_score), FROM
sat_proportion_2 = (interp lower:0 students
1600 sat_score),
]
Piping
Consistent with the principles of PRQL, it’s possible to pipe values into functions, which makes
composing many functions more readable. When piping a value into a function, the value is
passed as an argument to the final positional parameter of the function. Here’s the same result
as the examples above with an alternative construction:
PRQL SQL
func interp lower:0 higher x -> (x - SELECT
lower) / (higher - lower) *,
(sat_score - 0) / 1600 AS
from students sat_proportion_1,
derive [ (sat_score - 0) / 1600 AS
sat_proportion_1 = (sat_score | interp sat_proportion_2
1600), FROM
sat_proportion_2 = (sat_score | interp students
lower:0 1600),
]
and
PRQL SQL
- 32) / 1.8 *,
(temp_f - 32) / 1.8 AS temp_c
from cities FROM
derive temp_c = (temp_f | cities
fahrenheit_to_celsius)
We can combine a chain of functions, which makes logic more readable:
PRQL SQL
- 32) / 1.8 *,
func interp lower:0 higher x -> (x - ((temp_c - 32) / 1.8 - 0) / 100 AS
lower) / (higher - lower) boiling_proportion
FROM
from kettles kettles
derive boiling_proportion = (temp_c |
fahrenheit_to_celsius | interp 100)
Roadmap
Late binding
Currently, functions require a binding to variables in scope; they can’t late-bind to column
names; so for example:
func return price -> (price - dividend) / price_yesterday
…isn’t yet a valid function, and instead would needs to be:
func return price dividend price_yesterday -> (price - dividend) /

(price_yesterday)
(which makes functions in this case not useful)
Tables
We can create a table — similar to a CTE in SQL — with table :
PRQL SQL
table top_50 = ( WITH table_0 AS (
sort salary salary
take 50 FROM
aggregate [total_salary = sum salary] employees
) ORDER BY
salary
from top_50 # Starts a new pipeline LIMIT
50
), top_50 AS (
SELECT
SUM(salary) AS total_salary
FROM
table_0
)
SELECT
total_salary
FROM
top_50
Note
The table expression requires surrounding parentheses. Without parentheses, the compiler
wouldn’t be able to evaluate where the expression stopped and the main pipeline started.
We can even place a whole CTE in an s-string, enabling us to use features which PRQL doesn’t
yet support.
PRQL SQL
table grouping = s""" WITH table_0 AS (
SELECT SUM(a) SELECT
FROM tbl SUM(a)
GROUP BY FROM
GROUPING SETS tbl
((b, c, d), (d), (b, d)) GROUP BY
""" GROUPING SETS ((b, c, d), (d), (b,
d))
from grouping ),
grouping AS (
SELECT
*
FROM
table_0 AS table_1
)
SELECT
*
FROM
grouping
Info
In PRQL table s are far less common than CTEs are in SQL, since a linear series of CTEs can
be represented with a single pipeline.
Syntax Usage Example

| Pipelines from employees | select first_name
from e = employees
= Assigns & Aliases
derive total = (sum salary)
Named args &

: interp lower:0 1600 sat_score
Parameters
[] Lists select [id, amount]
derive celsius = (fahrenheit - 32) /

() Precedence
1.8
'' &
Strings derive name = 'Mary'
""
` ` Quoted identifiers select `first name`
# Comments # A comment
@ Dates & Times @2021-01-01
Syntax Usage Example

== Equality filter [a == b, c != d, e > f]
== Self-equality in join join s=salaries [==id]
-> Function definitions func add a b -> a + b
+/- Sort order sort [-amount, +date]
?? Coalesce amount ?? 0
Pipes
Pipes — the connection between transforms that make up a pipeline — can be either line
breaks or a pipe character ( | ).
In almost all situations, line-breaks pipe the result of a line’s transform into the transform on
the following line. For example, the filter transform operates on the result of from
employees (which is just the employees table), and the select transform operates on the
result of the filter transform.
PRQL SQL
filter department == "Product" first_name,
select [first_name, last_name] last_name
FROM
employees
WHERE
department = 'Product'
In the place of a line-break, it’s also possible to use the | character to pipe results, such that
this is equivalent:
PRQL SQL
from employees | filter department == SELECT
"Product" | select [first_name, first_name,
last_name] last_name
FROM
employees
WHERE
department = 'Product'
A line-break doesn’t create a pipeline in a couple of cases:
within a list (e.g. the derive examples below),

when the following line is a new statement, which starts with a keyword of func , table
or from .
Lists
Lists are represented with [] , and can span multiple lines. A final trailing comma is optional.
PRQL SQL
from numbers SELECT
derive [x = 1, y = 2] *,
derive [ 1 AS x,
a = x, 2 AS y,
b = y 1 AS a,
] 2 AS b,
derive [ 1 AS c,
c = a, 2 AS d
d = b, FROM
] numbers
Most transforms can take either a list or a single item, so these are equivalent:
PRQL SQL
select [first_name] first_name
FROM
employees
PRQL SQL
select first_name first_name
FROM
employees
Parentheses
Parentheses — () — are used to give precedence to inner expressions, as is the case in
almost all languages / math.
In particular, parentheses are used to nest pipelines for transforms such as group and
window , which take a pipeline. Here, the aggregate pipeline is applied to each group of
unique title and country values.
PRQL SQL
group [title, country] ( title,
aggregate [ country,
average salary, AVG(salary),
ct = count COUNT(*) AS ct
] FROM
) employees
GROUP BY
title,
country
Comments
Comments are represented by # . Currently only single line comments exist.
PRQL SQL
from employees # Comment 1 SELECT
# Comment 2 AVG(salary)
aggregate [average salary] FROM
employees
Quoted identifiers
To use identifiers that are otherwise invalid, surround them with backticks. Depending on the
dialect, these will remain as backticks or be converted to double-quotes.
PRQL SQL
prql target:sql.mysql SELECT
from employees `first name`
select `first name` FROM
employees
PRQL SQL
prql target:sql.postgres SELECT
from employees "first name"
select `first name` FROM
employees
BigQuery also uses backticks to surround project & dataset names (even if valid identifiers) in
the SELECT statement:
PRQL SQL
prql target:sql.bigquery SELECT
from `project-foo.dataset.table` `project-foo.dataset.table`.*,
join `project-bar.dataset.table` `project-bar.dataset.table`.*
[==col_bax] FROM
`project-foo.dataset.table`
JOIN `project-bar.dataset.table` ON
`project-foo.dataset.table`.col_bax =
`project-bar.dataset.table`.col_bax
Parameters
PRQL will retain parameters like $1 in SQL output, which can then be supplied to the SQL
query:
PRQL SQL
filter id == $1 *
FROM
employees
WHERE
id = $1
Query header: Target dialect & Version
Target dialect
PRQL allows specifying a target dialect at the top of the query, which allows PRQL to compile to
a database-specific SQL flavor.
Examples
PRQL SQL
prql target:sql.postgres SELECT
*
from employees FROM
sort age employees
take 10 ORDER BY
age
LIMIT
10
PRQL SQL
prql target:sql.mssql SELECT
TOP (10) *
from employees FROM
sort age employees
take 10 ORDER BY
age
Supported dialects
Note
Note that dialect support is early — most differences are not implemented, and most
dialects’ implementations are identical to generic ’s. Contributions are very welcome.
sql.ansi
sql.bigquery
sql.clickhouse
sql.generic
sql.hive
sql.mssql
sql.mysql
sql.postgres
sql.sqlite
sql.snowflake
Version
PRQL allows specifying a version of the language in the PRQL header, like:
PRQL SQL
prql version:"0.3" SELECT
*
from employees FROM
employees
This has two roles, one of which is implemented:
The compiler will raise an error if the compiler is older than the query version. This
prevents confusing errors when queries use newer features of the language but the
compiler hasn’t yet been upgraded.
The compiler will compile for the major version of the query. This allows the language to
evolve without breaking existing queries, or forcing multiple installations of the compiler.
This isn’t yet implemented, but is a gating feature for PRQL 1.0.
Transforms
PRQL queries are a pipeline of transformations (“transforms”), where each transform takes the
previous result and adjusts it in some way, before passing it onto to the next transform.
Because PRQL focuses on modularity, we have far fewer transforms than SQL, each one
fulfilling a specific purpose. That’s often referred to as “orthogonality”.
These are the currently available transforms:
Transform Purpose SQL Equivalent

from Starts from a table FROM
SELECT *, ... AS
derive Computes new columns
...
select Picks & computes columns SELECT ... AS ...
WHERE ,
filter Picks rows based on their values
HAVING , QUALIFY
sort Orders rows based on the values of columns ORDER BY
Adds columns from another table, matching

join JOIN
rows based on a condition
take Picks rows based on their position TOP , LIMIT , OFFSET
Partitions rows into groups and applies a GROUP BY , PARTITION

group
pipeline to each of them BY
aggregate Summarizes many rows into one row SELECT foo(...)
Applies a pipeline to overlapping segments

window OVER , ROWS , RANGE
of rows
Aggregate
Summarizes many rows into one row.
When applied:
without group , it produces one row from the whole table,

within a group pipeline, it produces one row from each group.
aggregate [{expression or assign operations}]
Note
Currently, all declared aggregation functions are min , max , count , average , stddev ,
avg , sum and count_distinct . We are in the process of filling out std lib.
Examples
PRQL SQL
aggregate [ AVG(salary),
average salary, COUNT(*) AS ct
ct = count FROM
] employees
PRQL SQL
] FROM
) employees
GROUP BY
title,
country
Aggregate is required
Unlike in SQL, using an aggregation function in derive or select (or any other transform
except aggregate ) will not trigger aggregation. By default, PRQL will interpret such attempts
functions as window functions:
PRQL SQL
derive [avg_sal = average salary] *,
AVG(salary) OVER () AS avg_sal
FROM
employees
This ensures that derive does not manipulate the number of rows, but only ever adds a
column. For more information, see window transform.
Derive
Computes one or more new columns.
derive [
{new_name} = {expression},
# or
{expression}
]
Examples
PRQL SQL
derive gross_salary = salary + *,
payroll_tax salary + payroll_tax AS gross_salary
FROM
employees
PRQL SQL
derive [ *,
gross_salary = salary + payroll_tax, salary + payroll_tax AS gross_salary,
gross_cost = gross_salary + salary + payroll_tax + benefits_cost
benefits_cost AS gross_cost
] FROM
employees
Filter
Picks rows based on their values.
filter {boolean_expression}
Examples
PRQL SQL
filter age > 25 *
FROM
employees
WHERE
age > 25
PRQL SQL
filter (age | in 25..40) *
FROM
employees
WHERE
age BETWEEN 25
AND 40
From
Specifies a data source.
from {table_reference}
Examples
PRQL SQL
*
FROM
employees
To introduce an alias, use an assign expression:
PRQL SQL
from e = employees SELECT
select e.first_name first_name
FROM
employees AS e
Group
Partitions the rows into groups and applies a pipeline to each of the groups.
group [{key_columns}] {pipeline}
The partitioning of groups are determined by the key_column s (first argument).
The most conventional use of group is with aggregate :
PRQL SQL
] FROM
) employees
GROUP BY
title,
country
In concept, a transform in context of a group does the same transformation to the group as it
would to the table — for example finding the employee who joined first across the whole table:
PRQL SQL
sort join_date *
take 1 FROM
employees
ORDER BY
join_date
LIMIT
1
To find the employee who joined first in each department, it’s exactly the same pipeline, but
within a group expression:
PRQL SQL
group role ( SELECT
sort join_date # taken from above *,
take 1 ROW_NUMBER() OVER (
) PARTITION BY role
ORDER BY
join_date
) AS _expr_0
FROM
employees
)
SELECT
*
FROM
table_1
WHERE
_expr_0 <= 1
Join
Adds columns from another table, matching rows based on a condition.
join side:{inner|left|right|full} {table} {[conditions]}
Parameters
side decides which rows to include, defaulting to inner .
Table reference
List of conditions
The result of join operation is a cartesian (cross) product of rows from both tables,
which is then filtered to match all of these conditions.
If name is the same from both tables, it can be expressed with only ==col .
Examples
PRQL SQL
join side:left positions employees.*,
[employees.id==positions.employee_id] positions.*
FROM
employees
LEFT JOIN positions ON employees.id =
positions.employee_id
PRQL SQL
join side:left p=positions employees.*,
[employees.id==p.employee_id] p.*
FROM
employees
LEFT JOIN positions AS p ON
employees.id = p.employee_id
Self equality operator

If the join conditions are of form left.x == right.x , we can use “self equality operator”:
PRQL SQL
join positions [==emp_no] employees.*,
positions.*
FROM
employees
JOIN positions ON employees.emp_no =
positions.emp_no
Select
Picks and computes columns.
select [
{new_name} = {expression},
# or
{expression}
]
Examples
PRQL SQL
select name = f"{first_name} CONCAT(first_name, ' ', last_name) AS
{last_name}" name
FROM
employees
PRQL SQL
select [ CONCAT(first_name, ' ', last_name) AS
name = f"{first_name} {last_name}", name,
age_eoy = dob - @2022-12-31, dob - DATE '2022-12-31' AS age_eoy
] FROM
employees
PRQL SQL
FROM
employees
PRQL SQL
from e=employees SELECT
select [e.first_name, e.last_name] first_name,
last_name
FROM
employees AS e
Note
In the final example above, the e representing the table / namespace is no longer available
after the select statement. For example, this would raise an error:
from e=employees
select e.first_name
filter e.first_name == "Fred" # Can't find `e.first_name`
To refer to the e.first_name column in subsequent transforms, either refer to it using

first_name , or if it requires a different name, assign one in the select statement:
PRQL SQL
from e=employees WITH table_1 AS (
select fname = e.first_name SELECT
filter fname == "Fred" first_name AS fname
FROM
employees AS e
)
SELECT
fname
FROM
table_1
WHERE
fname = 'Fred'
Concat & Union

Note
concat & union are currently experimental and may have bugs; please report any as
GitHub Issues.
Concat
concat concatenates two tables together, like UNION ALL in SQL. The number of rows is
always the sum of the number of rows from the two input tables.
PRQL SQL
from employees_1 (
concat employees_2 SELECT
*
FROM
employees_1
)
UNION
ALL
SELECT
*
FROM
employees_2
Union
union takes the union of rows, where duplicates are discarded (using the definition of union
from set logic), like UNION DISTINCT in SQL. If all rows are different between the tables, this is
synonymous with concat ; if there are duplicate rows it will produce fewer rows.
PRQL SQL
from employees_1 (
union employees_2 SELECT
*
FROM
employees_1
)
UNION
DISTINCT
SELECT
*
FROM
employees_2
Roadmap
We’d also like to implement the set operations of intersect and difference .
Sort
Orders rows based on the values of one or more columns.
sort [{direction}{column}]
Parameters
One column or a list of columns to sort by
Each column can be prefixed with:
+ , for ascending order, the default
- , for descending order
When using prefixes, even a single column needs to be in a list or parentheses.
(Otherwise, sort -foo is parsed as a subtraction between sort and foo .)
Examples
PRQL SQL
sort age *
FROM
employees
ORDER BY
age
PRQL SQL
sort [-age] *
FROM
employees
ORDER BY
age DESC
PRQL SQL
sort [age, -tenure, +salary] *
FROM
employees
ORDER BY
age,
tenure DESC,
salary
We can also use expressions:
PRQL SQL
sort [s"substr({first_name}, 2, 5)"] SELECT
*,
substr(first_name, 2, 5) AS _expr_0
FROM
employees
ORDER BY
_expr_0
)
SELECT
*
FROM
table_1
Notes
Ordering guarantees
Most DBs will persist ordering through most transforms; for example, you can expect this
result to be ordered by tenure .
PRQL SQL
sort tenure *,
derive name = f"{first_name} CONCAT(first_name, ' ', last_name) AS
{last_name}" name
FROM
employees
ORDER BY
tenure
But:
This is an implementation detail of the DB. If there are instances where this doesn’t hold,
please open an issue, and we’ll consider how to manage it.
Some transforms which change the existence of rows, such as join or group , won’t
persist ordering; for example:
PRQL SQL
sort tenure SELECT
join locations [==employee_id] *
FROM
employees
ORDER BY
tenure
)
SELECT
table_1.*,
locations.*
FROM
table_1
JOIN locations ON table_1.employee_id
= locations.employee_id
See Issue #1363 for more details.
Take
Picks rows based on their position.
take {n|range}
See Ranges for more details on how ranges work.
Examples
PRQL SQL
take 10 *
FROM
employees
LIMIT
10
PRQL SQL
from orders SELECT
sort [-value, date] *
take 101..110 FROM
orders
ORDER BY
value DESC,
date
LIMIT
10 OFFSET 100
Window
Applies a pipeline to segments of rows, producing one output value for every input value.
window rows:{range} range:{range} expanding:false rolling:0 {pipeline}
For each row, the segment over which the pipeline is applied is determined by one of:
rows , which takes a range of rows relative to the current row position.
0 references the current row.
range , which takes a range of values relative to current row value.
The bounds of the range are inclusive. If a bound is omitted, the segment will extend until the
edge of the table or group.
For ease of use, there are two flags that override rows or range :
expanding:true is an alias for rows:..0 . A sum using this window is also known as
“cumulative sum”.
rolling:n is an alias for row:(-n+1)..0 , where n is an integer. This will include n last
values, including current row. An average using this window is also knows as a Simple
Moving Average.
Some examples:
Expression Meaning
rows:0..2 current row plus two following
rows:-2..0 two preceding rows plus current row
rolling:3 (same as previous)
rows:-2..4 two preceding rows plus current row plus four following rows
rows:..0 all rows from the start of the table up to & including current row
expanding:true (same as previous)
rows:0.. current row and all following rows until the end of the table
rows:.. all rows, which same as not having window at all
Example
PRQL SQL
group employee_id ( *,
sort month SUM(paycheck) OVER (
window rolling:12 ( PARTITION BY employee_id
derive [trail_12_m_comp = sum ORDER BY
paycheck] month ROWS BETWEEN 11 PRECEDING
) AND CURRENT ROW
) ) AS trail_12_m_comp
FROM
employees
PRQL SQL
from orders SELECT
sort day *,
window rows:-3..3 ( AVG(value) OVER (
derive [centered_weekly_average = ORDER BY
average value] day ROWS BETWEEN 3 PRECEDING
) AND 3 FOLLOWING
group [order_month] ( ) AS centered_weekly_average,
sort day SUM(value) OVER (
window expanding:true ( PARTITION BY order_month
derive [monthly_running_total = sum ORDER BY
value] day ROWS BETWEEN UNBOUNDED
) PRECEDING
) AND CURRENT ROW
) AS monthly_running_total
FROM
orders
Windowing by default
If you use window functions without window transform, they will be applied to the whole table.
Unlike in SQL, they will remain window functions and will not trigger aggregation.
PRQL SQL
sort age *,
derive rnk = rank RANK() OVER (
ORDER BY
age ROWS BETWEEN UNBOUNDED
PRECEDING
AND UNBOUNDED FOLLOWING
) AS rnk
FROM
employees
ORDER BY
age
You can also only apply group :
PRQL SQL
group department ( *,
sort age RANK() OVER (
derive rnk = rank PARTITION BY department
) ORDER BY
age ROWS BETWEEN UNBOUNDED
PRECEDING
AND UNBOUNDED FOLLOWING
) AS rnk
FROM
employees
Window functions as first class citizens

There is no limitaions where windowed expressions can be used:
PRQL SQL
filter salary < (average salary) SELECT
*,
AVG(salary) OVER () AS _expr_0
FROM
employees
)
SELECT
*
FROM
table_1
WHERE
salary < _expr_0
Coalesce
We can coalesce values with an ?? operator. Coalescing takes either the first value or, if that
value is null, the second value.
PRQL SQL
from orders SELECT
derive amount ?? 0 *,
COALESCE(amount, 0)
FROM
orders
Dates & Times

PRQL uses @ followed by a string to represent dates & times. This is less verbose than SQL’s
approach of TIMESTAMP '2004-10-19 10:23:54' and more explicit than SQL’s implicit option of
just using a string '2004-10-19 10:23:54' .
Note
Currently PRQL passes strings which can be compiled straight through to the database, and
so many compatible formats string may work, but we may refine this in the future to aid in
compatibility across databases. We’ll always support the canonical ISO8601 format
described below.
Dates
Dates are represented by @{yyyy-mm-dd} — a @ followed by the date format.
PRQL SQL
derive age_at_year_end = (@2022-12-31 - *,
dob) DATE '2022-12-31' - dob AS
age_at_year_end
FROM
employees
Times
Times are represented by @{HH:mm:ss.SSS±Z} with any parts not supplied being rounded to
zero, including the timezone, which is represented by +HH:mm , -HH:mm or Z . This is consistent
with the ISO8601 time format.
PRQL SQL
from orders SELECT
derive should_have_shipped_today = *,
(order_time < @08:30) order_time < TIME '08:30' AS
should_have_shipped_today
FROM
orders
Timestamps
Timestamps are represented by @{yyyy-mm-ddTHH:mm:ss.SSS±Z} / @{date}T{time} , with any
time parts not supplied being rounded to zero, including the timezone, which is represented by
+HH:mm , -HH:mm or Z . This is @ followed by the ISO8601 datetime format, which uses T to
separate date & time.
PRQL SQL
from commits SELECT
derive first_prql_commit = @2020-01- *,
01T13:19:55-0800 TIMESTAMP '2020-01-01T13:19:55-0800'
AS first_prql_commit
FROM
commits
Intervals
Intervals are represented by {N}{periods} , such as 2years or 10minutes , without a space.
Note
These aren’t the same as ISO8601, because we evaluated P3Y6M4DT12H30M5S to be difficult

to understand, but we could support a simplified form if there’s demand for it. We don’t
currently support compound expressions, for example 2years10months , but most DBs will
allow 2years + 10months . Please raise an issue if this is inconvenient.
PRQL SQL
from projects SELECT
derive first_check_in = start + 10days *,
start + INTERVAL 10 DAY AS
first_check_in
FROM
projects
Examples
Here’s a fuller list of examples:
@20221231 is forbidden — it must contain full punctuation ( - and : ),

@2022-12-31 is a date
@2022-12 or @2022 are forbidden — SQL can’t express a month, only a date
@16:54:32.123456 is a time
@16:54:32 , @16:54 , @16 are all allowed, expressing @16:54:32.000000 ,
@16:54:00.000000 , @16:00:00.000000 respectively
@2022-12-31T16:54:32.123456 is a timestamp without timezone
@2022-12-31T16:54:32.123456Z is a timestamp in UTC
@2022-12-31T16:54+02 is timestamp in UTC+2
@2022-12-31T16:54+02:00 and @2022-12-31T16:54+02 are datetimes in UTC+2
@16:54+02 is forbidden — time is always local, so it cannot have a timezone
@2022-12-31+02 is forbidden — date is always local, so it cannot have a timezone
Roadmap
Datetimes
Datetimes are supported by some databases (e.g. MySql, BigQuery) in addition to timestamps.
When we have type annotations, these will be represented by a timestamp annotated as a
datetime:
derive pi_day = @2017-03-14T15:09:26.535898<datetime>
These are some examples we can then add:
@2022-12-31T16:54<datetime> is datetime without timezone

@2022-12-31<datetime> is forbidden — datetime must specify time
@16:54<datetime> is forbidden — datetime must specify date
Distinct
PRQL doesn’t have a specific distinct keyword. Instead, use group and take 1 :
PRQL SQL
select department DISTINCT department
group department ( FROM
take 1 employees
)
This also works without a linebreak:
PRQL SQL
select department DISTINCT department
group department (take 1) FROM
employees
Selecting from each group

We are be able to select a single row from each group by combining group and sort :
PRQL SQL
# youngest employee from each department WITH table_1 AS (
group department ( *,
sort age ROW_NUMBER() OVER (
take 1 PARTITION BY department
) ORDER BY
age
) AS _expr_0
FROM
employees
)
SELECT
*
FROM
table_1
WHERE
_expr_0 <= 1
Roadmap
When using Postgres dialect, we are planning to compile:
# youngest employee from each department

from employees
group department (
sort age
take 1
)
… to …
SELECT DISTINCT ON (department) *

FROM employees
ORDER BY department, age
F-Strings
f-strings are a readable approach to building new strings from existing strings. Currently PRQL
supports this for concatenating strings:
PRQL SQL
select full_name = f"{first_name} CONCAT(first_name, ' ', last_name) AS
{last_name}" full_name
FROM
employees
This can be much easier to read for longer strings, relative to the SQL approach:
PRQL SQL
from web SELECT
select url = f"http{tls}://www.{domain}. CONCAT(
{tld}/{page}" 'http',
tls,
'://www.',
domain,
'.',
tld,
'/',
page
) AS url
FROM
web
Roadmap
In the future, f-strings may incorporate string formatting such as datetimes, numbers, and
padding. If there’s a feature that would be helpful, please post an issue.
Null handling
SQL has an unconventional way of handling NULL values, since it treats them as unknown
values. As a result, in SQL:
NULL is not a value indicating a missing entry, but a placeholder for anything possible,
NULL = NULL evaluates to NULL , since one cannot know if one unknown is equal to
another unknown,
NULL <> NULL evaluates to NULL , using same logic,
to check if a value is NULL , SQL introduces IS NULL and IS NOT NULL operators,
DISTINCT column may return multiple NULL values.
For more information, check out the Postgres documentation.
PRQL, on the other hand, treats null as a value, which means that:
null == null evaluates to true ,

null != null evaluates to false ,
distinct column cannot contain multiple null values.
PRQL SQL
filter first_name == null *
filter null != last_name FROM
employees
WHERE
first_name IS NULL
AND last_name IS NOT NULL
Note that PRQL doesn’t change how NULL is compared between columns, for example in joins.
(PRQL compiles to SQL and so can’t change the behavior of the database).
For more context or to provide feedback check out the discussion on issue #99.
Ranges
PRQL has a concise range syntax start..end . If only one of start & end are supplied, the
range is open on the empty side.
Ranges can be used in filters with the in function, with any type of literal, including dates:
PRQL SQL
from events SELECT
filter (date | in @1776-07-04..@1787-09- *,
17) latitude >= 0 AS is_northern
filter (magnitude | in 50..100) FROM
derive is_northern = (latitude | in 0..) events
WHERE
date BETWEEN DATE '1776-07-04'
AND DATE '1787-09-17'
AND magnitude BETWEEN 50
AND 100
Like in SQL, ranges are inclusive.
As discussed in the take docs, ranges can also be used in take :
PRQL SQL
from orders SELECT
sort [-value, date] *
take 101..110 FROM
orders
ORDER BY
value DESC,
date
LIMIT
10 OFFSET 100
Note
Half-open ranges are generally less intuitive to read than a simple >= or <= operator.
Roadmap
We’d like to use ranges for other types, such as whether an object is in an array or list literal.
S-Strings
An s-string inserts SQL directly, as an escape hatch when there’s something that PRQL doesn’t
yet implement. For example, there’s no version() function in SQL that returns the Postgres
version, so if we want to use that, we use an s-string:
PRQL SQL
from my_table SELECT
select db_version = s"version()" version() AS db_version
FROM
my_table
We can embed columns in an s-string using braces. For example, PRQL’s standard library
defines the average function as:
func average column -> s"AVG({column})"
So this compiles using the function:
PRQL SQL
aggregate [average salary] AVG(salary)
FROM
employees
Here’s an example of a more involved use of an s-string:
PRQL SQL
from de=dept_emp SELECT
join s=salaries side:left [ de.*,
(s.emp_no == de.emp_no), s.*
s"""({s.from_date}, {s.to_date}) FROM
OVERLAPS dept_emp AS de
({de.from_date}, {de.to_date})""" LEFT JOIN salaries AS s ON s.emp_no =
] de.emp_no
AND (s.from_date, s.to_date) OVERLAPS
(de.from_date, de.to_date)
For those who have used python, s-strings are similar to python’s f-strings, but the result is SQL
code, rather than a string literal. For example, a python f-string of f"average{col}" would
produce "average(salary)" , with quotes; while in PRQL, s"average{col}" produces

average(salary) , without quotes.
We can also use s-strings to produce a full table:
PRQL SQL
from s"SELECT DISTINCT ON first_name, WITH table_2 AS (
id, age FROM employees ORDER BY age ASC" SELECT
join s = s"SELECT * FROM salaries" DISTINCT ON first_name,
[==id] id,
age
FROM
employees
ORDER BY
age ASC
),
table_3 AS (
SELECT
*
FROM
salaries
)
SELECT
table_0.*,
table_1.*
FROM
table_2 AS table_0
JOIN table_3 AS table_1 ON table_0.id
= table_1.id
Note
S-strings in user code are intended as an escape-hatch for an unimplemented feature. If we

often need s-strings to express something, that’s a sign we should implement it in PRQL or
PRQL’s stdlib.
Braces
To output braces from an s-string, use double braces:
PRQL SQL
derive [ *,
has_valid_title = regexp_contains(title, '([a-z0-9]*-)
s"regexp_contains(title, '([a-z0-9]*-) {2,}') AS has_valid_title
{{2,}}')" FROM
] employees
Precedence
The PRQL compiler simply places a literal copy of each variable into the resulting string, which
means we may get surprising behavior when the variable is has multiple terms and the s-string
isn’t parenthesized.
In this toy example, the salary + benefits / 365 gets precedence wrong:
PRQL SQL
derive [ *,
gross_salary = salary + benefits, salary + benefits AS gross_salary,
daily_rate = s"{gross_salary} / 365" salary + benefits / 365 AS daily_rate
] FROM
employees
Instead, we’d need to put the denominator {gross_salary} in parentheses:
PRQL SQL
derive [ *,
gross_salary = salary + benefits, salary + benefits AS gross_salary,
daily_rate = s"({gross_salary}) / 365" (salary + benefits) / 365 AS
] daily_rate
FROM
employees
Strings
Strings in PRQL can use either single or double quotes:
PRQL SQL
select x = "hello world" 'hello world' AS x
FROM
my_table
PRQL SQL
select x = 'hello world' 'hello world' AS x
FROM
my_table
To quote a string containing quotes, either use the “other” type of quote, or use three-or-more
quotes, and close with the same number.
PRQL SQL
select x = '"hello world"' '"hello world"' AS x
FROM
my_table
PRQL SQL
select x = """I said "hello world"!""" 'I said "hello world"!' AS x
FROM
my_table
PRQL SQL
select x = """""I said """hello 'I said """hello world"""!' AS x
world"""!""""" FROM
my_table
Note
Currently PRQL does not adjust escape characters.
Warning
Currently PRQL allows multiline strings with either a single character or multiple character
quotes. This may change for strings using a single character quote in future versions.
Switch
Note
switch is currently experimental and may change behavior in the near future
PRQL uses switch for both SQL’s CASE and IF statements. Here’s an example:
PRQL SQL
derive distance = switch [ *,
city == "Calgary" -> 0, CASE
city == "Edmonton" -> 300, WHEN city = 'Calgary' THEN 0
] WHEN city = 'Edmonton' THEN 300
ELSE NULL
END AS distance
FROM
employees
If no condition is met, the value takes a null value. To set a default, use a true condition:
PRQL SQL
derive distance = switch [ *,
city == "Calgary" -> 0, CASE
city == "Edmonton" -> 300, WHEN city = 'Calgary' THEN 0
true -> "Unknown", WHEN city = 'Edmonton' THEN 300
] ELSE 'Unknown'
END AS distance
FROM
employees
Standard Library
The standard library currently contains commonly used functions that are used in SQL. It’s not
yet as broad as we’d like, and we’re very open to expanding it.
Currently s-strings are an escape-hatch for any function that isn’t in our standard library. If we
find ourselves using them for something frequently, raise an issue and we’ll add it to the stdlib.
Note
Currently the stdlib implementation doesn’t support different DB implementations itself;

those need to be built deeper into the compiler. We’ll resolve this at some point. Until then,
we’ll only add functions here that are broadly supported by most DBs.
Here’s the source of the current PRQL std :
# Aggregate Functions
func min <scalar|column> column -> null
func max <scalar|column> column -> null
func sum <scalar|column> column -> null
func avg <scalar|column> column -> null
func stddev <scalar|column> column -> null
func average <scalar|column> column -> null
func count <scalar|column> non_null:s"*" -> null
# TODO: Possibly make this into `count distinct:true` (or like `distinct:` as an
# abbreviation of that?)
func count_distinct <scalar|column> column -> null
# Window functions
func lag<column> offset column -> null
func lead<column> offset column -> null
func first<column> offset column -> null
func last<column> offset column -> null
func rank<column> -> null
func rank_dense<column> -> null
func row_number<column> -> null
# Other functions
func round<scalar> n_digits column -> null
func as<scalar> `noresolve.type` column -> null
func in<bool> pattern value -> null
# Transform type definitions

func from<table> `default_db.source`<table> -> null
func select<table> columns<column> tbl<table> -> null
func filter<table> condition<bool> tbl<table> -> null
func derive<table> columns<column> tbl<table> -> null
func aggregate<table> a<column> tbl<table> -> null
func sort<table> by tbl<table> -> null
func take<table> expr tbl<table> -> null
func join<table> `default_db.with`<table> filter `noresolve.side`:inner tbl<table>
-> null
func concat<table> `default_db.bottom`<table> top<table> -> null
func union<table> `default_db.bottom`<table> top<table> -> (
top | concat _param.bottom | group [`*`] (take 1)
)
func group<table> by pipeline tbl<table> -> null
func window<table> rows:0..0 range:0..0 expanding:false rolling:0 pipeline
tbl<table> -> null
And a couple of examples:
PRQL SQL
derive [ *,
gross_salary = (salary + payroll_tax | CAST(salary + payroll_tax AS int) AS
as int), gross_salary,
gross_salary_rounded = (gross_salary | ROUND(CAST(salary + payroll_tax AS
round 0), int), 0) AS gross_salary_rounded,
time = s"NOW()", # an s-string, given NOW() AS time
no `now` function exists in PRQL FROM
] employees
Java (prql-java)
prql-java offers rust bindings to the prql-compiler rust library. It exposes a java native
method public static native String toSql(String query) .
Installation
<dependency>
<groupId>org.prqllang</groupId>
<artifactId>prql-java</artifactId>
<version>${PRQL_VERSION}</version>
</dependency>
Usage
import org.prqllang.prql4j.PrqlCompiler;
class Main {
public static void main(String[] args) {
String sql = PrqlCompiler.toSql("from table");
System.out.println(sql);
}
}
Javascript (prql-js)
JavaScript bindings for prql-compiler . Check out https://prql-lang.org for more context.
Installation
npm install prql-js
Usage
Currently these functions are exposed
function compile(prql_string) # returns CompileResult

function to_sql(prql_string) # returns SQL string
function to_json(prql_string) # returns JSON string ( needs JSON.parse() to get
the json)
From NodeJS
const prql = require("prql-js");
const { sql, error } = compile(`from employees | select first_name`);

console.log(sql);
// handle error as well...
From a Browser
<html>
<head>
<script src="./node_modules/prql-js/dist/web/prql_js.js"></script>
<script>
const { compile } = wasm_bindgen;
async function run() {

await wasm_bindgen("./node_modules/prql-js/dist/web/prql_js_bg.wasm");
const { sql, error } = compile("from employees | select first_name");
console.log(sql);
}
run();
</script>
</head>
<body></body>
</html>
From a Framework or a Bundler
import compile from "prql-js/dist/bundler";
const { sql, error } = compile(`from employees | select first_name`);

console.log(sql);
Notes
This uses wasm-pack to generate bindings1.
1 though we would be very open to other approaches, and used

trunk successfully in a rust-driven
approach to this, RIP prql-web .
Development
Build:
npm run build
This builds Node, bundler and web packages in the dist path.
Test:
wasm-pack test --firefox
Python (prql-python)
Installation
pip install prql-python
Usage
import prql_python as prql
prql_query = """
from employees
join salaries [==emp_id]
group [dept_id, gender] (
aggregate [
avg_salary = average salary
]
)
"""
sql = prql.to_sql(prql_query)
R (prqlr)
R bindings for prql-compiler . Check out https://eitsupi.github.io/prqlr/ for more context.
Note
prqlr is generously maintained by @eitsupi in the eitsupi/prqlr repo.
Installation
install.packages("prqlr", repos = "https://eitsupi.r-universe.dev")
Usage
library(prqlr)
"
from employees
join salaries [emp_id]
group [dept_id, gender] (
aggregate [
avg_salary = average salary
]
)
" |>
prql_to_sql()
Rust (prql-compiler)
Installation
cargo new myproject

cd myproject
cargo add prql-compiler
Usage
cargo run
src/main.rs
use prql_compiler::compile;
fn main() {
let prql = "from employees | select [name,age] ";
let sql = compile(prql).unwrap();
println!("{:?}", sql.replace("\n", " "));
}
Cargo.toml
[package]
name = "myproject"
version = "0.1.0"
edition = "2021"
[dependencies]
prql-compiler = "0.2.2"
Internals
This chapter explains PRQL’s semantics: how expressions are interpreted and their meaning.
It’s intended for advanced users and compiler contributors.
Name resolving
Because PRQL primarily handles relational data, it has specialized scoping rules for referencing
columns.
Scopes
In PRQL’s compiler, a scope is the collection of all names one can reference from a specific
point in the program.
In PRQL, names in the scope are composed from namespace and variable name which are
separated by a dot, similar to SQL. Namespaces can contain many dots, but variable names
cannot.
Example
Name my_table.some_column is a variable some_column from namespace my_table .
Name foo.bar.baz is a variable baz from namespace foo.bar .
When processing a query, a scope is maintained and updated for each point in the query.
It start with only namespace std , which is the standard library. It contains common functions
like sum or count , along with all transform functions such as derive and group .
In pipelines (or rather in transform functions), scope is also injected with namespaces of tables
which may have been referenced with from or join transforms. These namespaces contain
simply all the columns of the table and possibly a wildcard variable, which matches any
variable (see the algorithm below). Within transforms, there is also a special namespace that
does not have a name. It is called a “frame” and it contains columns of the current table the
transform is operating on.
Resolving
For each ident we want to resolve, we search the scope’s items in order. One of three things
can happen:
Scope contains an exact match, e.g. a name that matches in namespace and the variable
name.
Scope does not contain an exact match, but the ident did not specify a namespace, so we
can match a namespace that contains a * wildcard. If there’s a single namespace, the
matched namespace is also updated to contain this new variable name.
Otherwise, the nothing is matched and an error is raised.
Translating to SQL
When translating into a SQL statement which references only one table, there is no need to
reference column names with table prefix.
PRQL SQL
FROM
employees
But when there are multiple tables and we don’t have complete knowledge of all table
columns, a column without a prefix (i.e. first_name ) may actually reside in multiple tables.
Because of this, we have to use table prefixes for all column names.
PRQL SQL
derive [first_name, dept_id] employees.first_name,
join d=departments [==dept_id] d.title
select [first_name, d.title] FROM
employees
JOIN departments AS d ON
employees.dept_id = d.dept_id
As you can see, employees.first_name now needs table prefix, to prevent conflicts with
potential column with the same name in departments table. Similarly, d.title needs the
table prefix.
Functions
Function call
The major distinction between PRQL and today’s conventional programming languages such as
C or Python is the function call syntax. It consists of the function name followed by arguments
separated by whitespace.
function_name arg1 arg2 arg3
If one of the arguments is also a function call, it must be encased in parentheses, so we know
where arguments of inner function end and the arguments of outer function start.
outer_func arg_1 (inner_func arg_a, arg_b) arg_2
Pipeline
There is a alternative way of calling functions: using a pipeline. Regardless of whether the
pipeline is delimited by pipe symbol | or a new line, the pipeline is equivalent to applying each
of functions as the last argument of the next function.
a | foo 3 | bar 'hello' 'world' | baz
… is equivalent to …
baz (bar 'hello' 'world' (foo 3 a))
As you may have noticed, transforms are regular functions too!
PRQL SQL
filter age > 50 *
sort name FROM
employees
WHERE
age > 50
ORDER BY
name
PRQL SQL
from employees | filter age > 50 | sort SELECT
name *
FROM
employees
WHERE
age > 50
ORDER BY
name
PRQL SQL
filter age > 50 (from employees) | sort SELECT
name *
FROM
employees
WHERE
age > 50
ORDER BY
name
PRQL SQL
sort name (filter age > 50 (from SELECT
employees)) *
FROM
employees
WHERE
age > 50
ORDER BY
name
As you can see, the first example with pipeline notation is much easier to comprehend,
compared to the last one with the regular function call notation. This is why it is recommended
to use pipelines for nested function calls that are 3 or more levels deep.
Currying and late binding

In PRQL, functions are first class citizens. As cool as that sounds, we need simpler terms to
explain it. In essence in means that we can operate with functions are with any other value.
Syntax highlighting
PRQL contains multiple grammar definitions to enable tools to highlight PRQL code. These are
all intended to provide as good an experience as the grammar supports. Please raise any
shortcomings in a GitHub issue.
The definitions are somewhat scattered around the codebase; this page serves as an index.
Lezer — used by CodeMirror editors. The PRQL file is at prql-lezer/README.me .
Handlebars — currently duplicated:
The book: book/highlight-prql.js

The website (outside of the book & playground): website/themes/prql-
theme/static/plugins/highlight/prql.js
Textmate — used by the VSCode Extension. It’s in the prql-vscode repo in prql-
vscode/syntaxes/prql.tmLanguage.json .
Monarch — used by the Monaco editor, which we use for the Playground. The grammar is
at playground/src/workbench/prql-syntax.js .
While the pest grammar at prql-compiler/src/prql.pest isn’t used for syntax highlighting,
it’s the arbiter of truth given it currently powers the PRQL compiler.
dbt-prql
Original docs at https://github.com/prql/dbt-prql
dbt-prql allows writing PRQL in dbt models. This combines the benefits of PRQL’s power &
simplicity within queries, with dbt’s version control, lineage & testing across queries.
Once dbt-prql in installed, dbt commands compile PRQL between {% prql %} & {% endprql
%} jinja tags to SQL as part of dbt’s compilation. No additional config is required.
Examples
Simple example
{% prql %}
from employees
filter (age | in 20..30)
{% endprql %}
…would appear to dbt as:
SELECT
employees.*
FROM
employees
WHERE
age BETWEEN 20
AND 30
Less simple example
{% prql %}
from {{ source('salesforce', 'in_process') }}
derive expected_sales = probability * value
join {{ ref('team', 'team_sales') }} [==name]
group name (
aggregate (expected_sales)
)
{% endprql %}
…would appear to dbt as:
SELECT
name,
{{ source('salesforce', 'in_process') }}.probability * {{ source('salesforce',
'in_process') }}.value AS expected_sales
FROM
{{ source('salesforce', 'in_process') }}
JOIN {{ ref('team', 'team_sales') }} USING(name)
GROUP BY
name
…and then dbt will compile the source and ref s to a full SQL query.
Replacing macros
dbt’s use of macros has saved many of us many lines of code, and even saved some people
some time. But imperatively programming text generation with code like if not loop.last is
not our highest calling. It’s the “necessary” part rather than beautiful part of dbt.
Here’s the canonical example of macros in the dbt documentation:
{%- set payment_methods = ["bank_transfer", "credit_card", "gift_card"] -%}
select
order_id,
{%- for payment_method in payment_methods %}
sum(case when payment_method = '{{payment_method}}' then amount end) as
{{payment_method}}_amount
{%- if not loop.last %},{% endif -%}
{% endfor %}
from {{ ref('raw_payments') }}
group by 1
Here’s that model using PRQL1, including the prql jinja tags.
{% prql %}
func filter_amount method -> s"sum(case when payment_method = '{method}' then
amount end) as {method}_amount"
from {{ ref('raw_payments') }}
group order_id (
aggregate [
filter_amount bank_transfer,
filter_amount credit_card,
filter_amount gift_card,
]
)
{% endprql %}
As well the query being simpler in its final form, writing in PRQL also gives us live feedback
around any errors, on every keystroke. Though there’s much more to come, check out the
current version on PRQL Playground.
What it does
When dbt compiles models to SQL queries:
Any text in a dbt model between {% prql %} and {% endprql %} tags is compiled from
PRQL to SQL before being passed to dbt.
The PRQL compiler passes text that’s containing {{ & }} through to dbt without
modification, which allows us to embed jinja expressions in PRQL. (This was added to
PRQL specifically for this use-case.)
dbt will then compile the resulting model into its final form of raw SQL, and dispatch it to
the database, as per usual.
There’s no config needed in the dbt project; this works automatically on any dbt command (e.g.
dbt run ) assuming dbt-prql is installed.
Installation
pip install dbt-prql
Current state
Currently this is new, but fairly feature-complete. It’s enthusiastically supported — if there are
any problems, please open an issue.
Jupyter
Original docs at https://pyprql.readthedocs.io/en/latest/magic_readme.html
Work with pandas and PRQL in an IPython terminal or Jupyter notebook.
Implementation
This is a thin wrapper around the fantastic IPython-sql magic. Roughly speaking, all we do is
parse PRQL to SQL and pass that through to ipython-sql . A full documentation of the
supported features is available at their repository. Here, we document those places where we
differ from them, plus those features we think you are mostly likely to find useful.
Usage
Installation
If you have already installed PyPRQL into your environment, then you should be could to go!
We bundle in IPython and pandas , though you’ll need to install Jupyter separately. If you
haven’t installed PyPRQL, that’s as simple as:
pip install pyprql
Set Up
Open up either an IPython terminal or Jupyter notebook. First, we need to load the
extension and connect to a database.
In [1]: %load_ext pyprql.magic
Connecting a database
We have two options for connecting a database
1. Create an in-memory DB. This is the easiest way to get started.
In [2]: %prql duckdb:///:memory:
However, in-memory databases start off empty! So, we need to add some data. We have a
two options:
We can easily add a pandas dataframe to the DuckDB database like so:
In [3]: %prql --persist df
where df is a pandas dataframe. This adds a table named df to the in-memory

DuckDB instance.
Or download a CSV and query it directly, with DuckDB:
!wget https://github.com/graphql-compose/graphql-compose-
examples/blob/master/examples/northwind/data/csv/products.csv
…and then from products.csv will work.
2. Connect to an existing database
When connecting to a database, pass the connection string as an argument to the line
magic %prql . The connection string needs to be in SQLAlchemy format, so any
connection supported by SQLAlchemy is supported by the magic. Additional connection
parameters can be passed as a dictionary using the --connection_arguments flag to the
the %prql line magic. We ship with the necessary extensions to use DuckDB as the
backend, and here connect to an in-memory database.
Querying
Now, let’s do a query! By default, PRQLMagic always returns the results as dataframe, and
always prints the results. The results of the previous query are accessible in the _ variable.
These examples are based on the products.csv example above.
In [4]: %%prql
...: from p = products.csv
...: filter supplierID == 1
Done.
Returning data to local variable _
productID productName supplierID categoryID quantityPerUnit
unitPrice unitsInStock unitsOnOrder reorderLevel discontinued
0 1 Chai 1 1 10 boxes x 20 bags
18.0 39 0 10 0
1 2 Chang 1 1 24 - 12 oz bottles
19.0 17 40 25 0
2 3 Aniseed Syrup 1 2 12 - 550 ml bottles
10.0 13 70 25 0
In [5]: %%prql
...: group categoryID (
...: aggregate [average unitPrice]
...: )
Done.
Returning data to local variable _
categoryID avg("unitPrice")
0 1 37.979167
1 2 23.062500
2 7 32.370000
3 6 54.006667
4 8 20.682500
5 4 28.730000
6 3 25.160000
7 5 20.250000
We can capture the results into a different variable like so:
In [6]: %%prql results <<

...: aggregate [min unitsInStock, max unitsInStock]
Done.
Returning data to local variable results
min("unitsInStock") max("unitsInStock")
0 0 125
Now, the output of the query is saved to results .
Prefect
Because Prefect is in native python, it’s extremely easy to integrate with PRQL.
With a Postgres Task, replace:
PostgresExecute.run(..., query=sql)
…with…
PostgresExecute.run(..., query=pyprql.to_sql(prql))
We’re big fans of Prefect, and if there is anything that would make the integration easier, please
open an issue.
Examples
These examples are rewritten from other languages such as SQL. They try to express real-
world problems in PRQL, covering most of the language features. We are looking for different
use-cases of data transformation, be it database queries, semantic business modeling or data
cleaning.
If you want to help, translate some of your queries to PRQL and open a PR to add them here!
PRQL SQL
filter country == "USA" SELECT
# Each line transforms the previous title,
result. country,
derive [ salary + payroll_tax + benefits_cost
# This adds columns / variables. AS _expr_0,
gross_salary = salary + payroll_tax, salary + payroll_tax AS _expr_1,
gross_cost = gross_salary + salary
benefits_cost # Variables can use other FROM
variables. employees
] WHERE
filter gross_cost > 0 country = 'USA'
group [title, country] ( )
# For each group use a nested pipeline SELECT
aggregate [ title,
# Aggregate each group to a single row country,
average gross_salary, AVG(_expr_1),
sum salary, SUM(salary),
sum gross_salary, SUM(_expr_1),
average gross_cost, AVG(_expr_0),
sum_gross_cost = sum gross_cost, SUM(_expr_0) AS sum_gross_cost,
ct = count, COUNT(*) AS ct
] FROM
) table_1
sort sum_gross_cost WHERE
filter ct > 200 _expr_0 > 0
take 20 GROUP BY
title,
country
HAVING
COUNT(*) > 200
ORDER BY
sum_gross_cost
LIMIT
20
PRQL SQL
group [emp_no] ( SELECT
aggregate [ AVG(salary) AS _expr_0,
emp_salary = average salary # emp_no
average salary resolves to "AVG(salary)" FROM
(from stdlib) employees
] GROUP BY
) emp_no
join titles [==emp_no] )
group [title] ( SELECT
aggregate [ AVG(table_1._expr_0) / 1000 AS
avg_salary = average emp_salary salary_k,
] AVG(table_1._expr_0) / 1000 * 1000 AS
) salary
select salary_k = avg_salary / 1000 # FROM
avg_salary should resolve to table_1
"AVG(emp_salary)" JOIN titles ON table_1.emp_no =
take 10 # titles.emp_no
induces new SELECT GROUP BY
derive salary = salary_k * 1000 # titles.title
salary_k should not resolve to LIMIT
"avg_salary / 1000" 10
Single item is coerced into a list

PRQL SQL
select salary salary
FROM
employees
Same as above but with salary in a list:
PRQL SQL
select [salary] salary
FROM
employees
Multiple items
PRQL SQL
derive [ *,
gross_salary = salary + payroll_tax, salary + payroll_tax AS gross_salary,
gross_cost = gross_salary + salary + payroll_tax + benefits_cost
] FROM
employees
Same as above but split into two lines:
PRQL SQL
derive gross_salary = salary + *,
payroll_tax salary + payroll_tax AS gross_salary,
derive gross_cost = gross_salary + salary + payroll_tax + benefits_cost
FROM
employees
PRQL SQL
table newest_employees = ( WITH average_salaries AS (
sort tenure country,
take 50 AVG(salary) AS
) average_country_salary
FROM
table average_salaries = ( salaries
from salaries GROUP BY
group country ( country
aggregate average_country_salary = ),
(average salary) newest_employees AS (
) SELECT
) *
FROM
from newest_employees employees
join average_salaries [==country] ORDER BY
select [name, salary, tenure
average_country_salary] LIMIT
50
)
SELECT
newest_employees.name,
newest_employees.salary,
average_salaries.average_country_salary
FROM
newest_employees
JOIN average_salaries ON
newest_employees.country =
average_salaries.country
Employees
These are homework tasks on employees database.
Clone and init the database (requires a local PostgreSQL instance):
psql -U postgres -c 'CREATE DATABASE employees;'

git clone https://github.com/vrajmohan/pgsql-sample-data.git
psql -U postgres -d employees -f pgsql-sample-data/employee/employees.dump
Execute a PRQL query:
cd prql-compiler
cargo run compile examples/employees/average-title-salary.prql | psql -U postgres
-d employees
Task 1
rank the employee titles according to the average salary for each department.
My solution:
for each employee, find their average salary,

join employees with their departments and titles (duplicating employees for each of their
titles and departments)
group by department and title, aggregating average salary
join with department to get department name
PRQL SQL
from salaries WITH table_1 AS (
group [emp_no] ( SELECT
aggregate [emp_salary = average AVG(salary) AS _expr_0,
salary] emp_no
) FROM
join t=titles [==emp_no] salaries
join dept_emp side:left [==emp_no] GROUP BY
group [dept_emp.dept_no, t.title] ( emp_no
aggregate [avg_salary = average ),
emp_salary] table_2 AS (
) SELECT
join departments [==dept_no] t.title,
select [dept_name, title, avg_salary] AVG(table_1._expr_0) AS avg_salary,
dept_emp.dept_no
FROM
table_1
JOIN titles AS t ON table_1.emp_no =
t.emp_no
LEFT JOIN dept_emp ON table_1.emp_no
= dept_emp.emp_no
GROUP BY
dept_emp.dept_no,
t.title
)
SELECT
departments.dept_name,
table_2.title,
table_2.avg_salary
FROM
table_2
JOIN departments ON table_2.dept_no =
departments.dept_no
Task 2
Estimate distribution of salaries and gender for each department departments.
PRQL SQL
join salaries [==emp_no] SELECT
group [e.emp_no, e.gender] ( e.gender,
aggregate [ AVG(salaries.salary) AS _expr_0,
emp_salary = average salaries.salary e.emp_no
] FROM
) employees AS e
join de=dept_emp [==emp_no] side:left JOIN salaries ON e.emp_no =
group [de.dept_no, gender] ( salaries.emp_no
aggregate [ GROUP BY
salary_avg = average emp_salary, e.emp_no,
salary_sd = stddev emp_salary, e.gender
] ),
) table_2 AS (
join departments [==dept_no] SELECT
select [dept_name, gender, salary_avg, table_1.gender,
salary_sd] AVG(table_1._expr_0) AS salary_avg,
STDDEV(table_1._expr_0) AS
salary_sd,
de.dept_no
FROM
table_1
LEFT JOIN dept_emp AS de ON
table_1.emp_no = de.emp_no
GROUP BY
de.dept_no,
table_1.gender
)
SELECT
departments.dept_name,
table_2.gender,
table_2.salary_avg,
table_2.salary_sd
FROM
table_2
JOIN departments ON table_2.dept_no =
departments.dept_no
Task 3
Estimate distribution of salaries and gender for each manager.
PRQL SQL
join salaries [==emp_no] SELECT
group [e.emp_no, e.gender] ( e.gender,
aggregate [ AVG(salaries.salary) AS _expr_0,
emp_salary = average salaries.salary e.emp_no
] FROM
) employees AS e
join de=dept_emp [==emp_no] JOIN salaries ON e.emp_no =
join dm=dept_manager [ salaries.emp_no
(dm.dept_no == de.dept_no) and s" GROUP BY
(de.from_date, de.to_date) OVERLAPS e.emp_no,
(dm.from_date, dm.to_date)" e.gender
] ),
group [dm.emp_no, gender] ( table_2 AS (
aggregate [ SELECT
salary_avg = average emp_salary, AVG(table_1._expr_0) AS salary_avg,
salary_sd = stddev emp_salary STDDEV(table_1._expr_0) AS
] salary_sd,
) dm.emp_no
derive mng_no = emp_no FROM
join managers=employees [==emp_no] table_1
derive mng_name = s"managers.first_name JOIN dept_emp AS de ON
|| ' ' || managers.last_name" table_1.emp_no = de.emp_no
select [mng_name, managers.gender, JOIN dept_manager AS dm ON
salary_avg, salary_sd] dm.dept_no = de.dept_no
AND (de.from_date, de.to_date)
OVERLAPS (dm.from_date, dm.to_date)
GROUP BY
dm.emp_no,
table_1.gender
)
SELECT
managers.first_name || ' ' ||
managers.last_name AS mng_name,
managers.gender,
table_2.salary_avg,
table_2.salary_sd
FROM
table_2
JOIN employees AS managers ON
table_2.emp_no = managers.emp_no
Task 4
Find distributions of titles, salaries and genders for each department.
PRQL SQL
from de=dept_emp WITH table_1 AS (
join s=salaries side:left [ SELECT
(s.emp_no == de.emp_no), de.dept_no,
s"({s.from_date}, {s.to_date}) AVG(s.salary) AS salary,
OVERLAPS ({de.from_date}, {de.to_date})" de.emp_no
] FROM
group [de.emp_no, de.dept_no] ( dept_emp AS de
aggregate salary = (average s.salary) LEFT JOIN salaries AS s ON s.emp_no
) = de.emp_no
join employees [==emp_no] AND (s.from_date, s.to_date)
join titles [==emp_no] OVERLAPS (de.from_date, de.to_date)
select [dept_no, salary, GROUP BY
employees.gender, titles.title] de.emp_no,
de.dept_no
)
SELECT
table_1.dept_no,
table_1.salary,
employees.gender,
titles.title
FROM
table_1
JOIN employees ON table_1.emp_no =
employees.emp_no
JOIN titles ON table_1.emp_no =
titles.emp_no

PRQL Language Book

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PRQL Language Book

Uploaded by

Copyright:

Available Formats

6/1/23, 10:12 PRQL Language Book

Let’s get started with an example:

The simplest pipeline

Functions have two types of parameters:

1. Positional parameters, which require an argument.

So this function is named fahrenheit_to_celsius and has one parameter temp :

We can combine a chain of functions, which makes logic more readable:

func return price -> (price - dividend) / price_yesterday

…isn’t yet a valid function, and instead would needs to be:

func return price dividend price_yesterday -> (price - dividend) /

(which makes functions in this case not useful)

Syntax Usage Example

Named args &

derive celsius = (fahrenheit - 32) /

` ` Quoted identifiers select `first name`

@ Dates & Times @2021-01-01

Syntax Usage Example

== Self-equality in join join s=salaries [==id]

-> Function definitions func add a b -> a + b

+/- Sort order sort [-amount, +date]

A line-break doesn’t create a pipeline in a couple of cases:

within a list (e.g. the derive examples below),

Query header: Target dialect & Version

This has two roles, one of which is implemented:

These are the currently available transforms:

Transform Purpose SQL Equivalent

select Picks & computes columns SELECT ... AS ...

sort Orders rows based on the values of columns ORDER BY

Adds columns from another table, matching

Partitions rows into groups and applies a GROUP BY , PARTITION

aggregate Summarizes many rows into one row SELECT foo(...)

Applies a pipeline to overlapping segments

without group , it produces one row from the whole table,

aggregate [{expression or assign operations}]

To introduce an alias, use an assign expression:

group [{key_columns}] {pipeline}

The partitioning of groups are determined by the key_column s (first argument).

The most conventional use of group is with aggregate :

join side:{inner|left|right|full} {table} {[conditions]}

Self equality operator

To refer to the e.first_name column in subsequent transforms, either refer to it using

Concat & Union

We can also use expressions:

See Issue #1363 for more details.

See Ranges for more details on how ranges work.

window rows:{range} range:{range} expanding:false rolling:0 {pipeline}

You can also only apply group :

Window functions as first class citizens

Dates & Times

These aren’t the same as ISO8601, because we evaluated P3Y6M4DT12H30M5S to be difficult

@20221231 is forbidden — it must contain full punctuation ( - and : ),

derive pi_day = @2017-03-14T15:09:26.535898<datetime>

These are some examples we can then add:

@2022-12-31T16:54<datetime> is datetime without timezone

This also works without a linebreak:

Selecting from each group

# youngest employee from each department

SELECT DISTINCT ON (department) *

For more information, check out the Postgres documentation.

null == null evaluates to true ,

Like in SQL, ranges are inclusive.

As discussed in the take docs, ranges can also be used in take :

func average column -> s"AVG({column})"

So this compiles using the function: