You are on page 1of 3

DUCK DB

Data Import 45

Data Export 20

Querying 1

Query Plan 8

Types
Data Types - DuckDB

Python
Install
Installing the Python Client - DuckDB

pip install duckdb==0.9.2

Issues 3

Connect 10

Querying 70
Executing SQL in Python - DuckDB

Types
Types API - DuckDB

Interoperability

Pandas 44

Arrow
SQL on Apache Arrow - DuckDB

res:
https://duckdb.org/docs/guides/python/sql_on_arrow
https://duckdb.org/2021/12/03/duck-arrow.html

Tables

EXamples 2
Streaming

Examples

# Reads dataset partitioning it in year/month folder


nyc_dataset = ds.dataset('nyc-taxi/', partitioning=["year", "month"])

# Gets Database Connection


con = duckdb.connect()

query = con.execute("SELECT * FROM nyc_dataset")

# DuckDB's queries can now produce a Record Batch Reader


chunk_size = 1_000_000
record_batch_reader = query.fetch_record_batch(chunk_size)

# Which means we can stream the whole query per batch.

# Loop through the results. A StopIteration exception is thrown when


the RecordBatchReader is empty
while True:
try:
# Process a single chunk here (just printing as an example)
chunk = record_batch_reader.read_next_batch()
print(chunk.to_pandas())
except StopIteration:
print('Already fetched all batches')
break

Duckdb can consume Arrow stream objects unlike pandas

DuckDB’s query optimizer can automatically push down filters and


projections.

Projection pushdown

Filter pushdown

Benchmark
DuckDB quacks Arrow: A zero-copy da…

Duckdb runs in parallel, unlike pandas

Polars

API
Python Client API - DuckDB
SQL
SQL Introduction - DuckDB

Types

Statements

Functions 37
Functions - DuckDB

Aggregate Functions
Aggregate Functions - DuckDB

Window
Window Functions - DuckDB

Configuration

Jupyter
Jupyter Notebooks - DuckDB

You might also like