You are on page 1of 61

Solving PostgreSQL wicked problems

Alexander Korotkov

Oriole DB Inc.

2021

Alexander Korotkov Solving PostgreSQL wicked problems 1 / 40


PostgreSQL has two sides

Alexander Korotkov Solving PostgreSQL wicked problems 2 / 40


The bright side of PostgreSQL

Alexander Korotkov Solving PostgreSQL wicked problems 3 / 40


PostgreSQL – one of the most popular DBMS’es1

1
According to db-engines.com
Alexander Korotkov Solving PostgreSQL wicked problems 4 / 40
PostgreSQL – strong trend2

2
https://db-engines.com/en/ranking_trend/system/PostgreSQL
Alexander Korotkov Solving PostgreSQL wicked problems 5 / 40
PostgreSQL – most loved RDBMS3

3
According to Stackoverflow 2020 survey
Alexander Korotkov Solving PostgreSQL wicked problems 6 / 40
The dark side of PostgreSQL

Alexander Korotkov Solving PostgreSQL wicked problems 7 / 40


Cri cism of PostgreSQL (1/2)

https://eng.uber.com/postgres-to-mysql-migration/
Alexander Korotkov Solving PostgreSQL wicked problems 8 / 40
Cri cism of PostgreSQL (2/2)

https://medium.com/@rbranson/10-things-i-hate-about-postgresql-20dbab8c2791

Alexander Korotkov Solving PostgreSQL wicked problems 9 / 40


10 wicked problems of PostgreSQL

Problem name Known for Work started Resolu on


1. Wraparound 20 years 15 years ago S ll WIP
2. Failover Will Probably Lose Data 20 years 16 years ago S ll WIP
3. Inefficient Replica on That Spreads Corrup on 10 years 8 years ago S ll WIP
4. MVCC Garbage Frequently Painful 20 years 19 years ago Abandoned
5. Process-Per-Connec on = Pain at Scale 20 years 3 years ago Abandoned
6. Primary Key Index is a Space Hog 13 years — Not started
7. Major Version Upgrades Can Require Down me 21 years 16 years ago S ll WIP
8. Somewhat Cumbersome Replica on Setup 10 years 9 years ago S ll WIP
9. Ridiculous No-Planner-Hints Dogma 20 years 11 years ago Extension
10. No Block Compression 12 years 11 years ago S ll WIP
* Scalability on modern hardware

Alexander Korotkov Solving PostgreSQL wicked problems 10 / 40


The exci ng moment

▶ PostgreSQL community have proven to be brilliant on solving


non-design issues, providing fantas c product to the market.

Alexander Korotkov Solving PostgreSQL wicked problems 11 / 40


The exci ng moment

▶ PostgreSQL community have proven to be brilliant on solving


non-design issues, providing fantas c product to the market.
▶ As a result, PostgreSQL has had a strong upwards trend for many
years.

Alexander Korotkov Solving PostgreSQL wicked problems 11 / 40


The exci ng moment

▶ PostgreSQL community have proven to be brilliant on solving


non-design issues, providing fantas c product to the market.
▶ As a result, PostgreSQL has had a strong upwards trend for many
years.
▶ At the same me, the PostgreSQL community appears to be
dysfunc onal in solving design issues, a rac ng severe cri cism.
Nevertheless, cri cs not yet break the upwards trend.

Alexander Korotkov Solving PostgreSQL wicked problems 11 / 40


The exci ng moment

▶ PostgreSQL community have proven to be brilliant on solving


non-design issues, providing fantas c product to the market.
▶ As a result, PostgreSQL has had a strong upwards trend for many
years.
▶ At the same me, the PostgreSQL community appears to be
dysfunc onal in solving design issues, a rac ng severe cri cism.
Nevertheless, cri cs not yet break the upwards trend.
▶ It appears to be a unique moment for PostgreSQL redesign!

Alexander Korotkov Solving PostgreSQL wicked problems 11 / 40


How could we solve the PostgreSQL
wicked problems?

Alexander Korotkov Solving PostgreSQL wicked problems 12 / 40


Tradi onal buffer management

Memory Disk
1' 1

2' 3'
2 3

5' 6' 4 5 6 7

Buffer
1 2 3 4 5 6 7
mapping

▶ Each page access requires lookup into buffer mapping data structure.

Alexander Korotkov Solving PostgreSQL wicked problems 13 / 40


Tradi onal buffer management

Memory Disk
1' 1

2' 3'
2 3

5' 6' 4 5 6 7

Buffer
1 2 3 4 5 6 7
mapping

▶ Each page access requires lookup into buffer mapping data structure.
▶ Each B-tree key lookup takes mul ple buffer mapping lookups.

Alexander Korotkov Solving PostgreSQL wicked problems 13 / 40


Tradi onal buffer management

Memory Disk
1' 1

2' 3'
2 3

5' 6' 4 5 6 7

Buffer
1 2 3 4 5 6 7
mapping

▶ Each page access requires lookup into buffer mapping data structure.
▶ Each B-tree key lookup takes mul ple buffer mapping lookups.
▶ Accessing cached data doesn’t scale on modern hardware.
Alexander Korotkov Solving PostgreSQL wicked problems 13 / 40
Solu on: Dual pointers

2 3

5 7

Disk
1 2 3 4 5 6 7

▶ In-memory page refers either in-memory or on-disk page.

Alexander Korotkov Solving PostgreSQL wicked problems 14 / 40


Solu on: Dual pointers

2 3

5 7

Disk
1 2 3 4 5 6 7

▶ In-memory page refers either in-memory or on-disk page.


▶ Accessing cached data without buffer mapping lookups.

Alexander Korotkov Solving PostgreSQL wicked problems 14 / 40


Solu on: Dual pointers

2 3

5 7

Disk
1 2 3 4 5 6 7

▶ In-memory page refers either in-memory or on-disk page.


▶ Accessing cached data without buffer mapping lookups.
▶ Good scalability!
Alexander Korotkov Solving PostgreSQL wicked problems 14 / 40
PostgreSQL MVCC = bloat + write-amplifica on

▶ New and old row versions shares the same heap.

Alexander Korotkov Solving PostgreSQL wicked problems 15 / 40


PostgreSQL MVCC = bloat + write-amplifica on

▶ New and old row versions shares the same heap.


▶ Non-HOT updates cause index bloat.
Alexander Korotkov Solving PostgreSQL wicked problems 15 / 40
Solu on: undo log for both pages and rows
Undo

row1 row2 row2v1

row3 row4 row1v1 row1v2

page
row1 row4

▶ Old row versions form chains in undo log.

Alexander Korotkov Solving PostgreSQL wicked problems 16 / 40


Solu on: undo log for both pages and rows
Undo

row1 row2 row2v1

row3 row4 row1v1 row1v2

page
row1 row4

▶ Old row versions form chains in undo log.


▶ Page-level chains evict deleted rows from primary storage.

Alexander Korotkov Solving PostgreSQL wicked problems 16 / 40


Solu on: undo log for both pages and rows
Undo

row1 row2 row2v1

row3 row4 row1v1 row1v2

page
row1 row4

▶ Old row versions form chains in undo log.


▶ Page-level chains evict deleted rows from primary storage.
▶ Update only indexes with changed values.
Alexander Korotkov Solving PostgreSQL wicked problems 16 / 40
Block-level WAL

Index #1 Index #2
Heap

WAL

▶ Huge WAL traffic.

Alexander Korotkov Solving PostgreSQL wicked problems 17 / 40


Block-level WAL

Index #1 Index #2
Heap

WAL

▶ Huge WAL traffic.


▶ Problems with parallel apply.

Alexander Korotkov Solving PostgreSQL wicked problems 17 / 40


Block-level WAL

Index #1 Index #2
Heap

WAL

▶ Huge WAL traffic.


▶ Problems with parallel apply.
▶ Not suitable for mul -master replica on.

Alexander Korotkov Solving PostgreSQL wicked problems 17 / 40


Solu on: row-level WAL
Index #1 Index #2
Heap

WAL

▶ Very compact.

Alexander Korotkov Solving PostgreSQL wicked problems 18 / 40


Solu on: row-level WAL
Index #1 Index #2
Heap

WAL

▶ Very compact.
▶ Apply can be parallelized.

Alexander Korotkov Solving PostgreSQL wicked problems 18 / 40


Solu on: row-level WAL
Index #1 Index #2
Heap

WAL

▶ Very compact.
▶ Apply can be parallelized.
▶ Suitable for mul master (row-level conflicts, not block-level).

Alexander Korotkov Solving PostgreSQL wicked problems 18 / 40


Solu on: row-level WAL
Index #1 Index #2
Heap

WAL

▶ Very compact.
▶ Apply can be parallelized.
▶ Suitable for mul master (row-level conflicts, not block-level).
▶ Recovery needs structurally consistent checkpoints.
Alexander Korotkov Solving PostgreSQL wicked problems 18 / 40
Row-level WAL based mul master

OrioleDB instance OrioleDB instance OrioleDB instance

Storage Storage Storage

WAL WAL WAL

Raft replication

Alexander Korotkov Solving PostgreSQL wicked problems 19 / 40


Copy-on-write checkpoints (1/4)

2 3

5 7

Disk
1 2 3 4 5 6 7

Alexander Korotkov Solving PostgreSQL wicked problems 20 / 40


Copy-on-write checkpoints (2/4)

2 3

5 7*

Disk
1 2 3 4 5 6 7

Alexander Korotkov Solving PostgreSQL wicked problems 21 / 40


Copy-on-write checkpoints (3/4)

1*

2 3*

5 7*

Disk
1 2 3 4 5 6 7 7* 3* 1*

Alexander Korotkov Solving PostgreSQL wicked problems 22 / 40


Copy-on-write checkpoints (4/4)

1*

2 3*

5 7*

Disk
2 4 5 6 7* 3* 1*

Alexander Korotkov Solving PostgreSQL wicked problems 23 / 40


What do we need from PostgreSQL extendability?
File system PostgreSQL server File system

Backend Backgroud processes


OrioleDB

Data files extension


Connection Autovacuum
OrioleDB
Parser Background writer data files

WAL files Rewriter Checkpointer

Planner WAL writer

Executor ...... OrioleDB


Log files undo files

▶ Extended table AM.

Alexander Korotkov Solving PostgreSQL wicked problems 24 / 40


What do we need from PostgreSQL extendability?
File system PostgreSQL server File system

Backend Backgroud processes


OrioleDB

Data files extension


Connection Autovacuum
OrioleDB
Parser Background writer data files

WAL files Rewriter Checkpointer

Planner WAL writer

Executor ...... OrioleDB


Log files undo files

▶ Extended table AM.


▶ Custom toast handlers.

Alexander Korotkov Solving PostgreSQL wicked problems 24 / 40


What do we need from PostgreSQL extendability?
File system PostgreSQL server File system

Backend Backgroud processes


OrioleDB

Data files extension


Connection Autovacuum
OrioleDB
Parser Background writer data files

WAL files Rewriter Checkpointer

Planner WAL writer

Executor ...... OrioleDB


Log files undo files

▶ Extended table AM.


▶ Custom toast handlers.
▶ Custom row iden fiers.

Alexander Korotkov Solving PostgreSQL wicked problems 24 / 40


What do we need from PostgreSQL extendability?
File system PostgreSQL server File system

Backend Backgroud processes


OrioleDB

Data files extension


Connection Autovacuum
OrioleDB
Parser Background writer data files

WAL files Rewriter Checkpointer

Planner WAL writer

Executor ...... OrioleDB


Log files undo files

▶ Extended table AM.


▶ Custom toast handlers.
▶ Custom row iden fiers.
▶ Custom error cleanup.

Alexander Korotkov Solving PostgreSQL wicked problems 24 / 40


What do we need from PostgreSQL extendability?
File system PostgreSQL server File system

Backend Backgroud processes


OrioleDB

Data files extension


Connection Autovacuum
OrioleDB
Parser Background writer data files

WAL files Rewriter Checkpointer

Planner WAL writer

Executor ...... OrioleDB


Log files undo files

▶ Extended table AM.


▶ Custom toast handlers.
▶ Custom row iden fiers.
▶ Custom error cleanup.
▶ Recovery & checkpointer hooks.
Alexander Korotkov Solving PostgreSQL wicked problems 24 / 40
What do we need from PostgreSQL extendability?
File system PostgreSQL server File system

Backend Backgroud processes


OrioleDB

Data files extension


Connection Autovacuum
OrioleDB
Parser Background writer data files

WAL files Rewriter Checkpointer

Planner WAL writer

Executor ...... OrioleDB


Log files undo files

▶ Extended table AM. ▶ Snapshot hooks.


▶ Custom toast handlers.
▶ Custom row iden fiers.
▶ Custom error cleanup.
▶ Recovery & checkpointer hooks.
Alexander Korotkov Solving PostgreSQL wicked problems 24 / 40
What do we need from PostgreSQL extendability?
File system PostgreSQL server File system

Backend Backgroud processes


OrioleDB

Data files extension


Connection Autovacuum
OrioleDB
Parser Background writer data files

WAL files Rewriter Checkpointer

Planner WAL writer

Executor ...... OrioleDB


Log files undo files

▶ Extended table AM. ▶ Snapshot hooks.


▶ Custom toast handlers. ▶ Some other miscellaneous hooks
▶ Custom row iden fiers. total 1K lines patch to PostgreSQL
▶ Custom error cleanup. Core
▶ Recovery & checkpointer hooks.
Alexander Korotkov Solving PostgreSQL wicked problems 24 / 40
OrioleDB = PostgreSQL redesign

PostgreSQL

Block-level WAL Row-level WAL


Buffer mapping Direct page links
Buffer locking Lock-less access
Bloat-prone MVCC Undo log
Cumbersome Raft-based
block-level WAL multimaster
replication replication of row-
level WAL

Alexander Korotkov Solving PostgreSQL wicked problems 25 / 40


OrioleDB’s answer to 10 wicked problems of PostgreSQL

Problem name Solu on


1. Wraparound Na ve 64-bit transac on ids
2. Failover Will Probably Lose Data Mul master replica on
3. Inefficient Replica on That Spreads Corrup on Row-level replica on
4. MVCC Garbage Frequently Painful Non-persistent undo log
5. Process-Per-Connec on = Pain at Scale Migra on to mul thread model
6. Primary Key Index is a Space Hog Index-organized tables
7. Major Version Upgrades Can Require Down me Mul master + per-node upgrade
8. Somewhat Cumbersome Replica on Setup Simple setup of ra -based mul master
9. Ridiculous No-Planner-Hints Dogma In-core planner hints
10. No Block Compression Block-level compression
* Scalability on modern hardware

Alexander Korotkov Solving PostgreSQL wicked problems 26 / 40


Let’s do some benchmarks! 4

4
https://gist.github.com/akorotkov/f5e98ba5805c42ee18bf945b30cc3d67
Alexander Korotkov Solving PostgreSQL wicked problems 27 / 40
OrioleDB benchmark: read-only scalability
Read-only scalability test PostgreSQL vs OrioleDB
1 minute of pgbench script reading 9 random values of 100M
PostgreSQL
OrioleDB
800000

600000
TPS

400000

200000

0
0 50 100 150 200 250
# Clients

OrioleDB: 4X higher TPS!


Alexander Korotkov Solving PostgreSQL wicked problems 28 / 40
OrioleDB benchmark: read-write scalability
in-memory case
Read-write scalability test PostgreSQL vs OrioleDB
1 minute of pgbench TPC-B like transactions wrapped into stored procedure
PostgreSQL
OrioleDB

400000

300000
TPS

200000

100000

0
0 50 100 150 200 250
# Clients

OrioleDB: 3.5X higher TPS!


Alexander Korotkov Solving PostgreSQL wicked problems 29 / 40
OrioleDB benchmark: read-write scalability
external storage case
pgbench -s 20000 -j $n -c $n -M prepared on odb-node02
mean of 3 3-minute runs with shared_buffers = 32GB(128GB), max_connections = 2500

120000

100000

80000
TPS

60000

40000

20000
pgsql-read-write
orioledb-read-write
0 orioledb-read-write-block-device
0 250 500 750 1000 1250 1500 1750 2000
# Clients

OrioleDB: up to 50X higher TPS!


Alexander Korotkov Solving PostgreSQL wicked problems 30 / 40
OrioleDB benchmark: read-write scalability
Intel Optane persistent memory
pgbench -s 20000 -j $n -c $n -M prepared -f read-write-proc.sql on node03
5-minute run with shared_buffers = 32GB, max_connections = 2500
200000 pgsql
orioledb-fsdax
orioledb-devdax
175000

150000

125000
TPS

100000

75000

50000

25000

0
0 250 500 750 1000 1250 1500 1750 2000
# Clients

OrioleDB: up to 50X higher TPS!


Alexander Korotkov Solving PostgreSQL wicked problems 31 / 40
OrioleDB benchmark: write-amplifica on & bloat test: CPU
Troughtput
800000 PostgreSQL
700000 OrioleDB
600000
500000

TPS
400000
300000
200000
100000
0
400 600 800 1000 1200 1400 1600
seconds
CPU usage
100
PostgreSQL
OrioleDB
80

60
Usage, %

40

20

0
400 600 800 1000 1200 1400 1600
seconds

OrioleDB: 5X higher TPS! 2.3X less CPU/TPS!


Alexander Korotkov Solving PostgreSQL wicked problems 32 / 40
OrioleDB benchmark: write-amplifica on & bloat test: IO
Troughtput
800000 PostgreSQL
700000 OrioleDB
600000
500000

TPS
400000
300000
200000
100000
0
400 600 800 1000 1200 1400 1600
seconds
IO load
35000 PostgreSQL
OrioleDB
30000
25000
20000
IOPS

15000
10000
5000
0
0 250 500 750 1000 1250 1500 1750
seconds

OrioleDB: 5X higher TPS! 22X less IO/TPS!


Alexander Korotkov Solving PostgreSQL wicked problems 33 / 40
OrioleDB benchmark: write-amplifica on & bloat test: space

Space used
80 PostgreSQL
70 OrioleDB
60
50
40
GB

30
20
10
0
400 600 800 1000 1200 1400 1600
seconds
OrioleDB: no bloat!

Alexander Korotkov Solving PostgreSQL wicked problems 34 / 40


OrioleDB benchmark: taxi workload (1/3): read
Disk read
PostgreSQL
175 OrioleDB

150

125

100
IOPS

75

50

25

0
0 500 1000 1500 2000 2500 3000 3500
seconds
OrioleDB: 9X less read IOPS!
Alexander Korotkov Solving PostgreSQL wicked problems 35 / 40
OrioleDB benchmark: taxi workload (2/3): write
Disk write
PostgreSQL
350 OrioleDB

300

250

200
IOPS

150

100

50

0
0 500 1000 1500 2000 2500 3000 3500
seconds
OrioleDB: 4.5X less write IOPS!
Alexander Korotkov Solving PostgreSQL wicked problems 36 / 40
OrioleDB benchmark: taxi workload (3/3): space
Space used
40 PostgreSQL
OrioleDB
35

30

25
GB

20

15

10

0
0 500 1000 1500 2000 2500 3000 3500
seconds
OrioleDB: 8X less space usage!
Alexander Korotkov Solving PostgreSQL wicked problems 37 / 40
OrioleDB = Solu on of wicked PostgreSQL
problems + extraordinary performance

Alexander Korotkov Solving PostgreSQL wicked problems 38 / 40


Roadmap

▶ Basic engine features 4


▶ Table AM interface implementa on 4
▶ Data compression 4
▶ Undo log 4
▶ TOAST support 4
▶ Parallel row-level replica on 4
▶ Par al and expression indexes 4
Ini al release
▶ GiST/GIN analogues

Alexander Korotkov Solving PostgreSQL wicked problems 39 / 40


OrioleDB status

▶ Release is scheduled for December 1st 2021;


▶ https://github.com/orioledb/orioledb;
▶ If you need more explana on, don’t hesitate to make pull requests.

Alexander Korotkov Solving PostgreSQL wicked problems 40 / 40

You might also like