You are on page 1of 2

UUID Primary Key in Postgres, What Insert

Performance Impact?
Ask Question

19
3
I am wondering about the performance impact of using a non-sequential UUID as the primary key in a table
that will become quite large in PosgreSQL.

In DBMS's that use clustered storage for table records it is a given that using a UUID is going to increase the
cost of inserts due to having to read from disk to find the data page into which to perform the insert, once the
table is too big to hold in memory. As I understand it, Postgres does not maintain row clustering on inserts, so
I imagine that in Postgres using a UUID PK does not hurt the performance of that insert.

But I would think that it makes the insert into the index that the primary key constraint creates much more
expensive once the table is large, because it will have to constantly be read from disk to update the index on
insertion of new data. Whereas with a sequential key the index will only be updated at the tip which will
always be in memory.

Assuming that I understand the performance impact on the index correctly, is there any way to remedy that or
are UUIDs simply not a good PK on a large, un-partitioned table?

database performance postgresql indexing uuid

shareimprove this question


edited Dec 11 '15 at 19:12

Dmitry Savinkov
3,59421528
asked Dec 11 '15 at 18:28

philofinfinitejest
2,9131820
add a comment
2 Answers
activeoldest votes

14
As I understand it, Postgres does not maintain row clustering on inserts
Correct at the moment. Unfortunately.

so I imagine that in Postgres using a UUID PK does not hurt the performance of that insert.
It still does have a performance cost because of the need to maintain the PK, and because the inserted tuple is
bigger.

 The uuid is 4 times as wide as a typical 32-bit integer synthetic key, so the row to write is 12 bytes bigger
and you can fit fewer rows into a given amount of RAM
 The b-tree index that implements the primary key will be 4x as large (vs a 32-bit key), taking longer to
search and requiring more memory to cache. It also needs more frequent page splits.
 Writes will tend to be random within indexes, not appends to hot, recently accessed rows
is there any way to remedy [the performance impact on the index] or are UUIDs simply not a good PK on a
large, un-partitioned table?
If you need a UUID key, you need a UUID key. You shouldn't use one if you don't require one, but if you
cannot rely on a central source of synthetic keys and there is no suitable natural key to use, it's still the way to
go.

Partitioning won't help much unless you can confine writes to one partition. Also, you won't be able to
usefully use constraint exclusion on searches for the key if writing only to one partition at a time, so you'll still
have to search all the partitions' indexes for a key when doing queries. I can only see it being useful if your
UUID forms part of a composite key and you can partition on the other part of the composite key.

shareimprove this answer


answered Dec 12 '15 at 10:27

Craig Ringer
203k36425551
 2
Why is it "unfortunate" that Postgres doesn't store the data clustered? I have never understood why that would be
the default way of storing a relational table. There are some cases where it is helpful, but as a general default I don't
buy it. But then maybe I have been working too long with Oracle where index organized tables (Oracle's equivalent
to a clustered index) are the exception. – a_horse_with_no_name Dec 12 '15 at 10:33
 1
I don't think it should be the default; heaps often have better properties. But index-organized tables are great for
read-mostly tables that're joined on a lot. It's a pain CLUSTERing a table all the time. You can still create big fat
indexes for use with index-only scans, but that's a waste of disk and I/O. HOT updates in PostgreSQL help, albeit
at the price of needing wasted space in each page to use to write updated tuples. BRIN indexes would benefit
greatly from attempts to direct writes near similar values, and will hopefully also make it easier to implement such a
feature. – Craig Ringer Dec 12 '15 at 10:38
add a comment
1
It should be mentioned that you will get more WALs generated if you have btree index on UUID column
with full_page_writes option enabled. This happens because of UUID randomness - the values are not
sequential so each insert is likely to touch completely new leaf index leaf page. You can read more in On the
impact of full-page writes article.

You might also like