You are on page 1of 13

Postgres DBA Interview Questions

Postgres architecture & internals:

1. How connection is established in postgres?


The supervisor process is called postmaster and listens at a specified TCP/IP port for incoming
connections. Whenever a connection request comes, it spawns a new backend process. Those
backend processes communicate with each other and with other processes of the
instance using semaphores and shared memory to ensure data integrity throughout concurrent
data access.

Once a connection is established, the client process can send a query to the backend process it’s
connected to. The backend process parses the query, creates an execution plan, executes the plan,
and returns the retrieved rows to the client by transmitting them over the established connection.

2. Explain the postgres architecture?

3. What are some important background processes in postgres?


BGWriter ->  It writes dirty/modified buffers to disk
WalWriter- > WAL buffers are written out to disk at every transaction commit. The default value
is -1.
SysLogger:Error Reporting and Logging
Checkpoint
Stats Collector -> It collects and reports information about database activities. The permanent
statistics are stored in pg_catalog schema in the global  subdirectory.
Archiver ->. Setting up the database in Archive mode means to capture the WAL data of each
segment file once it is filled and save that data somewhere before the segment file is recycled for
reuse.

4. What are the memory components in postgres?


Shared Memory:
shared_buffer ->
 Sets the number of shared memory buffers used by the database server
 Each buffer is 8K bytes
 Minimum value must be 16 and at least 2 x max_connections
 Default setting is managed by dynatune
 6% – 25% of available memory is a good general guideline
 You may find better results keeping the setting relatively low and using the operating
system cache more instead
 
wal_buffer ->
 Number of disk-page buffers allocated in shared memory for WAL data
 Each buffer is 8K bytes
 Needs to be only large enough to hold the amount of WAL data created by a typical
transaction since the WAL data is flushed out to disk upon every transaction commit
 Minimum allowed value is 4
 Default setting is -1 (auto-tuned)
clog buffer

Private Memory used by each server process:


Temp_buffer -> for temp table operations.
work_mem ->
 Amount of memory in KB to be used by internal sorts and hash tables beforeswitching to
temporary disk files
 Minimum allowed value is 64 KB
 It is set in KB and the default is managed by dynatune
 Increasing the work_mem often helps in faster sorting
 work_mem settings can also be changed on a per session basis
 
maintenaince_work_mem ->
 Maximum memory in KB to be used in maintenance operations such as
 VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY − Minimum
allowed value is 1024 KB
 It is set in KB and the default is managed by dynatune
− Performance for vacuuming and restoring database dumps can be improved
 by increasing this value
autovacuum_work_mem ->
 Maximum amount of memory to be used by each autovacuum worker process
 Default value is -1, indicates that maintenance_work_mem to be used instead

5. When wal writer write data to wal segement?


 
6. When bgwriter writes data to disk?

6. What are some wal related parameter in postgrs.conf file?


max_wal_size – > It is the total size of wal segment(soft limit). If the this size is filled, then
checkpoint will occurs.
wal_keep_segment
wal_keep_size =
max_wal_sender = 10 (maximum number of wal sender process)
wal_level =

7. What is checkpoint? When checkpoint happens in postgres?


A checkpoint is a point in the transaction log sequence at which all data files have been updated
to reflect the information in the log. All data files will be flushed to disk.
1. pg_start_backup,
2. CREATE DATABASE,
3. pg_ctl stop|restart,
4. pg_stop_backup,
5. When we issue checkpoint command manually.

For periodic checkpoints below parameter play important role.


 checkpoint_timeout = 5min
 max_wal_size = 1GB
A checkpoint is begun every checkpoint_timeout seconds, or if max_wal_size is about to be
exceeded, whichever comes first. The default settings are 5 minutes and 1 GB, respectivel
With these (default) values, PostgreSQL will trigger a CHECKPOINT every 5 minutes, or after
the WAL grows to about 1GB on disk

8. Let’s say your work_mem is 4MB. However your query is doing heavy sorting, for which
work_mem required might be more than 4MB. So will the query execution be successful?
Yes it will be successful, Because it utilized the work_mem fully, it started using temp files.

9. What are the different phases of statement processing?

10. Which parameters controls the behaviour of BGWriter?


bgwriter_delay – size of sleep delay when number of processed buffers exceeded.
bgwriter_lru_maxpages – number of processed buffers after bgwriter delays.
bgwriter_lru_multiplier – multiplier used by bgwriter to calculate how many buffers need to be
cleaned out in the next round.
These settings used to make bgwriter more or less aggressive – lower values of maxpages and
multiplier will make bgwriter lazier, and higher maxpages and multiplier with low delays will
make bgwriter more diligent.

11. Does postgres support direct i/o.?

12. How authentication happens in postgres?


pg_hba.conf file is used defining host based authentication methods. In this file, we define the
details like host,database, ip address, user and method.
Different types of methods in pg_hba.conf file are .
 Trust: for this option users can connect to the database without specifying a password.
When using this option one should be cautious.
 Reject: This option rejects a connection to a database(s) for a user for a particular record
in the file.
 Password: this option prompts the user for a password before connecting to the database.
When this method is specified the password is not encrypted between the client and the
database.
 Md5: this option prompts the user for a password before connecting to the database.
When this method is specified the client is required to supply a double-MD5-hashed
password for authentication.
 rver.
 Ident – Obtain the operating system user name of the client by contacting the ident server
on the client and check if it matches the requested database user name. Ident
authentication can only be used on TCP/IP connections. When specified for local
connections, peer authentication will be used instead.
If you are doing any changes to pg_hba.conf file, then you need to reload /restart the cluster for
the changes to take place.

13. What is the significance of pg_ident.conf file?


Just like we have os authenticated db users in oracle. Here in postgres also we have similar
concept. We can provide mapping of os user and postgres db user inside pg_ident.conf file.

14. What is this wal_level parameter, different values of wal_level?


 Wal_level determines how much information is written to the WAL
 The default value is replica, adds logging required for WAL archiving as well as
information required to run read-only queries on a standby server
 The value logical is used to add information required for logical decoding
 The value minimal, removes all logging except the information required to recover from
a crash or immediate shutdown
 This parameter can only be set at server start

15. What is visibility map in postgres?


Every heap relation (i.e table/index) have a visibility map associated with them.
Every visibility map has 2 bits per page.
The first bit, if set, indicates that the page is all-visible (means those pages need not to be
vacuumed)
The second bit, if set means, all tuples on this page has been frozen. (No need to vacuum)
Note – > Visibility map bits are set by VACUUM operation. And if data is modified, bits will be
cleared.
This condition helps in index only scan.

16. What is free space mapping (FSM) in postgres?


Each table/index has a Free space mapping file. It keeps information about which pages are free.
The VACUUM process also updates the Free Space Map and using VACUUM FULL we can
recover those free spaces.

17. What is initial fork?

18. What is TOAST?


19. What happens in the background during vacuuming process?

20. What is transaction id wraparound? How to overcome it.

21. What is the significance of search_path in postgres?\


 
22. What is multi version concurrency control( MVCC)? 
MVCC helps in currency control by maintaining multiple copies of same tuple, so that read and
write operations same tuple doesnt impacts.  Update in postgres means new row will be inserted
and old row will be invalided( we can say delete followed by insert). And these old tuples can be
visible to the other transactions depending upon the isolation level.
 
22. What are the advantages and disadvantages of MVCC?
Advantage is read consistency.  Because of mvcc readers are writers don’t block each other. If a
a row has been modified by a user, but not committed. And if another user try to access that row,
Then he will get the old committed data.

Disadvantage is , it causes bloating. Also it needs more data to keep multiple versions of the
data. If your database doing lot of DML activities, then postgres need to keep all the old
transaction records also(updated or deleted) . And maintenance activity like vaccum need to be
done remove the dead tuples.

23. Difference between pg_log, pg_clog, pg_xlog?

24. What is ctid?


It is similar to row_id in oracle.
The physical location of the row version within its table. Note that although the ctid can be used
to locate the row version very quickly, a row’s ctid will change if it is updated or moved
by VACUUM FULL.

25. What is oid?


Every row in postgres will have a object identifier called oid.
 
26. Difference between oid and relfilenode?

27. Difference between postgres.conf and postgres.auto.conf file?


postgres.conf is the configuration file of the postgres cluster. But when we do any config
changes using alter system command, then those parameters are added in postgres.auto.conf file.
When postgres starts , it will first read postgres.conf and then it will read postgres.auto.conf  file.

28. What is timeline in postgres?

29. what is full page write in postgres?


full_page_write parameter is set to on ( default value).
When this parameter is on, the PostgreSQL server writes the entire content of each disk page to
WAL during the first modification of that page after a checkpoint. This is needed because dirty
write that is in process during an operating system crash might be only partially completed,
leading to an on-disk page that contains a mix of old and new data. The row-level change data
normally stored in WAL will not be enough to completely restore such a page during post-crash
recovery. Storing the full page image guarantees that the page can be correctly restored.

ADMINISTRATION:
1. What is the default port in postgres?
Default is 5432.

2. What is the default block size in postgres? Can we set different block size?

3. What is a tablespace?
A tablespace is used to map a logical name to a physical location on disk. In the simplest word,
we can understand tablespace as a location on the disk where all database objects like table and
indexes are stored.
It is also important to note that the name of the tablespace must not start with pg_, as these are
reserved for the system tablespaces.

Tablespace list can be found using below command:


postgres=# select * from pg_tablespace;
(OR)
postgres=# \db+

4. What are the tablespaces created by default after installing postgres cluster?
Pg_global – > PGDATA/global – > used for cluster wide table and system catalog
Pg_default – > PGDATA/base directory – > it stores databases and relations
Whenever any user creates any table or index, it will be created under pg_default and this is the
default_tablespace setting.
However you can change the default_tablespace setting using below one.
alter system set default_tablespace=ts_postgres;

5. What is the use of temporary tablespace? 


It is used to store temporary objects like temporary table etc. But If we don’t set temp_tablespace
parameter explicitly then , temporary objects will be created under pg_default tablespace.
postgres=# SELECT name, setting FROM pg_settings where name=’temp_tablespaces’;

6. What are the default databases created after setting up postgres cluster? 
postgres=# select datname from pg_database;
datname
———–
postgres
template1
template0
 
7. What is the significance of template0 and template1 databases? What the difference
between these two?
Whenever we create a new database , it creates is using the template1 database. And if you
modify anything in template1 database like creating new extension in template1 database, then
these changes will be reflected to other created databases.

Template0 – > No changes are allowed to this database. If you messed up your template1
database, then you can revert them by creating a new one from template0.

In the below screenshot, you can see for template1, datallowconn is true, But template0 it is
false.

 
8. What are common database object names in postgres compare to industry terms.
 

9. Difference between user and schema in postgres?


If you are aware of oracle db, then you know that both user and schema are same. But in postgres
both are different.

A schema is a collection of database objects. But a user is used to connect to postgres cluster. A
single user can be used to connect to any database in the cluster (provider permission is give).
However schema is local to that particular database.
Below diagram should clear your doubts
|-------------------------------------------|---|
| PostgreSQL instance | |
|-------------------------------------------| U |
| Database 1 | Database 2 | S |
|---------------------|---------------------| E |
| Schema 1 | Schema 2 | Schema 1 | Schema 2 | R |
|----------|----------|----------|----------| S |
| t1,t2,t3 | t1,t2,t3 | t1,t2,t3 | t1,t2,t3 | |
-------------------------------------------------

10. Difference between role and user in postgres? Can we convert a role to user?
Role and user are almost same in postgres, only difference is , a role cannot login , But a user
can. We can say like a user is  role with login privilege.
And yes we can convert the role to a user.
alter role <role_name> nologin;
 
11. Difference mode to stop a postgres cluster using pg_ctl?
Postgres has 3 shutdown modes.
 smart –> It is the normal mode . i.e  shutdown will wait for all existing connection to be
terminated And it might take a lot of time. And no new connections will be allowed
during this.
 fast -> It terminates all existing sessions and performs checkpoint . It is comparably
quick. But if there is lot of trnsactions that need to written to disk, then checkpoint might
take lot of time.
 immediate– > It will abort the instance. I.e it will terminate all existing session and
shutdown the instance without performing checkpoint. It is the quickest . But upon
instance startup , recovery will happen.
 
So if you have lot of pending transactions in your database, then the best way is perform
checkpoint manually and then stop the server.
postgress#CHECKPOINT; – > Run as super user
enterprisedb$ pg_ctl stop -m fast
 
12. Can you give comparison of shutdown modes between oracle and postgres?
ORACLE POSTGRES
shutdown normal smart
shutdown immediate fast
shutdown abort immediate

13. Can we stop a particular database in postgres cluster?


No we cannot stop/start a particular database in postgres. We can only shutdown the postgres
cluster.

14.  What are foreign data wrappers? What is its use?


 
2. What are the different datatypes in postgres?
Boolean
char,vchar,
int,float(n)
uuid
date

2. what is the maximum file size of table or index in postgres? Can we increase that ?
Max size is 1GB.  If a table size is big, then it can spread across multiple files.
 
4. Is there a way in which we can rebuild/reorg a table online to release free space?
As we know vacuum full is used to rebuild table and it releases free space to operating system.
However this method, puts an exclusive lock on the table.
So we can use pg_repack extension to rebuild a table online.

4. How can i check the version of postgres?

4. What is the latest version of postgres in market?


Postgres 14.

5. While dropping a postgres database i am getting error? What might be the issue?
If you want to drop a database , then you need to fire the command, after connecting to a
different database in the same postgres cluster with superuser privilege.

5. What are some key difference between oracle and postgres?

5. Between postgres and nosql database like mongodb , which one is better?

5. What is table partitioning in postgres? What are the advantages?

5. How you monitor long running queries in postgres?

5. How to kill a session in postgres?

5. What is an extension in postgres? Which extensions you have used ?


 
5. What are the popular tools for managing backup and recovery in postgres?
edb bart , barman etc

5. How can we use connection pooling in postgres?


pgbouncer is used for connection pooling

5. Which utility is used to upgrade postgres cluster?


pg_upgrade utility is used to upgrade postgres version.

5. How can we encrypt specific columns in postgres?


pgcrypto extension can be used

5. What is the use of pgbench utility?


5. What is difference between pg_cancel_backend vs pg_terminate_backend?

 
MAINTENANCE:
1. What is vacuuming in postgres?
Whenever tuples are deleted or becomes obsolete due to update, then they are not removed
physicallly. These tuples are known as dead tuples. So vacuuming can be used to clear those
dead tuples and release the space.

2. Difference between vacuum and vacuum full?


Vacuum(without full)
 Vacuum without full option, will reclaim the space and re-use it. However that free space
is not returned to operating system.
 This will not put any exclusive lock on the system. So impact on the normal read write of
the database.
 We can run this in parallel also. called as parallel vacuuming

Vacuum(with full)
 Vacuum with full option, will reclaim more space and release the space to operating
system.
 However we need additional storage , as it will rewrite the entire  data to new disk file
without any extra space.
 Also it will put exlusive lock on the table for which vaccuming is in progress.
 This method is much slower.

3. What do you mean by vacuum freeze? And when we use it?

4. What is auto vaccuming in postgres?


Autovacuum triggers when dead tuples in a table reaches 20 percent of. table and it analyze the
table when it reaches 10 percent of a table.
Below four parameters control the behaviour of the autovacumming.
autovacuum_vacuum_scale_factor = 0.2;
autovacuum_analyze_scale_factor = 0.1;
autovacuum_vacuum_threshold (integer)50
autovacuum_analyze_threshold (integer)50

5. How can we increase the performance of auto vacuuming?


autovacuum_max_workers parameter helps

6. How can i disable autovacuuming?

7. What is the data fragmentation or bloating in postgres?

8. Difference between normal transaction id and frozen transaction id?

9. How statistics were updates in postgres?


10. What CLUSTER command does? When and how it helps.?
As we know the table data may or may not be in a ordered fashion. However in index, the entries
are in ordered fashion. So we ca sort the data inside table to be in ordered fashion by refering the
index.
command is – > CLUSTER <table_name> USING < index_name>
In oracle , we will this clustering factor. clustering factor is low means, table data are is orderly
manner.
If we are accessing only single row,  then this clustering factor doesn’t impact the performance.
However you are accessing multiple rows, using index, then clustering factor will improve the
performance.
NOTE – > CLUSTER will put exclusive lock on the table during this activity.

11. What is reindexing?

12. Let’s says autovacuuming already running in background and You again ran the
vaccum command manually? What will happen?
In this case, postgres will terminate the autovacuum worker process and give priority to the
manual vacuum command. i.e autovacuuming will never block the normal manual database
operations.

13. Can i disable autovacuum for few tables in the database?


Yes you can do with below command.
alter table test2 set( autovacuum_enabled = off);
 
REPLICATION:
 
5. How can you check whether the database is primary or standby (replication).
Method 1 :
We can check using below query. If the output is t means true i.e it is standby (slave server). If
false means its primary or master.
select pg_is_in_recovery();

Method 2:
You can run below query.
select * from pg_stat_replication;
 
5. I have a master and slave setup with streaming replication. Now I want to break the
replication and open the standby server for some testing purpose and once the testing is
done, the server again need to be a standby of that primary with replication enabled. How
can i do that?
pg_rewind.
 
MONITORING:
BACKUP & RECOVERY:

5. Explain different backup methods in postgres?


Reference – https://www.tutorialdba.com/p/postgresql-backup-recovery-overview.html

 
5. Difference between pg_dump and pg_dumpall?
Both are logical backup utilities.
pg_dump -> Used for taking backup of databases, schema,table .backup format is like sql, tar file
or directory
pg_dumpall -> Used for taking backup of complete cluster. backup format is sql .

1. What pg_basebackup does internally?

1. Explain how postgres does crash recovery?


In case of crash of postgres instance, when the instance starts again, it will reply the wal file and
recover the instance.

1. Can we take incremental backup using postgres? and how?


You can use third party tools like BART and barman.
INDEXES:

1. How can i rebuild all the indexes of a database in one command ?


As a super user run the below command.
REINDEX DATABASE <DB_NAME>;

2. What happens internally when we rebuild indexes?


 
3. Different type of indexes present in postgres ?
 BTREE
 HASH
 GIST
 BRIN
 GIN

4.  Does create index puts lock on the table? If yes , then any workaround to avoid this
locking?
Yes create index command puts exclusive lock on the table . So you can
add CONCURRENTLY keyword, which will avoid the exclusive lock on the table.

5. What is index only scan in postgres?


PATCH & UPGRADE:
PERFORMANCE RELATED:
 
TROUBLESHOOTING:
 
postgres DBA Interview questions - DBACLASS DBACLASS

You might also like