Professional Documents
Culture Documents
Session: C09
Tue Oct 25 5:30-6:30
This session will dive into the internals of DB2 UDB in depth, including details of the latest versions of DB2 on the Unix, Windows and Linux platforms (V8.2 and beyond). Details such as record formats, page formats, index algorithms, memory management and tuning, storage management, bufferpool algorithm, logging, and the process and threading design will be covered in depth. As each concept explained,
key hints, tips and best practice information will be provided. This will enable DBAs and System Administrators to fully exploit the functions and features of DB2 UDB.
In this first (of two) parts, the focus will be on process and thread management, as well as logging, buffering and memory management..
Agenda
Part I
Architecture Overview
Process/Thread Model
Base Processing Model
Concentrator
Hints/Tips/Best Practices
Memory Management, Buffering, Logging
Shared and Private Memory Heaps
Sorting
Buffer Pools
Logging
Hints/Tips/Best Practices
Part II
Storage Architecture
System Managed Storage (SMS) Tablespaces
Database Managed Storage (DMS) Tablespaces
Automatic Storage
Hints/Tips/Best Practices
Data Management
Tables, Records, Indexes
Page Format, Space Management
Multi-Dimensional Clustering
Hints/Tips/Best Practices
As you can tell from this agenda, I'm going to focus on the lower ends of the system (ie. the components below the SQL processor). The basic format is - explain some concepts, follow up with hints/tips related to those concepts, explain more concepts, more hints/tips, etc.
Some administrivia related to registry variables and configuration parameters (which are often referred to in the talk):
Registry variables are usually referred to with syntax like DB2_STRIPED_CONTAINERS=ON. To set a registry variable, do the following:
db2set DB2_STRIPED_CONTAINERS=ON
Configuration parameters are usually referred to in italics. To view configuration parameter values use:
db2 get database manager configuration
db2 get database configuration for <dbname>
Part II
Storage Architecture
System Managed Storage (SMS) Tablespaces
Database Managed Storage (DMS) Tablespaces
Hints/Tips/Best Practices
Data Management
Tables, Records, Indexes
Page Format, Space Management
Multi-Dimensional Clustering
Hints/Tips/Best Practices
I'll first provide a high level narrative of the processing flow throughout the database server.
Architecture Overview
Parallelism
SQL and Utilities
Intra- & Intra-Partition
Clients Parallelism
Cost-based Optimizer with
Query Rewrite
DB2 Server Dynamic throttling based on
CPU Coordinator load
Agent
CPU
Package
Parallel SMP Exploitation
Subagents All CPUs exploited through
CPU Cache
OS threads and processes
CPU
CPU
Very Large Memory
Exploitation
CPU 64 bit Support
Log Buffers Buffer Pools
I/O Bufffering
CPU Multiple Buffer Pools
CPU
Logger Prefetchers Page
Cleaners I/O Subsystem
Asynchronous, Parallel I/O
Automatic, Intelligent Data
Striping with Parallel I/O
Big block I/O
Scatter/Gather I/O
Each circle in the box is an EDU (Engine Dispatchable Unit). EDUs are implemented as threads on Windows (all within a single process) and processes on Linux and UNIX. Each application is assigned a dedicated coordinator agent (by default - more on this later), which coordinates the processing for that application and communicates with it. Applications can also be assigned a set of subagents which work together on individual SQL requests (eg. sharing a sort) so as to fully exploit SMP machines. All agents are managed with a pooling algorithm which minimizes EDU
creations/destructions. The circles above the cloud are application programs (either local or remote) that are linked with DB2's client library (the client and server versions can differ, within limits). Local clients talk to their coordinator agent via shared memory and semaphores. Remote clients use TCPIP, SNA, IPX/SPX or IPC (for local clients).
The prefetchers' main duty is to ensure agents doing scans never wait for disk I/O. Agents send asynchronous read-ahead requests to a common prefetch queue, and the prefetchers use big-block or scatter read I/Os to bring the request pages into the bufferpool. Data is striped across the disks to enable the prefetchers to drive multiple disks simultaneously.
The pagecleaners' main duty is to ensure agents trying to bring a page into the buffer pool, never need to flush a dirty page to disk to free up a slot. This extra I/O would unecessarily reduce response time. The page cleaners are background EDUs which, under certain conditions (described later) wake up and "clean" (flush to disk) dirty pages.
The ARIES recovery method is used (generally recognized as the most advanced in the industry). Agents updating a record in the database update the associated page (of course), and write a log record containing information necessary to either redo or undo the change. (Various techniques, including XOR logging, are used to minimize the amount of data logged). Neither the page nor the log buffer are flushed to disk immediately (to optimize performance). The logger and bufferpool manager cooperate to implement a WAL (Write Ahead Logging) protocol that ensures any dirty page
does not make it to disk before it's associated log record. The only I/O that is always required by transaction is a force of the log at COMMIT time.
* C.Mohan,D.Haderle,B.Lindsay,H.Pirahesh,P.Schwarz. ARIES: a transaction management method using write-ahead logging. In ACM SIGMOD Conf. on the Management of Data, 1992.
Architecture Overview : DPF Feature
Shared Nothing Architecture Partitions are Logical
Any number of partitions can be created
Allows Virtually Unlimited on a single physical machine (works
Scalability extremely well with NUMA architectures)
Each partition owns it's resources (buffer pool,
locks, disks,...)
Avoids common limits on scalability:
No need for distributed lock manager or buffer
Virtually Everything Runs in
coherence protocols
No need to attach disks to multiple machines
Parallel Across Nodes
SQL: queries, inserts, updates, deletes
Partitions Communicate Only Necessary Tuples
Using shared memory (same machine) Utilities: Backup, Restore, Load, Index
Using high speed comm (diff. machines) Create, Reorg
Clients Optimized by global optimizer
...
CPU CPU CPU
In a MPP environment, the processing architecture described on the previous slide is extended across all database partitions (aka nodes). Clients connect into one node, and that node is where the coordinator agent will reside. The subagent pool, however, extends across all nodes that have been involved in SQL requests issued by the client.
Data for a given table is partitioned acrosss the nodes automatically and transaparently by DB2 based on a key hashing algorithm. Within each node, the operation of the node (eg. prefetchings, page cleaning, etc) is identical to that described on the previous chart.
Agenda
Part I
Architecture Overview
Process/Thread Model
Base Processing Model
Concentrator
Hints/Tips/Best Practices
Memory Management, Buffering, Logging
Shared and Private Memory Heaps
Sorting
Buffer Pools
Logging
Hints/Tips/Best Practices
Part II
Storage Architecture
System Managed Storage (SMS) Tablespaces
Database Managed Storage (DMS) Tablespaces
Hints/Tips/Best Practices
Data Management
Tables, Records, Indexes
Page Format, Space Management
Multi-Dimensional Clustering
Hints/Tips/Best Practices
I'll now get into more details on how DB2 uses proceses and threads.
Process/Thread Organization
Processing Model : Detailed View Per-instance
Per-application
Per-database
Idle, pooled agent or subagent
Logging Prefetchers
tif
Buffer Pool(s)
No
db2pfchr
Vi
Deadlock
db2loggr Detector ck
,
Page
blo
db2loggw
db2dlock , Big- sts
e
Cleaners
l
lle qu db2pclnr
ra Re
Pa ad
R e
Log lel,
ge
Pa sts
e Data Disks
ral equ
Disks a
P te R
Wr
i
This chart, through it's animation, takes you through the start up and database activation process, step by step.
Process/Thread Organization
Processing Model : Detailed View Per-instance
Per-application
Per-database
Idle, pooled agent or subagent
Logging Prefetchers
tif
Buffer Pool(s)
No
db2pfchr
Vi
Deadlock
db2loggr Detector ck
,
Page
blo
db2loggw
db2dlock , Big- sts
e
Cleaners
l
lle qu db2pclnr
ra Re
Pa ad
R e
Log lel,
ge
Pa sts
e Data Disks
ral equ
Disks a
P te R
Wr
i
This chart, through it's animation, takes you through the start up and database activation process, step by step.
Process/Thread Organization
Processing Model : Detailed View Per-instance
Per-application
Per-database
Idle, pooled agent or subagent
SQL
Common
Shared Mem & Semaphores, TCPIP, Named Pipes, NetBIOS, SNA, IPX/SPX
Client
Idle Agent Pool UDB Server
Instance Level 3
Listeners
Deadlock
Detector Page
Cleaners
Log Disks
Disks
This chart shows the algorithm used to select (or create) a new subagent if one is needed to execute a new SQL statement on the leftmost application.
As indicated, the algorithm gives strong preference towards re-using existing processes/threads and avoiding their creation.
Processing Model : Hints/Tips
And ensure agents are "stolen" and "created" much less frequently than agents are
"assigned" from the pool
A high OS "context switch" rate may also indicate that the steal or creation rate is too high
N Client Connections
Communications Link
N Coordinator K Coordinator
Agents Agents
Page Cleaners
Prefetchers
Bufferpools
Common
Shared Mem & Semaphores, TCPIP, Named Pipes, NetBIOS, SNA, IPX/SPX
Client
Idle Agent Pool UDB Server
Instance Level New Transactions
SQL within a SQL within a Listeners
Transaction Transaction
Dispatchers
db2disp
Coordinator
Database Level Agents
Select agent from an
idle pool, or,
Create new agent (if
within cfg'd limit) Subagents
Otherwise, Queue the
request
Logging Prefetchers
Subsystem Buffer Pool(s)
Deadlock
Detector Page
Cleaners
Log Disks
Disks
When a new transaction starts, the dispatchers (there can be more than one) try to find/create an agent to work on it (the brown arrows indicate this process).
If there is no free idle agent, or an agent cannot be created (because the configured limit on the number of coordinator agents has been reached), the request is queued.
When a coordinator agent working on behalf of a particular transaction becomes available (because the transactions ends), the agent will then serve the next transaction (regardless of connection) on the queue. If the queue is empty, it will wait for a request to appear.
Note that when Application Groups are present (discussed in previous speaker notes) there is 1 queue per application group, and each agent is associated with a particular application group.
Agents first look to their own application groups for new transaction requests.
To ensure no single application group 'hogs' the system resources, there is a mechanism which allow agents to migrate from across application groups over time.
Processing Model : Application Groups V8
Common
Shared Mem & Semaphores, TCPIP, Named Pipes, NetBIOS, SNA, IPX/SPX
Client
Idle Agent Pool UDB Server
Instance Level
Listeners
Coordinator
Agents
Application
Group Level V8
db2agntp
Subagents
Logging Prefetchers
Subsystem Buffer Pool(s)
Deadlock
Detector Page
Cleaners
Log Disks
Disks
This chart illustrates the processing model with 2 Application Groups in effect.
Note that in this environment, the database level idle agent pool is actually comprized of the 2 separate idle agent pools - one per Application Group. Note that the the other pools (eg. Application level, Instance level) are independent of this.
Processing Model : Application Groups V8
Coordinator
Agents
Application
Group Level V8
db2agntp
Subagents
Application groups are created
transparently by DB2 on demand Logging
Subsystem Buffer Pool(s)
Prefetchers
Deadlock Page
As applications connect to a database Detector
Cleaners
This chart shows the relationship between with Application Group Heap, the Application Control Heap, and the Application Group Shared Memory segment.
Concentrator & App Group : Hints/Tips
Try to ensure that applications are evenly divided amongst application groups
To avoid wasting memory in one of the application groups
For example, if you expect 200 peak applications at any given time, consider setting
APPGROUP_MEM_SZ / APP_CTL_HEAP_SZ to 100 (or 50 or 200)
This will ensure 2 ( or 4 or 1) 'fully populated' application groups
Part II
Storage Architecture
System Managed Storage (SMS) Tablespaces
Database Managed Storage (DMS) Tablespaces
Hints/Tips/Best Practices
Data Management
Tables, Records, Indexes
Page Format, Space Management
Multi-Dimensional Clustering
Hints/Tips/Best Practices
The next part of the sessions focus on memory management, including buffer management and log management.
Memory Model
Instance Shared Memory
FCMbuffers (fcm_num_buffers)
Monitor heap (mon_heap_sz)
This chart describes the various types of memory that exist in a parition, the main heaps and usages of the memory, as well as the configuration parameters which control the size of these heaps.
maxappls is a database configuration parameter that sets an upper limit to the number of applications that connect to a database. maxagents is a database manager configuration parameter that sets an upper limit to the total number of agents in a partition.
All EDUs in a partition are attached to Instance Shared Memory. All EDUs doing work within a database are attached to that database's Database Shared Memory. All EDUs working on behalf of a particular application, are attached to an Application Shared Memory region for that application. This type of shared memory is only allocated if intra- or inter-partition parallelism is enabled. In addition, all EDUs working on behalf of a particular application, are attached to the Application Group Shared Memory region for the Application Group that application is a member of. Application Groups,
and Application Group Shared Memory are not used if neither intra-partition parallelism, inter-partition parallelism, nor the concentrator, are enabled. Finally, each EDU, as well, has it's own private memory.
1 ... numdb
1 ... maxagents
Coordinator Agent / Local Client
Communications Memory
Agent/client comm area (aslheapsz)
This chart extends the previous chart, showing how some of the additional segments fit in.
Memory Model : Shared Segments Example Instance Shared Memory
includes FCM (Fast Communication Manager) buffers
Database Connection Information Application Group Sh'd Memory V8 Application Shared Memory
Application Control Heap (appl_ctl_heap_sz)
Application Group Heap Memory
(appgroup_mem_sz, groupheap_ratio)
Database server = DB2/6000 8.1.5 Agent Private Memory
private sorts (sortheap,sheapthresh)
SQL authorization ID = HURAS 1 ... maxappls
application heap (applheapsz)
agent stack (agent_stack_sz)
query heap (query_heap_sz)
m 1133903926 0x590ed661 --rw------- huras build huras build 18 8126464 88892 139862 13:17:54 13:17:54 13:17:17
m 181403700 0xffffffff --rw------- huras build huras build 8 268435456 151086 151086 13:17:41 13:17:41 13:17:41
m 3801128 0xffffffff --rw------- huras build huras build 2 131072 92296 151086 13:17:41 13:17:41 13:17:41
m 824442949 0x590ed674 --rw-rw-rw- huras build huras build 19 140665792 116770 72352 13:17:54 13:17:55 8:38:05
/home/huras> db2stop
SQL1063N DB2STOP processing was successful.
/home/huras> db2 update cbm cfg using intra_parallel on
DB20000I The UPDATE DATABASE MANAGER CONFIGURATION command completed successfully.
/home/huras> db2start
SQL1063N DB2START processing was successful.
/home/huras> db2 connect to mydb
Database Connection Information
Database server = DB2/6000 8.1.5
SQL authorization ID = HURAS
Local database alias = MYDB
/home/huras> ipcs -ma | grep huras
m 1133510710 0x590ed661 --rw------- huras build huras build 18 11534336 74900 74900 13:16:37 13:16:39 13:16:36
m 1966123 0xffffffff --rw------- huras build huras build 8 268435456 161484 161484 13:16:39 13:16:39 13:16:39
m 3670056 0xffffffff --rw------- huras build huras build 1 82051072 161484 161484 13:16:39 13:16:39 13:16:39
m 181272628 0xffffffff --rw------- huras build huras build 2 131072 92294 161484 13:16:39 13:16:39 13:16:39
m 824442949 0x590ed674 --rw-rw-rw- huras build huras build 19 140665792 116770 74902 13:16:39 13:16:39 8:38:05
This chart provides an example, on a UNIX platform, illustrating what segments get created, and when, and how you can show and recognize the segments using the UNIX ipcs command.
Memory Model : Shr Segments Example ...
/home/huras> db2 alter bufferpool ibmdefaultbp size 100000
SQL20189W The buffer pool operation (CREATE/ALTER) will not take effect until
the next database startup due to insufficient memory. SQLSTATE=01657
/index:/home/huras> db2 terminate
DB20000I The TERMINATE command completed successfully.
/home/huras> db2 connect to mydb
Database Connection Information
Database server = DB2/6000 8.1.5
SQL authorization ID = HURAS
Local database alias = MYDB
/home/huras> ipcs -ma | head
IPC status from /dev/mem as of Mon Mar 15 13:37:11 EST 2004
T ID KEY MODE OWNER GRP CREATOR CGRP NATTCH SEGSZ CPID LPID ATIME DTIME CTIME
/home/huras> ipcs -ma | grep huras
m 1133510710 0x590ed661 --rw------- huras build huras build 18 11534336 74900 74900 13:16:37 13:16:39 13:16:36
m 1966123 0xffffffff --rw------- huras build huras build 8 268435456 161484 161484 13:16:39 13:16:39 13:16:39
m 1332477970 0xffffffff --rw------- huras build huras build 8 268435456 161484 161484 13:16:39 13:16:39 13:16:39
m 3670056 0xffffffff --rw------- huras build huras build 1 82051072 161484 161484 13:16:39 13:16:39 13:16:39
m 181272628 0xffffffff --rw------- huras build huras build 2 131072 92294 161484 13:16:39 13:16:39 13:16:39
m 824442949 0x590ed674 --rw-rw-rw- huras build huras build 19 140665792 116770 74902 13:16:39 13:16:39 8:38:05
/home/huras> db2mtrk -d
Tracking Memory on: 2004/03/15 at 13:12:52
Notes:
1. One of -i -d -p must be specified.
2. The -w and -m flags are optional. An invocation of the application is invalid if both flags are specified.
3. The -m flag reports the maximum allowable size for a given heap while the -w flag reports the largest amount
of memory allocated from a given heap at some point in its history.
Usage scenarios:
db2mtrk -i -d
Report current memory usage for instance and all databases
db2mtrk -i -p -m
Report maximum allowable size for instance and agent private memory
db2mtrk -p -r 1 5
Report agent private memory five times at one second intervals
Heap Legend:
When running in normal mode (i.e. -v flag not specified) heaps are named using the following codes:
Details on a very useful new command to track internal DB2 memory usage - the db2mtrk command.
Heaps and Memory : Hints/Tips
Use db2 get snapshot ... to help determine if adjustment needed, eg.:
Heap db2 get snapshot for ...
locklist database on <dbname> | grep "esc"
pckcachesz database on <dbname> | grep "Package cache"
sortheap, sheapthresh, database on <dbname> | grep "ort"
sheapthresh_shr database manager | grep "ort"
catalogcache_sz database on <dbname> | grep "Catalog cache"
appgroup_mem_sz all applications | grep "Total shared"
applheap_sz all applications | grep "Total private"
In deferred (aka lazy) memory allocation schemes memory requests do not require backing paging or swap space until the memory is actually touched (ie used). On such systems, therefore, there is little penalty to erring on the high side when setting heap values. The only thing to be aware of is that, when allocating a region of shared memory (Database Shared Memory for example), DB2 will try to allocate a region large enough to accomodate all the contained heaps. If that size is to large to fit in the available address space (segment registers on AIX), the allocation will fail. This is usually not a problem on 64-bit instances, but can be on 32-bit instances. If this happens, simply reduce the artificially high heap settings, and try
again.
Here's some of the output the db2 get snapshot ... grep ... commands listed will show:
The Application Group Heap's The Application Control Heap If the Number of Applications per Application
chief purpose is to cache SQL contains internal control structures Group is too small you may be wasting the
access plans. It is managed as (eg. TQs). Note that it is NOT a memory in the Shared SQL Work Area. If it is
a cache. cache. Exhaustion of the memory too large, contention may become an issue.
can return errors (SQL0973).
Increase the Application Control A number between 50 and 150 is
Heap Size if these occur. recommended.
... but try to keep application control New Config Parm Settings
heap size and number of applications per appgroup_mem_sz = (38000 + 154*78) = 50012
group, constant app_ctl_heap_sz = 50012 / 78 = 641
groupheap_ratio = 38000 / 50012 = 76
http://www-1.ibm.com/support/docview.wss?rs=71&context=SSEPGG&uid=swg21179841&loc=en_US&cs=utf-8&lang=en
http://www-1.ibm.com/support/docview.wss?rs=71&context=SSEPGG&q1=application+heap+memory+usage&
uid=swg21175378&loc=en_US&cs=utf-8&lang=en+en
The next 3 charts illustrate the DB2 advancements in dynamic memory tuning that have occured so far in the V8 timeframe. These are illustrated through an example - a backup command that requires more memory than is currently assigned to the utility heap.
As shown, prior to V8, the backup command will fail, and the database will have to be shutdown and reactivated before the utility heap can be enlarged.
Overall Database Memory Tuning : V8.1
Example scenario
> db2 backup database <dbname>
SQL2009C There is not enough memory available to run the utility.
> db2 update db cfg for <dbname> using util_heap_sz 60000
> db2 backup database <dbname>
In V8, with dynamic heaps, the utility heap can be enlarged without shutting down the database - a major improvement. However, explicit administrator action is still required before the backup command can succeed.
Overall Database Memory Tuning : V8.2
In V8.2, heaps can automatically grow themselves, without intervention. Further, the database memory segment from which the utility heap is allocated, can also grow automatically (on certain platforms - AIX and Windows).
So, in V8.2, the backup command will automatically cause the utility heap to be enlarged, and will succeed.
Database Memory Tuning : Futures
Buffer Buffer Lock Pack- Shared DB Log Cat- Utility
Pool 1 Pool 2 List age Sorts Heap Buffer alog Heap
Cache Cache
Database
Memory
Memory Tuner
and automatically balanced to achieve Entry Size
MIMO
optimal performance as workload Control Algorithm
changes Y
Memory
Model Greedy Statistics
As opposed to purely on demand Builder Accurate (Constraint) Collector
In V8.2, heaps can automatically grow themselves, without intervention. Further, the database memory segment from which the utility heap is allocated, can also grow automatically (on certain platforms - AIX and Windows).
So, in V8.2, the backup command will automatically cause the utility heap to be enlarged, and will succeed.
Bufferpools and I/O
Database Shared Memory
Lock List
Bufferpool(s)
Pkg Cache
Shared Sorts
DB Heap
Utility Heap
I/O
Disks
Bufferpool fundamentals.
Dynamic Bufferpool Operations
Now you can alter bufferpools dynamically, without shutting down the
database
>>-ALTER--BUFFERPOOL--bufferpool-name--------------------------->
+IMMEDIATE+
>-----+-+---------+--+--------------------+---SIZE--n--+----><
| +DEFERRED-+ '-NODE--node-number--' |
| |
+-+-NOT EXTENDED STORAGE-+-----------------------+
| '-EXTENDED STORAGE-----' |
'-ADD NODEGROUP--nodegroup-name------------------'
+IMMEDIATE+
>>-CREATE--BUFFERPOOL--bufferpool-name---+---------+------------> ...
+DEFERRED-+
>>-DROP----BUFFERPOOL--bufferpool-name--------------------------> ...
One of the key features of DB2's algorithm here is that the internal hashing tables used to keep track of the bufferpool size is proportional adjusted, based on the resizing specified on the ALTER command. This is important to prevent excessive CPU consumption and contention as the bufferpool grows.
Note that DROPing a bufferpool has always had immediate semantics.
Bufferpools : Prefetching I/O
Agents
Agents send prefetch requests to the Asyn
c IO
prefetch queue(s) during planned Pre
prefetching (eg. tablescans), sequential fet
ch
detection (eg. scan through a clustered Re
qu
index), and list prefetch (eg. sorted list est
s
of pages gathered through an index
scan)
Buffer Pool(s)
Page
Cleaners
Log Writer
The logger can trigger the page cleaner
The page cleaners trigger themselves if
when available log disk space is getting
the proportion of dirty pages exceeds
low, or when target recovery window is
the target (CHNGPGS_THRESH). This is
exceeded (SOFTMAX). This is termed a
termed a threshold trigger.
lsn gap trigger.
Agents
Dirty steal from agent (rare)
Page read from agent (common with
OLTP workloads)
Page
Cleaners
Dirty page
writes
This chart shows the specific type of I/O that is used to bring pages into and out of, bufferpools. The arrows beside each process/thread indicate the type of I/O used by that process/thread:
Note that a down arrow is a write operation and an up arrow is a read.
The block region of a bufferpool is an optional reservation of a certain set of contiguous pages in the bufferpool. The pages in this region ares managed on a 'block' basis, rather than on an individual page basis. That is, the buffer manager will try to keep consecutive blocks of pages available in this region, and will try to use such a consecutive block of pages to satisfy prefetch requests that require a large
block of pages that are consecutive on disk. This allows the I/O of such a block of pages to be done in a single large block I/O, which is generally more efficient than the alternatives (eg. a scattered read operation which reads consecutive pages from disk to discontiguous pages in memory).
Bufferpool : Hints/Tips
A single bufferpool is often the best choice
Bufferpool hints/tips.
Direct I/O
Many file systems now support “Direct I/O”
Bypasses the file system’s buffercache
Combines performance benefits of RAW with the usability benefits of
file systems
Examples: AIX Concurrent I/O, Veritas Quick I/O
Page Page
Agents Prefetchers Cleaners Agents Prefetchers Cleaners
DB2
Bufferpools
Filesystem
Buffercache
Concurrent I/O on AIX (aka CIO) is generally preferred over it's predecessor 'DIO'.
Direct I/O Enhancements in 8.2 V8.2
The CREATE / ALTER mechanism for enabling direct I/O is strongly recommended over the others.
Note: Temps are supported through the mount option. DDL support is coming in 8.2.2.
The Logging Subsystem
ts
ues
Req
Log ck)
Wr it e ollba Buffer Pool(s)
st s (for R
e
Requ
Log
Read
Log Buffer
db2loggr
db2loggw
<database dir>/SQLOGDIR
S0000000.LOG
S0000001.LOG Disk,
S0000002.LOG
etc
Tape,
db2logmgr TSM
V8.2
The elements of the processing architecture of DB2 that are devoted to log management, are highlighted in blue.
Logging : Key Facts
Changes to regular data and index pages are written to log buffer in memory
BLOBs and LONG VARCHARs use shadow paging (data is not logged unless Log Retain is used and the LOB column is defined to
be logged (ie doesn't use the NOT LOGGED option)
The changes to BLOB and LONG VARCHAR allocation pages are logged as regular data pages are
Pages from the log buffer are regularly forced to the online log files on disk by the db2loggw
The db2loggw tries to always keep a large block I/O outstanding against the log device
However, there are times when the db2loggw may force specific individual pages or groups of pages to disk
When log records are not being genearated quickly enough for large blocks of contguous pages from the log buffer to always be ready for writing, the
db2loggw can write smaller groups of pages
If a dirty buffer pool page is written to disk, the db2loggw will first write the log pages containing the log records associated with the dirty page (if they're not
already on disk)
On COMMIT (or after mincommit transactions COMMIT), the db2loggw will write all log pages associated with the transaction(s), if they're not already on
disk
When log archiving is enabled, each online log file is archived by the db2logmgr after it becomes full
Archival devices supported include disk, TSM, tape
By default, online log files (those containing log records for active transactions or dirty pages - ie. those
needed in the event of crash recovery) cannot be overwritten
In this case, the total amount of active log space cannot exceed the total amount of online log space configured
Active log space = #bytes in log stream from first log record written by oldest active transaction or log record corresponding to the
oldest dirty page in the bufferpool (whichever was written first) to the end of the log
When infinite logging is enabled, archived log files can be immediately overwritten with new log data
If a rollback occurs that requires the overwritten log data, the archived log file will be retrieved
This chart provides a high level overview of the logging subsystem within DB2.
Logging : Key Parameters & Hints/Tips
logfilsz, logprimary, logsecond : determine online log disk space allocation
Larger sizes can buffer log I/Os more effectively - both writes and reads (for rollbacks) and prevent agents from waiting on log I/O
Recommendations
Use the num_log_buffer_full, and num_log_data_found_in_buffer, snapshot monitor elements to determine if the log buffer is too
small
Other recommendations
Use the following snapshot monitor elements to determine if the I/O subsystem is a bottleneck
log_write_time
log_read_time
num_log_write_io
num_log_read_io
log_writes
log_reads
Use ALTER/CREATE TABLE ... NOT LOGGED INITIALLY to turn off logging for the table during a given transaction
Avoid circular logging unless you can accept data loss in media failure scenarios, or can recover your data through other means
Note that when the NOT LOGGED INTIALLY clause of ALTER TABLE or CREATE TABLE is used, no logging of the records inserted/updated/deleted in that table takes place during the transaction. However, at COMMIT, we ensure all changed pages are flushed to disk, to ensure recoverability.
This capability can be helpful in reducing log space requirements when populating large tables.
It can also, in some situations, help increase performance. However, the logging benefit must be weighed against the page flushing drawback. Transactions which make a large number of changes to a small number of pages are more likely to gain performance advantages because the page I/O would likely be less than the log I/O that would result if the NOT LOGGED INITIALLY clause was not used.
Note, that the pagecleaners can perform much of the page I/O in the background before COMMIT. Consider (perhaps temporarily) setting the chngpgs_thresh/iocleaners configuration parameters to lower/higher values in order to make the cleaners more aggresive.
Note, also, that there are some important recoverability considerations with NOT LOGGED INITIALLY tables. Read about these in the Administration Guide before using this capability. (There's a pointer to online UDB document on the last chart).
Logging : Snapshot Parameter Details
The following parameters can help one in determining if the log disk(s) are sufficient. (These parameters
will allow one to determine average read/write I/O time, and average read/write I/O size. )
The following parameters can help one in determining if the log buffer is too small
(num_log_data_found_in_buffer, together with num_log_read_io can give the hit and miss ratios for
rollback).
Details on how some of the snapshot elements can be very useful in monitoring and evaluating the performance of the logging subsystem.
Infinite Logging
Allows space used by archived active logs to be overwritten with new log data
One is no longer limited by the size of the primary log (logprimary x logfilesiz)
Infinite logging was added in V8.1, and is aimed at providing tolerance to the occasional errant transaction that requires an excessive amount of log space.
Infinite Logging Usage Considerations
Rollback and crash recovery may require (relatively slow) retrieval of archived logs
Watch for:
Long running applications that do a few updates and hang so they never commit or end
Runaway transactions - eg. caused by SQL issued in error
A warning is written to the Administration Notification log when current units of work exceed
primary log allocation
It's very important to note that you should NOT design transactions to exploit log space that exceeds that of the configured online log. If such transactions decide to rollback, the ensuing undo operation will require potentially lengthy log retrieve operations.
Again, this feature is really designed to handle the exceptional errant transaction, not the general case.
Logging : More Hints/Tips
Use the log throttling configuration parameters (added V8 FP2) to
prevent 'runaway' transactions:
max_log
Maximum active log space consumed by one transaction as a percent of primary log space
Has a minimum value of 0 and a maximum value of 100
A value of 0 means that the control is not in use
Dynamic configuration parameter
num_log_span
Number of active log files a single transaction is allowed to span
Has a minimum value of 0 and a maximum value of 65535
A value of 0 means that the control is not in use
Dynamic configuration parameter
Session: C09
Tue Oct 25 5:30-6:30
This session will dive into the internals of DB2 UDB in depth, including details of the latest versions of DB2 on the Unix, Windows and Linux platforms (V8.2 and beyond). Details such as record formats, page formats, index algorithms, memory management and tuning, storage management, bufferpool algorithm, logging, and the process and threading design will be covered in depth. As each concept explained,
key hints, tips and best practice information will be provided. This will enable DBAs and System Administrators to fully exploit the functions and features of DB2 UDB.
In this first (of two) parts, the focus will be on process and thread management, as well as logging, buffering and memory management..