You are on page 1of 49

®

IBM Software Group


Platform: DB2 for Linux, UNIX, Windows

DB2 UDB Internals : The Deep Dive, Part 1

Matt Huras, DE, IBM


huras@ca.ibm.com

Session: C09
Tue Oct 25 5:30-6:30

This session will dive into the internals of DB2 UDB in depth, including details of the latest versions of DB2 on the Unix, Windows and Linux platforms (V8.2 and beyond). Details such as record formats, page formats, index algorithms, memory management and tuning, storage management, bufferpool algorithm, logging, and the process and threading design will be covered in depth. As each concept explained,
key hints, tips and best practice information will be provided. This will enable DBAs and System Administrators to fully exploit the functions and features of DB2 UDB.
In this first (of two) parts, the focus will be on process and thread management, as well as logging, buffering and memory management..
Agenda
Part I
Architecture Overview
Process/Thread Model
Base Processing Model
Concentrator
Hints/Tips/Best Practices
Memory Management, Buffering, Logging
Shared and Private Memory Heaps
Sorting
Buffer Pools
Logging
Hints/Tips/Best Practices

Part II
Storage Architecture
System Managed Storage (SMS) Tablespaces
Database Managed Storage (DMS) Tablespaces
Automatic Storage
Hints/Tips/Best Practices
Data Management
Tables, Records, Indexes
Page Format, Space Management
Multi-Dimensional Clustering
Hints/Tips/Best Practices

The first prat of the presentation will cover:


- the process/thread model
- the concentrator
- memory management
- buffering
- logging

As you can tell from this agenda, I'm going to focus on the lower ends of the system (ie. the components below the SQL processor). The basic format is - explain some concepts, follow up with hints/tips related to those concepts, explain more concepts, more hints/tips, etc.

Some administrivia related to registry variables and configuration parameters (which are often referred to in the talk):

Registry variables are usually referred to with syntax like DB2_STRIPED_CONTAINERS=ON. To set a registry variable, do the following:
db2set DB2_STRIPED_CONTAINERS=ON

Configuration parameters are usually referred to in italics. To view configuration parameter values use:
db2 get database manager configuration
db2 get database configuration for <dbname>

To set a database manager configuration parameter use, for eg:


db2 update database manager configuration using num_poolagents 1000

To set a database configuration parameter use, for eg;


db2 update database configuration for <dbname> using locklist 10000
Agenda
Part I
Architecture Overview
Process/Thread Model
Base Processing Model
Concentrator
Hints/Tips/Best Practices
Memory Management, Buffering, Logging
Shared and Private Memory Heaps
Sorting
Buffer Pools
Logging
Hints/Tips/Best Practices

Part II
Storage Architecture
System Managed Storage (SMS) Tablespaces
Database Managed Storage (DMS) Tablespaces
Hints/Tips/Best Practices
Data Management
Tables, Records, Indexes
Page Format, Space Management
Multi-Dimensional Clustering
Hints/Tips/Best Practices

I'll first provide a high level narrative of the processing flow throughout the database server.
Architecture Overview
Parallelism
SQL and Utilities
Intra- & Intra-Partition
Clients Parallelism
Cost-based Optimizer with
Query Rewrite
DB2 Server Dynamic throttling based on
CPU Coordinator load
Agent
CPU
Package
Parallel SMP Exploitation
Subagents All CPUs exploited through
CPU Cache
OS threads and processes
CPU

CPU
Very Large Memory
Exploitation
CPU 64 bit Support
Log Buffers Buffer Pools
I/O Bufffering
CPU Multiple Buffer Pools
CPU
Logger Prefetchers Page
Cleaners I/O Subsystem
Asynchronous, Parallel I/O
Automatic, Intelligent Data
Striping with Parallel I/O
Big block I/O
Scatter/Gather I/O

Each circle in the box is an EDU (Engine Dispatchable Unit). EDUs are implemented as threads on Windows (all within a single process) and processes on Linux and UNIX. Each application is assigned a dedicated coordinator agent (by default - more on this later), which coordinates the processing for that application and communicates with it. Applications can also be assigned a set of subagents which work together on individual SQL requests (eg. sharing a sort) so as to fully exploit SMP machines. All agents are managed with a pooling algorithm which minimizes EDU
creations/destructions. The circles above the cloud are application programs (either local or remote) that are linked with DB2's client library (the client and server versions can differ, within limits). Local clients talk to their coordinator agent via shared memory and semaphores. Remote clients use TCPIP, SNA, IPX/SPX or IPC (for local clients).
The prefetchers' main duty is to ensure agents doing scans never wait for disk I/O. Agents send asynchronous read-ahead requests to a common prefetch queue, and the prefetchers use big-block or scatter read I/Os to bring the request pages into the bufferpool. Data is striped across the disks to enable the prefetchers to drive multiple disks simultaneously.
The pagecleaners' main duty is to ensure agents trying to bring a page into the buffer pool, never need to flush a dirty page to disk to free up a slot. This extra I/O would unecessarily reduce response time. The page cleaners are background EDUs which, under certain conditions (described later) wake up and "clean" (flush to disk) dirty pages.
The ARIES recovery method is used (generally recognized as the most advanced in the industry). Agents updating a record in the database update the associated page (of course), and write a log record containing information necessary to either redo or undo the change. (Various techniques, including XOR logging, are used to minimize the amount of data logged). Neither the page nor the log buffer are flushed to disk immediately (to optimize performance). The logger and bufferpool manager cooperate to implement a WAL (Write Ahead Logging) protocol that ensures any dirty page
does not make it to disk before it's associated log record. The only I/O that is always required by transaction is a force of the log at COMMIT time.
* C.Mohan,D.Haderle,B.Lindsay,H.Pirahesh,P.Schwarz. ARIES: a transaction management method using write-ahead logging. In ACM SIGMOD Conf. on the Management of Data, 1992.
Architecture Overview : DPF Feature
Shared Nothing Architecture Partitions are Logical
Any number of partitions can be created
Allows Virtually Unlimited on a single physical machine (works
Scalability extremely well with NUMA architectures)
Each partition owns it's resources (buffer pool,
locks, disks,...)
Avoids common limits on scalability:
No need for distributed lock manager or buffer
Virtually Everything Runs in
coherence protocols
No need to attach disks to multiple machines
Parallel Across Nodes
SQL: queries, inserts, updates, deletes
Partitions Communicate Only Necessary Tuples
Using shared memory (same machine) Utilities: Backup, Restore, Load, Index
Using high speed comm (diff. machines) Create, Reorg
Clients Optimized by global optimizer

Applications See Single Database View


CPU CPU CPU

...
CPU CPU CPU

CPU CPU CPU

CPU CPU CPU

CPU CPU CPU

CPU CPU CPU

CPU CPU CPU

CPU CPU CPU

Partition 1 Partition 2 Partition N

In a MPP environment, the processing architecture described on the previous slide is extended across all database partitions (aka nodes). Clients connect into one node, and that node is where the coordinator agent will reside. The subagent pool, however, extends across all nodes that have been involved in SQL requests issued by the client.
Data for a given table is partitioned acrosss the nodes automatically and transaparently by DB2 based on a key hashing algorithm. Within each node, the operation of the node (eg. prefetchings, page cleaning, etc) is identical to that described on the previous chart.
Agenda
Part I
Architecture Overview
Process/Thread Model
Base Processing Model
Concentrator
Hints/Tips/Best Practices
Memory Management, Buffering, Logging
Shared and Private Memory Heaps
Sorting
Buffer Pools
Logging
Hints/Tips/Best Practices

Part II
Storage Architecture
System Managed Storage (SMS) Tablespaces
Database Managed Storage (DMS) Tablespaces
Hints/Tips/Best Practices
Data Management
Tables, Records, Indexes
Page Format, Space Management
Multi-Dimensional Clustering
Hints/Tips/Best Practices

I'll now get into more details on how DB2 uses proceses and threads.
Process/Thread Organization
Processing Model : Detailed View Per-instance
Per-application
Per-database
Idle, pooled agent or subagent

UDB Client Library Common


Client
Shared Mem & Semaphores, TCPIP, Named Pipes, NetBIOS, SNA, IPX/SPX
Idle Agent Pool UDB Server
Instance Level
Listeners
db2ipccm
db2agent (idle) db2tcpcm db2agent
Database Level
As
ync Coordinator
IO
Pre
fe
Agents
tch
Re
sts db2agntp qu
ue est
V8
db2agntp
gR
eq s Subagents
Lo
ite
Wr Active Idle
ns
tio
ica

Logging Prefetchers
tif

Buffer Pool(s)
No

Subsystem Log Buffer


im
ct

db2pfchr
Vi

Deadlock
db2loggr Detector ck
,
Page
blo
db2loggw
db2dlock , Big- sts
e
Cleaners
l
lle qu db2pclnr
ra Re
Pa ad
R e

Log lel,
ge
Pa sts
e Data Disks
ral equ
Disks a
P te R
Wr
i

This chart, through it's animation, takes you through the start up and database activation process, step by step.
Process/Thread Organization
Processing Model : Detailed View Per-instance
Per-application
Per-database
Idle, pooled agent or subagent

UDB Client Library Common


Client
Shared Mem & Semaphores, TCPIP, Named Pipes, NetBIOS, SNA, IPX/SPX
Idle Agent Pool UDB Server
Instance Level
Listeners
db2ipccm
db2agent (idle) db2tcpcm db2agent
Database Level
As
ync Coordinator
IO
Pre
fe
Agents
tch
Re
sts db2agntp qu
ue est
V8
db2agntp
gR
eq s Subagents
Lo
ite
Wr Active Idle
ns
tio
ica

Logging Prefetchers
tif

Buffer Pool(s)
No

Subsystem Log Buffer


im
ct

db2pfchr
Vi

Deadlock
db2loggr Detector ck
,
Page
blo
db2loggw
db2dlock , Big- sts
e
Cleaners
l
lle qu db2pclnr
ra Re
Pa ad
R e

Log lel,
ge
Pa sts
e Data Disks
ral equ
Disks a
P te R
Wr
i

This chart, through it's animation, takes you through the start up and database activation process, step by step.
Process/Thread Organization
Processing Model : Detailed View Per-instance
Per-application
Per-database
Idle, pooled agent or subagent
SQL

Common
Shared Mem & Semaphores, TCPIP, Named Pipes, NetBIOS, SNA, IPX/SPX
Client
Idle Agent Pool UDB Server
Instance Level 3
Listeners

Database Level Coordinator


2 Agents
1 4
V8 Create Steal Subagents
5 Active Idle
Logging Prefetchers
Subsystem Log Buffer Buffer Pool(s)

Deadlock
Detector Page
Cleaners

Log Disks
Disks

This chart shows the algorithm used to select (or create) a new subagent if one is needed to execute a new SQL statement on the leftmost application.
As indicated, the algorithm gives strong preference towards re-using existing processes/threads and avoiding their creation.
Processing Model : Hints/Tips

Minimize agent "steals" and "creations" by increasing num_poolagents


num_poolagents defines an upper limit for the total number of idle agents instance wide
To monitor steals and creations, use:

db2 get snapshot for database manager | grep "Agents"

And ensure agents are "stolen" and "created" much less frequently than agents are
"assigned" from the pool
A high OS "context switch" rate may also indicate that the steal or creation rate is too high

"Prime" the idle agent pool on db2start by setting num_initagents

Control SMP parallelism by:


Setting intra_parallel to YES to enable intra-partition parallelism
Using dft_degree to specify default degree of intra-partition parallelism for SQL
Individual users can override via DEGREE bind/precompile option or
CURRENT DEGREE special register
Value of ANY means UDB decides (based on hardware and the type of SQL request)
Using max_querydegree to specify an instance-wide upper limit
See the Admin Guide for details on enabling utility parallelism

Sample output from the snapshot command:


Agents registered = 2
Agents waiting for a token = 0
Agents assigned from pool = 2
Agents created from empty pool = 3
Agents stolen from another applicatio = 0
Connection Concentrator V8

Without Concentrator With Concentrator

N Client Connections

Communications Link

N Coordinator K Coordinator
Agents Agents

f(N) Subagents f(K) Subagents

Page Cleaners
Prefetchers
Bufferpools

Enable this by setting MAX_CONNECTIONS > MAX_COORDAGENTS, as in this


example:
db2 update dbm cfg using max_connections 10000 max_coordagents 100

The connection concentrator is new technology in V8.1.


It's purpose is to dramatically reduce the resource consumption (primarily memory) resources on the server, for a given user population.
It works by serving a very large set of database connections through a much smaller set of DB2 coordinator agents.
The coordinator agents service new transaction requests from these connections on a FCFS basis (see the next slide's speaker notes for details).
Processing Model : Concentrator V8

Common
Shared Mem & Semaphores, TCPIP, Named Pipes, NetBIOS, SNA, IPX/SPX
Client
Idle Agent Pool UDB Server
Instance Level New Transactions
SQL within a SQL within a Listeners
Transaction Transaction
Dispatchers
db2disp

Coordinator
Database Level Agents
Select agent from an
idle pool, or,
Create new agent (if
within cfg'd limit) Subagents
Otherwise, Queue the
request
Logging Prefetchers
Subsystem Buffer Pool(s)

Deadlock
Detector Page
Cleaners

Log Disks
Disks

When a new transaction starts, the dispatchers (there can be more than one) try to find/create an agent to work on it (the brown arrows indicate this process).
If there is no free idle agent, or an agent cannot be created (because the configured limit on the number of coordinator agents has been reached), the request is queued.
When a coordinator agent working on behalf of a particular transaction becomes available (because the transactions ends), the agent will then serve the next transaction (regardless of connection) on the queue. If the queue is empty, it will wait for a request to appear.
Note that when Application Groups are present (discussed in previous speaker notes) there is 1 queue per application group, and each agent is associated with a particular application group.
Agents first look to their own application groups for new transaction requests.
To ensure no single application group 'hogs' the system resources, there is a mechanism which allow agents to migrate from across application groups over time.
Processing Model : Application Groups V8

Common
Shared Mem & Semaphores, TCPIP, Named Pipes, NetBIOS, SNA, IPX/SPX
Client
Idle Agent Pool UDB Server
Instance Level
Listeners

Application Group Sh'd Memory Application Group Sh'd Memory


(Shared SQL Work Area) (Shared SQL Work Area)

Coordinator
Agents
Application
Group Level V8
db2agntp
Subagents

Logging Prefetchers
Subsystem Buffer Pool(s)

Deadlock
Detector Page
Cleaners

Log Disks
Disks

This chart illustrates the processing model with 2 Application Groups in effect.
Note that in this environment, the database level idle agent pool is actually comprized of the 2 separate idle agent pools - one per Application Group. Note that the the other pools (eg. Application level, Instance level) are independent of this.
Processing Model : Application Groups V8

Benefits of application groups


Common
Client
Potential for more efficient memory Idle Agent Pool
Shared Mem & Semaphores, TCPIP, Named Pipes, NetBIOS, SNA, IPX/SPX
UDB Server
utilization Instance Level
Listeners
SQL workspaces and other memory
structures shared across applications Application Group Sh'd Memory
(Shared SQL Work Area)
Application Group Sh'd Memory
(Shared SQL Work Area)

Coordinator
Agents
Application
Group Level V8
db2agntp
Subagents
Application groups are created
transparently by DB2 on demand Logging
Subsystem Buffer Pool(s)
Prefetchers

Deadlock Page
As applications connect to a database Detector
Cleaners

Number of applications per group is :


Log Disks
appgroup_mem_sz / Disks
app_ctl_heap_sz
DB2 starts off with 1 application group
The first (appgroup_mem_sz/app_ctl_heap_sz) appgroup_mem_sz
database manager configuration parameter that defines
connections are placed in this initial the size of the Application Group Shared Memory
group region for a single Application Group (ie. one of
green/pink/grey rectangles above)
If these connections do not
disconnect, the next connection will app_ctl_heap_sz
cause a new application group to be database manager configuration parameter that helps
created, and this connection will be define the Application Control Heap for a single
Application (ie. one of the pink rectangles above)
placed in this new application group
(more on these later)

This chart shows the relationship between with Application Group Heap, the Application Control Heap, and the Application Group Shared Memory segment.
Concentrator & App Group : Hints/Tips

In very high user populations, consider using the concentrator


Enable this by setting max_connections > max_coordagents
This can be useful when the number of simultaneous applications requires more resources than
is available, resulting in excessive context switching and/or swapping/paging, but,...
It can also decrease fairness (e.g. a long transaction can "hog" an agent)

Avoid using V7 clients when using the concentrator


The concentrator does not concentrate V7 clients

Target for at least 50 to 150 applications per application group


Ensure that APPGROUP_MEM_SZ / APP_CTL_HEAP_SZ is at least 50 to 150

Try to ensure that applications are evenly divided amongst application groups
To avoid wasting memory in one of the application groups
For example, if you expect 200 peak applications at any given time, consider setting
APPGROUP_MEM_SZ / APP_CTL_HEAP_SZ to 100 (or 50 or 200)
This will ensure 2 ( or 4 or 1) 'fully populated' application groups

This chart provides some tips for configuring the concentrator.


Agenda
Part I
Architecture Overview
Process/Thread Model
Base Processing Model
Concentrator
Hints/Tips/Best Practices
Memory Management, Buffering, Logging
Shared and Private Memory Heaps
Buffer Pools
Logging
Hints/Tips/Best Practices

Part II
Storage Architecture
System Managed Storage (SMS) Tablespaces
Database Managed Storage (DMS) Tablespaces
Hints/Tips/Best Practices
Data Management
Tables, Records, Indexes
Page Format, Space Management
Multi-Dimensional Clustering
Hints/Tips/Best Practices

The next part of the sessions focus on memory management, including buffer management and log management.
Memory Model
Instance Shared Memory
FCMbuffers (fcm_num_buffers)
Monitor heap (mon_heap_sz)

Database Shared Memory Database Shared Memory


Overall region size (database_memory)
Bufferpools (buffpage or ALTER BUFFERPOOL ..)
Lock list (locklist)
Package cache (pckcachesz)
Shared sorts (sortheap,sheapthresh_shr)
Database heap (dbheap)
Log buffer (logbufsz) 1 ... numdb
Catalog cache (catalogcache_sz)
Utility heap (util_heap_sz)

Application Group Sh'd Memory Application Group Sh'd Memory


Appl Group Heap (Shared SQL Work Area) Application Shared Memory
(appgroup_mem_sz, groupheap_ratio) Application Control Heap - contains internal
control structures (eg. Table Queues)
(appl_ctl_heap_sz, groupheap_ratio)
1 ... maxappls
Agent Private Memory
private sorts (sortheap,sheapthresh)
application heap (applheapsz)
agent stack (agent_stack_sz)
1 ... maxagents query heap (query_heap_sz)
statement heap (stmtheap)
statistics heap (stat_heap_sz)

This chart describes the various types of memory that exist in a parition, the main heaps and usages of the memory, as well as the configuration parameters which control the size of these heaps.
maxappls is a database configuration parameter that sets an upper limit to the number of applications that connect to a database. maxagents is a database manager configuration parameter that sets an upper limit to the total number of agents in a partition.
All EDUs in a partition are attached to Instance Shared Memory. All EDUs doing work within a database are attached to that database's Database Shared Memory. All EDUs working on behalf of a particular application, are attached to an Application Shared Memory region for that application. This type of shared memory is only allocated if intra- or inter-partition parallelism is enabled. In addition, all EDUs working on behalf of a particular application, are attached to the Application Group Shared Memory region for the Application Group that application is a member of. Application Groups,
and Application Group Shared Memory are not used if neither intra-partition parallelism, inter-partition parallelism, nor the concentrator, are enabled. Finally, each EDU, as well, has it's own private memory.

There a few special types of shared memory not shown:


Agent / Local Application Shared Memory - Attached to by coordinator agents servicing local applications, and the application. Used for SQL request/response communications.
UDF / Agent Shared Memory - Attached to by agents running a fenced UDF or Stored Procedure. Used as a communications area.
Extended Buffer Pool (ESTORE) - A typically huge (>>4Gb) region of shared memory used as an extended bufferpool. Agents/Prefetchers/Pagecleaners are not permanently attached to it, but rather, attach to individual segments within it, as needed.
db2trc shared segment - Attached to by all processes/threads associated with a given instance of DB2. Used for problem diagnosis purposes.
Memory Model : Continued
Instance Shared Memory

Database Shared Memory Database Shared Memory

1 ... numdb

Application Group Sh'd Memory


Application Shared Memory
V8

Agent Private Memory


1 ... maxappls

1 ... maxagents
Coordinator Agent / Local Client
Communications Memory
Agent/client comm area (aslheapsz)

1 ... number of local clients


Others
Trace shared segment (one per
instance)
Fenced mode process segments (one
per fenced mode process)

This chart extends the previous chart, showing how some of the additional segments fit in.
Memory Model : Shared Segments Example Instance Shared Memory
includes FCM (Fast Communication Manager) buffers

Instance Shared Segment


Database Shared Memory Database Shared Memory
Database Shared Segment Overall region size (database_memory) V8
/home/huras> db2start Bufferpools (buffpage or ALTER BUFFERPOOL ..)

App Group Shared Segment Lock list (locklist)


Package cache (pckcachesz)
Shared sorts (sortheap,sheapthresh_shr) V8
SQL1063N DB2START processing was successful. Database heap (dbheap)

Local App / Agent Shared Segment Log buffer (logbufsz)


Catalog cache (catalogcache_sz)
1 ... numdb

/home/huras> db2 connect to mydb db2trc Shared Segment


Utility heap (util_heap_sz)

Database Connection Information Application Group Sh'd Memory V8 Application Shared Memory
Application Control Heap (appl_ctl_heap_sz)
Application Group Heap Memory
(appgroup_mem_sz, groupheap_ratio)
Database server = DB2/6000 8.1.5 Agent Private Memory
private sorts (sortheap,sheapthresh)
SQL authorization ID = HURAS 1 ... maxappls
application heap (applheapsz)
agent stack (agent_stack_sz)
query heap (query_heap_sz)

Local database alias = MYDB statement heap (stmtheap)


statistics heap (stat_heap_sz)
1 ... maxagents

/home/huras> ipcs -ma | head


IPC status from /dev/mem as of Mon Mar 15 13:37:11 EST 2004
T ID KEY MODE OWNER GRP CREATOR CGRP NATTCH SEGSZ CPID LPID ATIME DTIME CTIME
/home/huras> ipcs -ma | grep huras

m 1133903926 0x590ed661 --rw------- huras build huras build 18 8126464 88892 139862 13:17:54 13:17:54 13:17:17
m 181403700 0xffffffff --rw------- huras build huras build 8 268435456 151086 151086 13:17:41 13:17:41 13:17:41
m 3801128 0xffffffff --rw------- huras build huras build 2 131072 92296 151086 13:17:41 13:17:41 13:17:41
m 824442949 0x590ed674 --rw-rw-rw- huras build huras build 19 140665792 116770 72352 13:17:54 13:17:55 8:38:05

/home/huras> db2stop
SQL1063N DB2STOP processing was successful.
/home/huras> db2 update cbm cfg using intra_parallel on
DB20000I The UPDATE DATABASE MANAGER CONFIGURATION command completed successfully.
/home/huras> db2start
SQL1063N DB2START processing was successful.
/home/huras> db2 connect to mydb
Database Connection Information
Database server = DB2/6000 8.1.5
SQL authorization ID = HURAS
Local database alias = MYDB
/home/huras> ipcs -ma | grep huras

m 1133510710 0x590ed661 --rw------- huras build huras build 18 11534336 74900 74900 13:16:37 13:16:39 13:16:36
m 1966123 0xffffffff --rw------- huras build huras build 8 268435456 161484 161484 13:16:39 13:16:39 13:16:39
m 3670056 0xffffffff --rw------- huras build huras build 1 82051072 161484 161484 13:16:39 13:16:39 13:16:39
m 181272628 0xffffffff --rw------- huras build huras build 2 131072 92294 161484 13:16:39 13:16:39 13:16:39
m 824442949 0x590ed674 --rw-rw-rw- huras build huras build 19 140665792 116770 74902 13:16:39 13:16:39 8:38:05

This chart provides an example, on a UNIX platform, illustrating what segments get created, and when, and how you can show and recognize the segments using the UNIX ipcs command.
Memory Model : Shr Segments Example ...
/home/huras> db2 alter bufferpool ibmdefaultbp size 100000
SQL20189W The buffer pool operation (CREATE/ALTER) will not take effect until
the next database startup due to insufficient memory. SQLSTATE=01657
/index:/home/huras> db2 terminate
DB20000I The TERMINATE command completed successfully.
/home/huras> db2 connect to mydb
Database Connection Information
Database server = DB2/6000 8.1.5
SQL authorization ID = HURAS
Local database alias = MYDB
/home/huras> ipcs -ma | head
IPC status from /dev/mem as of Mon Mar 15 13:37:11 EST 2004
T ID KEY MODE OWNER GRP CREATOR CGRP NATTCH SEGSZ CPID LPID ATIME DTIME CTIME
/home/huras> ipcs -ma | grep huras

m 1133510710 0x590ed661 --rw------- huras build huras build 18 11534336 74900 74900 13:16:37 13:16:39 13:16:36
m 1966123 0xffffffff --rw------- huras build huras build 8 268435456 161484 161484 13:16:39 13:16:39 13:16:39
m 1332477970 0xffffffff --rw------- huras build huras build 8 268435456 161484 161484 13:16:39 13:16:39 13:16:39
m 3670056 0xffffffff --rw------- huras build huras build 1 82051072 161484 161484 13:16:39 13:16:39 13:16:39
m 181272628 0xffffffff --rw------- huras build huras build 2 131072 92294 161484 13:16:39 13:16:39 13:16:39
m 824442949 0x590ed674 --rw-rw-rw- huras build huras build 19 140665792 116770 74902 13:16:39 13:16:39 8:38:05

/home/huras> db2 get db cfg show detail | grep memory


Size of database shared memory (4KB) (DATABASE_MEMORY) = AUTOMATIC(110720)

/home/huras> db2mtrk -d
Tracking Memory on: 2004/03/15 at 13:12:52

Memory for database: MYDB

utilh pckcacheh catcacheh bph bph bph bph


16.0K 160.0K 80.0K 4.1M 592.0K 336.0K 208.0K

bph lockh dbh other


144.0K 480.0K 3.1M 0

The previous example, continued.


Memory Model : Shr Segments Example ...
/home/huras> [504] index:/home/huras> db2mtrk -h
Usage: db2mtrk -i | -d | -p [-m | -w] [-v] [-r interval [count]] [-h]

-i Display instance level memory usage


-d Display database level memory usage
-p Display agent private memory usage
-m Display maximum usage information
-w Display watermark usage information
-v Display verbose memory usage information
-r Run in repeat mode
interval Amount of seconds to wait between reports
count Number of reports to generate before quitting
-h Display this help screen

Notes:
1. One of -i -d -p must be specified.
2. The -w and -m flags are optional. An invocation of the application is invalid if both flags are specified.
3. The -m flag reports the maximum allowable size for a given heap while the -w flag reports the largest amount
of memory allocated from a given heap at some point in its history.

Usage scenarios:
db2mtrk -i -d
Report current memory usage for instance and all databases

db2mtrk -i -p -m
Report maximum allowable size for instance and agent private memory

db2mtrk -p -r 1 5
Report agent private memory five times at one second intervals

Heap Legend:
When running in normal mode (i.e. -v flag not specified) heaps are named using the following codes:

appctlh - Application Control Heap lockh - Lock Manager Heap


apph - Application Heap monh - Database Monitor Heap
bph - Buffer Pool Heap other - Other Memory
catcacheh - Catalog Cache Heap pckcacheh - Package Cache
dbh - Database Heap queryh - Query Heap
dlfmh - DFM Heap stath - Statistics Heap

Details on a very useful new command to track internal DB2 memory usage - the db2mtrk command.
Heaps and Memory : Hints/Tips
Use db2 get snapshot ... to help determine if adjustment needed, eg.:
Heap db2 get snapshot for ...
locklist database on <dbname> | grep "esc"
pckcachesz database on <dbname> | grep "Package cache"
sortheap, sheapthresh, database on <dbname> | grep "ort"
sheapthresh_shr database manager | grep "ort"
catalogcache_sz database on <dbname> | grep "Catalog cache"
appgroup_mem_sz all applications | grep "Total shared"
applheap_sz all applications | grep "Total private"

Other useful memory tracking tools:


db2mtrk -- convenient command line front end to the snapshot command
db2 get snapshot for ... database / database manager / application now lists memory information for all
private and shared heaps -- current usage, maximum allowed, and high water mark
Memory visualizer -- GUI front end

Use the documentation:


Administration Guide Performance: good parameter descriptions and tuning advice
System Monitor Guide and Reference: how to monitor ; more tuning advice
http://www-306.ibm.com/software/data/db2/udb/support/manualsv8.html#V8PDF

In deferred (aka lazy) memory allocation schemes memory requests do not require backing paging or swap space until the memory is actually touched (ie used). On such systems, therefore, there is little penalty to erring on the high side when setting heap values. The only thing to be aware of is that, when allocating a region of shared memory (Database Shared Memory for example), DB2 will try to allocate a region large enough to accomodate all the contained heaps. If that size is to large to fit in the available address space (segment registers on AIX), the allocation will fail. This is usually not a problem on 64-bit instances, but can be on 32-bit instances. If this happens, simply reduce the artificially high heap settings, and try
again.

Here's some of the output the db2 get snapshot ... grep ... commands listed will show:

Agents currently waiting on locks = 0


Lock escalations = 0
Exclusive lock escalations = 0

Package cache lookups = 59


Package cache inserts = 59
Package cache overflows = 0
Package cache high water mark (Bytes) = 130784

Total Private Sort heap allocated = 0


Total Shared Sort heap allocated = 0
Shared Sort heap high water mark = 0
Total sorts = 0
Sort overflows = 0
Active sorts = 0

Private Sort heap allocated = 0


Private Sort heap high water mark = 0
Piped sorts requested = 0
Piped sorts accepted = 0
<
Catalog cache lookups = 4
Catalog cache inserts = 4
Catalog cache overflows = 0
Catalog cache high water mark = 0

Total shared overflows = 0


Total shared section inserts = 0
Total shared section lookups = 0

Total private overflows = 0


Total private section inserts = 0
Total private section lookups = 0
Memory Tuning Example : Access Plan Caching

Package Cache Tuning Database Shared Memory

Get information about health of Package Cache


package cache
pckcachesz
db2 get snapshot for database <dbname>
| grep "Package cache"

Package cache lookups = 500


Package cache inserts = 211 Catalog look ups and
Package cache overflows = 100 SQL Compilations /
Optimizations
Calculate package cache hit ratio

Package cache hit rate = 1- 211/500 = 0.58

The Package Cache caches commonly


Take action : increase pckcachesz
used SQL access plans.
db2 update db cfg
for <dbname> (Note: access plans stored here cannot
using pckcachesz <new size> be directly executed by an agent,
because, for example, they do not
contain memory allocations for any
temporary memory required. Before
an access plan can be executed, it
must be copied into a SQL Workspace.)

No additional notes on this slide.


Memory Tuning Example: SQL Work Area Tuning
Shared SQL Work Area Tuning
Database Shared Memory
Get information about health of shared
SQL work area
Package Cache
db2 get snapshot for database <dbname>
| grep "Total shared"

Total shared overflows = 100


Total shared section inserts = 155
Total shared section lookups = 400
To be executed, an access plan must exist in the SQL
Work Area for the application. So, before a SQL
Calculate cache hit ratio statement is executed, DB2 checks to see if the
access plan is already cached in the SQL Work Area.
Shared SQL work area hit rate = If not, it will be copied from the Package Cache.
1- 155/400 = 0.61
Application Group Sh'd Memory Application Group Sh'd Memory
Take action : increase application
group heap size ... (Shared SQL Work Area) (Shared SQL Work Area)

db2 update db cfg


for <dbname>
using ??
Note: this area of memory is
termed the "Application Group
Heap". The chief usage of
memory in this heap is the
Shared SQL Work Area Cache.

No additional notes on this slide.


Memory Tuning Example: SQL Work Area Tuning

3 key internal settings ...


Application Group Heap Size
Application Control Heap Size
Number of Applications per Application Group

... controlled by 3 external configuration parameters


appgroup_mem_sz
group_heap_ratio
app_ctrl_heap_sz

No additional notes on this slide.


Memory Tuning Example: SQL Work Area Tuning
Application Group Sh'd Memory
Application Group Heap Size = appgroup_mem_sz * group_heap_ratio/100
(Shared SQL Work Area)

Application Control Heap Size = app_ctrl_heap_sz*((100-group_heap_ratio)/100)

Number of Applications Per Application Group = appgroup_mem_sz / app_ctrl_heap_sz

The Application Group Heap's The Application Control Heap If the Number of Applications per Application
chief purpose is to cache SQL contains internal control structures Group is too small you may be wasting the
access plans. It is managed as (eg. TQs). Note that it is NOT a memory in the Shared SQL Work Area. If it is
a cache. cache. Exhaustion of the memory too large, contention may become an issue.
can return errors (SQL0973).
Increase the Application Control A number between 50 and 150 is
Heap Size if these occur. recommended.

When tuning one of ...


Application Group Heap Size
Application Control Heap Size
Number of Applications per Application Group

... keep the other 2 constant !

No additional notes on this slide.


Memory Tuning Example: SQL Work Area Tuning
Application Group Sh'd Memory
Shared SQL Work Area Tuning (Shared SQL Work Area)

Get information about health of shared


SQL work area Current Config Parm Settings
appgroup_mem_sz = 40000
db2 get snapshot for database <dbname> app_ctl_heap_sz = 512
| grep "Total shared" groupheap_ratio = 70

Total shared overflows = 100 Yields


Total shared section inserts = 155 Application Group Heap Size = 40000*.7 = 28000
Total shared section lookups = 400 Application Control Heap Size = 512*.3 = 154
# Apps per Group = 40000/512 = 78
Calculate cache hit ratio

Shared SQL work area hit rate =


1- 155/400 = 0.61
Application Group Sh'd Memory
Take action : increase application (Shared SQL Work Area)
group heap size ...

db2 update db cfg Desired State


for <dbname> Application Group Heap Size = 38000 (increase by 10000)
using appgroup_mem_sz ... Application Control Heap Size = 154 (no change)
group_heap_ratio ...
# Apps per Group = 78 (no change)
app_ctl_heap_sz

... but try to keep application control New Config Parm Settings
heap size and number of applications per appgroup_mem_sz = (38000 + 154*78) = 50012
group, constant app_ctl_heap_sz = 50012 / 78 = 641
groupheap_ratio = 38000 / 50012 = 76

No additional notes on this slide.


V8 Memory Tuning Tech Notes

http://www-1.ibm.com/support/docview.wss?rs=71&context=SSEPGG&uid=swg21179841&loc=en_US&cs=utf-8&lang=en

http://www-1.ibm.com/support/docview.wss?rs=71&context=SSEPGG&q1=application+heap+memory+usage&
uid=swg21175378&loc=en_US&cs=utf-8&lang=en+en

No additional notes on this slide.


Overall Database Memory Tuning : Pre V8

Buffer Buffer Lock Pack- Shared DB Log Cat- Utility


Pool 1 Pool 2 List age Sorts Heap Buffer alog Heap
Cache Cache

ƒ Size of each memory region fixed at database activation time


 Any change required complete database shutdown
ƒ Example scenario
> db2 backup database <dbname>
SQL2009C There is not enough memory available to run the utility.
> db2 force application all
> db2 deactivate database <dbname>
> db2 update db cfg for <dbname> using util_heap_sz 60000
> db2 activate database <dbname>
> db2 backup database <dbname>

The next 3 charts illustrate the DB2 advancements in dynamic memory tuning that have occured so far in the V8 timeframe. These are illustrated through an example - a backup command that requires more memory than is currently assigned to the utility heap.
As shown, prior to V8, the backup command will fail, and the database will have to be shutdown and reactivated before the utility heap can be enlarged.
Overall Database Memory Tuning : V8.1

Buffer Buffer Lock Pack- Shared DB Log Cat- Utility


Pool 1 Pool 2 List age Sorts Heap Buffer alog Heap
Cache Cache Database
Memory

Headroom for growth

ƒ Size of memory regions can be explicitly enlarged or reduced dynamically


 No database shutdown required
 Database memory configuration parameter provides “headroom” for growth
• Setting database_memory to AUTOMATIC results in ~20% room for growth
• Database memory not dynamically growable

ƒ Example scenario
> db2 backup database <dbname>
SQL2009C There is not enough memory available to run the utility.
> db2 update db cfg for <dbname> using util_heap_sz 60000
> db2 backup database <dbname>

In V8, with dynamic heaps, the utility heap can be enlarged without shutting down the database - a major improvement. However, explicit administrator action is still required before the backup command can succeed.
Overall Database Memory Tuning : V8.2

Buffer Buffer Lock Pack- Shared DB Log Cat- Utility


Pool 1 Pool 2 List age Sorts Heap Buffer alog Heap
Cache Cache
Database
Memory

Headroom for growth

ƒ Size of memory regions can be automatically enlarged and reduced dynamically,


on demand
 Memory region cfg settings now denotes guaranteed minimum amount of memory that
will be available for a particular purpose (no longer a maximum)
 Memory regions will automatically grow into “headroom” region on demand
ƒ Database memory (ie. headroom) can automatically grow
 On threaded platforms or those with lazy paging allocation policies (eg Windows, AIX*)
ƒ Example scenario
> db2 backup database <dbname>
V8.2

In V8.2, heaps can automatically grow themselves, without intervention. Further, the database memory segment from which the utility heap is allocated, can also grow automatically (on certain platforms - AIX and Windows).
So, in V8.2, the backup command will automatically cause the utility heap to be enlarged, and will succeed.
Database Memory Tuning : Futures
Buffer Buffer Lock Pack- Shared DB Log Cat- Utility
Pool 1 Pool 2 List age Sorts Heap Buffer alog Heap
Cache Cache
Database
Memory

Headroom for growth

ƒ Size of memory regions dynamically


DB2 Clients

Memory Tuner
and automatically balanced to achieve Entry Size
MIMO
optimal performance as workload Control Algorithm

changes Y
Memory
Model Greedy Statistics
 As opposed to purely on demand Builder Accurate (Constraint) Collector

 Using built-in, automatic memory N


Fixed Oscillation Entry
costing Step Reduction Size DB2 UDB
Engine

Saved System Time Benefit


ƒ Database memory (ie. headroom) can
automatically grow and shrink

In V8.2, heaps can automatically grow themselves, without intervention. Further, the database memory segment from which the utility heap is allocated, can also grow automatically (on certain platforms - AIX and Windows).
So, in V8.2, the backup command will automatically cause the utility heap to be enlarged, and will succeed.
Bufferpools and I/O
Database Shared Memory
Lock List
Bufferpool(s)
Pkg Cache

Shared Sorts

DB Heap

Utility Heap

I/O

Disks

Each tablespace is assigned a bufferpool


Multiple tablespaces can be assigned to a single bufferpool
The page sizes of a tablespace and its bufferpool must match

As a rule of thumb, start with about 75% of main memory devoted to


bufferpools, assuming a dedicated OLTP database server
Use a smaller number for BI/DSS workloads (eg. 35%)

Bufferpool fundamentals.
Dynamic Bufferpool Operations
Now you can alter bufferpools dynamically, without shutting down the
database
>>-ALTER--BUFFERPOOL--bufferpool-name--------------------------->
+IMMEDIATE+
>-----+-+---------+--+--------------------+---SIZE--n--+----><
| +DEFERRED-+ '-NODE--node-number--' |
| |
+-+-NOT EXTENDED STORAGE-+-----------------------+
| '-EXTENDED STORAGE-----' |
'-ADD NODEGROUP--nodegroup-name------------------'
+IMMEDIATE+
>>-CREATE--BUFFERPOOL--bufferpool-name---+---------+------------> ...
+DEFERRED-+
>>-DROP----BUFFERPOOL--bufferpool-name--------------------------> ...

With the IMMEDIATE option, the change is effective on commit


When reclaiming memory (reducing size or dropping), the wait for pages to become
available occurs immediately (on ALTER/DROP invocation)
Memory is not available for other use until a COMMIT is done
Old behaviour is achieved using the DEFFERED option

Rolling forward now replays bufferpool operations


Can significantly improve recovery speed

One of the key features of DB2's algorithm here is that the internal hashing tables used to keep track of the bufferpool size is proportional adjusted, based on the resizing specified on the ALTER command. This is important to prevent excessive CPU consumption and contention as the bufferpool grows.
Note that DROPing a bufferpool has always had immediate semantics.
Bufferpools : Prefetching I/O

Agents
Agents send prefetch requests to the Asyn
c IO
prefetch queue(s) during planned Pre
prefetching (eg. tablescans), sequential fet
ch
detection (eg. scan through a clustered Re
qu
index), and list prefetch (eg. sorted list est
s
of pages gathered through an index
scan)

Buffer Pool(s) Page Region Block Region Prefetchers


Normally, big block reads into
discontiguous bufferpool pages exploit
available interfaces when appropriate
(eg. readv on AIX, scattered read on
Windows). On other platforms the big
block is first read into a temporary
buffer, followed by each page being
copied individually into the bufferpool.

When a buffered pool is configured for


block access, big block IOs are
performed directly into contiguous
space in that bufferpool, when such
continguous space can be found. This
is typically more efficient than either of
the above to alternatives.

More details on read operations.


Bufferpools : Page Cleaner Triggers

Agents can trigger the page cleaners


Agents when they perform dirty steals and when
flushing objects (for example, during not
logged operations, eg create index, not
logged initially transactions)

Buffer Pool(s)

Page
Cleaners

Log Writer
The logger can trigger the page cleaner
The page cleaners trigger themselves if
when available log disk space is getting
the proportion of dirty pages exceeds
low, or when target recovery window is
the target (CHNGPGS_THRESH). This is
exceeded (SOFTMAX). This is termed a
termed a threshold trigger.
lsn gap trigger.

More details on write operations.


An example of how log space can get low, and how page cleaning can help:
Suppose there is a hot page - a page that is accessed very frequently
If some agent updates this page, it will likely stay buffered, and won't be flushed to disk by the page replacement algorithm
There will be a log record written to the log for this update
Be default, DB2 needs to keep these log records in the active log directory on disk, so that they can be used to recover updates, in the event of a crash
So, DB2 can't overwrite log files containing such log records,.. unless it knows all those transactions have ended, and any updated pages have been written to disk --> this is where page cleaning can help
Bufferpools : Direct Agent Reads/Writes

Agents
Dirty steal from agent (rare)
Page read from agent (common with
OLTP workloads)

Buffer Pool(s) Prefetchers


Page Region Block Region
Scattered read
(prefetch
request)
Big block read

Page
Cleaners
Dirty page
writes

This chart shows the specific type of I/O that is used to bring pages into and out of, bufferpools. The arrows beside each process/thread indicate the type of I/O used by that process/thread:
Note that a down arrow is a write operation and an up arrow is a read.
The block region of a bufferpool is an optional reservation of a certain set of contiguous pages in the bufferpool. The pages in this region ares managed on a 'block' basis, rather than on an individual page basis. That is, the buffer manager will try to keep consecutive blocks of pages available in this region, and will try to use such a consecutive block of pages to satisfy prefetch requests that require a large
block of pages that are consecutive on disk. This allows the I/O of such a block of pages to be done in a single large block I/O, which is generally more efficient than the alternatives (eg. a scattered read operation which reads consecutive pages from disk to discontiguous pages in memory).
Bufferpool : Hints/Tips
A single bufferpool is often the best choice

Other than its size, a single bufferpool needs little/no tuning


It uses a optimized clock-algorithm for aging pages, and it ages pages independently from other
bufferpools
Several techniques are used to optimize the hit ratio, for example:
Important pages are favoured (EMP, SMP, index pages)
Pages not required in the future are placed on "hate stacks" which are used to quickly identify victims

Some cases where multiple bufferpools can help


Consider giving tables with real-time requirements their own dedicated bufferpool in order to bound response time
Consider giving tables that are always appended to (eg. journal or history tables) a small dedicated bufferpool
If access frequencies are well-known, consider fine-tuning hit rates by assigning different bufferpools to
different objects

Consider defining a block region of a bufferpool when there is a significant


sequential I/O component in the workload (eg. table scans, index scans)

Limit the block region to 50% of the bufferpool


Typical settings are in the 10-20% range
Note: block based bufferpool will have less of an opportunity for improving performance on platforms
that have a good vector read implementations (eg AIX, Windows)

Bufferpool hints/tips.
Direct I/O
ƒ Many file systems now support “Direct I/O”
 Bypasses the file system’s buffercache
 Combines performance benefits of RAW with the usability benefits of
file systems
 Examples: AIX Concurrent I/O, Veritas Quick I/O

Page Page
Agents Prefetchers Cleaners Agents Prefetchers Cleaners

DB2
Bufferpools
Filesystem
Buffercache

Tables Without Tables With


Indexes Direct I/O Indexes Direct I/O

Concurrent I/O on AIX (aka CIO) is generally preferred over it's predecessor 'DIO'.
Direct I/O Enhancements in 8.2 V8.2

ƒ NO FILE SYSTEM CACHING enables Direct I/O for particular tablespaces


CREATE TABLESPACE <tablespace name> NO FILE SYSTEM CACHING
ALTER TABLESPACE <tablespace name> NO FILE SYSTEM CACHING

ƒ Mechanisms to control the use of file system buffering


 File system mount option, eg: mount -o cio <fs name>
 Registry variables, eg: DB2_DIRECT_IO, DB2NTNOCACHE Å deprecated
 CREATE/ALTER TABLESPACE Å recommended

ƒ Above list is in approx order of precedence, however rules are complex


 Surest way to determine which is in effect is to use tablespace snapshot
GET SNAPSHOT FOR TABLESPACE ON <dbname>
Tablespace Page size (bytes) = 4096

File system caching = No

CIO/DIO white paper: http://www3.software.ibm.com/ibmdl/pub/software/dw/dm/db2/dm-0408lee/CIO-article.pdf

The CREATE / ALTER mechanism for enabling direct I/O is strongly recommended over the others.
Note: Temps are supported through the mount option. DDL support is coming in 8.2.2.
The Logging Subsystem

ts
ues
Req
Log ck)
Wr it e ollba Buffer Pool(s)
st s (for R
e
Requ
Log
Read
Log Buffer

db2loggr
db2loggw

<database dir>/SQLOGDIR
S0000000.LOG
S0000001.LOG Disk,
S0000002.LOG
etc
Tape,
db2logmgr TSM
V8.2

The elements of the processing architecture of DB2 that are devoted to log management, are highlighted in blue.
Logging : Key Facts
Changes to regular data and index pages are written to log buffer in memory
BLOBs and LONG VARCHARs use shadow paging (data is not logged unless Log Retain is used and the LOB column is defined to
be logged (ie doesn't use the NOT LOGGED option)
The changes to BLOB and LONG VARCHAR allocation pages are logged as regular data pages are

Pages from the log buffer are regularly forced to the online log files on disk by the db2loggw
The db2loggw tries to always keep a large block I/O outstanding against the log device
However, there are times when the db2loggw may force specific individual pages or groups of pages to disk

When log records are not being genearated quickly enough for large blocks of contguous pages from the log buffer to always be ready for writing, the
db2loggw can write smaller groups of pages
If a dirty buffer pool page is written to disk, the db2loggw will first write the log pages containing the log records associated with the dirty page (if they're not
already on disk)
On COMMIT (or after mincommit transactions COMMIT), the db2loggw will write all log pages associated with the transaction(s), if they're not already on
disk

DB2 offers two logging retention strategy choices


"Circular" (log eventually wraps around and overwrites initial log file, aka No Log Retention)
"Log Retain" (enables log archiving and Roll Forward Recovery)

When log archiving is enabled, each online log file is archived by the db2logmgr after it becomes full
Archival devices supported include disk, TSM, tape

By default, online log files (those containing log records for active transactions or dirty pages - ie. those
needed in the event of crash recovery) cannot be overwritten
In this case, the total amount of active log space cannot exceed the total amount of online log space configured
Active log space = #bytes in log stream from first log record written by oldest active transaction or log record corresponding to the
oldest dirty page in the bufferpool (whichever was written first) to the end of the log

When infinite logging is enabled, archived log files can be immediately overwritten with new log data
If a rollback occurs that requires the overwritten log data, the archived log file will be retrieved

This chart provides a high level overview of the logging subsystem within DB2.
Logging : Key Parameters & Hints/Tips
logfilsz, logprimary, logsecond : determine online log disk space allocation

logfilsz : size, in 4Kbyte pages, of each log file


logprimary : number of primary log file (disk space for primary log files are allocated at database activation)
logsecond : maximum number of secondary log files (disk space for secondary log files is allocated on demand)

Total online log space is limited by (logfilsz * (logprimary + logsecond) * 4K bytes)


Recommendations
Try to configure enough online log space so that it always exceeds active log space
Use infinite logging to handle exceptional cases where active log space exceeds this limit (eg. errant transactions)
Use tot_log_used_top snapshop monitor element to monitor high water mark for active log space

logbufsz : the size of the in-memory log buffer

Larger sizes can buffer log I/Os more effectively - both writes and reads (for rollbacks) and prevent agents from waiting on log I/O
Recommendations
Use the num_log_buffer_full, and num_log_data_found_in_buffer, snapshot monitor elements to determine if the log buffer is too
small

Other recommendations
Use the following snapshot monitor elements to determine if the I/O subsystem is a bottleneck
log_write_time
log_read_time
num_log_write_io
num_log_read_io
log_writes
log_reads
Use ALTER/CREATE TABLE ... NOT LOGGED INITIALLY to turn off logging for the table during a given transaction
Avoid circular logging unless you can accept data loss in media failure scenarios, or can recover your data through other means

Note that when the NOT LOGGED INTIALLY clause of ALTER TABLE or CREATE TABLE is used, no logging of the records inserted/updated/deleted in that table takes place during the transaction. However, at COMMIT, we ensure all changed pages are flushed to disk, to ensure recoverability.

This capability can be helpful in reducing log space requirements when populating large tables.

It can also, in some situations, help increase performance. However, the logging benefit must be weighed against the page flushing drawback. Transactions which make a large number of changes to a small number of pages are more likely to gain performance advantages because the page I/O would likely be less than the log I/O that would result if the NOT LOGGED INITIALLY clause was not used.

Note, that the pagecleaners can perform much of the page I/O in the background before COMMIT. Consider (perhaps temporarily) setting the chngpgs_thresh/iocleaners configuration parameters to lower/higher values in order to make the cleaners more aggresive.

Note, also, that there are some important recoverability considerations with NOT LOGGED INITIALLY tables. Read about these in the Administration Guide before using this capability. (There's a pointer to online UDB document on the last chart).
Logging : Snapshot Parameter Details
The following parameters can help one in determining if the log disk(s) are sufficient. (These parameters
will allow one to determine average read/write I/O time, and average read/write I/O size. )

log_write_time : total elapsed write time spent in the logger


waiting on and performing log page writes
log_read_time : total elapsed read time spent by the logger
waiting on and performing log page reads
num_log_write_io : number of ios issued for writing log data
num_log_read_io : number of ios issued for reading log data
log_writes : total number of log pages written
log_reads : total number of log pages read

The following parameters can help one in determining if the log buffer is too small
(num_log_data_found_in_buffer, together with num_log_read_io can give the hit and miss ratios for
rollback).

num_log_buffer_full : number of times agents have to wait(during copy of


log records into log buffer) for some log data to
be written to disk. This is incremented per agent per
incident. That is, if buffer is full and 2 agents
want to write log data while the buffer is full,
its value is incremented by 2.
num_log_data_found_in_buffer : number of times log data is found from
buffer when agent is reading log record
(thus avoiding the need to read from disk)

Details on how some of the snapshot elements can be very useful in monitoring and evaluating the performance of the logging subsystem.
Infinite Logging

Allows space used by archived active logs to be overwritten with new log data

These logs are retrieved in the event of ROLLBACK

There's no need to worry about an occasional "run-away" transaction causing an


outage due to a log full condition

An active unit of work can span an infinite number of logs

One is no longer limited by the size of the primary log (logprimary x logfilesiz)

Enable this using:

>--UPDATE DB CFG-+--------------+-USING LOGSECOND -1


+-FOR <dbname>-+

Infinite logging was added in V8.1, and is aimed at providing tolerance to the occasional errant transaction that requires an excessive amount of log space.
Infinite Logging Usage Considerations

Log archival must be enabled

This is not a license to design massive transactions ; intended use is an


'insurance policy' against errant/run-away transactions that would otherwise
cause log full conditions

Rollback and crash recovery may require (relatively slow) retrieval of archived logs

Watch for:

Long running applications that do a few updates and hang so they never commit or end
Runaway transactions - eg. caused by SQL issued in error
A warning is written to the Administration Notification log when current units of work exceed
primary log allocation

It's very important to note that you should NOT design transactions to exploit log space that exceeds that of the configured online log. If such transactions decide to rollback, the ensuing undo operation will require potentially lengthy log retrieve operations.
Again, this feature is really designed to handle the exceptional errant transaction, not the general case.
Logging : More Hints/Tips
Use the log throttling configuration parameters (added V8 FP2) to
prevent 'runaway' transactions:

max_log
Maximum active log space consumed by one transaction as a percent of primary log space
Has a minimum value of 0 and a maximum value of 100
A value of 0 means that the control is not in use
Dynamic configuration parameter

num_log_span
Number of active log files a single transaction is allowed to span
Has a minimum value of 0 and a maximum value of 65535
A value of 0 means that the control is not in use
Dynamic configuration parameter

What happens when a transaction violates either ?


Transaction rolled back
Application forced off the database
However, if DB2_FORCE_APP_ON_MAX_LOG registry variable is set to FALSE (Default =
TRUE)
SQL0964N is returned.
Opportunity to issue COMMIT or ROLLBACK

These new log throttling parameters were added in V8 FP2.


Summary
You should now have an appreciation of UDB's internal architecture
and how to use that knowledge to tune an installation

Lots more at:


http://www.ibm.com/software/data/db2/

Emails are welcome


huras@ca.ibm.com

V8 Memory Tuning Tech Notes


http://www-1.ibm.com/support/docview.wss?rs=71&context=SSEPGG&uid=swg21179841&loc=en_US&cs=utf-8&lang=en
http://www-1.ibm.com/support/docview.wss?rs=71&context=SSEPGG&q1=application+heap+memory+usage&
uid=swg21175378&loc=en_US&cs=utf-8&lang=en+en

Recommended AIX Levels


Minimum recommended maintenance has changed to 5.1 ML6 , 5.2 ML06 , 5.3 ML02
More at:
http://www-1.ibm.com/support/docview.wss?rs=71&uid=swg21165448

Concurrent I/O White Paper


CIO/DIO white paper:
http://www3.software.ibm.com/ibmdl/pub/software/dw/dm/db2/dm-0408lee/CIO-article.pdf

Thanks for your time!


®

IBM Software Group


Platform: DB2 for Linux, UNIX, Windows

DB2 UDB Internals : The Deep Dive, Part 1

Matt Huras, DE, IBM


huras@ca.ibm.com

Session: C09
Tue Oct 25 5:30-6:30

This session will dive into the internals of DB2 UDB in depth, including details of the latest versions of DB2 on the Unix, Windows and Linux platforms (V8.2 and beyond). Details such as record formats, page formats, index algorithms, memory management and tuning, storage management, bufferpool algorithm, logging, and the process and threading design will be covered in depth. As each concept explained,
key hints, tips and best practice information will be provided. This will enable DBAs and System Administrators to fully exploit the functions and features of DB2 UDB.
In this first (of two) parts, the focus will be on process and thread management, as well as logging, buffering and memory management..

You might also like