A PEEK INSIDE ORACLE ASM METADATA

Luca Canali, CERN, Jan 2006.

OVERVIEW
Oracle ASM (Automatic Storage Management) is a new feature of Oracle 10g to streamline storage management and provisioning. ASM provides volume and (cluster) filesystem management where the IO subsystem is directly handled by the Oracle kernel [Ref 1,2]. With Oracle 10g and ASM it is possible to build a scalable and highly available storage infrastructure build on low-cost hardware [Ref 3]. A typical example is using SATA HD with fiber channel controllers arranged in a SAN network. A scalable architecture built on low-cost HW is deployed at CERN combining ASM and Oracle 10g RAC on Linux [Ref 4]. Oracle documentation and whitepapers [Ref 1,2,3,5] provide the necessary information to set up Oracle ASM instances and configure the storage with it. Configuration details and performance metrics are exposed via a few V$ views. Other possibilities are the command line interface, asmcmd (10g R2) and the graphical interface of OEM. Metadata are however partially hidden to the end user. That is the mapping between physical storage, ASM allocation units, and database files is not completely exposed via V$ views. The author has found that is however possible to query such information via undocumented X$ tables. For example, it is possible to determine the exact physical location on disk of each extent (or mirror copies of extents) for each file allocated on ASM (and if needed access the data directly via the OS). This kind of information can be put to profit by Oracle practitioners wanting to extend their knowledge of the inner workings of the ASM or wanting to diagnose hotspots and ASM rebalancing issues [Ref 6]. Direct access to ASM files (data peeking), possibly automated in a small ‘utility’, for didactical scopes or for emergency data rescue is a possible extension of the findings documented here.

ORACLE ASM RECAP
Database storage is organized by ASM using the following components: each LUN is mapped as a disk (possibly using asmlib on Linux). Disks are grouped together into disk groups. Each disk group can be segmented in one or more fail groups. Typically 2 fail groups are created for a normal redundancy disk group, ASM takes then care of writing mirror data in two copies allocated in different fail groups [Ref 5]. Oracle files are allocated by ASM from the pool of storage defined by the disk groups, as specified by the DBA. ASM takes care of mirroring and striping the data, following the S.A.M.E. concept [Ref 7]. A typical configuration [Ref 1-5] uses 2 disk groups: a data disk group (datafiles, redologs, spfile, tempfile and controfile) and a recovery disk group for the flash recovery area (archive logs, disk backups, redolog and controlfile multiplexed members). Configuration of the disk groups is performed via SQL commands [Ref 4]. The following V$ views are available to expose the configuration and some usage statistics, such as the number of physical reads and writes. All these views are accessible from the ASM instance (while some of them display no records when queried from the database instance).
View Name V$ASM_DISKGROUP V$ASM_DISKGROUP_STAT V$ASM_DISK V$ASM_DISK_STAT V$ASM_FILE Based on X$KFGRP X$KFGRP_STAT X$KFDSK, X$KFKID X$KFDSK_STAT, X$KFKID X$KFFIL Description performs disk discovery and lists diskgroups lists diskgroups perform disk discovery and lists disks + usage metrics List disks + usage metrics lists ASM files (1 row per file)

1

1 used to identify primary/mirror extent. EXAMPLE 1 – DIRECT FILE RETRIVAL FROM ASM DISKS The server parameter file (spfile) for the database instance is stored in an ASM diskgroup. useful to illustrate how to use the X$KFFXP table. X$KFBH. Join with v$asm_disk Relative position of the allocation unit from the beginning of the disk.V$ASM_ALIAS V$ASM_CLIENT V$OPERATION N. The allocation unit size (1 MB) in v$asm_diskgroup 0. directories) lists instances DB instances connected to ASM lists running rebalancing operations Extent mapping table for ASM files From the table above we can see that the V$ASM_* views are based on X$KF* (i. ASM ALLOCATION TABLE X$KFFXP contains the physical allocation table for each ASM file. When mirroring is used.A. that is it contains the mapping between ASM files (identified by columns NUMBER_KFFXP and COMPOUND_KFFXP) and their location on disk is. X$KFCBH. By querying the undocumented X$ tables listed above the author has found that the extent mapping table for ASM is contained in X$KFFXP (see also Ref 7 and 8). Join with incarnation in v$asm_file Extent number per file Logical extent number per file (mirrored extents have the same value) ASM disk group number.FILE_NUMBER. X$KFALS X$KFTMTA X$KFGMG X$KFFXP lists ASM aliases (files.e. 2 identifies file header allocation unit (hypothesis) N. X$KFDPARTNER.K. 2 . There are more of such tables that are not used to build V$ASM_* views: X$KFFXP. Note: the findings reported here are based on querying the documented dictionary views: V$FIXED_VIEW_DEFINITION and V$FIXED_TABLE. X$ tables with KF as a prefix). This is a small file. Join with v$asm_disk and v$asm_diskgroup Disk number where the extent is allocated. DISK_KFFXP) is segmented in 1MB chunks called allocation units (column AU_KFFXP). ASM files are correspondingly allocated in extents that are mapped to the Disk allocation unit.BYTES from v$asm_file where type='PARAMETERFILE'. X$KFFXP DESCRIPTION By querying X$KFFXP on a test database running ASM 10g R2 and RAC the following description for X$KFFXP has been speculated: Column Name ADDR INDX INST_ID NUMBER_KFFXP COMPOUND_KFFXP INCARN_KFFXP PXN_KFFXP XNUM_KFFXP GROUP_KFFXP DISK_KFFXP AU_KFFXP LXN_KFFXP FLAGS_KFFXP CHK_KFFXP Description table address/identifier row identifier instance number (RAC) ASM file number. X$KFDAT. 1. X$KFCCE. We find the disk group and file number of the database spfile: sys@+ASM1> select GROUP_NUMBER. Join with v$asm_file and v$asm_alias File identifier. each extent is allocated to 2 or 3 allocation units (2-way or 3-way mirroring). Join with compound_index in v$asm_file File incarnation id.K. X$KFCLLE. Space on disk (identified by GROUP_KFFXP. N.

The space allocation mapping (mapping between ASM file extents and allocation units on disk) for such a file can be queried like this: select DISK_KFFXP. ‘dd’ allows the sysadmin to read the disks directly (bs=1M is the block size. while skip=176 means that the command starts reading at the offset 176M) . composed of 4 disks arranged in two fail groups: disk 20 and 24 (fail group FG1). as expected.LXN_KFFXP from x$kffxp where GROUP_KFFXP=1 and NUMBER_KFFXP=271.PXN_KFFXP.---------1 267 3584 2. Using the disk name found in step 3 (only disk 0 demonstrated here) and the offsets found in step 2 we can confirm that the spfile data is at the expected physical location.---------.XNUM_KFFXP. A tablespace with 1 datafile of size 100MB has been created in an ASM disk group. From steps 1. FAILGROUP DISK_NUMBER PATH ---------.AU_KFFXP.__db_cache_size=1476395008 test11.PXN_KFFXP.---------.LXN_KFFXP from x$kffxp where GROUP_KFFXP=1 and NUMBER_KFFXP=267. EXAMPLE 2 – 100MB DATAFILE EXTENT ALLOCATION This example demonstrates how to query the datafile allocation map on ASM for a disk groups with normal redundancy (2-way mirroring).__java_pool_size=16777216 We can see the first 4 lines of the spfile are printed out. DISK_KFFXP AU_KFFXP PXN_KFFXP XNUM_KFFXP LXN_KFFXP ---------. We can find the OS path of the disks with the following query (note the test system used was on Linux using asmlib): sys@+ASM1> select failgroup.---------.---------. and 2. DISK_KFFXP AU_KFFXP PXN_KFFXP XNUM_KFFXP LXN_KFFXP ---------. above we know that the spfile is 3584 bytes long and is stored in 2 mirrored extents: one on disk 0. disk 7 and 8 make (fail group FG2).---------20 13682 0 0 0 7 14358 1 0 1 24 13699 2 1 0 8 14364 3 1 1 8 14366 4 2 0 3 .GROUP_NUMBER FILE_NUMBER BYTES -----------.XNUM_KFFXP.AU_KFFXP. $ dd if=/dev/oracleasm/disks/ITSTOR11_10_EXT bs=1M count=1 skip=176| strings|head -4 test12.24).---------. We can now confirm with OS commands that the mapping is correct.---------24 3820 0 0 0 0 176 1 0 1 3.disk_number.path from v$asm_disk where GROUP_NUMBER=1 and DISK_NUMBER in (0.----------.----------.---------.__java_pool_size=16777216 test11. We find the number and location of the extents where the spfile is written: sys@+ASM1> select DISK_KFFXP.-------------------FG1 24 ORCL:ITSTOR08_2_EXT FG2 0 ORCL:ITSTOR11_10_EXT 4. the other on disk 24 (on disk group).__db_cache_size=1476395008 test12.

.---------7 51 8 50 20 51 24 50 We can see that space is allocated in a uniform way between the different disks (and fail groups).24 13678 . The following query investigates how the extents are allocated across the four available disks in the test disk group used for this example. . LXN_KFFXP. One of the goals of ASM is providing uniform allocation of datafiles across the available spindles. DISK_KFFXP COUNT(*) ---------. This is a key feature of ASM and has the scope of maximizing performance (throughput. count(*) from x$kffxp where GROUP_KFFXP=1 and NUMBER_KFFXP=271 and XNUM_KFFXP!=2147483648 group by 4 . lower latency times and increase HA. 2-way mirroring. count(*) from x$kffxp where GROUP_KFFXP=1 and NUMBER_KFFXP=271 and XNUM_KFFXP!=2147483648 group by DISK_KFFXP . The actual datafile size is 101MB (the extra megabyte is allocated by Oracle for tablespace internal structures independently of the use of ASM). 198 99 199 99 200 100 201 100 0 2147483648 1 2147483648 2 2147483648 0 1 0 1 0 1 2 We can see that 205 allocation units of 1MB each are listed for the tablespace created with a datafile of 100MB nominal size. . sys@+ASM1> select DISK_KFFXP. Note: Oracle needs only has to read the primary extent of a mirrored pair of ASM-allocated extents for read operations (while both extents need to be accessed for a write operation). With the following query we can see that the primary and mirror extents (identified by the value of LXN_KFFXP) are mixed uniformly for each disk. select DISK_KFFXP. this accounts for 101x2=202 allocation units. These are most likely file headers and/or metadata (this conclusion has still to be further investigated). . The remaining 3 allocation units are listed in the last 3 rows in the x$kffxp table and are identifiable by aving XNUM_KFFXP=2147483648. Therefore a uniform primary/mirror extent allocation provides for better performance of read operations. . 5 2 1 8 14490 24 13750 20 13791 7 14493 7 14384 24 13714 65534 4294967294 205 rows selected. i.e.. IOPS). The disk group used for this test has ‘normal redundancy’.

LXN_KFFXP order by DISK_KFFXP. DISK_KFFXP LXN_KFFXP COUNT(*) ---------. DISK_NUMBER ----------2 3 20 24 0 1 7 8 FAILGROUP N#_AU ---------. after the online redefinition of the disk group.---------FG1 25 FG1 25 FG1 26 FG1 25 FG2 25 FG2 26 FG2 25 FG2 25 We can see that the datafile.---------7 0 25 7 1 26 8 0 25 8 1 25 20 0 26 20 1 25 24 0 25 24 1 25 EXAMPLE 3 – DISKGROUP REBALANCING This example illustrates the outcome of an ‘online’ disk group enlargement operation: four additional disks have been added to the disk group used in the example 2 above (disks 2.failgroup. 0. a. The outcome. as expected.GROUP_KFFXP=a. shown below. a.DISK_KFFXP.group_number where GROUP_KFFXP=1 and NUMBER_KFFXP=271 and XNUM_KFFXP!=2147483648 group by a.'ORCL:ITSTOR08_3_EXT' failgroup fg2 disk 'ORCL:ITSTOR11_3_EXT'.disk_number.disk_number order by a. 1 will be added) and the disk group has then be rebalanced (see SQL below).disk_kffxp=a.---------. count(*) N#_AU from x$kffxp x join v$asm_disk a on x. 5 .failgroup. is spread uniformly across the disks and correctly mirrored across the two fail groups. alter diskgroup test1_datadg1 add failgroup fg1 disk 'ORCL:ITSTOR08_1_EXT'. select a. a. LXN_KFFXP. After this operation the ASM space allocation map for the test datafile has been queried again.'ORCL:ITSTOR11_4_EXT'. alter diskgroup test1_datadg1 rebalance power 4. 3.disk_number. is that the file is again uniformly spread over all the disks available in the disk group. as expected.failgroup.disk_number and x.

2 (see rows where XNUM_KFFXP=2147483648). ASM rebalancing algorithm apparently does not utilize workload metrics (from v$asm_disk_stat) to spread datafiles (such as spread apart hot parts of the datafiles) but seems to use a simpler ‘round robin’ algorithm. OOW 2005. but their purposes have not yet been documented. that mirroring is taken care by ASM and it is done at the extent level (as opposed to volume-level mirroring found in many other volume managers).com/technology/deploy/availability/pdf/oow2000_same. it was found that datafiles are automatically spread over the available disks in a disk group. https://twiki. OTN 2005.pdf 5.cern. A few additional X$KF* tables have been identified (see above). N.pdf 4. J Loaiza and S Lee.pdf 3. S.com/pls/wocprod/docs/page/ocom/technology/products/database/asm/pdf/take %20the%20guesswork%20out%20of%20db%20tuning%2001-06. Configuration details and performance metrics of the configured ASM disks and disk groups are exposed via V$ASM_* views. OTN 1999.oracle. 2005. posting to oracle-l mailing list 6 .com/technology/products/database/asm/pdf/asm_10gr2_bptwp_sept05. take the guesswork out of db tuning. http://www.SUMMARY Oracle ASM is a powerful and easy to use volume manager and filesystem for Oracle 10g and 10g RAC. REFERENCES 1. such as: the role of the ‘extra 3 allocation unit’ allocated for each datafile. Rebalancing operations have been demonstrated to (re)distribute datafile extents uniformly over the available disks. Bug N. http://www. Rognes. Metalink. http://www. 2004. that were documented in example N. As expected.ch/twiki/pub/PSSGroup/HAandPerf/Architecture_description_Feb05.com/technology/deploy/availability/pdf/1262_Loaiza_WP. scalable Oracle 10g architecture. This paper details how queries on the X$KFFXP internal table can be used to work around this limitation. Oracle 10g administrator’s guide. 4306135 7. However.oracle. the space allocation mapping (ASM file extents to disk allocation unit mapping) is not fully documented. http://www. and that ‘online’ disk additions to a disk group allow to spread datafiles uniformly over a large number of spindles in a ‘transparent’ way and can be used to improve performance and possibly reduce the impact of ‘hot spots’. Shakian. Optimal storage configuration made easy. L. Vengurlekar.oracle. “Using Automatic Storage Management” 6. A set of working examples has been discussed to demonstrate the findings and to directly explore some inner workings of ASM. A few open points remain to be investigated. J Loaiza.oracle.pdf 2.pdf 8. A. Canali.

Sign up to vote on this title
UsefulNot useful