You are on page 1of 5

Differences in df and du on Oracle Cluster File System (OCFS2)

and Orphan Files


by Jeff Hunter, Sr. Database Administrator

Contents

1. Overview
2. Troubleshooting Steps
3. Upgrade OCFS2
4. Nightly Check Script for Orphan Files
5. About the Author

Overview

Recently, it was noticed that the df and du commands were displaying different results on two OCFS2 file systems
from all clustered database nodes in an Oracle RAC 10g configuration.

Mount OCFS2 Kernel OCFS2 OCFS2


LUN Purpose
Point Driver Tools Console

/ Oracle CRS
/u02 1.4.2-1.el5 1.4.2-1.el5 1.4.2-1.el5
dev/iscsi/thingdbcrsvol1/part1 Components

/u03 /dev/iscsi/thingdbfravol1/part1 Flash Recovery Area 1.4.2-1.el5 1.4.2-1.el5 1.4.2-1.el5

For example:

[oracle@thing1 ~]$ df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup01-LogVol00
33708824 17234092 14734752 54% /
/dev/hda1 101086 12188 83679 13% /boot
tmpfs 1036988 0 1036988 0% /dev/shm
domo:Public 4799457152 1876358656 2923098496 40% /domo
/dev/sda1 10485728 337920 10147808 4% /u02
/dev/sdb1 943714272 326912416 616801856 35% /u03

[oracle@thing1 ~]$ du -sk /u03


78463904 /u03

In the above example, the difference between df and du on the /u03 cluster file system is 248 GB! (326912416-
78463904)

The problem appeared to be isolated to the /u03 cluster file system which is used exclusively for the global Flash
Recovery Area (FRA). The files stored in the FRA include RMAN backups, archived redo logs, and flashback
database logs:

/u03/flash_recovery_area/thingdb/archivelog
/u03/flash_recovery_area/thingdb/autobackup
/u03/flash_recovery_area/thingdb/backupset
/u03/flash_recovery_area/thingdb/flashback

These directories are all managed by the Oracle RDBMS software — no files in the FRA (or /u03 in general) are
manually added or removed.

Researching this problem yielded the following My Oracle Support notes:


 OCFS2: df and du commands display different results [ID 558824.1]

Discusses either orphaned files or invalid cluster size as the problem.

 OCFS2: Disk Space is not Released After Deleting Many Files [ID 468923.1]

Contains tests used to determine that the current difference between the df and du commands were indeed
orphaned files and not the result of an invalid cluster size.

 OCFS2 1.4.1 not removing orphaned files after deletion [ID 806554.1]

It discusses a bug within OCFS2 that leaves some deleted files in the orphan directory
(the //orphan_dir name space in OCFS2) after being deleted.

The solution as described in Note ID 806554.1 is to upgrade OCFS2 to version 1.4.4 or higher. At the time of
this writing, the latest version of OCFS2 was 1.4.7-1.

After several days of research, it was determined that orphan files on the OCFS2 cluster file system were responsible
for the significant difference between the df and du commands. OCFS2 was apparently leaving some deleted files in
the orphan directory (the//orphan_dir name space in OCFS2) after being deleted.

Note ID 468923.1 on the My Oracle Support website explains that despite deleting files and/or directories on an
OCFS2 cluster file system, it is possible that orphaned files may exist. These are files that have been deleted, but are
being accessed by running processes on one or more OCFS2 cluster nodes.

When an object (file and/or directory) is deleted, the file system unlinks the object entry from the existing directory and
links it as an entry against that cluster node's orphan directory (the //orphan_dir name space in OCFS2). When
the object is eventually no longer used across the cluster, the file system frees it's inode including all disk space
associated with it.

So it appears that in order to completely avoid orphaned files being created, the application/database should
guarantee that the files are not being opened by other processes when deleting. Again, the problem seemed to be
isolated to the /u03 file system which was used exclusively for the global Flash Recovery Area and managed solely
by the Oracle RDBMS software. No files on this file system were being manually added or deleted.

The orphaned files in /u03 added up significantly given they were RMAN backups, archived logs, flashback logs, etc.
The initial plan was to manually remove all orphan files using /sbin/fsck.ocfs2 -fy
/dev/iscsi/thingdbfravol1/part1. Since one of the pre-requisites is to unmount the file system(s), a
production outage had to be scheduled.

It was later found that a bug within OCFS2 was leaving some deleted files in the orphan directory after being deleted.
The solution was to upgrade the OCFS2 configuration to version 1.4.4 or higher. At the time of this writing, the latest
version of OCFS2 was 1.4.7-1.

This article discussing some of the troubleshooting steps that were involved in resolving OCFS2 not removing orphan
files.

Troubleshooting Steps

Reboot Cluster Nodes

During the initial troubleshooting phase, all nodes in the cluster were rebooted to determine its effect.

[root@thing1 ~]# reboot

[root@thing2 ~]# reboot

When the nodes came back online, the problem was resolved and df and du were calculating the same disk space
usage. This was a short lived victory, however, when in less than 24 hours, df and du were once again reporting stark
differences on the /u03 cluster file system.
Identify Holders Associated with the OCFS2 "//orphan_dir" Name Space

Testing resumed with the assumption that the cluster file system(s) were having an issue removing orphaned files.
Note ID 468923.1 on the My Oracle Support website included directions to identify which node, application, or user
(holders) were associated with//orphan_dir name space entries (if any). To determine whether or not the orphaned
files were truly being held by a process (from one of the nodes in the RAC), the following was run from all OCFS2
clustered nodes:

[root@thing1 ~]# find /proc -name fd -exec ls -l {} \; | grep deleted

[root@thing2 ~]# find /proc -name fd -exec ls -l {} \; | grep deleted

Similarly, the lsof command can be used to produce the same results:

[root@thing1 ~]# lsof | grep -i deleted

[root@thing2 ~]# lsof | grep -i deleted

Although the find /proc and lsof commands both produced output, the files listed didn't appear to have anything
to do with the files on either of the clustered file systems (/u02 or /u03). It was now evident that while excessive
orphan files did indeed exist on the/u03 cluster file system, no processes were identified as holding a lock on them.

Query "//orphan_dir" Name Space using "debugfs.ocfs2"

The next round of troubleshooting involved querying the //orphan_dir name space in OCFS2 using
the debugfs.ocfs2 command. The general syntax used to query the //orphan_dir name space is:

debugfs.ocfs2 -R "ls -l //orphan_dir:<OCFS2_SLOT_NUM>" <DISK_DEVICE_NAME>

Where:

 OCFS2_SLOT_NUM

Four digit OCFS2 slot number that specifies which node to check using debugfs.ocfs2. For example,
cluster node 1 (thing1) will be slot number 0000, cluster node 2 (thing2) will be slot number 0001, cluster
node 3 (thing3) will be slot number 0002, node n will be slot number LPAD(n-1, 4, '0'), and so on.

 DISK_DEVICE_NAME

Name of the disk device. For example: /dev/iscsi/thingdbfravol1/part1.

For example:

Node 1 - (thing1)

[root@thing1 ~]# /sbin/debugfs.ocfs2 -R "ls -l //orphan_dir:0000" /dev/iscsi/thingdbfravol1/p


24 drwxr-xr-x 2 0 0 3896 18-Aug-2010 10:27 .
18 drwxr-xr-x 6 0 0 3896 27-Oct-2009 19:48 ..
83865735 -rw-r----- 0 501 501 2147483648 13-Aug-2010 00:14 00000000
83865734 -rw-r----- 0 501 501 2147483648 13-Aug-2010 00:13 00000000
83865720 -rw-r----- 0 501 501 2147483648 12-Apr-2010 00:09 00000000
83865719 -rw-r----- 0 501 501 2147483648 12-Apr-2010 00:08 00000000
83865718 -rw-r----- 0 501 501 2147483648 12-Apr-2010 00:08 00000000

... <snip> ...


Node 2 - (thing2)

[root@thing1 ~]# /sbin/debugfs.ocfs2 -R "ls -l //orphan_dir:0001" /dev/iscsi/thingdbfravol1/p


25 drwxr-xr-x 2 0 0 3896 18-Aug-2010 10:27 .
18 drwxr-xr-x 6 0 0 3896 27-Oct-2009 19:48 ..

The above output shows that cluster node 1 (thing1) found entries (files) associated with the //orphan_dir name
space on /u03, meaning they are orphan files. The output from debugfs.ocfs2 shows the file entries using some
type of "hex" value (i.e.83865735=00000000000bd021). This hex value indicates the inode number. Note that
cluster node 2 (thing2) has no entries in the //orphan_dir name space indicating there are no orphan files on this
node. This makes sense since the nightly RMAN process is only run from the first node (thing1).

Manually Remove Orphan Files using "fsck.ocfs2"

By the time it was identified that orphan files were not being removed from the OCFS2 cluster file system, immediate
action was required to clear out entries found in the //orphan_dir name space and allow the file system to free it's
inodes and disk space associated with it. The size and number of orphan files were considerably large given they
were RMAN backups and archived logs.

To manually clear all orphaned files, schedule an outage on all nodes in the cluster. Bring down the clustered
database and all Oracle RAC services, unmount the OCFS2 file system(s) that contain the orphan files to be removed,
and run the fsck.ocfs2 command from all nodes in the cluster as follows:

[root@thing1 ~]# umount /u03


[root@thing2 ~]# umount /u03

[root@thing1 ~]# /sbin/fsck.ocfs2 -fy /dev/iscsi/thingdbfravol1/part1


[root@thing2 ~]# /sbin/fsck.ocfs2 -fy /dev/iscsi/thingdbfravol1/part1

After removing all orphan files, mount the OCFS2 cluster file systems and restart all Oracle RAC services.

Upgrade OCFS2

The permanent solution in this case was to upgrade OCFS2 to version 1.4.4 or higher according to Note ID 806554.1
from the My Oracle Support website. At the time of this writing, the latest version of OCFS2 was 1.4.7-1.

Upgrading from OCFS2 version 1.4.2-1 to 1.4.7-1 does not require any on-disk format change. At a minimum, it is a
simple kernel driver update which means the upgrade could be performed in a rolling manner. While this would avoid
a cluster-wide outage, a full outage was still scheduled since it was unknown if cleaning out the orphan files
using fsck.ocfs2 could be performed on a disk device that still had other cluster instances mounting it.

The following paper provides step-by-step instructions to upgrade an installation of Oracle Cluster File System 2
(OCFS2) 1.4 on the Linux platform.

   Upgrading OCFS2 - 1.4

Nightly Check Script for Orphan Files

To ensure orphan files are not being held on an OCFS2 cluster file system, the following script should be scheduled to
run nightly from all nodes in the cluster.

The purpose of this script is to identify and warn the DBA on any orphan files in an OCFS2 cluster file system. The
script queries the //orphan_dir name space using the debugfs.ocfs2 command from a clustered node to determine
the number of orphaned files on that node.

This script should be scheduled to run on a nightly basis through CRON as the root user account.

   ocfs2_check_orphaned_files.ksh
The ocfs2_check_orphaned_files.ksh script takes three parameters:

 DISK_DEVICE_NAME

Name of the disk device. For example: /dev/iscsi/thingdbfravol1/part1.

 OCFS2_SLOT_NUM

Four digit OCFS2 slot number that specifies which node to check using debugfs.ocfs2. For example,
cluster node 1 will be slot number 0000, cluster node 2 will be slot number 0001, cluster node 3 will be slot
number 0002, node n will be slot number LPAD(n-1, 4, '0'), and so on.

 OCFS2_ORPHAN_FILE_COUNT_THRESHOLD

Maximum number of orphaned files that can exist in the provided OCFS2 file system before this script issues
a warning email.

For example, the following is scheduled nightly from two nodes in an Oracle RAC cluster; namely thing1 and thing2.
Note that node 1 (thing1) passes in slot number 0000 while node 2 (thing2) passes in slot number 0001. The third
parameter (5) specifies that at most, five orphan files can exist in the provided OCFS cluster file system before this
script issues a warning email.

Node 1 - (thing1)
ocfs2_check_orphaned_files.ksh /dev/iscsi/thingdbcrsvol1/part1 0000 5
ocfs2_check_orphaned_files.ksh /dev/iscsi/thingdbfravol1/part1 0000 5

Node 2 - (thing2)
ocfs2_check_orphaned_files.ksh /dev/iscsi/thingdbcrsvol1/part1 0001 5
ocfs2_check_orphaned_files.ksh /dev/iscsi/thingdbfravol1/part1 0001 5

About the Author

Jeffrey Hunter is an Oracle Certified Professional, Java Development Certified Professional, Author, and an Oracle
ACE. Jeff currently works as a Senior Database Administrator for The DBA Zone, Inc. located in Pittsburgh,
Pennsylvania. His work includes advanced performance tuning, Java and PL/SQL programming, developing high
availability solutions, capacity planning, database security, and physical / logical database design in a UNIX, Linux,
and Windows server environment. Jeff's other interests include mathematical encryption theory, programming
language processors (compilers and interpreters) in Java and C, LDAP, writing web-based database administration
tools, and of course Linux. He has been a Sr. Database Administrator and Software Engineer for over 18 years and
maintains his own website site at: http://www.iDevelopment.info. Jeff graduated from Stanislaus State University in
Turlock, California, with a Bachelor's degree in Computer Science.

You might also like