Professional Documents
Culture Documents
In this Document
Purpose
Questions and Answers
GENERAL
DOWNLOAD AND INSTALL
CONFIGURE
O2CB CLUSTER SERVICE
FORMAT
RESIZE
MOUNT
ORACLE RAC
MIGRATE DATA FROM OCFS (RELEASE 1) TO OCFS2
COREUTILS
EXPORTING VIA NFS
TROUBLESHOOTING
LIMITS
SYSTEM FILES
HEARTBEAT
QUORUM AND FENCING
NOVELL'S SLES9 and SLES10
RELEASE 1.2
UPGRADE TO THE LATEST RELEASE
PROCESSES
BUILD RPMS FOR HOTFIX KERNELS
BACKUP SUPER BLOCK
CONFIGURING CLUSTER TIMEOUTS
ENTERPRISE LINUX 5
References
Applies to:
Purpose
This Metalink Note duplicates the OCFS2 FAQ that can be found at the following URL:
http://oss.oracle.com/projects/ocfs2/dist/documentation/v1.2/ocfs2_faq.html
The intent is to make the OCFS2 FAQ searchable by Metalink when customers are creating SRs and also for Oracle Support
analysts when researching OCFS2 issues.
This note will be compared periodically against the master document on the oss.oracle.com website and will be updated as needed
to remain accurate with that document.
GENERAL
# cat /proc/fs/ocfs2/version
OCFS2 1.2.8 Tue Feb 12 20:22:48 EST 2008 (build 9c7ae8bb50ef6d8791df2912775adcc5)
kernel.panic = 60
For Novell's SLES9, use yast to upgrade to the latest SP3 kernel to get the required modules installed. Also, install the ocfs2-tools
and ocfs2console packages.
For Novell's SLES10, install ocfs2-tools and ocfs2console packages. For Red Hat's RHEL4 and RHEL5, download and install the
appropriate module package and the two tools packages, ocfs2-tools and ocfs2console. Appropriate module refers to one matching
the kernel version, flavor and architecture. Flavor refers to smp, hugemem, etc.
# uname -r
2.6.9-22.0.1.ELsmp
12. What modules are installed with the OCFS2 1.2 package?
* configfs.ko
* ocfs2.ko
* ocfs2_dlm.ko
* ocfs2_dlmfs.ko
* ocfs2_nodemanager.ko
* debugfs
The kernel shipped along with Enterprise Linux 5 includes configfs.ko and debugfs.ko.
13. What tools are installed with the ocfs2-tools 1.2 package?
* mkfs.ocfs2
* fsck.ocfs2
* tunefs.ocfs2
* debugfs.ocfs2
* mount.ocfs2
* mounted.ocfs2
* ocfs2cdsl
* ocfs2_hb_ctl
* o2cb_ctl
* o2cb - init service to start/stop the cluster
* ocfs2 - init service to mount/umount ocfs2 volumes
* ocfs2console - installed with the console package
CONFIGURE
17. What should the node name be and should it be related to the IP address?
The node name needs to match the hostname. The IP address need not be the one associated with that hostname. As in, any valid
IP address on that node can be used. OCFS2 will not attempt to match the node name (hostname) with the specified IP address.
18. How do I modify the IP address, port or any other information specified in cluster.conf?
While one can use ocfs2console to add nodes dynamically to a running cluster, any other modifications require the cluster to be
offlined. Stop the cluster on all nodes, edit /etc/ocfs2/cluster.conf on one and copy to the rest, and restart the cluster on all nodes.
Always ensure that cluster.conf is the same on all the nodes in the cluster.
Notice the "-i" argument is not required as the cluster is not online.
# /etc/init.d/o2cb configure
Enter 'y' if you want the service to load on boot and the name of the cluster (as listed in /etc/ocfs2/cluster.conf) and the cluster
timeouts
.
22. How do I start the cluster service?
* To load the modules, do:
# /etc/init.d/o2cb load
If you have configured the cluster to load on boot, you could combine the two as follows:
The cluster name is not required if you have specified the name during configuration.
# /etc/init.d/o2cb unload
If you have configured the cluster to load on boot, you could combine the two as follows:
The cluster name is not required if you have specified the name during configuration.
# /etc/init.d/o2cb status
FORMAT
The above formats the volume with default block and cluster sizes, which are computed based upon the size of the volume.
The above formats the volume for 4 nodes with a 4K block size and a 32K cluster size.
28. What does the number of node slots during format refer to?
The number of node slots specifies the number of nodes that can concurrently mount the volume. This number is specified during
format and can be increased using tunefs.ocfs2. This number cannot be decreased.
29. What should I consider when determining the number of node slots?
OCFS2 allocates system files, like Journal, for each node slot. So as to not to waste space, one should specify a number within the
ballpark of the actual number of nodes. Also, as this number can be increased, there is no need to specify a much larger number
than one plans for mounting the volume.
30. Does the number of node slots have to be the same for all volumes?
No. This number can be different for each volume.
RESIZE
36. Short of reboot, how do I get the other nodes in the cluster to see the resized partition?
Use blockdev(8) to rescan the partition table of the device on the other nodes in the cluster.
37. What is the tunefs.ocfs2 syntax for resizing the file system?
To grow a file system to the end of the resized partition, do:
# tunefs.ocfs2 -S /dev/sdX
38. Can the OCFS2 file system be grown while the file system is in use?
No. tunefs.ocfs2 1.2.2 only allows offline resize. i.e., the file system cannot be mounted on any node in the cluster. The online resize
capability will be added later.
The _netdev option indicates that the devices needs to be mounted after the network is up.
# /etc/init.d/o2cb configure
# mount
# cat /etc/mtab
# cat /proc/mounts
# /etc/init.d/ocfs2 status
# cat /proc/fs/ocfs2_dlm/*/stat
local=60624, remote=1, unknown=0, key=0x8619a8da
ORACLE RAC
# Also as OCFS2 does not currently support shared writeable mmap, the health check (GIMH) file $ORACLE_HOME/dbs
/hc_ORACLESID.dat and the ASM file $ASM_HOME/dbs/ab_ORACLESID.dat should be symlinked to local filesystem. We expect
to support shared writeable mmap in the OCFS2 1.4 release.
50. Does that mean I cannot have my data file and Oracle home on the same volume?
Yes. The volume containing the Oracle data files, redo-logs, etc. should never be on the same volume as the distribution (including
the trace logs like, alert.log).
53. Can OCFS volumes and OCFS2 volumes be mounted on the same machine simultaneously?
No. OCFS only works on 2.4 linux kernels (Red Hat's AS2.1/EL3 and SuSE's SLES8). OCFS2, on the other hand, only works on the
2.6 kernels (RHEL4, SLES9 and SLES10).
56. What is the quickest way to move data from OCFS to OCFS2?
Quickest would mean having to perform the minimal number of copies. If you have the current backup on a non-OCFS volume
accessible from the 2.6 kernel install, then all you would need to do is to retore the backup on the OCFS2 volume(s). If you do not
have a backup but have a setup in which the system containing the OCFS2 volumes can access the disks containing the OCFS
volume, you can use the FSCat tools to extract data from the OCFS volume and copy onto OCFS2.
COREUTILS
57. Like with OCFS (Release 1), do I need to use o_direct enabled tools to perform cp, mv, tar, etc.?
No. OCFS2 does not need the o_direct enabled tools. The file system allows processes to open files in both o_direct and bufferred
mode concurrently.
TROUBLESHOOTING
# debugfs.ocfs2 -l
To totally turn off tracing the SUPER bit, as in, turn off tracing even if some other bit is enabled for the same, do:
First thing to note is the Lockres, which is the lockname. The dlm identifies resources using locknames. A lockname is a combination
of a lock type (S superblock, M metadata, D filedata, R rename, W readwrite), inode number and generation.
To get the inode number and generation from lockname, do:
One could also provide the inode number instead of the lockname.
The first is the Metadata lock, then Data lock and last ReadWrite lock for the same resource.
The DLM supports 3 lock modes: NL no lock, PR protected read and EX exclusive.
If you have a dlm hang, the resource to look for would be one with the "Busy" flag set.
The next step would be to query the dlm for the lock resource.
To do dlm debugging, first one needs to know the dlm domain, which matches the volume UUID.
# echo "stats" | debugfs.ocfs2 -n /dev/sdX | grep UUID: | while read a b ; do echo $b ; done
82DA8137A49A47E4B187F74E09FBBB4B
Then do:
For example:
It shows that the lock is mastered by node 75 and that node 79 has been granted a PR lock on the resource.
LIMITS
SYSTEM FILES
65.What are system files?
System files are used to store standard filesystem metadata like bitmaps, journals, etc. Storing this information in files in a directory
allows OCFS2 to be extensible. These system files can be accessed using debugfs.ocfs2. To list the system files, do:
HEARTBEAT
72. How does one check the current active O2CB_HEARTBEAT_THRESHOLD value?
# cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold
7
# cat /proc/cmdline
* it sees an odd number of heartbeating nodes and has network connectivity to more than half of them.
OR,
* it sees an even number of heartbeating nodes and has network connectivity to at least half of them *and* has connectivity to the
heartbeating node with the lowest node number.
78. How does a node decide that it has connectivity with another?
When a node sees another come to life via heartbeating it will try and establish a TCP connection to that newly live node. It
considers that other node connected as long as the TCP connection persists and the connection is not idle for
O2CB_IDLE_TIMEOUT_MS. Once that TCP connection is closed or idle it will not be reestablished until heartbeat thinks the other
node has died and come back alive.
80. How can one avoid a node from panic-ing when one shutdowns the other node in a 2-node cluster?
This typically means that the network is shutting down before all the OCFS2 volumes are being umounted. Ensure the ocfs2 init
script is enabled. This script ensures that the OCFS2 volumes are umounted before the network is shutdown. To check whether the
service is enabled, do:
81. How does one list out the startup and shutdown ordering of the OCFS2 related services?
# cd /etc/rc3.d
# ls S*ocfs2* S*o2cb* S*network*
S10network S24o2cb S25ocfs2
# cd /etc/rc6.d
# ls K*ocfs2* K*o2cb* K*network*
K19ocfs2 K20o2cb K90network
# cd /etc/init.d/rc3.d
# ls S*ocfs2* S*o2cb* S*network*
S05network S07o2cb S08ocfs2
# cd /etc/init.d/rc3.d
# ls K*ocfs2* K*o2cb* K*network*
K14ocfs2 K15o2cb K17network
Please note that the default ordering in the ocfs2 scripts only include the network service and not any shared-device specific
service, like iscsi. If one is using iscsi or any shared device requiring a service to be started and shutdown, please ensure that that
service runs before and shutsdown after the ocfs2 init service.
82. Why are OCFS2 packages for SLES9 and SLES10 not made available on oss.oracle.com?
OCFS2 packages for SLES9 and SELS10 are available directly from Novell as part of the kernel. Same is true for the various
Asianux distributions and for ubuntu. As OCFS2 is now part of the mainline kernel, we expect more distributions to bundle the
product with the kernel.
83. What versions of OCFS2 are available with SLES9 and how do they match with the Red Hat versions available on
oss.oracle.com?
As both Novell and Oracle ship OCFS2 on different schedules, the package versions do not match. We expect to resolve itself over
time as the number of patch fixes reduce. Novell is shipping two SLES9 releases, viz., SP2 and SP3.
* The latest kernel with the SP2 release is 2.6.5-7.202.7. It ships with OCFS2 1.0.8.
* The latest kernel with the SP3 release is 2.6.5-7.283. It ships with OCFS2 1.2.3. Please contact Novell to get the latest OCFS2
modules on SLES9 SP3.
RELEASE 1.2
* Download the latest ocfs2-tools and ocfs2console for the target platform and the appropriate ocfs2 module package for the kernel
version, flavor and architecture. (For more, refer to the "Download and Install" section above.)
# /etc/init.d/o2cb offline
# /etc/init.d/o2cb unload
# /etc/init.d/o2cb configure
* At this stage one could either reboot the node or simply, restart the cluster and mount the volume.
93. Can I do a rolling upgrade from 1.2.7 to 1.2.8 or 1.2.9 on EL4 and EL5?
Yes. OCFS2 1.2.7, 1.2.8 and 1.2.9 are fully compatible. Users upgrading to 1.2.8/9 from 1.2.5/1.2.6 can expect the same
behaviour as described above for upgrading to 1.2.7.
94. After upgrade I am getting the following error on mount "mount.ocfs2: Invalid argument while mounting /dev/sda6 on /ocfs".
Do "dmesg | tail". If you see the error:
it means that you are trying to use the 1.2 tools and 1.0 modules. Ensure that you have unloaded the 1.0 modules and installed and
loaded the 1.2 modules. Use modinfo to determine the version of the module installed and/or loaded.
SELinux: initialized (dev configfs, type configfs), not configured for labeling audit(1139964740.184:2): avc: denied { mount } for ...
The above error indicates that you have SELinux activated. A bug in SELinux does not allow configfs to mount. Disable SELinux by
setting "SELINUX=disabled" in /etc/selinux/config. Change is activated on reboot.
PROCESSES
[o2net]
One per node. Is a workqueue thread started when the cluster is brought online and stopped when offline. It handles the network
communication for all threads. It gets the list of active nodes from the o2hb thread and sets up tcp/ip communication channels with
each active node. It sends regular keepalive packets to detect any interruption on the channels.
[user_dlm]
One per node. Is a workqueue thread started when dlmfs is loaded and stopped on unload. (dlmfs is an in-memory file system
which allows user space processes to access the dlm in kernel to lock and unlock resources.) Handles lock downconverts when
requested by other nodes.
[ocfs2_wq]
One per node. Is a workqueue thread started when ocfs2 module is loaded and stopped on unload. Handles blockable file system
tasks like truncate log flush, orphan dir recovery and local alloc recovery, which involve taking dlm locks. Various code paths queue
tasks to this thread. For example, ocfs2rec queues orphan dir recovery so that while the task is kicked off as part of recovery, its
completion does not affect the recovery time.
[o2hb-14C29A7392]
One per heartbeat device. Is a kernel thread started when the heartbeat region is populated in configfs and stopped when it is
removed. It writes every 2 secs to its block in the heartbeat region to indicate to other nodes that that node is alive. It also reads
the region to maintain a nodemap of live nodes. It notifies o2net and dlm any changes in the nodemap.
[ocfs2vote-0]
One per mount. Is a kernel thread started when a volume is mounted and stopped on umount. It downgrades locks when requested
by other nodes in reponse to blocking ASTs (BASTs). It also fixes up the dentry cache in reponse to files unlinked or renamed on
other nodes.
[dlm_thread]
One per dlm domain. Is a kernel thread started when a dlm domain is created and stopped when destroyed. This is the core dlm
which maintains the list of lock resources and handles the cluster locking infrastructure.
[dlm_reco_thread]
One per dlm domain. Is a kernel thread which handles dlm recovery whenever a node dies. If the node is the dlm recovery master,
it remasters all the locks owned by the dead node.
[dlm_wq]
One per dlm domain. Is a workqueue thread. o2net queues dlm tasks on this thread.
[kjournald]
One per mount. Is used as OCFS2 uses JDB for journalling.
[ocfs2cmt-0]
One per mount. Is a kernel thread started when a volume is mounted and stopped on umount. Works in conjunction with kjournald.
[ocfs2rec-0]
Is started whenever another node needs to be be recovered. This could be either on mount when it discovers a dirty journal or
during operation when hb detects a dead node. ocfs2rec handles the file system recovery and it runs after the dlm has finished its
recovery.
* Download and install all the kernel-devel packages for the hotfix kernel.
* Download and untar the OCFS2 source tarball.
# cd /tmp
# wget http://oss.oracle.com/projects/ocfs2/dist/files/source/v1.2/ocfs2-1.2.3.tar.gz
# tar -zxvf ocfs2-1.2.3.tar.gz
# cd ocfs2-1.2.3
# cat ~/.rpmmacros
%_topdir /home/jdoe/rpms
%_tmppath /home/jdoe/rpms/tmp
%_sourcedir /home/jdoe/rpms/SOURCES
%_specdir /home/jdoe/rpms/SPECS
%_srcrpmdir /home/jdoe/rpms/SRPMS
%_rpmdir /home/jdoe/rpms/RPMS
%_builddir /home/jdoe/rpms/BUILD
* Ensure you have all kernel-*-devel packages installed for the kernel version you wish to build for. If so, then the following
command will list it as a possible target.
# ./vendor/rhel4/kernel.guess targets
rhel4_2.6.9-67.EL_rpm
rhel4_2.6.9-67.0.1.EL_rpm
rhel4_2.6.9-55.0.12.EL_rpm
# ./configure --with-kernel=/usr/src/kernels/2.6.9-67.EL-i686
# make rhel4_2.6.9-67.EL_rpm
102. How do I detect whether the super blocks are backed up on a device?
103. How do I backup the super block on a device formatted by an older mkfs.ocfs2?
tunefs.ocfs2 1.2.3 or later can attempt to retroactively backup the super block.
If so, use the verify_backup_super script to list out the objects using these blocks.
# ./verify_backup_super /dev/sdX
Locating inodes using blocks 262144 1048576 4194304 on device /dev/sdX
Block# Inode Block Offset
262144 27 65058
1048576 Unused
4194304 4161791 25
Matching inodes to object names
27 //journal:0003
4161791 /src/kernel/linux-2.6.19/drivers/scsi/BusLogic.c
# If the object happens to be user created, move that object temporarily to an another volume before re-attempting the operation.
However, this will not work if one or more blocks are being used by a system file (shown starting with double slashes //), say, a
journal.
# fsck.ocfs2 -f -r 2 /dev/sdX
[RECOVER_BACKUP_SUPERBLOCK] Recover superblock information from backup block#1048576? n
Checking OCFS2 filesystem in /dev/sdX
label: myvolume
uuid: 4d 1d 1f f3 24 01 4d 3f 82 4c e2 67 0c b2 94 f3
number of blocks: 13107196
bytes per block: 4096
number of clusters: 13107196
bytes per cluster: 4096
max slots: 4
105. List and describe all the configurable timeouts in the O2CB cluster stack?
OCFS2 1.2.5 has 4 different configurable O2CB cluster timeouts:
* O2CB_HEARTBEAT_THRESHOLD - The Disk Heartbeat timeout is the number of two second iterations before a node is
considered dead. The exact formula used to convert the timeout in seconds to the number of iterations is as follows:
For e.g., to specify a 60 sec timeout, set it to 31. For 120 secs, set it to 61. The current default for this timeout is 60 secs
(O2CB_HEARTBEAT_THRESHOLD = 31). In releases 1.2.5 and earlier, it was 12 secs (O2CB_HEARTBEAT_THRESHOLD = 7).
* O2CB_IDLE_TIMEOUT_MS - The Network Idle timeout specifies the time in miliseconds before a network connection is
considered dead. The current default for this timeout is 30000 ms. In releases 1.2.5 and earlier, it was 10000 ms.
* O2CB_KEEPALIVE_DELAY_MS - The Network Keepalive specifies the maximum delay in miliseconds before a keepalive packet
is sent. As in, a keepalive packet is sent if a network connection between two nodes is silent for this duration. If the other node is
alive and is connected, it is expected to respond. The current default for this timeout is 2000 ms. In releases 1.2.5 and earlier, it
was 5000 ms.
* O2CB_RECONNECT_DELAY_MS - The Network Reconnect specifies the minimum delay in miliseconds between connection
attempts. The default has always been 2000 ms.
107. What are the currect defaults for the cluster timeouts?
The timeouts were updated in the 1.2.6 release to the following:
O2CB_HEARTBEAT_THRESHOLD = 31
O2CB_IDLE_TIMEOUT_MS = 30000
O2CB_KEEPALIVE_DELAY_MS = 2000
O2CB_RECONNECT_DELAY_MS = 2000
108 Can one change these timeout values in a round robin fashion?
# No. The o2net handshake protocol ensures that all the timeout values for both the nodes are consistent and fails if any value
differs. This failed connection results in a failed mount, the reason for which is always listed in dmesg.
# /etc/init.d/o2cb status
Module "configfs": Loaded
Filesystem "configfs": Mounted
Module "ocfs2_nodemanager": Loaded
Module "ocfs2_dlm": Loaded
Module "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster mycluster: Online
Heartbeat dead threshold: 31
Network idle timeout: 30000
Network keepalive delay: 2000
Network reconnect delay: 2000
Checking O2CB heartbeat: Not active
# cat /etc/sysconfig/o2cb
#
# This is a configuration file for automatic startup of the O2CB
# driver. It is generated by running /etc/init.d/o2cb configure.
# Please use that method to modify this file
#
ENTERPRISE LINUX 5
112. What are the changes in EL5 as compared to EL4 as it pertains to OCFS2?
The in-memory filesystems, configfs and debugfs, have different mountpoints. configfs is mounted at /sys/kernel/config, instead of
/config, while debugfs at /sys/kernel/debug, instead of /debug. (dlmfs still mounts at the old mountpoint /dlm.)
References
http://kerneltrap.org/node/4394
http://oss.oracle.com/bugzilla/show_bug.cgi?id=741
http://lwn.net/Articles/166954/
http://oss.oracle.com/projects/ocfs2-tools/dist/files/extras/verify_backup_super
Keywords
Help us improve our service. Please email us your comments for this document. .