Subject: OCFS2 - FREQUENTLY ASKED QUESTIONS Doc ID: 391771.

1 Modified Date: 10-JUN-2009 In this Document Purpose Questions and Answers GENERAL DOWNLOAD AND INSTALL CONFIGURE O2CB CLUSTER SERVICE FORMAT RESIZE MOUNT ORACLE RAC MIGRATE DATA FROM OCFS (RELEASE 1) TO OCFS2 COREUTILS EXPORTING VIA NFS TROUBLESHOOTING LIMITS SYSTEM FILES HEARTBEAT QUORUM AND FENCING NOVELL'S SLES9 and SLES10 RELEASE 1.2 UPGRADE TO THE LATEST RELEASE PROCESSES BUILD RPMS FOR HOTFIX KERNELS BACKUP SUPER BLOCK CONFIGURING CLUSTER TIMEOUTS ENTERPRISE LINUX 5 References Type: FAQ Status: PUBLISHED

Applies to:
Linux Kernel - Version: 1.2 Oracle Server - Enterprise Edition - Version: 9.2.0.1 to 10.2.0.1 IBM zSeries Based Linux Linux Itanium Linux x86-64 IBM Power Based Linux Linux x86

Purpose
This Metalink Note duplicates the OCFS2 FAQ that can be found at the following URL: http://oss.oracle.com/projects/ocfs2/dist/documentation/v1.2/ocfs2_faq.html The intent is to make the OCFS2 FAQ searchable by Metalink when customers are creating SRs and also for Oracle Support analysts when researching OCFS2 issues. This note will be compared periodically against the master document on the oss.oracle.com website and will be updated as needed to remain accurate with that document. Users of this note are encourage to refer to http://oss.oracle.com/projects/ocfs2/dist/documentation/v1.2/ocfs2_faq.html for the latest updates.

Questions and Answers GENERAL
1. How do I get started? * Download and install the module and tools rpms. * Create cluster.conf and propagate to all nodes. * Configure and start the O2CB cluster service.

flavor and architecture. How do I interpret the package name ocfs2-2.1.ELsmp . install ocfs2-tools and ocfs2console packages.9-22. How do I configure my system to auto-reboot after a panic? To auto-reboot system 60 secs after a panic. * ocfs2 .panic = 60 DOWNLOAD AND INSTALL 4. Flavor refers to smp.8 Tue Feb 12 20:22:48 EST 2008 (build 9c7ae8bb50ef6d8791df2912775adcc5) 3. use yast to upgrade to the latest SP3 kernel to get the required modules installed.i386. add the following to /etc/sysctl. For Novell's SLES10. 6.Package version * 1 .9-1 for both Enterprise Linux 4 and 5. Why can't I use uname -p to determine the kernel architecture? uname -p does not always provide the exact kernel architecture.2.1-1. ia32e and x86_64.2.Kernel version and flavor * 1.0. What are the latest versions of the OCFS2 packages? The latest module package version is 1.1 .1-1.6.i686.1.1-1.2. To know the kernel version and flavor. For Red Hat's RHEL4 and RHEL5.Architecture 7. Case in point the RHEL3 kernels on x86_64. one still needs to determine the kernel version. Even though Red Hat has two different kernel architectures available for this port.9-22.i386.9-22.2.ELsmp To know the architecture.Package subversion * i686 . do: # rpm -qf /boot/vmlinuz-`uname -r` --queryformat "%{ARCH}\n" i686 8.2. 9. How do I know the version number running? # cat /proc/fs/ocfs2/version OCFS2 1.2. uname -p identifies both as the generic x86_64.Package name * 2.2.rpm ocfs2console-1. use the up2date command as follows: # up2date --install ocfs2-tools ocfs2console # up2date --install ocfs2-`uname -r` For Novell's SLES9. How do I know which package to install on my box? After one identifies the package name and version to install. 5.6.rpm? The package name is comprised of multiple parts separated by '-'.6.6. flavor and architecture. download and install the appropriate module package and the two tools packages.1. do: # uname -r 2.7-1 for both Enterprise Linux 4 and 5.1-1.* Format the volume. Where do I get the packages from? For Oracle Enterprise Linux 4 and 5.conf: kernel. do: # echo 60 > /proc/sys/kernel/panic To enable the above on every reboot.ELsmp-1. etc. ocfs2-tools and ocfs2console.rpm Then install the appropriate kernel module package: # rpm -Uvh ocfs2-2. How do I install the rpms? First install the tools and console packages: # rpm -Uvh ocfs2-tools-1. install the ocfs2-tools and ocfs2console packages.9-22.i686. 2.0.rpm .1. Also. * Mount the volume.ELsmp-1.0.6. hugemem.2. The latest tools/console package version is 1.0. Appropriate module refers to one matching the kernel version.

the console is not required but recommended for ease-of-use. What tools are installed with the ocfs2-tools 1.conf? If you have installed the console. Always ensure that cluster.ocfs2 * fsck. 11. It is bundled with OCFS2 as the various distributions are currently not bundling it. What modules are installed with the OCFS2 1. However. 17. How do I populate /etc/ocfs2/cluster. OCFS2 will not attempt to match the node name (hostname) with the specified IP address.2 package? * configfs. What is debugfs and is it related to debugfs. For example. While OCFS2 does not take much bandwidth. it does require the nodes to be alive on the network and sends regular keepalive packets to ensure that they are. How do I add a new node to an online cluster? You can use the console to add a new node. check the Appendix in the User's guide for a sample cluster. Do not forget to copy this file to all the nodes in the cluster.ko * ocfs2.init service to mount/umount ocfs2 volumes * ocfs2console .ko * debugfs The kernel shipped along with Enterprise Linux 5 includes configfs. adding on one node and propagating to the other nodes is not sufficient. any valid IP address on that node can be used. you will need to explicitly add the new node on all the online nodes.ko * ocfs2_dlmfs. 16. and restart the cluster on all nodes.ocfs2 * mount.ocfs2 are unrelated in general. In that case.3 or later and ocfs2-tools. the latter is used as the front-end for the debugging info provided by the former. you can use the o2cb_ctl utility on all online nodes as follows: # o2cb_ctl -C -i -n NODENAME -t node -a number=NODENUM -a ip_address=IPADDR -a ip_port=IPPORT -a cluster=CLUSTERNAME 20. it will most likely be due to bug#741. What should the node name be and should it be related to the IP address? The node name needs to match the hostname. That is. 12.conf and the details of all the components. refer to the user's guide.2.ocfs2 * debugfs.ocfs2? debugfs is an in-memory filesystem developed by Greg Kroah-Hartman. It is currently being used by OCFS2 to dump the list of filesystem locks and could be used for more in the future.ko * ocfs2_dlm.10 or later.conf on one and copy to the rest.installed with the console package 14. 18. any other modifications require the cluster to be offlined. One could use the same interconnect for Oracle RAC and OCFS2. Stop the cluster on all nodes. Should the IP interconnect be public or private? Using a private interconnect is recommended.10. python 2. a private interconnect is recommended.ko.ko * ocfs2_nodemanager. edit /etc/ocfs2/cluster.3 or later. glib2 2. ensure the other nodes are updated as well. The IP address need not be the one associated with that hostname. refer to the troubleshooting section. If the operation fails.ko and debugfs.conf is the same on all the nodes in the cluster. For details. How do I modify the IP address. If you do not have the console installed. vte 0. It is useful for debugging as it allows kernel space to easily export data to userspace.99. 13. As in.conf? While one can use ocfs2console to add nodes dynamically to a running cluster. use it to create this configuration file.16 or later. Do I need to install the console? No.init service to start/stop the cluster * ocfs2 . # How do I add a new node to an offline cluster? .ocfs2 * ocfs2cdsl * ocfs2_hb_ctl * o2cb_ctl * o2cb . To avoid a network delay being interpreted as a node disappearing on the net which could lead to a node-self-fencing. If you ever edit this file on any node. 19.11.ocfs2 * tunefs. port or any other information specified in cluster.ocfs2 * mounted. pygtk2 (RHEL4) or python-gtk (SLES9) 1. What are the dependencies for installing ocfs2console? ocfs2console requires e2fsprogs. CONFIGURE 15. While debugfs and debugfs.2 package? * mkfs.

. FORMAT 26. do: # /etc/init. refer to the user's guide. do: # /etc/init.d/o2cb status 25.conf.d/o2cb configure Enter 'y' if you want the service to load on boot and the name of the cluster (as listed in /etc/ocfs2/cluster. How can I learn the status of the cluster? To learn the status of the cluster. Apart from the fact that partitioned disks are less likely to be "reused" by mistake.ocfs2 directly to format the volume.You can either use the console or use o2cb_ctl or simply hand edit cluster. How do I format a volume? You could either use the console or use mkfs. The o2cb_ctl command to do the same is: # o2cb_ctl -C -n NODENAME -t node -a number=NODENUM -a ip_address=IPADDR -a ip_port=IPPORT -a cluster=CLUSTERNAME Notice the "-i" argument is not required as the cluster is not online.d/o2cb stop [cluster_name] The cluster name is not required if you have specified the name during configuration. do: # /etc/init. I am unable to get the cluster online. do: # /etc/init. Should I partition a disk before formatting? Yes. What could be wrong? Check whether the node name in the cluster. For console.d/o2cb unload If you have configured the cluster to load on boot. 23.d/o2cb offline [cluster_name] * To unload the modules. Use fdisk or parted or any other tool for the task.d/o2cb start [cluster_name] The cluster name is not required if you have specified the name during configuration. some features like mount-by-label only work with partitioned volumes.d/o2cb online [cluster_name] If you have configured the cluster to load on boot. One of the nodes in the cluster. O2CB CLUSTER SERVICE 21. partitioning is recommended even if one is planning to use the entire disk for ocfs2. Then either use the console to propagate it to all nodes or hand copy using scp or any other tool. How do I configure the cluster service? # /etc/init.conf exactly matches the hostname. 22. you could combine the two as follows: # /etc/init. you could combine the two as follows: # /etc/init. How do I stop the cluster service? * To offline it.conf) and the cluster timeouts . 24. do: # /etc/init.d/o2cb load * To Online it. 27.conf need to be in the cluster for the cluster to be online. How do I start the cluster service? * To load the modules.

8K. i. 32K. 36. The online resize capability will be added later. 29. but you may lose your entire file system. Also. We have no current plans on providing this functionality. there is no need to specify a much larger number than one plans for mounting the volume. 32K to 64K. For Oracle home. What should I consider when determining the number of node slots? OCFS2 allocates system files. one should specify a number within the ballpark of the actual number of nodes. # mount -L "label" /dir The volume label is changeable using the tunefs. the file system cannot be mounted on any node in the cluster. 128K. What cluster size should I use? A cluster size is the smallest unit of space allocated to a file to hold the data. 16K. # mkfs. For database volumes. OCFS2 supports cluster sizes of 4K.e. What is the tunefs. So as to not to waste space. 512K and 1M. 1K. refer to the tunefs. On the other hand.ocfs2 utility. Can OCFS2 file systems be grown in size? Yes.ocfs2.ocfs2 -S /dev/sdX For more. which are computed based upon the size of the volume. What does the number of node slots during format refer to? The number of node slots specifies the number of nodes that can concurrently mount the volume. 35. RESIZE 34. . 30. 64K. do: # tunefs.ocfs2 -b 4k -C 32K -L "oracle_home" -N 4 /dev/sdX The above formats the volume for 4 nodes with a 4K block size and a 32K cluster size. Short of reboot. # blockdev --rereadpt /dev/sdX 37. When recreating it. as this number can be increased. OCFS2 supports block sizes of 512 bytes. Does the number of node slots have to be the same for all volumes? No. This number cannot be decreased. However. For most volume sizes.2 only allows offline resize. This number is specified during format and can be increased using tunefs. 32. The block size cannot be changed after the format.ocfs2 1. like Journal. 256K. You could also use labels to identify volumes during mount. Can the OCFS2 file system be shrunk in size? No. 33.. 38. if you find this feature useful. Any advantage of labelling the volumes? As in a shared disk environment.ocfs2.mkfs. ensure you specify the same starting disk cylinder as before and a ending disk cylinder that is greater than the existing one. a 4K size is recommended. the disk name (/dev/sdX) for a particular device be different on different nodes. You can use fdisk(8) (or any appropriate tool for your disk array) to resize the partition. tunefs. how do I get the other nodes in the cluster to see the resized partition? Use blockdev(8) to rescan the partition table of the device on the other nodes in the cluster. you can grow an OCFS2 file system using tunefs. 28. for each node slot. a cluster size of 128K or larger is recommended. the 512 bytes block is never recommended. not only will the resize operation fail.ocfs2 -L "oracle_home" /dev/sdX The above formats the volume with default block and cluster sizes. 2K and 4K.ocfs2 manpage. What block size should I use? A block size is the smallest unit of space addressable by the file system.What do I need to know to use fdisk(8) to resize the partition? To grow a partition using fdisk(8). 39. This number can be different for each volume. Backup your data before performing this task. you will have to delete it and recreate it with a larger size. Otherwise. 31. It should be noted that the tool will only resize the file system and not the underlying partition. labelling becomes a must for easy identification. Can the OCFS2 file system be grown while the file system is in use? No.ocfs2 syntax for resizing the file system? To grow a file system to the end of the resized partition. file an enhancement request on bugzilla listing your reasons for the same.2.

MOUNT 40. 46. # mount * List /etc/mtab. 47. or. which will make most mounts instant. 45. Why does it take so much time to umount the volume? . What do I need to do to mount OCFS2 volumes on boot? * Enable o2cb service using: # chkconfig --add o2cb * Enable ocfs2 service using: # chkconfig --add ocfs2 * Configure o2cb to load on boot using: # /etc/init. # cat /proc/mounts * Run ocfs2 service. How do I know my volume is mounted? * Enter mount without arguments. we plan to add support for a global heartbeat. How do I mount by label? To mount by label do: # mount -L "label" /dir 42. or. configfs is used by the ocfs2 tools to communicate to the in-kernel node manager the list of nodes in the cluster and to the in-kernel heartbeat thread the resource to heartbeat on. It does so so as to let the heartbeat thread stabilize. 41. For console. # cat /etc/mtab * List /proc/mounts.d/o2cb configure * Add entries into /etc/fstab as follows: /dev/sdX /dir ocfs2 _netdev 0 0 44. ocfs2_dlmfs is used by ocfs2 tools to communicate with the in-kernel dlm to take and release clusterwide locks on resources. or. How do I mount the volume? You could either use the console or use mount directly. What entry to I add to /etc/fstab to mount an ocfs2 volume? Add the following: /dev/sdX /dir ocfs2 _netdev 0 0 The _netdev option indicates that the devices needs to be mounted after the network is up. What are the /config and /dlm mountpoints for? OCFS2 comes bundled with two in-memory filesystems configfs and ocfs2_dlmfs. refer to the user's guide. # /etc/init.d/ocfs2 status mount command reads the /etc/mtab to show the information. 43. In a later release. # mount -t ocfs2 /dev/sdX /dir The above command will mount device /dev/sdX on directory /dir. Why does it take so much time to mount the volume? It takes around 5 secs for a volume to mount.

Any special flags to run Oracle RAC? OCFS2 volumes containing the Voting diskfile (CRS). remote=1. However. This is because Oracle rarely resizes these files and thus almost all writes are non-extending.oracle. that is. you can use the FSCat tools to extract data from the OCFS volume and copy onto OCFS2.During umount.2. If you do not have a backup but have a setup in which the system containing the OCFS2 volumes can access the disks containing the OCFS volume. Can I in-place convert my OCFS volume to OCFS2? No. This leads to one observing different times (ls -l) for the same file on different nodes on the cluster. With that in mind.log). on the other hand. Data files. the health check (GIMH) file $ORACLE_HOME/dbs /hc_ORACLESID. should never be on the same volume as the distribution (including the trace logs like. We expect to support shared writeable mmap in the OCFS2 1.6 kernels (SLES9/SLES10/RHEL4)? Yes.5.6 kernels using FSCat tools.4 linux kernels (Red Hat's AS2. MIGRATE DATA FROM OCFS (RELEASE 1) TO OCFS2 52. redo-logs. Redo logs. At the same time. SLES9 and SLES10). it was decided not to develop such a tool but instead provide tools to copy data from OCFS without one having to mount it. What is the quickest way to move data from OCFS to OCFS2? Quickest would mean having to perform the minimal number of copies. These tools can access the OCFS volumes at the device layer. If you have the current backup on a non-OCFS volume accessible from the 2. Does that mean I cannot have my data file and Oracle home on the same volume? Yes.3 release of OCFS2 does not update the modification time on the inode across the cluster for non-extending writes. 51. key=0x8619a8da local refers to locally mastered lockres'. OCFS2. # mount -o datavolume. . 50. to list and copy the files to another filesystem. These mount options are only relevant for Oracle files listed above. 54. Can OCFS volumes and OCFS2 volumes be mounted on the same machine simultaneously? No. 53. we have added enough flexibility in the new disk layout so as to maintain backward compatibility in the future. 55. etc. Any other information I should be aware off? The 1. Can I mount OCFS volumes as OCFS2? No. We had to break the compatibility in order to add many of the new features.nointr -t ocfs2 /dev/sda1 /u01/db 49. without the datavolume and nointr mount options. In OCFS2 1. the time will be locally updated in the cached inodes. We are looking into making it asynchronous so as to reduce the time it takes to migrate the lockres'. The nointr option ensures that the ios are not interrupted by signals. The on-disk layout of OCFS and OCFS2 are sufficiently different that it would require a third disk (as a temporary buffer) inorder to in-place upgrade the volume.2.dat should be symlinked to local filesystem. 56.4 release. In 1. OCFS and OCFS2 are not on-disk compatible. # mount -t ocfs2 /dev/sdb1 /software/orahome # Also as OCFS2 does not currently support shared writeable mmap. unknown=0. we intend to fix this by updating modification times for all writes while providing an opt-out mount option (nocmtime) for users who would prefer to avoid the performance overhead associated with this feature. Archive logs and Control files must be mounted with the datavolume and nointr mount options. Can I access my OCFS volume on 2.4 time frame. you can access the OCFS volume on 2. as one variably changes the file size during write. The datavolume option ensures that the Oracle processes opens these files with the o_direct flag. the one usage where this is most commonly experienced is with Oracle datafiles and redologs. do: # cat /proc/fs/ocfs2_dlm/*/stat local=60624.2. The volume containing the Oracle data files.6 kernel install. the task of asynchronously migrating lockres' has been pushed to the 1. fsls and fscp.) To find the number of lockres in all dlm domains. the lockres migration is a synchronous operation.6 kernels (RHEL4.com. the dlm has to migrate all the mastered lockres' to an another node in the cluster. What about the volume containing Oracle home? Oracle home volume should be mounted normally.4. (While we have improved this performance in 1. alert. then all you would need to do is to retore the backup on the OCFS2 volume(s). While this does not affect most uses of the filesystem. ORACLE RAC 48.dat and the ASM file $ASM_HOME/dbs/ab_ORACLESID.1/EL3 and SuSE's SLES8). Cluster registry (OCR). FSCat tools are available on oss. only works on the 2. OCFS only works on 2.

A lockname is a combination of a lock type (S superblock. TROUBLESHOOTING 60. you can export files on OCFS2 via the standard Linux NFS server. .COREUTILS 57. # echo "fs_locks" | debugfs. The file system allows processes to open files in both o_direct and bufferred mode concurrently. mv. do: # debugfs. inode number and generation.9+ has this feature. R rename. M metadata. Is there no solution for the NFS v2 clients? NFS v2 clients can work if the server exports the volumes with the no_subtree_check option. do I need to use o_direct enabled tools to perform cp. EXPORTING VIA NFS 58. etc. tar. do: # debugfs. In practice. # mount -t debugfs debugfs /debug . do: # debugfs. this means clients need to be running a 2.x kernel or above. Can I export an OCFS2 file system via NFS? Yes.ocfs2 -l HEARTBEAT ENTRY EXIT allow To disable heartbeat tracing. as in.ocfs2 /dev/sdX >/tmp/fslocks 62.OR # mount -t debugfs debugfs /sys/kernel/debug * Dump the locks. How do I get a list of filesystem locks and their statuses? OCFS2 1.ocfs2 -l HEARTBEAT off ENTRY EXIT deny 61. How do I read the fs_locks output? Let's look at a sample output: Lockres: M000000000000000006672078b84822 Mode: Protected Read Flags: Initialized Attached RO Holders: 0 EX Holders: 0 Pending Action: None Pending Unlock Action: None Requested Mode: Protected Read Blocking Mode: Invalid First thing to note is the Lockres.0.ocfs2 -l SUPER off To totally turn off tracing the SUPER bit. do: # debugfs.ocfs2 -l SUPER deny To enable heartbeat tracing. Please note that only NFS version 3 and above will work. W readwrite).ocfs2 -l SUPER allow To disable tracing the bit SUPER. turn off tracing even if some other bit is enabled for the same. D filedata. do: * Mount debugfs is mounted at /debug (EL4) or /sys/kernel/debug (EL5). OCFS2 does not need the o_direct enabled tools. However. this has some security implications that is documented in the exports manpage. To get this list. which is the lockname. How do I enable and disable filesystem tracing? To list all the debug bits along with their statuses. The dlm identifies resources using locknames. do: # debugfs. Like with OCFS (Release 1). do: # debugfs. 59.ocfs2 -l To enable tracing the bit SUPER.4.? No.

15/arch/i386/kernel/semaphore.c" | debugfs. node=79. etc.c One could also provide the inode number instead of the lockname. If you have a dlm hang. PR protected read and EX exclusive.6. first one needs to know the dlm domain.ocfs2 -n /dev/sdX 419616 /linux-2. The next step would be to query the dlm for the lock resource. Is there a limit to the size of an ocfs2 file system? Yes. OCFS2 currently allows up to 32000 subdirectories. done 82DA8137A49A47E4B187F74E09FBBB4B Then do: # echo R dlm_domain lockname > /proc/fs/ocfs2_dlm/debug For example: # echo R 82DA8137A49A47E4B187F74E09FBBB4B M000000000000000006672078b84822 > /proc/fs/ocfs2_dlm/debug # dmesg | tail struct dlm_ctxt: 82DA8137A49A47E4B187F74E09FBBB4B. do: # echo "encode /linux-2. state=0 last used: 0.6.pend=n). which matches the volume UUID. cookie=11673330234144325711. key=965960985 lockres: M000000000000000006672078b84822.pend=n) converting queue: blocked queue: It shows that the lock is mastered by node 75 and that node 79 has been granted a PR lock on the resource.ocfs2 -n /dev/sdX Inode: 419616 Mode: 0666 Generation: 2025343010 (0x78b84822) . node=79. 64. then Data lock and last ReadWrite lock for the same resource.).To get the inode number and generation from lockname. The DLM supports 3 lock modes: NL no lock. To do dlm debugging. To map the lockname to a directory entry. Is there a limit to the number of subdirectories in a directory? Yes. the resource to look for would be one with the "Busy" flag set. Note: The dlm debugging is still a work in progress. ast=(empty=y. current software addresses block numbers with 32 bits.6. do: #echo "stat " | debugfs.ocfs2 -n /dev/sdX | grep UUID: | while read a b . LIMITS 63. # echo "locate <419616>" | debugfs. So the file system device is limited to (2 ^ 32) * blocksize (see mkfs -b). on purge list: no granted queue: type=3. While this limit could be increased. SYSTEM FILES . At that point the limit becomes addressing clusters of 1MB each with 32 bits which leads to a 4PB file system.15/arch/i386/kernel/semaphore. bast=(empty=y.ocfs2 -n /dev/sdX 419616 /linux-2.ocfs2 -n /dev/sdX M000000000000000006672078b84822 D000000000000000006672078b84822 W000000000000000006672078b84822 The first is the Metadata lock. With a 4KB block size this amounts to a 16TB file system. conv=-1. do: # echo "locate " | debugfs. # echo "stats" | debugfs. owner=75. This is just to give a flavor of dlm debugging... do echo $b .c To get a lockname from a directory entry.15/arch/i386/kernel/semaphore.. This block addressing limit will be relaxed in future software. we will not be doing it till we implement some kind of efficient name lookup (htree.

any node crash after this point is ignored as there is no need for recovery.1) * 2) secs. 19 24 10 1 bad_blocks 20 32 18 1 global_inode_alloc 21 20 8 1 slot_map 22 24 9 1 heartbeat 23 28 13 1 global_bitmap 24 28 15 2 orphan_dir:0000 25 32 17 1 extent_alloc:0000 26 28 16 1 inode_alloc:0000 27 24 12 1 journal:0000 28 28 16 1 local_alloc:0000 29 3796 17 1 truncate_log:0000 The first column lists the block number. The numbers at the end of the system file name is the slot#. do: # echo "ls -l //" | debugfs. If the next timestamp is written within that duration. are node specific. As the journal is shutdown before this broadcast.What are system files? System files are used to store standard filesystem metadata like bitmaps. How does the disk heartbeat work? Every node writes every two secs to its block in the heartbeat system file. For 60 secs. How can one change the parameter value of O2CB_HEARTBEAT_THRESHOLD? This parameter value could be changed by adding it to /etc/sysconfig/o2cb and RESTARTING the O2CB cluster. as it should. The [o2hb-xx] kernel thread. it first cancels that timer before setting up a new one. All the nodes also read the heartbeat sysfile every two secs. When is a node deemed dead? An active node is deemed dead if it does not update its timestamp for O2CB_HEARTBEAT_THRESHOLD (default=31) loops. set it to 61. How does one check the current active O2CB_HEARTBEAT_THRESHOLD value? # cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold 7 73. like journal:0000. This value should be the SAME on ALL the nodes in the cluster. To list the slot maps. 70. while local. 66. Most multipath solutions have a timeout ranging from 60 secs to 120 secs. the surviving node which manages to cluster lock the dead node's journal.ocfs2 -n /dev/sdX Slot# Node# 0 39 1 40 2 41 3 42 HEARTBEAT 67. As long as the timestamp is changing. sets a timer to panic the system after that duration. What should one set O2CB_HEARTBEAT_THRESHOLD to? It should be set to the timeout value of the io layer.ocfs2. node 1 to the second. These system files can be accessed using debugfs.. global and local. 71. What about self fencing? A node self-fences if it fails to update its timestamp for ((O2CB_HEARTBEAT_THRESHOLD . This way it ensures the system will self fence if for some reason the [o2hb-x] kernel thread is unable to update the timestamp and thus be deemed dead by other nodes in the cluster. The block offset is equal to its global node number. etc. What if a node umounts a volume? During umount. Global files are for all the nodes. Storing this information in files in a directory allows OCFS2 to be extensible. 69. 68. set it to 31. etc. To list the system files. do: # echo "slotmap" | debugfs.65. after every timestamp write. the node will broadcast to all the nodes that have mounted that volume to drop that node from its node maps. 18 16 2 2 . Why do some files have numbers at the end? There are two types of files. So node 0 writes to the first block. Once a node is deemed dead. that node is deemed alive. The set of local files used by a node is determined by the slot mapping of that node. For 120 secs. recovers it by replaying the journal.ocfs2 -n /dev/sdX 18 16 1 2 . O2CB_HEARTBEAT_THRESHOLD = (((timeout in secs) / 2) + 1) 72. journals. .

edit the command line in /boot/grub/grub.74. A node with OCFS2 mounted will fence itself when it realizes that it doesn't have quorum in a degraded cluster. If you wish to use the DEADLINE io scheduler. by default.2.not syncing: ocfs2 is very sorry to be fencing this system by panicing" whenever I run a heavy io load? We have encountered a bug with the default CFQ io scheduler which causes a process doing heavy io to temporarily starve out other processes.EL) (with deadline) root (hd0. OCFS2 1.5-7.9-42. OR. It considers that other node connected as long as the TCP connection persists and the connection is not idle for O2CB_IDLE_TIMEOUT_MS.9-22. How does a node decide that it has connectivity with another? When a node sees another come to life via heartbeating it will try and establish a TCP connection to that newly live node.115200 noexec=off initrd (hd0. # echo 1 > /proc/fs/ocfs2_nodemanager/fence_method Please note that this change is local to a node. How long does the quorum process take? First a node will realize that it doesn't have connectivity with another node. Instead they will see the message "*** ocfs2 is very sorry to be fencing this system by restarting ***" and that too probably only as part of the messages captured on the netdump/netconsole server.6. edit the command line in /boot/grub/menu. If perchance the user wishes to use panic to fence (maybe to see the familiar oops stack trace or on the advise of customer support to diagnose frequent reboots).257).6. it will do this when it sees more nodes heartbeating than it has connectivity to and fails the quorum test.4)/boot/vmlinuz-2.9-22. This should not only prevent nodes from hanging during fencing but also allow for nodes to quickly restart and rejoin the cluster.244-bigsmp * For RHEL4. it uses "machine restart".5-7. I encounter "Kernel panic . one can do so by issuing the following command after the O2CB cluster is online.lst. Currently OCFS2 will panic the machine when it realizes it has to fence itself off from the cluster.5-7. What is fencing? Fencing is the act of forecefully removing a node from a cluster. As described above. A node has quorum when: * it sees an odd number of heartbeating nodes and has network connectivity to more than half of them.6. you could do so by appending "elevator=deadline" to the kernel command line as follows: * For SLES9.EL ro root=LABEL=/ console=ttyS0.115200 console=tty0 elevator=deadline noexec=off initrd /initrd-2.4)/boot/initrd-2.5-7.0) kernel /vmlinuz-2. While this is not fatal for most environments. it is for OCFS2 as we expect the hb thread to reading from and writing to the hb area atleast once every 12 secs (default). 76. title Linux 2.conf: title Red Hat Enterprise Linux AS (2.6. 77.img To see the current kernel command line. Due to user reports of nodes hanging during fencing.244-bigsmp root=/dev/sda5 vga=0x314 selinux=0 splash=silent resume=/dev/sda3 elevator=deadline showopts console=tty0 console=ttyS0. It comes up when there is a failure in the cluster which breaks the nodes up into groups which can communicate in their groups and with the shared storage but not between groups.6.6. 78. This can happen immediately if the connection is closed . we are documenting this so as to make users aware that they are no longer going to see the familiar panic stack trace during fencing.EL. It does this so that other nodes won't get stuck trying to access its resources.244-bigsmp (with deadline) kernel (hd0. This bug has been addressed by Red Hat in RHEL4 U4 (2.EL) and Novell in SLES9 SP3 (2. Once that TCP connection is closed or idle it will not be reestablished until heartbeat thinks the other node has died and come back alive. Instead. 79.6. How does OCFS2's cluster services define a quorum? The quorum decision is made by a single node based on the number of other nodes that are considered alive by heartbeating and the number of other nodes that are reachable via the network.6. do: # cat /proc/cmdline QUORUM AND FENCING 75. * it sees an even number of heartbeating nodes and has network connectivity to at least half of them *and* has connectivity to the heartbeating node with the lowest node number. What is a quorum? A quorum is a designation given to a group of nodes in a cluster which are still allowed to operate on shared storage.9-22. While this change is internal in nature.5 no longer uses "panic" for fencing.

do: # chkconfig --list ocfs2 ocfs2 0:off 1:off 2:on 3:on 4:on 5:on 6:off 81. By default. 84. What is new in OCFS2 1. Why are OCFS2 packages for SLES9 and SLES10 not made available on oss. Ensure the ocfs2 init script is enabled.5-7. How does one list out the startup and shutdown ordering of the OCFS2 related services? * To list the startup order for runlevel 3 on RHEL4. the package versions do not match.2. It ships with OCFS2 1. then. Novell is shipping two SLES9 releases. It does this by waiting two iterations longer than the number of iterations needed to consider a node dead (see the Heartbeat section of this FAQ).5-7. like iscsi. We expect to resolve itself over time as the number of patch fixes reduce.com? As both Novell and Oracle ship OCFS2 on different schedules.3. What versions of OCFS2 are available with SLES10? SLES10 is currently shipping OCFS2 1.oracle.2.8. As OCFS2 is now part of the mainline kernel. do: # cd /etc/rc6. viz.7.d # ls S*ocfs2* S*o2cb* S*network* S10network S24o2cb S25ocfs2 * To list the shutdown order on RHEL4.d # ls K*ocfs2* K*o2cb* K*network* K19ocfs2 K20o2cb K90network * To list the startup order for runlevel 3 on SLES9/SLES10. To check whether the service is enabled. Then the node must wait long enough to give heartbeating a chance to declare the node dead. RELEASE 1. With this release.2 has two new features: * It is endian-safe. ia64 and big endian . do: # cd /etc/rc3.oracle.d/rc3. * The latest kernel with the SP3 release is 2. NOVELL'S SLES9 and SLES10 82. please ensure that that service runs before and shutsdown after the ocfs2 init service. 83. This script ensures that the OCFS2 volumes are umounted before the network is shutdown. What versions of OCFS2 are available with SLES9 and how do they match with the Red Hat versions available on oss. SP2 and SP3. It ships with OCFS2 1. * The latest kernel with the SP2 release is 2. How can one avoid a node from panic-ing when one shutdowns the other node in a 2-node cluster? This typically means that the network is shutting down before all the OCFS2 volumes are being umounted. The current default of 31 iterations of 2 seconds results in waiting for 33 iterations or 66 seconds. Same is true for the various Asianux distributions and for ubuntu..0. Please contact Novell to get the latest OCFS2 modules on SLES9 SP3. a maximum of 96 seconds can pass from the time a network fault occurs until a node fences itself. one can mount the same volume concurrently on x86.283. x86-64. 80. do: # cd /etc/init. we expect more distributions to bundle the product with the kernel.d/rc3.2 85.d # ls S*ocfs2* S*o2cb* S*network* S05network S07o2cb S08ocfs2 * To list the shutdown order on SLES9/SLES10.3.com? OCFS2 packages for SLES9 and SELS10 are available directly from Novell as part of the kernel.202.6. do: # cd /etc/init.5-1.6.2? OCFS2 1.but can take a maximum of O2CB_IDLE_TIMEOUT_MS idle time. SLES10 SP1 is currently shipping 1. If one is using iscsi or any shared device requiring a service to be started and shutdown.d # ls K*ocfs2* K*o2cb* K*network* K14ocfs2 K15o2cb K17network Please note that the default ordering in the ocfs2 scripts only include the network service and not any shared-device specific service.2.

# /etc/init.2.2. Effectively.0 modules.3 before changing in 1.4 and 1.2-1.) 91. Can I do a rolling upgrade from 1. (For the record. refer to the "Download and Install" section above.2 tools work with 1. So if you were using the default timeouts. flavor and architecture. 90.5? No.7 on EL4? Yes.2.2.5. * Supports readonly mounts.2. there is a catch.0 will not work with OCFS2 1. restart the cluster and mount the volume.5 on one node while another node is still on an earlier release.2. # rpm -Uvh ocfs2-2.rpm * Upgrade the module. While the network protocol is fully compatible across the two releases.2 is fully on-disk compatible with 1.6 to 1.d/o2cb configure * At this stage one could either reboot the node or simply. OCFS2 1. 86.i686.5 to 1. do: # chkconfig --list o2cb o2cb 0:off 1:off 2:on 3:on 4:on 5:on 6:off # chkconfig --list ocfs2 ocfs2 0:off 1:off 2:on 3:on 4:on 5:on 6:off * To update the cluster timeouts. (For more.i386.4 to allow for proper reference counting of lockres' across the cluster. Can I do a rolling upgrade from 1.4-2.2.0.2. Can I do a rolling upgrade from 1. upgrade the tools and console. the protocol remained the same between 1.2. UPGRADE TO THE LATEST RELEASE 88.2.d/o2cb unload * If required.2.2.3.4 on one node while another node is still on an earlier release (1. # umount -at ocfs2 * Shutdown the cluster and unload the modules.3 to 1. # chkconfig --add o2cb # chkconfig --add ocfs2 * To check whether the services are enabled. 92.5 to ensure all nodes were using the same O2CB timeouts. How do I upgrade to the latest release? * Download the latest ocfs2-tools and ocfs2console for the target platform and the appropriate ocfs2 module package for the kernel version.) * Umount all OCFS2 volumes.i386. do: # /etc/init. # rpm -Uvh ocfs2-tools-1.3 or older).d/o2cb offline # /etc/init.2-1. the tools needs to be upgraded to ocfs2-tools 1. Can I do a rolling upgrade from 1.2. However. 89.2.rpm * Ensure init services ocfs2 and o2cb are enabled.2. one cannot run 1. 87.2. Effectively.2.9-42.2.4? No. one cannot run 1. This fix was necessary to fix races encountered during lockres purge and migrate. The network protocol is fully compatible across both releases. The network protocol had to be updated in 1.2.2. Do I need to re-make the volume when upgrading? No.7 on EL5? Yes.0.0 to 1. Do I need to upgrade anything else? Yes.architectures ppc64 and s390x. the default cluster timeouts are not.4 to 1.2. you will have to specifically set those timeouts on the new nodes using service . The network protocol had to be updated in 1. The fs uses this feature to auto remount ro when encountering on-disk corruptions (instead of panic-ing).rpm ocfs2console-1.2 nor will 1. ocfs2-tools 1.ELsmp-1.6.

2. Ensure that you have unloaded the 1.5/1. For example. [user_dlm] One per node. Handles blockable file system tasks like truncate log flush. It gets the list of active nodes from the o2hb thread and sets up tcp/ip communication channels with each active node. What do I do? Check "demsg | tail" for any relevant errors. OCFS2 1. not configured for labeling audit(1139964740. Is a workqueue thread started when dlmfs is loaded and stopped on unload. (dlmfs is an in-memory file system which allows user space processes to access the dlm in kernel to lock and unlock resources. Users upgrading to 1. which involve taking dlm locks. Is a workqueue thread started when the cluster is brought online and stopped when offline. Do "dmesg | tail".7. [kjournald] One per mount. If the node is the dlm recovery master. o2net queues dlm tasks on this thread. 93.2.2. The cluster fails to load.) Handles lock downconverts when requested by other nodes.7. [ocfs2_wq] One per node.6 can expect the same behaviour as described above for upgrading to 1. It handles the network communication for all threads. 94.2.2. [dlm_reco_thread] One per dlm domain. type configfs). One common error is as follows: SELinux: initialized (dev configfs.9 are fully compatible. Change is activated on reboot. The above error indicates that you have SELinux activated. BUILD RPMS FOR HOTFIX KERNELS .o2cb configure command.2. [o2hb-14C29A7392] One per heartbeat device.. PROCESSES 96. [dlm_wq] One per dlm domain.2 modules.2. List and describe all OCFS2 threads? [o2net] One per node. it remasters all the locks owned by the dead node. It sends regular keepalive packets to detect any interruption on the channels. After upgrade I am getting the following error on mount "mount. If you see the error: ocfs2_parse_options:523 ERROR: Unrecognized mount option "heartbeat=local" or missing value it means that you are trying to use the 1.. This is the core dlm which maintains the list of lock resources and handles the cluster locking infrastructure. Is a workqueue thread started when ocfs2 module is loaded and stopped on unload. It also reads the region to maintain a nodemap of live nodes. Works in conjunction with kjournald. dmesg will indicate the differing timeout values.ocfs2: Invalid argument while mounting /dev/sda6 on /ocfs". Is a kernel thread started when the heartbeat region is populated in configfs and stopped when it is removed.7 to 1.0 modules and installed and loaded the 1. [dlm_thread] One per dlm domain. It downgrades locks when requested by other nodes in reponse to blocking ASTs (BASTs). its completion does not affect the recovery time. ocfs2rec handles the file system recovery and it runs after the dlm has finished its recovery.8/9 from 1. [ocfs2cmt-0] One per mount. [ocfs2vote-0] One per mount.9 on EL4 and EL5? Yes. It also fixes up the dentry cache in reponse to files unlinked or renamed on other nodes. 1. It notifies o2net and dlm any changes in the nodemap. Is a kernel thread which handles dlm recovery whenever a node dies.2. # Users that are not careful with the above are likely to encounter failed mounts on the upgraded node. A bug in SELinux does not allow configfs to mount.184:2): avc: denied { mount } for . It writes every 2 secs to its block in the heartbeat region to indicate to other nodes that that node is alive. orphan dir recovery and local alloc recovery. 95. Use service o2cb status to review current timeouts. [ocfs2rec-0] Is started whenever another node needs to be be recovered.8 and 1. ocfs2rec queues orphan dir recovery so that while the task is kicked off as part of recovery. Disable SELinux by setting "SELINUX=disabled" in /etc/selinux/config.0 modules.2. Is used as OCFS2 uses JDB for journalling. Is a kernel thread started when a volume is mounted and stopped on umount. Various code paths queue tasks to this thread. Can I do a rolling upgrade from 1. Is a workqueue thread. Is a kernel thread started when a dlm domain is created and stopped when destroyed. This could be either on mount when it discovers a dirty journal or during operation when hb detects a dead node.2 tools and 1.8 or 1.2. Is a kernel thread started when a volume is mounted and stopped on umount. Use modinfo to determine the version of the module installed and/or loaded.

guess targets rhel4_2.EL_rpm rhel4_2. Wrote Superblock . Where are the backup super blocks located? In OCFS2. # . say.3.2.2. If you wish official support.3 or later automatically backs up super blocks on devices larger than 1G.6.3 or later can attempt to retroactively backup the super block. 102.9-67./vendor/rhel4/kernel. # tunefs.3 Adding backup superblock for the volume Proceed (y/N): y Backed up Superblock. it becomes important to backup the block and use it when the super block gets corrupted. contact Oracle via Support or the ocfs2-users mailing list with the link to the hotfix kernel (kernel-devel and kernel-src rpms). It should be noted that the super block is not backed up on devices smaller than 1G. One can disable this by using the --no-backup-super option. 100.9-67. 98. 101.9-55. If so.oracle.2.0. The actual number of backups depend on the size of the device. as the super block stores critical information that is hard to recreate.ocfs2? tunefs. # cd /tmp # wget http://oss.EL-i686 # make rhel4_2. 4G.2.0./configure --with-kernel=/usr/src/kernels/2.gz # tar -zxvf ocfs2-1. Moreover.9-67. 64G. How do I backup the super block on a device formatted by an older mkfs. Are the self-built packages officially supported by Oracle Support? No. * Download and untar the OCFS2 source tarball.9-67. the super blocks are backed up to blocks at the 1G. What is a Backup Super block? A backup super block is a copy of the super block.2.ocfs2 1.12. Oracle Support does not provide support for self-built modules. How does one enable this feature? mkfs.rpmmacros %_topdir /home/jdoe/rpms %_tmppath /home/jdoe/rpms/tmp %_sourcedir /home/jdoe/rpms/SOURCES %_specdir /home/jdoe/rpms/SPECS %_srcrpmdir /home/jdoe/rpms/SRPMS %_rpmdir /home/jdoe/rpms/RPMS %_builddir /home/jdoe/rpms/BUILD * Ensure you have all kernel-*-devel packages installed for the kernel version you wish to build for.1. As the super block is typically located close to the start of the device. by an errant write (dd if=file of=/dev/sdX).ocfs2 --backup-super /dev/sdX tunefs.tar. # . it is susceptible to be overwritten.6.6.EL_rpm # The packages will be in %_rpmdir. How do I detect whether the super blocks are backed up on a device? # debugfs.3.gz # cd ocfs2-1.6.2/ocfs2-1.com/projects/ocfs2/dist/files/source/v1. BACKUP SUPER BLOCK 99.ocfs2 1.EL_rpm rhel4_2.ocfs2 -R "stats" /dev/sdX | grep "Feature Compat" Feature Compat: 1 BackupSuper 103. 16G.ocfs2 1. 256G and 1T byte offsets.97 How to build OCFS2 packages for a hotfix kernel? * Download and install all the kernel-devel packages for the hotfix kernel.rpmmacros contains the proper links. then the following command will list it as a possible target.3 * Ensure rpmbuild is installed and ~/.tar.EL_rpm * Configure and make.6.2. # cat ~/.

104.ocfs2: block 262144 is in use. set it to 31.19/drivers/scsi/BusLogic.5 and earlier. How do I ask fsck. The current default for this timeout is 30000 ms.The Network Idle timeout specifies the time in miliseconds before a network connection is considered dead. In releases 1. set it to 61. However. For 120 secs.The Disk Heartbeat timeout is the number of two second iterations before a node is considered dead. refer to the man pages. move that object temporarily to an another volume before re-attempting the operation. use the verify_backup_super script to list out the objects using these blocks.2.5 has 4 different configurable O2CB cluster timeouts: * O2CB_HEARTBEAT_THRESHOLD . . tunefs.ocfs2: block 4194304 is in use. The exact formula used to convert the timeout in seconds to the number of iterations is as follows: O2CB_HEARTBEAT_THRESHOLD = (((timeout in seconds) / 2) + 1) For e. In releases 1.g. The current default for this timeout is 60 secs (O2CB_HEARTBEAT_THRESHOLD = 31). * O2CB_IDLE_TIMEOUT_MS .5 and earlier.2. # . to specify a 60 sec timeout.However. List and describe all the configurable timeouts in the O2CB cluster stack? OCFS2 1. Pass 0a: Checking cluster allocation chains Pass 0b: Checking inode allocation chains Pass 0c: Checking extent block allocation chains Pass 1: Checking inodes and blocks.ocfs2 -f -r 2 /dev/sdX [RECOVER_BACKUP_SUPERBLOCK] Recover superblock information from backup block#1048576? n Checking OCFS2 filesystem in /dev/sdX label: myvolume uuid: 4d 1d 1f f3 24 01 4d 3f 82 4c e2 67 0c b2 94 f3 number of blocks: 13107196 bytes per block: 4096 number of clusters: 13107196 bytes per cluster: 4096 max slots: 4 /dev/sdX was run with -f. check forced.2. (tunefs. say./verify_backup_super /dev/sdX Locating inodes using blocks 262144 1048576 4194304 on device /dev/sdX Block# Inode Block Offset 262144 27 65058 1048576 Unused 4194304 4161791 25 Matching inodes to object names 27 //journal:0003 4161791 /src/kernel/linux-2.2. it was 12 secs (O2CB_HEARTBEAT_THRESHOLD = 7).3 tunefs.ocfs2 backs up the block only if all the backup locations are unused. CONFIGURING CLUSTER TIMEOUTS 105. do: # fsck. a journal. this will not work if one or more blocks are being used by a system file (shown starting with double slashes //).) # tunefs.ocfs2 1.ocfs2 --backup-super /dev/sdX tunefs.. tunefs. it was 10000 ms. Pass 3: Checking directory connectivity.ocfs2: Cannot enable backup superblock as backup blocks are in use If so.6.c # If the object happens to be user created. it is quite possible that one or more backup locations are in use by the file system. Pass 2: Checking directory entries. For more. Pass 4a: checking for orphaned inodes Pass 4b: Checking inodes link counts. All passes succeeded.ocfs2 to use a backup super block? To recover a volume using the second backup super block.

For e. It is generated by running /etc/init. if not 60 secs. The current values will be shown in brackets ('[]'). 109. Load O2CB driver on boot (y/n) [n]: y Cluster to start on boot (Enter "none" to clear) []: mycluster Specify heartbeat dead threshold (>=7) [7]: 31 Specify network idle timeout in ms (>=5000) [10000]: 30000 Specify network keepalive delay in ms (>=1000) [5000]: 2000 Specify network reconnect delay in ms (>=2000) [2000]: 2000 Writing O2CB configuration: OK Starting O2CB cluster mycluster: OK 110. Where are the O2CB timeout values stored? # cat /etc/sysconfig/o2cb # # This is a configuration file for automatic startup of the O2CB # driver.2. This failed connection results in a failed mount. Then use o2cb configure to set the new values. it was 5000 ms. In releases 1. This will configure the on-boot properties of the O2CB driver.5+ and Tools 1. Do the same on all nodes. Hitting without typing an answer will keep that current value. it is expected to respond.g. What are the recommended timeout values? # As timeout values depend on the hardware being used.2. The default has always been 2000 ms.The Network Reconnect specifies the minimum delay in miliseconds between connection attempts.5 and earlier. Ctrl-C will abort. If the other node is alive and is connected. 107.2. As in.d/o2cb status Module "configfs": Loaded Filesystem "configfs": Mounted Module "ocfs2_nodemanager": Loaded Module "ocfs2_dlm": Loaded Module "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking O2CB cluster mycluster: Online Heartbeat dead threshold: 31 Network idle timeout: 30000 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Not active 111. if not 120 secs.d/o2cb configure. How does one set these O2CB timeouts? Umount all OCFS2 volumes and shutdown the O2CB cluster.. The following questions will determine whether the driver is loaded on boot. # service o2cb configure Configuring the O2CB driver. .The Network Keepalive specifies the maximum delay in miliseconds before a keepalive packet is sent. users of multipath io should set the disk heartbeat threshold to atleast 60 secs. Start mounting volumes only after the timeouts have been set on all nodes.4+. there is no one set of recommended values.2. The o2net handshake protocol ensures that all the timeout values for both the nodes are consistent and fails if any value differs. users of Network bonding should set the network idle timeout to atleast 30 secs. upgrade to OCFS2 1.6 release to the following: O2CB_HEARTBEAT_THRESHOLD = 31 O2CB_IDLE_TIMEOUT_MS = 30000 O2CB_KEEPALIVE_DELAY_MS = 2000 O2CB_RECONNECT_DELAY_MS = 2000 108 Can one change these timeout values in a round robin fashion? # No. 106. The current default for this timeout is 2000 ms. What are the currect defaults for the cluster timeouts? The timeouts were updated in the 1.* O2CB_KEEPALIVE_DELAY_MS . How to find the O2CB timeout values in effect? # /etc/init. the reason for which is always listed in dmesg. If not already. * O2CB_RECONNECT_DELAY_MS . a keepalive packet is sent if a network connection between two nodes is silent for this duration. Similarly.

) References http://kerneltrap. O2CB_HEARTBEAT_THRESHOLD=31 # O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is considered dead.com/bugzilla/show_bug. OCFS2 . instead of /debug. (dlmfs still mounts at the old mountpoint /dlm.oracle. configfs is mounted at /sys/kernel/config. O2CB_IDLE_TIMEOUT_MS=30000 # O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is sent O2CB_KEEPALIVE_DELAY_MS=2000 # O2CB_RECONNECT_DELAY_MS: Min time in ms between connection attempts O2CB_RECONNECT_DELAY_MS=2000 ENTERPRISE LINUX 5 112.# Please use that method to modify this file # # O2CB_ENABELED: 'true' means to load the driver on boot. HEARTBEAT . . FREQUENTLY~ASKED~QUESTIONS . configfs and debugfs.cgi?id=741 http://lwn. the name of a cluster to start.com/projects/ocfs2-tools/dist/files/extras/verify_backup_super Keywords FENCE . while debugfs at /sys/kernel/debug. have different mountpoints. O2CB_BOOTCLUSTER=mycluster # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.net/Articles/166954/ http://oss. Please email us your comments for this document.oracle. O2CB_ENABLED=true # O2CB_BOOTCLUSTER: If not empty. instead of /config. Help us improve our service. What are the changes in EL5 as compared to EL4 as it pertains to OCFS2? The in-memory filesystems. .org/node/4394 http://oss.

Sign up to vote on this title
UsefulNot useful