You are on page 1of 2

Troubleshooting for when Robotic Drives are going into AVR mode, and

backups are halting with a pending mount request

Problem

Troubleshooting for when Robotic Drives are going into AVR mode, and backups are halting
with a pending mount request.
Error

TLD(0) unavailable: initialization failed: Control daemon connect or protocol error


Solution

The cause of this problem is most often a result of communication problems. There are two
NetBackup daemons for robotic control:  one runs on the machine with robotic control, the other
runs on the machine that has drives in the robot. For example, if the robot is a TLD robot the two
daemons are tldcd (runs on the server with robotic control) and tldd (runs on server with drives
on the robot). In this commonly occurring problem, the drives will change from TLD control to
AVR control.  This is so the jobs will go into a pending mount state, rather than failing.  That
happens so that if network communications were to fail between two server for a short time, then
there would be no need to fail the jobs and they could wait until the connection comes back up.
However, at times this can be caused by more severe problems.

In the media server's system log would be an error such as this:


Dec 4 08:54:36 host01 tldd[260]: TLD(0) unavailable: initialization failed:
Control daemon connect or protocol error
Dec 4 08:56:41 host01 tldd[858]: TLD(0) [858] unable to connect to tldcd on
host02: Error 0 (0)

The above error is what will cause drives in a robot to go into an AVR control mode.  This is
because these two daemons are unable to communicate.

It is not possible to give a single cause or a single solution.

Some common causes are:

1. Network connectivity has just plain failed. In this case, the network must be restored.

2. There are multiple interfaces on one or both of the machines that cannot route to or resolve
each other. In this case, either routing needs to be changed so that a request going will be able to
reach its destination.  Adding the proper host names to the /etc/hosts file has been shown to work
in some situations.

3. The tldcd daemon has enters an uninterruptible state or is hung, thus making it unable for it to
reply to tldd. In this case, shutdown the media management daemons by running
/usr/openv/volmgr/bin/stopltid.
Next, run /usr/openv/volmgr/bin/vmps to get the pid (process ID) of the tldcd daemon and run
a kill command on it. If that doesn't work, use kill -9. If this does not kill the process, the server
will have to be rebooted. To restart the daemons, run /usr/openv/volmgr/bin/ltid.

Note: The daemon does not time out because it is hung on a system call. This is something out of
an application's ability to control.

4. The /etc/services file is missing the correct entries on one or both the servers. Below are the
entries that should be in /etc/services:
# Media Manager services #
vmd     13701/tcp       vmd
acsd    13702/tcp       acsd
tl8cd   13705/tcp       tl8cd
odld    13706/tcp       odld
tldcd   13711/tcp       tldcd
tl4d    13713/tcp       tl4d
tshd    13715/tcp       tshd
tlmd    13716/tcp       tlmd
tlhcd   13717/tcp       tlhcd
rsmd    13719/tcp       rsmd
# End Media Manager services #

Note: Not only can this happen between two different servers, it can also happen on the same
server.

You might also like