You are on page 1of 9

System Administration Toolkit: Migrating and moving

UNIX directory trees


Martin Brown July 25, 2006

Occasionally, you need to copy around an entire UNIX® directory tree, either between areas on
the same system or between different systems. There are many different methods of achieving
this, but not all preserve the right amount of information or are compatible across different
systems. This article discusses the various options available for UNIX and how best to make
them work.
View more content in this series

About this series


The typical UNIX® administrator has a key range of utilities, tricks, and systems he or she uses
regularly to aid in the process of administration. There are key utilities, command-line chains, and
scripts that are used to simplify different processes. Some of these tools come with the operating
system, but a majority of the tricks come through years of experience and a desire to ease the
system administrator's life. The focus of this series is on getting the most from the available tools
across a range of different UNIX environments, including methods of simplifying administration in a
heterogeneous environment.

Using cp
The standard cp command is capable of copying entire directory trees if you use the -r command-
line option to recurse into subdirectories. This option performs an unspecified operation on non-
standard files. Some UNIX variants and the GNU cp tool support the -R option that correctly copies
named pipes, links, and other files.

At the simplest level, the cp command can copy one directory to a new directory with a different
name (see Listing 1).

Listing 1. cp command -- copying one directory to a new directory with a


different name
$ cp -r srcdir destdir

You should, however, be careful when you specify source files and target locations with the cp
command, as the way they are handled can have a significant effect on the results. For example,

© Copyright IBM Corporation 2006 Trademarks


System Administration Toolkit: Migrating and moving UNIX Page 1 of 9
directory trees
developerWorks® ibm.com/developerWorks/

let's assume that you want to copy the directory /home/mc to the directory /export/home/mc. If /
export/home/mc does not exist, then Listing 2 copies the directory /home/mc to /export/home/mc.

Listing 2. Specifying source files and target locations with the cp command
$ cp -r /home/mc /export/home/mc

If, however, /export/home/mc already exists, then Listing 2 copies the directory /home/mc into the
directory, creating the new directory /export/home/mc/mc.

To copy the contents of one directory into an existing directory, select the files in the source
directory, as shown in Listing 3.

Listing 3. Copying the contents of one directory into an existing directory


$ cp -r /home/mc/* /export/home/mc

One very useful option with the cp tool is to use the -p command-line option, which also ensures
that the permissions and ownership of each file are retained.

Using tar
The tar command was originally developed for archiving files to tape (literally, tape archive). For
example, you might copy the files in the current directory to a tape using the command in Listing 4.

Listing 4. Copying the files in the current directory to tape using tar
$ tar cf /dev/rmt0 .

Listing 4 can be dissected as follows:

• The c option creates a new archive.


• The f option uses the next option on the command line as the name of the destination. In this
case, use the first raw tape device (/dev/rmt0). You could also create a tar file with all of the
information in it.
• The . tells tar to add every file and directory (and all the files and directories below the current
one) to the archive.
However, rather than copy files and a directory structure to a tape, you can also use tar to copy
into a file. Even more usefully, you can copy files into the standard output and, then using pipes,
you can extract the files from the standard input and copy the files from one location to another.
The tar command is also generally more reliable at copying and recreating non-standard file types
on systems, as the cp command does not support the -R command-line option.

For example, Listing 5 shows how to copy the files from the current directory to an existing
directory.

Listing 5. Copying the files from the current directory to an existing directory
$ tar cf - . | (cd DIR; tar xf - )

System Administration Toolkit: Migrating and moving UNIX Page 2 of 9


directory trees
ibm.com/developerWorks/ developerWorks®

Listing 5 can be dissected as follows:

• tar cf - . creates a new archive, to standard output, of the files in the current directory.
• cd DIR changes the directory. Note that this directory should exist before you start copying
files into it.
• tar xf - extracts the files from the standard input.
• By placing the above two components into parentheses, they are effectively treated as one
command, rather than two, with the change directory command occurring before the archive
is extracted.
• The pipe between the two (|) feeds the standard output from the first tar into the standard
input of the second, effectively copying the files into, and then out of, a non-existent archive
file.

The tar command retains the full path of the files included in the archive, if you specify the path
explicitly. Listing 6 copies files into the archive with their explicit path, which means that they
cannot be extracted anywhere but back to their original location.

Listing 6. Specifying the path explicitly


$ tar cf myhome.tar /home/mc

Some tar variants include support to strip off the leading forward slash, enabling you to extract the
files anywhere. To ensure you can always put the files where you want, you should add files from
the current directory, using Listing 7.

Listing 7. Adding files from the current directory


$ cd /home/mc
$ tar cf myhome.tar .

The tar command has an advantage over cp, in that you can monitor the transfer of files as they
are copied between the source and destination by adding the v command-line option to switch on
verbose mode. Generally, it is best to use this on the portion of the command that is extracting
files instead of creating them, as it ensures that the files have been copied properly, rather than
confirming that they have been read properly (see Listing 8).

Listing 8. Adding the v command-line option


$ tar cf - .|(cd /tmp/mc; tar xvf -)
./
./.bash_aliases
./.bash_history
./.bash_path
./.bash_profile
./.bash_vars
./.bashrc
./xmlsimple.pl
./rest.xml
...

System Administration Toolkit: Migrating and moving UNIX Page 3 of 9


directory trees
developerWorks® ibm.com/developerWorks/

Note that if the tar supported on your system has problems with long pathnames, then it might not
support the newer tar format. GNU tar supports the new tar format and has no problems with long,
or very deep, pathnames.

By default, most tar variants correctly copy and recreate files and directories with the same
ownership and permission information, however, some variants adapt this information if you are
running the root user and change the ownership when the files are extracted. You can ensure that
permissions and ownership are preserved using the p option (see Listing 9).

Listing 9. Using the p option


$ tar cpf - .|(cd /tmp/mc; tar xvpf -)

Finally, you can also create a new directory for the files to be copied into, by extending the second
half of the command (see Listing 10).

Listing 10. Creating a new directory for the files to be copied into
$ tar cpf - .|(mkdir /tmp/mc; cd /tmp/mc; tar xvpf -)

On its own, tar is a very useful tool for copying files and directory structures. However, it really
comes into its own when you use it to copy files over a network. Before you look at that trick, you'll
use the same basic method with another archiving tool, cpio.

Using cpio
The cpio tool is similar to the tar tool, but rather than accepting a file or directory specification, you
must supply it with a list of files. This can be more practical if you only want to copy specific files.
For example, to create a cpio archive containing specific directories, you might use Listing 11.

Listing 11. Creating a cpio archive containing specific directories


$ ls ./dira ./dirc |cpio -ov > diranc.cpio

The ls portion of this command outputs a list of the files (in this case, the contents of the two
directories) to be copied. The latter half is the cpio command to copy them into archive. By
dissecting this, you get two options:

• The o option copies files out to an archive.


• The v option displays a list of files as they are copied, which is useful for verification.
The actual archive is created by redirecting the output of cpio into a new file.

The above command is limited, in that it will only copy in files that are explicitly listed. The best
way to copy in an entire directory is to use the find command (see Listing 12).

Listing 12. Using the find command to copy in an entire directory


$ find . |cpio -ov >archive.cpio

System Administration Toolkit: Migrating and moving UNIX Page 4 of 9


directory trees
ibm.com/developerWorks/ developerWorks®

To extract files from a cpio archive, use the i command-line option. You should also use the d
option to ensure that any directories in the archive that do not exist in the destination structure
are recreated. By using the two together, you can copy from one directory to another, as shown in
Listing 13.

Listing 13. Using the i and d options together


$ find . |cpio -ov |(cd /tmp/mc; cpio -idv)
.
./.bash_aliases
./.bash_history
./.bash_path
./.bash_profile
./.bash_vars
./.bashrc
./xmlsimple.pl
./rest.xml
46 blocks
.
.bash_aliases
.bash_history
.bash_path
.bash_profile
.bash_vars
.bashrc
xmlsimple.pl
rest.xml
46 blocks

Because you use verbose mode in both portions of the command, you can confirm whether the
size of the archive created and extracted is identical. In this case, both operations used 46 blocks.

Note that cpio will not overwrite files on the destination if they have the same, or newer,
modification time.

Copying over a network


An obvious way of transferring files over a network within UNIX is to use Network File System
(NFS) to mount the remote directory and copy between them. That is a straightforward solution,
but it is not always possible, or practical, for all situations.

One of the simplest ways to copy files over a network is to use tar or cpio to create an archive
file, which you can then transfer over a network. The method has some advantages, such as the
flexibility of how and when you copy the files, but also has disadvantages, including the complexity
of the copy process and the disk space requirement to store a complete duplicate of the files, both
when you create the archive on the source and when you copy the archive to the destination.

As you've seen, it's straightforward to create an archive:

Listing 14. Creating an archive


$ tar cf mydir.tar .

System Administration Toolkit: Migrating and moving UNIX Page 5 of 9


directory trees
developerWorks® ibm.com/developerWorks/

You can then copy the file over using whatever method is appropriate, for example, copy the file
over with cp and NFS, or transfer to a remote system with FTP or SFTP.

The archive file method, however, is not a particularly efficient method. You can improve the
efficiency by using compression.

Using compression
If you are creating an archive with cpio or tar and are copying the file to a destination over a slow
link (for example, a WAN or the Internet ,rather than a LAN environment), then you can save time
by compressing the archive file before transfer. Choosing the right compression format will be
dependent on the level of compression you want.

The archive method is straightforward. You can either do it post archive creation, as shown in
Listing 15.

Listing 15. Archiving post archive creation


$ tar cf mydir.tar .
$ bzip2 mydir.tar

You can also do it by using a pipe to generate a compressed version of the archive (see Listing
16).

Listing 16. Using a pipe to generate a compressed version of the archive


$ tar cf - .| bzip2 >mydir.tar.bz2

The method in Listing 16 has the benefit that it works with all versions of tar, cpio, or any other
archive tool. It also works across a range of different platforms, where different variants of tar
might or might not support inline compression. If you have a version of GNU tar installed, you can
compress using Gzip by using the z command-line option to tar (see Listing 17).

Listing 17. Using the z command-line option to tar


$ tar zcf mydir.tar.gz .

Another alternative for copying directories between systems is to use the pipe solution shown in
Listing 16, but then use a remote shell tool as the destination.

Copying directly over a network


You can copy directly over a network by piping the output from a typical tar or cpio command
through a remote shell, such as remote shell (rsh) or secure shell (ssh). Which remote shell
technology you use is entirely up to the shells available in your environment. The former, rsh, is a
standard remote shell system that offers basic authentication security, but no encryption, while the
latter, ssh, offers both authentication and encryption of the data.

System Administration Toolkit: Migrating and moving UNIX Page 6 of 9


directory trees
ibm.com/developerWorks/ developerWorks®

Both methods use the same basic command-line structure (see Listing 18).

Listing 18. Copying directly over a network


$ tar cf - ./*|rsh remotehost tar xf - -C /remotedir

This command is similar to the localized tar, except that the destination tar command is being
executed on the remote system. The system works because of the pipe between the two
commands.

Remember that depending on your remote shell configuration, you might need to enter a password
during the process to authenticate on the remote machine. The same process can also be used
with ssh. Listing 19 specifies a user/host combination.

Listing 19. Specifying a user/host combination for authentication on the


remote machine
$ tar cf - ./*|ssh user@remotehost tar xf - -C /remotedir

For better performance over slow links, you should use compression, as shown in Listing 20.

Listing 20. Using compression when copying directly over a network


$ tar czf - ./*|ssh user@remotehost tar xzf - -C /remotedir

Both rsh and ssh also have simpler command-line cousins that can make the process of copying
from a remote system even more straightforward. For example, with rcp, the cousin to rsh, you
would use Listing 21.

Listing 21. Copying a remote system with rcp


$ rcp -r filename remotehost:/remotedir

You must use the -r command-line option to copy directories recursively.

The scp command, cousin to ssh, uses the same structure (see Listing 22).

Listing 22. Using scp


$ scp -r filename remotehost:/remotedir

Synchronizing over a network


All of the above solutions have been concerned with copying files, both locally and over a network.
However, they all rely on copying an entire directory structure each time the copy is made when
it might not always be necessary. Sometimes, you only need to copy the files that have changed
since the last time you performed a copy, essentially synchronization rather than a complete re-
copy.

System Administration Toolkit: Migrating and moving UNIX Page 7 of 9


directory trees
developerWorks® ibm.com/developerWorks/

If you are using tar or cpio, then you can achieve a time-based synchronization by explicitly
specifying the files that you want to include in the archive. For example, if you are running a
synchronization job through cron, then you can use a command like this to create an archive that
only contains files changed within the last day (see Listing 23).

Listing 23. Creating an archive that only contains the files changed within the
last day
$ tar cf archive.tar `find . -mtime -1 -type f`

The find command finds files where the modification has been changed in the last one day. I only
select files, because if you include directories, then tar includes all files within that directory and
includes more information than you want in the archive file.

For a more robust synchronization, you can use the rsync tool, a free software utility that can
efficiently exchange files over the network. The rsync tool can be an effective way of copying and
synchronizing files, especially over slower links.

Summary
There are a wide range of different tools and choices available to you when copy files and directory
trees in UNIX, whether on the same system or between systems over any kind of network. Which
tool you use depends on your exact situation and environment. I tend to use tar, because it is the
most compatible tool across the range of different UNIX systems that I use. For users in Linux®
environments, the scp tool, which is a standard component on most Linux distributions, might be
more appropriate.

System Administration Toolkit: Migrating and moving UNIX Page 8 of 9


directory trees
ibm.com/developerWorks/ developerWorks®

Related topics
• System Administration Toolkit: Process administration tricks: Check out other parts in this
series.
• Bash: Bash is an alternative shell to the standard Bourne shell with similar syntax, but an
expanded range of features, including aliasing, job control, and auto-completion of file and
directory names.
• AIX® and UNIX articles: Check out other articles written by Martin Brown.
• Search the AIX and UNIX library by topic:
• System administration
• Application development
• Performance
• Porting
• Security
• Tips
• Tools and utilities
• Java technology
• Linux
• Open source
• AIX and UNIX: The AIX and UNIX developerWorks zone provides a wealth of information
relating to all aspects of AIX systems administration and expanding your UNIX skills.
• IBM trial software: Build your next development project with software for download directly
from developerWorks.
• developerWorks technical events and webcasts: Stay current with developerWorks technical
events and webcasts.
• Podcasts: Tune in and catch up with IBM technical experts.

© Copyright IBM Corporation 2006


(www.ibm.com/legal/copytrade.shtml)
Trademarks
(www.ibm.com/developerworks/ibm/trademarks/)

System Administration Toolkit: Migrating and moving UNIX Page 9 of 9


directory trees

You might also like