You are on page 1of 6

Mysteries of fbackup Revealed

One of the most frequently asked questions about fbackup and frecover, second only to "Is this thing
working correctly", is "How much of my data can fit on a tape". This question is more often phrased as
"How come when I back-up my 400MB filesystem, it doesn't fit on my 8.0GB Data-Compression DDS?"

To fully understand the answer to this question, it is helpful to understand at least a high-level view of
fbackup's format and algorithms, and helpful to have the tools (see "New Tool" below) to find a more
specific answer.

Records and Blocks

The commercial world is accustomed to logical records being combined into blocks, and blocks
physically written to tape or disk. Typically these records are about 80 bytes, as in a document. If a
blocking factor of 100 were used, then a block size of about 8K bytes would be written to tape, followed
by an interrecord gap.

Terminology can get a little confusing when trying to adapt this terminology to fbackup, because fbackup,
has its own notion of records and blocks. Its starting point is a block which it assumes consists of an
unknown amount of raw data (known in the commercial world as logical records). Its block size currently
is 1024 bytes. Further, fbackup combines these blocks into its own grouping, which hereafter is referred
to as an fbackup record. Its uses the blocksperrecord option to do specify the number of logical blocks
which define an fbackup record. Using the default blocksperrecord of 16, an fbackup record would be
16k bytes, and would be internally structured as 16 logical 1k blocks. These 16k fbackup records would
each be written to the output device as individual 16k write(2) calls.

Beyond this level, how the output device handles the fbackup-record sized output is device dependent,
and relatively transparent to fbackup; if fbackup writes a 16k record to a file on an 8k-block filesystem,
the space on disk would occupy two complete 8k HFS (High Speed File System) blocks. If fbackup
writes a 16k record to half-inch magnetic tape or DDS, the space occupied on the tape would be 16k
contiguous bytes followed by an interrecord gap, considered one tape record, plus low-level tape format
overhead.

The purpose of this block and record arrangement in fbackup is to keep logical structures small while
keeping I/O (tape record) sizes large. This allows file data to be broken into manageable pieces, while the
large I/O size translates to faster throughput. With the exception of I/O transactions to the backup device,
fbackup deals almost exclusively with 1k blocks of file data, header data and trailer data (statistics kept by
fbackup about the file, also known as data about data or "meta-data").

All 1k blocks of data and meta-data are stored in memory in the order they will be written out, regardless
of the order they were read-in. Since file boundaries do not necessarily occur on record boundaries, large
files could occupy many records (or even many volumes), and many small files could occupy single
records.

The Internals of fbackup

The fbackup utility consists of three executables: /etc/fbackup, /etc/fbackuprdr and /etc/fbackupwrtr. The
fbackup executable is the main controlling program, and it invokes separate processes to do reading and
writing during the backup. It is important to note that the reader and writer subprocesses of fbackup
(/etc/fbackuprdr and /etc/fbackupwrtr) are invoked with their absolute pathnames, regardless of where the
fbackup executable resides. Before testing any version of fbackup, make sure that all three updated
executables are in the /etc directory, or they will not be correctly invoked.

Before the subprocesses do any real work, the main process reads the include and exclude options or
graph file entries, and searches the included directories recursively to generate a complete listing of all
files which are to be backed-up in this session. This list is completely generated and sorted before any
backup data is written to tape.

At startup time, the fbackup process sets-up a shared-memory "ring" area and a "pad" area to
communicate data and control between processes, and invokes one fbackupwrtr and one or more
fbackuprdr processes. The ring is divided into multiple fbackup records that normally correspond to
written output records (see the format section, and the Fast Search Mark exception below). The pad
section contains groups of status structures which give either the status of each of the reader processes or
the status of each of the records in the ring. The number of records stored simultaneously in the ring and
the number of reader processes invoked are configurable in the "-c config" file optionally specified to
fbackup.

After setting-up shared memory, and invoking the child processes, fbackup searches the included
directories for files, and generates the complete list of files which are to be backed-up in this session.
After recursively reading all directories, the filename list is sorted, and the excluded files are removed
from the list.

The general flow of the data for each file is that the main process finds the next file to backup, finds free
space in the next record in the ring, generates and writes header information to the ring, and chooses a
reader process to read data to asynchronously fill the in-memory record. Once the main process has set-
up the control structure in the pad area to indicate to the reader what to read and where to put the data (the
next space sequentially available in the ring), it sends a signal to the chosen reader process to wake it up.
The main process can now asynchronously continue this loop for the next file in the list. The signalled
reader processes read their data into their designated area in the ring, then send a signal to the main
process indicating its completion, and the main process modifies statuses accordingly. The main process
also writes a trailer block to the ring to indicate the file's final backup status. It follows this trailer block
with the next file's header block (in the same record if there is space), and the algorithm repeats.

Once the main process identifies that an entire ring record has been completely filled with information
(header, trailer and file data from the various reader processes), a signal is sent to the writer process
indicating that there is work to be done. One entire record is read from the ring and written to the output
device, and a signal is sent to the main process to indicate completion. Again, all statuses are checked
and acted upon by the main process as appropriate.

The Format

The format of an fbackup volume incorporates slight variations based on the device and HP-UX release
used. The device is significant in that when writing to tape devices the backup is checkpointed with
checkpoint records and EOF markers, whereas these are not written to non-tape devices (the reasoning
here is left as an exercise to the reader). For the purposes of this discussion, I will refer to the output of
fbackup as on a tape device, as it is the most common usage; for non-tape devices, remove all references
to EOF markers and checkpoint records.
Each tape (or "volume") contains structures in the following order:

!"#$!!!!!!!!!!!!!!!!!!!!!!!!%%!&'()*+,!-+./!./(0/!(/!/12!'23455453!6&!/(,2!
!789:!9/(5;(0;!<('2<!!=>*?!!%%!,<()216<;20!%%!<('2<!45&6!)+0025/<@!56/!+.2;!
!A#B!
!C6<+-2!D2(;20!!!!!!!!=E*?!!%%!:5&60-(/465!)(5!'2!.225!F4/1!G&02)6H20!%CG!
!A#B!
!I+-+<(/4H2!:5;2J!!!!!=H(04('<2?!%%!.225!F4/1!G&02)6H20!%:G!
!A#B!
!I12)*,645/!K2)60;!!!!=LE!'@/2.?!
!M(/(!
!A#B!
!I12)*,645/!
!M(/(!
!!!!N!
!!!!N!
!!!!N!
!
!A#B!
!A#B!

Here, "Data" is the collection of file data and meta-data described in the Records and Blocks section. For
each file, there is a triplet of header, file data, and trailer included in this "Data".

Volume Overhead

The ANSI label contains the string "ANSII standard label not yet implemented", and serves no real
function currently.

The volume header contains much of the details about how the backup as a whole, including the
information specified in the "-c config" file at backup time. It also includes information about how often
the tape has been used, a unique identifying stamp (time and machine based), and the sequence number of
this backup volume ("tape"). Most of this info is used to appropriately configure frecover at recovery
time.

The index lists all files which were to be backed-up (and is in the order in which the files were backed-up,
which is a lexicographic sort of all of the filenames), and the number of the tape on which the file is
assumed to reside. It is important to note that at the start of each tape, fbackup assumes that all remaining
files could fit on the current tape, so the index will list all remaining files as being stored on it. If the
backup contains multiple tapes, then only the index on the very last tape will have all correct volume
numbers.

Data

Our fbackup tape already has over (3k + index_size) bytes on it, and we haven't started writing file data to
tape yet. As mentioned earlier, each file (or directory or special file, etc) backed-up causes three
structures to be written-out: a file header, file data, and a file trailer. The file header contains the file
name, many of the fields from the stat(2) structure, along with information from the backup process itself,
such as the file's sequence number, an identification tag for the backup, and a checksum to ensure the
validity of the header block. Most headers are one 1k block in size, but headers have variable-width
fields which can grow into multiple header blocks if necessary. File trailers are also one 1k block in size,
and contain some information about the file -- most important of which is the file status flag (see "Active
Files" below), and have a checksum similar to that of the header. Checksums are calculated for the file
headers and trailers but not for the file data itself; checksum errors indicate problems with the fbackup
structures, but may or may not indicate errors with the surrounding data.

The file data lies between the file header and a file trailer blocks, and is just the raw data from the file
itself, null-filled at the end to roundup into 1k blocks.

For those keeping track, for each file this means that in addition to the file data itself, 1k is stored in
header info, 1k in trailer info, and the data is rounded-up to the nearest 1k boundary, giving an average of
2.5k overhead for each file in the backup (2k of meta-data and an average 0.5k unused in the last logical
block of a file's data).

Special Notes on hardware-compression Devices

If fbackup writes to a tape device in compression mode, the number of bytes which are physically written
is determined by the tape drive's firmware depends heavily on the data itself, and is unknown to the
application, fbackup -- to fbackup, I/O to a compression device appears transparently like I/O to a half-
inch magnetic tape drive, regardless of the number of bytes actually written to tape after compression. As
of this writing, there is no practical way to determine the percentage of a tape used in compression mode,
other than full.

Since fbackup makes no special considerations for compressed data or devices with built-in hardware data
compression, there can be a significant performance issue if someone is trying to back-up compressed
data to a drive with built-in compression. Most compression drives' algorithms will attempt to compress
already-compressed data; this can slow-down the drive's throughput considerably, causing a backup to
take more space and time than if the drive was not used in compressed mode. If you are backing-up large
amounts of compressed data, you will generally get better performance if the compressed data is backed-
up separately, with the compressed data going to a drive in non-compressed mode, and the non-
compressed data going to a drive in compressed mode. As always, a small amount of experimentation
will show which is the best backup method for any particular set-up.

Checkpoint Records

The checkpoint records are used as sanity checks during the backup to recover from bad tape conditions.
In the event of error, the tape can be rewound to the last checkpoint ("last known good spot on the tape").
From th information in the checkpoint record, fbackup can determine where it was in the backup (which
file it was working on, and how much of the file was backed up) when the checkpoint record was written,
and can continue the backup from this state at the start of the next tape so that no data is lost. This
process is known as making the volume "salvageable". Checkpoint records are written between the
records, and their frequency can be controlled using the "checkpointfreq" value in the "-c config" file.
Checkpoint records are 62 bytes long, preceded by an EOF marker, and can affect performance similarly
to fast search marks.

Fast Search Marks

DDS offers set marks or fast search marks which can be used by fbackup to locate files more quickly.
Although they take a negligible amount of space on the tape, they could affect backup performance
because they do not occur on ring record boundaries, as do checkpoints, and could take the tape out of its
normal streaming mode to write the tape marker. Most of the time, header and file data are being grouped
into contiguous records, but if a fast search mark needs to be written in the middle of a record (at the start
of a file's header), the pending record is split; the first part is written to tape, then the fast search mark and
finally the remaining part of the record. Also, due to the SCSI driver taking the drive out of immediate-
reporting mode to write any tape mark, interrupting the normal flow of the backup to write a fastsearch
mark can take up to four seconds to write each mark (this extra time is particular to the SCSI DDS driver,
and hopefully will be lessened in future drivers).

Choose a worst-case scenario: fbackup is writing-out 20,000 2k files with a fast-search mark in front of
each file. After the initial volume headers, this will generate one 4k record (2k file data, 1k header, 1k
trailer), a fast search mark, another 4k record, another fast search mark, and so on. Without fast-search
marks, this data could be written to tape in 128k records, rather than being broken into smaller blocks.
The performance considerations are far more severe: a SCSI DDS could take up to 4 seconds to put the
drive into synchronous mode and write each of these marks. 20,000 marks at 4 seconds each is an extra
22 hours of backup time for a 40MB backup that would otherwise complete in much under an hour.

Typically however, the extra overhead in space and time to do all of this is minimal relative to the amount
of time and space used by the actual data files in an average system backup. As always, your mileage
may vary: if you are backing-up many small files, use more files between fsm's; with larger files, use
fewer files between fsm's.

Special files

A particular overhead point for backups, special files have no file data associated with them, so take-up
only one inode slot in the file system, but take up 2k of space on the backup (one header and one trailer
block).

Inaccessible files

Since the filename list is generated at the start of the backup, there is a considerable window for changes
to the filesystem before the file data is actually backed-up. During this time, if the file is removed or
made unreadable, it will be skipped, and no further information about this file will be written to tape
except the listing in the index. If there are additional tapes in this backup, the indices at the start of
following tapes will not show this file as written to any numbered volume, however there will be no index
indication if the skipped file appears on the last tape in the backup.

A special case of a file not being accessible is enforcement-mode file locking. The fbackup process
checks to see if a file is locked with enforcement-mode locking before trying to backup each file. If the
file is enforcement-mode locked at the time of backup, fbackup loops until the retry limit is reached or the
file is unlocked. If the retry limit is reached, the file is skipped (since attempts to read the file would hang
the backup), similarly to the scenario where the file doesn't exist or is unreadable.

Active files

When writing a file to tape, fbackup attempts to make reasonably sure that a good copy is written to the
backup. This is done by checking the time stamp when starting to read the file, and checking the
timestamp again while continuing to read the file data. If the file's modification timestamp has changed
while the data was being written to tape, fbackup writes a "BAD" status flag to the file's trailer, and re-
reads the file, writing the file to tape again.
This process is repeated until the limits specified in the "-c config" file are reached (the default is 5
retries). The impact on tape space is that if a file is active (i.e. it's modification time changes during the
backup), it can consume up to (default) 5 times its normal space on the tape, as fbackup writes
consecutive bad backup copies to the tape.

Sparse Files

Sparse files can be created in the HP-UX filesystem such a way as to conserve disk space when the file
contains large amounts of null data. A sparse file can be written by using lseek(2) to point beyond the end
of a file, then writing data to the new end of file. All blocks between the old end of the file and the new
end of file are considered null-filled, and the file system will not allocate actual data blocks to hold the
null data. Although this type of file conserves disk space, utilities will not detect that the files are sparse -
- reads to the sparse areas of the file will correctly return blocks of nulls.

Utilities like fbackup will treat these files without any regard to their sparse qualities; if the file is 100
megabytes of nulls, fbackup will write-out 100MB of null data, even if it only took 32k of space on disk.
For these reason, sparse files pose a hidden danger: files which are larger than the file system in which
they reside. Copying such files into the filesystem (using normal utilities like cp, tar, cpio,
fbackup/frecover) frequently leads to unexpectedly full filesystems. To generate sparse files wherever
possible, use the -s option to frecover.

1/19/94

You might also like