Professional Documents
Culture Documents
x CONCEPTS:
=====================
Branch Priority:
================
Branch Mode:
============
Whiteouts:
==========
A whiteout removes a file name from the namespace. Whiteouts are needed when
one attempts to remove a file on a read-only branch.
./b0/
./b1/
./b1/foo
./union/
./union/foo
./b0/
./b0/.wh.foo
./b1/
./b1/foo
Opaque Directories:
===================
rm -fr a
Because branch 1 is not writable, we cannot physically remove the file /a/f
or the directory /a. So instead, we will create a whiteout in branch 0
named /.wh.a, masking out the name "a" from branch 1. Next, let's say we
try to create a directory named "a" as follows:
mkdir a
The problem now is that if you try to "ls" in the union, Unionfs will
perform is normal directory name unification, for *all* directories named
"a" in all branches. This will cause the file /a/f from branch 1 to
re-appear in the union's namespace, which violates Unix semantics.
Duplicate Elimination:
======================
Unlinking:
=========
Copyup:
=======
Cache Coherency:
================
Unionfs users often want to be able to modify files and directories directly
on the lower branches, and have those changes be visible at the Unionfs
level. This means that data (e.g., pages) and meta-data (dentries, inodes,
open files, etc.) have to be synchronized between the upper and lower
layers. In other words, the newest changes from a layer below have to be
propagated to the Unionfs layer above. If the two layers are not in sync, a
cache incoherency ensues, which could lead to application failures and even
oopses. The Linux kernel, however, has a rather limited set of mechanisms
to ensure this inter-layer cache coherency---so Unionfs has to do most of
the hard work on its own.
Maintaining Invariants:
The way Unionfs ensures cache coherency is as follows. At each entry point
to a Unionfs file system method, we call a utility function to validate the
primary objects of this method. Generally, we call unionfs_file_revalidate
on open files, and __unionfs_d_revalidate_chain on dentries (which also
validates inodes). These utility functions check to see whether the upper
Unionfs object is in sync with any of the lower objects that it represents.
The checks we perform include whether the Unionfs superblock has a newer
generation number, or if any of the lower objects mtime's or ctime's are
newer. (Note: generation numbers change when branch-management commands are
issued, so in a way, maintaining cache coherency is also very important for
branch-management.) If indeed we determine that any Unionfs object is no
longer in sync with its lower counterparts, then we rebuild that object
similarly to how we do so for branch-management.
While rebuilding Unionfs's objects, we also purge any page mappings and
truncate inode pages (see fs/unionfs/dentry.c:purge_inode_data). This is to
ensure that Unionfs will re-get the newer data from the lower branches. We
perform this purging only if the Unionfs operation in question is a reading
operation; if Unionfs is performing a data writing operation (e.g., ->write,
->commit_write, etc.) then we do NOT flush the lower mappings/pages: this is
because (1) a self-deadlock could occur and (2) the upper Unionfs pages are
considered more authoritative anyway, as they are newer and will overwrite
any lower pages.
Implementation:
Limitations:
Our implementation works in that as long as a user process will have caused
Unionfs to be called, directly or indirectly, even to just do
->d_revalidate; then we will have purged the current Unionfs data and the
process will see the new data. For example, a process that continually
re-reads the same file's data will see the NEW data as soon as the lower
file had changed, upon the next read(2) syscall (even if the file is still
open!) However, this doesn't work when the process re-reads the open file's
data via mmap(2) (unless the user unmaps/closes the file and remaps/reopens
it). Once we respond to ->readpage(s), then the kernel maps the page into
the process's address space and there doesn't appear to be a way to force
the kernel to invalidate those pages/mappings, and force the process to
re-issue ->readpage. If there's a way to invalidate active mappings and
force a ->readpage, let us know please (invalidate_inode_pages2 doesn't do
the trick).
Certain file systems have micro-second granularity (or better) for inode
times, and asynchronous actions could cause those times to change with some
small delay. In such cases, Unionfs may see a changed inode time that only
differs by a tiny fraction of a second: such a change may be a false
positive indication that the lower object has changed, whereas if unionfs
waits a little longer, that false indication will not be seen. (These false
positives are harmless, because they would at most cause unionfs to
re-validate an object that may need no revalidation, and print a debugging
message that clutters the console/logs.) Therefore, to minimize the chances
of these situations, we delay the detection of changed times by a small
factor of a few seconds, called UNIONFS_MIN_CC_TIME (which defaults to 3
seconds, as does NFS). This means that we will detect the change, only a
couple of seconds later, if indeed the time change persists in the lower
file object. This delayed detection has an added performance benefit: we
reduce the number of times that unionfs has to revalidate objects, in case
there's a lot of concurrent activity on both the upper and lower objects,
for the same file(s). Lastly, this delayed time attribute detection is
similar to how NFS clients operate (e.g., acregmin).