Professional Documents
Culture Documents
Over 30 years people worked with hierarchical le systems; le systems based on directories and les.
These le systems have proven their use, but today, with gigabytes of storage and millions of les,
managing les with directories is becoming increasingly dicult. This research presents an alternative to
hierarchy based le systems. At the basis of this system is a unied approach to all properties of a le.
Combining this approach with a drag and drop user interface creates an alternative that is as usable as
directories, while in the mean time going well beyond the expressive powers found in hierarchy based le
systems. The end result is a le system which makes working with les a lot easier on the user. Emphasis
of this systems lies at the user and not the operating system, therefor les like shared libraries do not
show up in this system, these should be stored by other means. The implementation is an abstraction
layer above a hierarchy based le system and the two work together in such a way that a high level of
backwards compatibility is achieved, not rendering current programs useless. This setup has been tested
on users and the results indicate that this system is a very valid alternative to hierarchical le systems.
Database File System
O. Gorter 1\
Foreword
This document is my graduation report for the University of Twente and is all about the Database File System
(dbfs); my graduation project. Because this document is mainly written for the University some content might
not be relevant to all readers. For those only interested in what the dbfs is, I recommend reading chapters
Introduction and Database File System Overview.
The document assumes the reader has a fair knowledge of current computer systems and related terms. As
a reference, most terms are explained in the chapter List of Used Terms and Abbreviations, related
research and documents are in chapter References.
All of the research in this document is done from a user and user interface point of view. This is dierent from
most research on le systems, and explains why compared to these writings, seemingly important information
is left out, while almost trivial points are discussed in depth.
For those who obtained this document digitally, it is available in two versions, one optimized for screen reading
and one optimized for print (dbfs-screen.pdf and dbfs-paper.pdf resp.). The screen version has a 4 : 3
layout with a slightly larger font and is denitely recommended when reading from a monitor.
Thanks
For all their advice, support, help and faith, I would like to thank Hans Scholten; Betsy van Dijk; Pierre Jansen;
all the others on the dies group; my atmates; and my family.
Also a special thanks goes out to all the usability testers, whom I will only mention by rst name, but you know
who you are.
Database File System Foreword
O. Gorter \
Albert
Bas
Coen
Edgar
Epco
Erwin
Gerben
Harry
Igor
Ivo
Janneke
Joris
Jeshua
Marc
Martin
Stefan
Tim
Rutger
Database File System
O. Gorter \1
Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1\
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Relevance :
: Hierachy Based File Systems Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
:.1 Properties of Hierarchy Based File Systems j
Database File System Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o
.1 Relevance 1o
.: Categorisation 11
i Database File System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
i.1 User Interaction 1i
j Database File System Internals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . :o
j.1 Server :1
j.: Client 1
j. Graphical User Interface 1
6 Usability Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . j
Database File System Contents
O. Gorter \11
6.1 Objective j
6.: Method
6. Results 8
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i1
.1 Future Work and Recommendations i1
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Related Work i
Related Software ii
Other References i6
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
List of Used Terms and Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . io
Assignment Database File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . j1
1 Dutch Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ji
C Test 1 Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . jj
1 Test 2 Arrange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . j8
1 Interview 1 Email Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
1 Interview 1 Email Dbfs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6j
G Interview 2 Arrange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o
H DBFS Source File Listing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . j
Database File System
O. Gorter 1
Introduction
1
File access and le management is something we do on our pcs every day; lots of computer time is spend
browsing our directories and opening and saving les, or worse, nding les. The basis for this system was laid
down over 30 years ago, and since graphical user interfaces became main-stream not much has changed. Yet
computer hardware has become increasingly more powerful and limitations that existed are no longer there. Still
the biggest change in our le interface is preview (thumbnail) rendering in the le manager.
A new le system can introduce better metaphors on working with les and can make use of advanced gui
techniques not available when hierarchical le systems came into use. It can bring the focus of the le system
to the user, instead of the computer, and in doing so change how we think about les and the whole computer.
It can be an enabler for a new and more up-to-date user-oriented computer interface.
And this is exactly what this research is about, trying to bring le management to the user. It does so by providing
a search based le interface, based on le meta-data, and it introduces keywords in favor o directories. Being
user oriented means only storing documents and not system les like shared libraries. Where documents are all
les the user is interested in, this can be a msword document, but also images, music and more.
Because the systems searches and modies meta-data, all meta-data can be treated equally, meaning that
security, ownership and sharing are just as easy to manipulate as the le-name or keywords. And without
directories the systems does away with locations, instead it can categorize documents in a more powerful way.
Without locations on a le system, we can have applications that just save everything you do. The save button
can be completely removed from every interface element of the computer, doing away with the dualistic nature
Database File System Introduction
O. Gorter :
of how we use a computer today; creating a system where there is no longer a dierence on what you see on
your screen and what is stored on the hard disk.
1.1 Relevance
In the References we see more works that try to extend the le system with searching. And it is a very
relevant idea, because lately it is sometimes easier to nd things on the enormous internet, using google, then
on your own hard drive. You can always use a search, but it is slow and not very ubiquitous.
This is known by the two mayor os software vendors, Microsoft and Apple: both announced a more integrated
search using some sort of database. But we will have to wait a while until the new products are available.
But already we can see what a non-hierarchical approach to les can do when we look at specialized applications
like iPhoto or Adobe Photoshop Album for digital photo management, or iTunes for digital music.
Database File System
O. Gorter
Hierachy Based File Systems Theory
2
This is chapter is an overview of how todays (hierarchical) le systems work from a high level point of view.
Where appropriate some forward references are given to the database le system. Four properties of hierarchy
based lesystems are discussed, with an emphasis on their weak points. But we start with a description of what
a hierarchy based lesystem actually is.
Hierarchy based lesystems are created by directories and les. A directory is an object wich has a name and
it can contain directories and les. Files are objects that also have a name and contain data. This data can be
anything and is only relevant for the application that uses this data; the le system does not impose what the
data must look like, in fact, this data is not shown in the le system. We use a le system in order to keep track
of this data, by keeping track of our les. We keep track of our les by knowing their name and the directory in
wich they reside.
This type of lesystem creates a two dimensional space laid out as a tree like structure. This structure is
created from directory names and depth of directories. (Also see Figure 2.1.) By choosing useful names for
directories, we eectively create a categorisation over sub-hierarchies (sub-trees) and the les they contain. For
example, a digital picture from a certain user on a MS Windows system would typically be found somewhere
below: /Documents and Settings/username/My Pictures/. We can trace back why the picture would be
stored there: The picture is a document, therefor it should be in Documents and Settings; The picture is from
username therefor it should be in username; and it should be in My Pictures because it is a picture.
Database File System Hierachy Based File Systems Theory
O. Gorter i
...
C:
Documents And Settings Program Files Winnt
User1 User2
... ...
...
My Documents My Pictures
Figure 2.1 A typical MS
Windows XP hierarchy.
Modern hierarchy based le systems like ntfs, hfs+ and also ext, reiserfs and more, have lots of optimal-
isations regarding how les are stored and retrieved. Using techniques like journalling, binary balanced trees,
hot-le tables and more. These techniques enable these systems to perform optimally, but on the outside they
all use the same hierarchical approach. This chapter will only discuss the basic properties that come from a
hierarchical approach, not the properties that come from their implementations. Especially when realizing that
the techniques used in these implementations will not be invalidated when moving to another type of le systems,
for instance, most database implementations are also highly optimized, perhaps even more so then le systems.
Database File System Hierachy Based File Systems Theory
O. Gorter j
:.1 Properties of Hierarchy Based File Systems
:.1.1 URLs
Using directories and les, hierarchy based le systems create a unique name for every le; referred to as the path
or as we will do here, the url. This is one of the strongest points of a hierarchy based le system. An example
of a url is /Documents and Settings/username/My Documents/University/Final Project/report.doc A
url is a clear means if identifying one le, and one le only. Hierarchy based lesystems are based on urls, or
create urls, depending on your point of view. This property comes from the fact that inside a directory there
can be only one object (le or directory) with a certain name. Otherwise a url could point to two or more les
at same time, without a clear way to know which le it actually refers to. urls are a feature lost in the database
le system, at least to some extend. There is more about this in the discussion of the database le system, see
Database File System Theory.
It is important to note that urls are very useful and part of the reason why hierarchy based le systems are
designed the way they are. It gives both computers and humans a way to refer to les uniquely. The next
three properties to be discussed are very much connected to the url creating property, but instead of being an
advantage only, they are the properties that are the main motivation for a new le system.
:.1.: Hiding
Directories hide what is inside of them, directories were designed to work this way, it is what keeps the le
system tidy and arranged and therefor useable; but there is a downside: Because directories hide what is inside
of them, there could be an endless set of sub-directories inside a directory. There is no way of knowing what is
inside a directory until you traverse into it. The consequences are that, if categorisation using directory names
fails, or is unclear, then a le can be hidden. There are tools to locate such lost les, but wether they are able
to help depends on the situation.
Database File System Hierachy Based File Systems Theory
O. Gorter 6
:.1. Locations
Directories create locations, this enables users to create meaningful locations to store certain les. But many
directories means many locations, which might mean too many locations; which makes it hard to nd the right
location to store documents. Also, documents can be stored at only one location, meaning that you need to nd
the right location if you want to nd a certain document.
To make matters worse, most systems dont put emphasis on locations anymore, they emphasize the next
property, hierarchy, instead. The problem here is that locations are easier (less abstract) to understand then
hierarchy, especially for less computer literate people. Stressing locations can be done by not allowing users to
create two windows on the same directory, opening sub-directories in new windows and presenting one directory
always in the same window at the same location with the same layout. This way a user recognizes his documents
directory not only by its url, but also by its window layout and position.
The reason for the shift to hierarchy and away from locations is the ever growing size of the le system and the
amount of les we store. With too many locations it is hard to identify locations. Notorious good systems that
placed lots of emphasis on locations were Mac OS 9 (and below) and Risc OS. Also earlier versions of Microsoft
Windows placed more emphasis on locations then they do today.
:.1.i Hierarchy
Directories create a hierarchy and we do our best to create hierarchies that split up sub-hierarchies using mean-
ingful properties, in doing so we are categorizing our les. But not everything kan be tted inside a hierarchy
and the larger part of the hierarchy is created backwards: because hierarchies are a must we impose hier-
archy on our les. For example: a msword document is stored in (on a MS Windows system) /Documents
and Settings/username/My Documents/name.doc. Even though this is the most logical place to store such
a document in a hierarchy based system, there is too much hierarchy imposed on this location: The docu-
ment is a subset of Documents and Settings but not so much a subset of username , let alone Documents
and Settings/username. Arguably the hierarchy should have been laid out like /username/Documents and
Settings/My Documents/name.doc, because Documents and Settings is more a subset of username then the
Database File System Hierachy Based File Systems Theory
O. Gorter
other way around. And where do we store programs only for one user? Clearly programs do not fall in the
category Documents and Settings. So we cannot store them at Documents and Settings/username. Instead
we need to create a new username categorisation somewhere else where we do store programs, like Program
Files/username.
The problem of imposing hierarchy becomes even clearer when thinking about other le properties like security,
ownership, encryption and sharing. If we want to share a le with another user, perhaps even over a network,
most of the time we must move (or copy) the le to a public location (directory) and set the rights to the
le correctly. Because the concepts of ownership and sharing work through the hierarchy, we need to create
a dierent hierarchy to prevent all our les from becoming public all at once. Two hierarchies to store our
les means keeping track of them even becomes harder. This principle of splitting the hierarchy based on some
properties of what is inside the sub-hierarchies is not always eective; sometimes we have documents that easily
t inside both sub-hierarchies. Which one is the best hierarchy to place that le? Probably none. Properties
over les just are not one dimensional and how we would like to categorize these dimensions depends on our
point of view. When sharing les, we want those les to be placed somewhere in the hierarchy where the are
actually shared. But when trying to retrieve certain les, we would like them to be in the most logical place in
the hierarchy. Unfortunately these two views on where the les should be stored are non-reconcilable.
There is a good reason why categorizing our les with directories has it shortcomings. The properties over
which we are trying to categorize are all dierent kinds of properties. In the example /Documents and Set-
tings/username/My Documents/University/Final Project/report.doc, the rst categorisation is made over
the type of the le (Documents and Settings), the second categorisation is made over the owner of the le
(username), then we categorize again over the type of le (My Documents), and nally we categorize twice over
the role of our le. And only the last part of this categorisation is a truly hierarchical relationship.
A last issue with hierarchy is its abstract nature, also see the previous property of locations. If we want to keep
our le system organized we must create hierarchies, preferably meaning-full ones, in order to keep track of our
documents. This is a dicult task for less computer literate people. It is hard to understand that a directory
can contain a directory. The reason this is dicult, is mostly because there is no real life example that has
somewhat the same properties. A house contains a closet that contains a box that contains a photo album that
Database File System Hierachy Based File Systems Theory
O. Gorter 8
contains a certain photo. Not a hard-drive contains a box that contains a box that contains a box that contains
a certain photo. And even when there are boxes inside other boxes, the rst box would be a big box.
Today most systems take a middle road between locations an hierarchy. They treat certain directories as locations
and from there create hierarchies if necessary. Examples are the My Documents and My Pictures directories on
an MS Windows system. These two stand out with their own icon and when opening such a directory they have
their own themes.
Database File System
O. Gorter o
Database File System Theory
3
In the previous chapter we discussed current (hierarchical) le systems. Four properties were analysed that
are inherent to hierarchy based le systems. The nal conclusion was that those le systems impose to much
hierarchy without a choice; the hierarchy forces categorization over dierent properties that dont have hierar-
chical relationships with each other. This chapter presents the idea of the database le system in a high level,
non-technical, overview; explaining the overall design and workings. The main dierences with hierachical le
systems will be pointed out.
The dbfs does not impose hierarchy by storing all les in one big data store, or database, hence the name
Database File System. It stores les without any restrictions on the les; multiple les can be stored with the
exact same meta-data. It is almost like storing all les in one directory, but without the need for unique names.
To retrieve les, the big store of les can be reduced by telling the system what les to look for. Like all les
that were modied today, or all les called report. The queries on the system can include any sort of meta-data
that is associated with les. This introduces a new powerful feature to a le system: You can retrieve les
independed of the perspective you took when storing them.
A little example to explain this some more, suppose you are looking for a le: If you remember you edited your
le last week, you can look for all les edited last week; If you remember giving it a certain property, you can
look for all les that have that property; If you remember you made someone else the owner of the le, you can
look for all les owned by that owner; If you remember at least some part of the le-name, you look for all les
containing that part in their le-name; Or you can use any combination of the above to look for your le.
Database File System Database File System Theory
O. Gorter 1o
Because the dbfs does not use directories anymore, there are no more custom properties you can categorize les
on. To reintroduce this the dbfs uses keywords, a le can have zero or more keywords, and the keywords can be
used in a search. Keywords can be seen as the new directories. Keywords are a superset of directories in view
of their capabilities; keywords can do what directories can and more. More on this later in this chapter.
From here on, the data store of les will be called a view, just like any subset from this store of les is called
a view. And a search or query will be called a lter. A view is created by a lter, and every view has a lter;
basically, a lter denes the les you are looking at, hence a view. The reason not to use search or query, but use
lter instead, is because search or query sound too much single-shot, though the terms are almost analogous.
.1 Relevance
In comparison to a hierarchy based le system, the dbfs is much more powerful in how to store and retrieve
les. But it does sacrice the notion of urls. The dbfs can produce urls by using unique le identiers, much
like inodes, but not by using symbolic identiers, like the path in a hierarchy based le system.
The dbfs can get away with this limitation, because it services a dierent goal then todays le systems. The
dbfs is targeted at the user by only storing documents (ie. les the user is interested in). You could see it as a
document retrieval system. Consequently it does not store system les like shared libraries, conguration les
and others. These les should be stored using apis, for instance using a hierarchy based le system.
For the dbfs to perform optimally, it is not so much that les should be stored; instead documents should be
stored. Lots of programs today use multiple les as one document. A few examples: An ide uses multiple
les as source and header les (and more), but all these les are related and form one document. Movies are
often stored as multiple les, a part one and part two, and two subtitle les, one for each part, again, all les
are related and form one document. Applications (especially under MS Windows) typically come with a whole
bunch of les, but none of these les make sense unless in the context of the application, again, an application
is one unit and should be treaded as such.
It is not that all les should be stored as one large le, but more that closely related les should be treated as
one unit. And the dbfs should provide the means to do so.
Database File System Database File System Theory
O. Gorter 11
.: Categorisation
The dbfs categorizes les on any property they have, which creates a multi-dimensional categorization, as
opposed by hierarchy based le systems that have a one-dimensional categorization applied multiple times.
In the dbfs only some categorizations are hierarchical, where there are hierarchical relationships (types and
keywords). In a hierarchy based le system every categorization is hierarchical, even if there is no hierarchical
relationship. It is important to realize that pushing categorizations in hierarchy decreases the categorization its
usefulness. The way the dbfs categorizes is called a faceted system.
With an simple example we can explain a faceted system and its powers over a hierarchical system. Lets say
we are looking at carrots and oranges. They share the properties that they are both edible and orange, but the
rst is a vegetable and the second is a fruit. Also both could be from Europe, but the rst is probably from the
Netherlands and the second from Spain. All these properties have no relationship between them: being orange
has nothing to do with being a vegetable. Only Spain and Europe have a relationship, which is a hierarchical
relationship because Spain is part of Europe.
In a faceted system we can create a categorization on both the carrot and the orange in a very natural way.
Such that we can ask the system for a orange vegetable and we see a carrot. Or ask for a vegetable from the
Netherlands and see a carrot. But in a hierarchy based categorization, there is a xed order of the properties
and only when we traverse this order can we know about the properties of an object. If the rst categorization
is on fruit or vegetable, then it is impossible to retrieve all edible things from Europe. Or if we are making an
orange salad, it is impossible to retrieve all orange edible things.
The main dierence between a hierarchy based system and a faceted system (like the dbfs) is that hierarchy
based systems are made to store things in some (reasonable) logical location, as where a faceted system is made
to categorize and nd things. Hierachy based systems are what whe use in the physical world to categorize things.
In a supermarket, oranges would be stored in the fruit department, and carrots in the vegetable department.
But the fruit and vegetable departments are in the biological food department, a hierarchical ordering on the
role of the product.
The reason we use such a system is because fysical objects can only reside in one place at the time, were as this
limitation does not go for virtual objects, like les. So there is no reason to limit a le system to a hierachical
Database File System Database File System Theory
O. Gorter 1:
system, when a faceted system is more powerfull, and could be considered a super-set of hierarchical systems.
This is why the beginning of this chapter stated that keywords are like a super-set of directories: A location in
a hierachical system is dened by its elements in the hierarchy, using these same elements to query a faceted
systems yields the same results.
In the example of /Documents and Settings/username/My Documents/University/Final Project/ this lo-
cation in a hierachy based le system is the same as querying the dbfs for all Documents from username where
keywords are University and Final Project. Only there is no hiding, when the /Final Project/ directory
contains more directories, these show up as directories in the hierarchy based le system, but their contents
shows up in the dbfs.
Database File System
O. Gorter 1
Database File System Overview
4
In the previous chapter we discussed the theory behind the dbfs. In this chapter a high-level overview is given
of the current implementation used in this research. It will start at the bottom and end at the gui that has
been implemented in kde.
The dbfs has been implemented as a daemon service for unix like systems, which integrates a sql library and
accepts connections from clients. The clients are the open-le and save-le dialogs in the open-source Desktop
Environment kde, together with a standalone lemanager, called kdbfs, which replaces Konqueror. Running
this setup of kde gives the impression to a user that there is no hierarchy based le system, only the new
database le system.
The daemon service is called dbfsd and runs in the background. It does not actually store les, it only stores
references to les on the hierarchy based le system. The dbfsd tries to work together with the underlying
hierarchy, such that a high level of backwards compatibility is achieved. In the current implementation it only
supports a few pieces of meta-data: le-name, le-type, le-size, modication-date and keywords. And the server
is only meant to service one user, but every user can run its own instance.
The dbfsd can be congured using the .dbfs/dbfs.conf le in the users home directory. The main purpose of
this le is to tell the server what directories to scan and where certain new les go, according to their le-type. It
can also be set to ignore certain directories or les. Which mime-type to use for which le extension is congured
in .dbfs/mime.conf. A log goes to .dbfs/dbfsd.log and the actual database is written in .dbfs/db.db. The
next chapter will go much deeper into its implementation.
Database File System Database File System Overview
O. Gorter 1i
KDE
dbfsd
Hierarchical File System
File Access
Figure 4.1 Overview
of the new kde.
What the user sees when using the kde implementation from this research, is a normal functioning system,
until the user accesses a open-le or save-le dialog. These fundamentally dier because they use the dbfs. But
because the dbfsd does not actually store les, only their references, while the user might see a dierent le
system, a kde application sees and uses normal les as if there was no dbfs. This is important because the
kde applications do not need to change in order to work with the dbfs. (Also see Figure 4.1.)
i.1 User Interaction
The main kdbfs application is shown in Figure 4.2, with this application the user can lookup and manage les
in the dbfs. This application replaces Konqueror which is the le manager of kde in a hierarchy based system.
(It should be noted that Konqueror is also the internet browser and more, because it is very modular in setup,
these functionalities have not been disabled.) The kdbfs application reects how the dbfs works internally,
with lters and views, as kan be seen in the gure.
Database File System Database File System Overview
O. Gorter 1j
Figure 4.2 The kdbfs application. The number 1 is the view, numbers 2
through 5 are lters.
Whenever the user manipulates a lter, the view follows the lter immediately, providing direct feedback. And
because the view updates in the background, the user can continue to manipulate the lters, even when thousands
of les show up. The user can manipulate how the view is rendered using the few buttons just above the view,
which toggle the zoom level; overall layout; and sorting on name, date or size. Right next to these buttons is a
search eld (number 5 in Figure 4.2), which searches the le-name by manipulating a le-name lter. Files in
the view can also be renamed.
Database File System Database File System Overview
O. Gorter 16
i.1.1 Filters
Just above we already mentioned the name-lter, which is implemented as a search eld. All other lters the
kdbfs oers are implemented as widgets, located next to the view. These lter-widgets can be hidden or shown
by the buttons at the very top of the application. The current implementation has only three of these lter-
widgets: a general main-type widget (numbers 2 and 3 in Figure 4.2); a keyword widget (number 4); and a
date widget (not shown in gure).
The general main-type widget has two functions. First it can select on one or more of the main le-types there
are in the system, like documents or images and more (number 2). But it also supports saving the current lter
(number 3). Which means that the user can save a view he created and quickly retrieve that view, without
having to click around to recreate the accompanying lter. Moreover, after using a stored view, the rest of the
lters can be used to create sub-views on the stored view.
The date widget can select a date range which will select all les that have a modication dates inside that
range. Unfortunately the current implementation is not an optimal one, it is just two calendars on which the
user can click. A more optimal widget would display one calendar, which should be zoom-able, and ranges can
be created by clicking and dragging on the calendar. Also it is not possible to select on creation-date or last
access-date.
The keyword widget is probably the most important one, because it supports user dened categorisations. The
user can create new keywords, and rename or delete existing ones. The user can also drag keywords around to
create hierarchical relationships between keywords. If a keyword is selected, the view will show all les which
have that keyword associated with them. Multiple keywords can be selected, and the view will show all les
which have at least one of the selected keywords associated with them (an or operation). When a keyword is
selected that has multiple keywords beneath it in a hierarchical relation, the created lter will be as if all the
keywords beneath and the selected keyword had been selected.
To add keywords to les, the user has a few options. One or more keywords can be dragged onto one le or
a selection of les, these les will then add these keywords to their meta-data. The user can also drag one or
more les on a keyword. Or the user can drag one or more les on an empty space in the keyword widget and a
new keyword will be created, which the user must give a name. Another function of the keywords widget is the
Database File System Database File System Overview
O. Gorter 1
inverted mode. In this mode the whole lter is appended with a not. This is useful when categorizing lots of
les, because the les already categorized will disappear from the view.
The behaviours of the lter-widgets are very natural, but chosen quite arbitrary during the development of
the kdbfs application. The dbfs supports much more powerful lters then can be created using the widgets
described above, but in order to keep the system simple, this implementation has been chosen. To really get a
feel for how the lters work together, the reader should be enabled to click around in the application himself.
More on the usability of the dbfs and the gui can be found in Chapter 6.
i.1.: Dialogs
The new open-le dialog is quite similar to the kdbfs application, except most buttons to manipulate the
keywords and the view have been removed, keeping the focus of the dialog on opening les, and not manipulation
le meta-data. For kde application that tell the dialog which le types it can open, the dialog displays only
those les by setting an appropriate lter. This can be disabled using a little checkbox located on the bottom
of the dialog. Also see Figure 4.3.
The new save-le dialog is completely dierent from the original. There is no need for an extensive dialog,
because the dbfs does not use locations. The user can enter a name, optionally add a keyword or keywords to
the new le, and press Save. For those kde application that tell the dialog what le-type to save to, the user
can leave out the extension. Also see Figure 4.4.
It is unfortunate that kde is not very focused on meta-data, and not all applications tell the dialogs what types
they can open, or what type they will use to save. This can be confusing because the dbfs save-le dialog only
asks for a le-name, not a le type (in the form of a le-extension). Happily the KOce suite fully supports
le-types when saving or opening. But for instance when Konqueror saves a le from the internet, it does not
relay its type to the dialog, the user should manually append the name with an extension. This shortcoming is
not permanent, and will be resolved if kde gets better le-type support or works together with the dbfs more.
Database File System Database File System Overview
O. Gorter 18
Figure 4.3 An open-le dialog.
Database File System Database File System Overview
O. Gorter 1o
Figure 4.4 A save-le dialog; the key-
word eld can be toggled on or o.
Database File System
O. Gorter :o
Database File System Internals
5
Readers not interested in the technical design, internals and implementation of the dbfs can safely skip this
chapter. In fact, the sections containing code can also be safely skipped for those not interested in that much
detail.
The dbfs has been designed to be client-server oriented, where a client is an user interface to the dbfs, and the
server is responsible for all the housekeeping and does all the work. The motivation behind this design is that
the users should never have to press refresh; Clients register views to the server and from there on the server
knows what a client is looking at. If the view of the client needs to be updated, the server tells the client to do
so. Updates are necessary when a client sets a new lter for a view, but also when another client renames a le,
or does any other meta-data manipulation. This scenario implies two other design aspects: the communication
between the client and server is asynchronous, and the server and the client are both multi-threaded.
The dbfs is mainly written in ocaml but on the client side there are dierent apis to interface with the system.
There are four low level apis: for ocaml, c, c++, and Objective-c. There is also a high level api for kde that
includes widgets and controllers.
The rest of this chapter will discuss various design details and a few implementation details. We will start with
the server. An overview of the system is given in Figure 5.1.
Database File System Database File System Internals
O. Gorter :1
File System
File
Crawler
Configuration Initialization
Database
SQL
File
File
view
Filter
view
Filter
Client Server
Operating System
Database File System
Figure 5.1 Overview of the dbfs.
j.1 Server
The server has two main responsibilities. First it lls and keeps track of all the views clients have registered to
it, and sends update to clients who need it. Secondly it keeps in sync with the underlying hierarchy based le
system, where it renames and deletes les when necessary.
The server keeps a sql database (see Data Querying) of all the les the user is interested in. There is a
crawler module in the server that lls and updates the database with les from the underlying le system. When
a client creates a view, the lter that accompanies the view is translated into a sql query. Every time the lter
or the database is changed, the sql query is run against the database. A set of les is created from the results
Database File System Database File System Internals
O. Gorter ::
of the database, this set is compared to the old set and any dierences (added or removed les) are transmitted
to the client. The same mechanism is used for meta-data like keywords and custom stored lters.
Clients connect the the server using either tcp/ip or unix domain sockets sockets. After a connection is
established, there is a protocol in place that denes the communication between the server and the client (see
Protocol and Views). As mentioned before, this protocol is completely asynchronous to allow either the
server or the client to initiate communications.
j.1.1 Synchronizing with the Hierarchical File System
The server keeps in sync with the underlying le system using a crawler module. This module periodically
indexes the congured directories and all their subdirectories; any new le is added to the database, and an
already existing le is updated so that, for example, its modication date keep in sync.
The synchronisation also goes the other way around; when the user renames a le using the dbfs, the server
will rename that le on the underlying le system. Because there is no need for unique le names in the dbfs
but there is on the regular le system, the server uses a special scheme that appends a number to the lename
when conicts arise. If in the dbfs there are two les called report, which are both msword documents and
are both stored in the same directory on the underlying le system. The rst le will be called report.doc and
the second report-1.doc. Any subsequent third le will be called report-2.doc and so on.
When using the dbfs to save les, the server will use its le-type to determine where to save the le in the
underlying le system. Such that in a standard conguration msword documents will be stored in Documents.
The same algorithm as discussed above is used to create a unique le names.
Lastly, when using the dbfs to delete les, the server will also delete the le from the underlying system. Though
it should be noted that there is an option that prevents the server from making any changes except saves. Only
in this mode the dbfs is suboptimal: the crawler will keep the dbfs in sync and will add les deleted from the
dbfs as new les and undoes renames to les in the dbfs in favor of the original name stored in the underlying
le system.
Database File System Database File System Internals
O. Gorter :
It is rather important to keep in sync with the underlying system, by using the scheme just discussed, because
it makes it possible to use the dbfs system while maintaining full backward compatibility. As has been done
during this research using kde.
j.1.: Files and Filters
First we will discuss two fundamental types in the system: les and lters. After that we will continue on to see
how the major processes take place inside the server.
But before we start, a few notes: From here on there will be some code blocks in the text. Because the
implementation of the dbfs is in ocaml there will not be many readers familiar with the language used in these
code blocks. Still, reading through these blocks and their accompanying texts should give a general idea of how
the server is implemented and how the data ows through the system. Therefor the author encourages readers
to just read on.
Whenever a code block corresponds to an implementation le, that le will be mentioned. A complete listing of
the source les can be found in Appendix H.
Files are represented by the following ocaml record type (dened in common/file.ml):
type file = {
fid: int; (* unique *)
version: int; (* increment when file info changes, caching optimization *)
name: string;
date: float;
size: int;
file: string;
mime: mime;
};;
Where fid is a global unique identier for a le, much like an inode on a unix le system. The rest of the elds
are pretty much self explanatory except for the version eld, which is explained below.
Database File System Database File System Internals
O. Gorter :i
In the dbfs there are two notions of les. First as the entity that is represented by the file record, which is
a representation of meta-data, and a le as a piece of data which is referenced by an url using the underlying
hierarchical le system (stored in the file eld of the file record). This distinction is rather faint, as the les
used in the dbfs are basically a wrapper around the les stored on the underlying hierarchical le system. Still
the reader should be aware of this distinctions at this level.
The version eld is used as a caching optimization; every time the le changes some property, the version
is incremented. This is useful because les are processed using FileSets, which are binary trees holding les,
implemented using the ocaml Set module. The les are indexed over a total ordering, which uses the les fid
and version. The end result is that we can compare FileSets with each other to see what has changed without
making this an expensive operation. This is important as we see later in Protocol and Views.
(* module that can order files *)
module File_ord = struct
type t = file
let compare f1 f2 =
let c = f1.fid - f2.fid in if c = 0 then f1.version - f2.version else c
end
(* ordered set of files module *)
module FileSet = Set.Make(File_ord);;
A lter is represented by the following ocaml type (dened in common/filter.ml):
type filter =
Empty
| All
| Type of string * string
| Date of float * float
| Size of int * int
| Name of string
| Keyword of string
| Not of filter
Database File System Database File System Internals
O. Gorter :j
| And of filter * filter
| Or of filter * filter
;;
And a lter that selects every le that has the keyword university and is a msword document is constructed
such:
let filter = And (Type ("application", "msword"), Keyword "university");;
But lters can also be parsed from strings:
let filter = filter_of_string
"type \"application\" \"msword\" and keyword \"university\"";;
Which results in a lter identical to the rst one.
Filters can be converted into strings using string of filter and converted to a (partial) sql query using
sql of filter, which we will see in action in Data Querying.
j.1. Protocol and Views
When a client starts communications with the server, a view is created for this client. A view has a three
components: a FileSet which are all the les in the current view; two sets of meta-data namely: KeywordSet
and CustomSet representing the keywords and the custom stored queries in the system; a filter which is used
to select the les for the Fileset from the database (see Data Querying).
After this is setup, the client typically activates the asynchronous mode of communication by calling Set callback
and sets a new lter using Set filter. While in the mean time the server will respond to Set callback by
transmitting the current set of les (using Files) and the keywords and custom stored queries (using Key-
words, Customs resp.). These protocol elements are implemented using ocaml types and are communicated
using the Marshal module from ocaml. The protocol type a client can transmit to the server (implemented in
common/protocol.ml):
Database File System Database File System Internals
O. Gorter :6
(** server command type, client sends these to server *)
type server_command =
Set_filter of filter
| Get_files
| Add_file_with_keywords of string * keywordlist
| Change_file of file
| Delete_file of file
| Get_keywords
| Add_keyword of keyword
| Delete_keyword of keyword
| Change_keyword of keyword
| Set_keywords_to_files of keywordlist * filelist
| Add_keyword_with_files of keyword * filelist
| Remove_keywords_from_files of filelist * filelist
| Get_customs
| Add_custom of custom
| Delete_custom of custom
| Change_custom of custom
| Set_callback
| Remove_callback
| Set_incremental
| Set_no_incremental
| Read_dir of string
;;
and the client receives these types from the server:
(** client command type, server send these to client, mostly as a response *)
type client_command =
Updated_files of FileSet.t * FileSet.t
Database File System Database File System Internals
O. Gorter :
| Files of FileSet.t
| Keywords of KeywordSet.t
| Customs of CustomSet.t
| Added_file of string
| Ok
;;
The actual transmission is done using functions like read server command:
let read_server_command inc =
(Marshal.from_channel inc : server_command);;
Which uses an input le descriptor and yields a server command. This all comes together in the main loop
of the server where it listens for incoming commands and responds to the client accordingly (implemented in
server/server.ml):
method run =
let rec loop () =
let command = read_server_command _in in
begin match command with
(*FILTER/FILELIST HANDLING*)
Set_filter f ->
debug ~file:"server.ml" ("Set filter command: " ^
(string_of_filter f));
let r, a = _view#set_filter f in
self#send_command (Updated_files (r, a))
| Get_files ->
...
| _ ->
info ~file:"server.ml" "command not implemented"
end;
loop () in
Database File System Database File System Internals
O. Gorter :8
try
loop ()
with
_ -> info ~file:"server.ml" "client disconnected";
_view#remove_observer (self :> v observer)
And here we can see why we use FileSets as a store for our les. Instead of sending all les the the client as a re-
sponse to Set filter, the server sends only the removed and added les (self#send command (Updated files
(r, a))). How these les are acquired is discussed in the next section.
j.1.i Data Querying
The server uses the sqlite sql database to store and query the les in the system. The schema for the database
is shown in Figure 5.2.
files
fid INT KEY
version INT
name CHAR
date INT
size INT
file CHAR
base CHAR
special CHAR
keywords
kid INT KEY
version INT
name CHAR
rank INT
parent INT
files_keywords
fid INT
kid INT
customs
cid INT KEY
version INT
name CHAR
filter CHAR
rank INT
crawler
inode INT KEY
fid INT
rank INT
timestamp FLOAT
Figure 5.2 Database schema as used by the dbfs.
To ll a views FileSet with les, the server translates the lter from that view into an sql query. This query
is run against the database and the results are translated into a FileSet. The whole process is described in
more detail below.
Database File System Database File System Internals
O. Gorter :o
A lter that will select all the msword documents: type "application" "msword":
let filter = filter_of_string("type \"application\" \"msword\"");;
This is translated into the following sql query and processed in the database like this:
let sql = sql_of_filter(filter);;
>>> sql = "base=application and special=msword"
let new_files = files_of_query ("SELECT * FROM files WHERE" ^ sql);;
Where the function files of query is implemented as (dened in server/files.ml):
(** executes a query and returns a FileSet of files which might be empty *)
let files_of_query q =
verbose ~file:"files.ml" ("files query: " ^ q);
Mutex.lock db_mutex;
try
let vm = compile_simple db q in
let rec loop () =
try
let v = file_of_array (step vm "") in
(* verbose (string_of_file v); *)
FileSet.add v (loop ())
with
Sqlite_done ->
Mutex.unlock db_mutex;
FileSet.empty
in
loop ()
with
Sqlite_error s ->
Mutex.unlock db_mutex;
Database File System Database File System Internals
O. Gorter o
warning ~file:"files.ml" ("Error (" ^ s ^ "): " ^ q);
FileSet.empty;;
After the new FileSet is constructed it is compared to the old FileSet and the updated les will be send to the
client, if any (see Protocol and Views). As seen in the following code (implemented in server/view.ml):
let new_files = files_of_query ("SELECT * FROM files WHERE " ^ q) in
let inter = FileSet.inter new_files _files in
let removed = FileSet.diff _files inter in
let added = FileSet.diff new_files inter in
This process of lling the view is done on two events. One, if the client set a new lter to the view (using
Set filter) or two, when any client changes the database. In both cases the server will respond with a
Updated files if the view has changed.
j.1.j Server Management
In this section we will discuss a few constructs used inside the server implementation, that are important enough
to be mentioned here.
Throughout the code we can see Mutex.lock and Mutex.unlock which are used to make the server thread
save on database access and communications. This is needed because simultaneous access to the database can
lead to incorrect results. And simultaneous access to the sending mechanisms can lead to incorrect data being
transmitted.
We also see constructs like debug ~file:"database.ml" "initializing" which are logger commands. The
logger is implemented in common/log.ml and can write its log to arbitrary output descriptors. A log level can
be set which can exclude messages from the log. Default only warning and fatal messages are logged.
To process events internally, the server is constructed using objects and these communicate with each other by
inheriting from either observer or observable or both (ocaml can do multiple inheritance). These classes
(implemented in common/observer.ml) provide a observer/observable programming construct.
Database File System Database File System Internals
O. Gorter 1
When an object inside the server also represents a thread, it inherits from a thread class (implemented in
common/threadobjects.ml). This provides basic thread management, including shutting threads down on
command. The object only needs to re-implement the run method which will be called after the thread is
started, and the thread will stop when the run method exits.
The server accepts a few command line congurations and reads from a conguration le for the rest. These con-
gurations variables can be accessed using various data access functions like string of config or bool of config
(implemented in common/config.ml). These options are initialized in the init module of the server (implemented
in server/init.ml), and the conguration le is read using the parse files function.
j.: Client
As we have seen in the previous section, the server does all the hard work. The client only mirrors the view,
while the server keeps the view updated. The main task of the client is to expose the internals of the system
into a set of easy accessible of programming functions.
This is why there are four apis for the dbfs. And of these the c like apis have been implemented as shared
libraries. Including a header le and linking to its library is enough to implement a dbfs client. Because
the client is only a thin client, we will not discuss it here further. There is programmers reference and an
programming example, distributed with the dbfs implementation.
j. Graphical User Interface
The main interface that has been implemented for this research is kde based. The implementation les for the
dbfs client are all inside the kde library libkio (in /kdelibs/kio/kfile/). All programs that use open-le
or save-le dialogs link against this library, and will automatically use the new dialogs after a recompile. (The
recompile is necessary due to a few awkward design decisions in the original kde implementation, which make
the new implementation source-compatible but not binary compatible.) Also the widgets and support objects
created for the dbfs are implemented in libkio and used by the dialogs and the kdbfs application.
Database File System Database File System Internals
O. Gorter :
Following are a few important implementations of the kde dbfs
Related Work
Augmenting Human Intellect: A Conceptual Framework
Douglas C. Engelbard.
Stanford Research Institute
1962
Semantic File Systems
Mark A. Sheldon David K. Giord, Pierre Jouvelot and James W. OToole Jr.
Proceedings of the 13th ACM Symposium on Operating Systems Principles; pages 1625
1991
Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections
Douglass R. Cutting and Jan O. Pedersen and David Karger and John W. Tukey
Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in
Information Retrieval; pages 318329
1992
The Harvest information discovery and access system
C. Mic Bowman and Peter B. Danzig and Darren R. Hardy and Udi Manber and Michael F. Schwartz
Computer Networks and ISDN Systems; vol 28; pages 119125
1995
Lifestreams: An Alternative to the Desktop Metaphor
Fertig, S., Freeman, E., and Gelernter, D.
Database File System References
O. Gorter ii
In ACM SIGCHI Conference on Human Factors in Computing Systems Conference Companion (CHI 96);
pages 410411
1996
Towards a Semantic-Aware File Store
Z. Xu and M. Karlsson and C. Tang and C. Karamanolis
Xu, Z., Karlsson, M., Tang, C., and Karamanolis, C. Towards a Semantic-Aware File Store. In Workshop
on Hot Topics in Operating Systems (Lihue, HI, May 2003); pages 145150
2003
Why cant I nd my les? New methods for automating attribute assignment
Craig A. N. Soules and Gregory R. Ganger
2003
Faceted Metadata for Image Search and Browsing
Ping Yee, Kirsten Swearingen, Kevin Li, and Marti Hearst
the Proceedings of ACM CHI 2003
2003
Related Software
[MSFT]
Microsoft Corporation
http://www.microsoft.com/
The MS Windows operating system and the MS Oce suite.
[APPL]
Apple Corporation
http://www.apple.com/
The MacOS operating system.
Linux
http://www.linux.org/
Linux operating system, a free operating system based around the Linux kernel.
Database File System References
O. Gorter ij
RISCOS Ltd.
http://www.riscos.com/
Risc OS.
KDE
http://www.kde.org/
The K Desktop Environment; a complete open-source desktop environment.
GNOME
http://www.gnome.org/
The GNOME Desktop Environment; a complete open-source desktop environment.
Haystack; The Universal Information Client
http://haystack.lcs.mit.edu/
Integrating every day computer information like emails and les.
GNOME storage
http://www.gnome.org/~seth/storage/features.html
Free text queries against a the le system.
Dashboard
http://www.nat.org/dashboard/
Implicit searching of information relevant to the current activity.
Coldstore
http://coldstore.sourceforge.net/
A persistant object store.
Disk Based Hashtables
http://dbh.sourceforge.net/
Files associated with a hashed key.
iPhoto
http://www.apple.com/iphoto/
Digital photo management.
Adobe Photoshopt Album
http://www.adobe.com/products/photoshopalbum/
Database File System References
O. Gorter i6
Digital photo management.
iTunes
http://www.apple.com/itunes/
Digital music jukebox.
Other References
Practical File System Design with the Be File System
Dominic Giampaolo
Morgan Kaufmann Publishers
1999
ISBN 1558604979
The Naming System Venture Future Vision
Hans Reiser
http://www.namesys.com/whitepaper.html
2001
ocaml
http://caml.inria.fr/
The Death of File Systems
Jakob Nielsen
http://www.useit.com/papers/filedeath.html
1996
When good interfaces go crufty
Matthew Thomas
http://mpt.phrasewise.com/stories/storyReader$374
2002
Database File System
O. Gorter i
Index
c
categorisation 3, 11
client 13, 31
client-server 20
crawler 22
d
database 28
dbfsd 13
directory 3
document 10
f
faceted system 11
le 3, 23
lter 10, 14, 24
h
hiding 5
hierarchy 6
Hierarchy Based File System 3
k
kdbfs 14
keyword 10, 16
konqueror 14
l
location 6
m
meta-data 1, 13
o
open-le dialog 17
s
save-le dialog 17
server 13, 21
t
test 35
Database File System Index
O. Gorter i8
u
url 5
usability 35
v
view 10, 14, 25
w
widget 16
Database File System
O. Gorter io
List of Used Terms and Abbreviations