You are on page 1of 20

CO620 R ESEARCH P ROJECT

T ECHNICAL R EPORT

A File System in Erlang

James Forward

jf352@kent.ac.uk

supervised by
Professor Simon Thompson

March 30, 2017


A File System in Erlang

James Forward
Department of Computer Science
University of Kent
Canterbury, England
jf352@kent.ac.uk

Abstract—Although there is no lack of number the decision to develop a new file system rather
and variety of file systems, new file systems must be than rely on an already existing one, by weighing
written to keep up to date with changing software up the pros and cons of both possibilities. It will
and hardware requirements. One such project is the then move on to establish the aims of the proposed
HydrOS project, developed at the University of Kent,
file system, as well as the implications these aims
which is a multikernel OS, written entirely in Erlang,
designed to withstand complete software failure. Much
have on performance and storage space. The report
like any other OS, it requires a file system in order will then discuss the development of the file system
for it to store and retrieve data. This report details the over time, and a complete explanation of how the file
background, design, development and functionality of system functions within the HydrOS environment.
HydroFS, a file system written entirely in Erlang for It will then discuss how the structure of the file
HydrOS, and how it can be extended in the future to system lays the groundwork for the further devel-
take full advantage of storage space. opment mentioned in the aims, with the goal of fully
achieving the stated aims and delivering a unique
I. I NTRODUCTION Erlang-based file system which can be utilised inside
S Dominic Giampaolo writes in his book HydrOS.
A Practical File Systems, a guide to designing
and implementing file systems, ”The main purpose II. BACKGROUND
of computers is to create, manipulate, store, and
retrieve data” [1]. Because of this, every computer A. Erlang
Operating System (OS) needs a designed file system
to be able to fulfill the function of permanently Erlang is a concurrent, functional programming
storing and retrieving data from a storage medium. language developed in the 1980s by Ericsson.
Although there are many file systems already in
existence, and most consumers have little to worry Written by a small division of employees at
about when deciding which file system to store Ericsson with the intent to improve the way that
their everyday data on, when developing new OSs a Telecom applications are developed [2], Erlang has
choice has to be made whether to adopt one of these since become a majorly adopted language by a
and restrict the functionality of the OS by abiding wide variety of companies, such as Amazon for
by its limitations, or to develop a new file system their SimpleDB distributed database, Facebook
which, whilst requiring much more development for their WhatsApp messaging service (although
time than a simple driver, provides complete control this was adopted before the takeover), Yahoo for
over the storage and retrieval mechanisms that the the bookmarking service ”Delicious” among many
OS can take full advantage of. others.

This report focuses on a proposal solution for The reason for Erlang’s adoption in these areas
HydrOS, a research OS developed at the University are due to its features which separate it from other
of Kent. It will begin by providing a background to languages. Its ability to be both fault-tolerant and
the language that the solution (and HydrOS itself) largely scalable is an incredibly useful feature to
is written in, and an explanation of how the OS have in the contexts it is usually deployed. Other
functions and what sets it apart from traditional OSs. than these settings, however, it also has merit in the
It will then move on to explain the motivation behind usage as the foundation of an operating system.

1
B. HydrOS access the files stored on the disk. Not requiring the
The Erlang BEAM - the virtual machine on top user to reformat a storage medium in order to be
of which Erlang code is processed - also includes able to use it as part of a functioning system is a
typical Operating System functionality such as major convenience - it requires less set up time, and
process management, memory management and less effort.
scheduling. As a result of these features, alongside
the scalability and fault-tolerance of Erlang already Secondly, development time of writing a file
mentioned, this led to its adoption in HydrOS, a system driver, as opposed to designing a brand new
research OS developed at the University of Kent file system (and then having to write a driver for it),
which intends to utilise these factors to provide is much lesser. Once the driver has been completed,
a system capable of tolerating both software and assuming it was done correctly, OS development can
partial hardware failure. turn to other matters; such as actually utilising the
file system in built-in and custom-made applications,
The operating system uses the multikernel designing a graphical user interface for the user to
model to ”seperate a multicore computer into a be able to use it, and so on.
set of individual computing nodes, each capable of
withstanding the failure of the others” [3]. Each This is not to say that writing a device driver is
node has its own kernel, and memory is not shared a perfect solution, however. Since the developer has
between the nodes. Data and tasks are, therefore, had no say in the design, existing file systems may
shared using Erlang’s message passing functionality not be able to take advantage of features of the OS,
instead. Tasks are distributed across the cores, or modern hardware running it. For example, FAT32
with each of the nodes acting together as a virtual would not be a good fit for someone storing 4K
network. movies due to the 4GB size limit. EXT4 would not
be a good fit for someone concerned about security,
as even when ”securely deleting” files, sensitive
C. The need for a file system metadata remains on the disk[4]. Developing a
Like all Operating Systems, data in HydrOS needs custom file system allows complete control over
to permanently be stored and retrieved by the user. what features are needed, as there is no need to
Since the OS is written from the ground up, and comply with previously existing limitations.
not a fork of an already existing OS, it does not by
default include compatibility with previously existing Besides the ability to implement features decided
file systems such as NTFS, EXT4, FAT, or any other. ahead of time, another positive of developing a file
system from scratch is that as development goes
This means that, to accomplish the aim of on, experimentation can lead to innovation that
permanent data manipulation, one of two possibilities was otherwise not considered. Since, as Giampaolo
has to be realised. The first possibility is the creation points out, there isn’t a single ”correct” way of
of a custom driver for an already existing file writing a file system,[1], the possibilities in this
system. The second is the creation of an entirely project are endless. Also, the learning experience
new file system, in addition to a driver for the gained from designing and implementing a custom
operating system to support it. This section of the file system is much greater than taking somebody
report will consider the pros and cons for both else’s work and adapting it.
cases, and come to a conclusion which explains why
ultimately, developing a new file system was the Due to the time scale of the research project
chosen path. The report will then explain the way meaning that results were not required ”as soon as
in which the file system was is designed. possible”, the option to design a new file system was
chosen for the learning experience, flexibility, and
The advantages of creating a driver for an already desire to create a unique file system to match the
existing file system are immediately clear. unique OS.

III. A IMS
Firstly, by doing so, already existing data storage
mediums will not need reformatting (or other The aims of the project can be described in
necessary reconfiguring) in order to be able to 2 groups - ’core’ and ’extended’. Core aims are

2
the fundamental aims of the file system that are directly accessing specific parts of a file stored on
implemented by the end of the development cycle. disk rapid[5]. A downside, however, is that creating
Extended aims are not necessarily in the scope of the files and expanding existing ones is difficult. If there
project due to time restrictions, but are future goals is another file a few sectors in front, growing the
for the file system, drawing upon the functionality file requires moving the file to a new sector of the
proposed and design provided in this report. disk and updating the file table. It can also lead to
external fragmentation, since despite the disk having
The core aims of HydroFS are as follows: enough space to hold a new file of, for example,
3GB, this 3GB is spread across the disk rather than
• Provide a user-level interface module with the being a free, contiguous space[6].
file system, and backend file server module to Linked allocation is where blocks are not necessarily
serve requests. stored contiguously, and each stored block points
• Maintain a file table which dynamically grows to the next one. An advantage of this is that
as needed. fragmentation is not an issue, since the file system
• Store and retrieve blocks of data from disk. can store files that are larger than the largest
• Store and retrieve files that exceed sector size contiguous free space, which makes disk utilisation
from a disk, using a file name as a reference. more efficient. They are easy to implement[7]. A
disadvantage to linked allocation is that directly
accessing a specific part of a file on disk is
The extended aims of HydroFS are:
impossible, as read requests much be sequential
(since the information about what sector stores a
• Provide a directory system. block cannot simply be calculated, nor is it explicitly
• Provide extra functionality such as copying files. stored anywhere)[8]. Read speeds may therefore be
• Provide a search functionality for files. impacted.
• Provide an immutable file system, where mul- Indexed allocation refers to a system where, like
tiple revisions of a file only use extra space to linked allocation systems, files can be stored non-
save differences, rather than wasting space. contiguously. Instead of each block pointing to the
• Utilise multiple kernels for improved perfor- next one, however, a single block contains the list
mance. of pointers. An advantage to this is that, like linked
allocation, fragmentation is not an issue. Secondly,
it also has the ability for direct access, since the
blocks are explicitly declared in the inode. However,
IV. D EVELOPMENT
the pointer overhead of this system is larger than
A. Researching design the linked allocation system[9].
File systems can be implemented in a number
of ways. Arguably the most important decision Ultimately, the decision was made to employ
to make, as it will have the largest impact on a linked allocation system. Firstly, the ease of
performance and efficiency, is how the blocks are implementation makes this a good choice for this
stored on disk. The most common methods are project. Secondly, the lack of direct access to blocks
contiguous, linked, and indexed allocation of blocks. for manipulation purposes are not an issue, since the
Each of these different methods come with their file system is aiming towards immutability (meaning
own set of advantages and disadvantages. no blocks are directly edited).
Contiguous allocation refers to storing blocks of
files sequentially. They operate alongside a file
table, which contains information about what sector B. Compiling and running
the first block is stored at, and its length. So, for Upon beginning the research as to how the file
example, a file whose first block is stored at sector system would be developed, the first thing to do
600, and is 14 blocks long will exist from block was understand how HydrOS itself is compiled
600 to 613. and ran. As the HydrOS project website mentions,
This method contains a number of advantages. crosstool-ng is first required to build a toolchain
As Edwin Reilly mentions in his book Concise for the x86 64-unknown-elf architecture, which
Encyclopedia of Computer Science, this makes can then be ran using bochs[3][10][11][12]. This

3
is relatively straightforward, and are compiled by In terms of the file system, the user-level code and
using gcc’s make with the appropriate flags. The file server back-end sits on top of HydrOS, which
enable-smp flag is required to simulate a symmetric contains the ISA driver and kernel which directly
processing machine[13] and the enable-x86-64 flag operate with the hardware.
is required to add support for the x86-64 instruction
set[14] - both features which HydrOS requires.
Once these requirements are satisfied, the OS can be
compiled and run (assuming Erlang/OTP 19.0.2 or
higher is installed on the system) by downloading
the source code, and running make sim in the project
source’s root directory.

C. Issues encountered during development


Throughout the research project, a number of
issues arose. Firstly, for the majority of the time
spent on the research project, it was not possible to
get the OS to run under the virtualisation software
Qemu/KVM, which would have significantly
improved the performance. The reasons for this are
unclear, since by the end compiling and running
Qemu/KVM worked, as did running HydrOS under
it (through the use of virt-manager). The file system structure. Blue background = handled
by HydroFS code, green = handled by HydrOS
This problem did not affect the development
significantly, however, as another virtualisation
software, bochs ran HydrOS. The performance of
HydrOS was often frustrating but did not seriously
damage development. E. Front-end and back-end

Secondly, the entire project being implemented


inside a custom OS made testing and debugging
very difficult. Since the keyboard buffer would often The detailed code for both the user-level interface,
lock, using in-line commands on the OS’ terminal as well as the file server, can be found in the
input often failed. It required to have debugging appendix. This segment of the report will detail how
methods implemented inside the written modules in they function generally, rather than explaining each
order to test implemented functions. line of code.

1) User-level interface: The first thing considered


D. System Architecture was how the user would interact with the system.
HydroFS is implemented using 4 Erlang modules. HydrOS currently exists as a terminal-only OS, so
These are fs, os file server, null and init file server. there is no graphical interaction. Despite this, it is
fs.erl is the module the user directly interacts with still necessary that the user can read and write data
to store and retrieve files. os file server is the in the simplest way possible.
back-end module which contains the file system The interface was implemented as fs.erl, which is
logic. null is a small module which checks if a stored and compiled from the stdlib directory.
binary is null (used for checking free sectors), and To write a file/term (the two terms are interchange-
if required, can create a 512-byte binary full of 0s able in the context of this file system), the user
(which could be useful for secure delete operations). simply calls from the command line:
init file server is a small module, which starts the
file server upon booting the OS.
fs:write(Term, Filename)

4
asynchronously. In fact, using the same code from
the earliest commit, the file system read and write
functionality in fs.erl still works - despite at the time
this was designed for reading/writing to memory
rather than disk.

The file server functions by spawning a process


which is continually listening for messages passed
to it by the user-level interface module. The process
contains three parameters, which operate as follows:
• Filetable - The list of files and where they are
Writing a file in HydrOS using HydroFS. stored
• Num - The next sector to attempt to write to
• Bucket - The current ”bucket” the user is in.
This function passes a message to the back-end file This is explained later in the report.
server (which is mentioned below), with the file to The interaction between the user level interface
write, as well as the filename to write to. The file and the file server works as follows. The server
server then handles the logic, and the file is written. spawns, and listens for requests - currently, there are
To read a term from the file system, the user calls: 13 possible options (a few of which are only for
debugging purposes). This is the server’s listening
fs:read(Filename)
functionality:
receive
{msg, write, File, Filename, Caller} ->
end.

F. File table
The file table consists of two parts. The first is the
”file table info”, which sits at sector 0 of the disk.
This is a small file containing the starting sector
of the file table (by default set to sector 1), and
the total size of the file table binary. Upon the file
server starting up, the file server checks if the ”file
table info” sector is blank, or contains information.
Reading a file in HydrOS using HydroFS.
In the case of the former, it means this is a new file
table which must be created, and proceeds to create
2) Back end file server: This is initialised at a new binary. Otherwise, it loads the information
boot from the init file server.erl module in the about the file table into memory, and reads the file
os init directory. Upon startup, it checks if a file table starting at the defined sector, taking account of
table exists on the disk and if so loads it into the total size of the file table if it exceeds 512 bytes
memory - otherwise, it creates a new file table. The - in which case, it loads the appropriate number
file server functions by using message passing to of following sectors, and recombines them into a
receive requests from the user interface, such as single file table which is loaded into memory.
”read” and ”write”, and operate accordingly. It is in The usage of memory to store a file table, as
this module that the file table, and the logic which opposed to keeping it on disk and only reading as
determines what blocks to read/write to, is contained. required, is to allow extremely high-speed access.
This is in part inspired by the ”bullet server” aspect
One benefit of the Erlang language and the file of the file system of the distributed OS Amoeba[15],
system’s design is that, due to the server-client which also has a focus on immutability. Unlike this
model, changes to the logic of the back-end file system, however, HydroFS does not store the
have no impact on the file server. This means entire file system in RAM - only the file table itself.
that the interface and back-end can be developed

5
The file table itself is a small list of file entries, an END/STOP flag.
which contain as little information about the file as
required to function. Currently, it contains the file 1) Storing a file:
name, the first sector to read, and the total size of
the file. The following shows an example of a file In HydroFS, storing a file uses an algorithm to
entry: calculate what sectors to store its binary in, as well
as how to organise the pointers. The process is
This equates to a file named ”helloworld.txt”, relatively straight forward. Firstly, the user requests
which is 2740 bytes, and whose first sector is sector to store a file using the fs:write(Term, Filename)
352. function. This sends a message containing these
parameters to the file server, which then converts the
The file table grows in size if the list exceeds 512 term into a binary, pads it to the nearest multiple of
bytes, and the file table info sector is automatically 508, then breaks the binary into multiple 508-byte
updated and written in this case. Because of this, be- smaller binaries. The number of binaries produced
sides the arbitrary choice of choosing to start writing from this are then stored in a local term. It then
user-level files at sector 300, there is theoretically checks the disk for the next free sectors (the same
no limit on the amount of files that this system can amount of binaries), signs these sector numbers as
handle. Of course, due to the file table needing to be 4-byte binaries, concatenates them with the data
loaded into RAM, memory limitations may become blocks accordingly, and then writes them to these
an issue if an incredibly large number of files are sectors. A new file entry, as defined earlier, is
needed to be stored. However, since this would likely then added to the file table in memory as a tuple
happen on a system with lots of RAM in the first containing the file name, the first sector number,
place, it should not cause any significant limitations. and the size of the term. The file table is then also
written to disk (and the file table information sector
G. Storing and retrieving files updated accordingly), to ensure that on successive
To write data to the disk, the write function in the boots the file is still accessible by the file name.
os isa module in the os drivers directory is used.
This module is not part of the HydroFS project, but For example, consider a list of 231 integers. The
the functionality for basic storage I/O was created file size of the list, due to the numbers involved,
by the developer of HydrOS. This function takes is 1162 bytes. The diagram on the following page
two parameters - the sector number to write to (with shows how HydroFS calculates where to store the
the cylinder address pre-defined in the module), and files, and how they are stored.
a binary of exactly 512 bytes. As such, passing a
normal term to this function does not write data to
the disk.

In os file server, a function called convfull exists


to ”convert” a term into a 512-byte binary. It does
this by converting the term into binary, and then
padding the data by adding 0s to the end using
a recursive function, padit. The way this function
works is detailed in the Appendix.

The ability to store and retrieve files that exceed


the 512-byte size of a sector has been implemented,
marking all core aims of the project as achieved.
As already mentioned, files are stored in a ”Linked
List” structure - the sectors that contain data also
contain pointers, located at the end of the block,
which tell the file server which sectors to load next.
This carries on until a block’s pointer bytes (which
are always the final 4) contain all 0s, which act as

6
7
2) Reading a file: for organising and grouping files. This is handled
entirely by string joining and parsing, and does
not require any additional records on the disk to
accomplish.

When HydroFS’s server process spawns upon the


OS booting, it is passed a String containing a ’/’.
The ’/’ character acts as a delimiter for separating
buckets, sub-buckets, and files. Therefore, at the
start of launching the OS, the user is in the root
folder.

Unlike folders in a traditional file system, which


are created independently from files, buckets are
created and destroyed dynamically, whenever files
are created or deleted.

HydroFS’ loading method. For example, if the user decided to group files
under a bucket called ”files”, the user would then
The diagram above shows how the example file go into that bucket by calling the following code:
mentioned in the Section V (File Table) would be fs:cb("/files")
loaded by the file system using the linked list system.
The file table contains a pointer to the first sector - This function is capable of processing both absolute
in this case, sector 352. The data contents of sector and relative paths - if prefixed by a ”/”, it is con-
352 are loaded, and the last 4 bytes are examined. sidered absolute. Otherwise, it will append the new
In this case, as it would point to sector 353 next, bucket to the working directory. In this instance, if
the final four bytes are 0,0,1,97, which is the four- the user was in the root bucket (”/”), they could also
byte signed representation of this number. Sector 353 just type ”files” as a parameter, without the preceding
would contain 0,0,1,98, so would then point to Sector ”/”. To store a text file containing ”Hello World!”,
354 being loaded next. The process continues until currently in memory as Erlang term X, under the file
sector 357’s contents are loaded, where the final four name ”mydoc.txt”, the user would call:
bytes are 0,0,0,0, and so the process stops. The file fs:write(X, "mydoc.txt")
server then extracts the data portion of all loaded
blocks (the first 508 bytes), combines them into one The file table entry would contain the string
single binary, converts this back into a usable term, ”/files/mydoc.txt”, as the bucket would be appended
and passes to the user through the user-level interface to the beginning of the file (with the placement of
fs. delimiters calculated by the server beforehand).

H. Categorisation of files Whilst buckets primarily work as a replacement


Creating a directory-capable file system proved for directories, they are also capable of being used
to be a particularly difficult challenge throughout for primitive search functionality. The following is
the course of the development, and was sidelined an example list of files stored on the file system:
as a feature. However, an alternative system to the • ”/odds/1.txt”
orthodox method of handling file organisation was • ”/evens/powsoftwo/primes/2.txt”
eventually implemented. • ”/odds/primes/triples/3.txt”
• ”/powsoftwo/evens/4.txt”
HydroFS does not contain a ”typical” directory • ”/primes/odds/5.txt
structure. There are no records for directories, • ”/triples/evens/6.txt”
in the same way a typical file system such as
NTFS or FAT32 uses. HydroFS’s file system is The files are organised based on the filename’s
completely flat, and is similar in design to Amazon number (the contents are unimportant to explain this
S3’s file system, which contains a ”bucket” system functionality). It is possible to find all files matching

8
certain criteria, by calling the fs:tagged function,
which takes a list of strings as a parameter.

Suppose the user wanted to filter all files that


can be categorised as ”odds”. The user would
call fs:tagged([”odds”]), and the file server would
process the instruction. Everything that is saved
with a matching bucket name of ”odds” would
have their full path returned. The order of the
buckets does not matter, so despite ”primes”
coming before ”odds” in the case of 5.txt, its path
(”/primes/odds/5.txt”) would be returned alongside
the paths for 1.txt and 3.txt. The user can also filter
multiple buckets at once. Suppose the user wants to
filter all files marked as ”odds” as well as ”primes”.
The user would call fs:tagged([”odds”,”primes”])
or fs:tagged([”primes”,”odds”]), and the paths of
both 3.txt and 5.txt would be returned.

HydroFS’ patching method.


I. Copying

In the diagram above, a file of 6 blocks is saved to


The user can also copy a file so it can be ac- sectors 352, 353, 354, 355, 356, and 357. In this
cessed from multiple directories. The user calls the instance, the file is completely contiguous, and the
fs:cp(Filename, Bucket) command, and a new file linked list loading method seems to have no impact
table entry will be added for the appropriate bucket. on loading. However, three different revisions of the
file are then made. The first revision is extra data
being added into the file, with nothing removed.
The amount of data being added is between 508
J. Immutability and the Patching System and 1016 bytes, requiring 2 blocks to be stored
(including pointer bytes).
As of the time of this report, due to time The first block is saved at the next available sector,
limitations making this feature outside the scope of which is sector 358, and contains the first 508 bytes
the project, this aspect has not been implemented. of the extra data. The second block is saved at the
That said, the file system’s design, using the Linked sector after that, 359. The final four bytes of sector
List loading system, has layed the groundwork for 358 point to sector 359, and tthe final four bytes of
this. sector 359 point back to sector 357. A new file table
entry is added, with the description of what pointer
should be overridden (in this case, 356 does not
The patching system allows files to be revised/up- point to 357 anymore but 358, so the patch would
dated, with the differences between the files saved be 356,358).
as patches, rather than as entirely new files in their Since sectors 352, 353, 354, 355, 356 and 357 are
own right. To the user, this is completely transparent, not actually modified, storing these files again in
as the file server logic interprets them as complete new blocks appears to be a waste of storage space.
files to begin with, so no additional patching tools Using this method, only 2 new blocks are taken by
are required. the first revision, rather than 6, meaning when this
To give an example of how this is planned to method is implemented, a 66.6% saving in storage
work, consider the following scenario. space in this scenario.
The second revision of the file is the same as the

9
first but with additional data occurring at the end of not functioning due to compatibility problems, and
the block. In this case, sector 357 would not be the on another there was no keyboard input available.
end of the file, so the pointer data is overridden in These issues were not encountered when running the
the file table, pointing to sector 360 instead. Sector system through bochs, which was what was used for
360’s block’s final four bytes are all 0, so the file development testing throughout the process, since
would terminate here. the ability to run the OS at all under Qemu was
The final revision includes both changes of the first only possible near the end of the project.
and second revision, so both overrides are listed in
the file table. This revision uses no extra data blocks It should be noted that the ”seconds” are seconds
itself, so in this particular instance, excluding the as measured by the OS under bochs as this was
negligible space used by the file table entry, the the only available ”benchmark” tool. Running the
saving is 100%. OS under bochs leads to a significant performance
reduction, and so is not indicative of how the file
The diagram below shows what each file table system would run directly on hardware or through
entry will end up loading. Qemu. Time was recorded running different numbers
of cores for the OS.

A. 2 cores
Write Performance (Seconds) - 2 cores
Size(in bytes) Measured Speed
508 0.001115957
1016 0.500850896
2032 1.501131014
4064 2.001798172
8128 3.003183701

Read Performance (Seconds) - 2 cores


Size(in bytes) Measured Speed
508 0.000061747
HydroFS’ loading method. 1016 0.000097891
2032 0.000158133
V. R ESULTS 4064 0.000284639
8128 0.000540663
All core features that were aimed to be
implemented in the project were successfully
implemented. In addition to this, a few additional B. 4 cores
features were implemented too, such as a directory Write Performance (Seconds) - 4 cores
structure (really a ”bucket” structure) for organising Size(in bytes) Measured Speed
files, the ability to move paths, and copy a file from 508 0.000311531
one location to another. 1016 0.000401273
2032 0.000578193
Time is recorded in seconds, using the system’s 4064 0.001984555
runtime to measure the time differential. Figures 8128 0.003784495
chosen based on the size of 1 block in HydroFS
(a 508 byte file is 512 bytes when considering the Read Performance (Seconds) - 4 cores
pointer block). Size(in bytes) Measured Speed
508 0.000032051
Results are shown running the system in bochs, 1016 0.000069231
rather than Qemu since running the OS with 2032 0.000121795
the functioning file system on Qemu was not 4064 0.000394859
possible. On one computer, the system’s I/O was 8128 0.000884598

10
C. 8 cores potentially be developed further, in order to reduce
Write Performance (Seconds) - 8 cores used storage space from files with multiple versions
Size(in bytes) Measured Speed using a patching system.
508 0.000154005
1016 0.000202007
ACKNOWLEDGMENT
2032 0.000286009
4064 0.000466015 The author is grateful to the project supervisor,
8128 0.000828105 Simon Thompson, who has on numerous occasions
provided great help focusing the direction of the
Read Performance (Seconds) - 8 cores project in meetings. The author would also like to
Size(in bytes) Measured Speed acknowledge Sam Williams, the developer of Hy-
508 0.000016001 drOS, for his advice for the project as well, and for
1016 0.000036003 creating the OS.
2032 0.000064005
R EFERENCES
4064 0.000112007
8128 0.000224015 [1] D. Giampaolo. Practical File System Design, 3rd ed.
San Francisco: Morgan Kaufmann Publishers, 1999.
Creating bigger files than 8128 bytes were not Pages 1 to 31.
possible at testing, due to the small amount of
memory per core. [2] J. Armstrong, Making reliable distributed systems in
the presence of software errors, PhD thesis, The
Royal Institute of Technology, Department of Micro-
Under operational circumstances the file system
electronics and Information Technology, Stockholm,
could, theoretically, provide around 57MB/s read Sweden, December 2003. Pages 1 to 7.
speed in bochs when running 8 cores. According
to the ISA driver at the moment, the maximum [3] S. Williams, Hydros Project. Hydros-project.org. N.p.,
read/write speed capable is 16MB/s, so this would Web.
limit it, and as such the ISA driver needs to be
improved [4] J. Corbet, Securely Deleting Files From Ext4 Filesys-
tems. .net. N.p., 2011. Web.
It is unclear whether the performance degradation
[5] E. Reilly. Concise Encyclopedia of Computer Science,
of bochs is why running on multiple cores the
1st ed. Chichester: John Wiley, 2004. Pages 580-582.
results are better (as the OS running generally faster
means that the system time value is more accurate), [6] J. Valvano. Embedded Microcomputer Systems. Cen-
or due to the design of HydrOS as a multikernel gage Learning, 2012. Pages 519 to 524.
OS. Further research is needed in this area, once the
OS can be run fully under Qemu/KVM. [7] R. Khurana. Operating System (For Anna). New
Delhi: Vikas Publishing House, 2011. Pages 191 to
215.
VI. C ONCLUSIONS
[8] Basics of OS, UNIX, and Shell Programming. New
In conclusion, the project has been successful. Delhi: Tata McGraw-Hill, 2007. Pages 20 to 31.
The project has been able to provide a functioning
file system with some advanced functionality. [9] I. Dhotre. Operating Systems. Technical Publica-
The project has laid the groundwork for the tions, 2008. Pages 10-1 to 10-12.
implementation of a truly immutable file system,
and can be developed from here into a fully-featured [10] ”Start [Crosstool-NG]”. Crosstool-ng.org. N.p.,
file system. Web.
There are still a number of areas which could be
[11] M. Schultz. Sample files for building crosstool-
improved, most notably the read/write performance NG cross-compilers for the Xinu platform, GitHub
of HydroFS. Currently, the speeds are not up to repository, https://github.com/xinu-os/ct-ng-samples
scratch for every day performance, however this
is mostly due to the limitations of the current [12] Bochs: The Open Source IA-32 Emulation Project
ISA driver. The design of the file system can also (Home Page). Bochs.sourceforge.net. N.p., Web.

11
[13] Simulating A Symmetric Multiprocessor (SMP) Ma-
chine. Bochs.sourceforge.net. N.p., Web.

[14] Compiling Bochs. Bochs.sourceforge.net. N.p., Web.

[15] P. K. Sinha. Distributed Operating Systems, 1st ed.


New Delhi: Prentice-Hall of India, 2001. Pages 642-
659.

[16] W. Gatliff. An Introduction To The GNU Compiler


And Linker, 1st ed. 2002. N.p., Web.

12
A PPENDIX A.
D IAGRAMS

[H]
Figure 1: HydroFS’ file-loading method.

13
[H]
Figure 2: HydroFS’ file-writing method.

14
Figure 3: How a ”Patched” revision would load.

15
Figure 4: How ”Patched” versions would look like
when reconstructed.

16
A PPENDIX B.
S CREENSHOTS

Figure 5: Writing a file to HydrOS with HydroFS.

Figure 6: A file being written to HydrOS through


HydroFS.

17
Figure 7: Reading a file using HydroFS.

18
A PPENDIX C. Write Performance (Seconds) - 8 cores
T EST R ESULTS Size(in bytes) Measured Speed
508 0.000154005
1016 0.000202007
2032 0.000286009
4064 0.000466015
8128 0.000828105
Read Performance (Seconds) - 8 cores
Size(in bytes) Measured Speed
508 0.000016001
1016 0.000036003
2032 0.000064005
Write Performance (Seconds) - 2 cores 4064 0.000112007
Size(in bytes) Measured Speed 8128 0.000224015
508 0.001115957
1016 0.500850896 Figure 10: Octa-Core performance results
2032 1.501131014
4064 2.001798172
8128 3.003183701
Read Performance (Seconds) - 2 cores
Size(in bytes) Measured Speed
508 0.000061747
1016 0.000097891
2032 0.000158133
4064 0.000284639
8128 0.000540663
Figure 8: Dual-Core performance results

Write Performance (Seconds) - 4 cores


Size(in bytes) Measured Speed
508 0.000311531
1016 0.000401273
2032 0.000578193
4064 0.001984555
8128 0.003784495
Read Performance (Seconds) - 4 cores
Size(in bytes) Measured Speed
508 0.000032051
1016 0.000069231
2032 0.000121795
4064 0.000394859
8128 0.000884598
Figure 9: Quad-Core performance results

19

You might also like