You are on page 1of 219

'I

A SECURE STEGANOGRAPHIC FILE SYSTEM WITH


NON-DUPLICATING PROPERTIES
by
\ IAN DAVID ELLEFSEN
DISSERTATION
submitted in the fulfilment
of the requirements for the degree
MAGISTER SCIENTIAE
In
INFORMATION TECHNOLOGY
in the
FACULTY OF SCIENCE
at the
UNIVERSITY OF JOHANNESBURG
SUPERVISOR: PROFESSOR SH VON SOLMS
CO-SUPERVISOR: MR WJC VAN STADEN
NOVEMBER 2008
Contents
List of Figures vii
List of Tables ix
List of Listings XI
Notation and Definitions xiii
Summary XV
I File Systems, Cryptography, and Steganography 1
1 Introduction
1.1 Introduction .
1.2 Problem Statement
1.3 Goals ....... .
1.4 Structure of this Dissertation
1.4.1 Terminology Used in this Dissertation.
1.5 Conclusion . . . . . . . . . . . . . . . . . . . .
2 File Systems
2.1 Introduction . . . . . . . .
2.2 The Disk . . . . . . . . . .
2.2.1 The Physical Disk
2. 2. 2 The Logical Disk .
2.3 File System Layers ....
2.4 Basic File System Abstractions
2.4.1 Files ..... .
2.4.2 Directories ...... .
2.5 File System Structures . . . .
2.5.1 File System Descriptor
3
3
4
4
5
6
7
9
9
10
10
12
12
14
15
15
16
16
11 CONTENTS
2.5.2 Storage Management 17
2.5.3 File Control Block 20
2.5.4 Directory Entries .. 22
2.6 File System Operations . . . 23
2.6.1 POSIX Compliance . 23
2.6.2 Read and Write Operations 23
2.6.3 System Operations 24
2.6.4 File Operations ... 25
2.6.5 Directory Operations 28
2.7 Virtual File System . . . . . 29
2.8 Filesystem in Userspace (FUSE) . 31
2.9 Summary 31
2.10 Conclusion . 32
3 Cryptography 35
3.1 Introduction . 0 35
3.2 Basic Concepts 0 0 36
3.3 Symmetric Encryption 37
3.3.1 Substitution Boxes 38
3.3.2 Data Encryption Standard (DES) 40
3.3.3 Serpent ........... 43
3.4 Block Cipher Modes 0 0 0 45
3.4.1 Electronic Codebook Mode . . 45
3.4.2 Cipher Block Chaining Mode 46
3.4.3 ECB versus CBC 47
3.5 Asymmetric Encryption .... 48
3.5.1 RSA Encryption 0 0 49
3.6 Cryptographic Hash Functions . 51
3.6.1 Message Integrity Codes 52
3.6.2 Message Authentication Codes . 53
3.6.3 Birthday Attack . . . . . . . . 54
3.6.4 Secure Hash Algorithm (SHA) 56
3.7 Summary 59
3.8 Conclusion ............... 60
4 Steganography and Steganographic File Systems 63
4.1 Introduction ..... 63
4.2 Steganography . . . . . . . . . . 64
4.2.1 Terminology . . . . . . . 64
4.2.2 Historic Steganography . 65
4.2.3 Currency Protection Mechanisms 65
CONTENTS
II
5
4.2.4 Copyright Protection Mechanisms .
4.3 Digital Steganography . . . . . . . . .
4.3.1 Image Steganography ..... .
4.3.2 Image Steganography Example
4.3.3 Audio Steganography ..... .
4.3.4 Least Significant Bit (LSB) Attacks
4.4 Cryptographic File Systems . . . . . . . .
4.4.1 The Cryptographic File System- CFS
4.4.2 Cryptfs . . . . . . . . .
4.4.3 Linux Cryptoloop Driver .
4.5 Steganographic File Systems . . .
4.5.1 File System Assumptions .
4.5.2
4.5.3
4.5.4
Anderson, Needham and Shamir.
McDonald and Kuhn
Pang, Tan, and Zhou
4.6 Summary
4.7 Conclusion ......... .
SSFS: The Secure Steganographic File System
SSFS: File System Implementation
5.1 Introduction . . . . . . . . . . . . .
5.2 Definitions ..............
5.3 Problems with Existing Implementations
5.3.1 McDonald and Kuhn
5.3.2 Pan, Tan, and Zhou .......
5.4 Aim 0 0
5.4.1 The Need for a Steganographic File System
5.4.2 Limitations of a Steganographic File Systems
5.5 Basic Construction ......
5.5.1 Modes of Operation . . .
5.5.2 The Host File System . .
5.5.3 The Hidden File System
5.5.4 Logical and Physical View
5.5.5 Operational Scenario
5.6 Summary
5.7 Conclusion . . . . . . . . .
iii
67
67
68
68
70
70
72
72
73
74
75
76
76
77
78
79
81
83
85
85
86
87
87
88
90
93
94
95
96
96
97
98
99
. 100
. 102
lV
6 File System Structures for SSFS
6.1 Introduction . . . . . . .
6.2 File Systems Structures.
6.2.1 Superblock . . .
6.2.2 TMap Array . . .
6.2.3 Translation Map
6.2.4 Inode Table . . .
6.2.5 Files and Directories
6.3 File System Initialisation . .
6.3.1 Host File System Initialisation .
6.3.2 Hidden File System Initialisation
6.4 Summary .
6.5 Conclusion ............ .
7 File System Operations for SSFS
7.1 Introduction .......... .
7.2 Layered File System Operations .
7.3 Low-Level Operations ...... .
7.3.1 Read and Write Operations Overview .
7.4 Intermediate-Level Operations . . . . . . . . .
7.4.1 Logical-Physical Translation Operation
7.4.2 Translation-Map Operations
7.4.3 Inode Operations ..
7.5 High-Level Operations ...
7.5.1 Directory Operations
7.5.2 File Operations
7.6 Summary .
7. 7 Conclusion . . . . . . .
8 File System Security for SSFS
8.1 Introduction . . . . . . . . . .
8.2 Security Overview .............. .
8.2.1 Security through Information Hiding
8.2.2 Security through Cryptography
8.3 Data Cryptography . . . . .
8.3.1 Choice of Algorithm ..
8.4 Cryptographic Layer . . . . . .
8.4.1 Transparent Encryption
8.5 File System Data Encryption Scheme
8.5.1 Data Classes.
8.5.2 Interactions . . . . . . . . . .
CONTENTS
105
. 105
106
. 106
. 109
. 110
. 112
. 115
. 116
. 116
. 118
. 126
. 127
129
. 129
. 130
. 131
. 131
133
. 133
. 134
136
. 140
. 140
. 146
150
. 151
153
. 153
. 153
. 154
. 155
. 156
. 156
. 157
. 158
. 160
. 160
. 160
CONTENTS
8.6 Encryption Hierarchy ...... .
8.6.1 Initialisation Vectors (IV)
8.6.2 Operational Scenario
8. 7 Performance Concerns
8.8 Summary .
8.9 Conclusion ..... .
9 Dynamic Reallocation
9.1 Introduction .....
9.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2.1 Other Possible Collision Avoidance Techniques .
9.2.2 Operational Scenario
9.3
9.4
9.5
9.2.3 Process Overview . . . . . . . . . . . . .
Operational Details . . . . . . . . . . . . . . . .
9.3.1 Access to Hidden File System Structures
9.3.2 Write Redirection . . . . .
9.3.3 Hidden Data Reallocation .
9.3.4 Reallocation Categories ...
9.3.5 Sacrificial versus Preserving
Summary .
Conclusion ............. .
10 Steganographic File System Performance
10.1 Introduction ............ .
10.2 Hidden File System Performance .
10.2.1 Hidden Data Fragmentation
10.3 Host File System Performance ...
10.3.1 Dynamic Reallocation Performance
10.3.2 Dynamic Reallocation Code Profiles .
10.4 Summary .
10.5 Conclusion .
11 Conclusion
11.1 Introduction .
11.2 Contribution.
11.3 Contribution of SSFS
11.4 Future Work.
11.5 Conclusion .
Appendices
v
. 161
. 162
. 164
. 164
166
. 167
169
169
170
171
171
172
. 173
174
174
177
179
. 181
. 183
. 184
185
185
186
186
187
187
189
190
193
195
195
. 195
197
198
199
200
vi
A SSFS Implementation
A.l Introduction . . . . .
A.2 Host File System ..
A.3 Hidden File System .
A.4 Screenshots
A.5 Conclusion ..... .
Bibliography
CONTENTS
203
. 203
. 203
. 204
. 206
. 208
215
List of Figures
2.1 Physical disk ..
2.2 Logical disk . . .
2.3 File system layers
2.4 A file system bitmap
2.5 A simple inode ...
2.6 An inode with levels of indirection .
2. 7 Virtual file system overview
3.1 Symmetric encryption ...
3.2 DES S
1
substitution box [18] .
3.3 DES encryption algorithm flow
3.4 Electronic codebook mode . . .
3.5 Cipher block chaining mode ..
3.6 Comparison between ECB and CBC modes.
3. 7 Asymmetric encryption . . . . . . . . . . . .
3.8 File verification using a message authentication code
4.1 Example of an EURion constellation
4.2 Image steganography example
4.3 CFS design architecture [14] . . . . .
4.4 Cryptfs design architecture [14] ...
4.5 Linux Cryptoloop driver architecture [14]
5.1 Simple host file system layout . . . . .
5.2 Hidden file system logical layout ....
5.3 Hidden and host file system integration
5.4 Steganographic file system operational scenario
7.1 File system operation layers ....... .
7.2 Simple read and write operation overview.
7.3 Logical to physical translation operation
7.4 Block allocation . . . . . . . .....
vii
11
12
13
18
20
21
30
38
39
41
46
47
48
50
53
66
69
72
74
75
97
98
99
101
130
132
133
135
Vlll LIST OF FIGURES
7.5 Creating a directory . 142
8.1 Cryptographic layer . 158
8.2 Transparent encryption . 159
8.3 Initialisation vector hierarchy 163
9.1 Write operation execution redirection . 175
9.2 Function modified with reallocation methods 176
9.3 Reallocation black-box functions . 177
9.4 Reallocation categories 0 0 . 179
10.1 Optimised versus unoptimised dynamic reallocation 188
10.2 Code profile of unoptimised dynamic reallocation 191
10.3 Code profile of optimised dynamic reallocation . . 192
A.1 Using the makehfs utility. 0 0 . 209
A.2 Starting the hidden file system shell. . 209
A.3 The hsh commands . . . . . 210
A.4 Directory listing with hsh ... . 210
A.5 File creation with hsh ..... . 211
A.6 Displaying the contents of a file . 211
A.7 Deleting a file with hsh . . . . . 212
A.8 Deleting a directory with hsh . 212
A.9 Creating a file with fsh .. . 213
A.10 Dynamic reallocation in fsh . 213
List of Tables
3.1 Basic cryptographic terms .. 36
3.2 Basic cryptographic functions 37
4.1 Basic steganographic terms . 65
5.1 SSFS definitions . . . . . . . 86
6.1 Calculation of the size of the Translation Map 121
6.2 Calculation of the size of the Inode Table . . . . 124
lX
Notation and Definitions
xy
xt!Jy
xy
xy
xl\y
xVy
JSI
rxl
LxJ
f : A ~ B
gcd(a, b)
bit
byte
kibibyte (KiB)
mebibyte (MiB)
gibibyte (GiB)
Bitwise Exclusive OR (XOR) of x and y.
Bitwise Addition of x and y.
Bitwise Right Shift of x by y bits.
Bitwise Left Shift of x by y bits.
Unary Negation of x.
Bitwise And of x andy.
Bitwise Or of x andy.
The size of set S
Ceiling function that returns the smallest integer 2: x
Floor function that returns the largest integer :::; x
A function f mapping A to B
A function to compute the Greatest Common Divisor
of two non-zero integers a and b.
A binary digit -either 0 or 1
A set of 8 bits
A kilo binary byte, where
1 KiB = 2
10
bytes = 1024 bytes.
A mega binary byte, where
1 MiB = 2
20
bytes= 1024 KiB.
a giga binary byte, where
1 GiB = 2
30
bytes = 1024 MiB.
Xlll
List of Listings
6.1 Superblock structure 0 0 . 107
6.2 Definition of the TMap Array . 110
6.3 Translation Map structures . . 111
6.4 Inode Table entry . . . . . . 113
6.5 Directory Entry structure . 115
Xl
Summary
This dissertation investigates the possibility of a steganographic file system
which does not have to duplicate hidden data in order to avoid "collisions"
between the hidden and non-hidden data. This will ensure the consistency of
the hidden data, and avoid unnecessary data duplication while at the same
time providing an acceptable level of information security.
The dissertation will critically analyse a number of existing stegano-
graphic file systems in order to determine the problems which are faced by
this field. These problems will then be addressed, which will allow for the
definition of a possible solution.
In order to provide a more complete understanding of the implementa-
tion discussed in the latter part of this dissertation, a number of background
concepts are discussed. This includes a discussion of file systems, cryptogra-
phy, and steganography, each of which contributes to the body of knowledge
required for later chapters.
The latter part of this dissertation outlines the Secure Steganographic File
System (SSFS). This implementation will attempt to effectively manage the
storage of hidden data which is embedded within a host file system. The
dissertation will outline how SSFS will allow fragments of hidden data to
exist in any physical location on a storage device, while still maintaining a
consistent file system structure.
The dissertation will then critically analyse the impact of such a system,
by examining the impact on the host file system's performance. This will
allow the feasibility of such a system to be demonstrated.
Keywords
Information Security, Cryptography, Information Hiding, Steganography,
File Systems, Steganographic File Systems.
XV
Part I
File Systems, Cryptography,
and Steganography
1
4 INTRODUCTION
1.2 Problem Statement
The application of steganography as a viable method for hiding large amounts
of information has always been limited. Traditional steganographic tech-
niques, such as image and audio steganography [1, 41], can hide only a very
limited amount of information, depending on the total size of the cover-file.
Steganographic file systems strive to solve this problem by allowing rela-
tively large amounts of data to be hidden within an existing host file system
[33]. This however presents a new set of challenges. By hiding data within
an existing file system there is a conflict which arises between the hidden
and non-hidden data. This results in "collisions" when the host file system
attempts to write to a physical block which contains hidden data. This in
turn results in hidden data being overwritten, and thus the consistency of
that data being questioned.
In this dissertation we attempt to address this issue by defining a stegano-
graphic file system which has the ability to dynamically reallocate fragments
of hidden data to any free physical location on a storage device. This will
allow hidden data to be effectively embedded within a host file system, and
thus guarantee the consistency of that data.
This dissertation is primarily concerned with the construction of a dy-
namically reallocating steganographic file system. As such, this study does
not specifically go into details of data consistency following the incorrect
shutdown of the file system, and assumes a computing environment in which
data is always written to storage media in a consistent manner.
In the following section we will discuss the goals which we wish to achieve
with our steganographic file system implementation.
1.3 Goals
In order to address the problems outlined above, we present our stegano-
graphic file system implementation, which we call the Secure Steganographic
File System (SSFS). To allow SSFS to have the required impact we will
present the design goals listed below. These goals are outline in greater
detail in chapter 5.
Security - hidden data must remain protected from attack through
the use of effective security mechanisms.
1.4. STRUCTURE OF THIS DISSERTATION 5
Consistency - hidden data which is retrieved from the hidden file
system must be the same as the original data which was stored.
Transparency - operation of the host file system must not be impacted
in a significant way.
Backward Compatibility - SSFS must remain backward compatible
with the host file system implementation.
Dynamic Reallocation - hidden data must be locatable from any phys-
ical location. The physical location of the hidden data must be allowed
to change.
These goals are achieved by SSFS through the construction of a com-
pound file system where the implementation contains both the host file sys-
tem and hidden file system components. The interactions between these two
components are carefully orchestrated to allow for seamless integration.
In the following section we will outline the structure of the following
dissertation.
1.4 Structure of this Dissertation
We will now outline the structure of this dissertation by briefly discussing
the content of each of the chapters which will follow.
Chapter 2 is the first of the background chapters. In this chapter we
present an overview of traditional storage media, and discuss a number of
concepts relating to the construction of a file system. Concepts introduced
in chapter 2 are used throughout the later chapters.
Chapter 3 discusses the concepts relating to cryptography as a method
of providing information security. We discuss a number of cryptographic
techniques and algorithms. These concepts are used primarily in chapter 8
to describe the implementations security scheme.
Chapter 4 introduces steganography as a method of providing information
hiding. This chapter discusses both traditional steganographic techniques,
and steganographic file systems. An important aspect of this chapter is the
distinction between a cryptographic and steganographic file system. Con-
cepts discussed in this chapter are used throughout the following chapters.
This chapter will conclude the background discussion.
6 INTRODUCTION
Chapter 5 is the first of the implementation chapters, where we discuss
the implementation details for SSFS. This chapter is concerned with critically
analysing a number of existing implementations. We then present our aim
for SSFS. We go on to discuss the basic construction for SSFS, which will
lay the framework for the following chapters.
Chapter 6 discusses the control structures which are used within SSFS
in order to embed hidden data within the host file system. The structures
discussed in this chapter are used extensively throughout the following chap-
ters.
Chapter 7 discusses the hidden file system operations as a mechanism for
interacting with the hidden data. In this chapter we define the operational
layers used by SSFS in order to allow for easy interaction with the hidden
file system structures. This is of particular importance for chapter 8 as the
security scheme will extend these layers.
Chapter 8 is concerned with the security scheme used by SSFS in order
to provide information security. Hidden data is encrypted using multiple
initialisation vectors to allow for maximum security. This chapter will also
define the transparent encryption layer, which will allow hidden data to be
encrypted and decrypted transparently, as needed. The security scheme will
work in tandem with the dynamic reallocation mechanism discussed in chap-
ter 9 in order to allow encrypted data to be reallocated by the host file
system.
Chapter 9 defines the dynamic reallocation mechanism which is used by
SSFS to avoid "collisions" between the hidden and non-hidden data. This
chapter utilises almost all of the concepts from the previous chapters in order
to describe the dynamic reallocation process.
Chapter 10 discusses the performance impact of the dynamic reallocation
mechanism on the host file system. This will allow us to critically analyse
the effectiveness of SSFS.
Finally in chapter 11 we reflect back upon the content on this dissertation
and will discuss the contribution made by SSFS.
1.4.1 Terminology Used in this Dissertation
In this dissertation, a UNIX approach is taken when discussing various con-
cepts. The UNIX environment has a long standing, accepted set of terms
used to describe various aspects of the operating system. These terms are
generally easy to understand and provide a good basis for development.
1.5. CONCLUSION 7
An advantage for utilising a UNIX approach is that many UNIX-like
operating systems are open sourced. This allows the source code, along
with large amounts of documentation, for the operating system to be freely
accessed.
Using this approach as a platform we can describe many different concepts
in this dissertation in a consistent, and accepted manner.
1.5 Conclusion
In this chapter we introduced the content of this dissertation. In section 1.2
we discussed the problem statement which will define the theme for the
following chapters. Section 1.3 presented an overview of the goals which we
hope to achieve with SSFS. Finally in section 1.4 we outlined the structure
of this dissertation.
In the following chapter we will start the discussion with the first of the
background chapters, which will deal with file systems.
Chapter 2
File Systems
2.1 Introduction
As the storage capability of physical devices grew from only a few Megabytes
to hundreds of Gigabytes in modern computer systems, a need arose for data
to be organised in a consistent logical way. Modern file systems need to
provide many more features in today's security centric world as compared to
file systems that existed only decades ago. However, although the functional
requirements for a file system have changed, the original design concepts are
still in use today, and form the basis for many modern file systems.
In this chapter the discussion primarily focuses on UNIX based file sys-
tems, and as such uses terminology that is specific to UNIX based file sys-
tems. References to other types of file systems are included for interest. This
is done as proceeding chapters will make reference to UNIX-styled operating
systems and related UNIX -specific terminology.
In this chapter we discuss different core components of a file system.
Firstly we discuss the hard disk drive as a storage medium for a file system.
We then introduce four file system layers that can describe the interaction
between the file system metadata, the data and the physical disk. Two basic
file system abstractions, files and directories, are then discussed as a method
for organising data within the file system. We then go on to introduce a
number of structures, which are implemented within a file system in order
to organise data. Finally we introduce a number of file system operations
and conclude with an overview of a Virtual File System. These will all be
discussed in the following sections.
9
10 FILE SYSTEMS
2.2 The Disk
Information is usually stored permanently on secondary memory. Primary
memory refers to the Random Access Memory (RAM) or the cache memory
that is physically on the Central Processing Unit (CPU); this memory is
usually made up of fast volatile memory that is accessed by the CPU when a
program is executed. In fact all data needed for processing has to be stored in
the primary memory. The volatility of primary memory does not lend itself
to the long-term storage of information. If a system should be powered down,
all the information that is stored in primary memory would be lost [50]. All
permanent data needs to be stored in secondary memory, such as a hard disk
drive. A management structure such as a file system needs to be in place
to effectively manage the storage and retrieval of information that is stored
on secondary memory. A hard disk drive can be viewed as either logical or
physical, these are discussed below.
2.2.1 The Physical Disk
Physically a hard disk drive is made up of one or more magnetic platters,
a number of read and write heads, and control circuitry that allows the
computer to interface with the hard disk drive. Data is stored magnetically
on the platters, which are coated in a magnetic substance. The control
circuitry allows information to be stored and retrieved from the magnetic
platters. Data is stored on the disk in physical blocks or sectors on the
magnetic platters; which is the smallest unit of data that the hard disk will
read or write [23]. The size of the blocks on the hard disk drive is usually 512
bytes, and thus all data that is stored on the disk is written in sequences of
512 byte blocks. Blocks on the hard disk drive are ordered in the following
way, as seen in figure 2.1:
Sectors are the blocks themselves.
Clusters are two or more adjacent sectors.
Tracks are concentric rings of sectors on the disk. Blocks that physically
located next to each other are said to be in the same track.
Cylinders (not seen in figure 2.1) are groups of sectors that are located
on the different platters but that are directly underneath each other.
The cylinder will refer to every block in this grouping.
2.2. THE DISK
Magnetic Platter
I
... Cluster
I
I
... Track

Figure 2.1: Physical disk - hard disk drive platter
11
12
Logical Linear Array
0 29
1111111111111111111111111111111 ..
I
I
I
I
'- Block
Figure 2.2: Logical disk - linear array of blocks
FILE SYSTEMS
The physical location of data on the disk is addressed using Cylinder Head
Sector (CHS) addressing, where data is located by referring to the cylinder
where the data exists, the head number that will read or write the data, and
the sector or block in the cylinder where the data can be located.
2.2.2 The Logical Disk
It would become very difficult to continually refer to a disk block using its
physical address, which may be dependent on a particular manufacturer's
specification. So there is a generic view of the hard disk drive, called the
logical disk. The logical disk can be simply viewed as a linear array (see
figure 2.2) of equally sized blocks [23]. The Logical Block Address (LBA) is
the number of a logical block within the logical disk array.
The LBA allows the system designers to reference a storage location in a
simple consistent manner, regardless of the physical construction of the hard
disk drive. This is achieved through the use of methods which convert logical
block addresses into a physical block addresses.
Now that we have discussed the physical and logical construction of the
hard disk drive, we will discuss a number of file system layers in the next
section. The file system layers are a high-level overview of the interaction
between the file system implementation and the hard disk drive.
2.3 File System Layers
The file system layers are a high-level overview of different functional com-
ponents that collectively would form a working file system. The lower layers
2.3. FILE SYSTEM LAYERS 13
Application Programs User Space
File-Organisation Module
Kernel Space
File System Implementation
Physical Device
Figure 2.3: File system layers
would usually be implemented in the operating system's kernel, and the
higher levels would be implemented in, what we would know as, the file
system implementation.
Silberschatz, Galvin, and Gagne [50] point out that the higher levels
are extended by functionality defined in the lower levels. Information flows
through the different layers until it is at a point where the data can be
written directly to the disk. The four different functional layers that are
used when interacting with a file system are I/0 Control, Basic File System,
File-Organisation Module, and the Logical File System. These different file
system layers interact together as shown in figure 2.3. We will now discuss
each of these function layers in detail.
I/0 Control
Silberschatz, Galvin, and Gagne [50] define the lowest file system level as I/0
Control. This level is responsible for interacting with the hardware devices
14 FILE SYSTEMS
through the use of device-drivers which communicate with the hardware
controller in order to retrieve or store data on the device.
Basic File System
The Basic File System is simply responsible for passing generic read and
write commands to the I/0 Control level [50]. A generic command would
be used to reference a particular physical block (using an addressing method
such as CHS) to access data that is stored on the disk.
File-Organisation Module
The File-Organisation Module layer is responsible for converting the logical
position of data on the disk to a physical address which then can be used
to access the data. This layer will also manage a list of disk blocks which
are currently being used by the file system, called the allocated blocks, and
those which are not being used, called unallocated blocks.
Logical File System
The Logical File System is responsible for managing the metadata for the
files and directories. This is the layer with which the user and application
programs would interact. Metadata, such as the human-readable name of a
file or directory, would be translated to block addresses in this level to be
passed to the File-Organisation Module. This level would be responsible for
allocating and managing file system structures that are defined in the lower
levels [50].
The file system layers discussed above describe the interaction between
the user data and the hard disk drive. We will now discuss files and di-
rectories. These two data types can be used to build up an organisational
structure, which is essential to the operation of a file system.
2.4 Basic File System Abstractions
There are two basic properties that are common to every file system:
1. The ability to store information; this is usually achieved by storing
information in files.
2.4. BASIC FILE SYSTEM ABSTRACTIONS 15
2. The ability for files to be organised into a directory structure. This
provides a hierarchical organisation of all the information on disk.
The primary purpose of the file system is to manage the organisation of
files and directories, and implement mechanisms that facilitate the fast and
efficient storage and retrieval of the data on the disk.
2.4.1 Files
The most basic file system object is the file. Giampaolo [23] refers to the
fact that all information on the file system is stored in some sort of file. Files
generally do not have a system-defined structure, and are viewed by the file
system simply as a "stream of bytes" which needs to be written or read from
the disk [23].
The meaning of the data within a file is designed and interpreted by the
creator of the file. Different content-types such as audio, video, or text are
in essence all a stream of bytes, and are all managed in the same way by the
file system. Files usually have a collection of attributes, commonly referred
to as metadata. Simply put, metadata is data about data.
Very little of the information that is contained in the metadata is useful
to the operation of the file system, except metadata indicating the size of
the file. Metadata is used to serve as an interface between the raw data
contained in the file system and the human operator.
2.4.2 Directories
Directories are used to organise data on the disk into an organisation struc-
ture. Originally there was no need for a complex hierarchical directory struc-
ture because the small sizes of disk drives prevented the storage of a large
number of files. As disk drives grew in size, a need arose to organise files
on the disk drive in a logical way to allow the operator to quickly find and
access the data, the traditional hierarchical directory structure evolved from
this need as it allows files to be efficiently organised.
A directory can contain sub-directories and files. Love [31] explains that
directories in the UNIX file system are simply modified files. UNIX directo-
ries are files that contain a list of i-nodes for associated child sub-directories
and files. There are a number of different methods that a file system can
use to handle the organisational structure of the file system; each of these
16 FILE SYSTEMS
methods will have an impact on the overall performance of the file system.
As a result, directory structures are usually designed to be managed using an
abstract data structure such as a tree to allow for quick traversal of the direc-
tory structure. For example, the Linux Ext2 file system uses B-'frees, and
the Mac OS HFS+ file system uses B*'frees (see the discussion on Multiway
Search Trees on page 19) to manage its directory structure [23).
In order for files and directories to be organised in a meaningful way, there
needs to be a method of referencing the physical location of the under lying
data on the storage device. In the following section we will discuss a number
of generic file system structures which are used to build up the "structure"
of the files and directories on the file system.
2.5 File System Structures
A file system in its most basic form can store, retrieve and organise in-
formation in a logical way. In order for this to be achieved, a set of basic
information needs to be maintained. This information usually takes the form
of a number of data structures that exist within the file system, which are
used to manage, coordinate, and reference data. A file system will have to
reference and maintain these file system structures for every operation which
can be performed on the data.
Every file system implementation has a different set of data structures
that it will maintain. The following generic file system structures are dis-
cussed below, the file system descriptor, the storage map to provide storage
management, file control blocks, and directory entries. These basic struc-
tures are found in one form or another on most file systems, although they
may differ in design and implementation.
2. 5.1 File System Descriptor
The file system descriptor will contain the most basic set of information that
can be used to describe and reference all other structures within the file
system. When a file system is created, all the basic on-disk structures are
defined and the physical position of these structures on the disk is determined.
Once the on-disk structures have been created the file system will then record
their physical location in the file system descriptor. Traditionally the file
system descriptor is called the superblock within UNIX file systems; it is also
called the Master File Table in the NTFS file system [50). For the purposes
2.5. FILE SYSTEM STRUCTURES 17
of this dissertation, we will use the term "superblock" to refer to the file
system descriptor.
The superblock must include all attributes of the file system to allow data
to be retrieved. This may include the total number of usable blocks within
the file system, the number of blocks that are currently in use, pointers to
any storage maps, and pointers to any file control blocks [50]. Although
there will be a number of similarities between the design of the superblock
on different file system, the contents of the superblock is determined by the
file system designers.
A common component of most superblocks is some form of consistency
information, this will be used to mark if certain operations and structures in
the file system have been stored correctly, and to determine if a consistency
check needs to be run. The superblock will also contain generic metadata
about the file system, such as an identifying name, or any "Magic Numbers"
1
.
In the following section we will discuss storage management, which will
allow physical disk block to be allocated and deallocated within the file sys-
tem.
2.5.2 Storage Management
All the blocks on a disk which are allocated to a particular file system need
to be mapped in some way. This mapping will allow the file system to record
which physical blocks are currently in use. The file system will use the storage
map in conjunction with an allocation policy to determine where data should
be positioned, this is usually done to minimise fragmentation.
File system blocks can be organised in a number of different ways in
order to facilitate the structure of the data. Listed below are a number of
commonly used methods for organising the physical file system data.
Block ~ a file system block. A file system block is the smallest unit of
data which the file system will process, and can be a different from the
physical disk block size.
Extent ~ a number of contiguous file system blocks. An extent will
usually be represented by a start address, and a total length.
1
M agic Numbers are simply constant numbers that can be used to identify data struc-
tures, to provide a simple method of consistency checking, or to differentiate between
versions of data structures. An example of a magic number could be Ox53424C4B, which
could be represented in ASCII as SBLK.
18 FILE SYSTEMS
Blockrun - another term for an extent.
Allocation Group- a number of contiguous extents or blocks. Usually
a file system can be broken up into a number of equal sized allocation
groups. As in the case of Ext3 each allocation group is a regarded
as a "mini-file system", each with a set of corresponding file system
structures.
In the following sections we will discuss a number of methods that can be
used to manage the file system blocks. These techniques allow the file system
to manage which of the file system blocks are either allocated or unallocated.
Storage Bitmap
The simplest approach for storage management is to use a bitmap. This
approach was used in early file system design as is it very simple to imple-
ment, and easy to understand. A storage bitmap will represent the entire
physical device as a linear array of file system blocks, as seen in figure 2.4,
each physical block with a corresponding bit in the storage bitmap. Each
bit can either be 0 when the block is not allocated and 1 when the block is
allocated [34].
Although a bitmap is a very simple solution to mapping file system blocks,
it can be inefficient. If the bitmap is implemented as a linear array of bits,
then it is subject to the same searching constraints as normal linear arrays.
The "worse-case" running time for searching a linear array can have a time
complexity of O(n) [29].
B Allocated
0 Unallocated
Figure 2.4: A file system bitmap
Silberschatz, Galvin, and Gagne [50] explain that modern processors im-
plement "bit-manipulation" instructions which allow the bitmap operations
to be implemented in a very efficient manner, and this allows a bitmap im-
plementation to gain a major performance advantage. However they also
2.5. FILE SYSTEM STRUCTURES 19
point out that there will only be an advantage when the bitmap is kept in
memory, and as seen above, this is not always possible because of the storage
requirements of larger hard disk drives.
Multiway Search Tree Implementation
Another approach for storage management is to use a more complex data-
structure, such as a multi way search tree, such as a B-Tree or a B+ Tree. The
XFS
2
file system implementation utilises B+ Trees to manage the storage
blocks on a physical device. XFS manages disk blocks in allocation groups
using two B+ Trees to manage the free space within each allocat.ion group
[38]. Both B+ Trees store a stored array of free space extents, where the first
is sorted by block offset, and the second is stored by the size of the extent.
This will allow free space to be located near a particular physical block offset
[38].
Allocation Policies
Allocation policies are implemented in most file systems in order to allocate
blocks in the most contiguous way, where all the data relating to a file is
stored as sequentially as possible on the disk. This usually involves trying to
locate a set of contiguous unallocated file system blocks in the storage map.
Smith and Seltzer [51] point out that a typical UNIX file system will have a
performance degradation of about 15% after two years of operation, due to
file fragmentation. The design aim of most allocation policies try to increase
locality of reference of data in order to minimise seek times, and to improve
the overall layout of the data on the disk [34]. FFS
3
will simply place data
from the same file within the same allocation group [34] thereby increasing
locality of reference.
Modern hard drives can be very large; a 500GiB hard drive is not un-
common. A storage map may become extremely large; this must be a con-
sideration because the storage map will need to be searched for free blocks.
Through efficient use of a storage management structure, and an efficient al-
location policy, free space can be located quickly and this can greatly improve
the performance of a file system.
2
XFS file system implementation created and maintained by Silicon Graphics, Inc.
(SGI)
3
FFS- The Fast File System for UNIX is used in the 4.3BSD Family of Operating
Systems
20 FILE SYSTEMS
In the following sections we will discuss the File Control Blocks and the
Directory Entries as a mechanism for referencing specific files and directories
within the file system.
2.5.3 File Control Block
Inode Structure
Inode Number
Attributes
Physical Disk
Direct Block 0 Data Block 1001
Direct Block 1 Data Block 1002
Direct Block n Data Block 3454
Figure 2.5: A simple inode
The file control block is one of the most important structures that can
exist within the file system; as it is responsible for describing the location of
a file on the disk, and for storing any of the file's metadata. Traditionally the
file control block is called an "inode" on UNIX systems [34]. The design of an
inode is of critical importance because specific attributes of the file system are
defined by the inodes, such as the maximum amount of disk space that can
be allocated to a single file [23]. With files becoming larger to accommodate
data such as video and audio; inodes need to be designed in such a way that
allows the file system to address large amounts of data.
A simple inode could have a structure similar to that shown in figure 2.5,
where the inode would contain metadata relating to a particular file. Refer-
ences to the physical location of the file data would be stored in a "block list".
A block list is an array of disk blocks where the data an inode references is
located.
To allow file systems to store large amount of data, there is a level of
"indirection" which is introduced into the inode structure. For example,
given a file system that has a block size of 1024-bytes, and assuming an
inode directly refers eight disk blocks in a single structure. A single inode,
2.5. FILE SYSTEM STRUCTURES 21
and therefore a file, can only reference a maximum of 8192-bytes, or 8KiB,
of disk space, this is not nearly sufficient for modern computing.
A file system implementation will introduce a level of indirection into the
structure of an inode, to increase the number of disk blocks that it can refer
to, and as a result increase the maximum file size. An inode will reference
a number of "direct" blocks, and then will have a reference to a number of
"indirect" blocks, which in turn will reference a number of "direct" blocks.
In most cases an inode will also reference "double-indirect" blocks which will
reference a number of "indirect" blocks. In rare cases the can be a third level
of indirection, and an inode can "triple-indirect" blocks. The relationship
between direct, indirect, and double-indirect blocks can be seen in figure 2.6.
!node Structure
Inode Number
Attributes
Physical Disk
Direct Block 0 Data Block 1001
Direct Block 1 Data Block 1002
Direct Block n Data Block 3454
Indirect Block Direct Block 0 Data Block 3553
Double-Indirect Block Direct Block 1 Data Block 2612
Direct Block n Data Block 109
Indirect Block
Indirect Direct Block 0
f-------to
Direct Block 0
Indirect Direct Block 1 Direct Block 1
:
Indirect Direct Block n Direct Block n
Double-Indirect Block Indirect Block
Figure 2.6: An inode with levels of indirection
The levels of indirection can dramatically increase maximum file size.
For example, again assuming a file system block size of 1024-bytes, and
assuming an inode that references eight "direct" blocks, and references eight
"indirect" blocks, which will in turn reference eight direct blocks. The direct
blocks would reference 8KiB, and each of the eight indirect-blocks would
22 FILE SYSTEMS
reference 8KiB, the total amount of disk space that a single inode could
reference would be 65KiB, and this could be increased in a similar way by
using "double-indirect" blocks.
UNIX file systems will store inodes in a "table of inodes" that will be
located somewhere on the disk. Inodes can be stored in one large single ta-
ble, but most UNIX file systems will store inodes for a particular allocation
group in a separate table. This would improve performance of the file sys-
tem by increasing locality of reference between the file and any associated
management structures. The NTFS file system manages file metadata in a
different way, the Master File Table stores all the metadata for a file in a
rational database, with an entry for each file on the system [50]. This is a
more complex method of handling file metadata, but has all the advantages
of a rational database, such as easy indexing and complex queries.
In the following section we will discuss the Directory Entries as a method
for managing a hierarchical directory structure, which is vital for the man-
agement of data on the physical disk.
2.5.4 Directory Entries
Directory Entries are used to manage the directory structure on the disk. Dif-
ferent file system implementations will use different types of abstract struc-
tures to manage the file system directory hierarchy. Directory structures can
be maintained using simple arrays to complex trees, each offer a different set
of advantages to the overall directory structure.
The simplest approach is to view directory entries as a special type of file;
the directories entries themselves are simply a linear list of sub-directories
and files that exist within the current working directory [23). This approach
can become inefficient when a directory contains a large number of files.
Giampaolo [23) points out that another approach is to store directory
entries in a tree structure such as a B-Tree, B+ Tree or a B*Tree. Carrano
and Savitch [9) explain that a B-Tree is a balanced multi way search tree of
order m, where each node in the tree can have up to 2m children. Giampaolo
[23) goes on to explain that B-Tree will allow the file system to store a key
for each directory entry, which will allow for the directory structure to be
traversed quickly.
Every file system regardless of structure requires a root directory, usually
referred to simply as the root, which will contains a number of files and
directories. The root is represented by a backslash ('/') on UNIX systems,
2.6. FILE SYSTEM OPERATIONS 23
and by a letter designation ('c: \')on Microsoft Windows platforms. It serves
as a "mounting point" from which the rest of the directory structure can be
referenced.
In order for the operating system to interact with the file system im-
plementation, a number of file system operation must be supplied by the
operating system. These operations allow access to the file system structures
and the data which is stored on the storage device. A number of generic file
system operations are discussed in the following sections.
2.6 File System Operations
The file system needs to provide mechanisms to manage data that it contains;
this is achieved through the use of a number of operations which the file
system provides. The most basic of the file system operations is the read and
write operations, all other file system operations are simply a combination
of either a read or a write operation.
2.6.1 POSIX Compliance
The Portable Operating System Interface (POSIX) is a standard that is
maintained by the IEEE and The Open Group, which allows for interoper-
ability between different operating systems. This is achieved by requiring
compliant operating systems to implement a standard set of system calls
and system utilities [24]. Part of the requirements for POSIX compliance,
is the implementation of standard File and Directory Operations, which are
discussed below. The interested reader is referred to the Single UNIX Spec-
ification [24] for more information regarding the POSIX interface.
2.6.2 Read and Write Operations
The read and write operations are the two most basic operations that the
file system must support. Both the read and write operations will handle the
translation from the logical addressing used in normal file system operations
to the physical addresses on the hard disk drive. Giampaolo [23] points
out that all file systems need to implement these low-level operations and
furthermore implement more advanced features that extend the functionality
of the file system.
24 FILE SYSTEMS
File system operations can be divided into different categories, namely
System Operations, File Operations, and Directory Operations. System op-
erations provide the operating system access to the file system. File and
directory operations act on the data within the file system. These different
types of operations are discussed in the following sections.
2.6.3 System Operations
The file system must support a number of basic system operations that man-
age the creation, called the initialisation, of the file system, the initial access
of the file system, called the mounting, and the shutdown of the file system,
called the unmounting. Each of these operations is described below.
Initialisation
The initialisation operation controls the creation of the file system. This
operation is responsible for the creation and set-up of all the file system
structures that are going to be used during normal file system operation.
The superblock, any storage maps, file control blocks, and all associated file
system information is gathered and stored on the storage device where the
file system will be located [23]. The root directory also needs to be created
during this operation, which will allow data to be created and accessed at a
later stage.
Once all the file system structures have been created, the location of the
structures is recorded in the superblock, and from this point the file system is
ready to be used. As Giampaolo [23] points out, the initialisation of modern
file systems is done by user programs, and not the file system itself.
Mounting
The mounting operation is performed whenever the file system is initially
exposed to the operating system. During this operation, the superblock is
read into primary memory, and any required access control mechanisms will
be created. The operating system will be able to access the storage maps,
and the file metadata.
The mount operation will usually attempt to run a consistency check on
the structure of the file system. Should a basic consistency check fail, then
a more intensive check will be performed on the file system. A consistency
2.6. FILE SYSTEM OPERATIONS 25
check is usually performed on the file system if it was not unmounted cleanly
from the operating system.
Unmounting
The unmounting operation will cleanly detach the file system from the op-
erating system and release any resources the file system is utilising. The
unmount operation will flush any blocks that are waiting in the block cache
to the storage media, and update any of the file system structures that have
been changed during the normal operation of the file system.
Once the file system's structures and data have been written to disk, the
unmounting operation will mark a flag in the superblock to indicate the file
system was unmounted cleanly, and finally the superblock is flushed to disk.
We will now discuss the file operations which are used to support the storage
and retrieval of files from the file system implementation.
2.6.4 File Operations
The file system needs to support a variety of operations which are used to
perform a number of actions on files. The reading and writing operations will
extend the file system's read and write operations and will operate directly on
the block space where the file exists. The create(), delete() and open()
file operations will operate on the file's metadata. By combining operations,
the file system can create complex operations, such as a move or a rename
operation. A file can be regarded as an abstract data type and as such the
file system needs to provide generic operations that will act on files regardless
of their internal structure [50]. These generic file operations are described
below.
Creating a file
Silberschatz, Galvin, and Gagne [50] state that there are two steps involved
in creating a new file. Firstly, allocating space on the hard disk drive for the
file, and secondly, modifying the directory entry where this file will exist in
order to reflect the new file.
Allocating space for the file involves finding unallocated space in the
storage map to house the file, then modifying the storage map in order to
reflect that the storage space is now allocated, and then allocating or creating
a file control block for the new file. Once the file has been allocated, the file's
26 FILE SYSTEMS
parent directory entry must then be modified to reflect the new file. Once all
the metadata has been created and written to the disk, data can be written
to the underlying storage device.
Deleting a file
Removal of the file from the file system is the reverse of file creation. There
are again two steps involved, unallocating the storage space where the file
existed, and removing any metadata related to the file. The storage map
then needs to be modified, unlike file creation; storage space now needs to
be marked as unallocated, which will allow the file system to reallocate the
storage space to any new files that will be created at a later stage.
Removal of the metadata belonging to the file involves removing the file
reference from the directory entry, and removing, or unallocating, the file
control block which references the file.
Most file systems do not remove the actual file data, but just allow the
blocks where the file existed to be overwritten at a later stage. This how-
ever, is a very insecure method of removal, as there is always a chance that
deleted information can be recovered at a later stage. This is done to allow
for a performance improvement; only removing file metadata is a much less
intensive operation than removing all of the file's data. When this deletion
method is in use, it is possible for deleted data to be recovered from a file
system [10].
More secure file systems "zeroize"
4
the blocks where the file existed, mak-
ing it more difficult for the information to be recovered at a later stage.
However this will greatly decrease the performance of a file system.
Opening a file
Opening a file is usually achieved through a system call that will instruct
the operating system to access the file system and create a pointer to the
file. Any controls on the file, such as whether a file is read-only or not,
will be controlled through the operating system's interface. A file is usually
opened using a PO SIX compliant open() command on UNIX systems. The
operating system will then request the file system to return the relevant
metadata for the file. A file pointer will then be created which will allow
user processes to interact with the file.
4
"zeroize" refers to the process involved in writing zero ( OxOO) to every byte where a
file existed.
2.6. FILE SYSTEM OPERATIONS 27
The operating system maintains a table of all open files and the associated
control mechanisms. Should a process request a file pointer that does not
point to a valid file, or points to a closed file, the operating system will
generate an error, which will be passed to the process.
Reading a file
The file read operation is required to read the data from the underlying
device and store the result in a buffer that will be returned to the invoking
process.
A read () operation is PO SIX compliant on UNIX systems. The file read
operation that is provided by the operating system is simpler than the file
write operation, because none of the on disk structures are modified [23].
The operating system will store a position indicator for each open file which
will be used by the read operating to retrieve data, and also used to indicate
if the end of the file has been reached. This allows the operating system to
read streams of data.
Writing a file
The file write operation is more complex that the read operation, because it
needs to handle many different situations, for instance, appending data to a
file may require that the file's metadata be expanded, and as Giampaolo [23]
points out, even modifications to the superblock may be required.
The write() operation is a POSIX compliant system operation. The
most basic form of the write operation allows data to be written to an existing
file, should a file not exist; it will then be created by the operating system.
This operation needs to handle many different situations, such as when
the file needs to grow beyond the size of the blocks that are currently allocated
to it; the file system will then need to allocate more blocks to the file. This
process needs to be handled with care, as many of the file system's structures
may need to be modified. Firstly the file system needs to find space in the
storage map and mark the blocks as allocated. The file's control block will
need to be modified, specifically the direct and indirect block addresses in
order to allocate file system blocks to the file. The file's metadata will then
need to be updated to reflect the new size of the file.
In the following section we will discuss directory operations. Directories
are easiest viewed as special types of files, and thus directory operations are
closely related to file operations in design.
28 FILE SYSTEMS
2.6.5 Directory Operations
Directory operations are very closely related to file operations because both
sets of operations act on the same generic type of data, however there are
differences in the way in which files and directories need to be handled. Op-
erations such as creating a directory are generally more complicated because
of the hierarchical directory structure that needs to be maintained.
Creating a directory
Giampaolo [23] argues that creating a directory is a more complex operation
than creating a file. In UNIX systems both files and directories have inodes
to store metadata; different file systems will use similar methods to store
metadata. As a result, the creation or allocation of an inode is very similar
to the creation of a file. However, a directory needs to be initialised, and
the more complex the structure used to manage the directory hierarchy, the
more complex this initialisation will be.
Together with initialising the directory structure for the newly created
directory, the parent directory entry will also need to be modified in order
to maintain the hierarchy correctly. This operation needs to be handled
with care because, it is fundamental to the creation of a hierarchical file
system [23].
Deleting a directory
The directory deleting operation is very similar to the operation for deleting
a file, however there needs to be care taken to manage the items which
the directory contains. This is an implementation dependant approach, as
every file system will handle this situation in a very different way. The
most common solution is to only allow a directory to be deleted if it has
no dependencies. Another solution is to recursively delete anything that is
contained in this directory, this can be a very expensive and time consuming
operation if the parent directory contains many other files and directories.
Opening a directory
The directory open command is fairly simple, the operating system will re-
quest the file system to open the directory using the PO SIX opendir ()
command. Just as the open() command provided access to the contents of
2.7. VIRTUAL FILE SYSTEM 29
a file, the opendir () command must provide access to the contents of the
directory [23). Internally this operation needs to provide a mechanism for
the operating system to access the directory entry that refers to a directory.
Again if a simple internal directory structure is used, then this is a fairly
trivial operation.
Reading a directory
As Giampaolo [23) discusses, the operation which reads the contents of a
directory operates together with the directory open command to provide the
directory listing, usually achieved by issuing a PO SIX compliant readdir ()
command. The main purpose of the directory reading operation is to provide
a convenient method of enumerating the directory contents.
Writing a directory
The directory writing operation does not manifest itself as a single operation
in most operating systems, and refers to the processes involved in updat-
ing a directory entry to reflect a newly created entity, such as a file or a
sub-directory. This operation again varies in complexity depending on the
underlying structure of the directory entries, the more complex the structure,
the more complex the handling of the directory entries. Such is the case in
a system that utilises a complex structure such as a B-Tree; the directory
write operation will need to ensure that the tree remains balanced, and as
a result may need to perform rotations, and rebalancing operations on the
directory structure.
Most modern operating systems aim for a level of interoperability between
multiple file system implementations. In the following section we will dis-
cuss the Virtual File System as a method for providing an abstraction layer
to allow multiple file systems to transparently interact within an operating
system.
2.7 Virtual File System
The goal of most modern operating systems is to provide the ability for its
users to access a large variety of different file systems, thus allowing maxi-
mum interoperability between systems. A Virtual File System (VFS) is an
30 FILE SYSTEMS
Operating System Kernel
EJ
Figure 2. 7: Virtual file system overview
abstraction layer that sits between a file system implementation and the op-
erating system kernel. The VFS provides a generic interface that is utilised
by the kernel, and then relays the commands to the file system implementa-
tion. The operating system does not differentiate between files on different
file system because all operations are performed indirectly by the VFS. The
VFS is sometimes called the V node layer.
The VFS can be easily extended to include new file systems, and this
does not require modification to the kernel of the operating system. In order
for the VFS to utilise a file system, an interface between the open () , read () ,
and write 0 commands of the VFS and the file system's corresponding com-
mands must be provided. This allows for an operating system to transpar-
ently access many different file systems regardless of their implementation,
as outlined in figure 2. 7.
An extension to the VFS layer is the concept of Vnode stacking [26,
61] which allows for modules to be inserted into the VFS interface which
would transparently extent the abilities of a file system. Function calls that
are passed through the VFS layer are then passed through any number of
stackable layers until the actual file system implementation interacts with
the disk. An example of such a stackable file system is WrapFS [61] which
wraps onto a directory on an existing file system, and can be used to provide
additional features such as transparent encryption or compression. In the
following section we will discuss the Filesystem in Userspace.
2.8. FILESYSTEM IN USERSPACE (FUSE) 31
2.8 Filesystem in Userspace (FUSE)
Filesystem in Userspace (FUSE) [53] is an extension of the operating system
kernel which allows a file system to be implemented in userspace. This allows
file systems to be easily developed without having a direct interface with the
operating system kernel. FUSE is currently part of the Linux kernel, and
is available on a number of platforms, such as FreeBSD and MacOS X. File
systems implemented using FUSE are highly portable and can be used on
any operating system which include the FUSE kernel extension.
FUSE file system implementations communicate with the operating sys-
tem kernel through the FUSE libraries, which will in turn communicate with
the kernel's VFS layer. FUSE file systems allow the easy expansion of the
overall functionality of the kernel.
FUSE can be used to create a powerful file system implementation, such
as NTFS-3G driver. This driver offers a fully stable NTFS file system imple-
mentation, which operates completely in userspace.
The interaction between the kernelspace and userspace does introduce a
performance impact, as such, FUSE file systems are not considered to be as
efficient as native implementations. However, steps can be taken to maximise
performance and produce a high performance file system. In the following
section we will present a summary of this chapter.
2.9 Summary
In this chapter we covered the following topics:
The Disk - in which we described the layout and organisation of a
physical device on which a file system structure is created.
File System Layers -- where we discussed a number of conceptual
layers which can be used to describe the component parts of a file
system implementation.
File System Abstractions - where we discussed the basic abstract stor-
age containers for a file system, namely:
- Files - as a container for storing raw streams of bytes.
- Directories - as a container for creating a hierarchical organisa-
tional structure containing files and other directories.
32 FILE SYSTEMS
File System Structures -- in which we discussed a number of file system
control structures which are used to house the file system metadata,
namely:
- File System Descriptor which stores file system metadata, such
as the file system size.
Storage which is used to mark allocated and un-
allocated file system blocks.
- File Control Blocks which store metadata concerning the files.
- Directory Entries which store metadata concerning the direc-
tories.
File System Operations which describe the operations which are
performed by the file system in order to manipulate metadata, files
and directories, namely:
- POSIX Compliance a set of standard operations which allow
the file system implementation to interface with different operat-
ing systems.
- Read and Write Operations the operations provided by the
operating system, which allow the file system implementation to
interact with the physical device.
- System Operations operations which interact with the file sys-
tem metadata.
- File Operations operations which interact with files.
- Directory Operations - operations which interact with directo-
nes.
Virtual File System an abstraction layer provided by the operating
system which allows the interaction of multiple file system implemen-
tations.
Filesystem in Userspace- an operating system kernel extension which
allows file system to be implemented in userspace.
2.10 Conclusion
In this chapter we discussed the basic concept of a file system. We discussed
the structure of the low-level storage media. We then went on to introduce
2.10. CONCLUSION 33
how a file system can be conceptualised as a number of interacting layers,
which can be used to control the flow of data through the file system, even-
tually resulting in the permanent storage of the data on the storage media.
The discussion continued with the introduction of the file system struc-
tures that are used to control the storage and retrieval of the data on the
disk. We then commented on some of the operations that are found in the
file system in order to act on the stored data. Lastly we discussed the Vir-
tual File System, and the Filesystem in Userspace which allows a file system
implementation to be implemented in userspace.
This chapter, along with chapter 3 and chapter 4, form the foundation
which is used in later chapters. Concepts introduced in this chapter are used
extensively throughout chapters 6 and 7 in order to describe the component
parts of the steganographic file system.
Many different types of information systems rely on cryptography to pro-
vide a level of information security. In the following chapter we will cover
many different aspects relating to cryptography which will be referred to
throughout the remaining chapters.
Chapter 3
Cryptography
3.1 Introduction
Cryptography plays a very important role in modern society. With an ever
increasing amount of personal information being stored on computer systems,
and transmitted over the Internet, mechanisms need to be in place to ensure
that this data remains secure. This need to secure information has always
played an important role in human history; from the simple substitution
ciphers made famous by the Romans, to a new age of quantum cryptography,
there has always been a need to keep information secure.
In this chapter we discuss some basic cryptographic principles and then go
on to further discuss some specific cryptographic techniques and algorithms.
Firstly in section 3. 2 we discuss some basic cryptographic terms that will be
used to form a basis for the cryptographic techniques that are discussed in
the later sections. We go on in section 3.3 to discuss symmetric encryption
techniques by introducing some basic theory, and then go on in sections 3.3.2
and 3.3.3, to discuss the DES and Serpent algorithms respectfully.
DES is a good example of a cryptosystem, because it has been in use
for a number of years and its properties are fully understood. Serpent is
a modern cryptosystem which was created using elements of the DES algo-
rithm. Although there are many different encryption algorithms; DES and
Serpent provide a complete overview of the design elements of symmetric
cryptosystems.
We then go on to discuss two different block cipher modes in section 3.4,
namely electronic codebook mode in section 3.4.1, and cipher block chaining
mode in section 3.4.2. This is then follows with a comparison of these two
35
36 CRYPTOGRAPHY
techniques in section 3.4.3. These two techniques provide a good understand-
ing of block cipher modes. All other block cipher modes extend the basic
principles that will be discussed.
A discussion of asymmetric encryption is then presented in section 3.5,
again presenting some theory, which is then followed by a brief discussion of
RSA encryption in section 3.5.1. Finally in section 3.6 we introduce cryp-
tographic hash functions with a discussion of message authentication codes
in section 3.6.2, and message integrity codes in section 3.6.1. We continue
this discussion with the Birthday Attack in section 3.6.3, and conclude this
chapter with a discussion of the SHA-1 algorithm in section 3.6.4.
Although there are many different cryptographic systems that are in use
today; the ones that will be discussed in the following section provide a
good understanding of the principles that are employed in many different
cryptosystems.
3.2 Basic Concepts
Throughout this chapter certain terms will be used in order to describe the
discussed cryptographic systems. These terms are in line with those outlined
by Schneier [46], and are described in table 3.1.
Plaintext
Ciphertext
Cryptosystem
Key
The unencrypted message or data
The encrypted message or data
The cryptographic system used for encryption
The key that is used to facilitate encryption
Table 3.1: Basic cryptographic terms
A cryptosystem is simply a collection of mathematical functions that
allow plaintext to be obfuscated into cryptotext; called encryption. A reverse
function is usually defined that can return the cryptotext to the original
plaintext; called decryption. In order to control the process and to provide
security, the key is provided. To ensure that the message can be only be
decrypted by an authorised person, the key or part of the key needs to
remain secret.
Along with the basic terms used to describe a cryptosystem, a set of basic
functions are also defined in order to mathematically describe a cryptosystem.
These basic functions are described in table 3.2.
3.3. SYMMETRIC ENCRYPTION 37
P Plaintext
C Ciphertext
E() Encryption function
D() Decryption function
Ek () Encryption function using key k
Dk() Decryption function using key k
Table 3.2: Basic cryptographic functions
In the following sections we will discuss Symmetric Encryption and Asym-
metric Encryption. These two encryption schemes provide the basis for the
standard encryption algorithms that are in use today. We will then go on to
discuss different algorithms for each of the two schemes.
3.3 Symmetric Encryption
Symmetric encryption refers to the family of cryptosystems that utilise a
"shared secret" approach to data encryption. The shared secret usually takes
the form of an encryption key, or a passphrase that is used to control the
encryption process.
Symmetric cryptosystems have two different forms, namely stream ciphers
and block ciphers. Stream ciphers are used to encrypt a single character at a
time, opposed to block ciphers that are used to encrypt a block or a number of
characters at a time. We will only be concerned with block ciphers because
of the prominent role they play in cryptographic and steganographic file
systems, which will be discussed in later sections.
A block cipher that encrypts 128-bits at a time is said to have a 16-byte
block size. Symmetric cryptosystems have a form as seen in equation 3.1,
and as seen in figure 3.1.
c
p
(3.1)
Symmetric cryptosystems have an inherent weakness; the key has to re-
main secret for the cryptosystem to be effective. If the key is compromised
in any way then the validity of the encrypted data can no longer be assured.
38 CRYPTOGRAPHY
Encryption
Key
l
Decryption
Key
l
Algorithm 1--l ---. .. Plaintext
Figure 3.1: Symmetric encryption
Therefore all parties involved in the encryption process need to take adequate
steps to ensure that the encryption key remains secure.
Schneier [46] outlines how a symmetric cryptosystem would be used to
encrypt and decrypt data:
1. Alice and Bob agree on a cryptosystem that will be used to encrypt
the data.
2. Alice and Bob agree on a key k.
3. Alice encrypts the data with the selected cryptosystem and the key k.
4. Alice sends the encrypted data to Bob.
5. Bob decrypts the data using the selected cryptosystem and the key k.
Now that the principles of symmetric encryption have been discussed, we
will now discuss substitution boxes. Substitution boxes are used in many
different symmetric cryptosystems as a method of securely encrypting data,
as with the DES algorithm discussed later.
3.3.1 Substitution Boxes
Substitution boxes, or S-Boxes are used in block ciphers to perform a sub-
stitution of bits; it is argued that S-Boxes are what give block ciphers their
security, because the substitution is a non-linear step in the encryption pro-
cess [46]. S-Boxes are discussed because of the important role they play in
3.3. SYMMETRIC ENCRYPTION 39
0000 0111
1111 0111 0100 1011 1001 0101 0011 1000
0001 1110 1000 0111 0011 1010 0101 0000
1100 1000 0010 1110 1010 0000 OllO llOl
'
'
I
'
'
I
' I
'
' I
' I
'
lo ll @ill
111 r1o111
Bit Block Outer Bits Inner Bits Output Bits
Figure 3.2: DES 81 substitution box {18}
symmetric cryptosystems, specifically in the DES and Serpent algorithms
that will be discussed below. A S-Box could be represented as a function
that would resemble equation 3.2, where St is a substitution box, bi are the
inner bits, b
0
are the outer bits, and bsub is the result of the substitution
(3.2)
In figure 3.2, 8
1
represents the first S-Box from the DES block cipher.
This S-Box is implemented as a 4 x 16 matrix, each of the rows represents a
number from 0 to 3, and each column represents a number from 0 to 15. The
original bit block is a binary number consisting of 6 bits, where b
0
, b
1
, ... , b
5
represent each individual bit, and b
0
is the least significant bit.
The complete original bit block in figure 3.2 is 001101b, the outer bits
are obtained by selecting b
0
and b
5
; the most significant bit and the least
significant bit. In this case the outer bits form the number Oh. The inner
bits are obtained by selecting b
1
, b
2
, b
3
, and b
4
; the bits between the most
significant and least significant bits. The inner bits in this case form the
number 0110b.
The substitution bits would be the result of the function S
1
(01b, OllOb),
which would form 1101b. The result of this substitution would be combined
with other substitution operations to form a step in the encryption process.
Substitution using S-Boxes are an integral step in the encryption process
of both the DES and Serpent algorithms discussed below.
40 CRYPTOGRAPHY
The DES algorithm was one of the first to be widely adopted by institu-
tions to secure data. Although other encryption algorithms existed, there was
no standardisation which limited the commercial use of cryptography. The
acceptance of DES as a standard allowed for more commercial applications
of cryptography.
The Serpent algorithm that is also discussed below uses principles of the
DES algorithm to create a cryptosystem that meets the needs for modern
data security. Serpent was an AES finalist, and as such was designed to
provide a fast and secure data encryption algorithm. Both DES and Serpent
are good examples of cryptosystems that are widely in use today.
3.3.2 Data Encryption Standard (DES)
History of DES
Schneier [46] explains that in the early 1970s the use of non-military cryp-
tography was not standardised, although there existed a number of crypto-
graphic algorithms they were all different and could not be used to inter-
change encrypted data, this limited their commercial use. The Data En-
cryption Standard (DES) became a United States federal standard in 1973,
and was used to encrypt "non-classified" government data. The American
National Standards Institute (ANSI) then adopted DES for commercial use,
and eventually many different industries started to utilise DES as the pre-
ferred method for securing data. The DES algorithm itself was derived from
the IBM Lucifer algorithm that was developed in the early 1970's [52].
DES Encryption Algorithm
The full DES algorithm is outlined in the Federal Information Standards
Publication 46-3 [18], and operates on 64-bit blocks using a 64-bit key. In
order to encrypt a block of plaintext, a number of operations are applied
to the block in order to produce the cryptotext block. Firstly a block of
plaintext is permuted using what is known as the Initial Permutation (I P);
which simply rearranges the bits of the plaintext block. This permuted block
then goes through 16 iterations of a key-dependant calculation in order to
obtain the preoutput, and then finally permuted using the Inverse Initial
Permutation. This can be seen graphically in figure 3.3.
The basic encryption algorithm for a single block of plaintext is show in
equation 3.3. The 64-bit plaintext block is broken up into two 32-bit blocks,
3.3. SYMMETRIC ENCRYPTION 41
I Plaintext Permuted BlockJ

' I Inverse IP J
1
\ Encryption Function ; Preoutput ------
1
Ciphertext Block
! !
!..----------------------------------------------'
Figure 3.3: DES encryption algorithm flow
called R and L; representing the left and right 32-bits of the plaintext. For
each iteration of the encryption function, R and L are combined with a
unique key (Kn) to generate a new Rand L which are used in subsequent
iterations to finally produce the preoutput for the current block.
For a description of the notation used below, please refer to "Notation"
on page xiii.
Kn KS(n,KEY)
Rn-1
Ln-1 EB f(Rn-1' Kn) where n = 1, 2, ... , 16 (3.3)
In equation 3.3, K S is the "key schedule" function, which is used to pro-
duce the unique key Kn for the current iteration ( n). The "cipher function"
(f) is used to encrypt a 32-bit block using the unique key. Ln and Rn are
the generated for the current iteration. The key schedule and the cipher
function will be discussed below. The complete DES encryption algorithm
can be seen in algorithm 1.
The cipher function (f) and the key schedule function (KS) will now be
discussed below. These two functions are discussed because of the important
role that they play in the functioning of the DES algorithm. The inter-
ested reader is referred to FIPS PUB 46-3 [18] for more detailed information
regarding these and other elements of the DES algorithm.
The Cipher Function (f)
The cipher function is used during each of the 16 iterations of the DES
encryption algorithm. The cipher function accepts two input blocks, a 32-
bit block and a 48-bit key, and will produce a 32-bit ciphered block. Firstly
the cipher function creates a 48-bit block from the 32-bit input block through
42
Input: inputBiock - a block of plaintext.
Input: key- a secret key.
Output: outputBiock - a block of ciphertext.
LR +-- Ini tialPermutation (inputBiock);
L
0
+-- left 32 bits of LR (bits 63-32);
Ro +-- right 32 bits of LR (bits 31-0);
for n +-- 1 to 16 do
Kn +-- KeySchedule(n, key);
Ln +-- Rn-l;
Rn +-- Ln-lEB CipherFunctionCRn-1, Kn);
end
CRYPTOGRAPHY
outputBiock +-- Inverseini tialPermutation (L
16
, R
16
);
Algorithm 1: DES Encryption Algorithm [18]
the use of what is referred to as the E function - this is simply a number of
bit selections on the original block to produce a new permuted
block.
The new 48-bit block is then added to the 48-bit key, using bitwise
addition, this will produce a single 48-bit block. This 48-bit block is then
broken up into eight 6-bit blocks, which are each passed through one of eight
Substitution Boxes to produce eight blocks. These eight 4-bit blocks
are then combined to form a single block, which is then passed through
the permutation function, which is simply a permutation of the bits of the
block, to produce the final 32-bit output block for the current iteration.
The Key Schedule Function ( K S)
The purpose of the key schedule function is to generate a key for each of the
16 iterations of the DES encryption algorithm. The key schedule algorithm
firstly creates two 28-bit blocks, called C and D. This is achieved using
the Permuted Choice 1 function. Depending on the current iteration of the
cipher function, C and D are left-shifted either one or two places. Again,
depending on the current iteration of the cipher function, C and D will either
go on to another round of or be passed to the Permuted Choice
2 function, which will form the completed 48-bit key to be used within the
cipher function. For more a comprehensive description of functions used
within the DES encryption algorithm, the interested reader is referred to
FIPS PUB 46-3 [18].
3.3. SYMMETRIC ENCRYPTION 43
The DES algorithm is a relatively slow method of encrypting data. In
order to meet the demands of modern computing a new encryption algorithm
was needed which would be just as secure, and would encrypt data in a much
more efficient way. One of these new algorithms is the Serpent algorithm,
which will be discussed below.
3.3.3 Serpent
History of Serpent
The Serpent algorithm was introduced in 1998 as a candidate for the Ad-
vanced Encryption Standard, which was organised by the US National In-
stitute of Standards and Technology (NIST) to find a successor algorithm
for the DES algorithm. The design for AES required that the new algo-
rithm should be faster and more secure than Triple DES [2]. The Serpent
algorithm was initially designed to build upon elements of DES, because of
the well-understood nature of the DES algorithm; specifically the original
Serpent algorithm used the S-Boxes from DES.
Serpent Encryption Algorithm
The Serpent algorithm operates on a 128-bit block of plaintext, with a 256-
bit key, although the key size can be any length between 64-bits and 256-bits,
as a shorter key is padded so that the key used in the encryption is always
256-bits [2]. The Serpent algorithm will encrypt a 128-bit block of plaintext
to a 128-bit block of ciphertext, using 32 iterations of the round function,
which will be discussed later, using a different 128-bit key in each of the
iterations.
As Anderson, Biham, and Knudsen [2] explain, the Serpent algorithm
will operate on a number of input blocks of plaintext. Firstly, the initial
permutation (I P) will be applied to a block of plaintext ( P), which will
produce a 128-bit block B
0
which will be used in the first iteration of the
32 iterations of the round function ( R). Each of the iterations of the round
function will produce a block (Bi) that will be used in the following iteration,
where i is the current iteration. Finally a final permutation (F P) is applied
to the last block produced by the round function (R), which will produce
the 128-bit block of ciphertext. The algorithm can be described as seen in
equation 3.4.
44 CRYPTOGRAPHY
Bo IP(P)
Bi+
1
R(Bi) where i = 1, 2, ... , 31
C F P(B
32
) (3.4)
The Round Function ( R)
Each of the 32 iterations of the round function produces a round output by
applying a single S-Box per iteration in parallel. As Anderson et al. [2]
explain, R
0
would use S
0
, where S
0
is the first S-Box, and R
0
is the first
iteration. It follows that R
1
would use S
1
during the next iteration and so
on. The S-Boxes produce a 4-bit output from a 4-bit input, so as stated
above the S-Boxes are applied in parallel, so during iteration i, Si would
operate on bit 0 ~ 3 of the input block, and concurrently operate on 4 ~ 7 of
the same input and so on. The results of these independent operations are
combined to produce the final output that will be used in the next round.
As in the implementation provided for AES, Serpent utilised a set of eight
S-Boxes that where generated from the standard eight DES S-Boxes. As a
result for an iteration i, the S-Box that will be applied would be Si mod
8
[3].
The round function is described in equation 3.5, where L is a Linear
Transformation, Si is the S-box for the current iteration, and Ki is the key
that is used in the current iteration.
L(Si(X Ki)) i = 0, 1, ... , 30
Si(X Ki) K32 i = 32 (3.5)
Decryption is achieved by applying the inverse of the S-Boxes in the
reverse order, with the inverse of the Linear Transformation and using the
reverse order of the keys used in the round function.
The Linear Transformation is simply a permutation of the bits of an
input block to produce a permuted output block. It is implemented using a
number of bitwise shifts and XOR operations. For a detail description of this
linear transformations and a detailed discussion of the Serpent algorithm, the
interested reader is referred to Anderson et al. [2].
In order for the block ciphers that are discussed above to operate effi-
ciently there are a number of different methods that can be used to encrypt
sequential blocks. Discussed below are two of the most common block cipher
3.4. BLOCK CIPHER MODES 45
modes; namely Electronic Codebook Mode (ECB) and Cipher Block Chain-
ing Mode ( CBC). A comparison between ECB and CBC is then presented in
order to graphically demonstrate the differences between the two modes.
3.4 Block Cipher Modes
Symmetric block ciphers use different modes of operation that can be used
to encrypt data. As Schneier [46] states, the chosen mode depends on the
application. All block ciphers operate on "blocks" of data, the plaintext P
is broken up into equal size blocks depending on the block-size of the cipher
being used. For the purposes of the following discussion, P is considered to
be the complete plaintext, and n is considered to be the number of plaintext
blocks, and P
1
, P
2
, ... , Pn is considered to be each block of the plaintext, the
size of Pn may be smaller than the cipher's block size, the same is assumed
for the ciphertext C. Two common block cipher modes are discussed in
the following section, namely Electronic Codebook Mode (ECB) and Cipher
Block Chaining Mode ( CBC). There are other block cipher modes, however
they all follow similar principles in their approach to encrypting blocks, the
interested reader is referred to Schneier [46] for more information regarding
these and other block cipher modes.
3.4.1 Electronic Codebook Mode
Electronic Code book Mode (ECB) is the simplest way to utilise a symmetric
block cipher. Every block of the plaintext is encrypted with the key to pro-
duce the output. ECB mode follows directly from the traditional definition
of a symmetric cipher, and as such can be defined as seen in equation 3.6.
Ci where i E {1 ... n}
Pi where i E { 1 ... n} (3.6)
ECB can be implemented in a very efficient manner because it can be
calculated in parallel; this can be seen from figure 3.4 [15]. As a result Ci
can be calculated directly from Pi. This does expose an undesirable feature
of ECB mode; Schneier [46] points out that if enough of the original plaintext
blocks and their corresponding ciphertext blocks are known, then parts of the
messages can be decrypted, even if the key is not known. This is particularly
46 CRYPTOGRAPHY
Plaintext (P) Ciphertext (C) Plaintext (P)
,---- Encryption (Ek) ,---- Decryption ( Dk)
r-
pl c1 pl
r---- r---- f---
Pz Cz Pz
r---- r---- t------
p3 c3 p3
r---- r---- t------
p4 c4 p4
r---- r---- t------
r---- f--- r----
Pn Cn Pn
L_ ~ L--
Figure 3.4: Electronic codebook mode
true for messages that have a regular structure, such as the headers of an
email message. For example, if it is known that the plaintext "foobar" in
encrypted form is "Ox4f653a018fcd", then every occurrence of the block of
ciphertext can be replaced with the corresponding block of plaintext. In the
following section we will introduce Cipher Block Chaining mode as a method
for improving on ECB mode.
3.4.2 Cipher Block Chaining Mode
Cipher Block Chaining mode implements a feedback mechanism where the
first block of the plaintext is XOR'ed with an Initialisation Vector and then
encrypted, the resulting block of ciphertext is XOR'ed with the next block of
plaintext and so on, until all the plaintext blocks have been encrypted [46].
CBC mode is defined as shown in equation 3.7.
Ek(P1 0 IV) c1
Ek(Pi 0 Ci-1) ci where i E { 2 ... n}
Dk(C1) 0 IV p1
Dk(Ci) 0 Ci-1
p.
~
where i E {2 ... n} (3.7)
CBC mode introduces IV which is an Initialisation Vector that will be
XOR'ed with the first block of the plaintext to start the chaining. CBC
mode is explained graphically in figure 3.5. Dworkin [15] states the IV
does not need to remain secret, but needs to be randomly generated; this
will ensure that the chaining is not predictable in any way. Schneier [46]
3.4. BLOCK CIPHER MODES 47
I Initialisation Vector (IV) J
I Plaintext ( PJ) I I Plaintext (Pz) I
1
Encryption (Ek) Encryption (Ek)
I
~
Ciphertext (C
1
) Ciphertext (Cz)
I
I
I
I
I
I
I
II Plaintext (Pn) I ---------------------------------
I
Encryption (Ek)
Ciphertext (Cn)
Figure 3.5: Cipher block chaining mode
discusses two problems that can occur with CBC mode; these are padding
and error propagation.
Padding refers to the fact the most plaintext data does not divide cleanly
into the block size of the cipher, and the last block will need to be padded in
order to allow the ciphertext for the block to be produced. Error propagation
will result from the first ciphertext block becomes corrupt, then the entire
resulting ciphertext will be corrupt.
3.4.3 ECB versus CBC
The effect of encrypting data in ECB mode can be clearly seen in figure 3.6.
The image that is produced when the original image is encrypted in ECB
mode is still discernible; this is a result of the same key being used on each of
the plaintext blocks. This clearly demonstrates the weaknesses of encrypt-
ing data in ECB mode when the image is compared to an image encrypted
using CBC mode, which results is only noise (random pixels of colour) being
produced.
We produced figure 3.6 by taking the original vectorised image of the
Darwin OS mascot and converting it to a Portable Pixmap image (ppm),
which is a very simple bitmap image format. The ppm version of the image
was encrypted using the mcrypt [32] utility, firstly in ECB mode, and then
again in CBC mode. Both images were encrypted using the DES algorithm,
and in both cases the key used was "hexley".
48 CRYPTOGRAPHY
Original Image [28] ECB Encryption CBC Encryption
Figure 3.6: Comparison between ECB and CBC modes
The shared secret approach to data encryption used in symmetric encryp-
tion does present a problem regarding key management, if the key is com-
promised, then the entire encryption scheme is no longer valid. Asymmetric
encryption, discussed in the next section, provides a scheme that allows data
to be encrypted using a "composite key". This scheme is used for digital
certificates and digital signatures where authenticity can be guaranteed.
3.5 Asymmetric Encryption
Asymmetric encryption refers to the family of cryptosystems that use a key
for encryption and a different key for decryption, also called public-key cryp-
tography. This type of cryptosystem as first described by Whitfield Diffie and
Martin Hellman in 1976 [12, 46]. In this section we will explain asymmetric
encryption, and then we will discuss RSA encryption as an example of an
asymmetric cryptosystem.
Asymmetric encryption differs from symmetric encryption, in that each
party involved in the transmission of encrypted data has two different keys; a
public-key and a private-key collectively called a key-pair. The public-key
is distributed freely, while the private-key remains secret. Asymmetric cryp-
tosystems allow data that is encrypted by the public-key to be decrypted
only with the corresponding private-key, as seen in figure 3. 7. These cryp-
tosystems have the form as shown in equation 3.8, where kprivate and kpublic
are elements of the same key--pair.
3.5. ASYMMETRIC ENCRYPTION
Ekpublic ( P)
Dkprivate (C)
c
p
49
(3.8)
Asymmetric cryptosystems rely heavily on large random prime numbers
in order to generate the key-pair, usually in the region of thousands of bits
long. The security of these cryptosystems is assured through the compu-
tational complexity of prime factorisation on modern computers. It is con-
sidered nearly impossible for a standalone modern desktop computer to fac-
torise a significantly large number into its two prime components. However
through the use of large distributed computing projects sometimes involving
thousands of computers, the time taken for finding the component prime
numbers can be reduced.
Schneier [46] again outlines how an asymmetric cryptosystem would be
used to encrypt and decrypt data:
1. Alice and Bob both generate a key-pair, consisting of a public-key and
a private-key.
2. Alice and Bob now agree on an asymmetric cryptosystem that will be
used to encrypt the data.
3. Bob sends Alice his public-key.
4. Alice encrypts the data using the selected cryptosystem with Bob's
public-key.
5. Alice sends the encrypted data to Bob.
6. Bob decrypts the data using his private-key.
In order to further discuss asymmetric encryption, RSA Encryption will
be discussed below. RSA is a well-understood and widely used asymmetric
cryptosystem.
3.5.1 RSA Encryption
History of RSA
The RSA Encryption algorithm was first introduced in 1978 and is named
after its inventors Rivest, Shamir, and Adleman [44]. As Schneier [46] points
out, it is a very popular public-private key algorithm because it is very easy
to understand and implement.
50 CRYPTOGRAPHY
Encryption
Public Key
l
.. , Algorithm .. cryptotext
Decryption
Private Key
l
Cryptotext,---+1 .. , Algorithm .. Plaintext
Figure 3.7: Asymmetric encryption
RSA Encryption Algorithm
The RSA algorithm [44] makes use of trap-door functions in to provide
strength to the cryptosystem; specifically RSA makes use of the inability
for computers to quickly factorise large numbers in prime numbers.
RSA makes use of two pairs of numbers, the public-key pair (e, n), and
the private-key pair (d, n), where d, e, and n are three positive integer num-
bers. To encrypt a message M, which is represented by a sequence of integer
numbers, M is raised to the power of e and then the ciphertext C is the
remainder of Me when divided by n. The decryption of the ciphertext C, is
simply C raised to the power of d, and then the plaintext results from the
remainder of Cd being divided by n. These functions are formally defined in
equation 3.9.
C Ekpublic ( M)
Dkprivate (C)
(3.9)
The key-pairs are chosen in such a way that they are related to two very
large prime numbers. Firstly, n is defined as the product of two large random
primes p, and q. The integer dis chosen to be a large number that is relatively
prime
1
to (p- 1) * ( q- 1). Finally e is chosen to be the multiplicative inverse
2
of d modulo (p- 1) * (q- 1). The definitions for n, d, and e can be seen in
equation 3.10.
1
Relatively Prime - When gcd( a, b) = 1 then a and b are relatively prime [22].
2
Multiplicative Inverse - When a * b = 1 then a is the multiplicative inverse of b.
3.6. CRYPTOGRAPHIC HASH FUNCTIONS 51
n P*q
gcd(d, (p- 1) * (q- 1)) 1
1 mod (p- 1) * (q- 1) (3.10)
There are often cases where the consistency or authenticity of data must
be determined. The use of cryptographic hash functions, discussed in the
following section, allows data to be represented by a unique hash which
would allow the consistency or authenticity of data to be checked.
3.6 Cryptographic Hash Functions
Cryptographic hash functions form a specific family of cryptographic ciphers
that aim at producing a specific output for a specific input. The output that
is produced is called the hash-value or message digest, and the function that
is used to produce the hash-value from an input is called the hash-function.
The input for a cryptographic hash-function is referred to as the message.
All cryptographic hash functions will take an arbitrary sized input and
produce a fixed-length output. Given a cryptographic hash function h with
a domain D and a range R, the mapping between the domain and the range
is shown in equation 3.11.
h:D-+R where IDI > IRI (3.11)
The size ofthe domain (IDI) is always greater than the size of the mapped
range (IRI), and due to the fact that the hash-function h can accept any arbi-
trary sized input, this implies that the function is many-to-one [35]. Menezes
et al. [35] points out that this will imply that a hash-function will contain
collisions; which are identical outputs for unique inputs. One of the design
aims of a cryptographic hash function is to minimise the probability that a
collision will occur in real world applications. The fact that a cryptographic
hash function will produce collisions can be used as the basis for an attack
to compromise the integrity of the hashed message; this feature is discussed
in section 3.6.3.
A cryptographic hash function will always produce a standard length
output, which is known as the bitlength of the hash-function. If a hash-
function produces an output that consists of m-bits then the hash-function
will have a bitlength of m [35].
52 CRYPTOGRAPHY
There are two main groups of cryptographic hash functions, namely keyed
and unkeyed. Keyed hash functions will generate a hash-value using a secret
key, and an input message. This type of hash-function is used to generate
what is known as a Message Authentication Code (MAC) which can be used
to verify the source and integrity of a message.
An unkeyed hash-function generates a message based solely upon the
input message without the use of a secret key; this type of hash-function is
used to generate what is known as a Message Integrity Code (MIC) [35]. This
type of hash-value is used to verify the integrity of a message. These two
types of hash-functions are discussed briefly in section 3.6.1 and section 3.6.2.
MICs and MACs are not used to secure messages, only to provide mech-
anisms to verify the integrity of a message. An example of the use of hash-
values to provide a mechanism to verify integrity can be seen on many FTP
services across the Internet. Files that can be downloaded from a particular
website or FTP server are often distributed with their corresponding hash-
values, which can be used to verify that a downloaded file is the same as the
original file. If the file was corrupted during download then a hash-value for
the file would not match the hash-value of the original file.
3.6.1 Message Integrity Codes
A Message Integrity Code (MIC) function will generate a hash-value based
solely on the input message, in the ideal case the output that is produced is
unique to the input message; however collisions can be produced as discussed
above. Menezes et al. [35] point out that there are two types of MICs namely:
1. One-Way Hash Functions - where finding an input message that
hashes to the output hash-value is computationally difficult.
2. Collision Resistant Hash Functions -- where finding two messages that
hash to the same hash-value is computationally difficult.
The use of MICs to verify the integrity of a message can be seen in fig-
ure 3.8, where the file ubuntu-7 .10-dvd-i386. iso has been provided along
with the file MD5SUMS, which contains the original MD5 [43] hash value for a
file as generated by the distributor. In order to verify integrity of the file, the
program md5sum [13] is use to calculate the hash-value of the locally stored
copy of the file and compare it to the hash-value provided by the distributor.
If the hash-value of the file matches the hash-value provided then we can
say with relative certainty that the two files are identical.
3.6. CRYPTOGRAPHIC HASH FUNCTIONS
rootmachine:/hash# ls
MD5SUMS ubuntu-7.10-dvd-i386.iso
rootmachine:/hash# cat MD5SUMS
b5d9aaa45af862b4c804530734216a15 *ubuntu-7.10-dvd-i386.iso
rootmachine:/hash# md5sum -c MD5SUMS
ubuntu-7.10-dvd-i386.iso: OK
rootmachine:/hash#
Figure 3.8: File verification using a message authentication code
3.6.2 Message Authentication Codes
53
A Message Authentication Code (MAC) is a hash-value that is used to en-
sure the integrity and source of a message. MAC hash-functions accept a
message and a secret key. MACs can be generated in a number of ways,
either using a symmetric block cipher, or by using a Message Integrity Code
that is combined with the secret key, in both cases a hash-value is produced
that can be used to verify the integrity of a message.
Block Cipher MAC
The simplest approach to generating a MAC is to use a block cipher to
encrypt a message using a specific block cipher mode (see section 3.4), and
then use the last block of the ciphered message as the MAC [46]. Provided
both parties involved in the authentication of the message use the same secret
key to generate the MAC, the same result will be achieved.
Message Integrity Code MAC
This again is a simple approach to the generation of a MAC; this method
simply involves combining the input message and the secret key to generate
a hash-value using a MIC (see section 3.6.1). Again, providing that both
parties involved in the authentication of a message use the same secret key
and the same algorithm to generate the hash-value then the authenticity of
the message can be verified.
Hash functions suffer from collisions because of the limited output range.
As such they are subject to an attack known as the birthday attack, where
the probability that a collision can occur can be calculated. The birthday
paradox is discussed below.
54 CRYPTOGRAPHY
3.6.3 Birthday Attack
The birthday attack is one of the most common attack which is used on
cryptographic hash functions, and exploits the fact that hash-function will
generate collisions for two or more distinct inputs. It is named after the
Birthday Paradox, which is a standard statistical distribution problem. In
order to fully explain the birthday attack, the birthday paradox is briefly
explained.
Definition 3.1 Combinatoric definitions [35}
1. Let m, n EN, where m 2: n. Then m'Il is:
m'
mrr = m(m- 1)(m- 2) ... (m- n + 1) = (m -n)!
(3.12)
This is the lower factorial, which will count the number of permutations
of m distinct objects when n of those objects are chosen.
2. Let m, n E Z, where m 2: n. Then { 7::} is:
(3.13)
This is the Stirling number of the second kind, which counts the number
of ways to partition m objects into n non-empty subsets.
The definitions presented in equation 3.12 and 3.13 are standard combina-
toric functions that deal with the counting the permutations and partitioning
of a given number of objects; these functions are used in the following theo-
rems to explain the classical occupancy problem and the birthday paradox.
Theorem 3.1 Classical occupancy problem [35}
A bucket contains m balls that are numbered 1 through m. If n balls are
drawn from the bucket one at a time, their number listed, and then returned
to the bucket (i.e. with replacement), then the probability that exactly t
different balls have been drawn is,
{
n} m'Il
f(m, n, t) = t mn'
where 1 ::; t ::; n (3.14)
3.6. CRYPTOGRAPHIC HASH FUNCTIONS 55
The classical occupancy problem is a probability function that calculates
the probability of a certain different number of occurrences over a larger set.
The birthday paradox follows from this, as is seen in equation 3.15.
Theorem 3.2 Birthday paradox {35}
A bucket contains m balls that are numbered 1 through m. If n balls are
drawn from the bucket one at a time, their number listed, and the returned
to the bucket (i.e. with replacement), then the probability of at least one
coincidence is,
mn_
g(m,n) = f(m,n,n) = 1--,
mn
where 1::; n :=:;: m (3.15)
As m--+ oo then n = O(Jm) (the upper asymptotic bound).
The birthday paradox can be demonstrated by the following example.
Consider a situation where there is a large group of people, and you would
like to calculate the number of people required such that there is a greater
than fifty-percent chance that at least one person has the same birthday as
you do; this can simply be calculated as 364 x 0.5 ~ 183. Now consider a
situation where you would like to calculate the number of people required
such that there is there a greater than fifty-percent chance that there is at
least one coincidence, or that there are 2 people that share the same birthday.
This situation can be calculated using the birthday paradox, and is simply
g(365, 23) ~ 0.507 (see theorem 3.2). Therefore you would need to have at
least 23 people together to have a greater that fifty-percent chance of at least
one coincidence.
The birthday attack follows directly from the birthday paradox, consider
the following situation as outlined by Schneier [46]. A hash-value that con-
tains n-bits would require calculating the hash-value of 2n random messages.
However, finding two messages that have the same hash-value would require
calculating the hash-value of at least 2n/
2
messages.
An attacker could theoretically generate multiple messages with different
minor changes in each; if enough messages are generated then there could be
a case where two of those messages would generate the same hash-value. This
could be exploited by an attacker which would allow a legitimate message
to be exchanged with a false message that has an identical hash-value, and
therefore would be considered to be a valid message.
We will now discuss an implementation of a cryptographic hash function.
The Secure Hash Algorithm (SHA) is a standard algorithm for calculating
56 CRYPTOGRAPHY
hash values. There are many variations of the SHA, each giving a slightly
different hash value. SHA -1 will be discussed in the following section.
3.6.4 Secure Hash Algorithm (SHA)
SHA Algorithm
There are four SHA algorithms that are outline in the Federal Information
Processing Standards Publication 180-2 [16], these are SHA-1, SHA-256,
SHA-384, and SHA-512. All of the algorithms are similar in design and
produce different output bitlengths. The SHA-1 algorithm will be discussed
below.
The SHA-1 algorithm operates on 512-bit blocks of a message and will
produce a 160-bit message digest. The SHA algorithm is broken up into two
stages; these are Pre-processing, and Hash Computation.
SHA-1 Pre-processing
The SHA-1 algorithm can accept an input message (M) of arbitrary length,
although the input message is padded so the total bitlength of the message
will always be a multiple of 512-bits. The message is then broken up into N
512-bit blocks that will be operated on by the Hash Computation phase of
the algorithm; these blocks can be represented as M(l), M(
2
), ... , M(N). The
Hash Computation phase uses an iterative algorithm to produce the final
message digest. For each iteration, a hash block (H) is produced and will be
used in the next iteration, and so on.
The hash block that is used in the first iteration is called the initial hash
value which will be used to produce H(l). The interested reader is referred
to FIPS PUB 180-2 [16] for more detailed information on the initial hash
value.
SHA-1 Hash Functions
The Secure Hash Standard (SHS) defines a number of functions which are
used in each iteration to calculate the hash-value. There are a number of
different functions that are defined for each version of the SHA algorithm.
Discussed below are the functions that relate to the generation of a SHA-1
hash. Firstly the SHS defines a set of primary functions that are used in
the hash computation. A circular right shift function is defined, as seen in
3.6. CRYPTOGRAPHIC HASH FUNCTIONS
Input: M- The input message.
Input: N- Number of message blocks.
Output: OutputHash- The hash-value.
I* Operate on every input message block *I
for i <--- 1 to N do
I* Initialise the temporary variables with words from
the previous iteration *I
H
(i-1).
a.___ o '
b H
(i-1).
<--- 1 '
H
(i-1).
c <--- 2 '
d H
(i-1).
<--- 3 '
H
(i-1).
e <--- 4 '
for t <--- 0 to 79 do
I* Perform 80 iterations of the cipher functions for
this iteration *I
T <--- ROT L
5
(a) EB ft (b,c,d) EB e EB Kt EB Wt;
e <--- d;
d <--- c;
c <--- ROTL
30
(b);
b <--- a;
a<--- T;
end
I* Create the Hash Block that will be used in the next
iteration *I
H
(i) Lf\H(i-1).
o <-aw o '
Hii) <--- b EBHii-
1
);
H
(i) IT\H(i-1).
2 <-Cw 2 '
.___ d
H
(i) IT\H(i-1).
4 <-ew 4 '
end
I* Return the Hash Block that was generated for the final
iteration, this is the ciphertext of the input message
block *I
OutputHash <--- fi(N);
Algorithm 2: Complete SHA -1 algorithm
57
58 CRYPTOGRAPHY
equation 3.16. Where ROTLn represents a circular right shift by n positions
on a 32-bit word.
ROT L n = ( x n) V ( x 32 - n) (3.16)
SHS also defines a set of eighty functions, called fo, fi, ... , !7
9
, and have
the form ft ( x, y, z). The particular form off differs depending on the number
of the current iteration. These functions are used during each iteration of
the hash computation and are defined in equation 3.17.
{
Ch(x,y,z) = (xl\y)0(xl\z)
Parity(x, y, z) = x 0 y@ z
ft(x, y, z) = Maj(x, y, z) = (x 1\ y) 0 (x 1\ z) 0 (y 1\ z)
Parity(x, y, z) = x 0 y 0 z
SHA -1 Constants
0:; t:; 19
20:; t:; 39
40:; t:; 59
60:; t:; 79
(3.17)
The SHA defines a set of eighty constant values that are used in the hash
computation, labelled Kt where t E 0, 1, ... , 79. These constant values are
shown in equation 3.18.
Ox5a827999 0 :; t :; 19
Ox6ed9eba1 20 :; t :; 39
Ox8f1bbcdc 40:; t :; 59
Oxca62c1d6 60 :; t :; 79
SHA-1 Hash Computation
(3.18)
The hash computation phase operates on each of the N blocks of the in-
put message. For each iteration of the hash computation, firstly a message
schedule (W) must be created. During the hash computation it is necessary
to reference certain 32-bit words within a larger structure. The SHS defines
the following notation to perform this reference, for example, M ~ i ) refers to
the nth 32-bit word in the ith message block.
The message schedule is made up of eighty 32-bit values; each of these 32-
bit values is referenced as Wt, where t = 0, 1, ... , 79. The message schedule
is constructed as follows:
3.7. SUMMARY 59
{
M(i)
TXT _ t
rrt - 1
ROT L (Wt-3 iSl Wt-s iSl Wt-14 iSl Wt-16) 16 :::; t :::; 79
0 :::; t :::; 15
The complete SHA-1 algorithm can be described as seen in algorithm 2.
The interested reader is referred to FIPS PUB 180-2 [16] for more detailed
information concerning the SHA family of hash functions.
3.7 Summary
In this chapter we covered the following cryptographic concepts:
Basic Concepts- where we discuss a number of concepts common to
many different cryptographic systems.
Symmetric Encryption- where we discuss the family of cryptosystem
which use a single "shared secret" , we discuss a number of concepts
and algorithms, namely:
- Substitution Boxes - a method which is used in many different
symmetric cryptosystems to securely encrypt data.
- Data Encryption Standard (DES) - a historically widely used
symmetric encryption algorithm,
- Serpent -- a modern symmetric encryption algorithm which is
faster and more secure than DES.
Block Cipher Modes -which are used in conjunction with a symmetric
cryptosystem to ensure data security, we discussed the following block
cipher modes:
- Electronic Codebook Mode- a simple cipher mode which uses the
same encryption key for each block which is encrypted.
- Cipher Block Chaining Mode - a cipher mode in which the en-
cryption key is dependent on the preceding encrypted block.
ECB versus CBC- where we compare the two discussed block
cipher modes to show the differences between the two.
Asymmetric Encryption - where we discuss the family of cryptosys-
tems which use different keys for encryption and decryption, we dis-
cussed the following cryptosystem:
60 CRYPTOGRAPHY
- RSA Encryption - a widely used asymmetric cryptosystem, use
primarily for digital certificates.
Cryptographic Hash Functions - a family of cryptosystems which pro-
duce a unique output for a particular input. We discussed the following
concepts:
Message Integrity Codes - used to verify the integrity of a mes-
sage.
- Message Authentication Codes - used to verify the authenticity
of a message.
- Birthday Attack -- a widely used attack on cryptographic hash
functions.
Secure Hash Algorithm (SHA-1) - a commonly used crypto-
graphic hash algorithm.
3.8 Conclusion
Cryptography is used extensively in modern information systems to provide
both secure communication and secure storage of data. Although many dif-
ferent algorithms and techniques exist, they all strive to provide the same
goals; to ensure data remains secure, and to provide that security quickly
and efficiently.
In this chapter we discussed some basic cryptographic theory and tech-
niques. We started the discussion with some basic cryptographic concepts
that can be used to describe cryptosystems. This was followed with a dis-
cussion of symmetric encryption with an overview of the DES and Serpent
algorithms. We then went on to describe different techniques that are used
to encrypt data used in symmetric block ciphers with specific block cipher
modes.
We then went on to describe asymmetric cryptosystems and this was
followed with a brief discussion of RSA encryption. Finally we discussed
cryptographic hash functions and their specific forms, namely Message Au-
thentication Codes, and Message Integrity Codes. The Birthday Attack was
then discussed along with a brief discussion of the SHA -1 algorithm.
In the following chapter we will discuss steganography and steganographic
file systems. Concepts and algorithms introduced in this chapter are used
throughout the following chapters in order to explain different aspects of
3.8. CONCLUSION 61
information security. Steganography in particular relies heavily on cryptog-
raphy to ensure information security. As such this chapter plays a vital role
in the understanding of the following chapters.
Chapter 4
Steganography and
Steganographic File Systems
4.1 Introduction
Data hiding techniques are becoming more prominent as more digital media
becomes available. Steganography can be used for many different situations
where data of some form needs to be hidden. Applications of steganogra-
phy can be found from intellectual property protection to currency anti-
counterfeiting software. As increasingly more personal information is found
in a digital form there is a need to protect that information, and steganog-
raphy offers a method of hiding data that provides plausible deniability that
particular hidden data never existed.
Steganography as a concept is not a new one. Historically there have
been many different attempts to ensure that information remains hidden,
as this increases security. These techniques have been adapted for use in a
modern world where digital information can be hidden within other digital
information to such a degree that it can become almost undetectable. As
security of information becomes increasingly important in modern society,
steganography can play an important role in ensuring that data remains
secure.
In this chapter we will introduce steganography and steganographic file
systems and discuss some applications and methods thereof. In section 4.2
we will introduce steganography and some non-digital applications, in order
to demonstrate the extent steganography plays in our day to day lives. In
section 4.3 we will discuss image and audio steganography. We will then
introduce cryptographic file systems in section 4.4, and then discuss some
63
64 8TEGANOGRAPHY AND 8TEGANOGRAPHIC FILE SYSTEMS
implementations. We will then introduce steganographic file systems in sec-
tion 4.5, and finally we will discuss some implementations.
4.2 Steganography
Steganography is a component of information hiding and literally means
"covered writing". Generally it refers to the hiding of information within
other information. Where cryptography is the art of obscuring information,
steganography is the art of obscuring the presence of information [41]. Mod-
ern steganographic techniques often use cryptographic algorithms to further
obfuscate information before it is hidden.
In the modern world, our personal information must remain secure. It is
the responsibility of the user to ensure that private data will remain private.
Such as is the case in the United States, where border officials are allowed
to search the laptop computers of travellers without a warrant, and based
purely on suspicion [42, 48, 55]. Steganography and data hiding techniques
can play an important role in ensuring that our personal information can
remam secure.
Steganography has many wide ranging applications, from paper water-
marking, currency protection mechanisms, to intellectual property and copy-
right protection mechanisms such as digital watermarking and digital finger-
printing, or simply to provide anonymity or allow for covert communications.
We will now discuss terminology that is used to describe steganographic sys-
tems, and we will then discuss some historic uses of information hiding.
4. 2.1 Terminology
Like cryptography, steganography utilises a number of standard terms in
order to describe the different components of a steganographic system. The
data that is going to be hidden is called the embedded data, the file in which
the data is to be hidden is called the cover-file. Depending on the type of
data the cover-file contains, it can be referred to as the cover-text, cover-
image, cover-audio, cover-video, or generically as the cover-object. The file
produced from the steganographic process is called the stego-object, or can
be referred to as the stego-text, stego-image, stego-video, or stego-audio.
Finally the key used to control the steganographic process is called the stego-
key [41, 1]. The terms are summarised in table 4.1.
4.2. STEGANOGRAPHY 65
A stego-object is referred to as steganographically strong if it is impos-
sible to detect the presence of the embedded data [36]. In the following
section we will discuss historic uses for information hiding, and some modern
non-digital applications of steganography.
4.2.2 Historic Steganography
Steganography has historically been used in many different forms to either
protect a message, or to verify the authenticity of a message. Early cryp-
tographic techniques were not particularly sophisticated, and often took the
form of a simple shifting cipher, such as the Caesar cipher. It was difficult
to distribute the keys from one far flung outpost to another, so as a result
if messages or keys were intercepted it was a fairly trivial task of decoding
them.
Hiding the existence of messages became an attractive solution for the
secure transportation of important messages. Techniques included writing
messages on wax tablets, or shaving the head of a slave, tattooing the message
on the head of the slave, and then sending the slave with the message once his
hair had grown back [41]. These techniques would ensure that if a message-
carrier was intercepted, the message would have a greater chance of not being
located.
Paper watermarking has long been used to verify authenticity of paper
documents, and although not effective today, paper watermarks are still seen
on paper currency and official documents. Paper watermarks are created
during the milling of the paper, and are used to hide simple information, in
the fibre of the paper [41].
4.2.3 Currency Protection Mechanisms
Currency incorporates many different steganographic objects in order to com-
bat counterfeiting [6], by embedding information that proves that a note or
Cover-file
Embedded data
Stego-object
Stego-key
An unassuming file that data will be hidden in
The data that will be hidden in the cover-file
The result of the steganographic process
A key used to control the steganographic process
Table 4.1: Basic steganographic terms
66 STEGANOGRAPHY AND STEGANOGRAPHIC FILE SYSTEMS
0
0
0
0
0
Figure 4.1: Example of an EURion constellation
coin is authentic. Most paper currency will incorporate many different anti-
counterfeiting elements such as:
a paper watermark embedded into the fabric of the
paper currency.
Colour changing ink ink that will change colour based on the viewing
angle.
Moire patterns geometric patterns that will appear blurred if the
note is duplicated using normal computer equipment.
Raised ink special ink that is raised off the surface of the note; this
cannot be produced with standard computer equipment.
Security strips -- metallic strips that are embedded into the fabric of
the note.
Micro-writing very small writing that will not be clear if the note
is duplicated.
EURion a geometric shape consisting of five lmm cir-
cles (See figure 4.1) that can be detected by anti-counterfeiting software
[30, 37].
UV ink that will only become visible under Ultra Violet light.
4.3. DIGITAL STEGANOGRAPHY 67
Currency protection mechanisms are a closely guarded secret of the coun-
try issuing the note. There are probably many more anti-counterfeiting ob-
jects that are built into the design of a note that the general public is not
aware of.
4.2.4 Copyright Protection Mechanisms
Steganography can be used to protect digital media, such as images, audio, or
video. This is done by embedding a digital watermark or digital fingerprint
in the digital content. The term Digital Watermark was first described by
Tirkel et al. [54] in their paper Electronic Watermark.
Copyrighted digital data such as images, audio and video, or any other
digital data that can be transmitted electronically, can be watermarked in
order to control its distribution, or to prove the legal ownership. Digital
Watermarks strive to be persistent in nature. Ideally a watermark should
still be detectable even after the data has been manipulated. In the case of
an image, even after the image has undergone a number of transformations
[41].
Another use for digital watermarking is as a Digital Rights Management
(DRM) system. Peinado et al. [40] explain that distribution information can
be included within video in order to only allow authorised people to view
the media. Wu et al. [59] explains how hidden information can be embedded
into medical images to prevent any form of tampering.
Digital watermarking has many applications as we use more and more
digital data in our daily lives. Steganography has wide ranging digital ap-
plications, which will be discussed in the following section. We will discuss
methods of image and audio steganography. We will then discuss methods
of attacking steganographic systems that utilise the least significant bit.
4.3 Digital Steganography
Digital steganography utilises digital data to hide other digital data. Com-
mon types of cover-files that are used are images and audio. Steganography
can be applied to any digital data, as long as the underlying structure is well-
understood. In this section, image steganography and audio steganography
are discussed, as they are most likely techniques to be encountered.
68 STEGANOGRAPHY AND STEGANOGRAPHIC FILE SYSTEMS
4.3.1 Image Steganography
Image steganography uses an image as the cover-file and is probably the
most well known of all the steganographic techniques.
Image steganographic techniques encode embedded data within the pixel
information of the cover-image. In a simple system, a single bit of the em-
bedded data can be stored in the Least Significant Bit of a single pixel from
the cover-image. This has the effect of increasing the overall colour value of a
pixel by one, which does not produce any visible change in the cover-image.
For example, assume that a cover-image uses 24-bits to store the colour
information for a single pixel, with 8-bits representing each of the colour
channels of the pixel, namely Red, Green, and Blue. Then 3-bits of the
embedded data can be stored in each pixel of the cover-image, a bit in the
red channel, a bit in the blue channel, and a bit in the green channel. The
number of least significant bits used can be increased, but this could produce
a stego-object which may appear visibly different from the cover-image. The
maximum size of the embedded data is therefore dictated by the size of the
cover-file [36, 5].
In order to obscure the presence of the embedded data further, a pseudo-
random sequence of pixels from the cover-image is chosen. This is controlled
through the use of a stego-key. The stego-object that is produced will be
the same size as the original cover-file [1].
However the least significant bit approach also has limitations; the em-
bedded data is not completely strong, as the embedded data can be lost if the
stego-image undergoes a transformation, such as a rotation. The embedded
data is not likely to survive operations such as JPEG compression, as this will
rewrite the pixel information of the cover-image, effectively destroying any
embedded data [1, 41]. As Francia and Gomez [19] explain, this can be an
effective method for destroying any steganographic content in the cover-file.
4.3.2 Image Steganography Example
We created Figure 4.2 using the steghide [27] application. Stego-objects
are created by steghide using a pseudo-random sequence of parts of the
cover-image. Graph theory is then used to find which of these parts, when
exchanged, will have the effect of encoding the embedded data. The colour
values are not changed in the resulting stego-object, which the author of the
application claims will make it resistant to standard steganographic detection
methods. However this approach will create a stego-object that is larger in
4.3. DIGITAL 8TEGANOGRAPHY 69
Original cover-image [56] Stego-image
MD5 Hash: edfa4c0babbba4de75b746600aec78ce MD5 Hash: fb09dc81f3fbb4b74f0b2dfca8527fcb
Embedded data [57]
MD5 Hash: 7e4890ea23e10bcd82a720362b65296e
Figure 4.2: Image steganography example
70 8TEGANOGRAPHY AND 8TEGANOGRAPHIC FILE SYSTEMS
size than if the least significant bit approach is used. The interested reader
is referred to the Steghide Manual [27] for more detailed information.
As can be seen from figure 4.2 the cover-image and the resulting stego-
image are visually indistinguishable, although the MD5 [43] hash of the two
images are different.
4.3.3 Audio Steganography
Another very interesting application of steganography is audio steganogra-
phy. Audio steganographic techniques attempt to hide information within an
audio file, while not changing the perceivable output [5]. With the ever in-
creasing sale of digital audio on the Internet, and the demand from recording
companies to ensure that digital music cannot be pirated; steganography can
be used to provide an effective Digital Rights Management (DRM) solution.
One method of audio steganography is the use of echo hiding. As Gruhl,
Lu, and Bender [25] explain, the human auditory system is more sensitive
than the other human senses. It is generally difficult to embed data within
cover-audio because of the large audio range that humans can perceive, both
in terms offrequency and power. They go on to explain that there are "holes"
in the human auditory system that can be exploited to encode steganographic
data, and not change the audible output.
Echo hiding operates by introducing an echo into the cover-file to hide
embedded data. Two different length echoes are used to encode 0 and 1, and
thus allowing binary data to be hidden. This data hiding technique relies
on the fact that if the original sound and an echo are close enough together,
humans will not distinguish between the two distinct sounds, but will only
hear a single "compound" sound. The interested reader is referred to "Echo
Hiding" [25] for more information.
4.3.4 Least Significant Bit (LSB) Attacks
LSB steganography is the most common form of steganography used, and as
such there are a number of methods for detecting steganographic content.
As Westfield and Pfitzmann [58] explain, there are two general type of LSB
detection methods; visual attacks and statistical attacks. Both of these types
of attacks will be discussed below.
4.3. DIGITAL STEGANOGRAPHY 71
Visual Attacks
Visual attacks are performed manually for any steganographic method that
modifies the LSB of a cover-image. This type of detection works best on
greyscale images, where the embedded data is encoded sequentially within
the cover-image. A standard visual attack works by mapping the LSB of
each pixel in the stego-object.
The resulting map will contain 1 if the LSB of a pixel was 1 and 0 if
the LSB of a pixel was 0. The likelihood that an image contains embedded
data can be determined by visually analysing that amount of "noise" that is
present in the least significant bits of the cover-image.
However, Westfield and Pfitzmann [58] go on to explain that most LSB
steganographic techniques hide data very carefully as to avoid detection. The
interested reader is referred to Westfield and Pfitzmann [58], pp. 64-68 for
more information.
Statistical Attacks
Statistical attacks generally calculate the probability that embedded data of
a certain length is hidden in an object. Statistical attacks can be automated
to calculate the probability that embedded data exists quickly and accurately.
Westfield and Pfitzmann [58] present a statistical attack that uses pairs
of values (PoV), which are pixel values that only differ in least significant
bit, to calculate the probability that embedded data exists. Fridrich, Goljan,
and Du [21] explain that this approach works well for data that is embedded
sequentially, but does not produce accurate results for randomly embedded
data. The interested reader is referred to Westfield and Pfitzmann [58],
pp. 68-71 for more information.
Fridrich, Goljan, and Du [21] propose a statistical steganographic attack
that calculates length of potential embedded data, and therefore the existence
of embedded data. The stego-image is divided into groups of pixels of which
are then quantified to eliminate excess noise. The groups of pixels are then
divided into regular, singular, and unusable groups through the application
of a flipping function. The flipping function simply negates the LSB of a
pixel value. This technique calculates how the number of regular and singular
groups change with increasing embedded data lengths. For more information
the interested reader is referred to Fridrich et al. [21].
We will now discuss cryptographic file systems as a method for obfus-
cating data on the hard disk, by discussing some implementations thereof.
72 8TEGANOGRAPHY AND 8TEGANOGRAPHIC FILE SYSTEMS
Kernel Space User Space
Host File System
Figure 4.3: CFS design architecture [14)
Cryptographic file systems obfuscate data but do not hide the presence of
data. We will then discuss steganographic file systems in order to contrast
the application of each of these two different types of file system.
4.4 Cryptographic File Systems
The goal of cryptographic file systems is to provide transparent encryption
and decryption for user data. Cryptographic file systems are usually imple-
mented as an encryption layer on an existing host file system, as is the case
with CFS and Cryptfs discussed below. These file systems do not maintain
a traditional file system of their own, but rely completely on the host file
system to provide low-level access to the data. Other cryptographic im-
plementations such as the Linux Cryptoloop driver, also discussed below,
manages the encryption and decryption of raw data, which allows for indi-
rect creation of a cryptographic file system through the use of other userspace
tools.
4.4.1 The Cryptographic File System - CFS
Blaze [7] was one of the first to propose creating a file system that transpar-
ently encrypted and decrypted user data. The Cryptographic File System
(CFS) was created in order to demonstrate these techniques. Transparency
is achieved by limiting the amount of human-interaction with the "crypto-
graphic housekeeping" that usually occurs when trying to encrypt data.
Normal user tools that are available for encrypting data involve a fair
amount of interaction with the human operator. The UNIX mcrypt utility
can be used to encrypt data streams. The user however needs to specify a
number of parameter arguments, such as the encryption algorithm to utilise,
4.4. CRYPTOGRAPHIC FILE SYSTEMS 73
the keysize, and the keymode. CFS tries to eliminate a large amount of user
interaction through the use of transparency.
This is achieved by introducing encryption and decryption routines di-
rectly into the file system implementation. CFS creates a "virtual" file sys-
tem on the host machine, through which a particular user can interact with
their encrypted files. CFS is implemented as an interface between a UNIX
file system and encrypted user data. Blaze [7] explains that a user can access
their encrypted data by using a userspace tool to issue an "attach" command
on an encrypted directory using an encryption key. CFS will then encrypt
and decrypt user data as needed. When a user is done with the file sys-
tem, then a "detach" command is issued and CFS will detach the encrypted
directory from the virtual file system.
The CFS virtual file system is simply a directory that contains all the
encrypted data on the host file system; this allows CFS to operate completely
independent of the host machine. CFS is implemented as a userspace daemon
that interacts with the file system using a modified NFS [45] server. CFS
provides a number of userspace tools that can be used to interact with the
CFS daemon and the virtual file system.
The CFS virtual file system is created using the cmkdir command; this
creates an initial directory that will contain the encrypted data. Interac-
tion with CFS is initiated using the cattach command; this command will
instruct the CFS daemon to mount the virtual file system within the host
Vnode sub-system, because the CFS daemon is a modified NFS server, the
standard NFS client is used to control access to the CFS daemon. All of the
transparent encryption and decryption is then handled by the CFS daemon.
In order for a virtual file system to be unmounted, the cdetach command is
used.
Data in CFS is encrypted using the DES algorithm (see section 3.3.2,
page 40), however there are different encryption algorithms available. The
interested reader is referred to the article authored by Blaze [7] for more
specific implementation details.
4.4.2 Cryptfs
Cryptfs [60] is another implementation of a transparent cryptographic file
system. Like CFS it is implemented as a virtual file system that uses an
existing file system as a host. Unlike CFS, Cryptfs is implemented as a
loadable kernel module. This allows Cryptfs to have better performance and
security as all internal workings are protected within the kernelspace of the
74 8TEGANOGRAPHY AND 8TEGANOGRAPHIC FILE SYSTEMS
Kernel Space User Space
Host File System 1<---r------1 User Process
Figure 4.4: Cryptfs design architecture [14]
operating system. A userspace tool is provided with Cryptfs in order for the
user to access and manage encrypted data.
Cryptfs is implemented as a stackable Vnode (see section 2.7, on page 29)
level file system; this allows the Cryptfs kernel module to extend the func-
tionality of any existing file system. Data is encrypted using the Blowfish
[4 7] algorithm. The interested reader is referred to the article authored by
Zadok, Badulescu, and Shender [60] for specific implementation details.
4.4.3 Linux Cryptoloop Driver
The Linux Cryptoloop driver uses the Linux loopback driver to allow a file
to be mounted as a block device [11]. This allows a file system to be cre-
ated within a normal file. Interaction with the loopback device is done via
the standard Linux VFS layer [23]. The Cryptoloop device adds a level of
transparent encryption to the standard loopback driver.
The Cryptoloop device uses the Linux CryptoAPI to provide many dif-
ferent encryption algorithms for the underlying file. The underlying file is
initially mounted using a userspace application which will also initialise the
encryption key for the Cryptoloop device. After the initial mounting opera-
tion, the underlying file is interacted upon as if it were a normal block device,
with the Cryptoloop driver providing transparent encryption and decryption
of data. A userspace tool is also used to unmount the device and release any
kernel resources the Cryptoloop driver is utilising.
Once the underlying file has been created and mounted via the Cryptoloop
driver, standard file system creation tools can be used to create a complete file
system within the encrypted file. The standard file system implementation
will then handle the storage of data, and the Cryptoloop driver will handle
the underlying encryption.
4.5. 8TEGANOGRAPHIC FILE SYSTEMS 75
Kernel Space User Space
Disk H CryptoLoop Driver Host File System Virtual File System t<-------i---1 User Process J
Figure 4.5: Linux Cryptoloop driver architecture [14]
Although the Cryptoloop driver is generally considered a secure method
of transparent encryption, an exploit was discovered in 2005 which allows for
watermarked data to be detected within the underlying encrypted file [49].
4.5 Steganographic File Systems
Steganographic file systems are file system implementations that strive to
hide the presence of data within the structure of an existing file system.
Cryptographic file systems obscure data through cryptographic algorithms,
but never deny the presence of the encrypted data. Steganographic file sys-
tems however obscure data, usually through cryptography and data hiding
techniques to provide plausible deniability.
Plausible deniability is a feature which is exhibited by steganographic file
systems, in that is allows the existence of data to be denied. This allows
sensitive data to be hidden from adversities, such as to thwart industrial
espionage, or to protect trade secrets.
File system steganography can be seen as low-level steganography, while
image and audio steganography (see section 4.2) can be seen as high-level
steganography. High-level steganography makes it possible to almost com-
pletely hide the presence of embedded data to such a degree that it becomes
almost impossible to detect. This is achievable because the structure of the
cover-file is well known. Low-level steganography is much less precise, as in
the case of a file system; no assumptions can be made about the structure of
existing data.
In order to fully discuss different file system implementations we need to
make a number of assumptions about the structure of a file system. These
assumptions will be discussed below.
76 8TEGANOGRAPHY AND 8TEGANOGRAPHIC FILE SYSTEMS
4.5.1 File System Assumptions
In order to fully discuss steganographic file systems a number of assumptions
need to be made about the cover file system. The cover file system is the
existing file system where data will be hidden. This file system will contain,
in some form or another, a storage map which will mark allocated file system
blocks. The size of the file system block will be defined by the cover file
system, it will usually be about 1024KiB or larger.
Existing data in the cover file system must be considered to be raw data,
as no assumptions can be made about the structure or makeup of the data.
As an extension to this, any allocated file system block must be considered to
be "completely allocated" . A steganographic file system is confined to hide
data within the unallocated cover file system blocks, while at the same time
not inhibit the normal operation of the cover file system.
As a result steganographic file systems rely heavily on encryption algo-
rithms to obfuscate the presence of hidden data to the untrained eye. Ulti-
mately hidden data is allocated in such a way to make it appear as if the data
is a result of normal file system operations, such as the continual creation
and deletion of files.
In the section below we will discuss three different proposed methods for
creating a steganographic file system. The following three steganographic
file systems will be critiqued in the following chapter. Each of the discussed
methods uses the above assumptions in order to hide data within the struc-
ture of a host file system.
4.5.2 Anderson, Needham and Shamir
Two different steganographic file system implementations were presented by
Anderson, Needham, and Shamir [4]. They propose that a steganographic
file system should provide plausible deniability while also securing the hidden
data. Their two proposed solutions are discussed below.
Method I
The first method that is proposed by Anderson et al. [4] utilises a number of
random cover-files in order to hide embedded data. The embedded data is
hidden in an exclusive or (XOR) of a subset of the random cover-files, which
are chosen with a password P.
4.5. STEGANOGRAPHIC FILE SYSTEMS 77
Assume that the file F is to be the embedded data, and the user specifies
a stego-key P that has a bitlength of k. Then suppose the complete set of
random cover-files are C
0
, C
1
, ... , Ck-l The subset of cover-files are then
obtained by selecting CJ if the ;th bit of P is one. They go on to explain
that the subset of cover-files are combined using a bitwise XOR to produce
CxoR CxoR is then XOR'ed with F, the result of this is then XOR'ed with
a cover-file, CJ, from the original subset.
Anderson et al. [4] then go on to extend this method to include multiple
security levels. This system relies on the existence of cover-files which will
exist on the cover file system; which could potentially give away the existence
of this method.
Method II
In this method Anderson, Needham, and Shamir [4] propose that a whole
disk is filled with pseudo-random bits and the embedded data is stored at
some pseudo-random location on the disk. They go on to explain that this
approach is subject to the Birthday Paradox (see section 3.6.3, on page 54)
and that collisions are likely to occur after y'n disk block have been written,
assuming that there are n disk blocks in total. This implies that a disk is
considered full once only a fraction of the total blocks has been written.
Their solution is to write the embedded data at two or more pseudo-
random locations on the disk. This has the effect of reducing the possibility
that the embedded data will be overwritten, and thus increasing the total
possible amount of disk blocks then can be used. The interested reader is
referred to Anderson et al. [4] for more information.
4.5.3 McDonald and Kuhn
McDonald and Kuhn [33] propose a system that is inspired by the second
method presented by Anderson et al. [4] (see section 4.5.2). Their method
uses a modified version of the Ext2 [8] file system to store embedded in-
formation within unused file system blocks. Their design, called StegFS, is
implemented as a Linux File System that is backwards compatible with Ext2.
It incorporates fifteen different security levels and features all of the elements
of a standard UNIX file system, such as a directory structure that contains
directories and files, and hard and soft links. They make no attempt to hide
the existence of their file system from the trained user, but provide plausible
deniability through the use of encryption and a number security levels.
78 STEGANOGRAPHY AND STEGANOGRAPHIC FILE SYSTEMS
The backward compatibility of their file system with Ext2 allows for the
file system driver to be removed from the system and allow non-hidden files
to be accessed using the normal file system driver. This ability allows further
deniability because the hidden data will just appear to be unused disk blocks
that have been overwritten.
Access to the hidden data is controlled through a number of userspace
tools. These tools give access to a particular security level using a specific
password. Each of the fifteen security levels are accessible as directories
under the root directory. McDonald and Kuhn [33] explain that normal file
system operations are performed exactly as if the file system where a generic
Ext2 file system; this however opens up the possibility that hidden data will
be overwritten.
To counter this StegFS will replicate the hidden data on disk; this will
allow the hidden data to be recovered. This does not guarantee that hidden
data, including the replications, be completely overwritten. In this case, an
error is returned to the user and a file system repair tool can be used to clean
up any remaining remnants of the hidden data.
StegFS Structures
In order for StegFS to reference hidden files within a security level, a Block
Table and Inode structure are used. The Block Table controls the blocks that
are allocated to hidden files. For every allocated block there is a 128-byte
Block Entry structure, which contains magic numbers, checksum values, an
initialisation vector, and an associated in ode number.
The inode number in the Block Entry will reference an inode structure.
The StegFS inode structure is similar to the Ext2 Inode structure and con-
tains 12 direct blocks, one indirect block, one double indirect block, and one
triple indirect block. Unlike the Ext2 Inode, the StegFS inode contains refer-
ences to all the replicated versions of the hidden file. This allows the StegFS
to retrieve the hidden data even if one of the replicated versions has been
overwritten
4.5.4 Pang, Tan, and Zhou
Pang, Tan, and Zhou [39] propose a steganographic file system that strives
to minimise processing and storage overhead. To try and completely hide
the embedded data, any metadata, such as inode tables and usage statistics
4.6. SUMMARY 79
are embedded within the hidden data and encrypted as a single object to the
disk.
Block allocation is managed with a single storage bitmap that manages
both the hidden and non-hidden data. To try to minimise the overhead
from block replication, as seen in section 4.5.3, hidden blocks are marked as
allocated in the storage bitmap; this prevents these blocks being allocated to
unhidden files during normal operation.
This method of marking the location of hidden blocks with non-hidden
blocks in the storage bitmap could be used to betray the location of the
hidden data. To combat this, random values are written to every block during
initialisation. There are a number of abandoned blocks that are introduced
into the file system. These abandoned blocks are blocks that are marked as
allocated in the bitmap but only contain random values. This is done to try
to obfuscate the presence of actual hidden data.
Together with abandoned blocks, a number of dummy files are introduced
throughout the file system. These files are periodically allocated and deal-
located, resulting in the storage bitmap periodically changing. This is done
to prevent "snapshots" of the block bitmap betraying the location of any
hidden data.
No metadata is stored separate from the hidden files; as such hidden files
are located using a hash value of the file name combined with an access
key. The access key will only provide access to files that were created by a
specific user. Allocated hidden files have the ability to hold on to free blocks,
when a file is truncated, a file's inode can still reference the blocks that
normally would not be allocated to the file anymore. This further obfuscates
the hidden file, and makes it difficult to distinguish a hidden file from an
abandoned block or a dummy file.
4.6 Summary
This chapter was concerned with different steganographic concepts. The
following concepts were discussed in this chapter:
Steganography- discussed as an overview to steganography, in which
we introduced a number of different steganographic concepts, including:
- Terminology - we domain specific terminology was introduced to
the reader, which is used throughout this, and later chapters.
80 STEGANOGRAPHY AND STEGANOGRAPHIC FILE SYSTEMS
- Historic Steganography - we gave a brief overview of the historic
use of information hiding.
- Currency Protection Mechanism -we discussed this concept as
an example of a non-digital use of steganography.
- Copyright Protection Mechanism - in this section we discussed
the use of steganography as a method for providing digital rights
management (DRM).
Digital Steganography - in this section we discuss the popular digital
applications of steganography, including:
- Image Steganography - arguably the most well-known applica-
tion of information hiding. In this section we discussed the hiding
of information in images.
- Audio Steganography - where we discussed the hiding of infor-
mation in audio data.
- Least Significant Bit Attacks- we discussed methods of attacking
steganographic content based on the least significant bit approach.
Cryptographic File Systems - we discussed the family of file system
which transparently encrypt user data, this is discussed as a precursor
to later section. We introduced the following implementation,
- The Cryptographic File System - CFS - in which we discussed
this early cryptographic file system implementation.
- Cryptfs - a cryptographic file system which is implemented as a
loadable kernel module.
- Linux Cryptoloop Driver - an implementation of the Linux loop-
back driver which adds a cryptographic layer.
Steganographic File Systems - where we discuss the family of file sys-
tem which attempt to embed data within a file system implementation,
we discuss the following implementations:
- File System Assumptions - we introduce a number of assump-
tions which relate to the cover file system.
- Anderson, Needham, and Shamir- the first to describe a steno-
graphic file system, laying down the framework for other imple-
mentation.
- McDonald and Kuhn - this steganographic file system implemen-
tation which is more closely modelled on a traditional file system.
4.7. CoNCLUSION 81
Pang, Tan, and Zhou ~ a steganographic file system implemen-
tation which operated in a slightly unique way, using dummy and
abandoned blocks.
4. 7 Conclusion
The ability to hide information in digital data is an important part of the
electronic data that we interact with. We interact with steganographic tech-
niques on almost a daily basis, and probably never realise it. Steganography
is gradually finding its way into many different forms of digital data. As this
data becomes more accessible through the growing use of the Internet there
is a need developing for data to be secured, and steganography offers the
techniques for doing so.
There are often situations where cryptography is simply not sufficient
to secure data, like top-secret military information. Steganography offers
another level of protection against this sensitive data being compromised.
In this chapter we introduce the concepts and applications of steganog-
raphy. In section 4.2 we explain how steganography was historically used to
hide important messages, and how data hiding techniques are used in non-
digital data, such as currency. We then introduced some digital techniques
for steganography in section 4.3, namely image steganography and audio
steganography.
In section 4.4 we explained some concepts of transparent cryptographic
file systems and explained some implementations. Finally we introduced
steganographic file systems in section 4.5, and discussed how data hiding is
achieved in some file system implementations.
Concepts introduced in this chapter are fundamental to chapters 5, 6,
and 7. Steganographic concepts and terms are used extensively throughout
the following chapters in order to describe certain components. The concepts
introduced in chapters 2, 3, and 4 forms the basis for the remaining chapters.
In the following chapter we will discuss the design and implementation of a
Secure Steganographic File System.
Part II
SSFS: The Secure
Steganographic File System
83
Chapter 5
SSFS: File System
Implementation
5.1 Introduction
A steganographic file system can be used to ensure the security of informa-
tion, not only through conventional encryption mechanisms, but by allowing
data to be hidden from unauthorised access. Large amounts of information
can be stored in a secure manner within the structure of a host file system,
which provides an advantage over traditional image or audio steganography.
The implementation of a steganographic file system requires that a num-
ber of different aspects be addressed, such as the duplication of hidden data.
A steganographic file system will require a careful interaction between the
so called "hidden" and "non-hidden" data in order to maximise the overall
performance and reliability of the system. Additional information security
features must be addressed, such as the use of cryptography, in order to
provide a solution which can effectively secure steganographic content.
In order to address the problems with existing steganographic file system
implementations, we will introduce our steganographic file system. This
chapter serves to introduce our Secure Steganographic File System (SSFS).
In this chapter we introduce the implementation of our steganographic
file system. Firstly in section 5.2 we discuss a number of terms which will be
used in this, and later chapters to describe components of the steganographic
file system. We go on in section 5.3 to discuss the problems with existing
steganographic file system implementations.
85
86 SSFS: FILE SYSTEM IMPLEMENTATION
In section 5.4 we introduce the aim for this file system implementation by
discussing a number of aspects which must be addressed in order to achieved
a non-duplicating steganographic file system. Finally in section 5.5 we discuss
the basic construction of the steganographic file system with respect to the
interaction of the different components.
System A high-level computing environment which con-
tains an operating system which allows access to
a block device (such as a hard disk drive), usually
through the use of a Kernel API.
Host File System A file system implementation that contains (in
some form) a superblock, storage map, and file
and directory control blocks. The host file system
is used as a container for the hidden file system.
Hidden File System A file system that will reference hidden data within
the host file system. The hidden file system is
embedded within the host file system.
Non-Hidden Data Data that is stored on the host file system.
Hidden Data Data that is stored on the hidden file system.
Shell A command line interface (CLI) which allows an
operator to interact with the file system using hu-
man understandable commands.
Table 5.1: SSFS definitions
5.2 Definitions
In order to fully discuss the implementation of our steganographic file sys-
tem, a number of concepts must be defined, seen in table 5.1. These concepts
will be used throughout the following chapters in order to describe certain
components of the steganographic file system implementation and are essen-
5.3. PROBLEMS WITH EXISTING IMPLEMENTATIONS 87
tial to describe the interactions between different components of the overall
system.
5.3 Problems with Existing
Implementations
In order to fully understand the proposed steganographic file system, we
will now critically evaluate some problems with the existing steganographic
file systems which SSFS will try to address. We will evaluate the imple-
mentations of McDonald and Kuhn and Pang, Tan, and Zhou which were
introduced in sections 4.5.3 and 4.5.4.
5.3.1 McDonald and Kuhn
The steganographic file system implementation by McDonald and Kuhn (see
section 4.5.3 on page 77) stores hidden data in a backward compatible file
system implementation. The major drawback of this particular implementa-
tion is the duplication of steganographic data which has to occur in order to
avoid collisions between hidden data and non-hidden data. Hidden data is en-
crypted, as such the contents are only accessible with the correct passphrase.
However the existence of the hidden data can be betrayed through a combi-
nation of the data duplication and a low-level examination of the physical
file system blocks.
In order to limit the exposure of hidden data, and therefore limit the risk
of detection, it would be advantageous to avoid data duplication altogether.
Data duplication will create instances where an exact copy of the hidden
data is stored in two or more physical locations on a device.
In order for a steganographic file system to effectively hide data, the
hidden data is masked to appear as random artefacts from the multiple ma-
nipulations of a disk block. During the normal operation of a file system,
the possibility that unused data in two or more unallocated blocks will be
exactly the same is extremely low.
Steganographic content can be detected by physically examining the file
system and finding two or more unallocated blocks which contain the random
same data. If this condition is met then it can be said with relative certainty
that a host file system contains steganographic data. However a passphrase
88 SSFS: FILE SYSTEM IMPLEMENTATION
will still need to be obtained in order to actually view the steganographic
content, which is computationally expensive.
It would be advantageous to avoid data duplication in order to eliminate
the possibility that hidden data could be detected by examination of the file
system structure.
Summary
Presented below is a summary which outline the findings for the discussion
above.
The file system implementation is backward compatible with the host
file system.
In order to avoid collisions between the hidden and non-hidden data
duplication of the hidden data is used. This has the effect of:
- Two or more physical block will contain the same steganographic
content.
- Increasing the possibility that the steganographic content can be
detected by low-level examination of the device.
The file system contains it own set of control structure - which are also
duplicated.
The control structures reference all duplicates of the hidden data.
In the following section we will evaluate the implementations described
by Pan, Tan, and Zhou.
5.3.2 Pan, Tan, and Zhou
This particular implementation, discussed in section 4.5.4 on page 78 does not
suffer from the duplication of data which exists in the previously discussed
implementation. The authors however do not make any attempt to hide the
fact that a steganographic file system is in place. There is no distinction
between the host and hidden file system; there is only a single file system
which can hide data. This will have performance benefits but will not have
the inherent plausible deniability aspects of utilising a cover file system.
5.3. PROBLEMS WITH EXISTING IMPLEMENTATIONS 89
In order to manage the storage of data in this implementation, all the
storage information is stored in a common location. This however could pro-
vide a single point of failure for the entire system. Should the storage object
become corrupt all data, hidden and non-hidden will be lost. The conven-
tional file system design of distributing the file system metadata structure to
different physical locations will minimise the risk of file system data becoming
corrupt through a single failure.
The use of abandoned and dummy blocks allows for an efficient method
of obscuring the location of hidden data, this does not ensure that data will
remain hidden. This implementation relies on strong cryptography in order
to ensure that data will remain secure. The lack of a cover file system does
betray the location of hidden data as the exact physical positions must be
explicitly marked in a shared storage map. Given this, this implementation
does not create a steganographic file system in the purest sense of the concept.
Summary
Below we present a summary of the above discussion.
All control information for the hidden data is stored in a single object.
The storage map marks physical blocks allocated to hidden and non-
hidden data.
Blocks allocated to hidden data are not automatically reclaimed when
they are no longer used in order to obscure the location of the hidden
data.
Abandoned and dummy blocks are used to obscure the presence of the
hidden data.
- These blocks are moved around the file system in order to obscure
the location of the hidden data.
Lack of a cover file system could betray the presence of the hidden
data.
In the following section, the aim of the proposed implementation will
be discussed, together with the aspects which a steganographic file system
should possess.
90 SSFS: FILE SYSTEM IMPLEMENTATION
5.4 Aim
The aim for SSFS is to provide a mechanism to hide arbitrary data within
an unassuming regular file system. There are a number of areas which need
to be addressed when constructing a steganographic file system. These are
listed and discussed below.
Security ~ hidden data must remain protected from attack. This can
be achieved through the use of cryptography, and providing confiden-
tiality, integrity, and availability.
Consistency ~ hidden data that is read from the steganographic file
system must be the same as the initial data that was stored.
Transparency ~ normal operation of the host file system should not
be impacted by the embedded hidden file system.
Backward Compatibility ~ the implementation should be backward
compatible with the cover file system, thus providing plausible denia-
bility.
Dynamic Reallocation ~ to allow the hidden data to be reallocated
to any free physical block, in order to avoid data duplication, thus
avoiding collisions between the hidden and non-hidden data.
The aim of the following chapters is to describe the SSFS implementation
which provided a secure and convenient mechanism for embedding arbitrary
user data within a host file system. A further aim is to present an implemen-
tation which is free from having to duplicate data in order to avoid collisions,
as described in the previous chapter (see section 4.5, page 75). The proposed
solution is to provide a dynamic reallocation mechanism that will transpar-
ently reallocate hidden data.
This implementation will take the form of a file system within a file sys-
tem; this will allow for efficient storage and provide the foundation for dy-
namic reallocation.
This steganographic file system will focus on modifying an existing file
system implementation in order to support embedded data. This will also
allow for the host file system to be accessed by a standard file system driver.
Backward compatibility with the original host file system driver must be
maintained in order to effectively obscure the hidden data, and to provide a
plausible deniability feature.
5.4. AIM 91
In the following sections we will discuss each of the goals for SSFS stated
above, these are security, consistency, transparency, backward compatibility,
and dynamic reallocation.
Security
Information security is achieved through a combination of information hiding
and cryptography, to provide confidentiality, integrity, and availability.
The security of the user data is achieved through the use of cryptogra-
phy, which allows the file system to restrict unauthorised access to the hidden
data. Only by supplying the correct passphrase will access to the data be
allowed. A strong cryptographic algorithm will ensure that the user data
cannot be accessed or modified without knowing the passphrase, thus pro-
viding confidentiality, and integrity. Availability is assured through the file
system implementation which will manage and ensure access to hidden data
when required.
The choice of algorithm is important, as this will greatly influence the
overall performance of the file system. A good example of a cryptographic
algorithm to use is the Serpent algorithm, as discussed in section 3.3.3 on
page 43. Recall that the Serpent algorithm is designed to be a fast and
secure, modern cryptographic algorithm and thus is perfectly suited to this
kind of application, however, any modern cryptographic algorithm, such as
Rijndael [17] would provide a good basis for data encryption. The Serpent
algorithm was chosen for this implementation as it is not patented and is in
public-domain, with no restrictions on its use.
Consistency
It is important for the data that is contained within the file system, hidden
and non-hidden to remain consistent. This is especially important for hidden
data which exists in an environment which supports dynamic reallocation.
Data that is contained in the hidden file system could remain unchanged;
even after multiple reallocations. Precautions must be taken with both the
metadata and the user data to ensure that it remains consistent even after
reallocations have taken place.
Data consistency is difficult to achieve without the use of mechanisms
such as a journal. There is always the possibility that a catastrophic event,
such as a hard drive failure, would render user data inaccessible, which is
unavoidable.
92 SSFS: FILE SYSTEM IMPLEMENTATION
Throughout the development of a steganographic file system, great care
needs to be taken to ensure that all the data remains consistent.
'lransparency
Transparency refers to the ability of the host file system and hidden file
system to interoperate on the same physical device without interfering with
each other. This allows the user of the host file system to be unaware of
the existence of the hidden file system, or the hidden data. This mechanism
provided within the file system implementation must support transparent
access to data in both the host and the hidden file system.
The operation of the host file system must appear to be completely nor-
mal; it must behave as it would under normal circumstances. This is a
consideration when taking dynamic reallocation of the hidden data into ac-
count, because of the interaction which will have to take place between the
host and hidden file systems. Any interactions between the host and hidden
file system must be designed in such a way as to not produce any behaviour
as to indicate that there is another file system in operation.
Backward Compatibility
Backward compatibility refers to the ability of the steganographic file system
implementation to remain compatible with a standard driver for the host file
system. The host file system is implemented from an existing file system,
therefore a standard driver for the original file system must be able to access
the data stored within the host file system.
For example if the host file system is constructed from the FAT file system,
then a standard FAT driver should be able to read and write to the file system
as if it were a normal FAT file system. This is achieved by ensuring that
the steganographic file system implementation does not modify existing file
system structures, while maintaining interoperability between the host and
hidden file systems.
Backward compatibility has the effect of aiding plausible deniability by
allowing a user to access the steganographic file system with the standard
file system driver. This process allows the user to deny the existence of the
steganographic content, because as far as the standard file system driver is
concerned, the steganographic content does not exist.
The process of accessing the steganographic file system with the stan-
dard file system driver will result in rendering the hidden data permanently
5.4. AIM 93
inaccessible. There will no longer be the protection of the steganographic
file system implementation to ensure interoperability between the host and
hidden file systems. This can be useful if a user is requested to access data
whilst under duress, as the presence of the data can effectively be denied.
Dynamic Reallocation
Dynamic reallocation is the ability for the file system to automatically reallo-
cate hidden data to a different physical block. This will occur when the host
file system requires that a physical block which contains hidden data must
be used for non-hidden data, this will be discussed later on in this chapter.
This provides a method of avoiding duplication of the hidden data to
avoid collisions with the non-hidden data, by simply moving hidden data
to a different physical location and updating the hidden file system control
structures. Dynamic reallocation allows the file system to effectively mask
the presence of the hidden data, as no duplicates are stored which will limit
the overall exposure of the hidden data.
In the following section we will discuss the need for a steganographic file
systems, and the role in which it can play in securing information.
5.4.1 The Need for a Steganographic File System
As more of our personal information moves into the electronic realm and is
distributed via email and through the Internet; there is a growing need for
that information to be protected. If our personal information falls into the
wrong hands we open ourselves up to issues such as identity theft and fraud.
By using a steganographic file system, it adds another layer to conven-
tional information security in a transparent and convenient way. This will
give individuals the confidence to store personal and sensitive information on
a computer system, because there is no longer the fear that this information
can be obtained without the expressly granted permission and knowledge.
Information hiding techniques can be used to allow not only the data con-
tent to be obfuscated, but also the presence of the data. Data contained
within a steganographic file system can only be revealed with the express
permission of the user.
Information which is stored on a computer is not inherently secure from
attack. A resourceful user can gain access to almost any data which is stored
on a computer with varying degrees of effort. Cryptography allows us to
94 SSFS: FILE SYSTEM IMPLEMENTATION
secure out information from outside attack, but the evidence of encrypted
data is clearly visible. Information hiding allows the existence of data to be
denied, giving a user greater control, over the access of their data.
Information security is of utmost importance as we move more towards
a cyber-existence. As our presence on the Internet increases, the user must
take the adequate steps to ensure that personal information remains secure.
A steganographic file system can be used to hide important information,
giving the ability for the existence of our information to be denied; this will
ensure that sensitive information can remain almost completely secure.
In the following section we will discuss a number of limitations which
exists for steganographic file systems.
5.4.2 Limitations of a Steganographic File Systems
Unlike image and audio steganography (discussed in the previous chapter),
file system steganography does not allow data to be completely hidden from a
forensic examination of a hard disk drive, this is inherent in the way in which
data is stored on the physical disk. Steganographic file systems draw their
strength from an effective organisation of the hidden data on the physical
device, and through the use of cryptographic algorithms.
Steganographic file systems aim to store hidden data in such a way as to
allow hidden data to appear to be random artefacts from normal file system
use. Most modern file systems do not remove the data associated with a file
when it is deleted, they only mark the corresponding file system blocks as
unallocated, and reclaim the associated file control blocks. This does have
the effect of improving the performance of the file system, but this does lead
to artefacts, or remnants of deleted files, developing in the unallocated blocks
over time.
By embedding data in the unallocated blocks, a host file system intro-
duces a limitation where the host file system blocks must be considered com-
pletely allocated. For example, should the host file system have a block size
of 1024 bytes, but only writes 128 bytes to a block, the whole block must
be marked as allocated, even though there are theoretically 896 unallocated
bytes.
This file slack space is not used by the steganographic file system as the
host file system and the hidden file system are generally two separate entities,
and are designed to have only very minor interactions, if any. The structure
of the data contained in the host file system is not exposed to the hidden file
5.5. BASIC CONSTRUCTION 95
system implementation; this data is considered raw in nature. The high-level
construction of the files and the directories in the host file system are not
parsed by the hidden file system, the physical blocks which they allocate are
only considered allocated.
Although storing hidden data in the slack space would increase the overall
storage capacity of the hidden file system, a large administrative overhead
would be introduced as hidden data would not be contained in discrete sized
file system blocks. This would affect the encryption and dynamic realloca-
tion mechanisms as "file system blocks" of unequal size will introduce large
performance and administrative limitations.
In order to maximise performance, hidden file system blocks are consid-
ered to be all equally sized, which will allow them to be reallocated easily,
thus eliminating data duplication. However, the total maximum size of the
hidden file system is affected.
The analysis and investigation info forensic techniques relating to the
detection of SSFS is not the main focus of this dissertation, and therefore
does not fit into the scope of this document. However, forensic analysis of the
hidden file system would allow forensic examiners to detected steganographic
content on a physical device. This would be advantageous to prevent the
abuse of a steganographic file system through the storage of illegal data.
In the following section we will discuss the basic construction of SSFS.
This section will serve as an overview for the detailed construction which will
be presented in later sections. The following section will help to introduce
concepts and aspects of the overall design.
5.5 Basic Construction
To facilitate a detailed discussion of a steganographic file system, this section
will introduce a number of concepts to help understand how the host and
hidden file systems will interact. The steganographic file system described
in this dissertation is implemented in a Linux environment. This is because
Linux is open and fairly well documented, as such, a number of terms and
concepts used in this dissertation are based on those found in a UNIX-type
environment.
In order to achieve a non-duplicating steganographic file system, a file
system within a file system approach will be taken. The steganographic file
system is divided into two parts, namely the host file system and the hidden
file system. We assume that there are a number of hooks into the kernel API
96 SSFS: FILE SYSTEM IMPLEMENTATION
to provide a number of low-level functions such as reading and writing file
system blocks. Each of these two parts will be discussed below.
5.5.1 Modes of Operation
There are two different high-level modes of operation that must be considered
when interacting with a steganographic file system. These are host-only
mode and hidden-only mode. These different modes operate on either the
hidden or non-hidden data on the block device. Both these modes will be
discussed below.
Host-Only Mode
This is what would be considered normal operation of a file system; it per-
forms as a normal file system would. A shell would allow the operator to only
access non-hidden data. This would be the mode in which the system would
operate on a day-to-day basis. However the host only mode will require ac-
cess into the hidden file system in order to facilitate the dynamic reallocation
mechanisms. However in terms of access to data, only non-hidden data will
be exposed to the user while in this mode.
Hidden-Only Mode
This mode would be used to access the hidden-data contained within the
hidden file system. Access would be controlled through the use of a dedicated
command line interface, and through access control mechanisms such as a
passphrase. The hidden data can only be accessed with the pre-existing
knowledge of the passphrase.
The separation of the access of the hidden and non-hidden data is done
in order to clearly define the separate roles the data will play in the overall
system. The hidden data will be kept more secure if the access to the data is
tightly controlled, which is why the interactions between the two modes are
kept to a minimum at all times.
5.5.2 The Host File System
The host file system is derived from an existing file system implementation,
such as the Ext2 or FAT file system. The host file system implementation
5.5. BASIC CONSTRUCTION 97
Host File System Layout:
0 1 n ...
I Superblock I Storage Bitmap I Inode Bitmap I Inode Table I User Data I . I
Figure 5.1: Simple host file system layout
must remain backward-compatible with the original file system implemen-
tation; this is done to provide a level of plausible deniability. Should the
steganographic file system come under scrutiny the host file system would
appear to be the original implementation. For example, if the host file sys-
tem is constructed from the FAT file system, then a normal FAT file system
driver should be able to access all the non-hidden data as if it were a normal
instance of the host file system.
The steganographic implementation will take advantage of the way in
which the host file system stores data on the physical device. This will allow
the hidden data to be embedded within the unallocated blocks of the host
file system. For the purposes the following chapters, a simple host file system
will be used. This file system will have the characteristics listed below, and
these structures are arranged on disk as seen in figure 5.1.
Superblock - to manage control information about the file system.
The superblock will have a size of 1 file system block.
Storage Bitmap - to mark the allocated blocks within the file system.
The storage bitmap will have a variable size, depending on the size of
the physical device.
!node Storage Bitmap - to mark the allocated inodes within the Inode
table. The inode bitmap will have a variable size, depending on the
size of the physical device
!node Table - to store the file and directory control blocks. The inode
table will have a variable size, depending on the size of the physical
device.
5.5.3 The Hidden File System
The hidden file system is the component of the steganographic file system
which is used to store and reference the hidden data, which is embedded
98 SSFS: FILE SYSTEM IMPLEMENTATION
Hidden File System Logical Layout:
0 1 n ...
l Superblock I Translation Map I Inode Table I User Data I. I
Figure 5.2: Hidden file system logical layout
within the unallocated blocks of the host file system. The hidden file system
is a complete file system in its own right, and contains the following metadata
structures:
Superblock - to manage basic storage information about the hidden
file system.
Translation Map - to facilitate storage and dynamic reallocation.
!node Table - to store and manage the file and directory control blocks.
Another aspect of the hidden file system is that every part has to be
reallocatable within the host file system, except for the superblock. Normal
operation of the host file system will not be hampered in any way. The
hidden file system therefore has two different views; the Logical View and
the Physical View. These two views will be discussed in the following sec-
tion. The "logical" layout of the hidden file system is shown graphically in
figure 5.2.
5.5.4 Logical and Physical View
The logical and physical view of the hidden file system is used to facilitate the
dynamic reallocation of the hidden data, while providing a consistent way
to store and reference the hidden data. The primary mechanism through
which this is achieved is the Translation Map (which will be discussed in
the following chapter), which stores a paired value for each allocated block
within the hidden file system. This paired value allows for hidden data to
be stored in a consistent "logical" position in the hidden file system, but in
actuality can be stored in any "physical" position within the host file system.
The integration between the hidden and host file systems is also achieved
through the use of the logical and physical view. As seen in figure 5.3, data
within the hidden file system is logically allocated in a contiguous manner,
5.5. BASIC CONSTRUCTION 99
40
Host File System:
Host Allocated
Hidden File System: Hidden Allocated
Hidden Position in Host
30
Figure 5.3: Hidden and host file system integration
but the actual physical position of the data can be in any of the unallocated
blocks within the host file system.
In the following section an operational scenario is presented in order to
demonstrate how the hidden and host file systems will operate with a dy-
namic reallocation mechanism in place. The following section is presented
in a high-level manner in order to simply demonstrate the operation of the
steganographic file system.
5.5.5 Operational Scenario
In order to demonstrate the operation of a steganographic file system, an op-
erational scenario will now be presented. This scenario will demonstrate the
features and requirements of the host and hidden file systems. This scenario
is a simplistic overview of the operation of the steganographic file system.
All the steps described below are demonstrated graphically in figure 5.4 on
page 101.
Alice initialises a steganographic file system on a block device. This
process initialises the host and the hidden file system structures on the
disk.
Alice now stores data on the host file system (File A); this would be
considered normal operation of the host file system.
Alice now stores data on the hidden file system (Hidden File B); this
hidden data is stored within the unallocated blocks of the host file
system.
100 SSFS: FILE SYSTEM IMPLEMENTATION
Alice now stores data on the host file system (File C). The host file
system still will consider the blocks where hidden data is contained as
unallocated and will now overwrite the hidden data with the new data.
The following events will now need to occur:
1. The file system detects that there is steganographic content in the
unallocated blocks where the data is to be written.
2. The hidden data is moved to a new location within the host file
system.
3. The host file system operation can now continue as normal.
Alice now wishes to access the hidden data that was previously stored
on the hidden file system (Hidden File B). The steganographic file
system must return the correct data to Alice, even if the hidden data
was reallocated.
5.6 Summary
In this chapter we introduced SSFS, by discussing existing implementations,
introducing our aim for our implementation, and discussing the basic con-
struction. We covered the following sections:
Definitions - where we discussed a number of concepts which will be
used throughout the following chapters to describe components of the
steganographic file system.
Problems with Existing Implementations - where we critically dis-
cussed the problems with the following existing file system implemen-
tations:
McDonald and Kuhn
Pan, Tan, and Zhou
Aim -where we outline the aim for this steganographic file system.
We then discussed the following:
- The Need for a Steganographic File System - where we outline
the need for a steganographic file system in modern computer
systems.
5.6. SUMMARY
Step 1: Initialisation
Step 2: Host File Addition
~
File A
- r r r r r < T T T ~ ~ - r r r r r < T T T ~ o
~
Hidden File B
Step 4: Host File Addition
~
File C
Step 5: Access Hidden File
~
Hidden File B ~
I Allocated Host Block
I
Allocated Hidden Block
'--C.._/ Dynamic Reallocation
Figure 5.4: Steganographic file system operational scenario
101
102 SSFS: FILE SYSTEM IMPLEMENTATION
Limitations of a Steganographic File System - where we discuss
a number of limitations of steganographic file systems.
Basic Construction- in this section we outline the basic construction
concepts for the steganographic file system implementation. We the
discussed the following concepts:
Modes of Operation - where we discussed how the host and hid-
den file systems will access their data.
The Host File system - in this section we discussed the basic
construction and operation of the host file system.
The Hidden File System - where we discussed the basic layout
for the hidden file system.
Logical and Physical View- in this section we defined the differ-
ences between the logical and physical view of the physical device
as used by the host and hidden file systems.
Operational Scenario- in which we presented an operational sce-
nario in order to describe the workings of the steganographic file
system.
5. 7 Conclusion
Information security plays an important role within the framework of our
modern lives. As more of our information is transmitted electronically, there
is a growing threat to our security. Thaditional information security mech-
anisms, such as cryptography, are becoming less effective in securing our
personal information. Steganography can be used to add another layer of
protection to our information, by hiding the presence of our data.
A steganographic file system allows a large amount of data to be stored
on a physical device. This allows the existence of our data to only be revealed
with our express permission. This gives us the ability to store our personal
information with confidence that it will not be discovered.
In this chapter we discussed the design of a steganographic file system,
which will allow information to be stored within the unallocated blocks of
the host file system. In section 5.2 we introduce a number of terms used to
describe the components of a steganographic file system. We then go on to
discuss the problems with existing systems, and in section 5.4 we discuss the
aim for this steganographic file system by a introducing number of different
5.7. CONCLUSION 103
aspects with the file system must satisfy. Finally we go on in section 5.5 to
give an overview of the basic operation of the steganographic file system, in
order to demonstrate how the component parts interact.
This chapter is the basis for chapters 6 and 7, which are fundamental
for the discussion on dynamic reallocation in chapter 9. In the following
two chapters we will discuss a number of file system structures and their
implementation; this will allow us to define the working framework for SSFS.
Chapter 6
File System Structures for
SSFS
6.1 Introduction
The design of the file system structures plays an important role for defining
the layout of the steganographic data on the disk. These structures will be
modified by file system operations in order to manage all aspects of the hidden
data on the physical device. The effectiveness of the underlying file system
structures will play an important role in the management and performance
of the file system.
SSFS requires a well-defined data structure that can effectively manage
and reference hidden data. This allows for operations such as encryption and
dynamic reallocation to be applied at a later stage.
In this chapter we outline the structures which will be used in SSFS in
order for the hidden file system component to effectively store and retrieve
data. We will also discuss the initialisation of these structures, with emphasis
on the limitations which the host file system will introduce.
In this chapter we will discuss the structures used to store and reference
data within the hidden file system. In section 6.2 we discuss the construction
of the structures, and describe in detail each internal field. We continue in
section 6.3 by discussing the initialisation of these structures as this plays an
important role in determining the initial state of the file system.
105
106 FILE SYSTEM STRUCTURES FOR SSFS
6.2 File Systems Structures
The hidden file system consists of a number of different structures which
control the layout of the hidden data within the file system. These structures
will be similar to the corresponding structures which are found in normal
file systems. The construction of these structures will be discussed in the
following sections.
The structures discussed in the following sections are of vital importance,
as they play a role in all aspects of managing the hidden data within the
hidden file system; their construction needs to always be geared to achieve
this end. The structures which will be discussed will be the Superblock, the
TMap Array, the Translation Map, the Inode Table and Entries, and the
Directory Entries.
6.2.1 Superblock
The superblock is responsible for storing basic control information about the
hidden file system, such as the block size, the location of other important
structures, and management information, such as the number of available
blocks.
The superblock is the only structure in the hidden file system that must
remain in a constant position within the host file system. Normally the file
system superblock is stored in the first physical block, the superblock for the
hidden file system is no exception, it will be stored in the first physical block.
This does present a problem, as the first file system block already contains
the host file system's superblock.
This does not present a problem as the host file system does not use the
entire first block to house its superblock. The hidden file system's superblock
can be stored directory after the host file system's superblock in the first file
system block. This will work as the host file system will reserve the entire
first disk block for the host file system superblock structure; depending on the
host file system block size this can be anywhere from 1024 to 8196 bytes in
size. The host superblock will only occupy as small portion of that, allowing
the superblock to be stored in the slack space directly after the host file
system's superblock.
The superblock is stored in the first logical hidden block (logical position
0) in the hidden file system. Logical position 0 is the only logical block
which must be the same as the physical block (i.e. logical block 0 is always
referenced to physical block 0). The structure of the hidden superblock can
6.2. FILE SYSTEMS STRUCTURES 107
be seen in listing 6.1, followed by a description of each of the elements of the
structure.
Each field in the superblock is used to store and manage information
concerning the overall hidden file system. Every structure that is stored in
the hidden file system must be locatable through the superblock. Each of
the fields in the superblock will now be discussed.
Listing 6.1: Superblock structure
1 //Define the Magic Numbers for the Superblock
2 #define HIDDEN_.FS...SPBLKJVIAGIC Ox48535042 // 'HSPB'
3 #define HIDDEN_.FS...SPBLKJVIAGIC2 Ox48454e44 // 'HEND'
5 typedef struct hiddenfs_superblock
6 {
7
8
10
11
12
14
15
17
18
20
21
23
24
25 }
unsigned int magic = HIDDEN_.FS...SPBLK_MAGIC;
unsigned int flags;
unsigned int inode_number;
unsigned int inode_table_start;
unsigned int inode_table_size;
unsigned int tmap_start;
unsigned int tmap_size;
unsigned int num_blocks;
unsigned int num_blocks_used;
unsigned int rooLinode;
unsigned int block_size;
unsigned int iv;
unsigned int magic2 = HIDDEN_.FS...SPBLKJVIAGIC2;
hiddenfs_superblock;
Superblock Control and Consistency Fields
The first fields of interest are the magic numbers magic and magic2. These
32-bit numbers are used to check the consistency of the superblock. Should
the magic number not match the predefined magic constants, then the file
system will fail a consistency check. Although this is a very rudimentary way
to check file system consistency, it does provide a simple and quick method of
determining if the superblock has become corrupt. In a dynamically changing
environment, such as a file system, this is of utmost importance.
108 FILE SYSTEM STRUCTURES FOR SSFS
The next field is the flags field. This field stores file system flags which
will control the operation of the hidden file system. There are two flags
which are defined for the hidden file system; these are Preserving mode and
Sacrificial mode. These two modes will be discussed in detail in the following
chapter. These flags control how the steganographic file system will behave
should there be no unallocated host file system blocks available for dynamic
reallocation.
Superblock Inode Table Fields
The next fields store information about the Inode table. The number of
inodes available to the hidden file system will be stored in inode_number.
The logical hidden file system block where the inode table can be located
is stored in inode_table_start. The size of the inode table is stored in
inode_table_size, this value stores the number of hidden file system blocks
which are allocated to the inode table. These fields will allow the inode table
to be located, which will in turn allow a particular inode entry to be located.
Superblock Translation Map Fields
The Translation Map information is stored next in the superblock. The
fields, tmap_start and tmap_size, store the logical starting block of the
Translation Map and number of hidden file system blocks that are allocated
to the Translation Map respectfully. The Translation Map allows for the
hidden file system to translate between logical hidden file system blocks and
host physical blocks.
Superblock Root Directory Field
The inode number of the root directory is stored in root_inode. This will
allow the hidden file system to locate the root directory which is used to
store any subsequent hidden files and directories. The root directory is cre-
ated during file system initialisation, which will be discussed in the following
section.
Additional Superblock Control Fields
The next field is block_size, which stores the byte size for each file system
block. This value will be the same for both the hidden and host file sys-
tems in order to maintain maximum interoperability. The following field, i v,
6. 2. FILE SYSTEMS STRUCTURES 109
stores a 32-bit initialisation vector that will be used to control an encryption
algorithm in order to access various parts of the file system.
One may notice that there are very few "metadata" related fields in the
superblock, such as the name of the volume, which you would generally ex-
pect to find in the superblock. This is done for two reasons, firstly to keep
the size of the superblock as small as possible, by only storing essential infor-
mation, and secondly to make the superblock appear to be less conspicuous.
This should help to slightly obscure the superblock from detection.
The superblock has a size of 52-bytes. Remember, the superblock is stored
directly after the host superblock, in the same physical location; the small
size of the hidden file system's superblock facilitates this.
The next structure that will be discussed is the TMap Array. This struc-
ture is considered to be part of the superblock and it allows the Translation
Map, which is discussed in a later section, to be located in any physical
location on the device.
6.2.2 TMap Array
The TMap Array is a very simple structure that lists all the physical blocks
that are allocated to the Translation Map, which will be discussed in the
following section.
The TMap Array is stored directly after the hidden superblock, in the
same physical block, so that its position is always consistent. This is done
so that the Translation Map can always be located, even if it has been real-
located on disk. Conceptually the TMap Array can be seen as part of the
hidden superblock.
The physical position of the blocks allocated to the Translation Map is
stored within the TMap Array. The Translation Map may exist in any of
the unallocated physical blocks, and may be locatable by both the hidden
file system, and the host file system in order for dynamic reallocation to take
place. The physical blocks allocated to the Translation Map need not be
contiguous; the sole purpose of the TMap Array is to allow the Translation
Map to be located.
For every physical block which is allocated to the Translation Map, a
single integer value is required in the TMap Array. For instance, if the file
system block size is 1 KiB, and say the Translation Map requires 16 KiB. The
TMap Array will require 16, 4-byte integers (assuming a 32-bit architecture),
and will have a total size of 64 bytes. The size however is dependent on the
110 FILE SYSTEM STRUCTURES FOR SSFS
overall size of the hidden file system, and the file system block size. The
discussion on the determination of the exact size will be discussed in the
following section.
The definition of the TMap Array can be seen in listing 6.2, where the
byte size of the array is obtained from line 15 of listing 6.1.
Listing 6.2: Definition of the TMap Array
1 unsigned int tmap_array [ hiddenfs_superblock. tmap_size];
As can be seen from the above listing, the TMap Array is simply an array
of integer values, one for each of the blocks that are allocated to the Trans-
lation Map. In the following section the Translation Map will be discussed,
this structure will allow hidden data to be stored in a logically contiguous
manner, yet be located at any physical location.
6.2.3 Translation Map
The Translation Map provides the ability for data to be dynamically reallo-
cated within the hidden file system. It allows hidden data to be stored in a
constant logical order, but located at any physical location within the host
file system.
The Translation Map is a structure which maps logical blocks to phys-
ical locations. All hidden data is organised in terms of its logical position
within the hidden file system, when data needs to be stored or retrieved, the
Translation Map is used to perform the translation between the logical and
the physical position on disk.
The structure of Translation Map is seen in listing 6.3. Conceptually the
Translation Map is made up of an array of hiddenfs_tmap_entry (as seen
on line 7), one entry for each block allocated to the hidden file system.
The Translation Map also provides the storage map to the hidden file
system; by using the allocated member (line 3 of listing 6.3) any unallocated
blocks can be located. This allows the Translation Map to play the dual of
providing the location of a physical block for a particular logical location,
and marking allocated logical blocks.
6.2. FILE SYSTEMS STRUCTURES
Listing 6.3: Translation Map structures
1 typedef struct hiddenfs_tmap_entry
2 {
3 unsigned char allocated [1];
4 unsigned int entry;
5 } hiddenfs_tmap_entry;
7 hiddenfs_tmap_entry hiddenfs_tmap [superblock. num_blocks];
111
Each Translation Map entry is a 5-byte structure that contains two en-
tries. The first field, allocated, is used to mark if a particular entry is
allocated within the hidden file system. The second field, entry, is used
to store the physical block that is mapped to a particular Translation Map
entry.
The Translation Map itself is simply an array of hiddenfs_tmap_entry's.
There is one entry for every possible hidden file system block. The size of
this structure will depend on the number of blocks that are allocated to the
hidden file system. The size of the hidden file system needs to be considered
as to ensure that the Translation Map does not grow too large.
The 5-byte size of the Translation Map entries means that the entire
structure will not be "block-aligned" with the block size, a Translation Map
entry could fall between a block boundary. This was done to obscure the
Translation Map slightly. Generally structures are designed to fall neatly
within a file system block, as to ensure that they are simple to locate. By
allowing the Translation Map to not be block aligned the overall structure
of the Translation Map will be obscured.
The TMap Array and Translation Map could be implemented as a B-Tree,
or equivalent data structure. This would have the benefit of decreasing the
time required to search for an entry in the Translation Map. The Translation
Map is rather implemented as a linear array of Translation Map Entries in
order to minimise the storage requirement on the physical device.
In the following section the structures relating to the Inode Table will be
discussed. These structures are used to reference blocks that are allocated
to files and directories, and to manage any related metadata.
112 FILE SYSTEM STRUCTURES FOR SSFS
6.2.4 Inode Table
The Inode table is used to store the hidden file and directory metadata. The
Inode table is a collection of a number of inode entries, each one can be used
to store information about a file or a directory.
Each inode is constructed from a number of different smaller structures,
which can be seen in listing 6.4 (on page 113 and each element of the struc-
tures are described below. An inode entry always references the logical posi-
tion of data within the hidden file system, this allows for the inode structure
not needing to be changed when the hidden data is reallocated.
The blocks that are allocated to an inode entry are stored in extents (see
listing 6.4, line 1). An extent stores allocated blocks using a starting position
and a length, a single extent can reference a large number of consecutive
blocks. The inode structure also makes provision for indirect and double-
indirect blocks (see listing 6.4, line 10 and 11), which further increases the
allowable size for a file.
As with a normal UNIX file system, the indirect blocks will reference a
logical block within the hidden file system, which will contain a number of
extents. The double-indirect block will reference another logical disk block
which will contain a list of references to indirect blocks, which will in turn
reference a number of extents. Indirect and double-indirect blocks are only
likely to be used when the file system is heavily fragmented.
The size of an inode entry plays an important factor in determining the
overall size and performance of finding and accessing an inode within the
inode table. The size of the inode entry will be of a power of 2; this will
ensure that the inode table will fit cleanly into the host file system blocks.
For example, if an inode entry has a size of 128 bytes, and the file system has
a block size of 1024 bytes, then 8 inode entries will fit cleanly into a single file
system block, with no overlap into the next block. This allows the location
of a particular inode entry to be located quickly.
In order to mark if a particular inode is allocated in the inode table, the
Most Significant Bit (MSB) of the inode__number (see listing 6.4, line 21) is
changed to either 0 if the inode entry is unallocated or 1 if the inode entry
is allocated. This is used to substitute for a separate storage map to mark
allocated and unallocated inode entries.
6.2. FILE SYSTEMS STRUCTURES
Listing 6.4: !node Table entry
1 typedef struct hiddenfs_exten t
2 {
3 unsigned int start;
4 unsigned int length;
5 } hiddenfs_extent;
7 typedef struct hiddenfs_inode_data
8 {
9 hiddenfs_extent direct [DIRECT_ELOCKS];
10 unsigned int indirect;
11 unsigned int double_indirect;
12 unsigned int size;
13 unsigned int padding;
14 } hiddenfs_inode_data;
16 typedef struct hiddenfs_inode
17 {
18 unsigned int magic;
19 unsigned int mode;
20 unsigned int key;
21 unsigned int inode_number;
23 hiddenfs _inode_d at a data;
24 } hiddenfs_inode;
113
114 FILE SYSTEM STRUCTURES FOR SSFS
Inode Extent Structure Fields
The first hiddenfs_extent structure, as seen on line 1 of listing 6.4, is used
to store block information which will be used in an inode entry. Recall that
an extent will store a "list" of blocks in terms of consecutive logical blocks
from where they start, and then the number of blocks that follow the starting
block to form the extent. The start field is used to specify the start of the
extent, and the length field is used to store the length of the extent.
Extents allow the file system to reference a large number of blocks using
a relatively small amount of space. For example, 1000 file system blocks can
be referenced by a single extent structure (if the logical blocks are contigu-
ous). However if there is significant file system fragmentation, there may be
multiple extents needed to represent the same number of allocated blocks. A
single extent structure has a size of 8-bytes.
Inode Data Structure Fields
The next structure to be discussed is the hiddenf s_inode_data structure,
as seen on line 7. This structure is used to reference the blocks allocated
to hidden data. The direct field is used to directly reference extents from
within the inode entry itself. This allows the file system to quickly find the
blocks that are allocated to the inode.
The next field is the indirect field; this field stores the logical location
of an indirect file system block. This indirect block is used to store a number
of extents that are associated to this inode. The indirect block is only used
if the inode structure has run out of directly stored extents.
The next field is the double_indirect field; this will store a logical loca-
tion of a double-indirect block that will be used to store extents. This field
is only used if the inode has run out of storage in both the direct and indi-
rect blocks. Assume a file system block size of 1024 bytes then an indirect
block can reference 128 extents. A double-indirect block will require a 4-
byte integer (assuming a 32-bit architecture) to store a reference for a single
indirect block, a double-indirect block will therefore reference 256 indirect
blocks. Each indirect block will in turn reference 128 extents. This will allow
a double-indirect block to reference 32768 extents. Technically this amount
of available addressable storage is not required as the hidden file system will
never typically be this large, however it does allow for expandability.
The size of the data referenced by the inode is stored in size. This is
simply the byte value for the amount of the data that is referenced by the
6.2. FILE SYSTEMS STRUCTURES 115
extents stored in the inode. The padding field is used to bring the whole
inode structure to the required byte size; this can be filled with random
values to obscure the structure of the inode.
Inode Structure Fields
The hiddenfs_inode structure stores metadata about the file or directory
which a particular inode references. The first field, magic, is used to mark
the start of the inode structure, and to provide a method to validate the
consistency of an in ode entry, using a magic number. The next field mode is
used to mark whether a particular inode is reserved for a file or for a directory
by storing a corresponding flag value.
The key field is used to store an encryption key that will be used to
encrypt and decrypt the data that is referenced by this inode, which will
allow the inode data to be encrypted and decrypted.
Each inode has a number used to reference it within the inode table; this
is stored in the inode_number field. As discussed above the MSB of this filed
is used to mark if a particular inode entry is allocated or not.
Finally the data field is used to reference a hiddenfs_inode_data struc-
ture (discussed above) which is used to store the extent information associ-
ated with a particular inode.
!nodes form the control structure for files and directories, allowing the
hidden file system to retrieve and maintain a directory hierarchy. The par-
ticular file and directory structures for the hidden file system will be discussed
in the following section.
Listing 6.5: Directory Entry structure
1 typedef struct hiddenfs_directory
2 {
3 unsigned int inode_number;
4 unsigned char name_length;
5 char name;
6 } hiddenfs_directory;
6.2.5 Files and Directories
Files and directories are both regarded to be streams of arbitrary bytes which
are stored in the hidden file system. Directories have a regular structure as
116 FILE SYSTEM STRUCTURES FOR SSFS
seen in listing 6.5. The sub-directories and files in a directory are stored in
a list of hiddenf s_directory structures each of which is stored sequentially
in a hidden file system block.
To navigate the directory structure there are two special directory entries
that must be provided for each directory, these are the root and the parent.
The root directory entry is designated by a "."; this entry simply points
to the inode number for this directory entry. The parent directory entry
is designated by " .. " ; this entry points to the inode number of the parent
directory. In this way the hierarchical directory structure is constructed and
can be navigated.
The first field in the hiddenfs_directory structure is the inode_number
field. This field stores the inode number of the item that this directory entry
references. The name_length field is the length of the name of this item.
Finally the name field is the name of this particular entry stored as a linear
array of characters.
In order for the hidden file system to operate, these structures need to
be initialised. There are a number of considerations that need to be made
during initialisation. Initialisation of each of the structures will be discussed
in the following section.
6.3 File System Initialisation
A crucial part of the operation of the hidden file system is correct initialisa-
tion within the host file system. Initialisation will involve constructing all of
the above mentioned structures and writing them out to the block device.
During initialisation all of the parameters, such as the overall size, of the
hidden file system are calculated. This phase will also determine the overall
operational constants of the hidden file system. Initialisation takes place
in two parts, namely host file system initialisation and hidden file system
initialisation, these will be discussed below. The host file system initialisation
will take place first, and then followed by the hidden file system initialisation.
6.3.1 Host File System Initialisation
Ideally the host file system is derived from an existing file system implemen-
tation, and is backward compatible with the existing file system drivers. This
allows data stored on the host file system to be access using a standard file
system driver.
6.3. FILE SYSTEM INITIALISATION 117
This adds a level of security, because the file system on the block device
will appear to only contain the host file system, and the hidden data will
appear to be remnants of the normal day to day activity, it not plausible
that a normal user would suspect that hidden data would exist.
Initialisation of the host file system can be taken directly from the original
host file system creation utility, the resulting host file system structures must
be laid out on the disk as they would be as if it were a standalone file system.
The way in which the host file system structures are placed on the block
device must be well-understood, so that the position of host data can be
used to embed the hidden file system structures.
There are however a number of factors that will influence the hidden file
system initialisation, they are:
The size of the host's superblock, as this will have a direct effect on the
overall structure of the hidden file system.
The host file system block size, as this needs to be taken into account.
The number of blocks that exist in the host file system.
The structure that marks unallocated and allocated blocks.
For the purpose of this discussion, a simple host file system with the
following structures listed below is assumed, with the structures stored on
the block device as seen in figure 5.1 on page 97:
A superblock
A storage bitmap
An Inode bitmap
An Inode Table
In the following section we will discuss the initialisation of the hidden
file system. In order to construct the hidden file system's structure on the
physical device, the positioning of the host file system structures must be un-
derstood in order to determine the exact locations for the hidden file system
structures.
118 FILE SYSTEM STRUCTURES FOR SSFS
6.3.2 Hidden File System Initialisation
Initialisation of the hidden file system relies heavily on a clear understanding
of how the host file system stores its structures on the physical device. This
allows the hidden file system structures to be embedded within the host
file system. Initialisation of the hidden file system is partitioned into four
separate stages, these stages are listed below.
Superblock Initialisation
Translation Map Initialisation
Inode Initialisation
Root Directory Initialisation
Each of the above mentioned stages are crucial to set up the overall
operating environment for effective storage of hidden data, and are discussed
in the following sections.
Superblock Initialisation
The superblock is the only structure that must be stored in a consistent
location. Initialisation of the superblock will determine a number of param-
eters for the hidden file system, such as the number of allocated blocks, the
location of the Translation Map, and the location of the inode table.
There are a number of considerations which must be made when initialis-
ing the superblock, such as what will be the maximum number of blocks that
can be allocated to the hidden file system. This value is limited to a per-
centage of the total number of blocks in the host file system, say for instance
5%. Limiting the hidden file system's size is done so that the existence of the
hidden data can obscured. If there was no limit and the hidden file system
could grow to fill all the unallocated blocks within a host file system, then
there would be a conflict that would develop between the hidden data and
the non-hidden data which will be discussed in the following chapter.
The superblock will be used to store the location and size of the Trans-
lation Map, the location and size of the Inode Table, and the inode number
of the Root Directory. However, these values can only be set once the corre-
sponding structures have been initialised.
Initialising the superblock is a simple operation. The following steps must
be taken:
6.3. FILE SYSTEM INITIALISATION 119
1. Allocate memory for the superblock structure.
2. Set the magic numbers in the superblock.
3. Set the block size for the file system in the superblock.
4. Set the flags in the superblock.
5. Calculate the number of blocks that will be available in the file system.
Algorithm 3 shows the basic steps needed to initialise the superblock.
Only the first few entries need to be initialised, as the rest of the superblock
structure will be constructed during the initialisation of the remaining struc-
tures. The superblock will be written to the disk once the rest of the struc-
tures have been initialised. The values for the magic numbers can be seen in
listing 6.1 on page 107.
Hiddenfs.Superblock.magicl ~ HIDDEN_FS_SPBLK_MAGIC;
Hiddenfs.Superblock.magic2 ~ HIDDEN_FS_SPBLK_MAGIC2;
Hiddenfs.Superblock.flags ~ set the flags to control the access to the
data
Algorithm 3: Hidden file system superblock initialisation
The constant values seen below are set during the hidden superblock
initialisation. These constants are used throughout the initialisation of the
other structures in order to determine the overall size and physical position.
As seen below hiddenFS and hostFS represent the hidden and host file systems
respectfully. The size of the hidden file system is determined by LIMIT. The
number of file system blocks is represented in a particular file system by
numblocks and the file system blocks size is represented by blocksize.
hiddenFSnumblocks
hidden FSblocksize
hidden FS size
hostFSnumblocks *LIMIT
hostFSblocksize
hidden FSnumblocks * hidden FSblocksize
Translation Map and TMap Array Initialisation
(6.1)
The Translation Map provides the ability for the hidden file system to dy-
namically reallocate data, by providing the translation between the logical
120 FILE SYSTEM STRUCTURES FOR SSFS
and physical view of the file system. Each block that can be allocated in the
hidden file system requires an entry in the Translation Map.
The Translation Map is implemented as an array of Translation Map
entries. The structure of the Translation Map entry can be seen in listing 6.3.
Each entry has a size of 5-bytes; this implies that considerations need to be
made to store the Translation Map on the block device.
The Translation Map is also used to manage the free space within the hid-
den file system, by utilising the allocated byte (seen on line 3, listing 6.3).
By marking this byte as either 0 or 1 the corresponding logical block will be
considered to either be unallocated or allocated.
Although the Translation Map provides the mechanism for dynamic re-
allocation of the hidden data, it needs to be dynamically reallocatable itself.
This is achieved through the use of the TMap Array. The TMap Array
provides a static reference to every physical block that is allocated to the
Translation Map. It is simply an array of integer values, where each integer
gives the physical location of each block of the Translation Map. The TMap
Array is stored directly after the Superblock structure, in the first block, and
is therefore not reallocatable. It is considered to be part of the complete
superblock structure.
The byte size for the Translation Map and the TMap Array can be seen in
equation 6.2, where translationMapEntry, translationMap and TMap represent
a Translation Map Entry, the Translation Map, and the TMap respectfully.
translationMapEntrysize
translation Map size
TMapsize
5 bytes
hidden FSnumblocks * translation M a pEntry size
r translationMapsize l ( )
1 h'dd FS * sizeof integer
I en blocksize
(6.2)
This does present a limitation on the hidden file system in that the file
system block size will have a direct impact on the overall size of the hidden file
system. This situation arises because the first file system block has to store a
number of different structures: the host superblock, the hidden superblock,
and the TMap Array.
The effect that this will have on the hidden file system can be demon-
strated by performing the calculation seen below. First assume that the file
system block size is 1024-bytes, host superblock as a size of 124-bytes, and
6.3. FILE SYSTEM INITIALISATION 121
the hidden superblock has a size of 56-bytes. Then the size of the TMap is
limited to the remaining bytes in the first block.
This is calculated by subtracting the size of the host superblock and the
hidden superblock from the size of the file system block. The remaining value
is then divided by the size of each TMap Array entry (a single 4-byte integer
for each TMap Array Entry). This will then give the maximum number
of blocks that can be allocated to the Translation Map, by extension the
number of Translation Map Entries, and thus the total allowable size of the
hidden file system. The calculation can be seen in table 6.1 and represented
in equation 6.3.
Superblock Total size
MaxTMapsize
TMapsize
hiddenFSsuperblock + hostFSsuperblock
L hiddenFSblocksize- SuperblockTotalsize J
sizeof(integer)
< MaxTMapsize
File System block size in bytes
(6.3)
1024
Subtract host superblock size in bytes -124
Subtract hidden superblock size in bytes -56
Number of bytes remaining in 1st block 844
Maximum blocks allowed for Translation Map L 844 -;... 4 J = 211
Total size for the Translation Map 211 X 1024 = 216064
Total number of Translation Map Entries L216064-;-5J = 43212
Total file system size 43212 X 1024 = 42 MiB
Table 6.1: Calculation of the size of the Translation Map
In order to initialise the Translation Map the following steps have to be
taken, these steps can be seen in algorithm 4:
1. Allocate memory to the Translation Map structure.
2. Calculate the size (S) of the Translation Map.
(a) Calculate the number of blocks (N) required to hold the Transla-
tion Map.
122 FILE SYSTEM STRUCTURES FOR SSFS
TranslationMap +- allocate memory for TranslationMap ;
I* Calculate the size of the Translation Map *I
TranslationMap.size +- Hiddenfs.Superblock.NumberBiocks * 5;
I* Calculate the number of blocks for the Translation Map *I
NumberBiocks +- lTranslationMap.size / Hiddenfs.Superblock.BiockSize J;
MaximumSize +- (BiockSize - Host.Superblock - Hiddenfs.Superblock )/4);
I* Check to see if the number of blocks is allowable *I
if NumberBiocks > MaximumSize then
I
NumberBiocks +- MaximumSize;
TranslationMap.size +- NumberBiocks *5;
end
I* Allocate an array of integers for TMapArray *I
TMapArray +- allocate NumberBiocks integers for TMapArray ;
I* Mark the first entry as allocated - for the Superblock *I
TranslationMap [OJ +- allocated;
for i=O to lNumberBiocks -;-BiockSize J do
I* Mark each block allocated to the Translation Map. When
the entry is allocated a physical block must be found in
the Host file system to become the storage location for
this particular entry *I
Translation Map [i] +- mark as allocated and find a physical block;
TMapArray [i] +- record physical location of block;
end
Hiddenfs.Superblock +- record location and size of the Translation Map;
Algorithm 4: Translation Map initialisation
6.3. FILE SYSTEM INITIALISATION 123
(b) Check to see if the number of blocks does not exceed the maximum
allowed.
3. Allocate an array of integers for the TMap Array.
4. Allocate memory for each Translation Map entry in the Translation Map.
5. Mark the first Translation Map entry as allocated for the Superblock
(Logical block 0).
6. Mark the next N blocks as allocated for the Translation Map.
(a) For each Translation Map entry to be allocated find an unallocated
physical block in the host file system to store this block.
7. Record the physical location of the N Translation Map blocks in the
TMap Array.
Inode Initialisation
Once the Superblock, Translation Map, and the TMap Array have been
initialised, the inode table for the hidden file system must be initialised. The
inodes in the hidden file system are allocated statically; this means that space
is reserved during initialisation for all of the inodes that could exist in the
file system.
When choosing the number of inodes to store in the inode table, there are
two considerations that must be made, namely storage requirement versus
number of inodes. The more inodes which exist in the inode table, the greater
the number of files and directories that can exist within the file system. The
more inode entries the larger the inode table will be, and hence the greater
the storage requirement.
The size of the inode table can be calculated by simply dividing the
number of blocks in the file system by the ratio of inodes to file system
blocks, and then multiplying that by the number of bytes per inode entry.
For example if the file system contains 3276 blocks, and there is 1 inode for
every 4 disk blocks. There will be 819 inode entries in the inode table. If
each inode entry is 128 bytes, then the total size of the inode table would be
10482 bytes. This calculation is demonstrated in table 6.2 and presented in
equation 6.4.
124 FILE SYSTEM STRUCTURES FOR SSFS
lnodeEntry size 128 bytes
I node T a bleentries
l hiddenFSnumblocks J
I node Entry size
lnodeTablesize I node Tableentries * lnodeEntry size
Number of hidden file system blocks
Size of an inode entry
Number of inode entries in inode table
Size of inode table
3276 blocks
128 bytes
l3276---;- 4J = 819
819 x 128 = 104832 bytes
Table 6.2: Calculation of the size of the !node Table
(6.4)
To try and strike a balance, the number of inodes is chosen to be one
inode for every four disk blocks; this is the same ratio as is used in the Ext2
file system.
The inode table has to be dynamically reallocatable; this is achieved by
allocating blocks for the inode table via the Translation Map. The inode
table will there for appear to be contiguous within the "logical" file system,
but it can be located in any "physical" location within the host file system.
Once the number of inodes, the size, and the location of the inode table
has been calculated, it is stored in the superblock. The inode table will now
be locatable by the file system.
The following steps must be taken in order to initialise the inode table
and is outlined in algorithm 5:
1. Calculate the number of inodes for this file system. Using 1 inode entry
for every 4 file system block.
2. Calculate the number of file system blocks that the inode table will
occupy.
3. Allocate memory for each lnode Entry in the lnode Table.
4. Allocate file system blocks in the Translation Map for each block the
inode table will occupy.
5. Mark the start, size, and the number of inode entries of the inode table
in the superblock.
6.3. FILE SYSTEM INITIALISATION
I* Calculate the number of inodes *I
Numberlnodes +--- Hiddenfs.Superblock.NumberBiocks /4;
I* Calculate the number of blocks for the Inode Table *I
size +--- Numberlnodes *128;
NumberBiocks +--- I size / BlockSize l;
I* Allocate memory for the inode table *I
lnodeTable +--- allocate memory for !node Table;
Allocate NumberBiocks logical blocks for the !node Table from the
Translation Map;
Mark !node Table attributes in the Hidden Superblock;
Algorithm 5: !node Table initialisation
Root Directory Initialisation
125
Once the inode table has been initialised, the root directory can be created.
The root directory is required so that the hidden file system has an initial
directory to work with.
The root directory requires that a directory entry (see listing 6.5, on
page 115) and an inode (see listing 6.4, on page 113) be allocated. There is
no structural difference between the directory entry for the root and any other
directory entry in the hidden file system, there is only a semantic difference,
in that the root and parent directory entries (" . " and " .. ") both point to
the inode that is allocated to the root directory. This is done because the
root directory does not have a parent; this ensures that a user can never
navigate to the "parent" of the root directory (which does not exist).
The inode that is allocated to the root is a normal inode entry. Once
the inode has been allocated, the number of the particular inode is recorded
in the superblock so that the root can be located. The root directory must
be dynamically reallocatable, this is achieved by allocating blocks for root
directory via the Translation Map.
Initialising the root directory is quite simple; the following steps must be
taken:
1. Allocate an inode from the inode table for the root directory, this should
be inode 1.
2. Create a directory entry for the root directory (.) that references inode
number 1.
126 FILE SYSTEM STRUCTURES FOR SSFS
3. Create a directory entry for the parent directory ( .. ) that references
inode number 1.
4. Allocate a logical block from the Translation Map for the root directory.
5. Mark the root directory inode number in the superblock.
6. Write the root directory to the physical disk using the Translation Map
to obtain the physical location.
Once the Superblock, Translation Map, Inode Table, and root directory
have been created, all of the structures are written to the disk. The Trans-
lation Map will provide the physical locations where allocated logical hidden
file system blocks should be written.
6.4 Summary
In the above chapter we discussed the following concepts:
File System Structures - where we introduce the control structures
which will be used in SSFS.
- Superblock - in this section we describe the superblock, which
contains the file system metadata
- TMap Array - where we describe the structure of the TMap
Array, which allows the Translation Map to be located on the
physical device.
- Translation Map -in this section we discuss the Translation Map,
which will house the logical to physical mappings for SSFS.
- !node Table -in which we discuss the Inode table, used to store
inodes.
- Files and Directories -- where we discuss the inodes and direc-
tory entries, used to store metadata about files and directories
respectfully.
File System Initialisation - in this section we discuss the initialisation
of the host and hidden file system structures.
- Host File System Initialisation - in this section we discuss a
simple host file system implementation.
6.5. CONCLUSION 127
- Hidden File System Initialisation- where we discuss the initiali-
sation for each of the hidden file system structures, with emphasis
on the physical limitations presented imposed by the host file sys-
tem.
6.5 Conclusion
The effective implementation of a steganographic file system relies on the
effective design of the structures which will support the storage and retrieval
of hidden data. By designing these structures in such a way as to enable
data to be reallocated allows problems such as data collisions to be avoided.
Steganographic file systems must be designed to provide convenient and
transparent security for hidden data, as well as providing an adequate level
of plausible deniability, this is facilitated by the file system's metadata struc-
tures.
In this chapter we discussed the framework for a steganographic file sys-
tem implementation by discussing the structures that will be used to man-
age and store hidden data. These structures will allow for convenient and
transparent integration with a host file system, giving the hidden file system
maximum flexibility.
In section 6.2 we discussed the construction of the structures that will
be used to store and manage hidden data within the hidden file system. In
section 6.3.1 we discussed the initialisation of these structures in order to
allow the hidden file system to enter an operational state.
In the following chapter we will discuss the file system operations. These
operations will operate upon the file system structures in order to allow data
to be stored and retrieved.
Chapter 7
File System Operations for
SSFS
7.1 Introduction
In order for the steganographic file system to operate, a number of data
operations must be defined to allow a user to interact with the file system
data structures. These operations are used to store and retrieve various forms
of data from within the file system. There are two different generic types of
operations, those which operate on metadata, and those which operate on
data. When a user requests access to data, there are a number of interactions
between the file system's components.
In this chapter we will be discussing the file system operations. We intro-
duce a number of the file system's operational layers which are used to group
file system operations in operational categories. Each layer of operation will
interact with other layers in order to achieve the storage and retrieval of data.
Firstly we will introduce a layered approach for the file system operations
in section 7.2. We will go on to discuss the low-level operations in section
7.3; this is where the lowest level of functionality is defined, and is primarily
concerned with input and output to the physical disk. We will then discuss
the intermediate-level operations in section 7.4 and the role which they play
in maintaining the file system's metadata. Finally we will discuss the high-
level operations in section 7.5 as a mechanism for the storage and retrieval
of data.
129
130 FILE SYSTEM OPERATIONS FOR SSFS
Disk
-----1 Low-Level Operations j
I
I
I
I
I
I
I
I

I Intermediate-Level Operations I
I
I
I
I
I
I
I
I

II ________ High-Level Operations j
Figure 7.1: File system operation layers
7.2 Layered File System Operations
The steganographic file system uses operations which are designed in layers of
functionality. This is done to provide interoperability between the "logical"
and "physical" layouts of the file system, and in order to provide modu-
larity. There are three main operational layers namely: the low-level, the
intermediate-level, and the high-level operations. The interaction between
these layers can be seen in figure 7.1.
The low-level operations are concerned with operating on the "physical"
location of the data on the disk. These operations are usually provided by the
operating system kernel API, and will provide low-level functionality such
as moving the heads of the hard disk drive, spinning the platters, and read
and writing to the physical device. The low-level operations are discussed in
the following section.
The intermediate-level operations are concerned with the logical place-
ment of data, and the translation between the logical and physical locations.
This layer is implemented within the file system implementation and inter-
acts with both the low and high level operations. This layer also includes
the security mechanisms in the form of cryptographic routines, which will be
covered in the following chapter.
The high-level operations allow for user interactions, and are concerned
only with the logical position of the hidden data within the steganographic file
7.3. Low-LEVEL OPERATIONS 131
system. This layer contains all the operations relating to the manipulation
of files and directories. This layer also provides an interface for a human user
to interact with the file system, usually via a command shell.
The interaction between the operational layers can be seen graphically in
figure 7.1.
7.3 Low-Level Operations
All operations that are performed on the file system are defined in terms of
a number of low-level operations on the physical storage medium. The most
basic low-level operations are that of reading and writing single blocks to a
particular physical location. Every other file system operation relies on the
ability to read and write to the physical disk.
The read operation is very simple, and takes the form of a function that
will read a specified number of bytes from a location on the physical medium
into primary memory. The write operation works in the same way, it simply
writes a specified number of bytes from primary memory to a location on
the physical medium. These two operations are usually provided by the
operating system kernel API.
The design of the steganographic file system does not require any modifi-
cation to these functions and can be used as provided by the kernel API. In
order to fully discuss other higher level file system operations, the overview of
the read and write operations as provided by a kernel API will be discussed
below.
7.3.1 Read and Write Operations Overview
The most basic read and write operation that is required is that which reads
a number of bytes from a physical position on a physical device, such as a
hard disk drive. A file system will most often read and write blocks that
are of a consistent size, which would normally be the file system block size.
The operation of the read and write functions can be seen graphically in
figure 7.2.
The file system implementation will define a number of situations in which
to utilise the kernel read and write operations. These operations are defined
in order to simplify the access of data from the physical medium.
132 FILE SYSTEM OPERATIONS FOR SSFS
Initial disk state:
11111111111111
Read data:
Figure 7.2: Simple read and write operation overview
1. Read or write a number of bytes less than the file system block size
- this operation will be used when the file system reads or writes to
a file system structure that has a number of defined elements, such as
the inode table. This will allow the file system to write directly to the
physical location on the disk, which will increase performance.
2. Read or write an entire file system block - this will be the most com-
mon I/0 operation. When a file or a directory is accessed on the
physical device, the file system will access each file system block that
is allocated to the file or directory individually, which either will be
stored or retrieved from primary memory.
3. Read or write a stream of file system blocks - this operation will
read or write a stream of bytes from the physical device. This will
allow access to files and directories which are stored across a number of
different file system blocks. This operation will usually take the form of
a function which will read multiple file system blocks from the physical
device.
In the following section we will discuss the intermediate-level operations
as a mechanism for accessing file system metadata.
7.4. INTERMEDIATE-LEVEL OPERATIONS 133
Translation Map
Logical Disk
Figure 7.3: Logical to physical translation operation
7.4 Intermediate-Level Operations
The intermediate-level operations are concerned with the translation be-
tween the logical and physical layouts of the steganographic file system, and
the modification of the metadata structures. These operations accept instruc-
tions from the high-level operations and invoke the required low-level oper-
ations after a translation or encryption has taken place. The intermediate-
level operations also provide the security mechanisms so that data encryption
can be transparent.
7 .4.1 Logical-Physical Translation Operation
The process of performing the translation between the logical file system
address and the physical locations allows data to be stored and retrieved
from the physical medium. The translation is achieved by interacting with
the Translation Map (see section 6.2.3, on page 110).
As described in the previous chapter, the Translation Map consists of a
list of logical addresses and associated physical locations. The translation is
achieved by simply returning the physical address for a particular logical lo-
134 FILE SYSTEM OPERATIONS FOR SSFS
cation. When a high-level operation requests that a logical block is allocated
to a file system structure, a free physical block is located through interaction
with the host file systems storage map, and the mapping between the logical
and physical locations is created in the Translation Map.
This gives the ability for the physical location of the data to change,
without effecting the logical position within the hidden component of the
steganographic file system. This forms the basis for dynamic reallocation
and will be discussed further in a later chapter. In the following section we
will discuss the translation-map operations and its use in free block manage-
ment. The logical to physical translation operation can be seen graphically
in figure 7.3.
7.4.2 Translation-Map Operations
The Translation Map plays the dual role of providing the translation mech-
anism for the logical to physical translation, and marking logical file system
blocks as either allocated or unallocated. The allocation and deallocation of
blocks will be discussed below.
Block Allocation
Block allocation is a vital part of the steganographic file system implementa-
tion. Every block that is allocated in the Translation Map has an associated
Translation Map Entry, as seen in listing 6.3 on page 111. As discussed in
the previous chapter, the first byte is used to mark the entry as allocated
and the next 4-bytes are used to mark the physical location which is allo-
cated to the Translation Map Entry. Each Translation Map entry represents
a single logical block within the steganographic file system, where the first
Translation Map Entry represents the first logical file system block.
Allocation of a hidden file system block involves a number of steps, and
is demonstrated graphically in figure 7.4. We provide the steps as follows:
1. Find a free logical block in the Translation Map, this is a free block in
the hidden file system.
2. Acquire a write lock on the host file system's storage map.
(a) Find a free physical block in the host file system.
3. Release the write lock on the host file system's storage map.
7.4. INTERMEDIATE-LEVEL OPERATIONS 135
Translation Map
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Step 1
Find free block
Physical Disk
Step 2
Find free block
Translation Map
,_,..,on.,,rnnoTTT>nnnoTTornn
Step 3
Physical location stored in Translation Map
Figure 7.4: Block allocation
4. Map the physical block to the logical block in the Translation Map.
5. Return the logical block address to the calling function.
Allocating a logical block is a simple operation, which involves searching
through the Translation Map and finding an unallocated block. The unal-
located block is then marked as allocated and then a free physical block is
located in order to complete the mapping.
A free physical block is located by interacting with the storage map of the
host file system. This is either achieved through direct interaction with the
storage map structure of the host file system, or by using functions provided
by the host file system in order to locate a free physical block. Once a
physical block is located it is stored in the Translation Map Entry for the
newly allocated logical block. Finally the logical block address is returned to
the function that requested that a new block be allocated.
It is important to note that the storage map of the host file system is
not modified in any way by this operation. It is only used to find a free
physical block. By not modifying the host file system's storage map to mark
the physical blocks allocated to the host file system, this allows the physical
blocks to be available for allocation within host file system at a later stage.
136 FILE SYSTEM OPERATIONS FOR SSFS
When this event occurs the dynamic reallocation policy will come into play,
this will be discussed in detail in a later chapter.
In the following section block deallocation will be discussed. This oper-
ation is used when a file or directory no longer needs to utilise a particular
logical block.
Block Deallocation
Block deallocation occurs when a logical block is no longer needed by a file
system object. The block must be marked as unallocated and then can be
reused for a different object at a later stage. Block deallocation is a very
simple operation, and is accomplished in the following two steps:
1. Mark the logical block as unallocated.
2. Write the value "zero" to the entry field of the Translation Map Entry.
By marking the Translation Map Entry as unallocated, the stegano-
graphic file system will consider the block for allocation at a later stage.
As a security precaution, the value "zero" is written to the entry field of the
Translation Map Entry. This is done to ensure that any residual data that
is remaining in the physical block cannot be referenced back to the hidden
data in any way. In order to securely deallocate a block, the hidden file sys-
tem removes all traces of redundant data. If the mapping in the Translation
Map were to persist, if the block was no longer allocated, there would be a
possibility that the redundant user data could be exploited. To ensure that
this cannot occur, the deallocated data is complete removed.
These block allocation methods play an important role when physical and
logical blocks need to be allocated to files and directories. In the following
section we will discuss the inode operations, which control the set of blocks
which are allocated to a file or directory.
7 .4.3 In ode Operations
The inodes are used to store metadata concerning files and directories. Every
file or directory needs to have an associated inode in the inode table. An
inode will have to be allocated, modified or deallocated depending on the
operation requested by the file system. These operations will be discussed
below.
7.4. INTERMEDIATE-LEVEL OPERATIONS 137
Allocate Inode
Each inode entry in the inode table as a structure as seen in listing 6.4 on
page 113, with each entry being 128-bytes in size. All of the inode entries are
pre-allocated during initialisation of the inode table; this allows free inode
entries to be located and allocated to a file or directory quickly.
Inode entries hold items of metadata for files and directories within the
file system. The metadata items are kept to a minimum within the file system
implementation, in order to reduce the amount of information that can be
referenced back to a file, which will eliminate a number of security risks. As
such only the most basic information is recorded within the inode entry.
The process of allocating an inode entry is performed in three stages:
1. Initialise the metadata variables within the inode structure.
2. Allocate a number of logical blocks to the inode structure.
3. Write the inode entry to the physical device.
Firstly the metadata entries have to be initialised, the magic, mode, key,
and inode_number fields are initialised as described in the latter part of
section 6.2.4. These metadata fields allow the file system to recognise the
inode entry and to decrypt the associated data if requested.
Secondly a number of logical file system blocks must be allocated to the
inode entry. Files and directories need to be stored in logical blocks within
the file system, and the inode entry is where these blocks are stored and
managed by the file system implementation. As discussed in the previous
chapter, the inode entry contains a number of extents which can directly
store a number of contiguous block references.
The number of logical blocks the will be needed for a file or directory can
be determined as follows, where Osize is the byte size of the object, Bfs is the
file system block size, and Nblocks is the number of file system blocks required
to store the object:
rOsizel N
1-B = blocks
fs
(7.1)
Once the number of blocks required has been calculated the block allo-
cation methods which were discussed above are invoked in order to allocate
the logical blocks within the Translation Map.
138 FILE SYSTEM OPERATIONS FOR SSFS
Finally the inode entry is written to its proper location within the inode
table. This can be done directly, as the size of the inode entry will divide
cleanly into the file system block size, which allows an inode entry within the
inode table to fall between block boundaries. This allows the exact location
of the inode entry within the inode table on the physical disk to be easily
calculated.
Modifying an Inode
Modification of an inode entry will occur when the file or directory associated
with the particular inode is modified, such as when data is appended, or
deleted from the data stream of a file or directory.
In such a situation the number of blocks which are allocated to the object
may either be increased or decreased, and the overall size of the object must
be increased or decreased appropriately.
This operation does not require any other modification to the inode entry,
which results from only storing a minimum amount of information within the
inode entry structure.
The following two cases will require a modification to the inode structure:
1. Blocks are required to be appended to the data stream.
2. Blocks are required to be removed from the data stream.
Blocks are appended or removed from the inode structure in the same way
in which they were added when the inode structure was created. In the case
of an addition to the inode structure, the modification function will interact
with the block allocation function in order to allocate a new set of logical
blocks to the inode. This operation will maintain the
mapping within the Translation Map.
When blocks are required to be removed from the inode structure, again
the modification function will interact with the block deallocation function in
order to reclaim the logical blocks within the Translation Map. In both cases
the size of the file system object as represented by the size (see listing 6.4
on page 113) field in the inode entry structure has to be modified in order to
reflect the correct byte size of the object.
7.4. INTERMEDIATE-LEVEL OPERATIONS 139
Deallocate Inode
Inode deallocation will occur when the file system is requested to delete a
file or directory, in which case the associated inode will need to be reclaimed.
The inode will then be available for later allocation by a different file system
object.
A number of steps must be taken to ensure the inode has been deallocated
securely. If the inode was simply marked as unallocated but the metadata
remained intact; then information concerning deleted files and directories
could be obtained through examination of the remains of the inode. In order
to securely deallocate an inode, all the metadata must be overwritten; this
can be done by writing "zero" to the entire inode structure. This will ensure
the metadata associated with a deleted object will not remain within the file
system.
There are therefore a number of steps required in order to securely remove
an inode from the inode table, these steps are listed below.
1. Mark the inode as unallocated.
2. Deallocate the blocks that are allocated to the inode entry.
3. Write "zero" to the entire inode entry structure.
4. Write the zero filled structure to the inode table.
In order to deallocate the blocks which are allocated to the inode entry,
the inode deallocation function will interact with the block deallocation func-
tion. Once this has occurred the blocks which were allocated to this inode
entry will be available for later reallocation by another file system object.
Finally the unallocated inode structure is written to the correct physi-
cal location within the inode table. This will allow the inode entry to be
reallocated to another file system object sometime in the future.
The intermediate-level operations discussed above provide the ability to
interact with the file system's control structures. In the following section we
will discuss the file and directory operations which form the high-level oper-
ations. These operations will interact with the intermediate-level operations
in order to maintain the files and directories within the steganographic file
system.
140 FILE SYSTEM OPERATIONS FOR SSFS
7.5 High-Level Operations
The operations are concerned with operating on the logical po-
sitions of files and directories within the hidden component of the stegano-
graphic file system. The operations will rely on the oper-
ations, discussed in the previous section, to perform the translation between
the logical file system position and physical locations on the physical medium.
Restricting the operations to only operate on the logical po-
sitioning of data, allows the steganographic implementation to freely reallo-
cate the underlying physical location of the hidden data, while maintaining
a consistent logical position. This allows data to be easily located within the
hidden file system regardless of the underlying physical location.
In the following sections we will discuss the file and directory operations,
which are the two sets of operations which allow a human user to
interact with the data which is stored within the file system.
7.5.1 Directory Operations
Directories are used to form the hierarchical structure of data within the file
system. As such their existence is used to give an organisational structure
to the file system. The operations on a directory can be seen as simplified
file operations, this is because a directory in its simplest form is a stream of
bytes, not unlike that of a file.
A directory is made up of a linear list of directory entries, called a direc-
tory stream. Each directory entry has a structure as was seen in listing 6.5
on page 115, where each entry has a reference to a related inode number
and a variable length name. The inode number will
either reference another directory, a or a file.
An important element of every directory is that it must contain the di-
rectory entries for the root and the parent, designated by '.' and ' .. '. The
root entry will reference the inode number for the particular directory entry,
and the parent entry will reference the inode number for the directory which
is the hierarchical parent of a
The special case is the so called "root directory", which is the first di-
rectory entry which is created during file system initialisation, which acts
as the parent for all subsequently created files and directories. The root
has the "root entry" and the "parent entry" both referencing the root itself.
7.5. HIGH-LEVEL OPERATIONS 141
This is done to prevent a user from attempting to navigate to a non-existent
directory above the root.
In the following sections we will discuss the operations that can be per-
formed on a directory. These operations allow the user to create and maintain
the hierarchical organisational structure.
Creating a directory
The process of creating a directory allows a hierarchical organisational struc-
ture to be established within the file system, and can be used to organise
files and other directories into a logical structure.
As discussed in the previous chapter, each directory is constructed of a
number of directory entries, which is described in listing 6.5 on page 115.
Each directory entry is used to describe a particular item in the directory.
Each entry has a variable length name and an associated in ode number.
In order to create a directory there are two elements of the overall direc-
tory structure which need to be considered, the parent directory entry and
the new directory entry that is to be created. The parent directory will be
the directory where the new directory is to be contained. In order to reflect
the new directory entry within the directory structure a reference to the new
directory must be appended to the directory stream of the parent. This will
require modification to the inode entry of the parent, and may require that
new logical blocks be allocated.
Firstly, the new directory entry structure must be created and written to
the disk, this will require that an inode be allocated, and logical blocks be
allocated to the inode. Once this has been accomplished the root and parent
directory entries are created. The root being the inode number of the inode
which is allocated to this new directory entry, and the parent being the inode
number of the inode allocated to the parent directory, designated by ' . ' and
' .. ' respectfully. This will allow the directory structure to be traversed by
a user at a later stage.
Once the new directory entry has been created and written to the disk, the
parent's directory entry can be modified in order to reflect the new directory
entry. This is achieved by appending a new directory entry structure to the
parent's directory entry. The name of the new directory and the inode which
was allocated to it is then appended to the parent's directory. The name of
the new directory is simply a human understandable representation of the
directory to allow a user to easily identify it.
142 FILE SYSTEM OPERATIONS FOR SSFS
Create new Directory Entry
Allocate Blocks
I Translation Map I
Modify Write
~ ~ Parent Directory Stream l Allocate Inode Disk
I Inode Table I
Figure 7.5: Creating a directory
The steps involved in creating a new directory can therefore be sum-
marised as follows, and represented graphically in figure 7.5:
1. Create the new directory entry
(a) Allocate a set of logical blocks to hold the directory entry.
(b) Allocate an inode entry in the inode table for the new directory
entry.
2. Modify the parent's directory entry to reflect the new directory entry
by appending the name and inode number of the new directory.
Reading from a directory
Reading the contents of a directory occurs when a user requests that a direc-
tory listing be retrieved, such as by using ls command on a UNIX system,
or when a user requests an item in a directory by its name, in which case
the associated inode number must be retrieved. In both cases the directory
entries will need to be parsed in order to extract the required information.
Each directory entry is a variable sized structure, and the directory entries
within the directory stream are not in any particular sorted order which must
be taken into account when parsing the directory entries. The directory data
stream specified by the allocated blocks in the inode will contain multiple
directory entries, one for each of the object which is stored in a particular
directory. The offset of each directory entry is not exact, and therefore in
7.5. HIGH-LEVEL OPERATIONS 143
order to locate a particular directory entry every entry preceding it must first
be analysed.
Within the directory entry structure, the size field specifies the character
size of the name field (see listing 6.5), and the inode_number field will contain
the inode number of the associated inode. By iterating through the list of
directory entries and by keeping track of the size of each entry, the exact
offset of a particular entry within the directory stream can be calculated.
In order to retrieve a directory listing every directory entry in the direc-
tory stream must be analysed. This is simply a matter of iterating through
the entire directory stream and returning each directory entry in turn.
Listed below are the generic operations that must be performed in order
to either locate a particular directory entry or retrieve a directory listing.
1. Read the directory stream into primary memory using the inode allo-
cated to the directory in order to locate the allocated blocks
1
.
2. Iterate through the directory stream, keeping track of the overall size
of each directory entry.
In order to retrieve a directory listing each directory entry in the
directory stream is analysed and returned.
To retrieve a particular directory entry each preceding entry must
be iterated over, and then the particular entry returned.
Writing to a directory
Writing to a directory stream will occur when a file or directory is created.
Each new file and directory must be contained within a parent directory, in
this way building up the hierarchical structure. Every new file or directory
must therefore have a unique directory entry in order for it to be locatable
within the directory structure. This is done by appending a new directory
entry to the directory stream of the parent directory.
In the event of the directory stream not being able to accommodate a new
directory entry, additional blocks will have to be allocated to the directory
stream. This is achieved by locating a free logical block and appending it to
the directories inode entry.
1
There are obvious memory concerns, however only directory streams which contain a
large number of directory entries will present a significant problem. This can be solved by
only reading a single file system block at a time into primary memory.
144 FILE SYSTEM OPERATIONS FOR SSFS
This is a relatively simple operation because the directory stream of the
parent does not need to be iterated over, as discussed in the previous sec-
tion. The overall size of the parent's directory stream is known, and can be
retrieved from the associated inode entry. All that is required is to create the
directory entry for the new file system object and then append that entry to
the directory stream.
In order to create the directory entry for a sub-directory, the process
discussed in the section above is followed. To create a directory entry for a
file, a similar process is taken and will be discussed in the following section.
Once the new directory entry has been created then it can be appended to the
parent's directory stream. This may involve more file system blocks needing
to be appended to the parent's inode in order to house the new directory
entry. Finally the size of the parent's directory stream is then modified in
the parent's inode entry.
The process used to write to a directory stream is summarised below.
1. Obtain the size of the particular directory stream from the correspond-
ing inode entry.
2. Read the parent's directory stream into primary memory.
3. Create the new directory entry.
The structure of a directory entry for a file or directory is identical.
To determine if a particular directory entry corresponds to a file
or directory the mode field (see listing 6.4, line 19, on page 113)
in the corresponding inode entry is examined.
4. Append the new directory to the parent's directory stream; this will
increase the overall size of the stream.
5. Allocate more logical file system blocks to the parent directory as
needed.
6. Modify the parent's inode entry to reflect any new allocated logical file
system blocks and the new size of the stream.
7. Write the parent's directory stream to the correct physical location on
the physical disk.
7.5. HIGH-LEVEL OPERATIONS 145
Deleting a directory
The deletion of a directory will occur when it is no longer needed by the file
system user. To remove a directory from the file system there are a number
of events that must occur in order to ensure a secure removal of data from the
file system. A directory cannot be removed if it contains other files or sub-
directories. A directory must be completely empty in order to be removed;
this is to ensure that the user will not mistakenly delete valid objects.
If a directory is empty then the file system will permit it to be removed.
Deleting a directory will require modification to the Translation Map, the
inode table, the parent directory, and of course the directory entry itself.
Firstly, the directory stream of the directory that must be removed must
be overwritten with either zero or random values. This is to ensure the secure
removal of the directory stream. If this is not performed then information
can be obtained from the remnants of the directory stream concerning the
files and sub-directories which it contained.
Once the directory data has been overwritten, the inode which was al-
located to the directory can be reclaimed, which will in turn reclaim the
logical blocks in the Translation Map which were allocated to this directory.
The inode and logical blocks will now be available for later reallocation by
subsequent file system objects.
Finally the directory stream of the parent directory must be modified in
order to no longer reflect the deleted directory. This is achieved by removing
the particular directory entry from the parent's directory stream. Removing
the directory entry is done by "shifting" all the proceeding directory entries,
and thus in effect removing the particular entry. The size of the parent
directory as reflected in the inode entry must then be modified.
To summarise the above process, in order to remove a directory entry
from the file system the following steps must be taken:
1. Ensure that the directory to be removed is empty.
2. Overwrite the directory stream with zeros.
2
.
3. Reclaim the inode entry for this directory from the inode table.
The logical blocks allocated to this directory will be reclaimed.
2
0nly a single overwrite is used, however this can be increased to ensure that the
directory stream is overwritten
146 FILE SYSTEM OPERATIONS FOR SSFS
4. Remove the corresponding directory entry from the parent's directory
stream.
In this section we discussed the directory operations which formed the
basis for interacting with the directory structure within the file system. In
the following section a number of file operations will be discussed.
7.5.2 File Operations
Files are the raw streams of data that form the bulk of information that
is stored within the file system. Files have no discernible structure as far
as the file system is concerned, only that they are a stream of bytes. Files
are described with inode entries and are stored within directories, and thus
require a directory entry.
Depending on the overall size of the file data, the file can be stored in
multiple file system blocks. The physical and logical location of these blocks
will differ, and need not be contiguous. It remains the aim of the file system to
manage the storage and retrieval of the blocks that are related to a particular
file when the human operator requests them.
Files differ only slightly from directories, in that directories have an im-
plied structure. The operations on files and directories are very similar, but
directory operations are generally more complex, because of the structure
that must be maintained.
In the following sections we will discuss a number of file operations, which
allow the human operator to interact with the file data that is stored on the
disk.
Creating a file
The process of creating a file is the basis for storing meaningful information
within the file system. Every file that is created has a number of associated
file system metadata structures. Files and directories are technically both
streams of bytes that are stored within the file system, the only difference is
that there is a structure that is imposed on the directories, where as there
are no such restrictions on file data.
For each file that is active within the file system, a set of logical blocks,
an inode entry, and a directory entry must be created. Files will normally
occupy more file system blocks than directories, and as such greater care
7.5. HIGH-LEVEL OPERATIONS 147
must be taken when allocating logical file system blocks, as indirect and
double-indirect block may have to be allocated within the inode.
To create a file, firstly an inode entry must be allocated to contain the
file's metadata. The inode is allocated through interaction with the inode al-
location functions which will allocate the inode within the inode table. When
files are initially created they do not have any file system blocks allocated to
them, and as such the overall size of the file remains zero. File system blocks
are added to the file's inode entry once data is written to the file.
Once the file has a valid inode entry, a directory entry must be created
in order to reference the file within the hierarchical directory structure. This
is a similar process as with the creation of a directory discussed above. A
directory entry for the file must be created within the directory stream of the
parent directory. The parent directory is simply the directory that is used
to house the file.
Once the directory entry has been created, the file is now available for
use within the file system. The process used to create a file is summarised
below.
If there are no longer any hidden file system blocks available for allocation
then the file cannot be created, and an error is returned to the user.
1. Allocate an inode to the new file. The file will initially have a no
allocated file system blocks and a size of zero.
2. Create a directory entry for the file.
3. Append the directory entry to the directory stream of the parent.
Writing to a file
Writing to a file assumes that there has already been a file created within the
file system which can be written to. The file may either have no size (a size
of "zero" reflected in the inode entry) or any arbitrary size. In both cases
the process of writing to a file is the same, and is accomplished in a number
of steps.
A file may, or may not, contain existing data; this data is stored at a
logical position within the steganographic file system, which is in turn stored
in a physical location on the hard disk. The bytes which make up the file
data are referred to as the file stream.
148 FILE SYSTEM OPERATIONS FOR SSFS
Firstly a number of file system blocks may need to be allocated to the
file; this will only need to occur if the data that is to be written to the file
will exceed the available amount of space that is available in the currently
allocated blocks. In the case of writing to a file which is zero bytes long, the
file system will always have to allocate new file system blocks to the inode
in order to store the new data. This will be achieved through interaction
with the inode modification methods which will in turn operate on the block
allocation methods to allocate new blocks to the file.
Once a set of blocks have been allocated to the file, new data can be
appended to the end of a file stream. For example, if the size of the current
file stream is N bytes, and the new data is M bytes long then the new data
will be written at byte position N + 1 in the file stream for a length of M
bytes, therefore increasing the overall size of the file stream to N + M bytes.
The inode entry for the file is then updated in order to reflect the new
size of the file. The inode is written to the correct position within the inode
table. The new data is then written to the physical disk.
The process of writing data to a file is summarised in the following steps.
1. Allocate more blocks to house the new data if needed.
2. Modify the inode entry to reflect the new blocks, and the increased size
of the file.
3. Write the inode entry to the inode table.
4. Append the new data to the existing file stream.
Reading from a file
All data that is stored within the steganographic file system will have to be
accessed as some stage. This will involve the file contents being transferred
from the physical disk to the primary memory. The processes involved in
reading the file stream from the physical disk to the primary memory involve
an interaction with a number of different file system structures.
Normally a user will specify a file using the human-understandable name
that is stored in the directory entry. The inode number corresponding to the
file system entry will have to be extracted from the directory entry in order
to access the data stream.
A file will have a size that is specified by the inode entry. A user can
only read a number of bytes from the file that is less than, or equal to, the
7.5. HIGH-LEVEL OPERATIONS 149
overall byte size. This restriction will prevent the user from accessing data
that does not form part of the actual file contents.
File data is stored within logical blocks in the steganographic file system,
which is referenced to a physical location. The read functions will have
to interact with the Translation Map in order to obtain the exact physical
location for a particular logical block. The dynamic-reallocation policy will
create a situation where the logical-to-physical is not constantly defined; as
such there must always be interaction with the translation-map in order to
obtain the correct mappings.
Data from the file stream can be read in many different ways, normally
this will involve reading the file data from the physical disk into a buffer in
primary memory to be utilised by the user. The user will specify an offset
with in a particular file and a length of bytes that should be read. The
file system will read those particular bytes from the physical disk and place
them into the user buffer. The read functions will interact with the low-level
input/output commands in order to achieve this. Care must be taken as the
encryption mechanisms within the steganographic file system will impact on
the way in which the bytes are read from the physical disk.
The process to read file data from the physical disk is summarised below
1. Obtain the inode number associated with the file from the parent's
directory entry.
2. Read the file data specified by the inode into primary memory.
(a) Obtain the physical location for each logical file system block.
(b) Read the data for each physical block into primary memory.
3. Return the requested file data to the user.
Deleting a file
Files are removed from the file system when the user no longer has a need
for them. An aspect of the steganographic file system is the mandatory
secure delete procedures within the file system. All data that is removed
from the file system must be completely removed. This includes all the file
data, and associated metadata. Normally, in order to improve performance,
file data is not removed from the file system, only marked as unallocated and
overwritten at a later stage with newer file data. This does present a security
risk, as deleted data can be recovered by examination of the physical disk.
150 FILE SYSTEM OPERATIONS FOR SSFS
In order to provide security and privacy, all data must be securely re-
moved. This process involves overwriting file data during deletion to ensure
that it cannot be recovered. This can however impact on performance, as a
large file will require a relatively large amount of time in order to overwrite
all of the file data. The performance impact is warranted to ensure security
and privacy.
A number of items must be considered when removing a file from the file
system. Firstly the blocks which were allocated to the file must be marked as
unallocated. These blocks can then be reallocated to new file system objects.
The inode entry corresponding to the file must be reclaimed; the inode entry
itself must also be securely removed to avoid any information about the file
being recovered. This can be achieved by either writing "zero" or random
values to the entire inode entry.
The directory entry corresponding to the file must be removed for the
parent's directory stream. This is done as discussed in the above sections.
Finally the file data must be securely removed by writing "zero" to every
physical location which was allocated to the file. This will prevent deleted
file data from being extracted from the steganographic file system.
To summarise the above process the following events must occur in order
to securely remove a file from the steganographic file system.
1. Deallocate the logical blocks allocated to this particular file.
2. Deallocate the inode entry corresponding to this file.
3. Write "zero" or "random values" to the inode entry to securely remove
the data it contains.
4. Remove the directory entry corresponding to the file from the parent's
directory stream.
5. Write "zero" or "random values" to the physical location where the file
was contained to securely remove the data.
7.6 Summary
In this chapter we discussed the following concepts:
Layered File System Operations - where we introduced the layered
model which is used to classify the steganographic file system opera-
tions.
7.7. CONCLUSION 151
Low-Level Operations- in this section we describe the low-level oper-
ations; these are the operations which interact with the physical device.
In this section we covered:
- Read and Write Operations - the operations which are used to
read and write data to the physical device.
Intermediate-Level Operations - where we discuss the intermediate-
level operations which are used to modify the hidden file system's meta-
data. In this section we covered:
- Logical-Physical Translation - these are the operations which
are used to perform the logical to physical translation via the
Translation Map.
- Translation-Map Operations - these are the operations which
support the logical to physical translation and control storage
management.
- !node Operations - these operations are used to control the meta-
data for files and directories.
High-Level Operations - in this section we discuss the operations
which are used to interact with the user. We discuss the following
concepts:
- Directory Operations- operations which are used to interact with
the directory structure of the hidden file system.
- File Operations - operations which are used to interact with the
file data stored in the hidden file system.
7. 7 Conclusion
In order for a user to interact with data in the steganographic file system
a number of operations must be defined. These operations will explain the
functionality for the file system, and will ensure secure storage and retrieval
of data. All of the above discussed operations require interaction with a
number of different file system layers to achieve the desired effect.
Firstly we introduced the file system layers which are used in order to
provide multiple layers of functionality. We then proceed to discuss the low-
level operations and the role which they play in interaction with the physical
disk. We then discussed the intermediate-level operations with regards to the
152 FILE SYSTEM OPERATIONS FOR SSFS
maintenance of the file system metadata. Finally we discussed the high-level
operations, specifically the file and directory operations which allow the user
to interact the data stored in the file system.
This chapter is presented in conjunction with chapter 6 in order to define
the layout and operation of the steganographic file system on the physical
device. The steganographic structures and operations are only concerned
with the embedding of the hidden data within the host file system. In order
for data to be securely hidden, the following chapter will present a scheme
for securing the hidden data through the use of cryptography.
Chapter 8
File System Security for SSFS
8.1 Introduction
Data security within the steganographic file system is achieved through two
primary techniques; information hiding and cryptography. These two ele-
ments work together in order to produce a security scheme which will ensure
data remains secure from attackers. There are a number of different aspects
which must be taken into consideration when implementing a data security
mechanism; these will be discussed in this chapter.
In this chapter we will be discussing the security scheme used by SSFS
which is used to ensure information security. This is achieved through the
use of cryptography. The encryption scheme must support the dynamic
reallocation policy which will be discussed in the following chapter.
Firstly we will give an overview of the security scheme with respect to
information hiding in section 8.2, and data cryptography in section 8.2. We
then discuss cryptography in section 8.3 with regards to how cryptographic
operations are implemented within the steganographic file system. The cryp-
tographic operational layer is then discussed, along with a discussion on
transparent encryption in section 8.4. The overall data encryption scheme
and encryption hierarchy are discussed in section 8.5 and 8.6 respectfully.
Finally a number of performance considerations are presented in section 8.7.
8.2 Security Overview
One of the primary goals for a steganographic file system is to provide a
high level of data security. Data is secured through two techniques; namely
153
154 FILE SYSTEM SECURITY FOR SSFS
information hiding and cryptography. Both techniques are used concurrently
in order to provide a security model which will ensure that data will remain
secure.
Information hiding capabilities are built into the structure of the stegano-
graphic file system as discussed in the previous chapters. This provides meth-
ods to ensure that data is securely hidden within the structure of a host file
system.
Cryptography is used to construct another layer of security which will
work in conjunction with the information hiding techniques in order to en-
sure the security of data. These two combined methods will act together to
provide a complete solution for data security.
Both the information hiding and cryptographic methods will be discussed
in the following sections.
8.2.1 Security through Information Hiding
Information hiding is the primary principle on which a steganographic file
system is based. This allows data to be hidden within the structure of a
host file system. As discussed in the previous chapters, data is hidden in the
unallocated blocks of a host file system which can only be accessed through
interaction with the steganographic file system implementation.
The management of hidden data is maintained by the steganographic file
system implementation. Data security is derived, in part, from the process
of storing data within the unallocated blocks of the host file system. During
normal interaction with the host file system, a user will not be aware of the
hidden data stored within the structure of the host file system. The user will
only be exposed to data which is stored within the host file system.
The presence of the steganographic component of the file system will only
be known to the user which created the steganographic file system. Hidden
data can only be accessed through the use of a dedicated command shell
which will allow access to the hidden file system component, provided that
the correct access controls are met.
In an unencrypted environment forensic examination of the physical de-
vice will reveal the presence of hidden data. Examination of all the physical
blocks will reveal that there is a large amount of structured data which is not
referenced by the host file system. A forensic examiner could reconstruct the
steganographic data which is stored in the unallocated file system blocks.
8. 2. SECURITY OVERVIEW 155
Classic steganography provides a much better cover medium, as data can
be "completely hidden" within the high-level structure of a picture or audio
file. It is however limited in the amount of data which can be hidden, a
picture can only contain a certain maximum amount of steganographic data
depending on the overall pixel dimensions.
A steganographic file system can contain a much larger amount of data,
but is limited by the low-level nature of the cover medium. The unallocated
blocks of a host file system is not the ideal place where data can be stored, as
it can be easily overwritten due to the dynamic nature of a file system. Data
protection methods need to be put into place to ensure that data is not inad-
vertently overwritten. The ability to store a large amount of steganographic
data is the greatest appeal for a steganographic file system.
As discussed above, hiding data within the unallocated file system blocks
of a host file system provides very little protection for the data, an experi-
enced user or forensic examiner could easily extract the hidden data. Extra
security measures must come into play in order to provide complete security
for the hidden data. This is achieved through the use of cryptography, which
will ensure that hidden data remains secure, even if it is detected. The use of
cryptography to secure hidden data will be discussed in the following section.
8.2.2 Security through Cryptography
Information hiding and cryptography are used together within the stegano-
graphic file system in order to provide a greater level of security. As discussed
above, information hiding alone will not sufficiently obfuscate the presence
of steganographic data. If data is hidden in its plaintext form, it is a simple
matter to extract and reconstruct the information.
Cryptography is used to ensure that the hidden data will remain secure,
even if it is detected by a third party. The decrypted form of the hidden
data will only be accessible with the correct passphrase which is only known
to the owner of the data.
The steganographic file system allows for data to be transparently en-
crypted and decrypted when the correct passphrase is given. The file system
user will be unaware of the encryption process, which will allow for more
efficient access to the hidden information.
The hidden data is encrypted in such a way as not to be effected by
the dynamic reallocation of the hidden data, which will be discussed in the
following chapters. This ability stems from the logical organisation of the
156 FILE SYSTEM SECURITY FOR SSFS
hidden file system blocks within the physical device. This will allow data
to be decrypted and access regardless of the underlying organisation of the
physical blocks.
Introducing cryptographic routines to the file system implementation does
introduce a performance impact, as all data must be encrypted and decrypted
as it is requested, using a unique key. The security of the system warrants
the performance impact which cryptography introduces. This impact can be
minimised through the use of modern, efficient, cryptographic algorithms.
The performance concerns will be discussed in a later section.
In the following section we will discuss the use of cryptography within
the steganographic file system.
8.3 Data Cryptography
All data which is store within the steganographic file system is transparently
encrypted and decrypted as required by the user. This allows the complex-
ity of managing data encryption to be the responsibility of the file system
implementation.
Data encryption is implemented as an operation within the intermedi-
ate layer (see section 7.4, on page 133). A cryptographic block cipher (see
section 3.3, on page 37) is used for the encryption process, as this simpli-
fies the overall implementation. Data is encrypted and decrypted as discrete
blocks of data of a particular size, which is depended on the block size of the
cryptographic algorithm being used and the file system block size.
In this section we will discuss the choice of cryptographic algorithm as
this will play an important role in the overall performance of data access
within the file system.
8.3.1 Choice of Algorithm
The choice of cryptographic algorithm will have a direct impact on the over-
all performance of the file system implementation. Different aspects of the
algorithm will influence the design and construction of the file system. There
are a number of considerations which must be made when selecting a cryp-
tographic algorithm, these are listed below.
8.4. CRYPTOGRAPHIC LAYER 157
1. Block cipher- the cryptographic algorithm must be a block cipher.
This allows the file system to encrypt and decrypt "blocks", which can
be easily reallocated if needed.
2. Cryptographic block size - as discussed in the previous chapters,
a file system reads and writes data in discrete blocks, equivalent to the
file system block size, which is a multiple of the physical block size.
The cryptographic block size and the file system block size must be
cleanly divisible, to allow for interaction between the two components.
3. Performance - modern cryptographic algorithms are designed to
maximise performance; to encrypt and decrypt as fast as possible. This
will allow data to be encrypted and decrypted within the file system
very quickly and efficiently. This will have a direct impact on the overall
performance of the file system implementation.
4. Security -the cryptographic algorithm must provide an adequate
level of data security in terms of the strength of the overall crypto-
graphic cipher. As modern computers become more powerful, crypto-
graphic algorithms become more secure to prevent unauthorised par-
ties from accessing the encrypted data. This aspect is dependent on
the overall strength of the user's passphrase, as a weak passphrase will
negate any security benefit the cryptographic algorithm provides.
The Serpent algorithm (see section 3.3.3, on page 43) is a good example
of a current cryptographic algorithm that satisfies the above requirements.
Any block cipher can be chosen to perform this task, the particular cipher
will be chosen in line with the functional requirements for the file system
implementation.
In the following section we will discuss the cryptographic layer of SSFS
with regards to the interaction with the overall file system implementation.
8.4 Cryptographic Layer
The cryptographic functions are implemented as an extension of the inter-
mediate layer, as discussed in section 7.4 on page 133. The aim for the cryp-
tographic layer is to implement a transparent encryption extension to the
file system, which will allow file system data to be encrypted and decrypted
without user interaction.
158 FILE SYSTEM SECURITY FOR SSFS
I Low-Level Operations I
Intermediate-Level Operations
l
Cryptographic Layer
I High-Level Operations I
Figure 8.1: Cryptographic layer
A cryptographic block cipher operates on discrete sized blocks of data.
This will allow the cryptographic layer to exist within the intermediate layer,
which is also concerned with discretely sized blocks of file system data. As
discussed in the previous chapter, when data is stored within the file system,
data flows through each of the operational file system layers culminating in
storage on the physical device. When data is accessed from the file system,
it again flows through each layer until it is presented to the user. Each file
system operational layer operates on progressively smaller sized "portions"
of data.
As mentioned above the cryptographic layer is an extension of the inter-
mediate operational layer, when data flows through the intermediate opera-
tions it is either encrypted and written to disk, or decrypted for presentation
to the user. The flow of information through the file system operational
layers is shown graphically in figure 8.1.
In the following section we will discuss the concept of transparent en-
cryption and the importance which it plays in the overall data encryption
scheme.
8.4.1 Transparent Encryption
The concept of transparent encryption is important to the overall operation
of the steganographic file system implementation. Transparent encryption
allows the complexities of data encryption to be taken away from the user,
and allows the file system to manage all the encryption and decryption of
user data.
8.4. CRYPTOGRAPHIC LAYER

: Disk
I
I
l !
Operation
:---------}---------------------
1
Transparent Encryption
i
j Intermediate Layer j
Unencrypted:
Figure 8.2: Transparent encryption
159
When a user is working with the steganographic file system, all data
which is requested will be encrypted or decrypted as it is accessed from the
physical device. The user will not have to interact with the cryptographic
operations in any way. The user will only interact with the plaintext form of
the data, with the lower levels of the file system implementation managing
the encryption and decryption process.
The transparent encryption mechanism operates on the user data. This
will allow the user to remain confident that the data which is being stored
within the steganographic file system will be secure. Ideally the crypto-
graphic algorithm which is used to encrypt and decrypt the user data should
be capable of operating as fast as possible, in order to maximise performance.
Data flow with a transparent encryption extension is demonstrated graph-
ically in figure 8.2.
In the above sections we discussed the requirements and components of
the cryptographic system which is used to manage the transparent data en-
cryption. In the following section we will discusses the use of unique initial-
isation vectors to form an overall system with a greater level of security.
160 FILE SYSTEM SECURITY FOR SSFS
8.5 File System Data Encryption Scheme
In order to achieve a transparent encryption mechanism, discussed in the
above section, an encryption scheme for the file system implementation must
be developed. There are a number of aspects of the file system and the
cryptographic system which must be considered, in order to fully implement
a transparent encryption system.
8.5.1 Data Classes
All data which is stored within the file system is divided into two different
data classes, namely system data and user data. This division of data is done
in order to specify different realms of data, where access to the specific data
class can be controlled through the file system implementation.
System data consists of the Superblock, the TMap Array, and the Trans-
lation Map. User data consists of the Inode table, and the file and directory
data. Access to the encrypted data is managed by the file system imple-
mentation and controlled through the type of data class to which the data
belongs.
Access to the system data is not as tightly controlled as the access to
the user data. As discussed in the previous chapters, only a minimum of
information is contained within the steganographic file system's metadata
structure, particularly to minimise the exposure of the user data. At the
same time to provide adequate support for the interaction between the host
and hidden file system implementations. This allows the system data to not
be as tightly controlled, as the exposure of user data will be minimal.
All user data is encrypted with a unique initialisation vector which will be
discussed in detail in the following sections. System data must be available to
the host file system in order to facilitate in the dynamic reallocation process,
which is why there are less restrictions on this data class.
In order to allow for interaction between the particular data within the
hidden file system, the interactions between the system and user data will
be discussed in the following section.
8.5.2 Interactions
The need for specific data classes can be seen by presenting the following
scenario. The file system implementation must be able to dynamically real-
8.6. ENCRYPTION HIERARCHY 161
locate data within the hidden file system when the host file system request
that a specific file system block be written to. In order to maintain security
the host file system must not have access to the user data stored within the
file system, yet the encrypted user data must be able to be reallocated when
requested.
To achieve this, the host file system will require access to the hidden
file system's Translation Map to determine if a particular file system block
contains any hidden file system data. As discussed in the previous chapters,
the Translation Map is itself reallocatable; as such the host file system will
require access to the hidden file system's TMap Array in order to locate the
exact position of the Translation Map.
The Translation Map and TMap Array must be accessible to the host
file system. By keeping a minimum of information within these metadata
structures, and ensuring that user data is strongly encrypted, there is only
a very limited risk presented to the user data. The worst case is that an
attacker can determine the location of the encrypted data, not its content or
construction. An inexperienced user should never be aware of the presence
of the hidden data.
It is important to note these interactions because of the direct impact
which it will have on the security of the hidden file system, and to ensure
that user data remain strongly encrypted.
In the following section we will discuss the encryption hierarchy which is
used to ensure that user data will remain secure.
8.6 Encryption Hierarchy
The encryption hierarchy is used to define a security scheme which will en-
sure that user data will remain secure. The process utilises multiple randomly
generated initialisation vectors (IVs), on multiple levels, in order to encrypt
user data. This process is controlled with the master passphrase which the
user will specify when the file system is initialised. This process will ensure
that the exposure of the master passphrase, and therefore the user data, is
reduced and this will limit the possibility that the passphrase can be brute-
forced1.
1
Bruteforcing - the process of extracting a passphrase by randomly trying different
possibilities until the passphrase is found. Usually this will occur halfway through the
key-space
162 FILE SYSTEM SECURITY FOR SSFS
The encryption hierarchy is formed by using the master passphrase when
the file system is initially accessed, this will give access to another unique
randomly generated IV which will in turn give access to the file system's
metadata. This IV can then be used to obtain the IV s for specific user data.
The process of using a number of different IV s to secure different aspects of
the overall file system will greatly increase the overall security.
This scheme is similar in design to the Derived Unique Key Per Transac-
tion (DUKPT) key management scheme. This allows hidden data to remain
secure even if one of the encryption keys is compromised. Although DUKPT
is normally used to secure transactions between two parties, the idea can be
adapted to allow SSFS to secure hidden data with a set of unique initialisa-
tion vectors.
In the following section we will discuss the initialisation vectors in detail
with regards to the overall encryption scheme, and their use in accessing
various aspects of the file system data.
8.6.1 Initialisation Vectors (IV)
Data which is stored within the hidden file system is encrypted with randomly
generated initialisation vectors (IVs). For the purposes of this discussion we
assume the use of the Serpent algorithm as discussed in section 3.3.3. The
Serpent algorithm uses a 256-bit IV for encrypting data. The IV can range
in size from 64-bits to 256-bits, as an IV shorter than 256-bits is padded with
zero so that the IV is always 256-bits in size.
An IV of 256-bits gives 2
256
(or 1.15792089 x 10
77
) different possible key
combinations. This key-space is so large that it is impractical to randomly
guess the correct key using modern computers.
There is a unique IV which is generated for all the data which is stored
within the hidden file system. When the file system is created, the user will
specify their master passphrase which will be transformed into an IV to be
used to "unlock" the file system metadata. The inode entries can then be
accessed, which will in turn allow the directory structure and user data to
be accessed.
As discussed in the above chapters, there is no distinction between direc-
tories and files, they are both considered to be forms of user data. When
user data is created, a randomly-generated IV is stored within the associated
inode entry (see listing 6.4 on page 113). This IV is then used to encrypt
and decrypt the user data transparently when requested.
8.6. ENCRYPTION HIERARCHY 163
The inode table is also encrypted transparently, using the key field within
the superblock. Portions of the superblock are encrypted with the master
passphrase. These encryption levels give complexity to the overall system,
making it very difficult for an attacker to forcibly gain access to the user data
as multiple layers of encryption will have to be overcome.
In order to facilitate the interaction between the host file system and
the hidden file system, the Translation Map and TMap Array will remain
unencrypted. This is to allow for dynamic reallocation to take place. This
does not pose a significant security threat as to the unaware user these will
appear to be blocks of unrelated integers.
As can be seen in figure 8.3, the encryption hierarchy is formed through
the interaction of multiple IV s. Firstly the master pass phrase is used to
access the file system metadata. The superblock IV is then used to access
the individual inode IV s. Finally the user data is accessed on the physical
device.
User File System
I Master Passphrase L -L Superblock IV
I
I
I
I
I Inode Table I
I
I
I
!node IVs I
I
I
I
I
I
User Data
I
EJ
I
I
I
I
I
Figure 8.3: Initialisation vector hierarchy
The IV hierarchy can also be described as seen below, where IVmp is the
master passphrase, and l"Vsb is the superblock IV. IV[o,n] is a set of IVs which
are used to encrypt and decrypt the user data, U D[o,n], such that IVa is used
to encrypt U D
0
, and /V
1
is used to encrypt U D
1
. There number of unique
items of user data, n defines the size of the set of randomly generated IVs.
164 FILE SYSTEM SECURITY FOR SSFS
IVmp -- IVsb -- IV[o,n] -- U D[o,n]
The master passphrase is used to access the superblock IV. The su-
perblock IV is then used to access the set of IVs relating to the unique
items of user data. This set of IV s is stored in the inode entries for each item
of user data.
In the following section we will present an operational scenario in order
to explain how different portions of the encryption hierarchy will interact in
order to secure the hidden user data.
8.6.2 Operational Scenario
In order to fully discuss the operation of the IVs with regards to the en-
cryption hierarchy, an operational scenario is presented below. When a user
initialised the file system, the master passphrase is specified. This is used
to encrypt various portions of the file system metadata. The superblock
IV, specified by the i v field in the superblock structure (see section 6.1 on
page 107), is used to encrypt the inode table and the inode entries. Each
inode entry contains a key field (see listing 6.4 on page 113), which is used
to encrypt the associated user data.
The user will request operations on the user data, and will always be
presented with the unencrypted plaintext form, providing that the correct
master passphrase is specified when the hidden file system is initially ac-
cessed. When a user request hidden data, the file system will use the master
passphrase to decrypt and access the superblock. The IV which is stored
in the superblock will then be used to decrypt the inode table, and finally
access the particular inode entry of the data being requested.
A particular inode entry contains a unique IV which will be used to
decrypt an item of hidden user data. All the hidden data will therefore be
encrypted using a different IV, which will increase the overall complexity of
the encryption scheme.
8. 7 Performance Concerns
The overall performance of the steganographic file system must be kept in
mind, especially concerning the transparent encryption operation. There
8.7. PERFORMANCE CONCERNS 165
should not be an excess amount of time which is consumed by the encryp-
tion and decryption process. In the following section we will discuss the
transparent encryption operations with regards to various file system opera-
tions.
Accessing the file system metadata
Portions of the file system metadata are encrypted with different IVs, such as
portions of the superblock and the inode table. In order to access the inode
table, it must first be decrypted. All inode entries in the inode table will
be encrypted with the same IV. Access to an inode entry will only require a
single level of encryption.
The encryption of the inode entries will have an impact on the access of
the file and directory data, which will be discussed below.
Accessing files and directories
As discussed above, file and directory data is encrypted with a unique IV.
This will increase the overall security of the system as different portions of the
user data are encrypted independently of each other. This does not present
a large performance impact as access to a single file or directory will only
require one to two accesses to encrypted data.
As discussed in the previous chapters, directory streams contain a linear
list of all the objects which they contain, in the form of a human-readable
name and an associated in ode number. When a directory stream is de-
crypted, access to this entire list will be given. This will allow the user to
navigate the directory structure. Directory streams are not very large in
comparison to other forms of user data; so as a result, a relatively small
amount of time is required to decrypt the directory stream.
Files will form the bulk of the user data which will be stored within the
file system. A relatively large amount of time will be required to decrypt the
file data stream. This can present a large performance impact as the size
of the file grows large. As discussed in the previous chapter, the user data
is stored in discretely sized blocks equal to the size of the file system block
size. User data which consumes a large amount of space is encrypted and
decrypted in these discrete blocks and therefore portions of the file can be
accessed as needed.
The worst case scenario will be when there is access to multiple files and
directories in a single operation. This will require the file system to decrypt
166 FILE SYSTEM SECURITY FOR SSFS
data from multiple inode entries, which could introduce a large performance
impact as the number of files contained in the file system grows larger.
8.8 Summary
In this chapter we covered the following sections:
Security Overview - in which gave an overview of the security scheme
as used by SSFS. This included a discussion on the following concepts:
Security through Information Hiding - in this section we discuss
information security which is provided by information hiding.
- Security through Cryptography- in this section we discuss infor-
mation security which is provided by cryptography.
Data Cryptography - where we introduce cryptography as used by
SSFS to encrypt data.
- Choice of Algorithm - where we discuss the choice of crypto-
graphic algorithm to be used in SSFS, along with a discussion on
the categories use to choose such an algorithm.
Cryptographic Layer - where we discuss the cryptographic layer as an
extension to the intermediate-level operations.
- Transparent Encryption - where we discuss the ability for the
cryptographic layer to provide transparent encryption and decryp-
tion of hidden data.
File System Data Encryption Scheme - in this section we discussed
the scheme used by SSFS to classify the type of data which is to be
encrypted. This included a discussion on the following concepts:
- Data Classes - which are used to manage how certain forms of
hidden data within SSFS is encrypted, namely system data and
user data.
- Interactions - where we outline the interactions which have to
be made between the host and hidden file systems, which outlines
the need for specific data classes.
8.9. CONCLUSION 167
Encryption Hierarchy -in this section we describe the hierarchy which
is formed by allowing access to certain data types which is dependent
on other data types. We discuss the following concepts in this section:
- Initialisation Vectors (IV) - which are used to from the encryp-
tion hierarchy by using unique IV s for each type of data.
- Operational Scenario- where we present an operational scenario
in order to describe the overall operation of the security scheme.
Performance Concerns - in this section we discuss a number of per-
formance concerns related to the encryption scheme, and the impact
which it will have on access to hidden files and directories.
8. 9 Conclusion
The security of the data within the steganographic file system plays an im-
portant role in ensuring that a user can store data, confident that it will not
be compromised. This is achieved through the creation of a security scheme
which will offer an adequate level of data security while not compromising on
the overall performance. This balance is achieved through the use of modern
and efficient cryptographic algorithms, and specific methods of encrypting
data.
In section 8.2 we gave a security overview, in order to explain how data
security is achieved with in the steganographic file system. We then go on in
section 8.3 to discuss data cryptography with particular focus on the require-
ment of a cryptographic algorithm. In section 8.4 we introduce and discuss
the cryptographic layer as an operational layer which is essential to perform-
ing transparent data encryption. We go on in sections 8.5 and 8.6 to discuss
the data encryption scheme and encryption hierarchy respectfully, both of
which are used to provide a secure data encryption mechanism. Finally in
section 8. 7 we address a number of performance concerns regarding the use
of data encryption.
Information security through cryptography allows us to confidently hide
information within the steganographic file system. In the following chapter
we will discuss dynamic reallocation to avoid "collisions" between the hidden
and non-hidden data, which forms the basis for the non-duplication ability
of the steganographic file system.
Chapter 9
Dynamic Reallocation
9.1 Introduction
In order to avoid duplication of hidden data and thus avoid data collisions,
a dynamic reallocation mechanism is introduced, which will give the ability
for hidden data to be automatically reallocated as needed by the host file
system. The hidden data reallocation mechanism will build upon the host
file system's existing write operation, which will check for, and reallocate
hidden data as needed from physical locations on a device.
The purpose of this chapter is to define the dynamic reallocation mech-
anism used by SSFS in order to avoid collisions between hidden and non-
hidden data.
In order to explain the dynamic reallocation process, an overview is pre-
sented in section 9.2 , in which we will introduce the operational processes.
In section 9.3 we discuss the details for the dynamic reallocation process. At
which point we will discuss access to various hidden file system structures
in section 9.3.1 and the write operations redirection in section 9.3.2. These
two sections allow the host file system to execute the dynamic reallocation
functions.
In section 9.3.3 we discuss the redirection process, which is followed in
section 9.3.4 we discuss the reallocation categories which describe how the
reallocation process should be handled. Finally in section 9.3.5 we discuss
the sacrificial and preserving operational modes which are used to control the
reallocation process there are no longer any available unallocated file system
blocks.
169
170 DYNAMIC REALLOCATION
9.2 Overview
The dynamic reallocation mechanism provides the core functionality to avoid
unnecessary data duplication on the hidden file system. In simple terms the
dynamic reallocation mechanism will move hidden data away from requested
physical location by the host file system. The host file system will then have
the ability to operate unhindered, and unaware of the underlying hidden file
system.
The dynamic reallocation mechanisms will interact with the read and
write requests of the host file system, by redirecting these requests to a set
of reallocation procedures, in order to ensure that hidden data is reallocated
in a secure and reliable fashion.
As discussed in previous chapters, the hidden file system utilises a number
of different on disk structures in order to manage the storage of hidden data,
the structures which are relevant to the dynamic reallocation process are
listed below:
1. The Superblock ~ records the location of the important on-disk struc-
tures for the hidden file system.
2. The TMap Array - records the physical location of the Translation
Map.
3. The Translation Map ~ r e c o r d s the mapping between the hidden file
system logical blocks and physical locations.
The design of the hidden file system allows the physical blocks where
hidden data exists to be modified without a need to make significant changes
to the hidden file system control structures. When a block of hidden data
must be reallocated, only a modification to the Translation Map need occur.
The logical layout of the data within the hidden file system ensures that
the logical blocks allocated to hidden data need not change. The design of
the encryption scheme as discussed in the previous chapter will allow hidden
file system blocks to be reallocated without the need to re-encrypt the data.
The overall design of the hidden file system supports the dynamic realloca-
tion process, while minimising the amount of data modification needed for a
reallocation operation.
In the following section we will discuss other possible collision avoidance
techniques. We will then contrast that to the dynamic reallocation mecha-
nism chosen for our solution.
9. 2. OVERVIEW 171
9.2.1 Other Possible Collision Avoidance Techniques
There are other possible solutions which can be used to avoid collisions be-
tween the hidden and non-hidden data. We will briefly discuss these pos-
sibilities and then discuss why the dynamic reallocation of the hidden data
was chosen as the most appropriate solution.
1. Reallocation of the non-hidden data - this would involve reallocating
the host file system data in order to allow the hidden data to be written
unhindered to the storage device. This is unacceptable as this would re-
quire extensive modifications to the host file system's implementation.
This is contrary to the design goals outlined in section 5.4 on page 90,
as this would hinder the backward compatibility with the original host
file system driver.
2. Utilise a shared storage map - use a single storage map for both the
hidden and non-hidden data. This again would require extensive mod-
ifications to the host file system implementation, and could be used to
easily identify physical blocks which contain hidden data.
Both of the above solutions where rejected because they require extensive
modifications to the host file system implementation. The host file system
implementation would no longer be backward compatible with the original
file system implementation. Dynamic reallocation of the hidden data ensures
that the structure of the host file system remains intact, hidden data can then
be stored and secured in a way which is detached from the host file system
implementation.
In the following section we will present an operational scenario which will
demonstrate the principals used in the dynamic reallocation process.
9.2.2 Operational Scenario
In this section we will discuss an operational scenario in order to demon-
strate the basic principle of dynamic reallocation. In order for the dynamic
reallocation of hidden data to be achieved, a degree of interaction between
the hidden file system and the host file system must be introduced. These
interactions must be kept at a minimum in order to minimise the exposure
of hidden data.
Imagine hidden data which is stored within the hidden file system, at
some particular physical location, called block H. During some point in
172 DYNAMIC REALLOCATION
time, the host file system implementation will request that its data be stored
in block H, which as stated before contains hidden data. This host file system
is unaware that hidden data exists at that location, and as such considers
it a valid block for allocation. To avoid the hidden data being overwritten,
and the hidden file system object which it belongs to becoming unusable,
the dynamic reallocation mechanism is invoked to move the hidden data out
of the way. In order to allow the hidden file system to locate this block,
the control structures are updated to reflect the new physical location. This
operation allows both the host and hidden file system data to remain intact
and usable, without the need to duplicate the hidden file system data.
Recall that in order for the physical location of hidden data to be located
the Translation Map is used to provide a mapping between the logical position
within the hidden file system and the physical position on the device. Also
recall that the physical location of the hidden data is not recorded by the host
file system, as far as the host file system is concerned, all available physical
blocks are "available" for allocation. This forms the crux of the dynamic
reallocation mechanism, when the host file system requests that a physical
block be written to, any hidden data which might be stored in that physical
block must be "reallocated" in order to preserve it. It is important to note
that hidden data must remain intact after a reallocation has occurred; this is
facilitated through the logical positioning of hidden data within the hidden
file system.
A process overview is presented in the following section, which will define
the basic operational process which is followed by the dynamic reallocation
mechanism.
9.2.3 Process Overview
As introduced above, in order for the dynamic reallocation process to take
place interaction between the hidden data and the host file system must be
allowed in some regard. The dynamic reallocation process can be summarised
into the following steps:
1. Intercept and redirect the host file system's write request.
2. Check to see if the physical block which is requested above contains
hidden data. If it does ...
(a) Determine a new unallocated physical location where the hidden
data can be stored.
9.3. OPERATIONAL DETAILS 173
(b) Move the hidden data to the new location.
(c) Update the hidden file system's control structures.
3. Allow the host file system to write to the requested location.
As can be seen from the above process, the host file system will require a
level of interaction with the hidden data. In order to determine the physical
locations where hidden data exists, the host file system will require access to
the Translation Map. In order to access the Translation Map, the host file
system will require access to the TMap Array. In turn, in order to access the
TMap Array, access to the hidden file system's superblock will be required.
No other structures are required for dynamic reallocation; remember that
the host file system is not concerned with the structure of the underlying
hidden data, only the raw data itself. The design of the hidden file system
component allows for data to be referenced independently of the physical
location on the device.
Remember that all data within the hidden file system is referenced in
terms of the logical position of the data, and is mapped to a particular
physical location through the use of the Translation Map. This allows the
logical references to the hidden files and directories to remain consistent in
the hidden Inode Table, and the hidden Directory Entries. Any block of
hidden data can exist in any physical block in the device, as long as the
mapping between the hidden logical blocks, and the physical device blocks
is valid.
The individual blocks of hidden data are all encrypted independent of
each other; this allows for hidden data block to be moved freely within the
physical device, and without the need for any encryption or decryption to
take place. The security implications with regard to the encrypted hidden
data will be discussed in a later section. In the following section we will
discuss each of the operations in this process in detail.
9.3 Operational Details
To describe the entire dynamic reallocation process we will now discuss a
number of concepts which will allow hidden data to be safely and securely
reallocated. In the following section we will discuss host file system access to
the hidden file system control structures to facilitate the reallocation process.
174 DYNAMIC REALLOCATION
9.3.1 Access to Hidden File System Structures
As discussed above, the dynamic reallocation methods within the host file
system will require access to a number of hidden file system structures. The
particular structures in question are the Superblock, the TMap Array, and
the Translation Map. The core function of the dynamic reallocation methods
is to modify the Translation Map when a hidden block is moved to a different
location. As mentioned above in order to locate the Translation Map on
the physical device (which itself is reallocatable), the TMap Array must be
accessed. Likewise in order to determine the length of the TMap Array,
particular fields of the hidden file system's superblock must be accessed.
The fields in question are the tmap_start, and tmap_length fields, as
seen in Listing 6.1 on page 107. We will now discuss the redirection of the
write operations in order to allow the dynamic reallocation mechanism to
operate.
9.3.2 Write Redirection
When the host file system requests a write to a particular physical loca-
tion on the device, it will invoke some or other function which will actually
perform the write operation. For the purposes of the following discussion,
imagine that the write function as invoked by the host file system takes the
form of a function with the following prototype:
1 int write(int position, void* buffer, int length);
This write method will accept the physical position when the host
data will be written, a memory buffer containing the data, and the length
(number of bytes) of the data in the buffer. This method will invoke the
kernel's write methods and write the data permanently to the specified
physical location on the device.
There is enough information contain within this function call in order to
determine if the requested physical location contains any hidden data. This
process will be discussed in the following section.
The redirection of the write involves redirecting the execution of the write
operation, performing dynamic reallocation of hidden data, if required, and
then resuming the normal execution of the write operation, as seen in fig-
ure 9.1. It is important to note that only the hidden data is reallocated.
9.3. OPERATIONAL DETAILS 175
When the host file system requests that data is written to the device, it will
have priority over the physical location. This is specifically done in order to
minimise the processing which must occur when hidden data is to be reallo-
cated, and to keep modification of the host file system implementation to a
minimum. In the following section we will discuss the actual redirection of
the write operation.
Write Operation Reallocation Operation
-----
~ ~ ~ ~ ' ~ : ~ ~ : ~ ~ ~ ~ J : I
---
Figure 9.1: Write operation execution redirection
Write Operation Execution Redirection
The host file system's write operation will perform a number of operations
which will eventually result in the data being written to the physical device.
In order to ensure that hidden data is reallocated away from the requested
physical location, another operation must be added to the overall function.
This extra operational step will redirect the execution path of the write
operation towards the reallocation functions. In order to achieve this, the
original write function is modified to execute the reallocation methods, this
can be seen in figure 9.2.
As can be seen from figure 9.2, only a very minor modification (see line 6
of the Modified Function) is made to this simplistic write function in order
to allow the host file system to perform the reallocation. At this point of the
execution, the dynamic reallocation function is considered to be a "black-
box" function - in that we are not concerned with the operation of this
function. Once the dynamic reallocation process has complete, the write
operation will continue with an unhindered write to the physical device. The
purpose of having the dynamic reallocation functionally separate from the
host file system's write will be discussed in the following section
176 DYNAMIC REALLOCATION
Original Function Modified Function
int write ( dev, pos, buf, len) 1 int write (device, pos, buf, len) {
2 int ret = 0; 2 int ret = 0;
3 seek ( dev, pos); 3 seek (device , pos);
4 4
5 5 //reallocation if needed
6 6 check_rcallocation (device, pos);
7 7
8 //write to the device 8 / jwrite to the device
9 ret=write(dev, buf, len); 9 ret=write (device, buf, len);
10 10
11 return ret ; 11 return ret;
12 12 }
Figure 9.2: Function modified with reallocation methods
Black-box Reallocation
As introduced above, the dynamic reallocation methods are considered to
be a black-box extension to the host file system implementation, this allows
for a distinct separation between the host file system implementation and
the dynamic reallocation methods. The separation is shown graphically in
figure 9.3. The host file system will therefore not have direct access to the
hidden file system control structures (which as discussed above are used
to facilitate the reallocation process). This will limit the exposure of the
hidden data, and increase the overall security during the dynamic reallocation
process.
The hidden file system control structures which are exposed during this
process are not complete enough to extract specific hidden data from the hid-
den file system. The information which can be obtained from these structures
can only expose the presence of hidden data, not the content. Recall that the
hidden data, the inode table and the directory entries are encrypted and not
directly exposed during this process. The data encryption, combined with
the separation of the host file system implementation and the dynamic real-
location mechanisms allows for secure reallocation of hidden data, without
the worry that the hidden data can be compromised.
Write redirection allows the dynamic reallocation mechanisms to be exe-
cuted. In the following section we will discuss the hidden data reallocation
process, which will allow hidden data to be reallocated as needed when then
host file system requests a write to a physical block.
9.3. OPERATIONAL DETAILS
Known Implementation
Execution
I Write Operation I
EJ
Unknown Implementation
-, Reallocation Operation Blackbox
Figure 9.3: Reallocation black-box functions
9.3.3 Hidden Data Reallocation
177
Hidden data reallocation forms the core of the dynamic reallocation mecha-
nisms. The dynamic reallocation process is used to determine if a physical
block contains hidden data, and if needed, move the hidden data it con-
tains to another unallocated physical block. In order to determine if a block
contains hidden data, the following process is followed:
1. Search the Translation Map to determine if a logical block maps to a
particular physical block.
2. If such a mapping exists ...
(a) Determine the Reallocation Category, as this will affect how the
reallocation must be handled.
(b) Locate a free physical block which is unallocated in both the host
and hidden file systems.
(c) Reallocate the physical block to the new location based on the
Reallocation Category, generally the following will occur:
1. Move all the data from the specified physical block to the new
location.
11. Update the mapping Translation Map to reflect the new phys-
ical location for the physical block.
178 DYNAMIC REALLOCATION
(d) Update the hidden file system's control structure on the physical
device.
3. Continue execution of the write function.
The dynamic reallocation mechanism is deliberately designed to reallocate
hidden data. This imparts a level of security to the hidden data in that hidden
data is not static, the physical position of the hidden data can change over
time. This obscures the hidden data which makes it difficult to determine
its exact position, and thus increasing the overall data security of SSFS.
The reallocation of hidden data also ensures that when the host file system
requests a physical block for non-hidden data, it is depended only on the free
blocks available to the host file system, and not on the position of the hidden
data. This reinforces the separation of functionality between the host and
hidden file system. It is the responsibility of the hidden file system to mange
hidden data, and the responsibility of the host file system to manage non-
hidden data. This reduces the exposure of the hidden data and thus will
increase overall security.
The requirements for locating a free physical block will be discussed in
the following section, this will be followed by a discussion of the Reallocation
Categories as a method of describing the type of data which is contained in
a physical block.
Searching for an Unallocated Block
In order for hidden data to be safely reallocated within the file system, an un-
allocated block must be located. A physical block must satisfy the following
two constraints:
1. Be marked as unallocated in the host file system.
2. Not be mapped to any logical block in the Translation Map.
If these two constants are not met, then there runs the risk of overwriting
data in one of the two file systems. These two constraints are derived from
the way in which hidden data is stored within the file system. Recall that
hidden data is stored in the unallocated block of the host file system. The
physical blocks utilised by hidden data are not marked as allocated in the
storage map of the host file system, only marked as mapped to a hidden
logical block in the Translation Map. Therefore both of the above constants
9.3. OPERATIONAL DETAILS 179
must be satisfied by a physical block in order for it to be considered for
reallocation.
The method of determining if the two constants depends on the construc-
tion of the storage map of the host file system, however the ability to search
for a free block will be provided by the host file system's implementation, it
is then a simple case to compared it against the mapping in the Translation
Map in order to determine if it is suitable for reallocation.
In the following section we will discuss the Reallocation Categories as a
mechanism for determining the appropriate course of actions when realloca-
tion a particular physical block.
9.3.4 Reallocation Categories
The reallocation category for a particular physical block refers to the type
of data which exists in that block. The type of data can be described by
which logical block is mapped to the particular physical location. Consider
figure 9.4, which shows the logical layout of the hidden file system and which
reallocation category will be used for different logical blocks. The figure
represents the logical layout of the hidden file system where m blocks are
allocated to the file system in total, and the Translation Map is n blocks
long, therefore the Translation Map ends at block offset n + 1 and the hidden
data continues from block offset n + 2.
Hidden Logical Layout
0
1
n+l
n+2
m
Superblock ~
'franslation Map
Normal Data
Superblock Category
Translation Map Category
Normal Category
Total blocks allocated ------> m blocks
'franslation Map size ------> n blocks
Figure 9.4: Reallocation categories
The reallocation of hidden data will fall into the following three categories:
180 DYNAMIC REALLOCATION
The Superblock category.
The Translation Map category.
The Normal Data category.
The reallocation categories are not be confused with the hidden file system
control structures of the same name. The name of the reallocation category
describes the type of data which exists a particular physical block. Depending
on the type of reallocation category, the reallocation of the physical blocks
must be handled in slightly different ways.
Superblock Reallocation Category
The Superblock Reallocation Category is used when the host file system re-
quests a write to physical block 0 (the superblock). Remember that physical
block 0 contains the host file system's superblock, the hidden file system's
superblock, and the TMap Array. When there is a write to physical block 0,
no reallocation will take place, if and only if the data to be written to the
physical block is equal to the byte size of the host file system's superblock.
This category will be used whenever the host file system wants to modify its
superblock, as long as the requested write does not overwrite the hidden file
system superblock or TMap Array, there is no need to perform a reallocation.
Translation Map Reallocation Category
The Translation Map Reallocation Category will be used when the host file
system requests a write to a physical location which is mapped to the Trans-
lation Map. For instance, if the Translation Map is n block in length, then
a when the host file system requests a write to a physical location which is
mapped to a logical block in the range 1 ---t ( n + 1), then the Translation
Map Reallocation Category will be used. Remember that the Translation
Map itself is reallocatable, and is located using the TMap Array. When
a reallocation of a block allocated to the Translation Map is required the
following must occur:
1. Move the data from the specified physical block to a new location.
2. Modify the logical to physical block mapping for this physical block
in the Translation Map (even though is allocated to the Translation
Map).
9.3. OPERATIONAL DETAILS 181
3. Update the physical location of the particular block allocated to the
Thanslation Map in the TMap Array.
The above process will ensure that the Translation Map can always be
located regardless of the physical blocks which it occupies.
Normal Data Reallocation Category
The Normal Data Reallocation Category is used when requested physical
location is mapped to a logical block which is occupied by the hidden file
system's Inode Table, Directory Entries, or File Data. For instance, if the
hidden file system consists of m logical blocks, and the Thanslation Map is
n blocks in length, then any physical block which maps to a logical block in
the range (n + 2) ---t m will fall into this category. This category will be
used the most, as the bulk of data in the hidden file system will fall into this
range. In order to reallocate data within this category, the following must
occur:
1. Move the data from the specified physical block to the new location.
2. Modify the logical to physical mapping for this physical block in the
Thanslation Map.
The above process will ensure that normal hidden data can be located
anywhere on the physical device through the Thanslation Map. In the fol-
lowing section we will discuss both sacrificial and preserving modes, as a
mechanism for handling how the dynamic reallocation mechanism will be-
have if there are no longer any free physical blocks available for reallocation.
9.3.5 Sacrificial versus Preserving
A problem that will arise with reallocating hidden data to the unallocated
blocks of the host file system is that eventually the number of available un-
allocated blocks will become exhausted. This arises from the fact that as
hidden data is moved the physical block becomes allocated by the host file
system in order to store its own data. This will become an inevitable side-
effect of embedding the hidden data within the host file system. In order to
preserve the hidden data from conflicting with the non-hidden data, there
182 DYNAMIC REALLOCATION
are two allocation modes presented below, namely sacrificial and preserv-
ing. Both of these two modes define how the reallocation mechanism should
behave if the unallocated block within the file system becomes exhausted.
Either one of these modes will come into play, depending on the preference
of the hidden file system, as specified in the hidden file system superblock in
the flags field (see listing 8, on page 107). In the following sections we will
discuss the sacrificial and preserving modes.
Sacrificial Mode
The policy for allocating blocks in this mode is to give priority to the host
file systems data. In this mode hidden data will be overwritten in favour of
the non-hidden data. This will result in the hidden file system be destroyed.
This can be useful if the hidden data is to only be stored for a limited period
of time, or data is to be hidden for "one-time" use only. The hidden file
system will destroy itself as the number of allocated blocks in the host file
system increases.
This mode will usually not be the ideal mode of operation, as hidden
data will usually be required to be stored for a greater period of time. In
the following section we will present the Preserving Mode as a method of
allocating block to the host file system as to ensure that hidden data is never
lost.
Preserving Mode
Preserving Mode will give priority to data stored in the hidden file system.
When the point where the number of unallocated blocks available to the
host file system is equal to the number of blocks allocated to the hidden file
system, this allocation mode will be enforced. When this point is reached
no more writes to the physical device by the host file system will be allowed.
This will ensure that the hidden data will remain intact, which will generally
be the preferred method of operation.
In order to minimise the possibility that either of the two allocation modes
discussed above will come into play, the maximum number of blocks which
this hidden file system can consume must be kept to a minimum. These
policies will only ever come into play as the host file system becomes very
full (which is generally not the case on most computer systems). However,
by keeping the number of available blocks for the hidden file system down,
9.4. SUMMARY 183
this likelihood of these allocation modes coming into play will be kept to a
minimum.
9.4 Summary
In this chapter we covered the following sections:
Overview - where we introduce the dynamic reallocation process. We
cover the following concepts:
Operational Scenario- in this section we provide an operational
scenario in order to demonstrate the dynamic reallocation process.
Process Overview - in this section we give an overview of the
dynamic reallocation process, by explaining the generic process
used to perform the dynamic reallocation of hidden data.
Operational Details - where we discuss the operational details for the
dynamic reallocation process, we discuss the following concepts:
Access to Hidden File System Structures - in this section we
discuss the need for host file system to access the hidden file system
control structures. This allows the host file system to determine
if a particular physical block contains hidden data.
- Write Redirection - where we discuss how the dynamic realloca-
tion process is started. This is achieved by modifying the host file
system's write operation to invoke the reallocation methods.
Hidden Data Reallocation - in this section we fully discuss the
dynamic reallocation process.
Reallocation Categories - in this section we discuss how the dy-
namic reallocation process will handle certain types of hidden
data. This allows hidden data to be reallocated in the correct
way depending on which category it falls into.
- Sacrificial versus Preserving- where we discuss how the dynamic
reallocation mechanism will operate if there are no longer any free
physical blocks available for reallocation.
184 DYNAMIC REALLOCATION
9.5 Conclusion
In this chapter we discussed the concept of dynamic reallocation as a mecha-
nism for the secure reallocation of hidden data by the host file system. This
provides a mechanism for the host file system to avoid collisions between
hidden and non-hidden data.
We introduce the concept of dynamic reallocation with an overview of the
reallocation process in section 9.2. We then go on in section 9.3 to discuss
the operational details. We discuss all aspects of the reallocation process, in-
cluding write operation redirection in section 9.3.2, hidden data reallocation
process in section 9.3.3, and the reallocation categories in section 9.3.3.
In the following chapter we will address the performance impact which
the dynamic reallocation mechanism will have on the operation of the host
file system. By analysing this impact we can make judgements concerning
the feasibility of a steganographic file system.
Chapter 10
Steganographic File System
Performance
10.1 Introduction
The performance of the steganographic file system will impact on the overall
feasibility of such a system. If the overall performance impact is too great,
then the existence of the hidden data can be betrayed, and the security of
the data will be jeopardised. In order to analyse the performance impact of
the steganographic file system, we will need to consider a number of different
factors, for both the hidden file system and the host file system.
In this chapter we will consider the factors impacting performance for the
hidden file system, in section 10.2, and the host file system, in section 10.3.
The results presented in this chapter were obtained through experimenta-
tion with our implementation of SSFS. This implementation of SSFS allowed
us to draw a number of conclusions concerning the performance and feasibil-
ity of the overall system.
The major concern for hidden file system performance is the impact which
file fragmentation will have on the storage and retrieval of hidden data, these
concerns can be negated by utilising an appropriate physical device, as will
be discussed in the following sections. Host file system performance is most
significantly impacted by the dynamic reallocation methods which are used to
avoid data collisions and avoid data duplication. An analysis of this impact
will be discussed in the following sections.
185
186 STEGANOGRAPHIC FILE SYSTEM PERFORMANCE
10.2 Hidden File System Performance
In the following section we will discuss the performance considerations for the
hidden file system. The only real consideration is that of file fragmentation.
As a side-effect of the dynamic reallocation process, file fragmentation will
inevitably impact hidden file system performance, this however is only a
consideration on traditional hard disk drives. File fragmentation is the most
significant external factor which will impact hidden file system performance
and will be discussed below.
10.2.1 Hidden Data Fragmentation
One of the factors which will have the greatest impact on the performance
of the steganographic file system will be file fragmentation. Ideally file data
should be allocated contiguously, and in adjacent physical blocks, in order
to minimise the movement of the read/write heads. The greater the degree
of file fragmentation, the more movements the read/write heads will have
to make in order to access the hidden data. This so called seek time
1
can
greatly impact on the performance for reading and writing hidden data.
Hidden file system fragmentation will increase more rapidly than on a
normal file system implementation, as the dynamic reallocation mechanism
is required to move hidden data to alternate physical locations as the host file
system requires the physical block. This increased possibility of file fragmen-
tation can impact on the access time of large hidden files, as the read/write
heads will be required to make more movements across the magnetic platter
to read the data into primary memory.
File fragmentation will only impact a hidden file system which is stored
on a hard disk drive, or removable diskette. It is not a consideration for
storage devices which use "flash memory", such as a USB Flash Drive, or a
Solid State Disk, access to the physical blocks on these devices is controlled
electrically, and therefore file fragmentation, and the associated seek time, is
not a factor. The time required to access data stored on a physical block on
a flash memory module is constant, and usually is achieved in a few micro-
seconds.
This ability of flash memory is geared especially well to SSFS. By us-
ing a USB Flash Drive as the underlying physical medium for the hidden
1
Seek time -The amount of time taken for the read/write heads to move to the correct
position on the platter to read or write data.
10.3. HOST FILE SYSTEM PERFORMANCE 187
file system, it will eliminate the performance impact brought about by file
fragmentation.
In the following section we will discuss the impact of the dynamic reallo-
cation methods on the host file system.
10.3 Host File System Performance
The performance of the host file system is most significantly impacted by the
dynamic reallocation methods which were introduced to ensure there are no
collisions between the hidden and non-hidden data. In order for SSFS to be
feasible, the impact on the host file system must be kept to a minimum.
In order to quantify the impact of the dynamic reallocation methods on
the host file system, we will discuss both an optimised and unoptimised imple-
mentation in the following sections. This will allow the efficient reallocation
of hidden data and ensure the feasibility of such a system.
10.3.1 Dynamic Reallocation Performance
The dynamic reallocation mechanisms will introduce a performance impact
on the host file system implementation. Recall that for every write operation
to the physical device, the physical block which is to be written to must be
checked to see if it contains hidden data which must be reallocated. Searching
the Translation Map will constitute the largest performance penalty, how-
ever this can be minimised by implementing the Translation Map as a more
efficient data structure, such as a Red-Black tree.
Figure 10.1 shows the effect which dynamic reallocation will have on the
host file system. These results were generated through interaction with SSFS
using a virtual machine running Ubuntu 7.10 with a 2.6GHz CPU. An empty
100MiB disk image was used in each case. A number of files of random size
were created on the host file system, and the time taken to create these files
was recorded. In each case the graph shows the amount of time required
to allocate a number of files. The line indicated in blue shows the original
performance of the host file system, allocating 2000 files of random size in
approximately 0.5 seconds.
The line indicated in black shows the amount of time taken when an
unoptimised version of the dynamic reallocation methods are used, where
the time taken to allocate 2000 files was approximately 4 seconds. This
188 STEGANOGRAPHIC FILE SYSTEM PERFORMANCE
results from a linear search of the Translation Map, with a worse case run-
time of 0 ( n). Recall that the Translation Map stores the logical to physical
mapping as a set of paired values, an entry for the hidden logical address and
a corresponding entry for the physical address. This linear increase in time is
clearly unacceptable, as the performance impact of the dynamic reallocation
mechanism will betray the presence of the hidden file system.
In order to improve on this result, an optimised version of the dynamic
reallocation methods was introduced. This version utilised a Red-Black tree,
indexed by the physical location on the device. This allows the Translation
Map to be searched in a worse case run-time of O(log n). This optimisation
results in a dramatic improvement of the overall file system performance,
taking only 0.8 seconds to allocate 2000 files. This effect of this optimisation
will be discussed fully in the following section .
4.0
3.5
~ 3.0
"' '-'
~ 2.5

u 2.0
<1.)
r:/"1 1.5
1.0
With Dynamic Reallocation - Unoptimised
With Dynamic Reallocation - Optimised
Without Dynamic Reallocation
0.5 ~ ~ ~ = = = = = = = = 3
. 0 ~
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Files created
Figure 10.1: Optimised versus unoptimised dynamic reallocation
These results indicate that the performance impact of the dynamic real-
location mechanisms on the host file system is small enough to warrant the
use of a hidden file system. As is the unoptimised case, the presence of the
hidden file system, and thus the hidden data, could be betrayed by the per-
formance impact. However by optimising the dynamic reallocation methods,
10.3. HOST FILE SYSTEM PERFORMANCE 189
it allows the hidden data to be stored without impacting on the overall host
file system performance.
In the following section we will discuss the code profiles for the host file
system, this will allow us to observe how the hidden file system implementa-
tion will impact on the operation of the host file system. Again these profiles
were obtained through experimentation with our implementation of SSFS.
10.3.2 Dynamic Reallocation Code Profiles
In order to analyse the impact of the dynamic reallocation methods on the
host file system, consider the code profiles for the creation of 2000 randomly
sized files as seen in figure 10.2 on page 191, and figure 10.3 on page 192.
These profiles will be discussed in detail below.
These profiles were created using gprof (the GNU Profiler) [20]. This
allows us to examine each function call to determine which function is taking
the longest to execute. We can thus determine the impact of the dynamic
reallocation mechanism on the host file system implementation.
U noptimised Code Profile
Figure 10.2 shows the amount of time taken for each function call when an
unoptimised version of the dynamic reallocation mechanism is in use. The
greatest impact is from the searchTmap function (indicated in red), which
is responsible for searching the Translation Map for a particular logical-to-
physical mapping. This profile reveals that this function consumes 93 percent
of the total running time for this particular application. This is confirmed
when examining the graph presented in figure 10.1 which shows the total
running time required to allocate a particular number of files. This single
function is responsible for the observed 800% increase in running time.
This performance impact is unacceptable, as normal operation of the
host file system will be impeded which is contrary to the stated design goals.
However, the only other dynamic reallocation function which is of significant
impact on the host file system is the hfs_wri te_pos function (indicated in
blue). This function is responsible for writing both the non-hidden and
reallocated hidden data to the physical disk. This function does not have a
significant impact on the overall host file system performance (determined
by examining the total execution time for this function), therefore if the
searchTmap function can be optimised the existence of the hidden file system
190 STEGANOGRAPHIC FILE SYSTEM PERFORMANCE
can become feasible. In the following section we will discuss the improved
implementation of the dynamic reallocation mechanisms.
Optimised Code Profile
Figure 10.3 shows the amount of time taken for each function call when
the searchTmap function has been optimised. The overall situation has been
dramatically improved. The searchTmap function no longer consumes almost
all of the total running time. The optimisation consisted of modifying the
searchTmap function to utilise a Red-Black tree to search for a particular
logical to physical mapping, and as can be seen from figure 10.3 no longer
has such a large impact. The hfs_write_pos function (indicated in blue) is
found just below the searchTmap function, and has an unchanged run time.
This optimised code, as graphed in figure 10.1 introduces only a 0.3 second
increase in running time, which is a significant improvement over the previous
situation. This overhead is not detrimental to the overall performance of the
host file system, which allows for the existence of the hidden file system.
By introducing this optimisation into the dynamic reallocation methods,
there is an observable improvement in overall performance. Analysing the
impact of the dynamic reallocation methods on the run-time of the host file
system, the feasibility of embedding hidden data within the host file system
is ensured. This will allow hidden data to be securely stored within the host
file system and not impact on the overall file system performance.
10.4 Summary
In this chapter we discussed the following concepts:
Hidden File System Performance ~ in which we introduce the per-
formance impacting factors for the hidden file system. We discuss the
following concepts:
Hidden File System Fragmentation ~ where we discuss file frag-
mentation with regards to the impact on the hidden file system.
Host File System Performance ~ in this section we discuss how the
storage of hidden data will impact on the operation of the host file
system.
10.4. SUMMARY 191
% cumulative seLf seLf totaL
time seconds seconds caLLs ms/caL L ms/caLL name
97.65 1. 66 1. 66 63609 0.03 0.03 searchTmap
0.59 1.67 0.01 447447 0.00 0.00 delete_from_ list
0.59 1. 68 0.01 210373 0.00 0.00 cache_block_io
0.59 1.69 0.01 2795 0.00 0.00 read_into_ents
0.59 1. 70 0.01 1311 0.01 0.18 flush_ents
0.00 1. 70 0.00 1121200 0.00 0.00 atomic_add
0.00 1. 70 0.00 534686 0.00 0.00 hash_ lookup
0.00 1. 70 0.00 467114 0.00 0.00 block_lookup
0.00 1. 70 0.00 419043 0.00 0.00 add_to_head
0.00 1. 70 0.00 210373 0.00 0.00 system_ time
0.00 1. 70 0.00 147197 0.00 0.00 compare_vnode
0.00 1. 70 0.00 141987 0.00 0.00 release_ block
0.00 1. 70 0.00 141222 0.00 0.00 get_ block
0.00 1. 70 0.00 114754 0.00 0.00 mark_ blocks_ dirty
0.00 1. 70 0.00 69528 0.00 0.00 hash_ delete
0.00 1. 70 0.00 69528 0.00 0.00 hash_ insert
0.00 1. 70 0.00 69528 0.00 0.00 new_hash_ent
0.00 1. 70 0.00 68384 0.00 0.00 cached_ write
0.00 1. 70 0.00 68384 0.00 0.00 write_blocks
0.00 1. 70 0.00 67310 0.00 0.00 file_pos_to_disk_addr
0.00 1. 70 0.00 66384 0.00 0.00 acquire_sem
0.00 1. 70 0.00 66384 0.00 0.00 release_sem
0.00 1. 70 0.00 57945 0.00 0.00 hfs_write_pos
0.00 1. 70 0.00 49546 0.00 0.00 GetFreeRangeOfBits
0.00 1. 70 0.00 42543 0.00 0.00 update_inode
0.00 1. 70 0.00 37253 0.00 0.00 myfs_allocate_blocks
0.00 1. 70 0.00 37253 0.00 0.00 real_allocate_blocks
0.00 1. 70 0.00 29428 0.00 0.00 add_to_tail
Figure 10.2: Code profile of unoptimised dynamic reallocation
192 STEGANOGRAPHIC FILE SYSTEM PERFORMANCE
% cumulative seLf self totaL
time seconds seconds calLs us/caLL us/caLL name
30.00 0.06 0.06 1127 53.24 153.91 write_rand_data
20.00 0.10 0.04 208026 0.19 0.43 cache_block_io
15.00 0.13 0.03 530494 0.06 0.06 hash_ lookup
5.00 0.14 0.01 445902 0.02 0.02 delete_from_list
5.00 0.15 0.01 416457 0.02 0.02 add_to_head
5.00 0.16 0.01 113100 0.09 0.14 mark_blocks_dirty
5.00 0.17 0.01 71116 0.14 0.14 hash_ insert
5.00 0.18 0.01 20459 0.49 5.60 myfs_write_data_stream
5.00 0.19 0.01 18459 0.54 6.15 sys_write
5.00 0.20 0.01 2940 3.40 3.40 get_ents
0.00 0.20 0.00 1110314 0.00 0.00 atomic_ add
0.00 0.20 0.00 821605 0.00 0.00 intcmp
0.00 0.20 0.00 461381 0.00 0.06 block_lookup
0.00 0.20 0.00 208026 0.00 0.00 system_ time
0.00 0.20 0.00 149654 0.00 0.00 compare_vnode
0.00 0.20 0.00 140255 0.00 0.10 release_block
0.00 0.20 0.00 139494 0.00 0.43 get_block
0.00 0.20 0.00 71116 0.00 0.00 hash_ delete
0.00 0.20 0.00 71116 0.00 0.00 new_hash_ent
0.00 0.20 0.00 67769 0.00 0.43 cached_ write
0.00 0.20 0.00 67769 0.00 0.43 write_ blocks
0.00 0.20 0.00 66233 0.00 0.21 file_pos_to_disk_addr
0.00 0.20 0.00 65769 0.00 0.00 acquire_sem
0.00 0.20 0.00 65769 0.00 0.00 release_sem
0.00 0.20 0.00 63359 0.00 0.00 searchTmap
0.00 0.20 0.00 54949 0.00 0.00 hfs_write_pos
0.00 0.20 0.00 48357 0.00 0.00 GetFreeRangeOfBits
0.00 0.20 0.00 42045 0.00 0.68 update_inode
Figure 10.3: Code profile of optimised dynamic reallocation
10.5. CONCLUSION 193
10.5
- Dynamic Reallocation Performance - in this section we analyse
the performance of the dynamic reallocation methods, by examine
the impact on the host file system.
- Dynamic Reallocation Code Profiles - where we analyse the code
profiles for the host file system in order to examine the impact of
the dynamic reallocation mechanism.
Conclusion
The performance of the steganographic file system will play an important
role in the overall ability to securely store hidden data. The performance
impact on the host file system due to dynamic reallocation must be kept to a
minimum in order to allow the presence of the hidden data to be kept secure.
The above chapter analyses the impact of the dynamic reallocation methods
on the host file system, and demonstrates the feasibility of such a system.
This chapter clearly demonstrates that hidden data can be embedded
within a host file system without incurring a major performance impact. The
design and implementation of the hidden file system as outlined in the above
chapters allows for the dynamic reallocation mechanism to be effectively
applied, and minimise the performance impact on the host file system.
The impact of the dynamic reallocation methods on the overall host file
system operation allows for the operation of a steganographic file system.
This will allow hidden data to be stored with confidence that it will not be
discovered through an impact in performance.
In the above chapter we discussed factors which will impact the perfor-
mance of the hidden and host file system. In section 10.2 we discussed the
performance concerns regarding the hidden file system, giving special atten-
tion to hidden file fragmentation. This was followed by a discussion of the
factors impacting the host file system in section 10.2. Special concern was
given to the dynamic reallocation methods as a performance impacting fac-
tor, as they play a major role in the operation of the steganographic file
system.
Chapter 11
Conclusion
11.1 Introduction
In this chapter we will outline the content of this dissertation, this will allow
us to look forward and examine areas of future research.
Firstly, in section 11.2 we outline the contribution of each chapter of this
dissertation. We go on in section 11.3 to discuss the contribution of SSFS to
the field of information hiding. Finally we discuss areas of future research in
section 11.4.
11.2 Contribution
In this section we will outline the contribution of each of the chapters of the
dissertation.
Chapter 1 serves as an introduction to this dissertation. The chapter
briefly outlined the problem statement, to introduce the secure stegano-
graphic file system (SSFS), and to present an overview of what was to follow.
Chapter 2 serves to introduce the concepts relating to hard disk drives and
traditional file systems. The concepts which are introduced in this chapter
are used throughout chapters 5-9. The interaction between the disk and
the file system is the most important concept discussed in this chapter, as it
laid the framework for the following chapters.
Chapter 3 introduces cryptography as a mechanism for providing informa-
tion security. This chapter serves to provide a holistic overview of the many
195
196 CONCLUSION
different cryptographic techniques. However the most relevant of the dis-
cussed aspects are that of symmetric cryptosystem and block cipher modes.
These two concepts are referred to extensively in chapter 8 to describe the
security scheme for SSFS.
Chapter 4 discusses steganography and steganographic file systems. Most
notably, this chapter outlines a number of domain specific steganographic
terms which are used frequently throughout the following chapters. Another
important concept introduced in this chapter is the distinction between cryp-
tographic and steganographic file systems. This chapter also outlines a num-
ber of steganographic file system implementations which are referred to in
later chapters.
Chapter 5 is the first of the chapters which describe the implementation of
the secure steganographic file system (SSFS). This chapter outlines problems
with existing implementations, our aim for SSFS, and the basic construction
of such a file system. Important in this chapter is the definition of the
relationship between the hidden and host file systems, and the distinction
between the logical and physical views of the device. This chapter outlines
the framework and concepts for SSFS, which will be used extensively in later
chapters.
Chapter 6 discusses the control structures which are used by the hidden
file system component of SSFS in order to support the storage and retrieval
of data. The control structures which are discussed in this chapter are the
Superblock, the TMap Array, the Thanslation Map, the Inode Table, the
Directory Entries, and the File Streams. The initialisation of the above
mentioned structures in relation to the host file system is then discussed. The
control structures discussed in this chapter are used extensively throughout
the remaining chapters to describe almost every component of SSFS.
Chapter 7 discusses the hidden file system operations as a mechanism for
interacting with the hidden data. This chapter defines a framework which
makes extensive reference to the structures discussed in chapter 6. This
chapter outlines the operational layers as used by SSFS, which is used to
control the storage and retrieval of the hidden data. These operational layers
are of importance, as they are used in chapter 8 to allow for transparent
encryption of the hidden data.
Chapter 8 defines the security scheme for SSFS, and makes extensive use
of the cryptographic concepts discussed in chapter 3. This chapter makes
the important distinction between information security through information
hiding and information security through data encryption, both of which are
used in SSFS to provide a holistic security scheme.
11.3. CONTRIBUTION OF SSFS 197
Chapter 9 discusses the dynamic reallocation mechanism, which provides
the core of the "non-duplication" functionality of SSFS. This chapter makes
extensive reference of the structures described in chapter 6, and makes con-
tinuous use of concepts discussed in previous chapters. The dynamic realloca-
tion mechanism will allow hidden data to be reallocated to any physical block
as needed by the host file system. This mechanism has a significant impact
on the performance of the overall system, which is addressed in chapter 10.
Chapter 10 addresses the performance of SSFS, specifically the perfor-
mance impact of the dynamic reallocation mechanism on the host file sys-
tem. This chapter confirms that such a system is feasible, in that there is an
acceptable impact on the host file system.
In the following section we will discuss the wider contribution of SSFS as
a mechanism for providing information security.
11.3 Contribution of SSFS
Steganographic file systems provide a unique way to ensure information se-
curity. This is achieved through the use of both cryptography and steganog-
raphy in a single environment, which allows for the convenient storage and
retrieval of multiple items of hidden data.
The greatest problem which plagues all steganographic file systems is
how to handle the interaction between the hidden and non-hidden data. A
commonly used approach is to store multiple copies of the steganographic
content to avoid it being overwritten at some stage by non-hidden data.
There is however no guarantee that hidden data will remain intact.
SSFS's use of the dynamic reallocation mechanism allows for hidden data
to be stored in a manner which will ensure that it will not be overwritten,
while still allowing the host file system to operate normally. Furthermore the
ability of SSFS to locate hidden data regardless of the underlying physical
layout of that data provides an interesting organisational mechanism which
could be extended to support many different types of file system.
Steganographic file systems, including SSFS, are subject to abuse through
the storage of illegal data. The methods and mechanism used by SSFS pre-
sented throughout this dissertation can be used to enable forensic examiners
to develop mechanisms to detect hidden data.
In the following section we will discuss the future improvements which
could be made to SSFS.
198 CONCLUSION
11.4 Future Work
Multi-user environment SSFS at present only allows for a single user
to hide data using their master passphrase. The introduction of a multi-user
environment would allow multiple users to hide data within SSFS. "Multi-
user" need not imply multiple individuals utilising SSFS, only that multiple
passphrases exist which could be used to access different sets of hidden data.
This would allow for a greater level of security, as data could be hidden in
one of many different sets of files, thus making the detection of the hidden
data even more complex.
File permissions Once access to the hidden file system component has
been granted, SSFS does not control user access to files and directories, this is
an extension of the single user environment currently in use. Implementation
of user permissions will add an extra security layer when used in conjunction
with a multi-user environment mentioned above.
Optimisation of the Translation Map structure The Translation Map
structure is relatively simple in nature, a simple linear list of logical to phys-
ical mappings. By utilising an optimised structure, such as a B-Tree, this
will improve on the overall performance of SSFS.
Forensic examination of the hidden file system The question of foren-
sic examination of a steganographic file system remains largely unexam-
ined. Detection and examination of steganographic file systems would allow
forensic examiners reliably determine then existence of steganographic con-
tent, and then apply other conventional methods to obtain the associated
passphrases. The area of research would prevent abuse of steganographic file
systems by enabling forensic examiners to detect steganographic content.
The use of a journal to ensure data consistency The consistency of
hidden and non-hidden data following the incorrect unmounting of the file
system is a specific area for further research and improvement. Structures
such as a file system journal will guarantee that both hidden and non-hidden
data is always available regardless of the state of the storage media.
Revise Placement of TMap Array The TMap array allows for the
Translation Map to be located from any physical location of the storage
11.5. CONCLUSION 199
device. The current placement of this structure could adversely affect the
overall system. It would be advantages to research methods to remove or
revise the TMap which would allow for further refinement of the overall
system.
11.5 Conclusion
This chapter served to reflect back upon the content outlined within this
dissertation. We discussed the contribution each of the preceding chapters
made to the whole dissertation. We then go on to discuss the contribution
of SSFS to the field of information hiding. Finally we discussed a number of
different areas of future research.
Appendices
201
Appendix A
SSFS Implementation
A.l Introduction
In this appendix we will discuss the technical aspects of our implementation of
SSFS. Our implementation of SSFS was constructed with the C programming
language, using both Linux and MacOS X machines. The C programming
language was chosen because of its suitability to an implementation of this
type. Linux and MacOS X were chosen because both provide UNIX-type
environments. Both the Linux kernel and MacOS X's Darwin kernel are
open and well documented.
Recall that SSFS is constructed as a "compound file system" , which con-
tains both a host and hidden file system. In the following sections we will
discuss the choices for the design and implementation of SSFS.
A.2 Host File System
A simple host file system was chosen for our implementation of SSFS in order
to provide a platform for testing and debugging. Giampaolo in his book,
Practical File System Design with the BE File System [23] describes such a
simple file system, and provides what he calls the File System Construction
Kit
1
(FS-Kit).
FS-Kit provides a complete file system implementation which operates
on a disk image in userspace. This provides the perfect platform for experi-
mentation as it provides for testing and debugging in a convenient manner.
1
Available online from Giampaolo's website: http: I /www. letterp. com/ -dbg/
203
204 SSFS IMPLEMENTATION
FS-Kit also provides an effective analogue for a kernel-level file system, as all
of the internal workings are identical to that of a normal kernel file system
implementation, except that it is implemented as a set of userspace utilities.
FS-Kit provide a number of userspace utilities to interact with the file
system, these are listed below:
makefs - this utility is used to initialise the file system.
fsh - this is used to access a file system shell in order to interact with
the file system implementation.
tstfs - this is used to perform "stress" tests on the file system.
In the following section we will discuss the implementation of the hidden
file system
A.3 Hidden File System
For the hidden file system implementation we extensively added to FS-Kit
to allow steganographic content to be embedded. Although FK-Kit was used
as the host file system, the hidden file system implementation was written
"from scratch" and then integrated within the host file system.
To embed the hidden file system into the host file system implementation,
very few modifications had to be made to the FS-Kit implementation. These
modifications were only used to access the hidden file system initialisation,
and dynamic reallocation routines. The bulk of the host file system source
code remained unaltered in order to allow for backward compatibility.
A number of userspace utilities were created in order to interact and
experiment with the hidden file system component of SSFS. These utilities
are listed below.
makehfs - this utility contains a modified version of the makefs (dis-
cussed in the previous section) which will initialise both the host and
hidden file systems.
hsh - this utility provides a dedicated shell used to interact with the
hidden file system component. The shell provides a number of com-
mands which will allow the user to create and modify files and direc-
tories.
A.3. HIDDEN FILE SYSTEM 205
As discussed above, the hidden file system component is a complete file
system implementation. The position for the hidden file system's structures
and files are determined through interaction with the host file system's on-
disk structures.
Once the hidden data has been positioned on the physical disk, the dy-
namic reallocation mechanism will perform the necessary reallocations when
the host file system requests a write to a particular physical blocks.
The hidden file system implementation manages all aspects of storage
and retrieval of the hidden data; this will include the translation between
the logical and physical locations of the hidden data. Recall that the logical-
to-physical mappings are handled through interaction with the Translation
Map which is implementation within the hidden file system.
In the following section we will discuss the hidden file system creation
utility and the hidden file system command shell, which is used to interact
with the hidden file system.
The SSFS Creation Utility (makehfs)
The makehfs utility is used to create and initialise both the host and hidden
file system. The makehfs utility will firstly create the host file system using
the FS-Kit makefs utility, and then initialise and embed the hidden file
system.
Depending on the overall size of the host file system, the limits of the hid-
den file system are determined during the creation of the hidden file system.
This allows a reasonable amount of space to be reserved for the stegano-
graphic content, without significantly consuming the space available to the
host file system.
In the following section we will discuss the hidden command shell, which
will allow the user to interact with the hidden file system.
The Hidden Command Shell (hsh)
The hidden command shell is used to provide a seamless interface when the
hidden data and the user. The transparent encryption and decryption of the
hidden data is managed by the hidden file system implementation, through
the hidden shell. The command shell will also allow the user to access hidden
data once it has been reallocated on the physical device.
The command shell provides the following user commands:
206 SSFS IMPLEMENTATION
ls - displays a directory listing.
pwd- displays the current working directory.
mkdir - creates a new directory.
rmdir - removes an existing directory.
cd - allows the user to change the current working directory.
touch- create a new file, with size 'zero'.
appendr - appends a specified number of random bytes to an existing
file.
cat - displays a 'hexdump' of an existing file.
rm - removes an existing file.
crandom- creates a file with random size and which contains random
data.
tstfs- performs a "stress test" on the hidden file system.
quit - terminates the hidden shell.
In the following section we will present a number of screenshots to demon-
strate the operation of SSFS.
A.4 Screenshots
In this section we present a number of screenshots which are used to demon-
strate SSFS in operation. A description of the screenshots will be presented
below. The screenshots depict SSFS compiled with debugging information
included which allows the operation of SSFS to be examined as it is in oper-
ation.
1. Figure A.l on page 209 - this figure shows the operation of the
makehf s utility. Firstly the host file system is initialised, followed by
the initialisation of the hidden file system. As can been seen in this
figure, the limits of the hidden file system are determined by this utility.
A.4. 8CREENSHOTS 207
2. Figure A.2 on page 209- this figure shows the operation of the hsh
utility which allows the user to interact with the hidden file system. It
is implemented as an interactive shell which accepts UNIX-style com-
mands.
3. Figure A.3 on page 210- this figure demonstrates all the commands
which are available for the user to interact with hsh.
4. Figure A.4 on page 210- this figure demonstrates a number of com-
mands within hsh. Firstly the ls command is used to obtain a directory
listing. The mkdir command is then used to create a new directory.
Finally, the ls command is used to display the new directory in the
directory hierarchy.
5. Figure A.5 on page 211 - this figure demonstrates the creation of a
file within the hidden file system. Firstly the cd command is used to
change into a new directory. The touch command is then used to create
a new file with size 'zero'. The file is then populated with random data
using the appendr command. Finally a directory listing is obtained
with the ls command.
6. Figure A.6 on page 211- this figure demonstrates the ability to display
the contents of a file. A new file is created and 100 bytes of random
data is appended to it. The content of the new file is then displayed
using the cat command.
7. Figure A. 7 on page 212- this figure demonstrates the removal of a
file. The rm command is used to remove a file from the current working
directory.
8. Figure A.8 on page 212- this figure demonstrates the removal of a
directory. The rmdir command is used to remove a directory from the
hidden file system. As can be seen, only an empty directory can be
removed. If the directory is not empty, an error is displayed.
9. Figure A.9 on page 213- this figure demonstrates FS-Kit's fsh utility.
This utility provides a shell for the host file system, and it used to access
non-hidden data. As can be seen from this figure, the fsh utility has
been modified to provide access to some of the hidden file system's
control structures.
10. Figure A.10 on page 213 -- this figure demonstrates the dynamic re-
allocation mechanism operating within the fsh utility. As can be seen
208
A.5
SSFS IMPLEMENTATION
from this figure, as the host file system attempts to write to a num-
ber of physical blocks which contain hidden data, the hidden data is
then reallocated. This figure also shows the identification of the Real-
location Categories, and the modification of the Translation Map and
TMap Array.
Conclusion
The utilities presented in this appendix constitute a working prototype of
SSFS, which gives us the ability to experiment and obtain meaningful results.
The prototype consists of the following components: the host file system, the
hidden file system, and a number of userspace utilities. The results presented
in this dissertation were obtained through interaction with our prototype of
SSFS.
This implementation allows for convenient examination of the internal
workings of a steganographic file system. This allows the strengths and
weaknesses of a steganographic file system to be easily examined.
In this appendix we discussed the components of SSFS; the host and
hidden file system. We go on to discuss a number of utilities which allow a
user to interact with the file system implementation. Finally we presented a
number of screenshots which depict SSFS in operation.
A.5. CONCLUSION 209
Figure A.l: Initialising the host and hidden file system with the makehfs applica-
tion.
Figure A.2: Starting the hidden file system shell.
210 SSFS IMPLEMENTATION
Figure A.3: Showing all the commands available to operate on the hidden file
system.
Figure A.4: Performing a directory listing with the Ls command and creating a
directory with the mkdir command.
A.5. CONCLUSION 211
Figure A.5: Creating a file in the newly created directory, and then appending data
using the appendr command.
Figure A.6: Creating a file and displaying the contents of the file on the console.
212 SSFS IMPLEMENTATION
Figure A.7: Deleting a file from the hidden file system, using the rm command.
Figure A.8: Attempting to delete a file from the hidden file system, using the rmdir
command.
A.5. CONCLUSION 213
Figure A.9: Creating a file on the host file system, and then appending data to
that file.
Figure A.lO: Showing the dynamic reallocation process, with identification of the
reallocation categories, and modification of the Translation Map.
Bibliography
[1] R. Anderson and F.A.P. Petitcolas. On The Limits of Steganography.
IEEE Journal of Selected Areas in Communications, 16:474-481, 1998.
doi: 10.1109/49.668971.
[2] R. Anderson, E. Biham, and L. Knudsen. Serpent: A New Block Ci-
pher Proposal. Proceedings of the 5th International Workshop on Fast
Software Encryption, Paris, France, LNCS 1372, pages 222-238, 1998.
doi: 10.1007 /3-540-69710-L15.
[3] R. Anderson, E. Bilham, and L. Knudsen. Serpent: A Proposal for
the Advanced Encryption Standard. NIST AES Proposal, 1998. URL
http://www.cl.cam.ac.uk/-rja14/Papers/serpent.pdf.
[4] R. Anderson, R. Needham, and A. Shamir. The Steganographic File
System. In David Aucsmith, editor, Information Hiding, Second Inter-
national Workshop, IH'98, Portland, Oregon, USA, April, 1999, Pro-
ceedings, 1998. doi: 10.1007 /3-540-49380-8_6.
[5] W. Bender, D. Gruhl, N. Morimoto, and A. Lu. Techniques for data
hiding. IBM Systems Journal, 35:313-336, 1996. doi: 10.1147 /sj.353.
0313.
[6] W. Bender, F. J. Paiz, W. Butera, S. Pogreb, D. Gruhl, and R. Hwang.
Applications for data hiding. IBM Systems Journal, 39:547-568, 2000.
doi: 10.1147 /sj.393.0547.
[7] M. Blaze. A Cryptographic File System for UNIX. In CCS '93: Pro-
ceedings of the 1st ACM conference on Computer and communications
security, pages 9-16, New York, NY, USA, 1993. ACM Press. ISBN
0-89791-629-8. doi: 10.1145/168588.168590.
[8] R. Card, T. Ts'o, and S. Tweedie. Design and Implementation of
the Second Extended Filesystem. Proceedings of the First Dutch In-
215
216 BIBLIOGRAPHY
ternational Symposium on Linux, pages 90-367, 1994. URL http:
//web.mit.edu/tytso/www/linux/ext2intro.html.
[9] F. M. Carrano and W. Savitch. Data Structures and Abstractions with
Java. Prentice Hall, 2003. ISBN 0-13-017489-0.
[10] E. Casey. Digital Evidence and Computer Crime. Academic Press, 2004.
ISBN 0-12-163104-4.
[11] J. Corbet, A. Rubini, and G Kroah-Hartman. Linux Device Drivers.
O'Reilly Media, Inc., third edition, 2005. ISBN 0-596-00590-3.
[12] W. Diffie and M. Hellman. New Directions in Cryptography. Infor-
mation Theory, IEEE Transactions on, 22(6):644-654, Nov 1976. ISSN
0018-9448. doi: 10.1109/TIT.1976.1055638.
[13] U. Drepper, S. Miller, and D. Madore. md5sum- Manual Page. UNIX
Man Page, September 2007.
[14] I. Dubrawsky. Cryptographic Filesystems, Part One: Design and Im-
plementation, March 2003. URL http: I /www. securi tyf ocus. com/
infocus/1673.
[15] M. Dworkin. Recommendation for Block Cipher Modes of Operation.
NIST Special Publication 800-38A, 2001. URL http: I I csrc. nist. gov I
publications/nistpubs/800-38a/sp800-38a.pdf.
[16] FIPS PUB 180-2. Federal Information, Processing Standards Publica-
tion 180-2, Secure Hash Standard, August 2002. URL http://csrc.
nist.gov/publications/fips/fips180-2/fips180-2.pdf.
[17] FIPS PUB 197. Federal Information, Processing Standards Pub-
lication 197, Advanced Encryption Standard ( AES) , N ovem-
ber 2001. URL http: I /www. csrc. nist. gov /publications/fips/
fips197/fips-197.pdf.
[18] FIPS PUB 46-3. Federal Information, Processing Standards Publication
46-3, Data Encryption Standard, October 1999. URL http: I I csrc.
nist.gov/publications/fips/fips46-3/fips46-3.pdf.
[19] G. A. Francia and T. S. Gomez. Steganography Obliterator: An Attack
on the Least Significant Bits. In InfoSecCD '06: Proceedings of the
3rd annual conference on Information security curriculum development,
pages 85-91, New York, NY, USA, 2006. ACM Press. ISBN 1-59593-
437-5. doi: 10.1145/1231047.1231066.
BIBLIOGRAPHY 217
[20] Free Software Foundation. GNU Binutils. UNIX Man Page. URL
http:llwww.gnu.orglsoftwarelbinutilsl.
[21] J. Fridrich, M. Goljan, and R. Du. Reliable Detection of LSB Steganog-
raphy in Color and Grayscale Images. In MM&#38;Sec '01: Proceed-
ings of the 2001 workshop on Multimedia and security, pages 27-30,
New York, NY, USA, 2001. ACM Press. ISBN 1-58113-393-6. doi:
10.1145/1232454.1232466.
[22] J. A. Gallian. Contemporary Abstract Algebra. Houghton Mifflin Com-
pany, 5th edition, 2002. ISBN 0-618-12214-1.
[23] D. Giampaolo. Practical File System Design with the BE File System.
Morgan Kaufmann Publishers, Inc., 1999. ISBN 1-55860-497-9. URL
http:llwww.letterp.coml-dbgl.
[24] The Open Group and IEEE. Single UNIX Specification Version 3. URL
http:llwww.unix.orglsingle_unix_specificationl.
[25] D. Gruhl, A. Lu, and W. Bender. Echo Hiding. In R. Ander-
son, editor, Information Hiding, First International Workshop, Isaac
Newton Institute, Cambridge, England, May 1996, volume 117 4 of
LNCS, pages 295-315. Springer-Verlag, 1996. ISBN 3-540-61996-8. doi:
10.1007 /3-540-61996-8_48.
[26] J. S. Heidemann and G. J. Popek. File-System Development with Stack-
able Layers. ACM Transactions on Computer Systems, 12(1):58-89,
1994. ISSN 0734-2071. doi: 10.1145/174613.174616.
[27] S. Hetzl. Steghide- Manual Page. UNIX Man Page, May 2002. URL
http:llsteghide.sourceforge.net.
[28) J. Hooper. Hexley- DarwinOS Mascot. URL http: I /www. hexley. com.
Hexley DarwinOS Mascot Copyright 2000 by Jon Hooper. All Rights
Reserved.
[29] D. E. Knuth. Big Omicron and Big Omega and Big Theta. SIGACT
News, 8(2):18-24, 1976. ISSN 0163-5700. doi: 10.1145/1008328.1008329.
[30] M. Kuhn. The EURion Constellation, February 2002. URL http: I I
www.cl.cam.ac.ukl-mgk25leurion.pdf.
[31] R. Love. Linux Kernel Development. Sams Publishing, 2004. ISBN
0-672-32512-8.
218 BIBLIOGRAPHY
[32] N. Mavroyanopoulos. MCrypt- Manual Page. UNIX Man Page, May
2002. URL http: I /mcrypt. sourceforge. net.
[33] A. D. McDonald and M.G. Kuhn. StegFS: A Steganographic File Sys-
tem for Linux. In Andreas Pfitzmann, editor, Information Hiding, Third
International Workshop, IH'99, Dresden, Germany, September/Octo-
ber, 1999, Proceedings, volume 1768 of LNCS, pages 462-477. Springer-
Verlag, 1999. ISBN 3-540-67182-X. doi: 10.1007 /10719724_32.
[34] M. K. McKusick, W. N. Joy, S. J. Leffler, and R. S. Fabry. A Fast
File System for UNIX. ACM Transactions on Computer Systems, 2(3):
181-197, 1984. ISSN 0734-2071. doi: 10.1145/989.990.
[35] A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone. Handbook of
Applied Cryptography. CBC Press, 1996. ISBN 0-84938-523-7. URL
http://www.cacr.math.uwaterloo.ca/hac/index.html.
[36] I.S. Moskowitz, G.E. Langdon, and L. Chang. A New Paradigm Hidden
in Steganography. In NSPW '00: Proceedings of the 2000 workshop on
New security paradigms, pages 41-50, New York, NY, USA, 2000. ACM
Press. ISBN 1-58113-260-3. doi: 10.1145/366173.366189.
[37] S.J. Murdoch. Software Detection of Currency, May 2004. URL http:
//www.cl.cam.ac.uk/-sjm217/talks/ih04currency.pdf.
[38] B. Naujok. XFS Filesystem Structure Rev 2.0. Technical report, Sil-
icon Graphics, Inc, 2006. URL http://oss.sgi.com/projects/xfs/
papers/xfs_filesystem_structure.pdf.
[39] H. Pang, K. Tan, and X. Zhou. StegFS: A Steganographic File System.
In Data Engineering, 2003. Proceedings. 19th International Conference
on, pages 657-667, 5-8 March 2003. doi: 10.1109/ICDE.2003.1260829.
[40] M. Peinado, F.A.P. Petitcolas, and D. Kirovski. Digital Rights Manage-
ment for Digital Cinema. Multimedia Systems, 9:228-238, 2003. ISSN
0942-4962. doi: 10.1007 /s00530-003-0094-3.
[41] F.A.P. Petitcolas, R.J. Anderson, and M.G. Kuhn. Information Hiding-
A Survey. Proceedings of the IEEE, 87(7):1062-1078, 1999. doi: 10.
1109/5.771065.
[42] E. Michael Power, Jonathan Gilhen, and Roland L. Trope. Setting
Boundaries at Borders: Reconciling Laptop Searches and Privacy. Secu-
rity 9 Privacy, IEEE, 5(2):72-75, March-April 2007. ISSN 1540-7993.
doi: 10.1109/MSP.2007.40.
BIBLIOGRAPHY 219
[43] R. Rivest. RFC1321: The MD5 Message-Digest Algorithm, 1992. URL
http://www.ietf.org/rfc/rfc1321.txt.
[44] R. L. Rivest, A. Shamir, and L. Adleman. A Method for Obtaining
Digital Signatures and Public-Key Cryptosystems. Communications of
the ACM, 21(2):120-126, 1978. ISSN 0001-0782. doi: 10.1145/359340.
359342.
[45] R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and B. Lyon. Design
and Implementation of the Sun Network Filesystem. Proceedings of the
Summer 1985 USENIX Conference, pages 119-130, 1985. URL http:
//citeseer.ist.psu.edu/sandberg85design.html.
[46] B. Schneier. Applied Cryptography: Protocols, Algorithms, and Source
Code in C. John Wiley & Sons, Inc., 1994. ISBN 0-471-59756-2.
[47] B. Schneier. Description of a New Variable-Length Key, 64-Bit Block
Cipher (Blowfish). Fast Software Encryption, Cambridge Security
Workshop Proceedings (December 1993), pages 191-204, 1994. doi:
10.1007/3-540-58108- L24.
[48] B. Schneier. Crossing Borders with Laptops and PDAs, May 2008. URL
http://www.schneier.com/essay-217.html.
[49] SecuriTeam. Linux Cryptoloop Watermark Exploit, May 2005. URL
http://www.securiteam.com/exploits/5UPOP1PFPM.html.
[50] A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts
with Java Sixth Edition. John Wiley & Sons, Inc., 2004. ISBN 0-471-
48905-0.
[51] K. A. Smith and M. Seltzer. A Comparison of FFS Disk Allocation Poli-
cies. In ATEC'96: Proceedings of the Annual Technical Conference on
USENIX 1996 Annual Technical Conference, Berkeley, CA, USA, 1996.
USENIX Association. URL http: I /www. usenix. org/publications/
library/proceedings/sd96/smith.html.
[52] D. R. Stinson. Cryptography Theory and Practice. Chapman & Hal-
1/CRC, 2002. ISBN 1-58488-206-9.
[53] M. Szeredi. FUSE: Filesystem in Userspace. Webpage. URL http:
//fuse.sourceforge.net/.
220 BIBLIOGRAPHY
[54] A. Z. Tirkel, G. A. Rankin, R. M. van Schyndel, W. J. Ho, N. R. A.
Mee, and C. F. Osborne. Electronic Watermark. In Digital Image Com-
puting, Technology and Applications (DICTA '93}, pages 666-673, Mac-
quarie University, Sidney, 1993. URL http: I I ci teseer. ist. psu. edul
tirkel93electronic.html.
[55) R. L. Trope and E. M. Power. Lessons for laptops from the 18th century.
Security f3 Privacy, IEEE, 4(4):64-68, July-Aug. 2006. ISSN 1540-7993.
doi: 10.1109/MSP.2006.97.
[56] University of Southern California- Signal & Image Processing Institute
Image Database. 4.2.03 - Mandrill, . URL http: I lsi pi. usc. edul
databaselmiscl4.2.03.tiff.
[57] University of Southern California- Signal & Image Processing Institute
Image Database. 5.1.09- Moon Surface, . URL http: I lsi pi. usc. edul
databaselmiscl5.1.09.tiff.
[58] A. Westfield and A. Pfitzmann. Attacks on Steganographic Systems.
In Andreas Pfitzmann, editor, Information Hiding, Third International
Workshop, IH'99, Dresden, Germany, September/October, 1999, Pro-
ceedings, volume 1768 of LNCS, pages 61-76. Springer-Verlag, 1999.
ISBN 978-3-540-67182-4. doi: 10.1007 /10719724_5.
[59] J.H.K. Wu, R. Chang, C. Chen, C. Wang, T. Kuo, W. Moon, and
D. Chen. Tamper Detection and Recovery for Medical Images Using
Near-lossless Information Hiding Technique. Journal of Digital Imaging,
0:59-76, 2007. doi: 10.1007 /s10278-007-9011-1.
[60] E. Zadok, I. Badulescu, and A. Shender. Cryptfs: A Stackable Vnode
Level Encryption File System, 1998. URL http: I I ci teseer. ist. psu.
edulzadok98cryptfs.html.
[61] E. Zadok, I. Badulescu, and A. Shender. Extending File Systems Us-
ing Stackable Templates. Proceedings of the Annual USENIX Technical
Conference, pages 57-70, June 1999. URL http:llwww.usenix.orgl
eventslusenix99lfull_paperslzadoklzadok.pdf.