You are on page 1of 6

6th International Conference on Internet Technology and Secured Transactions, 11-14 December 2011, Abu Dhabi, United

Arab Emirates

A Decentralized Access Control Mechanism using


Authorization Certificate for Distributed File Systems
Jumpei Arakawa, Koichi Sasada
Graduate School of Information Science and Technology, University of Tokyo, Japan
jumpei@nue.ci.i.u-tokyo.ac.jp, sasada@ci.i.u-tokyo.ac.jp

Abstract—Scalability and high availability are essential features as they are for distributed file systems. Therefore, we propose
when considering security as a foundation of cloud computing. an ACM that is suited to distributed file systems in the age of
However, existing centralized access control mechanisms are cloud computing. The ACM we propose (hereafter “proposed
unable to satisfy these requirements. Here we propose a mechanism”) is not centralized; rather, it is decentralized by
decentralized access control mechanism based on authorization
certificates. We describe a method to decentralize the Certificate
using a certificate. The reason for this is that centralized
Revocation List (CRL) and a method to improve access control ACMs, as represented by the mechanism based on an Access
performance. We show evaluation results for the availability and Control List (ACL), face problems with availability and
scalability of our proposed mechanism through simulation and scalability. Although decentralized ACMs based on
prototype implementation. authorization certificates have been proposed to overcome this
problem [1][3][4][5], the revocation of certificates by a
Keywords – security; access control; authorization certificate; Certificate Revocation List (CRL) is not supported in most
decentralization; distributed databases existing studies; hence, the costs required for revocation or
verification of authorization certificates have not been
I. INTRODUCTION properly evaluated. Therefore, here we propose a method to
Recent developments in cloud computing highlight the decentralize the CRL by using consistent hashing. We
increasing importance of distributed file systems. This is evaluate the availability and scalability of the proposed
because distributed file systems are foundational to the mechanism through simulation and prototype implementation.
scalability and high availability of cloud computing, which
II. ISSUES
provides computing resources as a service. The possibility of
enhancing performance and capacity by increasing the number In this section, we explain a number of key problems that
of nodes that comprise the system to increase the number of lead us to consider the proposed mechanism. We will proceed
users or volume of available resources, i.e., scale-out, is an with discussion by dividing ACMs into those that are
important feature. In addition, availability is directly linked to centralized and those that are decentralized, according to the
the quality of service. For this purpose, eliminating a single following definitions.
point of failure is important in order to achieve high Centralized ACM: An ACM in which all information on
availability. access control is stored on the side of the mechanism
Thus, the two crucial conditions for the evaluation of Decentralized ACM: An ACM in which a small portion of
distributed file systems are scale-out of performance and the the information on access control is stored on the side of the
absence of a single point of failure. In fact, many distributed mechanism
file systems that satisfy these conditions have been proposed
and developed [8][9]. A. Centralized ACMs
Access control mechanism (ACM) is a mechanism to For practical purposes (but not for the purpose of this study),
control accesses (operations including reading, writing, and most ACMs can be categorized as centralized ACMs. The
deleting) to resources (files and directories) of a file system. ACM included in Linux or Windows as a standard feature is
When utilization of a large volume of resources by many users also a centralized ACM. Centralized ACMs are highly
is considered, evaluation of the scalability and availability is flexible and easy to implement because all information is
necessary, for not only the distributed file system but also the managed on the side of the mechanism.
entire system including the ACM. When two or more systems However, because all the information is managed on the
provide one service in cooperation, a system with poor side of the mechanism, the volume of information to be
scalability might become a bottleneck and reduce the managed simply increases in proportion to the increase in the
performance of the service. In addition, a system with a single number of users or the number of resources. Compared to the
failure point is equivalent to a single failure point in the decentralized ACM to be described later, this is
service. disadvantageous in terms of scalability.
Based on the above, the scale-out of performance and the In addition, centralized ACMs require the involvement of
absence of single point of failure are as important for ACMs special users (e.g., administrator, resource owner) for user

978-1-908320-00-1/11/$26.00 ©2011 IEEE 148


registration or modification of access privileges. This implies
a concentration of management/operation costs, which poses a
practical problem.
B. Decentralized ACMs
Decentralized ACMs have been proposed to solve the
issues of centralized ACMs described above [1][6].
Decentralized ACMs do not require the maintenance of most
access control information on the side of the mechanism. This
Figure 2. Behavior of access control mechanism based on certificates.
is because the information required by a user is presented to
the side of the mechanism at the time of access control. By
using access control information that can be verified with a
digital signature called an authorization certificate, basic
processing of access control is possible with only the
authorization certificate presented by a user. For this reason,
processing can be decentralized to multiple nodes—an
arrangement that can be considered superior to centralized
ACMs in terms of scalability.
In most existing studies, however, expiration is the only
supported method to revoke authorization certificates.
Therefore, even if a CRL is supported [4], we may say that the Figure 3. Delegation by using authorization certificates.
availability or scalability has not been properly evaluated. A. Access Control with Authorization Certificate
III. DESIGN OF PROPOSED MECHANISM Before providing a detailed description of the certificate, the
flow of access control by using authorization certificates is
In this section, we explain the design and core ideas of the
indicated in Figure 2. A user can access the distributed file
proposed mechanism. Figure 1 represents an overview of the
system only through AC nodes as the premise. In other words,
proposed mechanism, which is comprised of equivalent
it is impossible to directly access the distributed file system
multiple Access Control (AC) nodes, each of which equally
without first going through the AC nodes.
provides all functions necessary for access control. As
When a user wants to access the resource in the distributed
indicated in the figure, some nodes that comprise the
file system, s/he sends an operation request (e.g., file reading,
distributed file system and the ACM may be the same
directory creation, etc.) to the AC node along with the
machines.
authorization certificate s/he owns. On the side of the AC
node, the operation request is sent to the target distributed file
system only if 1) the certificate is valid, 2) the requested
operation is allowed with the certificate, and 3) the user is
authenticated as the owner of the certificate presented. At this
point, the AC node refers to the CRL by communicating with
other AC nodes as necessary in the process to verify the
certificate in 1).

B. Delegation by using Authorization Certificates


The authorization certificate utilized in the proposed
mechanism is comprised of the following items.
• ID
• Set of target resources
• Set of permitted operations
• Effective period (start time, end time)
• Chained authorization certificates
Figure 1. Proposed mechanism and distributed file system. • Data for authentication (e.g., user’s public key)
• Issued time
• Signature
In the proposed mechanism, a user can issue a new
authorization certificate based on the certificate s/he owns to
another user who owns a subset authority, as in existing
studies.
In contrast to existing studies, the new authorization

149
certificate in the proposed mechanism is always issued
through an AC node. Each AC node stores a key pair in
order to issue and verify certificates. The procedure for
node id = 0
issuing new authorization certificates by using the certificate
Cp as the origin of delegation is indicated below.
node id = 150
Certificate Issuance: node id = 750

Step 1 A user sends the parent certificate Cp and information replicate


comprising the newly issued certificate (target resources,
permitted operations, effective period, etc.) to the AC node. replicate
Step 2 The AC node verifies the parent certificate Cp (details
are described later). node id = 600 node id = 300
Step 3 The AC node authenticates the certificate Cp to the
user (e.g., public key authentication).
Step 4 The AC node ensures that the requested privileges
node id = 450 hash value = 400
(target resources, permitted operations, effective period) are a
Figure 4. Decentralized CRL by using consistent hashing (k = 3).
subset of those in the parent certificate Cp.
Step 5 The AC node issues a new certificate C by adding a At this point, it is important to select information as a key to
signature with its own private key. the certificate for hashing. Uniform decentralization can be
Certificate Verification: expected in the case that the certificate ID is simply the key,
Step 1 The AC node confirms that the certificate at the current while communications to verify the certificate might occur
time is within the effective period. frequently. On the other hand, there might be a certain extent
Step 2 The AC node confirms that neither the certificate nor of bias in decentralization in the case that the beginning or
the chained certificate is included in the CRL (details are middle of the chained certificate ID is the key, although the
described later). number of communications upon verification can be limited.
Step 3 The AC node verifies the signature by using the public Each AC node possesses the IDs of other AC nodes and all
key corresponding to the AC node that issued the certificate. connection information in the form of a routing table (full-
Based on the above, access control can be set up within the mesh network). Hence, each AC node can always look up the
scope of authority owned on its own, without being limited to target AC node without extra communication. The procedure
specific users such as an administrator or owner. Personnel to revoke the certificate C is indicated below (in the case that
management/operation costs can thus be decentralized. the value after hashing the certificate is 400 and k is three).
C. Decentralization of ACM Certificate Revocation:
Step 1 A user sends the certificate C to be revoked and the
In the case of access control based on authorization
Certificate Cu it owns to the AC node.
certificates, all necessary information is included in the
Step 2 The AC node transfers the subsequent processing to the
certificate; therefore, it is essentially unnecessary for each AC
AC node (ID = 450) closest from the hash value (=400) of the
node to retain information. Hence, each AC node can perform
certificate.
access control independently. With this feature, access control
Step 3 The AC node verifies the certificate C.
performance can be increased by increasing the AC nodes.
Step 4 The AC node confirms that the certificate Cu exists in
However, a CRL that holds revoked certificates does not
the target certificate C or the chained certificate.
satisfy this feature exceptionally. The structure to decentralize
Step 5 The AC node requests the user to authenticate the
a CRL is explained below.
certificate Cu.
It should be possible to access a CRL from all AC nodes to
Step 6 The AC node adds the certificate C to the CRL of its
confirm that the authorization certificate sent by a user and
own AC node and to the CRL in the AC nodes (ID = 600,
maintained on the access control side is not invalid.
750) for replication.
Supposing that the CRL is developed on a specific node, that
Step 7 The AC node notifies completion of revocation to the
node becomes a bottleneck and hence a single point of failure
user.
for the entire ACM, preventing the scale-out of performance.
As indicated above, there is no single point of failure because
In the proposed mechanism, therefore, a CRL is constructed
multiple nodes hold the CRL configuration information
across all AC nodes by using consistent hashing. An ID is
(certificate). The procedure to refer to the CRL from each AC
assigned to each AC node and arranged over the circular hash
node at the time of certificate verification is as follows.
space. The hash value is then calculated from the certificate
Certificate Validation (reference to CRL):
that is stored at the AC node closest to the value (forward
Step 1 A user sends the certificate to the AC node (validator).
direction), and the replicated certificates are stored in the
Step 2 The validator lists the AC nodes (holders) that hold the
adjacent AC nodes. The certificate is stored at k AC nodes,
corresponding CRL based on the ID of the certificate and its
where k is a parameter indicating the number of redundancies
chained certificate (including AC nodes that hold replicated
(Figure 4).
certificates in their CRL).
Step 3 If there are certificates included in validator’s CRL, the

150
validator returns the failure to the user. If not, the validator certificates can be verified and authenticated at once. By
removes certificates that correspond to the validator’s CRL retaining verified/authenticated authorization certificates
from the set of certificates to validate. (more precisely, data expressing them as a capability map)
Step 4 If there are no certificates to validate, the validator within the session for a certain period, high-speed access
returns success to the user. control is possible. However, the cache for these certificates
Step 5 The validator selects one AC node (holder) from is cleared not only with the passing of a certain period but also
among holders and sends certificates to validate to that holder. as necessary, such as is the case where a CRL update is
Step 6 If there are certificates included in holder’s CRL, the detected.
validator returns failure to the user. If not, the validator 2) Cache for Inheritance of Authority
removes certificates that correspond to the holder’s CRL from In a file system, a hierarchical relationship exists among
the set of certificates to validate. resources. For example, consider that the resource abc is a
Step 7 Return to Step 4. directory and the resource def is included in the directory as a
By employing the above procedure, all certificates can be file or directory. In this case, the authority retained by the
revoked or confirmed by all AC nodes (referential resource in the upper hierarchy must be applicable to the
transparency of CRLs is achieved). resource in the lower hierarchy. We call this property the
“inheritance of authority.” When this property is implemented
IV. PROTOTYPE IMPLEMENTATION and a given resource is not included in the capability map, the
In this section, we explain prototype implementation of the upper resource is checked in the same way and
ACM proposed in the previous section. permission/refusal of authority is eventually determined by
repeated checking. However, if the operation to the same
A. System Structure
resource is repeated, a solution by inheritance is ineffective.
We implemented a prototype system of the proposed Therefore, when the hierarchy of authority is resolved with
mechanism (yassac) as a PHP script on Apache. As the target yassac, the result is also added to the capability map as a cache.
distributed system, we utilized yass, a distributed file system This improves access control to the same or surrounding
developed by the authors. Given that yass is also implemented resource (Figure 6).
as a single PHP script on Apache, the system was structured
so that yassac and yass operate on the same computer. HTTP
is used for communication between servers or between the
server and client.
B. Capability Map (Data Structure)
As an internal expression within the server program,
multiple authorization certificates retained by the same user Figure 6. Caching for the inheritance of authority.
are collectively expressed with a structure called a capability
map (Figure 5). The capability map is a data structure with an V. EVALUATION
ID indicating the resource as the key and an integer that The overhead for the proposed mechanism was measured
expresses the permitted operations with bit flags. By using the by implementing the prototype, and scalability was evaluated
capability map, the availability of an operation can be by using prototype implementation and simulation.
determined by calculating the logical product of the bit
expression of the requested operation and the capability flags A. Evaluation Environment
extracted from capability map with the ID of the target We used 20 linux virtual nodes for evaluation. A 2.8-GHz
resource of the request. Pentium 4 processor and 1-GB memory is loaded onto each
physical node with CentOS 5.3 running on each node. The
virtualization software Xen runs on the physical node, and
CentOS 5.4 runs on a virtual node with 512-MB memory. We
also used Apache 2.2.14 with OpenSSL 0.9.8l and PHP 5.3.1.
Each physical node is connected via a Gigabit Ethernet
connection.
B. Overhead
Figure 5. Capability map.
Frist, we evaluated the performance of our prototype with a
C. Performance improvements benchmark program on a single node in order to measure the
1) Cache of Verified/Authenticated Certificate overhead for access control without communication. As a
Sending an authorization certificate at each operation benchmark program, a program to simply repeat directory
request, verifying it, and authenticating the user (owner) of the creation is written in PHP by using the authorization
certificate is ineffective. With yassac, therefore, a PHP certificate with the privilege of directory creation at the user’s
session mechanism with log-in processing is employed prior root directory. To measure processing time, 300 nested
to a series of operations where to-be-utilized authorization directories were created five times and the average time

151
required to create the n th directory was obtained. Here, the following three parameters that influence the
Four patterns were subject to benchmarking: [base], where number of communications required to check the CRL are
the operation is directly executed without going through considered.
yassac; [opt0], where the operation is executed through yassac • n: Number of AC nodes.
without optimization; [opt1], where the authorization • k: Number of redundancies.
certificate is cached in advance; and [opt2], where the cache • d: Length of chained certificates.
for inheritance of authority is effective under the condition of In this case, the worst value for the required number of
[opt1]. The results are indicated in Figure 7. communications is the smaller of n − 1 or d. These are the
1200 number required to check all CRLs in other AC nodes (= n −
1) and that required to check all CRLs that correspond to the
1000
chained certificates when they are stored at different AC nodes
(= d), respectively.
800
2) Simulation
processing time [us]

opt0 For the purpose of obtaining the expected value of


600
opt1 communication cost, we developed a simulation program for
opt2 the number of communications that occur to confirm the
400
base revocation of certificates according to the procedure indicated
200
in Section III. Supposing that the certificates are evenly
distributed to each AC node and the randomly-selected d
0
certificates are chained, the processing to measure the number
1 51 101 151 201 251 of AC nodes required to communicate to confirm revocation
directory depth
was attempted 1,000 times, and the average value was
Figure 7. Time required for directory creation. evaluated as the communication cost.
Figure 8 indicates the simulation results in the case where k
As indicated in the figure, in the case of [opt0], the is three.
verification and authentication of certificates causes a large
overhead. Most of the overhead was successfully shortened
by caching verified/authenticated certificates. However, with
the benchmark used, a new directory is created each time and
the setting of authority to the resource itself is not directly
described in the authorization certificate. Therefore, it was
necessary to retrace the hierarchy of the resource at each
privilege check until the privilege of the resource (user’s root
directory) described in the authority certificate was reached.
Therefore, the overhead increases as the depth of the hierarchy
increases. By enabling a cache to the inheritance of privilege,
the privilege of the parent resource is cached in the capability
map, the privilege can always be checked within a constant
time, and the overhead is sufficiently reduced.
C. Scalability
1) Communication Cost Figure 8. Simulation of the number of communications (k = 3).
Before explaining the scalability evaluation, we describe the
number of communications (number of messages). For The number of communications increases with the increase of
revocation of authorization certificates, the AC node to hold n and d, while the average number of communications is 29.42,
the CRL can be identified from the certificate with only one smaller than the worst value even in the case of n = 100 and d
communication. To issue new authorization certificates, = 100.
communication with other AC nodes does not occur The number of communications can be reduced by
(excluding the verification process). Thus, such a process increasing the number of redundancies, k. For example, in the
requires a constant number of communications regardless of case of k = n, the CRLs for all certificates are assigned to all
the number of AC nodes. AC nodes, eliminating the necessity of communication.
At this point, the number of communications necessary for However, when k is increased, the number of communicating
verification processing of authorization certificates is nodes required to store the replicate upon revocation of
considered. Communications to check the CRL could occur in certificates increases.
the process of verifying authorization certificates. Thus, the Therefore, we consider making the key used for
number of communications required to check the CRL decentralization of a CRL as the ID of the chained certificate
(confirmation of certificate expiration) must be evaluated as in the mth order rather than the ID of the certificate itself. In
the communication cost of our proposed mechanism. this case, the CRLs of one certificate are decentralized to only

152
m AC nodes. Hence, the number of communications required
to verify the certificate at depth m can be limited. Based on the above-calculated amounts, simulation results,
3) Measurement with Prototype Implementation and actual measurements with the prototype system, the
We also evaluated the performance of the proposed performance of the proposed mechanism is considered to scale
mechanism with the prototype system to verify the simulation out because the increase in communication costs with the
results. In the prototype system, the ID of the first chained increase in the number of AC nodes is limited and controllable.
certificate (the user’s root certificate) was used as the key for
VI. CONCLUSION
the certificate in decentralization (d = m = 1). For processing
with the benchmark program, an AC node was randomly We proposed a decentralized ACM without a single point of
selected and one directory was created. The program on the failure and with scale-out of performance from the viewpoint
server side had the same composition as that of [opt0], which of application to distributed file systems. We explained the
was used to measure the overhead. The target measurements design of decentralized CRLs by using consistent hashing to
included the time required for the ACM on the server side, achieve high availability and scalability. We also described a
verification and authentication of the authorization certificate, method to improve access control performance at the
and authority check. In regards to the time measured, 300 implementation level.
directories were created per client, and the average time We evaluated a simulation program and prototype
required for one access control was obtained. implementation of the proposed mechanism. The results show
The results are shown in Figure 9. The simulated results that our proposed mechanism provides high availability and
under the same condition (Figure 10) are fairly similar to the scalability for access control.
actual measurements; therefore, the former are considered
REFERENCES
adequate.
3.5
[1] Miltchev, S., Smith, M. S., Prevelakis, V., Keromytis, A., Ioannidis,
S.,”Decentralized Access Control in Distributed File System,” ACM
time for access control per node [ms]

3
Computing Serveys, Vol. 40, No. 3, Article 10, 2008.
2.5
[2] Kaminsky, M., Savvides, G., Mazieres, D. Kaashoek, M. F.,
“Decentralized user authentication in a global file system,” Proceedings
of the 19th ACM Symposium on Operating Systems Principles (SOSP),
2 pp. 60–73, 2003.
[3] Miltchev, S., Prevelakis, V., Ioannidis, J., Keromytis, A. Smith, J.,
1.5 “Secure and flexible global file sharing,” Proceedings of the Annual
USENIX Technical Conference, Freenix Track, pp. 165–178, 2003
1 [4] Levine, A., Prevelakis, V. Ioanndis, J., Ioannidis, S. Keromytis, A. D.,
“Webdava: An administrator-free approach to web file-sharing,”
0.5 Proceedings of the IEEE International Workshops on Enabling
Technologies: Infrastructure for Collaborative Enterprises(WETIC),
0
Workshop on Distributed and Mobile Collaboration, pp. 59–64, 2003.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 [5] Leung, W. A., Miller, L. E., Jones, S., “Scalable Security for Petascale
number of nodes Parallel File Systems,” Proceedings of ACM/IEEE conference on
Supercomputing, Article No. 16, 2007.
Figure 9. Measurement results with the prototype (d = m = 1). [6] Regan, J, Jensen, C., “Capability file names: Separating authorization
from user management in an Internet file system,” Proceedings of
USENIX Security Symposium, pp.211–233, 2001.
[7] J. Crampton and H. Khambhammettu, “Delegation in role-based access
control,” International Journal of Information Security, Vol. 7, No. 2, pp.
123–136, 2008.
[8] Osamu Tatebe, Kohei Hiraga, Noriyuki Soda, “Gfarm Grid File System,
New Generation Computing,” Ohmsha, Ltd. and Springer, Vol. 28, No.
3, pp. 257–275, 2010.
[9] Sage Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, Carlos
Maltzahn, “Ceph: A Scalable, High-Performance Distributed File
System,” Proceedings of the 7th Conference on Operating Systems
Design and Implementation (OSDI), 2006.

Figure 10. Simulated results (d = 1).

153

You might also like