You are on page 1of 13



VOL. 9,

NO. 4,



Ensuring Distributed Accountability for Data Sharing in the Cloud
Smitha Sundareswaran, Anna C. Squicciarini, Member, IEEE, and Dan Lin
Abstract—Cloud computing enables highly scalable services to be easily consumed over the Internet on an as-needed basis. A major feature of the cloud services is that users’ data are usually processed remotely in unknown machines that users do not own or operate. While enjoying the convenience brought by this new emerging technology, users’ fears of losing control of their own data (particularly, financial and health data) can become a significant barrier to the wide adoption of cloud services. To address this problem, in this paper, we propose a novel highly decentralized information accountability framework to keep track of the actual usage of the users’ data in the cloud. In particular, we propose an object-centered approach that enables enclosing our logging mechanism together with users’ data and policies. We leverage the JAR programmable capabilities to both create a dynamic and traveling object, and to ensure that any access to users’ data will trigger authentication and automated logging local to the JARs. To strengthen user’s control, we also provide distributed auditing mechanisms. We provide extensive experimental studies that demonstrate the efficiency and effectiveness of the proposed approaches. Index Terms—Cloud computing, accountability, data sharing.

following features characterizing cloud environments. First, data handling can be outsourced by the direct cloud service provider (CSP) to other entities in the cloud and theses entities can also delegate the tasks to others, and so on. Second, entities are allowed to join and leave the cloud in a flexible manner. As a result, data handling in the cloud goes through a complex and dynamic hierarchical service chain which does not exist in conventional environments. To overcome the above problems, we propose a novel approach, namely Cloud Information Accountability (CIA) framework, based on the notion of information accountability [44]. Unlike privacy protection technologies which are built on the hide-it-or-lose-it perspective, information accountability focuses on keeping the data usage transparent and trackable. Our proposed CIA framework provides end-toend accountability in a highly distributed fashion. One of the main innovative features of the CIA framework lies in its ability of maintaining lightweight and powerful accountability that combines aspects of access control, usage control and authentication. By means of the CIA, data owners can track not only whether or not the service-level agreements are being honored, but also enforce access and usage control rules as needed. Associated with the accountability feature, we also develop two distinct modes for auditing: push mode and pull mode. The push mode refers to logs being periodically sent to the data owner or stakeholder while the pull mode refers to an alternative approach whereby the user (or another authorized party) can retrieve the logs as needed. The design of the CIA framework presents substantial challenges, including uniquely identifying CSPs, ensuring the reliability of the log, adapting to a highly decentralized infrastructure, etc. Our basic approach toward addressing these issues is to leverage and extend the programmable capability of JAR (Java ARchives) files to automatically log the usage of the users’ data by any entity in the cloud. Users will send their data along with any policies such as access control policies and logging policies that they want to
Published by the IEEE Computer Society

LOUD computing presents a new way to supplement the current consumption and delivery model for IT services based on the Internet, by providing for dynamically scalable and often virtualized resources as a service over the Internet. To date, there are a number of notable commercial and individual cloud computing services, including Amazon, Google, Microsoft, Yahoo, and Salesforce [19]. Details of the services provided are abstracted from the users who no longer need to be experts of technology infrastructure. Moreover, users may not know the machines which actually process and host their data. While enjoying the convenience brought by this new technology, users also start worrying about losing control of their own data. The data processed on clouds are often outsourced, leading to a number of issues related to accountability, including the handling of personally identifiable information. Such fears are becoming a significant barrier to the wide adoption of cloud services [30]. To allay users’ concerns, it is essential to provide an effective mechanism for users to monitor the usage of their data in the cloud. For example, users need to be able to ensure that their data are handled according to the servicelevel agreements made at the time they sign on for services in the cloud. Conventional access control approaches developed for closed domains such as databases and operating systems, or approaches using a centralized server in distributed environments, are not suitable, due to the


. S. Sundareswaran and A.C. Squicciarini are with the College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA 16802. E-mail: {sus263, asquicciarini} . D. Lin is with the Department of Computer Science, Missouri University of Science & Technology, Rolla, MO 65409. E-mail: Manuscript received 30 June 2011; revised 31 Jan. 2012; accepted 17 Feb. 2012; published online 2 Mar. 2012. For information on obtaining reprints of this article, please send e-mail to:, and reference IEEECS Log Number TDSC-2011-06-0169. Digital Object Identifier no. 10.1109/TDSC.2012.26.
1545-5971/12/$31.00 ß 2012 IEEE

We go beyond traditional access control in that we provide a certain degree of usage control for the protected data after these are delivered to the receiver. Currently. In summary. The results demonstrate the efficiency. in that it does not require any dedicated authentication or storage system in place. Crispo and Ruffo proposed an interesting approach . in the cloud. the user will have control over his data at any location. in particular with respect to accountability. JULY/AUGUST 2012 enforce. Section 3 lays out our problem statement. The authors’ focus is very different from ours. [21]. In addition. and granularity of our approach. Our proposed architecture is platform independent and highly decentralized. To cope with this issue. 4. respectively. Jagadeesan et al. thus. we integrated integrity checks and oblivious hashing (OH) technique to our system in order to strengthen the dependability of our system in case of compromised JRE. particularly in the context of electronic commerce [10]. followed by an experimental study in Section 8. In addition. The authors propose the usage of policies attached to the data and present a logic for accountability data in distributed settings. we report the results of new experiments and provide a thorough evaluation of the system performance.556 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING. We conduct experiments on a real cloud testbed. The rest of the paper is organized as follows: Section 2 discusses related work. 2. Pearson et al. we first review related works addressing the privacy and security issues in the cloud.jpg file). launched by malicious users or due to compromised Java Running Environment (JRE). if a JAR is not able to contact its central point. Similarly. our approach can handle personal identifiable information provided they are stored as image files (they contain an image of any textual content. To date. Section 4 presents our proposed Cloud Information Accountability framework. which allows it to monitor the loss of any logs from any of the JARs. Third. we briefly discuss works which adopt similar techniques as our approach but serve for different purposes. We refer to this type of enforcement as “strong binding” since the policies and the logging mechanism travel with the data. First. Concerns arise since in the cloud it is not always clear to individuals why their personal information is requested or how it will be used or passed on to other parties. users’ data and applications reside—at least for a certain amount of time—on the cloud cluster which is owned and maintained by a third party. with Eucalyptus as middleware [41]. Such issues are due to the fact that. NO. Then. our main contributions are as follows: We propose a novel automatic and enforceable logging mechanism in the cloud. . scalability. We have made the following new contributions. We also updated the log records structure to provide additional guarantees of integrity and authenticity. Fourth. the privacy manager provides only limited features in that it does not guarantee protection once the data are being disclosed. In [7].1 Cloud Privacy and Security Cloud computing has raised a range of important privacy and security issues [19]. the authors present a layered architecture for addressing the end-to-end trust management and accountability problem in federated systems. We tested our CIA framework in a cloud testbed. we extended the security analysis to cover more possible attack scenarios. this is the first time a systematic approach to data accountability through the novel usage of JAR files is proposed. and the processing is done on the encrypted data. 2 RELATED WORK In this section. we have added a detailed discussion on related works to prepare readers with a better understanding of background knowledge. This paper is an extension of our previous conference paper [40]. It records the error correction information sent by the JARs. their solution requires third-party services to complete the monitoring and focuses on lower level monitoring of system resources. . images often reveal social and personal habits of users. In [10]. Researchers have investigated accountability mostly as a provable property through cryptographic mechanisms. any access to its enclosed data will be denied. Second. the SSN stored as a . VOL. for example. we have improved the presentation by adding more examples and illustration graphs. To our knowledge. in that they mainly leverage trust relationships for accountability. Our experiments demonstrate the efficiency. The output of the processing is deobfuscated by the privacy manager to reveal the correct result. and Sections 5 and 6 describe the detailed algorithms for automated logging mechanism and auditing approaches. Further. This strong binding exists even when copies of the JARs are created. Moreover. A representative work in this area is given by [9]. we provide the JARs with a central point of contact which forms a link between them and the user. Finally. we focus on image files since images represent a very common content type for end users and organizations (as is proven by the popularity of Flickr [14]) and are increasingly hosted in the cloud as part of the storage services offered by the utility computing paradigm featured by cloud computing. . Further. the Emulab testbed [42]. Any access to the data will trigger an automated and authenticated logging mechanism local to the JARs. scalability and granularity of our approach. Section 9 concludes the paper and outlines future research directions. little work has been done in this space. or are used for archiving important files from organizations. Finally. along with authentication and anomaly detection. Section 7 presents a security analysis of our framework. Their basic idea is that the user’s private data are sent to the cloud in an encrypted form. to cloud service providers. we also provide a detailed security analysis and discuss the reliability and strength of our architecture in the face of various nontrivial attacks. [30]. recently proposed a logic for designing accountability-based distributed systems [20]. enclosed in JAR files. [25]. have proposed accountability mechanisms to address privacy concerns of end users [30] and then develop a privacy manager [31]. Such decentralized logging mechanism meets the dynamic nature of the cloud but also imposes challenges on ensuring the integrity of the logging. We also provide a detailed security analysis and discuss the reliability and strength of our architecture. However. . 9.

While some notable results have been achieved in this respect [12]. along with the resource consumption at local machines are tracked by static software agents. A user who subscribed to a certain cloud service. and application domains. Similarly. . we use it to provide strong guarantees for the encrypted content and the log files. we identify the common requirements and develop several guidelines to achieve data accountability in the cloud. mobile code. in that we do not aim at controlling the information workflow in the clouds. 2. such as protection against chosen plaintext and ciphertext attacks. She wants to ensure that the cloud service providers of SkyHigh do not share her data with other service providers. The authors have proposed an agent-based system specific to grid computing. thus far. . she has the following requirements: Her photographs are downloaded only by users who have paid for her services. We do not rely on IBE to bind the content with the rules. there is no concrete contribution addressing the problem of usage constraints enforcement. we also extend the concepts of object-oriented programming. on the data. as it focuses on validating code. and often specialized [3]. plans to sell her photographs by using the SkyHigh Cloud Services. rather than access control. Since it is in a distributed environment. we provided a Java-based approach to prevent privacy leakage from indexing [39]. In terms of authentication techniques. Example 1. and copy. Finally. while the items being protected are held as separate files. [24]. but it is mainly focused on resource consumption and on tracking of subjobs processed at multiple computing nodes. our work may look similar to works on secure data provenance [5]. [28]. Instead. Current efforts on usage control are primarily focused on conceptual analysis of usage control requirements and on languages to express constraints at various level of granularity [32]. [38]. techniques. our methods are related to self-defending objects (SDO) [17]. Along the lines of extended content protection. once the access rights are granted. users are allowed to only view them for a limited time. she wants to have all the access information of that client. Potential buyers are allowed to view her pictures first before they make the payment to obtain the download right. high-performance. but rather for auditing whether data receivers use the data following specified policies. the PCA’s goal is highly different from our research. write.: ENSURING DISTRIBUTED ACCOUNTABILITY FOR DATA SHARING IN THE CLOUD 557 related to accountability in case of delegation. 3 PROBLEM STATEMENT We begin this section by considering an illustrative example which serves as the basis of our problem statement and will be used throughout the paper to demonstrate the main features of our system. In a summary. In order to track the actual usage of the . For some of her works. who proposed an approach for strongly coupling content with access control. . [6]. The key difference in our implementations is that the authors still rely on a centralized database to maintain the access records. so that the accountability provided for individual users can also be expected from the cloud service providers. While related to ours to the extent that it helps maintaining safe. The few existing solutions are partial [12]. With the above scenario in mind. The notion of accountability policies in [22] is related to ours. our work is to provide data accountability. Although only [43] is specific to the cloud. some of the outsourcing protocols may also be applied in this realm. For her business in the cloud. rather than monitoring content.2 Other Related Techniques With respect to Java-based techniques for security. . this is not for verifying data integrity. Works on data provenance aim to guarantee data integrity by securing the data provenance. In previous work. In addition. We also leverage IBE techniques. we also log where the data go. but in fact greatly differs from them in terms of goals. They ensure that no one can add or remove entries in the middle of a provenance chain without detection. general outsourcing techniques have been investigated over the past few years [2]. using Identity-Based Encryption (IBE) [26]. a professional photographer. usage control [33] is being investigated as an extension of current access control mechanisms. [15]. Differently. which could be integrated with the CIA framework proposed in this work since they build on related architectures. only users from certain countries can view or download some sets of photographs. To the best of our knowledge. In this work. especially in distributed settings [32]. . Delegation is complementary to our work. After the data are received by the cloud service provider. the data will be fully available at the service provider. so that data are correctly delivered to the receiver. Alice. [29]. all these works stay at a theoretical level and do not include any algorithm for tasks like mandatory logging. the only work proposing a distributed approach to accountability is from Lee and colleagues [22]. Due to the nature of some of her works. [34]. [46]. to monitor the usage of the data and ensure that any access to the data is tracked. Another work is by Mont et al. so that the users cannot reproduce her work easily. However. . restricted to a single domain. Distributed jobs. In case any dispute arises with a client. where software objects that offer sensitive functions or hold sensitive data are responsible for protecting those functions/data. such as read. we do not cover issues of data storage security which are a complementary aspect of the privacy issues. usually needs to send his/her data as well as associated access control policies (if any) to the service provider. but in a very different way. The PCA includes a high order logic language that allows quantification over predicates. Appel and Felten [13] proposed the Proof-Carrying authentication (PCA) framework.SUNDARESWARAN ET AL. [34]. the service provider will have granted access rights. and focuses on access control for web services. Self-defending objects are an extension of the object-oriented programming paradigm. Using conventional access control mechanisms.

In this case. log files should be tightly bounded with the corresponding data being controlled. the log harmonizer can itself carry out logging in addition to auditing. The logger is strongly coupled with user’s data (either single or multiple data items). logger and harmonizer is sketched in Fig. Separating the logging and auditing functions improves the performance. The Cloud Information Accountability framework proposed in this work conducts automated logging and distributed auditing of relevant access performed by any entity. The log harmonizer is also responsible for handling log file corruption. as it is responsible for decrypting the logs. log files should be retrievable anytime by their data owners when needed regardless the location where the files are stored. 4. wherein a trusted certificate authority certifies the CSP. 1).g. which protects us against one of the most prevalent attacks to our architecture as described in Section 7. More specifically. a valid Java virtual machine installed) in order to be deployed. the user will create a logger component which is a JAR file. It supports two auditing strategies: push and pull. The error correction information combined with the encryption and authentication mechanism provides a robust and reliable recovery mechanism. More importantly. VOL. The log harmonizer forms the central component which allows the user access to the log files. The logger is the component which is strongly coupled with the user’s data. nor it should introduce heavy communication and computation overhead. At the beginning. It may also be configured to ensure that access and usage control policies associated with the data are honored. verify. the decryption can be carried out on the client end if the path between the log harmonizer and the client is not trusted. For example. Using the generated key. it is not very intrusive in its actions. Finally. the log file is pushed back to the data owner periodically in an automated fashion. Under the push strategy. Its main tasks include automatically logging access to data items that it contains. The tight coupling between data and logger. The logging should be decentralized in order to adapt to the dynamic nature of the cloud. The proposed technique should not intrusively monitor data recipients’ systems. In case there exist multiple loggers for the same set of data items. 1). the log harmonizer will merge log records from them before sending back to the data owner. whereby the log file is obtained by the data owner as often as requested. . 3. Then. and record the actual operations on the data as well as the time that the data have been accessed. In addition. and the second being the log harmonizer. The log harmonizer is responsible for auditing.1 Major Components There are two major components of the CIA. 4. therefore meeting our first design requirement. It handles a particular instance or copy of the user’s data and is responsible for logging access to that instance or copy. therefore meeting the third requirement. he sends the JAR file to the cloud service provider that he subscribes to. In the event that the access is requested by a user. 4. 1. and possibly other data stakeholders (users. results in a highly distributed logging system. wherein a trusted identity provider issues certificates verifying the user’s identity based on his username. Furthermore. so that it is downloaded when the data are accessed. we aim to develop novel logging and auditing techniques which satisfy the following requirements: 1. to store its data items. To authenticate the CSP to the JAR (steps 3-5 in Fig. Being the trusted component. the logger is also responsible for generating the error correction information for each log record and send the same to the log harmonizer. the log harmonizer generates the master key. The logger and the log harmonizer are both implemented as lightweight and portable JAR files. we present an overview of the Cloud Information Accountability framework and discuss how the CIA framework meets the design requirements discussed in the previous section. and modification by malicious parties. the harmonizer sends the key to the client in a secure key exchange. 9. The logger will control the data access even after it is downloaded by user X. It has two major components: logger and log harmonizer. since the logger does not need to be installed on any system or require any special support from the server. encrypting the log record using the public key of the content owner. and require minimal infrastructural support from any server. each user creates a pair of public and private keys based on Identity-Based Encryption [4] (step 1 in Fig. This requires integrated techniques to authenticate the entity who accesses the data. deletion. we use OpenSSLbased certificates. These two modes allow us to satisfy the aforementioned fourth design requirement. and is copied whenever the data are copied. The pull mode is an on-demand approach. Alternatively. carried out at any point of time at any cloud service provider. JULY/AUGUST 2012 data. This IBE scheme is a Weil-pairing-based IBE scheme. NO. which meets the second design requirement. we employ SAML-based authentication [8].. thus satisfying our fifth requirement. Log files should be reliable and tamper proof to avoid illegal insertion. 2. users. It holds on to the decryption key for the IBE key pair. companies) are authorized to access the content itself. which otherwise will hinder its feasibility and adoption in practice. combining data. Every access to the user’s data should be correctly and automatically logged. The logger requires only minimal support from the server (e. 4. a data owner can specify that user X is only allowed to view but not to modify the data. 5. 4 CLOUD INFORMATION ACCOUNTABILITY In this section. The JAR file includes a set of simple access control rules specifying whether and how the cloud servers. Recovery mechanisms are also desirable to restore damaged log files caused by technical problems. the first being the logger. The JAR file implementation provides automatic logging functions.558 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING.2 Data Flow The overall CIA framework. Log files should be sent back to their data owners periodically to inform them of the current usage of their data. and periodically sending them to the log harmonizer.

1. the JAR will provide usage control associated with logging. 1). able to quickly detect possible errors or missing records. Example 2. the service provider (or the user) will be allowed to access the data enclosed in the JAR. With only encryption. our proposed JAR file consists of one outer JAR enclosing one or more inner JARs. each time there is an access to the data. They require the data to stay within the boundaries of the centralized system for the logging to be possible. and store it along with the data (step 6 in Fig. which is however not suitable in the cloud.SUNDARESWARAN ET AL. They can be accessed by the data owner or other authorized stakeholders at any time for auditing purposes with the aid of the log harmonizer (step 8 in Fig. On a regular basis. encrypt it using the public key distributed by the data owner. Depending on the configuration settings defined at the time of creation. our proposed framework prevents various attacks such as detecting illegal copies of users’ data. Alice can rely on the pull-mode auditing mechanism to obtain log records. 2. With the aid of control associated logging (called AccessLog in Section 5. 2. individual records are hashed together to create a chain structure. As discussed in Section 7. the push-mode auditing mechanism will inform Alice about the activity on each of her photographs as this allows her to keep track of her clients’ demographics and the usage of her data by the cloud service provider. The encryption of the log file prevents unauthorized changes to the file by attackers. In addition. As shown in Fig. To ensure trustworthiness of the logs. In the event of some dispute with her clients.1 The Logger Structure We leverage the programmable capability of JARs to conduct automated logging. Further.2). Alice can enclose her photographs and access control policies in a JAR file and send the JAR file to the cloud service provider. A logger component is a nested Java JAR file which stores a user’s data items and corresponding log files. 1). each record is signed by the entity accessing the content. The data owner could opt to reuse the same key pair for all JARs or create different key pairs for separate JARs. we first elaborate on the automated logging mechanism and then present techniques to guarantee dependability. Note that our work is different from traditional logging methods which use encryption to protect log files.: ENSURING DISTRIBUTED ACCOUNTABILITY FOR DATA SHARING IN THE CLOUD 559 Fig. As for the logging. Alice will be able to enforce the first four requirements and record the actual data access. The encrypted log files can later be decrypted and their integrity verified. Using separate keys can enhance the security (detailed discussion is in Section 7) without introducing any overhead except in the initialization phase. the JAR will automatically generate a log record. their logging mechanisms are neither automatic nor distributed. Once the authentication succeeds. Considering Example 1. The structure of the JAR file. Overview of the cloud information accountability framework. some error correction information will be sent to the log harmonizer to handle possible log file corruption (step 7 in Fig. 5 AUTOMATED LOGGING MECHANISM In this section. Fig. or will provide only logging functionality. . 1). 5.

the inner JAR contains a class file for writing the log records. services like billing may just need to use PureLogs. hððID. a class file is used for managing the GUI for user authentication and the Java Policy. located in USA. A Java policy specifies which permissions are available for a particular piece of code in a Java application environment. locations as well as actions. which are elaborated as follows: . i. on the usage of the data. Logging occurs at any access to the data in the JAR.. r94gm30130ffi. The permissions expressed in the Java policy are in terms of File System Permissions. View. JULY/AUGUST 2012 The main responsibility of the outer JAR is to handle authentication of entities which want to access the data stored in the JAR file. Hence. The corresponding log record is hKronos. respectively. class files to facilitate retrieval of log files and display enclosed data in a suitable format. timed_access. AccessLogs will be necessary for services which need to enforce service-level agreements such as limiting the visibility to some sensitive content at a given location. and a log file for each encrypted item. Act has one of the following four values: view. the JAR will record the time when the request is made. The most critical part is to log the actions on the users’ data. The outer JAR may contain one or more inner . If the access request is granted. download. another class file finding the correct inner JAR. and J3 . . The three groups of photos are stored in three inner JAR J1 . The log files are used for pure auditing purpose.2 Log Record Generation Log records are generated by the logger component. 5. the time of access is determined using the Network Time Protocol (NTP) [35] to avoid suppression of the correct time by a malicious entity. we verify the access time. 2011-05-29 16:52:30. .560 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING. using Java Authentication and Authorization Services. and the public key of the IBE key pair that is necessary for encrypting the log records. AccessLog. An example of log record for a single file is shown below. We support two options: PureLog. For example. T . . To ensure the correctness of the log records. the PureLog will simply write a log record about the access. or an AccessLog). Consider Example 1. Further.g. In case an access request is denied. . The location of the cloud service provider can be determined using IP address. 2011. Loc. If more than one file is handled by the same logger. To carry out these functions. J2. . T . read the image in a JAR file (but did not download it) at 4:52 pm on May 20. 45rftT024g. sigi: Here. rk i. ri indicates that an entity identified by I D has performed an action Act on the user’s data at time T at location Loc. 4. authentication is specified according to the servers’ functionality (which we assume to be known through a lookup service). a log record takes the following form: ri ¼ hID. For each action. It has two functions: logging actions and enforcing access control. However. The location is converted from the IP address for improved readability. Example 4. More advanced techniques for determining location can also be used [16]. The JAR can perform an IP lookup and use the range of the IP address to find the most probable location of the CSP. an additional ObjI D field is added to each record. Suppose that Alice’s photographs are classified into three categories according to the locations where the photos were taken. say J1 . The component sig denotes the signature of the record created by the server.USA. the cloud service provider) can only read the data but is not allowed to save a raw copy of it anywhere permanently. Similarly. jr1 Þ corresponds to the checksum of the records preceding the newly inserted one. The checksum is computed using a collision-free hash function [37]. Each inner JAR contains the encrypted data. For example. The two kinds of logging modules allow the data owner to enforce certain access conditions either proactively (in case of AccessLogs) or reactively (in case of PureLogs). a third class file which checks the JVM’s validity using oblivious hashing. Suppose that a cloud service provider with ID Kronos. concatenated with the main content of the record itself (we use I to denote concatenation). VOL. specified as Java policies. Recall that the data are encrypted and . NO. Act. the encrypted data. a policy may state that Server X is allowed to download the data if it is a storage server. . Act. LocÞjri À 1j . The component hððID. we propose a specific method to correctly record or enforce it depending on the type of the logging module. the JAR will additionally record the access information along with the duration for which the access is allowed.e. For this type of action. it can be used to record the time stamp in the accountability log [1]. the data owners may not know the exact CSPs that are going to handle the data. No secret keys are ever stored in the system. View. T . the outer JAR will just render the corresponding inner JAR to the entity based on the policy evaluation result. As discussed below. Its main task is to record every access to the data. Example 3. Moreover. and Location-based_access. in addition to a class file for authenticating the servers or the users. In particular. the outer JAR is also in charge of selecting the correct inner JAR according to the identity of the entity who requests the data. the outer JAR may also have the access control functionality to enforce the data owner’s requirements. In particular. If some entities are allowed to access only one group of the photos. The entity (e. .. in order of creation LR ¼ hr1 . In the current system. JARs. we support four types of actions. rather than the server’s URL or identity. while the AccessLogs will enforce the action through the enclosed access control module. Act. Each record ri is encrypted individually and appended to the log file. 9. LocÞjri À 1j . . In our context. another class file which corresponds with the log harmonizer. a third class file for displaying or downloading the data (based on whether we have a PureLog. jr1 Þ. . associated with different access control policies. the data owner can specify the permissions in user-centric terms as opposed to the usual code-centric security offered by Java. if a trusted time stamp management infrastructure can be set up or leveraged. and new log entries are appended sequentially.

we discuss how we ensure the dependability of logs. Duplicate records result from copies of the user’s data JARs. If the attacker took the data JAR offline after the harmonizer was pinged. corrupting the JAR. If PureLog is adopted. This action is combined with the view-only access. Further.: ENSURING DISTRIBUTED ACCOUNTABILITY FOR DATA SHARING IN THE CLOUD 561 . so as to provide guarantees of integrity of the JRE. Download. 5. viewer disables the copying functions using right click or other hot keys such as PrintScreen. To verify the integrity of the logger component. the PureLog will record the location of the entities.3. If the harmonizer is not reachable. the entire JAR file will be given to the entity. the logger will be copied together with the user’s data.SUNDARESWARAN ET AL.1 JARs Availability To protect against attacks perpetrated on offline JARs. an attacker may try to evade the auditing mechanism by storing the JARs remotely. as well as the user’s IBE decryption key. In particular. These tasks are carried out by the log harmonizer and the logger components in tandem with each other. It does not contain the user’s data items being audited. to decrypt the log records and handle any duplicate records. In case of corruption of JAR files. each individual logging JAR. and then uses a timer to stop the access. In case of AccessLogs. The decrypted file will then be displayed to the entity using the Java application viewer in case the file is displayed to a human user. We choose the ReedSolomon code as it achieves the equality in the Singleton Bound [36]. to prevent the use of some screen capture software. . In this case. . If the entity is a human user. The harmonizer is implemented as a JAR file. The Purelog will just record the access starting time and its duration. To present the data owner an integrated view. Timed_access. contains a Reed-Solomon-based encoder. which calculate the hash values of the program traces of the modules being executed by the logger component. Such old log records are redundant and irrelevant to the new copy of the data. When there is a view-only access request. For every n symbols in the log file. the new copy of the logger contains the old log records with respect to the usage of data in the original data JAR file. Location-based_access. the inner JAR will decrypt the data on the fly and create a temporary decrypted file. For recovering purposes. To enforce the limit on the duration. The duration for which the access is allowed is calculated using the Network Time Protocol. The harmonizer stores error correction information sent from its logger components. The AccessLog will verify the location for each of such access. the JAR file associated with the data will decrypt the data and give it to the entity in raw form. This helps us detect modifications of the JRE once the logger component has been launched. the CIA includes a log harmonizer which has two main responsibilities: to deal with copies of JARs and to recover corrupted logs. the attacker may try to compromise the JRE used to run the JAR files. when created. the data will be hidden whenever the application viewer screen is out of focus. the harmonizer resides at the user’s end as part of his local machine. the harmonizer helps prevent attacks which attempt to keep the data JARs offline for unnoticed usage. Typically. making it a maximum distance separable code and hence leads to an optimal error correction. Naturally. we aim to prevent the following two types of attacks.3 Dependability of Logs In this section. 5. Each log harmonizer is in charge of copies of logger components containing the same set of data items. n redundancy symbols are added to the log harmonizer in the form of bits. Therefore. the harmonizer will recover the logs with the aid of Reed-Solomon error correction code [45]. If the entity is a CSP. this type of access can be enforced only when it is combined with the View access right and not when it is combined with the Download. The access is granted and the data are made available only to entities located at locations specified by the data owner. The log harmonizer is located at a known IP address. we rely on a two-step process: 1) we repair the JRE before the logger is launched and any kind of access is given. the harmonizer still has the error correction information about this access and will quickly notice the missing record.3. 5. logger components always ping the harmonizer before they grant any access right. First.2 Log Correctness For the logs to be correctly recorded. When an entity clicks this download link. Consequently. and it indicates that the data are made available only for a certain period of time. while the AccessLog will enforce that the access is allowed only within the specified period of time. In this way. Second. logger components are required to send error correction information to the harmonizer after writing each log record. Specifically. it is essential that the JRE of the system on which the logger components are running remain unmodified. the harmonizer will merge log records from all copies of the data JARs by eliminating redundancy. the logger components will deny all access. and are useful to verify if the original code flow of execution is altered. The content is displayed using the headless mode in Java on the command line when it is presented to a CSP. The log . This creates an error correcting code of size 2n and allows the error correction to detect and correct n errors. The entity is allowed to save a raw copy of the data and the entity will have no control over this copy neither log records regarding access to the copy. or trying to prevent them from communicating with the user. he/she just needs to double click the JAR file to obtain the data. it can either be stored in a user’s desktop or in a proxy server. the AccessLog records the start time using the NTP. stored in the inner JAR. Presenting the data in the Java application. 2) We insert hash codes. it can run a simple script to execute the JAR. Since user’s data are strongly coupled with the logger component in a data JAR file. but consists of class files for both a server and a client processes to allow it to communicate with its logger components. or alternatively. the user’s data will be directly downloadable in a pure form using a link.

2) pull mode. Push mode. A hybrid strategy can actually be implemented to benefit of the consistent information offered by pushing mode and the convenience of the pull mode. 3. An example of how the hashing transforms the code is shown in Fig. we notice that the auditor.562 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING. These integrity checks are carried out using oblivious hashing [11]. the other is that the JAR file exceeds the size stipulated by the content owner at the time of creation. or looping. The log harmonizer proceeds to match these values against each other to verify if the JRE has been tampered with. The JRE is reinstalled using commands such as sudo apt install for Linux-based systems or $ <jre>. For such data owners. The repair step is itself a two-step process where the harmonizer first recognizes the Operating System being used by the cloud machine and then tries to reinstall the JRE. The pushing mode may be preferred by data owners who are organizations and need to keep track of the data usage consistently over time. the log file may become very large. 6 END-TO-END AUDITING MECHANISM In this section. In this case. harmonizer is solely responsible for checking the integrity of the JRE on the systems on which the logger components exist before the execution of the logger components is started. the hash value of the result variable is cleared and the hash value is updated every time there is a variable assignment. 9. our distributed logging mechanism is complemented by an innovative auditing mechanism. using the checksum added to each and every record. the pull mode allows him to monitor the usage of his content immediately. The pushing strategy is beneficial when there are a large number of accesses to the data within a short period of time. Trusting this task to the log harmonizer allows us to remotely validate the system on which our infrastructure is working. By construction of the records. and the user will be informed of the data’s locations and obtain an integrated copy of the authentic and sealed log file. 6. The pull message consists simply of an FTP pull command. . Pull mode. 3. the hash code captures the computation results of each instruction and computes the oblivious-hash value as the computation proceeds. We support two complementary auditing modes: 1) push mode. This push mode is the basic mode which can be adopted by both the PureLog and the AccessLog. The logger and the log harmonizer work in tandem to carry out the integrity checks during runtime. Oblivious hashing applied to the logger. by checking the records’ integrity and authenticity. OH works by adding additional hash codes into the programs being executed. branching. will be able to quickly detect forgery of entries. will verify its cryptographic guarantees. a wizard comprising a batch file can be easily built. the execution values will not match. as discussed in Section 7. 4. The pull strategy is most needed when the data owner suspects some misuse of his data. which can be issues from the command line. NO. After the logs are sent to the data owner. If the JRE is tampered. The OS is identified using nmap commands. These hash codes are added to the logger components when they are created. Adding OH to the logger components also adds an additional layer of security to them in that any tampering of the logger components will also result in the OH values being corrupted. so as to free the space for future access logs.exe [lang=] [s] [IEXPLORER=1] [MOZILLA=1] [INSTALLDIR=:] [STATIC=1] for Windows-based systems. the auditor. upon receiving the log file. In this mode. For naive users. The log harmonizer stores the values for the hash computations. The request will be sent to the harmonizer. Along with the log files.2 Algorithms Pushing or pulling strategies have interesting tradeoffs. As shown. if the data are not pushed out frequently enough. the log files will be dumped. They are present in both the inner and outer JARs. receiving the logs automatically can lighten the load of the data analyzers. Further. The push action will be triggered by either type of the following two events: one is that the time elapses for a certain period according to the temporal timer inserted as part of the JAR file. which may increase cost of operations like copying data (see Section 8). This mode serves two essential functions in the logging architecture: 1) it ensures that the size of the log files does not explode and 2) it enables timely detection and correction of any loss or damage to the log files. Concerning the latter function. the error correcting information for those logs is also dumped. JULY/AUGUST 2012 Fig. regardless of whether there is a request from the data owner for the log files. The hash function is initialized at the beginning of the program. VOL. 6. we describe our distributed auditing mechanism including the algorithms for data owners to query the logs regarding their data. The maximum size at which logs are pushed out is a parameter which can be easily configured while creating the logger component.1 Push and Pull Mode To allow users to be timely and accurately informed about their data usage. The values computed during execution are sent to it by the logger components. This mode allows auditors to retrieve the logs anytime when they want to check the recent access to their own data. the logs are periodically pushed to the data owner (or auditor) by the harmonizer. supporting both pushing and pulling modes helps protecting from some nontrivial attacks.

Precisely. followed by a discussion on how to ensure that this assumption holds true. However. with the push mode. That is. With reference to Example 1. First. Once the ping is successful. the harmonizer will send the logs to data owners periodically. In this case.1 Copying Attack The most intuitive attack is that the attacker copies entire JAR files. The communication with the harmonizer begins with a simple handshake. if the JAR is configured to send error notifications. The log retrieval algorithm for the Push and Pull modes is outlined in Fig. 4. it guarantees a high level of availability of the logs. Recall that every JAR file is required to send log records to the harmonizer. the communication with the harmonizer proceeds. Once the handshake is completed. the use of the harmonizer minimizes the amount of workload for human users in going through long log files sent by different copies of JAR files. In case of AccessLog. even if the data . such attack will be detected by our auditing mechanism. 4. 7 SECURITY DISCUSSION Fig. Besides receiving log information once every week. the above algorithm is modified by adding an additional check after step 6. Push and pull PureLog mode. We assume that attackers may have sufficient Java programming skills to disassemble a JAR file and prior knowledge of our CIA architecture. she just need to send her pull request to the harmonizer which will then ping all the other JARs with the “pull” variable to 1. For a better understanding of the auditing mechanism. or the size or time exceeds the threshold) has occurred. the JAR simply dumps the log files and resets all the variables. using a TCP/IP protocol. we present the following example. If the conditions are satisfied. the harmonizer merges the logs and sends them to Alice. Once the message from the harmonizer is received.. Our analysis is based on a semihonest adversary model by assuming that a user does not release his master keys to unauthorized parties.SUNDARESWARAN ET AL. the AccessLog checks whether the CSP accessing the log satisfies all the conditions specified in the policies pertaining to it. First. otherwise. The algorithm also checks whether the data owner has requested a dump of the log files. On receiving the files. as it will allow her to monitor the accesses to her photographs. Alice can also request the log file anytime when needed. to make space for new records. If no response is received. it proceeds to encrypt the record and write the error-correction information to the harmonizer. The data owner is then alerted through We now analyze possible attacks to our framework. If any of the aforementioned events (i. Alice can specify that she wants to receive the log files once every week. while the attacker may try to learn extra information from the log files. The attacker may assume that doing so allows accessing the data in the JAR file without being noticed by the data owner. The size and time threshold for a dump are specified by the data owner at the time of creation of the JAR. the file transfer begins. access is denied. Our auditing mechanism has two main advantages. the attempted access to the data in the JAR file will be logged. the algorithm checks whether the size of the JAR has exceeded a stipulated size or the normal time between two consecutive dumps has elapsed. Irrespective of the access control outcome. We first assume that the JVM is not corrupted. access is granted.: ENSURING DISTRIBUTED ACCOUNTABILITY FOR DATA SHARING IN THE CLOUD 563 e-mails. The algorithm presents logging and synchronization steps with the harmonizer in case of PureLog. Under this setting. 7. the JARs start transferring the log files back to the harmonizer. the log file records an error.e. once every week the JAR files will communicate with the harmonizer by pinging it. Second. there is request of the log file. If none of these events has occurred. In particular. Example 5.

using different IBE key pairs for different JAR files will be more secure and prevent such attack.4 Compromised JVM Attack An attacker may try to compromise the JVM. 9. the log harmonizer can easily detect a corrupted record or log file. However. as mentioned earlier. since it is stored separately in either a secure proxy or at the user end and the attacker typically cannot access it. NO. This attack is prevented by the sealing techniques offered by Java. 7. More importantly. 7. the attacker may try to modify the Java classloader in the JARs in order to subvert the class files when they are being loaded. since they will need to sign with a valid key and the chain of hashes will not match. The Reed-Solomon encoding used to 1. to avoid that an attacker writes to the JAR. Once the JAR files are disassembled.564 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING. This is. To compromise the confidentiality of the log files. If the classloader is found to be a custom classloader. These two techniques allow for a first quick detection of errors due to malicious JVM. the integrity checks added to each record of the log will not match at the time of verification (see Section 5. very hard to accomplish since. One is after the actual service provider has completely disconnected and ended a session with the certificate authority. the attacker cannot modify the log files extracted from a disassembled JAR. JAR files are signed for integrity at the time of creation. we first introduce the settings of the test environment and then present the performance study of our system. the attackers are not able to directly view the access control policies either. the attacker is in possession of the public IBE key used for encrypting the log files.1 From the disassembled JAR files. Thus. This is because each JAR is required to write redundancy information to the harmonizer periodically. and it makes offline files unaccessible. Further. Even if an attacker can read from it by disassembling it—he cannot “reassemble” it with modified packages. the attacker may try to identify which encrypted log records correspond to his actions by mounting a chosen plaintext attack to obtain some pairs of encrypted log records and plain texts. all the JAR files using the same key will be compromised. the JARs will throw an exception and halt. 8. Finally. since the original source code is not included in the JAR files. The other is when the actual service provider is disconnected but the session is not over.2 for the record structure and hash chain). 4. which allows creating a signature that does not allow the code inside the JAR file to be changed. 8 PERFORMANCE STUDY In this section. To further strengthen our solution. however. JULY/AUGUST 2012 owner is not aware of the existence of the additional copies of its JAR files. Since we cannot prevent an attacker to gain possession of the JARs. and reply the messages in order to masquerade as a legitimate service provider. Sealing is one of the Java properties. OH adds hash code to capture the computation results of each instruction and computes the oblivious-hash value as the computation proceeds. it allows the data owner to detect when an attacker has created copies of a JAR. As a result. the encrypted log file itself. Also.3 Man-in-the-Middle Attack An attacker may intercept messages during the authentication of a service provider with the certificate authority. Similarly. In particular. he will still be able to receive log files from all existing copies. VOL. Would the attacker erase or tamper a record. 7. the attacker will not be able to decrypt any data or log files in the disassembled JAR file. and the *. one can extend OH usage to guarantee the correctness of the class files loaded by the JVM. we discussed in Section 5. the access to the content in the JAR will be disabled. To quickly detect and correct these issues. In case the attacker guesses or learns the data owner’s key from somewhere. the test environment consists of several OpenSSL-enabled servers: . the only possible way is through analyzing the log file. If the attacker wants to infer access control policies. Thus. The first type of attack will not succeed since the certificate typically has a time stamp which will become obsolete at the time point of reuse. log records are encrypted and breaking the encryption is computationally hard. the adoption of the Weil Pairing algorithm ensures that the CIA framework has both chosen ciphertext security and chosen plaintext security in the random oracle model [4]. Sealing ensures that all packages within the JAR file come from the same source code [27]. therefore mitigating the risk of running subverted JARs. Therefore. Notice that we do not consider the attack on the log harmonizer component. this attack is stopped as the JARs check the classloader each time before granting any access right. this attack poses one of the most serious threats to our architecture. the logger component provides more transparency than conventional log files encryption. There are two points in time that the attacker can replay the messages.3 how to integrate oblivious hashing to guarantee the correctness of the JRE [11] and how to correct the JRE prior to execution. If attackers move copies of JARs to places where the harmonizer cannot connect. Given the ease of disassembling JAR files. in case any error is detected. Even if the attacker is an authorized user. Therefore. we assume that the attacker cannot extract the decryption keys from the log harmonizer. so the attacker may try to renegotiate the connection. create the redundancy for the log files.class files. using the Emulab testbed [42]. attackers will not be able to write fake records to log files without going undetected. If the JAR cannot contact the harmonizer. The second type of attack will also fail since renegotiation is banned in the latest version of OpenSSL and cryptographic checks have been added. we rely on the strength of the cryptographic schemes applied to preserve the integrity and confidentiality of the logs.2 Disassembling Attack Another possible attack is to disassemble the JAR file of the logger and then attempt to extract useful information out of it or spoil the log records in it. he can only access the actual content file but he is not able to decrypt any other data including the log files which are viewable only to the data owner. the copies of JARs will soon become inaccessible. the attacker has to rely on learning the private key or subverting the encryption to read the log records.1 Experimental Settings We tested our CIA framework by setting up a small cloud. revealing the error.

Also. JAR act as a compressor of the files that it handles. We can observe that the time increases almost linearly to the number of files and size of files. In this experiment.35 minutes. submitting his username to the JAR).: ENSURING DISTRIBUTED ACCOUNTABILITY FOR DATA SHARING IN THE CLOUD 565 Fig. Time to merge log files. causing continuous logging. We measure the average time taken to grant an access plus the time to write the corresponding log record. We used Linux-based servers running Fedora 10 OS. Considering one access at the time. With respect to time.2.. and 1 MB each. 6. keeping other variables like space constraints or network traffic in mind. with the least time being taken for merging two 100 KB log files at 59 ms. which includes the time taken by a user to double click the JAR or by a server to run the script to open the JAR. 8.2 minutes. Each of the servers is equipped to run the OpenJDK runtime environment with IcedTea6 1. 8. The time for granting any access to the data items in a JAR file includes the time to evaluate and enforce the applicable policies and to locate the requested data items.2. 900 KB. As of present. multiple files can be handled by the same logger component.4 Log Merging Time To check if the log harmonizer can be a bottleneck.2 Authentication Time The next point that the overhead can occur is during the authentication of a CSP. This is because both the OpenSSL and the SAML certificates are handled in a similar fashion by the JAR. the overhead can occur at three points: during the authentication.e. results in storage overhead. Time to create log files of different sizes. 4 GB RAM. we find that the authentication time averages around 920 ms which proves that not too much overhead is added during this phase. Fig. Eucalyptus is an open source cloud implementation for Linux-based systems. therefore bringing the powerful functionalities of Amazon EC2 into the open source domain. As a result. Results are shown in Fig. the authentication takes place each time the CSP needs to access the data.1 Log Creation Time In the first round of experiments. 700 KB. When we consider the user actions (i. we let multiple servers continuously access the same data JAR file for a minute and recorded the number of log records generated. obtaining a SAML certificate and then evaluating it. Specifically. The performance can be further improved by caching the certificates. . during encryption of a log record. 5. it may become a bottleneck for accessing the enclosed data. 8. In the experiment. The results are shown in Fig. we measure the amount of time required to merge log files. viz. we first examine the time taken to create a log file and then measure the overhead in the system. Each of the servers is installed with Eucalyptus [41]. the average time to log an action is about 10 seconds. we are interested in finding out the time taken to create a log file when there are entities continuously accessing the data. and a 500 GB Hard Drive. as introduced in Section 3. We also measured the log encryption time which is about 300 ms (per record) and is seemingly unrelated from the log size. Each server has a 64-bit Intel Quad Core Xeon E5530 processor.2 Experimental Results In the experiments. If the time taken for this authentication is too long.2. Each access is just a view request and hence the time for executing the action is negligible.2. we investigate whether a single logger component. we ensured that each of the log files had 10 to 25 percent of the records in common with one other. and during the merging of the logs. 8. With this experiment as the baseline. and several computing nodes. We tested the time to merge up to 70 log files of 100 KB. Further. in that the only data to be stored are given by the actual files and the associated logs. 8. 500 KB. It is not surprising to see that the time to create a log file increases linearly with the size of the log file. In particular. with respect to storage overhead. It is loosely based on Amazon EC2. the head node issued OpenSSL certificates for the computing nodes and we measured the total time for the OpenSSL authentication to be completed and the certificate revocation to be checked. To this extent. it averages at 1. 300 KB. we notice that our architecture is very lightweight. one can decide the amount of time to be specified between dumps. the time to create a 100 Kb file is about 114. while the time to merge 70 1 MB files was 2. The exact number of records in common was random for each repetition of the experiment. one head node which is the certificate authority. To evaluate this.5 ms while the time to create a 1 MB file averages at 731 ms. The time for authenticating an end user is about the same when we consider only the actions required by the JAR.2. The time was averaged over 10 repetitions.3 Time Taken to Perform Logging This set of experiments studies the effect of log file size on the logging performance.8. 5. used to handle more than one file.SUNDARESWARAN ET AL. 6.

Buneman.” Proc. Third Int’l Conf. Lin. usage control for executables. R. pp. Cinzia Squicciarini. 14th European Conf. 2008. 2012. Riely. B. Franklin. 2007.e. “Provable Data Possession at Untrusted Stores. 2007. vol. Z.W. Notice that we purposely did not include large log files (less than 5 KB). and M. 152-167.5 Size of the Data JAR Files Finally. In the long term.” Proc. “Reasoning about Accountability within Delegation. File and Storage Technologies. vol. we plan to refine our approach to verify the integrity of the JRE and the authentication of JARs [23]. Lenzini. 5. “Distributed Timestamp Generation in Planar Lattice Networks. “A Posteriori Compliance Control.500 ms.php?wg abbrev=security. “The Case of the Fake Picasso: Preventing History Forgery with Secure Provenance. vol. 7. Cheney. B. “Location Systems for Ubiquitous Computing. Computer Science. pp. Research in Computer Security (ESORICS). NO. 341-349. J.” Proc.C. R. 2004. J. 128. Flickr. Int’l Cryptology Conf. S. 2012.” Proc. 1993. and Y. Caelli. Size of the logger component. 5.W. “The Design and Evaluation of Accountable Grid Computing System. and E. 145-154. 2009. A. Frew. P. W. A. IFIP TC1 WG1. Boneh and M. no. com/cs/projects/jvm/. REFERENCES [1] [2] Fig. 2006. Kailar. results in storage overhead..” Proc. This time is found to average around 9 ms. den Hartog. and generic accountability and provenance controls. 1-14. so as to focus on the overhead added by having multiple content files in a single JAR.” J. 12th ACM Symp. Hasan. 598609. Kissner. Ni. http://www. We measure the size of the loggers (JARs) by varying the number and size of data items held by them.” ACM Computing Surveys. Access Control Models and Technologies. Chapman. Bertino.6 Overhead Added by JVM Integrity Checking We investigate the overhead added by both the JRE installation/repair process. Rhodes. 2009.” IEEE Trans. Lakas. and J. 3. “Towards a Theory of Accountability and Audit. ACM SIGPLAN Int’l Workshop Types in Languages Design and Implementation. Overall. The number of hash commands varies based on the size of the code in the code does not change with the content. pp.” Computer. R. J. F. 9. J.. “Accountability in Electronic Commerce Protocols. Jajodia.K. Jagadeesan. Corin. vol. C. 2001. Bose and J. J.2. pp. 8.500 to 4. R. G. Winsborough. Peterson.” J. 2009. Barka and A. we will investigate whether it is possible to leverage the notion of a secure JVM [18] being developed by IBM. Computer Systems. ACM SIGMOD Int’l Conf. Hawaii Int’l Conf.. Software Eng. “Integrating Usage Control with SIP-Based Communications. Management of Data (SIGMOD ’06). pp. Our approach allows the data owner to not only audit his content but also enforce strong back-end protection if needed. ACM Conf. Ateniese. vol. R. pp. “Using SelfDefending Objects to Develop Security Aware Applications in Java. we plan to design a comprehensive and more generic object-oriented approach to facilitate autonomous protection of traveling content. Chun and A. and A. used to handle more than one file. pp. and J. L. “A Logic for Auditing Accountability in Decentralized Systems. pp. due to the compression provided by JAR files.. Bavier. 313-328. 67-78.” Proc. “Oblivious Hashing: A Stealthy Software Integrity Verification Primitive. In the future. Aug. Moreover. The size of logger grows from 3. 187-201. D. 2004. Networks. Chen et http://www.0. no. For example. Intuitively. 213-229.” ACM Trans. pp. 26. and Comm. Holford. Z. Hightower and G. VOL. “Decentralized Trust Management and Accountability in Federated Systems. Ann. Lee. Ammann and S. R. A. Sion. Burns.T. the overall logger also increases in size. Information and Comm. W. System Sciences (HICSS).almaden. Seventh Conf. like indexing policies for text files. pp.” http://www. 539-550. X.” Proc. Grimes. 2005. Trusted Java Virtual Machine IBM. pp. Aug.” Proc. ed. P. S. [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] 9 CONCLUSION AND FUTURE RESEARCH [17] We proposed innovative approaches for automatically logging any access to the data in the cloud together with an auditing mechanism. Advances in Cryptology. 7. Information Technology and Politics. 205-225. “Lineage Retrieval for Scientific Data Processing: A Survey.2.” Proc. Results are in Fig.M. 251-260. 2012. JULY/AUGUST 2012 Java applications. 57-66. pp. Pitcher. 2007. the number of hash commands remain constant. 2005. P. Guo. May 1996. 34. 11. vol. 8. pp.J. 37. Distributed Computing Systems (ICDCS ’09). committees/tc home.035 KB when the size of content items changes from 200 KB to 1 MB. 27th Australasian Conf. E. Shao.oasis-open. in case of larger size of data items held by a logger. Feng. To calculate the time overhead added by the hash codes. R. OASIS Security Services Technical Committee. Ruffo. 1-8.” SACMAT ’07: Proc. 2008. 4. images) of the same size as the file size increases. pp.566 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING. 2009. Computer and Comm. Security (ICICS). pp. This research is aimed at providing software tamper resistance to [18] [19] [20] [21] [22] .” Proc. and I. R. the size of the logger is dictated by the size of the largest files it contains. “An Open Framework for Foundational Proof-Carrying Code. Mar. We would like to support a variety of security policies.” Proc. pp. Crispo and G. 2001. Etalle. Int’l Workshop Information Hiding. 8. Etalle and W. The time taken for JRE installation/repair averages around 6.. “Provenance Management in Curated Databases. 2001.7 Workshop Formal Aspects in Security and Trust. “Identity-Based Encryption from the Weil Pairing. Y.H. Petitcolas. 22. vol.” Proc. Jaeger. We tested the increase in size of the logger containing 10 content files (i. and D. pp. pp. 400-414. and J. This time was measured by taking the system time stamp at the beginning and end of the installation/repair. Z. G. Curtmola. Jeffrey. we investigate whether a single logger. 2003. one of the main features of our work is that it enables the data owner to audit even those copies of its data that were made without his knowledge. “Security Assertion Markup Language (saml) 2. Staicu. we simply measure the time taken for each hash calculation. Security. no. 11-20. 269-283. Song.I. “Cloud Computing and Information Policy: Computing in a Policy Cloud?. 29th IEEE Int’l Conf. Computer Systems. and by the time taken for computation of hash codes.

Mori. India. 2009. “Enabling Public Verifiability and Data Dynamics for Storage Security in Cloud Computing. Schaefer. and W. pp. D.” SACMAT ‘02: Proc. Int’l Workshop Database and Expert Systems Applications (DEXA). Pretschner. [27] S. [42] Emulab Network Emulation Testbed. and T. 6. Hilty. [40] S. pp. Applied Cryptography: Protocols. S. no.ntp. ¨ tz. Her research interests include policy formulation and management for Distributed computing architectures. “Towards a VMMBased Usage Control Framework for OS Kernel Integrity Protection. 2006. and M. Xu. security for Web 2.L. vol. Sussman. July 2004. S. 2009. pp. first ed. no. C. Pearson. First Int’l Conf. 9. ACM.eucalyptus. [46] M. K. pp. Sandhu. Shen.” SACMAT ‘07: Proc. 2004. Int’l Conf. J. Feigen-baum. O’ Reilly. Information and System Security. C. X. Schneier. 1993. M. [25] T. Access Control Models and Technologies.SUNDARESWARAN ET AL. Roman. Walter. Pearson. Schuo “Usage Control Enforcement: Present and Future. Access Control Models and Technologies. Forget. Wang.” Proc. and T. Springer-Verlag. 2008. 2010. Lin. Squicciarini. 82-87. Method for Authenticating a Java Archive (jar) for Portable Devices. John Wiley & Sons. She was a postdoctoral research associate from 2007 to 2008. no. vol. privacy. vol. [28] J. A.” IEEE Security & Privacy.R. Li. ¨ tz. 12. and D. C. Squicciarini received the PhD degree in computer science from the University of Milan. [26] M. Bhargava. [35] NTP: The Network Time Protocol. Schuo Evolution in Distributed Usage Control. 128174. “A Privacy Manager for Cloud Computing. 7. Her main interests include access control for distributed systems.B. and Source Code in C. and Check: Using Algebraic Signatures to Check Remotely Administered Storage.” Proc. “Towards Accountable Management of Identity and Privacy: Sticky Policies and Enforceable Tracing Services. Mont. 244.” Comm. 2012. [38] T. European IEEE Int’l Conf.J. ACM. “On Usage Control for Grid Systems. Sandhu. US Patent 6. Her research interests cover many areas in the fields of database systems and information security. 2008. Bramhall. www. . Distributed Systems. and P. Smith.” Electronic Notes Theoretical Computer Science. July/Aug. and S. She is a member of the IEEE. 2001. Martinelli and P. S. T. 44-53. [41] Eucalyptus Systems. she was a postdoctoral research associate at Purdue University. She is the author or coauthor of more than 50 articles published in refereed journals. http://www. Cloud Computing. and S. “Preventing Information Leakage from Indexing in the Cloud. Cloud Computing. 2010. pp.J. 1032-1042. Ren. 26. R. [29] J. Oaks. vol. R. Abelson.emulab. please visit our Digital Library at www. http://www. 7. J. 2003.: ENSURING DISTRIBUTED ACCOUNTABILITY FOR DATA SHARING IN THE CLOUD 567 [23] J. pp. Berners-Lee. Sept. IEEE Int’l Conf. [30] S. Wang. and S. “Towards Usage Control Models: Beyond Traditional Access Control. and in proceedings of international conferences and symposia. “Information Accountability. She is an assistant professor at Missouri University of Science and Technology.J. [24] F. Hilty. [39] A. 2002. 12th ACM Symp. Cloud Computing. Basin. Dan Lin received the PhD degree from National University of Singapore in 2007. Research in Computer Security (ESORICS). 57-64. Java Security. “Store. During the years of 2006-2007. Weitzner. 1.E. S. She is currently working toward the PhD degree in the College of Information Sciences and Technology at the Pennsylvania State University. 109-123. Seventh ACM Symp. and X. vol. vol. Smitha Sundareswaran received the bachelor’s degree in electronics and communications engineering in 2005 from Jawaharlal Nehru Technological University. 2012. Sandhu. 4. 51. Pretschner.K. Latif. Hendler. 1999. and Lin.” Proc. Italy. Y. [33] A. Anna C. pp.” Proc. 2011. Algorithms. John Wiley & Sons. Cloud Computing (CloudCom). Miller. “Promoting Distributed Accountability in the Cloud. Lou. “Accountability as a Way Forward for Privacy Protection in the Cloud. [44] D. Mowbray. [32] A. A. Charlesworth.” Proc.” Comm. 2009. 71-80. in 2006. Park and R. For more information on this or any other computing topic. 2012. 2009. [37] B.353.W. Pearson and A. 2006. ed. Coding and Information Theory. Park and R. [31] S. Walter. 90-106. Hyderabad. pp. no. She is an assistant professor at the College of Information Science and Technology at the Pennsylvania State University.” Proc. Pretschner. Mather. Lin. M. 39-44. Geiger. Schwarz and E. no.” ACM Trans. Wicker and V.” Future Generation Computer Systems. 1992. 49.0 technologies and grid computing. [43] Q. Sundareswaran. F. and G.H. “Policy [34] A. Kumaraswamy. [45] Reed-Solomon Codes and Their Applications. “The Uconabc Usage Control Model. J. R. H. 2009. p. O’Really. 355-370. Schaefer. pp. 6. IEEE Int’l Conf. Cloud Security and Privacy: An Enterprise Perspective on Risks and Compliance (Theory in Practice). Huang. Jiang. [36] S.L. F. Chan. “Distributed Usage Control.C. 377-382. Sundareswaran. Squicciarini.” Proc. Zhang. . pp.