Professional Documents
Culture Documents
Abstract: Cloud Computing is the nascent technology and providing vast number of services to the
users from remote server machines with low costs. Among the diverse amount of services, the data storage
is the main issue that is being crooned by every researcher in the field of Cloud Computing. Different
problems ascended involving security and privacy issues on data storage. Data leakage is the intimidating
problem on data privacy and giving a fear to all the cloud users who store their data on cloud. Leading cloud
services like Google and Amazon, and all the social network sites are implementing several techniques
to protect the leakage from the cloud storages. This paper implements a framework for protecting the data
leakage with the help of Key-Mapped Data Anonymization method.
1. INTRODUCTION
Cloud Computing is the growing field and providing vast number of services to the
users through the Internet. Three types of services are offered by the Cloud Computing
environment namely Platform as a Service (Paas), Infrastructure as a Service (IaaS)
and Software as a Service (SaaS). Among the diverse amount of services, the data
storage is the main issue that is being crooned by every researcher in the field of Cloud
Computing. Data centralization and outsourcing become a trend due to the
advent of Cloud Computing environment [1].
Cloud Computing environment offers on-demand services to users and allow them to
keep their important data on remote servers. The environment also allows organizations
like companies and hospitals etc. to store their data on remote storage for research and
public processing [2]. The advancement in technologies like hardware, software and
infrastructure from the past years raised the Cloud Computing environment into a
flourishing industry [3].
The affordable benefit of Cloud Computing to the users is storing and processing of
data through online as an on the fly process without any substantial investment [4]. Cloud
Computing environment extends the traditional distributed process to incorporate
information processing and resources as a payable service with the help of the Internet
[5]. Even, the field of Cloud Computing reduces a bunch of burdens like scalability,
resources and costs etc., there is a great fear on data privacy occurred maximum by
means of data leakage. Around 88% of the users are worrying about their data on cloud
and demanding more privacy in today Cloud Computing environment [6].
The data stored on cloud data storage contains sensitive information called as
Personally Identified Information (PII) and are easily disclosed with the identification of
special identifiers called quasi-identifiers [7]. Quasi-identifiers are attributes jointly used
to identify the personal information from various resources [2]. Several techniques have
been adopted to protect the privacy of data on cloud and preventing the data leakage.
Volume 8, Issue 3, 2020 http://aegaeum.com/ Page No:
1251
AEGAEUM JOURNAL ISSN NO: 0776-
3808
The present paper tries to construct a framework for protecting the privacy of data on
cloud and there by preventing the data leakage from the cloud. The proposed framework
uses a famous data anonymization technique called data pseudonymization which
generates fake data for each individual original data along with the mapping key which
preserves the original information.
The section 2 of the present paper discusses the various data anonymization
techniques and the section 3 deals the related work relevant to the present study. Section
4 discusses the proposed methodology of the work and the section 5 discuses the
results obtained from the constructed framework. Last section of this paper concludes
the present study with future enhancement.
3. RELATED WORKS
The problem of protecting the data leakage on cloud is a major problem and several
works have been carried out for protecting the privacy and preventing the data leakage.
Renu Sara George and Sabitha proposed a method for anonymization and
deanonymization processes in a secured enclave and there by reducing the computing
power of the client’s side [1].
G. Ateniese et.al devised a Provable Data Possession (PDP) framework model
with the help of RSA-homomorphic tags [9] . Gabriel Ghinita et.al an
effective framework for privacy preservation [2]. Sweeny and Samarati
proposed k- anonymity model for hiding the PII from the original data
[10]. Proofs Of Retrievability (POR) method was formulated by Juels A and
Kaliski which mainly encodes files and inserts a set of random blocks called
sentinels [11] .
In another work G.Ateniese et.al formulated a high efficient and securable PDP
method purely based on symmetric cryptography [12]. Q.Wang et.al constructed a
new approach which permits a Third Party Auditor (TPA) to verify the integrity of
dynamic data on a cloud storage [13]. A remote integrity checking protocol was
proposed by Z.Hao et.al with public verifiability[14]. Data privacy is the most
impairment to the users of Cloud Computing [6]. A decoupling process was carried
out by J-S Xu et.al for securing the data on cloud [15]. Safwan Mahmud Khan and
Kevin W. Hamlen proposed a decoupling process from the meta data for
maintaining the data privacy and there by preventing from the data leakage [3] .
data and there by maintaining the Pseudonymized data on cloud with a key mapping
principle over the original to preserve the meaning.
4. METHODOLOGY
The present work proposes a framework in which the entire data is
Pseudonymized with fake data along with the mapped key for retaining the original
information. The Pseudonymized data is stored on cloud with fake details which
cannot be retained the original information, incase the data might be subject for
leakage. The overall approach is depicted in Figure 1.
The de-identification process shown in Figure 3. contains columns which are not
saying any personal information to the readers. The age column in Figure 3. Is indirectly
specifying little information to the readers and is generalized for further anonymization.
The data after age generalization is shown in Figure 4.
5.2 Generate Pseudonymized data and map with key on original data
The data shown in Figure 4. is already in anonymized form and still contains fewer
columns to be subjected for anonymization. For example, fnlwgt represent the final
weight of each row and may lead to find individual items. Suppose, that column is to be
removed out, then no column left on the dataset to preserve the information of each
row after the removal of PII elements.
The fnlwgt is not to removed but hidden as a key for protecting and preserving the
meaning of each row on the dataset. The fnlwgt is kept as a key and subjected for
Pseudonymization or generating equivalent fake values. The entire data after
Pseudonymization is shown in Figure 5.
The fnlwgt column after Pseudonymization contains fake values as shown in Figure 5
An equivalent mapping process is also created with the original value of fnlwgt to
preserve the original information of each row. The mapping of original fnlwgt with
the faked value is shown in Figure 6.
The entire data after full data anonymization with the help of pseudonymization
(faking) is not interpreting any sensitive or private information, because all columns are
containing general information. The mapped value is kept in offline in order to preserve
the original information and meaning of data after anonymization.
6. CONCLUSIONS
The present work on this paper proposed a framework for protecting the privacy of
data during the data leakage from the cloud storage. The work effectively proved that
removal of all PII and adjusting of columns as a range hide all sensitive information to the
readers, in case the data might be subjected for leakage from cloud. The present work
preserves information from the original data in offline by maintaining a separate data
source. Suppose, there are larger keys to be maintained for perseverance of meaning,
the offline side needs more memory. Anonymizing the data for cloud storage
without maintaining any offline supporting resource is the beyond the scope of the present
work.
REFERENCES
[1] Renu Sara George and Sabitha S, “Data Anonymization and Integrity checking in
Cloud Computing”, IEEE,2013.
[2] Gabrial Ghinita et.al, “Fast Data Anonymization with low information loss”,ACM
library,2007,pp 758- 769
[3] Safwan Mahmud Khan and Kevin W. Hamlen, “AnonymousCloud: A Data
Ownership Privacy Provider Framework in Cloud Computing”, IEEE,2012, pp 170-
176.
[4] M. Armbrus et.al , “A view of cloud computing,” Communications of the ACM
(CACM), vol. 53, no. 4, pp. 50–58, 2010
[5] Amazon, “Amazon elastic compute cloud (Amazon EC2),”
http://aws.amazon.com/ec2, 2012.
[6] Fujitsu Research Institute, “Personal data in the cloud: A global survey of
consumer attitudes,” http://www.fujitsu.com/
global/news/publications/dataprivacy.html, October 2010.
[7] A. Froomkin. The Death of Privacy. Stanford Law Review, 52(5):1461–1543, 2000.
[8] Ruslan Korniichuk, “Easy-to-use GDPR guide for Data Scientist. Part 2/2”,
https:// medium. com/@ korniichuk/ gdpr- guide-2- 7c399b44 ba3,2017 .
[9] G . Ateniese et. al, ” Provable data possession at untrusted stores” , Proceedings of
the 14th ACM conference on Computer and communications security, CCS’07
,New York, USA, ACM, 2007, pp. 598-609.
[10] P. Samarati and L. Sweeney. “Generalizing Data to Provide Anonymity when
Disclosing Information (abstract)”. PODS, 1998.
[11] Juels A and Kaliski BS, ”PORs: proofs of retrievability for large files”,
Cryptology ePrint archive, June 2007, Report 2007/243.
[12] G. Ateniese, et al., ”Scalable and efficient provable data possession”, Proceedings
of the 4th international conference on Security and privacy in communication
networks, Istanbul, Turkey, 2008.
[13] Q. Wang et.al, ”Enabling Public Auditability and Data Dynamics for Storage
Security in Cloud Computing”, IEEE Transactions On Parallel And Distributed
Systems, Vol. 22,No. 5, May 2011.
[14] Z. Hao et.al ,”A Privacy-Preserving Remote Data Integrity Checking Protocol with
Data Dynamics and Public Verifiability”, IEEE Transactions On Knowledge And
Data Engineering, Vol. 23,No. 9, September 2011.
Volume 8, Issue 3, 2020 http://aegaeum.com/ Page No:
1257
AEGAEUM JOURNAL ISSN NO: 0776-3808
[15] J.-S. Xu et.al, “Secure document service for cloud computing,” in Proceedings of
the 1st International Conference on Cloud Computing (CloudCom), 2009,pp. 541–546.
[1 6 ] M. J. Freedman and R. Morris, “Tarzan: A peer-to-peer anonymizing network
layer,” in Proceedings of the 9th ACM Conference on Computer and
Communications Security (CCS), 2002,pp. 193–206.
[17] L. Sweeney,” k-Anonymity: A Model for Protecting Privacy”, International Journal
on Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5,pp. 557-570,
2002.
[18] Q. Liu et.al, “Secure and privacy preserving keyword searching for cloud
storage services,” Journal of Network and Computer Applications (JNCA), vol. 35,
no. 3,pp. 927–933, 2012.
[19] adults.csv, https://www.kaggle.com/wenruliu/adult-income-dataset,2017