You are on page 1of 43

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/329155467

HEAP: An Efficient and Fault-tolerant Authentication and Key Exchange


Protocol for Hadoop-assisted Big Data Platform

Preprint · November 2018

CITATIONS

6 authors, including:

Neeraj Kumar
Thapar Institute of Engineering and Technology (Deemed University), Patiala (Punjab), India
285 PUBLICATIONS   3,185 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Cryptographic Authentication Protocols View project

QoS-aware Energy Management Scheme for Self-sustainable Data Centers using Renewable Energy Sources View project

All content following this page was uploaded by Neeraj Kumar on 23 November 2018.

The user has requested enhancement of the downloaded file.


Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier ACCESS.2018.XXXXXXX

HEAP: An Efficient and Fault-tolerant


Authentication and Key Exchange
Protocol for Hadoop-assisted Big Data
Platform
DURBADAL CHATTARAJ1 , (Student Member, IEEE), MONALISA SARMA1 ,
ASHOK KUMAR DAS2 , (Senior Member, IEEE), NEERAJ KUMAR3 , (Senior Member, IEEE),
JOEL J. P. C. RODRIGUES4 , (Senior Member, IEEE), YOUNGHO PARK5 , (Member, IEEE)
1
Subir Chowdhury School of Quality and Reliability, Indian Institute of Technology, Kharagpur 721 302, India (e-mail: dchattaraj@iitkgp.ac.in,
monalisa@iitkgp.ac.in)
2
Center for Security, Theory and Algorithmic Research, International Institute of Information Technology, Hyderabad 500 032, India (e-mail:
iitkgp.akdas@gmail.com, ashok.das@iiit.ac.in)
3
Department of Computer Science and Engineering, Thapar University, Patiala 147 004, India (e-mail: neeraj.kumar@thapar.edu)
4
National Institute of Telecommunications (Inatel), Brazil; Instituto de Telecomunicações, Portugal; University of Fortaleza (UNIFOR), Brazil (e-mail:
joeljr@ieee.org)
5
School of Electronics Engineering, Kyungpook National University, Daegu 41566, Republic of Korea (e-mail: parkyh@knu.ac.kr)
Corresponding author: Youngho Park (parkyh@knu.ac.kr)
This work was supported in part by the Basic Science Research Program through the National Research Foundation of Korea funded by the
Ministry of Science, ICT & Future Planning (2017R1A2B1002147), in part by the BK21 Plus project funded by the Ministry of Education,
Korea (21A20131600011).

ABSTRACT Hadoop framework has been evolved to manage Big Data in Cloud. Hadoop Distributed File
System (HDFS) and MapReduce the vital components of this framework provide scalable and fault tolerant
Big Data storage and processing services at a lower cost. However, Hadoop does not provide any robust
authentication mechanism for principals’ authentication. In fact, the existing state of the art authentication
protocols are vulnerable to various security threats, such as man-in-the-middle, replay, password guessing,
stolen-verifier, previliged-insider, identity compromization, impersonation, denial-of-service, on/offline
dictionary, chosen plaintext, workstation compromization and server-side compromisation attacks. Beside
these threats, the state of the art mechanisms lack to address the server-side data integrity and confidentiality
issues. In addition to this, most of the existing authentication protocols follow a single-server based
user authentication strategy, which is, in fact, originates Single Point of Failure (SOF) and Single Point
of Vulnerability (SOV) issues. To address these limitations, in this work we propose a fault tolerant
authentication protocol suitable for Hadoop framework, which is called as HEAP (Efficient Authentication
Protocol for Hadoop). HEAP alleviates the major issues of the existing state of the art authentication
mechanisms, namely, operating system based authentication, password-based approach and delegated
token-based schemes, respectively, that are presently deployed in Hadoop. HEAP follows two-server based
authentication mechanism. HEAP authenticates the principal based on digital signature generation and
verification strategy utilizing both Advanced Encryption Standard (AES) and Elliptic Curve Cryptography
(ECC). The security analysis using both the formal security using the broadly-accepted Real-Or-Random
(ROR) model and informal (non-mathematical) security show that HEAP protects several well-known
attacks. In addition, the formal security verification using the widely-used Automated Validation of Internet
Security Protocols and Applications (AVISPA) ensures that HEAP is resilient against replay and man-
in-the-middle attacks. Finally, the performance study contemplates that the overheads incurred in HEAP
is reasonable and is also comparable to that of other existing state-of-art authentication protocols. High
security along with comparable overheads make HEAP to be robust and practical for a secure access to the
Big Data storage and processing services.

INDEX TERMS Cloud computing, authentication, key agreement, big data security, hadoop, formal
security, AVISPA.
VOLUME 6, 2018 1
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

I. INTRODUCTION [28], [29], one-time padding [13], [14], PKI (Public-Key


The existence of the digital universe is expanding by a factor Infrastructure) based approach [24], [31], implementation
of 300, from 130 exabytes to 40,000 exabytes, or 40 trillion of a Trusted Computing Platform (TCP) [15], combination
gigabytes (more than 5,200 gigabytes for every man, woman, of both password and possession based strategy [25], [26],
and child in 2020) from 2005 to 2020. From the recent time [30], authorization delegation based approach [22], [32],
until 2020, the digital universe will about double every two combined public and private key cryptography with random
years 1 . This of course, drag the attention of many researchers number generator based scheme [12], utilizing basic geome-
and practitioners in the field of Big Data storage and process- try structure based password storing [33] and Identity-Based
ing issue. To deal with this, various distributed file systems’ Authentication (IBA) scheme [23], respectively. However,
namely Hadoop Distributed File System (HDFS) [1], Google these explications are either expensive in terms of extra hard-
File System (GFS) [2], MooseFS 2 , zFS [3], Ceph [4], etc. ware cost or computationally intensive. Further, the existing
have evolved. However, among these, due to popularity, approaches [24], [25], [26], [30] enrol an end user (or service
simplicity and easy availability (open source), HDFS (the server) by asking his username and password (or service
principal component of Hadoop framework 3 ) is widely used server identity and secret key), where the username (or ser-
in industries and became the de facto standard platform for vice server identity) is used as the primary credential, which
Big Data storage. HDFS has been evolved to provide the is verified at the time of mutual authentication between user
storage service, where data is reliably kept in a distributed and service server respectively. In fact, selecting a username
fashion into different servers. In this connection, client stores (or service server identity) is not enough to be considered
Big Data into the geographically dispersed remote third as a strong private identifier. As a result, an adversary can
party servers through an insecure channel. This excavates easily incorporate different attacks, such as impersonation
several security concerns as the storage service access to be attacks and identity compromisation attacks by sniffing the
made over an insecure communication channel. Towards the username (or service server identity) from the insecure media
solution, a robust authentication mechanism is the preferred [10]. Moreover, these approaches are not considered the
solution. In this synergy, different authentication protocols user-side and service server-side identity untraceability and
have been proposed such as Kerberos 4 , OAuth 5 , OpenID anonymity properties. In spite of this, the existing password-
connect 6 , SAML 7 , etc. But for the sake of simplicity, based user enrollment strategy [30] which is currently incor-
scalability and applicability, operating system based secu- porated in Hadoop is vulnerable to password guessing, online
rity (i.e., password-based approach), Kerberos authentication or offline dictionary and stolen-verifier attacks. Additionally,
protocol (i.e., password with possession-based approach) and the existing approach [30] derive client’s secret key as the
delegated token based approaches are currently employed in hash value of its password. Therefore, the key will remain
Hadoop for enhancing its security [5] [6] [7] [8] [9]. same until client changes the current password. However,
To access Big Data storage or processing services over the changing this password needs updating the enrolled data
Internet, it is necessary for an end user (or service server) maintained by the KDC (or CRA) and this, in fact, invites
to initially enroll himself (or itself) with the Key Distribution many key rollover problems [10]. In addition to this, man-in-
Center (KDC) or a Centralized Registration Authority (CRA) the-middle, privileged-insider, denial-of-service, workstation
offline. In centralized registration mechanism, it is difficult compromisation, chosen plaintext and replay attacks are the
to update secret credentials of user and Hadoop clusters vis- key security threats that are not properly addressed in the
a-vis service servers dynamically. After enrollment, the end existing schemes [10].
user can access Big Data storage or processing services from In order to ensure mutual authentication and session key
the service server remotely over the Web. Usually, in such distribution between end user and service server (Namen-
a setting, the KDC (or CRA) stores the secret information ode or JobTracker), in possession based (also called token
of all the principals’ in its database, where a single point of based) approach, a trusted server distributes a token with
vulnerability and single point of failure makes the whole sys- large numbers of authentication parameters, that is, more
tem jeopardized [10]. In order to address these issues, many parameters are included into the constitution of an Authenti-
schemes have been reported in [5], [11], [12], [13], [14], cation Token (AT ) and authorization token (or Service Token
[15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], (ST )). Hence, AT and ST verification increases the over-
[26], [27], [28], [29], [30] that are based on different tech- head to the existing authorization server (or service server).
niques namely, smart card based approaches [18], [20], [27], In addition to this, tokens and session keys are stored into
user’s credential cache [30], [24], [25], [26] in the respective
1 http://www.emc.com/leadership/digital-universe/2012iview/ workstation, and each token has its own lifetime. So, it
executive-summary-a-universe-of.htm leads to workstation compromisation attack, disclosure of
2 MooseFS: Can Petabyte Storage be super efficient. https://moosefs.com/
session key as well as misuse of tokens. Moreover, an end
3 Apache Hadoop: https://hadoop.apache.org/
4 http://web.mit.edu/kerberos/
user blindly accepts the authentication services, that is, he
5 http://www.oauth.net/ completely rely on the trusted third party server (KDC’s AS)
6 http://openid.net/ issued shared secret session key without verifying the strong
7 saml.xml.org/ authenticity of the AS. Therefore, if the AS is compromised
2 VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

by a malicious insider, a byzantine attack can be induced curity credentials and hardwares (smart card, biometric
into the system which can falsify the primitive operations scanner, smart mobile device, etc.).
and it can also lead to the wrong desires [10]. Nonetheless, 5) The proposed protocol should disseminate securely the
some mutual authentication schemes [24], [25], [26], [30] use session key between two communicating parties.
time synchronization for joint authentication between end 6) The proposed scheme should provide a mechanism to
user and the service servers. More precisely, all principals read, write and process the user’s Big Data securely in
in a realm must be synchronized with a centralized time Hadoop cluster.
server. In fact, this is an overhead for the implementation 7) The proposed protocol should support user and service
of the protocol. In addition to this, clock in a distributed server anonymity by hiding their original identities from
system may not be always synchronized, so it may cause a eavesdroppers and privileged-insiders.
replay attack for both the end user and the service server 8) The proposed scheme should have a provision to gen-
[34]. In spite of this, in the existing authentication approaches erate a fresh session key securely in each session to
[24], [25], [26], [30], a user blindly trusts the authentication mitigate the workstation compromisation attack.
server (AS) without verifying any cross parameters (e.g., 9) The proposed protocol should have the capability to
message authentication codes, server-side generated one-way store the dictionary of password securely at the server-
hash chain based one-time identifiers [16], digital signatures side to mitigate the offline dictionary, password guess-
[10], etc.) after receiving the authentication token. To the ing and stolen-verifier attacks.
best of our knowledge, there is no solution to verify the 10) The proposed scheme should able to establish the ses-
originality of AS except the timestamps and visualization of sion key between two communicating parties without
password (or session key protected authentication or service timestamps utilization.
token) [10]. Hence, this shortcoming opens a possibility of
impersonation attacks [34], where a compromised principle To fulfill the above objectives, a two-server based authen-
can falsify the basic operations of the authentication system. tication framework has been introduced. This framework is
In addition to this, in Hadoop, there is no provision to structured in such a way that it mitigates the single point
verify the data integrity and confidentiality after archiving of failure (SOF) and single point of vulnerability (SOV)
end user’s or organization’s Big Data into HDFS. Since, the issues. Further, the proposed framework resists various well
raw data blocks (constructed from the Big Data) are stored known security threats, such as, man-in-the-middle, replay,
into various Datanodes as plaintext format, it is easy for an privileged-insider, Denial-of-Service (DoS), chosen plain-
adversary to modify the content of the data blocks easily text, password guessing, identity compromisation, imperson-
[5], [6], [7], [8], [9], [35], [36], [37], [38], [39], [40], [41]. ation, stolen-verifier, server spoofing, offline dictionary and
Additionally, the result of the processed Big Data utilizing workstation compromisation attacks. According to the policy
MapReduce framework is stored into end user’s local file of the proposed framework a service provider can enrol any
system, so anybody can read this content. To the best of our number of Hadoop clusters vis-a-vis service servers online
knowledge, there is no solution exists to mitigate these issues. with the Key Distribution Center (KDC). In this framework,
the proposed KDC consists of three different servers. Among
A. MOTIVATION them, two are public servers (one server interacts with clients
To address the aforesaid issues and challenges of the exist- only and the other communicate with Big Data service
ing authentication schemes that assists security in Hadoop providers) and the other is private server. Initially, clients and
framework, we set the following objectives in the proposed Big Data service providers enrol themselves offline with the
scheme: private server. After offline registration, both client and ser-
1) The proposed protocol should have the capability to vice provider would eligible for online registration through
enroll the Hadoop cluster’s service server online (in- the respective KDC’s public server. As the private server
stead centralized deployment of service servers) with is hidden from universal access, it ensures the server-side
the authentication service provider by advocating the security. In the proposed framework, after online registration,
scalability issue. the service providers are able to enroll his service servers
2) The proposed protocol should prevent different well- (Namenode servers or Job Trackers) online with the KDC.
known attacks, such as man-in-the-middle, replay, So, the service server registration is simple and scalable in
denial-of-service, privileged-insider, impersonation, nature. Mean while, the clients’ are able to communicate
identity compromisation, ciphertext-only, sever- directly with the service servers’ (Namenode Servers or Job
spoofing and chosen plaintext attacks. Trackers) in a Hadoop cluster after establishing a secret key
3) The proposed scheme should have a fault-tolerant and with the KDC’s public server followed by a two-server based
dependable authentication architecture to address the mutual authentication process (we call it as single sign-on).
existing SOV and SOF issues. This single sign-on facility provides the access to any number
4) In the proposed scheme, the authentication task for both of service servers by accomplishing only one time authenti-
Big Data technology provider and user should be more cation with the KDC. As a solution to the server spoofing and
robust and user friendly advocating less usage of se- DoS attacks, we consider here a two-factor (password and
VOLUME 6, 2018 3
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

authorization token) based authentication strategy. To pre- proposed scheme is secure against man-in-the-middle
serve privacy, in our scheme, we make the original identity and replay attacks.
of users and service servers fully anonymous. To enhance • To enhance security, the proposed protocol has a facil-
the robustness and correctness of entity authentication, we ity to dynamically update user’s password and service
propose a new digital signature based entity verification server’s secret credentials online with the help of the
scheme utilizing both symmetric and asymmetric key cryp- authentication servers.
tography. As a remedy of chosen plaintext attacks, we use • Finally, the proposed scheme is user-friendly in nature
stateless Cipher Block Chaining (CBC) mode of symmetric and the user needs to remember only his/her identity
encryption and decryption strategy where a random nonce is and password to login into the proposed authentication
utilized as an Initial Vector (IV). To establish a secure ses- system.
sion between two communicating parties, we propose a new
pair-wise session key agreement policy using elliptic curve C. ROAD MAP OF THE PAPER
cryptography. As a solution to the client-side workstation The remainder of the paper is structured as follows. We
compromisation attacks, we store client’s secret information discuss the network model of HDFS in Section II. Section
indirectly into a private place (server-side) and later on it III presents the related work associated with the entity au-
can be fetched by the legitimate user’s only. Moreover, to thentication in Hadoop. The necessary related mathematical
check the integrity of data blocks which resides into various preliminaries are discussed in Section IV, which are help-
chunk servers or Datanodes in HDFS, a Hash-based Message ful for describing and analyzing the proposed protocol. In
Authentication Code (HMAC) based secure HDFS-read and Section V, we demonstrate the proposed scheme. Section
HDFS-write operations has been introduced. VI presents both formal security analysis using the widely-
accepted Real-or-Random (ROR) model and informal se-
B. RESEARCH CONTRIBUTIONS curity analysis of the proposed protocol. In Section VII,
we simulate the proposed protocol under the broadly-used
The major research contributions devised in this paper are
On-the-Fly Model Checker (OFMC) and SAT-based Model
listed below.
Checker (SATMC) backends by utilizing the AVISPA tool
• We propose a new secure and scalable enrollment and summarize the attack traces. Section VIII presents the
methodology to register a cluster of service servers with performance analysis of the proposed scheme. In Section IX,
the trusted third party server by eliminating traditional we elaborate few appealing features as the realizations of the
in-house (centralized) service server registration policy. proposed scheme. Finally, we conclude the paper in Section
• We then introduce a new fault tolerant authentication X.
framework to provide dependable authentication ser-
vices for remote clients. II. NETWORK MODEL OF HDFS
• Next, we propose a new digital signature based mutual Apache Hadoop 8 is an open source and provides a new way
authentication policy, where each principal is able to for storing and processing Big Data. It consists of two core
verify the legitimacy of other intended principals along components. The former one is File Store (FS) and later one
with the trusted third party on which both the principal is a Distributed Processing System (DPS). The FS is called
rely on. as HDFS 9 and the DPS is termed as MapReduce 10 . HDFS is
• We also introduce a new approach for security cre- a distributed file system designed for storing very large files
dentials distribution and replication policies in order to with streaming data access patterns, running on clusters of
mitigate server-side single point of failure and single commodity hardware. Files are divided into blocks (default
point of vulnerability issues. block size is 64 MB) and blocks are replicated and stored at
• To distribute the session key securely between two different chunk servers (also called slave servers). The basic
intended principals, we then propose an elliptic curve architecture of HDFS is shown in Figure 1.
cryptography based session key distribution policy by Note here, we have shown only two Namenode servers
utilizing the concept of in-memory caching. (NSs) and one JobTracker (JT) in Figure 1, but in practical
• In addition, we propose a mechanism to disseminate HDFS federation architecture 11 it cloud vary up to n number
the session key between two communicating entities of such servers. Intuitively, the three noteworthy classifica-
without compromising their identities. tions of machine roles in a single Hadoop Cluster (HC) are
• The extensive formal security inspection by utilizing de client machine (i.e., HDFS Client), Master Node (MN) (i.e.,
facto Real-Or-Random (ROR) model and the informal combination of Namenode and Job Tracker) and Slave Nodes
security analysis substantiate that the proposed protocol (SNs) (i.e., combination of Datanodes (DNs) and Task Track-
can address various well-known attacks against active
8 https://hadoop.apache.org/releases.html
and passive adversaries. 9 https://hortonworks.com/apache/hdfs/
• The formal security verification using the widely-used 10 https://hortonworks.com/apache/mapreduce/
AVISPA tool has been carried out for the proposed pro- 11 https://hadoop.apache.org/docs/current/hadoop-project-dist/
tocol, and the AVISPA simulation results assist that the hadoop-hdfs/Federation.html

4 VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

FIGURE 1: System architecture of HDFS Federation (new release)

ers (TTs)) (see Figure 1). All these components are connected here is two fold: first, we discuss the state of the art au-
through a communication network. In this architecture, for thentication protocols and its variant that are actually studied
a single Hadoop Cluster say HCj , the MN regulates two in Hadoop (Big Data) platform and finally, we present the
useful functions i.e., reliable data storage using HDFS and recent development of Authenticated Key Exchange (AKE)
parallel computations utilizing MapReduce framework. The protocols in the domain of Cloud Computing platform as
Namenode manages and facilitates distributed data storage well as two-server based Password-assisted Authenticated
services wherein the JT administers the parallel processing Key Exchange (PAKE) schemes.
of stored data utilizing MapReduce (MR). Moreover, SNs are
responsible for streamline data blocks storing and running the A. STATE OF THE ART AKES FOR HADOOP (BIG DATA)
parallel computations over the stored data blocks. Each SN PLATFORM
runs both Datanode and TT daemon and receives instructions
A limited number of authentication and key exchange pro-
from their MN. The TT daemon is a slave to the JT and
tocols [5], [11], [13], [14], [15], [16], [17], [22], [23], [24],
the Datanode daemon acts as a slave to the Namenode.
[30], [32], [42] has been found in this category.
Client machine has Hadoop installed with all the cluster
settings, however, it is neither a master nor a slave. Rather, Shen et al. [15] have proposed a theoretical prototype
the role of the Client machine is to load data into the cluster, system combined with trusted platform support service. In
submit MapReduce jobs portraying how that data ought to be their scheme, they have used a Trusted Computing Platform
processed, and after that, retrieve or view the results of the (TCP) to resolve the process of authentication in Hadoop. In
job when it’s completed. TCP, the users identity is preserved and it is encrypted with
users personal key and this mechanism is integrated in the
hardware such as the BIOS and TPM. So it is very hard to
III. RELATED WORK decipher a user identity. The TCP is based on the Trusted
Hadoop framework has been evolved to manage a massive Platform Module (TPM). The TPM is used to safeguard the
volume of data in Cloud. However, Hadoop does not provide system from different kind of hardware and software attacks.
any robust authentication mechanism for principals’ authen- Authors have also pointed out the limitations in their scheme:
tication [5], [6], [7], [30], [34]. Since very few literature is (i) the stored data in the Datanodes will be decrypted when
available in this domain, the related work that is illustrated being accessed and will re-encrypted with different key after
VOLUME 6, 2018 5
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

being accessed, the performance of system will be reduced, scheme. The authors also did a comparative analysis between
(ii) in order to make the authentication system trusted, some their proposed user authentication service versus Nivethitha
information are need to be stored among Namenodes, Datan- et al. scheme and found out that their scheme requires less
odes and users, and finally (iii) TCB needs to fulfill many number of hash operations as compare to Nivethitha et al.’s
requirements of server-side and user-side, so it may be raised scheme.
bottleneck situation in the system. For users’ job authorization in Hadoop, an hash-based
Neuman et al. [30] proposed Kerberos authentication pro- (MD5 and SHA-1) delegated job token mechanism has been
tocol. In their proposed approach, a user first registers with reported in [5]. OAuth [22], OpenID connect [32] and SAML
the system to avail the services. In this scheme, all the [42] are the new evolving authorization delegation based
messages (i.e., authentication messages, service messages) approach and it has been prioritize over traditional Kerberos
are first encrypted using a shared secret key between two protocol for principals’ authorization and single sign-on ca-
parties, and then the two parties communicate with each pability incorporated in Hadoop.
other with encrypted forms of messages. It may be noted that Rahul and GireeshKumar [12] proposed a novel authenti-
in the Kerberos protocol, a password based approach with cation framework for Hadoop. Their framework uses cryp-
a token based strategy needs to be followed for principal tographic functions such as public key cryptography, private
authentication. According to the current practice, a user key cryptography, hash functions, and random number gen-
makes an authentication request to an authentication server erator. In this framework, they define a new key for each
(AS) by means of a plain text containing “username" [34]. client and authenticate all clients and services using this key.
In this context, an attacker can eavesdrop the “username" They claimed that their authentication framework offers user
and later expose himself to the AS as a legitimate user. In data protection, a new way of privilege separations, and basic
other word, an attacker can easily determine from the trans- security needs for data storing inside HDFS.
mitted message that which users are currently online. In this Sadasivam et al. [11] proposed a novel authentication pro-
situation, an attacker has scope to make man-in-the-middle tocol for Hadoop in cloud environment, where they have used
attacks and replay attacks [10]. Further, an eavesdropper can the basic properties of a triangle and modified two server-
make identity compromisation and impersonation attacks by based model to improve the security level of Hadoop clusters.
stealing the “username" if the channel is insecure [34], [43], In their scheme, they have interpreted and alienated the user
[44]. Moreover, the AS issues an authentication ticket (AT) given password using the authentication server and stored
to an end user after verifying only its “username" without in multiple back-end servers along with the corresponding
verifying user’s password or other security credentials [10]. username.
However, as “username" is not a confidential credential, there Kang and Zhang [23] proposed an Identity-Based Authen-
is an opportunity for an attacker to get multiple authenti- tication (IBA) scheme which is of short key size, identity-
cation tickets by simply sending a “username" to the AS. based, non-interactive. This scheme divides the sharing users
As a consequence, a cryptanalyst can decrypt the ciphertexts into the very same domain and in this domain relies on the
(i.e., ATs) using some knowledge about underlying user’s sharing global master key to exercise mutual authentication.
password. Thus, this scheme is vulnerable to Ciphertext-only Their IBA scheme can be enabled by an emerging crypto-
Attack (COA). To avert this challenges, a public key infras- graphic technique from the bilinear pairing (i.e., Weil and
tructure based Kerberos namely PKINIT [24] is reported and Tate pairing [45] and its security can be assured by the Bi-
deployed in Hadoop. But, it is not properly addresses the user linear Diffie-Hellman Problem (BDHP)). But the limitation
and service server’s privacy issues and other security threats of this scheme is, if the global master key is leaked, then the
[10]. total system will be jeopardized.
Nivethitha et al. [14] proposed an authentication scheme Sharma and Navdeti [6] listed various security mecha-
for Hadoop and it is based on the encryption mechanism nisms inside Apache Hadoop Stack. According to the au-
using one-time pad key. A random key is used to encrypt thors, most of the cases, the Kerberos approach is preferably
the password for secure transmission between the two servers used for delivering authentication services.
(Registration Server and Back-end Server). Authors has Srinivas et al. [17] proposed 2PBDC: a privacy-preserving
claimed that their protocol makes the Hadoop environment Big Data collection scheme in cloud environment utilizing
more secure as the new random key for encryption is gen- elliptic curve cryptography. The authors shows that 2PBDC
erated for each login. They also claimed that their scheme offers a better trade-off among the security and functionality
reduces the possibility to decrypt the cipher stored into the features, communication and computation overheads. Aujla
server for an adversary as it involves the knowledge about et al. [16] proposed SecSVA: Secure Storage, Verification,
the valid random key. Mrudula et al. [13] illustrated that and Auditing of Big Data in the Cloud Environment. The
Nivethitha et al.’s scheme is vulnerable to offline password authors presented an attribute-based secure data deduplica-
guessing attack and on success of it, an attacker can perform tion framework for data storage on the cloud, Kerberos-based
all major attacks on HDFS. They proposed a new authen- identity verification and authentication, and Merkle hash-
tication service for Hadoop framework which is light weight tree-based trusted third-party auditing on cloud.
and resists all major attacks as compared to Nivethitha et al.’s
6 VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

B. STATE OF THE ART TWO-SERVER BASED PAKES Advanced Encryption Standard (AES) for both public and
AND AKES FOR CLOUD COMPUTING PLATFORM private key cryptography, respectively. The cryptographic
Karla and Sood [46] proposed cookie-based authentication hardness property related to Elliptic Curve Decisional Diffie-
and key exchange protocol for cloud and IoT environment. Hellman Problem (ECDDHP), Elliptic Curve Discrete Loga-
Later, Kumari et al. [47] pointed out the security flaws of rithm Problem (ECDLP) and Indistinguishability of Encryp-
Karla and Sood scheme. Yang et al. [48] proposed an au- tion scheme under Chosen Plaintext Attack (IND-CPA) are
thentication scheme in a cloud environment setting. However, briefly explained in the following subsections. The proposed
Chen et al. [49] pointed out the security pitfalls in Yang scheme also utilizes the collision-resistant cryptographic
et al.’s scheme [48] that it is vulnerable to insider and one-way hash function. This section provides a brief dis-
impersonation attacks. To withstand these security loopholes cussion about the aforesaid mathematical preliminaries as
in Yang et al.’s scheme, Chen et al. then designed a dynamic follows.
ID-based authentication scheme for cloud computing envi-
ronment, which is based on the elliptic curve cryptography A. INDISTINGUISHABILITY OF ENCRYPTION SCHEME
(ECC). Wang et al. [50] reviewed Chen et al.’s scheme [49], UNDER CPA
and proved that their scheme is vulnerable to offline password The indistinguishability of encryption scheme under chosen
guessing as well as impersonation attacks. In addition, it plaintext attack (IND-CPA) [61] is mathematically explained
was found that Chen et al.’s scheme does not provide user as follow:
anonymity and it also has clock synchronization problem.
Definition 1 (IND-CPA Secure): Assume SGL or
Later, Hao et al. [51] presented a time-bound ticket-based
M EL be the single or multiple eavesdropper/s
mutual authentication scheme for cloud computing. The pur-
respectively, and OLEK1 , OLEK2 , · · · , OLEKM
pose of using the time bound tickets is to reduce the server’s
be M different independent encryption oracles
processing overhead. Unfortunately, Jaidhar [52] identified
related to EK1 , EK2 , · · · , EKM encryption keys,
that Hao et al.’s scheme [51] is insecure against denial-of-
respectively. The advantage functions of SGL and
service attack during the password change phase. Wazid et al. IN D−CP A
M EL respectively, are defined as AdvE,SGL (K) =
[20] also proposed a provably secure user authentication and
|2 · P rob[SGL ← OLEK1 ; (p0 , p1 ←R SGL);
key agreement scheme for cloud computing environment.
µ ←R {0, 1}; τ ←R OLEK1 (pµ ) : SGL(τ ) = µ] − 1|,
Their scheme withstands the weaknesses of the existing IN D−CP A
and AdvE,M EL (K) = |2 · P rob[M EL ←
schemes and it also supports extra functionality features,
OLEK1 , OLEK2 , · · · , OLEKM ; (p0 , p1 ←R M EL);
such as user anonymity, and efficient password and biometric
µ ←R {0, 1}; τ1 ←R OLEK1 (pµ ), τ2 ←R
update phase in multi-server environment.
OLEK2 (pµ ), · · · , τM ←R OLEKM (ptµ ) : M EL(τ1 ,
Recently, Gope and Das [53] proposed an anonymous
τ2 , · · · , τM ) = µ] − 1|. We can say a symmetric cipher E is
mutual authentication scheme for ubiquitous mobile cloud
IND-CPA secure for the single or multiple eavesdropper/s
computing services, which allows a legitimate mobile cloud IN D−CP A IN D−CP A
setting if AdvE,SGL (K) (or AdvE,M EL (K))
user to enjoy n-times all the ubiquitous services in a secure
is negligible for the given security parameter K of any
and efficient way, where the value of n may differ based
probabilistic and polynomial time adversary SGL (or
on the principal he or she has paid for. In addition, Odelu
M EL).
et al. [21] reviewed Tsai-Lo’s scheme [54] and pointed out
From Definition 1, it is easy to proof that a deterministic
that their scheme does not provide the session-key security
encryption scheme is not IND-CPA secure [61]. Further,
and also strong user credentials’ privacy. To remove the
there exists five generic modes of symmetric encryption
security weaknesses found in Tsai-Lo’s scheme, Odelu et
scheme in the literature, namely Electronic Codebook (ECB),
al. designed a provably secure authentication scheme for
Output Feedback (OFB), Cipher Block Chaining (CBC), Ci-
distributed mobile cloud computing services. In addition to
pher Feedback (CFB) and Counter (CTR) respectively. From
this, various biometric and smartcard based multi-factor au-
these aforesaid modes, both ECB and stateful CBC modes
thentication protocols [55], [56], [57], [58], [59], [60], [61],
are not IND-CPA secure, particularly in stateful CBC mode
[62], [63] are found in the recent literature for multi-server
the value of Initialization Vector (IV ) remains constrained
environment. In spite of these approaches, various two-server
which is shared between the sender and receiver. But, in
based PAKE schemes [64], [65], [66], [67], [68] are evolved
stateless CBC mode, the IV value is chosen randomly for
to mitigate server-side dependability (by addressing single
each message block. Thus, we use AES with stateless CBC
point of failure and single point of vulnerability) and security
mode of encryption or decryption policy throughout this
issues.
paper so that it becomes IND-CPA secure [61].
IV. MATHEMATICAL PRELIMINARIES
The proposed authentication protocol is based on both asym- B. ONE-WAY HASH FUNCTION AND ITS PROPERTIES
metric and symmetric key cryptography. In this context, A one-way hash function h: {0, 1}∗ → {0, 1}l takes a binary
in this work, we use Elliptic Curve Cryptography (ECC) string of variable length input, say x ∈ {0, 1}∗ and results
and stateless CBC (cipher block chaining) mode of the a binary string h(x) ∈ {0, 1}l as an output of fixed length,
VOLUME 6, 2018 7
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

say l bits. The formal definition of h(·) is provided as follows V. THE PROPOSED PROTOCOL
[69] [10]. In this section, we discuss the proposed scheme in detail. We
Definition 2 (Collision-resistant one-way hash function): If call the proposed scheme as HEAP (Efficient Authentication
an adversary A’s advantage in finding collision in hash Protocol for Hadoop). The system architecture of HEAP is
HASH
outputs with the execution time t is denoted by AdvA (t), shown in Figure 2.
HASH
it is defined by AdvA (t) = P r[(x, y) ←R A: x 6= y
and h(x) = h(y)], where P r[E] is the probability of an A. SYSTEM MODEL
event E and (x, y) ←R A means the pair (x, y) is randomly Six types of principals are involved in the proposed sys-
chosen by A. By an (η, t)-adversary A attacking the collision tem model: 1) client (C), 2) Big Data Service Provider
resistance of h(·), it indicates that the execution time of A is (BDSP) 3) Namenode Server (N S) or Job Tracker (JT ),
HASH
at most t and that AdvA (t) ≤ η. 4) Client Management Server (CMS), 5) Namenode Man-
Examples of a one-way hash function include the Secure agement Server (N M S) and 6) Enrolment Server (ES).
Hash Standard (SHA-1) hashing algorithm and the stronger Both CM S and N M S are the public servers in two-server
SHA-256 hashing algorithm [70]. model, whereas ES is the private server. CM S is reachable
to Ci utilizing a client application instance say HCAj ,
C. ELLIPTIC CURVE AND ITS PROPERTIES where i ∈ {1, 2, 3, · · · , n}. N M S is reachable to BDSPj ’s
Suppose m, n ∈ Zp , where Zp = {0, 1, . . . , p−1} and p > 3 administrator using a server application instance say HSAk ,
is a prime [10]. A non-singular elliptic curve y 2 = x3 +mx+ where j ∈ {1, 2, 3, · · · , m}. Both CM S and N M S are
n over the finite field Zp is the set Ep (m, n) of solutions reachable to adversaries but, ES operates in the background
(x, y) ∈ Zp × Zp to the congruence and it is fully supervised internally by the respective system
administrator only. Thus, ES is fully trusted principal in the
y 2 ≡ x3 + mx + n (mod p),
network. To make the proposed system model fault-tolerant,
where m, n ∈ Zp such that 4m + 27n2 6= 0 (mod p), and
3
we distribute Ci ’s secret credentials into two servers (N M S
a point at infinity or zero point O. and ES) whereas disseminates BDSP ’s administrators and
Note that 4m3 + 27n2 6= 0 (mod p) is a necessary and their deployable service server’s (N S’s and JT ’s) private
sufficient condition to ensure a non-singular solution for the (secret) information into another pair of servers (CM S and
Eq. x3 + mx + n = 0 [71]. 4m3 + 27n2 = 0 (mod p) ES).
implies the elliptic curve is singular. Let P = (xP , yP ), Initially, BDSP ’s administrator needs to register all the
Q = (xQ , yQ ) ∈ Ep (m, n). Then xQ = xP and yQ = −yP service servers (N S and JT ) of his own Hadoop cluster
when P + Q = O. Also, P + O = O + P = P , for all with ES online. To do this, BDSP ’s administrator first
P ∈ Ep (m, n). Hasse’s theorem states that the number of enrol himself with ES and go through an authenticated key
points on Ep (m, n), denoted as #E, satisfies the following agreement procedure utilizing both N M S and CM S server.
inequality [72]: C enrol himself with ES during registration phase, but the
√ √ authenticated session key agreement task will be held by
p + 1 − 2 p ≤ #E ≤ p + 1 + 2 p.
both CM S and N M S. To prove the legitimacy of client
In other words, there are about p points on an elliptic curve C and BDSP ’s administrator, both need to give responses
Ep (m, n) over Zp . Also, Ep (m, n) forms a commutative or about three different challenges (specifically maintained by a
an abelian group under addition modulo p operation. two-step verification process utilizing user identity, password
• Elliptic curve point addition: Let P, Q ∈ Ep (m, n) be and digital signature) assisted by both the servers (CM S
two points on the elliptic curve. Then, R = (xR , yR ) = and N M S). The successful legitimacy checking provides
P + Q is calculated as follows [72]: a Big Data storage or processing service server ticket to C
and service server enrolment access privilege to BDSP ’s
xR = (λ2 − xP − xQ ) (mod p),
administrator. The provided service ticket will then give
yR = (λ(xP − xR ) − yP ) (mod p), access to the N S or JT after accomplishment of a mutual
( yQ −yP
xQ −xP (mod p), if P 6= −Q authentication and session key establishment process. The
where λ = 3xP 2 +m application instance HCAj will give access to the CM S for
2yP (mod p), if P = Q.
Ci including adversaries whereas HSAk will provide access
• Elliptic curve point scalar multiplication: In ECC, to N M S for BDSP ’s administrator including attackers.
multiplication is done as repeated additions. For exam- But, it is not possible for an adversary to access both CM S
ple, 5P = P + P + P + P + P , where P ∈ Ep (m, n). and N M S together utilizing a single application say HCAj
Definition 3 (ECDLP assumption): Given an elliptic curve or HSAk .
Ep (m, n) and two points R, S ∈ Ep (m, n), find an integer x
such that S = x · R. B. ADVERSARY MODEL
Definition 4 (ECDDHP assumption): Given a point R on an Presently, we have found three widely used threat mod-
elliptic curve Ep (m, n) and two other points x · R, y · R els in the literature such as, Dolev-Yao threat model (DY
∈ Ep (m, n), find (x · y) · R. model) [73], Canetti and Krawczyk adversary model (CK-
8 VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

FIGURE 2: System architecture of HEAP

adversary model) [74], and Extended Canetti and Krawczyk between C and N S or C and JT during communication. We
threat model (eCK-adversary model) [75] to model active and also assume that the information stored at the C’s workstation
passive adversaries. However, we adopt DY model and CK- (mobile device) can be stolen by Adv and after obtaining
adversary model to study the proposed protocol. the stolen information, Adv can perform the stolen-verifier
Under DY model [73], an insecure channel between two and privileged-insider attacks. In addition, under the CK-
communicating parties has been modeled mathematically in adversary model, adversary Adv can have access some form
such a way that an adversary Adv can intercept, delete or of secret credentials including session key and session states
modify the exchanged messages. In addition to this, Adv between C and N S or C and JT . Under this assumption,
may insert a fake message into the communication media to the proposed protocol needs to show less security breech
disgust the normal operations between two communicating possibility of other entities’ (BDSP , CM S, N M S and ES)
parties. In the CK-adversary model [74] (the super set of secret credential due to the leakage of “session ephemeral
DY model), the adversary Adv not only eavesdrop, delete secrets" between C and N S or C and JT .
or modify the exchanged messages between two communi- However, in this study, we inspect several known security
cating parties but also having the access to the session keys threats such as, chosen plain-text, denial-of-service, man-in-
(short-term keys), long-term secret keys and session states the-middle, online password guessing, server compromisa-
of each party involves into the key agreement process. This tion, replay, privileged-insider, stolen-verifier, offline pass-
model ensures the security of the authenticated key agree- word guessing, workstation compromisation, server spoofing
ment protocol considering some sorts of security credentials and identity compromisation attacks considering both DY
(long-term and short-term) leakage and its impact on the and CK-adversary model.
security of other secret credentials.
We follow both DY and CK-adversary model in the pro- C. GENERAL OVERVIEW OF HEAP
posed protocol, where we assume HEAP-KDC is trusted for HEAP goes through five basic operations: (i) HEAP-KDC
both C and N S (or JT ). Further, it is assumed that CM S configuration, (ii) user enrollment, (iii) Big Data service
and N M S are semi-trusted, whereas ES is fully trusted provider registration, (iv) Hadoop Cluster vis-a-vis service
server. According to the policy of the DY model, any two server enrollment and (v) mutual authentication and session
parties such as C and N S or C and JT , are not consid- key agreement between user and service server. Two security
ered as trustworthy principals in the network. Therefore, in application instances namely HCAj and HSAk are running
this DY model, an adversary (active and passive) Adv can separately on user’s workstation and service provider work-
then eavesdrop, modify or delete the exchanged messages station to access a particular public server (CM S or N M S)
VOLUME 6, 2018 9
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

of the HEAP-KDC’s realm. More precisely, C accesses only C. Utilizing this two tickets, C establishes a secret session
CM S through the application HCA and the service provider with a service server (either N S or JT ) associated with a
accesses only N M S through the application HSA. particular cluster.
Initially, a Big Data Service Provider (BDSP ) (or specif- During two-server based mutual authentication and key
ically the BDSP’s administrator (BA)) needs to register him- agreement phase, the service provider (BA) enters his origi-
self with the ES through out-of-band channel (for example, nal identity and password to HSA. HSA computes a masked
a postal network) with his identity proof documents, service identity of the given identity. HSA then construct a digital
level agreement, service server details, etc. This entails to signature on the masked identity utilizing a random nonce
avail a one-time dummy identity, one-time dummy password and BA’s chosen private key. After that, HSA sends the
and a pass-phrase to the service provider. After receiving masked identity and the digital signature to N M S. In the
these parameters offline, the service provider enrolls himself same way, N M S construct a digital signature by sign-
online with the HEAP-KDC utilizing both N M S and HSA. ing its original identity with its private key. N M S sends
During online registration, the service provider needs to BA’s masked identity and BA’s digital signature along with
provide both the dummy identity and the dummy password N M S’s encrypted digital signature and N M S’s identity to
to N M S through HSA. This entails to get a user account CM S. After verifying the identities of both BDSP and
creation permission from N M S. In such a provision, the ser- N M S, CM S decrypt N M S’s signature. Thereafter, CM S
vice provider sends his information namely, masked identity, modifies both the digital signatures utilizing the previously
masked password, email identity and mobile number, etc. shared pass-phrases of both the parties (BDSP and N M S)
to both CM S and ES servers utilizing N M S via a secure and CM S’s private key. CM S encrypts the modified signa-
channel (utilizing pass-phrase as a key). After sending the tures using BDSP ’s masked password and N M S’s shared
service provider’s information to CM S and ES, N M S only key. CM S then sends both the encrypted signatures to
keeps the masked identity in its database. Similarly, an end N M S. N M S decrypts the respective signature and veri-
user C register himself with ES by taking the help of CM S fies the legitimacy of both BDSP and CM S utilizing the
and HCA. During this operation, C sends his transformed previously loaded security parameters and sends the other
identities, masked password, email identity and mobile num- signature to BDSP . After receiving the modified signature,
ber securely to N M S and ES servers via CM S. In this way, BDSP (or BA) checks the legitimacy of both N M S and
both user and service provider registration process has been CM S. Finally, using the random nonces and ECC, both
accomplished. BDSP and N M S establish a session key between them-
After accomplishment of registration task, the BDSP selves. In the same way, at the time of single sign-on process,
needs to keep his original identity and password with himself. C establishes a short-term key with CM S.
These two secrets are utilized at the time of service provider After establishment of a secure session with N S utilizing
login phase. After successfully logged in into its workstation HCA, C outsources its Big Data in terms of raw data blocks
(locally) using HSA, the BDSP can register his Hadoop and its replicas into several Datanodes or chunk servers under
Cluster vis-a-vis Big Data storage and processing service the supervision of N S. In such a provision, HCA supplies
servers (N Ss’ and JT s’) with HEAP-KDC followed by a the session key to the corresponding HDFS Client (HDCL) to
mutual authentication and key agreement process utilizing achieve secure and integrity-assisted HDFS-read and HDFS-
both N M S and CM S servers (two-server based authenti- write operations. Thus, it will protect the user’s confidential
cation). This process is scalable in nature, where any Big Big Data from the third party interception.
Data service providers can able to enroll his cluster’s service To make the proposed authentication protocol fault tol-
servers’ online with HEAP-KDC. BDSP securely enrolls erant in terms of security credentials’ replications, we keep
each service server to both CM S and ES through N M S the service providers and service servers credentials (mainly
by assigning a service server’s masked identity and a masked transformed identities and masked passwords) under CM S’s
password. custody whereas disseminate the C’s credentials to N M S.
Similarly, after registration, C needs to keep his user Mean while, all the security credentials information are
identity and password secret with himself for logging in replicated concurrently into ES server. In addition to this,
into its workstation. At the time of login, C needs to au- to transform the service provider’s identity and password,
thenticate himself in its workstation locally utilizing HCA two random secrets (one secret generated by N M S and
by providing his identity and password. After that, C goes other produced by the HSA) are embedded with the ser-
through a mutual authentication and key formation process vice provider’s original user identity and password first,
utilizing both CM S and N M S servers (dual server based and afterwards a cryptographic one-way hash function has
authentication). This entails a short-term key to C. Utilizing been applied with themselves. Similarly, C’s original user
this short-term key, C establishes a secure session with identity and password are encapsulated with CM S’s chosen
CM S. We call this process as single sign-on of client C. secret and HCA’s secret, respectively. Thus, the aforesaid
After the single sign-on task, CM S provides two encrypted mechanism leads to create a strong password for both the
tickets: (1) client ticket and 2) service server (either Big Data parties (service provider and C) as well as reduce the chance
storage service server or processing service server) ticket to of both single point of failure and single point of vulnerability
10 VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

Notation Description
issues.
BDSP Big Data Service Provider
To discuss HEAP methodology, we use various notations BDSPID Original identity of BDSP
throughout the paper. The notations and their descriptions BDSPP W D Password of BDSP
C Client
are listed in Table 1. In addition to this, we make certain CID Original identity of C
assumptions for HEAP. These assumptions are described as CM S Client Management Server
follows: CertE Certificate of an entity E issued by TCCA
CM SID Public identity of CM S
ES Enrollment Server or Registration Server (RS)
1) Two security application instances say HCAi and ESID Public identity of ES
HSAk are running concurrently into two separate work- E(K : [msg]) A message msg is encrypted using the key K
stations (user’s and service provider’s workstations) Ep (m, n) An elliptic curve y 2 ≡ x3 + mx + n (mod p)
on the finite field Zp
which enable to access the public servers (i.e., CM S E A entity such as C, BDSP , CM S, N M S, N S or JT
and N M Ss) in a HEAP-KDC realm. G A generator point ∈ Ep (m, n) of some order (say, q)
2) An administrative application instances say, “h-admin" HCAj An application instance running on Ci ’s
workstation to access the CM S server
takes the responsibility of initial credentials’ (i.e., pri- HCAID Public identity for the application HCAj
vate keys, pass-phrases and public identities) configura- HSAk An application instance running on BDSPj ’s
tion in the key distribution center (HEAP-KDC) (also workstation to access the N M S server
HCAID Public identity for the application HCAj
see Figure 2).
HCj j th Hadoop cluster
3) A Trusted Central Certification Authority (TCCA) h(·) One way hash function such as (SHA-1 or SHA-2)
chooses a generator G on the elliptic curve Ep (m, n) of H1 (·) A one way hash function, where

order q, and selects two cryptographic hash functions H1 : {0, 1} → P
H2 (·) A one way hash function, where
say, H1 (·) and H2 (·). Further, the TCCA generates a H2 : P → {0, 1}∗
certificate CertE for an entity E. The entity E ran- JT Job Tracker
domly chooses sE ∈ Z∗q as private key and computes JTID Public identity of JT
KDC Key Distribution Center
the corresponding public key as QE = sE · G. KN M S,C A pass-phrase shared among N M S, ES and C
4) BDSP enrolls (i.e., online registration) himself with KCM S,N M S A shared symmetric key between CM S and N M S
CM S via N M S followed by an offline registration KCM S,N S A shared symmetric key between CM S and N S
KCM S,JT A shared symmetric key between CM S and JT
with ES. After registration, BDSP needs to login into KCM S,BDSP A pass-phrase shared among CM S, ES and BDSP
the system. After login, BDSP registers his Hadoop NMS Namenode Management Server
cluster vis-a-vis the service servers (N Ss’ and JT ’s) NS Namenode Server
N SID Public identity of N S
with HEAP-KDC. Note that at least one cluster needs N M SID Public identity of N M S
to be deployed with the KDC before initiating client PWD Password of C
registration. P A set of all points on the elliptic curve Ep (m, n)
QCM S Public key of CM S
5) C enrolls (i.e., online registration) himself with N M S QN M S Public key of N M S
via CM S followed by an offline registration with ES. QC Public key of C
After registration, C needs to login into the system for QN S Public key of N S
QBDSP Public key of BDSP
accessing the service servers (N S’s or JT ’s). q 160-bit prime
6) C and N S or C and JT are not considered as trusted RCCM S
i
A random number issued by CM S to transform
entity. They should mutually verify their legitimacy with CID and P W D
NMS
RBDSP A random number generated by N M S to transform
the help of both CM S and N M S. After verification,
BDSPID and BDSPP W D
either C and N S or C and JT become trusted to each SCM S Private key of CM S
other. SN M S Private key of N M S
7) CM S keeps masked identities of Cs, RC CM S
s and all SC Private key of C
i SN S Private key of N S
the secret credentials related to the Hadoop cluster vis- SBDSP Private key of BDSP
a-vis the service servers information whereas N M S SIDN j
S A shared pass-phrase between j th N S and CM S
NMS j
stores masked identities of BDSP s, RBDSP s and all SIDJT A shared pass-phrase between j th JT and CM S
SIDN M S,CM S or A shared pass-phrase between CM S and N M S
the secret credentials of clients (Cs). ES is having all SIDCM S,N M S
the secret information of Cs’, BDSP s’ and service T N SID A masked identity of N S
servers’. Note here, it is not permissible for HEAP- T JTID A masked identity of JT
TCCA Trusted Central Certification Authority
KDC’s server (CM S or N M S or ES) to store the
MT U A masked identity of C
secret credentials (mainly identity, password) of any MT U
0
A masked identity of BDSP
principals in a plaintext format. Zq A finite field consisting of 0, 1, · · · , (q − 1) elements
8) C or BDSP goes through a two-server based mutual
authentication to avail services from the service server TABLE 1: Notations and their meanings
or deploy a new cluster with HEAP-KDC.
9) Finally, ES does not available to any other entities (Cs,
N Ss and JT s) except N M S and CM S.
VOLUME 6, 2018 11
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

D. DETAILED DESCRIPTION OF HEAP tity (SBDSPID ), a synthetic password (SBDSPP W D ) and a


This section illustrates the detailed description of the pro- pass-phrase (KCM S,BDSP ) respectively, via the out-of-band
posed protocol phases as follows. channel to the BDSP ’s physical address. Before sending
these three security credentials to BA, ES securely sends
1) HEAP-KDC configuration SBDSPID and SBDSPP W D to the N M S for creating a
HEAP undergoes an initial configuration phase, where a Sys- synthetic account of BA. ES also sends securely the pass-
tem Administrator (SA) frames the Key Distribution Center phrase KCM S,BDSP to the CM S. Note that BA needs
(i.e., HEAP-KDC). In this phase, all public servers are pre- to use SBDSPID , SBDSPP W D and KCM S,BDSP only for
loaded with secret credentials administered by the SA. In this once to create his own profile into CM S with the help
regard, an admin process, called h-admin, runs at the time of N M S. Although, BA is permissible to use this pass-
of HEAP-KDC configuration under the supervision of the phrase KCM S,BDSP for his password updation and service
SA. This phase follows a public server registration process server registration process. Figure 3 summarizes the BA’s
to register both CM S and N M S with ES. In contrast, registration phase, which contains the following steps:
ES loads the public identities of CM S and N M S namely Step BDSPRG1: BA enters SBDSPID and SBDSPP W D
0
CM SID and N M SID in its database and share its public into HSA. HSA generates a random nonce say n1 .
0
identity with both the servers. In addition, the h-admin pro- HSA encrypts SBDSPID and n1 using SBDSPP W D
0
cess assigns three long-term shared secret symmetric keys as BDSP Dtl = E(SBDSPP W D : [SBDSPID , n1 ]).
0
between CM S and ES, N M S and ES, and CM S and HSA sends the message msgB1 = {N M SID , n1 ,
N M S as K(CM S,ES) , K(N M S,ES) and K(CM S,N M S) , re- BDSP Dtl } to the N M S.
spectively. Further, h-admin generates one pass-phrase say Step BDSPRG2: After receiving msgB1 , N M S decrypts
SID(CM S,N M S) . Finally, h-admin loads all these security BDSP Dtl and checks the availability of SBDSPID in
parameters into respective servers. After the completion of its database. If it exists then, BA is permissible to
h-admin process execution, the known security parameters create its own account into CM S and return msgB2 =
0
with the HEAP-KDC are summarized in Table 2. {N M SID , CM SID , n1 }, and goto Step BDSPRG3
CMS CM SID , N M SID , ESID , else, reject BA’s request.
HEAP-KDC SID(CM S,N M S) , K(CM S,ES) , K(CM S,N M S)
NMS CM SID , N M SID , ESID , Step BDSPRG3: BA enters its original identity as
SID(N M S,CM S) , K(N M S,ES) , K(CM S,N M S) BDSPID in HSA. HSA chooses a0 random secret d
ES CM SID , N M SID , ESID ,
K(CM S,ES) , K(N M S,ES) and transforms the BDSPID as T U = h(BDSPID ||
HCA CM SID , HCAID 0
Client and Service provider application HSA N M SID , HSAID d ||HSAID ). Using this transformed identity T U ,
HCA CM SID , HCAID both BA and N M S establishes a shared secret key
(SK(N M S,BDSP ) ) between themselves followed by a
TABLE 2: Summary of pre-loaded credentials into HEAP-
mutual authentication process utilizing the similar anal-
KDC after the execution of h-admin
ogy say “Initial key establishment between Ci and
F EAS" reported in DPTSAP [10].
Note that h-admin also assigns two public identities for
Step BDSPRG4: BA enters its new password as
both client application (HCA) and service provider ap-
BDSPP W D into HSA. HSA computes the masked
plication (HSA), say HCAID and HSAID , respectively.
password BDSPP∗ W D = h(h(d|| BDSPP W D )
HCAID is publicly available to CM S and HSAID is NMS
||RBDSP ). Note that at the time of shared secret
publicly available to N M S. Similarly, the public identity NMS
key establishment process RBDSP was generated by
of CM S is known to HCA whereas the public identity of
N M S and it has been delivered securely using the
N M S are known to HSA. These identities are also useful
key SK(N M S,BDSP ) . Further, both BA’s request and
for the key agreement process at the beginning of the client
N M S’s response messages using DPTSAP [10] are
(or Big Data service provider) registration and login phases,
represented as msgB3 and msgB4 , respectively in Fig-
respectively.
ure 3.
Step BDSPRG5: HSA encrypts BDSPP∗ W D
2) Big Data service provider registration 0
using K(CM S,BDSP ) as BDSP Dtl =
Suppose a Big Data Service Provider (BDSP ) wants to 0

deploy a new Hadoop Cluster (HCj ) for providing the E(K(CM S,BDSP ) : [BDSPP∗ W D , MT U ]), where
0
NMS
data storage and processing services via Internet. To enrol MT U = h(BDSPID || d|| HSAID || RBDSP ).
0
the HCj with HEAP-KDC, BDSP ’s Administrator (BA) HSA sends the message msgB5 = {N M SID , n2 ,
0
needs to register himself with ES offline (i.e., via the out- BDSP Dtl } to the N M S.
of-band channel or postal network) by giving the detail Step BDSPRG6:0 After receiving0
msgB5 , N M S broadcasts
about total number of service servers’, service types, Ser- both MT U and BDSP Dtl to both ES and CM S
0
vice Level Agreements (SLAs), service servers’ location servers (see msgB5.1 ). N M S keeps only MT U and
NMS
information, service servers’ subscription, payment docu- RBDSP in its database and deletes other informa-
0
ments, etc. This entails the BA to avail a synthetic iden- tions, wherein, after decrypting BDSP Dtl , both ES
12 VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

BDSP with HSAj NMS CM S and ES

Input: SBDSPID , SBDSPP W D , KCM S,BDSP NMS


Input: CM SID , RBDSP Input: KCM S,BDSP

- compute
BDSP Dtl = E(SBDSPP W D :
0
[SBDSPID , n1 ])
0
msgB1 = {N M SID , n1 , BDSP Dtl }
−−−−−−−−−−−−−−−−−−− −−−−−−−−−→
0 ?
-check SBDSP = SBDSPID
ID
0
msgB2 = {N M SID , CM SID , n1 }
←−−−−−−−−−−−−−−−−−−−−−−−−−− −
- select a random secret d
- compute masked identity
0
T U = h(BDSPID ||d||HSAID )
0
msgB3 = DP T SAP − IKE : {T U }
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→
- symmetric key establishment
msgB4 = DP T SAP − IKE : {E(SKN M S,BDSP : [RN M S ])}
←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−BDSP
−−−−−−−
- symmetric key establishment
NMS
-decrypt RBDSP
- compute
0
NMS )
MT U = h(BDSPID || d|| HSAID || RBDSP
and
0 0
BDSP Dtl = E(K(CM S,BDSP ) : [BDSPP∗ W D , MT U ])
0 0 0
msgB5 = {N M SID , n2 , MT U , BDSP Dtl }
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− −−→
- Keep
0
NMS
MT U and RBDSP
0 0
M sgB5.1 = {N M SID , MT U , BDSP Dtl }
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− −−→
- decrypt password
- store
0
MT U and BDSPP∗ W D
0
M sgB5.2 = {N M SID , MT U }
←−−−−−−−−−−−−−−−−−−−−−−−−
0 0
msgB6 = {N M SID , n2 , MT U }
←−−−−−−−−−−−−−−−−−−−−−−−−− −
- compute
d∗ = d ⊕ h(BDSPID ||BDSPP W D )

N M S = RN M S ⊕ h(d||BDSP
RBDSP BDSP P W D)

FI = h(BDSPID ||BDSPP W D )

K(CM NMS )
= K(CM S,BDSP ) ⊕ h(BDSPP W D ||RBDSP
S,BDSP )
N M S ||d)
BDP W = h(BDSPID ||HSAID ||BDSPP W D ||RBDSP

- HSAj stores d∗ , RBDSP
N M S , FI, K ∗
(CM S,BDSP )
and BDP W

into his database for future use

FIGURE 3: Summary of service provider registration

0
and CM S stores MT U and BDSPP∗ W D into their j = 1, 2, · · · , m). The proposed enrolment phase consists of
corresponding databases (see msgB5.2 ). Finally, N M S three activities: 1) service provider (BA) login, 2) mutual
sends a registration confirmation message as msgB6 to authentication and session key establishment between BA
BA through HSA and goto Step BDSPRG7. and N M S and 3) service servers registration. The detail
Step BDSPRG7: HSA computes d∗ = d⊕ h(BDSPID steps involved in this process are discussed as follows. For
N M S∗ NMS
||BDSPP W D ), RBDSP = RBDSP ⊕h(d simplicity, in this study, we assume a particular cluster say
||BDSPP W D ), FI = h(BDSPID || BDSPP W D ), HCj consists of a single Namenode server (N Sj ) and a

K(CM S,BDSP ) = K(CM S,BDSP ) ⊕h(BDSPP W D single Job Tracker server (JTj ), and N Sj is responsible to
NMS provide the Big Data storage services wherein JTj yields the
|| RBDSP ) and BDP W = h(BDSPID ||HSAID
NMS Big Data processing services to the remote user online.
||BDSPP W D || RBDSP ||d), and then stores these
information into HSA’s database. HSA will use these (A) Service provider login at workstation: After the
information at the time of BA’s login, service server completion of BA’s registration process, BA tries to login
registration and password updation phases. into the system using the server application HSA. Note
Note that after accomplishment of the registration process, here, before initiating service provider (BA) login into BA’s
BA needs to remember only two parameters BDSPID workstation, HSA loads all BA’s transform identities (that
and BDSPP W D to login into the system and then he can is, FI i , where i = 1, 2, · · · , k) into its browser cookie.
enrol any number of Hadoop Clusters (HC) online with A diagram summarizing several communication message
HEAP-KDC. Thus, the proposed service provider registra- exchanges between BA and N M S involved throughout the
tion scheme is user friendly and scalable in nature. The online service provider login, authenticated key establishment and
enrolment process of the HC driven by the BA is presented service server registration process are shown in Figure 9,
as follows. and it contains the following steps:
Step BDSPL1: BA enters his original identity BDSPID
3) Hadoop cluster registration and password BDSPP W D into HSA. HSA computes
Hadoop cluster registration vis-a-vis service servers enrol- FI ∗ = h(BDSPID ||BDSPP W D ) and checks this
ment phase is proposed to carry out the different activities entry exists in its cookie or not. If it exists then HSA
N M S∗
starting from service provider login to online registration of loads the respective d∗ , RBDSP ∗
, KCM S,BDSP and
service servers (mainly, all Namenode servers and JobTrack- BDP W entries for the same BA.
ers belongs to a particular Hadoop cluster say HCj , where Step BDSPL2: HSA computes d = d∗ ⊕ h(BDSPID
VOLUME 6, 2018 13
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

∗ 0
NMS NMS
||BDSPP W D ), RBDSP = RBDSP ⊕ h(d || function bdspSignPairGen[ ](MT U , SBDSP )
∗ {
BDSPP W D ) and K(CM S,BDSP ) = K(CM S,BDSP ) Input: BDSP/BA’s masked identity and BDSP ’s private key
NMS
⊕ h(RBDSP || BDSPP W D ), and goto Step BDSPL3. Output: Two ECC points as λ1BDSP and λ2BDSP
Domain Parameters: H2 (·), H1 (·), G, q
Else, BA can repeat Step BDSPL1 with another user 1. Select a pseudo-random number RBDSP ∈ Z∗q
identity and password. 2. Compute λ1BDSP := RBDSP · G
0
Step BDSPL3: HSA computes BDP W ∗ = h(BDSPID || 3. Compute λ2BDSP c := RBDSP · H2 (H1 (MT U )) +SBDSP ·
1
H2 (λBDSP ) (mod q)
HSAID || BDSPP W D || d) and checks if the condition
4. return RBDSP , λ1BDSP , λ2BDSP
BDP W ∗ = BDP W holds or not. If it holds, BA is }
treated as an authentic service provider. 0
c Note: BDSP digitally sign on both MT U and λ1BDSP using its
(B) Authenticated key agreement phase: After success- private key SBDSP .
ful logging in into the system, HSA initiate an authenticated
key formation process between BA/BDSP and N M S. The
FIGURE 4: BDSP ’s signature generation process
detail steps involved in this process are shown in Figure 9 and
are discussed as follows.
function nmsSignPairGen[ ](N M SID , SN M S )
Step MASKA1: HSA computes BDSPi masked iden- {
0
NMS Input: NMS’s identity and NMS’s private key
tity MT U = h(BDSPID ||d|| HSAID || RBDSP ). Output: Two ECC points µ1N M S and µ2N M S
Further, HSA generates two pseudo-random numbers Domain Parameters: H2 (·), H1 (·), G, q
0 1. Choose a pseudo-random number RN S ∈ Z∗q
(λ1BDSP and λ2BDSP ) utilizing MT U and different 2. Compute µ1N M S := RN M S · G
pre-loaded security domain parameters as the input to 3. Compute µ2N M S d := RN M S · H2 (H1 (N M SID )) + SN M S ·
a function say bdspSignPairGen[ ](·) (see Figure 4). 0 H2 (µ1N M S ) (mod q)
Step MASKA2: HSA forms the message M1 = {MT U , 4. return RN M S , µ1N M S , µ2N M S
}
λ1BDSP , λ2BDSP , N M SID , CertBDSP } and sends it
to N M S via a public channel. d Note: N M S digitally sign on both N M S 1
ID and µN M S using its
Step MASKA3: After receiving M1 , N M S searches its private key SN M S .
0
database to check the existence of MT U . If it finds the
same then N M S generates two pseudo-random num- FIGURE 5: N M S’s signature generation process
bers (µ1N M S , µ2N M S ) by taking N M SID and other do-
main parameters as the input to a function say nmsSign-
PairGen[ ](·) (see Figure 5). Step MASKA7: CM S constructs a message M3 =
Step MASKA4: N M S constructs a message M2 = {CM SID , N M SID , CertCM S , E(KCM S,N M S :
0

{N M SID , CM SID , λ2BDSP , E(KCM S,N M S : [λnew new
BDSP ]), E(BDSPP W D : [MT U , µN M S ])} and
0
[MT U , µ2N M S ])} and sends it to CM S via a public sends the same to N M S.
channel. Step MASKA8: N M S verifies the legitimacy of both
Step MASKA5: After receiving the message M2 , CM S BDSP and CM S utilizing the bcmsVerification(·)
0
searches both MT U and N M SID in its database. function shown in Figure 7. If the function
If both are exists then CM S understands that both bcmsVerification(·) returns “Accept", then N M S
BDSP and N M S are legitimate parties, and goto Step construct a session key SK = RN M S · λ1BDSP =
MASKA6; otherwise, rejects N M S’s request. RN M S 0 · RBDSP · G and a message M4 =
Step MASKA6: CM S loads both BDSP ’s pass- {MT U , N M SID , µ1N M S , CertCM S , CertN M S ,
0
phrase (KCM S,BDSP ) and N M S’s secret identity E(BDSPP∗ W D : [MT U , µnew N M S ]}. N M S sends
(SIDCM S,N M S ) from its database. CM S modifies M4 to BDSP , and goto Step MASKA9; otherwise, it
both BDSP ’s and N M S’s partial signatures (i.e., rejects BDSP ’s request.
µ2N M S and λ2BDSP ) using a function cmodifiedSign- Step MASKA9: After getting the message M4 , BDSP
PairGen[ ](·) (see Figure 6) as (1) µnew 1 2
N M S = µN M S +
verifies both N M S and CM S utilizing the following
SCM S · H2 (H1 (KCM S,BDSP )) (mod q) = RN M S · function ncmsVerification(·) (see Figure 8). If the
H2 (H1 (N M SID ))+ SN M S · H2 (µ1N M S )+ SCM S · H2 function ncmsVerification(·) returns “Accept" then
(H1 (KCM S,BDSP )) (mod q) and (2) λnew BDSP
2
= BSDP receives N M S’s response and constructs
λ2BDSP +SCM S ·H2 (H1 (SIDCM S,N M S )) (mod q) = a session key SKBDSP,N M S = RBDSP · µ1N M S =
0
RBDSP · H2 (H1 (MT U ) +SBDSP · H2 (λ1BDSP ) RBDSP ·RN M S ·G otherwise; rejects N M S’s response.
+SCM S · H2 (H1 (SIDCM S,N M S )) (mod q) and goto
Step MASKA7. Proof of correctness – In order to verify the legitimacy of
1 Note: CM S digitally sign on the messages say µ2
both BDSP and CM S, N M S needs to check λnew BDSP ·
N M S , KCM S,N M S G = λ2BDSP · G + QCM S · H2 (H1 (SIDN M S,CM S )). To
and µ1N M S using CMS’s private key SCM S .
2 Note: CM S digitally sign on the messages say MT U 0 , satisfy the verification condition, it must holds λ2BDSP ·
0
SIDCM S,N M S and λ1BDSP using its private key SCM S . G = λ1BDSP · H2 (H1 (MT U )) +QBDSP · H2 (λ1BDSP ) =
14 VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

0
function cmodifiedSignPairGen[ ](λ2BDSP , E(KCM S,N M S : function ncmsVerif(E(BDSPP∗ W D : [MT U , µnew 1
N M S ]), µN M S ,
[N M SID , µ2N M S ])) CertN M S , CertCM S )
{ {
Input: Ci ’s partial signature and encrypted NMS’s partial signature Input: Encrypted CM S’s signed message for BDSP , N M S’s
Output: Both Ci ’s and NMS’s modified signature random nonce, N M S’s certificate, CM S’s certificate
Domain Parameters: H2 (·), H1 (·), G, q Output: Accept or Reject
Data: BDSPP∗ W D , KCM S,BDSP , KCM S,N M S , SIDN M S,CM S , Domain Parameters: H2 (·), H1 (·), G, q
0
N M SID , MT U
0
Data: BDSPP∗ W D , KCM S,BDSP , N M SID , MT U , λ1BDSP ,
00 0 0 2
λBDSP , CBDSP , RBDSP
if(MT U = MT U && N M SID = N M SID )
{ 1. Retrieve QCM S from CertCM S
1. µnew 2 /* CM S’s public key obtained from CM S’s certificate
N M S = µN M S + SCM S · H2 (H1 (KCM S,BDSP )) */
(mod q) 2. Retrieve QN M S from CertN M S
2. λnew 2
BDSP = λBDSP + SCM S · H2 (H1 (SIDCM S,N M S )) /* N M S’s public key obtained from N M S’s certificate
(mod q) */ 0
0 3. Retrieve µnew ∗ U , µnew
3. return µnew new N M S from E(BDSPP W D : [MT N M S ])
N M S , λBDSP , MT U 0
} /* Decrypt E(BDSPP∗ W D : [MT U , µnew N M S ])
else using BDSP − CM S’s shared secret key */
{ 4. Compute T empB new
4. return null // reject BDSP and N M S // 1 := µN M S · G
} 5. Compute T empB 1
2 := Y1 · µN M S + Y2 · QN M S + Y3 · QCM S
f
} if(T empB 1 = T emp B)
2
{
6. Compute SK := RBDSP · µ1N M S := RBDSP · RN M S · G
FIGURE 6: Signature updation process into CM S 7. return Accept //accept BDSP and CM S //
}
else
function bcmsVerif(E(KCM S,N M S : [λnew BDSP ]), CertBDSP ,
{
CertCM S ) 8. return Reject // reject BDSP and CM S //
{ }
Input: Encrypted CM S’s signed message for N M S, BDSP ’s }
certificate,
CM S’s certificate f Y = H (H ( N M S 1
Output: Accept or Reject 1 2 1 ID )), Y2 = H2 ( µN M S ) and Y3 = H2 (H1 (
Domain Parameters: H2 (·), H1 (·), G, q KCM S,BDSP ))
0
Data: KCM S,N M S , SIDN M S,CM S , N M SID , MT U , λ1BDSP ,
λ2BDSP , CertN M S , RN M S FIGURE 8: N M S and CM S verification process at BDSP
1. Retrieve QCM S from CertCM S
/* CM S’s public key obtained from CM S’s certificate
*/
2. Retrieve QBDSP from CertBDSP
/* BDSP ’s public key obtained from BDSP ’s HSA starts a new session with N M S. In this regards,
certificate */
3. Retrieve λnew new
BDSP from E(KCM S,N M S : [λBDSP ]) BDSP can securely enrol all the service servers mainly
/* Decrypt E(KCM S,N M S : [λnew
BDSP ]) using
N M S − CM S’s shared secret key */ N Sj (responsible for Big Data storage services) and JTj
4. Compute T empN new
1 := λBDSP · G (responsible for Big Data processing services) with HEAP-
5. Compute T empN 2 := λ2BDSP · G + QCM S · KDC via N M S. For simplicity, in this study, we assume
H2 (H1 (SIDN M S,CM S )) e
if(T empN N that the BDSP wants to configure a Hadoop Cluster HCj
1 = T emp2 )
{ which consists of only two service servers namely (1) N Sj :
6. Compute SK := RN M S · λ1BDSP := RN M S · RBDSP · G responsible for controlling both namespace management and
7. return Accept //accept BDSP and CM S //
} Big Data storage service activities and (2) JTj : responsible
else
{ for both task assignment and Big Data processing activities.
8. return Reject // reject BDSP and CM S // To initiate a service server registration process, BDSP sends
}
} an initial enrolment request for both N Sj and JTj to N M S
e λ2 1
via secure channel (using SKBDSP,N M S ). In this regard,
BDSP · G0 = λBDSP · X1 + QBDSP · X2 , where X1 =
N M S asks CM S to provide two shared symmetric keys
H2 (H1 (MT U )) and X2 = H2 (λ1BDSP )
for N Sj and JTj . CM S sends the keys in encrypted format
as Kns,jt = E(KCM S,BDSP : [KCM S,N S ||KCM S,JT ]).
FIGURE 7: BDSP and CM S verification process at N M S
Thereafter, as a response to the BDSP ’s request, N M S
NMS NMS
sends two random numbers namely RN Sj and RJTj along
0
RBDSP · H2 (H1 (MT U )) · G +SBDSP · H2 (λ1BDSP ) · with Kns,jt to BDSP . A diagram summarizing several com-
G and QCM S · H2 (H1 (SIDN M S,CM S )) = SCM S · munication message exchanges between BDSP and N M S
H2 (H1 (SIDN M S,CM S )) · G. Similarly, to verify the legit- involved throughout the service server registration process
imacy of both N M S and CM S, BDSP needs to verify are shown in Figure 10, and it contains the following steps:
µnew 1
N M S · G = Y1 · µN M S + Y2 · QN M S + Y3 · QCM S , Step SSRG1: BDSP enters the security credentials namely
where Y1 = H2 (H1 (N M SID )), Y2 = H2 (µ1N M S ) and the identity, symmetric key and synthetic password
Y3 = H2 (H1 (KCM S,BDSP )). To satisfy the condition, it for both the service servers (i.e., N Sj and JTj ) into
must satisfies Y1 · µ1N M S = Y1 · RN M S · G, Y2 · QN M S = HSA. HSA computes the masked identities for N Sj
Y2 · SN M S · G and Y3 · QCM S = Y3 · SCM S · G. and JTj as T N SID = h(N SID ||HSAID || rss1 )
(C) Service servers’ registration phase: After establish- and T JTID = h(JTID ||HSAID || rss2 ), and the
ment of SKBDSP,N M S between BA/BDSP and N M S, masked password for the same service servers as
VOLUME 6, 2018 15
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

FIGURE 9: Summary of authenticated key agreement process between BDSP/BA and N M S


0 0
Note: Here, T1 = {CertCM S , CertN M S }, Z1 = E(KCM S,N M S : [MT U , µ2N M S ]), Z2 = E(KCM S,N M S : [λnew ∗
0
BDSP ]), Z3 = E(BDSPP W D :
[MT U , µnew
NMS ]), Operation 1 , Operation 2 , Operation 3 , Operation 4 and Operation 5 : signifies execution of the function bdspSignPairGen[ ](·),
nmsSignPairGen[ ](·), cmodifiedSignPairGen[ ](·), bcmsVerif(·) and ncmsVerif(·), respectively (refer Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8) and
OC1 , OC2 and OC3 : denotes outcome of bdspSignPairGen[ ](·), nmsSignPairGen[ ](·) and cmodifiedSignPairGen[ ](·) functions respectively.

j NMS j
SIDN S = h(N SP W D || rss1 || RN Sj ) and SIDJT = databases (after finding out the availability of masked
NMS server identity into their databases) and sends their
h(JTP W D ||rss2 ||RJTj ), respectively. Note that rss1
and rss2 are two random numbers chosen by HSA. acknowledgements to N M S.
Step SSRG2: 0HSA computes the masked identity of BDSP Step SSRG4: Upon getting the acknowledgements from
0
as MT U and construct a message M5 = {MT U , both the servers (CM S and ES), N M S checks
0 0 ?
N M SID , n3 , E(K(CM S,BDSP ) : [MT U , n3 , n3 = n3 . If the condition is satisfied then N M S con-
j structs a service servers’ (here, we consider two service
T N SID , KCM S,N S , SIDN S , T JT ID , KCM S,JT ,
0
SIDJT j
]), E(SKBDSP,N M S : [MT U , n3 ])} and servers i.e., N S and JT in a particular cluster) registra-
0
sends the same to N M S. tion completion message as M6 = {N M SID , MT U ,
Step SSRG3: After receiving the message0 M5 , n3 , E(SKBDSP,N M S : [N M SID , n3 ])} and sends it to
N M S checks the presence of MT U into BDSP .
0
Z5 = E(SKBDSP,N M S : [MT U , n3 ]) after Step SSRG5: Getting the message M6 , BDSP decrypts
decrypting the Z5 using the key SKBDSP,N M S . E(SKBDSP,N M S : [N M SID , n3 ]) and checks that
0 0 ?
If MT U exists in its database then BDSP is n3 = n3 . If the condition is satisfied then BDSP un-
treated as authentic service provider and N0 M S derstand the legitimacy of N M S and realized that the
0
broadcasts {MT U , E(K(CM S,BDSP ) : [MT U , n3 , service servers registration has been successfully ac-
T N SID , KCM S,N S , SIDN j complished with HEAP-KDC.
S , T JTID , KCM S,JT , 0

SIDJT j
])} to both CM S and ES. After receiving Step SSRG6: HSA computes N SID = N SID ⊕
0
0
E(K(CM S,BDSP ) : [MT U , n3 , T N SID , KCM S,N S , h(BDSPID || BDSPP W D ), JTID = JTID ⊕
0

SIDN j j h(BDSPID ||BDSPP W D ), N SjP W D = N SjP W D ⊕


S , T JTID , KCM S,JT , SIDJT ])}, both CM S 0
and ES decrypts it and updates their service server h(BDSPID ||BDSPP W D ), JTjP W D = JTjP W D ⊕

16 VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

FIGURE 10: Summary of service server registration process


0 j j 0
Note: Here, Z4 = E(K(CM S,BDSP ) : [MT U , n3 , T N SID , SIDN S , T JTID , SIDJT ]) and Z5 = E(SKBDSP,N M S : [N M SID , MT U , n3 ]).

h(BDSPID || BDSPP W D ), rss∗1 = rss1 ⊕ storage services utilizing HDFS and JT is solely made for
h(BDSPID ||BDSPP W D ) and rss∗2 = rss2 ⊕ providing Big Data processing services using MapReduce
h(BDSPID ||BDSPP W D ), respectively. HSA stores to the end users’. Though, we restrict our discussion with
0 0 0 0
N SID , JTID , N SjP W D , JTjP W D , rss∗1 , rss∗2 and N S only, but one could simply apply the same proposed
HCjID information into its database for future use. mechanism for JT also.
Note that HCjID signifies here as the pre-deployed Remark 2: According to the policy of the proposed scheme,
Hadoop cluster’s identity and HSA assigns a random the online enrolment process of the service servers’ (N Ss’ or
value for it. JT s’ belongs to a particular cluster (HCj )) is simple, flexible
and scalable. Because, a service provider (BDSPj ) can add
After successful registration of the service servers, BDSP or remove any number of service servers’ (N Ss’ or JT s’)
configures the service servers and its client application to or form HEAP-KDC. However, to do this, the service
(HDCLj ) for the cluster HCj . In this regard, BDSP provider needs to enroll himself with the HEAP-KDC once
stores the secret credentials offline among the correspond- and keep his user identity and password secret.
ing service servers and HDCLj . More precisely N Sj
has T N SID = h(N SID || HSAID || rss1 ), KCM S,N S 4) User registration
j NMS
and SIDN S = h(N SP W D ||rss1 || RN Sj ); JTj has Initially, a user Ci needs to register himself with ES offline
T JTID = h(JTID ||HSAID ||rss2 ), KCM S,JT and (i.e., via the out-of-band channel or postal network) by giving
j NMS
SIDJT = h(JTP W D ||rss2 || RJT j
), and HDCLj has his identity proof, address proof and service subscription
both T N SID = h(N SID || HSAID || rss1 ) and T JTID = payment documents. This entails Ci to avail a dummy user-
h(JTID ||HSAID ||rss2 ) information, respectively. Thus, id (DCID ), dummy password (DP W D ) and a one-time key
completes the HCj ’s deployment process. Finally, BDSP (K(N M S,C) ) via the out-of-band channel to his physical
make the HCj online for providing the Big Data storage and address. Before sending these three security credentials to Ci ,
processing services to the end users’. ES securely sends DCID and DP W D to CM S for creating
Remark 1: In the service server registration phase, we present a dummy account of Ci . ES also sends securely the one-
an enrolment strategy considering two service servers (Na- time key K(N M S,C) to N M S. Note that Ci needs to use
menode Server (N S) and Job Tracker (JT )) belong to a par- DCID , DP W D and K(N M S,C) only for once to create his
ticular Hadoop cluster (HCj ). But, for simplicity and better own user profile into N M S with the help of CM S. But, Ci is
understanding, throughout this paper, we consider only one permissible to use the key K(N M S,C) (we also call it as pass-
service server to describe the other phases of the proposed phrase) at the time of session key formation and password
protocol. Note that N S is responsible for providing Big Data updation tasks. Figure 11 summarizes the user registration
VOLUME 6, 2018 17
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

Ci with HCAj CM S N M S and ES

Input: DCID , DP W D , K(N M S,C) CM S


Input: CM SID , RC Input: K(N M S,C)
i

- Compute
UDtl = E(DP W D :
[DCID , n1 ])
msgC1 = {CM SID , n1 , UDtl }
−−−−−−−−−−−−−−−−−−−−−−−→
0 ?
-Check DC = DCID
ID
msgC2 = {N M SID , CM SID , n1 }
←−−−−−−−−−−−−−−−−−−−−−−−−−− −
- Select a random secret ri
- Compute masked identity
T U = h(CID ||ri ||HCAID )
msgC3 = DP T SAP − IKE : {T U }
−−−−−−−−−−−−−−−−−−−−−−−−−−−−→
- Symmetric key establishment
CM S ])}
msgC4 = DP T SAP − IKE : {E(SK(CM S,C) : [RC
←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−i−−−−−
- Symmetric key establishment
CM S
- Decrypt RC i
- Compute
MT U = h(CID ||ri ||HCAID ||RC CM S )
i
and
0

UDtl = E(K(N M S,C) : [P W D , MT U , EIDC , M BN OC ])
0
msgC5 = {CM SID , MT U , n2 , UDtl }
−−−−−−−−−−−−−−−−−−−−−−−−−−− −−→
- Keep
CM S
MT U and RCi
0
msgC5.1 = {CM SID , MT U , UDtl }

−−−−−−−−−−−−−−−−−−−−−−−−−−− →
- Decrypt password
- Store
MT U and P W D∗
msgC5.2 = {N M SID , ESID , MT U }
←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
msgC6 = {CM SID , n2 , MT U }
←−−−−−−−−−−−−−−−−−−−−−−−− −
- Compute
ri∗ = ri ⊕ h(CID ||P W D)

CM S = RCM S ⊕ h(r ||P W D)
RCi Ci i

EI = h(CID ||P W D)

K(N CM S )
= K(N M S,C) ⊕ h(P W D||RC
M S,C) i

U SP Wi = h(CID ||HCAID ||P W D||ri )



- HCAj stores ri∗ , RC
CM S , EI, K ∗
i (N M S,C)
and U SP Wi

into his database for future use

FIGURE 11: Summary of user registration

phase and contains the following steps: of “secret key establishment" process [10]. Further,
both Ci ’s request and CM S’s response messages using
Step URG1: Ci enters DCID and DP W D into HCAj . DPTSAP [10] are represented as msgC3 and msgC4
HCAj generates a random nonce say n1 . HCAj respectively in Figure 11.
encrypts DCID and n1 using DP W D as UDtl = Step URG5: HCAj asks Ci to give his registered mobile
E(DP W D : [DCID , n1 ]). HCAj sends the message number (M BN OC ) and a valid email id. (EIDC ).
msgC1 = {CM SID , n1 , UDtl } to the CM S. After taking M BN OC and EIDC from Ci , HCAj
Step URG2: After receiving msgC1 , CM S decrypts UDtl encrypts MT U, P W D∗ M BN OC and EIDC us-
and checks the availability of DCID in its database. 0
ing KN M S,C as UDtl = E(K(N M S,C) : [P W D∗ ,
If it exists then CM S acknowledge with a message MT U, M BN OC ]), where MT U = h(CID ||ri
msgC2 = {N M SID , CM SID , n1 } to Ci and Ci is ||HCAID || RC CM S
). HCAj sends the message
i
permissible to create his own profile into N M S, and 0
msgC5 = {CM SID , MT U, n2 , UDtl } to the CM S.
goto Step URG3 else, reject Ci ’s request.
Step URG6: After receiving msgC5 , CM S broadcasts both
Step URG3: Ci enters a new user-id CID in HCAj . HCAj 0
MT U and UDtl to both ES and N M S servers re-
chooses a random secret ri and transforms the CID as CM S
spectively. CM S keeps only MT U and RC in its
T U = h(CID ||ri ||HCAID ). Using this transformed i
database and deletes other information, wherein, after
identity T U, both Ci and CM S establishes a shared 0
decrypting UDtl , both ES and N M S stores MT U,
secret key say SK(CM S,C) between themselves using
EIDC , M BN OC and P W D∗ into their corresponding
the similar analogy proposed as “Initial key establish-
databases. Finally, CM S sends a registration confir-
ment between Ci and F EAS" in DPTSAP [10]. It is
mation message to Ci through HCAj and goto Step
worth noting that HCAID is the public identity of client
URG7.
application and it is known to HCAj .
Step URG7: HCAj computes ri∗ = ri ⊕ h(CID
Step URG4: Ci enters his password P W D into HCAj . CM S ∗ CM S
||P W D), RC = RC ⊕h(ri ||P W D), EI =
HCAj computes the masked password P W D∗ = i

i

CM S h(CID || P W D), K(N M S,C) = K(N M S,C) ⊕h(P W D


h(h(ri || P W D)|| RC ) and go to Step URG5. Here, CM S
CM S
i
|| RC ) and U SP Wi = h(CID ||HCAID ||P W D
RCi was generated by CM S and it has been de- i
||ri ), and then stores these information into HCAj ’s
livered securely using the key SK(CM S,C) at the time
18 VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

database. HCAj will use these information at the time SC · G. Similarly, SCM S and SN M S are private keys for
of user authentication and password updation phases. CM S and N M S and the corresponding public keys are
After successful registration of client Ci with HEAP- QCM S = SCM S · G and QN M S = SN M S · G, respec-
KDC, Ci needs to remember only two parameters CID and tively. Here, SC , SCM S and SN M S are belongs to Zq ∗ . All
P W D to access the services from any on-demand service these parties keep their private keys (i.e., SC , SCM S and
server (either N Sj or JTj ) belongs to a particular Hadoop SN M S ) secret, but disseminate the public keys QC , QCM S
cluster say HCj followed by a user login and authenticated and QN M S using CertE . The proposed key agreement and
key agreement phases discussed in the following sections. mutual authentication task between Ci and CM S are sum-
Thus, the proposed protocol is user friendly in nature. marized in Figure 12.
The detail steps involved in this process are discussed as
5) User login at workstation follows.
After the completion of user registration process, Ci tries to Step DKA1: HCAj computes Ci masked identity MT U =
login into the system using the client application HCAj . CM S
h(CID ||ri ||HCAID || RC i
). Further, HCAj gener-
Before initiating user login task into Ci ’s workstation, 1 2
ates two pseudo-random numbers (ψC i
and ψC i
) utiliz-
HCAj loads the user’s masked identities (EI i , where i = ing MT U and other pre-loaded security domain param-
1, 2, · · · , n) into browser cookies. The client-side login pro- eters as the input to a function say cSignPairGen[ ](·)
cess contains the following steps: (see Figure 13).
Step UL1: Ci enter his original identity CID and pass- Step DKA2: HCAj construct a message M C1 = {MT U,
1 2
word P W D into HCAj . HCAj computes EI ∗ = CM SID , ψC i
, ψC i
, CertC } and sends it to CM S via
h(CID ||P W D) and checks this entry exists in the cook- a public channel.
CM S ∗
ies or not. If it exists then HCAj loads ri∗ , RC i
and Step DKA3: After receiving M C1 , CM S searches its
U SP Wi entries for the same Ci . HCAj computes ri = database to check the existence of MT U. If it finds
CM S ∗
ri∗ ⊕ h(CID ||P W D) and RC CM S
i
= RC i
⊕ h(ri the same then CM S generates two pseudo-random
1 2
||P W D) and goto Step UL2. Else, client can repeat Step numbers (πCM S and πCM S ) by taking CM SID and
UL1 with other user identity and password. other domain parameters as the input to a function say
Step UL2: HCAj computes U SP Wi∗ = h(CID || HCAID cmsSignPairGen[ ](·) (see Figure 14).
|| P W D || ri ) and checks if the condition U SP Wi∗ = Step DKA4: CM S constructs a message M C2 =
2
U SP Wi holds or not. If it holds, Ci is treated as {CM SID , N M SID , MT U, ψC i
, E(KCM S,N M S :
2
authentic user. [CM SID || πCM S ])} and sends it to N M S via a public
After logged in into the client system, Ci needs to select channel.
the j th cluster say HCj from a list of Hadoop clusters Step DKA5: After receiving the message M C2 , N M S
(HC1 , HC2 , · · · , HCj , · · · , HCn ) for storing or processing searches both MT U and CM SID in its database. If
the Big Data. According to the selection, HCAj loads the both are exists then N M S understands that both Ci
respective client application say HDCLj belongs to the and CM S are legitimate parties, and goto Step DKA6;
HCj into the client workstation. In order to store or process otherwise, rejects CM S’s request.
Ci ’s Big Data securely, HCAj initiate a single sign-on and Step DKA6: N M S loads both Ci ’s pass-phrase (KN M S,C )
session key establishment task between Ci and HEAP-KDC and N M S’s secret identity (SIDCM S,N M S ) from its
as follows. database. N M S modifies both Ci ’s and CM S’s partial
2 2
signatures (i.e., πCM S and ψCi ) using a function nmod-

6) Single sign-on and dynamic key establishment


ifiedSignPairGen[ ](·) (see Figure 15) as (1) Vcnew 3 =
2
A two-server based Single sign-on (SSO) and Dynamic Key
πCM S + SN M S · H2 (H1 (KN M S,C )) (mod q) =
1
RCM S · H2 (H1 (CM SID ))+ SCM S · H2 (πCM S )+
Agreement (DKA) process is proposed to access Big Data
SN M S · H2 (H1 (KN M S,C )) (mod q) and (2)
storage services from a remote Namenode Server (N Sj ) 2
Vcmsnew 4 = ψC + SN M S · H2 (H1 (SIDCM S,N M S ))
(or access Big Data processing services from a remote Job i
1
(mod q) = RC · H2 (H1 (MT U) +SC · H2 (ψC )
Tracker (JTj )) of a specified Hadoop cluster say HCj . The i
+SN M S · H2 (H1 (SIDCM S,N M S )) (mod q) and goto
proposed two-server based SSO and dynamic key agreement
Step DKA7.
process establishes a dynamic key between Ci and HEAP-
Step DKA7: N M S constructs a message M C3 =
KDC followed by a mutual authentication process. The detail
{N M SID , CM SID , E(KCM S,N M S : [CM SID ||
steps involved in this tasks are discussed as follows.
Vcmsnew ]), MT U, E(P W D∗ : [MT U ||Vcnew ]),
Note here, the proposed SSO and dynamic key estab-
CertN M S } and sends the same to CM S.
lishment between Ci and CM S are based on ECC and
AES. The known pre-deployed security domain parameters 3 Note: N M S digitally sign on the messages say π 2
among HCAi , CM S and N M S are G, q, H1 (·) and H2 (·), CM S , KN M S,C and
1
πCM S using NMS’s private key SN M S .
respectively. Initially, HCAi selects a private key SC for 4 Note: N M S digitally sign on the messages say MT U ,
client Ci and computes the corresponding public key QC = SIDCM S,N M S and ψC 1 using its private key S
i
NMS.

VOLUME 6, 2018 19
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

FIGURE 12: Summary of mutual authentication and key establishment process between Ci and CM S
0 0
Note: T1 = {CertCM S , CertN M S }, Z4 = E(P W D∗ : [MT U ||Vcnew ]), Z5 = E(KCM S,N M S : [CM SID ||πCM 2
S ]), Z6 = E(KCM S,N M S :
[CM SID ||Vcmsnew ]), Opi , where i = {1, 2, 3, 4, 5}: signifies the execution of the function cSignPairGen[ ](·), cmsSignPairGen[ ](·), nmodifiedSignPair-
Gen[ ](·), cnmsVerif(·) and cmnmVerif(·), respectively (refer Figure 13, Figure 14, Figure 15, Figure 16 and Figure 17) and Oj , where j = {1, 2, 3}: denotes
the outcome of cSignPairGen[ ](·), cmsSignPairGen[ ](·) and nmodifiedSignPairGen[ ](·) functions.

Step DKA8: CM S verifies the legitimacy of both Ci function cSignPairGen[ ](MT U , SC )


{
and N M S utilizing the cnmsVerif(·) function shown Input: Client’s masked identity and Client’s private key
1 and ψ 2
Output: Two ECC points as ψC
in Figure 16. If the function cnmsVerif(·) returns “Ac- i Ci
Domain Parameters: H2 (·), H1 (·), G, q
cept", then CM S construct a session key SKC,CM S = ∗
Data: P W D , KN M S,C , CM SID , MT U
RCM S · ψC 1
= RCM S · RC · G and a message 1. Selects a pseudo-random number RC ∈ Z∗q
i
2. Compute ψC 1 := R · G
1 ∗ C
M C4 = {MT U, CM SID , πCM S , E(P W D : i
2 g := R · H (H (MT U )) + S · H (ψ 1 )
3. Compute ψC C 2 1 C 2 Ci
[MT U ||Vcnew ]), CertN M S , CertCM S }. CM S sends (mod q)
i

M C4 to Ci , and goto Step DKA9; otherwise, it rejects 4. return RC , ψC 1 , ψ2


i Ci
Ci ’s request. }
Step DKA9: After getting the message M C4 , Ci ver- g Note: C digitally sign on both MT U and ψ 1 using its private key
i Ci
ifies both N M S and CM S utilizing the func- SC .
tion cmnmVerif(·) (see Figure 17). If the function
cmnmVerif(·) returns “Accept" then Ci constructs a ses- FIGURE 13: Ci ’s signature generation process
1
sion key SKC,CM S = RC · πCM S = RCi · RCM S · G
otherwise; rejects N M S’s response.
After establishment of the dynamic key SKC,CM S 2 1
the verification condition, it must holds ψC i
· G = ψC i
·
between Ci and CM S, HCAj allows Ci to move further 1
H2 (H1 (MT U)) +QC ·H2 (ψCi ) = RC ·H2 (H1 (MT U))·G
1
for requesting service server ticket as follows. +SC · H2 (ψC i
) · G and QN M S · H2 (H1 (SIDN M S,CM S )) =
SN M S · H2 (H1 (SIDN M S,CM S )) · G. Similarly, to verify
Proof of correctness – In order to verify the legitimacy of the legitimacy of both N M S and CM S, Ci needs to verify
1
both Ci and N M S, CM S needs to check Vcmsnew · G = Vcnew · G = Y11 · πCM S + Y22 · QCM S + Y33 · QN M S .
2 1
ψC i
· G + QN M S · H2 (H1 (SIDN M S,CM S )). To satisfy To satisfy the condition, it must satisfies Y11 · πCM S =

20 VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

function cmsSignPairGen[ ](CM SID , SCM S ) function cnmsVerif(E(KCM S,N M S :


{ [CM SID , Vcmsnew ]), CertC , CertN M S )
Input: CM S’s identity and CM S’s private key {
Output: Two ECC points πCM 1 2
S and πCM S Input: Modified and encrypted Ci ’s partial signature, Ci ’s certificate,
Domain Parameters: H2 (·), H1 (·), G, q N M S’s certificate
Data: KCM S,N M S , SIDN M S,CM S , N M SID , CM SID , MT U Output: Accept or Reject
1. Choose a pseudo-random number RCM S ∈ Z∗q Domain Parameters: H2 (·), H1 (·), G, q
1 Data: KCM S,N M S , SIDN M S,CM S , N M SID , MT U , ψC 1 ,
2. Compute πCM S := RCM S · G 2 , Cert
i
2 h ψC , R
3. Compute πCM S := RCM S · H2 (H1 (CM SID )) + SCM S · i
CM S CM S
1
H2 (πCM ) (mod q) 1. Retrieve QN M S = SN M S · G from CertN M S
S /* N M S’s public key obtained from N M S’s certificate
1
4. return RCM S , πCM 2
S , πCM S */
} 2. Retrieve QC from CertC
/* Ci ’s public key obtained from Ci ’s certificate */
h Note: CM S digitally sign on both CM S 1
ID and πCM S using its
3. Retrieve Vcmsnew from E(KCM S,N M S : [Vcmsnew ])
private key SCM S . /* Decrypt E(KCM S,N M S : [Vcmsnew ]) using
N M S − CM S’s shared secret key */
4. Compute T emp1 CM S := Vcmsnew · G
FIGURE 14: CM S’s signature generation process 5. Compute T empCM 2
S := 2
ψC i
· G + QN M S ·
H2 (H1 (SIDN M S,CM S )) i

2 , E(K
if(T empCM 1
S = T empCM S )
2
function nmodifiedSignPairGen[ ](ψC i
CM S,N M S : {
6. Compute SK := RCM S · ψC 1 := R
[CM SID , πCM 2
S ])) i
CM S · RC · G
{ 7. return Accept //accept Ci and N M S //
Input: Ci ’s partial signature and encrypted N M S’s partial signature }
Output: Both Ci ’s and N M S’s modified signature else
Domain Parameters: H2 (·), H1 (·), G, q {
Data: P W D∗ , KN M S,C , KCM S,N M S , SIDN M S,CM S , 8. return Reject // reject Ci and N M S //
N M SID , CM SID , MT U }
0 }
if(MT U ∗ = MT U && CM SID = CM SID )
{ i ψ 2 · G = ψ 1 · X + Q · X , where X
2 11 = H2 (H1 (MT U )),
1. Vcnew = πCM S + SN M S · H2 (H1 (KN M S,C )) (mod q) Ci Ci 11 C 22
2 + S X22 = H2 (ψC 1 ) and Q = S · G.
2. Vcmsnew = ψC i
N M S · H2 (H1 (SIDCM S,N M S )) i
C C
(mod q)
3. return Vcnew , Vcmsnew // accept Ci and CM S //
} FIGURE 16: Ci and N M S verification process at CM S
else
{
4. return null // reject Ci and CM S //
} function cmnmVerif(E(P W D∗ : [MT U , Vcnew ]), πCM 1
S,
} CertN M S , CertCM S )
{
Input: Modified and encrypted CM S’s partial signature, CM S’s
random
FIGURE 15: Signature updation process at N M S nonce, N M S’s certificate, CM S’s certificate
Output: Accept or Reject
Domain Parameters: H2 (·), H1 (·), G, q
Data: P W D∗ , KC,CM S , N M SID , MT U , ψC 1 ,
i
Y11 · RCM S · G, Y22 · QCM S = Y22 · SCM S · G and ψC2 , Cert , R
C C
i
Y33 · QN M S = Y33 · SN M S · G. 1. Retrieve QCM S from CertCM S
/* CM S’s public key obtained from CM S’s certificate
*/
2. Retrieve QN M S from CertN M S
7) Big Data storage service server ticket granting /* N M S’s public key obtained from N M S’s certificate
*/
In this process, Ci requests for a Big Data storage service 3. Retrieve Vcnew from E(P W D∗ : [MT U , Vcnew ])
ticket from CM S. Before doing so, HCAj provides a drop- /* Decrypt E(P W D∗ : [MT U , Vcnew ])
using Ci − CM S’s shared secret key */
down list from which Ci needs to select a particular Hadoop 4. Compute T empC 1 := Vcnew · G
cluster say HCj . After selecting the cluster, HCAj auto- 5. Compute T empC 1
2 := Y11 · πCM S + Y22 · QCM S + Y33 · QN M S
j
matically choose a Namenode Server (N Sj ) for client Ci . if(T empC C
1 = T emp2 )
Finally, HCAj initiates the service server ticket granting {
6. Compute SK := RC · πCM 1
process. A diagram summarizing the message exchanges S := RC · RCM S · G
7. return Accept //accept CM S and N M S //
between Ci and CM S involved throughout the service ticket }
else
granting process is shown in Figure 18, and it contains the {
following steps: 8. return Reject // reject CM S and N M S //
}
Step SSTG1: HCAj constructs a message M C5 = }
2
{MT U, T N SID , E(SKC,CM S : [MT U, n4 , ψC i
])} jY 1
11 = H2 (H1 (CM SID )), Y22 = H2 (πCM S ) and Y33 =
into Ci ’s workstation. HCAj sends M C5 to CM S. H2 (H1 (KC,CM S )).
Step SSTG2: After getting M C5 , CM S checks the pres-
2
ence of both MT U and ψC i
in its system cache. If both FIGURE 17: N M S and CM S verification process in Ci ’s
exists then CM S constructs two tokens (namely Client workstation.
Token (CT ) and Namenode Server Token (N ST ))
namely CT = Ctoken = E(SK(C,CM S) : [CM SID ,
T N SID , OT KCM S,C , n4 ]) and N ST = N Stoken = goto Step SSTG3. CM S issues a random nonce namely
E(KCM S,N S : [CM SID , MT U, OT KCM S,N S ) and OT KCM S,C for N Sj and encapsulate the same into
VOLUME 6, 2018 21
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

FIGURE 18: Summary of service ticket granting and session key agreement

Note: Here, Ctoken = E(SK(C,CM S) : [CM SID , T N SID , OT KCM S,C , n4 ]), Z7 = E(SKC,CM S : [MT U , n4 , ψC 2 ]), N S
i
token =
j
E(KCM S,N S : [CM SID , MT U , OT KCM S,N S ), OT KCM S,C = h(OT KCM S,N S || SIDN S ), Operation 6 , Operation 7 , Operation 8 , and
Operation9 : signifies the execution of the function cModifiedSignature(·), ccmsVerif(·), nsSigPairGen[ ](·) and nscmsVerif(·) (refer Figure 19, Figure 20,
Figure 21 and Figure 22), O4 and O5 : denotes the outcome of cModifiedSignature(·) and nsSigPairGen[ ](·) functions and OT KCM S,N S is a fresh nonce
issued by CM S.

N ST wherein CM S computes another random nonce 8) Session key agreement with service server
j
OT KCM S,C = h(OT KCM S,N S ||SIDN S ) and en- After receiving the service server token (N Stoken ) from
close it into CT . CM S, HCAj initiate a session key agreement between Ci
Step SSTG3: CM S sends a message as M C6 = and N Sj . A diagram summarizing the communication mes-
{MT U, T N SID , Ctoken , N Stoken } to Ci by acknowl- sage exchanges between Ci and N Sj involved throughout the
edging Ci ’s request message (M C5 ) via a public chan- session key agreement process is shown in Figure 18, and it
nel. contains the following steps:
Step SSTG4: Upon receiving the message M C6 , Ci checks
both nonces (that is n4 ∈ {Z7 , Ctoken }) are equals or Step SKABSSA1: HCAj decrypts the Ctoken and extracts
not (see Figure 18). If both are equals then Ci accepts OT KCM S,C . HCAj computes a modified signature
1
CM S’s response else, rejects CM S’s response. pair as {ψC i
, Vcc } using cModifiedSignature(·) function,
where Vcc = RC · H2 (H1 (MT U ||OT KCM S,C ))
1
Finally, after getting both the tokens (CT and N ST ) from +SC · H2 (ψC i
) (mod q) (see Figure 19) and goto Step
CM S, HCAj initiates a session key agreement process SKABSSA2.
with the Namenode Server (N Sj ) followed by a mutual Step SKABSSA2: HCAj constructs a message M C7 =
1
authentication process discussed below. {MT U, T N SID , ψC i
, Vcc , N Stoken , CertC } and
send the same to N Sj via a public channel.
Step SKABSSA3: After getting M C7 , N Sj decrypts
22 VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

function cModifiedSignature(OT KCM S,C , SC ) function nscmsVerif(Ctoken , Pns , Vns , CertN S )


{ {
Input: CM S’s one-time key, Ci ’s private key Input: CM S’s issued authorization token, signature pair of N Sj ,
Output: Ci ’s modified signature pair N Sj ’s certificate
Domain Parameters: H2 (·), H1 (·), G, q Output: Accept or Reject
Data: ψC 1 , ψ2 , T N S
Ci ID , MT U , N Stoken , CertC , RC Domain Parameters: H2 (·), H1 (·), G, q
i
1. Compute Vcc := RC · H2 (H1 (MT U ||OT KCM S,C )) + SC · Data: T N SID , MT U , OT KCM S,C
1 ) (mod q) k 1. Retrieve QN S from CertN S
H2 (ψC i /* N Sj ’s public key obtained from N Sj ’s certificate */
1 ,V 0
2. return ψC 2. Compute T empC 1 0 := Vns · G
i
cc
} 3. Compute T empC 2 := Pns · H2 (H1 (T N SID ||OT KCM S,C )) +
QN S · H2 (Pns )
k Note: Here, C digitally sign on MT U , OT K 1 0
C0
CM S,C and ψCi if(T empC 1 = T emp2 )
i
using its private key SC , where OT KCM S,C = h(OT KCM S,N S || {
SIDN j 4. Compute SK := h(RC · Pns ||OT KCM S,C ) := h(RC ·
S ). RN S · G||OT KCM S,C )
5. return Accept // accept N Sj and CM S //
FIGURE 19: Signature updation process at Ci ’s workstation }
else
{
1 , V , Cert )
6. return Reject // reject N Sj and CM S //
function ccmsVerif(N Stoken , ψC i
cc C }
{ }
Input: CM S’s issued authorization token, Modified signature pair of
Ci , Ci ’s certificate
Output: Accept or Reject FIGURE 22: N Sj ’s signature verification and key formation
Domain Parameters: H2 (·), H1 (·), G, q
Data: KCM S,N S , T N SID , SIDN j at Ci ’s workstation
S
1. Retrieve QC from CertC
/* Ci ’s public key obtained from Ci ’s certificate */
2. Retrieve OT KCM S,N S from N Stoken
/* Decrypt N Stoken using CM S − N S’s shared secret key Ci ’s request.
*/
j
3. Compute OT KCM S,C := h(OT KCM S,N S ||SIDN S) Step SKABSSA4: N Sj computes OT KCM S,C =
4. Compute T empN 1
S := V
cc · G h(OT KCM S,N S ||SIDN j
) and verifies the legitimacy
S
5. Compute T empN 2
S := ψ 1 · H (H (MT U ||OT K
2 1 CM S,C )) +
1 )
Ci of both Ci and CM S by checking the following condi-
QC · H2 (ψC 1
i
0 tion: Vcc · G = ψC · H2 (H1 (MT U ||OT KCM S,C ))
if(MT U = MT U && T empN 1
S = T empN S )
2 1
i
{ +QC · H2 (ψCi ) (see Figure 20). If the condition is
6. Choose a pseudo-random number RN S ∈ Z∗q satisfied then N Sj realizes the legitimacy of client
7. Compute SK := h(RN S ·ψC 1 ||OT K
CM S,C ) := h(RN S ·
RC · G||OT KCM S,C )
i Ci and computes a session key SKC,N S = h(RN S ·
1
8. return Accept // accept Ci and CM S // ψC i
||OT KCM S,C ) = h(RN S · RC · G||OT KCM S,C ),
}
else and goto Step SKABSSA5, else reject Ci ’s request.
{ Step SKABSSA5: N Sj computes a signature pair as
9. return Reject // reject Ci and CM S //
} {Pns , Vns } using nsSigPairGen[ ](·) (see Figure 21).
}
N Sj constructs a message M C8 = {T N SID , MT U,
Pns , Vns , CertN S } and goto Step SKABSSA6.
FIGURE 20: Ci and CM S verification and key formation at
Step SKABSSA6: N Sj sends M C8 to Ci via a public
N Sj server
channel.
function nsSigPairGen[ ](T N SID , SN S , OT KCM S,C )
Step SKABSSA7: Upon receiving the message M C8 , Ci
{ checks the legitimacy of both N Sj and CM S by sub-
Input: N Sj ’s masked identity, N Sj ’s private key, CM S’s issued
random nonce stantiating the following condition: Vns · G = Pns ·
Output: Two ECC points as Pns and Vns H2 (H1 (T N SID ||OT KCM S,C )) +QN S ·H2 (Pns ) (see
Domain Parameters: H2 (·), H1 (·), G, q
Data: T N SID , CertN S , RN S nscmsVerif(·) function in Figure 22). If the condition is
1. Compute Pns := RN S · G satisfied then Ci realizes the legitimacy of both service
2. Compute Vns := RN S · H2 (H1 (T N SID ||OT KCM S,C )) +
SN S · H2 (Pns ) (mod q) m server N Sj and CM S, and computes a session key
3. return Pns , Vns SKC,N S = h(RC · Pns ||OT KCM S,C ) = h(RC ·
}
RN S · G||OT KCM S,C ), and goto Step SKABSSA8,
m Here, N S digitally sign on both T N S
j ID , OT KCM C,C and Pns otherwise; reject Ci ’s request.
using its private key SN S .
After establishment of the session key, HCAj redirects
FIGURE 21: Signature pair generation process at N Sj ’s client Ci to HDFS Client application instance say HDCLj
server to store (fresh write operation) or append or read Big Data
under the supervision of N Sj . By utilizing the aforesaid
process, Ci could establish a secure channel with JTj for
?
N Stoken and check MT U 0 = MT U utilizing processing Big Data.
ccmsVerif(·) function (see Figure 20). If it verifies
successfully then N Sj understands the legitimacy of Proof of correctness – In order to verify the legitimacy
client Ci and goto Step SKABSSA4; otherwise, reject of both Ci and CM S, N Sj needs to check Vcc · G =
VOLUME 6, 2018 23
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

1 1
ψC i
· H2 (H1 (MT U ||OT KCM S,C ))+ QC · H2 (ψC i
). To precisely, the data integrity of file F1 has been preserved),
1
satisfy the verification condition, it must holds ψCi = RC · G otherwise; Ci set the flag=“false". If the flag variable returns
and QC = SC · G. Similarly, to verify the legitimacy false for any j th block then Ci goes through the service
of both N Sj and CM S, Ci needs to verify Vns · G = provider’s Service Level Agreement (SLA) and takes proper
Pns · H2 (H1 (T N SID || OT KCM S,C )) +QN S · H2 (Pns ). To action, else Ci outsources map-reduce code [76] and flag to
satisfy the condition, it must satisfies Pns = RN S · G and JTj for Big Data processing as follows.
QN S = SN S · G.
11) Secure Big Data processing using MapReduce
9) Secure and integrity-assisted write or append operation in At the beginning of this phase, Ci establishes a session key
HDFS (say SKC,JT ) with JTj followed by a mutual authentication
Suppose Ci and N Sj are having the session key SKC,N S and key agreement process (see Sec. V-D8). After the key
between themselves. Now, say for instance, Ci has four files formation, Ci sends the map-reduce code [76] and the flag
(say f1 , f2 , f3 and f4 ) and each file is having 250 GB to JTj via secure channel. After getting these information,
(gigabytes) data. Ci wants to import these files from other JTj assigns different Task Trackers (TTs) or Datanodes to
external sources into HDFS, and it needs to create a Big File execute the map-reduce code for different blocks of Ci ’s
say F1 with the file size of 1 TB (terabytes). According to file (say F1 ). After successful execution of the code, JTj
the policy of HDFS, the file F1 should go through the HDFS- stores the result into Ci ’s local filesystem in encrypted format
Write task (see the basic HDFS-Write operation in Figure utilizing the key SKC,JT .
23).
12) Password change phase
In this regard, N Sj assigns several chunk servers or
Datanodes to HDFS Client (HDCLj ) for data stream- This section discuss about the password updation process of
ing via secure channel. In such a provision, HDCLj di- client Ci and BDSP ’s administrator. Assume, the client Ci
vides the file F1 into 1024×1024M B
= 16384 number of needs to change his password for security reasons. To do
64M B
blocks and write these blocks into several Datanodes along this, the following steps need to be executed. Note here, after
with the replicas (two replicas for each block). Before the execution of user login phase (refer Sec. V-D5), HCAj
CM S ∗ CM S
writing each block (ith block) including its replica into has the following information: RC i
= RC i
⊕h(ri
the chunk servers, HDCLj computes HM ACBL i
= ||P W D), MT U = h(CID ||ri ||HCAID || RCi ), ri∗ = CM S

i i th
h(KN M S,C ) (CID ||BLID ||h(BLdata )) of i block, where
ID
ri ⊕ h(CID || P W D), K(N M S,C) = K(N M S,C) ⊕h(P W D
CM S
BLiID represents the identity of ith block and BLidata || RC i
) and U SP W i = h(C ID ||HCAID ||P W D ||ri ).

denotes the data content (64 MB data) of the ith block. Step PCP1: Ci enters his identity CID and old password
HDCLj then stores both the ith block and HM ACBL i P old into HCAj and goto Step PCP2.
ID 0
to the respective chunk server for future activities (namely Step PCP2: HCAj locally verifies the condition U P Wi
? 0
integrity-assisted read and processing the Big Data). = U SP Wi , where U P Wi = h(CID ||HCAID || P old

If client Ci needs to append additional data in the existing ||ri ) and ri = ri ⊕ h(CID || P W Dold ). If the condition
file F1 in the near future, he needs to follow the similar is satisfied then HCAj asks Ci to enter his updated
strategy as write operation as discussed above. password and goto Step PCP3, otherwise; re-enter the
user id. and password again in HCAj .
10) Secure integrity-assisted read in HDFS Step PCP3: Ci enters a new password P new into HCAj .
In this phase, Ci wants to read his previously archived HCAj selects a random number rinew and computes
0
Big Data (here the file F1 ) from HDFS. The basic work- P new = h(h(rinew ||P new )||RC CM S
i
), where RC CM S
i
=
CM S ∗
flow of HDFS-Read operation is shown in Figure 24. Ac- RCi ⊕h(ri ||P W D), and goto Step PCP4.
00
cording to the policy of HDFS, after getting the block Step PCP4: HCAj computes UDtl = E(K(N M S,C) :
0
locations from N Sj via a secure channel followed by a [P new , MT U]) where K(N M S,C) = K(N ∗
M S,C)
session key (SKC,N S ) establishment process, Ci can di- CM S
⊕h(P W D || RCi ). HCAj select a random nonce
rectly read the blocks from the chunk servers or Datan-
n5 and constructs a password update message as
ode servers through HDCLj . Suppose Ci needs to read 00
Pwd_chg_msg = {CM SID , MT U, n5 , UDtl } (it is
the ith block of the file F1 . In this connection, HDCLj
i0
similar as the message msgC5 shown in Figure 11) and
loads the same block and computes HM ACBL ID
= goto Step PCP5.
0
h(KN M S,C ) (CID ||BLiID ||h(BLidata )). After that, HDCLj Step PCP5: HCAj sends Pwd_chg_msg to CM S.
i0 ? i
verifies that HM ACBL ID
= HM ACBL ID
. If the condition Step PCP6: After getting Pwd_chg_msg, CM S checks
is satisfied then Ci realizes the data integrity of ith block. MT U is an existing user or not. If user presents then
Similarly, after verifying the data integrity of all the blocks goto Step PCP7, otherwise; reject Ci ’s password change
belongs to F1 (here, F1 consists of 16384 blocks), Ci set a request.
00
boolean variable as flag =“true" which indicates that the file Step PCP7: CM S broadcasts both MT U and UDtl (similar
F1 is not being modified by any attackers or insiders (more as the message msgC5.1 shown in Figure 11) to ES and
24 VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

FIGURE 23: Summary of HDFS-Write operation

00
N M S. After decrypting UDtl , both ES and N M S up- system cache. In a similar way, the same procedure can be
dates their user databases and broadcasts their acknowl- applied for other phases to protect replay attack.
edgement messages (same as the message msgC5.2 Remark 4: To proof the proposed HEAP-KDC is fault-
shown in Figure 11) to CM S. Thereafter, CM S sends a tolerant in terms of secret credentials storage, we distribute
response message (similar as the message msgC6 shown the credentials of clients (Ci ), service provider (BDSP ) and
in Figure 11) to Ci about the confirmation of password service servers (N Sj and JTj ) among different servers in the
updation request. following settings:
Step PCP8: After getting the response message from CM S,
∗ 1) CM S: 0It is permissible to store the masked identities
HCAj computes rinew = rinew ⊕ h(CID ||P new ),
(MT U s, T N SID s and T JTID s), masked passwords
K(N M S,C) = K(N M S,C) ⊕h(P new || RC
new∗ CM S
) and j j
new new
i
(SIDN S s, SIDJT s and BDSPP∗ W D s), pass-
U SP Wi = h(CID ||HCAID ||P ||rinew ), re-
phrases (KCM S,BDSP s) secret keys (KCM S,BDSP s,
spectively and stores them in its database for future use.
KCM S,N S s, KCM S,JT s) of service providers
Note that using the aforesaid mechanism, BDSP ’s admin- and service servers, the dictionary of password
istrator (BA) can update its password for security reasons transformation parameters of clients (RCCM S
s), and the
i
utilizing both N M S and ES servers and the service provider masked identities of clients (MT Us), respectively.
application HSAk .
Remark 3: In order to prevent replay attack between Ci 2) N M S: This server keeps the masked identities,
1
and N Sj , N Sj can temporarily keep ψC i
and Vcc in its pass-phrases and passwords of clients (MT Us,
system cache. The similar mechanism has been reported in KN M S,C s and P W D∗ ), and the dictionary of password
[18] and [10] can be applied for the replay attack protection. transformation parameters of both service providers
0
10
Suppose a similar message M C7 = {MT U, T N SID , ψC i
, NMS
and service servers (RBDSP s, RNNMS NMS
Sj s and RJTj ),
0
Vcc , N Stoken , CertC } has been received by N Sj . N Sj first respectively.
10 1 0
checks ψC i
= ψC i
and Vcc = Vcc . If both the conditions are
0
satisfied successfully then N Sj treats the message M C7 as 3) ES: It keeps all the secret credentials of both CM S and
0
replay message, otherwise; N Sj treats the message M C7 is N M S servers. In fact, at the time of online registration
1 10 0
fresh and N Sj updates ψC i
and Vcc with ψC i
and Vcc in its process, both CM S and N M S send their principal’s
VOLUME 6, 2018 25
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

FIGURE 24: Summary of HDFS-Read operation

secret credentials to ES. Note here, ES is a fully VI. SECURITY ANALYSIS


trusted server since it is not reachable online to any To prove that the proposed protocol (HEAP) is provably
other principals (except CM S and N M S) including secure, we analyze the security of HEAP by utilizing both
adversaries at the time of authentication process. formal and informal security analysis. Further, to pursue the
In such a settings, if CM S (or N M S) fails due to some security analysis, we also use the proposed threat model as
hardware failure and the server data is totally lost then it discussed in Section V-B.
could be rebuild from ES server. Similarly, if ES server
fails then the entire legacy data can be extracted from both A. FORMAL SECURITY ANALYSIS USING ROR MODEL
CM S and N M S servers to renovate a new ES. Thus, we In order to present formal security analysis of HEAP in detail,
can remark that for the current settings of HEAP-KDC, the we first discuss few terminologies of the widely-used Real-
proposed authentication framework is more dependable and Or-Random (ROR) model [77]. Thereafter, utilizing the same
fault-tolerant. model we substantiate that the proposed protocol (HEAP)
Remark 5: At the time of mutual authentication and single provides the session-key (SK) security against an adversary
sign-on process between Ci and CM S, suppose any one of Adv in Theorem 1.
the random nonce RC or RCM S is known to the adversary
Adv. In this connection it is obvious that Adv can com- 1) ROR Model
1 1
pute RC · πCM S or RCM S · ψCi , respectively. In such a In HEAP, we have four entities, namely, user Ci , CM S,
provision, to protect known session-specific temporary infor- N M S and N Sj (or Big Data processing service server JTj ).
mation attacks from Adv both Ci and CM S can compute The following attributes are involved in the ROR model.
1 CM S
the session key as SKC,CM S = h(RC · πCM S ||RCi ) = Participants. Suppose ΓsCi , ΓtCM S , ΓuN M S and ΓvN Sj (or
1 CM S
h(RCM S · ψCi ||RCi ). Therefore, the adversary Adv re- w
ΓJTj ) are the instances s, t, u and v (or w) of the principals
CM S
quires the knowledge of RC i
to compute the actual session Ci , CM S, N M S and N Sj (or JTj ), respectively. These
key between Ci and CM S. In the same way, BDSP and instances are also coined as the oracles.
N M S can compute the sesseion key SKBDSP,N M S utiliz- Accepted state. An instance, say Γs is said to be in
NMS
ing RBDSP . accepted state, if it reaches to an accept state after receiving
the last protocol message. The session identification (sid) is
26 VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

constructed by concatenating all the communicated messages ing the semantic security of the proposed protocol (HEAP)
(sent and received messages) of the Γs for the current execut- is denoted and defined by AdvHEAPAKE
= |2 · P r[S] − 1|. If
AKE
ing session. AdvHEAP ≤ , for a sufficiently small  > 0, we say HEAP
Partnering. We say two instances Γs and Γt are partnered provides SK-security.
to each other if the following three conditions are satisfied Random oracle. Let us assume that the cryptographic one-
simultaneously: 1) both Γs and Γt are in accept state, 2) both way hash function h(·) is available to all the entities including
Γs and Γt mutually authenticate each other and also share the Adv. We model h(·) by a random oracle, say H [10], [78].
same sid, and 3) both Γs and Γt are mutual partners of each
other. 2) Security proof
Freshness. If the established session key SK between Ci Theorem 1 substantiates the semantic security of the pro-
and N Sj (or between Ci and JTj ) is not disclosed using the posed protocol (HEAP) under ROR model.
following Reveal(Γs ) query, ΓsCi or ΓvN Sj (or ΓwJTj ) is said Theorem 1: Let Adv be an adversary running in poly-
to be fresh. nomial time t against the proposed authentication pro-
Adversary. In ROR assumptions, all the message commu- tocol, HEAP in the ROR model, and D, qh , qs , |H|,
nications can be supervised by Adv including eavesdropping, ECDDHP
|D|, AdvAdv (t) and AdvEIN D−CP A (k) denote the
modifying, deleting, and inserting transmitted messages. In uniformly distributed password dictionary, the number of
addition, Adv can have the access of the following queries hash queries, Send(·) queries, the range space of h(·),
[78]: the length of D, the advantage of Adv in breaking ECD-
1) Execute(Γs , Γv ) – A passive attack is modeled uti- DHP and the advantage of Adv in breaking the IND-CPA
lizing this query wherein Adv can have access to the secure symmetric cipher E (provided in Definition 1), re-
transmitted messages between two legitimate parties. spectively, and AdvEIN D−CP A (K) = AdvE,SGL
IN D−CP A
(K) or
IN D−CP A
2) Send(Γs , M ) – This query is modeled as an active AdvE,M EL (K). Then, Adv’s advantage in breaking the
attack, wherein a message, say M can be transmitted semantic security of HEAP can be estimated as
to a participant instance, say Γs and also receives a AKE q2 q
s ECDDHP
response message. AdvHEAP ≤ h +2 + AdvAdv (t)
|H| |D|
3) Reveal(Γs ) – An adversary Adv discloses the current 
session key SK of Γs (and its partner) utilizing this +AdvEIN D−CP A (K) .
query. Proof 1: In order to proof the above theorem, we go through
4) CorruptSmartW orkstation(ΓsSWi ) – This query a sequence of six games, say GMj (j = 0, 1, 2, 3, 4, 5) as in
represents an active attack wherein HEAP-KDC’s au- [78], [19]. Let the initial game be GM0 and the final game
0
thentication token (Z4 ) and Ci ’s authorization tokens be GM5 . Assume that Si represents as an event wherein
(Ctoken and N Stoken ) are leaked to the adversary Adv Adv can successfully guess the bit cn in the T est(·) query
by compromising Ci ’s workstation. These information with respect to the game GMj . All the games are outlined as
leakage is modeled using this query to check the se- follows.
curity of the proposed protocol. It is reported in [61]
• Game GM0 : This is the initial game where the adver-
that CorruptSmartW orkstation(·) query fortifies the
sary Adv incorporates a real attack against HEAP under
weak-corruption model, where the temporary keys and
ROR model. Before starting of the game GM0 , Adv
internal credentials related to the participant instances
chooses a bit cn. Under the ROR assumptions, the initial
are not corrupted.
game GM0 and the actual protocol are identical to each
5) T est(Γs ) – Applying this query, the semantic security
other. Hence, it follows that
of the session key SK is being modeled adopting the
AKE
indistinguishability in ROR [77]. Before starting the AdvHEAP = |2 · P r[S0 ] − 1|. (1)
experiment, an unbiased coin cn needs to be flipped and
its result is only known to Adv. This result decides the • Game GM1 : In this game, Adv performs the eaves-
output of the T est query. If Adv executes this query, dropping attacks by running the Execute(·) query. Fi-
and also SK is fresh, Γs outputs SK when cn = 1 or nally, at the end of this game, Adv needs to call the
a random number in the same domain when cn = 0; T est(·) query. The result of the T est(·) query deter-
otherwise, it will produce the output as a null value (⊥). mines whether Adv obtains the real session key SK or
a random number. In HEAP, the session key between
Semantic security of session key. Under the ROR assump- Ci and N Sj is computed as SKC,N S = h(RC · Pns
tions, Adv needs to apprehend a participant instance’s real ||OT KCM S,C ) = h(RC · RN S · G ||OT KCM S,C ).
session key from a random key. To achieve this purpose, Adv Therefore, to reveal the session key SKC,N S , Adv
can execute several T est queries against either ΓsCi or ΓvN Sj needs the knowledge about two random secrets RC
(or ΓwJTj ). At the end of this experiment, Adv guesses a bit and RN S . Since we encapsulated these two secrets
0 0
cn , and he or she wins the game if cn = cn. Assume S is an inside the exchanged messages (see M C7 and M C8 in
event that Adv can win the game, Adv’s advantage in break- Section V-D8) indirectly, eavesdropping attacks against
VOLUME 6, 2018 27
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

these messages are not beneficial to determine SKC,N S . hard to derive SKC,N S due to the difficulty of solving
Therefore, the probability of winning GM1 by incorpo- ECDDHP (see Section IV). Therefore, it follows that
rating eavesdropping attacks by Adv is negligible. As a ECDDHP
result, we say that both the games GM0 and GM1 are |P r[S4 ] − P r[S3 ]| ≤ AdvA (t). (5)
indistinguishable. Thus, we infer that • Game GM5 : This is the final game and it is modeled
as an active attack, and the game GM4 is transformed
P r[S1 ] = P r[S0 ]. (2) into the final game GM5 . In this game, Adv compro-
mises the Ci ’s workstation and tries to compute the
• Game GM2 : We include the simulation of both Send(·) session key SKC,N S by stealing the session temporal
and H(·) queries into this game and transform the game secrets, such as OT KCM S,N S and OT KCM S,C =
j
GM1 to the game GM2 . This game is also modeled as h(OT KCM S,N S || SIDN S ) from N Stoken and Ctoken ,
an active attack. In this game, Adv eavesdrops all the ex- respectively. But, in the proposed protocol, for en-
changed messages (M C1 , M C2 , · · · , M C8 ). Accord- cryption or decryption we use IND-CPA secure sym-
ing to the policy of HEAP, we appended a random nonce metric cipher, such as stateless CBC mode of AES
with each communicating message (send and receive) analogy. Therefore, Adv requires the knowledge about
j
so that there will be no collision of hash outputs when SKC,CM S and SIDN S to decrypt both N Stoken
Adv simulates it utilizing the Send(·) query. Thus, the and Ctoken to get the parameters OT KCM S,N S and
birthday paradox results in the following inequality: OT KCM S,C . Thus, it follows that
qh2 |P r[S5 ] − P r[S4 ]| ≤ AdvEIN D−CP A (K). (6)
|P r[S2 ] − P r[S1 ]| ≤ . (3)
2|H|
Since all the queries are successfully simulated in the
• Game GM3 : This game is modeled as an active at- final game GM5 , Adv is left with only guessing the bit
tack wherein Adv tries to compute the current ses- cn for winning the game after the T est(·) query. Then,
sion key SKC,N S between Ci and N Sj by obtaining we have,
the other credentials (specifically, Vcnew and MT U)
1
from the guessed password (P W D) of Ci and the P r[S5 ] = . (7)
game GM2 is transformed into the game GM3 . Sup- 2
pose Adv eavesdrops all the exchanged messages M Ci From Eqs. (1) and (2), we have,
(i = 1, 2, · · · , 8) of the current session. We utilize Ci ’s 1 AKE 1
transformed password (P W D∗ = h(h(ri || P W D)|| · AdvHEAP = |P r[S0 ] − |
2 2
CM S
RC )) only in order to encrypt the CM S’s identifier 1
i
(i.e., Vcnew ) during the single sign-on and session key = |P r[S1 ] − |. (8)
2
establishment process, but Adv does not have any provi- From Eqs. (7) and (8), we have,
sions to check the transformed password of Ci directly
on the server-sides (both CM S and N Sj ). Therefore, 1 AKE
· AdvHEAP = |P r[S1 ] − P r[S5 ]|. (9)
even if Adv guesses Ci ’s password, but he or she has no 2
scope to verify it on the server-side. Moreover, to reveal According to the triangular inequality, we get the fol-
the actual password (P W D) of Ci correctly from client- lowing:
CM S
end, Adv requires the knowledge about RC i
and ri . |P r[S1 ] − P r[S5 ]| ≤ |P r[S1 ] − P r[S2 ]| +|P r[S2 ] −
In fact, if the authentication system has a provision P r[S5 ]| ≤ |P r[S1 ] − P r[S2 ]| +|P r[S2 ] − P r[S3 ]|
to check the limited number of incorrect passwords as +|P r[S3 ] − P r[S4 ]| +|P r[S4 ] − P r[S5 ]|.
inputs, we have the following result: Now from Eqs. (3), (4), (5) and (6), we get,
qs qh2 qs
|P r[S3 ] − P r[S2 ]| ≤ . (4) |P r[S1 ] − P r[S5 ]| ≤ + (10)
|D| 2|H| |D|
• Game GM4 : This game is imitated as an active at- +AdvAECDDHP
(t)
tack and the game GM3 is transformed into the game +AdvEIN D−CP A (K).
GM4 . In this game, Adv tries to compute the session
key SKC,N S by utilizing both the public information Thus, Eqs. (9) and (10) produce the following result:
1
(ψC i
= RC · G and Pns = RN S · G) of Ci and 1 AKE
N Sj and previously eavesdropped messages from the · AdvHEAP = |P r[S1 ] − P r[S5 ]| (11)
2
aforesaid discussed games. Since both Ci and N Sj can qh2 qs
compute the session key as SKC,N S = h(RC · Pns ≤ +
2|H| |D|
||OT KCM S,C ) = h(RC · RN S · G ||OT KCM S,C ) by ECDDHP
utilizing CM S server, it is obvious that having the +AdvA (t)
knowledge about ψC 1
i
and Pns , it is computationally +AdvEIN D−CP A (K).
28 VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

new new new


Finally, after multiplying both sides of the above equa- not have any knowledge about RC , RN S and OT KCM S,C
tion (Eq. 11) by a factor of 2, we get the required result parameters for the session say St+1 then it is hard to compute
as follows: the future session key.
qh2 q
s
Additionally, some information related to Ci namely EI,
AKE ECDDHP
AdvHEAP ≤ +2 + AdvA (t) ri , KN M S,C and RC CM S
are stored into the workstation
|H| |D| i
 for verifying Ci ’s legitimacy at the time of user login and
+AdvEIN D−CP A (K) . password change phase. This parameters are stored either
in encrypted format or in a transformed manner using one
B. INFORMAL SECURITY ANALYSIS way hash function. Therefore, without having the knowledge
This section presents an informal security inspection of the about the keys and breaking the hardness property of cryp-
proposed protocol (HEAP) and shows it is resilient against tographic one-way hash function, it is impossible to get the
various other well-known attacks. This discussion are repre- parameters. Thus, we can remark that the proposed protocol
sented in the following propositions. protects disclosure of Ci ’s confidential information through
Proposition 1: HEAP is resilient against the privileged- workstation compromise attacks.
insider attacks. Proposition 3: HEAP is resilient against denial-of-service
Proof: According to the policy of the proposed protocol, attacks.
during user enrollment task HCAj asks Ci to give his Proof: In order to achieve user (or service provider) login,
or her user identity CID and password P W D. After get- HEAP utilizes workstation-based authentication mechanism
ting these parameters, HCAj transforms Ci ’s identity and without involving the HEAP-KDC. In this connection, Ci
CM S
password as MT U = h(CID ||ri ||HCAID || RC i
) and enters his identity (CID ) and password (P W D) to HCAj .
P W D∗ = h(h(P W D|| ri )||RC CM S
). HCA j encrypts both 0
i To verify the current user Ci , HCAj computes EI =
these transformed parameters and construct a message as
h(CID || P W D) and check it with EI in its cookie. If
msgC5 = {CM SID , MT U, n2 , E(K(N M S,C) : [P W D∗ ,
both matches then HCAj load ri∗ = ri ⊕ h(CID ||P W D),
MT U, M BN OC ])}. HCAj then sends the message to the CM S ∗ CM S
RC = RC ⊕h(ri ||P W D) and U SP Wi = h(CID
CM S. i i
||HCAID ||P W D ||ri ) from its database into Ci ’s work-
Let a privileged-insider user of the CM S, being an adver- CM S CM S ∗
station. HCAj then computes RC = RC ⊕ ⊕h(ri
sary Adv, receives the message msgC5 and tries to extract i i
||P W D), ri = ri∗ ⊕ h(CID ||P W D) and U SP Wi∗ =
the original identity of Ci from MT U. Even if Adv is having
CM S h(CID ||HCAID ||P W D ||ri ). After that, HCAj verifies
the knowledge about RC and HCAID , but it is still not ?
i
sufficient for Adv to trace Ci ’s actual identity without having the condition U SP Wi∗ = U SP Wi . If it holds then HCAj
the value of ri . CM S can not decrypt the masked password accepts Ci ; otherwise, HSAj treats Ci as illegitimate user.
P W D∗ = h(h(P W D|| ri )||RC CM S
) because it does not As the first step verification has done only on the client-
i
have the key KN M S,C . Further, suppose a privileged-insider side and HEAP-KDC does not involve into this process then
user of the N M S or ES, being an adversary Adv, gets we say that HEAP can resists server-side denial-of-service
the transformed identity and the password of Ci and tries attacks.
to extract the actual identity and password of Ci . But, the Proposition 4: HEAP provides privacy preserving data in-
adversary Adv can not disclose the original identity of Ci tegrity in Hadoop.
CM S
due to the lack of knowledge about RC i
and ri . In the same Proof: Form Proposition 8 and Assrt. 9, it is obvious that
way, Adv can not extract Ci ’s original password. Thus, the both Ci and N Sj (or JTj ) preserve their identities during
proposed protocol HEAP can protect the privileged-insider session key establishment and Big Data service access task.
attacks. Further, from Section V-D9 and Section V-D10, it could also
Proposition 2: HEAP protects Ci ’s private information be observe that HEAP stores (writes or appends) Ci ’s raw
against workstation compromise attacks. datablocks into several chunk servers (HDFS). During this
Proof: Let an adversary Adv controls Ci ’s workstation after process, HEAP computes a hashed MAC (HMAC) of each
successful accomplishment of Ci ’s tth session say St . In datablock and stores the HMAC along with the raw datablock
such a provision, Adv captures MT U, Ctoken , N Stoken , into Datanode servers.
0
1 2
ψC i
, ψC i
, Vcc , Pns , Vns , Z4 and T N SID parameters from Suppose an adversary Adv or a malicious insiders change
workstation credential cache. After gaining the knowledge the content of the raw datablock. In such a provision, if
about these parameters, Adv can not derive the future ses- Ci processes his Big Data (basically the intercepted and
new
sion key between Ci and N Sj say SKC,N S = h(RC · modified data content) utilizing MapReduce framework then
new new new 1new new
Pns ||OT KCM S,C ) = h(RN S · ψCi ||OT KCM S,C ). Be- he will not get the desired result. To overcome this problem,
cause, for the future session say St+1 , Ci chooses a fresh in our proposed protocol, at the time of auditing or reading
new
pseudo-random number say RC 6= RC and N Sj selects a the datablock, Ci would be able to check the integrity of each
new
fresh pseudo-random number say RN S 6= RN S , and CM S datablock utilizing his secret key (KN M S,C ) and HMAC.
new old
chooses a fresh nonce say OT KCM S,C 6= OT KCM S,C to After checking the integrity of each datablocks, Ci is permis-
new old
construct Ctoken 6= Ctoken . Since the adversary Adv does sible to process the Big Data utilizing JTj and it will lead Ci
VOLUME 6, 2018 29
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

CM S
to get the desired output. sary Adv does not have the knowledge about ri , RC i
, rss1
Proposition 5: HEAP is resilient against known session- and rss2 , so he cannot retrieve the original identities CID ,
specific temporary information attacks. N SID (or JTID ) from MT U and T N SID (or T JTID ), re-
Proof: During the session key establishment process between spectively. In a similar way, to protect BDSP/BA’s identity,
0
Ci and N Sj (or JTj ), suppose any one of the random nonce it is also stored as MT U = h(BDSPID || d|| HSAID ||
RC or RN S (or RJT ) is known to the adversary Adv. NMS
RBDSP ) into those servers. Thus completes the proof.
Therefore it is obvious that Adv can compute RC · Pns Proposition 8: HEAP supports user and Big Data service
1
or RN S · ψC i
, respectively. But, it is not sufficient for the servers anonymity.
adversary Adv to compute the session key SKC,N S = Proof: During user and service server registration process
1
h(RC · Pns ||OT KCM S,C ) = h(RN S · ψC i
||OT KCM S,C ) both Ci and N Sj (or JTj ) are enroled themselves with
without having the knowledge about OT KCM S,C . Further HEAP-KDC utilizing their masked identities namely MT U
j
to compute OT KCM S,C = h(OT KCM S,N S || SIDN S ), the and T N SID (or T JTID ). At the time of session key es-
j
adversary Adv also requires the knowledge of SIDN S and tablishment task, Ci computes its digital signature using
OT KCM S,N S parameters. Moreover, Adv is unable to ex- Ci ’s masked identity (MT U) and an application generated
tract these parameters from OT KCM S,C due to the one-way pseudo random number (RC ). The same way, N Sj (or JTj )
property of cryptographic hash function and cryptographic encapsulates T N SID and RN S to construct its digital signa-
hardness property associated with the stateless CBC mode of ture. Thereafter, the digital signature exchanges between Ci
AES encryption/decryption policy. It is also observed from and N Sj (or JTj ) via a public channel lead to establish the
Remark 5 that the proposed protocol alleviate the known session key between themselves. Due to the encapsulation
session-specific temporary information attacks. of the pseudo random number, these digital signatures vis-a-
Proposition 6: HEAP protects man-in-the-middle attacks. vis the identities of both Ci and N Sj (or JTj ) are used to
Proof: Suppose during session key establishment pro- be dynamic and it will change in every sessions. Thus, the
cess, an adversary Adv tries to impersonate a legitimate proposed scheme provides user and Big Data service server
client Ci or service server N Sj by eavesdropping the ex- anonymity.
changed messages say M C7 and M C8 . However, in the Proposition 9: HEAP assists untraceability of user and Big
proposed protocol, Ci authenticates both CM S and N Sj Data service servers.
1
by verifying two conditions as (1) Vcc · G = ψC i
· Proof: Suppose an adversary Adv eavesdrops the mes-
1
H2 (H1 (MT U ||OT KCM S,C ))+ QC · H2 (ψCi ) and (2) sage set {M C7 , M C8 } and tries to extract the original
j
OT KCM S,C = h(OT KCM S,N S ||SIDN S ), respectively identities of Ci and N Sj (or JTj ). In this connection,
CM S
utilizing both Ctoken and message M C8 . Similarly, N Sj Adv extracts MT U = h(CID ||ri ||HCAID || RC i
),
verifies the legitimacy of both CM S and Ci by checking T N SID = h(N SID ||HSAID || rss1 ) (or T JTID =
two conditions as (i) Vns · G = Pns · H2 (H1 (T N SID || h(JTID ||HSAID || rss2 )), Vcc = RC · H2 (H1 (MT U
1
OT KCM S,C )) +QN S · H2 (Pns ) and (ii) OT KCM S,C = ||OT KCM S,C )) +SC · H2 (ψC i
) (mod q) and Vns = RN S ·
j
h(OT KCM S,N S ||SIDN S ), respectively using N Stoken and
H2 (H1 (T N SID ||OT KCM S,C ))+ SN S ·H2 (Pns ) (mod q)
message M C7 . After validating the aforesaid conditions suc- parameters. Note here, these four parameters are implicitly
cessfully, Ci (or N Sj ) establishes the session key SKC,N S derived from the original identity of either user Ci or service
between themselves, otherwise; terminate the process. Since, server N Sj (or JTj ), respectively. The adversary Adv can
the adversary Adv does not have the knowledge about RC , not trace the actual identities of Ci and N Sj (or JTj ) due
j
RN S , OT KCM S,C , OT KCM S,N S , SIDN S and KCM S,N S
to the adoption of collision-resistant cryptographic one way
so, it is impossible to impersonate either Ci or N Sj . hash function towards the identity transformation. Thus, the
In addition to this, the simulation result of AVISPA based proposed scheme satisfies the untraceability property.
formal verification (see Section VII) is also substantiates that Proposition 10: HEAP is resilient against offline dictionary
the other phases of the proposed protocol is robust against attacks.
man-in-the-middle attacks. Thus, we remark that the pro- Proof: To make the proposed protocol resilient against offline
posed protocol is resilient against man-in-the-middle attacks. dictionary attacks, Ci transforms his actual password P W D
Proposition 7: HEAP is resilient against identity compromise using a system generated random secret ri , a server gener-
CM S
attacks. ated random nonce RC i
and two hash functions (h(·)) as
∗ CM S
Proof: In this attack, to protect both Ci ’s and N Sj ’s (or P W D = h(h(ri || P W D)|| RC i
), and later store this

JTj ) original identities (CID and T N SID (or T JTID )) masked password (P W D ) into N M S and ES servers.
from an adversary say Adv who controls either CM S Suppose a privileged insider acting as an adversary Adv
or N M S or ES, in the proposed scheme, the identities compromises the dictionary of passwords from N M S (or
are stored in a transformed manner into those servers. For ES) and tries to reveal the actual password of Ci incor-
example, Ci ’s original identity are stored as MT U = porating offline password guessing attacks. But, Adv can
CM S
h(CID ||ri ||HCAID || RC i
) whereas N Sj ’s (or JTj ) not extract Ci ’s actual password due to lack of knowledge
CM S
identity are stored as T N SID = h(N SID ||HSAID || rss1 ) about ri and RC i
parameters and usage of the collision-
(or T JTID = h(JTID ||HSAID || rss2 )). Since, the adver- resistant cryptographic one way hash function towards pass-
30 VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

word transformation. The same technique has been followed Ci verifies both CM S and N Sj whereas N Sj checks the
to protect BDSP ’s original password from the offline pass- legitimacy of both CM S and Ci seperately before establish-
word guessing attacks. ment of the session key between themselves using the pro-
Proposition 11: HEAP is robust against ciphertext-only at- posed digital signature based verification strategy. Since, the
tacks (COA) on Ci ’s or BDSP/BA’s password. adversary Adv does not have OT KCM S,C , OT KCM S,N S ,
j
Proof: Suppose during the single sign-on process, a pas- SIDN S , SC , SN S , RC and RN S parameters he would not
sive adversary Adv listening the communication channel be able to generate a valid digital signature. Thus, we remark
between Ci and CM S for a particular session say St , and that the proposed protocol has ability to protect user and
eavesdrops the exchanged messages say M C1 , M C2 , M C3 , service server impersonation attacks.
M C4 . Adv repeats this process for multiple sessions say Proposition 14: Heap protects replay attacks.
St , S(t+1) , S(t+2) , · · · , S(t+n) and collects a set of messages Proof: To resist replay attacks during mutual authentication
←−−− ←−−− ←−−− ←−−− and single sign-on task, CM S can keep ψC 1
and ψC2
in its
say {M C1 , M C2 , M C3 , M C4 }. Form these set of messages, i i
←−−− ←−0 ←
−0 system cache memory temporarily. CM S initially verifies if
A picks M C4 and extracts Z4 . Note here, Z4 is a set of ψC1∗
= ψC 1
and ψC2∗
= ψC 2
. If both of these conditions are
ciphertexts that are directly associated with Ci ’s password. i i i i
valid then Ci ’s request message is treated as replay message,
←−0
Now, from Z4 the adversary Adv tries to find out the actual otherwise; CM S updates ψC 1
and ψC 2
with ψC 1∗
and ψC 2∗
i i i i
password of Ci . Since, Adv does not have any knowledge in its system cache memory. Further, in order to protect
CM S
about RC i
and ri , so it is hard to guess the password from the replay attacks during mutual authentication and cluster
the ciphertexts set. In the similar way, HEAP is also robust registration (service servers registration) phase, N M S can
against COA on BDSP/BA’s password. Thus, completes store λ1Ci and λ2Ci into its cache memory temporarily. In a
the proof. similar way, we can alleviate the replay attacks during session
Proposition 12: HEAP is robust against stolen-verifier at- key formation task between Ci and N Sj (or JTj ) (also see
tacks. Remark 3) and the other phases of the proposed protocol.
Proof: Suppose a privileged insider acting as an adver- Proposition 15: HEAP is resilient against server spoofing
sary Adv steals Ci ’s masked identity and Ci ’s transformed attacks.
CM S
password (i.e., MT U = h(CID ||ri ||HCAID || RC i
), Proof: To impersonate both the server i.e., CM S and N M S
∗ CM S to the end user Ci , an adversary Adv needs to generate the
P W D = h(h(ri || P W D)||RCi ) from N M S’s (or
2
ES’s) database and tries to login into a workstation 0
using valid Vcnew = πCM S + SN M S ·H2 (H1 (KN M S,C )) (mod q)
HCAj . In this regard, HCAj computes EI = h(MT U || for the message M C4 to satisfy the verification condition at
P W D∗ ) and search the same into the workstation’s cookies. user-side (Ci ) as T emp1C = T emp2C . It is obvious that the
Since, HCAj does not find such an entry into the cookies, adversary Adv cannot achieved without having the knowl-
it is obvious that HCAj rejects Adv’s request. In order edge of SCM S , SN M S and KC,CM S parameters. In a similar
to satisfy the aforesaid search condition successfully, Adv way, it is computationally intractable to impersonate both
needs the knowledge about the original user identity (CID ) the servers i.e., N M S and CM S to the Big Data service
and password (P W D) of Ci instead the masked identity provider BDSP . Further, to impersonate the server N Sj
and masked password of Ci . In the same way, the proposed to Ci , the adversary Adv needs the knowledge about SN S ,
j
scheme does not allow an adversary Adv to login into the OT KCM S,N S and SIDN S parameters to generate the valid
system by stealing BDSP/BA’s credentials from the server Vns . Thus, we can remark that HEAP can resists server
(N M S or ES). Hence, we can conclude that, the proposed spoofing attacks.
scheme protects stolen-verifier attacks. Proposition 16: HEAP protects server-side Single point Of
Proposition 13: HEAP is resilient against impersonation Failure (SOF) and Single point Of Vulnerability (SOV) is-
attacks. sues.
Proof: In order to access the Big Data storage and processing Proof: According to the proposed HEAP-KDC architecture,
services from the remote service server (N Sj or JTj ), an CM S interfaces with the clients whereas N M S interacts
adversary Adv initially requires the actual identity and pass- with the service providers and ES is reachable offline to both
word of Ci for sign in into the local workstation. In a similar client and service provider at the time of principal registration
way, to enrol the service servers of a Hadoop cluster, Adv process. Under the basic assumption of HEAP as discussed
needs the original identity and password of BDSP . Since, in Section V-C, the CM S server: which is the front server
Adv does not have these parameters so it is computationally to the client (Ci ): keeps the transformed identities (MT Us)
CM S
intractable to make a valid sign in request through HCAj or of clients, the dictionary of RC i
, the secert credentials
j j
HSAk . (SIDN S , SIDJT , KCM S,BDSP and BDSPP∗ W D ) of ser-
Further, during session key establishment, Adv does not vice providers (BDSP s) and service servers (N Sj or JTj s),
have any means to steal the Ci ’s (or N Sj ’s) information to wherein the N M S server: which is the front-end server
achieve mutual authentication in the presence of CM S. Be- to the service providers (BDSP s): stores the client (Ci )
cause, retrieval of OT KCM S,C from Ctoken is computation- secret credentials (P W D∗ s, KN M S,C ), the dictionary of
0
NMS
ally hard. Moreover, during this mutual authentication task RBDSP and the transformed identities (MT U s) of service
VOLUME 6, 2018 31
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

providers. In addition, ES keeps all the credentials of CM S tication system at mission critical situations.
and N M S servers in its custody for future use. In such a According to the proposed authentication architecture,
settings, two servers (CM S and N M S) are actively involved CM S has all the security credentials related to service
(two-server based handshaking) to achieve authentication of providers, Hadoop cluster vis-a-vis service servers informa-
Ci (or BDSP ) at the time of single-sign on (or service server tion and masked identities of users. N M S has all the security
registration) process. Since, the secret credentials of Ci (or credentials related to Ci , service providers masked identities
BDSP ) is distributed into two different servers, therefore and service servers masked identities. ES is having all the
it is resilient against SOV issue. Further, from Remark 4, secret credentials of each principals which is the union of
we can observe that the proposed authentication framework both CM S and N M S. Now, say for instance, if the
resists SOF issue. Thus, completes the proof. active CM S suddenly fails and all its security credentials
Proposition 17: HEAP provides both forward and backward are lost due to some hardware related issues then a stand-by
secrecy. CM S will be restored the whole system by configuring it
Proof: To measure the forward and backward secrecy of the from the back-up server say ES via a secure channel. In a
proposed protocol, we consider the simultaneous leakage of similar way, the failure of an active N M S can be restored
Ci ’s primary secret namely password P W D in the form of by a stand-by N M S. Further, if the trusted server say ES
Ctoken and its impact on the past and future session key fails then the maintenance engineer and system administrator
security. easily rebuild it from both active CM S and active N M S
Suppose in a particular session, to establish a session key servers. Note here, a trusted server ES (operates in an offline
between Ci and N Sj (or JTj ), a pseudo random number mode) is currently having all the secret credentials related to
say RC is chosen by Ci whereas another pseudo random each principal of the system including the master CM S and
number say RN S (or RJT ) is selected by N Sj . Although, N M S. It may also be noted that, we consider here only one
1
both ψC i
= RC ·G and Pns = RN S ·G are exchanged between server (CM S or N M S or ES) can fail at a particular point
Ci and N Sj via a public channel, it is computationally hard of time. Therefore, from the above discussion, we can remark
1
to reveal RC or RN S (or RJT ) from ψC i
or Pns (or Pjt ), that both maintenance engineer and system administrator
respectively due to the intractability of ECDLP (see Section operates under HEAP-KDC would be able to restore the
IV). Further, it is also impossible to compute RC · RN S · G or whole KDC in no time. Thus, the proposed protocol provides
1
RN S · RC · G after getting Pns = RN S · G or ψC i
= RC · G dependable authentication services.
via a public channel due to the intractability of ECDDHP Proposition 20: HEAP provides single sign-on facility.
(refer Section IV). However, the session key SKC,N S =
h(RC · Pns || OT KCM S,C ) = h(RN S · ψC 1
|| OT KCM S,C ) Proof: According to the proposed protocol policy, Ci can ac-
i
between Ci and N Sj has derived from RC , RN S , ψC 1
, Pns cess any service servers (that belongs to a particular Hadoop
i
and OT KCM S,C parameters and Ci ’s password P W D has cluster) after authenticating himself utilizing both CM S
nothing to do with this computation. Thus, our proposed and N M S servers (two-server based authentication). This
protocol achieves both forward and backward secrecy. authentication process is a one time task. After that, Ci can
Proposition 18: HEAP provides mutual authentication. make any numbers of service server request from CM S
Proof: During single sign-on process, Ci checks the le- throughout the session.
gitimacy of both CM S and N M S, and vice versa by
verifying the following three conditions: (i) MT U ∗ =
0
MT U && CM SID = CM SID , (ii) T emp1C = T emp2C VII. FORMAL SECURITY ANALYSIS USING AVISPA
and (iii) T empCM S = T emp2CM S as discussed in Step
1
In HEAP, we have two mutual authentication and session
DKA6, Step DKA8 and Step DKA9, respectively (refer Sec- key agreement tasks: 1) to register the service servers vis-
tion V-D6). Further, at the time of session key establishment a-vis the Hadoop cluster with HEAP-KDC online, service
between Ci and N Sj , both of them verify their legitimacy provider’s administrator (BA) needs to authenticate himself
along with CM S’s legitimacy
0
utilizing the following two to HEAP-KDC (utilizing both N M S and CM S servers) and
conditions: (a) MT U = MT U && T empN 1
S
= T empN 2
S
vice versa, and 2) to access the service server N Sj or JTj
C0 C0
and (b) T emp1 = T emp2 as elaborated in Step SKAB- from a distinct cluster, the end user Ci needs to authenticate
SSA4 and Step SKABSSA7, respectively (see Section V-D8. himself utilizing both CM S and N M S servers and vice
Thus, we can say that the proposed protocol achieves mutual versa. To achieve this two cases, we proposed two different
authentication. authentication strategies as discussed above. In order to val-
Proposition 19: HEAP provides dependable authentication idate these two proposed authentication protocols, we utilize
services. a well-known and widely used Internet security protocol
Proof: From Proposition 16, we can observe that the pro- verification tool, called AVISPA (Automated Validation of
posed authentication framework resolves the SOF and SOV Internet Security Protocols and Applications) [79], [80], [81].
issues of the key distribution center (HEAP-KDC). In such This tool is used to test whether a security protocol is safe
a setting, the failure or compromise of a server (CM S or against an active or passive adversary, such as man-in-the-
N M S or ES) can not increase the downtime of the authen- middle and replay attacks.
32 VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

Currently, AVISPA tool version 1.1 12 is equipped with simulation results in Figure 26 and 27. The simulation results
four implicit back-end model checkers, namely i) On-the- show that the proposed two protocols in HEAP are safe from
fly Model-Checker (OFMC), ii) Constraint Logic based man-in-the-middle and replay attacks.
Attack Searcher (CL-AtSe), iii) SAT-based Model-Checker
(SATMC) and iv) Tree Automata based on Automatic Ap- VIII. PERFORMANCE ANALYSIS
proximations for the Analysis of Security Protocols (TA4SP). This section analyzes the performance of HEAP. Currently,
Further, each model checker is also equipped with different HEAP protocol consists of two modules: (1) Big Data service
state-of-the-art automatic analysis algorithms. The internal provider module and (2) client module. We evaluate the per-
hierarchy of AVISPA tool and its modules are shown in formance of HEAP based on these module and present them
Figure 25. The following steps are needed to simulate a in Table 4 and Table 5. During the performance analysis, we
security protocol in this tool: consider four different metrics as follows:
Step 1: The proposed protocol needs to be codified into 1) CPU usage or Computation Time (CT) in terms of
HLPSL (High Level Protocols Specification Language) seconds – It tells about the execution time or CPU
[81], where HLPSL is the de facto language according usage (in terms of seconds) of different cryptographic
to the specification of AVISPA. operations that we have used throughout the proposed
Step 2: Save the designed code into a file with hlpsl exten- protocol. For example, a cryptographic one-way hash
sion. For example, we save two of our proposed codes function (SHA-1), modular exponentiation, ECC scalar
(two authentication strategies) into two distinct files as multiplication, AES-128 bits stateless CBC mode of
Heap_BNMSfinal.hlpsl and SK_CNS.hlpsl. encryption or decryption. An approximate CT for the
Step 3: After creating the file, we only need to execute the aforesaid operations are taken form [63] and it is sum-
command called “avispa <space> <Filename.hlpsl> marized in Table 3. In the proposed protocol phases,
<space> −− <model checker name>". For example, we use logical XOR operation but it is not depicted
in our case we execute the following commands in in Table 3. Because, this operation takes a negligible
Ubuntu 14.04 LTS platform: amount of time say, Txor = 10−9 seconds as compare
to the other as mentioned in Table 3. Therefore, we do
1) durbadal@durbadal-rec: /Desktop/HEAP_CODE $
not consider Txor for CT computation for both service
avispa Heap_BNMSfinal.hlpsl −− satmc
provider and client modules (refer Table 4 and Table 5).
2) durbadal@durbadal-rec: /Desktop/HEAP_CODE $
2) Communication Overhead (CO) in terms of bits – Sup-
avispa SK_CNS.hlpsl −− ofmc
pose an entity say Ci sends a message Mi to another
Step 4: During the execution of the above command say entity say CM S and |Mi | represents the bit length of the
“avispa <space> <Filename.hlpsl> <space> −− message Mi by summing up the bit length of its individ-
<model checker name>", the AVISPA tool implicitly ual component. More precisely, if Mi = {cP 1 , · · · , cn }
translate the “XYZ.hlpsl" file into another file format then |Mi | = {|c1 | + · · · + |cn |} or |Mi | = j=1 (cj ).
n
called Intermediate Format (IF) using the HLPSL2IF For example, in single sign-on and dynamic key estab-
translator (see Figure 25). lishment phase, Ci sends M C1 = {MT U, CM SID ,
Step 5: Eventually, the IF file is given to each model checker ψC1 2
, ψC , CertC }. Here, |MT U |= 160 bits, |CM SID |
i i
and the model checker test the proposed protocol is safe = 32 bits, |ψC 1 2
| = 160 bits, |ψC | = 160 bits and |CertC |
i i
or unsafe or inconclusive. The IF file is the required = 160 bits. So, |M C1 | = (160+32+160+160+160) =
input file format of each aforementioned model checker 672 bits. Therefore, the communication overhead of
to test the designed protocol. message M C1 is 672 bits. Similarly, we calculate the
We have implemented the codes for two proposed pro- CO for all the other messages and are depicted in Table
tocols in HLPSL and save them into two different files 4 and Table 5. Note here, for ease of CO calculation,
namely Heap_BNMSfinal.hlpsl and SK_CNS.hlpsl. The de- we assume here: (i) the bit length of each certificate is
tailed description on HLPSL and various protocols imple- equal to the bit length of the public-key (i.e., 160 bits
mentations in AVISPA are available in [80], [81]. Under ECC key), (ii) the bit length of the masked identity is
Heap_BNMSfinal.hlpsl file, initially we specify basic roles 160 bits, (iii) |nonce| =32 bits, (iv) the bit length of the
of all the participants (BDSP , N M S and CM S) and server’s original identity is 32 bits and (v) the bit length
then make composite role for representing different cases of cipher-text or plain-text block is 128 bits (using AES-
or scenarios derived from the basic roles. Similarly, under 128 bit encryption/decryption policy).
SK_CNS.hlpsl, we present the basic roles for Ci , CM S, 3) Storage Cost (SC) in terms of bits – To achieve a par-
N M S and N Sj and construct the composite role (or session) ticular task (namely user registration, user login, etc.) in
involving all the participants. the proposed protocol, few security credentials are need
We simulate both the files using AVISPA under the widely- to be stored previously either in client-side or server-
used OFMC and SATMC back-ends and summarized the side or both. The storage cost tells about this pre-loaded
credentials in terms of bits. For example, to achieve
12 AVISPA Project: http://www.avispa-project.org/ user login, HCAi needs to load ri∗ and U SP Wi (total
VOLUME 6, 2018 33
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

FIGURE 25: Building blocks of AVISPA (Source:[79])

% OFMC % OFMC
% Version of 2006/02/13 % Version of 2006/02/13
SUMMARY SUMMARY
SAFE SAFE
DETAILS DETAILS
BOUNDED_NUMBER_OF_SESSIONS BOUNDED_NUMBER_OF_SESSIONS
PROTOCOL PROTOCOL
/opt/avispa−1.1/testsuite /opt/avispa−1.1/testsuite
/results/Heap_BNMSfinal.if /results/SK_CNS.if
GOAL GOAL
as_specified as_specified
BACKEND BACKEND
OFMC OFMC
COMMENTS COMMENTS
STATISTICS STATISTICS
parseTime: 0.00s parseTime: 0.00s
searchTime: 2.41s searchTime: 22.17s
visitedNodes: 859 nodes visitedNodes: 3637 nodes
depth: 10 plies depth: 12 plies

FIGURE 26: Analysis of results under the OFMC backend

|ri∗ | + |U SP W | = 320 bits) credentials into Ci ’s work- storage or processing service server ticket request, the
station. Similarly, to accomplish the service provider proposed protocol needs to store the masked identity of
login, HSAj needs to load d∗ and BDP W (total either N Sj or JTj (160 bits) into HCAi .
|d∗ | + |BDP W | = 320 bits) into BDSP ’s workstation. 4) Communication Rounds (CR) – A single communica-
We highlights the SC for each phase of the proposed tion round represents a one-way message transmission.
protocol in Table 4 and Table 5. Note here, to achieve In this regards, we compute CR for each phases of the
a successful user registration and single sign-on task, proposed protocol and it is shown in Table 4 and Table
N M S needs to keep (|KN M S,C |+ |DCID |+ |DP W D |), 5.
a total of (64 + 64 + 64) bits = 192 bits, and both Tables 4 and 5 show that the proposed protocol is quite
N M S and CM S need to store (|MT U |+ |P W D∗ |+ efficient in terms of communication, storage and computa-
CM S
|RC i
|+ |KN M S,C |), a total of (160+160+32+64) bits tion cost considering all the participants. Consider Ci with
= 416 bits, respectively. Further, to make a Big Data HCAi , for instance; it needs 0.00128 seconds and 0.17588
34 VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

SUMMARY SUMMARY
SAFE SAFE

DETAILS DETAILS
BOUNDED_NUMBER_OF_SESSIONS BOUNDED_NUMBER_OF_SESSIONS
BOUNDED_SEARCH_DEPTH BOUNDED_SEARCH_DEPTH
BOUNDED_MESSAGE_DEPTH BOUNDED_MESSAGE_DEPTH

PROTOCOL PROTOCOL
Heap_BNMSfinal.if SK_CNS.if

GOAL GOAL
%% see the HLPSL specification.. %% see the HLPSL specification..

BACKEND BACKEND
SATMC SATMC

COMMENTS COMMENTS

STATISTICS STATISTICS
attackFound false boolean attackFound false boolean
upperBoundReached true boolean upperBoundReached true boolean
graphLeveledOff 2 steps graphLeveledOff 2 steps
satSolver zchaff solver satSolver zchaff solver
maxStepsNumber 30 steps maxStepsNumber 30 steps
stepsNumber 2 steps stepsNumber 2 steps
atomsNumber 0 atoms atomsNumber 0 atoms
clausesNumber 0 clauses clausesNumber 0 clauses
encodingTime 0.48 seconds encodingTime 0.16 seconds
solvingTime 0 seconds solvingTime 0 seconds
if2sateCompilationTime 0.28 seconds if2sateCompilationTime 0.8 seconds

ATTACK TRACE ATTACK TRACE


%% no attacks have been found.. %% no attacks have been found..

FIGURE 27: Analysis of results under the SATMC backend

Approximate computation
Operation Notations
cost (in seconds) and password change phases. In addition to this, a service
Hash (SHA-1) Thash 0.00032 provider can seamlessly enroll his own Hadoop cluster with
Encryption or Decryption TE /TD (= TS ) 0.0056 the HEAP-KDC by compromising only 33.6 ms (here, we
Modular exponentiation Tmod 0.0192 assume that the Hadoop cluster consists of a single storage
ECC scalar multiplication Tecsm 0.0171
service server and a single processing server). In spite of
this, we compare our scheme with the existing state of the
TABLE 3: Rough estimation of computation time reported in art authentication protocols as follows.
[63]
A. COMPARATIVE ANALYSIS
This section compares the proposed protocol HEAP with
seconds, a total of 0.17716 seconds ≈ 177 milliseconds (ms) other state of the art authentication strategies. For compari-
(refer Scenario2 from Table 5) to login into the system and son, we consider only login and authentication phases of the
validate the legitimacy of HEAP-KDC, respectively. After proposed protocol. Further, we consider three crucial perfor-
Ci ’s single sign-on (or establishment of the session key mance metrics namely communication overhead, CPU usage
with CM S), the Ci needs approximately 0.10072 seconds (or computation time) and storage cost as compared with
(1TE + 1TD + 4Tecsm + 1Tmod + 6Thash ) ≈ 101 ms to other schemes. The detail comparative analysis is discussed
access a particular service server (i.e., either N Sj or JTj ) as follows:
from any Hadoop cluster (which is registered with the HEAP-
KDC) followed by a mutual authentication and session key 1) Comparison of computation costs
agreement phase. According to the computational cost analy- In order to perform the computational cost analysis, we
sis of the proposed protocol, service provider enrollment and consider various schemes related to Big Data, Cloud and
password updation tasks will take ≈ 188 ms and ≈ 18.08 Internet of Thing (IoT) platforms, and are available in recent
ms and with the same cost, user achieves his registration literature [17] [46] [47] [62] [61] [63] [60] [57] [55] [56] [58]
VOLUME 6, 2018 35
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

Participants
Overall costs per phases
Protocol phases Performance metric (BDSP + N M S + CM S + ES)
(CO, CR, SC, CT)
Communication overhead (128+128+32)+(768)+(384+160+32+32)+(32+32+160) bits ++
Communication rounds 6++
Service provider registration (1888 bits, 6, 192 bits, 0.1877 sec)
Storage cost 192 bits
Computation cost 12Tecsm /(5Tecsm ) + 2Tmod + 3TE + 2TD + 3Txor + 9Thash /(4Thash )
Communication overhead None
Communication rounds None
Service provider login (0 bit, 0, 320 bits, 0.00128 sec)
Storage cost 320 bits
Computation cost 3Txor + 4Thash
Scenario1 : |M1 |+· · · +|M4 |= 3200 bits
Communication overhead
Scenario2 : |M1 |+· · · +|M4 |= 2560 bits
Authenticated key agreement Communication rounds 4 (3200 [or 2560] bits, 4, 416 bits,
Storage cost 160+160+64+32 bits = 416 bits
0.21008 [or 0.17588] sec)
Case1 : 15Tecsm /(7Tecsm ) + 3Tmod /(1Tmod ) + 3TE + 3TD + 15Thash /(11Thash )
Computation cost
Case2 : 15Tecsm /(9Tecsm ) + 3Tmod /(1TM od ) + 3TE + 3TD + 15Thash /(11Thash )
Communication overhead 160+32+32+(896+256)+160+32+32 bits
Communication rounds 2
Service server enrollment (1600 bits, 2, 0 bit, 0.0336 sec)
Storage cost None
Computation cost 2TE + 4TD
Communication overhead 32+160+32+384+32+32+160 bits ++
Communication rounds 2++
Password updation (832 bits, 2, 320 bits, 0.01808 sec)
Storage cost 320 bits
Computation cost 1TE + 2TD + 4Thash

TABLE 4: Performance of HEAP considering only Big Data service provider module

Note: X/(Y ) – It signifies that an entity does X computation in real time and Y computation in offline mode; Scenario1 – Dynamic public-private key-pair
and dynamic certificate usage for each principal (except ES) in each session; Scenario2 – Static public-private key-pair and certificate usage for a long
period of time; Case1 – first time login and session key establishment utilizing Scenario1 ; Case2 – next consecutive login and session key establishment
incorporating Scenario2 ; ++ For ease of calculation, we ignore internal broadcast messages among N M S, CM S and ES, respectively.

[59]. Further, we consider various two-server based PAKE Table 8. We can see that, our scheme is efficient in terms
protocols [66] [68] [65] [64] [67] for the analysis. Since its of memory usage (client’s workstation as well as server-
inception, the schemes [17] [46] [47] [62] [61] [63] [60] [57] side) than both Karla and Sood and Kumari et al. schemes.
[55] [56] [58] [59] follow single server based authentication Although, our scheme is lagging in terms of storage overhead
strategy whereas others [66] [68] [65] [64] [67] follow two- as compare with the existing state of the art two-server based
server based analogy. From Table 6, we can observe that our authentication strategies. The remaining schemes (shown in
proposed scheme is better than that of existing two-server Table 8) are not explicitly analyzed the storage overhead in
based approaches [66] [68] [65] [64] [67]. Mean while, the their works. So, we represent it as N/A in Table 8.
proposed scheme is also quite comparable with the traditional Besides all the above cost factors, the proposed scheme
single server based authentication approaches. supports several security and functional features (SFFs) as
compare to the other schemes. This SFFs are discussed as
2) Comparison of communication overheads follows.
In communication overhead analysis, we compare the pro-
posed protocol with the aforesaid schemes and summarize 4) Comparison of security and functional features
it in Table 7. It is easy to say that our scheme is efficient In this section, we discuss several Security and Functional
(only 3520 bits need to be transfered between Ci and HEAP- Features (SFFS) of the proposed protocol and compare it with
KDC for first time authentication or single-sign on and for other existing state of the art schemes. We summarize this
future single-sign on only 2880 bits are required) as compare discussion in Table 9. Form Table 9, it has been observed
to the two-server based authentication schemes [66] [68] [65] that the proposed scheme HEAP fulfills various security and
and [67] (needs 4480, 3840, 4480 and 4000 bits). Further, we functional features as compare with different state of the art
can observe that our scheme has the same CO compared with schemes.
Kumari et al. [62]. In fact, our scheme is quite admirable
in terms of the CO as compare with the single server based IX. DISCUSSIONS
strategy. The proposed scheme’s communication overhead is This section summarizes the major potentials of the pro-
equal to the CO of He and Wang’s scheme [59]. posed scheme. In this regard, various appealing features of
the proposed user identity and password-assisted two-server
3) Comparison of storage costs authentication and key exchange framework are highlighted
The memory usage (or pre-deployed secrets storage cost) as follows:
is involved to smooth execution of the proposed scheme 1) With the proposed enrolment strategy, a Big Data ser-
as compare to the aforementioned schemes are shown in vice provider’s administrator can securely deploy his
36 VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

Participants
Overall costs per phases
Protocol phases Performance metric (Ci + N M S + CM S + ES)
(CO, CR, SC, CT)
Communication overhead (128+128+32)+(768)+(384+160+32+32)+(32+32+160) bits ∗∗
Communication rounds 6∗∗
User registration (1888 bits, 6, 192 bits, 0.1877 sec)
Storage cost 192 bits
12Tecsm /(5Tecsm ) + 2Tmod + 3TE + 2TD + 3Txor
Computation cost
+9Thash /(4Thash )
Communication overhead None
Communication rounds None
User login (0 bits, 0, 320 bits, 0.00128 sec)
Storage cost 320 bits
Computation cost 3Txor + 4Thash
Scenario1 : M C1 + · · · + M C4 = 3520 bits
Communication overhead
Scenario2 : M C1 + · · · + M C4 = 2880 bits
Mutual Authentication and (3520 [or 2880] bits, 4, 416 bits,
Communication rounds 4
Single sign-on 0.21008 [or 0.17588] sec)
Storage cost 160+160+64+32 bits
Case1 : 15Tecsm /(7Tecsm ) + 3Tmod /(1Tmod ) + 3TE + 3TD
+15Thash /(11Thash )
Computation cost
Case2 : 15Tecsm /(9Tecsm ) + 3Tmod /(1TM od ) + 3TE
+3TD + 15Thash /(11Thash )
Communication overhead 640+384+384+384 bits
Communication rounds 2
Service server ticket granting (1792 bits, 2, 0 bits, 0.01712 sec)
Storage cost None
Computation cost 2TE + 1TD + 1Thash
Communication overhead 160 + 160 + 160 + 160 + 384 + 160 + 800 bits
Session key agreement Communication rounds 2
(1984 bits, 2, 160 bits, 0.17936 sec)
with service server Storage cost 160 bits
5Tecsm /(1Tecsm ) + 1Tmod + 7Thash /(1Thash )+
Computation cost
6Tecsm /(2Tecsm ) + 1Tmod + 8Thash /(1Thash )
Communication overhead 32+160+32+384+32+32+160 bits
Communication rounds 2 ∗∗
Password updation (832 bits, 2, 320, 0.01808 sec)
Storage cost 320 bits
Computation cost 1TE + 2TD + 4Thash

TABLE 5: Performance of HEAP considering only Client module

Note: X/(Y ) – It signifies that an entity does X computation in real time and Y computation in offline mode; Scenario1 – Dynamic public-private key-pair
and dynamic certificate usage for each principal (except ES) in each session; Scenario2 – Static public-private key-pair and certificate usage for a long
period of time; Case1 – first time login and session key establishment utilizing Scenario1 ; Case2 – next consecutive login and session key establishment
incorporating Scenario2 ; ∗∗ For ease of calculation, we ignore internal broadcast messages among N M S, CM S and ES, respectively.

cluster vis-a-vis service servers online with authentica- tional user registration policy, where during session key
tion service provider (HEAP-KDC). In order to achieve establishment, a long-term secret key (password) is used
this, the administrator needs to login into the system to build a secret channel between user and Registration
using his identity and password only. Thus, the cluster Authority (RA). With this settings, it is very difficult
enrollment policy is scalable and user friendly in nature. for a user to change or update his long-term credential
2) The proposed HEAP-KDC is robust against single point i.e., password into RA (a centralized server) for security
of failure and single point of vulnerability. Further, reasons. Since, the proposed protocol has a provision
it could tolerate various well-known attacks. Hence, to update user’s password online and has an ability to
HEAP can able to provide dependable and secure au- recover the password utilizing a out-of-band channel
thentication service 24×7 to the customers. (i.e., postal network) or a valid email id (or registered
3) With the proposed two-server based single sign-on mobile number), it is obvious that the current settings
scheme, the client can access any number of Namenode of the proposed approach mitigates the key rollover
servers (N Sj s) or JobTrackers (JTj s) from a Hadoop problem.
cluster by logging in into the system only once. During 6) The utilization of 160 bits public/private key (ECC) and
single sign-on process, a secure mutual authentication 128 bits symmetric key (AES) as compared to 4096
process takes place which ensures the legitimacy of both bits key-size reported in [66], [68], [65], [64], [67]
Ci and the dual-server (CM S and N M S). needs lesser memory space (storage in terms of bits) and
4) The adoptation of two-factor authentication (password reduce communication overheads.
and delegated token) and dual-server based session key 7) The proposed two-server based authentication and key
establishment strategy makes the proposed authentica- exchange protocol does not have any compatibility hur-
tion system more robust, cost effective and user friendly. dle with the traditional single-server based approaches
5) The key rollover problem is a crucial issue for tradi- [46], [47], [17], [62], but yields better dependability

VOLUME 6, 2018 37
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

Computation cost A rough estimation of CPU utilization


State-of-the-art authentication schemes
(in mathematical notations) (in seconds)
Karla and Sood [46] 9Thash + 7Tescm 0.12258
S. Kumari et al. [47] 7Thash + 8Tescm 0.13904
Odelu et al. [60] 25Thash + 6TS + 1TF + 6Tecsm 0.1613
S. Kumari et al. [62] 16Thash + 8Tecsm + 2TBH 0.17612
Shen et al. [57] 17Thash + 6Tecsm 0.10804
Yoon and Yoo [55] 16Thash + 4Tecsm 0.07352
Mishra et al. [56] 1TBH + 16Thash 0.02222
Wu et al. [58] 13Thash + 2Tecsm + 4TS + 1TF 0.07786
He and Wang [59] 21Thash + 8Tescm 0.14352
Wazid et al. [61] 22Thash + 4TS + TF 0.04654
Jangirala et al. [63] 31Thash + 4TCM + TF 0.09542
Katz et al. [66] 21Tmod +  0.4032
Yi et al. [68] 32Tmod +  0.6144
JWX Protocol [65] 18Tmod +  0.3456
Yang et al. [64] 11Tmod /(4Tmod ) +  0.1728
Yi et al. [67] 14Tmod +  0.2688
Jangirala et al. [17] 12Thash + 8Tecsm + 6Tepa 0.27744
Proposed scheme 15Tecsm /(9Tecsm ) + 3Tmod /(1TM od ) 0.17716
+6TS + 19Thash /(11Thash )

TABLE 6: Summary of computation cost analysis

Note: Here, we consider TF = Tepa = TBH = TCM = Tescm ≈ 0.0171 seconds;  represents some other costs (exponentiation and hash operations cost) that
are involved in the particular scheme, but for ease of calculation, in this work, we consider average computation cost.

State-of-the-art authentication schemes Communication cost (in bits) State-of-the-art authentication schemes Storage cost (in bits)
Karla and Sood [46] 1280 Karla and Sood [46] 576
S. Kumari et al. [47] 1760 S. Kumari et al. [47] 480
Odelu et al. [60] 2944
Odelu et al. [60] N/A
S. Kumari et al. [62] 2880
Shen et al. [57] 1856 S. Kumari et al. [62] N/A
Yoon and Yoo [55] 2496 Shen et al. [57] N/A
Mishra et al. [56] 1152 Yoon and Yoo [55] N/A
Wu et al. [58] 1856 Mishra et al. [56] N/A
He and Wang [59] 3520 Wu et al. [58] N/A
Wazid et al. [61] 3232 He and Wang [59] N/A
Jangirala et al. [63] 1856 Wazid et al. [61] N/A
Katz et al. [66] 4480
Jangirala et al. [63] N/A
Yi et al. [68] 3840
JWX Protocol [65] 4480 Katz et al. [66] 384
Yang et al. [64] 3520 Yi et al. [68] 384
Yi et al. [67] 4000 JWX Protocol [65] 384
Jangirala et al. [17] 2496 Yang et al. [64] 384
Proposed scheme 2880 Yi et al. [67] 384
Jangirala et al. [17] N/A
Proposed scheme 416
TABLE 7: Summary of communication overhead analysis
TABLE 8: Summary of storage cost analysis

(fault-tolerant in terms of secret credentials distribution Note: Karla and Sood scheme considers 224 bits ECC but Kumari et al. and
and replication) and security (resists more well-known our proposed scheme adopt 160 bits ECC; N/A – the particular scheme does
not considers the storage cost during performance analysis.
attacks). The utilization of two separate application
instances say HCA and HSA divide the users domain
into two distinct categories, hence maintenance of both
client and service provider is easy. Further, HEAP- KDC’s server). In such a provision, it is very difficult to
KDC encloses a less number of verification parameters trace the on-going activities among end users’, HEAP-
(i.e., masked identity, pass-phrase and digital signature) KDC’s servers and service servers’.
inside CT and N ST , that decreases the verification cost
for Ci and N Sj (or JTj ), respectively X. CONCLUSION
8) The proposed protocol preserves both user and service In this paper, we proposed a new fault tolerant two-server au-
server (N S or JT ) identities from external as well as thentication and key agreement protocol (HEAP) for Hadoop
internal adversaries (malicious insider control HEAP- framework to access secure and privacy preserved Big Data
38 VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

SFFs [46] [47] [60] [62] [57] [55] [56] [58] [59] [61] [63] [66] [68] [65] [64] [67] [17] HEAP
SF F1 7 3 3 3 3 7 3 3 3 3 3 N/A N/A N/A N/A N/A 3 3
SF F2 7 3 3 3 7 7 3 3 3 3 3 N/A N/A N/A N/A N/A 3 3
SF F3 7 3 3 3 3 3 3 7 3 3 3 3 3 3 3 3 N/A 3
SF F4 3 3 3 3 3 7 7 7 7 3 3 N/A N/A N/A N/A N/A 3 3
SF F5 3 3 3 3 3 3 3 3 7 3 3 N/A N/A N/A N/A N/A 3 3
SF F6 7 3 3 3 3 3 7 3 7 3 3 N/A N/A N/A N/A N/A 3 3
SF F7 7 3 3 3 7 3 3 3 3 3 3 3 3 3 3 3 3 3
SF F8 7 3 3 3 3 3 7 3 3 N/A N/A 7 7 7 7 7 3 3
SF F9 N/A N/A 3 3 3 7 3 3 7 N/A N/A N/A N/A N/A N/A N/A 7 3
SF F10 7 3 3 3 3 3 3 3 7 3 3 3 3 3 3 3 N/A 3
SF F11 7 7 3 3 3 3 3 7 7 3 3 7 7 7 7 7 N/A 3
SF F12 7 N/A N/A 3 3 3 7 3 N/A N/A 7 N/A N/A 7 N/A N/A N/A 3
SF F13 N/A 3 3 N/A N/A N/A N/A 7 N/A N/A N/A N/A N/A N/A N/A N/A N/A 3
SF F14 7 7 7 7 7 7 7 7 7 7 7 N/A N/A N/A N/A N/A 7 3
SF F15 7 7 7 7 7 7 7 7 7 7 7 3 3 3 3 3 7 3
SF F16 7 7 3 3 3 3 3 3 3 3 3 7 7 7 7 7 7 7
SF F17 3 3 3 7 N/A 3 N/A N/A 3 N/A N/A N/A N/A N/A N/A N/A N/A 3
SF F18 3 3 3 7 N/A N/A 3 N/A 3 3 3 N/A N/A N/A N/A N/A 3 3
SF F19 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 3
SF F20 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 3
SF F21 7 3 3 N/A N/A 7 N/A N/A N/A 3 N/A N/A N/A N/A N/A N/A 3 3
SF F22 7 7 7 3 7 7 7 3 7 3 3 3 3 7 7 3 7 3
SF F23 3 3 3 7 7 7 3 7 7 3 3 7 7 7 7 7 3 3
SF F24 7 7 3 7 3 3 N/A 3 3 N/A N/A N/A N/A N/A N/A N/A N/A 3
SF F25 7 7 3 7 7 7 3 7 7 3 3 N/A N/A N/A N/A N/A 3 3
SF F26 7 7 7 7 7 7 7 7 7 7 7 7 3 N/A N/A 3 7 3
SF F27 7 7 7 7 7 7 7 7 7 3 3 N/A 3 7 N/A 3 N/A 3
SF F28 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 3
SF F29 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 3

TABLE 9: Summary of comparison in terms of security and functionality features.

Note: SF F1 – Resists privileged-insider attacks; SF F2 – Provides user or device anonymity; SF F3 – Resists off-line password guessing attacks; SF F4
– Resists user impersonation attack; SF F5 – Resists replay attacks; SF F6 – Resists server impersonation attacks; SF F7 – Provides secure mutual
authentication; SF F8 – Provides forward secrecy; SF F9 – Resists known session-specific temporary information attacks; SF F10 – Provides session key
security; SF F11 – Provides freely password changing facility; SF F12 – Provides backward secrecy; SF F13 – Resists identity compromisation attacks;
SF F14 – Resists single point of failure issue; SF F15 – Resists single point of vulnerability issue; SF F16 – Extra hardware cost involved; SF F17 –
Resists stolen-verifier attacks; SF F18 – Resists man-in-the-middle attacks; SF F19 – Resists ciphertext-only attacks; SF F20 – Resists chosen plaintext
attacks; SF F21 – Provides service server anonymity; SF F22 – Provides formal verification using RoR model; SF F23 – Provides AVISPA-based protocol
verification; SF F24 – Resists server spoofing attacks; SF F25 – Resists denial of service attacks; SF F26 – Provides online updation of service server’s
secret credentials; SF F27 – Resists online dictionary attacks; SF F28 – Provides fault tolerant authentication framework; SF F29 – Provides single sign-on
facility;3– SFF is achieved; 7– SFF is not achieved; N/A – SFF is not analyzed in this scheme.

storage and processing services. To achieve this objective, security credentials of each principal in such a way that it
initially, Ci needs to login into its workstation using his makes the overall authentication system more dependable
identity and password. Then, the single sign-on mechanism and fault-tolerant.
vis-a-vis the session key formation task has been carried The rigorous security analysis of HEAP under de facto
out in which an end user Ci and CM S server establish a ROR simulation (formal analysis) and informal inspection
session secret key SKC,CM S between themselves with the shows that the proposed scheme is provably secure. More-
help of N M S server, followed by a mutual authentication over, the security of the proposed protocol (HEAP) is also
process. With this session secret key SKC,CM S , Ci can make verified utilizing the widely-used AVISPA protocol simulator
a service server (Big Data storage or Big Data processing tool. All these security analysis outputs shows that HEAP
service server) requests to CM S server through a secure is robust against active and passive adversary. In addition,
channel. In the next-level, CM S responses Ci with two the performance analysis evident that the proposed protocol
tokens (Ctoken and N Stoken or JTtoken ). Utilizing these two (HEAP) is effective in terms of computation, communication
tokens, both Ci and N Sj (or JTj ) are mutually authenticate and storage costs, and comparable with the existing state-of-
themselves and establish the session key for secure future art schemes.
communication. In the future, we plan to integrate HEAP with a real-
In this work, we proposed a new key distribution center world large scale cluster setting (or real-time Hadoop cluster)
(HEAP-KDC), where we can distribute and replicate the and try to re-calibrate it further to enhance the security and
VOLUME 6, 2018 39
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

performance with the real-world deployment. [15] Z. Shen, L. Li, F. Yan, X. Wu, “Cloud computing system based on
trusted computing platform,” Proceedings of International Conference on
Intelligent Computation Technology and Automation (ICICTA’10), vol. 1,
ACKNOWLEDGMENTS pp. 942–945, 2010.
The corresponding author would like to acknowledge Min- [16] G. S. Aujla, R. Chaudhary, N. Kumar, A. K. Das, J. J. Rodrigues,
istry of Human Resource Development, Government of India “Secsva: Secure storage, verification, and auditing of big data in the cloud
environment,” IEEE Communications Magazine, vol. 56, no. 1, pp. 78–85,
for providing financial support to carry out this research 2018.
work at Subir Chowdhury School of Quality and Relia- [17] J. Srinivas, A. K. Das, J. J. Rodrigues, “2PBDC: Privacy-Preserving Big-
bility, Indian Institute of Technology Kharagpur through data Collection in Cloud Environment,” Journal of Supercomputing,https:
//doi.org/10.1007/s11227-018-2605-1, pp. 1–30, 2018
Institute Fellowship category. This work was also partially [18] A. K. Das, “Analysis and improvement on an efficient biometric-based
supported by Finep/Funttel Grant No. 01.14.0231.00, un- remote user authentication scheme using smart cards,” IET Information
der the Radiocommunication Reference Center (Centro de Security, vol. 5, no. 3, pp. 145–151, 2011.
[19] S. Chatterjee, S. Roy, A. K. Das, S. Chattopadhyay, N. Kumar, A. V. Vasi-
Referência em Radicomunicações - CRR) project of the lakos, “Secure Biometric-Based Authentication Scheme using Chebyshev
National Institute of Telecommunications (Instituto Nacional Chaotic Map for Multi-Server Environment,” Transactions on Dependable
de Telecomunicações), Brazil, by National Funding from the and Secure Computing, vol. 15, no. 5, pp. 824–839, 2016.
[20] M. Wazid, A. K. Das, S. Kumari, X. Li, F. Wu, “Provably secure biometric-
FCT – Fundação para a Ciência e a Tecnologia through the based user authentication and key agreement scheme in cloud computing,”
UID/EEA/500008/2013 Project; and by Brazilian National Security and Communication Networks, vol. 9, no. 17, pp. 4103–4119,
Council for Research and Development (CNPq) via Grant 2016.
[21] V. Odelu, A. K. Das, S. Kumari, X. Huang, M. Wazid, “Provably secure
No. 309335/2017-5. We also thank the anonymous reviewers authenticated key agreement scheme for distributed mobile cloud comput-
for their valuable feedback on the paper which helped us to ing services,” Future Generation Computer Systems, vol. 68, pp. 74–88,
improve its quality and presentation. 2017.
[22] V. U. Srinivasan, R. Angal, A. Sondhi, “OAuth framework,” US Patent No.
8,935,757, Jan. 13, 2015.
REFERENCES [23] L. Kang, X. Zhang, “Identity-based authentication in cloud storage shar-
[1] K. Shvachko, H. Kuang, S. Radia, R. Chansler, “The hadoop distributed ing,” Proceedings of International Conference on Multimedia Information
file system,” Proceedings of twenty sixth symposium on Mass storage Networking and Security (MINES’10), pp. 851–855, 2010.
systems and technologies (MSST’10), pp. 1–10, 2010. [24] L. Zhu, B. Tung, “Public key cryptography for initial authentication in
[2] S. Ghemawat, H. Gobioff, S.-T. Leung, “The Google file system,” Proceed- Kerberos (PKINIT),” RFC-4556 (2006), https://tools.ietf.org/pdf/rfc4556.
ings of the nineteenth ACM symposium on Operating systems principles, pdf, Online accessed Feb. 10, 2017.
vol. 37, pp. 29–43, 2003. [25] G. Wettstein, J. Grosen, E. Rodriquez, “IDfusion, an open-architecture for
[3] O. Rodeh, A. Teperman, “zFS-a scalable distributed file system using Kerberos based authorization,” Proceedings of AFS and Kerberos Best
object disks,” Proceedings of Twentieth IEEE/Eleventh NASA Goddard Practices Workshop, Michigan, pp. 1–22, 2006.
Conference on Mass Storage Systems and Technologies (MSST’03), [26] J. Astorga, E. Jacob, M. Huarte, M. Higuero, “Ladon: end-to-end au-
pp. 207–218, 2003. thorisation support for resource-deprived environments,” IET Information
[4] S. A. Weil, S. A. Brandt, E. L. Miller, D. D. Long, C. Maltzahn, “Ceph: Security, vol. 6, no. 2, pp. 93–101, 2012.
A scalable, high-performance distributed file system,” Proceedings of the [27] N. Itoi, P. Honeyman, “Smartcard Integration with Kerberos V5,” Proceed-
Seventh symposium on Operating systems design and implementation, ings of USENIX Workshop on Smartcard Technology, pp. 1–12, 1999.
USENIX Association, pp. 307–320, 2006. [28] H.-Y. Chien, J.-K. Jan, Y.-M. Tseng, “An efficient and practical solution to
[5] O. O’Malley, “Integrating kerberos into apache hadoop,” Kerberos Con- remote authentication: smart card,” Computers & Security, vol. 21, no. 4,
ference, pp. 26–27, 2010. pp. 372–375, 2002.
[6] P. P. Sharma, C. P. Navdeti, “Securing big data hadoop: a review of security [29] X. Li, J. Niu, M. K. Khan, J. Liao, “An enhanced smart card based remote
issues, threats and solution,” Int. J. Comput. Sci. Inf. Technol., vol. 5, no .2, user password authentication scheme,” Journal of Network and Computer
pp. 2126–2131, 2014. Applications, vol. 36, no. 5, pp. 1365–1371, 2013.
[7] S. Jin, S. Yang, X. Zhu, H. Yin, “Design of a trusted file system based on [30] J. Kohl, C. Neuman, “The Kerberos network authentication service (V5),”
hadoop,” Proceedings of international conference on trustworthy comput- RFC 1510 (1993), https://tools.ietf.org/pdf/rfc1510.pdf, Online accessed
ing and services, pp. 673–680, 2012. Feb. 10, 2017.
[8] H. Zhou, Q. Wen, “A new solution of data security accessing for Hadoop [31] I. Downnard, “Public-key cryptography extensions into Kerberos,” IEEE
based on CP-ABE,” Proceedings of Fifth international conference on soft- Potentials, vol. 21, no. 5, pp. 30–34, 2002.
ware engineering and service science (ICSESS’14), pp. 525–528, 2014. [32] N. Sakimura, J. Bradley, M. Jones, B. de Medeiros, C. Mortimore,
[9] F. A. H. Jing, S. B. L. Renfa, T. C. T. Zhuo, “The research of the data “OpenID Connect Core 1.0 incorporating errata set 1,” The OpenID
security for Cloud disk based on the Hadoop framework,” Proceedings Foundation, specification, 2014.
of Fourth international conference on intelligent control and information [33] G. S. Sadasivam, K. A. Kumari, S. Rubika, “A novel authentication
processing (ICICIP’13), pp. 293–298, 2013. service for hadoop in cloud environment,” Proceedings of International
[10] D. Chattaraj, M. Sarma, A. K. Das, “A new two-server authentication and Conference on Cloud Computing in Emerging Markets (CCEM’12), pp.
key agreement protocol for accessing secure cloud services,” Computer 1–6, 2012.
Networks, vol. 131, pp. 144–164, 2018. [34] S. M. Bellovin, M. Merritt, “Limitations of the Kerberos authentication
[11] G. S. Sadasivam, K. A. Kumari, S. Rubika, “A novel authentication service system,” ACM SIGCOMM Computer Communication Review, vol. 20,
for Hadoop in cloud environment,” Proceedings of International Confer- no. 5, pp. 119–132, 1990.
ence on Cloud Computing in Emerging Markets (CCEM’12), pp. 1–6, [35] Z. Xu, K. M. Martin, “A Practical Deployment Framework for Use of
2012. Attribute-Based Encryption in Data Protection,” Proceedings of tenth
[12] P. Rahul, T. GireeshKumar, “A novel authentication framework for International Conference on High Performance Computing and Commu-
Hadoop,” Proceedings of Artificial Intelligence and Evolutionary Algo- nications, pp. 1593–1598, 2013.
rithms in Engineering Systems, pp. 333–340, 2015. [36] J. C. Cohen, S. Acharya, “Incorporating hardware trust mechanisms in
[13] M. Sarvabhatla, M. C. M. Reddy, C. S. Vorugunti, “A Secure and Light Apache Hadoop: To improve the integrity and confidentiality of data in a
Weight Authentication Service in Hadoop using One Time Pad,” Procedia distributed Apache Hadoop file system: An information technology infras-
Computer Science, vol. 50, pp. 81–86, 2015. tructure and software approach,” Proceedings of Globecom Workshops,
[14] N. Somu, A. Gangaa, V. S. Sriram, “Authentication service in hadoop pp. 769–774, 2012.
using one time pad,” Indian Journal of Science and Technology, vol. 7, [37] B. R. Chang, H. F. Tsai, Z.-Y. Lin, C.-M. Chen, “Access security on cloud
no. 4, pp. 56–62, 2014. computing implemented in hadoop system,” Proceedings of Fifth Interna-

40 VOLUME 6, 2018
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

tional Conference on Genetic and Evolutionary Computing (ICGEC’11), [59] D. He, D. Wang, “Robust biometrics-based authentication scheme for
pp. 77–80, 2011. multiserver environment,” IEEE Systems, vol. 9, no. 3, pp. 816–823, 2015.
[38] K. Hwang, D. Li, “Trusted cloud computing with secure resources and data [60] V. Odelu, A. K. Das, A. Goswami, “A secure biometrics-based multi-
coloring,” Internet Computing, vol. 14, no. 5, pp. 14–22, 2010. server authentication protocol using smart cards,” Transactions on Infor-
[39] W. B. Dai Yuefa, G. Yaqiang, Z. Quan, T. Chaojing, “Data security mation Forensics and Security, vol. 10, no. 9, pp. 1953–1966, 2015.
model for cloud computing,” Proceedings of International Workshop on [61] M. Wazid, A. K. Das, V. Odelu, N. Kumar, W. Susilo, “Secure remote
Information Security and Application (IWISA’09), pp. 141–144, 2009. user authenticated key establishment protocol for smart home environ-
[40] V. N. Inukollu, S. Arsi, S. R. Ravuri, “Security issues associated with big ment,” Transactions on Dependable and Secure Computing, 2017, DOI:
data in cloud computing,” International Journal of Network Security & Its 10.1109/TDSC.2017.2764083.
Applications, vol. 6, no. 3, pp. 45, 2014. [62] S. Kumari, X. Li, F. Wu, A. K. Das, K.-K. R. Choo, J. Shen, “Design
[41] M. R. Jam, L. M. Khanli, M. S. Javan, M. K. Akbari, “A survey on security of a provably secure biometrics-based multi-cloud-server authentication
of Hadoop,” Proceedings of Fourth International Conference on Computer scheme,” Future Generation Computer Systems, vol. 68, pp. 320–330,
and Knowledge Engineering (ICCKE’14), pp. 716–721, 2014. 2017.
[42] B. Campbell, C. Mortimore, M. Jones, “Security Assertion Markup [63] J. Srinivas, A. K. Das, M. Wazid, N. Kumar, “Anonymous lightweight
Language (SAML) 2.0 Profile for OAuth 2.0 Client Authentication and chaotic map-based authenticated key agreement protocol for industrial
Authorization Grants,” RFC No. 7522 (2015), Online accessed Feb. 10, internet of things,” Transactions on Dependable and Secure Computing,
2017. 2018, DOI: 10.1109/TDSC.2018.2857811.
[43] Y. Kirsal, O. Gemikonakli, “Further improvements to the kerberos timed [64] Y. Yang, R. H. Deng, F. Bao, “A practical password-based two-server
authentication protocol,” Proceedings of Novel Algorithms and Tech- authentication and key exchange system,” Transactions on Dependable and
niques in Telecommunications, Automation and Industrial Electronics, pp. Secure Computing, vol. 3, no. 2, pp. 105–114, 2006.
550–554, 2008. [65] H. Jin, D. S. Wong, Y. Xu, “An efficient password-only two-server authen-
[44] N. Abdelmajid, M. A. Hossain, S. Shepherd, K. Mahmoud, “Location- ticated key exchange system,” Proceedings of International Conference on
based kerberos authentication protocol,” Proceedings of Second Interna- Information and Communications Security, pp. 44–56, 2007.
tional Conference on Social Computing (SocialCom’10), pp. 1099–1104, [66] J. Katz, P. MacKenzie, G. Taban, V. Gligor, “Two-server password-only
2010. authenticated key exchange,” Computer and System Sciences, vol. 78,
[45] A. Joux, “The weil and tate pairings as building blocks for public key no. 2, pp. 651–669, 2012.
cryptosystems,” Proceedings of International Algorithmic Number Theory [67] X. Yi, S. Ling, H. Wang, “Efficient two-server password-only authen-
Symposium, pp. 20–32, 2002. ticated key exchange,” Transactions on Parallel & Distributed Systems,
[46] S. Kalra, S. K. Sood, “Secure authentication scheme for iot and cloud vol. 24, no. 9, pp. 1773–1782, 2013.
servers,” Pervasive and Mobile Computing, vol.24, pp. 210–223, 2015. [68] X. Yi, F. Hao, E. Bertino, “Id-based two-server password-authenticated
[47] S. Kumari, M. Karuppiah, A. K. Das, X. Li, F. Wu, N. Kumar, “A secure key exchange,” Proceedings of European Symposium on Research in
authentication scheme based on elliptic curve cryptography for iot and Computer Security, pp. 257–276, 2014.
cloud servers,” Journal of Supercomputing, 2017, DOI: 10.1007/s11227- [69] P. Sarkar, “A simple and generic construction of authenticated encryption
017-2048-0. with associated data,” Transactions on Information and System Security,
[48] J. H. Yang, P. Y. Lin, “An ID-Based User Authentication Scheme for vol. 13, no. 4, pp. 33, 2010.
Cloud Computing,” Proceedings of Tenth International Conference on [70] Secure Hash Standard, FIPS PUB 180-1, National Institute of Standards
Intelligent Information Hiding and Multimedia Signal Processing (IIH- and Technology (NIST), U.S. Department of Commerce, April 1995,
MSP’14), pp. 98–101, 2014. http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf. Accessed on
[49] T. H. Chen, H. L. Yeh, W. K. Shih, “An Advanced ECC Dynamic ID- September 2017.
Based Remote Mutual Authentication Scheme for Cloud Computing,” [71] R. W. D. Nickalls, “A new approach to solving the cubic: Cardan’s solution
Proceedings of fifth FTRA International Conference on Multimedia and revealed,” The Mathematical Gazette, vol. 77, no. 480, pp. 354–359, 1993.
Ubiquitous Engineering (MUE’11), pp. 155–159, 2011. [72] N. Koblitz, “Elliptic Curves Cryptosystems,” Mathematics of Computa-
[50] D. Wang, Y. Mei, C. G. Ma, Z. S. Cui, “Comments on an Advanced tion, vol. 48, no. 177, pp. 203–209, 1987.
Dynamic ID-Based Authentication Scheme for Cloud Computing,” Pro- [73] D. Dolev, A. Yao, “On the security of public key protocols,” Transactions
ceedings of International Conference on Web Information Systems and on information theory, vol. 29, no. 2, pp. 198–208, 1983.
Mining, pp. 246–253, 2012. [74] R. Canetti, H. Krawczyk, “Analysis of key-exchange protocols and their
[51] Z. Hao, S. Zhong, N. Yu, “A time-bound ticket-based mutual authenti- use for building secure channels,” Proceedings of International Conference
cation scheme for cloud computing,” International Journal of Computers, on the Theory and Applications of Cryptographic Techniques, pp. 453–
Communications and Control, vol. 6, no. 2, pp. 227–235, 2011. 474, 2001.
[52] C. D. Jaidhar, “Enhanced mutual authentication scheme for cloud architec- [75] C. J. Cremers, “Formally and Practically Relating the CK, CK-HMQV, and
ture,” Proceedings of Third International Advance Computing Conference eCK Security Models for Authenticated Key Exchange,” iACR Cryptology
(IACC’13), pp. 70–75, 2013. ePrint Archive, 2009.
[53] P. Gope, A. K. Das, “Robust Anonymous Mutual Authentication Scheme [76] J. Dean, S. Ghemawat, “Mapreduce: simplified data processing on large
for n-times Ubiquitous Mobile Cloud Computing Services,” Internet of clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008.
Things Journal, vol. 4, no. 5, pp. 1764–1772, 2017. [77] M. Abdalla, P. Fouque, D. Pointcheval, “Password-based authenticated key
[54] J. L. Tsai, N. W. Lo, “A Privacy-Aware Authentication Scheme for Dis- exchange in the three-party setting,” Proceedings of Eighth International
tributed Mobile Cloud Computing Services,” IEEE Systems, vol. 9, no. 3, Workshop on Theory and Practice in Public Key Cryptography (PKC’05),
pp. 805–815, 2015. LNCS, vol. 3386, pp. 65–84, 2005.
[55] E.-J. Yoon, K.-Y. Yoo, “Robust biometrics-based multi-server authen- [78] C. C. Chang, H. D. Le, “A Provably Secure, Efficient and Flexible Authen-
tication with key agreement scheme for smart cards on elliptic curve tication Scheme for Ad hoc Wireless Sensor Networks,” Transactions on
cryptosystem,” Journal of supercomputing, vol. 63, no. 1, pp. 235–255, Wireless Communications, vol. 15, no. 1, pp. 357–366, 2016.
2013. [79] A. Armando, D. Basin, Y. Boichut, Y. Chevalier, L. Compagna, J. Cuéllar,
[56] D. Mishra, A. K. Das, S. Mukhopadhyay, “A secure user anonymity- P. H. Drielsma, P.-C. Héam, O. Kouchnarenko, J. Mantovani, et al., “The
preserving biometric-based multi-server authenticated key agreement AVISPA tool for the automated validation of internet security protocols
scheme using smart cards,” Expert Systems with Applications, vol. 41, and applications,” Proceedings of International Conference on Computer
no. 18, pp. 8129–8143, 2014. Aided Verification, pp. 281–285, 2005.
[57] H. Shen, C. Gao, D. He, L. Wu, “New biometrics-based authentication [80] L. Viganò, “Automated security protocol analysis with the AVISPA tool,”
scheme for multi-server environment in critical systems,” Journal of Am- Electronic Notes in Theoretical Computer Science, vol. 155, pp. 61–86,
bient Intelligence and Humanized Computing, vol. 6, no. 6, pp. 825–834, 2006.
2015. [81] D. von Oheimb, “The high-level protocol specification language HLPSL
[58] F. Wu, L. Xu, S. Kumari, X. Li, “A novel and provably secure biometrics- developed in the EU project AVISPA,” Proceedings of APPSEM’05 work-
based three-factor remote authentication scheme for mobile client–server shop, pp. 1–17, 2005.
networks,” Computers & Electrical Engineering, vol. 45, pp. 274–285,
2015.

VOLUME 6, 2018 41
D. Chattaraj et al.: HEAP: Efficient and Fault-tolerant Authentication and Key Exchange Protocol for Hadoop-assisted Big Data Platform

DURBADAL CHATTARAJ (S’17) received his Neeraj Kumar (M’16, SM’17) received the Ph.D.
B.Tech. and M.Tech degree in Computer Science degree in computer science and engineering from
and Engineering from West Bengal University of Shri Mata Vaishno Devi University, Katra (J&K),
Technology. He is currently pursuing his Ph.D. India, in 2009. He was a Post-Doctoral Research
degree from Subir Chowdhury School of Quality Fellow at Coventry University, Coventry, U.K. He
and Reliability, Indian Institute of Technology, is currently an Associate Professor with the De-
Kharagpur, India. His current research interests partment of Computer Science and Engineering,
include Cloud Data Security and Reliability, Cryp- Thapar University, Patiala, India. He has authored
tography, Cloud Computing, Big Data Analytics, more than 200 technical research papers pub-
Information Security and Dependable Computing. lished in leading journals and conferences from
He has authored 7 papers in international journals and conferences in the the IEEE, Elsevier, Springer, John Wiley, etc. Some of his research findings
above areas. are published in top cited journals such as the IEEE Transactions on Indus-
trial Electronics, IEEE Transactions on Dependable and Secure Computing,
IEEE Transactions on Intelligent Transportation Systems, IEEE Transac-
tions on Consumer Electronics, IEEE Network, IEEE Communications,
IEEE Wireless Communications, IEEE Internet of Things Journal and IEEE
Systems Journal. He has guided many research scholars leading to Ph.D.
MONALISA SARMA received her Ph.D. (2008) and M.E./M.Tech. He is in the editorial board of IEEE Communications
and MS (2003) in Computer Science and En-
Magazine, Journal of Network and Computer Applications (Elsevier) and
gineering from Indian Institute of Technology,
International Journal of Communication Systems (Wiley).
Kharagpur, and B.Tech degree in Computer Sci-
ence and Engineering (1994) from North East-
ern Regional Institute of Science and Technology,
Itanagar. She is currently an Assistant Profes-
sor with the Subir Chowdhury School of Qual-
ity and Reliability, Indian Institute of Technol-
ogy, Kharagpur, India. She has about 5 years of Joel J. P. C. Rodrigues [S’01, M’06, SM’06]
teaching experience and four years industrial experience (two years in Oil is professor at the National Institute of Telecom-
India Limited and two years in Siemens Corporate Technology, India). Her munications (Inatel), Brazil and senior researcher
areas of interest and research include Biometric Security, Cloud Computing at the Instituto de Telecomunicações, Portugal. He
Security and Dependability Analysis. has been professor at UBI, Portugal and visiting
professor at UNIFOR. He received the Academic
Title of Aggregated Professor in informatics en-
gineering from UBI, the Habilitation in computer
science and engineering from the University of
ASHOK KUMAR DAS (M’17–SM’18) received Haute Alsace, France, a PhD degree in informatics
a Ph.D. degree in computer science and engineer- engineering and an MSc degree from the UBI, and a five-year BSc degree
ing, an M.Tech. degree in computer science and (licentiate) in informatics engineering from the University of Coimbra,
data processing, and an M.Sc. degree in mathemat- Portugal. He is the leader of NetGNA Research Group, the President of the
ics from IIT Kharagpur, India. He is currently an scientific council at ParkUrbis âĂŞ Covilhã Science and Technology Park,
Associate Professor with the Center for Security, the Past-Chair of the IEEE ComSoc TCs on eHealth and on Communications
Theory and Algorithmic Research, International Software, Steering Committee member of the IEEE Life Sciences Technical
Institute of Information Technology, Hyderabad, Community. He is the Editor-in-Chief of the International Journal on E-
India. His current research interests include cryp- Health and Medical Communications, and an editorial board member of
tography, wireless sensor network security, hier- several journals. He has authored or coauthored over 650 papers in refereed
archical access control, security in vehicular ad hoc networks, smart grid, international journals and conferences, 3 books, and 2 patents. He is member
Internet of Things (IoT), Cyber-Physical Systems (CPS) and cloud com- of the Internet Society, an IARIA fellow, and a senior member ACM and
puting, and remote user authentication. He has authored over 175 papers IEEE.
in international journals and conferences in the above areas, including over
150 reputed journal papers. Some of his research findings are published in
top cited journals, such as the IEEE Transactions on Information Forensics
and Security, IEEE Transactions on Dependable and Secure Computing,
IEEE Transactions on Smart Grid, IEEE Internet of Things Journal, IEEE
Transactions on Industrial Informatics, IEEE Transactions on Vehicular YOUNGHO PARK (M’17) received his BS, MS,
Technology, IEEE Transactions on Consumer Electronics, IEEE Journal of and Ph. D degrees in electronic engineering,
Biomedical and Health Informatics (formerly IEEE Transactions on Infor- Kyungpook National University, Daegu, Korea in
mation Technology in Biomedicine), IEEE Consumer Electronics Magazine, 1989,1991, and 1995, respectively. He is currently
IEEE Access, IEEE Communications Magazine, Future Generation Com- a professor at the School of Electronics Engi-
puter Systems, Computers & Electrical Engineering, Computer Methods neering, Kyungpook National University. In 1996-
and Programs in Biomedicine, Computer Standards & Interfaces, Computer 2008, he was a professor at the School of Electron-
Networks, Expert Systems with Applications, and Journal of Network and ics and Electrical Engineering, Sangju National
Computer Applications. He was a recipient of the Institute Silver Medal from University, Korea. In 2003-2004, he was a visiting
IIT Kharagpur. He is on the editorial board of KSII Transactions on Internet scholar at the School of Electrical Engineering and
and Information Systems, International Journal of Internet Technology and Computer Science, Oregon State University, USA. His research interests
Secured Transactions (Inderscience), and Recent Advances in Communi- include computer networks, multimedia, and information security.
cations and Networking Technology, is a Guest Editor for Computers &
Electrical Engineering (Elsevier) for the special issue on Big data and IoT
in e-healthcare, and has served as a Program Committee Member in many
international conferences.

42 VOLUME 6, 2018

View publication stats

You might also like