You are on page 1of 4

2019 International Conference on Automation, Computational and Technology Management (ICACTM)

Amity University

Enhancing Big Data Security using Elliptic Curve


Cryptography
Shubhi Gupta Swati Vashisht Divya Singh
Department of CSE Department of CSE Department of CSE
Amity University, Greater Noida Amity University, Greater Noida Amity University, Greater Noida
Uttar Pradesh, India Uttar Pradesh, India Uttar Pradesh, India
sgupta1@gn.amity.edu svashisht@gn.amity.edu dsingh@gn.amity.edu

Pradeep kushwaha
Department of CSE
Amity University, Greater Noida
Uttar Pradesh, India
pkkushwaha@gn.amity.edu

Abstract--Withgrowing times and technology, and the data 1. Volume.


related to it is increasing on daily basis and so is the daunting There is only one descriptive word for big data when it
task to manage it. The present solution to this problem i.e our comes to its size; Large. Organizations can encompass
present databases, are not the long-term solutions. These data terabytes and even petabytes of information.
volumes need to be stored safely and retrieved safely to use.
This paper presents an overview of security issues for big data. 2. Variety.
Big Data encompasses data configuration, distribution and Not only structured data, Big data includes unstructured data
analysis of the data that overcome the drawbacks of traditional
as well. It could be anything from text, audio & video to
data processing technology. Big data manages, stores and
acquires data in a speedy and cost-effective manner with the click streams, log files, and more.
help of tools, technologies and frameworks.
3. Veracity.
Keywords - Big data, map reduce Hadoop, security and Huge amounts of data available for processing is prone to
privacy, big data analytics. statistical errors and misinterpretation of the collected
information. Purity plays a critical role.
I. INTRODUCTION
Big idata iis ilarge idata isets iwhich iare iunable ito ibe 4. Velocity.
ianalyzed iand i managed iby itraditional iprocessing isystems. iIn As the data is sensitive, perishability of the data is an
ibig idata, idata isets igrow ito isizes i which itraditional iIT’s ican important concern. For value, it should be streamed as it is
ino ilonger ihandle ithe isize, iscale iand igrowth iof idata. iThe available and also archived.
imanagement iand igarnering i value ifrom iit iis idifficult. iThe

iprimary idifficulties iare ithe iacquisition, istorage, isearching,


These 4Vs of Big Data play a big role in laying out the
isharing, ianalytics, iand ivisualization iof idata. iWith ievolving
path to analytics, with each having essential value in the
idata iset, ithe iprocesses iinvolved ileveraging ithe idata iis ialso
process of unearthing value. But, not only these 4Vs make
ievolving. It iis often isynonymized with ibusiness intelligence,
Big data as complex it is, there is another aspect to it which
ianalytics iand idata i mining. iThe idifference ibetween ithe itwo
are: processes that Big Data drives.
iis ithat iBig iData iis iabout iinductive istatistics iand ibusiness
Tools:
ianalytics iis iabout idescriptive istatistics.

A. Hadoop
Big Data is not that something that has emerged in
latest times but only in the last two years it saw an Hadoop is a tool used to deal with Big data for quite
enormous amount of data recorded. Big Data has its cling to some time now. This has been there for a while, but now
the fields of science and medicine, study of large and more and more different kind of businesses are leveraging
complex data has been done for drug development, physics and exploring its capabilities. The Hadoop platform cater to
modelling, and other forms of research. And now from these large structured and unstructured data sets in big data
roots, Big data is starting to be evolved in different fields processing that does not suit to tables. It enables clustering
now. and targeting. It supports analytics that are deep and
computationally extensive.
Value iextraction ifrom ithe idata iset iis ieasier ithan ibefore.
iBig iData iis ifull iof ichallenges, iranging ifrom ithe itechnical ito Hadoop helps in managing the overheads associated
ithe iconceptual ito ithe ioperational, iany iof i which ican iderail with large data sets. In operation, when an organization’s
ithe iability ito idiscover ivalue iand ileverage i what iBig iData iis data are being loaded into a Hadoop software platform, it is
iall iabout. Big idata ihas iits ichallenges iranging ifrom itechnical broken down into manageable pieces and then automatically
ito iconceptual iand ioperational i which ihampers ithe iability ito distributed to different servers. With this, it is ensured that
ibetter iextract ivalue iand ileverage ithe idefinition iof ibig idata. there is no one place to go to access the data; the address
where the data reside is tracked upon, and multiple copies of
II. LITERATURE SURVEY the data are created for safety. This leads to enhanced
As Big data is multi-dimensional, four primary aspects resiliency as one server goes missing in operation due to
of it are:

348
978-1-5386-8010-0/19/$31.00 ©2019 IEEE
2019 International Conference on Automation, Computational and Technology Management (ICACTM)
Amity University

some reason, that data can be replicated automatically with network for various applications such as image transmission,
the help of a known copy of the same. secure communication, E-messaging, large data
transmission and etc. [7].
The Hadoop paradigm goes beyond than only working
with data. for example, the traditional centralized database III. PROBLEM STATEMENT
system, is limited to a large disk drive connected to a server Data, in today’s time, carry standards pertaining to
class system featuring multiple processors. In this case, security governed by compliance laws and regulations. It
performance of the disk got hampered as the analytics is could be of financial, medical, or government intelligence. It
disturbed and, also the number of processors that can be could be analytics set that needs protection. This data could
conferred. With Hadoop clusters, each one of them be the same as what IT managers are coming across but Big
participates in the processing of the data by spreading the Data analytics immingle the data and cross-index it which
work and the data across the cluster. each of the servers in leads to the need of its security. IT managers should look for
the cluster in indexed with the jobs & then they operate security solutions to the data stored in an array used for Big
upon themselves independently. The results are unified from Data analysis. It should also be put under access
each cluster and then delivered as a whole. This process, in authorization checks.
Hadoop terminologies, is called MapReduce, where the
processes, given codes, are mapped to all the servers in the Privacy
clusters and the results are reduced to a singleton. Privacy and security concerns of the data gathered from
With this feature of hadoop, complex computational the enterprises is not a new concept. However, Concept of
questions can be handled by harnessing all of the available Big data has done some benefit in this regard. Network
cluster processors to work in parallel. personells do cater with a perimeter-based security
mechanism such as firewalls but enforcements like these
B. Map Reduce cannot prevent unauthorized access to data once a
fraudulent has entered the network.
A iprogramming imodel ior ia isoftware iframework iused iin
iApache iHadoop iis iMap iReduce. iHadoop iMapReduce Challenges
iprovides iscalable, ireliable iand ifault itolerant i model i where

ilarge idata isets iare iprocessed iand ianalyzed iin iparallel ion
Big data has been into the IT market for some time now
ilarge imultinode iclusters iof icommodity ihardware. i Analysis
but still problems are being faced in assembling the data and
iof idata iand iprocessing iis idone iin itwo idifferent isteps: iMap
then analyzing it. Companies store different types of data
iphase iand iReduce iphase. iA iMapReduce iacts ifirst, ibreaks
differently(format). compiling, regularizing, and omission
iand idivides ithe iinput idata iinto ichunks i which iis ithen ifirst
of irregularities without removing the information and its
iprocessed iby iMap iphase iand ithen iby iReduce iphase. iThe
value is daunting and challenging.
isorted ioutput iof ithe iMap iphase i with ithe ihelp iof iHadoop,
Releasing information without authorization checks,
ibecomes ian iinput ito iReduce iphase i which iinitiates ireduction
changes in information and denial of services are examples
iparallelly. iFile isystem istores ithese ifiles. iMapReduce
of security breach. Now, this security can be achieved by
iframework igets iinput idatasets ifrom iHDFS22, i23 iby idefault.
proper authentication, preventing unauthorization,
iBoth ithe itasks i might inot ibe istrictly isequential ii.e., ias isoon
encryption and audit trials. Some of the techniques used are:
ias ithe i map iactivity iof ian iassigned iset iis icomplete, ireduce

iactivity ican i follow. iAll i map iactivities i might inot i be • Authentication method
icompleted ibefore iany ireduce iactivity ihappens. iThere iis ino • File encryption method
isuch i necessity. iBoth itasks iof imapping iand ireducing i works
• Access control
ion ikey-value ipairs. iThe iinput iof ithe idata iset iis itaken ias
• Key management
ikey-value ipair iand ithe iprocessing iof ithe ioutput ialso
• Logging method
igenerates iin ithe iform iof ikey-value ipair. iOutput ifrom ithe
• Secure communication method
iMap iphase iis icalled iintermediate iresults i which ibecomes ian

iinput ito ireduce.


IV. SOLUTION
C. Cryptography & Public key Cryptography
Algorithms in action:
Cryptography is ia ibranch iof iapplied imathematics ithat
i
One of the first published public-key algorithm was
iaims ito iadd isecurity iin ithe iciphers iof iany ikind iof imessages.
Diffie-Hellman. Computing discrete logarithmic has never
iCryptography ialgorithms iuse iencryption ikeys, i which iare
been easy. This system however paved the way to compute
ithe ielements ithat iturn ia igeneral iencryption ialgorithm iinto ia
exponents. In Diffie-Hellman, sender and receiver generates
ispecific i method iof iencryption. iThe idata iintegrity iaims ito
a secret key, this key is shared among the two in an insecure
iverify ithe ivalidity iof idata icontained iin ia igiven idocument
channel. They also share some information for computation
[14]. iA ipublic ikey icryptosystem iis ian iasymmetric
the keys but still to know the key based on this information
icryptosystem i where ithe i key iis iconstructed iof ia ipublic ikey
becomes difficult.
iand ia iprivate ikey. iThe ipublic ikey, iknown ito iall, ican ibe

iused ito iencrypt i messages. iOnly ia iperson ithat ihas ithe RSA public key cryptosystem shortly came after Diffie
icorresponding iprivate ikey ican idecrypt ithe i message. iThe Hellman and is also one of the oldest and worked upon
iaim iof ithis istudy iis ito ianalyze ithe iperformance iand isecurity public key cryptosystems. This became first to sign as well
iof idifferent ipublic ikey icryptosystems iover ithe i fraudulence

349
2019 International Conference on Automation, Computational and Technology Management (ICACTM)
Amity University

as encrypt. It works well with long keys and is widely 7. Input a supposed value of G known here as the base
accepted in Ecommerce applications. point which belongs to points from step 6.
8. Calculate 2G, 3G… such that:
Elliptic Curve Cryptography (ECC) proposed by Neal 2G = G + G, 3G = 2G + G and so on until a value iG is
Koblitz and Victor Miller, has been in use for security found ( let i be the least positive integer) such that the
reasons like key exchange and digital signatures. It works value of x coordinate of this point is same as the value
on graphical representation of coordinates which works for of x coordinate of G and the value of y coordinate is
the calculation of the algorithm and a comparative level of prime number minus the value of y coordinate of G.
security can be achieved with shorter keys. From this, Order of G, called as n is computed as i+1.
NTRU works on the algebraic structures of polynomial For instance, if p = 7, G = (1,3), values of 2G, 3G etc.
rings. The main concern of the algorithm is to find a short will be calculated until the value of 3G comes out to be
vector in a given lattice. This reduces the polynomials with (1,4). Then the order, n will be 4.
respect to two different moduli. It works in lesser time as This addition will be done as follows:
RSA and ECC or any other public key system. As the If point R = point P + point Q
computations are very simple, devices with restricted New point (xR, yR):
resources can also use this. xR= (λ2 – xP – xQ) mod p
yR= ((λ (xP – xR)- yP) mod p)
TABLEI.KEY SIZE RATIO OF ALL CRYPTOSYSTEMS where λ = (yQ- yP) / (xQ – xP) mod p
if P = Q and
Diffie-Hellman RSA Key NTRU ECC Key KEY SIZE
Key size in bits size in bits Key size size in RATIO(Bits)
(3xP2+ a) / 2yP) mod p if P ≠ Q
in bits bits 9. Computation of sender’s (say, A) public key:
1024 1024 256 163 6:6:2:1 Choose a large integer nA, so that it lies between 1 and
2048 2048 512 224 9:9:2:1
n.
Compute PA = nAG
3072 3072 768 256 12:12:3:1 10. Computation of receiver’s (say, B) public key:
7680 7680 1920 384 20:20:5:1 Choose a large integer nB, so that it lies between 1 and
15360 15360 3840 512 30:30:8:1 n.
Compute PB = nBG
11. Encryption:
Key size of ECC comes out to be one sixth in reference A will encrypt the message with B’s public key –
to other cryptosystems and hence better. Let plain text Pm belongs to the point set in computed in
step 6.
V. PROPOSED ECC METHODOLOGY Let k be the random integer which lies between 1 and n.
Compute – (kG,Pm+ kPB)
We propose a secure cloud big data storage and its 12. Decryption:
security using ECC algorithm. In the implementation, big Compute kGnB.
data set is divided into sequential data parts based on same Compute Pm + k PB- kGnB to get Pm.
data type block or IP-resembled (Internet Protocol) data
packets and is named alphanumerically.
ECC Cryptographic System [6]:
In this type of Public key cryptography, the user or the
communicating device should have a pair of keys, public
key and a private key. To carry the encryption and
decryption process, some set of operations are performed on
these keys. The underlying mathematic operation is defined
over the elliptic curve y2 = (x3 + ax + b) mod p such that Fig. 1. An Elliptic curve.
4a3 + 27b2 mod p ≠ 0 where p is a large prime number and
a and b are the coefficients that generates different elliptic This implements ECC using a data string which is an
curve points (x,y). alphanumeric sequence. The encryption and decryption of a
particular character is done by using its ASCII value.
Operation:
For instance: a = 0,
1. Take a large prime no. p and values for coefficients a b = -4,
and b such that 4a3 + 27b2 mod p ≠ 0. Base point, G = (31, 6),
2. Consider an equation: y2 = (x3 + ax + b) mod p. Sender’s private key = 25,
3. Take all values of y between 0 to p-1 and calculate y2 Receiver’s private key = 35,
mod p. Random key = 67.
4. Take all values of x between 0 to p-1 and calculate (x3
+ ax + b) mod p. This created over 200 different coordinates of an
5. Collect values of y from step 3 corresponding to values elliptic curve. The plaintext then becomes the coordinate
computed in step 4. stored at the number denoting the corresponding ASCII
6. Collect all points (x,y) from step 5. value.

350
2019 International Conference on Automation, Computational and Technology Management (ICACTM)
Amity University

• Encryption: A character is picked as plaintext. The the proposed scheme was concluded theoretically and also
ASCII value corresponding to it is taken into as an by comparing it to its peer algorithms. ECC is effective and
integer variable. The point on the elliptic curve feasible to protect the big data for cloud tenants.
corresponding to this particular integer is selected
from the database. This following point is then REFERENCES
encrypted. Now, this resultant point is mapped [1] Sangita Bansal, Dr. Ajay Rana, Department of Computer Science and
again to the database that will correspond to a new engineering Amity University, Noida (U. P.) India, transitioning from
integer value. The new integer is then changed to a relational databases to big data, International journal of advanced
research in computer science and software engineering valume4,
corresponding character which will consist of two Issue 1, January 2014
specifications - printable ASCII character which [2] Raghav Toshniwal, kanishka Ghosh Dastidar,Ashok Nath,department
further acts as an index and page number to which of computer science,st.xaviers college (autonomous) kollata,india ,
the corresponding index belongs to. Big Data Security issue and challenge, International Journal of
Innovative in Advanced Engineering (IJIRAE) ISSN:2349-2163
• Decryption: It selects the encrypted character and ISSUE 2, Volume 2 (February2015).
the coinciding page number. This calculates back [3] Venkata Narasimha Inukolu, sailajaArsi and Srinivassa Rao Ravuri,
the integer. Reverse mapping is carried out for the Department of computer Engineering, texas tech university, USA
conversion of integer to point. Decryption is Department of banking and financial services cognizant technology
solution, India, International journal of network security and its
carried out. This again helps in getting integer from application (IJNSA), vol 6 no. 3 May 2014
thedatabase. The corresponding character is the [4] Big_Data_Analytics_for_Security_Intelligence.pdf
plain text character. [5] Vinit Gopal Savant,Department of computer engineering, Pimpri
Chinchwad College of Engineering, Pune, Maharashtra,
• The printable ASCII character ranges from 32 to
India.,vinitsawant06@gmail.com, Approaches to Solve Big Data
126 only. If the encryption character goes beyond Security Issues and Comparative Study of Cryptographic Algorithms
this range, additional calculation is done. A tilde for Data Encryption, International Journal of Engineering Research
(~) is sent and the ASCII value gets incremented and General Science Volume 3, Issue 3, May-June 2015, ISSN 2091-
by 32 to send as a printable character whereas on 2730.
[6] Gupta Shubhi, Department of Computer Science and Engineering,
the decryption side, reverse calculation is done Amity university, Greater Noida, “Implementation of ECC using
when tilde is detected. socket programming in Java”, International Organization of Scientific
Research (IOSR), , Volume 16, Issue 4, Ver. I (Jul-Aug. 2014), PP
Plain Text: It works!! 87-89
Encrypted String: /~-~qI+RA## [7] Gupta Shubhi, Department of Computer Science and Engineering,
Krishna engineering college, “Key based performance analysis of
Decrypted String: It works!! different public key cryptosystems: a survey”, International Journal of
Advanced research in computer science, , Volume 3,No.2
VI. RESULT & CONCLUSION [8] William Stallings, Cryptography and network security, 2nd edition,
Prentice Hall publications
The techniques of data handling in Big data were walk [9] B.Schiener, Applied Cryptography. John Wiley publications and
through-ed. Security of this data is an important factor to sons, 2nd edition, 1996
look into. A survey of several cryptographic techniques that [10] Victor Miller, “Uses of elliptic curves in cryptography”, Advances in
cryptology, 1986
can be used to secure data analytics were presented. Further [11] N. Koblitz, A course in number theory and cryptography.
research would result in more practical solutions to secure [12] http://www.tutorialspoint.com/java/java_networking.
big data sets. The cost issues in terms of time and money [13] KohlekarMegha, Jadhav Anita, 2011.” Implementation of Elliptic
need to be addressed. ECC algorithm does this efficiently by Curve Cryptography on Text and Image”, International Journal of
Enterprise Computing and Business Systems, Vol. 1 Issue 2 July
far and that too in a relatively less key size which make it 2011.
easy to implement with less complexity. Data access [14] Diego F. de Carvalho, Rafael Chies, Andre P. Freire, Luciana A. F.
becomes secure. Implementation and maintenance Martimiano and RudineiGoularte, “Video Steganography for
becomefairly easy. Reliability and scalability increase. Confidential Documents: Integrity, Privacy and Version Control”,
University of Sao Paulo – ICMC, Sao Carlos, SP, Brazil, State
Levels of services are guaranteed. Efficiency and security of University of Maringa, Computing Department, Maringa, PR, Brazil.

351

You might also like